- This event has passed.
November 29, 2018 2:00 pm - 4:00 pm EST
Title: On the Privacy Implications of Real Time Bidding
Speaker: Muhammad Ahmad Bashir
Location: ISEC 655
The massive growth of online advertising has created a need for commensurate amounts of user tracking. Advertising companies track online users extensively to serve them targeted advertisements. On the surface, this seems like a simple process: a tracker places a unique cookie in the user’s browser, repeatedly observes the same cookie as the user surfs the web, and finally uses the accrued data to select targeted ads.
However, the reality is much more complex. The rise of Real Time Bidding (RTB) has forced advertising companies to collaborate more closely with each other via cookie matching. Because of RTB, tracking data is not just observed by trackers embedded directly into web pages, but rather it is funneled through the advertising ecosystem through complex networks of exchanges and auctions. Additionally, to gain a complete picture of user’s browsing behavior and interests across all devices (e.g., laptops, smart-phones, IoT devices, etc.), Advertising and Analytics (A&A) companies actively try to link all devices associated with a user through cross-device tracking.
Numerous surveys have shown that web users are not completely aware of the amount of data sharing that occurs between A&A companies, and thus underestimate the privacy risks associated with online tracking. In order to quantify users’ true digital footprints, we need to take into account information sharing during RTB and cross-device tracking. However, measuring these flows of tracking information is challenging. Although there has been recent work on detecting information sharing (cookie matching) between ad exchanges, these studies are based on brittle heuristics that cannot detect all forms of information sharing, especially under adversarial conditions (e.g., obfuscation). Furthermore, since tracking mechanisms vary across different devices, these studies cannot be effectively used to study cross-device tracking. This limits our view of the privacy landscape and hinders the development of effective privacy tools.
In this thesis, I propose a content-agnostic methodology that is able to detect client- and server-side information flows between arbitrary ad exchanges using retargeted ads. Intuitively, this methodology works because it relies on the semantics of how exchanges serve ads, rather than focusing on specific cookie matching mechanisms. Using crawled data on 35,448 ad impressions, we show that this methodology can successfully categorize four different kinds of information sharing behavior between ad exchanges, including cases where existing heuristic methods fail.
Since our methodology does not look for patterns or identifiers in network traffic, but rather relies on causal inference, I plan to use it to understand cross-device tracking. By conducting controlled experiments, I propose to investigate which ad exchanges are involved in cross-device tracking and which identifiers they leverage to track users across devices.
Our methods allow us to collect a novel and accurate dataset of the relationships between online advertisers and trackers. Using this dataset, I propose to investigate the privacy implications of ubiquitous online tracking. Our data can be used to represent the online advertising ecosystem as a graph; on this graph I plan to run simulations to understand the diffusion of users’ tracking data across the advertising ecosystem. These simulations will allow us to quantify users’ true digital footprints, as well as evaluate the relative effectiveness of privacy preserving tools (e.g., ad and tracker blockers).
The overall goal of my thesis is to bridge the divide between the actual privacy landscape and our understanding of it. My thesis proposes techniques that will help provide users with a more realistic view of the online advertising ecosystem, and enable them to gain a more accurate view of their digital footprint. Furthermore, the results from this thesis can be used to build better or enhance existing privacy preserving tools.
About the Speaker
Ahmad is a Ph.D. candidate in Computer Science at Northeastern University’s College of Computer and Information Science, advised by Prof. Christo Wilson. Ahmad is broadly interested in security and privacy aspects of large-scale systems. His current research focuses on understanding the online advertising ecosystem with an emphasis on privacy implications for end users. Ahmad received his bachelor’s degree in Computer Science from LUMS, Pakistan.
- Prof. Christo Wilson (Thesis Advisor)
- Prof. William Robertson
- Prof. Dave Choffnes
- Prof. Arvind Narayan (External Member)