805 Columbus Avenue
615 Interdisciplinary Science and Engineering Complex (ISEC)
Boston, MA 02120
ATTN: Christo Wilson, 435 ISEC
360 Huntington Avenue
Boston, MA 02115
- Algorithm auditing, specifically using controlled experiments to understand whether black-box algorithmic systems are unfair or discriminatory
- Understanding online tracking and developing techniques to improve online privacy
- Measuring and improving public key infrastructures like SSL/TLS and DNSSEC
- PhD in Computer Science, University of California, Santa Barbara
- MS in Computer Science, College of Engineering at University of California, Santa Barbara
- BS in Computer Science, College of Creative Studies at University of California, Santa Barbara
Christo Wilson is an associate professor in the Khoury College of Computer Sciences at Northeastern University. He is a founding member of the Cybersecurity and Privacy Institute at Northeastern and serves as director of the BS in Cybersecurity program. In 2012, Wilson received his PhD from the University of California, Santa Barbara working under Professor Ben Y. Zhao.
Wilson’s research lies at the intersection of Big Data, security, and privacy; while drawing on methods from the computer, social, political, and economic sciences. He is a 2019 Sloan Fellow and a 2019-2020 Fellow at the Berkman Klein Center for Internet & Society. His work is supported by an NSF Career Award, the Sloan Foundation, the Mozilla Foundation, the Knight Foundation, the Democracy Fund, the Data Transparency Lab, the European Commission, Google, and Verisign Labs.
Wilson’s research has earned widespread recognition. He has received best paper awards at SIGCOMM, NDSS, and ICWSM, and honorable mentions at CHI and CSCW. His work on improving TLS security was recognized with an IEEE Cybersecurity Award for Innovation, and his work on understanding the impact of policy on DNSSEC deployment was honored with an IRTF/Internet Society Applied Networking Research Prize. Additionally, Wilson’s work on modeling the privacy implications of online advertising received a Privacy Papers for Policymakers Award from the Future of Privacy Forum. His work has been covered extensively in the press, including the CBS Evening News, Good Morning America, The Wall Street Journal, The Boston Globe, and The Washington Post.
Professor Wilson is an active member of several academic communities. In 2018, he served as co-General Chair of the inaugural ACM Conference on Fairness, Accountability, and Transparency (FAT*), and he continues to serve on the conference’s executive committee. He regularly serves on the program committees for conferences such as IMC, WWW, ICWSM, IEEE Security and Privacy, and PETS.
Towards Methodologies and Tools for Conducting Algorithm Audits
Towards Methodologies and Tools for Conducting Algorithm Audits
This project will develop methodologies and tools for conducting algorithm audits. An algorithm audit uses controlled experiments to examine an algorithmic system, such as an online service or big data information archive, and ascertain (1) how it functions, and (2) whether it may cause harm.
Examples of documented harms by algorithms include discrimination, racism, and unfair trade practices. Although there is rising awareness of the potential for algorithmic systems to cause harm, actually detecting this harm in practice remains a key challenge. Given that most algorithms of concern are proprietary and non-transparent, there is a clear need for methods to conduct black-box analyses of these systems. Numerous regulators and governments have expressed concerns about algorithms, as well as a desire to increase transparency and accountability in this area.
This research will develop methodologies to audit algorithms in three domains that impact many people: online markets, hiring websites, and financial services. Auditing algorithms in these three domains will require solving fundamental methodological challenges, such as how to analyze systems with large, unknown feature sets, and how to estimate feature values without ground-truth data. To address these broad challenges, the research will draw on insights from prior experience auditing personalization algorithms. Additionally, each domain also brings unique challenges that will be addressed individually. For example, novel auditing tools will be constructed that leverage extensive online and offline histories. These new tools will allow examination of systems that were previously inaccessible to researchers, including financial services companies. Methodologies, open-source code, and datasets will be made available to other academic researchers and regulators. This project includes two integrated educational objectives: (1) to create a new computer science course on big data ethics, teaching how to identify and mitigate harmful side-effects of big data technologies, and (2) production of web-based versions of the auditing tools that are designed to be accessible and informative to the general public, that will increase transparency around specific, prominent algorithmic systems, as well as promote general education about the proliferation and impact of algorithmic systems.
Towards Confederated Web-Based Services
Towards Confederated Web-Based Services
This project is using cloud computing to re-architect web-based services in order to enable end users to regain privacy and control over their data. In this approach—a confederated architecture—each user provides the computing resources necessary to support her use of the service via cloud providers.
Users today have access to a broad range of free, web-based social services. All of these services operate under a similar model: Users entrust the service provider with their personal information and content, and in return, the service provider makes their service available for free by monetizing the user-provided information and selling the results to third parties (e.g., advertisers). In essence, users pay for these services by providing their data (i.e., giving up their privacy) to the provider.
This project is using cloud computing to re-architect web-based services in order to enable end users to regain privacy and control over their data. In this approach—a confederated architecture—each user provides the computing resources necessary to support her use of the service via cloud providers. All user data is encrypted and not exposed to any third-parties, users retain control over their information, and users access the service via a web browser as normal.
The incredible popularity of today’s web-based services has lead to significant concerns over privacy and user control over data. Addressing these concerns requires a re-thinking of the current popular web-based business models, and, unfortunately, existing providers are dis-incentivized from doing so. The impact of this project will potentially be felt by the millions of users who use today’s popular services, who will be provided with an alternative to the business models of today.
Towards Transparency of Personalization on the Web
Towards Transparency of Personalization on the Web
This project will develop new research methods to map and quantify the ways in which online search engines, social networks and e-commerce sites use sophisticated algorithms to tailor content to each individual user.
This project will develop new research methods to map and quantify the ways in which online search engines, social networks and e-commerce sites use sophisticated algorithms to tailor content to each individual user. This “personalization” may often be of value for the user, but it also has the potential to distort search results and manipulate the perceptions and behavior of the user. Given the popularity of personalization across a variety of Web-based services, this research has the potential for extremely broad impact. Being able to quantify the extent to which Web-based services are personalized will lead to greater transparency for users, and the development of tools to identify personalized content will allow users to access information that may be hard to access today.
Personalization is now a ubiquitous feature on many Web-based services. In many cases, personalization provides advantages for users, because personalization algorithms are likely to return results that are relevant to the user. At the same time, the increasing levels of personalization in Web search and other systems are leading to growing concerns over the Filter Bubble effect, where users are only given results that the personalization algorithm thinks they want, while other important information remains inaccessible. From a computer science perspective, personalization is simply a tool that is applied to information retrieval and ranking problems. However, sociologists, philosophers, and political scientists argue that personalization can result in inadvertent censorship and “echo chambers.” Similarly, economists warn that unscrupulous companies can leverage personalization to steer users towards higher-priced products, or even implement price discrimination, charging different users different prices for the same item. As the pervasiveness of personalization on the Web grows, it is clear that techniques must be developed to understand and quantify personalization across a variety of Web services.
This research has four primary thrusts: (1) To develop methodologies to measure personalization of mobile content. The increasing popularity of browsing the Web from mobile devices presents new challenges, as these devices have access to sensitive content like the user’s geolocation and contacts. (2) To develop systems and techniques for accurately measuring the prevalence of several personalization trends on a large number of e-commerce sites. Recent anecdotal evidence has shown instances of problematic sales tactics, including price steering and price discrimination. (3) To develop techniques to identify and quantify personalized political content. (4) To measure the extent to which financial and health information is personalized based on location and socio-economic status. All four of these thrusts will develop new research methodologies that may prove effective in other areas of research as well.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
David Lazer, Ryan Kennedy, Gary King, Alessandro Vespignani. “The Parable of Google Flu: Traps in Big Data Analysis,” Science, v.343, 2014, p. 1203.
R. Epstein, R. Robertson, D. Lazer, C. Wilson, “Suppressing the Search Engine Manipulation Effect (SEME). Proceedings of the ACM on Human Computer Interaction, November, 2017
Internet search rankings have a significant impact on consumer choices, mainly because users trust and choose higher-ranked results more than lower-ranked results. Given the apparent power of search rankings, we asked whether they could be manipulated to alter the preferences of undecided voters in democratic elections. Here we report the results of five relevant double-blind, randomized controlled experiments, using a total of 4,556 undecided voters representing diverse demographic characteristics of the voting populations of the United States and India. The fifth experiment is especially notable in that it was conducted with eligible voters throughout India in the midst of India’s 2014 Lok Sabha elections just before the final votes were cast. The results of these experiments demonstrate that (i) biased search rankings can shift the voting preferences of undecided voters by 20% or more, (ii) the shift can be much higher in some demographic groups, and (iii) search ranking bias can be masked so that people show no awareness of the manipulation. We call this type of influence, which might be applicable to a variety of attitudes and beliefs, the search engine manipulation effect. Given that many elections are won by small margins, our results suggest that a search engine company has the power to influence the results of a substantial number of elections with impunity. The impact of such manipulations would be especially large in countries dominated by a single search engine company.
James Larisch and David Choffnes and Dave Levin and Bruce M. Maggs and Alan Mislove and Christo Wilson In Proceedings of IEEE Symposium on Security and Privacy (Oakland 2017). San Jose, CA, May, 2017
Currently, no major browser fully checks for TLS/SSL certificate revocations. This is largely due to the fact that the deployed mechanisms for disseminating revocations (CRLs, OCSP, OCSP Stapling, CRLSet, and OneCRL) are each either incomplete, insecure, inefficient, slow to update, not private, or some combination thereof. In this paper, we present CRLite, an efficient and easily-deployable system for proactively pushing all TLS certificate revocations to browsers. CRLite servers aggregate revocation information for all known, valid TLS certificates on the web, and store them in a space-efficient filter cascade data structure. Browsers periodically download and use this data to check for revocations of observed certificates in real-time. CRLite does not require any additional trust beyond the existing PKI, and it allows clients to adopt a fail-closed security posture even in the face of network errors or attacks that make revocation information temporarily unavailable. We present a prototype of name that processes TLS certificates gathered by Rapid7, the University of Michigan, and Google’s Certificate Transparency on the server-side, with a Firefox extension on the client-side. Comparing CRLite to an idealized browser that performs correct CRL/OCSP checking, we show that CRLite reduces latency and eliminates privacy concerns. Moreover, CRLite has low bandwidth costs: it can represent all certificates with an initial download of 10 MB (less than 1 byte per revocation) followed by daily updates of 580 KB on average. Taken together, our results demonstrate that complete TLS/SSL revocation checking is within reach for all clients.
Zhenhua Li and Weiwei Wang and Tianyin Xu and Xin Zhong and Xiang-Yang Li and Yunhao Liu and Christo Wilson and Ben Y. Zhao In Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2016). Santa Clara, CA, March, 2016.
As mobile cellular devices and traffic continue their rapid growth, providers are taking larger steps to optimize traffic, with the hopes of improving user experiences while reducing congestion and bandwidth costs. This paper presents the design, deployment, and experiences with Baidu TrafficGuard, a cloud-based mobile proxy that reduces cellular traffic using a network-layer VPN. The VPN connects a client-side proxy to a centralized traffic processing cloud. TrafficGuard works transparently across heterogeneous applications, and effectively reduces cellular traffic by 36% and overage instances by 10.7 times for roughly 10 million Android users in China. We discuss a large-scale cellular traffic analysis effort, how the resulting insights guided the design of TrafficGuard, and our experiences with a variety of traffic optimization techniques over one year of deployment.
Muhammad Ahmad Bashir and Sajjad Arshad and William Robertson and Christo Wilson In Proceedings of Usenix Security. Austin, TX, August, 2016
Numerous surveys have shown that Web users are concerned about the loss of privacy associated with online tracking. Alarmingly, these surveys also reveal that people are also unaware of the amount of data sharing that occurs between ad exchanges, and thus underestimate the privacy risks associated with online tracking.
In reality, the modern ad ecosystem is fueled by a flow of user data between trackers and ad exchanges. Although recent work has shown that ad exchanges routinely perform cookie matching with other exchanges, these studies are based on brittle heuristics that cannot detect all forms of information sharing, especially under adversarial conditions.
In this study, we develop a methodology that is able to detect client- and server-side flows of information between arbitrary ad exchanges. Our key insight is to leverage retargeted ads as a tool for identifying information flows. Intuitively, our methodology works because it relies on the semantics of how exchanges serve ads, rather than focusing on specific cookie matching mechanisms. Using crawled data on 35,448 ad impressions, we show that our methodology can successfully categorize four different kinds of information sharing behavior between ad exchanges, including cases where existing heuristic methods fail.
We conclude with a discussion of how our findings and methodologies can be leveraged to give users more control over what kind of ads they see and how their information is shared between ad exchanges.
Elleen Pan, Jingjing Ren, Martina Lindorfer, Christo Wilson, David Choffnes. Panoptispy: Characterizing Audio and Video Exfiltration from Android Applications. In Proceedings of the 2018 Privacy Enhancing Technologies Symposium (PETs'18), July, 2018.
The high-fidelity sensors and ubiquitous internet connectivity offered by mobile devices have facilitated an explosion in mobile apps that rely on multimedia features. However, these sensors can also be used in ways that may violate user’s expectations and personal privacy. For example, apps have been caught taking pictures without the user’s knowledge and passively listened for inaudible, ultrasonic audio beacons. The developers of mobile device operating systems recognize that sensor data is sensitive, but unfortunately existing permission models only mitigate some of the privacy concerns surrounding multimedia data.
In this work, we present the first large-scale empirical study of media permissions and leaks from Android apps, covering 17,260 apps from Google Play, AppChina, Mi.com, and Anzhi. We study the behavior of these apps using a combination of static and dynamic analysis techniques. Our study reveals several alarming privacy risks in the Android app ecosystem, including apps that over-provision their media permissions and apps that share image and video data with other parties in unexpected ways, without user knowledge or consent. We also identify a previously unreported privacy risk that arises from third-party libraries that record and upload screenshots and videos of the screen without informing the user and without requiring any permissions.
Yang, X., Yang, Q., & Wilson, C. (2015). Penny for Your Thoughts: Searching for the 50 Cent Party on Sina Weibo. ICWSM.
Evidence suggests that the Chinese government employs “Internet Commentators” to post propaganda on social media. This group is pejoratively nicknamed the “50 cent party” or Wumao. In this study, we make the first attempt to quantify the size and behavior of the Wumao. Our study leverages a large corpus of data from Sina Weibo (Twitter in China) that includes 26M tweets and comments from 2.7M users over the span of one year. Unfortunately, detecting the Wumao is difficult because there is no ground truth information about them. To overcome this challenge, we apply a series of unsupervised techniques to filter our dataset and isolate suspicious users who exhibit characteristics indicative of being Wumao.
Aniko Hannak and Claudia Wagner and David Garcia and Alan Mislove and Markus Strohmaier and Christo Wilson In 20th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2017). Portland, OR, February, 2017
Online freelancing marketplaces have grown quickly in recent years. In theory, these sites offer workers the ability to earn money without the obligations and potential social biases associated with traditional employment frameworks. In this paper, we study whether two prominent online freelance marketplaces – TaskRabbit and Fiverr – are impacted by racial and gender bias. From these two platforms, we collect 13,500 worker profiles and gather information about workers’ gender, race, customer reviews, ratings, and positions in search rankings. In both marketplaces, we find evidence of bias: we find that gender and race are significantly correlated with worker evaluations, which could harm the employment opportunities afforded to the workers. We hope that our study fuels more research on the presence and implications of discrimination in online environments.