805 Columbus Avenue
614 Interdisciplinary Science and Engineering Complex (ISEC)
Boston, MA 02120
ATTN: Alan Mislove, 435 ISEC
360 Huntington Avenue
Boston, MA 02115
- Network measurement
- Privacy and security issues associated with online social networks
- PhD in Computer Science, Rice University
- MS in Computer Science, Rice University
- BA in Computer Science, Rice University
Alan Mislove is a professor and the associate dean and director of Undergraduate Programs in the Khoury College of Computer Sciences at Northeastern University. He received his BA, MS, and PhD from Rice University in 2002, 2005, and 2009, respectively. Prior to joining Northeastern, he was a co-founder and Chief Data Architect at Janys Analytics in Boston, Massachusetts.
Mislove’s primary field of interest concerns distributed systems and networks, with a focus on using social networks to enhance the security, privacy, and efficiency of newly emerging systems. He is a core faculty member of the Cybersecurity and Privacy Institute, which forages partnerships with experts in the industry, government, and academia worldwide.
In 2019, Mislove won the IRTF Applied Networking Research Prize for IMC ’17 paper for his work on understanding the role of registrars in DNSSEC deployment. His work has been funded by Amazon Web Services, the Army Research Office, the Data Transparency Lab, Facebook, Google, and the NSF. He was a recipient of the NSF Career Award in 2011, and his work has been covered by The Wall Street Journal, The New York Times, and CBS Evening News.
Chen, Le, Alan Mislove, and Christo Wilson. "Peeking Beneath the Hood of Uber." Proceedings of the 2015 ACM Conference on Internet Measurement Conference. ACM, 2015.
Recently, Uber has emerged as a leader in the “sharing economy”. Uber is a “ride sharing” service that matches willing drivers with customers looking for rides. However, unlike other open marketplaces (e.g., AirBnB), Uber is a black-box: they do not provide data about supply or demand, and prices are set dynamically by an opaque “surge pricing” algorithm. The lack of transparency has led to concerns about whether Uber artificially manipulate prices, and whether dynamic prices are fair to customers and drivers. In order to understand the impact of surge pricing on passengers and drivers, we present the first in-depth investigation of Uber. We gathered four weeks of data from Uber by emulating 43 copies of the Uber smartphone app and distributing them throughout downtown San Francisco (SF) and midtown Manhattan. Using our dataset, we are able to characterize the dynamics of Uber in SF and Manhattan, as well as identify key implementation details of Uber’s surge price algorithm. Our observations about Uber’s surge price algorithm raise important questions about the fairness and transparency of this system.
Zhang, Liang, et al. "Analysis of SSL certificate reissues and revocations in the wake of Heartbleed." Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, 2014.
Central to the secure operation of a public key infrastructure (PKI) is the ability to revoke certificates. While much of users’ security rests on this process taking place quickly, in practice, revocation typically requires a human to decide to reissue a new certificate and revoke the old one. Thus, having a proper understanding of how often systems administrators reissue and revoke certificates is crucial to understanding the integrity of a PKI. Unfortunately, this is typically difficult to measure: while it is relatively easy to determine when a certificate is revoked, it is difficult to determine whether and when an administrator should have revoked.
In this paper, we use a recent widespread security vulnerability as a natural experiment. Publicly announced in April 2014, the Heartbleed OpenSSL bug, potentially (and undetectably) revealed servers’ private keys. Administrators of servers that were susceptible to Heartbleed should have revoked their certificates and reissued new ones, ideally as soon as the vulnerability was publicly announced.
Using a set of all certificates advertised by the Alexa Top 1 Million domains over a period of six months, we explore the patterns of reissuing and revoking certificates in the wake of Heartbleed. We find that over 73% of vulnerable certificates had yet to be reissued and over 87% had yet to be revoked three weeks after Heartbleed was disclosed. Moreover, our results show a drastic decline in revocations on the weekends, even immediately following the Heartbleed announcement. These results are an important step in understanding the manual processes on which users rely for secure, authenticated communication.
Hannak, Aniko, et al. "Measuring price discrimination and steering on e-commerce web sites." Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, 2014.
Today, many e-commerce websites personalize their content, including Netflix (movie recommendations), Amazon (product suggestions), and Yelp (business reviews). In many cases, personalization provides advantages for users: for example, when a user searches for an ambiguous query such as “router,” Amazon may be able to suggest the woodworking tool instead of the networking device. However, personalization on e-commerce sites may also be used to the user’s disadvantage by manipulating the products shown (price steering) or by customizing the prices of products (price discrimination). Unfortunately, today, we lack the tools and techniques necessary to be able to detect such behavior.
In this paper, we make three contributions towards addressing this problem. First, we develop a methodology for accurately measuring when price steering and discrimination occur and implement it for a variety of e-commerce web sites. While it may seem conceptually simple to detect differences between users’ results, accurately attributing these differences to price discrimination and steering requires correctly addressing a number of sources of noise. Second, we use the accounts and cookies of over 300 real-world users to detect price steering and discrimination on 16 popular e-commerce sites. We find evidence for some form of personalization on nine of these e-commerce sites. Third, we investigate the effect of user behaviors on personalization. We create fake accounts to simulate different user features including web browser/OS choice, owning an account, and history of purchased or viewed products. Overall, we find numerous instances of price steering and discrimination on a variety of top e-commerce sites.
Zhang, Liang, and Alan Mislove. "Building confederated web-based services with priv. io." Proceedings of the first ACM conference on Online social networks. ACM, 2013.
With the increasing popularity of Web-based services, users today have access to a broad range of free sites, including social networking, microblogging, and content sharing sites. In order to offer a service for free, service providers typically monetize user content, selling results to third parties such as advertisers. As a result, users have little control over their data or privacy. A number of alternative approaches to architecting today’s Web-based services have been proposed, but they suffer from limitations such as relying the creation and installation of additional client-side software, providing insufficient reliability, or imposing an excessive monetary cost on users.
In this paper, we present Priv.io, a new approach to building Web-based services that offers users greater control and privacy over their data. We leverage the fact that today, users can purchase storage, bandwidth, and messaging from cloud providers at fine granularity: In Priv.io, each user provides the resources necessary to support their use of the service using cloud providers such as Amazon Web Services. Users still access the service using a Web browser, all computation is done within users’ browsers, and Priv.io provides rich and secure support for third-party applications. An implementation demonstrates that Priv.io works today with unmodified versions of common Web browsers on both desktop and mobile devices, is both practical and feasible, and is cheap enough for the vast majority users.
Taking a Long Look at QUIC: An Approach for Rigorous Evaluation of Rapidly Evolving Transport Protocols
Arash Molavi Kakhki, Samuel Jero, David Choffnes, Alan Mislove, and Cristina Nita-Rotaru In Proceedings of ACM Internet Measurement Conference (IMC'17), London, United Kingdom, Nov 2017.
Google’s QUIC protocol, which implements TCP-like properties at the application layer atop a UDP transport, is now used by the vast majority of Chrome clients accessing Google properties but has no formal state machine specification, limited analysis, and ad-hoc evaluations based on snapshots of the protocol implementation in a small number of environments. Further frustrating attempts to evaluate QUIC is the fact that the protocol is under rapid development, with extensive rewriting of the protocol occurring over the scale of months, making individual studies of the protocol obsolete before publication. Given this unique scenario, there is a need for alternative techniques for understanding and evaluating QUIC when compared with previous transport-layer protocols. First, we develop an approach that allows us to conduct analysis across multiple versions of QUIC to understand how code changes impact protocol effectiveness. Next, we instrument the source code to infer QUIC’s state machine from execution traces. With this model, we run QUIC in a large number of environments that include desktop and mobile, wired and wireless environments and use the state machine to understand differences in transport- and application-layer performance across multiple versions of QUIC and in different environments. QUIC generally outperforms TCP, but we also identified performance issues related to window sizes, re-ordered packets, and multiplexing large number of small objects; further, we identify that QUIC’s performance diminishes on mobile devices and over cellular networks.
Venkatadri, Giridhari et al. “Investigating sources of PII used in Facebook's targeted advertising.” PoPETs 2019 (2018): 227-244.
Online social networking services have become the gateway to the Internet for millions of users, accumulating rich databases of user data that form the basis of their powerful advertising platforms. Today, these services frequently collect various kinds of personally identifying information (PII), such as phone numbers, email addresses, and names and dates of birth. Since this PII often represents extremely accurate, unique, and verified user data, these services have the incentive to exploit it for other purposes, including to provide advertisers with more accurate targeting. Indeed, most popular services have launched PII-based targeting features that allow advertisers to target users with ads directly by uploading the intended targets’ PII. Unfortunately, these services often do not make such usage clear to users, and it is often impossible for users to determine how they are actually being targeted by advertisers.
In this paper, we focus on Facebook and investigate the sources of PII used for its PII-based targeted advertising feature. We develop a novel technique that uses Facebook’s advertiser interface to check whether a given piece of PII can be used to target some Facebook user, and use this technique to study how Facebook’s advertising service obtains users’ PII. We investigate a range of potential sources of PII, finding that phone numbers and email addresses added as profile attributes, those provided for security purposes such as two-factor authentication, those provided to the Facebook Messenger app for the purpose of messaging, and those included in friends’ uploaded contact databases are all used by Facebook to allow advertisers to target users. These findings hold despite all the relevant privacy controls on our test accounts being set to their most private settings. Overall, our paper highlights the need for the careful design of usable privacy controls for, and detailed disclosure about, the use of sensitive PII in targeted advertising.
Taejoong Chung, Jay Lok, Balakrishnan Chandrasekaran, David Choffnes, Dave Levin, Bruce M. Maggs, Alan Mislove, John Rula, Nick Sullivan, and Christo Wilson. 2018. Is the Web Ready for OCSP Must-Staple?. In 2018 Internet Measurement Conference (IMC ’18), October 31-November 2, 2018, Boston, MA, USA. ACM, New York, NY, USA, 14 pages. https://doi.org/ 10.1145/3278532.3278543
TLS, the de facto standard protocol for securing communications over the Internet, relies on a hierarchy of certificates that bind names to public keys. Naturally, ensuring that the communicating parties are using only valid certificates is a necessary first step in order to benefit from the security of TLS. To this end, most certificates and clients support OCSP, a protocol for querying a certificate’s revocation status and confirming that it is still valid. Unfortunately, however, OCSP has been criticized for its slow performance, unreliability, soft-failures, and privacy issues. To address these issues, the OCSP Must-Staple certificate extension was introduced, which requires web servers to provide OCSP responses to clients during the TLS handshake, making revocation checks low-cost for clients. Whether all of the players in the web’s PKI are ready to support OCSP Must-Staple, however, remains still an open question.
In this paper, we take a broad look at the web’s PKI and determine if all components involved—namely, certificate authorities, web server administrators, and web browsers—are ready to support OCSP Must-Staple. We find that each component does not yet fully support OCSP Must-Staple: OCSP responders are still not fully reliable, and most major web browsers and web server implementations do not fully support OCSP Must-Staple. On the bright side, only a few players need to take action to make it possible for web server administrators to begin relying on certificates with OCSP Must-Staple. Thus, we believe a much wider deployment of OCSP Must-Staple is an realistic and achievable goal.
G. Venkatadri et al., "Privacy Risks with Facebook's PII-Based Targeting: Auditing a Data Broker's Advertising Interface," 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, 2018, pp. 89-107.
Sites like Facebook and Google now serve as de facto data brokers, aggregating data on users for the purpose of implementing powerful advertising platforms. Historically, these services allowed advertisers to select which users see their ads via targeting attributes. Recently, most advertising platforms have begun allowing advertisers to target users directly by uploading the personal information of the users who they wish to advertise to (e.g., their names, email addresses, phone numbers, etc.); these services are often known as custom audiences. Custom audiences effectively represent powerful linking mechanisms, allowing advertisers to leverage any PII (e.g., from customer data, public records, etc.) to target users.
In this paper, we focus on Facebook’s custom audience implementation and demonstrate attacks that allow an adversary to exploit the interface to infer users’ PII as well as to infer their activity. Specifically, we show how the adversary can infer users’ full phone numbers knowing just their email address, determine whether a particular user visited a website, and de-anonymize all the visitors to a website by inferring their phone numbers en masse. These attacks can be conducted without any interaction with the victim(s), cannot be detected by the victim(s), and do not require the adversary to spend money or actually place an ad. We propose a simple and effective fix to the attacks based on reworking the way Facebook de-duplicates uploaded information. Facebook’s security team acknowledged the vulnerability and has put into place a fix that is a variant of the fix we propose. Overall, our results indicate that advertising platforms need to carefully consider the privacy implications of their interfaces.
Liang Zhang, Fangfei Zhou, Alan Mislove, and Ravi Sundaram. 2013. Maygh: building a CDN from client web browsers. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 281-294. DOI=http://dx.doi.org/10.1145/2465351.2465379
Over the past two decades, the web has provided dramatic improvements in the ease of sharing content. Unfortunately, the costs of distributing this content are largely incurred by web site operators; popular web sites are required to make substantial monetary investments in serving infrastructure or cloud computing resources—or must pay other organizations (e.g., content distribution networks)—to help serve content. Previous approaches to offloading some of the distribution costs onto end users have relied on client-side software or web browser plug-ins, providing poor user incentives and dramatically limiting their scope in practice.
In this paper, we present Maygh, a system that builds a content distribution network from client web browsers, without the need for additional plug-ins or client-side software. The result is an organically scalable system that distributes the cost of serving web content across the users of a web site. Through simulations based on real-world access logs from Etsy (a large e-commerce web site that is the 50th most popular web site in the U.S.), microbenchmarks, and a small-scale deployment, we demonstrate that Maygh provides substantial savings to site operators, imposes only modest costs on clients, and can be deployed on the web sites and browsers of today. In fact, if Maygh was deployed to Etsy, it would reduce network bandwidth due to static content by 75% and require only a single coordinating server.
Taejoong Chung, Roland van Rijswijk-Deij, Balakrishnan Chandrasekaran, David Choffnes, Dave Levin, Bruce M. Maggs, Alan Mislove, and Christo Wilson In Proceedings of USENIX Security Symposium (USENIX Security'17), Vancouver, Canada, Aug 2017
The Domain Name System’s Security Extensions (DNSSEC) allow clients and resolvers to verify that DNS responses have not been forged or modified inflight. DNSSEC uses a public key infrastructure (PKI) to achieve this integrity, without which users can be subject to a wide range of attacks. However, DNSSEC can operate only if each of the principals in its PKI properly performs its management tasks: authoritative name servers must generate and publish their keys and signatures correctly, child zones that support DNSSEC must be correctly signed with their parent’s keys, and resolvers must actually validate the chain of signatures.
This paper performs the first large-scale, longitudinal measurement study into how well DNSSEC’s PKI is managed. We use data from all DNSSEC-enabled subdomains under the
.net TLDs over a period of 21 months to analyze DNSSEC deployment and management by domains; we supplement this with active measurements of more than 59K DNS resolvers worldwide to evaluate resolver-side validation.
Our investigation reveals pervasive mismanagement of the DNSSEC infrastructure. For example, we found that 31% of domains that support DNSSEC fail to publish all relevant records required for validation; 39% of the domains use insufficiently strong key-signing keys; and although 82% of resolvers in our study request DNSSEC records, only 12% of them actually attempt to validate them. These results highlight systemic problems, which motivate improved automation and auditing of DNSSEC management.
Liu, Yabing, et al. "Analyzing facebook privacy settings: user expectations vs. reality." Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM, 2011.
The sharing of personal data has emerged as a popular activity over online social networking sites like Facebook. As a result, the issue of online social network privacy has received significant attention in both the research literature and the mainstream media. Our overarching goal is to improve defaults and provide better tools for managing privacy, but we are limited by the fact that the full extent of the privacy problem remains unknown; there is little quantification of the incidence of incorrect privacy settings or the difficulty users face when managing their privacy.
In this paper, we focus on measuring the disparity between the desired and actual privacy settings, quantifying the magnitude of the problem of managing privacy. We deploy a survey, implemented as a Facebook application, to 200 Facebook users recruited via Amazon Mechanical Turk. We find that 36% of content remains shared with the default privacy settings. We also find that, overall, privacy settings match users’ expectations only 37% of the time, and when incorrect, almost always expose content to more users than expected. Finally, we explore how our results have potential to assist users in selecting appropriate privacy settings by examining the user-created friend lists. We find that these have significant correlation with the social network, suggesting that information from the social network may be helpful in implementing new tools for managing privacy.