Anupam Datta

Anupam Datta

Associate Professor, CSD, ECE

Email: danupam@andrew.cmu.edu

My research goal is to develop science and technology to account for information flows in complex systems, including big data systems and cryptographic protocols.

Accountability in Big Data Systems/Machine Learning

A specific focus is on accountability in big data systems that employ machine learning. We are developing theories and tools that can be used to provide oversight of complex information processing ecosystems (including big data systems) to ensure that they respect privacy, and other desirable values in the personal data protection area, such as fairness and transparency. This includes foundations, methods, and tools for detection of violations, explanations for decisions by machine learning systems, attribution or responsibility-assignment for the violations, and correction of responsible entities to avoid future violations. The technical work is informed by and applied to significant practical privacy problems in a broad range of sectors, including Web and healthcare privacy. 

Significant recent results include the following:

  • Algorithmic transparency via Quantitative Input Influence -- an approach to measuring causal influence of features on decisions of a machine learnt classifier [IEEE S & P 2016]
  • The first statistically rigorous methodology for information flow experiments to discover personal data use by black-box Web services [CSF 2015]. The AdFisher tool that implements an augmented version of this methodology to enable discovery at scale; and its application in the first study to demonstrate statististically significant evidence of discrimination in online behavioral advertising [PETS 2015] (see also the FAQ on this study and AdFisher)
  • The first automated privacy compliance analysis of the production code of an Internet-scale system -- the big data analytics pipeline for Bing, Microsoft's search engine; leverages our usable privacy policy language called Legalease, and an information flow analysis methodology (joint work with Microsoft Research) [IEEE S & P 2014]
  • The first complete logical specification of all disclosure-related clauses of the HIPAA Privacy Rule for healthcare privacy [WPES2010] and audit algorithms that apply to it and, more generally, to a rich class of policies (fragments of metric first-order temporal logic) [CCS 2011CAV 2014CCS 2015]
  • The first formal semantics for purpose restrictions on information use and associated audit algorithms[IEEE S & P 2012ESORICS 2013]
  • A formalization of privacy as contextual integrity [IEEE S & P 2006] (see also the White House's Consumer Privacy Bill of Rights)