Talk Abstracts

Defining Privacy, Why and How
Ilya Mironov - Google

We survey practical and theoretical attacks on disclosure mechanisms, which led to the advent of the definition of differential privacy. We motivate the definition and discuss its properties. We conclude with discussion of relaxations of differential privacy.

 

Privacy in personal-data networks
Elena Zheleva - University of Illinois at Chicago

In an age when data about human activity is collected at unprecedented levels across multiple digital platforms, protecting individual privacy is harder than ever. Many real-world data sources can be represented as heterogeneous information graphs, in which data about individuals is encoded in nodes and edges represent connections between individuals on the same or different platforms. In this talk, I will provide an overview of some of the technical and algorithmic issues related to privacy in graphs. I will give an overview of common statistical-inference techniques that can be used to infer potentially sensitive information in graphs, privacy mechanisms for graph data publishing, and techniques for modeling, evaluating, and managing individual users’ privacy risk.

 

Link prediction and privacy in networks
Aaron Clauset - University of Colorado, Boulder

The predictability of missing information is a key measure of how private that information may be. That is, my private information is necessarily vulnerable to inference because of correlations with data that you observe or I disclose. Networks can enhance these vulnerabilities because of social homophily, i.e., the tendency for individual attributes (disclosed or private) to correlate across connections. Hence, knowing who connects to whom can make missing information more predictable, and the more complete the knowledge of the network, the more accurate such predictions tend to be. In this talk, I will describe a broad empirical analysis of the degree to which missing network connections are themselves predictable from network data alone. These results arise from a comparison of 11 state-of-the-art ""community detection"" algorithms, which decompose a network into its constituent structural building blocks, a task qualitatively similar to clustering on a graph. We evaluate the accuracies of these algorithms for predicting missing links using a large corpus of 572 structurally diverse real-world networks, representing a variety of social, biological, technological, economic, and transportation systems. Across algorithms, we find wide variation in their link prediction accuracies, as a result of different tradeoffs they make in over- or under-fitting to network data. And, confirming the recent No Free Lunch theorem for community detection, we find that no algorithm is always the best across all inputs at this task. However, comparing link prediction accuracies across different types of networks, we find that missing links in social networks are generally the easiest to predict. This robust behavior suggests that privacy in social networks may be fundamentally limited.

 

Privacy-preserving friend suggestions revisited
Aleksandra Korolova - University of Southern California

TBD

 

Privacy-first Graph Data Management Systems
Amol Deshpande - University of Maryland

Recent regulations like EU GDPR and California Consumer Privacy Act raise an intriguing possibility of fundamentally rethinking end-to-end data lifecycle and developing new data-centric systems that ensure privacy, accountability, and transparency by design. I will discuss some of the challenges in building such systems for managing, querying, and analyzing large volumes of graph data. In particular, I will focus on the crucial role of fine-grained provenance in such a system and present our recent work on capturing and analyzing provenance and context information for collaborative data science activities.

 

Composition of privacy
Thomas Steinke - IBM Research, Almaden

Differential privacy has been widely accepted as a high standard of privacy protection and the key to its success is composition -- the property that combining the information released by multiple independent differentially private algorithms remains predictably differentially private. This is especially important in our highly interconnected world where personal information is spread across many locations. In this talk I will discuss concentrated differential privacy and its variants, which reformulate differential privacy in information-theoretic terms to provide sharp tools for analyzing composition.

 

Can Meaningful Graph Queries on Social Media Data be Private?
Amarnath Gupta - San Diego Supercomputing Center

We will present a use case where social media data is obtained based on deliberate filtering and stored in a property graph database. The application poses aggregate queries using a query language supporting Extended Conjunctive Regular Path Queries. Can this data be made private and yet yield meaningful results for the application? The talk will present the use case in detail and ask how the techniques of privacy can be applied to the use case.

 

Privacy in Cyber-Physical Networks
Alvaro Cardenas - University of Texas at Dallas

Our physical infrastructures are being modernized to meet the energy and efficiency challenges of the 21st century: from modernizing the power grid, to creating intelligent transportation systems to improve the efficiency of our limited resources. In order to achieve these goals, we are collecting data at unprecedented levels of granularity, and because sensing is passive, users are generally unaware of their exposure. In this talk we will discuss some of the privacy challenges, threats, and unique privacy enhancing algorithms. From discussing how to achieve data minimization in synchronizing smart grids through their computer network graph and their physical graph (the properties of these graphs determine the degree of data needed for coordination), to how cyber-physical systems provide unique challenges and opportunities for the use of differential privacy.

 

Differential privacy for relational data
Ashwin Machanavajjala - Duke University

Differential privacy has traditionally been defined for flat tables where the privacy object (or the individual) has information on a small number (usually one) record in the table. However, real data are relational and seldom can be fully captured as flat files. In this talk, I will describe ongoing work on building a relational database engine that can support differentially private query answering. I will highlight challenges in defining privacy in the presence of constraints, computing the sensitivity of queries and the problem of inference from noise answers.

 

Privacy preserving prediction
Vitaly Feldman - Google

Ensuring differential privacy of models learned from sensitive user data is an important goal that has been studied extensively in recent years. It is now known that for some basic learning problems, especially those involving high-dimensional data, producing an accurate private model requires much more data than learning without privacy. At the same time, in many applications it is not necessary to expose the model itself. Instead users may be allowed to query the prediction model on their inputs only through an appropriate interface. Here we formulate the problem of ensuring privacy of individual predictions and investigate the overheads required to achieve it in several standard models of classification and regression.

 

Analyzing Graphs with Node Differential Privacy
Shiva Kasivishwanathan - Amazon

Many types of data can be represented as graphs, where nodes correspond to individuals and edges capture relationships between them. On one hand, such datasets contain sensitive information about individuals. On the other hand, global information that can be gleaned from their analysis can provide significant benefits to society. In this talk, we will discuss algorithms for analyzing graph data that satisfy a rigorous notion of privacy, called node differential privacy. We will survey existing results in this area and present some tools for designing node differentially private algorithms.

 

Per-instance differential privacy
Yu-Xiang Wang - University of California, Santa Barbara

I will talk about a recent work on “per instance differential privacy” (pDP), which captures the privacy of a specific individual with respect to a fixed data set. This allows us to inspect the privacy footprint of a randomized algorithm on a more refined level. pDP can also be used as an analytical tool that enables more refined data-dependent analysis of DP and data-dependent algorithm design. I will also talk about why pDP is particularly promising direction in privacy-preserving graph algorithms. Specifically, pDP enables us to measure the precise privacy loss incurred to a particular edge or a particular node when the current graph is fed into a randomized algorithm. This avoids using only the DP guarantee --- a crude upper bound --- that depends on a hypothetical worst-case graph (a chain graph) and the largest possible perturbation (adding or removing Justin Bieber from twitter).

 

Regression with relational data 
Bailey Fosdick - Colorado State University

Relational data represent interactions or associations between pairs of actors, often in varied contexts or over time. Such data appear as, for example, trade flows between countries, financial transactions between individuals, contact frequencies between school children in classrooms, and dynamic protein-protein interactions. A common goal in an analysis of such data is to model the relations as a linear function of observable covariates. Valid statistical inference for the model parameters rely on accurate uncertainty estimates which must account for both heterogeneity across actors and dependence arising from relations involving the same actor. Often statisticians make various independence assumptions to perform inference and show theoretical properties, such as consistency, of their estimators. In addition, they leverage the dependence among relations to make predictions for missing data. Thus, relational dependence is an advantage in the prediction problem, and yet hinders estimators from having good properties. In this talk, we begin a discussion of this tradeoff and its connections with privacy.

 

Privacy-preserving Data Mining and Analytics at LinkedIn: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi - LinkedIn

Preserving privacy of users is a key requirement of web-scale data mining, machine learning, and analytics applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this talk, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting two case studies at LinkedIn: differential privacy for LinkedIn analytics applications (https://arxiv.org/pdf/1809.07754), and LinkedIn Salary privacy design (https://arxiv.org/pdf/1705.06976), and reflect on the privacy challenges associated with applications enabled by the LinkedIn Economic Graph.

 

Adventures in Practical Machine Learning with Privacy
Ulfar Erlingsson - Google

For the last several years, Google has been leading the development and real-world deployment of state-of-the-art, practical techniques for learning statistics and ML models with strong privacy guarantees for the data involved.  I'll give an overview of our work, and the practical techniques we've developed for training Deep Neural Networks with strong  privacy guarantees.  In particular, I'll cover recent results that show how local differential privacy guarantees can be strengthened by the addition of anonymity, and explain the motivation for that work. I'll also cover recent work on uncovering and measuring privacy problems due to unintended memorization in machine learning models.

 

Private Selection from Private Candidates
Kunal Talwar - Google

Differentially Private algorithms often need to select the best amongst many candidate options. Classical works on this selection problem require that the candidates' goodness, measured as a real-valued score function, does not change by much when one person's data changes. In many applications such as hyperparameter optimization, this stability assumption is much too strong. In this work, we consider the selection problem under a much weaker stability assumption on the candidates, namely that the score functions are differentially private. Under this assumption, we present algorithms that are near-optimal along the three relevant dimensions: privacy, utility and computational efficiency.
Our result can be seen as a generalization of the exponential mechanism and its existing generalizations. We also develop an online version of our algorithm, that can be seen as a generalization of the sparse vector technique to this weaker stability assumption. We show how our results imply better algorithms for hyperparameter selection in differentially private machine learning, as well as for adaptive data analysis.  (Joint work with JIngcheng Liu.)

 

Differentially Private estimation of Exponential Random Graph Models
Vishesh Karwa - Temple University

Motivated by a real-life problem of sharing social network data that contain sensitive personal information, we propose approach to release and analyze synthetic graphs in order to protect privacy of individual relationships captured by the social network while maintaining the validity of statistical results. We focus on estimation of exponential random graph models, which are the most widely used models by social scientists. We use a simple yet effective randomized response mechanism to generate synthetic networks under ε-edge differential privacy, and then use likelihood based inference for missing data and Markov chain Monte Carlo techniques to fit exponential-family random graph models to the generated synthetic networks.

 

Graph Querying, Mining, and Privacy Issues
Xifeng Yan - University of California, Santa Barbara

In this talk, I will overview a set of graph querying and mining tasks we have done in the past, including knowledge graph querying, graph anomaly detection, graph correlation, graph iceberg, and graph summarization. A bunch of privacy issues related to these tasks will be raised and we look for their solutions.

 

Composition, Verification, and Differential Privacy
Justin Hsu - University of Wisconsin-Madison

Differential Privacy's strong composition properties have been fundamental to its immense impact. We briefly recall these properties, commenting on how they enable formal verification for privacy and how they can suggest further improvements to the definition of differential privacy.

 

Symmetric graph properties have independent edges
Dimitris Achlioptas - University of California Santa Cruz

Random graph models face a tradeoff between realism and tractability, the latter typically enabled by independence assumptions. In this work we initiate an effort to bridge this gap by developing tools that allow us to work with independence without assuming it. Concretely, let Gn be the set of all graphs on n vertices and let S be an arbitrary subset of it, e.g., the set of all graphs with m edges. What are general sufficient conditions for the uniform distribution on S to be well-approximated by a product distribution? (Joint work with Paris Siminelakis.)

 

Inferring Graph Properties with Limited Access to the Graph
Ali Pinar - Sandia National Laboratories 

In many applications of network analysis, we do not have the full graph structure, and when we need to infer properties of these graphs with limited access to data. In this talk, I will give an example for finding nodes with high betweenness centrality on an active graph without full access to the graph.

 

Privacy in (Social) Context: The Map is Not the Territory
Luke Stark - Microsoft Research Montreal/Harvard University

Scholarship both recent and established in sociology, information science, HCI and media studies has highlighted the importance of social context to human understandings of privacy. In this brief talk, I'll provide an overview of recent conversations and developments in conceptualizing privacy and studying it via digital means. In particular, I'll highlight the utility of mixed method studies and engagement with interdisciplinary teams as a means to distinguish tidy computational maps from messy human territories where privacy is concerned.

Poster Abstracts

Protein Interface Prediction Using Graph Convolutional Network
Alex Fout - Colorado State University

We consider the prediction of interfaces between proteins, a challenging problem with important applications in drug discovery and design, and examine the performance of existing and newly proposed spatial graph convolution operators for this task. By performing convolution over a local neighborhood of a node of interest, we are able to stack multiple layers of convolution and learn effective latent representations that integrate information across the graph that represent the three dimensional structure of a protein of interest. An architecture that combines the learned features across pairs of proteins is then used to classify pairs of amino acid residues as part of an interface or not. In our experiments, several graph convolution operators yielded accuracy that is better than the state-of-the-art SVM method in this task.

 

APEx: Accuracy-Aware Differentially Private Data Exploration    
Xi He - Duke University

Organizations are increasingly interested in allowing external data scientists to explore their sensitive datasets. Due to the popularity of differential privacy, data owners want the data exploration to ensure provable privacy guarantees. However, current systems for differentially private query answering place an inordinate burden on the data analysts to understand differential privacy, manage their privacy bud- get and even implement new algorithms for noisy query answering. Moreover, current systems do not provide any guarantees to the data analyst on the quantity they care about, namely accuracy of query answers.  In this poster, we present APEx, a novel system that allows data analysts to pose adaptively chosen sequences of queries along with required accuracy bounds. By translating queries and accuracy bounds into differentially private algorithms with the least privacy loss, APEx returns query answers to the data analyst that meet the accuracy bounds, and proves to the data owner that the entire data exploration process is differentially private. Our comprehensive experimental study on real datasets demonstrates that APEx can answer a variety of queries accurately with moderate to small privacy loss, and can support data exploration for performing entity resolution tasks with high accuracy under reasonable privacy settings. 

 

Model-Agnostic Private Learning     
Om Dipakbhai Thakkar - Boston University

TBD

 

Co-Location Networks:  Construction, Privacy Concerns, and Latent Structure Identification    
Wenna Xi - The Ohio State University

Co-location networks are a special case of two-mode networks, or bipartite graphs, where ties can only exist between the two modes: individuals and geographic locations.  The goal of analyzing co-location networks is to simultaneously study the "dual" association between individuals and the geographic locations where they spend time.  Observed co-location networks present significant privacy concerns due to the fact that the potential for identity disclose increases with the number of links to specific geographic locations.  In this poster presentation, we will discuss privacy concerns related to co-location networks and present preliminary results obtained by the application of a new statistical methodology to analyze them, implemented within a secure data computing environment.  Specifically, we present a Bayesian hierarchical model that allows us to study the dual association between individuals and locations. Our approach can be viewed as a model-based analogue to non-negative matrix factorization (NMF). In particular, we assume that there exists a collection of latent variables, which define "communities" that simultaneously identify groups of individuals who frequent the same locations and groups of locations that are visited by the same people. We illustrate our model via simulation and using data from the Adolescent Health and Development in Context Study.