Polisis: Automated Analysis and Presentation of Privacy ... · ﬁne-grained privacy classes for...

Polisis: Automated Analysis and Presentation ofPrivacy Policies Using Deep Learning

Technical Report, EPFL, October 2017

Hamza HarkousEPFL

Switzerland

Kassem FawazUniversity of Wisconsin

USA

Remi LebretEPFL

Switzerland

Florian SchaubUniversity of Michigan

USA

Kang G. ShinUniversity of Michigan

USA

Karl AbererEPFL

Switzerland

Abstract—Privacy policies are the primary channel throughwhich companies inform users about their data collection andsharing practices. In their current form, policies remain long anddifficult to comprehend, thus merely serving the goal of legallyprotecting the companies. Short notices based on informationextracted from privacy policies have been shown to be usefuland more usable, but face a significant scalability hurdle, giventhe number of policies and their evolution over time. Companies,users, researchers and regulators still lack usable and scalabletools to cope with the breadth and depth of privacy policies.To address these hurdles, we propose Polisis, an automatedframework for privacy Policies analysis. It enables scalable,dynamic, and multi-dimensional queries on privacy policies. Atthe core of Polisis is a privacy-centric language model, built with130K privacy policies, and a novel hierarchy of neural-networkclassifiers that caters to the high-level aspects and the fine-graineddetails of privacy practices. We demonstrate Polisis’s modularityand utility with two robust applications that support structuredand free-form querying. The structured querying application isthe automated assignment of privacy icons from the privacypolicies. With Polisis, we can achieve an accuracy of 88.4% onthis task, when evaluated against earlier annotations by a groupof three legal experts. The second application is PriBot, the firstfree-form Question Answering system for Privacy policies. Weshow that PriBot can produce a correct answer among its top-3results for 82% of the test questions.

I. INTRODUCTION

Privacy policies are one of the most common ways ofproviding notice and choice online. They intend to informusers how companies collect, store and manage their personalinformation. Although some service providers have improvedthe comprehensibility and readability of their privacy policies,these policies remain excessively long and difficult to fol-low [1], [2], [3], [4], [5]. In 2008, McDonald and Cranor [4]estimated that it would take an average user 201 hours toread all the privacy policies encountered in a year. Sincethen, we have witnessed a smartphone revolution and therise of the Internet of Things (IoTs). As a result, in 2017,users are likely to spend substantially more time on readingthe privacy policy of each website, app, device, or servicethey interact with [6]. In addition to the increasing numberof privacy policies, emerging technologies brought along newforms of user interfaces (UIs), such as voice-controlled devicesor wearables, for which existing techniques for presentingprivacy policies are not suitable [3], [6], [7], [8].

Problem Description: For the above reasons, users,researchers, and regulators are not well-equipped to pro-cess/understand the content of privacy policies. Users are sur-prised at data practices that do not meet their expectations [9],hidden in long, vague, and ambiguous policies. Researchershave to employ expert annotators to analyze and reason abouta subset of the available privacy policies [10], [11]. Regulators,such as the U.S. Department of Commerce, rely on companiesto self-certify their compliance with privacy practices (e.g., thePrivacy Shield Framework [12] between the US and EU tomanage transatlantic personal data transfer). The problem liesin that stakeholders lack the usable and scalable tools to dealwith the breadth and depth of privacy policies.

Several proposals have aimed at alternative methods andUIs for presenting privacy notices [7], including nutritionlabels [13], privacy icons (recently recommended by theEU [14]), short notices [15], conversational UIs [16] andeven physical drones [17]. Unfortunately, these approacheshave faced a significant scalability hurdle: the human effortneeded to retrofit the new notices to existing policies andmaintain them over time is tremendous. Moreover, the existingresearch towards automating this process has been limited toa handful of “queries,” e.g., whether the policy mentions dataencryption or whether it provides an opt-out choice from third-party tracking [15], [18].

Our Framework: We address this scalability hurdle byproposing an automatic and comprehensive framework forprivacy Policies analysis (Polisis). It divides the privacy policyinto smaller and self-contained fragments of text, referredto as segments. Polisis automatically annotates, with highaccuracy, each segment with a set of labels describing itsprivacy practices. Unlike the previous research in automaticlabeling/analysis of privacy policies, we do not design Polisisto just predict a handful of classes given the entire policycontent. Instead, Polisis annotates the privacy policy at a muchfiner-grained scale. It predicts for each segment the set ofclasses that account for both the high-level aspects and thefine-grained classes of embedded privacy information. Polisisuses these classes to enable scalable, dynamic, and multi-dimensional queries on privacy policies, in a way that wasnot possible in earlier proposed approaches.

At the core of Polisis is a novel hierarchy of neural-network classifiers that predict up to 10 high-level and 122

1

fine-grained privacy classes for each privacy policy segment.To build these fine-grained classifiers, we leverage techniquessuch as subword embeddings and multi-label classification. Wefurther seeded these classifiers with a custom, privacy-specificlanguage model that we trained on our corpus of more than130,000 privacy policies.

Polisis will provide the underlying intelligence that allowsresearchers and regulators to focus their efforts on merelydesigning a set of queries that power their applications. Weemphasize, however, that Polisis is not intended to replacethe privacy policy – as a legal document – with an automatedinterpretation. Similar to existing approaches on privacy policyanalysis and presentation, it decouples the legally bindingfunctionality of privacy policies from their informational util-ity.

Applications: We demonstrate and evaluate Polisis’ mod-ularity and utility with two robust applications that supportstructured and free-form querying of a privacy policy.

The structured querying application involves extractingshort notices in the form of privacy icons from the privacypolicies. As a case study, we investigate the Disconnect privacyicons [19]. By composing a set of simple rules on top ofPolisis, we show a solution that can automatically selectappropriate privacy icons from a privacy policy. We go a stepfurther to study the practice of companies assigning iconsto privacy policies at a scale. We empirically demonstratethat existing privacy-compliance companies, such as TRUSTe(now rebranded as TRUSTe), adopt highly permissive policieswhen assigning such privacy icons. Our findings are consistentwith anecdotal controversies and manually investigated privacycertification and compliance processes [20], [21].

The second application illustrates the power of free-formquerying in Polisis. We design, implement and evaluate Pri-Bot, the first automated Question-Answering (QA) system forPrivacy policies. To build PriBot, we overcame the absence ofa public privacy-specific QA dataset by casting the problem asa ranking problem that could be solved using the classificationresults of Polisis. As a result, PriBot answers free-formuser questions from a previously unseen privacy policy, inreal time and with high accuracy. Hence, it demonstrates amore intuitive and user-friendly method to present privacynotices and controls. We evaluate PriBot using a new testdataset, based on real-world questions that have been askedby consumers on Twitter.

Contributions: This paper makes the following maincontributions:• We propose Polisis that automatically annotates a previ-

ously unseen privacy policy with a group of high-leveland fine-grained labels from a pre-specified taxonomy(Sec. II, III, IV, and V).

• We demonstrate how Polisis can be used to assign privacyicons to a privacy policy with an average accuracy of88.4%. This accuracy is computed by comparing iconsassigned with Polisis’s automatic labels to icons assignedwith manual annotations by three legal experts from theOPP-115 dataset [11] (Sec. VI).

Segmenter Segment Classifier

Query Module

Query Analyzer

App

policy segments

userquery

privacy policy link

Application Layer

MLLayer

Data Layer

1

1

2

ModuleDataflow

Privacy Taxonomy

Legend:

Class Comparison

query classes

annotatedsegments

34

Fig. 1: A high-level overview of Polisis.

• We design, implement and evaluate PriBot, a QA systemthat answers free-form user questions from privacy policies(Sec. VII). Our accuracy evaluation shows that PriBot canproduce at least one correct answer (as indicated by privacyexperts) in its top three for 82% of the test questions, andas the top one for 68% of the test questions.

• We provide two web services demonstrating our applica-tions: an app giving a visual overview of the differentaspects of each privacy policy (available from the weband via browser extensions) and a chatbot for answeringuser questions in real-time. Our apps will be available atpribot.org

II. FRAMEWORK OVERVIEW

Fig. 1 shows a high-level overview of Polisis. It comprisesthree layers: the Application Layer, the Data Layer, and theMachine Learning (ML) Layer. Polisis treats a privacy policyas a list of semantically coherent segments (i.e., groups of con-secutive sentences). It also utilizes a taxonomy of privacy datapractices. One example of such a taxonomy was introduced byWilson et al. [11] (detailed later with Fig. 3 in Sec. IV).

Application layer (Sec. V, VI & VII): The ApplicationLayer provides fine-grained information about the privacypolicy, thus providing the users with high modularity inposing their queries. In this layer, a Query Module receivesthe User Query about a privacy policy (Step 1 in Fig. 1).These inputs are forwarded to lower layers, which extract theprivacy classes embedded within the query and the policy’ssegments. To resolve the user query, the Class-Comparisonmodule identifies the segments with privacy classes matchingthose of the query. Then, it passes the matched segments (withtheir predicted classes) back to the application.

Data layer (Sec. III): The Data Layer first scrapes the pol-icy’s webpage. Then it partitions the policy into semanticallycoherent and adequately sized segments (using the Segmentercomponent in Step 2 of Fig. 1). Each of the resulting segmentscan be independently consumed by both the humans andprogramming interfaces.

Machine Learning layer (Sec. IV): In order to enablea multitude of applications to be built around Polisis, the

2

https://pribot.org

Further useful privacy and security related materials can be found through Google’s policies

and principles pages, including:

o Information about our technologies and principles, which includes, among other things,

more information on

• how Google uses cookies.

• technologies we use for advertising.

• how we recognize patterns like faces.

o A page that explains what data is shared with Google when you visit websites that use our

advertising, analytics and social products.

o The Privacy Checkup tool, which makes it easy to review your key privacy settings.

o Google’s safety center, which provides information on how to stay safe and secure online.

2. Append

3. P

rep

end

1. Merge short list

Fig. 2: List merging during the policy segmentation.

ML layer is responsible for producing rich and fine-grainedannotations of the data segments. This layer takes as an inputthe privacy policy segments from the Data Layer (Step 2)and the user query (Step 1) from the Application Layer.The Segment Classifier probabilistically assigns each segmenta set of class–value pairs describing its privacy practices.For example, an element in this set can be: information-type=location with probability p = 0.65. Similarly, the QueryAnalyzer extracts the privacy classes from the user’s query.Finally, the class–value pairs of both the segments and thequery are passed back to the Class Comparison module of theApplication Layer (Steps 3 and 4).

III. DATA LAYER

To pre-process the privacy policy, the Data Layer employsa Segmenter module in three stages: extraction, list handling,and segmentation. Other than the link to the privacy policy, theData Layer requires no other information or prior knowledge.

Policy Extraction: Given the URL of the privacy policy,the segmenter employs Firefox in headless mode (withoutUI) to scrape the policy’s webpage. It waits for the page tofully load which happens after all the JavaScript has beendownloaded and executed. Then, the segmenter removes allirrelevant HTML elements including the scripts, header, footer,side/navigation menus, comments, and CSS.

Although several online privacy policies contain dynami-cally viewable content (e.g., accordion toggles and collapsi-ble/expandable paragraphs), the “dynamic” content is alreadypart of the loaded webpage in almost all the cases. Forexample, when the user expands a collapsible paragraph, alocal JavaScript exposes an offline HTML snippet; no furtherdownloading takes place.

We confirmed this through the privacy policies of the top200 global websites from Alexa.com. For each privacy policylink, we compared the segmenter’s scraped content to thatextracted from our manual navigation of the same policy(while accounting for all the dynamically viewable elementsof the webpage). Using a fuzzy string matching library,1 wefound that the segmenter’s scraped policy covers, on average,99.08% of the content of the manually fetched policy. Thisresult indicates that our segmenter almost perfectly extractsthe privacy policy content during the extraction step.

List Aggregation: Second, the segmenter handles any or-dered or unordered lists inside the policy. Lists require aspecial treatment since counting an entire lengthy list, possibly

1https://pypi.python.org/pypi/fuzzywuzzy

covering diverse data practices, as a single segment couldresult in noisy annotations. On the other hand, treating eachlist item as an independent segment is problematic as listelements are typically not self-contained, resulting in missedannotations. See Fig. 2 from Google’s privacy policy as anexample2.

Our handling of the lists involves two techniques: one forshort list items (e.g., the inner list of Fig. 2) and another forlonger list items (e.g., the outer list of Fig. 2). For short listitems (maximum of 20 words per element), the segmentercombines the elements with the introductory statement of thelist into a single paragraph element (with <p> tag). The restof the lists with long items are transformed into a set ofparagraphs. Each paragraph is a distinct list element prependedby the list’s introductory statement (Step 3 in Fig. 2).

Policy Segmentation: The segmenter performs an initialcoarse segmentation by breaking down the policy accordingto the HTML <div> and <p> tags. The output of this step isan initial set of policy segments. As some of the resultingsegments might still be long, we subdivide them furtherwith another technique. We use GraphSeg [22], a recentlyproposed unsupervised algorithm that generates semanticallycoherent segments, without making assumptions about thestructure of the input text. It relies on word embeddings togenerate segments as cliques of related (semantically similar)sentences. For that purpose, we use custom, domain-specificword embeddings that we pre-trained on our corpus of 130Kprivacy policies (cf. Sec. IV). Finally, the segmenter outputs aseries of fine-grained segments to the Machine Learning layer,where they are automatically analyzed.

IV. MACHINE LEARNING LAYER

In this section, we describe the components of Polisis’sMachine Learning layer. We proceed in two stages: (1) anunsupervised stage, in which we build domain-specific wordvectors (i.e., word embeddings) for privacy policies fromunlabeled data, and (2) a supervised stage, in which we traina novel hierarchy of privacy text classifiers, based on neuralnetworks, that leverages the word vectors. These classifierspower the Segment Classifier and Query Analyzer modules ofFig. 1. We use word embeddings and neural networks due totheir proven advantages in text classification [23], comparedto traditional classification techniques.

A. Specialized Word Embeddings for Privacy Policies

Traditional text classifiers use the words and their frequen-cies as the building block for their features. They, however,have a limited generalization power, especially when thetraining datasets are limited in size and scope. A traditionalclassifier has no inherent way to account for word similarity.For example, replacing the word “erase” by the word “delete”can significantly change the classification result if “delete” wasnot in the classifier’s training set.

2https://www.google.com/policies/privacy, last modified on Aug. 29, 2016,retrieved on Feb. 14, 2017

3

Word embeddings solve this issue by extracting genericword vectors from a large corpus, in an unsupervised manner,and enabling their use in new classification problems (atechnique termed as Transfer Learning). The features in theclassifiers become the word vectors instead of the words them-selves. Hence, two text segments composed of semanticallysimilar words would be represented by two groups of wordvectors (i.e., features) that are close in the vector space. Thisallows the text classifier to account for words outside thetraining set, as long as they are part of the large corpus usedto train the word vectors.

While general-purpose pre-trained embeddings, such asWord2vec [24] and GloVe [25] do exist, domain-specificembeddings are well-known to result in superior classificationaccuracies [26]. Thus, we trained custom word embeddings forthe privacy policies’ domain. To that end, we created a newcorpus of 130K privacy policies collected from apps on theGoogle Play Store. The vast majority of these policies are notlimited to the respective mobile app; they typically describe acompany’s overall data practices.

We crawled the metadata of more than 1.4 million Androidapps available via the PlayDrone project [27]. From thismetadata, we found the links to 199,186 privacy policies. Wecrawled the web pages for these policies, retrieving 130,326policies which returned an HTTP status code of 200. Then,we extracted the textual content from their HTML usingthe policy crawler described in Sec. III. We will refer tothis corpus as the Policies Corpus. Using this corpus, wetrained a word-embeddings model using fastText [28]. Wehenceforth call this model the Policies Embeddings. A majoradvantage of using fastText is that it allows training vectorsfor subwords (or character n-grams of sizes 3 to 6) in additionto words. Hence, even if we have words outside our corpus,we can assign them vectors by combining the vectors of theirconstituent subwords. This is very useful in accounting forspelling mistakes that occur in applications that involve free-form user queries.

B. Classification Dataset

Our Policies Embeddings provide a solid starting point tobuild powerful classifiers. However, training the classifiers todetect fine-grained labels of privacy policies’ segments re-quires a labeled dataset. For that purpose, we leverage the On-line Privacy Policies (OPP-115) dataset, introduced by Wilsonet al. [11]. This dataset contains 115 privacy policies manuallyannotated by skilled annotators (law students). In total, thedataset has 23K annotated data practices. The annotations wereat two levels. First, paragraph-sized segments were annotatedaccording to one or more of the 10 high-level categoriesin Fig. 3 (e.g., First Party Collection, Data Retention). Thenannotators selected parts of the segment and annotated themusing attribute-value pairs, e.g., Information Type: Location,Purpose: Advertising, etc. In total, there were 20 distinctattributes and 138 distinct values across all attributes. Of thesevalues, 122 values had more than 20 labels. In Fig. 3, we onlyshow the mandatory attributes that should be present in all

segments. For space limitation, we only show samples of thevalues for selected attributes in Fig. 3.

C. Hierarchical Multi-label Classification

To account for the multiple granularity levels in the policies’text, we build a hierarchy of classifiers that are individuallytrained on handling specific parts of the problem.

At the top level, a classifier predicts one or more high-levelcategories of the input segment x (categories are the shadedboxes of Fig. 3). We train a multi-label classifier that providesus with the probability p(ci|x) of the occurrence of each high-level category ci, taken from the set of all categories C. Inaddition to allowing multiple categories per segment, using amulti-label classifier makes it possible to determine whethera category is present in a segment by simply comparing itsclassification probability to a threshold of 0.5.

At the lower level, a set of classifiers predicts one or morevalues for each privacy attribute (the leaves in the taxonomy ofFig. 3). We train a set of multi-label classifiers on the attribute-level. Each classifier produces the probabilities p(vj |x) for thevalues vj ∈ V(a) of a single attribute a. For example, giventhe attribute a=Information Type, the corresponding classifieroutputs the probabilities for elements in V(a): {financial,location, user profile, health, demographics, cookies, contactinformation, generic personal information, unspecified, . . .}.

An important consequence of this hierarchy is that interpret-ing the output of the attribute-level classifier depends on thecategories’ probabilities. For example, the values’ probabilitiesof the attribute “retention period” are irrelevant when thedominant high-level category is “policy change”. Hence, for acategory ci, one would only consider the attributes descendingfrom it in the hierarchy. We denote these attributes as A(ci),and we denote the set of all values across these attributes asV(ci).

We use Convolutional Neural Networks (CNNs) internallywithin all the classifiers for two main reasons. First, theyenable us to integrate pre-trained word embeddings whichprovide the classifiers with generalization capabilities. Second,they recognize when a certain set of tokens are a goodindicator of the class, in a way that is invariant to theirposition within the input segment. We use a similar CNNarchitecture for classifiers on both levels as shown in Fig. 4.The embeddings layer extracts the embeddings from the termscomposing an input segment (pre-processed with traditionaltokenization techniques). We chose to freeze the embeddingsof the input words during the training process by preventingtheir weights from being updated. This allows us to preservethe semantic similarity between words in the training set onone hand and words in the Policies Corpus on the other hand.

Next, the word vectors pass through a Convolutional layer.This layer applies a nonlinear function (a rectified linearunit (ReLU) in our case) to each window of k words in aphrase. Thus, it transforms each k-words into a dc-dimensionalvector, which accounts for the co-occurrences of words in thiswindow. The max-pooling layer combines these vectors bytaking the maximum value observed in each of the dc channels

4

1st Party

Collection

Collection

Mode

Information

Type

Purpose

3rd Party

Collection

Action

Information

Type

Purpose

Access, Edit,

Delete

Access

Scope

Access

Rights

Data

Retention

Retention

Period

Retention

Purpose

Information

Type

Data

Security

Security

Measure

Specific

Audiences

Audience

group

Do Not

Track

Do Not

Track

Policy

Policy

Change

Change

Type

User

Choice

Notification

Type

Other

Introductory

Contact

Information

Practice not

covered

Choice,

Control

Choice Type

Choice

Scope

• financial• health• contact• location• demographic• …

Information Type

• opt-in • opt-out• opt-out-link• …

Choice Type

• advertising • marketing• analytics/research• legal requirement• …

Purpose

• stated period• limited• indefinitely• unspecified• other

Retention Period

Fig. 3: The privacy taxonomy of Wilson et al. [11]. The top level of the hierarchy (shaded blocks) defines the high-levelprivacy categories. The lower level defines a set of privacy attributes, each assuming a set of values. For space consideration,we show the set of values for a handful of the attributes.

EmbeddingsLayer

Seg

men

t w1

w2

…

CNN+

Max-Pooling

+Relu

Dense 1+

ReluDense 2

Sigmoid

ClassesProbs

Fig. 4: Components of the CNN-based classifier we use.

over all the windows (to detect the most important features).This vector passes through the first dense (i.e., fully connected)layer, which is again followed by a max-pooling operationand a ReLU activation function. Finally, the vector arrives atthe second dense layer. A sigmoid operation is applied to theoutput of this layer to obtain the probabilities for the possibleoutput classes. We used multi-label cross-entropy loss as theclassifier’s objective function.

Models’ Training: In total, we trained 20 classifiers atthe attribute level (including the optional attributes). We alsotrained two classifiers at the category level: one for classifyingsegments and the other for classifying free-form queries. Forthe former, we include all the classes in Fig. 3. For the latter,we ignore the “Other” category as it is mainly for introductorysentences or uncovered practices, which are not applicableto users’ queries. For training the classifiers, we used thedata from 65 policies in the OPP-115 dataset, and we kept50 policies as a testing set. The hyper-parameters for eachclassifier were obtained by running a randomized grid-search.It is worth noting that, unlike in [11], we do not train thecategory classifiers at the paragraph level, but at the sentencelevel. For example, if a phrase is labeled at the attribute level asRetention Period: Unlimited, we label its containing segmentwith the parent category: Data Retention. We empiricallynoticed that this results in higher overall accuracy.

In Table I, we present the evaluation metrics on the testingset for the category classifier intended for free-form queries.In addition to the precision, recall and F1 scores (macro-averaged per label3), we also show the top-1 precision metric,representing the fraction of segments where the top predicted

3A successful multilabel classifier should not only predict the presence of alabel, but also its absence. Otherwise, a model that predicts that all labels arepresent would have 100% precision and recall. For that, the precision in thetable represents the macro-average of the precision in predicting the presenceof each label and predicting its absence (similarly for recall and F1 metrics).

TABLE I: Classification results for user queries at thecategory level. The hyperparameters used were as follows:(Embeddings size: 300, Number of filters: 200, Filter Size:3, Dense Layer Size: 100, Batch Size: 40).

Category Prec. Recall F1 Top-1Prec. Support

1st Party Collection 0.80 0.80 0.80 0.80 12673rd Party Sharing 0.81 0.81 0.81 0.86 963User Choice/Control 0.76 0.73 0.75 0.81 455Data Security 0.87 0.86 0.87 0.77 202Specific Audiences 0.95 0.94 0.95 0.91 156Access, Edit, Delete 0.94 0.75 0.82 0.97 134Policy Change 0.96 0.89 0.92 0.93 120Data Retention 0.79 0.67 0.71 0.60 93Do Not Track 0.97 0.97 0.97 0.94 16

Average 0.87 0.83 0.84 0.84

category label occurs in the annotators’ ground-truth labels. Asevident in the table, our classifiers can predict the top-levelprivacy category with high accuracy. Although we considerthe problem in the multilabel setting, these metrics are signifi-cantly higher compared to the models presented in the originalOPP-115 paper [11]. These results are also consistent with ourattribute-level classifiers for which the results are omitted dueto space constraints. The subsequently described applicationsfurther highlight their accuracy through queries that directlyleverage their output.

V. APPLICATION LAYER

Leveraging the power of the ML Layer’s classifiers, Polisissupports two types of queries: structured and free-from. Astructured query is a combination of first-order logic predicatesover the predicted privacy classes and the policy segments,such as: ∃s (s ∈ policy ∧ information type(s)=location ∧purpose(s) = marketing ∧ user choice(s)=opt-out). On theother hand, a free-form query is simply a natural languagequestion posed directly by the users, such as “do you sharemy location with third parties?”. The response to the queryis a set of segments satisfying the predicates in the case of astructured query or matching the user’s question in the caseof a free-form query.

The Application Layer builds on these query types to enablean array of applications for the different privacy stakeholders(users, researchers regulators, etc.). In the coming sections, we

5

will delve deep into two applications of this framework. Then,we will expand on the framework’s potential applications inSec. X.

VI. PRIVACY ICONS

Our first application shows the efficacy of Polisis in resolv-ing structured queries to privacy policies. As a case study, weinvestigate the Disconnect privacy icons [19], described in thefirst three columns of Table II. These icons evolved from aMozilla-led working group that included the Electronic Fron-tier Foundation, Center for Democracy and Technology, andthe W3C. The database powering these icons originated fromTRUSTe (rebranded later as TrustArc), a privacy compliancecompany, which carried out the task of manually analyzingand labeling privacy policies.

In what follows, we first establish the accuracy of Polisis’sautomatic assignment of privacy icons, using the Disconnecticons as a proof-of-concept. We perform a direct comparisonbetween assigning these icons via Polisis’s and assigning thembased on annotations by law students from [11]. Second, weleverage Polisis to investigate the level of permissiveness of theicons that Disconnect assigns based on the TRUSTe dataset.Our findings are consistent with the series of concerns raisedaround compliance checking companies over the years [20],[29], [30]. This illustrates the power of Polisis in scalable,automated auditing of privacy compliance checks.

A. Predicting Privacy Icons

Given that the rules behind the Disconnect icons are notprecisely defined, we translated their description into explicitfirst-order logic queries to enable automatic processing. Ta-ble II shows the original description and color assignmentprovided by Disconnect. We also show our interpretation ofeach icon in terms of labels present in the OPP-115 dataset andthe automated assignment of colors based on these labels. Ourgoal is not to reverse-engineer the logic behind the creationof these icons but to show that we can automatically assignthese icons with high accuracy, given a plausible interpretation.Hence, this represents our best effort to reproduce the icons,but these rules could easily be adapted later if needed.

To evaluate the efficacy of automatically selecting appro-priate privacy icons, we compare the icons produced withPolisis’s automatic labels to the icons produced with the lawstudents’ annotations from the OPP-115 dataset. We performthe evaluation over the set of 50 privacy policies which wedid not use to train Polisis (i.e., kept aside as a testing set).Each segment in the OPP-115 dataset has been labeled bythree experts. Hence, we take the union of the experts’ labelson one hand and the predicted labels from Polisis on the otherhand. Then, we run the logic presented in Table II (Columns4 and 5) to assign a set icons to each policy based on eachset of labels.

Table III shows the accuracy obtained per icon, measuredas the fraction of policies where: the icon based on automaticlabels matched the icon based on the experts’ labels. The av-erage accuracy across icons is 88.4%, showing the efficacy of

our approach in matching the experts’ aggregated annotations.To put this result in context, it is similar to the agreement levelbetween trained human judges assessing privacy policies (c.f.,Miyazaki and Krishnamurthy [20]). We also show Cohen’sκ, an agreement measure that accounts for agreement due torandom chance. In our case, the values indicate substantialto almost perfect agreement [31]. Finally, we show the dis-tribution of icons based on the experts’ labels alongside theHellinger distance, which measures the difference betweenthat distribution and the one produced using the automaticlabels. This distance assumes small values, illustrating that thedistributions are very close. Overall, these results support thepotential of automatically assigning privacy icons with Polisis.

B. Automated Assessment of the Industry Compliance Metrics

Given that we achieve a high accuracy in assigning privacyicons, it is intuitive to investigate how they compare to theicons assigned by Disconnect and TRUSTe. An importantconsideration in this regard is that several concerns have beenraised earlier around the level of leniency of TRUSTe andother compliance companies [30], [29], [21]. In 2000, theFTC conducted a study on privacy seals, including those ofTRUSTe, and found that, of the 27 sites with a privacy seal,approximately one-half implement, at least in part, all fourof the fair information practice principles and that only 63%implement Notice and Choice. Hence, we pose the followingquestion: “Can we automatically provide evidence of the levelof leniency of the Disconnect icons using Polisis?”.To answerthis question, we designed an experiment to compare the iconsextracted by Polisis’s automatic labels to the icons assignedby Disconnect on real policies.

One obstacle we faced is that the Disconnect icons havebeen announced in June 2014 [32]; many privacy policieshave likely been updated since then. To ensure that theprivacy policies we consider are within the time frame tothose used by Disconnect, we make use of Ramanath et al.’sACL/COLING 2014 dataset [33]. This dataset contains thebody of 1,010 privacy policies extracted between December2013 and January 2014. We obtained the icons for the same setof sites using the Disconnect privacy icons extension [19]. Ofthese, 354 policies had been (at least partially) annotated in theDisconnect dataset. We automatically assign the icons for thesesites by passing their policy contents into Polisis and applyingthe rules in Table II on the generated automatic labels. Wereport the results for the Expected Use and Expected Collectionicons as they are directly interpretable by Polisis. We do notreport the rest of the icons because the location informationlabel in the OPP-115 taxonomy included non-precise location(e.g., zip codes), and there was no label that distinguishes theexact retention period. Moreover, the Children privacy icon isassigned through a certification process that does not solelyrely on the privacy policy.

Fig. 5 shows the distribution of automatically extractedicons vs. the distribution of icons from Disconnect, when theywere available. The discrepancy between the two distributionsis obvious: the vast majority of the Disconnect icons have

6

TABLE II: The list of Disconnect icons with their description, our interpretation, and Polisis’s queries.

Icon Disconnect Description Disconnect Color Assignment Interpretation as Labels Automated Color Assignment

ExpectedUse

Discloses whether datait collects about you isused in ways other thanyou would reasonablyexpect given the site’sservice?

Red: Yes, w/o choice to opt-out. Or,undisclosed.Yellow: Yes, with choice to opt-out.

Green: No.

Let S be the segments with category:first-party-collection-use and purpose:advertising.

Yellow: All segments in S havecategory: user-choice-control andchoice-type ∈[opt-in, opt-out-link,opt-out-via-contacting-company]Green: S = φ

Red: Otherwise

ExpectedCollection

Discloses whether itallows other companieslike ad providers andanalytics firms to trackusers on the site?

Red: Yes, w/o choice to opt-out. Or,undisclosed.Yellow: Yes, with choice to opt-out.

Green: No.

Let S be the segments with category:third-party-sharing-collection, purpose:∈ [advertising,analytics-research ], andaction-third-party∈ [track-on-first-party-website-app,collect-on-first-party-website-app].

PreciseLocation

Discloses whether thesite or service tracks auser’s actualgeolocation?

Red: Yes, possibly w/o choice.

Yellow: Yes, with choice.

Green: No.

Let S be the segments withpersonal-information-type: location.

DataRetention Discloses how long

they retain yourpersonal data?

Red: No data retention policy.

Yellow: 12+ months.Let S be the segments with category:data-retention.

Green: All segments in S haveretention-period: ∈[stated-period, limited ].Red: S = φ

Yellow: Otherwise

ChildrenPrivacy

Has this websitereceived TrustArc’sChildren’s PrivacyCertification?

Green: Yes. Gray: No.Let S be the segments with category:international-and-specific-audiences andaudience-type: children

Green: length(S) > 0

Red: Otherwise

0 20 40 60 80 100Percentage %

TRUSTe

Polisis

(a) Exp. Use

0 20 40 60 80 100Percentage %

TRUSTe

Polisis

(b) Exp. Collection

Fig. 5: Conservative interpretation ofthe icons.

0 20 40 60 80 100Percentage %

TRUSTe

Polisis

(a) Exp. Use

0 20 40 60 80 100Percentage %

TRUSTe

Polisis

(b) Exp. Collection

Fig. 6: Permissive interpretation of theicons.

0 20 40 60 80 100Percentage %

TRUSTe

Polisis

(a) Exp. Use

0 20 40 60 80 100Percentage %

TRUSTe

Polisis

(b) Exp. Collection

Fig. 7: Very permissive interpretation ofthe icons.

TABLE III: Prediction accuracy and Cohen’s κ for iconprediction, along with the distribution of icons of each colorbased on OPP-115 labels.

Icon Accuracy Cohen κ Hellingerdistance N(R) N(G) N(Y)

Exp. Use 92% 0.76 0.12 41 8 1Exp. Collection 88% 0.69 0.19 35 12 3Precise Location 84% 0.68 0.21 32 14 4Data Retention 80% 0.63 0.13 29 16 5Children Privacy 98% 0.95 0.02 12 38 NA

a yellow label, indicating that the policies offer the useran opt-out choice (from unexpected use or collection). TheHellinger distances between those distributions are 0.71 and0.61 for Expected Use and Collection, respectively (i.e., 3–5xthe distance in the Table III).

This discrepancy might stem from our icon-assignmentstrategy in Table II, where we assign a yellow label only when“All segments in S (the concerned subset)” include the opt-in/opt-out choice, which could be considered as conservative.In Fig. 6, we show the icon distributions when relaxing theyellow-icon condition to become: “At least one segment inS” includes the opt-in/opt-out choice. Although the numberof yellow icons increases slightly, the icons with the new per-

missive strategy are significantly red-dominated. The Hellingerdistances between those distributions drop to 0.47 and 0.50for Expected Use and Collection, respectively. This resultindicates that the majority of the policies do not provide theusers a choice within the same segments describing data usagefor advertising or data collection by third parties.

We go one step further to follow an even more permissive(and potentially unrealistic) strategy where we assign theyellow label to any policy with S! = φ, given that there isat least one segment in the whole policy (i.e., even outsideS) with opt-in/opt-out choice. For example, a policy wherethird-party advertising is mentioned in the middle of the policywhile the opt-out choice about another action is mentioned atthe end of the policy would still receive a yellow label. Theicon distributions, in this case, are illustrated in Fig. 7, withHellinger distance of 0.22 for Expected Use and 0.19 for Ex-pected Collection. Only in this highly unrealistic interpretationof the icons would the distributions of Disconnect and Polisiscome to a reasonable proximity. This finding suggests that theicons assigned by Disconnect based on TRUSTe’s databasemight be highly permissive.

7

C. Discussion

There was no loss of generality when considering only twoof the icons; they provided the needed evidence of Disconnectfollowing a permissive strategy when assigning icons to poli-cies. A developer could still utilize Polisis to extract the restof the icons by either augmenting the existing taxonomy orby performing additional natural language processing on thePolisis’s returned segments.

Furthermore, by automatically generating icons, we do notintend to push the human completely outside the loop, espe-cially in situations where legal liability issues might arise. Poli-sis can assist human annotators by providing initial answersto their queries and the supporting evidence. In other words,it accurately flags the segments of interest to an annotator’squery so that the annotator can make the final decision.

VII. FREE-FORM QUESTION-ANSWERING

Our second application of Polisis is PriBot, a system thatenables free-form queries (in the form of user questions) onprivacy policies. PriBot is primarily motivated by the riseof conversation-first devices, such as voice-activated digitalassistants (e.g., Amazon Alexa and Google Assistant) andsmartwatches. For those devices, the existing techniques oflinking to a privacy policy or reading it aloud are not usable;they might require the user to access privacy-related informa-tion and controls on a different device, which is not desirablein the long run [7].

The inadequacy of current methods for privacy notice de-livery is emerging in another domain: customer support. Asa new trend in the industry [34], automated customer supportallows companies to respond to customer inquiries in real timeand around the clock. Through chatbots and other automatedinterfaces (e.g., Twitter bots), companies interact with theircustomers on a variety of topics, including privacy.

To support these new forms of services, devices and in-terfaces, we present PriBot as an intuitive and user-friendlymethod to communicate privacy information. PriBot answersfree-form user questions from a previously unseen privacypolicy, in real time and with high accuracy. In what follows,we formalize the problem of free-form QA and then describehow we leverage Polisis to build PriBot.

A. Problem Formulation

The input to PriBot consists of a user question q abouta privacy policy. PriBot passes q to the ML layer and thepolicy’s link to the Data Layer. The ML layer probabilisticallyannotates q and each policy’s segments with the privacycategories and attribute-value pairs of Fig. 3.

The segments in the privacy policy constitute the pool ofcandidate answers {a1, a2, . . . , aM}. A subset G of the answerpool is the ground-truth. We consider an answer ak as correctif ak ∈ G and as incorrect if ak /∈ G. If G is empty, then noanswers exist in the privacy policy. We formulate the problemas a one-shot question-answering problem, in which PriBotreturns the best answer without requesting further questionrefinements from the user. This formulation does not preclude

conducting a dialog with our system, provided that users posestandalone questions.

B. PriBot Ranking Algorithm

Ranking Score: In order to answer the question, PriBotranks each potential answer4 a by computing a proximity scores(q, a) between a and the question q. This is within the ClassComparison module of the Application Layer, which works intwo stages. First, given the output of the Segment Classifier,an answer is represented as a vector:

α = {p(ci|a)2 × p(vj |a) | ∀ci ∈ C, vj ∈ V(ci)}

for categories ci ∈ C and values vj ∈ V(ci) descending fromci. Similarly, given the output of the Query Analyzer, thequestion is represented as:

β = {p(ci|q)2 × p(vj |q) | ∀ci ∈ C, vj ∈ V(ci)}

The category probability in both α and β is squared to putmore weight on the categories at comparison time. Next, wecompute a certainty measure of the answer’s high-level cate-gorization based on the entropy of the normalized probabilitydistribution (pn) of the predicted categories:

cer(a) = 1− (−∑

(pn(ci|a)× ln(pn(ci|a))/ ln(|C|)). (1)

Akin to a dot product between two vectors, we compute thescore s(q, a) as:

s(q, a) =

∑i(βi ×min(βi, αi))∑

i β2i

× cer(a) (2)

As answers are typically longer than the question andinvolve a higher number of significant features, this scoreprioritizes the answers containing significant features thatare also significant in the question. The min function andthe denominator are used to normalize the score within therange [0, 1]. From an implementation perspective, we found iteffective to exclude features below a certain threshold (=0.2)from the score computation. This exclusion minimizes thecases where small differences between insignificant category–value combinations could lead to unwanted changes in theanswer ranking.

To illustrate the strength of PriBot and its answer rankingapproach, we consider the following question (posed by aTwitter user):“Under what circumstances will you release [sic] to 3rd parties?”

Then, we consider two examples of ranked segments by Pri-Bot. The first example has a ranking score of 0.63: “Personalinformation will not be used or disclosed for purposes otherthan those for which it was collected, except with the consentof the individual or as required by law. . . ”The second has a ranking score of 0:“All personal information

collected by the TTC will be protected by using appropriate

safeguards against loss, theft and unauthorized access, disclosure,

copying, use or modification.”

4For notation simplicity, we henceforth use a to indicate an answer insteadof ak .

8

Although both example segments share terms such as “per-sonal” and “information”, PriBot ranks them differently. Itaccounts for the fact that the question and the first exampleshare the same high-level category: 3rd Party Collection whilethe second example is categorized under the Data Security.

Confidence Indicator: The ranking score is an internalmetric that specifies how close is each segment to the question,but does not relay PriBot’s certainty in reporting a correctanswer to a user. Intuitively, the confidence in an answershould be low when (1) the answer is semantically far fromthe question (i.e., s(q, a) is low) (2) the question is interpretedambiguously by Polisis, (i.e., classified into multiple categoriesresulting in a high classification entropy), or (3) when thequestion contains unknown words (e.g., in a non-Englishlanguage or with too many spelling mistakes). Taking intoconsideration these criteria, we compute a confidence indicatoras follows:

conf(q, a) = s(q, a) ∗ (cer(q) + frac(q))2

(3)

where the categorization certainty measure cer(q) is computedsimilarly to cer(a) in Eq. (1), and s(q, a) is computed accord-ing to Eq. (2). The fraction of known words frac(q) is basedon the presence of the question’s words in the vocabulary ofour Policies Embeddings’ corpus.

Potential Answer Conflicts: Another challenge is display-ing potentially conflicting answers to the users. One answercould describe a general sharing clause while another specifiesan exception (e.g., one answer specifies “share” and anotherspecifies “do not share”). To mitigate this issue, we usedthe same CNN classifier of Sec. IV and exploited the factthat the OPP-115 dataset had optional labels of the form:“does” vs. “does not” to indicate the presence or absenceof sharing/collection. Our classifier had a cross-validation F1score of 95%. Hence, we can use this classifier to detectpotential discrepancies between the top-ranked answers. TheUI of PriBot can thus highlight the potentially conflictinganswers to the user (cf. Fig. 20 in the Appendix).

Abstractive Answers: Finally, we briefly propose an ap-proach to reduce the complexity of the returned answers fromPriBot, turning them into shorter, more user-friendly answers.Our approach consists of generating abstractive versions ofthese answers based on Polisis as compared to the directly ex-tracted answers derived from the policy itself. Fig. 8 illustratesour proposed approach, where we show an existing answeralongside the probabilistically assigned categories and attributelabels from Polisis. In the first step, we remove the low-qualitylabels – the ones with probability lower than a threshold. Then,we group the labels under the high-level category they belongto, as shown in the figure. Next, for each high-level category,we generate a summary based on all the labels (attribute-valuepairs) descending from it. This grouping serves to preserve thecoherence of the generated content.

The summary consists of one or more sentences and isgenerated based on our custom grammar. Our grammar con-sists of an optional introductory sentence about the high-level

category, followed by a statement about each label. Statementsdescribing the same attributes (e.g., information type) followa similar structure; they start with an optional introductoryphrase, followed by a specific phrase about the value (e.g., userprofile information). Example summaries are shown in Fig. 8.We rank these answers according to the high level categorypresent in each of them. The main benefit of this approach isresponding back to the user in a simplified language while stillreflecting the essence of the extracted answer. These propertiesare highly desirable in conversational settings, such as voice-activated devices. In this report, we focus on evaluation of thedirectly extracted answers. We leave the full development andevaluation of this abstractive approach for future work.

VIII. PriBot EVALUATION

In this report, we assess the performance of PriBot byevaluating the predictive accuracy (Sec. VIII-C) of its QAranking model.

A. Twitter Dataset

In order to evaluate PriBot with realistic privacy questions,we created a new privacy QA dataset. It is worth noting thatwe utilize this dataset for the sole purpose of testing PriBot,not for training it. Our requirements for this dataset werethat it: (1) must include free-form questions about the privacypolicies of different companies and (2) must have a groundtruth answer for each question from the associated policy.

To this end, we collected, from Twitter, privacy-relatedquestions Twitter users had tweeted at companies. This ap-proach avoids subject bias, which is likely to arise when elic-iting privacy-related questions from individuals, who will notpose them out of genuine need. In our collection methodology,we aimed for a QA test set of size between 100 and 200 QApairs, as is the convention in similar human-annotated QAevaluation domains, such as the Text REtrieval Conference(TREC) and SemEval-2015 [35], [36], [37], [38].

To avoid searching for questions via biased keywords, westarted by searching for reply tweets that direct the usersto a company’s privacy policy (e.g., using queries such as”filter:replies our privacy policy” and ”filter:replies our privacystatement” ). We then backtracked these reply tweets to the(parent) question tweets asked by customers to obtain a setof 4,743 pairs of tweets, containing privacy questions but alsosubstantial noise due to the backtracking approach. Followingthe best practices of noise reduction in computational socialscience [39], we automatically filtered the tweets to keep thosecontaining question marks, at least four words (excludinglinks, hashtags, mentions, numbers and stop words) and alink to the privacy policy, leaving 260 pairs of question-replytweets. This is an example of a tweet pair which was removedby the automatic filtering:Question: “@Nixxit your site is very suspicious.”

Answer: “@elitelinux Updated it with our privacy policy. Apolo-

gies, but we’re not fully up yet and running shoe string.”

Next, two of the authors independently validated each ofthe tweets to remove question tweets (a) that were not related

9

Float may also transfer personal data toFloat-affiliated companies in other countries.These may be outside the EuropeanEconomic Area and may not have adequatelaws that protect the rights and freedoms ofdata subjects in relation to the processing ofpersonal data. Where this is done, Float shalltake necessary steps to adequately protectthe information transferred. In addition, wewill transfer/share your personal data asrequired by law, such as to comply with asubpoena, or similar legal process.

3rd Party Collection

Action: Shared with

Purpose: Legal Requirement

Info Type: User profile

Specific Audiences:Audience Type:

Citizens from other countries

Data SecuritySecurity measure:

Privacy/Security program

0.24

0.23

0.12

0.14

0.05

Summary 1: We share data with third parties. Ourdata sharing might involve personal information (likeyour user profile info). We need this to respond torequests from legal authorities (i.e. the government).

Summary 3: We take security measures to guard yourdata. We particularly adhere to a special privacy andsecurity framework.

Summary 2: We have a special section in the policyon how we handle the data of citizens of othercountries.

Fig. 8: The flow diagram of the proposed abstractive approach of PriBot.

to privacy policies, (b) to which the replies are not from theofficial company account, and (c) with inaccessible privacypolicy links in their replies. The level of agreement (Cohen’sKappa) among both annotators for the labels valid vs. invalidwas almost perfect (κ = 0.84) [31]. The two annotators agreedon 231 of the question tweets (of the 260), tagging 182 as validand 49 as invalid. This is an example of a tweet pair whichwas annotated as invalid:Question: “What is your worth then? You can’t do it? Nuts.”

Answer: “@skychief26 3/3 You can view our privacy policy at

http://t.co/ksmaIK1WaY. Thanks.”

As we will evaluate the answers to these questions withhuman experts, our estimates of an adequately sized study ledus to randomly sample 120 tweets out of the tweets whichboth annotators labeled as valid questions. We provide thesetweets in the Appendix, and we henceforth refer to them asthe Twitter QA Dataset.

B. QA Baselines

We compare PriBot’s QA model against three baselineapproaches that we developed: (1) Retrieval reflects the state-of-the-art in term-matching retrieval algorithms, (2) SemVecrepresenting a single neural network classifier, and (3) Ran-dom as a control approach where questions are answered withrandomly chosen segments from the policy.

Our first baseline, Retrieval, builds on the BM25 algo-rithm [40], which is the state-of-the-art in ranking modelsemploying term-matching. It has been used successfully acrossa range of search tasks, such as the TREC evaluations [41]. Weimprove on the basic BM25 model by computing the inversedocument frequency on the Policies Corpus of Sec. IV-Binstead of a single policy. Retrieval ranks the segments inthe policy according to their similarity score with the user’squestion. This score depends on the presence of distinctivewords that link a user’s question to an answer.

Our second baseline, SemVec employs a single classifiertrained to distinguish among all the (mandatory) attribute-values (with > 20 annotations) from the OPP-115 dataset (81classes in total). An example segment is “geographic locationinformation or other location-based information about youand your device” that was labeled as {Information Type:Location}. We obtain a micro-average precision of 0.56 (i.e.,the classifier is on average predicting the right label acrossthe 81 classes in 56% of the cases compared to 3.6%

precision for a random classifier). After training this model,we extract a “semantic vector”, which is a representationvector that accounts for the distribution of attribute valuesin the input text. We extract this vector as the output of theReLU activation layer preceding the second dense layer (asshown in Fig. 4). SemVec ranks the similarity between aquestion and a policy segment using the Euclidean distancebetween semantic vectors. This approach is similar to whathas been applied previously in image retrieval, where imagerepresentations learned from a large-scale image classificationtask were effective in visual search applications [42].

C. Predictive Accuracy Evaluation

Here, we evaluate the predictive accuracy of PriBot’s QAmodel by comparing its predicted answers against expert-generated ground-truth answers for the questions of the TwitterQA Dataset.

1) Ground-truth Generation: Two of the authors generatedthe ground-truth answers to the questions from the Twitter QADataset. They were given a user’s question (tweet) and thesegments of the corresponding policy. Each policy consists of45 segments on average (min=12, max=344, std=37). Eachannotator selected, independently, the subset of these segmentswhich they consider as best responding to the user’s question.This annotation took place prior to generating the answersusing our models to avoid any bias. While deciding on theanswers, the annotators accounted for the fact that multiplesegments of the policy might answer a question.

After finishing the individual annotations, the two annota-tors consolidated the differences in their labels to reach anagreed-on set of segments; each assumed to be answeringthe question. We call this the ground-truth set for eachquestion. The annotators agreed on at least one answer in 88%of the questions for which they found matching segments,thus signifying a substantial overlap. Cohen’s κ, measuringthe agreement on one or more answer, was 0.65, indicatingsubstantial agreement [31].

We generated, for each question, the predicted ranked listof answers according to each QA model (PriBot and the otherthree baselines). In what follows, we evaluate the predictiveaccuracy of these models.

2) Top-k Score: We first report the top-k score, a widelyused and easily interpretable metric, which denotes the portionof questions having at least one correct answer in the top k

10

Random Retrieval SemVec PriBot

1.0 2.0 3.0 4.0 5.0 6.0 7.0k

0.0

0.2

0.4

0.6

0.8

1.0

top-

k sc

ore

(a) top-k score

1.0 2.0 3.0 4.0 5.0 6.0 7.0k

0.00.10.20.30.40.50.60.7

ND

CG

(b) NDCG

Fig. 9: Predictive evaluation metrics as a function of k.

returned answers. It is desirable to achieve a high top-k scorefor low values of k so that user has to process less informationbefore reaching a correct answer. We show in Fig. 9a howthe top-k score varies as a function of k. PriBot’s model hasthe best performance over the other three models by a largemargin, especially at the low values of k. For example, atk = 1, PriBot has a top-k score of 0.68, which is significantlylarger than the scores of 0.39 (Retrieval), 0.27 (SemVec), and0.08 (Random) (p-value < 0.05 according to pairwise Fisher’sexact test, corrected with Bonferroni method for multiplecomparisons). PriBot further reaches a top-k score of 0.75,0.83, and 0.87 for k ∈ [2, 3, 4]. To put these numbers in thewider context of free-form QA systems, we note that the top-1accuracy reported by IBM Watson’s team on a large insurancedomain dataset (a training set of 12,889 questions and 21,325answers) was 0.65 in 2015 [43] and was later improved to0.69 in 2016 [44]. Given that PriBot had to overcome theabsence of publicly available QA datasets, our top-1 accuracyvalue of 0.68 is on par with such systems. We also observethat the Retrieval model outperforms the SemVec model. Thisresult is not entirely surprising since we seeded Retrieval witha large corpus of 130K unsupervised policies, thus improvingits performance on answers with matching terms.

3) Policy Length: We then assess the impact of the policylength on PriBot’s accuracy. First, we report the NormalizedDiscounted Cumulative Gain (NDCG) [45]. Intuitively, itindicates that a relevant document’s usefulness decreases loga-rithmically with the rank. This metric captures how presentingthe users with more choices affects their user experience asthey need to process more text. Also, it is not biased bythe length of the policy. The DCG part of the metric iscomputed as DCGk =

∑ki=1

relilog2(i+1) , where reli is 1 if

answer ai is correct and 0 otherwise. NDCG at k is obtainedby normalizing the DCGk with the maximum possible DCGkacross all values of k. We show in Fig. 9b the average NDCGacross questions for each value of k. It is clear that PriBot’smodel consistently exhibits superior NDCG. This indicatesthat PriBot is poised to perform better in a system where lowvalues of k matter the most.

Second, to further focus on the effect of policy length, wecategorize the policy lengths (#segments) into short, medium,and high, based on the 33rd and the 66th percentiles (i.e.,corresponding to #segments of 28 and 46). We then compute ametric independent of k, namely the Mean Average Precision(MAP), which is the mean of the area under the precision-

recall curve across all questions. Informally, MAP is an indi-cator of whether all the correct answers are highly ranked. Wesee from Fig. 10 that, for short policies, the Retrieval model iswithin 15% of the MAP of PriBot’s model, which makes sensegiven the smaller number of potential answers. With medium-sized policies, PriBot’s model is better by a large margin. Thismargin is still considerable with long policies.

4) Confidence Indicator: Comparing the confidence (usingthe indicator from Eq. (3)) of incorrect answers predictedby PriBot (mean=0.37, variance=0.04) with the confidenceof correct answers (mean=0.49, variance =0.05) shows thatPriBot places lower confidence in the answers that turn outto be incorrect. Hence, we can use the confidence indicatorto filter-out the incorrect answers. For example, by setting thecondition: conf(q, a) ≥ 0.6 to accept PriBot’s answers, wecan increase the top-1 accuracy to 70%.

This indicator delivers another advantage: its componentsare independently interpretable by the application logic. If thescore s(q, a) of the top-1 answer is too low, the user can benotified that the policy might not contain an answer to thequestion. A low value of cer(q) indicates that the user mighthave asked an ambiguous question; the system can ask theuser back for a clarification. We show actual examples of thesecases from PriBot in Fig. 17 and Fig 18 of the Appendix.

5) Pre-trained Embeddings Choice: As discussed inSec. IV, we utilize our custom Policies Embeddings, whichhave the two properties of (1) being domain-specific and(2) using subword embeddings to handle out-of-vocabularywords. We test the efficacy of this choice by studying threevariants of pre-trained embeddings. For the first variant, westart from our Policies Embeddings (PE), and we disable thesubwords mode, thus only satisfying the first property; wecall it PE-NoSub. The second variant is the fastText WikipediaEmbeddings from [46], trained on the English Wikipedia, thusonly satisfying the second property; we denote it as WP. Thethird variant is WP, with the subword mode disabled, thussatisfying neither property; we call it WP-NoSub. In Fig. 11,we show the top-k score of PriBot on our Twitter QA datasetwith each of the four pre-trained embeddings. First, we cansee that our Policies Embeddings outperform the other modelsfor all values of k, scoring 14% and 5% more than the closestvariant at k = 1 and k = 2 respectively. As expected, thedomain-specific model without subwords embeddings (PE-NoSub) has a weaker performance by a significant margin,especially for the top-1 answer. Interestingly, the difference ismuch narrower between the two Wikipedia embeddings sincetheir vocabulary already covers more than 2.5M tokens. Hence,subword embeddings play a less of a pronounced role there. Insum, the advantage of using subwords embeddings with the PEmodel originates from their domain specificity and their abilityto compensate for the missing words from the vocabulary (cf.Fig. 19 in the Appendix for an example).

D. User-Perceived Utility Evaluation

In order to get further insights on the performance of the QAapproach, we are currently performing a user study to assess

11

Random Retrieval SemVec PriBot

short medium longPolicy Length

0.0

0.2

0.4

0.6

0.8

Mea

n A

vera

ge P

reci

sion

Fig. 10: Variation of MAPacross policy lengths.

WP-NoSub WP PE-NoSub PE

1.0 2.0 3.0 4.0k

0.0

0.2

0.4

0.6

0.8

1.0

top-

k sc

ore

Fig. 11: top-k score ofPriBot with different pre-trained embeddings.

the the user-perceived utility of the provided answers. Thisis motivated by research on the evaluation of recommendersystems, where the model with the best accuracy is not alwaysrated to be the most helpful by the users (see the work byKnijnenburg et al. [47]). We leave the details of this studyand its results for upcoming versions of this manuscript.

IX. DEPLOYMENT AND LEGAL ASPECTS

Deployment: We will provide two prototype web applica-tions for end-users. The first is an application that visualizesthe different aspects in the privacy policy, powered by theannotations from Polisis (available as a web application anda browser extension for Chrome and Firefox). The second isa chatbot implementation of PriBot for answering questionsabout privacy policies in a conversational interface. Theseapplications will be available at pribot.org. We provide thecorresponding screenshots in the Appendix.

We envision that, by making these applications accessibleto the public and by engaging with the different privacystakeholders, we will be contributing to a greater ecosystemfor automated analysis of privacy policies. Our future deploy-ment goals include integrating PriBot within a platform likeAmazon Alexa or Google Home as a conversational methodfor asking about the privacy practices of third-party apps.

Legal Aspects: We want to highlight, however, that Polisisis not intended to replace the legally-binding privacy policy.Rather, it offers a complementary interface for the privacystakeholders to easily inquire the contents of a privacy pol-icy. Following the trend of automation in legal advice [48],insurance claim resolution [49], and privacy policy presenta-tion [15], [50], third parties, such as automated legal servicesfirms or regulators, can deploy Polisis as a solution for theirusers. As is the standard in such situations, these parties shouldamend Polisis with a disclaimer specifying that it is based onautomatic analysis and does not represent the actual serviceprovider [51]. Using the confidence scores, similar to that ofEq. (3), can also assist in conveying Polisis’s (un)certainty ofa reported result, whether it is an answer, icon or another formof short notice.

Companies and service providers can internally deploy anapplication similar to PriBot as an assistance tool for theircustomer support agents to handle privacy-related inquiries.Putting the human in the loop allows for a favorable trade-off between the utility of Polisis and its legal implications.For a wider discussion on the issues surrounding automated

legal analysis, we refer the interested reader to the works ofMcGinnis and Pearce [52] and Pasquale [53].

X. FUTURE APPLICATIONS OF Polisis

Although we presented above our current realization ofPolisis, we believe that there are a lot of extensions that areworth investigating in the coming period. In this section, wetake an exemplification approach to illustrate the potential ofPolisis. We discuss the applications that can be built at thelevel of users, researchers and regulators.

Users: Polisis can automatically populate several of thepreviously proposed short notices for privacy policies, suchas nutrition tables and privacy icons [54], [55], [19], [3]. Thistask can be achieved by mapping these notices to a set ofstructured queries (cf. Sec. VI) Another possible applicationis privacy-centered comparative shopping [56]. A user canbuild on Polisis’s output to automatically quantify the privacyutility of a certain policy. For example, such a privacy metriccould be a combination of positive scores describing privacy-protecting features (e.g., policy containing a segment withthe label: Retention period: stated period ) and negative scoresdescribing privacy-infringing features (e.g., policy containing asegment with the label: Retention period: unlimited ). A majoradvantage of automatically generating notices is that they canbe seamlessly refreshed when policies are updated or whenthe rules to generate these notices are modified. Otherwise,discrepancies between policies and notices might arise overtime, which might deter companies from adopting the shortnotices in the first place.

By answering free-form queries with relevant policy seg-ments, Polisis can remove the interface barrier between thepolicy and the users, especially in conversational interfaces(e.g., voice assistants and chatbots). Going a step further,Polisis’s output could serve to automatically rephrase theanswer segments to a simpler language. A rule-engine cangenerate text based on the combination of predicted classes ofan answer segment (e.g., We share data with third parties. Thisconcerns our users’ information, like your online activities. Weneed this to respond to requests from legal authorities).

Researchers: Whether in Law, Journalism, or ComputerScience, the analysis of privacy policies at scale can pro-vide researchers with valuable insights about data practicesof specific companies or industry sectors. For instance, re-searchers interested in analyzing apps that admit collectinghealth data [57], [58] could utilize Polisis to query a datasetof app policies. For example, a respective query to a specificpolicy can be formed by joining the label Personal InformationType: health with the category of First Party Collection orThird Party Sharing.

Regulators: Numerous studies from regulators and lawand public policy researchers have manually analyzed thepermissiveness of compliance checks [59], [20]. The numberof assessed privacy policies in these studies is typically inthe range of tens of policies. For instance, the NorwegianConsumer Council has investigated the level of ambiguity in

12

https://pribot.org

defining personal information within only 20 privacy poli-cies [59]. Polisis can scale these studies by the processing theregulator’s queries on large datasets. For example, with Polisis,policies can be ranked according to an automated ambiguitymetric by using the Personal Information Type attribute anddifferentiating between the label generic personal informationand other labels specifying the type of data collected. Simi-larly, this applies to frameworks such as Privacy Shield [12]and the GDPR [14], where issues such as limiting the datausage purposes should be investigated.

Finally, Polisis might be limited by the employed privacytaxonomy. Although the OPP-115 taxonomy covers a widevariety of privacy practices [11], there are certain types ofapplications that it does not fully capture. One mitigation is touse Polisis as an initial step in order to filter the relevant dataat a high level before applying additional, application-specifictext processing. Another mitigation is to leverage Polisis’smodularity by amending it with new categories/attributes andtraining these new classes on the relevant annotated dataset.

XI. RELATED WORK

Privacy Policy Analysis: There have been numerous at-tempts to create easy-to-navigate and alternative presentationsof privacy policies. Kelley et al. [54] studied using nutritionlabels as a paradigm for displaying privacy notices. Iconsrepresenting the privacy policies have also been proposed [60],[55]. Others have proposed standards to push service providersto encode privacy policies in a machine-readable format, suchas P3P [61], but they suffered the lack of adoption by browserdevelopers and service providers. Polisis has the potential toautomate the generation of a lot of these notices, withoutrelying on the respective parties to do it themselves.

Recently, several researchers have explored the potential ofautomated analysis of privacy policies. For example, Liu etal. [50] have used deep learning to model the vagueness ofwords in privacy policies. Zimmeck et al. [62] have beenable to show significant inconsistencies between app prac-tices and their privacy policies via automated analysis. Thesestudies, among others [63], [64], have been largely enabledby the release of the OPP-115 dataset by Wilson et al. [11],containing 115 privacy policies extensively annotated by lawstudents. Our work is the first to provide a generic systemfor the automated analysis of privacy policies. In terms of thecomprehensiveness and the accuracy of the approach, Polisisprovides a major improvement on the state of the art. It allowstransitioning from labeling of policies with a few practices(e.g., the works by Zimmeck and Bellovin [15] and Sathyendraet al. [18]) to a much more fine-grained annotation (up to10 high-level and 122 fine-grained classes for each privacysegment), thus enabling a richer set of applications.

Evaluating the Compliance Industry: Regulators andresearchers are continuously scrutinizing the practices of theprivacy compliance industry [20], [30], [29]. Miyazaki andKrishnamurthy [20] found no support that participating in aseal program is an indicator of following privacy practice

standards. The FTC has found discrepancies between the prac-tical behaviors of the companies, as reported in their privacypolicies, and the privacy seals they have been granted [29].Polisis can be used by these researchers and regulators toautomatically, and continuously perform such checks at scale.It can provide the initial evidence that could be processed byskilled experts afterward, thus reducing the analysis time andthe cost.

Automated Question Answering: Our QA system, PriBot,is focused on non-factoid questions, which are usually com-plex and open-ended. Over the past few years, deep learninghas yielded superior results to traditional retrieval techniquesin this domain [43], [44], [65]. Our main contribution isthat we build a QA system, without a dataset that includesquestions and answers, while achieving results on par withthe state of the art on other domains. We envision that ourapproach could be transplanted to other problems that facesimilar issues.

XII. CONCLUSION

We proposed Polisis, the first generic framework that pro-vides automatic analysis of privacy policies. It can assist users,researchers, and regulators in processing and understandingthe content of privacy policies at a scale. To build Polisis, wedeveloped a new hierarchy of neural networks that extracts thehigh-level privacy practices as well as fine-grained informationfrom privacy policies. Using this extracted information, Polisiscan enable several applications. In this paper, we demonstratetwo examples of such applications: structured and free-formquerying. In the first example, we use Polisis’s output toextract short notices from the privacy policy in the form ofprivacy icons and to audit TRUSTe’s policy analysis approach.In the second example, we build PriBot that answers users’free-form questions in real time and with high accuracy. Ourevaluation of both applications reveals that Polisis matchesthe accuracy of expert analysis of privacy policies. Besidesthese applications, Polisis opens opportunities for further in-novative privacy policy presentation mechanisms, includingsummarizing policies into simpler language. It can also enablecomparative shopping applications that advise the consumer bycomparing the privacy aspects of multiple applications theywant to choose from.

REFERENCES

[1] F. H. Cate, “The limits of notice and choice,” IEEE Security Privacy,vol. 8, no. 2, pp. 59–62, March 2010.

[2] Federal Trade Commission, “Protecting Consumer Privacy in an Era ofRapid Change,” March 2012.

[3] J. Gluck, F. Schaub, A. Friedman, H. Habib, N. Sadeh, L. F. Cranor, andY. Agarwal, “How short is too short? implications of length and framingon the effectiveness of privacy notices,” in Twelfth Symposium onUsable Privacy and Security (SOUPS 2016). Denver, CO: USENIXAssociation, 2016, pp. 321–340. [Online]. Available: https://www.usenix.org/conference/soups2016/technical-sessions/presentation/gluck

[4] A. M. McDonald and L. F. Cranor, “The cost of reading privacypolicies,” ISJLP, vol. 4, p. 543, 2008.

[5] President’s Concil of Advisors on Science and Technology, “Big dataand privacy: A technological perspective. Report to the President,Executive Office of the President,” May 2014.

13

https://www.usenix.org/conference/soups2016/technical-sessions/presentation/gluck

https://www.usenix.org/conference/soups2016/technical-sessions/presentation/gluck

[6] F. Schaub, R. Balebako, and L. F. Cranor, “Designing effective privacynotices and controls,” IEEE Internet Computing, vol. 21, no. 3, pp. 70–77, 2017.

[7] F. Schaub, R. Balebako, A. L. Durity, and L. F. Cranor, “A designspace for effective privacy notices,” in Eleventh Symposium On UsablePrivacy and Security (SOUPS 2015). Ottawa: USENIX Association,2015, pp. 1–17. [Online]. Available: https://www.usenix.org/conference/soups2015/proceedings/presentation/schaub

[8] Federal Trade Commission, “Internet of Things, Privacy & Security ina Connected World,” Jan. 2015.

[9] A. Rao, F. Schaub, N. Sadeh, A. Acquisti, and R. Kang, “Expectingthe unexpected: Understanding mismatched privacy expectationsonline,” in Twelfth Symposium on Usable Privacy and Security(SOUPS 2016). Denver, CO: USENIX Association, 2016, pp. 77–96. [Online]. Available: https://www.usenix.org/conference/soups2016/technical-sessions/presentation/rao

[10] S. Wilson, F. Schaub, R. Ramanath, N. Sadeh, F. Liu, N. A.Smith, and F. Liu, “Crowdsourcing annotations for websites’ privacypolicies: Can it really work?” in Proceedings of the 25th InternationalConference on World Wide Web, ser. WWW ’16. Republic andCanton of Geneva, Switzerland: International World Wide WebConferences Steering Committee, 2016, pp. 133–143. [Online].Available: https://doi.org/10.1145/2872427.2883035

[11] S. Wilson, F. Schaub, A. A. Dara, F. Liu, S. Cherivirala, P. G. Leon,M. S. Andersen, S. Zimmeck, K. M. Sathyendra, N. C. Russell,T. B. Norton, E. H. Hovy, J. R. Reidenberg, and N. M. Sadeh,“The creation and analysis of a website privacy policy corpus,”in Proceedings of the 54th Annual Meeting of the Associationfor Computational Linguistics, ACL 2016, August 7-12, 2016,Berlin, Germany, Volume 1: Long Papers, 2016. [Online]. Available:http://aclweb.org/anthology/P/P16/P16-1126.pdf

[12] U.S. Department of Commerce, “Privacy shield program overview,”https://www.privacyshield.gov/Program-Overview, 2017, accessed: 10-01-2017.

[13] P. G. Kelley, J. Bresee, L. F. Cranor, and R. W. Reeder,“A ”nutrition label” for privacy,” in Proceedings of the 5thSymposium on Usable Privacy and Security, ser. SOUPS ’09. NewYork, NY, USA: ACM, 2009, pp. 4:1–4:12. [Online]. Available:http://doi.acm.org/10.1145/1572532.1572538

[14] “Regulation (EU) 2016/679 of the European Parliament and of theCouncil of 27 April 2016 on the protection of natural personswith regard to the processing of personal data and on the freemovement of such data, and repealing Directive 95/46/EC (GeneralData Protection Regulation),” Official Journal of the EuropeanUnion, vol. L119, pp. 1–88, May 2016. [Online]. Available: http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:L:2016:119:TOC

[15] S. Zimmeck and S. M. Bellovin, “Privee: An architecture for automat-ically analyzing web privacy policies.” in USENIX Security, vol. 14,2014.

[16] H. Harkous, K. Fawaz, K. G. Shin, and K. Aberer, “Pribots: Conver-sational privacy with chatbots,” in Workshop on the Future of PrivacyNotices and Indicators, SOUPS 2016, Denver, CO, USA, June 22, 2016.USENIX Association, 2016.

[17] F. Schaub and P. Knierim, “Drone-based privacy interfaces:Opportunities and challenges,” in Twelfth Symposium on UsablePrivacy and Security (SOUPS 2016). Denver, CO: USENIXAssociation, 2016. [Online]. Available: https://www.usenix.org/conference/soups2016/workshop-program/wfpn/presentation/schaub

[18] K. M. Sathyendra, S. Wilson, F. Schaub, S. Zimmeck, and N. Sadeh,“Identifying the provision of choices in privacy policy text,” in Proceed-ings of the 2017 Conference on Empirical Methods in Natural LanguageProcessing, 2017, pp. 2764–2769.

[19] Disconnect, “Privacy Icons,” https://web.archive.org/web/20170709022651/disconnect.me/icons, accessed: 07-01-2017.

[20] A. D. Miyazaki and S. Krishnamurthy, “Internet seals of approval:Effects on online privacy policies and consumer perceptions,” Journalof Consumer Affairs, vol. 36, no. 1, pp. 28–49, 2002.

[21] T. Foremski, “TRUSTe responds to Facebook privacy problems...”http://www.zdnet.com/article/truste-responds-to-facebook-privacy-problems/, 2017, accessed: 10-01-2017.

[22] G. Glavas, F. Nanni, and S. P. Ponzetto, “Unsupervised text segmentationusing semantic relatedness graphs,” in *SEM 2016: The Fifth JointConference on Lexical and Computational Semantics : proceedings ofthe conference ; August 11-12 2016, Berlin, Germany. Stroudsburg,

Pa.: Association for Computational Linguistics, 2016, pp. 125–130.[Online]. Available: http://ub-madoc.bib.uni-mannheim.de/41341/

[23] Y. Kim, “Convolutional neural networks for sentence classification,”in Proceedings of the 2014 Conference on Empirical Methodsin Natural Language Processing, EMNLP 2014, October 25-29,2014, Doha, Qatar, A meeting of SIGDAT, a Special InterestGroup of the ACL, 2014, pp. 1746–1751. [Online]. Available:http://aclweb.org/anthology/D/D14/D14-1181.pdf

[24] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,“Distributed representations of words and phrases and their composi-tionality,” in Advances in neural information processing systems, 2013,pp. 3111–3119.

[25] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectorsfor word representation,” in Empirical Methods in Natural LanguageProcessing (EMNLP), 2014, pp. 1532–1543. [Online]. Available:http://www.aclweb.org/anthology/D14-1162

[26] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin, “Learningsentiment-specific word embedding for twitter sentiment classification.”in ACL (1), 2014, pp. 1555–1565.

[27] N. Viennot, E. Garcia, and J. Nieh, “A measurement study of googleplay,” in ACM SIGMETRICS Performance Evaluation Review, vol. 42,no. 1. ACM, 2014, pp. 221–233.

[28] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching wordvectors with subword information,” arXiv preprint arXiv:1607.04606,2016.

[29] R. Pitofsky, S. Anthony, M. Thompson, O. Swindle, and T. Leary, “Pri-vacy online: Fair information practices in the electronic marketplace,”Statement of the Federal Trade Commission before the Committee onCommerce, Science and Transportation, United States Senate, Washing-ton, DC, 2000.

[30] E. M. Caudill and P. E. Murphy, “Consumer online privacy: Legal andethical issues,” Journal of Public Policy & Marketing, vol. 19, no. 1,pp. 7–19, 2000.

[31] J. R. Landis and G. G. Koch, “The measurement of observer agreementfor categorical data,” biometrics, pp. 159–174, 1977.

[32] TRUSTe, “TRUSTe & Disconnect Introduce Visual Iconsto Help Consumers Understand Privacy Policies,” http://www.trustarc.com/blog/2014/06/23/truste-disconnect-introduce-visual-icons-to-help-consumers-understand-privacy-policies/, June2013, accessed: 01-07-2017.

[33] R. Ramanath, F. Liu, N. M. Sadeh, and N. A. Smith, “Unsupervisedalignment of privacy policies using hidden markov models,” inProceedings of the 52nd Annual Meeting of the Association forComputational Linguistics, ACL 2014, June 22-27, 2014, Baltimore,MD, USA, Volume 2: Short Papers, 2014, pp. 605–610. [Online].Available: http://aclweb.org/anthology/P/P14/P14-2099.pdf

[34] C. Schneider, “10 reasons why ai-powered, automated customerservice is the future,” https://www.ibm.com/blogs/watson/2017/10/10-reasons-ai-powered-automated-customer-service-future, October 2017,accessed: 10-01-2017.

[35] M. Wang, N. A. Smith, and T. Mitamura, “What is the jeopardy model?a quasi-synchronous grammar for qa.” in EMNLP-CoNLL, vol. 7, 2007,pp. 22–32.

[36] E. M. Voorhees and L. Buckland, “Overview of the trec 2003 questionanswering track.” in TREC, vol. 2003, 2003, pp. 54–68.

[37] H. T. Dang, D. Kelly, and J. J. Lin, “Overview of the trec 2007 questionanswering track.” in Trec, vol. 7, 2007, p. 63.

[38] H. Llorens, N. Chambers, N. Mostafazadeh, J. Allen, and J. Pustejovsky,“Qa tempeval: Evaluating temporal information understanding with qa.”

[39] A. OLTEANU, “Probing the limits of social data: Biases, methods, anddomain knowledge,” Ph.D. dissertation, ECOLE POLYTECHNIQUEFEDERALE DE LAUSANNE, 2016.

[40] S. Robertson, “Understanding inverse document frequency: on theoreti-cal arguments for IDF,” Journal of Documentation, vol. 60, pp. 503–520,2004.

[41] M. Beaulieu, M. Gatford, X. Huang, S. Robertson, S. Walker, andP. Williams, “Okapi at trec-5,” NIST SPECIAL PUBLICATION SP, pp.143–166, 1997.

[42] A. S. Razavian, J. Sullivan, S. Carlsson, and A. Maki, “Visual in-stance retrieval with deep convolutional networks,” arXiv preprintarXiv:1412.6574, 2014.

[43] M. Feng, B. Xiang, M. R. Glass, L. Wang, and B. Zhou, “Applying deeplearning to answer selection: A study and an open task,” in 2015 IEEEWorkshop on Automatic Speech Recognition and Understanding, ASRU

14

https://www.usenix.org/conference/soups2015/proceedings/presentation/schaub

https://www.usenix.org/conference/soups2015/proceedings/presentation/schaub

https://www.usenix.org/conference/soups2016/technical-sessions/presentation/rao

https://www.usenix.org/conference/soups2016/technical-sessions/presentation/rao

https://doi.org/10.1145/2872427.2883035

http://aclweb.org/anthology/P/P16/P16-1126.pdf

https://www.privacyshield.gov/Program-Overview

http://doi.acm.org/10.1145/1572532.1572538

http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:L:2016:119:TOC

http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:L:2016:119:TOC

https://www.usenix.org/conference/soups2016/workshop-program/wfpn/presentation/schaub

https://www.usenix.org/conference/soups2016/workshop-program/wfpn/presentation/schaub

https://web.archive.org/web/20170709022651/disconnect.me/icons

https://web.archive.org/web/20170709022651/disconnect.me/icons

http://www.zdnet.com/article/truste-responds-to-facebook-privacy-problems/

http://www.zdnet.com/article/truste-responds-to-facebook-privacy-problems/

http://ub-madoc.bib.uni-mannheim.de/41341/

http://aclweb.org/anthology/D/D14/D14-1181.pdf

http://www.aclweb.org/anthology/D14-1162

http://www.trustarc.com/blog/2014/06/23/truste-disconnect-introduce-visual-icons-to-help-consumers-understand-privacy-policies/



http://aclweb.org/anthology/P/P14/P14-2099.pdf

https://www.ibm.com/blogs/watson/2017/10/10-reasons-ai-powered-automated-customer-service-future

https://www.ibm.com/blogs/watson/2017/10/10-reasons-ai-powered-automated-customer-service-future

2015, Scottsdale, AZ, USA, December 13-17, 2015, 2015, pp. 813–820.[Online]. Available: http://dx.doi.org/10.1109/ASRU.2015.7404872

[44] M. Tan, C. dos Santos, B. Xiang, and B. Zhou, “Improved representationlearning for question answer matching,” in Proceedings of the 54thAnnual Meeting of the Association for Computational Linguistics, 2016.

[45] K. Jarvelin and J. Kekalainen, “Cumulated gain-based evaluation of irtechniques,” ACM Transactions on Information Systems (TOIS), vol. 20,no. 4, pp. 422–446, 2002.

[46] Facebook, “Wiki word vectors,” https://fasttext.cc/docs/en/pretrained-vectors.html, 2017, accessed: 10-01-2017.

[47] B. P. Knijnenburg, M. C. Willemsen, and S. Hirtbach, “Receivingrecommendations and providing feedback: The user-experience of arecommender system,” in International Conference on Electronic Com-merce and Web Technologies. Springer, 2010, pp. 207–216.

[48] J. Goodman, “Legal technology: the rise of the chatbots,”https://www.lawgazette.co.uk/features/legal-technology-the-rise-of-the-chatbots/5060310.article, 2017, accessed: 04-27-2017.

[49] A. Levy, “Microsoft ceo satya nadella: for the future of chat bots, lookat the insurance industry,” http://www.cnbc.com/2017/01/09/microsoft-ceo-satya-nadella-bots-in-insurance-industry.html, 2017, accessed: 04-27-2017.

[50] F. Liu, N. L. Fella, and K. Liao, “Modeling language vaguenessin privacy policies using deep neural networks,” in 2016 AAAI FallSymposium Series, 2016.

[51] T. Hwang, “The laws of (legal) robotics,” Robot, Robot & Hwang LLP,Tech. Rep., 2013. [Online]. Available: http://www.robotandhwang.com/wp-content/uploads/2013/06/The-Laws-of-Legal-Robotics.pdf

[52] J. O. McGinnis and R. G. Pearce, “The great disruption: How machineintelligence. will transform the role of lawyers in the delivery of legalservices,” Fordham L. Rev., vol. 82, pp. 3041–3481, 2014.

[53] F. Pasquale and G. Cashwell, “Four futures of legal automation,” UCLAL. Rev. Discourse, vol. 63, p. 26, 2015.

[54] P. G. Kelley, J. Bresee, L. F. Cranor, and R. W. Reeder, “A nutrition labelfor privacy,” in Proceedings of the 5th Symposium on Usable Privacyand Security. ACM, 2009, p. 4.

[55] L. F. Cranor, P. Guduru, and M. Arjula, “User interfaces for privacyagents,” ACM Transactions on Computer-Human Interaction (TOCHI),vol. 13, no. 2, pp. 135–178, 2006.

[56] J. Y. Tsai, S. Egelman, L. Cranor, and A. Acquisti, “The effect of onlineprivacy information on purchasing behavior: An experimental study,”Information Systems Research, vol. 22, no. 2, pp. 254–268, 2011.

[57] A. Aktypi, J. Nurse, and M. Goldsmith, “Unwinding ariadne’s identitythread: Privacy risks with fitness trackers and online social networks,”in Proceedings of the 2017 ACM SIGSAC Conference on Computer andCommunications Security. ACM, 2017.

[58] E. Steel and A. Dembosky, “Health apps run into privacy snags,”Financial Times, 2013.

[59] N. C. Council, “Appfail report threats to consumers inmobile apps,” Norwegian Consumer Council, Tech. Rep., 2016.[Online]. Available: https://www.forbrukerradet.no/undersokelse/2015/appfail-threats-to-consumers-in-mobile-apps/

[60] L.-E. Holtz, H. Zwingelberg, and M. Hansen, “Privacy policy icons,” inPrivacy and Identity Management for Life. Springer, 2011, pp. 279–285.

[61] L. Cranor, Web privacy with P3P. ” O’Reilly Media, Inc.”, 2002.[62] S. Zimmeck, Z. Wang, L. Zou, R. Iyengar, B. Liu, F. Schaub, S. Wilson,

N. Sadeh, S. M. Bellovin, and J. Reidenberg, “Automated analysis ofprivacy requirements for mobile apps,” in 24th Annual Network andDistributed System Security Symposium, NDSS 2017, 2017.

[63] K. M. Sathyendra, F. Schaub, S. Wilson, and N. Sadeh, “Automaticextraction of opt-out choices from privacy policies,” in 2016 AAAI FallSymposium Series, 2016.

[64] F. Liu, S. Wilson, F. Schaub, and N. Sadeh, “Analyzing vocabularyintersections of expert annotations and topic models for data practicesin privacy policies,” in 2016 AAAI Fall Symposium Series, 2016.

[65] J. Rao, H. He, and J. Lin, “Noise-contrastive estimation foranswer selection with deep neural networks,” in Proceedingsof the 25th ACM International on Conference on Informationand Knowledge Management, ser. CIKM ’16. New York, NY,USA: ACM, 2016, pp. 1913–1916. [Online]. Available: http://doi.acm.org/10.1145/2983323.2983872

APPENDIX

SERVICES’ SCREENSHOTS

In this appendix, we first show screenshots from our web ap-plication for navigating the results produced by Polisis (Fig. 12to Fig. 14). This application is available at pribot.org/polisis.Also, we show screenshots of PriBot’s web app, answeringquestions about multiple companies (Fig. 15 to Fig. 20). Thechatbot web application is available at pribot.org/bot.

15

http://dx.doi.org/10.1109/ASRU.2015.7404872

https://fasttext.cc/docs/en/pretrained-vectors.html

https://fasttext.cc/docs/en/pretrained-vectors.html

https://www.lawgazette.co.uk/features/legal-technology-the-rise-of-the-chatbots/5060310.article

https://www.lawgazette.co.uk/features/legal-technology-the-rise-of-the-chatbots/5060310.article

http://www.cnbc.com/2017/01/09/microsoft-ceo-satya-nadella-bots-in-insurance-industry.html

http://www.cnbc.com/2017/01/09/microsoft-ceo-satya-nadella-bots-in-insurance-industry.html

http://www.robotandhwang.com/wp-content/uploads/2013/06/The-Laws-of-Legal-Robotics.pdf

http://www.robotandhwang.com/wp-content/uploads/2013/06/The-Laws-of-Legal-Robotics.pdf

https://www.forbrukerradet.no/undersokelse/2015/appfail-threats-to-consumers-in-mobile-apps/

https://www.forbrukerradet.no/undersokelse/2015/appfail-threats-to-consumers-in-mobile-apps/

http://doi.acm.org/10.1145/2983323.2983872

http://doi.acm.org/10.1145/2983323.2983872

https://pribot.org/polisis

https://pribot.org/bot

Fig. 12: Here, we show a case where our web app visualizes the result produced by Polisis.The app shows the flow of the data being collected, the reasons behind that, and the choicesgiven to the user in the privacy policy. The user can check the policy statements for each linkby hovering over it.

Fig. 13: In this case, the security aspects of the policy are highlighted based on the labelsextracted from Polisis. The user has the option to see the related statement by expanding eachitem in the app.

16

Fig. 14: Here, the user is presented with the choices possible, automatically retrieved fromPolisis.

17

Fig. 15: The first answer from our chatbotimplementation of PriBot about third-partysharing in the case of Bose.com. Answers areannotated by a header mentioning the highlevel category (e.g., Context of sharing withthird parties). The confidence metric is alsohighlighted int the interface.

Fig. 16: The first answer about data retentionin the case of Medium. Notice the semanticmatching in the absence of common terms.Notice also that only one answer is shownas it is the only one with high confidence.Hence, the user is not distracted by irrelevantanswers.

Fig. 17: The answer given a question “do youbla bla bla”, showcasing the power of theconfidence metric, accounting for unknownwords in the question and relaying that par-ticular reason to the user.

Fig. 18: This case illustrates the scenariowhen PriBot finds no answer in the policyand explains the reason based on the auto-matically detected high-level category (expla-nations are preset in the application).

18

Fig. 19: This case illustrates the power ofsubword embeddings. Given a significantlymisspelled question “how much shoud i waitfor you to delet my data”, PriBot still findsthe most relevant answer.

Fig. 20: This case, with the policy of Oy-oty.com, illustrates automatic accounting fordiscrepancies across segments (Sec. VII-A)by warning the user about that.

19

TWITTER QA DATASET

In this appendix, we list the 120 privacy-related questionsthat we used in our evaluation (collected from Twitter andagreed on during the annotations by two of the authors).

1) @Monsterjobs uk Are you legally responsible for any loss of claimants’ data thatmay occur on Universal Job Match?

2) @Kenshoo I know I can just check your website, but are you taking any personaldata while you are looking for our search queries?

3) Just tried to change my details at @theregister and it now wants my address andphone number. Why?

4) @NorthumbrianH2O thanks. Do you pass on customer addresses to 3rd parties? Gotinteriors catalogue addressed to me here. How did they know?

5) @EE May I please request what companies my details have been passed on to? Iam getting calls from various elec suppliers. Many thanks.

6) @creditkarma Does cancelling an account also delete all associated data (especiallySSN) from your system? Want to know before I sign up. :)

7) @yewknee @Simplify does it simplify sharing your banking data with advertisers?8) @TechSmith Do you collect information and provide it to third parties?9) @moneysupermktUK Hi - if I use your service will there be ANY telephone calls

from you or 3rd party company or is it 100% web based?10) @Viber So, can everyone in your contacts see the photos you have shared on Viber

even if not originally shared with them? @fit gurl11) @TradeMe Isn’t releasing information under the Privacy Act voluntary? I.e to protect

users you could make the Police follow formal process?12) @FitbitSupport is data stored on the cloud? I heard of leaks from the cloud.13) @SparkMailApp Is there more information about sync settings via cloud. Security?

Possible to delete what’s been synced?14) @nswpolice what sort of data do you track outside the questions for visitors to the

link?15) @weebly shocked at the unsolicited emails and calls I’m getting since I signed up

with your web service. Did you REALLY sell my information?16) @Prezi Guys, small question: the email addresses you collect from Prezi accounts,

are they being used for third parties? If so, why?17) @myen Are Evernote notes encrypted at rest?18) @quip quick question if I connect my accounts can you access all my info? Or it

still remains just for me to see?19) .@AngiesList so do you sell your mailing list to everyone??? My junk email has

increased exponentially since joining. #sheesh20) @fullcontact do you have a warrant canary statement that you’ve never provided

users’ address books to authorities? If not, can you?21) @getspeedify, does Speedify encrypt my traffic or is it an unencrypted VPN? Also,

do you keep logs of users? If so, for how long?22) @duckduckgo You don’t log user info, but what bout cookies ? Do you use them

every time we log on ?23) @AskSubaruCanada So that makes it ok for you to give them personal info to spam

customers with every day? Where was the opt out option?24) @skulpt me Very interested in your device! Can you tell us how our personal info

and data are used and/or resold with your app?25) @loseit how about you guys? Do you #share the data we log in your app? #Privacy

https://t.co/qTH6Ir905A26) @EuclidAnalytics are you able to isolate a MAC address’ data and provide it to

law enforcement?27) @threatspikelabs what data do u store, if any, how long u store for? 3rd party

compliance? If served with warrant what’s ur steps to protect28) @Adobe do you sell Mail Adresses??!!29) @FreePPICheck do you keep or pass over any personal information after completing

your ppi check??30) Hey @Optus why am I getting calls from people wanting to sell me funeral

insurance? Have you sold my phone number to a call centre?31) @nest Do you keep your customer’s emails private after you have obtained an email

during installation setup?32) @smallpdf are your applications HIPAA compliant?33) @SpotifyCares where is the opt out option for shareing personal info. If I opt out

of the terms I am told that I cannot use Spodify?34) @msg @ProductHunt @service Curious how personal info is protected, assuming

u have to give it out for most cust serv resolutions?35) Also, @HotDocOnline you make no mention on your site of how patient data is

secured. Would you like to elaborate in public?36) Does anybody know if @EE sell on emails? I’ve been inundated with junk mail

since I got my new contract the other week...37) @carmillaseries how secure is the merch store? I want to buy myself something for

my birthday but I’m afraid to use my credit card online.38) @carshare hey folks, what’s the best way to reach you about security disclosure of

your service and potential access to customer data?39) @submittable why do I suddely need to enable cookies? Is there an opt out, or am

I switching back to stamps?40) Latest @bankofireland iPhone app update wants constant background access to my

location. For security, marketing, or something else?41) MT @Remind101 @cellyme ...What do you do with the phone numbers that are

archived? Your current policy?

42) @AirsideInsider Could you provide some technical docs about the security used forthe #MobilePassport apps? Saved locally, encryption, etc.

43) Ok @HRBlock, why would @Ghostery report over 16 advertising trackers fromyour supposedly secure online tax application?

44) @angrybirds is this true? http://t.co/gCBoFIcZ you send people’s contacts to 3rdparties without permission?

45) @LinkedInHelp so someone is selling my email address from you and signing meup? Are you saying that you’ve nothing to do with this email?

46) @floatapp how secure are your servers? Can you direct me to the security info onyour website please?

47) @Gopit Search @PrivacyMatters what’s your privacy statement and what data doyour store from your users?

48) .@AskTarget ok thanks but I assume that means yes you all do sell patient namesand addresses?

49) @opendns I’d like to know if there’s any security concern, like whether the DNSprovider can track my browsing. What are the privacy issues?

50) @swiftkey can your app NOT collect my passwords and credit card numbers? Howis that legal?

51) @HootSuite Help If I sign up: Where’s the option on your website to opt out ofyour sharing my personal info?

52) Also, can anyone at @TrustifyPI guarantee that the emails people put in to checkagainst this “list” won’t be sold off? No TOS on app.

53) @SagiGidali Hi Sagi, what’s your stance on keeping customer logs, and where isyour company/customer data based for legal reasons?

54) @Telstra just wondering what’s the extent of your monitoring on customers (me).Link me to pds if possible?

55) @TTChelps Yes, how will you manage my travel records and contact info, underwhat circumstances will you release to 3rd parties?

56) @AirbnbHelp you already have my phone number, my linkedin and my profile pic+ feedback from previous hosts. So why you want my ID? @Airbnb

57) @hushmail Q: does hushmail collaborate with NSA to spy on its users’ emails?http://t.co/aZqiuL6sja looking for alternative to Google mail

58) @EtsyHelp is it safe putting my banking info to etsy?? I have unpaid student loans,and I don’t want them paying off @Etsy for my info :P

59) @troyhunt @FreedomeVPN do they log traffic ? Port open for upload ? Rhanks :)60) @onavo Can I ask - why is it free? and how do u guarantee that data is anonymised?

Ta :)61) So @stripe has entered the UK market (nice) shouldn’t they declare if they share

user data with any one?62) @asiaelle @graceishuman I like Evernote for some things but I worry about data

security. Who can see my pages ?63) The @nest needs to collect your in and out patterns for all your family. Who owns

that data? Can it be subpoenaed? federated? @mdrasch #IoT64) @tigerVPN Can’t wait for the IOS app! It seem that I can’t find info if u log or no

log to guarantee our privacy?65) @duckduckgo –is it really private. Does duckduckgo–follow me and watch me and

spy on me? Can I truly search in total privacy?66) @Telus @Shawhelp @Shawinfo Can you verify if you provide customer information

without a warrant to law enforcement but upon request? #bcpoli67) @nest Is Nest sharing data with Google still optional? Will it remain that way for

the foreseeable future?68) @duggan My money is on marketing plus hubris. @bankofireland, how confident

are you that data won’t leak?69) .@fitbit What are you doing to protect customers’ privacy?70) @TMobileHelp where’s the part of my contract where I gave you permission to log

my urls and location?71) @Netflixhelps The Perfect World Peanut Labs offer to earn zen for signing up. Will

my information be confidential or is it shared with PWE?72) Hey @indeed, do you sell info to 3rd party sites?73) @Jawbone how do you protect all the information tracked by your wearables?

#CIS21074) I understand that @airbnb want to see my ID’s before booking, but can I know what

they are doing with that data ? #privacy #matters75) @privatewifi Will do more research, but how do I know VPN software isn’t gathering

my data/personal info?76) @mysms Can you elaborate on prvcy polcy? Can employees access sms? If so,

when/when would they? What safeguards exist to prevent abuse? Thx!77) @mrgunn @colwizSupportWhat kind of data do you collect from users? What do

you do with it? Who stands to gain from it financially? How?78) @Viber is that truth you are spying on users’s calls ??79) @22seven - do you, would you, could you ever share personal data with SARS? Is

there anything in your T&C’s prohibiting you to do so?80) Hey @GoDaddy why do you guys sell my information every time I buy a domain

from you? I’m assuming I gave you permission at some point?81) @sprintcare @sprint - What’s up with the new privacy contract you guys just did?

Giving away our info to random businesses ? #Sketchy #Smh82) @opera Do you track and store people’s data like Google, Microsoft, Facebook and

all their other cronies? If you don’t I’ll switch to you.83) Does @Official GDC have a privacy statement anywhere re: the personal data of

attendees & how it’s shared w/ exhibitors, speakers, etc.?84) Is this correct? @VodafoneIreland cc rep said they don’t retain customer information

predating your existing or last contract. cc: @ComReg85) Hey @HostGator can you not sell my phone number to telemarketers? Paying for

20

privacy protection should protect me from YOU TOO86) With the rapid rise in so called encrypted messaging apps, how do you feel @viber

competes on security? #cgc1487) @UnrollmeHelp is there a way i can be sure that you’re not just the nsa reading

my mails? #faq88) @evernotehelps are my notes being saved encrypted on your servers per default?

Or is only manually encrypted text encrypted?89) @Telstra So nothing more than absolutely necessary to meet legal interception and

not used unless required under law?90) @Viber if I sign up with viber, do you load my iphone contacts into your servers?

Thanks #viber91) @AutomaticHelp hi! how is the sensitive data my Automatic collects encrypted and

stored?92) @truecallerhelp good, thx. The question was different: will they be removed from

your servers or not? Which is the procedure to remove them?93) Sean: What Mobile Apps Know & Transmit About You: @AngryBirds sends my

contacts to third parties? #WTF #FAIL http://t.co/IKVYc6l794) @simpletaxca What personal information do you retain after I do my taxes on your

site?95) @MailChimp Does Mail Chimp retain our list and use or sell them elsewhere ?96) @FreePPICheck if I were to give you my phone #, how many people will you sell

it to?97) @TripCase how safe isthe personal info?98) @fitbit @FitbitSupport Where can I go to see who you sold my private health data

too? http://t.co/Rd64dKWGFb99) @VentraChicago How do you use our personal data once we’ve registered with

Ventra? #AskVentra100) @bitly Thanks for the response. Does bitly have access to the links? What if I

wanted to send a file to a friend and it’s personal?101) @jobsdotie Also, no data privacy guarantee, or info on who gets access to my CV

(is it just the advertising company or also jobs.ie staff?).102) @nest Privacy question, does Nest share usage patterns or anything with third

parties? cc @joshmend103) The @tapjoy ad framework uploads my UDID *and* my MAC address? Is that

*really* necessary? :—104) Dear @eBay I don’t appreciate my buyers having access to my phone number! Why

is this person calling me at 9pm? #totallyinapprpriate105) Soooo, I joined @mint about a week ago and today I’ve received 7 credit card offers

in the mail... selling my info much?106) Hi @auspost how can a person opt out on Australia Post storing phone numbers ?107) .@MGMResortsIntl do you sell your mlife member contact information? Receiving

calls from sports betting tip line since staying @ NYNY.108) Im conflicted 2 clicking “accept” to your policy changes. Why do you need my

birth date, friends info, etc.? @SpotifyUSA #TaylorMightBeRight109) @AdblockPlus https://t.co/3Awum5BwRF How much control will users have over

their personal data? #tracking #adblock110) @clue just curious. do you share users information with third parties? i’m getting

targeted ads for tampons etc. after using your app.111) @theTunnelBear nice! Given the recent U.K. change in law, do you have any details

about your logging privacy etc?112) Is customer credit card info stored encrypted? Even if it is storing password in plain

text kind of kills that. @RSComponents113) @davidsbridal when people sign up for your site, do you sell their info? I have

received numerous unsolicited calls on my cell phone.114) @Cabelas Do you or your partners sell, share, or otherwise disseminate your

customer’s mailing addresses directly or indirectly to the @NRA?115) @Spotify Due to the complete lack of respect for privacy in the new T&C I wonder

where to delete my account? stopped subscription already.116) @hubspot Why must I enable third party cookies in your browser settings? Is there

a workaround? DM pls117) @Viber Hi, is your service secured against spying by the nsa and gchq?118) .@automatic Who owns my driving data? Will my driving behavior be aggregated

and sold?119) @ProtonMail @ProtonMailHelp E.g.: If I use my IOS to access emails, what data

is stored by you guys? Is there a link to explain?120) @Nosgoth - is it mandated to link our steam account to Square in order to play the

game ? What securities are in place for protection of acc

21

Date post:	13-May-2018
Category:	Documents
Upload:	duongminh
View:	220 times
Download:	2 times

Polisis: Automated Analysis and Presentation of Privacy ... · ﬁne-grained privacy classes for...

Documents