+ All Categories
Home > Documents > Privacy in a Demographic Database

Privacy in a Demographic Database

Date post: 22-Feb-2016
Category:
Upload: hal
View: 26 times
Download: 0 times
Share this document with a friend
Description:
Privacy in a Demographic Database. Milestone #1 Razi Mukatren, Golan Salman . Milestone #1. We started the privacy analysis of the Data. we manually generate tables from the Israel Central Bureau of Statistic's website - more than 40 tables . - PowerPoint PPT Presentation
Popular Tags:
14
PRIVACY IN A DEMOGRAPH IC DATABASE Milestone #1 Razi Mukatren, Golan Salman
Transcript

Privacy in a Demographic Database

Privacy in a Demographic DatabaseMilestone #1

Razi Mukatren, Golan Salman 1Milestone #1We started the privacy analysis of the Data.we manually generate tables from the Israel Central Bureau of Statistic's website - more than 40 tables.Understanding the specific technique that the CBS uses for their website.From the pulled Data, we learned the tables, we manually looked for intersection between the data in order to understand more about the surveysNext Step: pulling the data/tables from the website using a script.

2the privacy analysis of the system We run manually tests, we saw its possible to create information about specific participant in the survey.

For example: Taking all 7,500 participants data and filtering only those who:

1) Studied some subject that connects to education. 2) Has incoming profit of more than 24,000 NIS per month. 3For example :We generated 10 Tables and use the following filters: Arab villages and Religion Muslims.Filter used to reduce the size of the table, what we mean that we will get the info only related to the above Filters.

4The survey has only 12 people who live in Arab villages and Muslims (we can learn this from Table #1. Six of them are men, and six are women. Also, we can see the ages of those 12 people in the tables below.

Now well look in the tables which includes in total 12 participates, since they for sure will include all the 12 participates from the Table #1.

5Table 5,7,9,10 includes all the 12 participates.

6From table #5 we can learn that for example the participates between age 20-24 one his height 120 -124 the second 185-189. From table #5 we can learn that for example the participates between age 20-24 one his height 120 -124 the second 185-189. Now if we go back to table #1 we will see that one is man one is women, to see who is who we will generate new table includes same filters and we will add second column for genderWill name it table 11, from table 11 we can see the Women her height is 160-164, and the men 185-189. Lets focus only on this 2 participates for example because one of them appears in all the 10 tables (we have age 20-24 in all the 10 tables).

7From table #2, we can see that one of them hired worker, lets generate new table (called table12) and check who is the hired worker the man or the women. We can see from table number 12 that the man is the hired worker.So far we know about the Man, his age 20-24, Muslim, from Arab village, his height 185-189, and hired worker.

8From table #3 and table #4, we can learn that he work in the constructions and he far about 15-30 min driving from his work.From table #6, both of them the man and the women study 11-12 years

9From table #7 one of them weight 90- 94 and the other 65- 69, lets generate new table (13) and check which one is the man, from table 13 we can see that the man weight between 90-94 Kg.

10From table #8 he makes from 5K 6K NIS gross.Table #9 he is from the north.Table #10 we need to generate new table #14, from table 14 we can see that his family includes more than 7 members.

11In conclusion:

We know about the Man, His age 20-24MuslimFrom Arab village,His height 185-189Hired workerDistance from work 15-30 min drivingStudying Years11-12he weighs 90-94 KgHis salary 5K-6KNIS gross per monthhe is from the northHis family includes more than 7 members.

12Where are we going From hereNext stepsTwo major points (the plane is to finish them until milestone 2):

automatic extracting and generating surveys tables from the CBS (it will be the first script).

Start working in the algorithm for searching in the data for the 1, and try to find intersections between this information (it will be the second script).

13The first script and major issuesThe website support only IE. We though that we can use a macro script using FF or Chrome, but since the IL Governments sites support only IE so we cant use the macros scripts.Now we are testing alternatives:Either Scrapy: http://scrapy.org/ used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.or curl in bash or java with http://jtidy.sourceforge.net/ - JTidy is a Java port of HTML Tidy

14


Recommended