Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | branden-stewart-crawford |
View: | 222 times |
Download: | 0 times |
Using geolocated Twitter traces to infer residence and mobility
Nigel Swier, Bence Kormaniczky, and Ben Clapperton
Background
• ONS Big Data Project: This is one of four pilots exploring the use of big data for official statistics
• Users tweeting from a smartphone have an option to provide a GPS location
• 300,000-plus such tweets sent daily within GB• Data is relatively accessible• Can these data be used to infer residence and
mobility patterns?
Age Distribution of UK Twitter Users
Data Acquisition
• Target data: All geolocated tweets sent within Great Britain between (1 April 2014 to 31 October 2014)
• Combination of Twitter API and procured data (GNIP)
• 81.4 million tweets• Stored as JSON files in MongoDB
Distribution of user activity
Distribution of persistence levels
User frequency
count
Users with geolocated tweets on just one day not shown
Geo-located Twitter volumes by Device Type Great Britain, 15 August to 31 October 2014
Lots of activity in different places but where does this person* live?
* This example is based a real data but has been altered to prevent identification
DBSCAN
DBSCAN (Density Based Spatial Clustering Algorithm with Noise)
•i = distance (radius)•minpts = minimum points to define a cluster
Developed by Ester et al (1996)
Raw Data
Cluster Centroid
Noise
Cluster_id Northing Easting Count Type
60033_1 105?31 530?02 28 Residential
60022_2 104?41 530?94 4 Residential
60033_6 182?46 532?10 13 Commercial
60033_13 104?56 531?17 3 Commercial
60033_15 179?30 533?95 3 Commercial
60033_21 165?47 532?51 3 Commercial
Most likely lives here:“Dominant Residential Cluster”
Time of day profile by address type
Geolocated penetration rates*by local authority
* Dominant residential cluster with date range of at least one month
Student mobility
Conclusions
• Twitter may be useful for identifying short-term mobility patterns
• DBSCAN can identify anchor points and AddressBase can classify them
• Results are indicators NOT estimates - may be possible to produce new de-facto based population statistics
• Twitter could help inform public policy but we need to be extremely alert to source changes.
Next Steps
• Technical Report to be published shortly• Developing methods for inferring socio-
demographic characteristics• Development of an estimation framework
(including a benchmarking survey)