Date post: | 06-Apr-2017 |
Category: |
Data & Analytics |
Upload: | ben-carls |
View: | 47 times |
Download: | 0 times |
@L_Tron_CTA: A Friendly Bot with an Eye on Chicago’s ‘L’Ben Carls
The Chicago Transit Authority (CTA) operates the ‘L’ (elevated)
• Overwhelming amount of data exists for describing the system
• CTA Twitter account is still operated by a person in a control room
• Could we do better?
A Twitter bot sends out timely information
Pulls data from sources
Analyzes data, finds what’s important
Creates sentence and posts it to Twitter
Famous examples
Wealth of structured data exists for the ‘L’ amongst other things in Chicago
Wealth of structured data exists for the ‘L’ amongst other things in Chicago
Wealth of structured data exists for the ‘L’ amongst other things in Chicago
Wealth of structured data exists for the ‘L’ amongst other things in Chicago
Wealth of structured data exists for the ‘L’ amongst other things in Chicago
Wealth of structured data exists for the ‘L’ amongst other things in Chicago
Wealth of structured data exists for the ‘L’ amongst other things in Chicago
Okay! Okay! Most of this is irrelevant! How do I quickly find out what actually matters?
What kinds of events impact train travel and are worth mentioning? Chicago Cubs’ games?
Daily ridership for Addison Stop (Red), right where the Chicago Cubs play
Random forest modeling ridership showed baseball mattered, bot tweets it
Trained on 2011-2013, tested on 2014-2015
Here used day of the week and day of the year as features
Random forest modeling ridership showed baseball mattered, bot tweets it
Trained on 2011-2013, tested on 2014-2015
Here used day of the week, day of the year, and if there was a Cubs game that day as features
‘L’ Tron works 24/7 on an EC2 instance
Find what the person wants
Compare data to timetable and look for delays
Search for other events (e.g. baseball), compare to ridership model
Thread 1: Every 5 minutes
Query CTA server for data via API
Thread 2: Someone talks to ‘L’ Tron
Look for data from Thread 1 to respond with
What should I tweet to my audience?
Find a line delay > 5 minutes?No
Is there a baseball game?Yes
No
Does the system look okay?
NoYes
YesTweet it out!
Tweet it out!
Tweet it out!
Tweet it out!
Following from Thread 1:
Language generation starts with a large, human-written corpus
"[route_name] line trains on their way toward [destination] are running roughly [delay_minutes] [minute_s]late.”"[route_name] line trains on their way toward [destination] have fallen roughly [delay_minutes] [minute_s]behind schedule.”"[route_name] line trains on their way to [destination] are running roughly [delay_minutes] [minute_s] late.”"[destination] headed [route_name] line trains have fallen roughly [delay_minutes] [minute_s] behind schedule.”"[destination] bound [route_name] line trains are running roughly [delay_minutes] [minute_s] behind schedule.”"[destination] bound [route_name] line trains have fallen roughly [delay_minutes] [minute_s] behind schedule.”
Each tweet template is categorized for a particular use case
A template is chosen at random and filled in as needed
"[destination] bound [route name] line trains are running about [delay_minutes] [minute_s] behind schedule."
”O’Hare bound Blue line trains are running about 12 minutes behind schedule."
If a delay of 12 minutes is found on the O’Hare bound Blue line, those details are inserted into the template
‘L’ Tron - CTA is alive and tweeting!
I lived here
I worked here
High-resolution imaging detectors for particles and 3D data visualizations
Looking for the Higgs boson at Fermilab