Real-time Stream Processing Architecture for Comcast IP Video
Strata Conference + Hadoop World 2013
Chris LintzGabriel Commeau
o Comcast VIPER Overviewo Architecture Overviewo Q & A
Agenda
Comcast Video IP Engineering and Research (VIPER)
Packaging
Origination
Storage
Transcoding
iOS
Android
Xbox Live
Samsung
Storm
Why Do We Focus on Real-time?
• Proactively diagnose issues
• Form real-time intelligence
• Help deliver best possible video experience
Prime Time
Viewership
Video Player Analytics Protocol
• Live and On Demand• JSON event objects• Key metrics• Bitrate• Frame rate• Fragments• Errors
We collect and use all data in accordance with best consumer privacy practices and applicable laws
Player Sessions: Key In Understanding Video Experience
High Level Architecture And Data Flow
o Collect, aggregate and move large amounts of datao Distributed, scalable, reliable, customizableo Multi-tier architecture
Flume: Data collection Tier
Storm: Stream Processing Tier
o Sessions in Flume?• Technical issues: consistent hash and exactly-once semantics• Design goals• Separation of concerns
o Session write-through rate?
Player Sessions in Real-time
o Analytics events over HTTPSo HTTP Sourceo Re-batch with inner sink and source
Flume Edge Tier: Video Player Analytics End Point
o Video Player Event processing• Geo-location, asset metadata, validation, to-storm
o Replication channel processor:• HDFS sink• Storm sink
Flume Mid Tier: Processing and Routing Data
o Service discoveryo Distributed, scalable and reliableo Low latency
Bridging Flume to Storm: Flume2Storm Connector
Simplified Video Player Storm Topology
o Functionality beyond key/value storeso Real-time and historic window querieso Speed of in-memory writes and durability of disk
Requirements for Read/Writes from Storm Bolts
Utilizing MemSQL for Persistence
• Distributed in-memory SQL database
• ACID, highly available, fault tolerant
• Aggregators route queries to leaves
• Leaves are auto-sharded• Solves our intense read/writes
Isolated Analysts and Ingest Aggregators
Achievements In Utilizing MemSQL
• Complex queries in milliseconds
• Fault-tolerant Storm bolt state
• Joins now available outside of Storm bolts• Foreign key shards
• Complex data streams • Dynamic alters without locks
or down time• JSON type
Wrapping Up
o Real-time at Comcast scale• Millions of video players• Horizontal scale everywhere• Aggregated metrics across US and complex analysis• Real-time API
o Builds foundation• Advanced real-time analytics • Better platform for innovation
– Alerts on complex objects– Supplemental real-time data back to clients– Popularity-based CDN