Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | ashlee-washington |
View: | 220 times |
Download: | 0 times |
PowerPoint bemutat
AdatmenedzsmentKihvsok s vlaszokAndras BenczurInsitute for Computer Science and ControlHungarian Academy of Sciences (MTA SZTAKI)[email protected]://datamining.sztaki.huNovember 11, 2025Jv Internet KonferenciaGrowth of Data vs. Growth of Data Analysts
http://www.delphianalytics.net/wp-content/uploads/2013/04/GrowthOfDataVsDataAnalysts.png2012.06: http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/
2013.02: http://www.slideshare.net/mjft01/big-data-big-deal-a-big-data-101-presentationSAP HANA Demo: Meryl Streep, Oscar 2012
SAP HANA Demo: Meryl Streep, Oscar 2012
kp: http://mirror.co.ukSAP HANA Demo: Meryl Streep, Oscar 2012
SAP HANA Demo: Meryl Streep, Oscar 2012What was The Challenge in Analytics?One year 1 billion Tweet collection, 100GBAd Hoc queries (Meryl Streep) may have 100,000+ hitsFast response needed to support the analystSolutionsIn Memory databases (such as SAP HANA)Customized approximate data structures (Bloom filters, MinHash fingerprints)
Information spread prediction in time
Collaboration w/ Ericsson on Spark PiggybackingHard to predict resource usage overestimation hurts SparkPeaks may cause out-of-memory errors Spark will fail
timeresourcesresource allocated by jobcurrent consumptionout-of-memory errorunder-utilization, waste of resources
Collaboration w/ Ericsson on Spark PiggybackingData Stream Analytics: Mobile Session DropPeriodic radio channel physical parameter measurementsPredict abnormal terminationBest features selected by AdaBoost: DropNot drop1. RLC NACK ratio Upl. Max 0.93447 0.12676 2. RLC NACK ratio Upl. Mean 0.11787-0.115713. Harq NACK ratio Downl. Max 0.02061 0.00619 4. Delta RLC NACK ratio Upl. Mean 0.19277 0.18110 5. Signal-Interf+Noise Mean 1.92105 6.61538ii+2iDynamic Time Warping
AUC: 0.93FPR: 0.03, TPR: 0.7FPR: 0.2, TPR: 0.89
DropNo DropSTREAMLINE Magic Triangle
New initiative on top of Apache FlinkDFKI (DE)SICS (SE)Portugal Telecom (PT)Internet Memory (FR)Rovio (FI)NMusic (PT )SZTAKI (HU)
Batch and Streaming in FlinkA "use-case complete" framework to unify batch and stream processingEvent logsHistoric dataETL RelationalGraph analysisMachine learningStreaming analysis
Flink
Historic dataKafka, RabbitMQ, ...HDFS, JDBC, ...
ETL, Graphs,Machine LearningRelational,
Low latencywindowing, aggregations, ...Event logs
Real-time data streamsBatch and stream: Same execution engine An engine that puts equal emphasis to streaming and batchBike Share Challenge one more month to go!November 11, 2025Jv Internet KonferenciaNovember 11, 2025Jv Internet Konferencia
Questions?November 11, 2025Jv Internet KonferenciaNew architecture for unified batch + stream neededApache Flink has the potentialNew machine learning is neededWe participate in turning research codes to open source software