+ All Categories
Home > Technology > Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks

Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks

Date post: 28-Jul-2015
Category:
Upload: mapr-technologies
View: 459 times
Download: 2 times
Share this document with a friend
Popular Tags:
31
© 2015 MapR Technologies © 2015 MapR Technologies Exploring Enterprise Networks with Familiar BI Tools June 2015 Nick Amato Director, Technical Marketing MapR Technologies
Transcript

1. 2015 MapR Technologies 2015 MapR Technologies Exploring Enterprise Networks with Familiar BI Tools 2. 2015 MapR Technologies On the Menu Discovery: why Hadoop + BI tools for analyzing networks? Network analysis in a BI context Apache Drill Connecting BI tools to network data Practical examples with Drill and BI Querying packets with Tableau Troubleshooting with SAP Lumira Gaining insight into customer experience across multiple sources Using built-in Drill features for faster analysis Summary, conclusions, more resources 3. 2015 MapR Technologies Topics not covered in detail Packet capture architectures Ways to capture packets effectively Large-scale packet processing others have done this Comparison of BI tools Survey of the best SQL-on-Hadoop technology 4. 2015 MapR Technologies Theres a lot happening in your network Packets, logs, interconnections Many layers (L1-L7), L8 Network data is multi-faceted Its serialized and highly structured It facilitates communication between heterogeneous devices via common protocols But its not structured to be stored and analyzed The application often doesnt care Consequently, specialized tooling and software is required 5. 2015 MapR Technologies Why Hadoop + BI tools? What does Hadoop enable that makes it a powerful tool for network analytics? Whats new that wasnt previous possible/desirable? How does it augment existing solutions? Its many things: New ways of accessing semi-structured data from the network Offloading of existing data warehouses and tools Combining, joining, blending network captures with other sources Many network tools cannot answer questions about your business and customers You can use SQL to get a lot of the answers you need 6. 2015 MapR Technologies New Data Sources Unlock New Insights & Apps Existing structured data Well-defined and well- understood schema OLTP data Data warehouse data End user data stores (e.g., Excel) New multi-structured data Typically un-modeled, different in format Network data Clickstream data Sensor data Rich media (e.g., audio, video) Documents both types are needed today for deeper insights 7. 2015 MapR Technologies 1980 2000 20101990 2020 Fixed schema DBA controls structure Dynamic / Flexible schema Application controls structure NON-RELATIONAL DATASTORESRELATIONAL DATABASES GBs-TBs TBs-PBsVolume Database Network data, like other data, is increasingly Stored in Non- Relational Datastores Structure Development Structured Structured, semi-structured and unstructured Planned (release cycle = months-years) Iterative (release cycle = days-weeks) 8. 2015 MapR Technologies Apache Drill Brings Flexibility & Performance Access to any data type, any data source Relational Nested data Schema-less Rapid time to insights Query data in-situ No Schemas required Easy to get started Integration with existing tools ANSI SQL BI tool integration Scale in all dimensions TB-PB of scale 1000s of users 1000s of nodes Granular Security Authentication Row/column level controls De-centralized 9. 2015 MapR Technologies Granular security permissions through Drill views Name City State Credit Card # Dave San Jose CA 1374-7914-3865-4817 John Boulder CO 1374-9735-1794-9711 Raw File (/raw/cards.csv) Owner Admins Permission Admins Business Analyst Data Scientist Name City State Credit Card # Dave San Jose CA 1374-1111-1111-1111 John Boulder CO 1374-1111-1111-1111 Data Scientist View (/views/maskedcards.csv) Not a physical data copy Name City State Dave San Jose CA John Boulder CO Business Analyst View Owner Admins Permission Business Analysts Owner Admins Permission Data Scientists 10. 2015 MapR Technologies Self-Service Data Exploration Direct access to Hadoop data from familiar BI / Analytics tools- ANSI SQL compatible Ad-hoc Reporting Queries Raw Data Exploration Day Zero queries 11. 2015 MapR Technologies Drill is a Distributed SQL query engine drillbit DataNode/Regi onServer drillbit DataNode/Regi onServer drillbit DataNode/Regi onServer ZooKeeper ZooKeeper ZooKeeper Scale out Columnar and Vectorized execution Optimistic and pipelined execution (no MR, Spark, Tez) Late binding Extensible 12. 2015 MapR Technologies - Sub-directory - HBase namespace - Hive database Run SQL on Captures Directly SELECT * FROM dfs.router1.`captures.json` Workspace - Pathnames - Hive table - HBase table Table - DFS (Text, Parquet, JSON) - HBase/MapR-DB - Hive Metastore/HCatalog - Easy API to go beyond Hadoop Storage plugin instance 13. 2015 MapR Technologies Network Analytics in a BI Context Getting results from BI tools requires SQL expertise Analytic techniques, visualizations, dashboarding Proprietary information about your operations Making sense of sources quickly New SQL-on-Hadoop (like Drill) technologies enable leveraging this: To find new areas to gain value from combining your own proprietary data with network sources Augment the analysis youre doing now via use cases for packet data youre already storing in Hadoop Use data in real-time thats too large to fit into memory and/or hits BI tool limitations for analysis directly 14. 2015 MapR Technologies Hadoop Packet Processing Ecosystem Translating to various formats JSON CSV Parquet, others Packet ingestion Flume tcpdump source Direct from hardware vendors Northbound APIs Openstack and opendaylight More open source tools Packet processing in Pig, etc. 15. 2015 MapR Technologies Network Data Sources Data sources in the network are growing, changing Existing: tcpdump, SPAN, pcap New and more: SDN, NFV, REST APIs Often not suitable for analysis directly Requires building a schema ETL Structure is changing and evolving ongoing management Large size, too big for memory 16. 2015 MapR Technologies REST APIs and JSON Self-describing data is common with REST APIs JSON Northbound APIs on almost everything in the network Enables access to many operational views But requires development work to pull it together SQL queries directly on the data is difficult Requires transformations, scripting, parsing 17. 2015 MapR Technologies View Drillbits information in the cluster 18. 2015 MapR Technologies Manage storage plugin instances through Web UI 19. 2015 MapR Technologies Monitor and manage Drill queries 20. 2015 MapR Technologies See details of the query 21. 2015 MapR Technologies SAP Lumira and Wireshark Example -- Scenario Overview: Sensor data in JSON format being gathered multiple times daily from remote locations Done over an IP network, each sensor has an IP address Problem One sensor is experiencing reading failures Network connectivity issues are suspected Solution Approach Take packet captures where we are reading sensors (central location) CSV-formatted Wireshark file Observe whether there are many TCP retransmissions happening between the source and destination Ultimately, determine if the network is the problem and take action 22. 2015 MapR Technologies 23. 2015 MapR Technologies Summary Using Drill from SAP Lumira, and the JDBC driver We compared data across multiple sources Notice we didnt do any ETL Or define any schema for the network data Using existing ANSI SQL knowledge to query the data without transformations Not just on the network data, but combined with other sources Self-service 24. 2015 MapR Technologies 25. 2015 MapR Technologies Network Routing, OpenStack, JSON Link-state routing protocols (OSPF, IS-IS, Trill) Each participating node knows the topology of the entire network A dump of the database shows all nodes and adjacencies Physical and logical topology Other information (MPLS, etc.) OpenStack: pull networks, subnets, ports via REST API Use Drill Explorer to build a view Combine the data with device or customer information Enables visualizing the entire network quickly 26. 2015 MapR Technologies OpenStack Networking APIs Example JSON formatted responses Run queries without any data preparation Use of FLATTEN() for arbitrary maps 27. 2015 MapR Technologies FLATTEN() FLATTEN() is useful for exploration of data that is repeated Used on arrays Columns are repeated as necessary to maintain association with each element of the array Example: host routes: [ { destination : 0.0.0.0/0, nexthop: 10.10.10.1 }, { destination : 192.168.10.0/24, nexthop: 192.168.0.1, }, ] 28. 2015 MapR Technologies 29. 2015 MapR Technologies TCP Round-Trip Times Example TCP RTT can affect customer experience in many ways Not just loading pages Also interactive, AJAX, forms, etc. Much of this can be calculated with other tools, then visualized Complex to calculate on your own Only a part of overall performance story, but helpful Example: switching network providers, adding caches or optimizers 30. 2015 MapR Technologies 31. 2015 MapR Technologies Summary and Conclusions New SQL-on-Hadoop technologies enable network analysis in a BI context Less time making schema, fewer requirements Easily supplement existing analysis Less need for specialized tools Apache Drill reduces the time required to get answers JSON analysis in place interactive Queries and dashboards Integrated with BI tools out of the box Tableau, MicroStrategy, Qlikview, others More examples on github mapr-demos


Recommended