Interactively Search and Visualize Your Big Data

Post on 02-Jul-2015

1,202 views 1 download

description

Open up your user base to the data! Contrary to programming and SQL, almost everybody knows how to search. This talk describes through an interactive demo based on open source Hue how users can graphically search their data in Hadoop. The underlying technical details of the application and its interaction with Apache Solr will be clarified. The session will detail how to get started with data indexing in just a few clicks as well as explore several data analysis scenarios. Through a web browser, attendees will be shown how to explore and visualize data for quick answers. The new search dashboard in Hue, with its draggable charts and dynamic interface, lets any non-technical user look for documents or patterns. Attendees of this talk will learn how to get started with interactive search visualization in their Hadoop cluster.

transcript

Interactively search and visualize your data Romain Rigaux

Goals

Build a Web app Quickly explore data

… with Solr

Hue: make Solr / Hadoop easier to use

+

Architecture

REST

“Just a view” on top of the standard Solr API

History: v1 User

History: v1 Admin

Architecture: Next!

Lot of learning, UX Boost needed

Simple, don’t know it is Solr

History: v2 User

History: v2 Admin

Architecture

/select /admin/collections /get /luke...

/add_widget /zoom_in /select_facet /select_range...

REST AJAX Templates

+ JS Model

www….

Architecture: UI for Facets

Layout

Collection

Query

All the 2D positioning (cell ids), visual, drag&drop

Dashboard, fields, template, widgets (ids)

Search terms, selected facets (q, fqs)

Adding a widget life cycle

Load the initial page Edit mode and Drag&Drop

/solr/zookeeper/clusterstate.json /solr/admin/luke…

/get_collection

Adding a widget life cycle

/solr/select?stats=true /new_facet

Select the field Guess ranges (number or dates)

Rounding (number or dates)

Adding a widget life cycle Query part 1

Query Part 2

Augment Solr response

facet.range={!ex=bytes}bytes&f.bytes.facet.range.start=0&f.bytes.facet.range.end=9000000&  f.bytes.facet.range.gap=900000&f.bytes.facet.mincount=0&f.bytes.facet.limit=10  

q=Chrome&fq={!tag=bytes}bytes:[900000+TO+1800000]  

{ !'facet_counts':{ ! 'facet_ranges':{ ! 'bytes':{ ! 'start':10000,! 'counts':[ ! '900000',! 3423,! '1800000',! 339,!

! ! ...! ]! }! }!}!

{! ...,! 'normalized_facets':[ ! { ! 'extraSeries':[ !! ],! 'label':'bytes',! 'field':'bytes',! 'counts':[ ! { ! 'from’:'900000',! 'to':'1800000',! 'selected':True,! 'value':3423,! 'field’:'bytes',! 'exclude':False! }! ], ...! }! }!}!

JSON to Widget { !"field":"rate_code",!"counts":[ ! { ! "count":97797,! "exclude":true,! "selected":false,! "value":"1",! "cat":"rate_code"! } ...!

{ !"field":"medallion",!"counts":[ ! { ! "count":159,! "exclude":true,! "selected":false,! "value":"6CA28FC49A4C49A9A96",! "cat":"medallion"! } ….!

{ !"extraSeries":[ !!],!"label":"trip_time_in_secs",!"field":"trip_time_in_secs",!"counts":[ ! { ! "from":"0",! "to":"10",! "selected":false,! "value":527,! "field":"trip_time_in_secs",! "exclude":true! } ...!

{ !"field":"passenger_count",!"counts":[ ! { ! "count":74766,! "exclude":true,! "selected":false,! "value":"1",! "cat":"passenger_count"! } ...!

Repeat…

Enterprise features

-  Access to Search App configurable, LDAP/SAML auths -  Share by link -  Solr Cloud (or non Cloud) -  Proxy user

/solr/jobs_demo/select?user.name=hue&doAs=romain&q= -  Security

Kerberos -  Sentry

Collection level, Solr calls like /admin, /query, Solr UI, ZooKeeper

Demo Index and Visualize Taxi data

http://chriswhong.com/open-data/foil_nyc_taxi/ https://archive.org/details/nycTaxiTripData2013 [torrent better]

Missed it?

http://demo.gethue.com/search

What’s next?

-  Map Pivot Facets -  Autocomplete -  Analytics range facets -  Easier Indexing -  … ?

Thank you!

http://gethue.com/blog/search https://github.com/cloudera/hue