Date post: | 15-Jul-2015 |
Category: |
Technology |
Upload: | accumulo-summit |
View: | 239 times |
Download: | 0 times |
Accumulo Summit - 4/28/2015
Event-Driven Big Data with Accumulo
Leveraging Big Data in M o t i o n …
John HebelerLockheed Martin Inc.
“It is a capital mistake to theorize before one has data.” Sherlock Holmes
Plan…✴Brief Event-Driven Overview✴Accumulo Event Management✴Demonstration/Access to EC2
2
Events❖ Events drive our world - it is our context
❖ Data processing often reflects these events but with batch latency, poor resolution, longitudinal conflicts, and pull-type architectures
❖ If you don’t ask - no one hears…
❖ Event consequences are delayed and possibly lost
❖ Especially true “In Context” with related events
❖ Time plays a critical factor - before, after, simultaneous…
❖ Focus on Accumulo Role and Implementation
3
Event-Driven Architecture
❖ Events drive to consequences
❖ Multiple Levels/Iterations
❖ Clients (or downstream events) analyze the consequences in near real-time
❖ Stateless except for Big Data (Accumulo) which makes it possible!
❖ Resolution, Fidelity, Query, …4
Accumulo Data Model
❖ Decomposable, Flexible Key
❖ Lexicographical Index (only) from Row ID
❖ Family and Qualifier can be “Columns” or Row/Key “Enrichment”
❖ Visibility controls row level flexible “security”
❖ Timestamp usually automatic and allows “versions”
❖ Value
❖ Anything but not really “searchable”
❖ Any above can be quite h u g e❖ Atomic only at Row Level
KeyValue
Row IDColumn
TimestampFamily Qualifier Visibility
Events and Context❖ Store events for easy retrieval
❖ Events continue to grow; Context reaches steady state
❖ Proper interpretation of an event within its context
❖ Idempotence
6
Categories
1. Direct Accumulo Operations
2. Event Programming
3. Event Management with Accumulo
Direct Accumulo Operations
Query❖ Key constructs - Packed fields vs Column based - your choice
❖ Lexigraphical Index Only Index - (Another word for build a new table)
❖ a finds a.a.a.b
❖ Not usually practical to search in the Value
❖ Query for the past values (versions)
❖ Time
ArrayList<Range> ranges = new ArrayList<Range>( );// Populate rangesBatchScanner bs = conn.createBatchScanner(table,… );b.setRanges(ranges)
TableOperations to = conn.tableOperations()to.setProperty(tableName, “table.iterator.scan.vers.opt.maxVersions”, N);to.setProperty(tableName, “table.iterator.majc.vers.opt.maxVersions”, N);to.setProperty(tableName, “table.iterator.minc.vers.opt.maxVersions”, N);
RowID Family Qualifier Value
9
Event Update❖ Store events for easy retrieval
❖ Maintain context surrounding the event
❖ Write with same key - updates valueRowID Family Qualifier Value
10
EventID1 EventID2 EventID3 Event** JSON or Serialized Object
Event Cursor❖ Accumulo Cursor automatically buffers responses to conserve memory
❖ Events constructed directly from an Accumulo row do not
❖ If not careful, out of memory exceptions (especially true in big data)
RowID Family Qualifier ValueClass EventCursor {Iterator rowIterator = null;public EventCursor(Scanner s) {
rowIterator = s.iterator();}
public Event next() { return( row2Event(s.iterator.next())); } }
A Word About Accumulo Visibility…
❖ Different
❖ (part of the key)
Event Programming
Exception based Programming❖ Don’t ask for permission but plan for exceptions…
❖ Faster and more efficient
❖ Program to expect that they won’t happen and if they do, handle it
❖ Watch out for thread contention - can use LockRowID Family Qualifier Value
// Optional - openLock.lock();while(true){ try { wr = aClient.createBatchWriter(EVENT_CONTEXT_TABLE, new BatchWriterConfig()); break; } catch (TableNotFoundException e) {
// Create Table and retry - also need to catch TableExistsException aClient.tableOperations().create(EVENT_CONTEXT_TABLE);
}}// Optional - openLock.unlock();
Avoid Transactions❖ Big data transactions expensive (and difficult)
❖ Make the need rare and solution lazy
❖ Distributed partial state dilemma
Append and update a single row does not require formal transactions
Race Condition lazy recognition and repair
Accumulo only ensures row level transactions (but can still be of value for each field can hold a lot of data)
Event conclusions too close in time are just reprocessed or properly thread bundled
RowID Family Qualifier Value
15
Progressive Provenience❖ Retrieve origin of event combinations
❖ Maintain context surrounding the event
❖ Use same key in different tables for rapid traversalRowID Family Qualifier Value
16
Test Events
❖ Test Flag allows In-Stream Test and Validation
❖ Availability
❖ Performance
❖ Quality
❖ What Ifs
❖ Flag indicates different storage table, queues, …
Event Management with Accumulo
Turning an Event Off❖ Event assertion no longer supported (but was)
RowID Family Qualifier Value
19
Forgetting an Event (Error)❖ Store events for easy retrieval
❖ Maintain context surrounding the event
RowID Family Qualifier Value
20
Time Travel❖ Rerun (Time) Events due to corrupted data,
out-of-order events, event error, event correction, or “what if”scenarios
❖ Develop context surrounding the event
❖ Remixing the cake
** Need to Run Topic X again since last October due to error then
// Collect all events for Topic since October (already in time order)
// Clear Topic X Context
// Rerun collected events in order (all corrected now!)
RowID Family Qualifier Value
21
Future Events
❖ Future Events (Expiring State, Travel Plans, …)
❖ May not happen or change…
RowID Family Qualifier Value
❖ Store event as always
❖ Schedule timer (or interval timer) to ignite future events
❖ Events easily removed due to update, timer finds nothing
❖ Requires careful consideration of index/RowId
Extra Extra❖ Analytics
❖ Events create a rich foundation for longitudinal analytics - but must consider the data model for efficient queries (proper indexing)
❖ Backup/Recovery
❖ Take advantage of Accumulo clone and pause processing
❖ Hybrid Systems
❖ Semantic Web
❖ Related NoSQL - MongoDB and Neo4J
❖ Map Reduce
❖ Gotcha
❖ Accumulo built upon Hadoop, Zookeeper…
Follow Up❖ Email for EC2 accumulo and event driven prototype
❖ Questions any time
❖ Play - free micro computer one year