+ All Categories
Home > Data & Analytics > Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

Date post: 07-Aug-2015
Category:
Upload: lisanl
View: 77 times
Download: 1 times
Share this document with a friend
16
© 2015 IBM Corporation Consistent Region in Specialized Toolkits IBM InfoSphere Streams 4.0 Samantha Chan Team Lead, Streams Toolkits Team For questions about this presentation contact: [email protected]
Transcript
Page 1: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Consistent Region in Specialized

Toolkits

IBM InfoSphere Streams 4.0

Samantha Chan

Team Lead, Streams Toolkits Team

For questions about this presentation contact: [email protected]

Page 2: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

2 © 2015 IBM Corporation

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.

IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

Page 3: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

3 © 2015 IBM Corporation

Agenda

Requirements for operators to participate in a consistent region

Compile Errors and Warnings

Specialized Toolkits Support

Page 4: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

4 © 2015 IBM Corporation

Three Kinds of Operators in a Consistent Region

Start Operator– Allow user to start a consistent region at this operator

– Operator can persist states into checkpoint• Internal States: state variables, windows, fields that can change over time

• External States: external models, file systems, DBs, etc.

– Operator can restore states upon reset

– Source operator that can replay tuples upon reset

Start Operator

Page 5: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

5 © 2015 IBM Corporation

Three Kinds of Operators in a Consistent Region

Middle Operator– Processing operator that can participate in a consistent region

– Operator can persist states into checkpoint • Internal States: state variables, windows, fields that can change over time

• External States: external models, file systems, DBs, etc.

– Operator can restore states upon reset

Middle Operator

Page 6: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

6 © 2015 IBM Corporation

Three Kinds of Operators in a Consistent Region

End Operator– Represents the end of a consistent region

– Can be a sink operator with no output port

– Can be annotated as Autonomous

– Operator can persist states into checkpoint • Internal States: state variables, windows, fields that can change over time

• External States: external models, file systems, DBs, etc.

– Operator can restore states upon reset

– Writing duplicated tuples to external systems have no detrimental /

unexpected effect

End Operator

Page 7: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

7 © 2015 IBM Corporation

Compile Errors and Warnings

For each operator– We determine if the operator can participate in a consistent region

• We will provide a compile warning/error if an operator cannot be part of a consistent

region or start of region

CDISP9163W WARNING: The following operator is not supported in a

consistent region:

com.ibm.streams.timeseries.modeling::AutoForecaster2. The operator

does not checkpoint or reset its internal state. If an application

failure occurs, the operator might produce unexpected results even

if it is part of a consistent region.

Page 8: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

8 © 2015 IBM Corporation

Adding Consistent Region Support in Operator

For operators that can participate in a consistent region, we did the

following:

– Become a StateHandler – to be called by the runtime to drain -> checkpoint -> reset

– Drain – called before checkpoint is called. Empties all internal buffer and submits any

pending tuples

– Checkpoint – persisting operator internal states upon checkpoint

– Reset – upon application failure, reset the operator internal states to the checkpoint

states

– Reset to Initial – called if an application failure is detected before the first checkpoint

can be taken.

Page 9: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

9 © 2015 IBM Corporation

Toolkits Support of Consistent Region

Consistent Region Support is added to the following toolkits:

– Cep

– Data Explorer

– DB

– RProject

– Rules

– Text

– Hbase

– HDFS

– Messaging

Consistent Region is not supported by the following toolktis:– Geospatial – plan to enable consistent region in a future release

– Timeseries – plan to enable consistent region in a future release

– Financial

– Mining

– Inet

Page 10: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

10 © 2015 IBM Corporation

Consistent Region Changes for Specialized Toolkits

For details on consistent region behavior for an operator, refer to its

SPLDoc

Page 11: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

11 © 2015 IBM Corporation

R Toolkit

Rscript operator – Spawns off a new process for an R session

– Execute R-Script

– Parses output from R session and submit as tuples

Rscript - Can participate in a consistent region, but cannot be a

start operator.

Operator does not have states, but R-Environment that executes

the R-scripts have states that can change during the lifetime of the

operator.

Checkpoint – save R environment to a file in the data directory

Reset – Call R to restore an R environment from file

Files are deleted when checkpoints are retired.

Page 12: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

12 © 2015 IBM Corporation

Messaging Toolkit

Supported in Consistent Region:– JMSSink

– KafkaProducer

– MQTTSink

– XMSSink

Can participate in consistent

region

Cannot be start of a consistent

region.

MQTTSink

– Control input port not supported in

consistent region

– Messages with Qos=1 or Qos=2 will be

delivered to an MQTT provider at least

once.

– Messages with Qos=1 can still be lost as

messages can be lost in transit

Disallowed to be in a Consistent

Region:– JMSSource

– KafkaConsumer

– MQTTSource

– XMSSource

To enable consistent region,

use ReplayableStart operator

Page 13: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

13 © 2015 IBM Corporation

HDFS Toolkit

Toolkit is supported in a consistent region.

HDFS2DirectoryScan– Scans directory from HDFS and submits filenames as output

Consistent Region Behavior– Can be the start operator of a consistent region if there is no input port

– Drain – do nothing, operator has no internal buffer

– Checkpoint – saves the last submitted filename and its modification

timestamp.

– Reset – restores the last submitted filename and modification timestamp

– When processing resumes:• Find all files on the file system

• Will only submit filenames that have not been submitted since the checkpoint

• This algorithm allows us to support exactly-once processing

Page 14: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

14 © 2015 IBM Corporation

HDFS Toolkit

HDFS2FileSource– Reads file content from HDFS and submits as output

Consistent Region behavior:

• Can be the start operator if it does not have an input port.

• Supports both operator-driven and periodic policy.

• If operator driven, a checkpoint is established after the file is fully

read.

• Drain - the operator flushes internal buffer.

• Checkpoint - the operator saves the current filename and file cursor

location

• Reset - the operator resets cursor location and start reading again

when processing is resumed.

• This allows operator to supports exactly once processing as

content submitted before the checkpoint will not be sent again.

Page 15: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

15 © 2015 IBM Corporation

HDFS Toolkit

HDFS2FileSink– Writes data to files in HDFS

Consistent Region Behavior:– HDFS must be configured properly with APPEND enabled

– Cannot support exactly once processing because HDFS does not support

random write.

– Drain – flushes any internal buffer and write content to file system. The

operator will force a flush of content from HDFS client as well.

– Checkpoint - saves the current filename, filesize, tuple count, file number, etc,

to checkpoint.

– Reset – operator closes the current file. Resets all the various counters and

properties. It will regenerate the filename and open the file in APPEND

mode.

– When processing is resumed, content will be appended at the end of the file

being reset to.

Page 16: Consistent Regions in Specialized Toolkits for IBM InfoSphere Streams V4.0

16 © 2015 IBM Corporation

Questions?


Recommended