Big Data and Spark Streaming. Oil production sensors data monitoring

Post on 14-Apr-2017

624 views 1 download

transcript

Big Data, Spark Streaming, Oil and Gas

Kyiv, Ukraine. 07 June 2016

Oil production sensors data monitoring

Yaroslav Nedashkovsky, System Architect

Shell data project

“Next year, BP will connect 650 wells to the Industrial Internet. If all goes according to plan, the companies will expand the scope to 4,000 BP subsea wells around the world”

Digital Oil Field

How we could handle such huge data flow ?

What kind of streaming technology could we use ?

Stream processing system- Apache Spark Streaming

- Apache Storm- Apache Samza- Azure Stream Analytics- Google Dataflow- Heron

- Apache Flink

What we need from streaming system?- scalable- fault-tolerant- low latency- data distribution- distributed computations- good API- “exactly-once” guarantees or maybe “at most once” or “at least once ” will be enough ?

(near real time, but not real time)

- high-level api (windows, joins, etc.)

- exactly-one semantics (?!), fault tolerant, scalable

- integration with SQL, DataFrames, Mllib, GraphX

How this work ?

Spark 2.0: Structured Streaming

Oil location data flow monitor

“Christmas tree”

IoT (MQTT) + Spark Streaming + Vizualization

Let look at “monitor” implementation and see how it works

Contacts:

email: bigdata@softelegance.com