+ All Categories
Home > Technology > Building A Scalable Big Data System for the Internet of Things (IoT))

Building A Scalable Big Data System for the Internet of Things (IoT))

Date post: 14-Apr-2017
Category:
Upload: nasscom-product-connect
View: 496 times
Download: 0 times
Share this document with a friend
22
The Internet of Things Company
Transcript
Page 1: Building A Scalable Big Data System for the Internet of Things (IoT))

The Internet of Things

Company

Page 2: Building A Scalable Big Data System for the Internet of Things (IoT))

Vinay Nathan, CEO 15 years of varied experience across sales,

marketing, engineering and PM Most recently, VP Sales at Persistent

Systems

Yogesh Kulkarni, COO 16 years of product engineering

experience in global product companies Most recently, Director - Product

Development at BMC Software

Ranjit Nair, CTO 16 years of software architecture and

engineering experience Most recently, Engineering Manager at

Amazon

About Altizon

Page 3: Building A Scalable Big Data System for the Internet of Things (IoT))

And this is what we do

Page 4: Building A Scalable Big Data System for the Internet of Things (IoT))

My Motivation

Page 5: Building A Scalable Big Data System for the Internet of Things (IoT))

IoT

IoT is the integration of the physical world into the computing world

Page 6: Building A Scalable Big Data System for the Internet of Things (IoT))

IoT is a Big-Data problem

• Massive amounts of data

• Machines are commonly sampled for data at millisecond intervals.

• Volume, variety and velocity.

• That needs to be analyzed in real time

• Condition based monitoring

• Anomaly detection

• That need to be analyzed for actionable insights

• Efficiency, utilization, machine health

• That need supervised and unsupervised machine learning

• Predictive maintenance, proactive support

Page 7: Building A Scalable Big Data System for the Internet of Things (IoT))

Cloud

Cloud EdgeCloud EdgeCloud Edge

Edge

Topology

EdgeEdge

Page 8: Building A Scalable Big Data System for the Internet of Things (IoT))

The Edge

Sensors

• Network and connectivity

• Wifi, BLE, Zigbee, 6LoWPAN

• Protocols

• MQTT, CoAP, AMQP

• Low Complexity

• Security

• Upgrades

Edge

• Network Protocols

• Higher Complexity

• Bidirectional communication

• Integration

• Device context

• Security

Page 9: Building A Scalable Big Data System for the Internet of Things (IoT))

Be cloud agnostic

Page 10: Building A Scalable Big Data System for the Internet of Things (IoT))

The Cloud Edge

• Protocol Adapters• Edge to cloud protocols

• Filtering rules and aggregations• Batching• Local controller• Highly available• Load balanced

Page 11: Building A Scalable Big Data System for the Internet of Things (IoT))

Event Ingestion at Scale

• Device auto-discovery• Metadata driven device discovery

• Device Telemetry Data• Time-series data• Which can be out of sequence

• Alerts and logs• Event validation• Bandwidth and backpressure

Page 12: Building A Scalable Big Data System for the Internet of Things (IoT))

• Portable deployment of applications as a single

object versus process sandboxing• Application-centric versus machine/server-

centric• Supports for automatic container builds• Built-in version tracking• Reusable components• Public registry for sharing containers• A growing tools ecosystem from the published

API. https://www.docker.com/what-docker

Page 13: Building A Scalable Big Data System for the Internet of Things (IoT))

backend datonis-events balance source server event1 event1.datonis.io:80 check server event2 event2.datonis.io:80 check

backend datonis-api balance roundrobin mode http server api1 api1.datonis.io:80 check server api2 api2.datonis.io:80 check

frontend http bind *:80 mode http

acl events path_beg /event use_backend datonis-events if events

default_backend datonis-api

HAProxy

Page 14: Building A Scalable Big Data System for the Internet of Things (IoT))

• Entities• Broker, Topic, Producer, Consumer

• A sharded write ahead log• Contiguous memory allocation• Index and offset• Messages are not deleted on read

• But on an SLA• Data reloads

• Log replication for fault tolerance• Making reads faster

• Kafka-Spark consumer

Page 15: Building A Scalable Big Data System for the Internet of Things (IoT))

• Caching.

• Redis can be used in the same manner as memcache• Counting stuff. Atomic counters • Show latest items.

• This is a live in-memory cache and is very fast. • Deletion and filtering.

• If a cached article is deleted it can be removed from the cache using.

• Leaderboards and related problems. • Implement expires on items.• Unique N items in a given amount of time. • Pub/Sub. • Queues.

Page 16: Building A Scalable Big Data System for the Internet of Things (IoT))

Real time CEP

• Apache Spark• Unified stream, batch processing and

machine learning

• RDDs• Immutable, resilient, distributed collection

of records.

• DStreams• A continuous sequence of RDDs

val textFile = sc.textFile("hdfs://...")val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)counts.saveAsTextFile("hdfs://...")

val ssc = new StreamingContext(sparkConf, Seconds(1))

val lines = ssc.socketTextStream(args(0), args(1))val words = lines.flatMap(_.split(" "))val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)wordCounts.print()ssc.start()ssc.awaitTermination()

Spark

Spark Streaming

Page 17: Building A Scalable Big Data System for the Internet of Things (IoT))

Actually this is CEP

Page 18: Building A Scalable Big Data System for the Internet of Things (IoT))

Why do we love Spark

• Common logic for stream and batch processing• No separate architectures and approaches• Storm would have been appropriate for absolute real-time

• A hit with data-scientists• Rapid iterations on large data sets

• Language support• Python, Java, Scala and R• R syntax is extremely baffling (or maybe I’m just too old)

• Spark MLIB• Statistics, classification, filtering, clustering, feature extraction • The list is constantly growing

Page 19: Building A Scalable Big Data System for the Internet of Things (IoT))

Persistence

• Why Mongo?• Concerns around separate databases for transactional data and

event data• Premature optimization

• Path• Started with 2.x. Collection level locking• Now at 3.2. Document level locking• WiredTiger storage engine. 5x with snappy compression.

• Extreme convenience for configuration objects• Design patterns for time-series data• Great toolsets

• Shout out to Mongoid

• Easy data migration

Page 20: Building A Scalable Big Data System for the Internet of Things (IoT))

Replica Sets

https://docs.mongodb.org/manual/core/replication-introduction/

• Multiple copies on servers• Provides fault tolerance• All writes to primary

• Secondaries replicate primary oplog.• Asynchronous replications

• Improved read performance• You can specify reading from a replica.

• Automatic failover• Election if the primary goes down

Page 21: Building A Scalable Big Data System for the Internet of Things (IoT))

Sharding

https://docs.mongodb.org/manual/core/sharding-introduction/

• Horizontal scaling• Divides and distributes data over shards

• Entities• Shards store data• Query routers route requests to shards• Config servers. Metadata about the shards.

• Shard keys• Range based sharding. Efficient querying• Hash bases sharding. Efficient distribution

• Maintenance• Splitting and balancer

Page 22: Building A Scalable Big Data System for the Internet of Things (IoT))

http://xkcd.com/

Questions

Email: [email protected]

is hiring


Recommended