Performance is not an Option - gRPC and Cassandra

Post on 16-Apr-2017

217 views 1 download

transcript

Copyright 2016 Expero, Inc. All Rights Reserved 1

In today's development ecosystem building a service oriented architecture based on a micro services is

common practice. With the rise of Big Data and Internet of Things applications making these services highly

performant services is no longer an option. In order to accomplish the scalability and performance

requirements that customers expect we are required to start thinking differently about how we architect and

build these applications in order to meet those demands.

This session will demonstrate a method for creating a highly performant service based application using

Google’s GRPC and Apache Cassandra in .NET. We will show how you can combine gRPC to minimize

communication overhead while leveraging Cassandra to optimize storage of time series data. We will explore

these concepts by creating an Internet of Things (IoT) application to demonstrate how you can effectively

meet the performance and scalability challenges posed by these new breeds of applications.

Abstract

Copyright 2016 Expero, Inc. All Rights Reserved

Performance is not an OptionBuilding High performance Web Services with gRPC and Cassandra

June 9th, 2016

#build4prfrmnc

Copyright 2016 Expero, Inc. All Rights Reserved 3

● What is gRPC

● What is Cassandra

● How to build a simple gRPC Microservice

● How to persist time series data in Cassandra

● Why you might want to use gRPC/Cassandra instead of a

more traditional REST/RDBMS for a Time-Series IoT

application

What I hope you take away

Copyright 2016 Expero, Inc. All Rights Reserved 4

Dave BechbergerSenior Architect dave@experoinc.com @bechbd https://www.linkedin.com/in/davebechberger

About me

Copyright 2016 Expero, Inc. All Rights Reserved 5

Expero - Bringing challenging product ideas to reality

Architecture & Development

Product Strategy

User Experience

DomainExpert

Copyright 2016 Expero, Inc. All Rights Reserved 6

Expero - Select Clients

6Austin(HQ) • Houston • New York City • Founded 2003

Copyright 2016 Expero, Inc. All Rights ReservedCopyright 2016 Expero, Inc. All Rights Reserved

What is gRPC?

7

Copyright 2016 Expero, Inc. All Rights Reserved

● gRPC is a general purpose RPC framework

● Built on standards

● Free and Open Source

● Built for distributed systems

8

What is gRPC?

Copyright 2016 Expero, Inc. All Rights Reserved

● Allows client to call methods on the server as if they were local

● Built for low latency highly scalable microservices

● Payload agnostic

● Bi-Directional Streaming

● Pluggable and Extensible

9

gRPC Architecture

Copyright 2016 Expero, Inc. All Rights Reserved 10

Simple Model and Service Definition

Copyright 2016 Expero, Inc. All Rights Reserved 11

Optimized Speed and Performance

Copyright 2016 Expero, Inc. All Rights Reserved 12

Code Generation

Copyright 2016 Expero, Inc. All Rights ReservedCopyright 2016 Expero, Inc. All Rights Reserved

What is Cassandra?

13

Copyright 2016 Expero, Inc. All Rights Reserved

What is Cassandra?

● Distributed Datastore

● Open Source Apache Project

● No Single Point of Failure

● Scalable

14

Copyright 2016 Expero, Inc. All Rights Reserved

CAP Theorem - Pick 2● Consistency - all nodes see the same data

at the same time

● Availability - every requests receives a response

● Partition Tolerant - the system continues to operate even during network failures

15

Copyright 2016 Expero, Inc. All Rights Reserved

ACID vs. BASERDBMS World

● Atomic - transactions are “all or nothing”

● Consistency - On completion all data is the same

● Isolated - transactions do not interfere with one another

● Durable - results of a transaction are permanent

16

NoSQL World

● Base Availability - The datastore works most of the time

● Soft State - Stores are not write consistent, data can differ between replicas

● Eventually Consistent - Stores become consistent over time

Copyright 2016 Expero, Inc. All Rights Reserved

Cassandra Architecture

17

Copyright 2016 Expero, Inc. All Rights Reserved

Hash Ring Architecture

18

● All nodes own a portion of the token ring

● All nodes know which token ranges belong to which nodes

● Partitioner generates a token from the Partition Key

● Tokens determine where data is located on the ring

Copyright 2016 Expero, Inc. All Rights Reserved

How Tokens Work

19

Partitioner

Token:12

Client Driver

PK: Expero

Data Written

Copyright 2016 Expero, Inc. All Rights Reserved

Data Replication

● Data replication is automatic

● Number of replicas is called the Replication Factor or RF

● Data is replicated between Data Centers

● Hinted Handoff

Copyright 2016 Expero, Inc. All Rights Reserved

Data Replication in Action

Client

Write A

Data Written

Replica Written

Replica Written

Coordinator

Driver

Partitioner

Token:12

PK: Expero

Copyright 2016 Expero, Inc. All Rights Reserved

What does it mean to be Eventually Consistent?

22

● Data will “eventually” match on all replicas, usually in terms of milliseconds

● Consistency Level or CL (11 for writes, 10 for reads)

● Tuning consistency affects performance and availability

● CL can be tuned for R/W performance on a per query basis using CQL

Copyright 2016 Expero, Inc. All Rights Reserved

Why use Cassandra over your RDBMS?

● Performance

● Linearly Scalable

● Natively built as a distributed datastore

● Always-On Architecture

23

Copyright 2016 Expero, Inc. All Rights Reserved

What is DataStax vs. Apache Cassandra?● Certified Cassandra – Delivers highest

quality Cassandra software for confidence and peace of mind for production environments

● Enterprise Security – Full protection for sensitive data

● Automatic Management Services – Automates key maintenance functions to keep the database running smoothly

● OpsCenter – Advanced management and monitoring functionality for production applications

● Expert Support – Answers and assistance from the Cassandra experts for all production needs

24

Copyright 2016 Expero, Inc. All Rights ReservedCopyright 2016 Expero, Inc. All Rights Reserved

The Problem - Engine Monitoring

25

Copyright 2016 Expero, Inc. All Rights Reserved

● Truck Engine Monitoring Software

● Currently ~ 1000 trucks taking readings every 10 seconds

● WebAPI REST on a SQL Server 2014 Database

26

Setup

Copyright 2016 Expero, Inc. All Rights Reserved

● You were recently landed a huge new client Expero Trucking Inc.

● Sensor readings now 1/second and add geolocation (lat/long) data

● Adding 10,000 trucks.

● Minimize costs and zero downtime

27

The Requirements

Copyright 2016 Expero, Inc. All Rights Reserved

● 100 measurements/second to 22,000 measurements/second

● Data load from ~35 MB/day to ~2.2 GB/day

● Your architecture needs to change

28

The Problem

Copyright 2016 Expero, Inc. All Rights ReservedCopyright 2016 Expero, Inc. All Rights Reserved

The Solution - gRPC and Cassandra

29

Code Available Here https://github.com/experoinc/NDC-Oslo-2016/tree/master/NDC.Oslo

Copyright 2016 Expero, Inc. All Rights Reserved

● Change SQL Server Database to a Cassandra Cluster

● Replace REST based services with gRPC services

30

Proposed Solution

Copyright 2016 Expero, Inc. All Rights Reserved 31

Defining a Model and Service

Copyright 2016 Expero, Inc. All Rights Reserved 32

Generating Client/Server Stubs

Model Definition Service Definition

Copyright 2016 Expero, Inc. All Rights Reserved 33

Creating Cassandra KeySpace and Table

Copyright 2016 Expero, Inc. All Rights Reserved 34

Connecting to Apache Cassandra using DataStax DriverDataStax Open Source C# Driver - https://github.com/datastax/csharp-driver

Copyright 2016 Expero, Inc. All Rights Reserved 35

Writing Data to Cassandra

Copyright 2016 Expero, Inc. All Rights Reserved 36

Reading Data From Cassandra

Copyright 2016 Expero, Inc. All Rights Reserved 37

Time to See Some Running Code

Average Ping Time

Running in Oregon

Copyright 2016 Expero, Inc. All Rights ReservedCopyright 2016 Expero, Inc. All Rights Reserved

Tradeoffs of gRPC and Cassandra

38

Copyright 2016 Expero, Inc. All Rights Reserved 39

● Not for Browsers

● Chunk “Big” (>1MB ) Data

● No Nullable Data Types

● Not Production Yet

Tradeoffs of using gRPC

Copyright 2016 Expero, Inc. All Rights Reserved 40

● No joins between tables

● No Ad-Hoc queries

● Minimal Aggregations

● Complexity

● Cassandra is Not Relational

Tradeoffs of using Cassandra

Copyright 2016 Expero, Inc. All Rights Reserved 41

● gRPC

○ http://www.grpc.io/

○ https://developers.google.com/protocol-buffers/docs/proto3

● Cassandra

○ http://cassandra.apache.org/

○ http://www.planetcassandra.org/

○ https://academy.datastax.com/

○ http://www.datastax.com/

Learning More

Copyright 2016 Expero, Inc. All Rights ReservedCopyright 2016 Expero, Inc. All Rights Reserved

Thank you, any Questions?

42