+ All Categories
Home > Documents > Toronto jaspersoft meetup

Toronto jaspersoft meetup

Date post: 26-Jan-2015
Category:
Upload: patrick-mcfadin
View: 118 times
Download: 9 times
Share this document with a friend
Description:
 
Popular Tags:
36
©2012 DataStax 1 Move. Faster. Toronto Jaspersoft User Group Patrick McFadin, Principal Solution Architect @PatrickMcFadin
Transcript
Page 1: Toronto jaspersoft meetup

©2012 DataStax1

Move. Faster.

Toronto Jaspersoft User Group

Patrick McFadin, Principal Solution Architect@PatrickMcFadin

Page 2: Toronto jaspersoft meetup

2©2012 DataStax©2012 DataStax

About Me/Moi?

2

• Principal Solution Architect at DataStax, THE Cassandra company

• Cassandra user since .7

• Prior

- Chief Architect at Hobsons

- Started a software services company. Link-11

• Follow me here: @PatrickMcFadin

Page 3: Toronto jaspersoft meetup

3©2012 DataStax©2012 DataStax

Who is

3

• We employ most of the Cassandra committers• 24/7 support• Consulting• DataStax enterprise

Page 4: Toronto jaspersoft meetup

4©2012 DataStax

4

And beer!

And cupcakes! (??)

Page 5: Toronto jaspersoft meetup

©2012 DataStax5

Our Solution

DataStax Enterprise allows you to focus on your Big Data applications instead of battling your underlying infrastructure:

•Velocity

•Volume

•Variety

•Complexity

•Distribution

Page 6: Toronto jaspersoft meetup

6©2012 DataStax

6

DATASTAX Enterprise also includes…

•Log4j application log integration•A single graphical management tool •World-class support

Page 7: Toronto jaspersoft meetup

7©2012 DataStax

7

Cassandra as real-time foundation

•Continuous availability•Extreme scale •Multi-datacenter support •Cloud enablement•Operational simplicity

Page 8: Toronto jaspersoft meetup

8©2012 DataStax

8

Hadoop in the same system:

•Batch analytics •Reduced data movement, less ETL operations•No complex architectures•Integrated mahout, sqoop, hive, pig, etc.

Page 9: Toronto jaspersoft meetup

9©2012 DataStax

9

And we integrate Solr:

•Enterprise search •Always indexed data•Scalable performance•Mission-critical dependability

Page 10: Toronto jaspersoft meetup

10©2012 DataStax

10

Can we just talk Can we just talk about Cassandraabout Cassandra

... and aliens. ... and aliens.

Page 11: Toronto jaspersoft meetup

11©2012 DataStax

11

Roots

DynamoDynamo

BigTableBigTable

Page 12: Toronto jaspersoft meetup

12©2012 DataStax

12

Shared NothingCore concepts

Page 13: Toronto jaspersoft meetup

13©2012 DataStax

13

Core concepts Replicated

Page 14: Toronto jaspersoft meetup

14©2012 DataStax

14

Core concepts WAN Replication

Page 15: Toronto jaspersoft meetup

15©2012 DataStax

15

Core concepts Scaling

• Need more write throughput? - add nodes

• Need more read throughput? - add nodes

• Cassandra scales in a linear fashion

• Massive number of ops/sec

Page 16: Toronto jaspersoft meetup

16©2012 DataStax

16

Core concepts Scaling

Source: Solving big data challenges for enterprise application performance managementProceedings of the VLDB Endowment, Volume 5 Issue 12, August 2012, Pages 1724-1735

Page 17: Toronto jaspersoft meetup

17©2012 DataStax

17

Core concepts

CConsistencyonsistency--

Eventual, but

Cassandra will not

lose your data.

PPartition-artition-Nodes canNodes can’’t see t see each other but each other but

cluster is still upcluster is still up

AAvailabilityvailability- -

Max uptime for Max uptime for clientsclients

CAP Theorem

Cassandra lives here

...and sometimes lives here

It’s your choice!

Page 18: Toronto jaspersoft meetup

18©2012 DataStax

18

Core concepts Availability

Text

Continuous Availability > High Availability

Your infrastructure will fail ...deal with it.

Page 19: Toronto jaspersoft meetup

19©2012 DataStax

19

Data Model Basics

Page 20: Toronto jaspersoft meetup

20©2012 DataStax

20

Data Model Basics Cluster

Cluster - Multiple Nodes acting together. Even over WAN.Cluster - Multiple Nodes acting together. Even over WAN.

Keyspace - Logical collection of Column Families. StoresKeyspace - Logical collection of Column Families. Stores replication strategy.replication strategy.

Column Family (Table) - Stores rows of dataColumn Family (Table) - Stores rows of data

Page 21: Toronto jaspersoft meetup

21©2012 DataStax

21

Data Model Basics Rows

• Unique in column family

• Hashed

• Randomly assigned to node*

• Indexed for speed

*You pick the partitioner. Please pick random. Please. Please. Please

Page 22: Toronto jaspersoft meetup

22©2012 DataStax

22

Data Model Basics Columns

• Assigned to a row

• Column Name: 64k ByteArray

• Column Value: 2G ByteArray (!!)

• Timestamp of when set

• Optional: Expire TTL

• Dynamic

Column NameColumn Name

Column ValueColumn Value

TimestampTimestamp

TTLTTL

RowRow ...

Page 23: Toronto jaspersoft meetup

23©2012 DataStax

23

Data Model Basics Wide Rows

• How wide? 2 Billion columns!!!

• No schema needed

• Row key, many columns

• Add columns as needed per row

Page 24: Toronto jaspersoft meetup

24©2012 DataStax

24

Data Model Basics Data Access

Thrift

• Cassandra's client API built entirely on top of Thrift*

• Provides for manipulation of Data Model and Data

• Almost all current clients implement this API

CQL

• Cassandra Query Language

• New binary driver as of 1.2

• Extends functionality beyond Thrift

Page 25: Toronto jaspersoft meetup

25©2012 DataStax

25

Data Model Basics Data Access

More about CQL

• Rapidly evolving spec

- Version 1 since Cassandra 0.8

- Version 2 since Cassandra 1.0

- Version 3 since Cassandra 1.1

- Final cut in 1.2

• Offers more enhanced features than thrift

• DataStax Drivers

Page 26: Toronto jaspersoft meetup

26©2012 DataStax

26

Data Model Basics Fixed schema

CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username));

• Similar to a RDBMS table. Fairly fixed columns • This example: Row key = username and is unique• Use secondary indexes on firstname and lastname for lookup• Adding columns with Cassandra is super easy (no downtime)

CREATE INDEX user_firstname ON users (firstname);CREATE INDEX user_lastname ON users (lastname);

Page 27: Toronto jaspersoft meetup

27©2012 DataStax

27

Data Model Basics One-to-many

CREATE TABLE comments ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username,comment_ts));

• Videos have many comments• Comments have many users• Order is as inserted (Reversable if needed)• Use getSlice() to pull some or all of the comments

Page 28: Toronto jaspersoft meetup

28©2012 DataStax

28

Wide rowTime ordered

Data Model Basics One-to-many pt2

• Underlying storage model is still wide rows

• CQL presents as a table

• username and comment_ts are filterable

SELECT comment FROM commentsWHERE username = ‘ctodd’ AND comment_ts > ‘2012-07-12 10:30:00’;

Page 29: Toronto jaspersoft meetup

29©2012 DataStax

29

Data Model Basics Query Tables

• No joins in Cassandra

• Filtering and scans can be expensive• Tag is unique regardless of video• Great for “List videos with X tag”• Tags have to be updated in Video and Tag at the same time• Index integrity is maintained in app logic

CREATE TABLE tag_index ( tag varchar, videoid varchar, timestamp timestamp, PRIMARY KEY (tag, videoid));

Powerful performance tool!

Page 30: Toronto jaspersoft meetup

30©2012 DataStax

30

Data Model Basics Loading data

sstableloader -d 10.0.0.100 /home/pmcfadin/dbfiles

> 1 Million rows• BI Tools - Talend, Pentaho, JasperSoft

• Custom code - My personal favorite

• sstable loader - Only for specific file types

Requires files to be in sstable format

Page 31: Toronto jaspersoft meetup

31©2012 DataStax

31

Data Model Basics Loading data

< 1 Million rows• Everything that worked for 1 Million +

• CQL copy command

• Loads a delimited file into a table

COPY customers(Card_ID, Registration_Date, Gender, Birth_Date) FROM 'Customers_File.txt' WITH HEADER=true AND DELIMITER=’,';

Page 32: Toronto jaspersoft meetup

32©2012 DataStax

32

Cassandra 1.2 Data Access

•Collections (maps, sets, lists)Support for virtual nodes (vnodes)Query ProfilerAtomic batchesEnhanced JBOD supportNative binary CQL transport (no Thrift)Parallel leveled compactionsOff-heap bloom filters

Page 33: Toronto jaspersoft meetup

33©2012 DataStax

Collections

•Structure to column values

•Insert and update

•Map

•List

•Set

33

cqlsh> CREATE TABLE users (           user_id text PRIMARY KEY,           first_name text,           last_name text,           emails set<text>       );

http://www.datastax.com/dev/blog/cql3_collections

Page 34: Toronto jaspersoft meetup

34©2012 DataStax

Request tracing

•Automatically stored for 24h

•Full path trace

•Includes node info

34

cqlsh> tracing on;Now tracing requests.

cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example');Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9

activity | timestamp | source | source_elapsed-------------------------------------+--------------+-----------+---------------- execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0 Parsing statement | 00:02:37,015 | 127.0.0.1 | 81 Preparing statement | 00:02:37,015 | 127.0.0.1 | 273 Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779

Messsage received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888

Messsage received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334 Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550 Request complete | 00:02:37,017 | 127.0.0.1 | 2581

http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2

Page 35: Toronto jaspersoft meetup

35©2012 DataStax

Virtual Nodes (vnodes)

•Many nodes per JVM

•Tokens are auto-assigned (!!!)

•Faster...

✓repair

✓bootstrap

✓decommission

35

http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2

Page 36: Toronto jaspersoft meetup

36©2012 DataStax

36

Data Model Basics Data Access

DEMO


Recommended