Time Series Data with InfluxDB

Post on 12-Aug-2015

179 views 4 download

transcript

Working with time series data with InfluxDB

Paul Dix @pauldix

paul@influxdb.com

What is time series data?

Stock trades and quotes

Metrics

Analytics

Events

Sensor data

Two kinds of time series data…

Regular time series

t0 t1 t2 t3 t4 t6 t7

Samples at regular intervals

Irregular time series

t0 t1 t2 t3 t4 t6 t7

Events whenever they come in

Inducing a regular time series from an irregular one

query: select count(customer_id) from events where time > now() - 1h group by time(1m), customer_id

Data that you ask questions about over time

InfluxDB is an open source distributed time

series database* still working on the distributed part

Why would you want a database for time series

data?

Scale

Example from DevOps• 2,000 servers, VMs, containers, or sensor units

• 200 measurements per server/unit

• every 10 seconds

• = 3,456,000,000 distinct points per day

Sharding Datausually requires application level code

Data retentionapplication level code and sharding

Rollups and aggregation

InfluxDB features

SQL style query language

Retention policiesautomatically managed data retention

Continuous queriesfor rollups and aggregation

HTTP API - 2 endpoints

HTTP API - 2 endpoints

/write?db=mydb&rp=fooWrite: HTTP POST

HTTP API - 2 endpoints

/write?db=mydb&rp=foo

/query?db=mydb&rp=foo&q=

Write: HTTP POST

Read: HTTP GET

InfluxDB Schema• Measurements (e.g. cpu, temperature, event,

memory)

InfluxDB Schema• Measurements (e.g. cpu, temperature, event,

memory)

• Tags (e.g. region=uswest, host=serverA, sensor=23)

InfluxDB Schema• Measurements (e.g. cpu, temperature, event,

memory)

• Tags (e.g. region=uswest, host=serverA, sensor=23)

• Fields (e.g. value=23.2, info=‘this is some extra stuff`, present=true)

InfluxDB Schema• Measurements (e.g. cpu, temperature, event,

memory)

• Tags (e.g. region=uswest, host=serverA, sensor=23)

• Fields (e.g. value=23.2, info=‘this is some extra stuff`, present=true)

• Timestamp (nano-second epoch)

All data is indexed by measurement, tagset,

and time

Influx CLI

$ ./influx Connected to http://localhost:8086 version 0.9 InfluxDB shell 0.9 >

Create a database

CREATE DATABASE foo

Create a retention policy

CREATE RETENTION POLICY <rp-name> ON <db-name> DURATION <duration> REPLICATION <n> [DEFAULT]

Create a retention policy

CREATE RETENTION POLICY <rp-name> ON <db-name> DURATION <duration> REPLICATION <n> [DEFAULT]

CREATE RETENTION POLICY high_precision ON mydb DURATION 7d REPLICATION 3 DEFAULT

Create a retention policy

CREATE RETENTION POLICY <rp-name> ON <db-name> DURATION <duration> REPLICATION <n> [DEFAULT]

CREATE RETENTION POLICY high_precision ON mydb DURATION 7d REPLICATION 3 DEFAULT

Writes will go into this RP unless otherwise specified

Discovery

Inverted indexof measurements and tags

DiscoverySHOW MEASUREMENTs

DiscoverySHOW MEASUREMENTs

SHOW MEASUREMENTS where host = 'serverA'

DiscoverySHOW MEASUREMENTs

SHOW MEASUREMENTS where host = 'serverA'

SHOW TAG KEYS

DiscoverySHOW MEASUREMENTs

SHOW MEASUREMENTS where host = 'serverA'

SHOW TAG KEYS

SHOW TAG KEYS from CPU

DiscoverySHOW MEASUREMENTs

SHOW MEASUREMENTS where host = 'serverA'

SHOW TAG KEYS

SHOW TAG KEYS from CPU

SHOW TAG VALUES from CPU WITH KEY = 'region'

DiscoverySHOW MEASUREMENTs

SHOW MEASUREMENTS where host = 'serverA'

SHOW TAG KEYS

SHOW TAG KEYS from CPU

SHOW TAG VALUES from CPU WITH KEY = 'region'

SHOW SERIES

DiscoverySHOW MEASUREMENTs

SHOW MEASUREMENTS where host = 'serverA'

SHOW TAG KEYS

SHOW TAG KEYS from CPU

SHOW TAG VALUES from CPU WITH KEY = 'region'

SHOW SERIES

SHOW SERIES where service = 'redis'

Queries

SQL-ish

select * from some_series where time > now() - 1h

Aggregates

select percentile(90, value) from cpu where time > now() - 1d group by time(10m)

Aggregates

select percentile(90, value) from cpu where time > now() - 1d group by time(10m), region

Group by a tag

Where against Regex (field)

select value from some_log_series where value =~ /.*ERROR.*/ and time > "2014-03-01" and time < "2014-03-03"

Where against Regex (tag)

select value from some_log_series where host =~ /.*asdf.*/ and time > "2014-03-01" and time < “2014-03-03" group by host

Functionsmin max percentile first last stddev mean count sum median distinct count(distinct)

more soon: difference, histogram, moving_average

Continuous queriesCREATE CONTINUOUS QUERY "10m_event_count"ON mydbBEGIN SELECT count(value) INTO "6_months".events FROM events GROUP BY time(10m)END;

Other tools

Telegrafdata collection

Chronograf

Grafana

More coming• Compression

• Clustering

• Custom functions

Thank you!Paul Dix @pauldix

paul@influxdb.com