Date post: | 20-Aug-2015 |
Category: |
Technology |
Upload: | pydata |
View: | 505 times |
Download: | 0 times |
• Writing a simple application with Bokeh
• Packaging our application with Docker
• Orchestrating our application with Ferry
Technical material can be found at: https://github.com/jhorey/pydata
U.S. Census
http://api.census.gov/data/2011/acs5?get=DP03_0062E&for=county:*&in=state:06
Median income All counties California
Let’s install Bokeh$ pip install bokeh >> Downloading/unpacking bokeh >> SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel. $ apt-get install python-dev & pip install bokeh >> "gcc: error trying to exec 'cc1plus': execvp: No such file or directory $ apt-get install g++ $ pip install bokeh
RuntimeError: bokeh sample data directory does not exist, please execute bokeh.sampledata.download()
$ python >>> import bokeh.sampledata
Let’s share
#!/bin/bash !# Make sure we have ‘pip’ installed apt-get install python-pip !# Install packages in right order apt-get —-yes install g++ python-dev pip install bokeh !# Now download the data python geography.py data/ python population economic Kentucky data/ !# Start the web server python webserver data/
• Your script didn’t work • Oh, I was supposed to run this as
sudo? • Ok, it still didn’t work • I get this funny error • Oh yeah, I’m running Redhat • Ok I’m at my desk, just use my
computer
• Encapsulates applications in isolated containers • Makes it easy and safe to distribute applications • Easy to get started
Our DockerfileStart from a clean Precise image
Install stuff
Add our files
Run this when starting
$ docker build -t ferry/pydata . $ docker push ferry/pydata
Sharing made simple
$ docker pull ferry/pydata $ docker run -p 8000:8000 -name p1 —d ferry/pydata
p1
Kernel
Hardware
Sharing made simple
$ docker pull ferry/pydata $ docker run -p 8000:8000 -name p1 —d ferry/pydata $ docker run -p 8001:8000 -name p2 —d ferry/pydata $ docker run -p 8002:8000 -name p3 —d ferry/pydata
p1 p2 p3
Kernel
Hardware
• Containers share basic kernel and H.W. capabilities
• No virtualization
• Containers are isolated • Access via port forwarding
You can run these commands now!
• Highly scalable and fault-tolerant • Great for storing streaming data (sensors,
messages)
CREATE KEYSPACE census WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; !USE census; !CREATE TABLE acs_economic_data ( state_cd TEXT, state_name TEXT, county_cd TEXT, county_name TEXT, median INT, mean INT, capita INT, PRIMARY KEY(count_cd, state_cd) );
Orchestration
Web DB
Web + DB
• Simple • Full control • More work for you
• Simpler Dockerfile • More extensible • How to orchestrate?
• Specify the containers that constitute your application in YAML
• Support for Hadoop, Cassandra, GlusterFS, and OpenMPI
• It’s a little bit like pip for your Docker-based runtime environment
Ferry
http://ferry.opencore.io
Our Application
backend: - storage: personality: "cassandra" instances: 1 connectors: - personality: "ferry/pydata-cassandra" ports: ["8000:8000"]
# The cassandra-client base comes with the various drivers # pre-installed. FROM ferry/cassandra-client NAME ferry/pydata-cassandra !# Place the start scripts in the events directories so they # are started when the connector is brought up. ADD ./scripts/startcas.sh /service/runscripts/start/ ADD ./scripts/restartcas.sh /service/runscripts/restart/ RUN chmod a+x /service/runscripts/start/startcas.sh RUN chmod a+x /service/runscripts/restart/restartcas.sh
+
Easy to share (again)
$ ferry start cassandra.yml sa-df8d0aa6 $ ferry ps UUID Storage Compute Connectors Status Base Time ---- ------- ------- ---------- ------ ---- ---- sa-df8d0aa6 se-54ed4e93 se-a5350a8d running cassandra.yml
$ ferry ssh sa-df8d0aa6 root@client-se-a5350a8d:~# ps -eaf | grep python root 144 1 0 19:49 ? 00:00:00 python /home/ferry/pydata/bokeh/webserver.py /home/ferry/pydata/data
What’s it doing?$ ferry start cassandra.yml
Web C* C*
root@client-se-a5350a8d:~# env | grep BACK BACKEND_STORAGE_TYPE=cassandra BACKEND_STORAGE_IP=10.1.0.12
Generate!Config
What’s it doing?$ ferry start yarn
Client
Y Y
root@client-se-b597cb21:~# env | grep BACK BACKEND_STORAGE_TYPE=gluster BACKEND_STORAGE_IP=10.1.0.18 BACKEND_COMPUTE_TYPE=yarn BACKEND_COMPUTE_IP=10.1.0.15
G G
• Even simple applications can be complicated to install and run
• Docker helps quite a bit with this
• Ferry helps build out big data applications