Date post: | 19-Jan-2015 |
Category: |
Technology |
Upload: | charles-care |
View: | 1,034 times |
Download: | 0 times |
Data, dev-ops, and cloud services
Building a distributed data-platform
Charles Care
Engineering TeamKasabi / Talis
Talk overview
● About me...● What Kasabi is,
● what we are trying to do● how we are working to achieve that● a quick walk-though
● Discussion of the Kasabi platform team● Our technology / architecture● Our engineering culture● Lessons learnt
Views are mine...
…and not necessarily those of my (current/past) employers
About me...
About me...
● 2001-2004 – BSc Computer Science (Warwick) ● 2004-2008 – PhD Computer Science (Warwick) ● 2007-2011 – BT Plc
● Technical risk analyst – BT Global MPLS Network● Software Engineer – Infrastructure for Financial Markets● Senior Software Engineer – Central software standards
and tools
● 2011-Present – Talis/Kasabi ● Software Engineer – Semantic web platform
About Kasabi
About Kasabi
● Data market place● Bringing together data...
● owners● consumers
● Lowering the barrier for data-driven apps to enter the market
● Enabling new opportunities for aggregating and mixing data
Data licensing today
Data Owners Data Consumers
Bespoke, expensive, contracts
Kasabi as a data platform
Data Owners
Third-party services
Application Developers
Data enthusiastsData engineers
API developers
About Kasabi
● Publish datasets using standard APIs● Access data using standard APIs
● Query a dataset using SPARQL● Search a dataset using a simple full-text search
● Define, contribute, and share your own APIs
A dataset
Access data using standard APIs
Contribute custom APIs
Example – contributed APIs
Current organisation
● Product development● Data engineering● Customer operations● Platform development
Current organisation
● Product development● Data engineering● Customer operations● Platform development
Platform architecture
Data Platform
Load balancing and routing
Update services Search services Query services
Datasets
● Need to store and update datasets● Access data via various services● Must scale with load and increasing data● Must be tolerant to failure● Extensible
● Should be easy to add new services over time
To distribute...
...or not to distribute
Dynamic Gossip Network
Distributed PlatformRouting layer
Updateservice Search
service
Sequence Service Storage Service Monitoring Services
Updateservice
Updateservice
Searchservice
Searchservice
SPARQLservice
SPARQLservice
SPARQLservice
Newservice?
Dynamic Gossip Network
Distributed Platform – updatesRouting layer
Updateservice Search
service
Sequence Service Storage Service
Updateservice
Updateservice
Searchservice
Searchservice
SPARQLservice
SPARQLservice
SPARQLservice
Newservice?
Monitoring Services
- Updates are sequenced- Data stored in distributed storage
Dynamic Gossip Network
Distributed Platform – updatesRouting layer
Updateservice Search
service
Sequence Service Storage Service
Updateservice
Updateservice
Searchservice
Searchservice
SPARQLservice
SPARQLservice
SPARQLservice
Newservice?
Monitoring Services
- Updates are gossiped around network- Here a SPARQL node realises that it should apply the update
Dynamic Gossip Network
Distributed Platform – queryRouting layer
Updateservice Search
service
Sequence Service Storage Service
Updateservice
Updateservice
Searchservice
Searchservice
SPARQLservice
SPARQLservice
SPARQLservice
Newservice?
Monitoring Services
SPARQL queries will now reflect the update that was submitted
Monolithic vs distributed
● Monolithic● Easy to synchronise events and data
● Consistent views and queries
● Less inter-process communication / less network overhead
● Easier to optimise for high throughput
● Single code-base
● Fewer processes to monitor
● Distributed● Service-oriented - separate concerns run in isolated processes (and can be scaled
independently)
● Development is component-based
– Changes are more focussed / helps avoids scope-creep
● Deployment can be localised to avoid downtime
● Failure is more likely – so you need to plan for it
● Easier to integrate out-of-the box software – e.g. using standard Apache Solr
Distributed data platform
● Separate services for each API
● Communication via Gossip messages
● Have to manage eventual consistency
● Highly scalable
● Easy to add new services
● Use standard protocols and open-source components● HTTP libraries / REST / ZeroMQ / Apache Thrift● RDF and SPARQL using Apache Jena● Search using Apache Solr● Avoid modification and forks
● Deploy into Amazon EC2 (also using: S3, EMR, and ELB)
Benefits of using cloud services
Consider a start-up in 2002
● Have an idea...
● Get funding (development, op-ex, cap-ex)
● Aquire servers● Set-up your servers
– mail, web, source code repo, build systems
– development, staging, live
● Some 'cloud' services
– …, SourceForge, shared servers, etc
● Build, and go, to market● Probably embedding open-source
components
● Delivery based on full-stack, monolithic, architectures
Consider a start-up in 2012
● Have an idea...
● Get funding (development capital, op-ex)● you will probably not get cap-ex
● Use cloud services... rent rather than buy● SaaS – Software as a Service
– Why would you run your own (chat/email etc)
– Host your code in GitHub/BitBucket etc
● PaaS – Platform as a Service
– Do you need to control the full stack?
– Could you leverage platforms like: Heroku, Joyant, AppEngine etc
– Amazon RDS
● IaaS – Infrastructure as a Service
– Cloud services to provide 'bare metal'
● Build and go to market quickly
● scale elastically over time
But what about the enterprise?
● Benefits of cloud services are already transforming the enterprise● Private clouds
● Virtual appliances
● Cloud bursting
● Independent scaling
● Separation of concerns
● SOA architecture
● And in future...● Appetite for IaaS is growing
● PaaS and SaaS will follow.
● Perimeter security will be replaced by localised security boundaries
So how do we build this stuff...?
How it all happens
● Constantly iterating through...● Requirements● Development (Test-driven)● Testing/Review● Deployment● Operation
● We're an Agile, dev-ops team...
so all the above is a shared responsibility
Being a dev-ops team...
● Removing barriers between development and operations
● Shared responsibilities rather than distrust
● Everyone has root access
● Developers are responsible for operating systems they build
● Everyone is free to make changes
...and responsible to manage the roll-out of those changes
● Ops/Deployment/Monitoring are automated
● Everyone should have full-stack awareness
● Read more...● http://dev2ops.org/blog/2010/2/22/what-is-devops.html
● http://www.jedi.be/blog/
● http://en.wikipedia.org/wiki/Devops
● http://www.slideshare.net/jallspaw/ 10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Life-cycle of a change
Requirements and Planning
● Identification of requirement ● Planning
● Break down big changes into smaller tasks– Can the change be deployed in small steps?– Can the change be dark-deployed?
● Understand the wider impact● Find middle ground between generic and specific
● Team is self-organising● People pull work from the prioritised, planned stories
Branch based development
● One branch per change, squash before merge
Writing the code
● Work on a branch ● don't know if/when you'll merge
● Test-driven● Unit tests first
● Do acceptance tests need to change?
● What technology? Which tool-sets?
● Smoke testing● How do you know it works?
● What's different in production?
● What are the risks of failure?
● Feature flags?
Tests run: 110, Failures: 0, Errors: 0, Skipped: 2
[INFO] ------------------------------------------------------------------------[INFO] BUILD SUCCESSFUL[INFO] ------------------------------------------------------------------------[INFO] Total time: 39 seconds[INFO] Finished at: Sat Feb 18 15:20:36 GMT 2012[INFO] Final Memory: 33M/240M[INFO] ------------------------------------------------------------------------
Writing the code
● Avoid unnecessary scope-creep● “I'll just fix this...”
● “It would be much cleaner if I re-factored this...”
● “It would be neat if I also added this...”
● …however, these observations can be written as new stories
● …and sometimes it's good to fix things before they cause pain
● …if extra changes are really necessary, can they be implemented separately?
● …team should be empowered to fix technical debt
● ...managing scope-creep is a shared responsibility
● Be prepared to abandon a change if it's taking too long, maybe it needs more planning?
● Should you be pairing?
● Should you demo your work?
Code review
● Code review possible with tools for distributed teams (e.g. Gerrit or ReviewBoard)
● If you're not following a strict pairing policy, code-review is vital
● Useful to make others aware of changes
● Gerrit● Build agent automatically builds your change and
runs tests – verify +/- 1
● Invite others to review your code, they can give it a score between -2 and +2.
● Can only deploy code once at least one person has given a +2
● Work-flow is customisable
● Self-organising... anyone can review
$> git commit$> git review
Code review (2)
Code review (3)
Merge / Deployment
● Merge & Deployment● One-click deployment
● Developer should press the button
● Code is merged into the master/release branch
● Build server automatically checks out the code and builds, tags, and uploads the release to an artefact repository
● Package is automatically deployed on all servers
– Extra orchestration for external-facing services to avoid “thundering-herd” problems
Managing infrastructure
● Puppet or Chef
● Build packages (e.g. DEB or RPM)
● Centralise configuration management
● Utilising cloud compute infrastructure● Amazon EC2
● Amazon S3
● Elastic load balancers
● Elastic Map-Reduce
● Application monitoring● Metrics
● Log analysis
● Internal monitoring
● External checks
Lessons learnt
(again, my views!)
Technical lessons learnt
● Use distributed SOA-based services to reduce tight-coupling
● Monitor everything...● Leverage cloud offerings
● wrap them with well-defined interfaces to avoid lock-in
● Design systems to scale● Use open and unmodified components where possible
● Standard components fronting external APIs● E.g. Jena, Solr, Haproxy, Apache
Practices that have helped us
● Dev-ops culture● Pragmatic approach to agile development
● Task allocation should be 'pull', rather than 'push'● Teams should be self-organising● Pairing when working on new problems
● Test-Driven-Development (TDD)● Continuous integration● Peer-review of code● Continuous deployment
…so, in summary...
Conclusion
● Isolate your design into components● Empower your team to release small changes
frequently● Leverage hosted/cloud offerings
Thanks for listening!
Credits
● Thanks for the invite to speak● Thanks to Kasabi / Talis Systems Ltd
● Sign up at http://www.kasabi.com
Graphics from http://www.iconarchive.com/, http://www.oxygen-icons.org and http://www.icons-land.com
Questions?