Date post: | 11-May-2015 |
Category: |
Technology |
Upload: | tony-tam |
View: | 3,827 times |
Download: | 0 times |
Inside Wordnik's Architecture
Tony Tam@fehguy
Who is Wordnik?
•Founded in 2008 by Erin McKean
•"Understand meaning of words automatically"
•Patented "Free-Range Definition" technology
•Constructed largest (known) English Word Graph
We do Discovery
It's all about Data!
Data?
•Word Graph is built by data
•Runtime answers needed fast
50M+ Nodes!
80mS reads!
80M+ Edges!
What we do with Data
•Update the Graph constantly
•Augment our NLP pipeline
•"Reality-based Annotation" with current, real-world data
What we do with Data
•Update the Graph constantly
•Augment our NLP pipeline
•"Reality-based Annotation" with current, real-world data
Language is NOT static
What we do with Data
•Update the Graph constantly
•Augment our NLP pipeline
•"Reality-based Annotation" with current, real-world data
Language is NOT static
Twitter?
Tumblr?
Wordpress
Next???
Is a 20 year-old corpus good enough?
How we do it
•Amazon EC2-based deployment
•Efficiency through constraint-based architecture
• Small is Big!
•Horizontal scaling by adding servers!
• Yea, we can always go vertical
•Blah, blah, more details!
Micro Services
•Services are stand-alone building blocks
•Increase capacity through a "more like this" button
Micro Services
•Big application => micro services
Monolithic application
"Isn't this just SOA?"
Micro Services
•Big application => micro services
Monolithic application
"Isn't this just SOA?"
Micro Services
•Big application => micro services
Monolithic application
"Isn't this just SOA?"
Micro Services
•Big application => micro services
Monolithic application
"Isn't this just SOA?"
Not PO-SOA
•This is different
• No proprietary message bus
• Decoupled objects
• Dedicated storage***
•Speak REST
• Develop your services in…
• Java
• Scala
• Ruby
• Php
Al valid
!
Speak REST?
•Sounds good but…
• REST semantics vary wildly
• HATEOAS vs. practical REST?
/api/pet.json/1?delete (GET)
/api/pet.json/1 (DELETE)
/api/pet.json/1 (POST empty)
So…
All valid
!
Speak REST?
•Sounds good but…
• REST semantics vary wildly
• HATEOAS vs. practical REST?
/api/pet.json/1?delete (GET)
/api/pet.json/1 (DELETE)
/api/pet.json/1 (POST empty)
So…API
Styleguide!
Peer Review!
Better Docs!
API Council!
mSOA makes new Challenges
•It's communication (not easy)
•Need a consumer & provider contract
•Driving force to create Swagger
What is Swagger?
•Swagger is…
• Spec for declaring and documenting an API
• A framework for auto-generating the spec
• A library for client library generation
• A JSON-based test framework
•It's open source!
• http://swagger.wordnik.com
How?
•Swagger Codegen
• Creates a client based on your Swagger Specscala src/main/scala/Codegen.scala \ ${swagger-spec-url}
Scala
Ruby
In the Wordnik Workflow
•Jenkins will…
• Build a service library
• Build a stand-alone application distro
• Build an installable image (RPM)
• Build a compatible client library
•Consumers will…
• Declare dependency on a service version
• Use a client for that version
• Be given a list of compatible services, by cluster, version
Back to Data
•Micro services have small(ish) databases
• Share nothing across services
• YES To replica sets
•Deployed to ephemeral storage
• (more in a bit)
• Small by design
•How to keep them small?
Keeping Databases Small
•Some easy tricks
• Schema-less => "schema per document"
• Keep field names short!
db.foo.save({user_name:"Tony"})
db.foo.save({un:"Tony"})
•Indexes
• They can get *huge*
• Make _id matter!
Repeat 10e9
times!
Keeping Databases Small
•Some easy tricks
• Schema-less => "schema per document"
• Keep field names short!
db.foo.save({user_name:"Tony"})
db.foo.save({un:"Tony"})
•Indexes
• They can get *huge*
• Make _id matter!
Repeat 10e9
times!
Keeping Databases Small
•Don't make _id just an "auto increment"You're stuck with it! Be smart
• User collection? Try _id: username
• Email collection? Try _id: email
• Date-driven collection? How about _id: "20120502"
• db.logins.find({_id:/^201205/}) 17
15
27
Be lazy until you can't anymore!
Keeping Databases Small
•DAO or die!
• Fancy index scheme => control access to collections
NO!!!!
Yes
Keeping Databases Small
•If/when you need to shard…
Don't make your
clients do this!
Keeping Databases Small
•Again, why keep them small?
•Starting a new replica
• Initial sync
• Index rebuilding
•Backups
•Index Compaction
•Speed
•TCO
Keeping Databases Small
•Again, why keep them small?
•Starting a new replica
• Initial sync
• Index rebuilding
•Backups
•Index Compaction
•Speed
•TCO
Everything is
easier
This can take DAYS
Ephemeral Storage?
•Every EC2 instance type has some (except micro)
•Only available via EC2 API
•Less prone to issues than EBS
•Faster ***
•Included in cost of server
Ephemeral Storage?
•Every EC2 instance type has some (except micro)
•Only available via EC2 API
•Less prone to issues than EBS
•Faster ***
•Included in cost of serverBut dies on host reboot!
Keeping Data Safe
Which Zone? Which Region?
Which Zone? Which Region?
Arbiter handles external
connectivity issue
detection
How does this really stack up?
•Tuned indexes & access, split with services
• Was: 3 DAS Devices w/18 TB disk
• Now: 21 M1.large + M1.xlarge instances
• 3 Zones, 2 regions
•The Gory Detailsblog.wordnik.com/with-software-small-is-the-new-big
As for Services
•~1,000 requests/sec via Swagger-enabled micro services
•Direct to Consumer via SwaggerSocket
What's Next
•Migrating all services to SwaggerSocket
• OSS WebSocket subprotocol
https://github.com/wordnik/swaggersocket
• 25%-100% speed increase (sync & async)
•Discovery via Wordnik
If you're Interested…
If you're Interested…
If you're Interested…
If you're Interested…
If you're Interested…
If you're Interested…
If you're Interested…
See more:
developer.wordnik.com
swagger.wordnik.com
github.com/wordnik
Questions?