+ All Categories
Home > Technology > Inside Wordnik's Architecture

Inside Wordnik's Architecture

Date post: 11-May-2015
Category:
Upload: tony-tam
View: 3,827 times
Download: 0 times
Share this document with a friend
Description:
Slides about Wordnik's arch
Popular Tags:
45
Inside Wordnik's Architecture Tony Tam @fehguy
Transcript
Page 1: Inside Wordnik's Architecture

Inside Wordnik's Architecture

Tony Tam@fehguy

Page 2: Inside Wordnik's Architecture

Who is Wordnik?

•Founded in 2008 by Erin McKean

•"Understand meaning of words automatically"

•Patented "Free-Range Definition" technology

•Constructed largest (known) English Word Graph

We do Discovery

Page 3: Inside Wordnik's Architecture

It's all about Data!

Page 4: Inside Wordnik's Architecture

Data?

•Word Graph is built by data

•Runtime answers needed fast

50M+ Nodes!

80mS reads!

80M+ Edges!

Page 5: Inside Wordnik's Architecture

What we do with Data

•Update the Graph constantly

•Augment our NLP pipeline

•"Reality-based Annotation" with current, real-world data

Page 6: Inside Wordnik's Architecture

What we do with Data

•Update the Graph constantly

•Augment our NLP pipeline

•"Reality-based Annotation" with current, real-world data

Language is NOT static

Page 7: Inside Wordnik's Architecture

What we do with Data

•Update the Graph constantly

•Augment our NLP pipeline

•"Reality-based Annotation" with current, real-world data

Language is NOT static

Twitter?

Tumblr?

Wordpress

Next???

Page 8: Inside Wordnik's Architecture

Is a 20 year-old corpus good enough?

Page 9: Inside Wordnik's Architecture

How we do it

•Amazon EC2-based deployment

•Efficiency through constraint-based architecture

• Small is Big!

•Horizontal scaling by adding servers!

• Yea, we can always go vertical

•Blah, blah, more details!

Page 10: Inside Wordnik's Architecture

Micro Services

•Services are stand-alone building blocks

•Increase capacity through a "more like this" button

Page 11: Inside Wordnik's Architecture

Micro Services

•Big application => micro services

Monolithic application

"Isn't this just SOA?"

Page 12: Inside Wordnik's Architecture

Micro Services

•Big application => micro services

Monolithic application

"Isn't this just SOA?"

Page 13: Inside Wordnik's Architecture

Micro Services

•Big application => micro services

Monolithic application

"Isn't this just SOA?"

Page 14: Inside Wordnik's Architecture

Micro Services

•Big application => micro services

Monolithic application

"Isn't this just SOA?"

Page 15: Inside Wordnik's Architecture

Not PO-SOA

•This is different

• No proprietary message bus

• Decoupled objects

• Dedicated storage***

•Speak REST

• Develop your services in…

• Java

• Scala

• Ruby

• Php

Page 16: Inside Wordnik's Architecture

Al valid

!

Speak REST?

•Sounds good but…

• REST semantics vary wildly

• HATEOAS vs. practical REST?

/api/pet.json/1?delete (GET)

/api/pet.json/1 (DELETE)

/api/pet.json/1 (POST empty)

So…

Page 17: Inside Wordnik's Architecture

All valid

!

Speak REST?

•Sounds good but…

• REST semantics vary wildly

• HATEOAS vs. practical REST?

/api/pet.json/1?delete (GET)

/api/pet.json/1 (DELETE)

/api/pet.json/1 (POST empty)

So…API

Styleguide!

Peer Review!

Better Docs!

API Council!

Page 18: Inside Wordnik's Architecture

mSOA makes new Challenges

•It's communication (not easy)

•Need a consumer & provider contract

•Driving force to create Swagger

Page 19: Inside Wordnik's Architecture

What is Swagger?

•Swagger is…

• Spec for declaring and documenting an API

• A framework for auto-generating the spec

• A library for client library generation

• A JSON-based test framework

•It's open source!

• http://swagger.wordnik.com

Page 20: Inside Wordnik's Architecture

How?

•Swagger Codegen

• Creates a client based on your Swagger Specscala src/main/scala/Codegen.scala \ ${swagger-spec-url}

Scala

Ruby

Page 21: Inside Wordnik's Architecture

In the Wordnik Workflow

•Jenkins will…

• Build a service library

• Build a stand-alone application distro

• Build an installable image (RPM)

• Build a compatible client library

•Consumers will…

• Declare dependency on a service version

• Use a client for that version

• Be given a list of compatible services, by cluster, version

Page 22: Inside Wordnik's Architecture

Back to Data

•Micro services have small(ish) databases

• Share nothing across services

• YES To replica sets

•Deployed to ephemeral storage

• (more in a bit)

• Small by design

•How to keep them small?

Page 23: Inside Wordnik's Architecture

Keeping Databases Small

•Some easy tricks

• Schema-less => "schema per document"

• Keep field names short!

db.foo.save({user_name:"Tony"})

db.foo.save({un:"Tony"})

•Indexes

• They can get *huge*

• Make _id matter!

Repeat 10e9

times!

Page 24: Inside Wordnik's Architecture

Keeping Databases Small

•Some easy tricks

• Schema-less => "schema per document"

• Keep field names short!

db.foo.save({user_name:"Tony"})

db.foo.save({un:"Tony"})

•Indexes

• They can get *huge*

• Make _id matter!

Repeat 10e9

times!

Page 25: Inside Wordnik's Architecture

Keeping Databases Small

•Don't make _id just an "auto increment"You're stuck with it! Be smart

• User collection? Try _id: username

• Email collection? Try _id: email

• Date-driven collection? How about _id: "20120502"

• db.logins.find({_id:/^201205/}) 17

15

27

Be lazy until you can't anymore!

Page 26: Inside Wordnik's Architecture

Keeping Databases Small

•DAO or die!

• Fancy index scheme => control access to collections

NO!!!!

Yes

Page 27: Inside Wordnik's Architecture

Keeping Databases Small

•If/when you need to shard…

Don't make your

clients do this!

Page 28: Inside Wordnik's Architecture

Keeping Databases Small

•Again, why keep them small?

•Starting a new replica

• Initial sync

• Index rebuilding

•Backups

•Index Compaction

•Speed

•TCO

Page 29: Inside Wordnik's Architecture

Keeping Databases Small

•Again, why keep them small?

•Starting a new replica

• Initial sync

• Index rebuilding

•Backups

•Index Compaction

•Speed

•TCO

Everything is

easier

This can take DAYS

Page 30: Inside Wordnik's Architecture

Ephemeral Storage?

•Every EC2 instance type has some (except micro)

•Only available via EC2 API

•Less prone to issues than EBS

•Faster ***

•Included in cost of server

Page 31: Inside Wordnik's Architecture

Ephemeral Storage?

•Every EC2 instance type has some (except micro)

•Only available via EC2 API

•Less prone to issues than EBS

•Faster ***

•Included in cost of serverBut dies on host reboot!

Page 32: Inside Wordnik's Architecture

Keeping Data Safe

Page 33: Inside Wordnik's Architecture

Which Zone? Which Region?

Page 34: Inside Wordnik's Architecture

Which Zone? Which Region?

Arbiter handles external

connectivity issue

detection

Page 35: Inside Wordnik's Architecture

How does this really stack up?

•Tuned indexes & access, split with services

• Was: 3 DAS Devices w/18 TB disk

• Now: 21 M1.large + M1.xlarge instances

• 3 Zones, 2 regions

•The Gory Detailsblog.wordnik.com/with-software-small-is-the-new-big

Page 36: Inside Wordnik's Architecture

As for Services

•~1,000 requests/sec via Swagger-enabled micro services

•Direct to Consumer via SwaggerSocket

Page 37: Inside Wordnik's Architecture

What's Next

•Migrating all services to SwaggerSocket

• OSS WebSocket subprotocol

https://github.com/wordnik/swaggersocket

• 25%-100% speed increase (sync & async)

•Discovery via Wordnik

Page 38: Inside Wordnik's Architecture

If you're Interested…

Page 39: Inside Wordnik's Architecture

If you're Interested…

Page 40: Inside Wordnik's Architecture

If you're Interested…

Page 41: Inside Wordnik's Architecture

If you're Interested…

Page 42: Inside Wordnik's Architecture

If you're Interested…

Page 43: Inside Wordnik's Architecture

If you're Interested…

Page 44: Inside Wordnik's Architecture

If you're Interested…

Page 45: Inside Wordnik's Architecture

See more:

developer.wordnik.com

swagger.wordnik.com

github.com/wordnik

Questions?


Recommended