Scalable File System In 14 Days Jeff Hoffer, Software Architect
Alex Zherdev, Sr. Software Engineer
Our Background In the beginning... YouTube for Documents Today
Make every small business better Professional Documents Custom
Documents Business Licenses Jason Nazar Alon Shwartz The Team
Our Product www.docstoc.com
Initial Approach Pros: Existing libraries used Reliable storage
Replication Cons: Hard to scale out Replication cant keep up Taxed
all data SELECT `text_data` FROM `documents` WHERE `doc_id` =
8675309;
IIS HTTP Based Solution Pros: HTTP GET IIS Static Content Cache
5TB = Years of Growth Easy Setup & Deploy Cons: Not scalable
NTFS & 30M small files Replication In-House HTTP GET
http://docs.api/text/160717/8675309.txt
Importance of Performance IIS Source Failed early 2013 Page
speed heavily influenced our traffic and SEO MongoDB solution
implemented within 2 weeks and results immediately felt 0 5 10 15
20 25 Speed 0 1 2 3 4 Views
Requirements Sharded horizontal scale out of reads and writes
Replication no single point of failure for core business data Doc
Page Peak Read Load of 200 / second < 4s REST Interface switch
only requires changing URL Easy to Maintain maintenance cost of no
more than 1 FTE / day / month 99.9% uptime Can handle # of our
current set of text files 43 M Production Rollout within 3
weeks
Requirements Sharded horizontal scale out of reads and writes
Replication no single point of failure for core business data Doc
Page Peak Read Load of 200 / second < 4s REST Interface switch
only requires changing URL Easy to Maintain maintenance cost of no
more than 1 FTE / day / month 99.9% uptime Can handle # of our
current set of text files 43 M Production Rollout within 3
weeks
Requirements Sharded horizontal scale out of reads and writes
Replication no single point of failure for core business data Doc
Page Peak Read Load of 200 / second < 4s REST Interface switch
only requires changing URL Easy to Maintain maintenance cost of no
more than 1 FTE / day / month 99.9% uptime Can handle # of our
current set of text files 43 M Production Rollout within 3
weeks
Requirements Sharded horizontal scale out of reads and writes
Replication no single point of failure for core business data Doc
Page Peak Read Load of 200 / second < 4s REST Interface switch
only requires changing URL Easy to Maintain maintenance cost of no
more than 1 FTE / day / month 99.9% uptime Can handle # of our
current set of text files 43 M Production Rollout within 3
weeks
MongoDB FTW
Test Setup
{ id : {document_id} body: {text_content} created: {date_time}
} Simple Structure Object Size 50KB Shard on hashed id Rarely
modified Heavy Reads Mongo Collection Structure
Tests Client Server MongoDB Duration Reads (100/sec) Writes
(100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty
Collection 20 min (3x) **10x peak load
Tests Client Server MongoDB Duration Reads (100/sec) Writes
(100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty
Collection 20 min (3x) **10x peak load
Test Setup
Tests Client Server MongoDB Duration Reads (100/sec) Writes
(100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty
Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection
20 min (3x) *ASP.NET MVC 4 Web API **10x peak load
Tests Client Server MongoDB Duration Reads (100/sec) Writes
(100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty
Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection
20 min (3x) Jmeter ASP.NET REST Server* Seeded Collection 2M 30 min
(3x) *ASP.NET MVC 4 Web API **10x peak load
Test Setup
Tests Client Server MongoDB Duration Reads (100/sec) Writes
(100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty
Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection
20 min (3x) Jmeter ASP.NET REST Server* Seeded Collection 2M 30 min
(3x) .NET Console Loader ASP.NET REST Server* Seeded Collection 2M
1 hour (3x) *ASP.NET MVC 4 Web API **10x peak load
Tests Client Server MongoDB Duration Reads (100/sec) Writes
(100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty
Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection
20 min (3x) Jmeter ASP.NET REST Server* Seeded Collection 2M 30 min
(3x) .NET Console Loader ASP.NET REST Server* Seeded Collection 2M
1 hour (3x) .NET Console Loader ASP.NET REST Server* Seeded
Collection 6M Overnight (10 hrs) *ASP.NET MVC 4 Web API **10x peak
load
Production
In Conclusion Its Good Enough, Its Fast Enough, and Doggone It,
Developers Like It! Fast Prototype Low Maintenance Quick Deployment
Scale Out Stable Linux, Windows, Mac Excellent Support