+ All Categories
Transcript
  • Scalable File System In 14 Days Jeff Hoffer, Software Architect Alex Zherdev, Sr. Software Engineer
  • Our Background In the beginning... YouTube for Documents Today Make every small business better Professional Documents Custom Documents Business Licenses Jason Nazar Alon Shwartz The Team
  • Our Product www.docstoc.com
  • Initial Approach Pros: Existing libraries used Reliable storage Replication Cons: Hard to scale out Replication cant keep up Taxed all data SELECT `text_data` FROM `documents` WHERE `doc_id` = 8675309;
  • IIS HTTP Based Solution Pros: HTTP GET IIS Static Content Cache 5TB = Years of Growth Easy Setup & Deploy Cons: Not scalable NTFS & 30M small files Replication In-House HTTP GET http://docs.api/text/160717/8675309.txt
  • Importance of Performance IIS Source Failed early 2013 Page speed heavily influenced our traffic and SEO MongoDB solution implemented within 2 weeks and results immediately felt 0 5 10 15 20 25 Speed 0 1 2 3 4 Views
  • Requirements Sharded horizontal scale out of reads and writes Replication no single point of failure for core business data Doc Page Peak Read Load of 200 / second < 4s REST Interface switch only requires changing URL Easy to Maintain maintenance cost of no more than 1 FTE / day / month 99.9% uptime Can handle # of our current set of text files 43 M Production Rollout within 3 weeks
  • Requirements Sharded horizontal scale out of reads and writes Replication no single point of failure for core business data Doc Page Peak Read Load of 200 / second < 4s REST Interface switch only requires changing URL Easy to Maintain maintenance cost of no more than 1 FTE / day / month 99.9% uptime Can handle # of our current set of text files 43 M Production Rollout within 3 weeks
  • Requirements Sharded horizontal scale out of reads and writes Replication no single point of failure for core business data Doc Page Peak Read Load of 200 / second < 4s REST Interface switch only requires changing URL Easy to Maintain maintenance cost of no more than 1 FTE / day / month 99.9% uptime Can handle # of our current set of text files 43 M Production Rollout within 3 weeks
  • Requirements Sharded horizontal scale out of reads and writes Replication no single point of failure for core business data Doc Page Peak Read Load of 200 / second < 4s REST Interface switch only requires changing URL Easy to Maintain maintenance cost of no more than 1 FTE / day / month 99.9% uptime Can handle # of our current set of text files 43 M Production Rollout within 3 weeks
  • MongoDB FTW
  • Test Setup
  • { id : {document_id} body: {text_content} created: {date_time} } Simple Structure Object Size 50KB Shard on hashed id Rarely modified Heavy Reads Mongo Collection Structure
  • Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) **10x peak load
  • Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) **10x peak load
  • Test Setup
  • Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection 20 min (3x) *ASP.NET MVC 4 Web API **10x peak load
  • Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection 20 min (3x) Jmeter ASP.NET REST Server* Seeded Collection 2M 30 min (3x) *ASP.NET MVC 4 Web API **10x peak load
  • Test Setup
  • Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection 20 min (3x) Jmeter ASP.NET REST Server* Seeded Collection 2M 30 min (3x) .NET Console Loader ASP.NET REST Server* Seeded Collection 2M 1 hour (3x) *ASP.NET MVC 4 Web API **10x peak load
  • Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection 20 min (3x) Jmeter ASP.NET REST Server* Seeded Collection 2M 30 min (3x) .NET Console Loader ASP.NET REST Server* Seeded Collection 2M 1 hour (3x) .NET Console Loader ASP.NET REST Server* Seeded Collection 6M Overnight (10 hrs) *ASP.NET MVC 4 Web API **10x peak load
  • Production
  • In Conclusion Its Good Enough, Its Fast Enough, and Doggone It, Developers Like It! Fast Prototype Low Maintenance Quick Deployment Scale Out Stable Linux, Windows, Mac Excellent Support

Top Related