Urbanesia - Development History

Post on 02-Jul-2015

1,053 views 0 download

description

Urbanesia's brief development history for Business Connect - 29 October 2012

transcript

URBANESIADevelopment History

Business Connect – 29 Oktober 2012

Prepared by: Batista Harahap

URBANESIA BETA V0The first public iteration of Urbanesia

PROS

• Data structures in MySQL

• Effective memory caching implementations

• Effective SEO implementations

• Effective search server implementations

• Urbanesia is successfully consumed as a Directory

CONS

• No effective separation of Backend & Frontend web applications

• Source Code = Spaghetti Code• Storing low value, high volume data in MySQL• Many queries using GROUP BY with highly populated tables• A warm boot will cause +20 seconds to generate any page• Difficult to scale horizontally & vertically• Very low concurrency

• The product’s identity is weak• So many features left unused by users

WHAT WE LEARNED

• Do NOT use MySQL as session storage• Use NoSQL database for low value, high volume

data• Separate backend & frontend web application,

create APIs for backends• Use output caching where available• When using PHP-APC, make sure apc.stat = 0• Increase concurrency by reverse proxying

requests to Apache

CHALLENGES

• Handle Google Bots traffic of over 1 TB/month with only 2 servers

• Do output caching with Codeigniter

• Achieving sub second page generation even in warm boots

• Redesign backend by creating an API for our native apps

URBANESIA V1The second iteration based on refined codes

and infrastructure design

PROS

• Achieved sub second page generation in warm boots• Aggressive & effective caching mechanism• Optimized MY_Controller• Session storage handled by Memcache• MySQL read/write access lowered from ~400 qps to only 1 qps• Lean memory usage in database server• Created an OAUTH enabled API• Concurrency increased by using nginx as reverse proxy• The same server setup can theoretically handle 10x the current traffic

without scaling horizontally• Google bots are only limited by bandwidth instead of efficient codes• Index properly with MySQL• Don’t use MySQL, used custom built MySQL alternative: Percona Server

CONS

• Source code = Spaghetti code• Unpredictable behavior of codes because of V0 inheritance,

when more rows fill, queries are bottlenecks• Subqueries still exists• Everything is still synchronous, no message queue yet• The end product fails to impress the illusion of speed (fast)

to users• New hires have a steeper learning curve because of the

inherited complexity added with V1’s own complex• Still difficult to scale horizontally & vertically

WHAT WE LEARNED

• CodeIgniter is enabling fast product delivery but optimization & efficiency of codes are questionable at best

• Need to enable asynchronous architecture• Do not do things realtime, instead offload to message queues• To impress users with the illusion of speed, JavaScript must be

thoroughly implemented• Emails should not be handled by ourselves, use third party email

solutions like AWS SES• Offload server side international bandwidth to clients, for

Facebook, use Facebook JS SDK instead of the PHP SDK• The product gains more engagements with contents that are more

focused (thematic)• Speed of content delivery is important to engagement metrics

CHALLENGES

• Build a third iteration with a strong identity based on users’ personas

• Focus more on verticals, create the illusion of a discovery/recommendation platform

• Progressive Disclosure of contents• A JavaScript framework that is light, fast and minimal

dependencies• Make everything asynchronous and message/event based• Redefine Urbanesia’s atomic data structure• Do MySQL JOINs in server side• Get the data first FAST, compute later

PRODUCTS & TECHNOLOGIESDoes the product makes the technology

or the technology makes the product?

THE PRODUCT MAKES THE TECHNOLOGY!

REAL WORLD EXAMPLES

• We need to know which part of Urbanesia will really work for users

• Store the preferences for each users’ dynamic activity

• Make calculations of other contents a user might consume

• Present the content unobtrusively

• Do it fast and almost realtime

TECHNICAL SPEAK

We need to know which part of Urbanesia will really work for users

• Mine all user’s data each time they visit, including anonymous users

• Log everything FAST and asynchronously

• Low value & high volume data

• Avoid MySQL at all cost

• Model data based on choosen NoSQL database model

TECHNICAL SPEAK

Introducing Redis

• Read/Write data from memory• Stores data on disk• Key/Value similarity with Memcache• Ability to perform atomic tasks without worrying states• Redis’ primitive data types are very simple• Ideal for low value/high volume data• Less is more!

TECHNICAL SPEAK

Store the preferences for each users’ dynamic activity

• Simple increments• Perfect for Sorted Hashmaps in Redis• Need them sorted so analytics functions is supported

primitively by Redis == High Performance• Fire & Forget – Consider using async frameworks like

Node.js & trigger using JavaScript• Why trigger with JavaScript? To make sure at the very

least that it’s actually users accessing the page

TECHNICAL SPEAK

Node.js & Socket.io

• Node.js is a Network ready daemon with Chrome’s V8 JavaScript engine inside

• Node.js is asynchronous by default (event based)• Socket.io is the transport used for data• Socket.io is abstracted to fallback gracefully between

Websocket, Flash and plain AJAX• JavaScript clients should only subscribe to onFailed

events to minimize overhead

TECHNICAL SPEAK

Make calculations of other contents a user might consume

• Use Machine Learning algorithms to learn users behaviors

• Naïve Bayes Classifier to the rescue

• Independent per keyword assumptions

• Proven algorithm used by many big websites

TECHNICAL SPEAK

Naïve Bayes Classifier

• There is no wrong or right assumptions, only accuracy

• Accuracy is increased with more data and better classifications

• Relatively easy to code

• Lots of libraries out there in different languages

TECHNICAL SPEAK

Present the content unobtrusively

• Giving users the illusion that we understand them

• Do not make this feature dominant

• Show it where you want the content look smart

TECHNICAL SPEAK

Do it fast and almost realtime

• Fast is an illusion

• Realtime is overrated

• If you don’t have enough resource to do so, schedule it and pre generate content

• Scale vertically

Talk is cheap, show me the CODES!

URBANESIA @ Github

https://github.com/Urbanesia

URBANESIA @ Github

https://github.com/Urbanesia/Simple-Naive-Bayes-Classifier-for-PHP

NAÏVE BAYES CLASSIFIER

First Iteration:

• Took ~1000 seconds to classify 1 keyword

• MySQL as storage

• No micro optimizations

NAÏVE BAYES CLASSIFIER

Second Iteration:

• Took ~400 seconds to classify 1 keyword

• MongoDB as storage

• Macro optimization trimmed 600 of 1000 seconds

• No micro optimizations

NAÏVE BAYES CLASSIFIER

Third Iteration:

• Took ~1 second to classify 1 keyword

• Redis as storage

• Insane macro optimization boost

• No micro optimizations

NAÏVE BAYES CLASSIFIER

Fourth Iteration:

• Took 0.01428 second to classify 1 keyword

• Redis as storage

• Reworked classification algorithm

• Get the data first and compute later

• More memory usage, faster execution time

NAÏVE BAYES CLASSIFIER

Fifth Iteration:

• Reworked the trainer methods

• Created deTrain method to update data

• Created helpers to do keyword blacklists

• Consistent performance from CLI or HTTP

NAÏVE BAYES CLASSIFIER

What we learned:• Always be open to new things• Geek Talk with peers from the industry• Very talented people will always come up with smarter and

better way to do something• Decide, get smart or get smarter?• Algorithms are the engine but it doesn’t mean anything

without implementation• Consider opening up source codes for others to examine,

the smarter the population, the better products we create• Focus on USERS instead of technology

GeekballEvery Tuesday, 17.00 – 19.00Basket Hall C, Senayan

OUR PRODUCTSUrbanesia’s product lineup

URBANESIA.COM

URBANESIA.COM SEARCH

M.URBANESIA.COM

URBAN’S NOTES

URBANESIA WINDOWS 8

http://urho.me/vkND6

URBANESIA ANDROID

http://urho.me/BSsqR

JAJAN

JAJAN

JAJAN

Jajan is Open Source, get the source codes:• Blackberry - https://github.com/Urbanesia/Jajan-Blackberry• Android - https://github.com/Urbanesia/Jajan• HTML5 - https://github.com/Urbanesia/jajan-html5

Platforms:• Blackberry - https://appworld.blackberry.com/webstore/content/54742/• Android - https://play.google.com/store/apps/details?id=com.bango.jajan• iOS - https://itunes.apple.com/us/app/jajan/id527278768?mt=8• HTML5 - https://jajan5.urbanesia.com/

URBANESIA BALI

http://urho.me/HPLT9

WHAT’S NEXTOur third iteration of Urbanesia.com

WHAT’S NEXT

• A rework from scratch both in Product Design and Technical Implementation

• Focusing more on users and our RICH content

• A social network useful for everyday city life

• Machine learning implementation for our recommendation engine

WHAT’S NEXT

Live Beta opening soon!

Email to dev@urbanesia.com for access

KEY TAKEAWAYSSummary

KEY TAKEAWAYS

• Empower people working with you

• Invest in company culture

• Focus on USERS, not technology

• Macro to Micro optimizations & scaling

• Be open to new ideas (things)

• Geek Talks over whatever like Basketball or Beer

• Good is not Great

• Whatever WORKS

Hi! From Urbanesia

THANK YOUEmail me: batista@bango29.com

Twitter: @tistaGithub: tistaharahap

Blog: www.bango29.com