Date post: | 25-Jun-2015 |
Category: |
Documents |
Upload: | yifat-kanfi |
View: | 139 times |
Download: | 2 times |
Performance - what itactually is
well, code which does what it'ssupposed to, and doesn't do it asslow as rails 3.0's boot time.
in every part of a project's lifecycle, the way we treat performance is very different.
When you're young
When you're young and naive
when you start with a project, and it's still small on traffic, write naive code!
Do TDD!
To avoid this :)
Write short and concise code
Don't bother with premature optimization
(when you prematurely optimize, this happens)
READ!
prepare for growth, because you're optimistic and all that. make sure you'll know what to do when shit gets real.
be naive but not TOO naive, though
there are some things which just scream - don't do this! it's gonna suck, BAD!the n+1 query issue is a good example of too naive code.
The controller
The view
The view
The problemwe have an array of users, and when we iterate over that array we reach for profile_image and for posts, which triggers two queries to the DB for each user. ending up with 2n+1 queries, n being number of users
ActiveRecord's includes prefetches the extra queries, so they turn into two queries, instead of 2n queries
The solution
The new controller
now there are only 3 queries, instead for 2n+1 (n being the amount of users)note that this might not be the right thing to do in larger scale projects. you might want to cache the profile image in redis, for instance, and completely avoid bringing in the profile_image object from the database.
The importance of TDD
One of the roles I took upon arriving to FTBPro is kickstarting and leading the move to TDD, we also wrote a bunch of specs for our legacy code. Difference was incredible.
Daily deploys
(instead of weekly deploys)
New code's clean and awesome
More focus on features
because code's fairly covered, there's less issues that come up in production (less being relative, yeah?)
Upgrading made easy
we moved from rails 3.0 to 3.2 within two weeks. mostly because a vast majority of the issues were discovered in tests.
But this talk is about performance!
When writing TDD, your code will be faster.● TDD forces you to write short and atomic
methods● we try to make these methods fast because
we hate slow specs :)● code doesn't fail on production, because if it
fails, we know about it before deployment.● no long-running methods, because they're
short and concise
More performance specific TDD
using rspec you can test the time a method takes to run, set a threshold above which the spec fails!when using the bullet gem, you can set a limit on number of queries you allow a controller to run.Do benchmarks and performance tests
original code - written without tests
Rewrite - the specs
the actual code does exactly the same thing, but it's much shorter, and much more readable, because it's TDD, every method does only one thing, and is tested well.
Conclusion - do TDD!
● code is shorter● easier to maintain● it's tested so when it breaks we know it
before it's on production● when we need to refactor or change it, we
can be fairly certain it will still work as intended because of the tests.
When you're growing
Now, you start growing, and there are growing pains
● because you've written TDD, when you optimize, you're not going to break anything (or are, but will see it when tests run)
● your code is short and concise, so optimizing it will be easy
● because you didn't optimize anything, you'll feel what needs to be optimized first (using newrelic and the such)
● again, don't optimize what's easy to optimize, optimize the parts which start causing pain.
How to get the feelin`
Newrelic
shows you what's hurting the most
And gives you a breakdown of that
Google Analytics
Browse your site (that crazy!!)
Listen to users
they may come and complain, and may just go away. use google analytics to look for pages with unusually high bounce rate.
Custom tools
statsd and graphite can be quite handy
Real life example
in FTBPro, we have a score table for each league, it gets daily(ish) updated from an external source.We noticed in Newrelic that the league page took a long time to load. A short investigation pointed to the table, which led to a tiny change in the code.
Before
After
What? wait! it looks the same!
well, almost. there are two changes - one is a tiny change in variable names to make code more readable.the second change is we used a caching mechanism to bring in the team (called Subject in our code) without making any queries.
the difference was HUGE. time to build the table when cache was cold went down from 7 seconds to 0.5 seconds.
So - what have we done exactly?
● we removed an n+1 query not by including stuff, but by avoiding the query altogether
● we used a caching mechanism for teams, which takes the team's nick (Barcelona can be referred to as barca, or F.C. Barcelona) and returns the cached team.
● used that cache to speed up a very painful part of the site by a lot.
● and yes, of course the view is cached so the rebuild of the table only happens once a day.
When you need to refactor, or rewrite.
refactoring is taking code and changing it, while rewriting is starting from scratch.different reasons for refactoring or rewriting● code is causing performance issues● code is too clumsy, and makes debugging
very hard and costly● code just looks horrid● Tom said so.But when do we rewrite and when is it enough just to refactor?
When to refactor
● code is generally ok, maintainable and worth keeping
● small changes would get the desired result easily
● code is well covered with specs● we're too damn lazy to rewrite it all (yes, it's
a valid reason, lazy programmers create short code)
When should we just throw it away and rewrite.
● if the maintaining the code costs more than rewriting it, rewrite, and do it well!
● if the code does not have any test coverage and is untestable.
● when code looks like the Flying Spaghetti Monster
● when it was written by Avi Tzurel :)
make sure that new code is good, if you rewrite shit code to new shit code, you've done nothing!
A little bit about queues
DelayedJob, Resque, Sidekiq, they all got strange names with typos in them. They all save us from hell.
Move long running stuff to the background!
Let's talk about user registration - a user comes to the site, signs in with facebook, we get his image, his facebook friends, etc. It takes a while, even a long while.
Put it aside!
Calculating all that stuff is long.This doesn't have to be that way. We really only need to save the user's name, facebook details, and that's it. We'll do the rest in the background, using one of the queueing mechanisms Ruby has to offer us. This will allow us to give the user a better, faster experience.
Starting to get seriously huge
(ok, maybe this isn't a good image)
Hitting large scale
Q - when do you know you've hit large scale?A - when your servers crash daily.
now, when you've reached that, you know you need to do some really drastic stuff to adjust to your new position.
A quick detour to the land of DevOps
● handling large scale requires a lot of resources, and managing these resources effectively.
● cloud services such as Amazon AWS give companies some simple tools to handle scale very well.
● but if you don't know what you're doing, call for help :)
FTBpro's setup on AWS
Mysql with RDS
RDS is Amazon's mysql. It's optimized and easy to set up. saves us a lot of time on system administration.
Memcached with elasticache
Elasticache is the Amazon memcached service. same as RDS, saves us time bother of messing with memcached servers.
Custom redis server
thinking about moving to cloud services to save us the trouble.
Web servers with nginx+unicorn
nginx+unicorn are like milk and cookies. With the right setup we also get zero-downtime deploys, which are awesome.
Resque servers
they're also built for automatic scaling. just because we're awesome!
CDN cache with cotendo (akamai)
logged out users don't even touch the web servers - their content is served by the CDN.
Build it for quick and automatic scale
● self-deploying servers - when you start the server from its image, it will deploy to itself and start serving traffic / run resque workers
● adding servers is automatic - when there's high traffic, start them up, then kill them when traffic's low
● this allows to pay the minimum for hosting, while keeping scalability
careful with these self-deploying robots! make sure they know the robot rules...
The rules:1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.2. A robot must obey any orders given to it by human beings, except where such orders would conflict with the First Law.3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
ok, back to Ruby (kind of)
When reaching massive scale, we'd start looking for custom solutions - relational dbs would stay forever, but some things should be moved to other customized solutions.● consider using mongo for document-like
data● consider using neo4j or other graph dbs for
representing graph data (sorry Avi, mongo ain't no graph DB!)
And don't forget to stay naive!
being large scale, but still fun and lean, can be hard, but pulling it off is worth it!
Thanks for not falling asleep!