Sunday, August 04, 2013

Scaling to a Billion - Part III

(An expanded version of this article first published in Software Developer's Journal)

Scaling to a Billion (Part 3)

Up Up and Away

When asked to scale a solution, the first concepts that leaps to the minds of developers is momentary load. Measured in ‘‘requests per second’, scaling consists in handling more requests in the same time.But how many more? and when? it is likely that the load on your system varies through the day, between weekdays and weekends, and between rush times and ordinary days. For drugstore, like many other eCommerce retailers, Black Friday (and Cyber Monday) represented an annual peak. Yours may differ - but you should know when your requests hit max load, and what load you expect to handle. Backend requests can often be queued, but queuing introduces delays - your service development must be guided by your SLA (Service Level Agreement): is it OK to delay processing some requests for 15 seconds on peak times? how about 15 minutes?


A few tricks to help optimize systems quickly are
- Minimize data flow. Moving data around in memory or from disk takes time. Check the columns fetched in your SQL query and slim them. Verify that your database table only contains required rows, and that unnecessary (or old) rows are purged or archived. Considering covering indexes for common queries. And reduce the amount of data (such as unnecessary fields) passed in web service calls.
- Scaling horizontally means adding more machines. It is easier to design a system for horizontal scaling if your services are stateless. Soft state (cached data) is usually OK, but consider a shared or distributed cache if the same data ends up cached on multiple machines.
- Seek out and eliminate all single point of failures.  The same search will likely identify some of your choke points - the servers that have to handle -every- requests. Consider alternatives.
- Scale horizontally or partition your data. Either plan for many machines to process your workload at once, where each machine has access to the entire data, or divide your data into partitions, and have a separate set of machines process each. Either approach has pros and cons - understand your tradeoffs.
- Simplify your solutions. Complex solutions are hard to maintain or even get right. And they are harder to optimize.

No comments: