Scaling to a Billion (Part 5)
Nothing is Free
Optimizing
engineering metrics is never sufficient. To support a high transaction value
system, the architects must understand and optimize the metrics the business
cares about. In some businesses, low error rate is crucial. In others, lowe
turnaround time. Some businesses care about average response time while others
cannot tolerate slower response time even for a smaller fraction of the
requests. The metrics measured - average latency or P90 (a measure of the
experience of the worst 10% of requests) latencies, request error rate or down
time, must fit the business the company is in.
And
above all, understand your assumptions and state them clearly. Handling ‘five
times the order volume’ may sound specific - but does your tests assume that each
order has less than 10 items? that orders arrive in a constant rate over an
hour rather than in a bursty manner due to batching in other systems? or that
other systems do not cause load or lock contention on the database?
Misunderstanding your requirements or assumptions may result in perfectly
engineered systems that do not help the business grow as planned.
And
above all, remember that scaling a system is hard. Systems do not scale
linearly, and in many cases, handling twice the load requires more than twice
the computing resources, developer time, and stabilization period. Advantages
to scale exist - but they take time to materialize.
About the Author
Yaniv Pessach is a software architect living in Bellevue,
WA. He worked for multiple SP500 as well as smaller firms throughout the years,
and received his graduate degree from Harvard University, where his research
focused on distributed systems. You can find more about Yaniv on his website, www.yanivpessach.com, or contact him
through his linkedin page at www.linkedin.com/in/yanivpessach