Tuesday, July 02, 2013

Scaling to a Billion - Part I

(An expanded version of this article first published in Software Developer's Journal)

Scaling to a Billion (Part I)

In 2011, the late stage startup I was with sold and fulfilled eCommerce orders in an annual rate of half a billion dollars. After being purchased by a major brick-and-mortars retailer, our backend fulfillment systems were enhanced to fulfill not only the orders coming from our own site, but also to handle most of the fulfillment of orders placed on the retailers’ site. With multiple sources of orders, we had to be ready for considerably more orders placed and processed. As the Principal Engineer of the group responsible for Order Processing and Payment systems, I led the technical design of our systems toward handling that elusive and quite challenging goal of enabling our systems to support one billion dollars in annual sales. This article is about some of the lessons that the team and I learned in the process.

Reducing Uncertainty Under Pressure

When your application is directly responsible for revenue, you and your team will find yourself constantly under a microscope. When issues occur, escalations are distracting and time consuming; therefore, part of the effort in designing a system must go towards minimizing outliers and possible escalations. A stable, reliable system may be preferable to a more efficient system that runs in fits and breaks. When looking at performance and SLAs, it is not enough to minimize the average response or cycle time - minimizing the variance of those numbers is crucial.

(To Be Continued)

No comments: