Scaling up your platform is a necessary part of the evolution of your company. Moreover, scaling should be something that’s always on your mind.
Only three-and-a-half years ago, Bigabid’s platform handled less than ten campaigns, with 5K QPS from one data source, under one data center. Today, we run hundreds of campaigns with roughly 1M QPS from four data sources, under three data centers!
If scaling wasn’t constantly on our minds, we may not be where we are today. Here are a few tips to ensure that you’re prepared to scale, and that you’re set up for success when you do.
Questions to ask before starting
Before you attempt to scale, answer the following questions:
- What part of your system are you going to scale? Is there a surge of new users? Will an upcoming holiday drive a lot of new traffic your way? Did we recently deploy a resource-intense feature? Exploring these questions will help you determine which part of your system (if any) you should scale first.
- How much of the scale is predictable? You should always overshoot when preparing to scale. Will the platform need to handle twice the amount of traffic? If so, aim for even more than that. How much more will depend on how many resources (including money) you have for scaling.
- How frequently are you going to scale? You’d be wasting your time spending a few days every month working out how to scale your platform. You’d also be wasting your time putting a month’s worth of effort into making your platform scalable for the next three years. The way to properly address how frequently you’ll scale is by regularly considering the circumstances in which you’ll need to scale, so that you’re prepared to do so when the time comes.
Case in point: At one point, each of our data centers had only one Redis instance running. We knew this would become an issue, but creating a cluster when it wasn’t yet 100% necessary would have been a redundant expense. We foresaw the day we’d need to upgrade (more on that below), and when that day came, we upgraded to a Redis cluster—but only one data center had enough traffic to call for a cluster. So, the other centers remain with a single Redis instance until their time for an upgrade comes.
Metrics Are Your Best Friend
You need to be working with metrics before you scale, and after! In fact, metrics should always be on your mind.
We could easily dedicate an entire article to metrics—how to use them to ensure that our platform is working well, and how to use them for scouting out problems before they happen. But for now, we’ll put it simply: if it has a pulse, put a metric on it.
Here are some tips you should take into account regarding your performance metrics:
- Start with the basics. CPU, memory, network activity—for every instance you run, including managed instances such as databases.
- Monitor your scaling features. How you’re expecting to scale should not come as any surprise. With the proper metrics in place, you’ll be able to accurately foresee when scaling is needed.
- Every operation should be timed. You need to know how much time it takes to complete an operation. If you have a complex operation that’s comprised of multiple operations, you should measure each operation as well as the sum of them. Imagine having a process that loads a cache from a database while starting, and every interval updates the cache. In this scenario we definitely want to know how long it took to load or update the cache, but we’ll also want to measure how long it actually took to query the database. One timer is great for alerts, the other for investigating.
- Consider third-party components. If you’re integrated with third-party components, take into account that any one of them can break at any time, and plan accordingly.
Needless to say, metrics should be monitored and you’ll also need to set up proper alerts. When doing so, aim for general monitors. If you have a complex operation, you don’t need to monitor its individual components. If you receive an alert for a complex operation, it should guide you in the direction you need to investigate, which is where the metrics you put in place for the smaller components will come in very handy.
The Data Goldrush
Saving and analyzing data is the latest rage. You’re inclined to save every micro-action your user did or even thought about doing because… well, who knows? This might be the best predictor of user profitability.
Unfortunately, saving too much data has become a common and costly issue. Data saved for “maybe one day” purposes more often than not winds up being unused data that becomes a burden, both in terms of expense (the bandwidth to save said data is costly in terms of server performance and the effort needed to sustain ETL processes), and in terms of HR managing it.
Let’s take a step back; after all, we’re not Google. Be smart about the data you’re saving. Figure out if you’ll be able to use it before it goes stale. Being a hoarder won’t help, and with time you’ll feel increasingly uneasy deleting old data. So, best to avoid the issue before it happens.
Scaling is a lot about preparing, and being honest and realistic about our scaling goals. Indeed, if a team expects to scale successfully, the topic must be part of everyday conversation long before scaling actually happens. Only with proper preparation that comes from constantly discussing it, and from setting up the correct metrics and alerts, will you be able to ensure the success of your scale.
Once you’ve laid the foundation for scaling, you should be sure to adopt proper habits for handling a scaled system, and have a plan in place for when something goes wrong. More on that in Are You Prepared to Scale: Part 2.