“It’s much simpler to account for changes when you have an isolated environment for testing new features and seeing how they affect your platform.”
Are You Prepared to Scale Your App: Part 2
By Amir Schlezinger
You’re now managing a scaled system. Congratulations! But wait—don’t throw your feet up just yet. You need to make sure you’re properly handling your scaled system, and that you have a plan in place for what you’ll do when something goes wrong (don’t worry; it’ll happen eventually!).
Habits You Should Adopt
One of the keys to scaling successfully is for scaling to be part of the daily conversation happening in your team. But saying you should be thinking and talking about scaling daily is easier than actually doing it.
With that in mind, here are some tips to keep you talking and thinking about scaling every day:
Check your metrics daily. Ask yourself how the metrics you’re seeing today are different from the ones you saw yesterday, a week ago, a month ago, etc. Are you seeing a positive trend, or is there something that needs to be addressed before things start falling apart? Side note: Remember the Redis story from Part 1 of this article? Thanks to the fact that we were monitoring daily and alerted long before the issue became a pressing one, we were able to take an entire week to manage the upgrade, without any pressure. This well-thought-out execution relied on the regular monitoring we’d done beforehand.
Anytime you deploy a new feature, ask yourself this question: How could this potentially break our system? It’ll help to examine which part(s) of your system the new feature interacts with, and consider the different ways in which it could affect your system with varying load.
Aim for small, incremental releases. It’s much simpler to account for changes when you have an isolated environment for testing new features and seeing how they affect your platform.
Don’t do it alone. Scaling should not be a one-man job. If you’re leading a team of developers, make sure you’re regularly discussing how their work might affect your platform’s ability to scale.
Take note of your technical debts. You’ll inevitably accrue them, so be sure to write them down. Then, make sure to take a peek every so often to see whether now is the right time to address them.
What do I do when something goes wrong?
It’s crucial to prepare for the likelihood that something will go wrong. After all, it’s inevitable that at some point, some part of your system will fail. The good news is that you can prepare for that rainy day, and even mitigate some of the damage!
Here are tips for preparing for the moment when something goes wrong:
Pinpoint what’s most important to you. If multiple fires suddenly start burning, you need to be clear on which one you want to put out first. Be clear about this before it happens.
Define your system’s SLA. Understand what it is that could potentially kill your business, and what could be a mere nuisance. Does your system require 24/7 uptime? Is data integrity your top priority? Would an increase of one second in your response time create huge losses for your company?
Define the parts of your system that scale easily, and the ones that don’t. By doing this, you’ll be able to tell which parts of your system need to be handled first. Some questions to consider:
Can you simply increase the number of servers you use? Pay a bit more anddeal with the problem that way for now?
If a certain component fails, will that require manual intervention? Does everyone on my team know how to handle that scenario?
Do you have a single point of failure?
Are you taking a large financial hit by moving unmanaged services to a managed state?
Have a rollback plan. Ask yourself how easy would it be to roll back this version, and what would happen if the data structure changes? Make sure you can revert back to your old schema and migrate new data to the old format, if possible, to prevent data loss.
For many, scaling is very intimidating—but it doesn’t need to be. If you’re well prepared, you can manage scaling with stride. Part of this involves accepting the fact that at some point, your operation will fail. Your system will crash. But there’s a bright side: you will improve and the process of scaling will become easier with time. Talk about it daily. Think about it always.