Every developer encounters a situation where he/she pushes code to production, and at a critical moment, that code breaks everything. This can happen to people from all seniority levels and can be caused by silly or sophisticated reasons.
In this blog, I’ll teach you how to avoid dumb SQL syntax errors automatically. We will check the validity of each of our queries before we merge the code into production. The code for this article can be found on GitHub.
My Name is Kobi Balilty, I’m a Senior BI Data Engineer. Bigabid is a company of elite Programmatic User Acquisition and Retargeting DSP for Mobile Apps. Two of the greatest challenges in BI are the data’s scale and the data’s complexity.
During software development, there are many validations that we want to go through before they “hit” the customer. We can automate these processes and block “bad” behaviors, through a process called the CI/CD. This result in a more stable product, increase productivity, provide faster release cycles and allow a better experience for our customers.
“CI/CD bridges the gaps between development and operation activities and teams by enforcing automation in the building, testing, and deployment of applications”
In addition, developers hate spending their time on checks and always will prefer automated processes that do it for them. More than that, the process maintains the energy and morale of the developers and allows them to spend less time troubleshooting and more time building quality products.
It’s pretty common to see ETL failing due to silly SQL query syntax. This can lead to downstream failure and incorrect data in dashboards. For example, one of our developers pushed a critical change of the query and unfortunately, he/she was not concentrated enough and pushed a silly syntax error with an extra comma, which leads to an error in the ETL and downtime of several minutes to all of our dashboards.
In the next section, we are going to show how to eliminate those.
We want to make sure syntax issues are detected in our CI/CD. When prepending a query with Explain clause it will return the execution plan. In case the query is broken it will result in an error.
So to validate all queries are syntax is ok we need to do some pre-steps first in our purest:
Since in Bigabid we are heavy users of GitHub actions, we triggered our CI/CD pipeline from there (including test_queries_synatx). In GitHub action, we block merge/push into the master branch in case of any invalid queries.
This technique that I just described has some prominent pros:
Like everything interesting in life, it does have some limitations:
In my opinion, those limitations pale in comparison to the value that we got and keep getting.
In this article, we began by explaining why it’s important to have a CI/CD on our queries. It helps both beginners and senior software engineers to avoid silly SQL syntax errors and increase the ETLs availability.
I hope I was able to share my enthusiasm for this fascinating topic and that you find it useful. You’re more than welcome to drop me a line via email or LinkedIn.
Thanks to Eyal Trabelsi, for reviewing this post and making it much clearer.