Commvault Unveils Clumio Backtrack - Near Instant Dataset Recovery in S3
Developing a modern SaaS product is not just limited to writing features that customers can use, there is an entire Software Development Life Cycle (SDLC) that needs to be defined and implemented. The SDLC has many parts to it that would be too long to cover here. In this latest installment of the “SaaS Matters” series, we will focus on an overview of the Continuous Integration and Continuous Delivery (CI/CD) pipelines that are in place at Clumio which allows our engineering organization to provide a scalable, reliable and secure SaaS backup experience to our customers.
Every single change to our Master branch has to go through a thorough validation process. For our code to land on Master, our development teams create a Pull Request (PR) that gets reviewed by one or more peers and a PR pipeline is triggered automatically.
The CI and PR pipelines run a series of validations:
These checks give us greater confidence that the code will work once landed on Master.
Checks are timeboxed to provide ample time for reviewers to analyze the results and assess the changes and they are designed to be reliable by avoiding reaching out to third party APIs.
For more thorough validation, we have additional long running CI pipelines which we run hourly against our Master branch on code changes.
Our Quality Engineering team has several sets of Integration and Regression Test Pipelines. We use a set of one hour tests that builds and deploys all our services to a Dev/QA environment and exercises the product in a simulated customer environment.
These run hourly for the master branch and our developers are encouraged to run them against their development branch whenever they make infrastructure or API related changes. These tests make real calls to the Public Cloud and Third Party APIs and ensure that the product is working as expected from end to end.
Just as our product services run in a server-less model, the same applies to our pipelines. They run on Kubernetes which scales up and down automatically based on demand. We parallelize our tests and split them up further so that we do not have to choose between testing and getting things done.
Our SaaS model has allowed us to adopt a consistent weekly release cycle.
In this model, a commit to Master is no more than 2 weeks away from being live in Production. Instead of long-lived Feature Branches, we prefer to use Feature Flags. This permits us to avoid the overhead of merge conflict resolution associated with long-lived branches. It also allows us to try new features internally and then in production with selected customers before we fully enable them. If a new feature turns out to be unstable, we do not need to perform a new deployment – we just disable the flag. The same goes for when we feel a feature is ready for general use.
The short release cycle means that the scope of potential problems is smaller and that the authors of the changes should still have a relatively fresh memory of the implementation if we need to troubleshoot.
Finally, this allows our product team to rapidly shift priorities based on customer feedback without disrupting the current sprints while maintaining high quality standards.
When issues arise, the engineering teams work together to address the problems as quickly as possible. If specific microservices are misbehaving, the DRIs (Directly Responsible Individual) for these services get pulled in quickly to analyze the situation.
By allowing our developers to work closely with our Operations team and giving them the same sense of ownership from laptop to production, we foster software development practices that include TDD (Test Driven Development), defensive programming and adding additional fault tolerance. Finally, we perform blameless post mortems wherein the goal is to establish how our platform can automatically recover from the same root cause in the future.
To recap, we have built our SDLC with our customers in mind. The short release cycle allows us to take feedback from our customers more rapidly and continuously improve our product.
Our automated tests allow us to deliver releases that have a lower bug rate while allowing us to implement improvements quickly. Additionally, better quality not only means that our customers are enjoying a product that works but also means our developers can spend more time developing features and spend less time debugging production issues. We are far from being done. The CI/CD strategies we used when the company started 2 years ago were updated to scale to our current size. As we keep on building new features and a larger engineering organization, our tools and procedures will evolve accordingly. At the end of the day, delivering the highest quality product to our customers is what drives us.
This post is part of our SaaS Matters series. Be sure to check out these other posts in the series.