Introduction
At the starting of 2018, when I joined Kubric, all of our services were on Google AppEngine.We used to run gcloud shell script to push the repository onto the deployment environment. As the deployments became more complex and team size grew, we created a unified model for deployment, the deploy.sh script. This served as a standard means to deploy a service onto the app-engine / compute machine. Now for all applications, one has to just run a deploy script without having to remember all the minor details. This was the first baby step in improving our deployment process.
As we grew further and became from 3 developers to 12 developers, from 5 microservice to 20 microservices and migrated from auto-managed app-engine cluster to self-managed Kubernetes cluster, we realized that a deploy script from a local developer environment couldn't provide the right visibility around the system anymore. Everyone was deploying their feature branches on production. Whenever a deployment failed, it was hard to debug the deployment. Replicating the AppEngine environment locally also became a nightmare. That is the time, we decided we needed to move to a central deployment structure which also gives us the visibility around deployments and easy rollback abilities. We decided to move to managing all deployments using CircleCI.
Wait but why?
The reasons we wanted to integrate with CICD, were the following:
Traceability
Previously when we were doing deployment by running scripts in our local machine, we had less idea about the deployment. The logs were in a local machine and in case something goes wrong, it was difficult to quickly figure out the deployer. With CircleCI, we have logs for every run, with details about when was it triggered and who triggered it.
Decoupling Deployment from Development Process
We could now confidently merge and ensure that all deployments were automatically moved to production environments using the CircleCI deployer.
Confident Deployments
Integrating with CircleCI helped us to set up the right process of deployment.
Code Changes -> PR Review -> Merge to master -> Auto trigger Test cases -> Auto Deployment.
This gave us more confidence in our deployments and reduced chances of error.
Refresher — The Basics of CircleCI
Before we jump into how we went about integrating with CircleCI, I would like to quickly take you through some of the basic concepts of CircleCI, which we had to know to set up the first version of our integration. You can skip this section if you are already aware of these concepts.
Job: A job in CircleCI defines the set of commands to be run on a machine. For each job, CircleCI creates a machine and executes all the steps defined in it. We can create a job for running test cases or for building & pushing docker files or for deployments.
Workflows: Through workflows, we can define the flow of execution of jobs. We can specify a job that runs test cases and once it’s finished, only then run the job for build and deployment. This is an example of simple workflow but CircleCI allows you to build complex workflows as well.
Executor Type: An executor type defines the underlying technology or environment in which to run a job. CircleCI enables you to run jobs in one of four environments: docker, machine, macOS and windows
Challenges
As we started the integration with CircleCI, we faced a few issues, which I believe any beginner may face ( Having successfully done this integration now, these issues feel very trivial to me, but when I started with my first integration, they were the reason for my occasional bad mood! )
Some of the major challenges for us to migrate from local deployment to auto-deployment through CircleCI were the following
- How to store secrets and credential files?
We had a few credential files required in the code, but we cannot be checking these files into a git repository and since deployment was remote, we cannot keep it locally. - How to clone private repositories
Since we had written our own repositories which are being used across services, we needed a way to clone these private repositories as well.
Getting past the challenges
Before getting down to solving these two challenges, we decided to get the basic setup done -> Creating .circleci.yml file and triggering the build. Here we faced our first blocker.
In config.yml, CicleCI configuration file, we were using docker as the executor type. But as we ran the build, it had an issue accessing the code. On searching online, we found that while using docker as executor type, it will not have access to checked out code. To solve this issue, we had to use “machine” as the executor type
Now let’s talk about how we went about solving the two major challenges.
“How to store secrets and credential files”
CircleCI has the concept of Environment variables, which can be used to define private keys or secrete environment variables for your project. We used this to store our secrets and credential files and access them as variables during the deployment. CircleCI also has a concept of Context, which allows us to use the same environment variables across projects.
“How to clone private repositories”
The solution was to add a dependency of the private repository in requirements.txt file. But there were a bunch of issues we faced to make it work.
Issue:
We were using docker builds for deployments. Now while downloading private repositories dependencies mentioned in requirments.txt, docker-machine required access to the repository. To provide the ssh access, we added below command in Dockerfile to copy the ssh key file from the CircleCI machine to the docker build machine.
ARG SSH_PRIVATE_KEY
RUN mkdir /root/.ssh/
RUN echo "${SSH_PRIVATE_KEY}" > /root/.ssh/id_rsa
Here SSH_PRIVATE_KEY variable was passed from the docker build command.
Still Issue?
Even after adding the id_rsa key, we were still having trouble downloading the private repo. As we researched more, we realized that the id_rsa field copied inside the docker-machine had open permission ( 644 ) and if you look at ssh man page ( which I never had to look at before! ), it clearly mentions that file should be readable by only the user
~/.ssh/id_rsa
Contains the private key for authentication. These files contain sensitive
data and should be readable by the user but not
accessible by others (read/write/execute). ssh will simply ignore a private
key file if it is
accessible by others.
Now we changed the permission of the file, by adding the below line in our Dockerfile
RUN chmod 600 /root/.ssh/id_rsa
Still Issue?
Now the file was with the correct permission, but still, docker build was failing and the issue still was — can’t access the repo.
After doing an ssh to the machine and running the clone command, we found that it prompted to enter “yes” or “no” for the authenticity of the host and since CircleCI deployment is an automated process, being asked a question prevented it from executing. To bypass this, we had to disable host checking. For this, we added below command which disabled host checking for our particular host
RUN ssh -o "StrictHostKeyChecking=no" <your-host>
Through these steps, copy the required ssh key -> Set the corresponding permission of file -> Disable host checking, we were able to clone the private repository.
Final Leg
Once we figured out all these issues and successfully deployed one service with CircleCI integrated, the next task was to do the same across 20 different microservices. The whole team came into action and we were quickly able to integrate all services with CircleCI. Now all our deployments are automated, the process is transparent and less error-prone. The nightmares of broken deployments and unclear production branches are gone and now we can confidently move to the next steps in building a great tech product!
PS: If you have any comments or any suggestions, please feel free to drop a note to me at paroksh@kubric.io