AWS DevOps Exam preparation: Code Commit

Previously we would do things on hand but now we will automate using these tools

What is CI?

So first, CICD is about continuous integration, and that means that the developers are going to push the code very often into a central code repository.

And so that could be GitHub, which is a third party service from AWS, or CodeCommit which is Ed's AWS service or Bitbucket which is also a third party service. So developers push the code to code repo and then there's going to be a test or build server that will check if the code is correct and working as soon as it's pushed into the code repository.

So this could be CodeBuild, if it's AWS or Jenkins if you want an open source tool, for example. So the build server is going to fetch a code and test it and then as a developer, we're going to get feedback as to whether or not the test or checks will have passed or failed. And so we get the build and test results, but we've saved time. So with this, we find bug early and we fix them early, because we are testing the code as soon as it is pushed in a code repository.

So the developer doesn't need to test the code on their machine. They can just push the code and wait for the build server to do it while they do some other tasks.

Therefore, the code is going to be delivered faster, because it's going to be tested. Thanks for this we're gonna be able to deploy often, because as soon as it's tested and ready, we can deploy it. And then happier developers, because developers have a healthier cycle for development.

What is CD?

So we saw CI and now there is CD. So it could be continuous delivery, for example. And so let's give you this example. So here we have application servers of version one, and we want our code as a developer to be pushed all the way to these application servers. So the way we do it is that we use continued delivery.

Anytime we push the code onto the code repo, it's going to be deployed if it's tested appropriately onto our application servers. So the developer pushes the code, the code is going to be tested by the build server. So this is the continuous integration part. And then after the build is passed, so it is green, so it's fully tested, Then there's going to be a deployment server.

And this deployment server is going to deploy our application onto our application servers. So version one at first. But then if we push again a new version of our code into our code repo, then we're going to have application server version two. So with continuous delivery, we ensure that deployments happen often and then they're very quick and we shift away from a mentality of let's do one release every three months, which is very long to do, you know, error prone, because it doesn't happen very often to let's automate you know, up to five releases a day.

Because every time we push the code into the code repo, the code is going to be live onto some application servers. So to do continuous delivery, we need automated deployment tools such as CodeDeploy, which is an AWS service or Jenkins CD or Spinnaker or other tools.

So if you look at the tech stack for CICD on AWS, we have the code and the code can live in CodeCommit or GitHub or Bitbucket or any third party code repository.

Then we have build phase, the build phase and the test phase could be done by CodeBuild on AWS, but there's a competitor to CodeBuild, which is open source called Jenkins CI or any third party CI server will do as well.

Then we'll go to the deploy phase, then we can use CodeDeploy, and CodeDeploy will seek and deploy to EC2 instances, on-premises servers Lambda functions, and ECS. Or if you wanted to also provision the infrastructure, we could use Elastic Beanstalk as an alternative to CodeDeploy to do the deployment and the provision of the infrastructure.

And then to orchestrate all these things to define exactly what happens in our CICD process, then we can orchestrate everything using AWS CodePipeline. So that's it for an overview of CICD.

AWS CodeCommit

So the concept we need to introduce is version control. It's the ability to understand the various changes that happen to code over time and possibly roll back.

So to have version control, that means that you can see what happened in the past, who committed some code, what changed, what was added was removed and so on, and then roll back. And so to get version control, there is an underlying technology that's very, very popular nowadays called Git, and a Git repository can be synchronized on your computer, but it's also very usually uploaded onto a central online repository.

And the benefits of having a central online repo for a git repo is that you can collaborate with other developers. So it allows, you know, organizations up to maybe hundreds of thousands developers to work on the same code at the same time, which is amazing. Make sure that also the code is backed up somewhere.

So the code lives on the cloud and not on just someone's computer, make sure it's fully viewable and editable so we can see who committed to what line of code when, and we can revert them. You can roll back. You can do a lot of good things with code repositories And so with Code Commit, well, we have a code repository in AWS and our developers.

For example,

Emma and John can collaborate and push and pull code from our code repo So why do we want to use Code Commit? Well, Git repository can be quite expensive. And so there is like some industry, that offer, third party services, which is github,Gitlab, Bitbucket and so on, but the bill could be pretty high, but by using code commits, which is on AWS, you get a private git repository that's because your code actually lives and stays within your VPC on AWS cloud.

There's no size limit on the repo. That means that you can scale to, you know, gigabytes of code, if you want it to it's fully managed is highly available. And the Code, as I said, is only in the AWS cloud. So that means there is increased security and compliance, and maybe for example, unacceptable for you to have your code elsewhere then on AWS also could commit a security. So it's encrypted, you have access control using IAM and so on, and you have integrations within code commit and industry standards for such as Jenkins or CodeBuild or other CI tools, which make it a great choice to store your code.

So interactions are done using this standard Git comman line, but then you have authentication on top of it. It could be using SSH keys. In which case, as a use you can configure your SSH keys to be able to go into a Git repo or HTTPS. If you wanted to get access using standard login and password to get repo. for authorization, you have IAM policies are used to manage users and roles permissions to the specific repos, which is nice because that means you only have one way of managing security in AWS

Encryption: So your code is ultimately going to be encrypted using KMS. And that means that no one else, but you can retrieve it. And also while you push your code to code commit, you have encryption in transit because you use HTTPS or SSH protocols, which are both secure.

And then in case of cross account access, of course you would not share you as SSH keys or your credentials with someone else. Instead, you would create an IAM role in your accounts and then use STS the AssumeRoleAPI to get access to a code commit repo.

So just to finish your high-level overview, because you may be very well familiar with GitHub.

The first one is that you can monitor any kind of events happening in CodeCommit through EventBridge. So anytime there is a pull request that is created or that a pull request status changes, or a new reference is created, or a new comment is created, then you can react to that in EventBridge which gives you some cool automation opportunities because for example, say you wanted to react to a new pull request, you could, through EventBridge, invoke SNS, Lambda, or CodePipeline.

The first one is that you can monitor any kind of events happening in CodeCommit through EventBridge. So anytime there is a pull request that is created or that a pull request status changes, or a new reference is created, or a new comment is created, then you can react to that in EventBridge which gives you some cool automation opportunities because for example, say you wanted to react to a new pull request, you could, through EventBridge, invoke SNS, Lambda, or CodePipeline.

Also, how do you migrate a repository from one place to CodeCommit?

Well, if your repository is hosted on another place such as GitHub or GitLab, then you can push it to CodeCommit very simply. The way you do it is that first you need to create your CodeCommit repository, but then you do a git clone. Now, this git clone command is going to take the entire content of your Git repository from the server and put it on your local computer.

That means all project files, all commits, everything. And then once the project is cloned onto your local computer, you can push it to a different URL. So instead of pushing it back to the Git server, you can push it to a new Git place such as your CodeCommit repository. And that's how you migrate a Git repo.

So how can we achieve a cross-region replication in CodeCommit?

Well, we would want to do cross-region replication, for example, to have lower latency for pulls for global developers, or to have a backup of a repo. So say for example, we want to have a copy of us-east-1 into eu-west-2 with replication. So how does that work? Well, for example, whenever we push to an existing branch or we create and delete a branch, CodeCommit will actually emit an event in EventBridge called referenceCreated or referenceUpdated. So this is the type of event that will appear. And from there, EventBridge can trigger, for example, an ECS task. The ECS task will do a git clone of the CodeCommit repository, and then we'll replicate it to the target repository in eu-west-2. So we could use an ECS task, but I'm pretty sure we could also use a CodeBuild task if we wanted to. And so this is how thanks to EventBridge we can achieve cross-region replication of a CodeCommit repo.So how can we achieve a cross-region replication in CodeCommit? Well, we would want to do cross-region replication, for example, to have lower latency for pulls for global developers, or to have a backup of a repo.

So say for example, we want to have a copy of us-east-1 into eu-west-2 with replication.

So how does that work? Well, for example, whenever we push to an existing branch or we create and delete a branch, CodeCommit will actually emit an event in EventBridge called referenceCreated or referenceUpdated. So this is the type of event that will appear. And from there, EventBridge can trigger,

for example, an ECS task. The ECS task will do a git clone of the CodeCommit repository, and then we'll replicate it to the target repository in eu-west-2. So we could use an ECS task, but I'm pretty sure we could also use a CodeBuild task if we wanted to. And so this is how thanks to EventBridge we can achieve cross-region replication of a CodeCommit repo.

So what about branch security in CodeCommit?

Well, as soon as you grant a user push permission into a CodeCommit repo, they can contribute to any branch they want. And so to restrict to which branch they can contribute to, you need to restrict the users. And for this, we use IAM policies. So say we have a CodeCommit repo with production, staging, and development branch and we want to only allow the senior developers to push the code to the production branch. So here we allow it through IAM policies, but the other IAM policies we apply to the junior developers will prevent them from pushing into production.

So this is the kind of IAM policy that looks like it. So here we have a deny effect and if we look at the condition, we say, hey, we deny if it looks like you're pushing into the main or the prod branches. And so this IAM policy must be attached to your junior developer groups, for example, and then they won't be able to push into prod. So you may say, well, what about resource policies? What about an IAM policy directly attached into your CodeCommit repository? Well, for now, it is not supported yet. So the only way to do branch security is to deal with it at the user level or the group level directly in IAMSo what about branch security in CodeCommit?

Well, as soon as you grant a user push permission into a CodeCommit repo, they can contribute to any branch they want. And so to restrict to which branch they can contribute to, you need to restrict the users. And for this, we use IAM policies. So say we have a CodeCommit repo with production, staging, and development branch and we want to only allow the senior developers to push the code to the production branch. So here we allow it through IAM policies, but the other IAM policies we apply to the junior developers will prevent them from pushing into production. So this is the kind of IAM policy that looks like it. So here we have a deny effect and if we look at the condition, we say, hey, we deny if it looks like you're pushing into the main or the prod branches. And so this IAM policy must be attached to your junior developer groups, for example, and then they won't be able to push into prod. So you may say, well, what about resource policies?

What about an IAM policy directly attache into your CodeCommit repository? Well, for now, it is not supported yet. So the only way to do branch security is to deal with it at the user level or the group level directly in IAM.

Now, if there is a pull request, so that means a proposed change into a branch, then you can set up approval rules. So these approval rules ensure that the quality of your code has been reviewed and approved by a certain amount of people before the PR, the pull request is merged. So with a pull request approval rule, we specify a pool of users to approve and then a number of users we need to approve for the PR. So for example, we have a pull request on our CodeCommit repo and we've defined a pool of five users that can approve this request, but we need two of them to accept it. So maybe user one will review it and accept it.

User three will review it and accept it. Therefore, your pull request will be accepted. And once accepted, then it can be merged into the repository. So to specify who and what can accept pull request, we can specify IAM principles ARN such as users, federated users, IAM roles, IAM groups, and so on. Also, if you wanted to automatically create these kind of approval rules on any pull request, then we can use a template and we say,okay, you should apply these approval rules directly for any pull request in the dev and the prod branches.

Okay, so that's it for CodeCommit