Best approach for migrating Maven development projects into AWS Code Pipeline - amazon-web-services

We are trying to migrate several of our Java/Maven projects into AWS Code Pipeline and we could not find a good and reasonable migration approach (our current architecture is to use AWS for production). Specifically we are interested in several things:
How to cache Maven dependencies so that build tasks do not download the same packages all over again.
There are several approaches possible, for example:
a) Use Code Artifact, but then the Maven projects will be connected to a specific AWS subscription.
b) Use S3 buckets, but then 3PP modules (Maven Wagons) will need to be used.
c) Use EC2 instance for building.
d) Use Docker container created specifically for build purposes.
It is not really clear if Jenkins or Code Pipeline is recommended as a CI/CD product in AWS. We could see some examples that Code Pipeline is used with Jenkins. What is the purpose of such a setup.
Thank you,

Related

Should I share GCP Artifact Registry instances between different stages like production and test?

I'm setting up an application in GCP.
For the application resources I plan to create separate Projects in GCP for the stages. E.g. for databases, backend services. I plan to start with a production stage and a non productive stage, e.g. development.
I would like to store the docker images of our backend services in GCP Artifact Registry.
Should I create two separate Artifact Registry instances, one for each stage? Or should I use a shared one and deploy the same image to production that was tested before on non productive stages? Are there best pracies for this?
Personally I prefer having a dedicated GCP project with artifact registry repositories, shared accross all other projects.
A SA of the project that hosts registries can publish images and packages (Python for example).
The SA of other projects are only readers of the registries in the other project.
When we work with Nexus repo for example, we don't have a repo per env (dev, uat, prd).
We have a shared repo and we can generate snapshot intermediate versions for testing purpose in a dev environement.
I use the same principle for Artifact registry.

CloudBuild with GITLAB at module level

I was working on GITHUB and GCP(Cloud Build for deployments) and working good. Below are the steps:
Created multiple Cloud Functions and used same GIT HUB repository.
Created separate Cloud Build Trigger for each Cloud Function where separate cloudbuild.yml in each Cloud Function folder in repository.
Trigger gets run when there are changes in respective cloud function scripts.
Now i need to integrate Cloud Build with GITLAB.
I have gone through the documentation but found that only webhook is the option and the trigger will be based on whole repository changes. It will require separate repository for each cloud function or Cloud Run. There is no option to select the repository itself.
Can experts guide me on this how I can do this integration because, we are planning to have one repo and multiple service/applications stored in that repository. And we want CI to run on GCP environment itself.
Personally I found GitLab being the worst in comparison to GitHub and BitBucket in terms of integration with the GCP Cloud Build (to run the deployment within GCP).
I don't know ideal solutions, but I probably have 2 ideas. None of them is good from my point of view.
1/ Mirror GitLab repository into GCP repository as described here - Mirroring GitLab repositories to Cloud Source Repositories One of the biggest drawbacks from my point of view - the integration solution is based on a personal credentials, and there should be a person to make it working -
Mirroring stops working if the Google Account is closed or loses access rights to the Git repository in Cloud Source Repositories
When mirroring is done - you probably can work with the GCP based repository in an ordinary way and trigger cloud build jobs as usual. A separate question - how to provide deployment logs to those who initiated the deployment...
2/ Use webhooks. That does not depend on any personal accounts, but not very granular - as you mentioned push on the whole repository level. To overcome that limitation, there might be a very tricky (inline) yaml file - executed by a cloud build trigger. In that yaml file, not only we should fetch the code, but also parse all changes (all commits) in that push to find out which subdirectories (thus separate components - cloud functions) are potentially modified. Then, for each affected (modified) subdirectory we can trigger (asynchronously) some other cloud build job (with a yaml file for it located inside that subdirectory).
An obvious drawback - not clear who and how should get the logs from all those deployments, especially if something went wrong, and the development (and management) of such deployment process might be time/effort consuming and not easy.

from gitlab ci/cd to AWS EC2

It's beens ome time since I've been trying to figure out the really easy way.
I am using gitlab CI/CD and want to move the built data from there to AWS EC2. Problem is i found 2 ways which both are really bad ideas.
building project on gitlab ci/cd, then ssh into the AWS, pull the project from there again, and run npm scripts. This is really wrong and I won't go into details why.
I saw the following: How to deploy with Gitlab-Ci to EC2 using AWS CodeDeploy/CodePipeline/S3 , but it's so big and complex.
Isn't there any easier way to copy built files from gitlab ci/cd to AWS EC2 ?
I use Gitlab as well, and what has worked for me is configuring my runners on EC2 instances. A few options come to mind:
I'd suggest managing your own runners (vs. shared runners) and
giving them permissions to drop built files in S3 and have your
instances pick from there. You could trigger SSM commands from the
runner targeting your instances (preferably by tags) and they'll
download the built files.
You could also look into S3 notifications. I've used them to trigger
Lambda functions on object uploads: it's pretty fast and offers
retry mechanisms. The Lambda could then push SSM commands to
instances. https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

Skaffold vs Spinnaker

I've read about two approaches (there are probably more) for implementing continuous delivery pipelines in GCP:
Skaffold
Spinnaker + Container Builder
I've worked with both a little bit in Quiklabs. If someone has real experience with both, could you please share their pros and cons compared to each other? Why did you choose one over another?
Pipeline using Skaffold (from the docs https://skaffold.dev/docs/pipeline-stages/):
Detect source code changes
Build artifacts
Test artifacts
Tag artifacts
Render manifests
Deploy manifests
Tail logs & Forward ports
Cleanup images and resources
Pipeline using Spinnaker + Cloud Builder:
Developer:
Change code
Create a git tag and push to repo
Container Builder:
Detect new git tag
Build Docker image
Run unit tests
Push Docker image
Spinnaker (from the docs https://www.spinnaker.io/concepts/):
Detect new image
Deploy Canary
Cutover manual approval
Deploy PROD (blue/green)
Tear down Canary
Destroy old PROD
I have worked on both and as per my experience, skaffold is good only for local development testing however if we want to scale to production, pre-production usecases it is better to use a spinnaker pipeline. It(spinnaker) provides cutting edge advantages over skaffold as
Sophisticated/Complex deployment strategies: You can define deployment
strategies like deployment of service 1 before service 2 etc.
Multi-Cluster deployments: Easy UI based deployment can be configured to multiple clusters
Visualization:It provides a rich UI that shows the status of any deployment or pod across clusters, regions, namespace and cloud providers.
I'm not a real power user of both, but my understanding is that
Skaffold is great for dev environment, for developers (build, test, deploy, debug, loop).
Spinnaker is more oriented continuous development for automated platforms (CI/CD), that's why you can perform canary and blue/green deployment and stuff like this, useless for development phase.
Skaffold is also oriented Kubernetes environment, compare to Spinnaker which is more agnostic and can deploy elsewhere.
Skaffold is for fast Local Kubernetes Development.Skaffold handles the workflow for building, pushing and deploying your application
This makes it different from spinnaker which is more oriented towards CI/CD with full production environments

Continuous Integration on AWS EMR

We have a long running EMR cluster that has multiple libraries installed on it using bootstrap actions. Some of these libraries are under continuous development and their codebase is on GitHub.
I've been looking to plug Travis CI with AWS EMR in a similar way to Travis and CodeDeploy. The idea is to get the code on GitHub tested and deployed automatically to EMR while using bootstrap actions to install the updated libraries on all EMR's nodes.
A solution I came up with is to use an EC2 instance in the middle, where Travis and CodeDeploy can be first used to deploy the code on the instance. After that a lunch script on the instance is triggered to create a new EMR cluster with the updated libraries.
However, the above solution means we need to create a new EMR cluster every time we deploy a new version of the system
Any other suggestions?
You definitely don't want to maintain an EC2 instance to orchestrate a CI/CD process like that. First of all, it introduces a number of challenges because then you need to deal with an entire server instance, keep it maintained, deal with networking, apply monitoring and alerts to deal with availability issues, and even then, you won't have availability guarantees, which may cause other issues. Most of all, maintaining an EC2 instance for a purpose like that simply is unnecessary.
I recommend that you investigate using Amazon CodePipeline with a Lambda Step Function.
The Step Function can be used to orchestrate the provisioning of your EMR cluster in a fully serverless environment. With CodePipeline, you can setup a web hook into your Github repo to pull your code and spin up a new deployment automatically whenever changes are committed to your master Github branch (or whatever branch you specify). You can use EMRFS to sync an S3 bucket or folder to your EMR file system for your cluster and then obtain the security benefits of IAM, as well as additional consistency guarantees that come with EMRFS. With Lambda, you also get seamless integration into other services, such as Kinesis, DynamoDB, and CloudWatch, among many others, that will simplify many administrative and development tasks, as well as enable you to have more sophisticated automation with minimal effort.
There are some great resources and tutorials for using CodePipeline with EMR, as well as in general. Here are some examples:
https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-apache-spark-applications-using-aws/
https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-ecs-ecr-codedeploy.html
https://chalice-workshop.readthedocs.io/en/latest/index.html
There are also great tutorials for orchestrating applications with Lambda Step Functions, including the use of EMR. Here are some examples:
https://aws.amazon.com/blogs/big-data/orchestrate-apache-spark-applications-using-aws-step-functions-and-apache-livy/
https://aws.amazon.com/blogs/big-data/orchestrate-multiple-etl-jobs-using-aws-step-functions-and-aws-lambda/
https://github.com/DavidWells/serverless-workshop/tree/master/lessons-code-complete/events/step-functions
https://github.com/aws-samples/lambda-refarch-imagerecognition
https://github.com/aws-samples/aws-serverless-workshops
In the very worst case, if all of those options fail, such as if you need very strict control over the startup process on the EMR cluster after the EMR cluster completes its bootstrapping, you can always create a Java JAR that is loaded as a final step and then use that to either execute a shell script or use the various Amazon Java libraries to run your provisioning commands. In even this case, you still have no need to maintain your own EC2 instance for orchestration purposes (which, in my opinion, still would be hard to justify even if it was running in a Docker container in Kubernetes) because you can easily maintain that deployment process as well with a fully serverless approach.
There are many great videos from the Amazon re:Invent conferences that you may want to watch to get a jump start before you dive into the workshops. For example:
https://www.youtube.com/watch?v=dCDZ7HR7dms
https://www.youtube.com/watch?v=Xi_WrinvTnM&t=1470s
Many more such videos are available on YouTube.
Travis CI also supports Lambda deployment, as mentioned here: https://docs.travis-ci.com/user/deployment/lambda/