AWS EFS too slow when i use git & npm install - amazon-web-services

I'm using aws S3 storage and cloudfront for my dozens of static web sites.
Also i'm using aws lambda with nodejs and EFS for git, node_modules and build cache files.
When i try git clone, npm install and npm run build from EFS it's working to slow.
But when i try from lambda /tmp folder it's working x10 faster than the EFS storage.
I need a storage like EFS because i storage dozens of web sites git, node package and cache files. So how can i increase EFS performance.

If you have used the standard settings for EFS you will be utilizing burstable credits which are depleted the more file changes you make.
Depending on the file size and the number of changes on the EFS mount you nay be depleting the available credits which would provide performance problems for any application attached to the EFS mount. You can detect this by looking at the BurstCreditBalance CloudWatch metric, also keep an eye for any flatlining of TotalIOBytes as this might suggest it has reached its maximum throughput.
When you perform the git clone you could also use the --depth with a value of 1 to create a shallow clone. This option will get only the latest commit, as opposed to cloning the entire git history too.
A improvement to this workflow I would suggest reconsidering to use the following technologies to provide the workflow for what you want. Rather than a Lambda function create a CodePipeline pipeline that will trigger a CodeBuild job. This CodeBuild job would be responsible for running the npm install task for you as well as any other actions.
Part of CodePipeline's flow is that it will store the legacy artifact in S3 along the way, so that you have a copy of it. The CodePipeline can also deploy to your S3 bucket as well at the end.
A couple of links that might be useful for you:
AWS EFS Performance
Tutorial: Create a pipeline that uses Amazon S3 as a deployment provider

Related

Run java -jar inside AWS Glue job

I have relatively simple task to do but struggle with best AWS service mix to accomplish that:
I have simple java program (provided by 3rd party- I can't modify that, just use) that I can run anywhere with java -jar --target-location "path on local disc". The program, once executed, is creating csv file on local disc in path defied in --target-location
Once file is created I need to upload it to S3
The way I'm doing it currently is by having dedicated EC2 instance with java installed and first point is covered by java -jar ... and second with aws s3 cp ... command
I'm looking for better way of doing that (preferably serverless). I'm wandering if above points can be accomplished with AWS Glue Job type Python Shell? Second point (copy local file to S3), most likely I can cover with boto3 but first (java -jar execution)- I'm not sure.
Am I force to use EC2 instance or you see smarter way with AWS Glue?
Or most effective would be to build docker image (that contains this two instructions), register in ECR and run wit AWS Batch?
I'm looking for better way of doing that (preferably serverless).
I cannot tell if a serverless option is better, however, an EC2 instance will do the job just fine. Assume that you have CentOS on your instance, you may do it through
aaPanel GUI
Some useful web panels offer cron scheduled tasks, such as backing up some files from one directory to another S3 directory. I will use aaPanel as an example.
Install aaPanel
Install AWS S3 plugin
Configure the credentials in the plugin.
Cron
Add a scheduled task to back up files from "path on local disc" to AWS S3.
Rclone
A web panel goes beyond the scope of this question. Rclone is another useful tool I use to back up files from local disk to OneDrive, S3, etc.
Installation
curl https://rclone.org/install.sh | sudo bash
Sync
Sync a directory to the remote bucket, deleting any excess files in the bucket.
rclone sync -i /home/local/directory remote:bucket

On-Premise file backup to aws

Use case:
I have one directory on-premise, I want to make a backup for it let's say at every midnight. And want to restore it if something goes wrong.
Doesn't seem a complicated task,but reading through the AWS documentation even this can be cumbersome and costly.Setting up Storage gateway locally seems unnecessarily complex for a simple task like this,setting up at EC2 costly also.
What I have done:
Reading through this + some other blog posts:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html
https://docs.aws.amazon.com/storagegateway/latest/userguide/WhatIsStorageGateway.html
What I have found:
1.Setting up file gateway (locally or as an EC2 instance):
It just mount the files to an S3. And that's it.So my on-premise App will constantly write to this S3.The documentation doesn't mention anything about scheduled backup and recovery.
2.Setting up volume gateway:
Here I can make a scheduled synchronization/backup to the a S3 ,but using a whole volume for it would be a big overhead.
3.Standalone S3:
Just using a bare S3 and copy my backup there by AWS API/SDK with a manually made scheduled job.
Solutions:
Using point 1 from above, enable versioning and the versions of the files will serve as a recovery point.
Using point 3
I think I am looking for a mix of file-volume gateway: Working on file level and make an asynchronus scheduled snapshot for them.
How this should be handled? Isn't there a really easy way which will just send a backup of a directory to the AWS?
The easiest way to backup a directory to Amazon S3 would be:
Install the AWS Command-Line Interface (CLI)
Provide credentials via the aws configure command
When required run the aws s3 sync command
For example
aws s3 sync folder1 s3://bucketname/folder1/
This will copy any files from the source to the destination. It will only copy files that have been added or changed since a previous sync.
Documentation: sync — AWS CLI Command Reference
If you want to be more fancy and keep multiple backups, you could copy to a different target directory, or create a zip file first and upload the zip file, or even use a backup program like Cloudberry Backup that knows how to use S3 and can do traditional-style backups.

from gitlab ci/cd to AWS EC2

It's beens ome time since I've been trying to figure out the really easy way.
I am using gitlab CI/CD and want to move the built data from there to AWS EC2. Problem is i found 2 ways which both are really bad ideas.
building project on gitlab ci/cd, then ssh into the AWS, pull the project from there again, and run npm scripts. This is really wrong and I won't go into details why.
I saw the following: How to deploy with Gitlab-Ci to EC2 using AWS CodeDeploy/CodePipeline/S3 , but it's so big and complex.
Isn't there any easier way to copy built files from gitlab ci/cd to AWS EC2 ?
I use Gitlab as well, and what has worked for me is configuring my runners on EC2 instances. A few options come to mind:
I'd suggest managing your own runners (vs. shared runners) and
giving them permissions to drop built files in S3 and have your
instances pick from there. You could trigger SSM commands from the
runner targeting your instances (preferably by tags) and they'll
download the built files.
You could also look into S3 notifications. I've used them to trigger
Lambda functions on object uploads: it's pretty fast and offers
retry mechanisms. The Lambda could then push SSM commands to
instances. https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

AWS S3 as Docker volume?

I'm trying to set up a Gitlab CI Pipeline using Docker on AWS EC2. Everything worked as expected until I hit the storage cap on my EC2 instance (8GB). As I quickly learn, a pipeline could easily use up 1-2 GB of data. Having 4 on the server and everything stop.
Granted I could look into optimising Docker storage usage e.g. using Alpine, however I do need a more permanent solution because 8GB would hardly suffice.
I have been trying to use s3 bucket, with s3fs, as Docker volume to handle my data hunger pipelines but to no avail. Docker volume use hardlinks which are not supported by s3fs.
Is this possible to configure Docker to use symlink instead? Or alternatively, if are there other packages which mount s3 as a "true" filesystem?

CodeDeploy to S3

I have a site in a S3 bucket, configured for web access, for which I run an aws s3 sync command every time I push on a specific git repository (I'm using Gitlab at the moment).
So if I push to stable branch, a Gitlab runner performs the npm start build command for building the site, and then aws s3 sync to synchronize to a specific bucket.
I want to migrate to CodeCommit and use pure AWS tools to do the same.
So far I was able to successfully setup the repository, create a CodeBuild for building the artifact, and the artifact is being stored (not deployed) to a S3 bucket. Difference is that I can't get it to deploy to the root folder of the bucket instead of a subfolder, seems like the process is not made for that. I need it to be on a root folder because of how the web access is configured.
For the deployment process, I was taking a look at CodeDeploy but it doesn't actually let me deploy to S3 bucket, it only uses the bucket as an intermediary for deployment to a EC2 instance. So far I get the feeling CodeDeploy is useful only for deployments involving EC2.
This tutorial with a similar requirement to mine, uses CodePipeline and CodeBuild, but the deployment step is actually a aws s3 sync command (same as I was doing on Gitlab), and the actual deployment step on CodePipeline is disabled.
I was looking into a solution which involves using AWS features made for this specific purpose, but I can't find any.
I'm also aware of LambCI, but to me looks like what CodePipeline / CodeBuild is doing, storing artifacts (not deploying to the root folder of the bucket). Plus, I'm looking for an option which doesn't require me to learn or deploy new configuration files (outside AWS config files).
Is this possible with the current state of AWS features?
Today AWS has announced as a new feature the ability to target S3 in the deployment stage of CodePipeline. The announcement is here, and the documentation contains a tutorial available here.
Using your CodeBuild/CodePipeline approach, you should now be able to choose S3 as the deployment provider in the deployment stage rather than performing the sync in your build script. To configure the phase, you provide an S3 bucket name, specify whether to extract the contents of the artifact zip, and if so provide an optional path for the extraction. This should allow you to deploy your content directly to the root of a bucket by omitting the path.
I was dealing with similar issue and as far as I was able to find out, there is no service which is suitable for deploying app to S3.
AWS CodeDeploy is indeed for deploying code running as server.
My solution was to use CodePipeline with three stages:
Source which takes source code from AWS CodeCommit
Build with AWS CodeBuild
Custom lambda function which after successful build takes artifact from S3 artifact storage, unzip it and copies files to my S3 website host.
I used this AWS lambda function from SeamusJ https://github.com/SeamusJ/deploy-build-to-s3
Several changes had to be made, I used node-unzip-2 instead of unzip-stream for unziping artifict from s3.
Also I had to change ACLs in website.ts file
Uploading from CodeBuild is currently the best solution available.
There's some suggestions on how to orchestrate this deployment via CodePipeline in this answer.