I'm trying to set up a Gitlab CI Pipeline using Docker on AWS EC2. Everything worked as expected until I hit the storage cap on my EC2 instance (8GB). As I quickly learn, a pipeline could easily use up 1-2 GB of data. Having 4 on the server and everything stop.
Granted I could look into optimising Docker storage usage e.g. using Alpine, however I do need a more permanent solution because 8GB would hardly suffice.
I have been trying to use s3 bucket, with s3fs, as Docker volume to handle my data hunger pipelines but to no avail. Docker volume use hardlinks which are not supported by s3fs.
Is this possible to configure Docker to use symlink instead? Or alternatively, if are there other packages which mount s3 as a "true" filesystem?
Related
I'm using aws S3 storage and cloudfront for my dozens of static web sites.
Also i'm using aws lambda with nodejs and EFS for git, node_modules and build cache files.
When i try git clone, npm install and npm run build from EFS it's working to slow.
But when i try from lambda /tmp folder it's working x10 faster than the EFS storage.
I need a storage like EFS because i storage dozens of web sites git, node package and cache files. So how can i increase EFS performance.
If you have used the standard settings for EFS you will be utilizing burstable credits which are depleted the more file changes you make.
Depending on the file size and the number of changes on the EFS mount you nay be depleting the available credits which would provide performance problems for any application attached to the EFS mount. You can detect this by looking at the BurstCreditBalance CloudWatch metric, also keep an eye for any flatlining of TotalIOBytes as this might suggest it has reached its maximum throughput.
When you perform the git clone you could also use the --depth with a value of 1 to create a shallow clone. This option will get only the latest commit, as opposed to cloning the entire git history too.
A improvement to this workflow I would suggest reconsidering to use the following technologies to provide the workflow for what you want. Rather than a Lambda function create a CodePipeline pipeline that will trigger a CodeBuild job. This CodeBuild job would be responsible for running the npm install task for you as well as any other actions.
Part of CodePipeline's flow is that it will store the legacy artifact in S3 along the way, so that you have a copy of it. The CodePipeline can also deploy to your S3 bucket as well at the end.
A couple of links that might be useful for you:
AWS EFS Performance
Tutorial: Create a pipeline that uses Amazon S3 as a deployment provider
AWS provides a config to limit the upload bandwidth when copying files to s3 from ec2 instances. This can be configured by below AWS config.
aws configure set default.s3.max_bandwidth
Once we set this config and run an AWS CLI command to cp files to s3 bandwidth is limited.
But when I run the s3_sync ansible module on the same ec2 instance that limitation is not getting applied. Any possible workaround to apply the limitation to ansible as well?
Not sure if this is possible because botocore may not support this.
Mostly is up to Amazon to fix their python API.
For example Docker module works fine by sharing confugration between cli and python-api.
Obviously that I assumed you did run this command locally as the same user because otherwise the aws config you made would clearly not be used.
I am transferring an existing PHP application to the Elastic Beanstalk and have a newbie question. My application has a data folder that grows and changes over time and can grow quite large currently the folder is a subfolder in the root directory of the application. In the traditional development model I just upload the changed PHP files are carry on using the same data folder, how can I do this in the Elastic Beanstalk?
I dont want to have to download and upload the data folder everytime I deploy a new version of the application. What is the best practice to do this in the AWS Beanstalk?
TIA
Peter
This is a question of continious deployment.
Elastic BeanStalk supports CD from AWS CodePipeline: https://aws.amazon.com/getting-started/tutorials/continuous-deployment-pipeline/
To address "grows and changes over time and can grow quite large currently the folder is a subfolder in the root directory of the application", you can use CodeCommit to version your code using Git. If you version the data folder with your application, the deployment will include it.
If the data is something you can offload to an object store (S3) or a database (RDS/DynamoDB/etc), it would be a better practice to do so.
As per the AWS documentation here, Elastic Beanstalk applications run on EC2 instances that have no persistent local storage, as a result your EB application should be as stateless as possible, and should use persistent storage from one of the storage offerings offered by AWS.
A common strategy for persistent storage is to use Amazon EFS, or Elastic File Service. As noted in the documentation for using EFS with Elastic Beanstalk here:
Your application can treat a mounted Amazon EFS volume like local storage, so you don't have to change your application code to scale up to multiple instances.
Your EFS drive is essentially a mounted network drive. Files stored in EFS will be accessible across any instances that have the file system mounted, and will persist beyond instance termination and/or scaling events.
You can learn more about EFS here, and using EFS with Elastic Beanstalk here.
I have the end goal of deploying a docker container on AWS Fargate. As it happens, my dockerfile has no local dependencies and my upload connection is very slow, thus I want to build it in the cloud. What would be the easiest way to build the image on AWS? Creating an EC2 Linux instance, installing docker and aws-cli in it, building the image then uploading to AWS ECR, if that's possible?
The easiest way is by using AWS CodeBuild - it will do everything for you, even push it to AWS ECR.
Basic instructions: here
I'm following this tutorial and 100% works like a charm :
http://docs.aws.amazon.com/gettingstarted/latest/wah-linux/awsgsg-wah-linux.pdf
but, in that tutorial, it use Amazon EC2 and RDS only. I was wondering what if my servers scaled up into multiple EC2 instances then I need to update my PHP files.
do I have to distribute it manually across those instances? because, as far as I know, those instances are not synced each other.
so, I decided to use S3 as replacement of my /var/www so the PHP files is now centralised in one place.
so, whenever those EC2 scaled up, the files remains in one place and I don't need to upload to multiple EC2.
is this the best practice to have centralised file server (S3) for /var/www ? because currently I still having permission issue when it's mounted using s3fs.
thank you.
You have to put your /var/www/ in S3 and when your instances scaled up have to make 'aws s3 sync' from your bucket, you can do that in the userdata. Also you have to select a 'master' instance where you make changes, a sync script upload changes to S3 and with rsync it copy changes to your alive FE. This is because if you have 3 FE that downloaded /var/www/ from S3 and you want to make a new change you would have to make a s3 sync in all your instances.
You can manage changes in your 'master' instance with inotify. Inotify can detect a change in /var/www/ and exec two commands, one could be aws s3 sync and then a rsync to the rest of your instances. You can get the list of your instances from the ELB through the AWS API.
The last thing is check the instance terminate protection in your 'master' instance.
Your architecture should look like here http://www.markomedia.com.au/scaling-wordpress-in-amazon-cloud/
Good look!!