I am running task runner to perform the defined task, while running it am getting exception telling that can't upload log files to s3. After debugging the task runner application I found that, it will use ACL option to upload task runner log files to S3, due to some restrictions I should not use ACL option while uploading files to S3.
Please suggest if I can do anything to resolve this without configuring ACL on objects.
Do you mean the computational resource owner cannot have write permissions on the S3 log path? You will need to give write permissions on the log path (through ACLs)if you want task runner to upload the logs automatically to S3.
If you don't want to push the task runner log files to S3, you can disable logging by not specifying "logUri" while starting the task runner. In that case, task runner will not try to upload the log files and should not fail.
Related
I have a Nextflow pipeline executed in AWS Batch. Recently, I tried to add a process that uploads files from local machine to S3 bucket so I don't have to upload files manually before each run. I wrote a python script that handles the upload and I wrapped it into a Nextflow process. Since I am uploading from a local machine, I want the upload process with
executor 'local'
This requires a Fusion filesystem enabled in order to have a Work Dir in S3. But when I enable the Fusion filesystem I don't have access to my local filesystem. In my understanding, when Fusion filesystem is enabled, the task runs in Wave container without access to host filesystem. Does anyone have experience with running Nextflow with FusionFS enabled and how to access host filesystem? Thanks!
I don't think you need to manage a hybrid workload here. Pipeline inputs can be stored either locally or in an S3 bucket. If your files are stored locally and you specify a working directory in S3, Nextflow will already try to upload them into the staging area for you. For example, if you specify your working directory in S3 using -work-dir 's3://mybucket/work', Nextflow will try to stage the input files under s3://mybucket/work/stage-<session-uuid>. Once the files are in the staging area, Nextflow can then begin to submit jobs that require them.
Note that a Fusion file system is not strictly required to have your working directory in S3. Nextflow includes support for S3. Either include your AWS access and secret keys in your pipeline configuration or use an IAM role to allow your EC2 instances full access to S3 storage.
Use case:
I have one directory on-premise, I want to make a backup for it let's say at every midnight. And want to restore it if something goes wrong.
Doesn't seem a complicated task,but reading through the AWS documentation even this can be cumbersome and costly.Setting up Storage gateway locally seems unnecessarily complex for a simple task like this,setting up at EC2 costly also.
What I have done:
Reading through this + some other blog posts:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html
https://docs.aws.amazon.com/storagegateway/latest/userguide/WhatIsStorageGateway.html
What I have found:
1.Setting up file gateway (locally or as an EC2 instance):
It just mount the files to an S3. And that's it.So my on-premise App will constantly write to this S3.The documentation doesn't mention anything about scheduled backup and recovery.
2.Setting up volume gateway:
Here I can make a scheduled synchronization/backup to the a S3 ,but using a whole volume for it would be a big overhead.
3.Standalone S3:
Just using a bare S3 and copy my backup there by AWS API/SDK with a manually made scheduled job.
Solutions:
Using point 1 from above, enable versioning and the versions of the files will serve as a recovery point.
Using point 3
I think I am looking for a mix of file-volume gateway: Working on file level and make an asynchronus scheduled snapshot for them.
How this should be handled? Isn't there a really easy way which will just send a backup of a directory to the AWS?
The easiest way to backup a directory to Amazon S3 would be:
Install the AWS Command-Line Interface (CLI)
Provide credentials via the aws configure command
When required run the aws s3 sync command
For example
aws s3 sync folder1 s3://bucketname/folder1/
This will copy any files from the source to the destination. It will only copy files that have been added or changed since a previous sync.
Documentation: sync — AWS CLI Command Reference
If you want to be more fancy and keep multiple backups, you could copy to a different target directory, or create a zip file first and upload the zip file, or even use a backup program like Cloudberry Backup that knows how to use S3 and can do traditional-style backups.
Is there any way to stream/push docker app logs to S3 bucket?
I know following 2 ways
Configure cloud watch logs/stream - All logs (both info & Error logs) are getting merged in this approach
Configure graylogs2 to push every log message and collect and then push to S3 bucket - Need to maintain graylogs2 app.
I am looking for any easy way to push docker app/error logs S3 Bucket
Thanks
A possible solution, though it's hard to tell for your case, is to run logstash in a separate container, and have your app direct logs to logstash. Since Logstash’s logging framework is based on Log4j 2 framework, it will likely be familiar to you. A plugin already exists for logstash to push to S3 on your behalf.
You can configure your existing log4j2 to emit to a port that logstash is running on.
If even this is considered too much maintenance for you, your best solution is probably just setting up a cron to run rsync.
I have a site in a S3 bucket, configured for web access, for which I run an aws s3 sync command every time I push on a specific git repository (I'm using Gitlab at the moment).
So if I push to stable branch, a Gitlab runner performs the npm start build command for building the site, and then aws s3 sync to synchronize to a specific bucket.
I want to migrate to CodeCommit and use pure AWS tools to do the same.
So far I was able to successfully setup the repository, create a CodeBuild for building the artifact, and the artifact is being stored (not deployed) to a S3 bucket. Difference is that I can't get it to deploy to the root folder of the bucket instead of a subfolder, seems like the process is not made for that. I need it to be on a root folder because of how the web access is configured.
For the deployment process, I was taking a look at CodeDeploy but it doesn't actually let me deploy to S3 bucket, it only uses the bucket as an intermediary for deployment to a EC2 instance. So far I get the feeling CodeDeploy is useful only for deployments involving EC2.
This tutorial with a similar requirement to mine, uses CodePipeline and CodeBuild, but the deployment step is actually a aws s3 sync command (same as I was doing on Gitlab), and the actual deployment step on CodePipeline is disabled.
I was looking into a solution which involves using AWS features made for this specific purpose, but I can't find any.
I'm also aware of LambCI, but to me looks like what CodePipeline / CodeBuild is doing, storing artifacts (not deploying to the root folder of the bucket). Plus, I'm looking for an option which doesn't require me to learn or deploy new configuration files (outside AWS config files).
Is this possible with the current state of AWS features?
Today AWS has announced as a new feature the ability to target S3 in the deployment stage of CodePipeline. The announcement is here, and the documentation contains a tutorial available here.
Using your CodeBuild/CodePipeline approach, you should now be able to choose S3 as the deployment provider in the deployment stage rather than performing the sync in your build script. To configure the phase, you provide an S3 bucket name, specify whether to extract the contents of the artifact zip, and if so provide an optional path for the extraction. This should allow you to deploy your content directly to the root of a bucket by omitting the path.
I was dealing with similar issue and as far as I was able to find out, there is no service which is suitable for deploying app to S3.
AWS CodeDeploy is indeed for deploying code running as server.
My solution was to use CodePipeline with three stages:
Source which takes source code from AWS CodeCommit
Build with AWS CodeBuild
Custom lambda function which after successful build takes artifact from S3 artifact storage, unzip it and copies files to my S3 website host.
I used this AWS lambda function from SeamusJ https://github.com/SeamusJ/deploy-build-to-s3
Several changes had to be made, I used node-unzip-2 instead of unzip-stream for unziping artifict from s3.
Also I had to change ACLs in website.ts file
Uploading from CodeBuild is currently the best solution available.
There's some suggestions on how to orchestrate this deployment via CodePipeline in this answer.
I have an application deployed on EB that need to download a file from a remote server then serve to visitors
As I understand, Its recommended to save files to S3 instead then grant users access to these files. However, I believe there is no option for S3 to initiate the download of a file on a remote server therefore the process would be :
EB application get the files => EB application upload the files to S3.
That would double the wait time for users.
Should I save files directly to the application directory instead as I will only use 200-300MB max then clean it daily.
Is there any risk or better approach to this problem?
Why would it double the time? The upload to S3 would be extremely quick. You could even stream the file to S3 as it is being downloaded.
Saving the files to the server will prevent you from scaling your application beyond a single server.