I have a requirement, we have one web Application.
from that application, we are downloading the Logs by clicking the Download Button ( manually).
After download using AWS CLI uploading the Logs into S3 then processing the data.
can we do automate this?
please help me to automate this If we can.
Thanks in Advance.
You can create a lambda function and assume a role of ec2-lambda and collect the logs and move them into a S3 bucket even with the time stamp as well you can even schedule it using cloud watch if you want the log backup at a specific time .
You can also use Ansible or Jenkins to do the task in Jenkins you can create a job and it has S3 plugin even available and simply run your Jenkins job which going to copy the logs to your s3 buckets
Related
Is it possible to get per file statistics (or at least download count) for files in google cloud storage?
I want to find the number of downloads for a js plugin file to get an idea of how frequently these are used (in client pages).
Yes, it is possible, but it has to be enabled.
The official recommendation is to create another bucket for the logs generated by the main bucket that you want to trace.
gsutil mb gs://<some-unique-prefix>-example-logs-bucket
then assign Cloud Storage the roles/storage.legacyBucketWriter role for the bucket:
gsutil iam ch group:cloud-storage-analytics#google.com:legacyBucketWriter gs://<some-unique-prefix>-example-logs-bucket
and finally enable the logging for your main bucket:
gsutil logging set on -b gs://example-logs-bucket gs://<main-bucket>
Generate some activity on your main bucket, then wait for one hour at most, hence the reports are not generated hourly and daily. You will be able to browse these events on the logs-bucket created at step 1:
https://imgur.com/a/fncnxwM (imgur is down at the moment..I will fix this image later)
More info, can be found at https://cloud.google.com/storage/docs/access-logs
In most cases, using Cloud Audit Logs is now recommended instead of using legacyBucketWriter.
Logging to a separate Cloud Storage bucket with legacyBucketWriter produces csv files, which you would have to then load into BigQuery yourself to make them actionable, and this would be done far from in real time. Cloud Audit Logs are easier to set up and work with by comparison, and logs are delivered almost instantly.
I am working on a requirement, where i am doing multipart upload of the csv file from on prem server to S3 Bucket.
To achieve this using AWS Lambda I create a presigned url and use this url i am uploading the csv file. Now, once i have the file in AWS S3, i want it to be moved to AWS RDS Oracle DB. Initially i was planning to use AWS Lambda for this.
So once i have the file in S3, it triggers lambda(s3 event) and lambda will push this file to RDS. But with this the issue is with the file Size(600 MB).
I am looking for some other way, where whenever there is a file uploaded to S3, it should trigger any AWS service and that service will push this csv file to RDS. I have gone through AWS DMS/Data Pipeline, but not able to find any way to automate this migration
I need to automate this migration on every s3 upload, that is also cost effective.
Setup S3 Integration and build SPROCS to help automate load. Details found here.
UPDATE:
Looks like you don't even need to create a SPROC. You can just use the RDS procedure as outlined here. You would then just create an event-driven lambda function that is triggered on a given S3 event--e.g. on object PUT(), POST(), COPY, etc..--which passes the S3 metadata requisite to access the event object. Here is a simple Python example of what that Lambda and config might look like. You would then use the metadata passed on the trigger event--as outlined in the Python example--to dynamically create your procedure call then execute that procedure. You can also add the ensuing workflow logic that meets your requirements--i.e. TASK_ID fetch & operational handling, monitoring, etc...--to the same lambda function or separate those concerns by adding additional lambdas. Hope this helps!
How can I get notified when a Device Farm run is finished ?
Is it possible to get the report into s3 bucket ? So it can be use as a source trigger in CodePipeline ?
How can I get notified when a Device Farm run is finished?
One way to do that would be to have a small program with continuously calls get-run and checks the status. There are no waiters in boto3(assuming you're using this) for Device Farm at the time of writing this
https://github.com/boto/botocore/tree/develop/botocore/data/devicefarm/2015-06-23
Is it possible to get the report into s3 bucket ?
Device Farm's artifacts are already in s3 however it's in the Device Farm account and not in the account the run was scheduled with. We can see they're in s3 already from the create-upload command which returns a s3 presigned URL.
So it can be use as a source trigger in CodePipeline ?
That would be cool but this would be something the service doesn't do on our behalf at the moment. You would need to write the script to check if the run is finished, pull the artifacts, then reupload the artifacts to another s3 bucket.
Here's the links to those APIs needed in boto3
get_run
list_artifacts
upload files example
Some images which is already uploaded on AWS S3 bucket and of course there is a lot of image. I want to edit and replace those images and I want to do it on AWS server, Here I want to use aws lambda.
I already can do my job from my local pc. But it takes a very long time. So I want to do it on server.
Is it possible?
Unfortunately directly editing file in S3 is not supported Check out the thread. To overcome the situation, you need to download the file locally in server/local machine, then edit it and re-upload it again to s3 bucket. Also you can enable versions
For node js you can use Jimp
For java: ImageIO
For python: Pillow
or you can use any technology to edit it and later upload it using aws-sdk.
For lambda function you can use serverless framework - https://serverless.com/
I have made youtube videos long back. This is related to how get started with aws-lambda and serverless
https://www.youtube.com/watch?v=uXZCNnzSMkI
You can trigger a Lambda using the AWS SDK.
Write a Lambda to process a single image and deploy it.
Then locally use the AWS SDK to list the images in the bucket and invoke the Lambda (asynchronously) for each file using invoke. I would also save somewhere which files have been processed so you can continue if something fails.
Note that the default limit for Lambda is 1000 concurrent executions, so to avoid reaching the limit you can send messages to an SQS queue (which then triggers the Lambda) or just retry when invoke throws an error.
My usecase is to process S3 access logs(having those 18 fields) periodically and push to table in RDS. I'm using AWS data pipeline for this task to run everyday to process previous day's logs.
I decided to split the task into two activities
1. Shell Command Activity : To process s3 access logs and create a csv file
2. Hive Activity : To read data from csv file and insert to RDS table.
My input s3 bucket has lots of log files hence first activity fails due to out of memory error while staging. However i don't want to stage all the logs, staging the previous day's log is enough for me. I searched around internet but didn't get any solution. How do i achieve this ? Is my solution the optimal one ? Does any solution better than this exist ? Any suggestions will be helpful
Thanks in Advance
You can define your S3 data node use timestamps. For e.g. you can say the directory path is
s3://yourbucket/ #{format(#scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}
Since your log files should have a timestamp in the name (or they could be organized by timestamped directories).
This will only stage the files matching that pattern.
You may be recreating a solution that is already done by Logstash (or more precisely the ELK stack).
http://logstash.net/docs/1.4.2/inputs/s3
Logstash can consume S3 files.
Here is a thread on reading access logs from S3
https://groups.google.com/forum/#!topic/logstash-users/HqHWklNfB9A
We use Splunk (not-free) that has the same capabilities through its AWS plugin.
May I ask why are you pushing the access logs to RDS?
ELK might be a great solution for you. You can build it on your own or use ELK-as-a-service from Logz.io (I work for Logz.io).
It enables you to easily define an S3 bucket, get all your logs read regularly from the bucket and ingested by ELK and view them in preconfigured dashboards.