Setting up AWS for data processing S3 or EBS? - amazon-web-services

Hey there I am new to AWS and trying to piece together the best way to do this.
I have thousands of photos I'd like to upload and process on AWS. The software is Agisoft Photoscan and is run in stages. So for the first stage i'd like to use an instance that is geared towards CPU/Memory usage and the second stage geared towards GPU/Memory.
What is the best way to do this? Do I create a new volume for each project in EC2 and attach that volume to each instance when I need to? I see people saying to use S3, do I just create a bucket for each project and then attach the bucket to my instances?
Sorry for the basic questions, the more I read the more questions I seem to have,

I'd recommend starting with s3 and seeing if it works - will be cheaper and easier to setup. Switch to EBS volumes if you need to, but I doubt you will need to.
You could create a bucket for each project, or you could just create a bucket a segregate the images based on the file-name prefix (i.e. project1-image001.jpg).
You don't 'attach' buckets to EC2, but you should assign an IAM role to the instances as you create them, and then you can grant that IAM role permissions to access the S3 bucket(s) of your choice.
Since you don't have a lot of AWS experience, keep things simple, and using S3 is about as simple as it gets.

You can go with AWS S3 to upload photos. AWS S3 is similar like Google Drive.
If you want to use AWS EBS volumes instead of S3. The problem you may face is,
EBS volumes is accessible within availability zone but not within region also means you have to create snapshots to transfer another availability zone. But S3 is global.
EBS volumes are not designed for storing multimedia files. It is like hard drive. Once you launch an EC2 instance need to attach EBS volumes.
As per best practice, you use AWS S3.
Based on your case view, you can create bucket for each project or you can use single bucket with multiple folders to identify the projects.
Create an AWS IAM role with S3 access permission and attach it to EC2 instance. No need of using AWS Credentials in the project. EC2 instance will use role to access S3 and role doesn't have permanent credentials, it will keep rotating it.

Related

Upload files to Amazon EC2 in a private network from Github Actions

As part of our workflow, we want to upload files to our Amazon EC2 instance automatically.
It's currently only allowing whitelisted IP ranges to connect over SSH. And since we are running Github actions, it seems odd to white list roughly 1500 IP ranges.
Does anyone have an intelligent solution for this?
SCP and/or rsync don't matter for us.
It's merely getting access that I need help with.
I have access to the ssh key, and I can get a hold of an admin to get temporary access to the AWS Console should I need it.
Since the EC2 instance is in a private network, the hurdles to get Github Actions ssh access to it are many.
I would work with a decoupled architecture. Have the GitHub action upload the files to S3.
Then
Lambda can load the file onto the ec2 instance - S3 trigger for Lambda
OR
Have a process running on the ec2 instance poll for new events on the s3 bucket per SNS - S3 polling

Copy AWS Snapshot to S3 bucket using python lambda

I am looking to build a lambda function as part of a forensics workflow that will copy a particular EBS snapshot to a manually created S3 bucket in order to store for short/long term forensics requirements. Looking for any pointers!
The copy_snapshot option is not helpful here. It copies an EBS snapshot to an AWS-controlled S3 bucket (in a different region). It's not to an S3 bucket under your control and you have no direct access to it.
If you genuinely want to export an EBS snapshot to your own S3 bucket, or even to some storage device external to AWS, then you need to do it manually.
One way is as follows (some details are thanks to this serverfault answer):
launch an EC2 instance
create an EBS volume from your EBS snapshot
attach, but do not mount, the EBS volume to your instance
export the data (to S3 or elsewhere) using a tool such as dd
There may be tools available that actually implement this series of steps for you, though I was not able to locate any with a quick search.

Duplicity/Duply Backup to S3 without API Keys?

Goal: Automated full and incremental backups of an AWS EFS filesystem to an S3 bucket.
I have been looking at Duplicity/Duply to accomplish this, and it looks like it could work.I do have one concern, you would have to store API keys in the clear on an AMI for this to work. Is there any way to accomplish this using a role?
I do backups exactly as you want to and it can be done since duplicity has support for instance profile. Make sure to give appropriate access to your role and attach it to your instance.

File storage for social media application

I am launching Mobile application with backend as PHP hosted on 4 instances of AWS Elastic beanstalk. For media storage (images and videos) I am not sure if S3 is a better option or having an EC2 instance with a share directory will be fine.
My consideration will be based on performance and throughput. For S3 i never came across any documentation or reference which can give me the throughput between EC2 and S3.
As per your use case S3 is the best option as per the images durability goes. And the data transfer speeds between an EC2 instance and S3 is super fast so you don't have to worry about that.
And if you come across issue where there is latency in data transferred between the EC2 instance and S3 due to the Instance and S3 bucket regions being different AWS just introduced S3 Accelerated transfer http://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html
So using S3 for image file storage is most durable and reliable option for your use case.

How to download a file from s3 using an EC2 instance?

I have an AMI image in which will be used for autoscaling, every EC2 instance that initiated from the AMI image,suppose to download some files from a s3 bucket, (They are all in the same VPC) the s3 suppose to be private(Not open to public).
How does this can be done?
There are lots of ways. You could use the AWS CLI (S3 Command) or you could use the SDK for the language of your choice. You will also probably want to use IAM to establish the credentials for accessing the resources. The CLI is probably the quickest way to get up and running.