AWS storage architecture help is needed - amazon-web-services

I have recently joined a company who have a single working AWS environment, manually created in the console, running a few Java apps that use data in an S3 bucket using a Boto-esque library.
I want to use Terraform create arbitrary clones of the environment on demand, in different AWS accounts, and I am stuck on copying big S3 buckets (~600Gb of files).
How have other people solved this problem in the past?

Related

Using AWS Lambda to copying S3 files to on-premise LAN folder

Problem:
We need to perform a task under which we have to transfer all files ( CSV format) stored in AWS S3 bucket to a on-premise LAN folder using the Lambda functions. This will be a scheduled tasks which will be carried out after every 1 hour, and the file will again be transferred from S3 to on-premise LAN folder while replacing the existing ones. Size of these files is not large (preferably under few MBs).
I am not able to find out any AWS managed service to accomplish this task.
I am a newbie to AWS, any solution to this problem is most welcome.
Thanks,
Actually, I am looking for a solution by which I can push S3 files to on-premise folder automatically
For that you need to make the on-premise network visible to the logic (lambda, whatever..) "pushing" the content. The default solution is using the AWS site-to-site VPN.
There are multiple options for setting up the VPN, you could choose based on the needs.
Then the on-premise network will look just like another subnet.
However - VPN has its complexity and cost. In most of the cases it is much easier to "pull" data from the on-premise environment.
To sync data there are multiple options. For a managed service, I could point out the S3 Gateway which based on your description sounds like an insane overkill.
Maybe you could start with a simple cron job (or a task timer if working with windows) and run a CLI command to sync the S3 content or just copy specified files.
Check out S3 Sync, I think it will help you accomplish this task: https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/sync.html#examples
To run any AWS CLI in your computer, you will need to setup credentials, and the setup account/roles should have permissions to do the task (e.g. access S3)
Check out AWS CLI setup here: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html

Best practices to Initialize and populate a serverless PostgreSQL RDS instance by a CloudFormation stack deployment

We are successfully spinning up an AWS CloudFormation stack that includes a serverless RDS PostgreSQL instance. Once the PostgreSQL instance is in place, we're automatically restoring a PostgreSQL database dump (in binary format) that was created using pg_dump on a local development machine on the instance PostgreSQL instance just created by CloudFormation.
We're using a Lambda function (instantiated by the CloudFormation process) that includes a build of the pg_restore executable within a Lambda layer and we've also packaged our database dump file within the Lambda.
The above approached seems complicated for something that presumably has been solved many times... but Google searches have revealed almost nothing that corresponds to our scenario. ​We may be thinking about our situation in the wrong way, so please feel free to offer a different approach (e.g., is there a CodePipeline/CodeBuild approach that would automate everything). Our preference would be to stick with the AWS toolset as much as possible.
This process will be run every time we deploy to a new environment (e.g., development, test, stage, pre-production, demonstration, alpha, beta, production, troubleshooting) potentially per release as part of our CI/CD practices.
Does anyone have advice or a blog post that illustrates another way to achieve our goal?
If you have provisioned everything via IaC (Infrastructure as Code) anyway that is most of the time-saving done and you should already be able to replicate your infrastructure to other accounts/ stacks via different roles in your AWS credentials and config files by passing the —profile some-profile flag. I would recommend AWS SAM (Serverless application model) over Cloudformation though as I find I only need to write ~1/2 the code (roles & policies are created for you mostly) and much better & faster feedback via the console. I would also recommend sam sync (it is in beta currently but worth using) so you don’t need to create a ‘change set’ on code updates, purely the code is updated so deploys take 3-4 secs. Some AWS SAM examples here, check both videos and patterns: https://serverlessland.com (official AWS site)
In terms of the restore of RDS, I’d probably create a base RDS then take a snapshot of that and restore all other RDS instances from snapshots rather than manually creating it all the time you are able to copy snapshots and in fact automate backups to snapshots cross-account (and cross-region if required) which should be a little cleaner
There is also a clean solution for replicating data instantaneously across AWS Accounts using AWS DMS (Data Migration Services) and CDC (Change Data Capture). So, process is you have a source DB and a target DB and also a 'Replication Instance' (eg EC2 Micro) that monitors your source DB and can replicate that across to different instances (so you are always in sync on data, for example if you have several devs working on separate 'stacks' to not pollute each others' logs, you can replicate data and changes out seamlessly if required from one DB) - so this works with Source DB and Destination DB both already in AWS however you can also use AWS DMS of course for local DB migration over to AWS, some more info here: https://docs.aws.amazon.com/dms/latest/sbs/chap-manageddatabases.html

Extract Entire AWS Setup into storable Files or Deployment Package(s)

Is there some way to 'dehydrate' or extract an entire AWS setup? I have a small application that uses several AWS components, and I'd like to put the project on hiatus so I don't get charged every month.
I wrote / constructed the app directly through the various services' sites, such as VPN, RDS, etc. Is there some way I can extract my setup into files so I can save these files in Version Control, and 'rehydrate' them back into AWS when I want to re-setup my app?
I tried extracting pieces from Lambda and Event Bridge, but it seems like I can't just 'replay' these files using the CLI to re-create my application.
Specifically, I am looking to extract all code, settings, connections, etc. for:
Lambda. Code, Env Variables, layers, scheduling thru Event Bridge
IAM. Users, roles, permissions
VPC. Subnets, Route tables, Internet gateways, Elastic IPs, NAT Gateways
Event Bridge. Cron settings, connections to Lambda functions.
RDS. MySQL instances. Would like to get all DDL. Data in tables is not required.
Thanks in advance!
You could use Former2. It will scan your account and allow you to generate CloudFormation, Terraform, or Troposphere templates. It uses a browser plugin, but there is also a CLI for it.
What you describe is called Infrastructure as Code. The idea is to define your infrastructure as code and then deploy your infrastructure using that "code".
There are a lot of options in this space. To name a few:
Terraform
Cloudformation
CDK
Pulumi
All of those should allow you to import already existing resources. At least Terraform has a import command to import an already existing resource into your IaC project.
This way you could create a project that mirrors what you currently have in AWS.
Excluded are things that are strictly taken not AWS resources, like:
Code of your Lambdas
MySQL DDL
Depending on the Lambdas deployment "strategy" the code is either on S3 or was directly deployed to the Lambda service. If it is the first, you just need to find the S3 bucket etc and download the code from there. If it is the second you might need to copy and paste it by hand.
When it comes to your MySQL DDL you need to find tools to export that. But there are plenty tools out there to do this.
After you did that, you should be able to destroy all the AWS resources and then deploy them later on again from your new IaC.

AWS: How to transfer files from ec2 instance (Windows Server) to S3 daily?

Can someone explain me whats the best way to transfer data from a harddrive on an EC2 Instance (running Windows Server 2012) to an S3 Bucket for the same AWS Account on a daily basis?
Backround idea to this:
I'm generating a .csv file for one of our Business partners daily at 11:00 am and I want to deliver it to S3 (he has access to our S3 Bucket).
After that he can pull it out of S3 manually or automatically whenever he wants.
Hope you can help me, I only found manually solutions with the CLI, but no automated way for daily transfers.
Best Regards
You can directly mount S3 buckets as mounted drives on your EC2 instances. This way you don't even need some sort of triggers/daily task scheduler along with third party service as objects would be directly available in the S3 bucket.
For Linux typically you would use Filesystem in Userspace (FUSE). Take a look at this repo if you need it for Linux: https://github.com/s3fs-fuse/s3fs-fuse.
Regarding Windows, there is this tool:
https://tntdrive.com/mount-amazon-s3-bucket.aspx
If these tools don't suit you or if you don't want to mount directly the s3 bucket, here is another option: Whatever you can do with the CLI you should be able to do with the SDK. Therefore if you are able to code in one of the various language AWS Lambda proposes - C#/Java/Go/Powershell/Python/Node.js/Ruby - you could automate that using a Lambda function along with a daily task scheduler triggering at 11a.m.
Hope this helps!
Create a small application that uploads your file to an S3 bucket (there are a some example here). Then use Task Scheduler to execute your application on a regular basis.

How can i create s3 bucket and object (like upload an shell script file) pro grammatically

I have to do this for almost 100 account so planning to create using something infra as code. Cloud formation does not support creating object.. can anyone help
There are several strategies, depending on the client environment.
The aws-cli may be used for shell scripting, aws-sdk for JavaScript environments, or Boto3 for python environments.
If you provide the client environment, creating an S3 object is almost a one-liner holding equal s3 bucket security and lifecycle matters.
As Rich Andrew said, there are several different technologies. If you are trying to do infrastructure as code and attach policies and roles I would suggest you look into Terraform or Serverless.
I frequently combine two of the techniques already mentioned above.
For infrastructure setup - Terraform. This tool is always ahead of competition (Ansible, etc.) in terms of cloud modules. You can use it to create bucket, create bucket policies, users, their IAM policies for bucket access, upload initial files to bucket and much more.
It will keep state file containing record of those resources, so you can use the same workflow to destroy all that's created if necessary with very little modifications.
Very easy to get started, but not flexible and you can be caught out if scope change in middle of project suddenly requires feature that's not there.
To get started check out Terrafrom module registry - https://registry.terraform.io/.
It has quite a few S3 modules available to get started even quicker.
For interaction with aws resources - Python Boto3. In your case that would be subsequent file uploads, deletions in S3 bucket.
You can use Boto3 to set up infrastructure - just like Terraform, but it will require more work on your side (like handling exceptions and errors).