Ingest files from GCP Cloud Storage to AWS S3 bucket dynamically - amazon-web-services

I am working on a pet project based on multi-cloud (AWS and GCP) which is based on serverless architecture.
Now there are files generated by the business logic within GCP (using Cloud Functions and Pub/Sub) and they are stored in GCP Cloud storage. I want to ingest these files dynamically to AWS S3 bucket from the Cloud Storage.
One possible way is by using the gsutil library (Exporting data from Google Cloud Storage to Amazon S3) but this would require a compute instance, and run the gsutil commands manually which I want to avoid.

In answering this I'm reminded a bit of a Rube Goldberg type setup but I don't think this is too bad.
From the Google side you would create a Cloud Function that is notified when a new file is created. You would use the Object Finalize event. This function would get the information about the file and then call an AWS Lambda fronted by AWS API Gateway.
The GCP Function would pass the bucket and file information to the AWS Lambda. On the AWS side you would have your GCP credentials and the GCP API download the file and upload it to S3.
Something like:
All serverless on both GCP and AWS. Testing isn't bad as you can keep them separate - make sure that GCP is sending what you want and make sure that AWS is parsing and doing the correct thing. There is likely some authentication that needs to happen from the GCP cloud function to API gateway. Additionally, the API gateway can be eliminated if you're ok pulling AWS client libraries into the GCP function. Since you've got to pull GCP libraries into the AWS Lambda this shouldn't be much of a problem.

Related

Using AWS Rekognition in a google cloud function

I am very new to AWS and was wondering if it was possible to use AWS Rekognition in a Google Cloud Function that pulls images from Google Cloud Storage? If so, what are the keys steps to getting this to work?
Thanks
Many Amazon Rekognition API calls involve interaction with files stored in Amazon S3. These API calls will not work with Google Cloud Storage.
Some API calls allow images to be passed as part of the call. These will work from any environment.

How to migrate from cellar clever cloud to S3 of Amazon

I am still new to the cloud and when I first started I used Clever Cloud.
But now I want to migrate to AWS, and I have data that I want to move from Cellar to Amazon S3.
I am not sure what are the conventions on this or the best practices, and if anyone can help with documentation or explanation on how I can proceed is very much appreciated.
Thank you very much.
Clever Cloud Cellar is an Amazon S3 compatible service. This means it operates pretty much the same as S3.
Clever Cloud is not able to communicate directly to Amazon S3, and Amazon S3 is not able to communicate directly to Cellar. Therefore, you will need to:
Download the files from Cellar using s3cmd or the AWS Command-Line Interface (CLI) (see instructions on CleverCloud Cellar website)
Upload the files to an Amazon S3 bucket using the AWS CLI and your AWS credentials
This activity would be most efficient if performed from an Amazon EC2 instance since it has high bandwidth connectivity to Amazon S3.
Note that there will be Data Transfer costs from Clever Cloud for "Outbound traffic".
I suggest you start by getting s3cmd or the AWS CLI working with Cellar to download a single file, and then get the AWS CLI working with Amazon S3 to upload a single file. You can then use the sync command to copy whole directories of files.

Regularly pull files from On-Prem server to S3 using AWS Transfer family

I'm trying to prepare a flow where we can regularly pull the available new files in third parties' on-prem server to our S3 using AWS Transfer family.
I read this documentation https://aws.amazon.com/blogs/storage/how-discover-financial-secures-file-transfers-with-aws-transfer-family/, but it was not clear on setting up and configuring the process.
Can someone share any clear documentation or reference links on using AWS Transfer Family to pull files from external on-prem server to our S3?
#Sampath, I think you misunderstood the available features of the AWS Transfer service. That service is actually acting as a serverless SFTP with AWS S3 as the backend storage to which you can connect via SFTP protocol (now supports FTP and FTPS as well). You can either PUSH data to S3 or PULL data from S3 via AWS Transfer service. You cannot PULL data into S3 from anywhere else via AWS Transfer service alone.
You may have to use any other solution like a Python Script running on AWS EC2 for that purpose.
Another solution would be to connect the external third-party server to the AWS Transfer Service and that server PUSHES files on S3 via AWS Transfer.
As per your use case, I think you need a simple solution that connects to an external third-party server and copies files from it to the AWS S3 bucket. It can be done via a Python script as well and you can run it on either AWS EC2, AWS ECS, AWS Lambda, AWS Batch, etc, depending on the specifications and requirements.
I have used AWS Transfer once I found it to be very expensive and went on with AWS EC2 instead. In the case of AWS EC2, you can even buy reserved instances to further reduce the cost. If the task is just about copying files from an external server to S3 and the copy job will never take more than 10 minutes, then it is better to run it on AWS Lambda.
In short, you cannot PULL data from any server into S3 using the AWS Transfer service. You can only PUSH data to or PULL data from S3 using the AWS Transfer service.
References to some informative blogs:
Centralize data access using AWS Transfer Family and AWS Storage Gateway
How Discover Financial secures file transfers with AWS Transfer Family
Moving external site data to AWS for file transfers with AWS Transfer Family
Easy SFTP Setup with AWS Transfer Family
With the AWS Transfer Family service you can create servers that uses SFTP, FTPS, and FTP protocols for your file transfers, and use the Amazon S3 and EFS as domains to store and access your files.
To connect your on-premise servers with the Transfer Family server you will need to use a service like File Gateway/Storage Gateway and connect via HTTPS to S3 to sync your files.
Your architecture will be something like this:
If you want more details of how to connect with your on-premises servers with the AWS S3/Transfer Family services take a look on this blog post: Centralize data access using AWS Transfer Family and AWS Storage Gateway

How to copy an aws private bucket to azure storage

I want to copy a folder with large files in it to azure storage.
I found this article that shows how to copy a public aws bucket to azure: https://microsoft.github.io/AzureTipsAndTricks/blog/tip220.html
But how can I do this, if the aws bucket is private? How can I pass the credentials to azcopy for it to copy my files from aws bucket to azure directly?
From Copy data from Amazon S3 to Azure Storage by using AzCopy | Microsoft Docs:
AzCopy is a command-line utility that you can use to copy blobs or files to or from a storage account. This article helps you copy objects, directories, and buckets from Amazon Web Services (AWS) S3 to Azure blob storage by using AzCopy.
The article explains how to provide both Azure and AWS credentials.
One thing to note is that Amazon S3 cannot 'send' data to Azure Blob Storage. Therefore, something will need to call GetObject() on S3 to retrieve the data, and then send it to Azure. I'm assuming that Azure Blob Storage cannot directly request data from Amazon S3, so it means that the data will be 'downloaded' from S3, then 'uploaded' to Azure. To improve efficiency, run the AzCopy command either from an AWS or an Azure virtual machine, to reduce the latency of sending via your own computer.
One solution, albeit not an ideal one is that you could request an AWS Snowball with your bucket data on it, then use Azure Import/Export service to send the Snowball to Azure for ingestion.
Have you tried generating a pre-signed url with limited ttl on it for the duration of the copy?
https://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html
that way you can just execute azcopy <aws_presigned_url> <azure_sas_url>
the url contains the information you need for authn/authz on both sides.

Accessing AWS S3 from within google GCP

We were doing most of our cloud processing (and still do) using AWS. However, we also now have some credits on GCP and would like to use and want to explore interoperability between the cloud providers.
In particular, I was wondering if it is possible to use AWS S3 from within GCP. I am not talking about migrating the data but whether there is some API which will allow AWS S3 to work seamlessly from within GCP. We have a lot of data and databases that are hosted on AWS S3 and would prefer to keep everything there as it still does the bulk of our compute.
I guess one way would be to transfer the AWS keys to the GCP VM and then use the boto3 library to download content from AWS S3 but I was wondering if GCP, by itself, provides some other tools for this.
From an AWS perspective, an application running on GCP should appear logically as an on-premises computing environment. This means that you should be able to leverage the services of AWS that can be invoked from an on-premises solution. The GCP environment will have Internet connectivity to AWS which should be a pretty decent link.
In addition, there is a migration service which will move S3 storage to GCP GCS ... but this is distinct from what you were asking.
See also:
Getting started with Amazon S3
Storage Transfer Service Overview