i am bit confused between AWS S3 and AWS storage gateway as both functions the same of storing data.Can anyone explain me with example of what is the exact difference between two services offered by Amazon
AWS S3 is the data repository
AWS Storage Gateway connects on premise storage to the S3 repository.
You would use Storage Gateway for a number of reasons
You want to stop purchasing storage devices, and use S3 to back your enterprise storage. In this case, your company would save to a location defined on the storage gateway device, which would then handle local caching, and offload the less frequently accessed data to S3.
You want to use it as a back up system - whereby Storage Gateway would snap shot the data into S3
To take advantage of the newly released virtual tape library - which would alloy you to transition from tape storage to S3/Glacier storage, without losing your existing tape software and cataloging investment.
1, AWS S3 is a file system. It acts as network disk. For people has no cloud experience, you can treat it as dropbox.
2, AWS Storage Gateway is an virtual interface (or in practice, it is a virtual pc running on your server) which allow you read/write data from/to aws S3 or other aws storage service transparently
You can assume S3 is dropbox itself, you can access it through web or api, and AWS Storage Gateway is the dropbox client on your pc, which simulate the dropbox as your local drive (actually a network drive in the real case).
I think, the above answers are enough explanatory. But here's just a check
Why would I use Storing data on AWS S3?
Easy to use
Cost-effective
Long durability and availability
No limitation for storing amount of data. Only thing is - Data object should not be more than 5 TB
Why would I use AWS Storage Gateway?
I have large amount of data or important data that is stored in data centre and I want to store on Cloud (AWS) for "obvious" reasons
I need a mechanism to transfer my important data from data centre to AWS S3
I need to store my old and "not-so-useful" but "may-be-needed-in-future" type data so I will store it on AWS
Glacier
Now, I need a mechanism to implement this successfully. AWS Storage gateway is provided to fulfil this requirement.
AWS Storage Gateway will provide you a VM which will be installed on your data centre and will transfer that data.
That's it. (y)
Related
let's suppose I have the need to have a NAS-equivalent share on AWS that will replace my on-prem NAS server. I see that both solutions, FSx and S3 File Gateway, allow to have a SMB protocol interface. So they will present themselves to clients in the same way.
Costs would be much smaller using Storage Gateway backed by S3 for large volumes, if slower performance are acceptable. Is this the only difference?
What are the differences, from a practical perspective, to use one solution over the other?
I'm not mentioning the specific use case on purpose, just want to keep the discussion at a general level.
Thanks,
Regards.
FSx is file system service and S3 is objects storage. File Gateway can "trick" your OS to "think" that S3 is a file system, but it isn't.
Try creating s3 bucket & FSx file system, options are very different. If you use it through file gateway, i would look mostly into what happens with data post upload to aws, what will you do next. If it's just a backup and you want to have unlimited space network disk drive attached to your device, i would pick s3.
In s3 you pick storage classes and not worry about capacity, in Fsx you do worry about those things, you pick SSD/HDD, you set capacity, which minimum could be 32Gb, so you over provision by nature of tech. You also have ceilings of how much data you can put into file system device (65536 GiB). I would pick S3 always except when you have some specific requirements for not picking S3 to store data while it has perfect lifecycle, storage class, versioning, security built in and it's true cloud serverless service with all the peace of mind that it just works and you don't run to traditional issues like out of disk space.
I read the documentation from the official website. but it does not give me a clear picture.
Why would need to use AWS Transfer Family since AWS DataSync can also achieve the same result?
I notice the protocol differences, but am not quite sure about the data migration use case.
Why would we pick one over the other?
Why would need to use AWS Transfer Family since AWS DataSync can also achieve the same result?
It depends on what you mean by achieving the same result.
If it is transferring data to & from AWS then - yes both achieve the same result.
However, the main difference is that AWS Transfer Family is practically an always-on server endpoint enabled for SFTP, FTPS, and/or FTP.
If you need to maintain compatibility for current users and applications that use SFTP, FTPS, and/or FTP then using AWS Transfer Family is a must as that ensures the contract is not broken and that you can continue to use them without any modifications. Existing transfer workflows for your end-users are preserved & existing client-side configurations are maintained.
On the other hand, AWS DataSync is ideal for transferring data between on-premises & AWS or between AWS storage services. A few use-cases that AWS suggests are migrating active data to AWS, archiving data to free up on-premises storage capacity, replicating data to AWS for business continuity, or transferring data to the cloud for analysis and processing.
At the core, both can be used to transfer data to & from AWS but serve different business purposes.
Your exact question in the AWS DataSync FAQs:
Q: When do I use AWS DataSync and when do I use AWS Transfer Family?
A: If you currently use SFTP to exchange data with third parties, AWS Transfer Family provides a fully managed SFTP, FTPS, and FTP transfer directly into and out of Amazon S3, while reducing your operational burden.
If you want an accelerated and automated data transfer between NFS servers, SMB file shares, self-managed object storage, AWS Snowcone, Amazon S3, Amazon EFS, and Amazon FSx for Windows File Server, you can use AWS DataSync. DataSync is ideal for customers who need online migrations for active data sets, timely transfers for continuously generated data, or replication for business continuity.
Also see: AWS Transfer Family FAQs - Q: Why should I use the AWS Transfer Family?
As-Is:
We are currently uploading files to Amazon S3.
These files are processed by a lambda function which then writes a file back to Amazon S3.
Problem:
We are processing critical data. So the data must not be stored in the cloud according to the compliance team.
It shall be stored on-premise on our own file servers.
Question:
How can we replace S3 easily so that our lambda function is accessing the file on the on-premise file server?
(The files must not stored be on S3 - even for a millisecond)
(Alternatively the file might be provided by a user e.g. on a GUI)
If the data can't be transmitted to the cloud, then you can't use a Lambda function in the cloud to process it - if the code is not running on your servers, then it has to receive a copy of the data somehow, which means the data is leaving your network.
If you really want to have the same experience as running in AWS but with on-premise hardware, you can get an AWS Outpost, which is like your own little bit of AWS.
Alternatively, just run the code that would have been in Lambda on your own servers, perhaps using an open-source package that gives you Lambda-like execution using local containers.
So the data must not be stored in the cloud according to the compliance team.
If your only concern is that you don't want to store data on S3, you can put your Lambda in a VPC and have a Site-to-Site VPN from your on-premises network to the AWS VPC.
Usually compliance is not just limited to long term storage like S3. You should check if your data is allowed to leave your local network. In order for Lambda to do processing on your data, the data has to be stored temporarily in the cloud, and also it will leave your local network. If there are compliance limitations for these cases, probably Lambda would not be the best option.
I have a requirement to transfer data(one time) from on prem to AWS S3. The data size is around 1 TB. I was going through AWS Datasync, Snowball etc... But these managed services are better to migrate if the data is in petabytes. Can someone suggest me the best way to transfer the data in a secured way cost effectively
You can use the AWS Command-Line Interface (CLI). This command will copy data to Amazon S3:
aws s3 sync c:/MyDir s3://my-bucket/
If there is a network failure or timeout, simply run the command again. It only copies files that are not already present in the destination.
The time taken will depend upon the speed of your Internet connection.
You could also consider using AWS Snowball, which is a piece of hardware that is sent to your location. It can hold 50TB of data and costs $200.
If you have no specific requirements (apart from the fact that it needs to be encrypted and the file-size is 1TB) then I would suggest you stick to something plain and simple. S3 supports an object size of 5TB so you wouldn't run into trouble. I don't know if your data is made up of many smaller files or 1 big file (or zip) but in essence its all the same. Since the end-points or all encrypted you should be fine (if your worried, you can encrypt your files before and they will be encrypted while stored (if its backup of something). To get to the point, you can use API tools for transfer or just file-explorer type of tools which have also connectivity to S3 (e.g. https://www.cloudberrylab.com/explorer/amazon-s3.aspx). some other point: cost-effectiviness of storage/transfer all depends on how frequent you need the data, if just a backup or just in case. archiving to glacier is much cheaper.
1 TB is large but it's not so large that it'll take you weeks to get your data onto S3. However if you don't have a good upload speed, use Snowball.
https://aws.amazon.com/snowball/
Snowball is a device shipped to you which can hold up to 100TB. You load your data onto it and ship it back to AWS and they'll upload it to the S3 bucket you specify when loading the data.
This can be done in multiple ways.
Using AWS Cli, we can copy files from local to S3
AWS Transfer using FTP or SFTP (AWS SFTP)
Please refer
There are tools like cloudberry clients which has a UI interface
You can use AWS DataSync Tool
I am planning to store objects in S3 standard storage, each object could be of 100MB in size so monthly it could go upto 1TB. I will use a single region to store these objects in S3.
I want to create a mobile app to store and fetch this objects using post/get apis.
And then show them in my app.
S3 uses different pricing sections, I understand storage and requests (post/get) pricing.
My question is around data transfer in/out pricing, in my case above will I be billed for data transfer in/out? if no why not?
Yes, you will be billed because you mobile app will connect from internet. Even connected from within AWS there are fees associated with your number of requests and data transferred (inside region or outside region).
You can use the AWS Calc to get an estimate for the cost associated: https://calculator.s3.amazonaws.com/index.html
All traffic FROM mobile phones to S3 or EC2 is free.
All traffic TO mobile phones from S3/CloudFront is billed according to a selected region. Take a look at https://aws.amazon.com/s3/pricing/.
Keep in mind that incoming traffic (to S3) is free only if you're NOT using S3 Transfer Acceleration.