I am managing a bunch of users using Amazon Workspaces, they have terabytes of data which they want to start playing around with on their workspace.
I am wondering what is the best way to do the data upload process? Can everything just be downloaded from Google Drive or Dropbox? Or should I use something like AWS Snowball, which is specifically for migration?
While something like AWS Snowball is probably the safest, best bet, I'm kind of hesitant to add another AWS product to the mix, which is why I might just have everything be uploaded and then downloaded from Google Drive / Dropbox. Then again, I am building an AWS environment that will be used long term, and long term using Google Drive / Dropbox won't be a solution.
Thoughts to architect this out (short term and long term)?
Why would you be hesitant to include more AWS products in the mix? Generally speaking, if you aren't combining multiple AWS products to build your solutions then you aren't making very good use of AWS.
For the specific task at hand I would look into AWS WorkDocs, which is integrated very well with AWS Workspaces. If that doesn't suit your needs I would suggest placing the data files on Amazon S3.
You can use FileZilla Pro to upload your data to a AWS S3 bucket.
And use FileZilla Pro within the Workspaces instance to download the files.
Related
Problem:
We need to perform a task under which we have to transfer all files ( CSV format) stored in AWS S3 bucket to a on-premise LAN folder using the Lambda functions. This will be a scheduled tasks which will be carried out after every 1 hour, and the file will again be transferred from S3 to on-premise LAN folder while replacing the existing ones. Size of these files is not large (preferably under few MBs).
I am not able to find out any AWS managed service to accomplish this task.
I am a newbie to AWS, any solution to this problem is most welcome.
Thanks,
Actually, I am looking for a solution by which I can push S3 files to on-premise folder automatically
For that you need to make the on-premise network visible to the logic (lambda, whatever..) "pushing" the content. The default solution is using the AWS site-to-site VPN.
There are multiple options for setting up the VPN, you could choose based on the needs.
Then the on-premise network will look just like another subnet.
However - VPN has its complexity and cost. In most of the cases it is much easier to "pull" data from the on-premise environment.
To sync data there are multiple options. For a managed service, I could point out the S3 Gateway which based on your description sounds like an insane overkill.
Maybe you could start with a simple cron job (or a task timer if working with windows) and run a CLI command to sync the S3 content or just copy specified files.
Check out S3 Sync, I think it will help you accomplish this task: https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/sync.html#examples
To run any AWS CLI in your computer, you will need to setup credentials, and the setup account/roles should have permissions to do the task (e.g. access S3)
Check out AWS CLI setup here: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
Can someone explain me whats the best way to transfer data from a harddrive on an EC2 Instance (running Windows Server 2012) to an S3 Bucket for the same AWS Account on a daily basis?
Backround idea to this:
I'm generating a .csv file for one of our Business partners daily at 11:00 am and I want to deliver it to S3 (he has access to our S3 Bucket).
After that he can pull it out of S3 manually or automatically whenever he wants.
Hope you can help me, I only found manually solutions with the CLI, but no automated way for daily transfers.
Best Regards
You can directly mount S3 buckets as mounted drives on your EC2 instances. This way you don't even need some sort of triggers/daily task scheduler along with third party service as objects would be directly available in the S3 bucket.
For Linux typically you would use Filesystem in Userspace (FUSE). Take a look at this repo if you need it for Linux: https://github.com/s3fs-fuse/s3fs-fuse.
Regarding Windows, there is this tool:
https://tntdrive.com/mount-amazon-s3-bucket.aspx
If these tools don't suit you or if you don't want to mount directly the s3 bucket, here is another option: Whatever you can do with the CLI you should be able to do with the SDK. Therefore if you are able to code in one of the various language AWS Lambda proposes - C#/Java/Go/Powershell/Python/Node.js/Ruby - you could automate that using a Lambda function along with a daily task scheduler triggering at 11a.m.
Hope this helps!
Create a small application that uploads your file to an S3 bucket (there are a some example here). Then use Task Scheduler to execute your application on a regular basis.
I have about 35 gb of photos in my amazon cloud drive. I'm trying to run a convolutional neural network using the Deep Learning Linux EC2 AMI on AWS. Is there a way I could use the images in my amazon cloud drive for this purposes? Maybe access them in my python script or something?
Or is there another way of storing the 35 gb of data to be able to be used with AWS?
Interesting question and I could visualize use-case for the same. Note that S3 is in ideal location for enterprise use-cases while Amazon Drive is alternative of Google drive and is more towards customer focussed a b2c solution... Primary purpose of Amazon Drive is to let photos/documents be in sync across different devices of customer and secondly, auto upload of photos taken.
Having said that, ofcourse u can use S3 to store your images and then you have many sdks in all languages through which your ec2 can interact with s3.
But if you want to use amazon drive; you can refer https://developer.amazon.com/public/apis/experience/cloud-drive/content/restful-api
Note that as S3 is focussed towards enterprise; you have loads of options and easy apis as compared to with amazon drive.. You can also try uploading images to Google drive.. they have much better apis and your ec2 can talk to google drive apis.
You cannot access Amazon Cloud Drive. Amazon does not provide public access to its only API. Amazon Cloud Drive does not have WebDav or (S)FTP, the only supported API is poorly documented REST API. And Amazon does not authorize API keys anymore for at least a year.
Even if you got whitelisted access key year ago or earlier, API and service itself are so awkward and buggy that you get more problems than solve them. Constant TooManyRequests errors even if you didnt send any request for whole day, wierd undocumented errors.
I'm trying to figure out what AWS services I need for the mobile application I'm working on with my startup. The application we're working on should go into the app-/play-store later this year, so we need a "best-practice" solution for our case. It must be high scaleable so if there are thousands of requests to the server it should remain stable and fast. Also we maybe want to deploy a website on it.
Actually we are using Uberspace (link) servers with an Node.js application and MongoDB running on it. Everything works fine, but for the release version we want to go with AWS. What we need is something we can run Node.js / MongoDB (or something similar to MongoDB) on and something to store images like profile pictures that can be requested by the user.
I have already read some informations about AWS on their website but that didn't help a lot. There are so many services and we don't know which of these fit our needs perfectly.
A friend told me to just use AWS EC2 for the Node.js server + MongoDB and S3 to store images, but on some websites I have read that it is better to use this architecture:
We would be glad if there is someone who can share his/her knowledge with us!
To run code: you can use lambda, but be careful: the benefit you
don't have to worry about server, the downside is lambda sometimes
unreasonably slow. If you need it really fast then you need it on EC2
with auto-scaling. If you tune it up properly it works like a charm.
To store data: DynamoDB if you want it really fast (single digits
milliseconds regardless of load and DB size) and according to best
practices. It REQUIRES proper schema or will cost you a fortune,
otherwise use MongoDB on EC2.
If you need RDBMS then RDS (benefits:
scalability, availability, no headache with maintenance)
Cache: they have both Redis and memcached.
S3: to store static assets.
I do not suggest CloudFront, there are another CDN on market with better
price/possibilities.
API gateway: yes, if you have an API.
Depending on your app, you may need SQS.
Cognito is a good service if you want to authenticate your users at using google/fb/etc.
CloudWatch: if you're metric-addict then it's not for you, perhaps standalone EC2
will be better. But, for most people CloudWatch is abcolutely OK.
Create all necessary alarms (CPU overload etc).
You should use roles
to allow access to your S3/DB from lambda/AWS.
You should not use the root account but create a separate user instead.
Create billing alarm: you'll know if you're going to break budget.
Create lambda functions to backup your EBS volumes (and whatever else you may need to backup). There's no problem if backup starts a second later, so
Lambda is ok here.
Run Trusted Adviser now and then.
it'd be better for you to set it up using CloudFormation stack: you'll be able to deploy the same infrastructure with ease in another region if/when needed, also it's relatively easier to manage Infrastructure-as-a-code than when it built manually.
If you want a very high scalable application, you may be need to use a serverless architecture with AWS lambda.
There is a framework called serverless that helps you to manage and organize all your lambda function and put them behind AWS Gateway.
For the storage you can use AWS EC2 and install MongoDB or you can go with AWS DynamODB as your NoSql storage.
If you want a frontend, both web and mobile, you may be want to visit the react native approach.
I hope I've been helpful.
I have a service hosted on Amazon Web Services. There I have multiple EC2 instances running with the exact same setup and data, managed by an Elastic Load Balancer and scaling groups.
Those instances are web servers running web applications based on PHP. So currently there are the very same files etc. placed on every instance. But when the ELB / scaling group launches a new instance based on load rules etc., the files might not be up-to-date.
Additionally, I'd rather like to use a shared file system for PHP sessions etc. than sticky sessions.
So, my question is, for those reasons and maybe more coming up in the future, I would like to have a shared file system entity which I can attach to my EC2 instances.
What way would you suggest to resolve this? Are there any solutions offered by AWS directly so I can rely on their services rather than doing it on my on with a DRBD and so on? What is the easiest approach? DRBD, NFS, ...? Is S3 also feasible for those intends?
Thanks in advance.
As mentioned in a comment, AWS has announced EFS (http://aws.amazon.com/efs/) a shared network file system. It is currently in very limited preview, but based on previous AWS services I would hope to see it generally available in the next few months.
In the meantime there are a couple of third party shared file system solutions for AWS such as SoftNAS https://aws.amazon.com/marketplace/pp/B00PJ9FGVU/ref=srh_res_product_title?ie=UTF8&sr=0-3&qid=1432203627313
S3 is possible but not always ideal, the main blocker being it does not natively support any filesystem protocols, instead all interactions need to be via an AWS API or via http calls. Additionally when looking at using it for session stores the 'eventually consistent' model will likely cause issues.
That being said - if all you need is updated resources, you could create a simple script to run either as a cron or on startup that downloads the files from s3.
Finally in the case of static resources like css/images don't store them on your webserver in the first place - there are plenty of articles covering the benefit of storing and accessing static web resources directly from s3 while keeping the dynamic stuff on your server.
From what we can tell at this point, EFS is expected to provide basic NFS file sharing on SSD-backed storage. Once available, it will be a v1.0 proprietary file system. There is no encryption and its AWS-only. The data is completely under AWS control.
SoftNAS is a mature, proven advanced ZFS-based NAS Filer that is full-featured, including encrypted EBS and S3 storage, storage snapshots for data protection, writable clones for DevOps and QA testing, RAM and SSD caching for maximum IOPS and throughput, deduplication and compression, cross-zone HA and a 100% up-time SLA. It supports NFS with LDAP and Active Directory authentication, CIFS/SMB with AD users/groups, iSCSI multi-pathing, FTP and (soon) AFP. SoftNAS instances and all storage is completely under your control and you have complete control of the EBS and S3 encryption and keys (you can use EBS encryption or any Linux compatible encryption and key management approach you prefer or require).
The ZFS filesystem is a proven filesystem that is trusted by thousands of enterprises globally. Customers are running more than 600 million files in production on SoftNAS today - ZFS is capable of scaling into the billions.
SoftNAS is cross-platform, and runs on cloud platforms other than AWS, including Azure, CenturyLink Cloud, Faction cloud, VMware vSPhere/ESXi, VMware vCloud Air and Hyper-V, so your data is not limited or locked into AWS. More platforms are planned. It provides cross-platform replication, making it easy to migrate data between any supported public cloud, private cloud, or premise-based data center.
SoftNAS is backed by industry-leading technical support from cloud storage specialists (it's all we do), something you may need or want.
Those are some of the more noteworthy differences between EFS and SoftNAS. For a more detailed comparison chart:
https://www.softnas.com/wp/nas-storage/softnas-cloud-aws-nfs-cifs/how-does-it-compare/
If you are willing to roll your own HA NFS cluster, and be responsible for its care, feeding and support, then you can use Linux and DRBD/corosync or any number of other Linux clustering approaches. You will have to support it yourself and be responsible for whatever happens.
There's also GlusterFS. It does well up to 250,000 files (in our testing) and has been observed to suffer from an IOPS brownout when approaching 1 million files, and IOPS blackouts above 1 million files (according to customers who have used it). For smaller deployments it reportedly works reasonably well.
Hope that helps.
CTO - SoftNAS
For keeping your webserver sessions in sync you can easily switch to Redis or Memcached as your session handler. This is a simple setting in the PHP.ini and they can all access the same Redis or Memcached server to do sessions. You can use Amazon's Elasticache which will manage the Redis or Memcache instance for you.
http://phpave.com/redis-as-a-php-session-handler/ <- explains how to setup Redis with PHP pretty easily
For keeping your files in sync is a little bit more complicated.
How to I push new code changes to all my webservers?
You could use Git. When you deploy you can setup multiple servers and it will push your branch (master) to the multiple servers. So every new build goes out to all webserver.
What about new machines that launch?
I would setup new machines to run a rsync script from a trusted source, your master web server. That way they sync their web folders with the master when they boot and would be identical even if the AMI had old web files in it.
What about files that change and need to be live updated?
Store any user uploaded files in S3. So if user uploads a document on Server 1 then the file is stored in s3 and location is stored in a database. Then if a different user is on server 2 he can see the same file and access it as if it was on server 2. The file would be retrieved from s3 and served to the client.
GlusterFS is also an open source distributed file system used by many to create shared storage across EC2 instances
Until Amazon EFS hits production the best approach in my opinion is to build a storage backend exporting NFS from EC2 instances, maybe using Pacemaker/Corosync to achieve HA.
You could create an EBS volume that stores the files and instruct Pacemaker to umount/dettach and then attach/mount the EBS volume to the healthy NFS cluster node.
Hi we currently use a product called SoftNAS in our AWS environment. It allows us to chooses between both EBS and S3 backed storage. It has built in replication as well as a high availability option. May be something you can check out. I believe they offer a free trial you can try out on AWS
We are using ObjectiveFS and it is working well for us. It uses S3 for storage and is straight forward to set up.
They've also written a doc on how to share files between EC2 instances.
http://objectivefs.com/howto/how-to-share-files-between-ec2-instances