Does AWS Elasticsearch Supports force zoneawareness
“forced awareness“ functionality is available in Elastic Cloud service.
https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-awareness.html
Yes, when you create a new cluster, you can pick whether the cluster should be deployed in multiple availability zones or not (usually when creating production cluster).
When you do so, AWS takes care of setting up zone awareness for you and properly distributes everything so as to minimize service disruptions. You shouldn't need to have to do it yourself and there aren't a million ways to do it properly anyway.
This is not a duplicate question. I am just confused in Iaas,Saas with respect to AWS services like Dynamo, RDS, RedShift and Kinesis etc. They helps users to create database So, should we categorize them in Iaas or Saas?
Thanks
To help you understand, SaaS is Software as a Service. It's more like an on demand application where you don't have to worry about configurations, accesses, whitelisting etc. For instance, Google Maps (or Google Apps).
IaaS or Infra as a Service gives you more flexibility in terms of spawning of nodes and clusters, to deal with security services at IP and Port levels, manage access control and authentication etc. On AWS, you may specify what all private or public IPs will have access to your system, whether you prefer to go with dense storage or dense compute nodes for your warehouse, rotate your log files etc.
A page on Amazon RDS reads -
When you buy a server, you get CPU, memory, storage, and IOPS, all
bundled together. With Amazon RDS, these are split apart so that you
can scale them independently.
So, in short... Services like AWS and Azure are mostly now either IaaS or PaaS.
When trying to use Amazon Redshift to create a datasource for my Machine Learning model, I encountered the following error when testing the access of my IAM role:
There is no '' cluster, or the cluster is not in the same region as your Amazon ML service. Specify a cluster in the same region as the Amazon ML service.
Is there anyway around this, as this would be a huge pain since all of our development team's data is stored in a region that Machine Learning doesn't work in?
That's an interesting situation to be in.
What probably you can do :
1) Wait for Amazon Web Services to support AWS ML in your preferred Region. (That's a long wait though).
2) OR what else you can do is Create a backup plan for your Redshift data.
Amazon Redshift provides you some by Default tools to back up your
cluster via snapshot to Amazon Simple Storage Service (Amazon S3).
These snapshots can be restored in any AZ in that region or
transferred automatically to other regions wherever you want (In your
case where your ML is running).
There is (Probably) no other way around to use your ML with Redshift being in different regions.
Hope it will help !
I have data in an AWS RDS, and I would like to pipe it over to an AWS ES instance, preferably updating once an hour, or similar.
On my local machine, with a local mysql database and Elasticsearch database, it was easy to set this up using Logstash.
Is there a "native" AWS way to do the same thing? Or do I need to set up an EC2 server and install Logstash on it myself?
You can achieve the same thing with your local Logstash, simply point your jdbc input to your RDS database and the elasticsearch output to your AWS ES instance. If you need to run this regularly, then yes, you'd need to setup a small instance to run Logstash on it.
A more "native" AWS solution to achieve the same thing would include the use of Amazon Kinesis and AWS Lambda.
Here's a good article explaining how to connect it all together, namely:
how to stream RDS data into a Kinesis Stream
configuring a Lambda function to handle the stream
push the data to your AWS ES instance
Take a look at Amazon DMS. Its usually used for DB migrations, however, it also supports continuous data replication. This might simplify the process and be cost-effective.
You can use AWS Database Migration Service to perform continuous data replication. Continuous data replication has a multitude of use cases including Disaster Recovery instance synchronization, geographic database distribution and Dev/Test environment synchronization. You can use DMS for both homogeneous and heterogeneous data replications for all supported database engines. The source or destination databases can be located in your own premises outside of AWS, running on an Amazon EC2 instance, or it can be an Amazon RDS database. You can replicate data from a single database to one or more target databases or data from multiple source databases can be consolidated and replicated to one or more target databases.
https://aws.amazon.com/dms/
I have a service hosted on Amazon Web Services. There I have multiple EC2 instances running with the exact same setup and data, managed by an Elastic Load Balancer and scaling groups.
Those instances are web servers running web applications based on PHP. So currently there are the very same files etc. placed on every instance. But when the ELB / scaling group launches a new instance based on load rules etc., the files might not be up-to-date.
Additionally, I'd rather like to use a shared file system for PHP sessions etc. than sticky sessions.
So, my question is, for those reasons and maybe more coming up in the future, I would like to have a shared file system entity which I can attach to my EC2 instances.
What way would you suggest to resolve this? Are there any solutions offered by AWS directly so I can rely on their services rather than doing it on my on with a DRBD and so on? What is the easiest approach? DRBD, NFS, ...? Is S3 also feasible for those intends?
Thanks in advance.
As mentioned in a comment, AWS has announced EFS (http://aws.amazon.com/efs/) a shared network file system. It is currently in very limited preview, but based on previous AWS services I would hope to see it generally available in the next few months.
In the meantime there are a couple of third party shared file system solutions for AWS such as SoftNAS https://aws.amazon.com/marketplace/pp/B00PJ9FGVU/ref=srh_res_product_title?ie=UTF8&sr=0-3&qid=1432203627313
S3 is possible but not always ideal, the main blocker being it does not natively support any filesystem protocols, instead all interactions need to be via an AWS API or via http calls. Additionally when looking at using it for session stores the 'eventually consistent' model will likely cause issues.
That being said - if all you need is updated resources, you could create a simple script to run either as a cron or on startup that downloads the files from s3.
Finally in the case of static resources like css/images don't store them on your webserver in the first place - there are plenty of articles covering the benefit of storing and accessing static web resources directly from s3 while keeping the dynamic stuff on your server.
From what we can tell at this point, EFS is expected to provide basic NFS file sharing on SSD-backed storage. Once available, it will be a v1.0 proprietary file system. There is no encryption and its AWS-only. The data is completely under AWS control.
SoftNAS is a mature, proven advanced ZFS-based NAS Filer that is full-featured, including encrypted EBS and S3 storage, storage snapshots for data protection, writable clones for DevOps and QA testing, RAM and SSD caching for maximum IOPS and throughput, deduplication and compression, cross-zone HA and a 100% up-time SLA. It supports NFS with LDAP and Active Directory authentication, CIFS/SMB with AD users/groups, iSCSI multi-pathing, FTP and (soon) AFP. SoftNAS instances and all storage is completely under your control and you have complete control of the EBS and S3 encryption and keys (you can use EBS encryption or any Linux compatible encryption and key management approach you prefer or require).
The ZFS filesystem is a proven filesystem that is trusted by thousands of enterprises globally. Customers are running more than 600 million files in production on SoftNAS today - ZFS is capable of scaling into the billions.
SoftNAS is cross-platform, and runs on cloud platforms other than AWS, including Azure, CenturyLink Cloud, Faction cloud, VMware vSPhere/ESXi, VMware vCloud Air and Hyper-V, so your data is not limited or locked into AWS. More platforms are planned. It provides cross-platform replication, making it easy to migrate data between any supported public cloud, private cloud, or premise-based data center.
SoftNAS is backed by industry-leading technical support from cloud storage specialists (it's all we do), something you may need or want.
Those are some of the more noteworthy differences between EFS and SoftNAS. For a more detailed comparison chart:
https://www.softnas.com/wp/nas-storage/softnas-cloud-aws-nfs-cifs/how-does-it-compare/
If you are willing to roll your own HA NFS cluster, and be responsible for its care, feeding and support, then you can use Linux and DRBD/corosync or any number of other Linux clustering approaches. You will have to support it yourself and be responsible for whatever happens.
There's also GlusterFS. It does well up to 250,000 files (in our testing) and has been observed to suffer from an IOPS brownout when approaching 1 million files, and IOPS blackouts above 1 million files (according to customers who have used it). For smaller deployments it reportedly works reasonably well.
Hope that helps.
CTO - SoftNAS
For keeping your webserver sessions in sync you can easily switch to Redis or Memcached as your session handler. This is a simple setting in the PHP.ini and they can all access the same Redis or Memcached server to do sessions. You can use Amazon's Elasticache which will manage the Redis or Memcache instance for you.
http://phpave.com/redis-as-a-php-session-handler/ <- explains how to setup Redis with PHP pretty easily
For keeping your files in sync is a little bit more complicated.
How to I push new code changes to all my webservers?
You could use Git. When you deploy you can setup multiple servers and it will push your branch (master) to the multiple servers. So every new build goes out to all webserver.
What about new machines that launch?
I would setup new machines to run a rsync script from a trusted source, your master web server. That way they sync their web folders with the master when they boot and would be identical even if the AMI had old web files in it.
What about files that change and need to be live updated?
Store any user uploaded files in S3. So if user uploads a document on Server 1 then the file is stored in s3 and location is stored in a database. Then if a different user is on server 2 he can see the same file and access it as if it was on server 2. The file would be retrieved from s3 and served to the client.
GlusterFS is also an open source distributed file system used by many to create shared storage across EC2 instances
Until Amazon EFS hits production the best approach in my opinion is to build a storage backend exporting NFS from EC2 instances, maybe using Pacemaker/Corosync to achieve HA.
You could create an EBS volume that stores the files and instruct Pacemaker to umount/dettach and then attach/mount the EBS volume to the healthy NFS cluster node.
Hi we currently use a product called SoftNAS in our AWS environment. It allows us to chooses between both EBS and S3 backed storage. It has built in replication as well as a high availability option. May be something you can check out. I believe they offer a free trial you can try out on AWS
We are using ObjectiveFS and it is working well for us. It uses S3 for storage and is straight forward to set up.
They've also written a doc on how to share files between EC2 instances.
http://objectivefs.com/howto/how-to-share-files-between-ec2-instances