How do I split shard ids across EC2 instances as env variables? - amazon-web-services

I have two EC2 instances running via ECS. In code, I have a client that is instantiated with a shard_count and shard_ids. I need the list of shard_ids to be different per process/instance.
e.g.
instance1:
Client(shard_count=5, env.get(shard_ids) -> [1, 2])
instance2:
Client(shard_count=5, env.get(shard_ids) -> [3, 4, 5])
I'm assuming some kind of approach with env variables is the way to go, but I'm not sure if these are assigned at the instance level, or if there's some sophisticated approach to split up a collection across instances.

Related

Amazon web services autoscale

How to use lambda python function to name my group instances?
I want to name them in increasing order like hello1,hello2,hello3,etc.Can anyone tell how to use lambda function to name my autlscale groups?
I want to create instances..I want a function which will create them and give them name tag as..first instance name tag should be "hello1" second instance name tag should be "hello2" ..and so on... If any instance gets terminated ..say hello2 gets terminated then by autoscaling group formed, minimum number of instances is 2 ..therefore new instance will be created name it as hello2
One way to do this would be to write a script that gets executed when the instance is started. Put the script in the User Data that automatically gets run when an instance starts.
The script would:
Call DescribeInstances() to obtain a list of EC2 instances
Filter the list down to the instances within the Auto Scaling group
Count the number of instances (including itself)
Perform the necessary logic to figure out which number should be assigned
Create a Name tag on the new instance (effectively tagging itself)
Please note that the numbers might not be continuous. For example:
Start 4 instances (1, 2, 3, 4)
Auto Scaling might remove instances 2 & 3
Auto Scaling might add an instance (call it #2)
The current instances are: 1, 2, 4
Bottom line: You really shouldn't get fixated over numbering instances that are ephemeral (that is, that can be removed at any time). Simply be aware of how many instances are in the Auto Scaling group. If you really do need a unique ID, use the InstanceId.

AWS - Aurora replicas

Scenario:
I have two reader-aurora replicas.
I make many calls to my system (high load)
I see only one replica working at 99.30%, but the other one is not doing
anything at all
Why?, is because this second replica is ONLY to prevent failures of the first one?, cannot be possible to make both to share the load?
In your RDS console, you should be able to look at each of the 3 instances
aurora-databasecluster-xxx.cluster-yyy.us-east-1.rds.amazonaws.com:3306
zz0.yyy.us-east-1.rds.amazonaws.com:3306
zz1.yyy.us-east-1.rds.amazonaws.com:3306
If you look at the cluster tab you will see two end points and the 2nd is the following:
aurora-databasecluster-xxx.cluster-ro-yyy.us-east-1.rds.amazonaws.com
Aurora allows you do either explicitly get to specific read replica. This would allow a set of read only nodes for OLTP performance and another set for data analysis - with long running queries that won't impact performance.
If you use the -ro end point, it should balance cross all read only nodes or you can have your code take a list of read only connection strings and do your own randomizer. I would have expected the ro to be better...but I am not yet familiar on their load balancing technique (fewest connections, round robin, etc)

Automatic creation of snapshots using AWS Lambda

I have completed the automatic creation of snapshots using the following link :
https://blog.powerupcloud.com/2016/02/15/automate-ebs-snapshots-using-lambda-function/
As written in the code, filtering is done based on tags of VMs. Instead of creating a VM with a Backup or backup tag, I want to create snapshots of all except for some names.
I do not want to add extra tags to VMs. Instead, I want to write an if condition in my filters. I would provide the names of my Test VMs and if the VM tag matches that, snapshot would not be created. If it does not match, snapshots have to be created. Can I do that?
Ex : I have four VMs in my account.
VM 1 --> Prod1,
VM 2 --> Prod2,
VM 3 --> Prod3,
VM 4 --> Test1.
Acc to example, I need to be able to write an if condition which includes my test VM tag 'Test1'. If the tag matches this, the snapshot should not be created. If it does not match, snapshots have to be created.
So, for doing this, how should I change my code?
You just need to create a tag for all your three servers with key 'Backup'. The script is filtering the instances on the key names only.
The piece of code that picks up which VMs need to be backed up is this:
reservations = ec.describe_instances(
Filters=[
{'Name': 'tag-key', 'Values': ['Backup', 'True']},
]
).get(
'Reservations', []
)
As you can see, it uses boto's describe_instances and a filter limits the number of instances that will be processed. If you would like to backup everything except for those which are non-prod in your environment, you should consider tagging your non-prod instances with something like Backup=NO.
To backup all servers except those marked with a tag:
Get a list of all servers
Get a list of servers with the 'do not backup' flag and remove them from the first list
Do the backup
It will require two calls to describe_instances().

Filtering ec2-instances with boto

I use tags to keep track of my EC2 instances, such as (Project, Environment). I have a use case where I need to filter only those instances that belong to a specific project and to a specific environment.
When I use filter with boto and pass these two values I get a result that does a OR rather than a AND of the filters and so I am receiving a list of instances that belong to different projects but same environment.
Now I can use two lists and then compare the instances in each and get the desired set of instances, but is there a better way of getting this done?
Here is what i am doing:
conn = ec2.EC2Connection('us-east-1',aws_access_key_id='XXX',aws_secret_access_key='YYY')
reservations = conn.get_all_instances(filters={"tag-key":"project","tag-value":<project-name>,"tag-key":"env","tag-value":<env-name>})
instances = [i for r in reservations for i in r.instances]
Now the instance list that I am getting gives all the instances from the specified project irrespective of the environment and all the instances from the specified environment irrespective of the project.
You can use the tag:key=value syntax to do an AND search on your filters.
import boto.ec2
conn = boto.ec2.connect_to_region('us-east-1',aws_access_key_id='xx', aws_secret_access_key='xx')
reservations = conn.get_all_instances(filters={"tag:Name" : "myName", "tag:Project" : "B"})
instances = [i for r in reservations for i in r.instances]
print instances
See EC2 API for details
http://docs.aws.amazon.com/AWSEC2/latest/APIReference/ApiReference-query-DescribeInstances.html
The problem with the syntax you used is that a Python dict has unique keys, so the second tag-key entry overwrites the first one :-(
Seb
While the documentation does not specifically say what happens with multiple filters, the ORing may be by design. In this case, pass the required attributes in sequence to the function and pass in the result of the previous invocation into the next one (using the instance_ids parameter). This will restrict the results in each step with the additional filter. The attributes are then applied in sequence returning the ANDed result you desire.

Amazon RDS batch insert slower than local database?

I wrote a node program that batch insert to database and do a console log for every completed insert.
function insert(){
var sql = "insert into todo (user, content) values (xx, xx);" +
"insert into todo (user, content) values (xx, xx);" +
"insert into todo (user, content) values (xx, xx);" +
(.... 4000 lines of insert)
db.insert(sql,function success(){
console.log('done');
});
}
for(i=0;i<10000;i++){
insert();
}
I have 2 setup of this:
1) Local machine to local DB.
2) Amazon EC2 Micro Instance to Amazon RDS Micro Instance from same region
*Both my.cnf leave to default with only max_allowed_packet=500m set.
The result is by the time RDS complete one insert, my local machine has completed 24 insert. I tried to upgrade my RDS to small instance, it make no different.
My question is why is amazon rds slower in this case. Any solution for this?
I think the problem might be related to the micro instance performance. After some testing, we moved completely away from micro instances and switched to small instances. On the other hand, I have not found any problems regarding the RDS speed, even with small instances.
Have a look at how the EC2 micro instances work and where they should be used:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts_micro_instances.html
From how I understand your test case, your setup mostly looks like the second figure, and this should not be suitable for micro instances. Try using a small instance and compare the results there. Even if it is still slower than on your local machine, you will have comparable results then.