Filtering ec2-instances with boto - amazon-web-services

I use tags to keep track of my EC2 instances, such as (Project, Environment). I have a use case where I need to filter only those instances that belong to a specific project and to a specific environment.
When I use filter with boto and pass these two values I get a result that does a OR rather than a AND of the filters and so I am receiving a list of instances that belong to different projects but same environment.
Now I can use two lists and then compare the instances in each and get the desired set of instances, but is there a better way of getting this done?
Here is what i am doing:
conn = ec2.EC2Connection('us-east-1',aws_access_key_id='XXX',aws_secret_access_key='YYY')
reservations = conn.get_all_instances(filters={"tag-key":"project","tag-value":<project-name>,"tag-key":"env","tag-value":<env-name>})
instances = [i for r in reservations for i in r.instances]
Now the instance list that I am getting gives all the instances from the specified project irrespective of the environment and all the instances from the specified environment irrespective of the project.

You can use the tag:key=value syntax to do an AND search on your filters.
import boto.ec2
conn = boto.ec2.connect_to_region('us-east-1',aws_access_key_id='xx', aws_secret_access_key='xx')
reservations = conn.get_all_instances(filters={"tag:Name" : "myName", "tag:Project" : "B"})
instances = [i for r in reservations for i in r.instances]
print instances
See EC2 API for details
http://docs.aws.amazon.com/AWSEC2/latest/APIReference/ApiReference-query-DescribeInstances.html
The problem with the syntax you used is that a Python dict has unique keys, so the second tag-key entry overwrites the first one :-(
Seb

While the documentation does not specifically say what happens with multiple filters, the ORing may be by design. In this case, pass the required attributes in sequence to the function and pass in the result of the previous invocation into the next one (using the instance_ids parameter). This will restrict the results in each step with the additional filter. The attributes are then applied in sequence returning the ANDed result you desire.

Related

If I call the EC2 client's "describe_instances" function with no MaxResults, will it return all instances?

If I call the boto3 EC2 client's describe_instances function with no MaxResults parameter, will it return all instances in the initial call? There is a parameter that allows one to specify MaxResults, but it is not required. If I don't specify this MaxResults parameter, will the response contain all instances or will it still chunk them into groups using the NextToken of the response?
The documentation says
"Describes the specified instances or all of AWS account's
instances...If you do not specify instance IDs, Amazon EC2 returns
information for all relevant instances."
But it is not clear whether I still need to expect that things could be returned in chunks if my account has a lot of instances. The MaxResults parameter can be set to "between 5 and 1000," which implies 1000 may be the default MaxResults.
If you do not specify MaxResults, then the server-side API will limit the response to either a maximum number of results/items (for example 1000) or a maximum size of response payload (e.g. 256 MB). Which it does is not typically documented, and potentially varies from API call to API call and from service to service.
If NextToken is present in the response and is not NULL, then you should re-issue the API call, with the NextToken, to get the next 'page' of results. Rinse and repeat until you have all results.
If you know you have only a handful of EC2 instances (say < 100), most programmers don't typically check the response's NextToken. They probably should, but they don't.
Note that the above relates to the boto3 Client interface. You can also use the describe-instances paginator.
If you are purely interested in EC2 instances within a given VPC, then you can use the VPC's instances collection. This is part of boto3's Resource-level interface. The instances are lazily-loaded and you don't need to paginate or mess with next tokens. See differences between Client and Resource.
Modified
Let us assume that we call the describe_instances() and didn't set the value of MaxResults.
Then, the response will contain the list of instances. There can be NextToken or not. If NextToken exists, the response is showing only some part of all instances. If NextToken is not present, then the response shows all instances.
Not setting the MaxResults does not mean that the response will show all instances.
Original
Once you receive the response as a result of describe_instances() without NextToken, the result shows all instances even you didn't set the MaxResults. You only need to care about the response for describe_instances().
Or use the pagenator to get all result without NextToken. Here is my sample code for snapshot.
import boto3
boto3 = boto3.session.Session(region_name='ap-northeast-2')
ec2 = boto3.client('ec2')
page_iterator = ec2.get_paginator('describe_snapshots').paginate()
for page in page_iterator:
for snapshot in page['Snapshots']:
print(snapshot['SnapshotId'], snapshot['StartTime'])
This will print all snapshot id and starttime.
Check the below 2 options to call describe instances:
simple direct API call like "describe_instances" which has NextToken argument which means you can use this token as a starting point when you query next time. If you have less number of instances maybe in a single call it would return all instances and in that case you wont see NextToken value.
Using paginator Command Reference Here once you have a paginator.paginate() object, you can use for loop and it will return all instances. In this way we don't have to worry about MaxItems or NextToken.
Simple example that illustrates how to use a paginator
I would recommend using paginators whenever possible.

List AMI using boto3 in order by time creation day?

I'm using boto3 API to describe_image from AWS. My list AMI has the same name, AWS automatically generate suffix on that name.
I wonder is there any option to describe_image in creation time order list.
Currently I have to sort programmatically on that return dict.
Any help would be appreciated.
No. There is no capability to request the data back in a particular order.
You can use a Filter to limit the results, but not to sort the results.
You would need to programmatically sort the results to identify your desired AMI.
For reference, sorting Images by creation date can be done as follows
import boto3
def image_sort(elem):
return elem.get('CreationDate')
ec2_client = boto3.client('ec2')
images = ec2_client.describe_images(ImageIds=["my", "ami", "ids"])
# The ec2 describe_images call returns a dict with a key of
# 'Images' that holds a list of image definitions.
images = images.get('Images')
# After getting the images, we can use python's list.sort
# method, and pass a simple function that gets the item to sort on.
images.sort(key=image_sort)
print(images)

Amazon web services autoscale

How to use lambda python function to name my group instances?
I want to name them in increasing order like hello1,hello2,hello3,etc.Can anyone tell how to use lambda function to name my autlscale groups?
I want to create instances..I want a function which will create them and give them name tag as..first instance name tag should be "hello1" second instance name tag should be "hello2" ..and so on... If any instance gets terminated ..say hello2 gets terminated then by autoscaling group formed, minimum number of instances is 2 ..therefore new instance will be created name it as hello2
One way to do this would be to write a script that gets executed when the instance is started. Put the script in the User Data that automatically gets run when an instance starts.
The script would:
Call DescribeInstances() to obtain a list of EC2 instances
Filter the list down to the instances within the Auto Scaling group
Count the number of instances (including itself)
Perform the necessary logic to figure out which number should be assigned
Create a Name tag on the new instance (effectively tagging itself)
Please note that the numbers might not be continuous. For example:
Start 4 instances (1, 2, 3, 4)
Auto Scaling might remove instances 2 & 3
Auto Scaling might add an instance (call it #2)
The current instances are: 1, 2, 4
Bottom line: You really shouldn't get fixated over numbering instances that are ephemeral (that is, that can be removed at any time). Simply be aware of how many instances are in the Auto Scaling group. If you really do need a unique ID, use the InstanceId.

Automatic creation of snapshots using AWS Lambda

I have completed the automatic creation of snapshots using the following link :
https://blog.powerupcloud.com/2016/02/15/automate-ebs-snapshots-using-lambda-function/
As written in the code, filtering is done based on tags of VMs. Instead of creating a VM with a Backup or backup tag, I want to create snapshots of all except for some names.
I do not want to add extra tags to VMs. Instead, I want to write an if condition in my filters. I would provide the names of my Test VMs and if the VM tag matches that, snapshot would not be created. If it does not match, snapshots have to be created. Can I do that?
Ex : I have four VMs in my account.
VM 1 --> Prod1,
VM 2 --> Prod2,
VM 3 --> Prod3,
VM 4 --> Test1.
Acc to example, I need to be able to write an if condition which includes my test VM tag 'Test1'. If the tag matches this, the snapshot should not be created. If it does not match, snapshots have to be created.
So, for doing this, how should I change my code?
You just need to create a tag for all your three servers with key 'Backup'. The script is filtering the instances on the key names only.
The piece of code that picks up which VMs need to be backed up is this:
reservations = ec.describe_instances(
Filters=[
{'Name': 'tag-key', 'Values': ['Backup', 'True']},
]
).get(
'Reservations', []
)
As you can see, it uses boto's describe_instances and a filter limits the number of instances that will be processed. If you would like to backup everything except for those which are non-prod in your environment, you should consider tagging your non-prod instances with something like Backup=NO.
To backup all servers except those marked with a tag:
Get a list of all servers
Get a list of servers with the 'do not backup' flag and remove them from the first list
Do the backup
It will require two calls to describe_instances().

Dataproc client : googleapiclient : method to get list of all jobs(runnng, stopped .. etc) in a cluster

We are using Google Cloud Dataproc to run sparkJobs.
We have a requirement to get a list of all jobs and its states corresponding to a cluster.
I can get the status of a job, if I know the job_id, as below
res = dpclient.dataproc.projects().regions().jobs().get(
projectId=project,
region=region,
jobId="ab4f5d05-e890-4ff5-96ef-017df2b5c0bc").execute()
But , what if I dont know the job_id, and want to know the status of all the Jobs
To list jobs in a cluster, you can use the list() method:
clusterName = 'cluster-1'
res = dpclient.dataproc.projects().regions().jobs().list(
projectId=project,
region=region,
clusterName=clusterName).execute()
However, note that this only currently supports listing by clusters which still exist; even though you pass in a clusterName, this is resolved to a unique cluster_uuid under the hood; this also means if you create multiple clusters of the same name, each incarnation is still considered a different cluster, so job listing is only performed on the currently running version of the clusterName. This is by design, since clusterName is often reused by people for different purposes (especially if using the default generated names created in cloud.google.com/console), and logically the jobs submitted to different actual cluster instances may not be related to each other.
In the future there will be more filter options for job listings.