boto EC2 find all Ubuntu images? - amazon-web-services

How do I find all Ubuntu images available in my region?
Attempt:
from boto.ec2 import connect_to_region
conn = connect_to_region(**dict(region_name=region, **aws_keys))
if not filters: # Not as default arguments, as they should be immutable
filters = {
'architecture': 'x86_64',
'name': 'ubuntu/images/ebs-*'
}
print conn.get_all_images(owners=['amazon'], filters=filters)
I've also tried setting ubuntu/images/ebs-ssd/ubuntu-trusty-14.04-amd64-server-20140927, ubuntu*, *ubuntu and *ubuntu*.

The AWS API does not accept globs in search filters as far as I am aware. You can use the owner id to find it. 'Amazon' is not the owner of the ubuntu images Canonical is.
Change owners=['amazon'] to owners=['099720109477'].
There is no owner alias for canonical as far as I can see, so you will have to use the owner id instead.
Hope this helps.

Related

MWAA can retrieve variable by ID but not connection from AWS Secrets Manager

we've set up AWS SecretsManager as a secrets backend to Airflow (AWS MWAA) as described in their documentation. Unfortunately, nowhere is explained where the secrets are to be found and how they are to be used then. When I supply conn_id to a task in a DAG, we can see two errors in the task logs, ValueError: Invalid IPv6 URL and airflow.exceptions.AirflowNotFoundException: The conn_id redshift_conn isn't defined. What's even more surprising is that when retrieving variables stored the same way with Variable.get('my_variable_id'), it works just fine.
The question is: Am I wrongly expecting that the conn_id can be directly passed to operators as SomeOperator(conn_id='conn-id-in-secretsmanager')? Must I retrieve the connection manually each time I want to use it? I don't want to run something like read_from_aws_sm_fn in the code below every time beforehand...
Btw, neither the connection nor the variable show up in the Airflow UI.
Having stored a secret named airflow/connections/redshift_conn (and on the side one airflow/variables/my_variable_id), I expect the connection to be found and used when constructing RedshiftSQLOperator(task_id='mytask', redshift_conn_id='redshift_conn', sql='SELECT 1'). But this results in the above error.
I am able to retrieve the redshift connection manually in a DAG with a separate task, but I think that is not how SecretsManager is supposed to be used in this case.
The example DAG is below:
from airflow import DAG, settings, secrets
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago
from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
from airflow.models.baseoperator import chain
from airflow.models import Connection, Variable
from airflow.providers.amazon.aws.operators.redshift import RedshiftSQLOperator
from datetime import timedelta
sm_secret_id_name = f'airflow/connections/redshift_conn'
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(1),
'retries': 1,
}
def read_from_aws_sm_fn(**kwargs): # from AWS example code
### set up Secrets Manager
hook = AwsBaseHook(client_type='secretsmanager')
client = hook.get_client_type('secretsmanager')
response = client.get_secret_value(SecretId=sm_secret_id_name)
myConnSecretString = response["SecretString"]
print(myConnSecretString[:15])
return myConnSecretString
def get_variable(**kwargs):
my_var_value = Variable.get('my_test_variable')
print('variable:')
print(my_var_value)
return my_var_value
with DAG(
dag_id=f'redshift_test_dag',
default_args=default_args,
dagrun_timeout=timedelta(minutes=10),
start_date=days_ago(1),
schedule_interval=None,
tags=['example']
) as dag:
read_from_aws_sm_task = PythonOperator(
task_id="read_from_aws_sm",
python_callable=read_from_aws_sm_fn,
provide_context=True
) # works fine
query_redshift = RedshiftSQLOperator(
task_id='query_redshift',
redshift_conn_id='redshift_conn',
sql='SELECT 1;'
) # results in above errors :-(
try_to_get_variable_value = PythonOperator(
task_id='get_variable',
python_callable=get_variable,
provide_context=True
) # works fine!
The question is: Am I wrongly expecting that the conn_id can be directly passed to operators as SomeOperator(conn_id='conn-id-in-secretsmanager')? Must I retrieve the connection manually each time I want to use it? I don't want to run something like read_from_aws_sm_fn in the code below every time beforehand...
Using secret manager as a backend, you don't need to change the way you use the connections or variables. They work the same way, when looking up a connection/variable, airflow follow a search path.
Btw, neither the connection nor the variable show up in the Airflow UI.
The connection/variable will not up in the UI.
ValueError: Invalid IPv6 URL and airflow.exceptions.AirflowNotFoundException: The conn_id redshift_conn isn't defined
The 1st error is related to the secret and the 2nd error is due to the connection not existing in the airflow UI.
There is 2 formats to store connections in secret manager (depending on the aws provider version installed) the IPv6 URL error could be that its not parsing the connection correctly. Here is a link to the provider docs.
First step is defining the prefixes for connections and variables, if they are not defined, your secret backend will not check for the secret:
secrets.backend_kwargs : {"connections_prefix" : "airflow/connections", "variables_prefix" : "airflow/variables"}
Then for the secrets/connections, you should store them in those prefixes, respecting the required fields for the connection.
For example, for the connection my_postgress_conn:
{
"conn_type": "postgresql",
"login": "user",
"password": "pass",
"host": "host",
"extra": '{"key": "val"}',
}
You should store it in the path airflow/connections/my_postgress_conn, with the json dict as string.
And for the variables, you just need to store them in airflow/variables/<var_name>.

AWS CLI Query to find Shared Security Groups

I am trying to write a query to return results of any referenced security group not owned by the current account.
This means I am trying to show security groups that are being used as part of a peering connection from another VPC.
There are a couple of restrictions.
Show the entire security group details (security group id, description)
Only show security groups where IpPermissions.UserIdGroupPairs has a Value and where that value is not equal to the owner of the security group
I am trying to write this using a single AWS CLI cmd vs a bash script or python script.
Any thoughts?
Heres what I have so far.
aws ec2 describe-security-groups --query "SecurityGroups[?IpPermissions.UserIdGroupPairs[*].UserId != '`aws sts get-caller-identity --query 'Account' --output text`']"
Following is Python 3.8 based AWS Lambda, but you can change a bit to use as python script file to execute on any supported host machine.
import boto3
import ast
config_service = boto3.client('config')
# Refactor to extract out duplicate code as a seperate def
def lambda_handler(event, context):
results = get_resource_details()
for resource in results:
if "configuration" in resource:
config=ast.literal_eval(resource)["configuration"]
if "ipPermissionsEgress" in config:
ipPermissionsEgress=config["ipPermissionsEgress"]
for data in ipPermissionsEgress:
for userIdGroupPair in data["userIdGroupPairs"]:
if userIdGroupPair["userId"] != "123456789111":
print(userIdGroupPair["groupId"])
elif "ipPermissions" in config:
ipPermissions=config["ipPermissions"]
for data in ipPermissions:
for userIdGroupPair in data["userIdGroupPairs"]:
if userIdGroupPair["userId"] != "123456789111":
print(userIdGroupPair["groupId"])
def get_resource_details():
query = "SELECT configuration.ipPermissions.userIdGroupPairs.groupId,configuration.ipPermissionsEgress.userIdGroupPairs.groupId,configuration.ipPermissionsEgress.userIdGroupPairs.userId,configuration.ipPermissions.userIdGroupPairs.userId WHERE resourceType = 'AWS::EC2::SecurityGroup' AND configuration <> 'null'"
results = config_service.select_resource_config(
Expression=query,
Limit=100
) # you might need to refacor to add support huge list of records using NextToken
return results["Results"]

Does GCP API supports filtering list objects?

I'm trying to filter instances from a specific network and I tried to use this filter:
networkInterfaces.network = 'NET_NAME'
And I got invalid expression.
I even tried the following - same results for all of them:
networkInterfaces[0].network = 'NET_NAME'
networkInterfaces[].network = 'NET_NAME'
[]networkInterfaces.network = 'NET_NAME'
Can't find a place saying if this even a supported filter
I tried running these filters in their API explorer:
https://cloud.google.com/compute/docs/reference/rest/v1/instances/aggregatedList?
And also via their official Python client
You can parse the response and filter the required instances like below
import googleapiclient.discovery
project = "my-gcp-project"
zone = "us-central1-b"
network_name = "mynetwork"
compute = googleapiclient.discovery.build('compute', 'v1')
result = compute.instances().list(project=project, zone=zone).execute()
filtered_instances = []
for item in result['items']:
if "networkInterfaces" in item.keys():
for network_interface in item['networkInterfaces']:
if "network" in network_interface.keys():
if network_name in network_interface['network']:
filtered_instances.append(item['name'])
print(str(filtered_instances))
Hope this helps.
Basically, filtering for 'Network tags' to list instances is not supported in this API. To view the list of supported filter/fields, you can click on 'Filter VM Instances' on your GCP console's main page showing VM instances. Unfortunately, the list does not include 'Network tags' to filter on. Alternatively, you can execute the following command in a cloud shell to list all instances with their network tags:
$ gcloud compute instances list --filter=tags:TAG_NAME
You can see this example in this documentation.

How use Filters with boto3 vpc endpoint services?

I need to get vpc endpoint service ID from python script, but I don't understand how use boto3, filters from vpc-id or a subnet
How do I use Filters?
This part of boto3
> (dict) --
A filter name and value pair that is used to return a more specific list of results from a describe operation. Filters can be used to match a set of resources by specific criteria, such as tags, attributes, or IDs. The filters supported by a describe operation are documented with the describe operation. For example:
DescribeAvailabilityZones
DescribeImages
DescribeInstances
DescribeKeyPairs
DescribeSecurityGroups
DescribeSnapshots
DescribeSubnets
DescribeTags
DescribeVolumes
DescribeVpcs
Name (string) --
The name of the filter. Filter names are case-sensitive.
Values (list) --
The filter values. Filter values are case-sensitive.
(string) --
The easiest method would be to call it with no filters, and observe what comes back:
import boto3
ec2_client = boto3.client('ec2', region_name='ap-southeast-2')
response = ec2_client.describe_vpc_endpoint_services()
for service in response['ServiceDetails']:
print(service['ServiceId'])
You can then either filter the results within your Python code, or use the Filters capability of the Describe command.
Feel free to print(response) to see the data that comes back.
It depends on what you want to filter the results with. In my case, I use below to filter it for a specific vpc-endpoint-id.
import boto3
vpc_client = boto3.client('ec2')
vpcEndpointId = "vpce-###"
vpcEndpointDetails = vpc_client.describe_vpc_endpoints(
VpcEndpointIds=[vpcEndpointId],
Filters=[
{
'Name': 'vpc-endpoint-id',
'Values': [vpcEndpointId]
},
])

Configuring munin server for use with AWS autoscaling?

I am planning to use AWS autoscaling groups for my webservers. As a monitoring solution I am using munin at the moment. In the configuration file on the munin master server, you have to give IP addresses or host names for every host you want to monitor.
Now with autoscaling the number of instances will change frequently, and writing static information in the munin config does not seem to fit well in this environment. I could probably query all server addresses I want to monitor and write the munin master configuration file then, but this seems not like a good approach to me.
What is the preferred way of using munin in such an environment? Does someone use munin with autoscaling?
In general I would like to keep using munin and not switch to another monitoring solution because I wrote quite a lot of specific plugins that I rely on. However if you have another monitoring solution that will probably let me keep my plugins I am also open for that.
One year ago we used munin as alternative monitoring system and I will tell you one: I don't like it at all.
We had some automation for auto scaling system in nagios too, but this is also ugly way to monitor large amount of AWS instances because nagios starts to lag/crash after some amount of monitoring instances.
If you have more that 150-200 instances to monitor I suggest you to use some commercial services like StackDriver or other alternatives.
I stumbled across this old topic because I was looking for a solution to the same problem. Finally I found a way that works for me which I would like to share with you. The tl;dr summary
use AWS Python API to get all instances in the same VPC the munin master is in
test if munin port 4949 is open on the instances found to detect munin nodes
create munin.conf from a munin.base.conf (without nodes) and append entries for all the nodes found
run the script on the munin master all 5 minutes via cron
Finally, here is my Python script which does all the magic:
#! /usr/bin/python
import boto3
import requests
import argparse
import shutil
import socket
socketTimeout = 2
ec2 = boto3.client('ec2')
def getVpcId():
response = requests.get('http://169.254.169.254/latest/meta-data/instance-id')
instance_id = response.text
response = ec2.describe_instances(
Filters=[
{
'Name' : 'instance-id',
'Values' : [ instance_id ]
}
]
)
return response['Reservations'][0]['Instances'][0]['VpcId']
def findNodes(tag):
result = []
vpcId = getVpcId()
response = ec2.describe_instances(
Filters=[
{
'Name' : 'tag-key',
'Values' : [ tag ]
},
{
'Name' : 'vpc-id',
'Values' : [ vpcId ]
}
]
)
for reservation in response['Reservations']:
for instance in reservation['Instances']:
result.append(instance)
return result
def getInstanceTag(instance, tagName):
for tag in instance['Tags']:
if tag['Key'] == tagName:
return tag['Value']
return None
def isMuninNode(host):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(socketTimeout)
try:
s.connect((host, 4949))
s.shutdown(socket.SHUT_RDWR)
return True
except Exception as e:
return False
finally:
s.close()
def appendNodesToConfig(nodes, target, tag):
with open(target, "a") as file:
for node in nodes:
hostname = getInstanceTag(node, tag)
if hostname.endswith('.'):
hostname = hostname[:-1]
if hostname <> None and isMuninNode(hostname):
file.write('[' + hostname + ']\n')
file.write('\taddress ' + hostname + '\n')
file.write('\tuse_node_name yes\n\n')
parser = argparse.ArgumentParser("muninconf.py")
parser.add_argument("baseconfig", help="base munin config to append nodes to")
parser.add_argument("target", help="target munin config")
args = parser.parse_args()
base = args.baseconfig
target = args.target
shutil.copyfile(base, target)
nodes = findNodes('CNAME')
appendNodesToConfig(nodes, target, 'CNAME')
For the API calls to work you have to setup AWS API credentials or assign an IAM role with the required permissions (ec2:DescribeInstances as a bare minimum) to your munin master instance (which is my prefered method).
Some final implementation notes:
I have a tag named CNAME assigned to all my AWS instances which holds the internal DNS host name. Therefore I filter for this tag and use the value as the node name and address for the munin configuration. You probably have to change this for your setup.
Another option would be to assign a specific tag to all the instances you want to monitor with munin. You could then filter for this tag and probably also skip the check for the open munin port.
Hope this is of some help.
Cheers,
Oliver