I tried to add a DBClusterParameterGroup to aurora postgresql engine. For that I used below command.
db_cluster_params = self.add_resource(
rds.DBClusterParameterGroup(
"CustomAuroraRDSClusterParamGroup",
Description="Aurora RDS Cluster Parameter Group",
Family="aurora-postgresql13",
Parameters={
"time_zone": "UTC"
}
)
)
When I executed the above, I got the below error.
Invalid / Unmodifiable / Unsupported DB Parameter: time_zone
But based on [1] time_zone is a valid parameter. Could you please tell me what is the issue in my script?
[1] https://docs.amazonaws.cn/en_us/AWSCloudFormation/latest/UserGuide/aws-resource-rds-dbclusterparametergroup.html
Before applying parameters to a given Family of a DBClusterParameterGroup, we need to check the available parameters for that family.
You can use the below query for that.
aws rds describe-db-cluster-parameters --db-cluster-parameter-group-name default.aurora-postgresql13
based on the output of the above troposphere code should be changed like this.
db_cluster_params = self.add_resource(
rds.DBClusterParameterGroup(
"CustomAuroraRDSClusterParamGroup{}".format(identifier),
Description="Aurora RDS Cluster Parameter Group",
Family="aurora-postgresql13",
Parameters={
"timezone": "UTC"
}
)
)
Related
we've set up AWS SecretsManager as a secrets backend to Airflow (AWS MWAA) as described in their documentation. Unfortunately, nowhere is explained where the secrets are to be found and how they are to be used then. When I supply conn_id to a task in a DAG, we can see two errors in the task logs, ValueError: Invalid IPv6 URL and airflow.exceptions.AirflowNotFoundException: The conn_id redshift_conn isn't defined. What's even more surprising is that when retrieving variables stored the same way with Variable.get('my_variable_id'), it works just fine.
The question is: Am I wrongly expecting that the conn_id can be directly passed to operators as SomeOperator(conn_id='conn-id-in-secretsmanager')? Must I retrieve the connection manually each time I want to use it? I don't want to run something like read_from_aws_sm_fn in the code below every time beforehand...
Btw, neither the connection nor the variable show up in the Airflow UI.
Having stored a secret named airflow/connections/redshift_conn (and on the side one airflow/variables/my_variable_id), I expect the connection to be found and used when constructing RedshiftSQLOperator(task_id='mytask', redshift_conn_id='redshift_conn', sql='SELECT 1'). But this results in the above error.
I am able to retrieve the redshift connection manually in a DAG with a separate task, but I think that is not how SecretsManager is supposed to be used in this case.
The example DAG is below:
from airflow import DAG, settings, secrets
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago
from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
from airflow.models.baseoperator import chain
from airflow.models import Connection, Variable
from airflow.providers.amazon.aws.operators.redshift import RedshiftSQLOperator
from datetime import timedelta
sm_secret_id_name = f'airflow/connections/redshift_conn'
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(1),
'retries': 1,
}
def read_from_aws_sm_fn(**kwargs): # from AWS example code
### set up Secrets Manager
hook = AwsBaseHook(client_type='secretsmanager')
client = hook.get_client_type('secretsmanager')
response = client.get_secret_value(SecretId=sm_secret_id_name)
myConnSecretString = response["SecretString"]
print(myConnSecretString[:15])
return myConnSecretString
def get_variable(**kwargs):
my_var_value = Variable.get('my_test_variable')
print('variable:')
print(my_var_value)
return my_var_value
with DAG(
dag_id=f'redshift_test_dag',
default_args=default_args,
dagrun_timeout=timedelta(minutes=10),
start_date=days_ago(1),
schedule_interval=None,
tags=['example']
) as dag:
read_from_aws_sm_task = PythonOperator(
task_id="read_from_aws_sm",
python_callable=read_from_aws_sm_fn,
provide_context=True
) # works fine
query_redshift = RedshiftSQLOperator(
task_id='query_redshift',
redshift_conn_id='redshift_conn',
sql='SELECT 1;'
) # results in above errors :-(
try_to_get_variable_value = PythonOperator(
task_id='get_variable',
python_callable=get_variable,
provide_context=True
) # works fine!
The question is: Am I wrongly expecting that the conn_id can be directly passed to operators as SomeOperator(conn_id='conn-id-in-secretsmanager')? Must I retrieve the connection manually each time I want to use it? I don't want to run something like read_from_aws_sm_fn in the code below every time beforehand...
Using secret manager as a backend, you don't need to change the way you use the connections or variables. They work the same way, when looking up a connection/variable, airflow follow a search path.
Btw, neither the connection nor the variable show up in the Airflow UI.
The connection/variable will not up in the UI.
ValueError: Invalid IPv6 URL and airflow.exceptions.AirflowNotFoundException: The conn_id redshift_conn isn't defined
The 1st error is related to the secret and the 2nd error is due to the connection not existing in the airflow UI.
There is 2 formats to store connections in secret manager (depending on the aws provider version installed) the IPv6 URL error could be that its not parsing the connection correctly. Here is a link to the provider docs.
First step is defining the prefixes for connections and variables, if they are not defined, your secret backend will not check for the secret:
secrets.backend_kwargs : {"connections_prefix" : "airflow/connections", "variables_prefix" : "airflow/variables"}
Then for the secrets/connections, you should store them in those prefixes, respecting the required fields for the connection.
For example, for the connection my_postgress_conn:
{
"conn_type": "postgresql",
"login": "user",
"password": "pass",
"host": "host",
"extra": '{"key": "val"}',
}
You should store it in the path airflow/connections/my_postgress_conn, with the json dict as string.
And for the variables, you just need to store them in airflow/variables/<var_name>.
In AWS, All the instances which do not have the tag "os" with value "linux" should shut down in the region ?.
can someone please help doing this. I tried, but not able to achieve.
I found the possibility to stop instances with specific tag, but unable to create lambda function to stop ec2 instances with out specific tag and value.
Could you get a list of all of your instances, then iterate through them, declining to terminate if they have the tag? In Python, something like:
import boto3
def has_tag(instance):
# check if instance has tag
client = boto3.client('ec2')
response = client.describe_instances(
Filters=[
...
],
...
)
for instance in response['Reservations']['Instances']:
if not has_tag(instance):
#terminate instance
You could use the Resource Group Tagging API to get the list of resource that do not have that value.
This would get you a list of the ARN of all the EC2 instances that do not have "linux" as the value of any tag:
aws resourcegroupstaggingapi get-resources --resource-type-filters ec2:instance --query 'ResourceTagMappingList[?!(Tags[?Value==`linux`])].ResourceARN'
Assuming "linux" is a value only used for the "os" tag to filter out any resources whose tags do not have that value, but you can adjust the JMESPath query as needed.
I'm trying to filter instances from a specific network and I tried to use this filter:
networkInterfaces.network = 'NET_NAME'
And I got invalid expression.
I even tried the following - same results for all of them:
networkInterfaces[0].network = 'NET_NAME'
networkInterfaces[].network = 'NET_NAME'
[]networkInterfaces.network = 'NET_NAME'
Can't find a place saying if this even a supported filter
I tried running these filters in their API explorer:
https://cloud.google.com/compute/docs/reference/rest/v1/instances/aggregatedList?
And also via their official Python client
You can parse the response and filter the required instances like below
import googleapiclient.discovery
project = "my-gcp-project"
zone = "us-central1-b"
network_name = "mynetwork"
compute = googleapiclient.discovery.build('compute', 'v1')
result = compute.instances().list(project=project, zone=zone).execute()
filtered_instances = []
for item in result['items']:
if "networkInterfaces" in item.keys():
for network_interface in item['networkInterfaces']:
if "network" in network_interface.keys():
if network_name in network_interface['network']:
filtered_instances.append(item['name'])
print(str(filtered_instances))
Hope this helps.
Basically, filtering for 'Network tags' to list instances is not supported in this API. To view the list of supported filter/fields, you can click on 'Filter VM Instances' on your GCP console's main page showing VM instances. Unfortunately, the list does not include 'Network tags' to filter on. Alternatively, you can execute the following command in a cloud shell to list all instances with their network tags:
$ gcloud compute instances list --filter=tags:TAG_NAME
You can see this example in this documentation.
Why is the list for EC2 different from the EMR list?
EC2: https://aws.amazon.com/ec2/spot/pricing/
EMR: https://aws.amazon.com/emr/pricing/
Why are not all the types of instances from the EC2 available for EMR? How to get this special list?
In case your question is not about the amazon console
(then it would surely be closed as off-topic):
As a programming solution, you are looking something like this: (using python boto3)
import boto3
client = boto3.client('emr')
for instance in client.list_instances():
print("Instance[%s] %s"%(instance.id, instance.name))
This is what I use, although I'm not 100% sure it's accurate (because I couldn't find documentation to support some of my choices (-BoxUsage, etc.)).
It's worth looking through the responses from AWS in order to figure out what the different values are for different fields in the pricing client responses.
Use the following to get the list of responses:
default_profile = boto3.session.Session(profile_name='default')
# Only us-east-1 has the pricing API
# - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/pricing.html
pricing_client = default_profile.client('pricing', region_name='us-east-1')
service_name = 'ElasticMapReduce'
product_filters = [
{'Type': 'TERM_MATCH', 'Field': 'location', 'Value': aws_region_name}
]
response = pricing_client.get_products(
ServiceCode=service_name,
Filters=product_filters,
MaxResults=100
)
response_list.append(response)
num_prices = 100
while 'NextToken' in response:
# re-query to get next page
Once you've gotten the list of responses, you can then filter out the actual instance info:
emr_prices = {}
for response in response_list:
for price_info_str in response['PriceList']:
price_obj = json.loads(price_info_str)
attributes = price_obj['product']['attributes']
# Skip pricing info that doesn't specify a (EC2) instance type
if 'instanceType' not in attributes:
continue
inst_type = attributes['instanceType']
# AFAIK, Only usagetype attributes that contain the string '-BoxUsage' are the ones that contain the prices that we would use (empirical research)
# Other examples of values are <REGION-CODE>-M3BoxUsage, <REGION-CODE>-M5BoxUsage, <REGION-CODE>-M7BoxUsage (no clue what that means.. )
if '-BoxUsage' not in attributes['usagetype']:
continue
if 'OnDemand' not in price_obj['terms']:
continue
on_demand_info = price_obj['terms']['OnDemand']
price_dim = list(list(on_demand_info.values())[0]['priceDimensions'].values())[0]
emr_price = Decimal(price_dim['pricePerUnit']['USD'])
emr_prices[inst_type] = emr_price
Realistically, it's straightforward enough to figure this out from the boto3 docs. In particular, the get_products documentation.
I am planning to use AWS autoscaling groups for my webservers. As a monitoring solution I am using munin at the moment. In the configuration file on the munin master server, you have to give IP addresses or host names for every host you want to monitor.
Now with autoscaling the number of instances will change frequently, and writing static information in the munin config does not seem to fit well in this environment. I could probably query all server addresses I want to monitor and write the munin master configuration file then, but this seems not like a good approach to me.
What is the preferred way of using munin in such an environment? Does someone use munin with autoscaling?
In general I would like to keep using munin and not switch to another monitoring solution because I wrote quite a lot of specific plugins that I rely on. However if you have another monitoring solution that will probably let me keep my plugins I am also open for that.
One year ago we used munin as alternative monitoring system and I will tell you one: I don't like it at all.
We had some automation for auto scaling system in nagios too, but this is also ugly way to monitor large amount of AWS instances because nagios starts to lag/crash after some amount of monitoring instances.
If you have more that 150-200 instances to monitor I suggest you to use some commercial services like StackDriver or other alternatives.
I stumbled across this old topic because I was looking for a solution to the same problem. Finally I found a way that works for me which I would like to share with you. The tl;dr summary
use AWS Python API to get all instances in the same VPC the munin master is in
test if munin port 4949 is open on the instances found to detect munin nodes
create munin.conf from a munin.base.conf (without nodes) and append entries for all the nodes found
run the script on the munin master all 5 minutes via cron
Finally, here is my Python script which does all the magic:
#! /usr/bin/python
import boto3
import requests
import argparse
import shutil
import socket
socketTimeout = 2
ec2 = boto3.client('ec2')
def getVpcId():
response = requests.get('http://169.254.169.254/latest/meta-data/instance-id')
instance_id = response.text
response = ec2.describe_instances(
Filters=[
{
'Name' : 'instance-id',
'Values' : [ instance_id ]
}
]
)
return response['Reservations'][0]['Instances'][0]['VpcId']
def findNodes(tag):
result = []
vpcId = getVpcId()
response = ec2.describe_instances(
Filters=[
{
'Name' : 'tag-key',
'Values' : [ tag ]
},
{
'Name' : 'vpc-id',
'Values' : [ vpcId ]
}
]
)
for reservation in response['Reservations']:
for instance in reservation['Instances']:
result.append(instance)
return result
def getInstanceTag(instance, tagName):
for tag in instance['Tags']:
if tag['Key'] == tagName:
return tag['Value']
return None
def isMuninNode(host):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(socketTimeout)
try:
s.connect((host, 4949))
s.shutdown(socket.SHUT_RDWR)
return True
except Exception as e:
return False
finally:
s.close()
def appendNodesToConfig(nodes, target, tag):
with open(target, "a") as file:
for node in nodes:
hostname = getInstanceTag(node, tag)
if hostname.endswith('.'):
hostname = hostname[:-1]
if hostname <> None and isMuninNode(hostname):
file.write('[' + hostname + ']\n')
file.write('\taddress ' + hostname + '\n')
file.write('\tuse_node_name yes\n\n')
parser = argparse.ArgumentParser("muninconf.py")
parser.add_argument("baseconfig", help="base munin config to append nodes to")
parser.add_argument("target", help="target munin config")
args = parser.parse_args()
base = args.baseconfig
target = args.target
shutil.copyfile(base, target)
nodes = findNodes('CNAME')
appendNodesToConfig(nodes, target, 'CNAME')
For the API calls to work you have to setup AWS API credentials or assign an IAM role with the required permissions (ec2:DescribeInstances as a bare minimum) to your munin master instance (which is my prefered method).
Some final implementation notes:
I have a tag named CNAME assigned to all my AWS instances which holds the internal DNS host name. Therefore I filter for this tag and use the value as the node name and address for the munin configuration. You probably have to change this for your setup.
Another option would be to assign a specific tag to all the instances you want to monitor with munin. You could then filter for this tag and probably also skip the check for the open munin port.
Hope this is of some help.
Cheers,
Oliver