I'm using Python script that fetch data from Rally API, manipulate it and send to Elasticsearch.
I'm trying to figure how to find my exist index/s in script code. My ES instance is quite simple:
es = Elasticsearch([{'host': 'myIP', 'port': 9200}])
I cannot find where's exist index in this instance
My purpose is to state a condition for updating my index/s data
Any idea?
tnx
OK I found the solution and it's quite simple to use. Just type:
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'host_ip', 'port': 9200}])
if es.indices.exists(index = 'index_name')
#set a condition here
Related
we've set up AWS SecretsManager as a secrets backend to Airflow (AWS MWAA) as described in their documentation. Unfortunately, nowhere is explained where the secrets are to be found and how they are to be used then. When I supply conn_id to a task in a DAG, we can see two errors in the task logs, ValueError: Invalid IPv6 URL and airflow.exceptions.AirflowNotFoundException: The conn_id redshift_conn isn't defined. What's even more surprising is that when retrieving variables stored the same way with Variable.get('my_variable_id'), it works just fine.
The question is: Am I wrongly expecting that the conn_id can be directly passed to operators as SomeOperator(conn_id='conn-id-in-secretsmanager')? Must I retrieve the connection manually each time I want to use it? I don't want to run something like read_from_aws_sm_fn in the code below every time beforehand...
Btw, neither the connection nor the variable show up in the Airflow UI.
Having stored a secret named airflow/connections/redshift_conn (and on the side one airflow/variables/my_variable_id), I expect the connection to be found and used when constructing RedshiftSQLOperator(task_id='mytask', redshift_conn_id='redshift_conn', sql='SELECT 1'). But this results in the above error.
I am able to retrieve the redshift connection manually in a DAG with a separate task, but I think that is not how SecretsManager is supposed to be used in this case.
The example DAG is below:
from airflow import DAG, settings, secrets
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago
from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
from airflow.models.baseoperator import chain
from airflow.models import Connection, Variable
from airflow.providers.amazon.aws.operators.redshift import RedshiftSQLOperator
from datetime import timedelta
sm_secret_id_name = f'airflow/connections/redshift_conn'
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(1),
'retries': 1,
}
def read_from_aws_sm_fn(**kwargs): # from AWS example code
### set up Secrets Manager
hook = AwsBaseHook(client_type='secretsmanager')
client = hook.get_client_type('secretsmanager')
response = client.get_secret_value(SecretId=sm_secret_id_name)
myConnSecretString = response["SecretString"]
print(myConnSecretString[:15])
return myConnSecretString
def get_variable(**kwargs):
my_var_value = Variable.get('my_test_variable')
print('variable:')
print(my_var_value)
return my_var_value
with DAG(
dag_id=f'redshift_test_dag',
default_args=default_args,
dagrun_timeout=timedelta(minutes=10),
start_date=days_ago(1),
schedule_interval=None,
tags=['example']
) as dag:
read_from_aws_sm_task = PythonOperator(
task_id="read_from_aws_sm",
python_callable=read_from_aws_sm_fn,
provide_context=True
) # works fine
query_redshift = RedshiftSQLOperator(
task_id='query_redshift',
redshift_conn_id='redshift_conn',
sql='SELECT 1;'
) # results in above errors :-(
try_to_get_variable_value = PythonOperator(
task_id='get_variable',
python_callable=get_variable,
provide_context=True
) # works fine!
The question is: Am I wrongly expecting that the conn_id can be directly passed to operators as SomeOperator(conn_id='conn-id-in-secretsmanager')? Must I retrieve the connection manually each time I want to use it? I don't want to run something like read_from_aws_sm_fn in the code below every time beforehand...
Using secret manager as a backend, you don't need to change the way you use the connections or variables. They work the same way, when looking up a connection/variable, airflow follow a search path.
Btw, neither the connection nor the variable show up in the Airflow UI.
The connection/variable will not up in the UI.
ValueError: Invalid IPv6 URL and airflow.exceptions.AirflowNotFoundException: The conn_id redshift_conn isn't defined
The 1st error is related to the secret and the 2nd error is due to the connection not existing in the airflow UI.
There is 2 formats to store connections in secret manager (depending on the aws provider version installed) the IPv6 URL error could be that its not parsing the connection correctly. Here is a link to the provider docs.
First step is defining the prefixes for connections and variables, if they are not defined, your secret backend will not check for the secret:
secrets.backend_kwargs : {"connections_prefix" : "airflow/connections", "variables_prefix" : "airflow/variables"}
Then for the secrets/connections, you should store them in those prefixes, respecting the required fields for the connection.
For example, for the connection my_postgress_conn:
{
"conn_type": "postgresql",
"login": "user",
"password": "pass",
"host": "host",
"extra": '{"key": "val"}',
}
You should store it in the path airflow/connections/my_postgress_conn, with the json dict as string.
And for the variables, you just need to store them in airflow/variables/<var_name>.
I'm new to stack overflow. Apologize if I didn't format it right.
I'm currently using terraform to provision aurora-rds. Problem is, I shouldn't be having the db master-password as a plaintext sitting in the .tf file.
I've been using this config initially with a plaintext password.
engine = "aurora-mysql"
engine_version = "5.7.12"
cluster_family = "aurora-mysql5.7"
cluster_size = "1"
namespace = "eg"
stage = "dev"
admin_user = "admin"
admin_password = "passwordhere"
db_name = "dbname"
db_port = "3306
I'm looking for a solution where I can skip a plaintext password like shown above and have something auto-generated and able to be included into terraform file. Also, I must be able to retrieve the password so that I can use that to configure wordpress server.
https://gist.github.com/smiller171/6be734957e30c5d4e4b15422634f13f4
I came across this solution but, not sure how to retrieve the password to use it in server. Well I haven't deployed this yet too.
As you mentioned in your question, there is a workaround, which you haven't yet tried.
I suggest to try that first and if its successful then to retrieve the password use output terraform resource.
output "db_password" {
value = ${random_string.db_master_pass.result}
description = "db password"
}
Once your terraform run is completed you can retrieve that value using terraform output db_password or if you want to refer that password somewhere in the terraform code itself then right away refer to that variable ${db_password}
Why is the list for EC2 different from the EMR list?
EC2: https://aws.amazon.com/ec2/spot/pricing/
EMR: https://aws.amazon.com/emr/pricing/
Why are not all the types of instances from the EC2 available for EMR? How to get this special list?
In case your question is not about the amazon console
(then it would surely be closed as off-topic):
As a programming solution, you are looking something like this: (using python boto3)
import boto3
client = boto3.client('emr')
for instance in client.list_instances():
print("Instance[%s] %s"%(instance.id, instance.name))
This is what I use, although I'm not 100% sure it's accurate (because I couldn't find documentation to support some of my choices (-BoxUsage, etc.)).
It's worth looking through the responses from AWS in order to figure out what the different values are for different fields in the pricing client responses.
Use the following to get the list of responses:
default_profile = boto3.session.Session(profile_name='default')
# Only us-east-1 has the pricing API
# - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/pricing.html
pricing_client = default_profile.client('pricing', region_name='us-east-1')
service_name = 'ElasticMapReduce'
product_filters = [
{'Type': 'TERM_MATCH', 'Field': 'location', 'Value': aws_region_name}
]
response = pricing_client.get_products(
ServiceCode=service_name,
Filters=product_filters,
MaxResults=100
)
response_list.append(response)
num_prices = 100
while 'NextToken' in response:
# re-query to get next page
Once you've gotten the list of responses, you can then filter out the actual instance info:
emr_prices = {}
for response in response_list:
for price_info_str in response['PriceList']:
price_obj = json.loads(price_info_str)
attributes = price_obj['product']['attributes']
# Skip pricing info that doesn't specify a (EC2) instance type
if 'instanceType' not in attributes:
continue
inst_type = attributes['instanceType']
# AFAIK, Only usagetype attributes that contain the string '-BoxUsage' are the ones that contain the prices that we would use (empirical research)
# Other examples of values are <REGION-CODE>-M3BoxUsage, <REGION-CODE>-M5BoxUsage, <REGION-CODE>-M7BoxUsage (no clue what that means.. )
if '-BoxUsage' not in attributes['usagetype']:
continue
if 'OnDemand' not in price_obj['terms']:
continue
on_demand_info = price_obj['terms']['OnDemand']
price_dim = list(list(on_demand_info.values())[0]['priceDimensions'].values())[0]
emr_price = Decimal(price_dim['pricePerUnit']['USD'])
emr_prices[inst_type] = emr_price
Realistically, it's straightforward enough to figure this out from the boto3 docs. In particular, the get_products documentation.
db_conn.j2:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'db_name',
'USER': 'db_user',
'PASSWORD': 'db_pass',
'HOST': 'localhost',
'PORT': '5432',
}
}
main.yml:
tasks:
- name: Set DB settings
template: src="/vagrant/ansible/templates/db_settings.j2" dest="{{ proj_dev }}/proj/settings.py"
tags:
- template
In my task file settings.py will be replaced on db_conn.j2.
But i need to change only DATABASES option in destination file (settings.py).
Can i do this via template? Or better use replace?
Is there other way in ansible for set django-settings?
The template module will override the complete file. There is no option to only replace a specific section. That's the idea of a template.
You could move the DATABASES section out to another file, and then from database.py import *, but then of course you'd have the same problem: You need to replace the DATABASES section with the import rule.
So yes, the replace module or the lineinfile module are generally better suited to replace a section of a file.
But you're lucky, Stouts has created a django role:
You can install it in your project with:
ansible-galaxy install Stouts.django
The blockinfile module introduced in Ansible 2.0 does exactly what you want. It will create and manage a block with special starting and ending marks ("BEGIN/END ANSIBLE MANAGED BLOCK" by default) in your file.
How do I find all Ubuntu images available in my region?
Attempt:
from boto.ec2 import connect_to_region
conn = connect_to_region(**dict(region_name=region, **aws_keys))
if not filters: # Not as default arguments, as they should be immutable
filters = {
'architecture': 'x86_64',
'name': 'ubuntu/images/ebs-*'
}
print conn.get_all_images(owners=['amazon'], filters=filters)
I've also tried setting ubuntu/images/ebs-ssd/ubuntu-trusty-14.04-amd64-server-20140927, ubuntu*, *ubuntu and *ubuntu*.
The AWS API does not accept globs in search filters as far as I am aware. You can use the owner id to find it. 'Amazon' is not the owner of the ubuntu images Canonical is.
Change owners=['amazon'] to owners=['099720109477'].
There is no owner alias for canonical as far as I can see, so you will have to use the owner id instead.
Hope this helps.