I'm using a lambda function, coded in python, as a backend to an aws-api-gateway method.
The api is completed, but now I have a new problem, the API should be deployed to multiple environments (production, test, etc), and each one should use a different configuration for the backend. Let's say that I had this handler:
import settings
import boto3
def dummy_handler(event, context):
logger.info('got event{}'.format(event))
utils = Utils(event["stage"])
response = utils.put_ticket_on_dynamodb(event["item"])
return json.dumps(response)
class Utils:
def __init__(self, stage):
self.stage = stage
def put_ticket_on_dynamodb(self, item):
# Write record to dynamoDB
try:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(settings.TABLE_NAME)
table.put_item(Item=item)
except Exception as e:
logger.error("Fail to put item on DynamoDB: {0}".format(str(e)))
raise
logger.info("Item successfully written to DynamoDB")
return item
Now, in order to use a different TABLE_NAME on each stage, I replace the setting.py file by a module, with this structure:
settings/
__init__.py
_base.py
_servers.py
development.py
production.py
testing.py
Following this answer here.
But I don't have any idea of how can I use it on my solution, considering that stage (passed as parameter to the Utils class), will match the settings filename in the module settings, What should I change in my class Utils to make it works?
Another alternative to handling this use case is to use API Gateway's stage variables and pass in the setting which vary by stage as parameters to your Lambda function.
Stage variables are name-value pairs associated with a specific API deployment stage and act like environment variables for use in your API setup and mapping templates. For example, you can configure an API method in each stage to connect to a different backend endpoint by setting different endpoint values in your stage variables.
Here is a blog post on using stage variables.
Here is the full documentation on using stage variables.
I finally used a different approach here. Instead of a python module for the setting, I used a single script for the settings, with a dictionary containing the configuration for each environment. I would like to use a separate settings script for each environment, but so far I can't find how.
So, now my settings file looks like this:
COUNTRY_CODE = 'CL'
TIMEZONE = "America/Santiago"
LOCALE = "es_CL"
DEFAULT_PAGE_SIZE = 20
ENV = {
'production': {
'TABLE_NAME': "dynamodbTable",
'BUCKET_NAME': "sssBucketName"
},
'testing': {
'TABLE_NAME': "dynamodbTableTest",
'BUCKET_NAME': "sssBucketNameTest"
},
'test-invoke-stage': {
'TABLE_NAME': "dynamodbTableTest",
'BUCKET_NAME': "sssBucketNameTest"
}
}
And my code:
def put_ticket_on_dynamodb(self, item):
# Write record to dynamoDB
try:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(settings.ENV[self.stage]["TABLE_NAME"])
table.put_item(Item=item)
except Exception as e:
logger.error("Fail to put item on DynamoDB: {0}".format(str(e)))
raise
logger.info("Item successfully written to DynamoDB")
return item
Related
we've set up AWS SecretsManager as a secrets backend to Airflow (AWS MWAA) as described in their documentation. Unfortunately, nowhere is explained where the secrets are to be found and how they are to be used then. When I supply conn_id to a task in a DAG, we can see two errors in the task logs, ValueError: Invalid IPv6 URL and airflow.exceptions.AirflowNotFoundException: The conn_id redshift_conn isn't defined. What's even more surprising is that when retrieving variables stored the same way with Variable.get('my_variable_id'), it works just fine.
The question is: Am I wrongly expecting that the conn_id can be directly passed to operators as SomeOperator(conn_id='conn-id-in-secretsmanager')? Must I retrieve the connection manually each time I want to use it? I don't want to run something like read_from_aws_sm_fn in the code below every time beforehand...
Btw, neither the connection nor the variable show up in the Airflow UI.
Having stored a secret named airflow/connections/redshift_conn (and on the side one airflow/variables/my_variable_id), I expect the connection to be found and used when constructing RedshiftSQLOperator(task_id='mytask', redshift_conn_id='redshift_conn', sql='SELECT 1'). But this results in the above error.
I am able to retrieve the redshift connection manually in a DAG with a separate task, but I think that is not how SecretsManager is supposed to be used in this case.
The example DAG is below:
from airflow import DAG, settings, secrets
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago
from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
from airflow.models.baseoperator import chain
from airflow.models import Connection, Variable
from airflow.providers.amazon.aws.operators.redshift import RedshiftSQLOperator
from datetime import timedelta
sm_secret_id_name = f'airflow/connections/redshift_conn'
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(1),
'retries': 1,
}
def read_from_aws_sm_fn(**kwargs): # from AWS example code
### set up Secrets Manager
hook = AwsBaseHook(client_type='secretsmanager')
client = hook.get_client_type('secretsmanager')
response = client.get_secret_value(SecretId=sm_secret_id_name)
myConnSecretString = response["SecretString"]
print(myConnSecretString[:15])
return myConnSecretString
def get_variable(**kwargs):
my_var_value = Variable.get('my_test_variable')
print('variable:')
print(my_var_value)
return my_var_value
with DAG(
dag_id=f'redshift_test_dag',
default_args=default_args,
dagrun_timeout=timedelta(minutes=10),
start_date=days_ago(1),
schedule_interval=None,
tags=['example']
) as dag:
read_from_aws_sm_task = PythonOperator(
task_id="read_from_aws_sm",
python_callable=read_from_aws_sm_fn,
provide_context=True
) # works fine
query_redshift = RedshiftSQLOperator(
task_id='query_redshift',
redshift_conn_id='redshift_conn',
sql='SELECT 1;'
) # results in above errors :-(
try_to_get_variable_value = PythonOperator(
task_id='get_variable',
python_callable=get_variable,
provide_context=True
) # works fine!
The question is: Am I wrongly expecting that the conn_id can be directly passed to operators as SomeOperator(conn_id='conn-id-in-secretsmanager')? Must I retrieve the connection manually each time I want to use it? I don't want to run something like read_from_aws_sm_fn in the code below every time beforehand...
Using secret manager as a backend, you don't need to change the way you use the connections or variables. They work the same way, when looking up a connection/variable, airflow follow a search path.
Btw, neither the connection nor the variable show up in the Airflow UI.
The connection/variable will not up in the UI.
ValueError: Invalid IPv6 URL and airflow.exceptions.AirflowNotFoundException: The conn_id redshift_conn isn't defined
The 1st error is related to the secret and the 2nd error is due to the connection not existing in the airflow UI.
There is 2 formats to store connections in secret manager (depending on the aws provider version installed) the IPv6 URL error could be that its not parsing the connection correctly. Here is a link to the provider docs.
First step is defining the prefixes for connections and variables, if they are not defined, your secret backend will not check for the secret:
secrets.backend_kwargs : {"connections_prefix" : "airflow/connections", "variables_prefix" : "airflow/variables"}
Then for the secrets/connections, you should store them in those prefixes, respecting the required fields for the connection.
For example, for the connection my_postgress_conn:
{
"conn_type": "postgresql",
"login": "user",
"password": "pass",
"host": "host",
"extra": '{"key": "val"}',
}
You should store it in the path airflow/connections/my_postgress_conn, with the json dict as string.
And for the variables, you just need to store them in airflow/variables/<var_name>.
I am using Sagemaker python sdk for my inference job and following this guide. I am triggering my sagemaker inference job from Airflow with below python callable:
def transform(sage_role, inference_file_local_path, **kwargs):
"""
Python callable to execute Sagemaker SDK train job. It takes infer_batch_output, infer_batch_input, model_artifact,
instance_type and infer_file_name as run time parameter.
:param inference_file_local_path: Local entry_point path for Inference file.
:param sage_role: Sagemaker execution role.
"""
model = TensorFlowModel(entry_point=infer_file_name,
source_dir=inference_file_local_path,
model_data=model_artifact,
role=sage_role,
framework_version="2.5.1")
tensorflow_serving_transformer = model.transformer(
instance_count=1,
instance_type=instance_type,
accept="text/csv",
strategy="SingleRecord",
max_payload=10,
max_concurrent_transforms=10,
output_path=batch_output)
return tensorflow_serving_transformer.transform(data=batch_input, content_type='text/csv')
and my simply inference.py looks like:
def input_handler(data, context):
""" Pre-process request input before it is sent to TensorFlow Serving REST API
Args:
data (obj): the request data, in format of dict or string
context (Context): an object containing request and configuration details
Returns:
(dict): a JSON-serializable dict that contains request body and headers
"""
if context.request_content_type == 'application/x-npy':
# very simple numpy handler
payload = np.load(data.read().decode('utf-8'))
x_user_feature = np.asarray(payload.item().get('test').get('feature_a_list'))
x_channel_feature = np.asarray(payload.item().get('test').get('feature_b_list'))
examples = []
for index, elem in enumerate(x_user_feature):
examples.append({'feature_a_list': elem, 'feature_b_list': x_channel_feature[index]})
return json.dumps({'instances': examples})
if context.request_content_type == 'text/csv':
payload = pd.read_csv(data)
print("Model name is ..............")
model_name = context.model_name
print(model_name)
examples = []
row_ch = []
if config_exists(model_bucket, "{}{}".format(config_path, model_name)):
config_keys = get_s3_json_file(model_bucket, "{}{}".format(config_path, model_name))
feature_b_list = config_keys["feature_b_list"].split(",")
row_ch = [float(ch_feature_str) for ch_feature_str in feature_b_list]
if "column_names" in config_keys.keys():
cols = config_keys["column_names"].split(",")
payload.columns = cols
for index, row in payload.iterrows():
row_user = row['feature_a_list'].replace('[', '').replace(']', '').split()
row_user = [float(x) for x in row_user]
if not row_ch:
row_ch = row['feature_b_list'].replace('[', '').replace(']', '').split()
row_ch = [float(x) for x in row_ch]
example = {'feature_a_list': row_user, 'feature_b_list': row_ch}
examples.append(example)
raise ValueError('{{"error": "unsupported content type {}"}}'.format(
context.request_content_type or "unknown"))
def output_handler(data, context):
"""Post-process TensorFlow Serving output before it is returned to the client.
Args:
data (obj): the TensorFlow serving response
context (Context): an object containing request and configuration details
Returns:
(bytes, string): data to return to client, response content type
"""
if data.status_code != 200:
raise ValueError(data.content.decode('utf-8'))
response_content_type = context.accept_header
prediction = data.content
return prediction, response_content_type
It is working fine however I want to pass custom arguments to inference.py so that I can modify the input data accordingly based on requirement. I thought of using a config file per requirement and download it from s3 based on model name but as I am using model_data and passes model.tar.gz at runtime context.model_name is always None.
Is there a way I can pass run time argument to inference.py that I can use for customization?
In the docs I see sagemaker provides custom_attributes but I don't see any example of it on how to use it and access it in inference.py.
custom_attributes (string): content of ‘X-Amzn-SageMaker-Custom-Attributes’ header from the original request. For example, ‘tfs-model-name=half*plus*three,tfs-method=predict’
Currently CustomAttributes is supported in the InvokeEndpoint API call when using a realtime Endpoint.
As an example, you can look at passing JSON Lines as input to your Transform Job that contains the input payload and some custom arguments which you can consume in your inference.py file.
For example,
{
"input":"1,2,3,4",
"custom_args":"my_custom_arg"
}
I am keeping the application endpoint in SSM parameter store and able to access from Lambda environment .
Resources:
M4IAcarsScheduler:
Type: AWS::Serverless::Function
Properties:
Handler: not.used.in.provided.runtime
Runtime: provided
CodeUri: target/function.zip
MemorySize: 512
Timeout: 900
FunctionName: Sample
Environment:
Variables:
SamplePath: !Ref sample1path
SampleId: !Ref sample1pathid
Parameters:
sample1path:
Type: AWS::SSM::Parameter::Value<String>
Description: Select existing security group for lambda function from Parameter Store
Default: /sample/path
sample1pathid:
Type: AWS::SSM::Parameter::Value<String>
Description: Select existing security group for lambda function from Parameter Store
Default: /sample/id
My issue is while I am updating the SSM parameter, the Lambda Env. is not update dynamically, and every time I need to restart.
Is there any way I can handle it dynamically, meaning that when it changes in SSM parameter Store, it'll reflect without restart of Lambda?
By using SSM parameters in a CloudFormation stack, the parameters get resolved when the CloudFormation stack is deployed. If the value in SSM subsequently changes, there is nothing to update the lambda, so the lambda will still have the value that was pulled from SSM at the moment the CloudFormation stack deployed. The lambda will not even know that the parameter came from SSM; rather, it will only know that there there is a static environment variable configured.
Instead, to use SSM Parameters in your lambda you should change your lambda code so that it fetches the parameter from inside the code. This AWS blog shows a Python lambda example of how to fetch the parameters from the lambda code (when the lambda runs):
import os, traceback, json, configparser, boto3
from aws_xray_sdk.core import patch_all
patch_all()
# Initialize boto3 client at global scope for connection reuse
client = boto3.client('ssm')
env = os.environ['ENV']
app_config_path = os.environ['APP_CONFIG_PATH']
full_config_path = '/' + env + '/' + app_config_path
# Initialize app at global scope for reuse across invocations
app = None
class MyApp:
def __init__(self, config):
"""
Construct new MyApp with configuration
:param config: application configuration
"""
self.config = config
def get_config(self):
return self.config
def load_config(ssm_parameter_path):
"""
Load configparser from config stored in SSM Parameter Store
:param ssm_parameter_path: Path to app config in SSM Parameter Store
:return: ConfigParser holding loaded config
"""
configuration = configparser.ConfigParser()
try:
# Get all parameters for this app
param_details = client.get_parameters_by_path(
Path=ssm_parameter_path,
Recursive=False,
WithDecryption=True
)
# Loop through the returned parameters and populate the ConfigParser
if 'Parameters' in param_details and len(param_details.get('Parameters')) > 0:
for param in param_details.get('Parameters'):
param_path_array = param.get('Name').split("/")
section_position = len(param_path_array) - 1
section_name = param_path_array[section_position]
config_values = json.loads(param.get('Value'))
config_dict = {section_name: config_values}
print("Found configuration: " + str(config_dict))
configuration.read_dict(config_dict)
except:
print("Encountered an error loading config from SSM.")
traceback.print_exc()
finally:
return configuration
def lambda_handler(event, context):
global app
# Initialize app if it doesn't yet exist
if app is None:
print("Loading config and creating new MyApp...")
config = load_config(full_config_path)
app = MyApp(config)
return "MyApp config is " + str(app.get_config()._sections)
Here is a post with an example in Node, and similar examples exist for other languages too.
// parameter expected by SSM.getParameter
var parameter = {
"Name" : "/systems/"+event.Name+"/config"
};
responseFromSSM = await SSM.getParameter(parameter).promise();
console.log('SUCCESS');
console.log(responseFromSSM);
var value = responseFromSSM.Parameter.Value;
basically what I trying to do is creating a message on Pub/Sub that triggers a GCF which creates a instance from a Regional Managed Instance Group in whatever available zone it has at the time.
The issue I'm trying to solve here is a rather recurrent ZONE_RESOURCE_POOL_EXHAUSTED which the regional MIG deals with.
Is this solution possible? I've tried using createInstances method but Logging just states PRECONDITION_FAILED.
The code snippet I'm using is as follows:
from googleapiclient import discovery
def launch_vm(project, region, igm, body)
service = discovery.build('compute', 'v1')
response = service.regionInstanceGroupManagers()\
.createInstances(
project=project,
region=region,
instanceGroupManager=igm,
body=body)
return response.execute()
request_body = {"instances":[{"name": "testinstance"}]}
launch_vm('project-name', 'us-central1', 'instace-group-name', request_body)
####### EDIT :
I just found out what happened, when I tried on another project with a recently created instance group, I found out that instance redistribution was enabled, which can NOT be the case as with the response from the CLI:
ERROR: (gcloud.compute.instance-groups.managed.create-instance) CreateInstances can be used only when instance redistribution is disabled (set to NONE).
I checked out the instance redistribution check and now it works wonders :) Thanks everyone for the help!
I'm able to createInstance:
import os
from googleapiclient import discovery
PROJECT = os.environ["PROJECT"]
REGION = os.environ["REGION"]
NAME = os.environ["NAME"]
service = discovery.build('compute', 'v1')
def launch_vm(project,region, name, body):
rqst = service.regionInstanceGroupManagers().createInstances(
project=project,
region=region,
instanceGroupManager=name,
body=body)
return rqst.execute()
body = {
"instances": [
{
"name": "testinstance"
}
]
}
launch_vm(PROJECT, REGION, NAME, body)
What's the best way to store API access keys that you need in your settings.py but that you don't want to commit into git?
I use an environment file that stays on my computer and contains some variables linked to my environment.
In my Django settings.py (which is uploaded on github):
# MANDRILL API KEY
MANDRILL_KEY = os.environ.get('DJANGO_MANDRILL_KEY')
On dev env, my .env file (which is excluded from my Git repo) contains:
DJANGO_MANDRILL_KEY=PuoSacohjjshE8-5y-0pdqs
This is a "pattern" proposed by Heroku: https://devcenter.heroku.com/articles/config-vars
I suppose there is a simple way to setit without using Heroku though :)
To be honest, the primary goal to me is not security-related, but rather related to environment splitting. But it can help for both I guess.
I use something like this in settings.py:
import json
if DEBUG:
secret_file = '/path/to/development/config.json'
else:
secret_file = '/path/to/production/config.json'
with open(secret_file) as f:
SECRETS = json.loads(f)
secret = lambda n: str(SECRETS[n])
SECRET_KEY = secret('secret_key')
DATABASES['default']['PASSWORD'] = secret('db_password')
and the JSON file:
{
"db_password": "foo",
"secret_key": "bar"
}
This way you can omit the production config from git or move it outside your repository.