I have a data pipeline definition in json format, and I would like to 'put' that using Boto3 in Python.
I know you can do this via the AWS CLI using put-pipeline-definition, but Boto3 (and the AWS API) use a different format, splitting the definition into pipelineObjects, parameterObjects and parameterValues.
Do I need to write code to translate from a json definition to that expected by the API/Boto? If so, is there a library that does this?
The AWS CLI has code that does this translation, so I can borrow that!
You could convert from the Data Pipeline exported JSON format to the pipelineObjects format expected by boto3 using a python function of the following form.
def convert_to_pipeline_objects(pipeline_definition_dict):
objects_list = []
for def_object in pipeline_definition_dict['objects']:
new_object = {
'id': def_object['id'],
'name': def_object['name'],
'fields': []
}
for key in def_object.keys():
if key in ('id', 'name'):
continue
if type(def_object[key]) == dict:
new_object['fields'].append(
{
'key': key,
'refValue': def_object[key]['ref']
}
)
else:
new_object['fields'].append(
{
'key': key,
'stringValue': def_object[key]
}
)
objects_list.append(new_object)
Related
AWS Cloudfront with Custom Cookies using Wildcards in Lambda Function:
The problem:
On AWS s3 Storage to provide granular access control the preferred method is to use AWS Cloudfront with signed URL's.
Here is a good example how to setup cloudfront a bit old though, so you need to use the recommended settings not
the legacy and copy the generated policy down to S3.
https://medium.com/#himanshuarora/protect-private-content-using-cloudfront-signed-cookies-fd9674faec3
I have provided an example below on how to create one of these signed URL's using Python and the newest libraries.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-creating-signed-url-canned-policy.html
However this requires the creation of a signed URL for each item in the S3 bucket. To give wildcard access to a
directory of items in the S3 bucket you need use what is called a custom Policy. I could not find any working examples
of this code using Python, many of the online expamples have librarys that are depreciated. But attached is a working example.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-creating-signed-url-custom-policy.html
I had trouble getting the python cryptography package to work by building the lambda function on an Amazon Linux 2
instance on AWS EC2. Always came up with an error of a missing library. So I use Klayers for AWS and worked
https://github.com/keithrozario/Klayers/tree/master/deployments.
A working example for cookies for a canned policy (Means only a signed URL specific for each S3 file)
https://www.velotio.com/engineering-blog/s3-cloudfront-to-deliver-static-asset
My code for cookies for a custom policy (Means a single policy statement with URL wildcards etc). You must use the Cryptology
package type examples but the private_key.signer function was depreciated for a new private_key.sign function with an extra
argument. https://cryptography.io/en/latest/hazmat/primitives/asymmetric/rsa/#signing
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
import base64
import datetime
class CFSigner:
def sign_rsa(self, message):
private_key = serialization.load_pem_private_key(
self.keyfile, password=None, backend=default_backend()
)
signature = private_key.sign(message.encode(
"utf-8"), padding.PKCS1v15(), hashes.SHA1())
return signature
def _sign_string(self, message, private_key_file=None, private_key_string=None):
if private_key_file:
self.keyfile = open(private_key_file, "rb").read()
elif private_key_string:
self.keyfile = private_key_string.encode("utf-8")
return self.sign_rsa(message)
def _url_base64_encode(self, msg):
msg_base64 = base64.b64encode(msg).decode("utf-8")
msg_base64 = msg_base64.replace("+", "-")
msg_base64 = msg_base64.replace("=", "_")
msg_base64 = msg_base64.replace("/", "~")
return msg_base64
def generate_signature(self, policy, private_key_file=None):
signature = self._sign_string(policy, private_key_file)
encoded_signature = self._url_base64_encode(signature)
return encoded_signature
def create_signed_cookies2(self, url, private_key_file, keypair_id, expires_at):
policy = self.create_custom_policy(url, expires_at)
encoded_policy = self._url_base64_encode(
policy.encode("utf-8"))
signature = self.generate_signature(
policy, private_key_file=private_key_file)
cookies = {
"CloudFront-Policy": encoded_policy,
"CloudFront-Signature": signature,
"CloudFront-Key-Pair-Id": keypair_id,
}
return cookies
def sign_to_cloudfront(object_url, expires_at):
cf = CFSigner()
url = cf.create_signed_url(
url=object_url,
keypair_id="xxxxxxxxxx",
expire_time=expires_at,
private_key_file="xxx.pem",
)
return url
def create_signed_cookies(self, object_url, expires_at):
cookies = self.create_signed_cookies2(
url=object_url,
private_key_file="xxx.pem",
keypair_id="xxxxxxxxxx",
expires_at=expires_at,
)
return cookies
def create_custom_policy(self, url, expires_at):
return (
'{"Statement":[{"Resource":"'
+ url
+ '","Condition":{"DateLessThan":{"AWS:EpochTime":'
+ str(round(expires_at.timestamp()))
+ "}}}]}"
)
def lambda_handler(event, context):
response = event["Records"][0]["cf"]["response"]
headers = response.get("headers", None)
cf = CFSigner()
path = "https://www.example.com/*"
expire = datetime.datetime.now() + datetime.timedelta(days=3)
signed_cookies = cf.create_signed_cookies(path, expire)
headers["set-cookie"] = [{
"key": "set-cookie",
"value": "CloudFront-Policy={signed_cookies.get('CloudFront-Policy')}"
}]
headers["Set-cookie"] = [{
"key": "Set-cookie",
"value": "CloudFront-Signature={signed_cookies.get('CloudFront-Signature')}",
}]
headers["Set-Cookie"] = [{
"key": "Set-Cookie",
"value": "CloudFront-Key-Pair-Id={signed_cookies.get('CloudFront-Key-Pair-Id')}",
}]
print(response)
return response ```
I'm currently using aws lambda to trigger an amazon comprehend job, but the code is only used to run one piece of text under sentiment analysis.
import boto3
def lambda_handler(event, context):
s3 = boto3.client("s3")
bucket = "bucketName"
key = "textName.txt"
file = s3.get_object(Bucket = bucket, Key = key)
analysisdata = str(file['Body'].read())
comprehend = boto3.client("comprehend")
sentiment = comprehend.detect_sentiment(Text = analysisdata, LanguageCode = "en")
print(sentiment)
return 'Sentiment detected'
I want to run a file where each line in the text file is a new piece of text to analyze with sentiment analysis (it's an option if you manually enter stuff into comprehend), but is there a way to alter this code to do that? And have the output sentiment analysis file be placed into that same S3 bucket? Thank you in advance.
It looks like you can use start_sentiment_detection_job():
response = client.start_sentiment_detection_job(
InputDataConfig={
'S3Uri': 'string',
'InputFormat': 'ONE_DOC_PER_FILE'|'ONE_DOC_PER_LINE',
'DocumentReaderConfig': {
'DocumentReadAction': 'TEXTRACT_DETECT_DOCUMENT_TEXT'|'TEXTRACT_ANALYZE_DOCUMENT',
'DocumentReadMode': 'SERVICE_DEFAULT'|'FORCE_DOCUMENT_READ_ACTION',
'FeatureTypes': [
'TABLES'|'FORMS',
]
}
},
OutputDataConfig={
'S3Uri': 'string',
'KmsKeyId': 'string'
},
...
)
It can read from an object in Amazon S3 (S3Uri) and store the output in an S3 object.
It looks like you could use 'InputFormat': 'ONE_DOC_PER_LINE' to meet your requirements.
The below-mentioned code is created for exporting all the findings from the security hub to an S3 bucket using lambda functions. The filters are set for exporting only CIS-AWS foundations benchmarks. There are more than 20 accounts added as the members in security hub. The issue that I'm facing here is even though I'm using the NextToken configuration. The output doesn't have information about all the accounts. Instead, it just displays any one of the account's data randomly.
Can somebody look into the code and let me know what could be the issue, please?
import boto3
import json
from botocore.exceptions import ClientError
import time
import glob
client = boto3.client('securityhub')
s3 = boto3.resource('s3')
storedata = {}
_filter = Filters={
'GeneratorId': [
{
'Value': 'arn:aws:securityhub:::ruleset/cis-aws-foundations-benchmark',
'Comparison': 'PREFIX'
}
],
}
def lambda_handler(event, context):
response = client.get_findings(
Filters={
'GeneratorId': [
{
'Value': 'arn:aws:securityhub:::ruleset/cis-aws-foundations-benchmark',
'Comparison': 'PREFIX'
},
],
},
)
results = response["Findings"]
while "NextToken" in response:
response = client.get_findings(Filters=_filter,NextToken=response["NextToken"])
results.extend(response["Findings"])
storedata = json.dumps(response)
print(storedata)
save_file = open("/tmp/SecurityHub-Findings.json", "w")
save_file.write(storedata)
save_file.close()
for name in glob.glob("/tmp/*"):
s3.meta.client.upload_file(name, "xxxxx-security-hubfindings", name)
TooManyRequestsException error is also getting now.
The problem is in this code that paginates the security findings results:
while "NextToken" in response:
response = client.get_findings(Filters=_filter,NextToken=response["NextToken"])
results.extend(response["Findings"])
storedata = json.dumps(response)
print(storedata)
The value of storedata after the while loop has completed is the last page of security findings, rather than the aggregate of the security findings.
However, you're already aggregating the security findings in results, so you can use that:
save_file = open("/tmp/SecurityHub-Findings.json", "w")
save_file.write(json.dumps(results))
save_file.close()
I made a Serverless API backend on AWS console which uses API Gateway, DynamoDB, Lambda functions.
Upon creation I can add the data in dynamoDB online by adding a JSON file, which looks like this:
{
"id": "4",
"k": "key1",
"v": "value1"
}
But when I try to add this using "Postman", by adding the above JSON data in the body of POST message, I get a Positive return (i.e. no errors) but only the "id" field is added in the database and not the "k" or "v".
What is missing?
I think that you need to check on your Lambda function.
As you are using Postman to do the API calls, received event's body will be as follows:
{'resource':
...
}, 'body': '{\n\t"id": 1,\n\t"name": "ben"\n
}', 'isBase64Encoded': False
}
As you can see:
'body': '{\n\t"id": 1,\n\t"name": "ben"\n}'
For example, I will use Python 3 for this case, what I need to do is to load the body into JSON format then we are able to use it.
result = json.loads(event['body'])
id = result['id']
name = result['name']
Then update them into DynamoDB:
item = table.put_item(
Item={
'id': str(id),
'name': str(name)
}
)
I am trying to use boto3 to run a set of queries and don't want to save the data to s3. Instead I just want to get the results and want to work with those results. I am trying to do the following
import boto3
client = boto3.client('athena')
response = client.start_query_execution(
QueryString='''SELECT * FROM mytable limit 10''',
QueryExecutionContext={
'Database': 'my_db'
}.
ResultConfiguration={
'OutputLocation': 's3://outputpath',
}
)
print(response)
But here I don't want to give ResultConfiguration because I don't want to write the results anywhere. But If I remove the ResultConfiguration parameter I get the following error
botocore.exceptions.ParamValidationError: Parameter validation failed:
Missing required parameter in input: "ResultConfiguration"
So it seems like giving s3 output location for writing is mendatory. So what could the way to avoid this and get the results only in response?
The StartQueryExecution action indeed requires a S3 output location. The ResultConfiguration parameter is mandatory.
The alternative way to query Athena is using JDBC or ODBC drivers. You should probably use this method if you don't want to store results in S3.
You will have to specify an S3 temp bucket location whenever running the 'start_query_execution' command. However, you can get a result set (a dict) by running the 'get_query_results' method using the query id.
The response (dict) will look like this:
{
'UpdateCount': 123,
'ResultSet': {
'Rows': [
{
'Data': [
{
'VarCharValue': 'string'
},
]
},
],
'ResultSetMetadata': {
'ColumnInfo': [
{
'CatalogName': 'string',
'SchemaName': 'string',
'TableName': 'string',
'Name': 'string',
'Label': 'string',
'Type': 'string',
'Precision': 123,
'Scale': 123,
'Nullable': 'NOT_NULL'|'NULLABLE'|'UNKNOWN',
'CaseSensitive': True|False
},
]
}
},
'NextToken': 'string'
}
For more information, see boto3 client doc: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.get_query_results
You can then delete all files in the S3 temp bucket you've specified.
You still need to provide s3 as temporary location for Athena to save the data although you want to process the data using python. But you can page through the data as tuple using Pagination API. please refer to the example here. Hope that helps