Boto3 athena query without saving data to s3

Boto3 athena query without saving data to s3 - amazon-web-services

I am trying to use boto3 to run a set of queries and don't want to save the data to s3. Instead I just want to get the results and want to work with those results. I am trying to do the following
import boto3
client = boto3.client('athena')
response = client.start_query_execution(
QueryString='''SELECT * FROM mytable limit 10''',
QueryExecutionContext={
'Database': 'my_db'
}.
ResultConfiguration={
'OutputLocation': 's3://outputpath',
}
)
print(response)
But here I don't want to give ResultConfiguration because I don't want to write the results anywhere. But If I remove the ResultConfiguration parameter I get the following error
botocore.exceptions.ParamValidationError: Parameter validation failed:
Missing required parameter in input: "ResultConfiguration"
So it seems like giving s3 output location for writing is mendatory. So what could the way to avoid this and get the results only in response?

The StartQueryExecution action indeed requires a S3 output location. The ResultConfiguration parameter is mandatory.
The alternative way to query Athena is using JDBC or ODBC drivers. You should probably use this method if you don't want to store results in S3.

You will have to specify an S3 temp bucket location whenever running the 'start_query_execution' command. However, you can get a result set (a dict) by running the 'get_query_results' method using the query id.
The response (dict) will look like this:
{
'UpdateCount': 123,
'ResultSet': {
'Rows': [
{
'Data': [
{
'VarCharValue': 'string'
},
]
},
],
'ResultSetMetadata': {
'ColumnInfo': [
{
'CatalogName': 'string',
'SchemaName': 'string',
'TableName': 'string',
'Name': 'string',
'Label': 'string',
'Type': 'string',
'Precision': 123,
'Scale': 123,
'Nullable': 'NOT_NULL'|'NULLABLE'|'UNKNOWN',
'CaseSensitive': True|False
},
]
}
},
'NextToken': 'string'
}
For more information, see boto3 client doc: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.get_query_results
You can then delete all files in the S3 temp bucket you've specified.

You still need to provide s3 as temporary location for Athena to save the data although you want to process the data using python. But you can page through the data as tuple using Pagination API. please refer to the example here. Hope that helps

Related

I'm not getting the expected response from client.describe_image_scan_findings() using Boto3

I'm trying to use Boto3 to get the number of vulnerabilities from my images in my repositories. I have a list of repository names and image IDs that are getting passed into this function. Based off their documentation
I'm expecting a response like this when I filter for ['imageScanFindings']
'imageScanFindings': {
'imageScanCompletedAt': datetime(2015, 1, 1),
'vulnerabilitySourceUpdatedAt': datetime(2015, 1, 1),
'findingSeverityCounts': {
'string': 123
},
'findings': [
{
'name': 'string',
'description': 'string',
'uri': 'string',
'severity': 'INFORMATIONAL'|'LOW'|'MEDIUM'|'HIGH'|'CRITICAL'|'UNDEFINED',
'attributes': [
{
'key': 'string',
'value': 'string'
},
]
},
],
What I really need is the
'findingSeverityCounts' number, however, it's not showing up in my response. Here's my code and the response I get:
main.py
repo_names = ['cftest/repo1', 'your-repo-name', 'cftest/repo2']
image_ids = ['1.1.1', 'latest', '2.2.2']
def get_vuln_count(repo_names, image_ids):
container_inventory = []
client = boto3.client('ecr')
for n, i in zip(repo_names, image_ids):
response = client.describe_image_scan_findings(
repositoryName=n,
imageId={'imageTag': i}
)
findings = response['imageScanFindings']
print(findings)
Output
{'findings': []}
The only thing that shows up is findings and I was expecting findingSeverityCounts in the response along with the others, but nothing else is showing up.
THEORY
I have 3 repositories and an image in each repository that I uploaded. One of my theories is that I'm not getting the other responses, such as findingSeverityCounts because my images don't have vulnerabilities? I have inspector set-up to scan on push, but they don't have vulnerabilities so nothing shows up in the inspector dashboard. Could that be causing the issue? If so, how would I be able to generate a vulnerability in one of my images to test this out?

My theory was correct and when there are no vulnerabilities, the response completely omits certain values, including the 'findingSeverityCounts' value that I needed.
I created a docker image using python 2.7 to generate vulnerabilities in my scan to test out my script properly. My work around was to implement this if statement- if there's vulnerabilities it will return them, if there aren't any vulnerabilities, that means 'findingSeverityCounts' is omitted from the response, so I'll have it return 0 instead of giving me a key error.
Example Solution:
response = client.describe_image_scan_findings(
repositoryName=n,
imageId={'imageTag': i}
)
if 'findingSeverityCounts' in response['imageScanFindings']:
print(response['imageScanFindings']['findingSeverityCounts'])
else:
print(0)

Getting Lex bot checksum value

I am trying to update a Lex bot using the python SDK put_bot function. I need to pass a checksum value of the function as specified here. How do I get this value?
So far I have tried using below functions and the checksum values returned from these
The get_bot with the prod alias
The get_bot_alias function
The get_bot_aliases function
Is the checksum value

Lex Model Building Service:
checksum (string) --
Identifies a specific revision of the $LATEST version.
When you create a new bot, leave the checksum field blank. If you specify a checksum you get a BadRequestException exception.
When you want to update a bot, set the checksum field to the checksum of the most recent revision of the $LATEST version. If you don't specify the checksum field, or if the checksum does not match the $LATEST version, you get a PreconditionFailedException exception.
You should first retrieve the checksum of your bot if you want to update it.
You should be able to use the same checksum that is returned from get_bot_aliases().
This is the example response from get_bot_aliases() function.
{
'BotAliases': [
{
'name': 'string',
'description': 'string',
'botVersion': 'string',
'botName': 'string',
'lastUpdatedDate': datetime(2015, 1, 1),
'createdDate': datetime(2015, 1, 1),
'checksum': 'string', --checksum here
'conversationLogs': {
'logSettings': [
{
'logType': 'AUDIO'|'TEXT',
'destination': 'CLOUDWATCH_LOGS'|'S3',
'kmsKeyArn': 'string',
'resourceArn': 'string',
'resourcePrefix': 'string'
},
],
'iamRoleArn': 'string'
}
},
],
'nextToken': 'string'
}

If you are trying to update intent, first do get_intent and save the checksum and use it in put_intent, If you are using the put_bot api, then first get_bot and save the checksum in that , use it in put_bot api

I want to find out the total RAM size of AWS RDS through lambda python.I tried the code and got empty set.Is there any other way to find this?

import json
import boto3,datetime
def lambda_handler(event, context):
cloudwatch = boto3.client('cloudwatch',region_name=AWS_REGION)
response = cloudwatch.get_metric_data(
MetricDataQueries=[
{
'Id': 'memory',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/RDS',
'MetricName': 'TotalMemory',
'Dimensions': [
{
"Name": "DBInstanceIdentifier",
"Value": "mydb"
}]
},
'Period': 30,
'Stat': 'Average',
}
}
],
StartTime=(datetime.datetime.now() - datetime.timedelta(seconds=300)).timestamp(),
EndTime=datetime.datetime.now().timestamp()
)
print(response)
The result is like below:
{'MetricDataResults': [{'Id': 'memory', 'Label': 'TotalMemory', 'Timestamps': [], 'Values': [], 'StatusCode': 'Complete'}]

If you are looking to get the configured vCPU/Memory then it seems like we need to call DescribeDBInstances API to get DBInstanceClass, which contains the hardware information from here
You would need to use one of the CloudWatch metric names from https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MonitoringOverview.html#rds-metrics and it seems like we can retrieve the currently available memory metric using the FreeableMemory. I was able to get data (in bytes) as seen from the RDS' Monitoring console while using this metric name from your sample code.

You can check the total amount of memory and other useful information associated with the RDS in the CloudWatch console.
Step1: Go to the CloudWatch console. Navigate to Log groups.
Step2: Search for RDSOSMetrics in the search bar.
Step3: Click on the log stream. You will be able to find all the details in the JSON. Your total memory would be present in the field titled memory.total. Sample result would be like this
{
"engine": "MYSQL",
"instanceID": "dbName",
"uptime": "283 days, 21:08:36",
"memory": {
"writeback": 0,
"free": 171696,
"hugePagesTotal": 0,
"inactive": 1652000,
"pageTables": 19716,
"dirty": 324,
"active": 5850016,
"total": 7877180,
"buffers": 244312
}
}
I have intentionally reduced the message in the JSON because of the size, but there will be many other useful fields that you can find here.
You can use custom jq command-line utility to extract the field that you want from these log groups.
You can read more about this here cloudwatch enhanced monitoring.

Not able to add data into DynamoDB using API gateway POST method

I made a Serverless API backend on AWS console which uses API Gateway, DynamoDB, Lambda functions.
Upon creation I can add the data in dynamoDB online by adding a JSON file, which looks like this:
{
"id": "4",
"k": "key1",
"v": "value1"
}
But when I try to add this using "Postman", by adding the above JSON data in the body of POST message, I get a Positive return (i.e. no errors) but only the "id" field is added in the database and not the "k" or "v".
What is missing?

I think that you need to check on your Lambda function.
As you are using Postman to do the API calls, received event's body will be as follows:
{'resource':
...
}, 'body': '{\n\t"id": 1,\n\t"name": "ben"\n
}', 'isBase64Encoded': False
}
As you can see:
'body': '{\n\t"id": 1,\n\t"name": "ben"\n}'
For example, I will use Python 3 for this case, what I need to do is to load the body into JSON format then we are able to use it.
result = json.loads(event['body'])
id = result['id']
name = result['name']
Then update them into DynamoDB:
item = table.put_item(
Item={
'id': str(id),
'name': str(name)
}
)

Formatting for Firehose transformation output

I am using AWS Kinesis Firehose with a custom Data Transformation. The Lambda's written in Python 3.6 and returns strings that look like the following:
{
"records": [
{
"recordId": "...",
"result": "Ok",
"data": "..."
},
{
"recordId": "...",
"result": "Ok",
"data": "..."
},
{
"recordId": "...",
"result": "Ok",
"data": "..."
}
]
}
This Lambda is perfectly happy, and logs outputs that look like the above just before returning them to Firehose. However, the Firehose's S3 Logs then show an error:
Invalid output structure: Please check your function and make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed.
Looking at the examples for this spread across the web in JS and Java, it's not clear to me what I need to be doing differently; I'm quite confused.

If your data is a json object, you can try following
import base64
import json
def lambda_handler(event, context):
output = []
for record in event['records']:
# your own business logic.
json_object = {...}
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(json.dumps(json_object).encode('utf-8')).decode('utf-8')
}
output.append(output_record)
return {'records': output}
base64.b64encode function only works with b'xxx' string while 'data' attribute of output_record needs a normal 'xxx' string.

I've found the same error using Node.js.
Reading the documentation http://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html my mistake was not base64-encoding of the data field of every record.
I resolved doing this:
{
recordId: record.recordId,
result: 'Ok',
data: new Buffer(JSON.stringify(data)).toString('base64')
}

You can check the code in my repo.
https://github.com/hixichen/golang_lamda_decode_protobuf_firehose

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Boto3 athena query without saving data to s3 - amazon-web-services

The StartQueryExecution action indeed requires a S3 output location. The ResultConfiguration parameter is mandatory. The alternative way to query Athena is using JDBC or ODBC drivers. You should probably use this method if you don't want to store results in S3.

You still need to provide s3 as temporary location for Athena to save the data although you want to process the data using python. But you can page through the data as tuple using Pagination API. please refer to the example here. Hope that helps

Related

I'm not getting the expected response from client.describe_image_scan_findings() using Boto3

Getting Lex bot checksum value

I want to find out the total RAM size of AWS RDS through lambda python.I tried the code and got empty set.Is there any other way to find this?

Not able to add data into DynamoDB using API gateway POST method

Formatting for Firehose transformation output

Categories

Resources