How to use NextToken in Boto3 - amazon-web-services

The below-mentioned code is created for exporting all the findings from the security hub to an S3 bucket using lambda functions. The filters are set for exporting only CIS-AWS foundations benchmarks. There are more than 20 accounts added as the members in security hub. The issue that I'm facing here is even though I'm using the NextToken configuration. The output doesn't have information about all the accounts. Instead, it just displays any one of the account's data randomly.
Can somebody look into the code and let me know what could be the issue, please?
import boto3
import json
from botocore.exceptions import ClientError
import time
import glob
client = boto3.client('securityhub')
s3 = boto3.resource('s3')
storedata = {}
_filter = Filters={
'GeneratorId': [
{
'Value': 'arn:aws:securityhub:::ruleset/cis-aws-foundations-benchmark',
'Comparison': 'PREFIX'
}
],
}
def lambda_handler(event, context):
response = client.get_findings(
Filters={
'GeneratorId': [
{
'Value': 'arn:aws:securityhub:::ruleset/cis-aws-foundations-benchmark',
'Comparison': 'PREFIX'
},
],
},
)
results = response["Findings"]
while "NextToken" in response:
response = client.get_findings(Filters=_filter,NextToken=response["NextToken"])
results.extend(response["Findings"])
storedata = json.dumps(response)
print(storedata)
save_file = open("/tmp/SecurityHub-Findings.json", "w")
save_file.write(storedata)
save_file.close()
for name in glob.glob("/tmp/*"):
s3.meta.client.upload_file(name, "xxxxx-security-hubfindings", name)
TooManyRequestsException error is also getting now.

The problem is in this code that paginates the security findings results:
while "NextToken" in response:
response = client.get_findings(Filters=_filter,NextToken=response["NextToken"])
results.extend(response["Findings"])
storedata = json.dumps(response)
print(storedata)
The value of storedata after the while loop has completed is the last page of security findings, rather than the aggregate of the security findings.
However, you're already aggregating the security findings in results, so you can use that:
save_file = open("/tmp/SecurityHub-Findings.json", "w")
save_file.write(json.dumps(results))
save_file.close()

Related

How do you setup AWS Cloudfront to provide custom access to S3 bucket with signed cookies using wildcards

AWS Cloudfront with Custom Cookies using Wildcards in Lambda Function:
The problem:
On AWS s3 Storage to provide granular access control the preferred method is to use AWS Cloudfront with signed URL's.
Here is a good example how to setup cloudfront a bit old though, so you need to use the recommended settings not
the legacy and copy the generated policy down to S3.
https://medium.com/#himanshuarora/protect-private-content-using-cloudfront-signed-cookies-fd9674faec3
I have provided an example below on how to create one of these signed URL's using Python and the newest libraries.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-creating-signed-url-canned-policy.html
However this requires the creation of a signed URL for each item in the S3 bucket. To give wildcard access to a
directory of items in the S3 bucket you need use what is called a custom Policy. I could not find any working examples
of this code using Python, many of the online expamples have librarys that are depreciated. But attached is a working example.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-creating-signed-url-custom-policy.html
I had trouble getting the python cryptography package to work by building the lambda function on an Amazon Linux 2
instance on AWS EC2. Always came up with an error of a missing library. So I use Klayers for AWS and worked
https://github.com/keithrozario/Klayers/tree/master/deployments.
A working example for cookies for a canned policy (Means only a signed URL specific for each S3 file)
https://www.velotio.com/engineering-blog/s3-cloudfront-to-deliver-static-asset
My code for cookies for a custom policy (Means a single policy statement with URL wildcards etc). You must use the Cryptology
package type examples but the private_key.signer function was depreciated for a new private_key.sign function with an extra
argument. https://cryptography.io/en/latest/hazmat/primitives/asymmetric/rsa/#signing
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
import base64
import datetime
class CFSigner:
def sign_rsa(self, message):
private_key = serialization.load_pem_private_key(
self.keyfile, password=None, backend=default_backend()
)
signature = private_key.sign(message.encode(
"utf-8"), padding.PKCS1v15(), hashes.SHA1())
return signature
def _sign_string(self, message, private_key_file=None, private_key_string=None):
if private_key_file:
self.keyfile = open(private_key_file, "rb").read()
elif private_key_string:
self.keyfile = private_key_string.encode("utf-8")
return self.sign_rsa(message)
def _url_base64_encode(self, msg):
msg_base64 = base64.b64encode(msg).decode("utf-8")
msg_base64 = msg_base64.replace("+", "-")
msg_base64 = msg_base64.replace("=", "_")
msg_base64 = msg_base64.replace("/", "~")
return msg_base64
def generate_signature(self, policy, private_key_file=None):
signature = self._sign_string(policy, private_key_file)
encoded_signature = self._url_base64_encode(signature)
return encoded_signature
def create_signed_cookies2(self, url, private_key_file, keypair_id, expires_at):
policy = self.create_custom_policy(url, expires_at)
encoded_policy = self._url_base64_encode(
policy.encode("utf-8"))
signature = self.generate_signature(
policy, private_key_file=private_key_file)
cookies = {
"CloudFront-Policy": encoded_policy,
"CloudFront-Signature": signature,
"CloudFront-Key-Pair-Id": keypair_id,
}
return cookies
def sign_to_cloudfront(object_url, expires_at):
cf = CFSigner()
url = cf.create_signed_url(
url=object_url,
keypair_id="xxxxxxxxxx",
expire_time=expires_at,
private_key_file="xxx.pem",
)
return url
def create_signed_cookies(self, object_url, expires_at):
cookies = self.create_signed_cookies2(
url=object_url,
private_key_file="xxx.pem",
keypair_id="xxxxxxxxxx",
expires_at=expires_at,
)
return cookies
def create_custom_policy(self, url, expires_at):
return (
'{"Statement":[{"Resource":"'
+ url
+ '","Condition":{"DateLessThan":{"AWS:EpochTime":'
+ str(round(expires_at.timestamp()))
+ "}}}]}"
)
def lambda_handler(event, context):
response = event["Records"][0]["cf"]["response"]
headers = response.get("headers", None)
cf = CFSigner()
path = "https://www.example.com/*"
expire = datetime.datetime.now() + datetime.timedelta(days=3)
signed_cookies = cf.create_signed_cookies(path, expire)
headers["set-cookie"] = [{
"key": "set-cookie",
"value": "CloudFront-Policy={signed_cookies.get('CloudFront-Policy')}"
}]
headers["Set-cookie"] = [{
"key": "Set-cookie",
"value": "CloudFront-Signature={signed_cookies.get('CloudFront-Signature')}",
}]
headers["Set-Cookie"] = [{
"key": "Set-Cookie",
"value": "CloudFront-Key-Pair-Id={signed_cookies.get('CloudFront-Key-Pair-Id')}",
}]
print(response)
return response ```

AWS Textract InvalidParameterException

I have a .Net core client application using amazon Textract with S3,SNS and SQS as per the AWS Document , Detecting and Analyzing Text in Multipage Documents(https://docs.aws.amazon.com/textract/latest/dg/async.html)
Created an AWS Role with AmazonTextractServiceRole Policy and added the Following Trust relation ship as per the documentation (https://docs.aws.amazon.com/textract/latest/dg/api-async-roles.html)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "textract.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Subscribed SQS to the topic and Given Permission to the Amazon SNS Topic to Send Messages to the Amazon SQS Queue as per the aws documentation .
All Resources including S3 Bucket, SNS ,SQS are in the same us-west2 region
The following method shows a generic error "InvalidParameterException"
Request has invalid parameters
But If the NotificationChannel section is commented the code is working fine and returning the correct job id.
Error message is not giving a clear picture about the parameter. Highly appreciated any help .
public async Task<string> ScanDocument()
{
string roleArn = "aws:iam::xxxxxxxxxxxx:instance-profile/MyTextractRole";
string topicArn = "aws:sns:us-west-2:xxxxxxxxxxxx:AmazonTextract-My-Topic";
string bucketName = "mybucket";
string filename = "mytestdoc.pdf";
var request = new StartDocumentAnalysisRequest();
var notificationChannel = new NotificationChannel();
notificationChannel.RoleArn = roleArn;
notificationChannel.SNSTopicArn = topicArn;
var s3Object = new S3Object
{
Bucket = bucketName,
Name = filename
};
request.DocumentLocation = new DocumentLocation
{
S3Object = s3Object
};
request.FeatureTypes = new List<string>() { "TABLES", "FORMS" };
request.NotificationChannel = channel; /* Commenting this line work the code*/
var response = await this._textractService.StartDocumentAnalysisAsync(request);
return response.JobId;
}
Debugging Invalid AWS Requests
The AWS SDK validates your request object locally, before dispatching it to the AWS servers. This validation will fail with unhelpfully opaque errors, like the OP.
As the SDK is open source, you can inspect the source to help narrow down the invalid parameter.
Before we look at the code: The SDK (and documentation) are actually generated from special JSON files that describe the API, its requirements and how to validate them. The actual code is generated based on these JSON files.
I'm going to use the Node.js SDK as an example, but I'm sure similar approaches may work for the other SDKs, including .NET
In our case (AWS Textract), the latest Api version is 2018-06-27. Sure enough, the JSON source file is on GitHub, here.
In my case, experimentation narrowed the issue down to the ClientRequestToken. The error was an opaque InvalidParameterException. I searched for it in the SDK source JSON file, and sure enough, on line 392:
"ClientRequestToken": {
"type": "string",
"max": 64,
"min": 1,
"pattern": "^[a-zA-Z0-9-_]+$"
},
A whole bunch of undocumented requirements!
In my case the token I was using violated the regex (pattern in the above source code). Changing my token code to satisfy the regex solved the problem.
I recommend this approach for these sorts of opaque type errors.
After a long days analyzing the issue. I was able to resolve it .. as per the documentation topic only required SendMessage Action to the SQS . But after changing it to All SQS Action its Started Working . But Still AWS Error message is really misleading and confusing
you would need to change the permissions to All SQS Action and then use the code as below
def startJob(s3BucketName, objectName):
response = None
response = textract.start_document_text_detection(
DocumentLocation={
'S3Object': {
'Bucket': s3BucketName,
'Name': objectName
}
})
return response["JobId"]
def isJobComplete(jobId):
# For production use cases, use SNS based notification
# Details at: https://docs.aws.amazon.com/textract/latest/dg/api-async.html
time.sleep(5)
response = textract.get_document_text_detection(JobId=jobId)
status = response["JobStatus"]
print("Job status: {}".format(status))
while(status == "IN_PROGRESS"):
time.sleep(5)
response = textract.get_document_text_detection(JobId=jobId)
status = response["JobStatus"]
print("Job status: {}".format(status))
return status
def getJobResults(jobId):
pages = []
response = textract.get_document_text_detection(JobId=jobId)
pages.append(response)
print("Resultset page recieved: {}".format(len(pages)))
nextToken = None
if('NextToken' in response):
nextToken = response['NextToken']
while(nextToken):
response = textract.get_document_text_detection(JobId=jobId, NextToken=nextToken)
pages.append(response)
print("Resultset page recieved: {}".format(len(pages)))
nextToken = None
if('NextToken' in response):
nextToken = response['NextToken']
return pages
Invoking textract with Python, I received the same error until I truncated the ClientRequestToken down to 64 characters
response = client.start_document_text_detection(
DocumentLocation={
'S3Object':{
'Bucket': bucket,
'Name' : fileName
}
},
ClientRequestToken= fileName[:64],
NotificationChannel= {
"SNSTopicArn": "arn:aws:sns:us-east-1:AccountID:AmazonTextractXYZ",
"RoleArn": "arn:aws:iam::AccountId:role/TextractRole"
}
)
print('Processing started : %s' % json.dumps(response))

How to query the CostController API for cost forecast using boto3

I am trying to query the Cost Controller API of AWS for the cost forecast using boto3. Here is the code:
import boto3
client = boto3.client('ce', region_name='us-east-1', aws_access_key_id=key_id, aws_secret_access_key=secret_key)
#the args object presents the filters
data = client.get_cost_forecast(**args)
The result is:
AttributeError: 'CostExplorer' object has no attribute 'get_cost_forecast'
But the actual documentation for the API says that it provides the get_cost_forecast() function.
There is no method get_cost_forecast, you can refer below document to get cost forecast,
Boto3 CostForecast
eg.
import boto3
client = boto3.client('ce')
response = client.get_cost_forecast(
TimePeriod={
'Start': 'string',
'End': 'string'
},
Metric='BLENDED_COST'|'UNBLENDED_COST'|'AMORTIZED_COST'|'NET_UNBLENDED_COST'|'NET_AMORTIZED_COST'|'USAGE_QUANTITY'|'NORMALIZED_USAGE_AMOUNT',
Granularity='DAILY'|'MONTHLY'|'HOURLY',
},
PredictionIntervalLevel=123
)
So, I figured out that the version of botocore I am using 1.8.45 does not support the method get_cost_forecast(). An upgrade to the version 1.9.71 is needed. I hope that this will help other people facing this issue.

AWS S3 to Google Cloud Storage Transfer not working with Python Client Library because "precondition check failed"

I tried testing out cloud transfer function that would transfer an object from AWS S3 to GCS (as a one-off task) but I keep getting googleapiclient.errors.HttpError: <HttpError 400 when requesting https://storagetransfer.googleapis.com/v1/transferJobs?alt=json returned "Precondition check failed.">.
Here is the code:
import argparse
import datetime
import json
from pprint import pprint
import googleapiclient.discovery
def main(description, project_id, year, month, day, hours, minutes,
source_bucket, access_key, secret_access_key, sink_bucket):
"""Create a one-off transfer from Amazon S3 to Google Cloud Storage."""
storagetransfer = googleapiclient.discovery.build('storagetransfer', 'v1')
# Edit this template with desired parameters.
# Specify times below using US Pacific Time Zone.
transfer_job = {
'description': description,
'status': 'ENABLED',
'projectId': project_id,
'schedule': {
'scheduleStartDate': {
'day': day,
'month': month,
'year': year
},
'scheduleEndDate': {
'day': day,
'month': month,
'year': year
},
'startTimeOfDay': {
'hours': hours,
'minutes': minutes
}
},
'transferSpec': {
'awsS3DataSource': {
'bucketName': source_bucket,
'awsAccessKey': {
'accessKeyId': access_key,
'secretAccessKey': secret_access_key
}
},
'gcsDataSink': {
'bucketName': sink_bucket
}
}
}
result = storagetransfer.transferJobs().create(body=transfer_job).execute()
print('Returned transferJob: {}'.format(
json.dumps(result, indent=4)))
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('description', help='Transfer description.')
parser.add_argument('project_id', help='Your Google Cloud project ID.')
parser.add_argument('date', help='Date YYYY/MM/DD.')
parser.add_argument('time', help='Time (24hr) HH:MM.')
parser.add_argument('source_bucket', help='Source bucket name.')
parser.add_argument('access_key', help='Your AWS access key id.')
parser.add_argument('secret_access_key', help='Your AWS secret access '
'key.')
parser.add_argument('sink_bucket', help='Sink bucket name.')
args = parser.parse_args()
date = datetime.datetime.strptime(args.date, '%Y/%m/%d')
time = datetime.datetime.strptime(args.time, '%H:%M')
main(
args.description,
args.project_id,
date.year,
date.month,
date.day,
time.hour,
time.minute,
args.source_bucket,
args.access_key,
args.secret_access_key,
args.sink_bucket)
And here is the command I used to execute it:
python cloud_transfer_test.py "cloudtransfer" "MY_PROJECT_ID" "2018/04/12" "09:17" "SOURCE_BUCKET" "ACCESS_KEY" "SECRET_ACCESS_KEY" "SINK_BUCKET"
*Note: ACCESS_KEY and SECRET_ACCESS_KEY actually had values but I'm omiting them from this post. Also the date and time values are configured such that it runs as soon as I execute the script.
I suspect your problem will turn out to be the permissions for the not-obviously needed xxx#storage-transfer-service.iam.gserviceaccount.com
You can find out the name of the account you need using the storage transfer get call in the API explorer
https://cloud.google.com/storage/transfer/reference/rest/v1/googleServiceAccounts/get
The needed permissions are documented at https://cloud.google.com/storage/docs/access-control/iam-transfer
Be careful setting these; when I had this problem I had set them, but apparently not correctly, and after I went back to looking at them it still took me a couple tries before it worked. If they are not set correctly on the destination bucket/project you will get the precondition failed error.

AWS billing with python

I'm want to extract my current billing from aws by using amazon boto3 library for python, but couldn't find any API command that does so.
When trying to use previous version (boto2) with the fps connection and get_account_balance() method, i'm waiting for a response with no reply.
What's the correct way of doing so?
You can get the current bill of your AWS account by using the CostExplorer API.
Below is an example:
import boto3
client = boto3.client('ce', region_name='us-east-1')
response = client.get_cost_and_usage(
TimePeriod={
'Start': '2018-10-01',
'End': '2018-10-31'
},
Granularity='MONTHLY',
Metrics=[
'AmortizedCost',
]
)
print(response)
I use the CloudWatch API to extract billing information. The "AWS/Billing" namespace has everything you need.
captainblack has the right idea. Note, however, that the End Date is exclusive. In the example provided, you would only retrieve cost data from 2018-10-01 - 2018-10-30. The end date needs to be the first day of the following month if you want the cost data for the entire month:
import boto3
client = boto3.client('ce', region_name='us-east-1')
response = client.get_cost_and_usage(
TimePeriod={
'Start': '2018-10-01',
'End': '2018-11-01'
},
Granularity='MONTHLY',
Metrics=[
'AmortizedCost',
]
)
print(response)