BigQuery Create a table from Query Results - python-2.7

New to the Big Query API. Trying to just do a basic query and have it save to a table.
I am not sure what I am doing wrong with the below.(I have read the similar questions posted about this topic) I don't get an error but it also doesn't save the results in a table like I want.
Any thoughts/advice?
import argparse
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from oauth2client.client import GoogleCredentials
credentials = GoogleCredentials.get_application_default()
bigquery_service = build('bigquery', 'v2', credentials=credentials)
query_request = bigquery_service.jobs()
query_data = {
'query': (
'SELECT * '
'FROM [analytics.ddewber_acq_same_day] limit 5;'),
'destinationTable':{
"projectId": 'XXX-XXX-XXX',
"datasetId": 'analytics',
"tableId": "ddewber_test12"
},
"createDisposition": "CREATE_IF_NEEDED",
"writeDisposition": "WRITE_APPEND",
}
query_response = query_request.query(
projectId='XXX-XXX-XXX',
body=query_data).execute()

See the difference between Jobs: query (that you use in your example) and Jobs: insert (that you should use) APIs
Hope this gives you direction to fix your code

Related

How to use NextToken in Boto3

The below-mentioned code is created for exporting all the findings from the security hub to an S3 bucket using lambda functions. The filters are set for exporting only CIS-AWS foundations benchmarks. There are more than 20 accounts added as the members in security hub. The issue that I'm facing here is even though I'm using the NextToken configuration. The output doesn't have information about all the accounts. Instead, it just displays any one of the account's data randomly.
Can somebody look into the code and let me know what could be the issue, please?
import boto3
import json
from botocore.exceptions import ClientError
import time
import glob
client = boto3.client('securityhub')
s3 = boto3.resource('s3')
storedata = {}
_filter = Filters={
'GeneratorId': [
{
'Value': 'arn:aws:securityhub:::ruleset/cis-aws-foundations-benchmark',
'Comparison': 'PREFIX'
}
],
}
def lambda_handler(event, context):
response = client.get_findings(
Filters={
'GeneratorId': [
{
'Value': 'arn:aws:securityhub:::ruleset/cis-aws-foundations-benchmark',
'Comparison': 'PREFIX'
},
],
},
)
results = response["Findings"]
while "NextToken" in response:
response = client.get_findings(Filters=_filter,NextToken=response["NextToken"])
results.extend(response["Findings"])
storedata = json.dumps(response)
print(storedata)
save_file = open("/tmp/SecurityHub-Findings.json", "w")
save_file.write(storedata)
save_file.close()
for name in glob.glob("/tmp/*"):
s3.meta.client.upload_file(name, "xxxxx-security-hubfindings", name)
TooManyRequestsException error is also getting now.
The problem is in this code that paginates the security findings results:
while "NextToken" in response:
response = client.get_findings(Filters=_filter,NextToken=response["NextToken"])
results.extend(response["Findings"])
storedata = json.dumps(response)
print(storedata)
The value of storedata after the while loop has completed is the last page of security findings, rather than the aggregate of the security findings.
However, you're already aggregating the security findings in results, so you can use that:
save_file = open("/tmp/SecurityHub-Findings.json", "w")
save_file.write(json.dumps(results))
save_file.close()

API for bigquery.tabledata.insertAll method

Below is my code for writing into a BigQuery table:
from google.cloud import bigquery
response = bigquery.tabledata.insertAll(projectId=PROJECT_ID,datasetId=DATASET_ID,
tableId=TABLE_ID,
body=data).execute()
However, I'm getting the following error:
no module tabledata in google.cloud.bigquery
Can anyone help me with this?
Which API should I use here?
Please, check the Streaming data into BigQuery documentation. When using python you need to use following function:
insert_rows(table, rows, selected_fields=None, **kwargs)
which insert rows into a table via the streaming API. For more information refer to following BigQuery Python API reference documentation.
You can check Python streaming insert example:
# TODO(developer): Import the client library.
# from google.cloud import bigquery
# TODO(developer): Construct a BigQuery client object.
# client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the model to fetch.
# table_id = "your-project.your_dataset.your_table"
table = client.get_table(table_id) # Make an API request.
rows_to_insert = [(u"Phred Phlyntstone", 32), (u"Wylma Phlyntstone", 29)]
errors = client.insert_rows(table, rows_to_insert) # Make an API request.
if errors == []:
print("New rows have been added.")
You can also use API, calling tabledata.insertAll method to see the request and response of API. You need to specify projectId, datasetId, tableId. You can see JavaScript code snippet that is used to perform request:
function execute() {
return gapi.client.bigquery.tabledata.insertAll({
"projectId": "<your_projectId>",
"datasetId": "<your_datasetId>",
"tableId": "<your_tableId>",
"resource": {}
})
.then(function(response) {
// Handle the results here (response.result has the parsed body).
console.log("Response", response);
},
function(err) { console.error("Execute error", err); });
}
Let me know about the results.

How to query the CostController API for cost forecast using boto3

I am trying to query the Cost Controller API of AWS for the cost forecast using boto3. Here is the code:
import boto3
client = boto3.client('ce', region_name='us-east-1', aws_access_key_id=key_id, aws_secret_access_key=secret_key)
#the args object presents the filters
data = client.get_cost_forecast(**args)
The result is:
AttributeError: 'CostExplorer' object has no attribute 'get_cost_forecast'
But the actual documentation for the API says that it provides the get_cost_forecast() function.
There is no method get_cost_forecast, you can refer below document to get cost forecast,
Boto3 CostForecast
eg.
import boto3
client = boto3.client('ce')
response = client.get_cost_forecast(
TimePeriod={
'Start': 'string',
'End': 'string'
},
Metric='BLENDED_COST'|'UNBLENDED_COST'|'AMORTIZED_COST'|'NET_UNBLENDED_COST'|'NET_AMORTIZED_COST'|'USAGE_QUANTITY'|'NORMALIZED_USAGE_AMOUNT',
Granularity='DAILY'|'MONTHLY'|'HOURLY',
},
PredictionIntervalLevel=123
)
So, I figured out that the version of botocore I am using 1.8.45 does not support the method get_cost_forecast(). An upgrade to the version 1.9.71 is needed. I hope that this will help other people facing this issue.

Mocking DynamoDB using moto + serverless

I am trying to write tests for a serverless application using the AWS serverless framework. I am facing a weird issue. Whenever I try to mock S3 or DynamoDB using moto, it does not work. Instead of mocking, the boto3 call actually goes to my AWS account and tries to do things there.
This is not desirable behaviour. Could you please help?
Sample Code:
import datetime
import boto3
import uuid
import os
from moto import mock_dynamodb2
from unittest import mock, TestCase
from JobEngine.job_engine import check_duplicate
class TestJobEngine(TestCase):
#mock.patch.dict(os.environ, {'IN_QUEUE_URL': 'mytemp'})
#mock.patch('JobEngine.job_engine.logger')
#mock_dynamodb2
def test_check_duplicate(self, mock_logger):
id = 'ABCD123'
db = boto3.resource('dynamodb', 'us-east-1')
table = db.create_table(
TableName='my_table',
KeySchema=[
{
'AttributeName': 'id',
'KeyType': 'HASH'
}
],
AttributeDefinitions=[
{
'AttributeName': 'id',
'AttributeType': 'S'
}
],
ProvisionedThroughput={
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1
}
)
table.meta.client.get_waiter('table_exists').wait(TableName='my_table')
table.put_item(
Item={
'id': {'S': id},
... other data ...
}
)
res = check_duplicate(id)
self.assertTrue(mock_logger.info.called)
self.assertEqual(res, True, 'True')
Please see the above code, I am trying to insert a record into the table and then call a function that would verify if the specified id is already present in the table. Here, I get an error table already exists when I run this code.
If I disable the network, I get an error:
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://dynamodb.us-east-1.amazonaws.com/"
I fail to understand why there is an attempt to connect to AWS if we are trying to mock.
I did some digging and have finally managed to solve this.
See https://github.com/spulec/moto/issues/1793
This issue was due to some incompatibilities between boto and moto. Turns around that everything works fine when we downgrade botocore to 1.10.84

How do we query on a secondary index of dynamodb using boto3?

Is there a way at all to query on the global secondary index of dynamodb using boto3. I dont find any online tutorials or resources.
You need to provide an IndexName parameter for the query function.
This is the name of the index, which is usually different from the name of the index attribute (the name of the index has an -index suffix by default, although you can change it during table creation). For example, if your index attribute is called video_id, your index name is probably video_id-index.
import boto3
from boto3.dynamodb.conditions import Key
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('videos')
video_id = 25
response = table.query(
IndexName='video_id-index',
KeyConditionExpression=Key('video_id').eq(video_id)
)
To check the index name, go to the Indexes tab of the table on the web interface of AWS. You'll need a value from the Name column.
For anyone using the boto3 client, below example should work:
import boto3
# for production
client = boto3.client('dynamodb')
# for local development if running local dynamodb server
client = boto3.client(
'dynamodb',
region_name='localhost',
endpoint_url='http://localhost:8000'
)
resp = client.query(
TableName='UsersTabe',
IndexName='MySecondaryIndexName',
ExpressionAttributeValues={
':v1': {
'S': 'some#email.com',
},
},
KeyConditionExpression='emailField = :v1',
)
# will always return list
items = resp.get('Items')
first_item = items[0]
Adding the updated technique:
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource(
'dynamodb',
region_name='localhost',
endpoint_url='http://localhost:8000'
)
table = dynamodb.Table('userTable')
attributes = table.query(
IndexName='UserName',
KeyConditionExpression=Key('username').eq('jdoe')
)
if 'Items' in attributes and len(attributes['Items']) == 1:
attributes = attributes['Items'][0]
There are so many questions like this because calling dynamo through boto3 is not intuitive. I use dynamof library to make things like this a lot more common sense. Using dynamof the call looks like this.
from dynamof.operations import query
from dynamof.conditions import attr
query(
table_name='users',
conditions=attr('role').equals('admin'),
index_name='role_lookup_index')
https://github.com/rayepps/dynamof
disclaimer: I wrote dynamof