How do we query on a secondary index of dynamodb using boto3? - amazon-web-services

Is there a way at all to query on the global secondary index of dynamodb using boto3. I dont find any online tutorials or resources.

You need to provide an IndexName parameter for the query function.
This is the name of the index, which is usually different from the name of the index attribute (the name of the index has an -index suffix by default, although you can change it during table creation). For example, if your index attribute is called video_id, your index name is probably video_id-index.
import boto3
from boto3.dynamodb.conditions import Key
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('videos')
video_id = 25
response = table.query(
IndexName='video_id-index',
KeyConditionExpression=Key('video_id').eq(video_id)
)
To check the index name, go to the Indexes tab of the table on the web interface of AWS. You'll need a value from the Name column.

For anyone using the boto3 client, below example should work:
import boto3
# for production
client = boto3.client('dynamodb')
# for local development if running local dynamodb server
client = boto3.client(
'dynamodb',
region_name='localhost',
endpoint_url='http://localhost:8000'
)
resp = client.query(
TableName='UsersTabe',
IndexName='MySecondaryIndexName',
ExpressionAttributeValues={
':v1': {
'S': 'some#email.com',
},
},
KeyConditionExpression='emailField = :v1',
)
# will always return list
items = resp.get('Items')
first_item = items[0]

Adding the updated technique:
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource(
'dynamodb',
region_name='localhost',
endpoint_url='http://localhost:8000'
)
table = dynamodb.Table('userTable')
attributes = table.query(
IndexName='UserName',
KeyConditionExpression=Key('username').eq('jdoe')
)
if 'Items' in attributes and len(attributes['Items']) == 1:
attributes = attributes['Items'][0]

There are so many questions like this because calling dynamo through boto3 is not intuitive. I use dynamof library to make things like this a lot more common sense. Using dynamof the call looks like this.
from dynamof.operations import query
from dynamof.conditions import attr
query(
table_name='users',
conditions=attr('role').equals('admin'),
index_name='role_lookup_index')
https://github.com/rayepps/dynamof
disclaimer: I wrote dynamof

Related

Import and use a Tink AEAD KMS key in BigQuery

I've gone with the approach of using a kms_aead_key template to generate a DEK with the key in KMS acting as the KEK and then writing the encrypted key out for use in BigQuery as a variable.
I can’t seem to decrypt the DEK in BigQuery. It always fails with an error asking if I am referencing the right key which I’m sure I am.
I’ve reproduced the problem in the script below. This script uses the same key_uri to encrypt in Python and decrypt in BigQuery, but I still get that error. I think the issue could be the format in which I am writing the key out and passing that to BQ, but I’m a little lost.
To get the script to run I had to download the roots.pem file from https://github.com/grpc/grpc/blob/master/etc/roots.pem
A placed the roots.pem file in the same directory as my script.
I then set this environment variable GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=roots.pem
If you can figure out what is wrong I would be very grateful.
Thanks
import io
import tink
from tink import aead, cleartext_keyset_handle
from tink.integration import gcpkms
from google.cloud import bigquery
key_uri = 'gcp-kms://projects/PROJECT/locations/europe-west2/keyRings/KEY_RING/cryptoKeys/KEY_NAME'
aead.register()
gcpkms.GcpKmsClient.register_client(key_uri=key_uri, credentials_path="")
template = aead.aead_key_templates.create_kms_aead_key_template(key_uri=key_uri)
keyset_handle = tink.KeysetHandle.generate_new(template)
kms_aead_primitive = keyset_handle.primitive(aead.Aead)
# Encrypt
encrypted_value = kms_aead_primitive.encrypt(
plaintext='encrypt_me'.encode('utf-8'),
associated_data='test'.encode('utf-8')
)
out = io.BytesIO()
writer = tink.BinaryKeysetWriter(out)
cleartext_keyset_handle.write(writer, keyset_handle)
out.seek(0)
key = out.read()
# Decrypt in BigQuery
bq_client = bigquery.Client(location='europe-west2')
sql = f"""
DECLARE kms_resource_name STRING;
DECLARE first_level_keyset BYTES;
DECLARE associated_data STRING;
SET kms_resource_name = '{key_uri}';
SET first_level_keyset = {key};
SET associated_data = 'test';
SELECT
AEAD.DECRYPT_STRING(
KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
{encrypted_value},
associated_data
) as Decrypt,
"""
job_config = bigquery.QueryJobConfig(
priority=bigquery.QueryPriority.BATCH
)
job = bq_client.query(sql, job_config=job_config)
result = job.result()

Dynamically Insert/Update Item in DynamoDB With Python Lambda using event['body']

I am working on a lambda function that gets called from API Gateway and updates information in dynamoDB. I have half of this working really dynamically, and im a little stuck on updating. Here is what im working with:
dynamoDB table with a partition key of guild_id
My dummy json code im using:
{
"guild_id": "126",
"guild_name": "Posted Guild",
"guild_premium": "true",
"guild_prefix": "z!"
}
Finally the lambda code:
import json
import boto3
def lambda_handler(event, context):
client = boto3.resource("dynamodb")
table = client.Table("guildtable")
itemData = json.loads(event['body'])
guild = table.get_item(Key={'guild_id':itemData['guild_id']})
#If Guild Exists, update
if 'Item' in guild:
table.update_item(Key=itemData)
responseObject = {}
responseObject['statusCode'] = 200
responseObject['headers'] = {}
responseObject['headers']['Content-Type'] = 'application/json'
responseObject['body'] = json.dumps('Updated Guild!')
return responseObject
#New Guild, Insert Guild
table.put_item(Item=itemData)
responseObject = {}
responseObject['statusCode'] = 200
responseObject['headers'] = {}
responseObject['headers']['Content-Type'] = 'application/json'
responseObject['body'] = json.dumps('Inserted Guild!')
return responseObject
The insert part is working wonderfully, How would I accomplish a similar approach with update item? Im wanting this to be as dynamic as possible so I can throw any json code (within reason) at it and it stores it in the database. I am wanting my update method to take into account adding fields down the road and handling those
I get the follow error:
Lambda execution failed with status 200 due to customer function error: An error occurred (ValidationException) when calling the UpdateItem operation: The provided key element does not match the schema.
A "The provided key element does not match the schema" error means something is wrong with Key (= primary key). Your schema's primary key is guild_id: string. Non-key attributes belong in the AttributeUpdate parameter. See the docs.
Your itemdata appears to include non-key attributes. Also ensure guild_id is a string "123" and not a number type 123.
goodKey={"guild_id": "123"}
table.update_item(Key=goodKey, UpdateExpression="SET ...")
The docs have a full update_item example.

DynamoDB Conditional Put_Item

Hi Stackoverflow I'm trying to conditionally put an item within a DynamoDB table. The DynamoDB table has the following attributes.
ticker - Partition Key
price_date - Sort Key
price - Attribute
Every minute I'm calling an API which gives me a minute by minute list of dictionaries for all stock prices within the day so far. However, the data I receive from the API sometimes can be behind by a minute or two. I don't particularly want to overwrite all the records within the DynamoDB table every time I get new data. To achieve this I've tried to create a conditional expression to only use put_item when there is a match on ticker but there is a new price_date
I've created a simplification of my code below to better illustrate my problem.
import boto3
from boto3.dynamodb.conditions import Attr
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('stock-intraday')
data = [
{'ticker': 'GOOG', 'price_date': '2021-10-08T9:30:00.000Z', 'price': 100},
{'ticker': 'GOOG', 'price_date': '2021-10-08T9:31:00.000Z', 'price': 101}
]
for item in data:
dynamodb_response = table.put_item(Item=item,
ConditionExpression=Attr("ticker").exists() & Attr("price_date").not_exists())
However when I run this code I get this error...
What is wrong with my conditional expression?
Found an answer to my own problem. DynamoDB was throwing an error because my code WAS working but with some minor changes.
There needed to be a TRY EXCEPT block but also since the partition key is already evaluated only the price_date needed to be included within the condition expression
import boto3
from boto3.dynamodb.conditions import Attr
from botocore.exceptions import ClientError
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('stock-intraday')
data = [
{'ticker': 'GOOG', 'price_date': '2021-10-08T9:30:00.000Z', 'price': 100},
{'ticker': 'GOOG', 'price_date': '2021-10-08T9:31:00.000Z', 'price': 101}]
for item in data:
try:
dynamodb_response = table.put_item(Item=item,
ConditionExpression=Attr("price_date").not_exists())
except ClientError as e:
if e.response['Error']['Code'] == 'ConditionalCheckFailedException':
pass

how to set table expiry time big query

Need help in setting expiry time for a new table in GBQ.
I am creating/uploading a new file as a table in gbq using the below code,
def uploadCsvToGbq(self, table_name, jsonSchema, csvFile, delim):
job_data = {
'jobReference': {
'projectId': self.project_id,
'job_id': str(uuid.uuid4())
},
#"expires":str(datetime.now()+timedelta(seconds=60)),
#"expirationTime": 20000,
#"defaultTableExpirationMs":20000,
'configuration': {
'load': {'writeDisposition': 'WRITE_TRUNCATE',
'fieldDelimiter': delim,
'skipLeadingRows': 1,
'sourceFormat': 'CSV',
'schema': {
'fields': jsonSchema
},
'destinationTable': {
'projectId': self.project_id,
'datasetId': self.dataset_id,
'tableId': table_name
}
}
}
}
upload = MediaFileUpload(csvFile,
mimetype='application/octet-stream', chunksize=1048576,
# This enables resumable uploads.
resumable=True)
start = time.time()
job_id = 'job_%d' % start
# Create the job.
return self.bigquery.jobs().insert(projectId=self.project_id,
body=job_data,
media_body=upload).execute()
This is a perfect code that uploads that file into GBQ as a new table,now i need to set the expiry time for the table,already i tried setting(which is commented) expires,expirationTime and defaultTableExpirationMs,but nothing works.
Do anyone have any idea?
You should use Tables: patch API and set expirationTime property
Below function creates a table with an expirationTime, so as an alternative solution you can create the table first and insert the data later.
def createTableWithExpire(bigquery, dataset_id, table_id, expiration_time):
"""
Creates a BQ table that will be expired in specified time.
Expiration time can be in Unix timestamp format e.g. 1452627594
"""
table_data = {
"expirationTime": expiration_time,
"tableReference":
{
"tableId": table_id
}
}
return bigquery.tables().insert(
projectId=_PROJECT_ID,
datasetId=dataset_id,
body=table_data).execute()
Also answered by Mikhail in this SO question.
Thankyou both,I combined both solution,but made some modifications to work for mine.
As i am creating the table by uploading csv, i am setting the expirationTime by calling patch method and passing tableid to that,
def createTableWithExpire(bigquery, dataset_id, table_id, expiration_time):
"""
Creates a BQ table that will be expired in specified time.
Expiration time can be in Unix timestamp format e.g. 1452627594
"""
table_data = {
"expirationTime": expiration_time,
}
return bigquery.tables().patch(
projectId=_PROJECT_ID,
datasetId=dataset_id,
tableId=table_id,
body=table_data).execute()
Another alternative is to set the expiration time after the table has been created:
from google.cloud import bigquery
import datetime
client = bigquery.Client()
table_ref = client.dataset('my-dataset').table('my-table') # get table ref
table = client.get_table(table_ref) # get Table object
# set datetime of expiration, must be a datetime type
table.expires = datetime.datetime.combine(datetime.date.today() +
datetime.timedelta(days=2),
datetime.time() )
table = client.update_table(table, ['expires']) # update table

BigQuery Create a table from Query Results

New to the Big Query API. Trying to just do a basic query and have it save to a table.
I am not sure what I am doing wrong with the below.(I have read the similar questions posted about this topic) I don't get an error but it also doesn't save the results in a table like I want.
Any thoughts/advice?
import argparse
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from oauth2client.client import GoogleCredentials
credentials = GoogleCredentials.get_application_default()
bigquery_service = build('bigquery', 'v2', credentials=credentials)
query_request = bigquery_service.jobs()
query_data = {
'query': (
'SELECT * '
'FROM [analytics.ddewber_acq_same_day] limit 5;'),
'destinationTable':{
"projectId": 'XXX-XXX-XXX',
"datasetId": 'analytics',
"tableId": "ddewber_test12"
},
"createDisposition": "CREATE_IF_NEEDED",
"writeDisposition": "WRITE_APPEND",
}
query_response = query_request.query(
projectId='XXX-XXX-XXX',
body=query_data).execute()
See the difference between Jobs: query (that you use in your example) and Jobs: insert (that you should use) APIs
Hope this gives you direction to fix your code