I have doubled checked that the item exists in the dynamodb table. id is the default hash key.
I want to retrieve the content by using the main function in this code:
import boto.dynamodb2
from boto.dynamodb2 import table
table='doc'
region='us-west-2'
aws_access_key_id='YYY'
aws_secret_access_key='XXX'
def get_db_conn():
return boto.dynamodb2.connect_to_region(
region,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key)
def get_table():
return table.Table(table, get_db_conn())
def main():
tbl = get_table()
doc = tbl.get_item(id='4d7a73b6-2121-46c8-8fc2-54cd4ceb2a30')
print doc.keys()
However I get this exception instead:
File "scripts/support/find_doc.py", line 31, in <module>
main()
File "scripts/support/find_doc.py", line 33, in main
doc = tbl.get_item(id='4d7a73b6-2121-46c8-8fc2-54cd4ceb2a30')
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/dynamodb2/table.py", line 504, in get_item
consistent_read=consistent
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 1065, in get_item
body=json.dumps(params))
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 2731, in make_request
retry_handler=self._retry_handler)
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/connection.py", line 953, in _mexe
status = retry_handler(response, i, next_sleep)
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 2774, in _retry_handler
data)
boto.exception.JSONResponseError: JSONResponseError: 400 Bad Request
{u'message': u'Requested resource not found', u'__type': u'com.amazonaws.dynamodb.v20120810#ResourceNotFoundException'}
Why I am getting this error message?
I am using boto version 2.34
The problem is in this code:
def get_table():
return table.Table(table, get_db_conn())
It should be
def get_table():
return table.Table(table, connection=get_db_conn())
Note the connection named parameter
If you have a range key you have to specify in the get_item, like so:
get_item(timestamp=Decimal('1444232509'), id='HASH_SHA1')
Here on my table Packages I have an index (id) and a range key (timestamp).
I was getting this error because I was connecting to the wrong region.
To check your table region, go to overview tab of your table and scroll down to Amazon Resource Name (ARN) field.
My ARN starts with arn:aws:dynamodb:us-east-2:. Here 'us-east-2' is the region I need to pass while initiating the boto3 client.
Related
I have a specific use case where I want to upload an object to S3 at a specific prefix. A file already exists at that prefix and I want to replace that file with this new one. I am using boto3 to do the same and I am getting the following error. Bucket versioning is turned off and hence I am expecting the file to be overwritten in this case. However, I get the following error.
{
"errorMessage": "An error occurred (InvalidRequest) when calling the CopyObject operation: This copy request is illegal because it is trying to copy an object to itself without changing the object's metadata, storage class, website redirect location or encryption attributes.",
"errorType": "ClientError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 25, in lambda_handler\n s3.Object(bucket,product_key).copy_from(CopySource=bucket + '/' + product_key)\n",
" File \"/var/runtime/boto3/resources/factory.py\", line 520, in do_action\n response = action(self, *args, **kwargs)\n",
" File \"/var/runtime/boto3/resources/action.py\", line 83, in __call__\n response = getattr(parent.meta.client, operation_name)(*args, **params)\n",
" File \"/var/runtime/botocore/client.py\", line 386, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/var/runtime/botocore/client.py\", line 705, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]
}
This is what I have tried so far.
import boto3
import tempfile
import os
import tempfile
print('Loading function')
s3 = boto3.resource('s3')
glue = boto3.client('glue')
bucket='my-bucket'
bucket_prefix='my-prefix'
def lambda_handler(_event, _context):
my_bucket = s3.Bucket(bucket)
# Code to find the object name. There is going to be only one file.
for object_summary in my_bucket.objects.filter(Prefix=bucket_prefix):
product_key= object_summary.key
print(product_key)
#Using product_key variable I am trying to copy the same file name to the same location, which is when I get an error.
s3.Object(bucket,product_key).copy_from(CopySource=bucket + '/' + product_key)
# Maybe the following line is not required
s3.Object(bucket,bucket_prefix).delete()
I have a very specific reason to copy the same file at the same location. AWS GLue doesn't pick the same file once it's bookmarked it. My copying the file again I am hoping that the Glue bookmark will be dropped and the Glue job will consider this as a new file.
I am not too tied up with the name. If you can help me modify the above code to generate a new file at the same prefix level that would work as well. There always has to be one file here though. Consider this file as a static list of products that has been bought over from a relational DB into S3.
Thanks
From Tracking Processed Data Using Job Bookmarks - AWS Glue:
For Amazon S3 input sources, AWS Glue job bookmarks check the last modified time of the objects to verify which objects need to be reprocessed. If your input source data has been modified since your last job run, the files are reprocessed when you run the job again.
So, it seems your theory could work!
However, as the error message states, it is not permitted to copy an S3 object to itself "without changing the object's metadata, storage class, website redirect location or encryption attributes".
Therefore, you can add some metadata as part of the copy process and it will succeed. For example:
s3.Object(bucket,product_key).copy_from(CopySource=bucket + '/' + product_key, Metadata={'foo': 'bar'})
I've used the sample from the BQ documentation to read a BQ table into a pandas dataframe using this query:
query_string = """
SELECT
CONCAT(
'https://stackoverflow.com/questions/',
CAST(id as STRING)) as url,
view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
"""
dataframe = (
bqclient.query(query_string)
.result()
.to_dataframe(bqstorage_client=bqstorageclient)
)
print(dataframe.head())
url view_count
0 https://stackoverflow.com/questions/22879669 48540
1 https://stackoverflow.com/questions/13530967 45778
2 https://stackoverflow.com/questions/35159967 40458
3 https://stackoverflow.com/questions/10604135 39739
4 https://stackoverflow.com/questions/16609219 34479
However, the minute I try and use any other non-public data-set, I get the following error:
google.api_core.exceptions.FailedPrecondition: 400 there was an error creating the session: the table has a storage format that is not supported
Is there some setting I need to set in my table so that it can work with the BQ Storage API?
This works:
query_string = """SELECT funding_round_type, count(*) FROM `datadocs-py.datadocs.investments` GROUP BY funding_round_type order by 2 desc LIMIT 2"""
>>> bqclient.query(query_string).result().to_dataframe()
funding_round_type f0_
0 venture 104157
1 seed 43747
However, when I set it to use the bqstorageclient I get that error:
>>> bqclient.query(query_string).result().to_dataframe(bqstorage_client=bqstorageclient)
Traceback (most recent call last):
File "/Users/david/Desktop/V/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
return callable_(*args, **kwargs)
File "/Users/david/Desktop/V/lib/python3.6/site-packages/grpc/_channel.py", line 533, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/Users/david/Desktop/V/lib/python3.6/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.FAILED_PRECONDITION
details = "there was an error creating the session: the table has a storage format that is not supported"
debug_error_string = "{"created":"#1565047973.444089000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"there was an error creating the session: the table has a storage format that is not supported","grpc_status":9}"
>
I experienced the same issue as of 06 Nov 2019 and it turns out that the error that you are getting is a known issue with the Read API as it cannot currently handle result sets smaller than 10MB. I came across this that shed some light on this problem:
GitHub.com - GoogleCloudPlatform/spark-bigquery-connector - FAILED_PRECONDITION: there was an error creating the session: the table has a storage format that is not supported #46
I have tested it with a query that returns a larger than 10MB result set and it seems to be working fine for me with an EU multi-regional location of the dataset that I am querying against.
Also, you will need to install fastavro in your environment for this functionality to work.
I have a TensorFlow Model deployed with AWS SageMaker endpoint exposed .
From Lambda Python I am using boto3 client to invoke the endpoint .
The TensorFlow model accepts 3 inputs as follows
{'input1' : numpy array , 'input2' : integer ,'input3' :numpy array }
From Lambda using runtime.invoke_endpoint to invoke the SageMaker endpoint .
Getting the error as Parse Error when the API is invoked from boto3client
I tried serializing the data into csv format before calling the API endpoint
Below code written in Lambda
payload = {'input1': encoded_enc_inputstanza_in_batch,
'input2' : encoded_enc_inputstanza_in_batch.shape[1],
'input3' : np.reshape([[15]*20],20) }
infer_file = io.StringIO()
writer = csv.writer(infer_file)
for key, value in payload.items():
writer.writerow([key, value])
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='text/csv',
Body=infer_file.getvalue())
Additional Details
These are the additional details
- Sagemaker Model expects 3 fields as input - 'input1' - Numpy array
'input2' - Int data type ,
'input3' - numpy array
Actual result -
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 143, in lambda_handler
Body=infer_file.getvalue())
File "/var/runtime/botocore/client.py", line 320, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 623, in _make_api_call
raise error_class(parsed_response, operation_name)
END RequestId: fa70e1f3-763b-41be-ad2d-76ae80aefcd0
Expected Result - Successful invocation of the API endpoint .
For text/csv the value for the Body argument to invoke_endpoint should be a string with commas separating the values for each feature. For example, a record for a model with four features might look like: 1.5,16.0,14,23.0. You can try the following:
dp = ','.join(str(a) for a in payload)
After converting the data to text/csv format with comma delimited, have you updated the Endpoint to use the new model data? The input data needs to match the schema of the model.
Is there comma in the "encoded_enc_inputstanza_in_batch" variable?
I am following this example to batch insert records into a table but modifying it to fit my specific example as such
sql='INSERT INTO CypressApp_grammatrix (name, row_num, col_num, gram_amount) VALUES {}'.format(', '.join(['(%s, %s, %s, %s)']*len(gram_matrix)),)
#print sql
params=[]
for gram in gram_matrix:
col_num=1
for g in gram:
params.extend([(matrix_name, row_num, col_num, g)])
col_num += 1
row_num += 1
print params
with closing(connection.cursor()) as cursor:
cursor.execute(sql, params)
However, upon doing so, I receive this error
return cursor._last_executed.decode('utf-8')
File "/usr/local/lib/python2.7/dist-packages/django/db/backends/mysql/base.py", line 150, in __getattr__
return getattr(self.cursor, attr)
AttributeError: 'Cursor' object has no attribute '_last_executed'
I would like to know why I received this error and what I can do to fix it, although I feel the problem could be with this code that works with MySQL that I did not write
def last_executed_query(self, cursor, sql, params):
# With MySQLdb, cursor objects have an (undocumented) "_last_executed"
# attribute where the exact query sent to the database is saved.
# See MySQLdb/cursors.py in the source distribution.
return cursor._last_executed.decode('utf-8')
So I don't know if I simply have an old copy of MySQLdb or what, but the problem appear to be with cursors.py. The only spot in that file where you can find _last_executed is here
def _do_query(self, q):
db = self._get_db()
self._last_executed = q
db.query(q)
self._do_get_result()
return self.rowcount
However, the __init__ does not set up this variable as an instance attribute. It's missing completely. So I took the liberty of adding it myself and initializing it to some query string. I assumed any would do, so I just added
class BaseCursor(object):
"""A base for Cursor classes. Useful attributes:
description
A tuple of DB API 7-tuples describing the columns in
the last executed query; see PEP-249 for details.
description_flags
Tuple of column flags for last query, one entry per column
in the result set. Values correspond to those in
MySQLdb.constants.FLAG. See MySQL documentation (C API)
for more information. Non-standard extension.
arraysize
default number of rows fetchmany() will fetch
"""
from _mysql_exceptions import MySQLError, Warning, Error, InterfaceError, \
DatabaseError, DataError, OperationalError, IntegrityError, \
InternalError, ProgrammingError, NotSupportedError
def __init__(self, connection):
from weakref import ref
...
self._last_executed ="SELECT * FROM T"
...
Now the cursor object does have the attribute _last_executed and when this function
def last_executed_query(self, cursor, sql, params):
# With MySQLdb, cursor objects have an (undocumented) "_last_executed"
# attribute where the exact query sent to the database is saved.
# See MySQLdb/cursors.py in the source distribution.
return cursor._last_executed.decode('utf-8')
in base.py is called, the attribute does exist and so this error
return cursor._last_executed.decode('utf-8')
File "/usr/local/lib/python2.7/dist-
packages/django/db/backends/mysql/base.py", line 150, in __getattr__
return getattr(self.cursor, attr)
AttributeError: 'Cursor' object has no attribute '_last_executed'
will not be encountered. At least that is how I believe it works. In any case, it fixed the situation for me.
I'm working on an app that will stream events into BQ. Since Streamed Inserts require the table to pre-exist, I'm running the following code to check if the table exists, and then to create it if it doesn't:
TABLE_ID = "data" + single_date.strftime("%Y%m%d")
exists = False;
request = bigquery.tables().list(projectId=PROJECT_ID,
datasetId=DATASET_ID)
response = request.execute()
while response is not None:
for t in response.get('tables', []):
if t['tableReference']['tableId'] == TABLE_ID:
exists = True
break
request = bigquery.tables().list_next(request, response)
if request is None:
break
if not exists:
print("Creating Table " + TABLE_ID)
dataset_ref = {'datasetId': DATASET_ID,
'projectId': PROJECT_ID}
table_ref = {'tableId': TABLE_ID,
'datasetId': DATASET_ID,
'projectId': PROJECT_ID}
schema_ref = SCHEMA
table = {'tableReference': table_ref,
'schema': schema_ref}
table = bigquery.tables().insert(body=table, **dataset_ref).execute(http)
I'm running python 2.7, and have installed the google client API through PIP.
When I try to run the script, I get the following error:
No handlers could be found for logger "oauth2client.util"
Traceback (most recent call last):
File "do_hourly.py", line 158, in <module>
main()
File "do_hourly.py", line 101, in main
body=table, **dataset_ref).execute(http)
File "build/bdist.linux-x86_64/egg/oauth2client/util.py", line 142, in positional_wrapper
File "/usr/lib/python2.7/site-packages/googleapiclient/http.py", line 721, in execute
resp, content = http.request(str(self.uri), method=str(self.method),
AttributeError: 'module' object has no attribute 'request'
I tried researching the issue, but all I could find was info about confusing between urllib, urllib2 and Python 2.7 / 3.
I'm not quite sure how to continue with this, and will appreciate all help.
Thanks!
Figured out that the issue was in the following line, which I took from another SO thread:
table = bigquery.tables().insert(body=table, **dataset_ref).execute(http)
Once I removed the "http" variable, which doesn't exist in my scope, the exception dissappeared