Exception when trying to create bigquery table via python API - python-2.7

I'm working on an app that will stream events into BQ. Since Streamed Inserts require the table to pre-exist, I'm running the following code to check if the table exists, and then to create it if it doesn't:
TABLE_ID = "data" + single_date.strftime("%Y%m%d")
exists = False;
request = bigquery.tables().list(projectId=PROJECT_ID,
datasetId=DATASET_ID)
response = request.execute()
while response is not None:
for t in response.get('tables', []):
if t['tableReference']['tableId'] == TABLE_ID:
exists = True
break
request = bigquery.tables().list_next(request, response)
if request is None:
break
if not exists:
print("Creating Table " + TABLE_ID)
dataset_ref = {'datasetId': DATASET_ID,
'projectId': PROJECT_ID}
table_ref = {'tableId': TABLE_ID,
'datasetId': DATASET_ID,
'projectId': PROJECT_ID}
schema_ref = SCHEMA
table = {'tableReference': table_ref,
'schema': schema_ref}
table = bigquery.tables().insert(body=table, **dataset_ref).execute(http)
I'm running python 2.7, and have installed the google client API through PIP.
When I try to run the script, I get the following error:
No handlers could be found for logger "oauth2client.util"
Traceback (most recent call last):
File "do_hourly.py", line 158, in <module>
main()
File "do_hourly.py", line 101, in main
body=table, **dataset_ref).execute(http)
File "build/bdist.linux-x86_64/egg/oauth2client/util.py", line 142, in positional_wrapper
File "/usr/lib/python2.7/site-packages/googleapiclient/http.py", line 721, in execute
resp, content = http.request(str(self.uri), method=str(self.method),
AttributeError: 'module' object has no attribute 'request'
I tried researching the issue, but all I could find was info about confusing between urllib, urllib2 and Python 2.7 / 3.
I'm not quite sure how to continue with this, and will appreciate all help.
Thanks!

Figured out that the issue was in the following line, which I took from another SO thread:
table = bigquery.tables().insert(body=table, **dataset_ref).execute(http)
Once I removed the "http" variable, which doesn't exist in my scope, the exception dissappeared

Related

Send string gremlin query to Amazon Neptune database using TinkerPop's gremlinpython

We can do the following to create a connection, and then attached the connection to the graph g object, and then use g to mirror gremlin query inline.
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
Create a GraphTraversalSource which is the basis for all Gremlin traversals:
graph = Graph()
connection = DriverRemoteConnection('ws://localhost:8182/gremlin', 'g')
g = graph.traversal().withRemote(connection)
g.V().limit(2).toList()
However, I want to submit string grelmin query like below,
connection = DriverRemoteConnection('ws://localhost:8182/gremlin', 'g')
query = "g.V().limit(2).toList()"
connection.submit(query)
Then I'm getting the following error. Looks like I did NOT call the submit() function correctly, and I can't find any docs or examples on this function. Please help.
[ERROR] AttributeError: 'str' object has no attribute 'source_instructions'
Traceback (most recent call last):
  File "/var/task/sentry_sdk/integrations/aws_lambda.py", line 152, in sentry_handler
    return handler(aws_event, aws_context, *args, **kwargs)
    response = remoteConn.submit(query)
  File "/var/task/gremlin_python/driver/driver_remote_connection.py", line 56, in submit
    result_set = self._client.submit(bytecode, request_options=self._extract_request_options(bytecode))
  File "/var/task/gremlin_python/driver/driver_remote_connection.py", line 81, in _extract_request_options
    options_strategy = next((x for x in bytecode.source_instructions
Here is an example of calling submit from Gremlin Python, you need to create the connection a slightly different way:
client = client.Client('ws://localhost:8182/gremlin','g')
query = """
g.V().hasLabel('airport').
sample(30).
order().by('code').
local(__.values('code','city').fold()).
toList()
"""
result = client.submit(query)
future_results = result.all()
results = future_results.result()
client.close()
The full example is here

Why I am getting this error "google.api_core.exceptions.ResourceExhausted: 429 received trailing metadata size exceeds limit"?

I am new to google cloud platform. I have created a endpoint after uploading a model on google Vertex AI. But when I am running the prediction function (python) suggested in the sample request I am getting this error :-
Traceback (most recent call last):
File "C:\Users\My\anaconda3\lib\site-packages\google\api_core\grpc_helpers.py", line 67, in
error_remapped_callable
return callable_(*args, **kwargs)
File "C:\Users\My\anaconda3\lib\site-packages\grpc\_channel.py", line 923, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "C:\Users\My\anaconda3\lib\site-packages\grpc\_channel.py", line 826, in
_end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "received trailing metadata size exceeds limit"
debug_error_string = "{"created":"#1622724354.768000000","description":"Error received
from peer ipv4:***.***.***.**","file":"src/core/lib/surface/call.cc",
"file_line":1063,"grpc_message":"received trailing metadata size exceeds limit",
"grpc_status":8}">
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "b.py", line 39, in <module>
predict_custom_trained_model_sample(
File "b.py", line 28, in predict_custom_trained_model_sample
response = client.predict(
File "C:\Users\My\anaconda3\lib\site-
packages\google\cloud\aiplatform_v1\services\prediction_service\client.py", line 445, in
predict
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "C:\Users\My\anaconda3\lib\site-packages\google\api_core\gapic_v1\method.py", line 145,
in __call__
return wrapped_func(*args, **kwargs)
File "C:\Users\My\anaconda3\lib\site-packages\google\api_core\grpc_helpers.py", line 69, in
error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.ResourceExhausted: 429 received trailing metadata size exceeds limit
the code that I executed for prediction is
from typing import Dict
from google.cloud import aiplatform
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
def predict_custom_trained_model_sample(
project: str,
endpoint_id: str,
instance_dict: Dict,
location: str = "us-central1",
api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
# The AI Platform services require regional API endpoints.
client_options = {"api_endpoint": api_endpoint}
# Initialize client that will be used to create and send requests.
# This client only needs to be created once, and can be reused for
multiple requests.
client =
aiplatform.gapic.PredictionServiceClient(client_options=client_options)
# The format of each instance should conform to the deployed model's prediction input schema.
instance = json_format.ParseDict(instance_dict, Value())
instances = [instance]
parameters_dict = {}
parameters = json_format.ParseDict(parameters_dict, Value())
endpoint = client.endpoint_path(
project=project, location=location, endpoint=endpoint_id
)
response = client.predict(
endpoint=endpoint, instances=instances, parameters=parameters
)
print("response")
print(" deployed_model_id:", response.deployed_model_id)
# The predictions are a google.protobuf.Value representation of the model's predictions.
predictions = response.predictions
for prediction in predictions:
print(" prediction:", dict(prediction))
After running this code I got the error.
If anyone knows about this issue pls help.
Few things to consider:
Profile your custom container model, make sure it's predict api function isn't for some reason latent
Allow your prediction service to serve using multiple workers
Increase your number of replicas in Vertex or set your machine types to stronger types as long as you gain improvement
However, there's something worth doing first in the client side assuming most of your prediction calls go through successfully and it is not that frequent that the service is unavailable,
Configure your prediction client to use Retry (exponential backoff):
from google.api_core.retry import Retry, if_exception_type
import requests.exceptions
from google.auth import exceptions as auth_exceptions
from google.api_core import exceptions
if_error_retriable = if_exception_type(
exceptions.GatewayTimeout,
exceptions.TooManyRequests,
exceptions.ResourceExhausted,
exceptions.ServiceUnavailable,
exceptions.DeadlineExceeded,
requests.exceptions.ConnectionError, # The last three might be an overkill
requests.exceptions.ChunkedEncodingError,
auth_exceptions.TransportError,
)
def _get_retry_arg(settings: PredictionClientSettings):
return Retry(
predicate=if_error_retriable,
initial=1.0, # Initial delay
maximum=4.0, # Maximum delay
multiplier=2.0, # Delay's multiplier
deadline=9.0, # After 9 secs it won't try again and it will throw an exception
)
def predict_custom_trained_model_sample(
project: str,
endpoint_id: str,
instance_dict: Dict,
location: str = "us-central1",
api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
...
response = await client.predict(
endpoint=endpoint,
instances=instances,
parameters=parameters,
timeout=SOME_VALUE_IN_SEC,
retry=_get_retry_arg(),
)

Scrapy download files from FTP

I need to download a group of csv using scrapy from FTP. But first I need to scrape a website(https://www.douglas.co.us/assessor/data-downloads/) in order to get the urls of csv in the ftp.I read about how to download files in the documentation(Downloading and processing files and images)
settings
custom_settings = {
'ITEM_PIPELINES': {
'scrapy.pipelines.files.FilesPipeline': 1,
},
'FILES_STORE' : os.path.dirname(os.path.abspath(__file__))
}
parse
def parse(self, response):
self.logger.info("In parse method!!!")
# Property Ownership
property_ownership = response.xpath("//a[contains(., 'Property Ownership')]/#href").extract_first()
# Property Location
property_location = response.xpath("//a[contains(., 'Property Location')]/#href").extract_first()
# Property Improvements
property_improvements = response.xpath("//a[contains(., 'Property Improvements')]/#href").extract_first()
# Property Value
property_value = response.xpath("//a[contains(., 'Property Value')]/#href").extract_first()
item = FiledownloadItem()
self.insert_keyvalue(item,"file_urls",[property_ownership, property_location, property_improvements, property_value])
yield item
But I got the following error
Traceback (most recent call last): File
"/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw) File "/usr/local/lib/python2.7/dist-packages/scrapy/pipelines/media.py",
line 79, in process_item
requests = arg_to_iter(self.get_media_requests(item, info)) File "/usr/local/lib/python2.7/dist-packages/scrapy/pipelines/files.py",
line 382, in get_media_requests
return [Request(x) for x in item.get(self.files_urls_field, [])] File
"/usr/local/lib/python2.7/dist-packages/scrapy/http/request/init.py",
line 25, in init
self._set_url(url) File "/usr/local/lib/python2.7/dist-packages/scrapy/http/request/init.py",
line 58, in _set_url
raise ValueError('Missing scheme in request url: %s' % self._url) ValueError: Missing scheme in request url: [
The best explanation to my problem is this answer of this question scrapy error :exceptions.ValueError: Missing scheme in request url:, that explain that the problem is that urls to download are missing the "http://".
What should I do in my case? Can I use FilesPipeline? or I need to do something different?
Thanks in advance.
ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: [
According to the traceback, scrapy thinks your file url is '['.
My best guess is that you have an error in the insert_keyvalue() method.
Also, why have a method for this? Simple assignment should work.

Error using OAuth2 to connect to dropbox in Python

On my Raspberry Pi running raspbian jessie I tried to go through the OAuth2 flow to connect a program to my dropbox using the dropbox SDK for Python which I installed via pip.
For a test, I copied the code from the documentation (and defined the app-key and secret, of course):
from dropbox import DropboxOAuth2FlowNoRedirect
auth_flow = DropboxOAuth2FlowNoRedirect(APP_KEY, APP_SECRET)
authorize_url = auth_flow.start()
print "1. Go to: " + authorize_url
print "2. Click \"Allow\" (you might have to log in first)."
print "3. Copy the authorization code."
auth_code = raw_input("Enter the authorization code here: ").strip()
try:
access_token, user_id = auth_flow.finish(auth_code)
except Exception, e:
print('Error: %s' % (e,))
return
dbx = Dropbox(access_token)
I was able to get the URL and to click allow. When I then entered the authorization code however, it printed the following error:
Error: 'str' object has no attribute 'copy'
Using format_exc from the traceback-module, I got the following information:
Traceback (most recent call last):
File "test.py", line 18, in <module>
access_token, user_id = auth_flow.finish(auth_code)
File "/usr/local/lib/python2.7/dist-packages/dropbox/oauth.py", line 180, in finish
return self._finish(code, None)
File "/usr/local/lib/python2.7/dist-packages/dropbox/oauth.py", line 50, in _finish
url = self.build_url(Dropbox.HOST_API, '/oauth2/token')
File "/usr/local/lib/python2.7/dist-packages/dropbox/oauth.py", line 111, in build_url
return "https://%s%s" % (self._host, self.build_path(target, params))
File "/usr/local/lib/python2.7/dist-packages/dropbox/oauth.py", line 89, in build_path
params = params.copy()
AttributeError: 'str' object has no attribute 'copy'
It seems the build_path method expects a dict 'params' and receives a string instead. Any ideas?
Thanks to smarx for his comment. The error is a known issue and will be fixed in version 3.42 of the SDK. source

Why this dynamodb query failed with 'Requested resource not found'?

I have doubled checked that the item exists in the dynamodb table. id is the default hash key.
I want to retrieve the content by using the main function in this code:
import boto.dynamodb2
from boto.dynamodb2 import table
table='doc'
region='us-west-2'
aws_access_key_id='YYY'
aws_secret_access_key='XXX'
def get_db_conn():
return boto.dynamodb2.connect_to_region(
region,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key)
def get_table():
return table.Table(table, get_db_conn())
def main():
tbl = get_table()
doc = tbl.get_item(id='4d7a73b6-2121-46c8-8fc2-54cd4ceb2a30')
print doc.keys()
However I get this exception instead:
File "scripts/support/find_doc.py", line 31, in <module>
main()
File "scripts/support/find_doc.py", line 33, in main
doc = tbl.get_item(id='4d7a73b6-2121-46c8-8fc2-54cd4ceb2a30')
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/dynamodb2/table.py", line 504, in get_item
consistent_read=consistent
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 1065, in get_item
body=json.dumps(params))
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 2731, in make_request
retry_handler=self._retry_handler)
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/connection.py", line 953, in _mexe
status = retry_handler(response, i, next_sleep)
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 2774, in _retry_handler
data)
boto.exception.JSONResponseError: JSONResponseError: 400 Bad Request
{u'message': u'Requested resource not found', u'__type': u'com.amazonaws.dynamodb.v20120810#ResourceNotFoundException'}
Why I am getting this error message?
I am using boto version 2.34
The problem is in this code:
def get_table():
return table.Table(table, get_db_conn())
It should be
def get_table():
return table.Table(table, connection=get_db_conn())
Note the connection named parameter
If you have a range key you have to specify in the get_item, like so:
get_item(timestamp=Decimal('1444232509'), id='HASH_SHA1')
Here on my table Packages I have an index (id) and a range key (timestamp).
I was getting this error because I was connecting to the wrong region.
To check your table region, go to overview tab of your table and scroll down to Amazon Resource Name (ARN) field.
My ARN starts with arn:aws:dynamodb:us-east-2:. Here 'us-east-2' is the region I need to pass while initiating the boto3 client.