Querying BigQuery table in AI platform notebooks - google-cloud-platform

I'm stuck in using a query into my jupyter notebook of gcp.
The query work fine in Bigquery when I run it there (see pic below)
When I run it in my notebook using this code.
query = """SELECT *, FROM [kaggle-competition-datasets:geotab_intersection_congestion.train] LIMIT 4"""
import google.datalab.bigquery as bq
train = bq.Query(query).execute().result().to_dataframe()
I get this error.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.5/dist-packages/google/datalab/bigquery/_query.py in execute_async(self, output_options, sampling, context, query_params)
278 try:
--> 279 destination = query_result['configuration']['query']['destinationTable']
280 table_name = (destination['projectId'], destination['datasetId'], destination['tableId'])
KeyError: 'destinationTable'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
<ipython-input-1-85e1ffdadde6> in <module>
1 query = """SELECT *, FROM [kaggle-competition-datasets:geotab_intersection_congestion.train] LIMIT 4"""
2 import google.datalab.bigquery as bq
----> 3 train = bq.Query(query).execute().result().to_dataframe()
/usr/local/lib/python3.5/dist-packages/google/datalab/bigquery/_query.py in execute(self, output_options, sampling, context, query_params)
337 """
338 return self.execute_async(output_options, sampling=sampling, context=context,
--> 339 query_params=query_params).wait()
340
341 #staticmethod
/usr/local/lib/python3.5/dist-packages/google/datalab/bigquery/_query.py in execute_async(self, output_options, sampling, context, query_params)
281 except KeyError:
282 # The query was in error
--> 283 raise Exception(_utils.format_query_errors(query_result['status']['errors']))
284
285 execute_job = _query_job.QueryJob(job_id, table_name, sql, context=context)
Exception: invalidQuery: Syntax error: Unexpected "[" at [1:16]. If this is a table identifier, escape the name with `, e.g. `table.name` rather than [table.name].
Of course I modified the query as suggested by the traceback but nothing works. What the problem does notebooks in gcp access in different manner in bigquery table??

Related

torchvision.datasets.mnist RunTimeError on JupyterLab

I'm trying to run the following sample code on JupyterLab (through GCP vertex AI):
import torch
from torchvision import transforms
from torchvision import datasets
train_data = datasets.MNIST(root='data', train=True, download=True, transform=None)
print(train_data)
with versions:
torch-1.12.1+cu113
torchvision-0.13.1+cu113
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_10081/229378695.py in <module>
11 from torchvision import datasets
12
---> 13 train_data = datasets.MNIST(root='data', train=True, download=True, transform=None)
14 print(train_data)
/opt/conda/lib/python3.7/site-packages/torchvision/datasets/mnist.py in __init__(self, root, train, transform, target_transform, download)
102 raise RuntimeError("Dataset not found. You can use download=True to download it")
103
--> 104 self.data, self.targets = self._load_data()
105
106 def _check_legacy_exist(self):
/opt/conda/lib/python3.7/site-packages/torchvision/datasets/mnist.py in _load_data(self)
121 def _load_data(self):
122 image_file = f"{'train' if self.train else 't10k'}-images-idx3-ubyte"
--> 123 data = read_image_file(os.path.join(self.raw_folder, image_file))
124
125 label_file = f"{'train' if self.train else 't10k'}-labels-idx1-ubyte"
/opt/conda/lib/python3.7/site-packages/torchvision/datasets/mnist.py in read_image_file(path)
542
543 def read_image_file(path: str) -> torch.Tensor:
--> 544 x = read_sn3_pascalvincent_tensor(path, strict=False)
545 if x.dtype != torch.uint8:
546 raise TypeError(f"x should be of dtype torch.uint8 instead of {x.dtype}")
/opt/conda/lib/python3.7/site-packages/torchvision/datasets/mnist.py in read_sn3_pascalvincent_tensor(path, strict)
529
530 assert parsed.shape[0] == np.prod(s) or not strict
--> 531 return parsed.view(*s)
532
533
RuntimeError: shape '[60000, 28, 28]' is invalid for input of size 9437168
____________________
and I'm getting this strange error when trying to load MNIST
I tried reproducing it in other envaironments but couldn't - it works great locally & on cloab
I tried lots of other versions of torch and torchvision but non of them works
This error is often caused by an issue with the MNIST dataset files that are downloaded onto your system. Try deleting the MNIST dataset files in the data directory and then running the code again to download fresh copies of the dataset files. Follow this code:
import os
import shutil
mnist_folder = 'data/MNIST'
if os.path.exists(mnist_folder):
shutil.rmtree(mnist_folder)
train_data = datasets.MNIST(root='data', train=True, download=True, transform=None)
If this method doesn't work, visit this website and placing them in the data/MNIST folder.

Getting `dtype of input object does not match expected dtype <U0` when invoking MLflow-deployed NLP model in SageMaker

I deployed a Huggingface Transformer model in SageMaker using MLflow's sagemaker.deploy().
When logging the model I used infer_signature(np.array(test_example), loaded_model.predict(test_example)) to infer input and output signatures.
Model is deployed successfully. When trying to query the model I get ModelError (full traceback below).
To query the model, I am using precisely the same test_example that I used for infer_signature():
test_example = [['This is the subject', 'This is the body']]
The only difference is that when querying the deployed model, I am not wrapping the test example in np.array() as that is not json-serializeable.
To query the model I tried two different approaches:
import boto3
SAGEMAKER_REGION = 'us-west-2'
MODEL_NAME = '...'
client = boto3.client("sagemaker-runtime", region_name=SAGEMAKER_REGION)
# Approach 1
client.invoke_endpoint(
EndpointName=MODEL_NAME,
Body=json.dumps(test_example),
ContentType="application/json",
)
# Approach 2
client.invoke_endpoint(
EndpointName=MODEL_NAME,
Body=pd.DataFrame(test_example).to_json(orient="split"),
ContentType="application/json; format=pandas-split",
)
but they result in the same error.
Will be grateful for your suggestions.
Thank you!
Note: I am using Python 3 and all strings are unicode.
---------------------------------------------------------------------------
ModelError Traceback (most recent call last)
<ipython-input-89-d09862a5f494> in <module>
2 EndpointName=MODEL_NAME,
3 Body=test_example,
----> 4 ContentType="application/json; format=pandas-split",
5 )
~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
393 "%s() only accepts keyword arguments." % py_operation_name)
394 # The "self" in this scope is referring to the BaseClient.
--> 395 return self._make_api_call(operation_name, kwargs)
396
397 _api_call.__name__ = str(py_operation_name)
~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
723 error_code = parsed_response.get("Error", {}).get("Code")
724 error_class = self.exceptions.from_code(error_code)
--> 725 raise error_class(parsed_response, operation_name)
726 else:
727 return parsed_response
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{"error_code": "BAD_REQUEST", "message": "dtype of input object does not match expected dtype <U0"}". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/bec-sagemaker-model-test-app in account 543052680787 for more information.
Environment info:
{'channels': ['defaults', 'conda-forge', 'pytorch'],
'dependencies': ['python=3.6.10',
'pip==21.3.1',
'pytorch=1.10.2',
'cudatoolkit=10.2',
{'pip': ['mlflow==1.22.0',
'transformers==4.17.0',
'datasets==1.18.4',
'cloudpickle==1.3.0']}],
'name': 'bert_bec_test_env'}
I encoded the strings into numbers before sending them to the model.
Next, I added a code within the model wrapper that decodes numbers back to strings. This workaround worked without issues.
In my understanding this might indicate that there is a problem with MLflow's type checking for strings.
Added an issue here: https://github.com/mlflow/mlflow/issues/5474

Google Cloud - Pytorch Deep Learning VM - Error: 'module' object has no attribute '_cuda_setDevice'

I have tried GCP following these tutorial https://cloud.google.com/deep-learning-vm/docs/pytorch_start_instance and I've tried to run my code using a GPU TeslaK80. It gives me the following AttributeError: 'module' object has no attribute '_cuda_setDevice'. Days before I had another VM and it worked fine. I had create another Deep Learning VM and now it gives me that error. I don't understand why it's failing now.
These is my stacktrace:
AttributeError Traceback (most recent call last)
<ipython-input-3-cc76841a4302> in <module>()
45 # from config_training import config as config_training
46 torch.manual_seed(0)
---> 47 torch.cuda.set_device(0)
48
49 model = import_module(args.model)
/usr/local/lib/python2.7/dist-packages/torch/cuda/__init__.pyc in set_device(device)
262 device = _get_device_index(device)
263 if device >= 0:
--> 264 torch._C._cuda_setDevice(device)
265
266
AttributeError: 'module' object has no attribute '_cuda_setDevice'

get the client from pyspark

I want to retrieve a list of file. I saw a post sayong that these commands would do the job:
from hdfs import Config
client = Config().get_client('dev')
client.list('/*')
But actually, execution fails:
---------------------------------------------------------------------------
HdfsError Traceback (most recent call last)
<ipython-input-308-ab40dc16879a> in <module>()
----> 1 client = Config().get_client('dev')
/opt/cloudera/extras/anaconda3/lib/python3.5/site-packages/hdfs/config.py in get_client(self, alias)
117 break
118 else:
--> 119 raise HdfsError('Alias %r not found in %r.', alias, self.path)
120 return self._clients[alias]
121
HdfsError: Alias 'dev' not found in '/home/sbenet/.hdfscli.cfg'.
As you can see, it is trying to access the file /home/sbenet/.hdfscli.cfg which does not exists.
If I want to use this method to retrieve the list of files, I need to fix this .hdfscli.cfg file issue, or to use another method with sc maybe.
You have to create a configuration file first. Check this out 1
[global]
default.alias = dev
[dev.alias]
url = http://dev.namenode:port
user = ann
[prod.alias]
url = http://prod.namenode:port
root = /jobs/

Issue starting out with xlwings - AttributeError: Excel.Application.Workbooks

I was trying to use the package xlwings and ran into a simple error right from the start. I was able to run the example files they provided here without any major issues (except for multiple Excel books opening up upon running the code) but as soon as I tried to execute code via IPython I got the error AttributeError: Excel.Application.Workbooks. Specifically I ran:
from xlwings import Workbook, Sheet, Range, Chart
wb = Workbook()
Range('A1').value = 'Foo 1'
and got
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-7436ba97d05d> in <module>()
1 from xlwings import Workbook, Sheet, Range, Chart
----> 2 wb = Workbook()
3 Range('A1').value = 'Foo 1'
PATH\xlwings\main.pyc in __init__(self, fullname, xl_workbook, app_visible)
139 else:
140 # Open Excel if necessary and create a new workbook
--> 141 self.xl_app, self.xl_workbook = xlplatform.new_workbook()
142
143 self.name = xlplatform.get_workbook_name(self.xl_workbook)
PATH\xlwings\_xlwindows.pyc in new_workbook()
103 def new_workbook():
104 xl_app = _get_latest_app()
--> 105 xl_workbook = xl_app.Workbooks.Add()
106 return xl_app, xl_workbook
107
PATH\win32com\client\dynamic.pyc in __getattr__(self, attr)
520
521 # no where else to look.
--> 522 raise AttributeError("%s.%s" % (self._username_, attr))
523
524 def __setattr__(self, attr, value):
AttributeError: Excel.Application.Workbooks
I noticed the examples have a .xlxm file already present in the folder with the python code. Does the python code only ever work if it's in the same location as an existing Excel file? Does this mean it can't create Excel files automatically? Apologies if this is basic.