Missing XLA configuration when running pytorch/xla

Missing XLA configuration when running pytorch/xla - google-cloud-platform

I am trying to run GCP TPU with Pytorch/XLA, I am using a VM with debian-9-torch-xla-v20200818 image, I initiate the TPU and check it is running using ctpu status which shows that both the CPU and TPU are running, I then activate the torch-xla-nightly environment, but when I try to invoke this simple code:
import torch
import torch_xla
import torch_xla.core.xla_model as xm
dev = xm.xla_device()
t1 = torch.ones(3, 3, device = dev)
print(t1)
this error comes up:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/anaconda3/envs/torch-xla-nightly/lib/python3.6/site-packages/torch_xla/core/xla_model.py", line 231, in xla_device
devkind=devkind if devkind is not None else None)
File "/anaconda3/envs/torch-xla-nightly/lib/python3.6/site-packages/torch_xla/core/xla_model.py", line 136, in get_xla_supported_devices
xla_devices = _DEVICES.value
File "/anaconda3/envs/torch-xla-nightly/lib/python3.6/site-packages/torch_xla/utils/utils.py", line 32, in value
self._value = self._gen_fn()
File "/anaconda3/envs/torch-xla-nightly/lib/python3.6/site-packages/torch_xla/core/xla_model.py", line 18, in <lambda>
_DEVICES = xu.LazyProperty(lambda: torch_xla._XLAC._xla_get_devices())
RuntimeError: tensorflow/compiler/xla/xla_client/computation_client.cc:274 : Missing XLA configuration
I tried everything but nothing seem to work.

Take a look at this link as it seems to pertain to the issue. Maybe you didn't setup the XRT_TPU_CONFIG: (vm)$ export XRT_TPU_CONFIG="tpu_worker;0;$TPU_IP_ADDRESS:8470" Follow the instructions here and you should be fine.

Another possibility if you see that XRT_TPU_CONFIG set properly is that you forgot to start your instance with the appropriate scopes:
gcloud compute instances create ... --scopes=https://www.googleapis.com/auth/cloud-platform

Related

PyMongo 3 and ServerSelectionTimeoutError while getting data from Mongodb

This seems like an old solved problem here and here and here but Still I am getting this error.I create my db on Docker.And It worked only one time.Before this, I re-created db, did "connect =false",added wait, downgraded pymongo, did previous solutions etc. I stuck.
Python 3.8.0, Pymongo 3.9.0
from pymongo import MongoClient
import pprint
client = MongoClient('mongodb://192.168.1.100:27017/',
username='admin',
password='psw',
authSource='myappdb',
authMechanism='SCRAM-SHA-1',
connect=False)
db = client['myappdb']
serverStatusResult=db.command("serverStatus")
pprint(serverStatusResult)
and I am getting ServerSelectionTimeoutError
Traceback (most recent call last):
File "C:\Users\ME\eclipse2019-workspace\exdjango\exdjango__init__.py",
line 12, in
serverStatusResult=db.command("serverStatus")
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\database.py",
line 610, in command
with self.client._socket_for_reads(
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\contextlib.py",
line 113, in __enter
return next(self.gen)
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\mongo_client.py",
line 1099, in _socket_for_reads
server = topology.select_server(read_preference)
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\topology.py",
line 222, in select_server
return random.choice(self.select_servers(selector,
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\topology.py",
line 182, in select_servers
server_descriptions = self._select_servers_loop(
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\topology.py",
line 198, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: 192.168.1.100:27017: timed out

Your connection looks a little misconfigured. Firstly you have half connection string, half parameter format. I'd suggest you stick with one or the other.
Your auth database is usually seperate to your actual databases (and it's usually called admin). Check this is correct.
There's no particular need to specify the authMechanism assuming you are using MongoDB 3.0 or later.
The connect=False is likely a red herring.
So I would try one of either:
client = MongoClient('mongodb://admin:psw#192.168.1.100:27017/myappdb?authSource=admin')
or
client = MongoClient(host='192.168.1.100',
port=27017,
username='admin',
password='psw',
authSource='admin')

Cloud composer issue with datasets in Australia region

I was trying to use cloud composer to schedule and orchestrate Bigquery jobs. Bigquery tables are in australia-southeast1 region.The cloud composer environment was created in us-central1 region(As composer is not available in Australia region). When I try below command , it throws a vague error. The same setup worked fine when I tried with datasets residing in EU and US.
Command:
gcloud beta composer environments run bq-schedule --location us-central1 test -- my_bigquery_dag input_gl 8-02-2018
Error:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 7, in <module>
exec(compile(f.read(), __file__, 'exec'))
File "/usr/local/lib/airflow/airflow/bin/airflow", line 27, in <module>
args.func(args)
File "/usr/local/lib/airflow/airflow/bin/cli.py", line 528, in test
ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
File "/usr/local/lib/airflow/airflow/utils/db.py", line 50, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/airflow/airflow/models.py", line 1583, in run
session=session)
File "/usr/local/lib/airflow/airflow/utils/db.py", line 50, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/airflow/airflow/models.py", line 1492, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/airflow/airflow/contrib/operators/bigquery_operator.py", line 98, in execute
self.create_disposition, self.query_params)
File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 499, in run_query
return self.run_with_configuration(configuration)
File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 868, in run_with_configuration
err.resp.status)
Exception: ('BigQuery job status check failed. Final error was: %s', 404)
Is there any workaround to resolve this issue?

Because your dataset resides in australia-southeast1, BigQuery created a job in the same location by default, which is australia-southeast1. However, the Airflow in your Composer environment was trying to get the job's status without specifying location field.
Reference: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
This has been fixed by my PR and it has been merged to master.
To work around this, you can extend the BigQueryCursor and override the run_with_configuration() function with location support.

Unable to debug airflow error.Trying to use airflow to build data pipeline on GCP

I am trying to identify what might be causing the below issue (Airflow)?
Basically I have written a Test DAG and it's main task is to read data from BigQuery and write it into a new table.I tried searching about this but I am not able to find out what might be causing this. I'm not even sure if my gcp_connection is working correctly. I don't know how to test this.
Any help is greatly appreciated!
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/airflow/models.py", line 1659, in _run_raw_task
result = task_copy.execute(context=context)
File "/anaconda3/lib/python3.6/site-packages/airflow/operators/subdag_operator.py", line 103, in execute
executor=self.executor)
File "/anaconda3/lib/python3.6/site-packages/airflow/models.py", line 4214, in run
job.run()
File "/anaconda3/lib/python3.6/site-packages/airflow/jobs.py", line 203, in run
self._execute()
File "/anaconda3/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/anaconda3/lib/python3.6/site-packages/airflow/jobs.py", line 2547, in _execute
raise AirflowException(err)
airflow.exceptions.AirflowException: ---------------------------------------------------
Some task instances failed:
{('test_oscope.test_oscope', 'create_if_not_exists', datetime.datetime(2016, 6, 1, 0, 0, tzinfo=<Timezone [UTC]>), 1), ('test_oscope.test_oscope', 'fill', datetime.datetime(2016, 6, 1, 0, 0, tzinfo=<Timezone [UTC]>), 1)}

Hooks in Airflow are very useful on their own to use in interactive environments like iPython or Jupyter Notebook.
For example:
from airflow.contrib.hooks.gcs_hook import GoogleCloudStorageHook
GCSHook = GoogleCloudStorageHook(google_cloud_storage_conn_id='google_conn_id')
GCSHook.get_conn() # This will check if your GCP connection is working correctly or not

While airflow initdb, AttributeError: module' object has no attribute 'client_auth'

I have recently installed apache airflow 1.8.1, I executed following command:
airflow initdb
which returned following error:
Traceback (most recent call last):
File "/usr/bin/airflow", line 18, in <module>
from airflow.bin.cli import CLIFactory
File "/usr/lib/python2.7/dist-packages/airflow/bin/cli.py", line 65, in <module>
auth=api.api_auth.client_auth)
AttributeError: 'module' object has no attribute 'client_auth'
I tried several solutions but it doesn't work.

I figured out what we were doing wrong. The field auth_backend=airflow.contrib.auth.backends.password_auth needs to be under webserver and not under api. Add it if it is not already there. There are multiple fields for auth_backend as there are authenticate etc..

I had the same error with airflow 1.8.1 with python 2.7.11.
I have disabled the webserver auth ( temporarily) and switched to the default value for auth_backend. ( that solved the issue)
The final configuration in my airflow.cfg is as follows
auth_backend = airflow.api.auth.backend.default
authenticate = False

NotSupportedError when trying to build primary index in N1QL in Couchbase Python SDK

I'm trying to get into the new N1QL Queries for Couchbase in Python.
I got my database set up in Couchbase 4.0.0.
My initial try was to retreive all documents like this:
from couchbase.bucket import Bucket
bucket = Bucket('couchbase://localhost/dafault')
rv = bucket.n1ql_query('CREATE PRIMARY INDEX ON default').execute()
for row in bucket.n1ql_query('SELECT * FROM default'):
print row
But this produces a OperationNotSupportedError:
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 2357, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1777, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/my_user/python_tests/test_n1ql.py", line 9, in <module>
rv = bucket.n1ql_query('CREATE PRIMARY INDEX ON default').execute()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/couchbase/n1ql.py", line 215, in execute
for _ in self:
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/couchbase/n1ql.py", line 235, in __iter__
self._start()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/couchbase/n1ql.py", line 180, in _start
self._mres = self._parent._n1ql_query(self._params.encoded)
couchbase.exceptions.NotSupportedError: <RC=0x13[Operation not supported], Couldn't schedule n1ql query, C Source=(src/n1ql.c,82)>
Here the version numbers of everything I use:
Couchbase Server: 4.0.0
couchbase python library: 2.0.2
cbc: 2.5.1
python: 2.7.8
gcc: 4.2.1
Anyone an idea what might have went wrong here? I could not find any solution to this problem up to now.
There was another ticket for node.js where the same issue happened. There was a proposal to enable n1ql for the specific bucket first. Is this also needed in python?

It would seem you didn't configure any cluster nodes with the Query or Index services. As such, the error returned is one that indicates no nodes are available.

I also got similar error while trying to create primary index.
Create a primary index...
Traceback (most recent call last):
File "post-upgrade-test.py", line 45, in <module>
mgr.n1ql_index_create_primary(ignore_exists=True)
File "/usr/local/lib/python2.7/dist-packages/couchbase/bucketmanager.py", line 428, in n1ql_index_create_primary
'', defer=defer, primary=True, ignore_exists=ignore_exists)
File "/usr/local/lib/python2.7/dist-packages/couchbase/bucketmanager.py", line 412, in n1ql_index_create
return IxmgmtRequest(self._cb, 'create', info, **options).execute()
File "/usr/local/lib/python2.7/dist-packages/couchbase/_ixmgmt.py", line 160, in execute
return [x for x in self]
File "/usr/local/lib/python2.7/dist-packages/couchbase/_ixmgmt.py", line 144, in __iter__
self._start()
File "/usr/local/lib/python2.7/dist-packages/couchbase/_ixmgmt.py", line 132, in _start
self._cmd, index_to_rawjson(self._index), **self._options)
couchbase.exceptions.NotSupportedError: <RC=0x13[Operation not supported], Couldn't schedule ixmgmt operation, C Source=(src/ixmgmt.c,98)>
Adding query and index node to the cluster solved the issue.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Missing XLA configuration when running pytorch/xla - google-cloud-platform

Take a look at this link as it seems to pertain to the issue. Maybe you didn't setup the XRT_TPU_CONFIG: (vm)$ export XRT_TPU_CONFIG="tpu_worker;0;$TPU_IP_ADDRESS:8470" Follow the instructions here and you should be fine.

Another possibility if you see that XRT_TPU_CONFIG set properly is that you forgot to start your instance with the appropriate scopes: gcloud compute instances create ... --scopes=https://www.googleapis.com/auth/cloud-platform

Related

PyMongo 3 and ServerSelectionTimeoutError while getting data from Mongodb

Cloud composer issue with datasets in Australia region

Unable to debug airflow error.Trying to use airflow to build data pipeline on GCP

While airflow initdb, AttributeError: module' object has no attribute 'client_auth'

NotSupportedError when trying to build primary index in N1QL in Couchbase Python SDK

Categories

Resources