Airflow BigQueryHook ValueError: The project_id should be set - google-cloud-platform

I am trying to fetch data from a big query table using BigQueryHook and I am facing this error
File "/home/abdul/etl-pipelines/venv/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 2061, in run_query
raise ValueError("The project_id should be set")
ValueError: The project_id should be set
I have tried exporting the AIRFLOW_CON_BIGQUERY_DEFAULT environment variable still it didn't help. The way I did it.
export AIRFLOW_CONN_BIGQUERY_DEFAULT="google-cloud-platform://?extra__google_cloud_platform__project=<gcp_project_id>"
Here is my function
def get_data_from_bq(**kwargs):
hook = BigQueryHook(delegate_to=None, use_legacy_sql=False)
hook.run_query('SELECT max(created_date) FROM `dataset.table`')
Although I tried to put project_id as an argument in BigQueryHook it gave me Unexpected argument error
Tried the second way using the BigQueryHook object
hook.project_id = project_id
Got the same ValueError.
I went through every piece of documentation and still couldn't find any solution maybe I am missing something?
Airflow BigQueryHook Docs

Related

Automate raising ImproperlyConfigure if environment variable is missing

I have the following code in one of my views.py:
tl_key = os.getenv("TRANSLOADIT_KEY")
tl_secret = os.getenv("TRANSLOADIT_SECRET")
if not tl_key:
logger.critical("TRANSLOADIT_KEY not set")
raise ImproperlyConfigured
if not tl_secret:
logger.critical("TRANSLOADIT_SECRET not set")
raise ImproperlyConfigured
I know that if Django doesn't find SECRET_KEY or DEBUG environment variable, it will raise ImproperlyConfigured exception. Is there a way I can specify which env variables are required so that the said exception is raised automatically?
You can create a list of items which is required in env file and paste this in your main settings.py or dev settings
required_env_items=["TRANSLOADIT_KEY","TRANSLOADIT_SECRET"]
for item in required_env_items:
if not os.getenv(item):
raise ImproperlyConfigured("please add {} in env file".format(item))
You can add another environment variable defining your need, something like APP_ENVIRONMENT=dev|prod, that way you can check that var and raise ImproperlyConfigured or assign a default value in your code

GCP/Python: Capturing actual error in subprocess.popen() while csv import from Hive to CloudSQL

I have python 3.6.8 on GNU/Linux 3.10 on GCP and I'm trying to load data from Hive to CloudSQL.
gc_cmd_import_csv_p1 = subprocess.Popen(['gcloud', 'sql', 'import', 'csv',
'{}'.format(quote(cloudsql_instance)),
'{}'.format(quote(load_csv_files)),
'--database={}'.format(quote(cloudsql_db)),
'--table={}'.format(quote(cloudsql_table_name)),
'--user={}'.format(quote(db_user_name)),
'--quiet'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True)
import_cmd_op, import_cmd_error = gc_cmd_import_csv_p1.communicate()
import_cmd_return_code = gc_cmd_import_csv_p1.returncode
if import_cmd_return_code:
print("""[ERROR] Unable to import data from Hive to CloudSQL.
Error description: {}
Error Code(s): {}
Issue file name: {}
""".format(import_cmd_error, import_cmd_return_code, load_csv_files))
sys.exit(9)
print("[INFO] Data Import completed from HIVE to CloudSQL.")
In case of any error above, I'm getting message like:
Error description: ERROR: (gcloud.sql.import.csv) HTTPError 403: The client is not authorized to make this request.Error Code(s): 1
But when I actually run the same import command directly as shown below:
gcloud sql import csv test-cloud-sql-instance gs://test-server-12345/app1/data/lookup_table/000000_0 --database=test_db --table=name_lookup --user=test_user --quiet
I'm getting the actual error like below:
ERROR: (gcloud.sql.import.csv) [ERROR_RDBMS] ERROR: extra data after last expected column CONTEXT: COPY name_lookup, line 16902:
I want this message
( Extra data after last expected column... line 16902:)
to be shown in python script instead of
HTTPError 403:
error. How to capture that?
Please note: There is no authentication issue as suggested by HTTP Error.
So after long discussion with GCP Admin, we have found the issue.
We tried to execute the same import command using os.system() and then again we got the HTTP error. Admin then revisited the GCP IAM documentation and created a role for P-SQL user. Issue is resolved now.

How to solve AttributeError in python active_directory?

Running the below script works for 60% of the entries from the MasterGroupList however suddenly fails with the below error. although my questions seem to be poor ou guys have been able to help me before. Any idea how I can avoid getting this error? or what is trhoughing off the script? The masterGroupList looks like:
Groups Pulled from AD
SET00 POWERUSER
SET00 USERS
SEF00 CREATORS
SEF00 USERS
...another 300 entries...
Error:
Traceback (most recent call last):
File "C:\Users\ks185278\OneDrive - NCR Corporation\Active Directory Access Scr
ipt\test.py", line 44, in <module>
print group.member
File "C:\Python27\lib\site-packages\active_directory.py", line 805, in __getat
tr__
raise AttributeError
AttributeError
Code:
from active_directory import *
import os
file = open("C:\Users\NAME\Active Directory Access Script\MasterGroupList.txt", "r")
fileAsList = file.readlines()
indexOfTitle = fileAsList.index("Groups Pulled from AD\n")
i = indexOfTitle + 1
while i <= len(fileAsList):
fileLocation = 'C:\\AD Access\\%s\\%s.txt' % (fileAsList[i][:5], fileAsList[i][:fileAsList[i].find("\n")])
#Creates the dir if it does not exist already
if not os.path.isdir(os.path.dirname(fileLocation)):
os.makedirs(os.path.dirname(fileLocation))
fileGroup = open(fileLocation, "w+")
#writes group members to the open file
group = find_group(fileAsList[i][:fileAsList[i].find("\n")])
print group.member
for group_member in group.member: #this is line 44
fileGroup.write(group_member.cn + "\n")
fileGroup.close()
i+=1
Disclaimer: I don't know python, but I know Active Directory fairly well.
If it's failing on this:
for group_member in group.member:
It could possibly mean that the group has no members.
Depending on how phython handles this, it could also mean that the group has only one member and group.member is a plain string rather than an array.
What does print group.member show?
The source code of active_directory.py is here: https://github.com/tjguk/active_directory/blob/master/active_directory.py
These are the relevant lines:
if name not in self._delegate_map:
try:
attr = getattr(self.com_object, name)
except AttributeError:
try:
attr = self.com_object.Get(name)
except:
raise AttributeError
So it looks like it just can't find the attribute you're looking up, which in this case looks like the 'member' attribute.

Error with GDAL

I have tried and run this script from Rutger Kassies.
import gdal
import matplotlib.pyplot as plt
ds = gdal.Open('HDF4_SDS:sample:"A2002037045000.L2_LAC.SAMPLE.hdf":01')
data = ds.ReadAsArray()
ds = None
fig, ax = plt.subplots(figsize=(6,6))
ax.imshow(data[0,:,:], cmap=plt.cm.Greys, vmin=1000, vmax=6000)
But then an error always occured:
Traceback (most recent call last):
File "D:\path\to\python\stackoverflow.py", line 5, in <module>
data = ds.ReadAsArray()
AttributeError: 'NoneType' object has no attribute 'ReadAsArray'
What's wrong with the script? Am I missing something? In installing GDAL I have followed this instruction http://pythongisandstuff.wordpress.com/2011/07/07/installing-gdal-and-ogr-for-python-on-windows/
Am using windows 7/32 bit/Python 2.7.
Thanks!
gdal.Open() is failing and returning 'None'. This produces the sometimes counterintuitive message "NoneType' object has no attribute ...". Quoting from Python: Attribute Error - 'NoneType' object has no attribute 'something', "NoneType means that instead of an instance of whatever Class or Object you think you're working with, you've actually got None. That usually means that an assignment or function call up above failed or returned an unexpected result."
Apparently GDAL is correctly installed. It could be that the file is not readable or that there is an issue with the HDF driver. Are you getting any error message like:
`HDF4_SDS:sample:"A2002037045000.L2_LAC.SAMPLE.hdf":01' does not
exist in the file system, and is not recognised as a supported dataset
name.
To get additional information you can try something like this instead of the gdal.Open() line in your script:
gdal.UseExceptions()
ds=None
try:
ds = gdal.Open('HDF4_SDS:sample:"A2002037045000.L2_LAC.SAMPLE.hdf":01')
except RuntimeError, err:
print "Exception: ", err
exit(1)
Also, there's an extra '}' at the end of the script.
By default, osgeo.gdal returns None on error, and does not normally raise informative exceptions. You can change this with gdal.UseExceptions().
Try something like this:
from osgeo import gdal
gdal.UseExceptions()
source_path = r'HDF4_SDS:sample:"D:\path\to\file\A2002037045000.L2_LAC.SAMPLE.hdf":01'
try:
ds = gdal.Open(source_path)
except RuntimeError as ex:
raise IOError(ex)
The last bit just re-raises the exception as an IOError rather than a RuntimeException.
The solution is to modify source_path to a working path to your data source, e.g., I see
IOError: `HDF4_SDS:sample:"A2002037045000.L2_LAC.SAMPLE.hdf":01' does not exist in the file system, and is not recognised as a supported dataset name.

No indexers created by Djapian for Django

I am working through the tutorial for setting up Djapian and am trying to use the indexshell (as demonstrated in this step). When I run the command 'list' I get the following output:
Installed spaces/models/indexers:
- 0: 'global'
I therefore cannot run any queries:
>>> query
No index selected
Which leads me to attempt:
>>> use 0
Illegal index alias '0'. See 'list' command for available aliases
My index.py is as follows:
from djapian import space, Indexer, CompositeIndexer
from cms.models import Article
class ArticleIndexer(Indexer):
fields = ['body']
tags = [
('title', 'title'),
('author', 'author'),
('pub_date', 'pub_date',),
('category', 'category')
]
space.add_index(Article, ArticleIndexer, attach_as='indexer')
Update: I moved the djapian folder from site-packages to within my project folder and I move index.py from the project root to within the djapian folder. When I run 'list' in the indexshell the following is now returned:
>>> list
Installed spaces/models/indexers:
- 0: 'global'
- 0.0 'cms.Article'
-0.0.0: 'djapian.space.defaultcmsarticleindexer'
I still cannot do anything though as when I try to select an index I still get the following error:
>>> use 0.0
Illegal index alias '0'. See 'list' command for available aliases
Update 2: I had a problem with my setting for DJAPIAN_DATABASE_PATH which is now fixed. I can select an indexer using the command 'use 0.0.0' but when I try to run a query it raises the following ValueError: "Empty slice".
Have you fixed the problem of the ValueError: Empty Slice?
I'm having the exact same problem using the djapian tutorial. First I was wondering if my database entries were right, but now I'm thinking it might have something to do with the actual querying of the Xapian install?
Seeing that I haven't had to point to the install at all wonders me if I placed it in the right directory and if djapian knows where to find it.
-- Edit
I've found the solution, atleast for me. The tutorial is not up to date and the query command expects a number of results too. So if you use 'query mykeyword 5' you get 5 results and the ValueError: Empty Slice disappears. It's a known issue and it will be fixed soon from what I read.
Perhaps you're not loading indexes?
You could try placing the following in your main urls.py:
import djapian
djapian.load_indexes()
In a comment to your question you write that you've placed index.py file in the project root. It should actually reside within an app, along models.py.
One more thing (which is very unlikely to be the cause of your problems); you've got a stray comma on the following line:
('pub_date', 'pub_date',),
^