Unable to insert data into existing BigQuery Table? - python-2.7

I am trying to insert some data into bigquery table which is already exists. But I am unable to get that data into the table.
I tried standard example provided by google (insert_rows) but no luck. I have also referred this:https://github.com/googleapis/google-cloud-python/issues/5539
I have tried passing this data as list of tupples as well but same issue with that too.
from google.cloud import bigquery
import datetime
bigquery_client = bigquery.Client()
dataset_ref = bigquery_client.dataset('my_dataset_id')
table_ref = dataset_ref.table('my_destination_table_id')
table = bigquery_client.get_table(table_ref)
rows_to_insert = [
{u'jobName': 'writetobigquery'},
{u'startDatetime': datetime.datetime.now().strftime('%Y-%m-%d-%H%M%S')},
{u'jobStatus': 'Success'},
{u'logMessage': 'NA'},
]
errors = bigquery_client.insert_rows(table, rows_to_insert)
When I execute this, I don't get an error, but its not writing anything into table. It will be really great if anyone suggested something that would work for me. Thank You!

After making some modifications on your code I could make it work as expected. I changed your row from being a list of dictionaries of one value each to be a dictionary with all the columns in one row. I also changed the datetime format as it was invalid for BigQuery (valid format can be found here). So the following snippet should work fine:
from google.cloud import bigquery
import datetime
bigquery_client = bigquery.Client()
dataset_ref = bigquery_client.dataset('my_dataset_id')
table_ref = dataset_ref.table('my_destination_table_id')
table = bigquery_client.get_table(table_ref)
rows_to_insert = [
{u'jobName': 'writetobigquery',
u'startDatetime': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
u'jobStatus': 'Success',
u'logMessage': 'NA'}
]
errors = bigquery_client.insert_rows(table, rows_to_insert)
print "Errors occurred:", errors

Shouldn't your rows be a list of dictionaries? I assume your table schema is like jobName, startDatetime, jobStatus, logMessage, then:
rows_to_insert = [
{
u'jobName': 'writetobigquery',
u'startDatetime': datetime.datetime.now().strftime('%Y-%m-%d-%H%M%S'),
u'jobStatus': 'Success',
u'logMessage': 'NA'
}
]
errors = bigquery_client.insert_rows(table, rows_to_insert)

Related

Using Firestore with Flask/Jinja, correct method to query and display results to DOM

#app.route('/view_case/<case_name>')
def view_case(case_name):
query = db.collection('cases').document(case_name).collection('documents').get()
documents = []
for _document in query:
documents.append(_document)
return render_template('views/view_case.html', documents=documents)
Is the above method the correct way to query a group of documents and send them to the DOM as a list to be iterated over by jinja to display?
Side question, I notice the results dont include the documents ID, is there a way to attach the id to the document?
Just amended to use list comprehension
from google.cloud import firestore
db = firestore.Client()
collection_ref = db.collection(u'collection').get()
documents = list(doc.to_dict() for doc in collection_ref)

How to page through QueryResults

I am getting result from BigQuery using the following code:
from google.oauth2 import service_account
from google.cloud import bigquery
credential = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE)
scoped_credential = credential.with_scopes(BIG_QUERY_SCOPE)
client = bigquery.Client(project="XX-XX",credentials=scoped_credential)
query_results = client.run_sync_query(query_detail)
query_results.use_legacy_sql = False
query_results.run()
iterator = query_results.fetch_data()
rows = iterator.query_result.rows
But it only returns up-to 50000 rows. I tried to paginate while fetching data, but failed to figure out how to do it:
page_token = query_results.page_token
iterator = query_results.fetch_data(max_results=500, page_token=page_token)
I could not find out how to get the updated page_token.
Thanks,
I think you are close. Try running this code now:
data = list(query_results.fetch_data()) # changed from `iterator` to `data` the variable name
The management of page tokens is done automatically for you.

Bigquery : job is done but job.query_results().total_bytes_processed returns None

The following code :
import time
from google.cloud import bigquery
client = bigquery.Client()
query = """\
select 3 as x
"""
dataset = client.dataset('dataset_name')
table = dataset.table(name='table_name')
job = client.run_async_query('job_name_76', query)
job.write_disposition = 'WRITE_TRUNCATE'
job.destination = table
job.begin()
retry_count = 100
while retry_count > 0 and job.state != 'DONE':
retry_count -= 1
time.sleep(10)
job.reload()
print job.state
print job.query_results().name
print job.query_results().total_bytes_processed
prints :
DONE
job_name_76
None
I do not understand why total_bytes_processed returns None because the job is done and the documentation says :
total_bytes_processed:
Total number of bytes processed by the query.
See
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#totalBytesProcessed
Return type: int, or NoneType
Returns: Count generated on the server (None until set by the server).
Looks like you are right. As you can see in the code, the current API does not process data regarding bytes processed.
This has been reported in this issue and as you can see in this tseaver's PR this feature has already been implemented and awaits review /merging so probably we'll have this code in production quite soon.
In the mean time you could get the result from the _properties attribute of job, like:
from google.cloud.bigquery import Client
import types
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/key.json'
bc = Client()
query = 'your query'
job = bc.run_async_query('name', query)
job.begin()
wait_job(job)
query_results = job._properties['statistics'].get('query')
query_results should have the totalBytesProcessed you are looking for.

django retrieve specific data from a dictionary database field

I have a table that contains values saved as a dictionary.
FIELD_NAME: extra_data
VALUE:
{"code": null, "user_id": "103713616419757182414", "access_token": "ya29.IwBloLKFALsddhsAAADlliOoDeE-PD_--yz1i_BZvujw8ixGPh4zH-teMNgkIA", "expires": 3599}
I need to retrieve the user_id value from the field "extra_data" only not the dictionnary like below.
event_list = Event.objects.filter(season_id=season_id, event_status_id=2).value('extra_data')
If you are storing a dictionary as text in the code you can easily convert it to a python dictionary using eval - although I don't know why you'd want to as it opens you to all sorts of potential malicious code injections.
event_list = eval(Event.objects.filter(season_id=season_id, event_status_id=2).value('extra_data'))
user_id = event_list['user_id']
print user_id
Would give:
"103713616419757182414"
Edit:
On deeper inspection , thats not a Python dictionary, you could import a JSON library to import this, or declare what null is like so:
null = None
event_list = eval(Event.objects.filter(season_id=season_id, event_status_id=2).value('extra_data'))
user_id = event_list['user_id']
Either way, the idea of storing any structured data in a django textfield is fraught with danger that will come back to bite you. The best solution is to rethink your data structures.
This method worked for me. However, this works with a json compliant string
import json
json_obj = json.loads(event_list)
dict1 = dict(json_obj)
print dict1['user_id']

Why will these strings not be quoted by psycopg2 for the query?

Writing an app with django, using the psycopg2 engine. It doesn't always seem to want to quote my strings. Here is a test case:
>>> from pypvs.search.models import Addr2zip
>>> kwargs = {
... 'street_name__iexact': 'Common Ground',
... 'state_id__iexact': 'MT',
... }
>>> addrMatch = Addr2zip.objects.extra(
... where = ['ctystate.zip5 = addr2zip.zip5 AND ctystate.city_name = \'%s\'' % 'Philipsburg'],
... tables = ['ctystate', 'addr2zip']
... ).filter(**kwargs).order_by('zip5', 'street_name', 'primary_address_low', 'secondary_address_low')
>>> print addrMatch.query
SELECT "addr2zip"."addr2zip_id", "addr2zip"."zip5", "addr2zip"."zip4_low", "addr2zip"."zip4_high", "addr2zip"."street_direction", "addr2zip"."street_name", "addr2zip"."street_suffix", "addr2zip"."street_post_direction", "addr2zip"."primary_address_low", "addr2zip"."primary_address_high", "addr2zip"."primary_address_parity", "addr2zip"."secondary_address", "addr2zip"."secondary_address_low", "addr2zip"."secondary_address_high", "addr2zip"."secondary_address_parity", "addr2zip"."state_id", "addr2zip"."county_code", "addr2zip"."municipality_key", "addr2zip"."urbanization_key", "addr2zip"."record_type" FROM "addr2zip" , "ctystate" WHERE (ctystate.zip5 = addr2zip.zip5 AND ctystate.city_name = Philipsburg AND UPPER("addr2zip"."state_id"::text) = UPPER(MT) AND UPPER("addr2zip"."street_name"::text) = UPPER(Common Ground) ) ORDER BY "addr2zip"."zip5" ASC, "addr2zip"."street_name" ASC, "addr2zip"."primary_address_low" ASC, "addr2zip"."secondary_address_low" ASC
What could be the reason that these strings are not quoted? For instance, 'Common Ground':
AND UPPER("addr2zip"."street_name"::text) = UPPER(Common Ground)
Not sure if the problem is in my implementation, psycopg2, or the django ORM. I'd appreciate any ideas.
str(query) only returns an approximative representation of the query. Are you trying to pass it to the database?
The query issued with iexact seems correct with Django 1.2.3. The above would result in WHERE UPPER("addr2zip"."street_name"::text) = UPPER(E'Common Ground'). Which version are you using?
To get the query to be executed, use something like:
from django.db import DEFAULT_DB_ALIAS
queryset.query.get_compiler(DEFAULT_DB_ALIAS).as_sql()
But you just pasted it like that yourself:
where = ['ctystate.zip5 = addr2zip.zip5 AND ctystate.city_name = %s' % 'Philipsburg'],
so you got query string constructed with syntax error:
AND ctystate.city_name = Philipsburg