Cloud function that queries Oracle database - google-cloud-platform

Need some support in building the cloud function that calls Oracle database, wrote the python code and it's on repo and the function calls it with an HTTP trigger, so that's good.
To connect to Oracle, Oracle Client Library is needed, and it's uploaded on cloud storage bucket.
So now the repo and bucket and the function are all set and in the same region, yet the function throws an error that it can't configure the oracle client library
Here is the code if it's important
import cx_Oracle
def queryOracleDatabase(request):
# Oracle Database Connection
username = 'x'
password = 'y'
connStr = '00.00.00.00:0000/abcd'
try:
conn = cx_Oracle.connect(username, password, connStr)
except cx_Oracle.DatabaseError as e:
error = e.args
print('Error: ', error.message)
return
# Execute the query
try:
cursor = conn.cursor()
cursor.execute('select * table')
data = cursor.fetchall()
except cx_Oracle.DatabaseError as e:
error = e.args
print('Error: ', error.message)
return
# Clean up
cursor.close()
conn.close()
return data
And this is the error it throws
Error: DPI-1047: Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory"
How to connect the function with the bucket?

Given more context provided by the comments, Cloud Functions don't fit so well the use case since they don't provide a persistent disk where you may store your oracle client lib.
Cloud functions is a very specialized service, such specialization provides a very low adopting curve and makes it the best choice when the use case and tech stack fit such specialization (eg.: No need of FS unless /tmp, or no need to customize the runtime/OS).
Instead, when the use case does require some degree of customization of the container where the function runs, Cloud Run comes to life. By simply defining a docker container you can make it host the oracle client lib in the FS (everywhere you need), as well as running your function reusing your current code almost as is.
I presume your teck stack is quite standard, so I would check the docker hub for an image based on python and maybe even the oracle SDK you need.. It would be an easy starting point.
About accessing the oracle client hosted on a bucket: the cloud function might download it to the /temp storage, but I'm not sure that you can actually load the lib from there. Such approach of storing libraries to buckets is something unusual to me (just personal experience).

Related

How to Load Data into remote Neo4j AWS instance?

I want to import data into a Neo4j instance brought up in AWS (community edition from AWS marketplace). One option is to convert the data to CSV and run the LOAD CSV command in the Neo4j UI, and point it to a public http address that reads from S3. This, however, means we need to publicly expose the file externally which would expose sensitive data. How else can import this data?
Thanks!
I would suggest you use any of the Neo4j driver like Python or Java. Here is one Python example that I used in my posts:
def store_to_neo4j(distances):
data = [{'source': el[0], 'target': el[1], 'weight': distances[el]} for el in distances]
with driver.session() as session:
session.run("""
UNWIND $data as row
MERGE (c:Character{name:row.source})
MERGE (t:Character{name:row.target})
MERGE (c)-[i:INTERACTS]-(t)
SET i.weight = coalesce(i.weight,0) + row.weight
""", {'data': data})
You don't want to execute the import line by line, but you want to batch lets say 1000 rows in a parameter and then use UNWIND operator to import this data.

How to specify dataset location using BigQuery API v0.27?

I am trying to figure out how to specify the dataset location in a BigQuery API query using v0.27 of the BigQuery API.
I have a dataset located in northamerica-northeast1 and the BigQuery API is returning 404 errors since this is not the default multi-regional location "US."
I am using the run_async_query method to execute my queries but based on documentation am unsure how to add a location to this field to make it location aware.
I have also tried to previously update my client instantiation like this:
def _get_client(self):
bigquery.Client.SCOPE = (
'https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive')
client = bigquery.Client.from_service_account_json(_KEY_FILE)
if self._params['bq_data_location'].strip():
client.location = self._params['bq_data_location']
return client
However, it does not appear that this is the correct way to inform the BigQuery API of a dataset location.
For additional context, in my SQL that I am passing to the BigQuery API, I am already specifying the PROJECT_ID.DATASET_ID.TABLE_ID, however, this does not seem to be sufficient to find regional data.
Furthermore, I am making this request from Google App Engine using the CRMint open source data flow platform.
Can you please help me with an example of how location can be added to the BigQuery API for v0.27 so that the API does not return 404?
Thank you!
From the code sample it seems you're likely talking about google-cloud-bigquery 0.27, which was released in Aug 2017 and predates location support (as well as many other features).
Your best bet is to update that dependency to something more recent.

Is there a way to pass credentials programmatically for using google documentAI without reading from a disk?

I am trying to run the demo code given in pdf parsing of GCP document AI. To run the code, exporting google credentials as a command line works fine. The problem comes when the code needs to run in memory and hence no credential files are allowed to be accessed from disk. Is there a way to pass the credentials in the document ai parsing function?
The sample code of google:
def main(project_id='YOUR_PROJECT_ID',
input_uri='gs://cloud-samples-data/documentai/invoice.pdf'):
"""Process a single document with the Document AI API, including
text extraction and entity extraction."""
client = documentai.DocumentUnderstandingServiceClient()
gcs_source = documentai.types.GcsSource(uri=input_uri)
# mime_type can be application/pdf, image/tiff,
# and image/gif, or application/json
input_config = documentai.types.InputConfig(
gcs_source=gcs_source, mime_type='application/pdf')
# Location can be 'us' or 'eu'
parent = 'projects/{}/locations/us'.format(project_id)
request = documentai.types.ProcessDocumentRequest(
parent=parent,
input_config=input_config)
document = client.process_document(request=request)
# All text extracted from the document
print('Document Text: {}'.format(document.text))
def _get_text(el):
"""Convert text offset indexes into text snippets.
"""
response = ''
# If a text segment spans several lines, it will
# be stored in different text segments.
for segment in el.text_anchor.text_segments:
start_index = segment.start_index
end_index = segment.end_index
response += document.text[start_index:end_index]
return response
for entity in document.entities:
print('Entity type: {}'.format(entity.type))
print('Text: {}'.format(_get_text(entity)))
print('Mention text: {}\n'.format(entity.mention_text))
When you run your workloads on GCP, you don't need to have a service account key file. You MUSTN'T!!
Why? 2 reasons:
It's useless because all GCP products have, at least, a default service account. And most of time, you can customize it. You can have a look on Cloud Function identity in your case.
Service account key file is a file. It means a lot: you can copy it, send it by email, commit it in Git repository... many persons can have access to it and you loose the management of this secret. And because it's a secret, you have to store it securely, you have to rotate it regularly (at least every 90 days, Google recommendation),... It's nighmare! When you can, don't use service account key file!
What the libraries are doing?
There are looking if GOOGLE_APPLICATION_CREDENTIALS env var exists.
There are looking into the "well know" location (when you perform a gcloud auth application-default login to allow the local application to use your credential to access to Google Resources, a file is created in a "standard location" on your computer)
If not, check if the metadata server exists (only on GCP). This server provides the authentication information to the libraries.
else raise an error.
So, simply use the correct service account in your function and provide it the correct role to achieve what you want to do.

Connection issues implementing Django Custom Storage backend with Box Cloud Storage API

I am working on a legacy Django application that has a lot of third party dependencies, one of which is the storage backend for file uploads. I was recently tasked with replacing our legacy third party cloud storage vendor with a newer cloud storage vendor (Box).
The cloud storage is implemented as a custom storage backend and used as the "storage" parameter in FileFields in models throughout the app. I'm basically trying to figure out what exactly happens in storage if you have a FileField in a model and you create a ModelForm based on that model, then you call "save" on the form.
It seems that a lot of stuff is going on and some of it is causing connection problems with the cloud storage API.
I tried reading the Django source to follow it and got all the way down to where the model is deciding if it should do an update (by doing an "exists" check in storage) or an insert.
Once it decides to do an insert, I noticed a call to my cloud storage backend occurs to upload the file (presumably non blocking?) as the insert sql is being generated.
Somewhere in here, connections to the cloud storage begin to hang and become unresponsive. At least, all I see in logs is
INFO requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): upload.box.com"
and no further info or response.
Unlike the previous cloud storage, the new one issues JWT's per session instead of having a static auth token that you simply pass every time. If I do not use Django's ModelForm with its magical "save" method, but call methods directly on the models with the FileFields, I do not encounter the connection problem - I get responses just fine from the cloud storage API.
So, I'm thinking there must be some kind of concurrency issue when calling "save" on a form that affects a model with a FileField?? I'm a little stumped. The code is involved, so it is hard to copy here, but basically it comes down to:
class CustomStorage:
def __init__(self):
set up storage API client,
authenticate client instance, etc
def _save(self, name):
call storage API methods to upload file
** includes a retry loop with file renaming
algorithm to avoid name conflicts, as
cloud API does not allow duplicate file
names
def exists(self, name):
call storage API methods to determine if file name conflict exists
def _open(self, name):
call storage API methods to download file
custom_storage = CustomStorage()
class ExampleModel(models.Model):
name = models.CharField(maxlength=255)
file_ref = FPFileField(upload_to="uploads", storage=custom_storage) #because we also have a dependency on FilePicker, now called FileStack
class ExampleModelForm(ModelForm):
file_ref = CustomFilePickerField()
class Meta:
model = ExampleModel
fields = ('name', 'file_obj')
form = ExampleModelForm()
model = form.save() # --> connection problem with
# cloud storage API starts here
# if I were to call ExampleModel.objects.create(...),
# the storage upload process would work fine
Is there some gotcha I'm not aware of that Django experts would know about implementing custom storage backends for Django based on cloud storage APIs?
Turns out that the problem was our usage of a deprecated version of a FilePicker field for the uploaded document model field, combined with an old bug in a stale version of Requests. Occasionally, this field would return a Django file instance wrapped around a cStringIO.StringIO object instead of a vanilla file object. This ended up running into a bug in the Requests library which caused stalled responses when chunking for a multi-part POST when the payload is a StringIO instance. Because upgrading is not an option, I solved the issue by detecting if the the underlying Django FileField File is not a file object and re-wrapping it in a ContentFile instance and seek-ing to 0, in order to play nice with the older version of Requests. If anyone knows of a better alternative, by all means, please let me know.

Why LOAD DATA LOCAL INFILE will work from the CLI but not in application?

The problem:
My C++ application connects to a MySQL server, reads the first/header line of each db export.txt, makes a create table statement to prepare for the import and executes that against the database (no problem with that, the table appears just as intended) -- but when I try and execute the LOAD DATA LOCAL INFILE to import the data into the newly created table, I get the error "The used command is not allowed with this MySQL version". But, this works on the CLI! When I execute this command on the CLI using mysql -u <user> -p<password> -e "LOAD DATA LOCAL INFILE 'myfile.txt' INTO TABLE mytable FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n';" it works flawlessly?
The Situation:
My company gets a large quantity of database exports (160 files/10gb of .txt files that are '|' delimited) from our vendors on a monthly basis that have to replace the old vendor lists. I am working on a smallish C++ app to deal with it on my work desktop. The application is meant to set up the required tables, import the data, then execute a series of intermediate queries against multiple tables to assemble information in a series of final tables, which is then itself exported and uploaded to the production environment, for use in the companies e-commerce website.
My Setup:
Ubuntu 12.04
MySQL Server v. 5.5.29 + MySQL Command Line client
Linux GNU C++ Compiler
libmysqlcppconn is installed and I have the required mysqlconn library linked in.
I have already overcome/tried the following issues/combinations:
1.) I have already discovered (the hard way) that LOAD DATA [LOCAL] INFILE statements must be enabled in the config -- I have the "local-infile" option set in the configuration files for both client and server. (fixed by updating the /etc/mysql/my.cnf with "local-infile" statements for the client and server. NOTE: I could have used the --local-infile=1 to restart the mysql-server, but this is my local dev environment so I just wanted it turned on permanently)
2.) LOAD DATA LOCAL INFILE seems to fail to perform the import (from the CLI) if the target import file does not have execute permissions enabled (fixed with chmod +x target_file.txt)
3.) I am using the mysql root account in my application code (because its my localhost, not production and this particular program will never run on a production server.)
4.) I have tried executing my compiled binary program using the sudo command (no change, same error "The used command is not allowed with this MySQL version")
5.) I have tried changing the ownership of the binary file from my normal login to root (no change, same error "The used command is not allowed with this MySQL version")
6.) I know the libcppmysqlconn is working because I am able to connect and perform the CREATE TABLE call without a problem, and I can do other queries and execute statements
What am I missing? Any suggestions? Thanks in advance :)
After much diligent trial and error working with the /etc/mysql/my.cfg file (I know this is a permissions issue because it works on the command line, but not from the connector) and after much googling and finding some back alley tech support posts I've come to conclude that the MySQL C++ connector did not (for whatever reason) decide to implement the ability for developers to be able to allow the local-infile=1 option from the C++ connector.
Apparently some people have been able to hack/fork the MySQL C++ connector to expose the functionality, but no one posted their source code -- only said it worked. Apparently there is a workaround in the MySQL C API after you initialize the connection you would use this:
mysql_options( &mysql, MYSQL_OPT_LOCAL_INFILE, 1 );
which apparently allows the LOAD DATA LOCAL INFILE statements to work with the MySQL C API.
Here are some reference articles that lead me to this conclusion:
1.) How can I get the native C API connection structure from MySQL Connector/C++?
2.) Mysql 5.5 LOAD DATA INFILE Permissions
3.) http://osdir.com/ml/db.mysql.c++/2004-04/msg00097.html
Essentially if you want the ability to use the LOAD DATA LOCAL INFILE functionality from a programmatic Connector API -- you have to use the mysql C API or hack/fork the existing mysql C++ api to expose the connection structure. Or just stick to executing the LOAD DATA LOCAL INFILE from the command line :(