Cannot connect to Cloud SQL using Apache-Beam JDBC - google-cloud-platform

I am trying to connect to Cloud SQL by using Python SDK io.jdbc module, more specifically ReadFromJdbc class, which is documented here- https://beam.apache.org/releases/pydoc/current/apache_beam.io.jdbc.html
Based on it and info on connecting to Cloud MySQL using JDBC here- https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/blob/main/docs/jdbc-mysql.md I wrote the following code
import apache_beam as beam
import apache_beam.io.jdbc as jdbc
import typing
import apache_beam.coders as coders
from apache_beam.options.pipeline_options import PipelineOptions
pipeline_options = {
'project': 'project-name',
'runner': 'DataflowRunner',
'region': 'europe-central2',
'staging_location':"gs://temp",
'temp_location':"gs://temp",
'template_location':"gs://templates/temp_name"
}
pipeline_options = PipelineOptions.from_dictionary(pipeline_options)
serviceAccount = r'path\to\serviceaccount.json'
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = serviceAccount
ExampleRow = typing.NamedTuple('ExampleRow',
[('id', int), ('migration', str)])
coders.registry.register_coder(ExampleRow, coders.RowCoder)
with beam.Pipeline(options=pipeline_options) as p:
res = (
p
| "Read database list" >> jdbc.ReadFromJdbc(
table_name='table',
driver_class_name='com.mysql.jdbc.Driver',
jdbc_url='jdbc:mysql:///<DATABASE_NAME>?cloudSqlInstance=<INSTANCE_CONNECTION_NAME>&socketFactory=com.google.cloud.sql.mysql.SocketFactory&user=<MYSQL_USER_NAME>&password=<MYSQL_USER_PASSWORD>',
username='user',
password='pass',
query = "select id, migration from db.table;",
fetch_size=1,
classpath=["com.google.cloud.sql:mysql-socket-factory-connector-j-8:1.7.2"],
expansion_service = 'host:6666'
)
| "Print results" >> beam.io.WriteToText(r'gs://output/out.csv')
)
For the expansion service I have set up WLS2 python environment as documented here- https://beam.apache.org/documentation/sdks/java-multi-language-pipelines/#advanced-start-an-expansion-service
Unfortunately, I get this error:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNAVAILABLE: ipv4:127.0.0.1:6666: WSA Error"
debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNAVAILABLE: ipv4:127.0.0.1:6666: WSA Error {grpc_status:14, created_time:"2022-12-08T15:43:05.445755053+00:00"}"
I tried to switch expansion_service to a specific IP that I got from wls hostname -I but it produced the same result, even though you can reach it (tested with ping and hosted a webserver).
Am I doing something completely wrong? I find it hard to believe that it's so hard to connect to Cloud SQL, so I must be...

Transforms under apache_beam.io.jdbc module are cross-language transforms implemented in the Beam Java SDK. Hence, during the pipeline construction, Python SDK will connect to a Java expansion service to expand these transforms. You followed the instructions to create a Python expansion service.
I think the easiest thing to do will be to use the default expansion service.
First, install Java runtime in the computer from where the pipeline is constructed and make sure that java command is available.
Use the following transform to read from Cloud SQL,
p | "Read database list" >> jdbc.ReadFromJdbc(
table_name='table',
driver_class_name='com.mysql.jdbc.Driver',
jdbc_url='jdbc:mysql:///<DATABASE_NAME>?cloudSqlInstance=<INSTANCE_CONNECTION_NAME>&socketFactory=com.google.cloud.sql.mysql.SocketFactory&user=<MYSQL_USER_NAME>&password=<MYSQL_USER_PASSWORD>',
username='user',
password='pass',
query = "select id, migration from db.table;",
fetch_size=1,
classpath=["com.google.cloud.sql:mysql-socket-factory-connector-j-8:1.7.2"]
)

Related

Trouble authenticating and writing to database locally

I'm having trouble authenticating and writing data to a spanner database locally. All imports are up to date - google.cloud, google.auth2, etc. I have tried having someone else run this and it works fine, so the problem seems to be something on my end - something wrong or misconfigured on my computer, maybe where the credentials are stored or something?
Anyone have any ideas?
from google.cloud import spanner
from google.api_core.exceptions import GoogleAPICallError
from google.api_core.datetime_helpers import DatetimeWithNanoseconds
import datetime
from google.oauth2 import service_account
def write_to(database):
record = [[
1041613562310836275,
'test_name'
]]
columns = ("id", "name")
insert_errors = []
try:
with database.batch() as batch:
batch.insert_or_update(
table = "guild",
columns = columns,
values = record,
)
except GoogleAPICallError as e:
print(f'error: {e}')
insert_errors.append(e.message)
pass
return insert_errors
if __name__ == "__main__":
credentials = service_account.Credentials.from_service_account_file(r'path\to\a.json')
instance_id = 'instance-name'
database_id = 'database-name'
spanner_client = spanner.Client(project='project-name', credentials=credentials)
print(f'spanner creds: {spanner_client.credentials}')
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)
insert_errors = write_to(database)
some credential tests:
creds = service_account.Credentials.from_service_account_file(a_json)
<google.oauth2.service_account.Credentials at 0x...>
spanner_client.credentials
<google.auth.credentials.AnonymousCredentials at 0x...>
spanner_client.credentials.signer_email
AttributeError: 'AnonymousCredentials' object has no attribute 'signer_email'
creds.signer_email
'...#....iam.gserviceaccount.com'
spanner.Client().from_service_account_json(a_json).credentials
<google.auth.credentials.AnonymousCredentials object at 0x...>
The most common reason for this is that you have accidentally set (or forgot to unset) the environment variable SPANNER_EMULATOR_HOST. If this environment variable has been set, the client library will try to connect to the emulator instead of Cloud Spanner. This will cause the client library to wait for a long time while trying to connect to the emulator (assuming that the emulator is not running on your machine). Unset the environment variable to fix this problem.
Note: This environment variable will only affect Cloud Spanner client libraries, which is why other Google Cloud product will work on the same machine. The script will also in most cases work on other machines, as they are unlikely to have this environment variable set.

Pytest on Flask based API - test by calling the remote API

New to using Pytest on APIs. From my understanding, testing creates another instance of Flask. Additionally, from the tutorials I have seen, they also suggest to create a separate DB table instance to add, fetch and remove data for test purposes. However, I simply plan to use the remote api URL as host to simply make the call.
Now, I set my conftest like this, where the flag --testenv would indicate to make the get/post call on the host listed below:
import pytest
import subprocess
def pytest_addoption(parser):
"""Add option to pass --testenv=api_server to pytest cli command"""
parser.addoption(
"--testenv", action="store", default="exodemo", help="my option: type1 or type2"
)
#pytest.fixture(scope="module")
def testenv(request):
return request.config.getoption("--testenv")
#pytest.fixture(scope="module")
def testurl(testenv):
if testenv == 'api_server':
return 'http://api_url:5000/'
else:
return 'http://locahost:5000'
And my test file is written like this:
import pytest
from app import app
from flask import request
def test_nodes(app):
t_client = app.test_client()
truth = [
{
*body*
}
]
res = t_client.get('/topology/nodes')
print (res)
assert res.status_code == 200
assert truth == json.loads(res.get_data)
I run the code using this:
python3 -m pytest --testenv api_server
The thing I expect is that the test file would simply make a call to the remote api with the creds, fetch the data regardless of how it gets pulled in the remote code, and bring it here for assertion. However, I am getting the 400 BAD REQUEST error, with the error being like this:
assert 400 == 200
E + where 400 = <WrapperTestResponse streamed [400 BAD REQUEST]>.status_code
single_test.py:97: AssertionError
--------------------- Captured stdout call ----------------------
{"timestamp": "2022-07-28 22:11:14,032", "level": "ERROR", "func": "connect_to_mysql_db", "line": 23, "message": "Error connecting to the mysql database (2003, \"Can't connect to MySQL server on 'mysql' ([Errno -3] Temporary failure in name resolution)\")"}
<WrapperTestResponse streamed [400 BAD REQUEST]>
Does this mean that the test file is still trying to lookup the database locally for fetching? I am unable to figure out on which host are they sending the test url as well, so I am kind of stuck here. Looking to get some help around here.
Thanks.

Dialogflow: Agent metadata not found for agentId

I'm trying to use Dialogflow's detect_intent in Python and I keep getting:
404 com.google.apps.framework.request.NotFoundException: Agent metadata not found for agentId: ####-####-####-####-####
Here's a snippet of my code:
import google.cloud.dialogflow as dialogflow
from CONFIG import DIALOGFLOW_PROJECT_ID
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = 'credentials/dialogflow.json'
def predict_intent(text, language):
session_client = dialogflow.SessionsClient()
session = session_client.session_path(DIALOGFLOW_PROJECT_ID, SESSION_ID)
text_input = dialogflow.TextInput(text=text, language_code=language)
query_input = dialogflow.QueryInput(text=text_input)
response = session_client.detect_intent(session=session, query_input=query_input) # ERROR
return response.query_result.intent.display_name
I tried running the function multiple times and some of them succeed, but most fall in the exception.
I can train the bot using the same interface and it works fine.
I'm using Python 3.7 and the following Google Cloud modules: google-api-core==2.0.1, google-auth==2.0.2, google-cloud-dialogflow==2.7.1, googleapis-common-protos==1.53.0.

InvalidQueryException: Consistency level LOCAL_ONE is not supported for this operation. Supported consistency levels are: LOCAL_QUORUM

import org.apache.spark._
import org.apache.spark.SparkContext._
import com.datastax.spark.connector._
import com.datastax.spark.connector.cql.CassandraConnector
val conf = new SparkConf()
.setMaster("local[*]")
.setAppName("XXXX")
.set("spark.cassandra.connection.host" ,"cassandra.us-east-2.amazonaws.com")
.set("spark.cassandra.connection.port", "9142")
.set("spark.cassandra.auth.username", "XXXXX")
.set("spark.cassandra.auth.password", "XXXXX")
.set("spark.cassandra.connection.ssl.enabled", "true")
.set("spark.cassandra.connection.ssl.trustStore.path", "/home/nihad/.cassandra/cassandra_truststore.jks")
.set("spark.cassandra.connection.ssl.trustStore.password", "XXXXX")
.set("spark.cassandra.output.consistency.level", "LOCAL_QUORUM")
val connector = CassandraConnector(conf)
val session = connector.openSession()
sesssion.execute("""INSERT INTO "covid19".delta_by_states (state_code, state_value, date ) VALUES ('kl', 5, '2020-03-03');""")
session.close()
i amn trying to write data to AWS Cassandra Keyspace using Spark App set in my local system.
Problem is when i execute above code, I get Exception like below:
"com.datastax.oss.driver.api.core.servererrors.InvalidQueryException:
Consistency level LOCAL_ONE is not supported for this operation.
Supported consistency levels are: LOCAL_QUORUM"
As you can see from the above code I have already set cassandra.output.consistency.level as LOCAL_QUORUM in Spark Conf. Also I am using datastax cassandra driver.
But when I read data from AWS Cassandra, it works fine. Also I tried same INSERT command in AWS Keyspace cqlsh. It is working fine there too. So Query is valid.
Can someone help me how to set consistency via datastax.CassandraConnector?
Cracked it.
Instead of setting cassandra consistency via spark config. I created an application.conf file in src/main/resources directory.
datastax-java-driver {
basic.contact-points = [ "cassandra.us-east-2.amazonaws.com:9142"]
advanced.auth-provider{
class = PlainTextAuthProvider
username = "serviceUserName"
password = "servicePassword"
}
basic.load-balancing-policy {
local-datacenter = "us-east-2"
}
advanced.ssl-engine-factory {
class = DefaultSslEngineFactory
truststore-path = "yourPath/.cassandra/cassandra_truststore.jks"
truststore-password = "trustorePassword"
}
basic.request.consistency = LOCAL_QUORUM
basic.request.timeout = 5 seconds
}
and created cassandra session like below
import com.datastax.oss.driver.api.core.config.DriverConfigLoader
import com.datastax.oss.driver.api.core.CqlSession
val loader = DriverConfigLoader.fromClassPath("application.conf")
val session = CqlSession.builder().withConfigLoader(loader).build()
sesssion.execute("""INSERT INTO "covid19".delta_by_states (state_code, state_value, date ) VALUES ('kl', 5, '2020-03-03');""")
It finally worked. No need to mess with spark config
Doc for Driver Config https://docs.datastax.com/en/drivers/java/4.0/com/datastax/oss/driver/api/core/config/DriverConfigLoader.html#fromClasspath-java.lang.String-
datastax configuration doc https://docs.datastax.com/en/developer/java-driver/4.6/manual/core/configuration/reference/

Connect Python to H2

I'm trying to make a connection from python2.7 to H2 (h2-1.4.193.jar - latest)
H2 (is running and available): java -Dh2.bindAddress=127.0.0.1 -cp "E:\Dir\h2-1.4.193.jar;%H2DRIVERS%;%CLASSPATH%" org.h2.tools.Server -tcpPort 15081 -baseDir E:\Dir\db
For python I'm using jaydebeapi:
import jaydebeapi
conn = jaydebeapi.connect('org.h2.Driver', ['jdbc:h2:tcp://localhost:15081/db/test', 'sa', ''], 'E:\Path\to\h2-1.4.193.jar')
curs = conn.cursor()
curs.execute('create table PERSON ("PERSON_ID" INTEGER not null, "NAME" VARCHAR not null, primary key ("PERSON_ID"))')
curs.execute("insert into PERSON values (1, 'John')")
curs.execute("select * from PERSON")
data = curs.fetchall()
print(data)
As a result everytime I get an error: Process finished with exit code -1073741819 (0xC0000005)
Do you have any ideas about this case? Or maybe there is something else that I can use instead of the jaydebeapi?
Answering my own question:
First of all I could not do anything through the jaydebeapi.
I've read that H2 supports PostgreSQL network protocol. My next steps were to transfer h2 and python into pgsql:
H2 pg:
java -Dh2.bindAddress=127.0.0.1 -cp h2.jar;postgresql-9.4.1212.jre6.jar org.h2.tools.Server -baseDir E:\Dir\h2\db
TCP server running at tcp://localhost:9092 (only local connections)
PG server running at pg://localhost:5435 (only local connections)
Web Console server running at http://localhost:8082 (only local connections)
postgresql.jar was included to try to connect from Web Console.
Python: psycopg2 instead of jaydebeapi:
import psycopg2
conn = psycopg2.connect("dbname=h2pg user=sa password='sa' host=localhost port=5435")
cur = conn.cursor()
cur.execute('create table PERSON ("PERSON_ID" INTEGER not null, "NAME" VARCHAR not null, primary key ("PERSON_ID"))')
As a result - it's working now. Connection was established and table was created.
Web Console settings:
Generic PostgreSQL
org.postgresql.Driver
jdbc:postgresql://localhost:5435/h2pg
name: sa, pass: sa
Web Console did connect but did not show me table list and showed many errors instead: "CURRENT_SCHEMAS" is not found etc.... PG admin 4 was not also able to connect. SQuirrel to the rescue - it had connected to this db and all is working fine there.
Perhaps a bit late for an update after 1.5 years, but the current version connects fine with H2, without having to use a postgres driver.
conn = jaydebeapi.connect("org.h2.Driver", "jdbc:h2:~/test", ["sa", ""], "/Users/angelo/websites/GEPR/h2/bin/h2-1.4.197.jar",)
source: https://pypi.org/project/JayDeBeApi/#usage