I am using Python 2.7 with sqlite3 version 2.6.0. I am trying to create a memory database, attach to it from a physical database and insert data, then query it back later. I am having issues if anyone can help.
The following two fail with the error message "unable to open database file"
con = sqlite3.connect(":memory:?cache=shared")
con = sqlite3.connect("file::memory:?cache=shared")
The following works until I attempt to access a table in the attached DB. I can do this with physical databases with no problem. I suspect the issue is not having the cache=shared.
con = sqlite3.connect(":memory:")
cursor = con.cursor()
cursor.executescript("create table table1 (columna int)")
cursor.execute("select * from table1")
con2 = sqlite3.connect("anotherdb.db")
cursor2 = con2.cursor()
cursor2.execute("attach database ':memory:' as 'foo'")
cursor2.execute("select * from foo.table1")
The error from the last select is "no such table: foo.table1".
Thanks in advance.
The SQLite library shipped with Python 2.x does not have URI file names enabled, so it is not possible to open an in-memory database in shared-cache mode.
You should switch to apsw, or Python 3.
Related
I want to copy certain data from a Vertica cluster (lets say a test cluster) to another Vertica cluster (lets say QA cluster). Manually I can do this by dumping the result of a query into a CSV file and then importing it on the other cluster. But, how can I do it on a Python script without using os or system commands. I want to do it purely using some Python module or adapter. As of now I am using python-vertica adapter, I am able to connect to Test cluster and get the data into a python list, but I am unable to export it to a CSV file natively using the adapter (i.e. without using python csv module). Also, how can I import the CSV file in my QA cluster using the same adapter (or a different vertica module for python)?
You can do it with COPY FROM VERTICA for simple problems. Read here for more info.
For python you can use in my template:
Environment:
python=2.7.x
vertica-python==0.7.3
Vertica Analytic Database v8.1.1-10
Source code example:
#!/usr/bin/env python2
# coding: UTF-8
import csv
import cStringIO
# connection info: username, password, etc
SRC_DB_INFO = {...}
DST_DB_INFO = {...}
csvbuffer = cStringIO.StringIO()
csvwriter = csv.writer(csvbuffer, delimiter='|', lineterminator='\n', quoting=csv.QUOTE_MINIMAL)
# establish connection to source database
connection = vertica_python.connect(**SRC_DB_INFO)
cursor = connection.cursor()
cursor.execute('SELECT * FROM A')
# convert data to csv format
for row in cursor.iterate():
csvwriter.writerow(row)
# cleanup
cursor.close()
connection.close()
# establish connection to destination database
connection = vertica_python.connect(**DST_DB_INFO)
cursor = connection.cursor()
# copy data
cursor.copy('COPY B FROM STDIN ABORT ON ERROR', csvbuffer.getvalue())
connection.commit()
# cleanup
cursor.close()
connection.close()
I launched a fresh AWS EMR Spark cluster with Zeppelin on AWS to query an MYSQL database. When I tried to add an MYSQL interpreter in Zeppelin the option does not exist. I googled to find a way to get the interpreter to display but I didn't find a solution. How can I get the MYSQL interpreter in Zeppelin so I can query the MYSQL database?
Spark SQL supports many features of SQL:2003 and SQL:2011 [ 1][2], you may consider doing that that via Spark on Zeppelin by adding dependency.
Get a mysql connector with proper version
Add it as a dependency to the Spark interpreter on Zeppelin. (I put the jar on the master machine)
You should be able to access a MySQL table right now. The following is an example using the API of Scala:
/* Database Configuration*/
val jdbcURL = s"jdbc:mysql://${HOST}/${DATABASE}"
val jdbcUsername = s"${USERNAME}"
val jdbcPassword = s"${PASSWORD}"
import java.util.Properties
val connectionProperties = new Properties()
connectionProperties.put("user", jdbcUsername)
connectionProperties.put("password", jdbcPassword)
connectionProperties.put("driver", "com.mysql.cj.jdbc.Driver")
/* Read Data from MySQL */
val desiredData = spark.read.jdbc(jdbcURL, "${TABLE NAME}", connectionProperties)
desiredData.printSchema
/* Data Manipulation */
desiredData.createOrReplaceTempView("desiredData")
val query = s"""
SELECT COUNT(*) AS `Record Number`
FROM desiredData
"""
spark.sql(query).show
val query2 = s"""
SELECT ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY column1, column2) AS column3
FROM desiredData
"""
spark.sql(query2).show
.
.
.
Testing Notes:
EMR: emr-5.10.0 with Pig 0.17.0, Zeppelin 0.7.3, and ,Spark 2.2.0
MySQL: MariaDB 5.2.10
References
Apache Hive (n.d.). Home. [online] Cwiki.apache.org. Available at: https://cwiki.apache.org/confluence/display/Hive/Home [Accessed 1 Dec. 2017].
Apache Spark (n.d.). Compatibility with Apache Hive. [online] spark.apache.org. Available at: ​https://spark.apache.org/docs/latest/sql-programming-guide.html#compatibility-with-apache-hive [Accessed 1 Dec. 2017].
I have some code to query a MYSQL database and send the output to a text file.
The code below prints out the first 7 columns of data and sends it to a text file called Test
My question is, how do i also obtain the column HEADINGS from the database as well to display in the text file?
I am using Python 2.7 with a MYSQL database.
import MySQLdb
import sys
connection = MySQLdb.connect (host="localhost", user = "", passwd = "", db =
"")
cursor = connection.cursor ()
cursor.execute ("select * from tablename")
data = cursor.fetchall ()
OutputFile = open("C:\Temp\Test.txt", "w")
for row in data :
print>>OutputFile, row[0],row[1],row[2],row[3],row[4],row[5],row[6]
OutputFile.close()
cursor.close ()
connection.close ()
sys.exit()
The best way to get the details of the column name is by using INFORMATION_SCHEMA
SELECT `COLUMN_NAME`
FROM `INFORMATION_SCHEMA`.`COLUMNS`
WHERE `TABLE_SCHEMA`='yourdatabasename'
AND `TABLE_NAME`='yourtablename';
or by using the SHOW command of mySQL
SHOW columns FROM your-table;
This command is only mySQL specific.
and then to get the data you can use the .fetchall() function to get the details.
I'm trying to make a connection from python2.7 to H2 (h2-1.4.193.jar - latest)
H2 (is running and available): java -Dh2.bindAddress=127.0.0.1 -cp "E:\Dir\h2-1.4.193.jar;%H2DRIVERS%;%CLASSPATH%" org.h2.tools.Server -tcpPort 15081 -baseDir E:\Dir\db
For python I'm using jaydebeapi:
import jaydebeapi
conn = jaydebeapi.connect('org.h2.Driver', ['jdbc:h2:tcp://localhost:15081/db/test', 'sa', ''], 'E:\Path\to\h2-1.4.193.jar')
curs = conn.cursor()
curs.execute('create table PERSON ("PERSON_ID" INTEGER not null, "NAME" VARCHAR not null, primary key ("PERSON_ID"))')
curs.execute("insert into PERSON values (1, 'John')")
curs.execute("select * from PERSON")
data = curs.fetchall()
print(data)
As a result everytime I get an error: Process finished with exit code -1073741819 (0xC0000005)
Do you have any ideas about this case? Or maybe there is something else that I can use instead of the jaydebeapi?
Answering my own question:
First of all I could not do anything through the jaydebeapi.
I've read that H2 supports PostgreSQL network protocol. My next steps were to transfer h2 and python into pgsql:
H2 pg:
java -Dh2.bindAddress=127.0.0.1 -cp h2.jar;postgresql-9.4.1212.jre6.jar org.h2.tools.Server -baseDir E:\Dir\h2\db
TCP server running at tcp://localhost:9092 (only local connections)
PG server running at pg://localhost:5435 (only local connections)
Web Console server running at http://localhost:8082 (only local connections)
postgresql.jar was included to try to connect from Web Console.
Python: psycopg2 instead of jaydebeapi:
import psycopg2
conn = psycopg2.connect("dbname=h2pg user=sa password='sa' host=localhost port=5435")
cur = conn.cursor()
cur.execute('create table PERSON ("PERSON_ID" INTEGER not null, "NAME" VARCHAR not null, primary key ("PERSON_ID"))')
As a result - it's working now. Connection was established and table was created.
Web Console settings:
Generic PostgreSQL
org.postgresql.Driver
jdbc:postgresql://localhost:5435/h2pg
name: sa, pass: sa
Web Console did connect but did not show me table list and showed many errors instead: "CURRENT_SCHEMAS" is not found etc.... PG admin 4 was not also able to connect. SQuirrel to the rescue - it had connected to this db and all is working fine there.
Perhaps a bit late for an update after 1.5 years, but the current version connects fine with H2, without having to use a postgres driver.
conn = jaydebeapi.connect("org.h2.Driver", "jdbc:h2:~/test", ["sa", ""], "/Users/angelo/websites/GEPR/h2/bin/h2-1.4.197.jar",)
source: https://pypi.org/project/JayDeBeApi/#usage
I can connect to a database in sqlite3, attach another database and run an inner join to retrieve records from two tables, one in each database. But when I try to do the same with a python script running on the command line, I get no results - the error reads that the table (in the attached database) does not exist.
import sqlite3 as lite
db_acts = '/full/path/to/activities.db'
db_sign = '/full/path/to/sign_up.db'
def join_tables():
try:
con = lite.connect(db_acts)
cursor = con.cursor()
cursor.execute("attach database 'db_sign' as 'sign_up'")
cursor.execute("select users.ID, users.Email, users.TextMsg from sign_up.users INNER JOIN db_acts.alerts on sign_up.users.ID = db_acts.alerts.UID")
rows = cursor.fetchall()
for row in rows:
print 'row', row
con.commit()
con.close()
except lite.Error, e:
print 'some error'
sys.exit(1)
The response on localhost is the same as on the HostGator remote host where I just ran a test (it's a new site without user inputs at the moment). I have no problem reading rows from tables in the original database connection - only the tables in the attached database are not read. The attachment works at least partially - a print statement to attach it in the except clause shows that the database is in use.