I have installed following necessary packages on the remote server to access Hive through Python.
Python 2.7.6,
Python development tools,
pyhs2,
sasl-0.1.3,
thrift-0.9.1,
PyHive-0.1.0
Here is the Python script to access Hive.
#!/usr/bin/env python
import pyhs2 as hive
import getpass
DEFAULT_DB = 'camp'
DEFAULT_SERVER = '10.25.xx.xx'
DEFAULT_PORT = 10000
DEFAULT_DOMAIN = 'xxx.xxxxxx.com'
# Get the username and password
u = raw_input('Enter PAM username: ')
s = getpass.getpass()
# Build the Hive Connection
connection = hive.connect(host=DEFAULT_SERVER, port=DEFAULT_PORT, authMechanism='LDAP', user=u + '#' + DEFAULT_DOMAIN, password=s)
# Hive query statement
statement = "select * from camp.test"
cur = connection.cursor()
# Runs a Hive query and returns the result as a list of list
cur.execute(statement)
df = cur.fetchall()
Here is the output I got:
File "build/bdist.linux-x86_64/egg/pyhs2/__init__.py", line 7, in connect
File "build/bdist.linux-x86_64/egg/pyhs2/connections.py", line 46, in __init__
File "build/bdist.linux-x86_64/egg/pyhs2/cloudera/thrift_sasl.py", line 74, in open
File "build/bdist.linux-x86_64/egg/pyhs2/cloudera/thrift_sasl.py", line 92, in _recv_sasl_message
File "build/bdist.linux-x86_64/egg/thrift/transport/TTransport.py", line 58, in readAll
File "build/bdist.linux-x86_64/egg/thrift/transport/TSocket.py", line 118, in read
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
I don't see any error in the output after executing the script, however I don't see any query results on the screen. I'm not sure why it's not displaying any query results, Hive server IP, port, user and password are correct. I also verified connectivity between hive server and remote server, no issues with connectivity.
Try using this code:
import pyhs2
with pyhs2.connect(host='localhost',
port=10000,
authMechanism="PLAIN",
user='root',
password='test',
database='default') as conn:
with conn.cursor() as cur:
#Show databases
print cur.getDatabases()
#Execute query
cur.execute("select * from table")
#Return column info from query
print cur.getSchema()
#Fetch table results
for i in cur.fetch():
print i
I've managed to get access by using the following
from pyhive import presto
DEFAULT_DB = 'XXXXX'
DEFAULT_SERVER = 'server.name.blah'
DEFAULT_PORT = 8000
# Username
u = "user"
# Build the Hive Connection
connection = presto.connect(host=DEFAULT_SERVER, port=DEFAULT_PORT, username=u)
# Hive query statement
statement = "select * from public.dudebro limit 5"
cur = connection.cursor()
# Runs a Hive query and returns the result as a list of list
cur.execute(statement)
df = cur.fetchall()
print df
Related
I'm using the psychopg2 module to make queries against QuestDB from Python. I have had some trouble using the copy_from() cursor object to get CSV data into a table. What's the best way to get this into the database?
I'm trying the following:
import pandas as pd
import numpy as np
import psycopg2
import os
conn = psycopg2.connect(user="admin",
password="quest",
host="127.0.0.1",
port="8812",
database="qdb")
cursor = conn.cursor()
dest_table = "eur_fr_bulk"
temp_dataframe = "./temp_dataframe.csv"
# input
df = pd.read_csv("./data/eur_fr.csv")
df.to_csv(temp_dataframe, index_label='id', header=False)
f = open(temp_dataframe, 'r')
cursor = conn.cursor()
try:
cursor.copy_from(f, dest_table)
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
os.remove(temp_dataframe)
print("Error: %s" % error)
conn.rollback()
cursor.close()
cursor.close()
The copy_from() wrapper in psychopg2 is executing some SQL in the background that's not yet supported in QuestDB as of yet, specifically, it will run
COPY my_table FROM stdin WITH DELIMITER AS ' ' NULL AS '\\N'
The DELIMITER keyword is not yet implemented. As a workaround, you can either make the request via HTTP in python, which might be the most convenient:
import requests
csv = {'data': ('my_table_import', open('./data/eur_fr.csv', 'r'))}
server = 'http://localhost:9000/imp'
response = requests.post(server, files=csv)
print(response.text)
or you can specify a copy directory in the server.conf file which allows loading CSV files. This is documented on the COPY documentation page.
I want to copy certain data from a Vertica cluster (lets say a test cluster) to another Vertica cluster (lets say QA cluster). Manually I can do this by dumping the result of a query into a CSV file and then importing it on the other cluster. But, how can I do it on a Python script without using os or system commands. I want to do it purely using some Python module or adapter. As of now I am using python-vertica adapter, I am able to connect to Test cluster and get the data into a python list, but I am unable to export it to a CSV file natively using the adapter (i.e. without using python csv module). Also, how can I import the CSV file in my QA cluster using the same adapter (or a different vertica module for python)?
You can do it with COPY FROM VERTICA for simple problems. Read here for more info.
For python you can use in my template:
Environment:
python=2.7.x
vertica-python==0.7.3
Vertica Analytic Database v8.1.1-10
Source code example:
#!/usr/bin/env python2
# coding: UTF-8
import csv
import cStringIO
# connection info: username, password, etc
SRC_DB_INFO = {...}
DST_DB_INFO = {...}
csvbuffer = cStringIO.StringIO()
csvwriter = csv.writer(csvbuffer, delimiter='|', lineterminator='\n', quoting=csv.QUOTE_MINIMAL)
# establish connection to source database
connection = vertica_python.connect(**SRC_DB_INFO)
cursor = connection.cursor()
cursor.execute('SELECT * FROM A')
# convert data to csv format
for row in cursor.iterate():
csvwriter.writerow(row)
# cleanup
cursor.close()
connection.close()
# establish connection to destination database
connection = vertica_python.connect(**DST_DB_INFO)
cursor = connection.cursor()
# copy data
cursor.copy('COPY B FROM STDIN ABORT ON ERROR', csvbuffer.getvalue())
connection.commit()
# cleanup
cursor.close()
connection.close()
I have some code to query a MYSQL database and send the output to a text file.
The code below prints out the first 7 columns of data and sends it to a text file called Test
My question is, how do i also obtain the column HEADINGS from the database as well to display in the text file?
I am using Python 2.7 with a MYSQL database.
import MySQLdb
import sys
connection = MySQLdb.connect (host="localhost", user = "", passwd = "", db =
"")
cursor = connection.cursor ()
cursor.execute ("select * from tablename")
data = cursor.fetchall ()
OutputFile = open("C:\Temp\Test.txt", "w")
for row in data :
print>>OutputFile, row[0],row[1],row[2],row[3],row[4],row[5],row[6]
OutputFile.close()
cursor.close ()
connection.close ()
sys.exit()
The best way to get the details of the column name is by using INFORMATION_SCHEMA
SELECT `COLUMN_NAME`
FROM `INFORMATION_SCHEMA`.`COLUMNS`
WHERE `TABLE_SCHEMA`='yourdatabasename'
AND `TABLE_NAME`='yourtablename';
or by using the SHOW command of mySQL
SHOW columns FROM your-table;
This command is only mySQL specific.
and then to get the data you can use the .fetchall() function to get the details.
My requirement is to capture logs for a particular http request sent to server from project server log file. So have written two function and trying to execute them parallel using multiprocessing module. But only one is getting executed. not sure what is going wrong.
My two functions - run_remote_command - using paramiko module for executing the tail command on remote server(linux box) and redirecting the output to a file. And send_request - using request module to make POST request from local system (windows laptop) to the server.
Code:
import multiprocessing as mp
import paramiko
import datetime
import requests
def run_remote_command():
basename = "sampletrace"
suffixname = datetime.datetime.now().strftime("%y%m%d_%H%M%S")
filename = "_".join([basename, suffixname])
print filename
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
try:
ssh.connect(hostname='x.x.x.x',username='xxxx',password='xxxx')
except Exception as e:
print "SSH Connecting to Host failed"
print e
ssh.close()
print ssh
tail = "tail -1cf /var/opt/logs/myprojectlogFile.txt >"
cmdStr = tail + " " + filename
result = ''
try:
stdin, stdout, stderr = ssh.exec_command(cmdStr)
print "error:" +str( stderr.readlines())
print stdout
#logger.info("return output : response=%s" %(self.resp_result))
except Exception as e:
print 'Run remote command failed cmd'
print e
ssh.close()
def send_request():
request_session = requests.Session()
headers = {"Content-Type": "application/x-www-form-urlencoded"}
data = "some data "
URL = "http://X.X.X.X:xxxx/request"
request_session.headers.update(headers)
resp = request_session.post(URL, data=data)
print resp.status_code
print resp.request.headers
print resp.text
def runInParallel(*fns):
proc = []
for fn in fns:
p = mp.Process(target=fn)
p.start()
proc.append(p)
for p in proc:
p.join()
if __name__ == '__main__':
runInParallel(run_remote_command, send_request)
Output: only the function send_request is getting executed. Even I check the process list of the server there is no tail process is getting created
200
Edited the code per the #Ilja comment
I can connect to a database in sqlite3, attach another database and run an inner join to retrieve records from two tables, one in each database. But when I try to do the same with a python script running on the command line, I get no results - the error reads that the table (in the attached database) does not exist.
import sqlite3 as lite
db_acts = '/full/path/to/activities.db'
db_sign = '/full/path/to/sign_up.db'
def join_tables():
try:
con = lite.connect(db_acts)
cursor = con.cursor()
cursor.execute("attach database 'db_sign' as 'sign_up'")
cursor.execute("select users.ID, users.Email, users.TextMsg from sign_up.users INNER JOIN db_acts.alerts on sign_up.users.ID = db_acts.alerts.UID")
rows = cursor.fetchall()
for row in rows:
print 'row', row
con.commit()
con.close()
except lite.Error, e:
print 'some error'
sys.exit(1)
The response on localhost is the same as on the HostGator remote host where I just ran a test (it's a new site without user inputs at the moment). I have no problem reading rows from tables in the original database connection - only the tables in the attached database are not read. The attachment works at least partially - a print statement to attach it in the except clause shows that the database is in use.