Python Sybase module vs subprocess isql , which one is better to use? - python-2.7

I found isql(using subprocess) is taking less time compare to Sybase module in python.
Could someone please suggest me, should I use subprocess or Sybase.
Below is the small test script which I have used for my understanding.
Query = 'select count(*) from my_table'
start_time1 = datetime.now()
db = Sybase.connect(mdbserver,muserid,mpassword,mdatabase)
c = db.cursor()
c.execute(Query)
list1 = c.fetchall()
end_time1 = datetime.now()
print (end_time1-start_time1)
start_time2 = datetime.now()
command = "./isql -S "+mdbserver+" -U "+muserid+" -P "+mpassword+" -D "+mdatabase+" -s '"+Delimiter+"' --retserverror -w 99999 <<EOF\nSET NOCOUNT ON\n "+Query+"\ngo\nEOF"
proc = subprocess.Popen(
command,
stdout=subprocess.PIPE,stderr=subprocess.PIPE,
shell=True,
cwd=sybase_bin
)
output, error = proc.communicate()
end_time2 = datetime.now()
print (end_time2 - start_time2)

isql is intended for interactive access to the database, and it returns data formatted for screen output. There is additional padding and formatting that can't be directly controlled. It also does not work well when you are looking at binary/image or other non varchar data.
The Python Module will pull the data as expected, without additional formatting.
So as long as you are only pulling columns that aren't too wide, or have binary data, then you can probably get away with using subprocess. The better solution would be to use the python module.

Related

PVLIB: How can I add module and inverter specifications which are not present in CEC and SAM library?

I am working on a PV system installed in Amsterdam. The PVsystem code is as follows. I am getting good results with the inverter and the modules specified in the code which is obtained with retrieve_sam.
import pvlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pvlib.temperature import TEMPERATURE_MODEL_PARAMETERS
from pandas.plotting import register_matplotlib_converters
from pvlib.modelchain import ModelChain
# Define location for the Netherlands
location = pvlib.location.Location(latitude=52.53, longitude=5.15, tz='UTC', altitude=50, name='amsterdam')
#import the database
module_database = pvlib.pvsystem.retrieve_sam(name='SandiaMod')
inverter_database = pvlib.pvsystem.retrieve_sam(name='cecinverter')
module = module_database.Canadian_Solar_CS5P_220M___2009_
# module = module_database.DMEGC_Solar_320_M6_120BB_ (I want to add this module)
inverter = inverter_database.ABB__PVI_3_0_OUTD_S_US__208V_
temperature_model_parameters = pvlib.temperature.TEMPERATURE_MODEL_PARAMETERS['sapm']['open_rack_glass_glass']
modules_per_string = 10
inverter_per_string = 1
# Define a PV system characteristics
surface_tilt = 12.5
surface_azimuth = 180
system = pvlib.pvsystem.PVSystem(surface_tilt=surface_tilt, surface_azimuth=surface_azimuth, albedo=0.25,
module=module, module_parameters=module,
temperature_model_parameters=temperature_model_parameters,
modules_per_string=modules_per_string, inverter_per_string=inverter_per_string,
inverter=inverter, inverter_parameters=inverter, racking_model='open_rack')
# Define a weather file
def importPSMData():
df = pd.read_csv('/Users/laxmikantradkar/Desktop/PVLIB/solcast_data1.csv', delimiter=';')
# Rename the columns for input to PVLIB
df.rename(columns={'Dhi': 'dhi', 'Dni': 'dni', 'Ghi': 'ghi', 'AirTemp': 'temp_air', 'WindSpeed10m': 'wind_speed',
}, inplace=True)
df.rename(columns={'Year': 'year', 'Month': 'month', 'Day': 'day', 'Hour': 'hour',
'Minute': 'minute'}, inplace=True)
df['dt'] = pd.to_datetime(df[['year', 'month', 'day', 'hour', 'minute']])
df.set_index(df['dt'], inplace=True)
# Rename data parameters to run to datetime
# df.rename(columns={'PeriodEnd': 'period_end'}, inplace=True)
# Drop unnecessary columns
df = df.drop('PeriodStart', 1)
df = df.drop('Period', 1)
df = df.drop('Azimuth', 1)
df = df.drop('CloudOpacity', 1)
df = df.drop('DewpointTemp', 1)
df = df.drop('Ebh', 1)
df = df.drop('PrecipitableWater', 1)
df = df.drop('SnowDepth', 1)
df = df.drop('SurfacePressure', 1)
df = df.drop('WindDirection10m', 1)
df = df.drop('Zenith', 1)
return df
mc = ModelChain(system=system, location=location)
weatherData = importPSMData()
mc.run_model(weather=weatherData)
ac_energy = mc.ac
# ac_energy.to_csv('/Users/laxmikantradkar/Desktop/ac_energy_netherlands.csv')
plt.plot(ac_energy)
plt.show()
Now I want to change the module and inverter which is not present in the library. Could anyone please tell me how to do this?
Is it possible to access the library and manually add the row/column of inverter and module? If yes, where is the library located?
Is it ../Desktop/PVLIB/venv/lib/python3.8/site-packages/pvlib/data/sam-library-sandia-modules-2015-6-30.csv
When I change try to change the module/inverter parameters from above path, I receive an error as DataFrame' object has no attribute 'Module name'
I started working on PVLIB_python 2 days ago, so I am new to the language. I really appreciate your help. Feel free to correct me at any point.
I started working on PVLIB_python 2 days ago, so I am new to the
language. I really appreciate your help. Feel free to correct me at
any point.
Welcome to the community! If you haven't already I encourage you to dig through the pvlib-python documentation and continue to learn Python basics through playing with the examples in the documentation. I encourage you to checkout the pandas tutorials and any other highly rated pandas learning material you can find to get yourself running with data science in Python.
When I change try to change the module/inverter parameters from above
path, I receive an error as DataFrame' object has no attribute 'Module
name'
This is because you're asking for a column in the DataFrame table that's not there. No worries, you can make your own module.
Now I want to change the module and inverter which is not present in
the library. Could anyone please tell me how to do this? Is it possible to access the library and manually add the row/column
of inverter and module? If yes, where is the library located?
It isn't necessary to change the library. You can construct a module yourself since it is a Series from the pandas library. Here's an example showing how you can output the module as a dictionary, change a couple parameters and create your own module.
my_new_module = module.copy() # create your own copy of the module
print("Before:", my_new_module, sep="\n") # show module before
my_new_module["Notes"] = "This is how to change a field in the module. Do this for every field in the module."
my_new_module.name = "DMEGC_Solar_320_M6_120BB_" # rename the Series appropriately
print("\nAfter:", my_new_module, sep="\n") # show module after
Then you can just insert "my_new_module" into PVSystem:
system = pvlib.pvsystem.PVSystem(
surface_tilt=surface_tilt,
surface_azimuth=surface_azimuth,
albedo=0.25,
module=my_new_module, # HERE'S THE NEW MODULE!
module_parameters=module,
temperature_model_parameters=temperature_model_parameters,
modules_per_string=modules_per_string,
inverter_per_string=inverter_per_string,
inverter=inverter,
inverter_parameters=inverter,
racking_model='open_rack')
The hard part here is having the right coefficients that you can trust. You may have an easier time using module_database = pvlib.pvsystem.retrieve_sam(name='CECMod') and replacing those parameters since they can be substituted more easily with data from the module spec sheet.
This should work identically for inverters as well.

Python SQLite Insert not working despite commit

This is the code I'm running in Python. The table has been created in the DB already. I'm doing a commit, so I don't know why it's working.
The code executes just fine, but no data is inserted into the table. I ran the same insert statement directly via sqlite command line and it worked just fine.
import os
import sqlite3
current_dir = os.path.dirname(__file__)
db_file = os.path.join(current_dir, '../data/trips.db')
trips_db = sqlite3.connect(db_file)
c = trips_db.cursor()
print 'inserting data into aggregate tables'
c.execute(
'''
insert into route_agg_data
select
pickup_loc_id || ">" || dropoff_loc_id as ride_route,
count(*) as rides_count
from trip_data
group by
pickup_loc_id || ">" || dropoff_loc_id
'''
)
trips_db.commit
trips_db.close
I changed the last 2 lines of my code to this:
trips_db.commit()
trips_db.close()
Thanks #thesilkworm
Could you write stored procedure inside sqlite:
Then Try In Python:
cur = connection.cursor()
cur.callproc('insert_into_route_agg_data', [request.data['value1'],
request.data['value2'], request.data['value3'] ] )
results = cur.fetchone()
cur.close()

using pd.read_sql() to extract large data (>5 million records) from oracle database, making the sql execution very slow

Initially tried using pd.read_sql().
Then I tried using sqlalchemy, query objects but none of these methods are
useful as the sql getting executed for long time and it never ends.
I tried using Hints.
I guess the problem is the following: Pandas creates a cursor object in the
background. With cx_Oracle we cannot influence the "arraysize" parameter which
will be used thereby, i.e. always the default value of 100 will be used which
is far too small.
CODE:
import pandas as pd
import Configuration.Settings as CS
import DataAccess.Databases as SDB
import sqlalchemy
import cx_Oracle
dfs = []
DBM = SDB.Database(CS.DB_PRM,PrintDebugMessages=False,ClientInfo="Loader")
sql = '''
WITH
l AS
(
SELECT DISTINCT /*+ materialize */
hcz.hcz_lwzv_id AS lwzv_id
FROM
pm_mbt_materialbasictypes mbt
INNER JOIN pm_mpt_materialproducttypes mpt ON mpt.mpt_mbt_id = mbt.mbt_id
INNER JOIN pm_msl_materialsublots msl ON msl.msl_mpt_id = mpt.mpt_id
INNER JOIN pm_historycompattributes hca ON hca.hca_msl_id = msl.msl_id AND hca.hca_ignoreflag = 0
INNER JOIN pm_tpm_testdefprogrammodes tpm ON tpm.tpm_id = hca.hca_tpm_id
inner join pm_tin_testdefinsertions tin on tin.tin_id = tpm.tpm_tin_id
INNER JOIN pm_hcz_history_comp_zones hcz ON hcz.hcz_hcp_id = hca.hca_hcp_id
WHERE
mbt.mbt_name = :input1 and tin.tin_name = 'x1' and
hca.hca_testendday < '2018-5-31' and hca.hca_testendday > '2018-05-30'
),
TPL as
(
select /*+ materialize */
*
from
(
select
ut.ut_id,
ut.ut_basic_type,
ut.ut_insertion,
ut.ut_testprogram_name,
ut.ut_revision
from
pm_updated_testprogram ut
where
ut.ut_basic_type = :input1 and ut.ut_insertion = :input2
order by
ut.ut_revision desc
) where rownum = 1
)
SELECT /*+ FIRST_ROWS */
rcl.rcl_lotidentifier AS LOT,
lwzv.lwzv_wafer_id AS WAFER,
pzd.pzd_zone_name AS ZONE,
tte.tte_tpm_id||'~'||tte.tte_testnumber||'~'||tte.tte_testname AS Test_Identifier,
case when ppd.ppd_measurement_result > 1e15 then NULL else SFROUND(ppd.ppd_measurement_result,6) END AS Test_Results
FROM
TPL
left JOIN pm_pcm_details pcm on pcm.pcm_ut_id = TPL.ut_id
left JOIN pm_tin_testdefinsertions tin ON tin.tin_name = TPL.ut_insertion
left JOIN pm_tpr_testdefprograms tpr ON tpr.tpr_name = TPL.ut_testprogram_name and tpr.tpr_revision = TPL.ut_revision
left JOIN pm_tpm_testdefprogrammodes tpm ON tpm.tpm_tpr_id = tpr.tpr_id and tpm.tpm_tin_id = tin.tin_id
left JOIN pm_tte_testdeftests tte on tte.tte_tpm_id = tpm.tpm_id and tte.tte_testnumber = pcm.pcm_testnumber
cross join l
left JOIN pm_lwzv_info lwzv ON lwzv.lwzv_id = l.lwzv_id
left JOIN pm_rcl_resultschipidlots rcl ON rcl.rcl_id = lwzv.lwzv_rcl_id
left JOIN pm_pcm_zone_def pzd ON pzd.pzd_basic_type = TPL.ut_basic_type and pzd.pzd_pcm_x = lwzv.lwzv_pcm_x and pzd.pzd_pcm_y = lwzv.lwzv_pcm_y
left JOIN pm_pcm_par_data ppd ON ppd.ppd_lwzv_id = l.lwzv_id and ppd.ppd_tte_id = tte.tte_id
'''
#method1: using query objects.
Q = DBM.getQueryObject(sql)
Q.execute({"input1":'xxxx',"input2":'yyyy'})
while not Q.AtEndOfResultset:
print Q
#method2: using sqlalchemy
connectstring = "oracle+cx_oracle://username:Password#(description=
(address_list=(address=(protocol=tcp)(host=tnsconnect string)
(port=pertnumber)))(connect_data=(sid=xxxx)))"
engine = sqlalchemy.create_engine(connectstring, arraysize=10000)
df_p = pd.read_sql(sql, params=
{"input1":'xxxx',"input2":'yyyy'}, con=engine)
#method3: using pd.read_sql()
df_p = pd.read_sql_query(SQL_PCM, params=
{"input1":'xxxx',"input2":'yyyy'},
coerce_float=True, con= DBM.Connection)
It would be great if some one could help me out in this. Thanks in advance.
And yet another possibility to adjust the array size without needing to create oraaccess.xml as suggested by Chris. This may not work with the rest of your code as is, but it should give you an idea of how to proceed if you wish to try this approach!
class Connection(cx_Oracle.Connection):
def __init__(self):
super(Connection, self).__init__("user/pw#dsn")
def cursor(self):
c = super(Connection, self).cursor()
c.arraysize = 5000
return c
engine = sqlalchemy.create_engine(creator=Connection)
pandas.read_sql(sql, engine)
Here's another alternative to experiment with.
Set a prefetch size by using the external configuration available to Oracle Call Interface programs like cx_Oracle. This overrides internal settings used by OCI programs. Create an oraaccess.xml file:
<?xml version="1.0"?>
<oraaccess xmlns="http://xmlns.oracle.com/oci/oraaccess"
xmlns:oci="http://xmlns.oracle.com/oci/oraaccess"
schemaLocation="http://xmlns.oracle.com/oci/oraaccess
http://xmlns.oracle.com/oci/oraaccess.xsd">
<default_parameters>
<prefetch>
<rows>1000</rows>
</prefetch>
</default_parameters>
</oraaccess>
If you use tnsnames.ora or sqlnet.ora for cx_Oracle, then put the oraaccess.xml file in the same directory. Otherwise, create a new directory and set the environment variable TNS_ADMIN to that directory name.
cx_Oracle needs to be using Oracle Client 12c, or later, libraries.
Experiment with different sizes.
See OCI Client-Side Deployment Parameters Using oraaccess.xml.

Python make script run at specified time daily

I want to make this script to run automatically once or twice a day at a specified time, what would be the best way to approach this.
def get_data():
"""Reads the currency rates from cinkciarz.pl and prints out, stores the pln/usd
rate in a variable myRate"""
sock = urllib.urlopen("https://cinkciarz.pl/kantor/kursy-walut-cinkciarz-pl/usd")
htmlSource = sock.read()
sock.close()
currancyRate = re.findall(r'<td class="cur_down">(.*?)</td>',str(htmlSource))
for eachTd in currancyRate:
print(eachTd)
print currancyRate[0]
myRate = currancyRate[0]
print myRate
return myRate
You can use crontab to run any script at regular intervals. See https://stackoverflow.com/a/8727991/1517864
To run a script once a day (at 12:00) you will need an entry like this in your crontab
0 12 * * * python /path/to/script.py
You can add a bash function.
while true; do <your_command>; sleep <interval_in_seconds>; done

Xively read data in Python

I have written a python 2.7 script to retrieve all my historical data from Xively.
Originally I wrote it in C#, and it works perfectly.
I am limiting the request to 6 hour blocks, to retrieve all stored data.
My version in Python is as follows:
requestString = 'http://api.xively.com/v2/feeds/41189/datastreams/0001.csv?key=YcfzZVxtXxxxxxxxxxxORnVu_dMQ&start=' + requestDate + '&duration=6hours&interval=0&per_page=1000' response = urllib2.urlopen(requestString).read()
The request date is in the correct format, I compared the full c# requestString version and the python one.
Using the above request, I only get 101 lines of data, which equates to a few minutes of results.
My suspicion is that it is the .read() function, it returns about 34k of characters which is far less than the c# version. I tried adding 100000 as an argument to the ad function, but no change in result.
Left another solution wrote in Python 2.7 too.
In my case, got data each 30 minutes because many sensors sent values every minute and Xively API has limited half hour of data to this sent frequency.
It's general module:
for day in datespan(start_datetime, end_datetime, deltatime): # loop increasing deltatime to star_datetime until finish
while(True): # assurance correct retrieval data
try:
response = urllib2.urlopen('https://api.xively.com/v2/feeds/'+str(feed)+'.csv?key='+apikey_xively+'&start='+ day.strftime("%Y-%m-%dT%H:%M:%SZ")+'&interval='+str(interval)+'&duration='+duration) # get data
break
except:
time.sleep(0.3)
raise # try again
cr = csv.reader(response) # return data in columns
print '.'
for row in cr:
if row[0] in id: # choose desired data
f.write(row[0]+","+row[1]+","+row[2]+"\n") # write "id,timestamp,value"
The full script you can find it here: https://github.com/CarlosRufo/scripts/blob/master/python/retrievalDataXively.py
Hope you might help, delighted to answer any questions :)