Python SQLite Insert not working despite commit - python-2.7

This is the code I'm running in Python. The table has been created in the DB already. I'm doing a commit, so I don't know why it's working.
The code executes just fine, but no data is inserted into the table. I ran the same insert statement directly via sqlite command line and it worked just fine.
import os
import sqlite3
current_dir = os.path.dirname(__file__)
db_file = os.path.join(current_dir, '../data/trips.db')
trips_db = sqlite3.connect(db_file)
c = trips_db.cursor()
print 'inserting data into aggregate tables'
c.execute(
'''
insert into route_agg_data
select
pickup_loc_id || ">" || dropoff_loc_id as ride_route,
count(*) as rides_count
from trip_data
group by
pickup_loc_id || ">" || dropoff_loc_id
'''
)
trips_db.commit
trips_db.close

I changed the last 2 lines of my code to this:
trips_db.commit()
trips_db.close()
Thanks #thesilkworm

Could you write stored procedure inside sqlite:
Then Try In Python:
cur = connection.cursor()
cur.callproc('insert_into_route_agg_data', [request.data['value1'],
request.data['value2'], request.data['value3'] ] )
results = cur.fetchone()
cur.close()

Related

Django toolbar: not showing the results for query with filter with greater than

I have a simple query in Django. I have Django toolbar installed to check the SQL queries and the corresponding data
My model:
class RecipePosition(models.Model):
name = models.CharField(max_length=200,blank=True,help_text="If left blank will be same as Ingredient name Eg: Tomato pulp")
mass_quantity = models.DecimalField(max_digits=19, decimal_places=10,null=True,blank=True,default=0,validators=[MinValueValidator(0)])
title = models.CharField(max_length=200,blank=True)
updated = models.DateTimeField(auto_now=True, auto_now_add=False)
timestamp = models.DateTimeField(auto_now=False, auto_now_add=True)
I have the below django query with filter.
RecipePosition.objects.all().filter(mass_quantity__gt = 0)
Django gets all the objects whose mass_quantity is greater than 0.
But when i check the sql in the django - toolbar. it shows:
SELECT "recipes_recipeposition"."id",
"recipes_recipeposition"."name",
"recipes_recipeposition"."mass_quantity",
"recipes_recipeposition"."title",
"recipes_recipeposition"."updated",
"recipes_recipeposition"."timestamp"
FROM "recipes_recipeposition"
WHERE "recipes_recipeposition"."mass_quantity" > 'Decimal(''0'')'
ORDER BY "recipes_recipeposition"."sequence_number" ASC
I tried this command in sqlite browser also, but it didn't show any results.
Why django-toolbar is not showing the correct SQL?
As per me the sql should be:
SELECT "recipes_recipeposition"."id",
"recipes_recipeposition"."name",
"recipes_recipeposition"."mass_quantity",
"recipes_recipeposition"."title",
"recipes_recipeposition"."updated",
"recipes_recipeposition"."timestamp"
FROM "recipes_recipeposition"
WHERE "recipes_recipeposition"."mass_quantity" > 0
ORDER BY "recipes_recipeposition"."sequence_number" ASC
and this when tested in sqlite browser shows the results.
Also when I tested this on shell_plus with --print-sql --ipython shows
$ python manage.py shell_plus --print-sql --ipython
System check identified some issues:
# Shell Plus Model Imports
from recipes.models import Recipe, RecipePosition
Python 3.6.4 (default, Jan 5 2018, 02:35:40)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: RecipePosition.objects.all().filter(mass_quantity__gt=0)
Out[1]: SELECT "recipes_recipeposition"."id",
"recipes_recipeposition"."name",
"recipes_recipeposition"."mass_quantity",
"recipes_recipeposition"."title",
"recipes_recipeposition"."updated",
"recipes_recipeposition"."timestamp"
FROM "recipes_recipeposition"
WHERE "recipes_recipeposition"."mass_quantity" > '0'
ORDER BY "recipes_recipeposition"."sequence_number" ASC
LIMIT 21
Only on django-toolbar it shows Decimal() thing here on Django shell it shows WHERE "recipes_recipeposition"."mass_quantity" > '0'
I also tried debugsqlshell as mentioned in the django-toolbar documentation. Its shows "recipes_recipeposition"."mass_quantity" > '0' rather than "recipes_recipeposition"."mass_quantity" > 'Decimal(''0'')'
$ python manage.py debugsqlshell
Python 3.6.4 (default, Jan 5 2018, 02:35:40)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [2]: from recipes.models import Recipe, RecipePosition
In [3]: RecipePosition.objects.all().filter(mass_quantity__gt = 0)
Out[3]: SELECT "recipes_recipeposition"."id",
"recipes_recipeposition"."name",
"recipes_recipeposition"."mass_quantity",
"recipes_recipeposition"."title",
"recipes_recipeposition"."updated",
"recipes_recipeposition"."timestamp"
FROM "recipes_recipeposition"
WHERE "recipes_recipeposition"."mass_quantity" > '0'
ORDER BY "recipes_recipeposition"."sequence_number" ASC
LIMIT 21 [1.58ms]
I dont know why django-toobar is using "recipes_recipeposition"."mass_quantity" > 'Decimal(''0'')' instead of "recipes_recipeposition"."mass_quantity" > '0'
I want to rely on django-toolbar, but now i am worried.
Good news, I think you need to make the most minute of changes!
Instead of:
RecipePosition.objects.all().filter(mass_quantity__gt = 0)
You require:
RecipePosition.objects.all().filter(mass_quantity__gt=0.0)
Finally after lot of struggle and going through the code. I did the following changes in the source code and then everything worked the way i want.
####################
http://127.0.0.1:8001/static/debug_toolbar/css/toolbar.css
# By doing this the sql will show in multiple lines with indent
#djDebug .djDebugSql {
#word-break:break-word;
z-index:100000002;
}
#####################
# replace \n with <br> and space with nbsp and dont use Boldkeyword
# By doing this the sql will show in multiple lines with indent
def reformat_sql(sql):
stack = sqlparse.engine.FilterStack()
options = formatter.validate_options({'reindent':True,'indent_width':True,})
stack = formatter.build_filter_stack(stack, options)
#stack.preprocess.append(BoldKeywordFilter()) # add our custom filter
stack.postprocess.append(sqlparse.filters.SerializerUnicode()) # tokens -> strings
#return swap_fields(''.join(stack.run(sql)))
return swap_fields(''.join(stack.run(sql)).replace("\n", "<br/>").replace(" ", " "))
#####################
in file /lib/python3.6/site-packages/debug_toolbar/panels/sql/tracking.py
# because of this the greater than 0 is shown as deciman(0.0)
# change for decimal wrap p in rev_typecast_decimal
def _record(self, method, sql, params):
start_time = time()
try:
return method(sql, params)
finally:
stop_time = time()
duration = (stop_time - start_time) * 1000
if dt_settings.get_config()['ENABLE_STACKTRACES']:
stacktrace = tidy_stacktrace(reversed(get_stack()))
else:
stacktrace = []
_params = ''
try:
_params = json.dumps([self._decode(rev_typecast_decimal(p)) for p in params])
#_params = json.dumps([self._decode(p) for p in params]) I
###########################
hare = []
if params is not None:
hare = [self._decode(rev_typecast_decimal(p)) for p in params]
try:
_params = json.dumps([self._decode(rev_typecast_decimal(p)) for p in params])
# _params = json.dumps([self._decode(p) for p in params])
except Exception:
pass # object not JSON serializable
template_info = get_template_info()
alias = getattr(self.db, 'alias', 'default')
conn = self.db.connection
vendor = getattr(conn, 'vendor', 'unknown')
params1 = {
'vendor': vendor,
'alias': alias,
# 'sql': self.db.ops.last_executed_query(
# self.cursor, sql, self._quote_params(params)),
'sql': self.db.ops.last_executed_query(
self.cursor, sql, hare),
'duration': duration,
'raw_sql': sql,
'params': _params,
'stacktrace': stacktrace,
'start_time': start_time,
'stop_time': stop_time,
'is_slow': duration > dt_settings.get_config()['SQL_WARNING_THRESHOLD'],
'is_select': sql.lower().strip().startswith('select'),
'template_info': template_info,
}
################################################
The final out put looks like, with decimal(0.0) replaces with 0 and well formatted sql
SELECT "recipes_recipeposition"."id",
"recipes_recipeposition"."name",
"recipes_recipeposition"."recipe_id",
"recipes_recipeposition"."ingredient_id",
"recipes_recipeposition"."recipeposition_slug",
"recipes_recipeposition"."cooking_unit",
"recipes_recipeposition"."mass_unit_id",
"recipes_recipeposition"."mass_quantity",
"recipes_recipeposition"."volume_unit_id",
"recipes_recipeposition"."volume_quantity",
"recipes_recipeposition"."pieces_unit_id",
"recipes_recipeposition"."pieces_quantity",
"recipes_recipeposition"."cooking_notes",
"recipes_recipeposition"."sequence_number",
"recipes_recipeposition"."title",
"recipes_recipeposition"."updated",
"recipes_recipeposition"."timestamp",
"ingredients_ingredient"."rate" AS "ingredient__rate",
CASE
WHEN "ingredients_ingredient"."munit" = 'kg' THEN 'kg'
WHEN "ingredients_ingredient"."munit" = 'ltr' THEN 'ltr'
WHEN "ingredients_ingredient"."munit" = 'pcs' THEN 'pcs'
ELSE 'False'
END AS "ingredient__cost_unit",
CASE
WHEN "ingredients_ingredient"."munit" = 'kg' THEN CASE
WHEN ("recipes_recipeposition"."mass_unit_id" IS NOT NULL
AND "recipes_recipeposition"."mass_quantity" IS NOT NULL
AND "recipes_recipeposition"."mass_quantity" > '0') THEN CAST(("recipes_recipeposition"."mass_quantity" * "single_measurements_singlemeasurements"."quantity") AS NUMERIC)
ELSE 'False'
END
WHEN "ingredients_ingredient"."munit" = 'ltr' THEN 'ltr'
WHEN "ingredients_ingredient"."munit" = 'pcs' THEN 'pcs'
ELSE 'False'
END AS "reciposition_cost_quantity"
FROM "recipes_recipeposition"
LEFT OUTER JOIN "ingredients_ingredient" ON ("recipes_recipeposition"."ingredient_id" = "ingredients_ingredient"."id")
LEFT OUTER JOIN "single_measurements_singlemeasurements" ON ("recipes_recipeposition"."mass_unit_id" = "single_measurements_singlemeasurements"."id")
WHERE "recipes_recipeposition"."recipe_id" = '1'
ORDER BY "recipes_recipeposition"."sequence_number" ASC

using pd.read_sql() to extract large data (>5 million records) from oracle database, making the sql execution very slow

Initially tried using pd.read_sql().
Then I tried using sqlalchemy, query objects but none of these methods are
useful as the sql getting executed for long time and it never ends.
I tried using Hints.
I guess the problem is the following: Pandas creates a cursor object in the
background. With cx_Oracle we cannot influence the "arraysize" parameter which
will be used thereby, i.e. always the default value of 100 will be used which
is far too small.
CODE:
import pandas as pd
import Configuration.Settings as CS
import DataAccess.Databases as SDB
import sqlalchemy
import cx_Oracle
dfs = []
DBM = SDB.Database(CS.DB_PRM,PrintDebugMessages=False,ClientInfo="Loader")
sql = '''
WITH
l AS
(
SELECT DISTINCT /*+ materialize */
hcz.hcz_lwzv_id AS lwzv_id
FROM
pm_mbt_materialbasictypes mbt
INNER JOIN pm_mpt_materialproducttypes mpt ON mpt.mpt_mbt_id = mbt.mbt_id
INNER JOIN pm_msl_materialsublots msl ON msl.msl_mpt_id = mpt.mpt_id
INNER JOIN pm_historycompattributes hca ON hca.hca_msl_id = msl.msl_id AND hca.hca_ignoreflag = 0
INNER JOIN pm_tpm_testdefprogrammodes tpm ON tpm.tpm_id = hca.hca_tpm_id
inner join pm_tin_testdefinsertions tin on tin.tin_id = tpm.tpm_tin_id
INNER JOIN pm_hcz_history_comp_zones hcz ON hcz.hcz_hcp_id = hca.hca_hcp_id
WHERE
mbt.mbt_name = :input1 and tin.tin_name = 'x1' and
hca.hca_testendday < '2018-5-31' and hca.hca_testendday > '2018-05-30'
),
TPL as
(
select /*+ materialize */
*
from
(
select
ut.ut_id,
ut.ut_basic_type,
ut.ut_insertion,
ut.ut_testprogram_name,
ut.ut_revision
from
pm_updated_testprogram ut
where
ut.ut_basic_type = :input1 and ut.ut_insertion = :input2
order by
ut.ut_revision desc
) where rownum = 1
)
SELECT /*+ FIRST_ROWS */
rcl.rcl_lotidentifier AS LOT,
lwzv.lwzv_wafer_id AS WAFER,
pzd.pzd_zone_name AS ZONE,
tte.tte_tpm_id||'~'||tte.tte_testnumber||'~'||tte.tte_testname AS Test_Identifier,
case when ppd.ppd_measurement_result > 1e15 then NULL else SFROUND(ppd.ppd_measurement_result,6) END AS Test_Results
FROM
TPL
left JOIN pm_pcm_details pcm on pcm.pcm_ut_id = TPL.ut_id
left JOIN pm_tin_testdefinsertions tin ON tin.tin_name = TPL.ut_insertion
left JOIN pm_tpr_testdefprograms tpr ON tpr.tpr_name = TPL.ut_testprogram_name and tpr.tpr_revision = TPL.ut_revision
left JOIN pm_tpm_testdefprogrammodes tpm ON tpm.tpm_tpr_id = tpr.tpr_id and tpm.tpm_tin_id = tin.tin_id
left JOIN pm_tte_testdeftests tte on tte.tte_tpm_id = tpm.tpm_id and tte.tte_testnumber = pcm.pcm_testnumber
cross join l
left JOIN pm_lwzv_info lwzv ON lwzv.lwzv_id = l.lwzv_id
left JOIN pm_rcl_resultschipidlots rcl ON rcl.rcl_id = lwzv.lwzv_rcl_id
left JOIN pm_pcm_zone_def pzd ON pzd.pzd_basic_type = TPL.ut_basic_type and pzd.pzd_pcm_x = lwzv.lwzv_pcm_x and pzd.pzd_pcm_y = lwzv.lwzv_pcm_y
left JOIN pm_pcm_par_data ppd ON ppd.ppd_lwzv_id = l.lwzv_id and ppd.ppd_tte_id = tte.tte_id
'''
#method1: using query objects.
Q = DBM.getQueryObject(sql)
Q.execute({"input1":'xxxx',"input2":'yyyy'})
while not Q.AtEndOfResultset:
print Q
#method2: using sqlalchemy
connectstring = "oracle+cx_oracle://username:Password#(description=
(address_list=(address=(protocol=tcp)(host=tnsconnect string)
(port=pertnumber)))(connect_data=(sid=xxxx)))"
engine = sqlalchemy.create_engine(connectstring, arraysize=10000)
df_p = pd.read_sql(sql, params=
{"input1":'xxxx',"input2":'yyyy'}, con=engine)
#method3: using pd.read_sql()
df_p = pd.read_sql_query(SQL_PCM, params=
{"input1":'xxxx',"input2":'yyyy'},
coerce_float=True, con= DBM.Connection)
It would be great if some one could help me out in this. Thanks in advance.
And yet another possibility to adjust the array size without needing to create oraaccess.xml as suggested by Chris. This may not work with the rest of your code as is, but it should give you an idea of how to proceed if you wish to try this approach!
class Connection(cx_Oracle.Connection):
def __init__(self):
super(Connection, self).__init__("user/pw#dsn")
def cursor(self):
c = super(Connection, self).cursor()
c.arraysize = 5000
return c
engine = sqlalchemy.create_engine(creator=Connection)
pandas.read_sql(sql, engine)
Here's another alternative to experiment with.
Set a prefetch size by using the external configuration available to Oracle Call Interface programs like cx_Oracle. This overrides internal settings used by OCI programs. Create an oraaccess.xml file:
<?xml version="1.0"?>
<oraaccess xmlns="http://xmlns.oracle.com/oci/oraaccess"
xmlns:oci="http://xmlns.oracle.com/oci/oraaccess"
schemaLocation="http://xmlns.oracle.com/oci/oraaccess
http://xmlns.oracle.com/oci/oraaccess.xsd">
<default_parameters>
<prefetch>
<rows>1000</rows>
</prefetch>
</default_parameters>
</oraaccess>
If you use tnsnames.ora or sqlnet.ora for cx_Oracle, then put the oraaccess.xml file in the same directory. Otherwise, create a new directory and set the environment variable TNS_ADMIN to that directory name.
cx_Oracle needs to be using Oracle Client 12c, or later, libraries.
Experiment with different sizes.
See OCI Client-Side Deployment Parameters Using oraaccess.xml.

Python import/insert a CSV (without headers) in a Oracle BD using cx_Oracle

Can anyone suggest a way to import a CSV file into a Oracle BD using cx_Oracle. The below code works but I have to manually delete the CSV headers column on row 1 before I run the below Python Script. Is there a way to change the code to ignore line 1 of the CSV file?
import cx_Oracle
import csv
connection = cx_Oracle.connect(USER,PASSWORD,'adhoc_serv')#DADs
cursor = connection.cursor()
insert = """
INSERT INTO MUK (CODE, UNIT_NAME, GROUP_CODE, GROUP_NAME,)
VALUES(:1, :2, :3, :4)"""
# Initialize list that will serve as a container for bind values
L = []
reader = csv.reader(open(r'C:\Projects\MUK\MUK_Latest_PY.csv'),delimiter=',')
for row in reader:
L.append(tuple(row))
# prepare insert statement
cursor.prepare(insert)
print insert
# execute insert with executemany
cursor.executemany(None, L)
# report number of inserted rows
print 'Inserted: ' + str(cursor.rowcount) + ' rows.'
# commit
connection.commit()
# close cursor and connection
cursor.close()
connection.close()
If you want to simply ignore line 1 of the CSV file, that is easily accomplished by performing this immediately after the reader has been created:
next(reader)
This will simply get the first row from the CSV file and discard it.

Python Sybase module vs subprocess isql , which one is better to use?

I found isql(using subprocess) is taking less time compare to Sybase module in python.
Could someone please suggest me, should I use subprocess or Sybase.
Below is the small test script which I have used for my understanding.
Query = 'select count(*) from my_table'
start_time1 = datetime.now()
db = Sybase.connect(mdbserver,muserid,mpassword,mdatabase)
c = db.cursor()
c.execute(Query)
list1 = c.fetchall()
end_time1 = datetime.now()
print (end_time1-start_time1)
start_time2 = datetime.now()
command = "./isql -S "+mdbserver+" -U "+muserid+" -P "+mpassword+" -D "+mdatabase+" -s '"+Delimiter+"' --retserverror -w 99999 <<EOF\nSET NOCOUNT ON\n "+Query+"\ngo\nEOF"
proc = subprocess.Popen(
command,
stdout=subprocess.PIPE,stderr=subprocess.PIPE,
shell=True,
cwd=sybase_bin
)
output, error = proc.communicate()
end_time2 = datetime.now()
print (end_time2 - start_time2)
isql is intended for interactive access to the database, and it returns data formatted for screen output. There is additional padding and formatting that can't be directly controlled. It also does not work well when you are looking at binary/image or other non varchar data.
The Python Module will pull the data as expected, without additional formatting.
So as long as you are only pulling columns that aren't too wide, or have binary data, then you can probably get away with using subprocess. The better solution would be to use the python module.

Cursor execution - Python

I am working on a Django project where i use MySql database. In my project, one of the method used two cursor connections to execute the query. It was closed correctly.
The MySql database has a table called GeometryTable where it had two columns (Geometry Datatypes). One is GeomPolygon (Polygon values) and another is GeomPoint (Point data). I wrote a MySql query in my python project which returns the selected points and polygons within the given polygon. The table (GeometryTable) had 9 million rows of values.
When i run the query in MySql Workbench, it took a few seconds. But in project, it took several minutes to return the values. Anyone please help me to optimize the code. Thanks
The method is :
def GeometryShapes(polygon):
geom = 'POLYGON((' + ','.join(['%s %s' % v for v in polygon]) + '))'
query1 = 'SELECT GeomId, AsText(GeomPolygon)' \
' FROM GeometryTable' \
' WHERE MBRWithin(GeomPolygon, GeomFromText("%s")) AND length(GeomPolygon) < 10 million' % (geom)
query2 = 'SELECT GeomId, AsText(GeomPoint)' \
' FROM GeometryTable' \
' WHERE MBRWithin(GeomPoint, GeomFromText("%s")) AND length(GeomPoint) < 10 million' % (geom)
if query1:
cursor = connection.cursor()
cursor.execute(query1)
.....
cursor.close()
if query2:
cursor = connection.cursor()
cursor.execute(query2)
.......
cursor.close()
Here, while cursor execution (cursor.execute(query1) and cursor.execute(query2)) took several minutes to execute the query. I have indexed both the columns in the specified table.
Can anyone help me to optimize the code?