I have a prog that stores data in a sqlite db. Among other tables in the db, I have one created as follows:
conn.execute("CREATE TABLE {tn} ({cn} {ct})".format(tn=test, cn="STEP_NAME", ct="TEXT"))
Therein, the table creates has several columns. One is:
conn.execute("ALTER TABLE {tn} ADD COLUMN '{cn}' {ct} ".format(tn=test, cn=value, ct="TEXT"))
Im trying to save data to it, but it's behaving in a way I can't explain. When I save 270113185308874890 to it, it appears 270113185308874890 when recalled. However, when I save 89014103258771944209 to it, it saves as 8.90141032588e+19.
How can I prevent this? I've tried different column types with no luck and really don't understand why it's converting it.
EDIT:
Code that I'm using to store it
def store_result(conn, table_name, row_name, data):
for k, v in data.iteritems():
if isinstance(v, str):
data[k] = v.replace('"', "'").rstrip(' \t\r\n\0')
keys = data.keys()
vals = data.values()
# add test name column for everything but info call
if table_name != "info":
keys.insert(0, "STEP_NAME")
vals.insert(0, str(row_name))
# Make pretty for sqlite3 and its crazy param rules.
sql_keys = ','.join(str(v) for v in keys)
sql_vals = ','.join(str(v) for v in [x if str.isdigit(str(x)) else '"{}"'.format(x) for x in vals])
# try to write or tell me why not.
try:
conn.execute("""INSERT into {table}({sql_keys}) values ({vals})""".format(table=table_name,
sql_keys=sql_keys,
vals=sql_vals))
conn.commit()
except Exception as e:
logging.warn("DB ERROR:{}_{}_{}".format(e, table_name, row_name))
When you print the values after they are returned from the table, the type of the variable that holds the values affects both how they're printed and they're precision. As an example:
int1 = 270113185308874890;
float1 = 270113185308874890.0;
int2 = 89014103258771944209;
float2 = 89014103258771944209.0;
print 'int1 : ' + str(int1);
print 'float1: ' + str(float1);
print '';
print 'int2 : ' + str(int2);
print 'float2: ' + str(float2);
Will print:
int1 : 270113185308874890
float1: 2.70113185309e+17
int2 : 89014103258771944209
float2: 8.90141032588e+19
It seems likely that in the SQLite table the type is TEXT, as shown in the example from the SQLite website (https://www.sqlite.org/datatype3.html) below. You should use the typeof() function to ensure that you're data is being stored as TEXT.
Finally, you should consider using the INTEGER type rather than TEXT in your SQLite table if all of your numbers will be integers. Also if you are using TEXT to try and preserve precision, make sure you are not limited by the calling code. I.e. unless you are dealing with the Decimal Python type the REAL SQLite type will match the precision of the Float Python type.
2.3 Column Affinity Behavior Example
The following SQL demonstrates how SQLite uses column affinity to do
type conversions when values are inserted into a table.
CREATE TABLE t1(
t TEXT, -- text affinity by rule 2
nu NUMERIC, -- numeric affinity by rule 5
i INTEGER, -- integer affinity by rule 1
r REAL, -- real affinity by rule 4
no BLOB -- no affinity by rule 3 );
-- Values stored as TEXT, INTEGER, INTEGER, REAL, TEXT.
INSERT INTO t1 VALUES('500.0', '500.0', '500.0', '500.0', '500.0');
SELECT typeof(t), typeof(nu), typeof(i), typeof(r), typeof(no) FROM t1;
text|integer|integer|real|text
-- Values stored as TEXT, INTEGER, INTEGER, REAL, REAL.
DELETE FROM t1;
INSERT INTO t1 VALUES(500.0, 500.0, 500.0, 500.0, 500.0);
SELECT typeof(t), typeof(nu), typeof(i), typeof(r), typeof(no) FROM t1;
text|integer|integer|real|real
-- Values stored as TEXT, INTEGER, INTEGER, REAL, INTEGER.
DELETE FROM t1;
INSERT INTO t1 VALUES(500, 500, 500, 500, 500);
SELECT typeof(t), typeof(nu), typeof(i), typeof(r), typeof(no) FROM t1;
text|integer|integer|real|integer
Related
I created this function in python using pandas dataframe, and I'd like to use it also in spark.
What I'm doing with this function is :
converting the df column to a list ( t1 )
converting the unique values of the column to a list ( t2 )
creating a list for each unique value of each feature ( t ). this list takes value 1 when the unique value is present in t1, 0 otherwise.
at the end the result is a dictionary with the unique values of each feature as key and as argument a list with value 1 when the key (the unique value) appears and 0 otherwise.
feat_list is just a list with all the column names.
def binary_dict(pandas_df, feat_list):
dict_feature = dict()
for col in feat_list:
t1 = pandas_df[col].tolist()
t2 = pandas_df[col].unique().tolist()
for value in t2:
t = []
for i in range (0, len(t1)):
if value == t1[i]:
t.append(1)
else:
t.append(0)
cc = str(col)
vv = "_" + str(value)
cv = cc + vv
dict_feature[cv] = t
return dict_feature
I tried using
t1 = df.select("col_name").rdd.flatMap(list).collect()
for creating t1 but it took over 20 minutes to create the list for a single column. I got something like 100 columns. Is there a way to convert this function to spark efficiently?
Thanks everyone for the answers!
PS: I'm using synapse analytics by azure/microsoft, Python 3.8 and pyspark 3.1.
Sorry for this newbie questions.
I have a dict like this:
{'id':'1', 'Book':'21', 'Member':'3', 'Title':'Chameleon vol. 2',
'Author':'Jason Bridge'}
I want to convert that dict to:
{'id':1, 'Book':21, 'Member':3, 'Title':'Chameleon vol. 2',
'Author':'Jason Bridge'}
I need to convert only the first 3 key value to int
Thanks in advance
dict1 = {'id':'1', 'Book':'21', 'Member':'3', 'Title':'Chameleon vol. 2', 'Author':'Jason Bridge'}
y_dict = dict(list(dict1.items())[:3])
print(y_dict) #dict sliced to the first 3 items that their values will be converted
z_dict = dict(list(dict1.items())[3:])
print(z_dict) #the rest of item that their values will not be converted to integer
x_dict = {k:int(v) for k, v in y_dict.items()}
print(x_dict) # dict values converted to integer
w_dict = {**x_dict, **z_dict}
print(w_dict) # merge of first 3 items with values as integer and the rest of the dict intact
w_dict is the result you are looking for.
Let's say your dict stored in "book_data" variable.
What means first 3 keys?
If you have static keys, you can set manually for it:
for key in ['id', 'Book', 'Member']:
book_data[key] = int(book_data[key])
If you have mutable dictionary, you may get it with it:
for key, val in list(book_data.items())[:3]:
book_data[key] = int(val)
method items help you avoid iterate over values.
I am trying to take a list of points, and query a geospatial database, to find all matching rows.
I have a computed SQL statement that looks like this:
cursor = connection.cursor()
cursor.execute(
'''
SELECT g.ident
FROM (VALUES %s) AS v (lon, lat)
LEFT JOIN customers g
ON (ST_Within(ST_SetSRID(ST_MakePoint(v.lon, v.lat), %s), g.poly_home));
''', [AsIs(formatted_points), SRID]
)
Here is an example of what the formatted_points variable looks like:
(-115.062,38.485), (-96.295,43.771)
So, when that is inserted into the SQL expression, then VALUES expression reads:
(VALUES (-115.062,38.485), (-96.295,43.771)) AS v (lon, lat)
So far so good. However, when the list of points is empty, the VALUES expression looks like this:
(VALUES ) AS v (lon, lat)
.. which causes me to get this error:
django.db.utils.ProgrammingError: syntax error at or near ")"
In other words, (VALUES ) is not legal SQL.
Here's the question: How do I represent an empty list using VALUES? I could special case this, and just return an empty list when this function is passed an empty list, but that doesn't seem very elegant.
I have looked at the PostgreSQL manual page for VALUES, but I don't understand how to construct an empty VALUES expression.
If you can put your lons and lats in separate arrays, you could use arrays with unnest:
select * from unnest(ARRAY[1, 2, 3]::int[], ARRAY[4, 5, 6]::int[]) as v(lon, lat);
lon | lat
-----+-----
1 | 4
2 | 5
3 | 6
(3 rows)
select * from unnest(ARRAY[]::int[], ARRAY[]::int[]) as v(lon, lat);
lon | lat
-----+-----
(0 rows)
You'll have to cast the arrays to the appropriate type (probably not int[]). Postgres will guess the type if the arrays aren't empty, but it will throw an error if they are empty and you don't cast them to a specific type.
I have a problem. When I pass a Python array:
self.notPermited = [2,3]
This is my procedure
def select_ids_entre_amistades(self,cod_us,ids_not):
lista = []
try:
cursor = self.__cursor.var(cx_Oracle.CURSOR)
print ids_not
data = self.__cursor.arrayvar(cx_Oracle.NUMBER, ids_not)
print data
l_query = self.__cursor.callproc("SCHEMA.PROC_SELECT_IDS_ENT_AMISTADES", [cursor,cod_us,data])
lista = l_query[0]
return lista
except cx_Oracle.DatabaseError as ex:
error, = ex.args
print(error.message)
return lista
The problem is when I call that procedure using this:
self.select_ids_entre_amistades(int_id,self.notPermited)
I visualize in the console the following message:
PLS-00306: wrong number or types of arguments in call to 'PROC
In the database I create the array object like this:
CREATE TYPE SCHEMA.ARRAY_ID_FRIENDS AS TABLE OF INT;
The Oracle stored procedure starts like this:
CREATE OR REPLACE PROCEDURE FACEBOOK.PROC_SELECT_IDS_ENT_AMISTADES
(CONSULTA OUT SYS_REFCURSOR,COD_US IN INT, IDS_FRIEND IN SCHEMA.ARRAY_ID_FRIENDS)
I don't know what the problem is, I believe cx_Oracle.NUMBER is not integer but there aren't other numeric type. Thanks in advance.
Try to use a plsql array in the parameters of the procedure and after that you pass the content of a sql array. The last one will be used to the sql statement into the procedure. It solve my trouble using oracle database 11g because in 12g you don't need to pass the content to an sql array. This could be the code:
def select_ids_entre_amistades(self,cod_us,ids_not):
lista = []
try:
cursor = self.__cursor.var(cx_Oracle.CURSOR)
varray = self.__cursor.arrayvar(cx_Oracle.NUMBER,ids_not)
l_query = self.__cursor.callproc("PACKFACE.P_SELECT_IDBFRIENDS", [cursor, cod_us, varray])
lista = l_query[0]
return lista
except cx_Oracle.DatabaseError as ex:
error, = ex.args
self.guardar_errores('dato ' + str(error.message))
return lista
And the stored procedure like this:
First you create a type
CREATE OR REPLACE TYPE LIST_IDS AS TABLE OF INT;
Second you create your package
CREATE OR REPLACE PACKAGE PACKFACE IS
TYPE LISTADO_IDS IS TABLE OF INT INDEX BY PLS_INTEGER;
PROCEDURE P_SELECT_IDBFRIENDS (CONSULTA OUT SYS_REFCURSOR,COD_US IN INT,IDS_NOT IN LISTADO_IDS);
END;
And finally create the body of the package
CREATE OR REPLACE PACKAGE BODY PACKFACE IS
PROCEDURE P_SELECT_IDBFRIENDS (CONSULTA OUT SYS_REFCURSOR,COD_US IN INT, IDS_NOT IN LISTADO_IDS)
IS
num_array LIST_IDS;
BEGIN
num_array:=LIST_IDS();
for i in 1 .. IDS_NOT.count
loop
num_array.extend(1);
num_array(i) := IDS_NOT(i);
end loop;
OPEN CONSULTA FOR
SELECT * FROM T_TABLE WHERE ID IN (SELECT COLUMN_VALUE FROM TABLE(num_array));
END;
END;
I hope that It helps you.
When you look at the cx_Oracle documentation, it says you can create the arrays like this;
Cursor.arrayvar(dataType, value[, size])
Create an array variable associated with the cursor of the given type and size and return a variable object (Variable Objects). The value is either an integer specifying the number of elements to allocate or it is a list and the number of elements allocated is drawn from the size of the list. If the value is a list, the variable is also set with the contents of the list. If the size is not specified and the type is a string or binary, 4000 bytes (maximum allowable by Oracle) is allocated. This is needed for passing arrays to PL/SQL (in cases where the list might be empty and the type cannot be determined automatically) or returning arrays from PL/SQL.
You may pass your arrays as long as array types are compatible with your PL/SQL procedure's parameter. Here is a simple example to create an array.
>>> myarray=cursor.arrayvar(cx_Oracle.NUMBER,range(0,10))
>>> myarray
<cx_Oracle.NUMBER with value [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]>
Here is a link (belongs to 2005 seems outdated, not sure) showing how to create arrays in PL/SQL side.
EDIT:
I added a complete example below showing how to pass arrayvar and other variable types. I tested the code with Oracle 10g and Python 2.7. I hope this helps.
from __future__ import print_function
import cx_Oracle as cxo
conn = cxo.connect("<YOUR TNS STRING>")
cursor = conn.cursor()
ref_cursor = cursor.var(cxo.CURSOR)
cod_us = cursor.var(cxo.NUMBER, 10)
ids_friend = cursor.arrayvar(cxo.NUMBER, range(0, 10))
ids_friend_sum = cursor.var(cxo.NUMBER)
cursor.execute('''
DECLARE
TYPE REF_CURSOR IS REF CURSOR;
TYPE ARRAY_ID_FRIENDS IS TABLE OF INT INDEX BY BINARY_INTEGER;
FUNCTION test(CONSULTA OUT REF_CURSOR,
COD_US IN INT,
IDS_FRIEND IN ARRAY_ID_FRIENDS) RETURN NUMBER
IS
sum_ NUMBER:=0;
BEGIN
OPEN CONSULTA FOR SELECT 1 FROM DUAL UNION SELECT 2 FROM DUAL;
FOR i in IDS_FRIEND.FIRST..IDS_FRIEND.LAST LOOP
sum_:=sum_+IDS_FRIEND(i);
END LOOP;
RETURN sum_;
END;
BEGIN
:ids_friend_sum:=test(:ref_cursor,:cod_us,:ids_friend);
END;
''', {"ref_cursor": ref_cursor, "cod_us": cod_us, "ids_friend": ids_friend,
"ids_friend_sum": ids_friend_sum})
print("ref cursor=", end=" ")
for rec in ref_cursor.getvalue():
print(rec, end="\t")
print("\nids_friend_sum=", ids_friend_sum.getvalue())
I have converted grid1 and grid2 into arrays and using following function which iterates through table and should return corresponding value form table when grid1 and grid2 values are matched. But somehow the final output contain only 4 integer values which isn't correct. Any suggestion what is possibly wrong here?
def grid(grid1,grid2):
table = {(10,1):61,(10,2):75,(10,3):83,(10,4):87,
(11,1):54,(11,2):70,(11,3):80,(11,4):85,
(12,1):61,(12,2):75,(12,3):83,(12,4):87,
(13,1):77,(13,2):85,(13,3):90,(13,4):92,}
grid3 = np.zeros(grid1.shape, dtype = np.int)
for k,v in table.iteritems():
grid3[[grid1 == k[0]] and [grid2 == k[1]]] = v
return grid3
I think what's happening is that the assignment to the variables "k" and "v" not done using "deepcopy". This means the assignment is just to the variables and not their values. For example, when the value of "k" changes on subsequent iterations, all previous "gridx" assignments now reflect the new/current status of "k".