I'm using Python to read values from SQL Server (pypyodbc) and insert them into PostgreSQL (psycopg2)
A value in the NAME field has come up that is causing errors:
Montaño
The value is existing in my MSSQL database just fine (SQL_Latin1_General_CP1_CI_AS encoding), and can be inserted into my PostgreSQL database just fine (UTF8) using PGAdmin and an insert statement.
The problem is selecting it using python causes the value to be converted to:
Monta\xf1o
(xf1 is ASCII for 'Latin small letter n with tilde')
...which is causing the following error to be thrown when trying to insert into PostgreSQL:
invalid byte sequence for encoding "UTF8": 0xf1 0x6f 0x20 0x20
Is there any way to avoid the conversion of the input string to the string that is causing the error above?
Under Python_2 you actually do want to perform a conversion from a basic string to a unicode type. So, if your code looks something like
sql = """\
SELECT NAME FROM dbo.latin1test WHERE ID=1
"""
mssql_crsr.execute(sql)
row = mssql_crsr.fetchone()
name = row[0]
then you probably want to convert the basic latin1 string (retrieved from SQL Server) to the type unicode before using it as a parameter to the PostgreSQL INSERT, i.e., instead of
name = row[0]
you would do
name = unicode(row[0], 'latin1')
Related
void UpdateRecords(PGconn *conn, std::string &records)
{
if(!records.empty())
{
std::string sql;
sql.append("INSERT INTO data.record VALUES ");
sql.append(records);
PGresult *res = PQexec(conn, sql.c_str());
}
}
Here records has data separated by comma for each column. If any special character is available in sql string, like µ or m³ record is not updated in database (query fails).
Error message ERROR: invalid byte sequence for encoding "UTF8": 0xb5
Database Version 9.6.12
Database encoding: UTF8
The database is expecting you to send utf8, but you are sending something else. Probably LATIN1-extended.
You can fix this by first doing a set client_encoding to latin1 on your connection, so the database knows what encoding you are sending to it.
You could also change records so that they actually do contain UTF8 characters, but that seems harder. Or at least, I don't know to make C++ do that off the top of my head.
I am attempting to copy data into redshift from an S3 bucket, however I am getting a 1204 error code 'char length exceeds DDL length'.
copy table_name from '[data source]'
access_key_id '[access key]'
secret_access_key '[secret access key]'
region 'us-east-1'
null as 'NA'
delimiter ','
removequotes;
The error occurs in the very first row, where it tries to put the state abbreviation 'GA' into the data_state column which is defined with the data type char(2). When I query the stl_load_errors table I get the following result:
line_number colname col_length type raw_field_value err_code err_reason
1 data_state 2 char GA 1204 Char length exceeds DDL length
As far as I can tell that shouldn't exceed the length as it is two characters and it is set to char(2). Does anyone know what could be causing this?
Got it to work by changing the data type to char(3) instead, however still not sure why char(2) wouldn't work
Mine did this as well, for a state column too. Redshift defaults char to char(1) - so I had to specify char(2) - are you sure it didn't default back to char(1) because mine did
Open the file up with a Hex editor, or use an online one here, and look at the GA value in the data_state column.
If it has three dots before it like so:
...GA
Then the file (or when it was orignally created) was UTF-8-BOM not just UTF-8.
You can open the file in something like Notepad++ and go to Encoding in the top bar then select Convert to UTF-8.
I have a list of tuples like below -
[(float.inf, 1.0), (270, 0.9002), (0, 0.0)]
I am looking for a simple serializer/deserializer that helps me store this tuple in a jsonb field in PostgreSQL.
I tried using JSONEncoder().encode(a_math_function) but didn't help.
I am facing the following error while attempting to store the above list in jsonb field -
django.db.utils.DataError: invalid input syntax for type json
LINE 1: ...", "a_math_function", "last_updated") VALUES (1, '[[Infinit...
DETAIL: Token "Infinity" is invalid.
Note: the field a_math_function is of type JSONField()
t=# select 'Infinity'::float;
float8
----------
Infinity
(1 row)
because
https://www.postgresql.org/docs/current/static/datatype-numeric.html#DATATYPE-FLOAT
In addition to ordinary numeric values, the floating-point types have
several special values:
Infinity
-Infinity
NaN
yet, the json does not have such possible value (unless its string)
https://www.json.org/
value
string
number
object
array
true
false
null
thus:
t=# select '{"k":Infinity}'::json;
ERROR: invalid input syntax for type json
LINE 1: select '{"k":Infinity}'::json;
^
DETAIL: Token "Infinity" is invalid.
CONTEXT: JSON data, line 1: {"k":Infinity...
Time: 19.059 ms
so it's not the jango or postgres limitation - just Infinity is invalid token, yet 'Infinity' is a valid string. so
t=# select '{"k":"Infinity"}'::json;
json
------------------
{"k":"Infinity"}
(1 row)
works... But Infinity here is "just a word". Of course you can save it as a string, not as numeric value and check every string if it's not equal "Infinity", and if it is - launch your program logic to treat it as real Infinity... But in short - you can't do it, because json specification does not support it... same asyou can't store lets say red #ff0000 as colour in json - only as string, to be caught and processed by your engine...
update:
postgres would cast float to text itself on to_json:
t=# select to_json(sub) from (select 'Infinity'::float) sub;
to_json
-----------------------
{"float8":"Infinity"}
(1 row)
update
https://www.postgresql.org/docs/current/static/datatype-json.html
When converting textual JSON input into jsonb, the primitive types
described by RFC 7159 are effectively mapped onto native PostgreSQL
types
...
number numeric NaN and infinity values are disallowed
Hi guys I am having a problem with inserting utf-8 unicode character to my database.
The unicode that I get from my form is u'AJDUK MARKO\u010d'. Next step is to decode it to utf-8. value.encode('utf-8') then I get a string 'AJDUK MARKO\xc4\x8d'.
When I try to update the database, works the same for insert btw.
cur.execute( "UPDATE res_partner set %s = '%s' where id = %s;"%(columns, value, remote_partner_id))
The value gets inserted or updated to the database but the problem is it is exactly in the same format as AJDUK MARKO\xc4\x8d and of course I want AJDUK MARKOČ. Database has utf-8 encoding so it is not that.
What am I doing wrong? Surprisingly couldn't really find anything useful on the forums.
\xc4\x8d is the UTF-8 encoding representation of Č. It looks like the insert has worked but you're not printing the result correctly, probably by printing the whole row as a list. I.e.
>>> print "Č"
"Č"
>>> print ["Č"] # a list with one string
['\xc4\x8c']
We need to see more code to validate (It's always a good idea to give as much reproducible code as possible).
You could decode the result (result.decode("utf-8")) but you should avoid manually encoding or decoding. Psycopg2 already allows you send Unicodes, so you can do the following without encoding first:
cur.execute( u"UPDATE res_partner set %s = '%s' where id = %s;" % (columns, value, remote_partner_id))
- note the leading u
Psycopg2 can return Unicodes too by having strings automatically decoded:
import psycopg2
import psycopg2.extensions
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
Edit:
SQL values should be passed as an argument to .execute(). See the big red box at: http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters
Instead
E.g.
# Replace the columns field first.
# Strictly we should use http://initd.org/psycopg/docs/sql.html#module-psycopg2.sql
sql = u"UPDATE res_partner set {} = %s where id = %s;".format(columns)
cur.execute(sql, (value, remote_partner_id))
I am parsing a csv file (created in windows) and trying to populate a database table using a model i've created.
I am getting this error:
pl = PriceList.objects.create(code=row[0], description=row[1],.........
Incorrect string value: '\xD0h:NAT...' for column 'description' at row 1
My table and the description field use utf-8 and utf8_general_ci collation.
The actual value i am trying to insert is this.
HOUSING:PS-187:1g\xd0h:NATURAL CO
I am not aware of any string processing i should do to get over this error.
I think i used a simple python script before to populate the database using conn.escape_string() and it worked (if that helps)
Thanks
I've had trouble with the CSV reader and unicode before as well. In my case using the following got me past the errors.
From http://docs.python.org/library/csv.html
The csv module doesn’t directly support reading and writing Unicode, ...
unicode_csv_reader() below is a
generator that wraps csv.reader to
handle Unicode CSV data (a list of
Unicode strings). utf_8_encoder() is a
generator that encodes the Unicode
strings as UTF-8, one string (or row)
at a time. The encoded strings are
parsed by the CSV reader, and
unicode_csv_reader() decodes the
UTF-8-encoded cells back into Unicode:
import csv
def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
# csv.py doesn't do Unicode; encode temporarily as UTF-8:
csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
dialect=dialect, **kwargs)
for row in csv_reader:
# decode UTF-8 back to Unicode, cell by cell:
yield [unicode(cell, 'utf-8') for cell in row]
def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
yield line.encode('utf-8')