MongoDB C++ String encoding error on accent when inserting JSON string - c++

I have a problem when I insert a JSON string in MongoDB a C++ function. I am basically creating a big std::string formatted as a JSON and I put my values in it.
I have some accents in strings in the data I put in the JSON and I get an error when I try to see the document correctly in the DB after.
This is my update/insert code
mongodb_client_connector.update
(
mongodb_database+"."+MONGODB_COLLECTION,
Query(BSON(MONGODB_ID << OID(param_oid))),
fromjson(The_JSON_I_Wrote)
);
This is the result:
How do I format the string correctly so I get the accents?

Related

Format timestamp inside a set-column sentence

I'm developing a data fusion pipeline. It contains a wrangler node where I'm trying to create a new field that will contain the system date in timestamp format (yyyy-MM-dd'T'HH-mm-ss).
I've tried using the sentence:
set-column :sysdate (${logicalStartTime(yyyy-MM-dd'T'HH-mm-ss)})
But I receive the error:
Caused by: io.cdap.wrangler.api.DirectiveParseException: Error encountered while parsing 'set-column' : Error encountered while compiling '( 2022 -12-01T16-29-32 ) ' at line '1' and column '14'. Make sure a valid jexl transformation is provided.
Which would be the correct sentence?
I've tried:
set-column :sysdate (${logicalStartTime(yyyy-MM-ddHH-mm-ss)})
Which will result in something like "1877", as it substracts the numbers, and also tried:
set-column :sysdate (${logicalStartTime(yyyyMMddHHmmss)})
but the format isn't correct and can only be written if the field is a String.
You have the correct method, just incorrect syntax. The syntax you are looking for is set-column :sysdate ${logicalStartTime(yyyy-MM-dd'T'HH-mm-ss)}, you have to remove (). Then you can convert the string in datetime pattern in this format parse-as-datetime :sysdate "yyyy-MM-dd'T'HH-mm-ss".

Storing utf-8 data in a text field in SQLite results in BLOB content

I'm trying to save some form data that are in UTF-8 format in a sqlite database.I'm binding the input using
int bindText(int index, const char *text, void func(void*)) {
return sqlite3_bind_text(comp_stat, index, text, -1, func);
}
and the compiled statement is a simple INSERT query.
"INSERT INTO `tbl` (`id`,...`,`title`) VALUES (?,...,?);
and the CREATE statement is
CREATE TABLE `tbl` (
`id` INTEGER NOT NULL,
...,
`title` TEXT
);
The query runs and everything is fine except for that if the input string is a non-ASCII utf-8 string, I won't be able to retrieve it in, and SQLite Browser also shows that field as BLOB and not TEXT anymore. I also hopelessly tried sqlite3_bind_text16 as someone suggested, and of course the result was not OK, but this time it was a garbled text in a totally different UTF-8 code page, but the field was TEXT again, not BLOB.
If I use simple abcd characters everything works fine, so there is no fundamental problem in my queries I guess. Any help is appreciated.
OK. There was a problem in my tests, and now I have much more accurate info!
This works:
cs->bindText ( 12, u8"سلام", sfree);
but this fails (BLOB):
cs->bindText ( 12, (std::string(u8"سلام")).c_str(), sfree);
What is my problem? Thanks again!

How to avoid conversion to ASCII when reading

I'm using Python to read values from SQL Server (pypyodbc) and insert them into PostgreSQL (psycopg2)
A value in the NAME field has come up that is causing errors:
Montaño
The value is existing in my MSSQL database just fine (SQL_Latin1_General_CP1_CI_AS encoding), and can be inserted into my PostgreSQL database just fine (UTF8) using PGAdmin and an insert statement.
The problem is selecting it using python causes the value to be converted to:
Monta\xf1o
(xf1 is ASCII for 'Latin small letter n with tilde')
...which is causing the following error to be thrown when trying to insert into PostgreSQL:
invalid byte sequence for encoding "UTF8": 0xf1 0x6f 0x20 0x20
Is there any way to avoid the conversion of the input string to the string that is causing the error above?
Under Python_2 you actually do want to perform a conversion from a basic string to a unicode type. So, if your code looks something like
sql = """\
SELECT NAME FROM dbo.latin1test WHERE ID=1
"""
mssql_crsr.execute(sql)
row = mssql_crsr.fetchone()
name = row[0]
then you probably want to convert the basic latin1 string (retrieved from SQL Server) to the type unicode before using it as a parameter to the PostgreSQL INSERT, i.e., instead of
name = row[0]
you would do
name = unicode(row[0], 'latin1')

Insert JSON format in Mysql query using C++

I am using JSON format to save data in my c++ program , i want to send it to MySql database (the table tab has one column with type : TEXT) but the query failed (tested also VARCHAR and CHAR )
this is a part of the code since we are not interrested in the rest
string json_example = "{\"array\":[\"item1\",\"item2\"], \"not an array\": \"asdf\"}";
mysql_init(&mysql); //initialize database connection
string player="INSERT INTO tab values (\"";
player+= json_example;
player += "\")";
connection = mysql_real_connect(&mysql,HOST,USER,PASSWD,DB,0,NULL,0);
// save data to database
query_state=mysql_query(connection, player.c_str()); // use player.c_str()
to show the final query that will be used : cout << player gives :
INSERT INTO tab values ("{"array":["item1","item2"], "not an
array": "asdf"}")
using for example string json_example = "some text"; is working
but with the json format it is not working , maybe the problem came from the use of curly bracket {} or double quotes "" but i haven't find a way to solve it .
i'm using :
mysql Ver 14.14 Distrib 5.5.44, for debian-linux-gnu (armv7l) under raspberry pi 2
Any help will be appreciated , thanks .
Use a prepared statement. See prepared statements documentation in the MySQL reference manual.
Prepared statements are more correct, safer, possibly faster, and keep your code cleaner. You get all those benefits and don't need to escape anything. There is hardly a reason not to use them.
Something like this might work. But take it with a grain of salt, because I have not tested or compiled it. It should just give you the general idea:
MYSQL_STMT* const statement = mysql_stmt_init(&mysql);
std::string const query = "INSERT INTO tab values(?)";
mysql_stmt_prepare(statement, query, query.size());
MYSQL_BIND bind[1] = {};
bind[0].buffer_type = MYSQL_TYPE_STRING;
bind[0].buffer = json_example.c_str();
bind[0].buffer_length = json_example.size();
mysql_stmt_bind_param(statement, bind);
mysql_stmt_execute(statement);

Django DB insert incorrect string value

I am parsing a csv file (created in windows) and trying to populate a database table using a model i've created.
I am getting this error:
pl = PriceList.objects.create(code=row[0], description=row[1],.........
Incorrect string value: '\xD0h:NAT...' for column 'description' at row 1
My table and the description field use utf-8 and utf8_general_ci collation.
The actual value i am trying to insert is this.
HOUSING:PS-187:1g\xd0h:NATURAL CO
I am not aware of any string processing i should do to get over this error.
I think i used a simple python script before to populate the database using conn.escape_string() and it worked (if that helps)
Thanks
I've had trouble with the CSV reader and unicode before as well. In my case using the following got me past the errors.
From http://docs.python.org/library/csv.html
The csv module doesn’t directly support reading and writing Unicode, ...
unicode_csv_reader() below is a
generator that wraps csv.reader to
handle Unicode CSV data (a list of
Unicode strings). utf_8_encoder() is a
generator that encodes the Unicode
strings as UTF-8, one string (or row)
at a time. The encoded strings are
parsed by the CSV reader, and
unicode_csv_reader() decodes the
UTF-8-encoded cells back into Unicode:
import csv
def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
# csv.py doesn't do Unicode; encode temporarily as UTF-8:
csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
dialect=dialect, **kwargs)
for row in csv_reader:
# decode UTF-8 back to Unicode, cell by cell:
yield [unicode(cell, 'utf-8') for cell in row]
def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
yield line.encode('utf-8')