Encoded character buffer storage problem in MySQL varchar using C

Encoded character buffer storage problem in MySQL varchar using C - c++

I have a encoded character buffer array of size 512 in C, and a database field of varchar in MySQL. Is it possible to store the encoded character buffer into varchar?
I have tried this, but the problem which I face is that it only stores the limited area of the buffer into the database and ignore. What is the actual problem, and how do I solve this problem?

It is not clear what you mean by encoded.
If you mean that you have an arbitrary string of byte values, then varchar is a bad fit because it will attempt to trim trailing spaces. A better choice in such cases is to use varbinary fields.
If the string you are inserting contains control characters, you might be best converting that into a hex string and inserting it like follows:
create table xx (
v varbinary(512) not null );
insert into xx values ( 0x68656C6C6F20776F726C64);
This will prevent any component in the tool chain from choking on NUL characters and so forth.

What size is your varchar declared for the table?
Often varchar fields are set to 255 bytes, not characters. Starting with MySQL 5.0.3 you can have longer varchar fields.
Sounds like you need a varchar(512) field, is that what you have?
See http://dev.mysql.com/doc/refman/5.0/en/char.html

Related

Redshift varchar too narrow

I've got a table that I populate with tab-separated data from files whose encoding doesn't seem to be utf-8 exactly, like so:
CREATE TABLE tab (
url varchar(2000),
...
);
COPY tab
FROM 's3://input.tsv'
After the copy has completed I run
SELECT
MAX(LEN(url))
FROM tab
which returns 1525. I figure, since I'm wasting space I might as well resize the column by almost a quarter by using varchar(2000) instead of varchar(1525). But neither redoing the COPY nor setting up a new table and inserting the already imported data works. In both cases I get
error: Value too long for character type
Why won't the column hold these values?

Your file might be in a multi-byte format.
From the LEN Function documentation:
The LEN function returns an integer indicating the number of characters in the input string. The LEN function returns the actual number of characters in multi-byte strings, not the number of bytes. For example, a VARCHAR(12) column is required to store three four-byte Chinese characters. The LEN function will return 3 for that same string.
The extra size of a VARCHAR will not waste disk space due to the compression methods used by Amazon Redshift, but it will waste in-memory buffer space when a block is read from disk and decompressed into memory.

unable to insert ascii characters above 128 to oracle database using c++ program

I'm trying to insert characters above ascii range 128 using a C++ program(characters are Ø, Å). It's working fine for ascii characters less than 128
Data type used in database is VARCHAR2
Those characters are inserted as question marks (????) to DB
If I set field value in DB with those characters through Toad and try to read using application they were read as question marks(????)
Can someone please give me an example code to how to insert strings which contains those characters(ascii value above 128).
I think problem with data type conversion. (Because in application level before insert to DB those characters display correctly.Also If I set field value through Toad and read from DB they are read as Question Marks. I can set field value in DB means DB column can hold those characters)
I'm using following to Define and Bind methods in my application
OCIDefineByPos(p_sql, &p_dfn, p_DBCon->p_err, iPos,
(dvoid*)p->un_DataArray.pzValue, (sword)iSize, SQLT_STR, (dvoid*)p->un_FlagArray.pssValue, 0, 0,
OCI_DEFAULT);
OCIBindByName(p_sql, &p_bnd, p_DBCon->p_err, (text *) zName,
-1, (dvoid *) zValue, iSize, SQLT_STR, 0, 0, 0, 0, 0, OCI_DEFAULT);
Can someone help me
Or If you have some sample program that can insert ascii values up to 256 please share it with me

WE8MSWIN1252 correspond to "MS Windows Code Page 1252 8-bit West European".
You have to convert your strings to the Windows Code Page 1252 before inserting them in the db.
For instance, on Windows, if your strings are in utf8 then convert them to utf16 with MultiByteToWideChar and then back to CodePage1252 using WideCharToMultiByte

This may be because, there ARE no ascii characters above 128. ASCII is a 7-bit encoding.
In order to add non-ascii characters (there are no Ø, Å in ASCII), you'll need to use a different encoding to put them in. Most sane applications nowadays use utf8.

Shouldn't VARCHAR data be converted to char * (MSSQL OLE DB)?

In a C++ program, I am trying to read data from MSSQL database using OLE DB. The column I am trying to read is a VARCHAR type. The data in the column is imported from a multi-value database. Sometimes the data in the column has a delimiter in it. The delimiter is a value marker (0Xfd). I covert the data read from the table to a char * like this:
retcode = WideCharToMultiByte(CP_UTF8, 0, (WCHAR*)pDBColumnAccess[nCol].pData, -1, (char *)pReadBuf, pDBColumnAccess[nCol].cbDataLen, NULL, NULL);
Everything is fine if the data does not contain above mentioned delimiter - value marker (0xfd). But when the delimiter is there, in the converted data the value marker is replaced by some junk characters.
Shouldn't I do a conversion to char * in the case of VARCHAR? Is it enough to just copy the data as it is without any coversion?

The WideCharToMultiByte converts from UTF-16, yet there is no such thing as 0xFD character in UTF-16. All characters are encoded as at least 2 bytes. Did you actually mean 0x00FD (or even 0xFD00)?
Also, UTF-8 (your "target" encoding since you specified CP_UTF8) does not guarantee that all characters will be encoded in just one byte.
According to UTF Converter:
UTF-16 00FD converts to UTF-8 C3 BD.
UTF-16 FD00 converts to UTF-8 EF B4 80.
Is that what you are getting?

Delimiting Character

We are loading a Fixed width text file into a SAS dataset.
The character we are using to delimit multi valued field values is being interpreted as 2 characters by SAS. This breaks things, because the fields are of a fixed width.
We can use characters that appear on the keyboard, but obviously this isn't as safe, because our data could actually contain those characters.
The character we would like to use is '§'.
I'm guessing this may be an encoding issue, but don't know what to do about it.

Could you use the keycode for the character like DLM='09'x and change 09 to the right keycode?

ASCII Value for Nothing

Is there an ascii value I can put into a char in C++, that represents nothing? I tried 0 but it ends up screwing up my file so I can't read it.

ASCII 0 is null. Other than that, there are no "nothing" characters in traditional ASCII. If appropriate, you could use a control character like SOH (start of heading), STX (start of text), or ETX (end of text). Their ASCII values are 1, 2, and 3 respectively.
For the full list of ASCII codes that I used for this explaination, see this site

Sure. Use any character value that won't appear in your regular data. This is commonly referred to as a delimited text file. Popular choices for delimiters include spaces, tabs, commas, semi-colons, vertical-bar characters, and tilde.

In a C++ source file, '\0' represents a 0 byte. However, C++ strings are usually null-terminated, which means that '\0' represents the end of the string - which may be what is messing up your file.
If you really want to store a 0 byte in a data file, you need to use some other encoding. A simplistic one would use some other character - 0xFF, for example - that doesn't appear in your data, or some length/data format or something similar.
Whatever encoding you choose to use, the application writing the file and the one reading it need to agree on what the encoding is. And that is a whole new nightmare.

The null character '\0' still takes up a byte.
Does your software recognize the null character as an end-of-file character?
If your software is reading in this file, you can define a place holder character (one that isn't the same as data) but you'll also need to handle that character. As in, say '*' is your place-holder. You will read in the character but not add it to the structure that stores your data. It will still take up space in your file, but it won't take up space in your data structure.
Am I answering your question or missing it?

Do you mean a value you can write which won't actually change the file? The answer is no.
Maybe post a little more about what you're trying to accomplish.

it would depend on what kind of file it is and who is parsing it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Encoded character buffer storage problem in MySQL varchar using C - c++

What size is your varchar declared for the table? Often varchar fields are set to 255 bytes, not characters. Starting with MySQL 5.0.3 you can have longer varchar fields. Sounds like you need a varchar(512) field, is that what you have? See http://dev.mysql.com/doc/refman/5.0/en/char.html

Related

Redshift varchar too narrow

unable to insert ascii characters above 128 to oracle database using c++ program

Shouldn't VARCHAR data be converted to char * (MSSQL OLE DB)?

Delimiting Character

ASCII Value for Nothing

Categories

Resources