What is the correct method to insert utf-8 data in an openldap database? I have data in a std::wstring which utf-8 encoded with:
std::wstring converted = boost::locale::conv::to_utf<wchar_t>(line, "Latin1");
When the string needs to added tot an ldapMod structure, i use this fuction:
std::string str8(const std::wstring& s) {
return boost::locale::conv::utf_to_utf<char>(s);
}
to convert from wstring to string. This is used in my function to create an LDAPMod:
LDAPMod ** y::ldap::server::createMods(dataset& values) {
LDAPMod ** mods = new LDAPMod*[values.elms() + 1];
mods[values.elms()] = NULL;
for(int i = 0; i < values.elms(); i++) {
mods[i] = new LDAPMod;
data & d = values.get(i);
switch (d.getType()) {
case NEW: mods[i]->mod_op = 0; break;
case ADD: mods[i]->mod_op = LDAP_MOD_ADD; break;
case MODIFY: mods[i]->mod_op = LDAP_MOD_REPLACE; break;
case DELETE: mods[i]->mod_op = LDAP_MOD_DELETE; break;
default: assert(false);
}
std::string type = str8(d.getValue(L"type"));
mods[i]->mod_type = new char[type.size() + 1];
std::copy(type.begin(), type.end(), mods[i]->mod_type);
mods[i]->mod_type[type.size()] = '\0';
mods[i]->mod_vals.modv_strvals = new char*[d.elms(L"values") + 1];
for(int j = 0; j < d.elms(L"values"); j++) {
std::string value = str8(d.getValue(L"values", j));
mods[i]->mod_vals.modv_strvals[j] = new char[value.size() + 1];
std::copy(value.begin(), value.end(), mods[i]->mod_vals.modv_strvals[j]);
mods[i]->mod_vals.modv_strvals[j][value.size()] = '\0';
}
mods[i]->mod_vals.modv_strvals[d.elms(L"values")] = NULL;
}
return mods;
}
The resulting LDAPMod is passed on to ldap_modify_ext_s and works as long as i only use ASCII characters. But if other characters are present in the string I get an ldap operations error.
I've also tried this with the function provided by the ldap library (ldap_x_wcs_to_utf8s) but the result is the same as with the boost conversion.
It's not the conversion itself that is wrong, because if I convert the modifications back to a std::wstring and show it in my program output, the encoding is still correct.
AFAIK openldap supports utf-8 since long, so I wonder if there's something else that must be done before this works?
I've looked into the openldap client/tools examples, but the utf-8 functions provided by the library are never used in there.
Update:
I noticed I can insert utf-8 characters like é into ldap with Apache Directory Studio. I can retrieve these values from ldap in my c++ program. But if I insert the same character again, without changing anything to that string, I get the ldap operations error again.
It turns out that my code was not wrong at all. My modifications tried to store the full name in the 'displayName' field as well as in 'gecos'. But apparently 'gecos' cannot handle utf8 data.
We don't actually use gecos anymore. The value was only present because of some software we used years ago, so I removed it from the directory.
What made it hard to find was that even though the loglevel was set to 'parse', this error was still not in the logs.
Because libldap can be such a hard nut to crack, I'll include a link to the complete code of the project i'm working on. It might serve as a starting point for other programmers. (Most of the code in tutorials I have found is outdated.)
https://github.com/yvanvds/yATools/tree/master/libadmintools/ldap
Related
I am trying to concatenate an array of strings into a character array - but one of the strings is in a foreign language (why I need UTF8). I can see the UTF8 string in their appropriate language in the debugger (Visual Studio) after I read it from the database and put it into the wxString array, but when I try to concatenate the string to the array, t never gets put in there.
I have tried variable.mb_str()
variable.mb.str().data(). Neither seems to work in the strcat
for my Language data. The other data is concatenated fine. All of the data comes from a MariaDB database call.
int i, numRows;
wxString query;
wxString sortby;
wxString group_list;
wxString *stringGroups;
char holdString[400];
/* Try UTF Force */
query.Printf(_("set names 'utf8'"));
mysql_query(mDb, query.mb_str());
result = mysql_store_result(mDb);
mysql_free_result(result);
query.Printf(_("select GROUP_NAME from USER_PERMS where USER_NAME =
\"%s\"
ORDER BY GROUP_NAME "), riv_getuser().c_str() );
mysql_query(mDb, query.mb_str());
result = mysql_store_result(mDb);
numRows = mysql_num_rows(result);
stringGroups = new wxString[numRows + 1];
i = 0;
while ((row = mysql_fetch_row(result)))
{
stringGroups[i] = wxString(row[0], wxConvUTF8);
i++;
}
mysql_free_result(result);
i = 0;
strcpy (holdString,"IN (\'");
while (i < numRows)
{
if (i != 0) strcat(holdString, "\', \'");
strcat(holdString, (const char *)stringGroups[i].mb_str().data());
i++;
}
strcat (holdString," \')");
-- END OF CODE --
--ACTUAL stringGroup that fails -- Debugger Watch Output
stringGroups[2] {m_impl=L"文字化け"...
I expect to get:
IN ( 'test' , 'test' , '文字化け' )
what I get
IN ( 'test','test2','' )
Don't use strcpy() and strcat() with wxString, this is just needlessly error-prone. If you use wxString in the first place, build the entire string you need and then utf8_str() method to get the buffer containing UTF-8 string contents which you can then pass to whatever function you need.
Do keep in mind that this buffer is temporary, so you can't rely on it continuing to exist if you don't make a copy of it or at least extend its lifetime, i.e.
auto const& buf = some_wx_string.utf8_str();
... now you can use buf.data() safely until the end of scope ...
To get UTF8 from wxString you need to call ToUTF8(). Similarly, for getting UTF8 into wxString there is FromUTF8(). Both are members of wxString and documented.
wxString::mb_str() converts to a multi-byte string in your current locale. Presumably the characters in your string aren't representable in your locale so the conversion fails and an empty string is returned.
You should pass wxConvUTF8 as a parameter or simply call utf8_str or ToUTF8 instead.
I am doing this IoT based project on displaying data to connected display( I've used the MAX7219 module, in this case) with the help of nodeMCU. The idea here is that the string which is stored in my firebase database is to be display on the led display.
I've had no trouble in getting the value from the database to my nodeMCU but there is this little problem with converting that string to char array since the code i am using( Max72xx_Message_serial, which was available as an example with the max72xx library) has used char array but i can only fetch the stored data in string format. I've modified that code so as to connect with firebase but the main issue is to convert the string fetched from the database to char array.
I tried toCharArray() but it still shows conversion error.
void readfromfirebase(void)
{
static uint8_t putIndex = 0;
int n=1;
while (Firebase.available())
{
newMessage[putIndex] = (char)Firebase.getString("Submit Message"); // this line produces the error
if ((newMessage[putIndex] == '\n') || (putIndex >= BUF_SIZE-3)) // end of message character or full buffer
{
// put in a message separator and end the string
newMessage[putIndex++] = ' ';
newMessage[putIndex] = '\0';
// restart the index for next filling spree and flag we have a message waiting
putIndex = 0;
newMessageAvailable = true;
}
else if (newMessage[putIndex] != '\r')
// Just save the next char in next location
{putIndex++;}
n++;
}
}
I think you are confusing the types
getString returns a String object wich can be converted to a char[] using the methods of the String class.
I assume your newMessage is of type char[] or char*.
Then I would advise you to go for the String.c_str() method, because it returns a C style null-terminated string, meaning a char*.
See https://www.arduino.cc/reference/en/language/variables/data-types/string/functions/c_str/ for reference.
It also sets the last character of the string to 0. So methods like strlen, strcmp etc will work.
! be carefull not to modify the array returned by c_str(), if you want to modify it you chould copy the char[] or use string.toCharArray(buf, len).
Your Code might then look like the following.
String msg = Firebase.getString("Submit Message");
newMessage = msg.c_str();
// rest of your code
If newMessage is a buffer storing multiple messages, meaning char* newMessage[3].
String msg = Firebase.getString("Submit Message");
newMessage[putIndex] = msg.c_str();
// rest of your code
Be careful, because you are storing multiple characters in an array, so use strcmp to compare these arrays!
If you are new to C I would recommend reading.
https://www.cprogramming.com/tutorial/c/lesson9.html
https://www.arduino.cc/reference/en/language/variables/data-types/stringobject/ (as pointed out by #gre_gor)
This question probably has an easy answer. I believe I have some working code, but I feel like there is a much better solution.
In any case, this is the problem I'm having:
Basically a user enters some data into a terminal, and my program is monitoring the terminal. It's important to know that I'm not using cin, which I know can easily be manipulated to interpret incoming data as hexadecimal. Instead, my program is using an adapted version of conios.h for linux and using kbhit(). All user input is stored as a std::string until the user decides to submit it.
At that point, I have to interpret the string as hexadecimal - but there's a minor caveat. I have to save this string in a character array.
That said, this is what I have:
...
char bufferBytes[6144];
std::string bufferString = "";
...
for(i = 0; i < bufferString.length(); i = i+2)
{
bufferBytes[i] = (stoi(bufferString.at(i), 0, 16) << 4);
bufferBytes[i] = (stoi(bufferString.at(i+1), 0, 16);
}
I believe this will do the trick, but I feel like there's probably a better solution.
Any input would be appreciated.
EDIT:
Say a user enters 0123456789ABCDEF. This is stored as a std::string until the user decides to submit it. At this point, I need to interpret this std::string as hexadecimal numbers and store them in a character array. I believe the code I have above will work, but is there a better/more efficient way of doing what I described.
Here's a rough snippet that would probably do the job.
for (int i = 0; i < bufferString.length(); i += 2) {
bufferBytes[i/2] = (bufferString[i] - '0') << 4;
bufferBytes[i/2] |= bufferString[i+1] - '0';
}
If your intention is to store each hex digit as their own char element then this should be the body of the loop:
bufferBytes[i] = (bufferString[i] - '0') << 4;
bufferBytes[i+1] = bufferString[i+1] - '0';
I just have a problem with a text that contains Polish diacritical marks (eg. ą, ć, ę, ł, ń, ó, ś, ź, ż) obtained by libcurl from the server. I'm trying to display this text correctly in a Windows C++ console application.
I solved the similar problem with putting to the console screen something like that:
cout << "ąćęźół";
by switching codepage of my source file to: DOS Codepage 852 (Central Europe). Unfortunately it doesn't work out with text passing from libcurl. I think that it works only with the text written directly into the code. So could you tell my some helpful information? I have no idea how to resolve this issue.
Well I've written temporary solution for my problem. It works fine, but I'm not contented of this way:
char* cpl(const char* input)
{
size_t length = strlen(input);
char* output = new char[length+1];
/* Order of the diacretics
Ą ą Ć ć Ę ę
Ł ł Ń ń Ó ó
Ś ś Ź ź Ż ż
*/
const size_t pld_in[] = {
0xA1,0xB1,0xC6,0xE6,0xCA,0xEA,
0xA3,0xB3,0xD1,0xF1,0xD3,0xF3,
0xA6,0xB6,0xAC,0xBC,0xAF,0xBF,
};
const size_t pld_out[] = {
0xA4,0xA5,0x8F,0x86,0xA8,0xA9,
0x9D,0x88,0xE3,0xE4,0xE0,0xA2,
0x97,0x98,0x8D,0xAB,0xBD,0xBE
};
for(size_t i = 0; i < length; i++)
{
bool modified = false;
for(size_t j = 0; j < 18; j++)
{
if(*(input + i) == (*(pld_in + j)) + 0xFFFFFF00)
{
*(output + i) = *(pld_out + j);
modified = true;
break;
}
}
if(!modified)
*(output + i) = *(input + i);
}
*(output + length) = 0x00;
return output;
}
Could you propose better solution of this problem, without characters converting?
The content of the web page returned by libcurl will use the character set of the web page. What's likely happening here is that it's not the character set used by your "codeset", which I presume the MS-Windows term for locale.
libcurl should let you look at the headers of the HTTP response that was received from the server. Look at the Content-Type: header, which will indicate which character set the returned text uses; then look up which codepage uses the same character set.
My code is the following (reduced):
CComVariant* input is an input parameter
CString cstrPath(input ->bstrVal);
const CHAR cInvalidChars[] = {"/*&#^°\"§$[]?´`\';|\0"};
for (unsigned int i = 0; i < strlen(cInvalidChars); i++)
{
cstrPath.Replace(cInvalidChars[i],_T(''));
}
When debugging, value of cstrPath is L"§", value of cInvalidChars[7] is -89 '§'
I have tried to use .Remove() before, but the problem remains the same: when it comes to § or ´, the code table does not seem to match and the char does not get recognized properly and will not be removed. using a TCHAR array for invalidChars results in even different problems ('§' -> 'ᄡ').
The problem seems that I am not using the correct code tables, but everything I tried so far did not result in any success.
I want to successfully replace/delete any occuring '§'..
I also have had a look at several "delete character from string"-Posts but I did not find anything that helped me.
executable code:
CComVariant* pccovaValue = new CComVariant();
pccovaValue->bstrVal = L"§§";
const CHAR cInvalidChars[] = {"§"};
CString cstrPath(pccovaValue->bstrVal);
for (unsigned int i = 0; i < strlen(cInvalidChars); i++)
{
cstrPath.Remove(cInvalidChars[i]);
}
cstrPath = cstrPath;
just break into cstrPath = cstrPath;
According to the comments you are mixing up Unicode and ANSI encodings. It seems that your application is targeting Unicode which is good. You should stop using ANSI altogether.
Declare cInvalidChars like this:
CString cInvalidChars = L"/*&#^°\"§$[]?´`\';|";
The use of the L prefix means that the string literal is a wide character UTF-16 literal.
Then your loop can look like this:
for (int i = 0; i < cInvalidChars.GetLength(); i++)
cstrPath.Remove(cInvalidChars[i]);