Replacing chars from string

Replacing chars from string - c++

My code is the following (reduced):
CComVariant* input is an input parameter
CString cstrPath(input ->bstrVal);
const CHAR cInvalidChars[] = {"/*&#^°\"§$[]?´`\';|\0"};
for (unsigned int i = 0; i < strlen(cInvalidChars); i++)
{
cstrPath.Replace(cInvalidChars[i],_T(''));
}
When debugging, value of cstrPath is L"§", value of cInvalidChars[7] is -89 '§'
I have tried to use .Remove() before, but the problem remains the same: when it comes to § or ´, the code table does not seem to match and the char does not get recognized properly and will not be removed. using a TCHAR array for invalidChars results in even different problems ('§' -> 'ﾴ').
The problem seems that I am not using the correct code tables, but everything I tried so far did not result in any success.
I want to successfully replace/delete any occuring '§'..
I also have had a look at several "delete character from string"-Posts but I did not find anything that helped me.
executable code:
CComVariant* pccovaValue = new CComVariant();
pccovaValue->bstrVal = L"§§";
const CHAR cInvalidChars[] = {"§"};
CString cstrPath(pccovaValue->bstrVal);
for (unsigned int i = 0; i < strlen(cInvalidChars); i++)
{
cstrPath.Remove(cInvalidChars[i]);
}
cstrPath = cstrPath;
just break into cstrPath = cstrPath;

According to the comments you are mixing up Unicode and ANSI encodings. It seems that your application is targeting Unicode which is good. You should stop using ANSI altogether.
Declare cInvalidChars like this:
CString cInvalidChars = L"/*&#^°\"§$[]?´`\';|";
The use of the L prefix means that the string literal is a wide character UTF-16 literal.
Then your loop can look like this:
for (int i = 0; i < cInvalidChars.GetLength(); i++)
cstrPath.Remove(cInvalidChars[i]);

Related

How can I iterate over CString and compare its characters to int?

I'm writing a dialog based MFC application in Visual Studio 2017 in C++. In the dialog I added a list control where the user can change the values of the cells as shown in the picture below:
after he changes the values, I want to check if those values are valid (so if he accidentally pressed the wrong button he will be notified). For this purpose I'm iterating over the different cells of the list and from each cell I extract the text which is written in it into a CString type variable. I want to check that this variable has only 8 characters which are '1' or '0'. The problem with the code I've written is that I get weird values when I try to print the different characters of the CString variable.
The Code for checking the validity of the CString:
void CEditableListControlDlg::OnBnClickedButton4()
{
// TODO: Add your control notification handler code here
// Iterate over the different cells
int bit7Col = 2;
int bit6Col = 3;
int bit5Col = 4;
int bit4Col = 5;
int bit3Col = 6;
int bit2Col = 7;
int bit1Col = 8;
int bit0Col = 9;
for (int i = 0; i < m_EditableList.GetItemCount(); ++i) {
CString bit7 = m_EditableList.GetItemText(i, bit7Col);
CString bit6 = m_EditableList.GetItemText(i, bit6Col);
CString bit5 = m_EditableList.GetItemText(i, bit5Col);
CString bit4 = m_EditableList.GetItemText(i, bit4Col);
CString bit3 = m_EditableList.GetItemText(i, bit3Col);
CString bit2 = m_EditableList.GetItemText(i, bit2Col);
CString bit1 = m_EditableList.GetItemText(i, bit1Col);
CString bit0 = m_EditableList.GetItemText(i, bit0Col);
CString cvalue = bit7 + bit6 + bit5 + bit4 + bit3 + bit1 + bit0;
std::string value((LPCSTR)cvalue);
int length = value.length();
if (length != 7) {
MessageBox("Register Value Is Too Long", "Error");
return;
}
for (int i = 0; i < length; i++) {
if (value[i] != static_cast<char>(0) || value[i] != static_cast<char>(1)) {
char c = value[i];
MessageBox(&c, "value"); // this is where I try to print the value
return;
}
}
}
}
Picture of what get's printed in the message box when I try to print one character of the variable value. I expect to see '1' but instead I see in the message box '1iiiiii`:
I've tried extracting the characters directly from the variable cvalue of type CString like this:
cvalue[i]
and it's length I got by using
strlen(cvalue[i])
but I've got the same result. I've also tried accessing the characters in the variable cvalue of type CString as follows:
cvalue.GetAt(i)
and to get it's length by using:
cvalue.GetLength()
But again, I've got the same results.
Perhaps anyone could advice me how can I check that the characters in the variable cvalue of type CString are '0' or '1'?
Thank you.

You don't need to use std::string to process your strings in this case: CString works fine.
Assuming that your CString cvalue is the string you want to check, you can write a simple loop like this:
// Check cvalue for characters different than '0' and '1'
for (int i = 0; i < cvalue.GetLength(); i++)
{
TCHAR currChar = cvalue.GetAt(i);
if ((currChar != _T('0')) && (currChar != _T('1')))
{
CString message;
message.Format(_T("Invalid character at position %d : %c"), i, currChar);
MessageBox(message, _T("Error"));
}
}
The reason for the apparently weird output in your case is that you are passing a pointer to a character that is not followed by a null-terminator:
// Wrong code
char c = value[i];
MessageBox(&c, "value");
If you don't want to build a CString with a formatted message containing the offending character, like I did in the previous sample code, an alternative could be creating a simple raw char array storing the character you want to output followed by the null-terminator:
// This is an array storing two chars: value[i] followed by '\0' (null)
char s[2] = {value[i], '\0'};
MessageBox(s, "value");
P.S.
I used TCHAR in my code sample instead of char, to make the code more easily portable to Unicode builds. Think of TCHAR as a preprocessor macro that maps to char for ANSI/MBCS builds (which seems your current case), and to wchar_t for Unicode builds.
A Brief Note on Validating User Input
With the above answer, I tried to strictly address your specific problem with CString character validation. But, if you can take a look from a broader perspective, I would definitely consider validating the user input before storing it in the list-view control. For example, you could handle the LVN_ENDLABELEDIT notification from the list-view control, and reject invalid input values.
Or, considering that the only valid values for each bit are 0 and 1, you could let the user select them from a combo-box.
Doing that starting from the MFC's CListCtrl is non-trivial work; so, you may also consider using other open-source controls, like this CGridListCtrlEx control available from CodeProject.

As you write in your last paragraph, "check that the characters in the variable cvalue of type CString are '0' or '1'?".
That's exactly how. '0' is the character 0. But you check for the integer 0, which is equal to the character '\0'. That's the end-of-string character.

insert utf-8 data in openldap with c api

What is the correct method to insert utf-8 data in an openldap database? I have data in a std::wstring which utf-8 encoded with:
std::wstring converted = boost::locale::conv::to_utf<wchar_t>(line, "Latin1");
When the string needs to added tot an ldapMod structure, i use this fuction:
std::string str8(const std::wstring& s) {
return boost::locale::conv::utf_to_utf<char>(s);
}
to convert from wstring to string. This is used in my function to create an LDAPMod:
LDAPMod ** y::ldap::server::createMods(dataset& values) {
LDAPMod ** mods = new LDAPMod*[values.elms() + 1];
mods[values.elms()] = NULL;
for(int i = 0; i < values.elms(); i++) {
mods[i] = new LDAPMod;
data & d = values.get(i);
switch (d.getType()) {
case NEW: mods[i]->mod_op = 0; break;
case ADD: mods[i]->mod_op = LDAP_MOD_ADD; break;
case MODIFY: mods[i]->mod_op = LDAP_MOD_REPLACE; break;
case DELETE: mods[i]->mod_op = LDAP_MOD_DELETE; break;
default: assert(false);
}
std::string type = str8(d.getValue(L"type"));
mods[i]->mod_type = new char[type.size() + 1];
std::copy(type.begin(), type.end(), mods[i]->mod_type);
mods[i]->mod_type[type.size()] = '\0';
mods[i]->mod_vals.modv_strvals = new char*[d.elms(L"values") + 1];
for(int j = 0; j < d.elms(L"values"); j++) {
std::string value = str8(d.getValue(L"values", j));
mods[i]->mod_vals.modv_strvals[j] = new char[value.size() + 1];
std::copy(value.begin(), value.end(), mods[i]->mod_vals.modv_strvals[j]);
mods[i]->mod_vals.modv_strvals[j][value.size()] = '\0';
}
mods[i]->mod_vals.modv_strvals[d.elms(L"values")] = NULL;
}
return mods;
}
The resulting LDAPMod is passed on to ldap_modify_ext_s and works as long as i only use ASCII characters. But if other characters are present in the string I get an ldap operations error.
I've also tried this with the function provided by the ldap library (ldap_x_wcs_to_utf8s) but the result is the same as with the boost conversion.
It's not the conversion itself that is wrong, because if I convert the modifications back to a std::wstring and show it in my program output, the encoding is still correct.
AFAIK openldap supports utf-8 since long, so I wonder if there's something else that must be done before this works?
I've looked into the openldap client/tools examples, but the utf-8 functions provided by the library are never used in there.
Update:
I noticed I can insert utf-8 characters like é into ldap with Apache Directory Studio. I can retrieve these values from ldap in my c++ program. But if I insert the same character again, without changing anything to that string, I get the ldap operations error again.

It turns out that my code was not wrong at all. My modifications tried to store the full name in the 'displayName' field as well as in 'gecos'. But apparently 'gecos' cannot handle utf8 data.
We don't actually use gecos anymore. The value was only present because of some software we used years ago, so I removed it from the directory.
What made it hard to find was that even though the loglevel was set to 'parse', this error was still not in the logs.
Because libldap can be such a hard nut to crack, I'll include a link to the complete code of the project i'm working on. It might serve as a starting point for other programmers. (Most of the code in tutorials I have found is outdated.)
https://github.com/yvanvds/yATools/tree/master/libadmintools/ldap

Converting a unsigned char(BYTE) array to const t_wchar* (LPCWSTR)

Alright so I have a BYTE array that I need to ultimately convert into a LPCWSTR or const WCHAR* to use in a built in function. I have been able to print out the BYTE array with printf but now that I need to convert it into a string I am having problems... mainly that I have no idea how to convert something like this into a non array type.
BYTE ba[0x10];
for(int i = 0; i < 0x10; i++)
{
printf("%02X", ba[i]); // Outputs: F1BD2CC7F2361159578EE22305827ECF
}
So I need to have this same thing basically but instead of printing the array I need it transformed into a LPCWSTR or WCHAR or even a string. The main problem I am having is converting the array into a non array form.

LPCWSTR represents a UTF-16 encoded string. The array contents you have shown are outside the 7bit ASCII range, so unless the BYTE array is already encoded in UTF-16 (the array you showed is not, but if it were, you could just use a simple type-cast), you will need to do a conversion to UTF-16. You need to know the particular encoding of the array before you can do that conversion, such as with the Win32 API MultiByteToWideChar() function, or third-party libraries like iconv or ICU, or built-in locale convertors in C++11, etc. So what is the actual encoding of the array, and where is the array data coming from? It is not UTF-8, for instance, so it has to be something else.

Alright I got it working. Now I can convert the BYTE array to a char* var. Thanks for the help guys but the formatting wasn't a large problem in this instance. I appreciate the help though, its always nice to have some extra input.
// Helper function to convert
Char2Hex(unsigned char ch, char* szHex)
{
unsigned char byte[2];
byte[0] = ch/16;
byte[1] = ch%16;
for(int i = 0; i < 2; i++)
{
if(byte[i] >= 0 && byte[i] <= 9)
{
szHex[i] = '0' + byte[i];
}
else
szHex[i] = 'A' + byte[i] - 10;
}
szHex[2] = 0;
}
// Function used throughout code to convert
CharStr2HexStr(unsigned char const* pucCharStr, char* pszHexStr, int iSize)
{
int i;
char szHex[3];
pszHexStr[0] = 0;
for(i = 0; i < iSize; i++)
{
Char2Hex(pucCharStr[i], szHex);
strcat(pszHexStr, szHex);
}
}

How to get codepoint of particular WCHAR character?

For example I need codepoint of 5th character here, that is ð
const WCHAR* mystring = L"Þátíð";
I know that it has code point : U+00F0 - but how to get this integer using c++ ?

WCHAR in Windows 2000 and later is UTF-16LE so it is not necessarily safe to access a specific character in a string by index. You should use something like CharNext to walk the string to get correct handling of surrogate pairs and combining characters/diacritics.
In this specific example Forgottn's answer depends on the compiler emitting precomposed versions of the á and í characters... (This is probably true for most Windows compilers, porting to Mac OS is probably problematic)

const WCHAR myString[] = L"Þátíð";
size_t myStringLength = 0;
if(SUCCEEDED(StringCchLengthW(myString, STRSAFE_MAX_CCH, &myStringLength))
{
LPCWSTR myStringIterator = myString;
for(size_t sz = 0; sz < myStringLength; ++sz)
{
unsigned int mySuperSecretUnicodeCharacter = *myStringIterator;
LPCWSTR myNextIterator = CharNext(myStringIterator);
std::vector<unsigned int> diacriticsOfMySuperSecretUnicodeCharacter(myStringIterator+1, myNextIterator);
myStringIterator = myNextIterator;
}
}
Edit 1: made it actually work
Edit 2: made it actually look for all codepoints

Convert wchar_t to char

I was wondering is it safe to do so?
wchar_t wide = /* something */;
assert(wide >= 0 && wide < 256 &&);
char myChar = static_cast<char>(wide);
If I am pretty sure the wide char will fall within ASCII range.

Why not just use a library routine wcstombs.

assert is for ensuring that something is true in a debug mode, without it having any effect in a release build. Better to use an if statement and have an alternate plan for characters that are outside the range, unless the only way to get characters outside the range is through a program bug.
Also, depending on your character encoding, you might find a difference between the Unicode characters 0x80 through 0xff and their char version.

You are looking for wctomb(): it's in the ANSI standard, so you can count on it. It works even when the wchar_t uses a code above 255. You almost certainly do not want to use it.
wchar_t is an integral type, so your compiler won't complain if you actually do:
char x = (char)wc;
but because it's an integral type, there's absolutely no reason to do this. If you accidentally read Herbert Schildt's C: The Complete Reference, or any C book based on it, then you're completely and grossly misinformed. Characters should be of type int or better. That means you should be writing this:
int x = getchar();
and not this:
char x = getchar(); /* <- WRONG! */
As far as integral types go, char is worthless. You shouldn't make functions that take parameters of type char, and you should not create temporary variables of type char, and the same advice goes for wchar_t as well.
char* may be a convenient typedef for a character string, but it is a novice mistake to think of this as an "array of characters" or a "pointer to an array of characters" - despite what the cdecl tool says. Treating it as an actual array of characters with nonsense like this:
for(int i = 0; s[i]; ++i) {
wchar_t wc = s[i];
char c = doit(wc);
out[i] = c;
}
is absurdly wrong. It will not do what you want; it will break in subtle and serious ways, behave differently on different platforms, and you will most certainly confuse the hell out of your users. If you see this, you are trying to reimplement wctombs() which is part of ANSI C already, but it's still wrong.
You're really looking for iconv(), which converts a character string from one encoding (even if it's packed into a wchar_t array), into a character string of another encoding.
Now go read this, to learn what's wrong with iconv.

An easy way is :
wstring your_wchar_in_ws(<your wchar>);
string your_wchar_in_str(your_wchar_in_ws.begin(), your_wchar_in_ws.end());
char* your_wchar_in_char = your_wchar_in_str.c_str();
I'm using this method for years :)

A short function I wrote a while back to pack a wchar_t array into a char array. Characters that aren't on the ANSI code page (0-127) are replaced by '?' characters, and it handles surrogate pairs correctly.
size_t to_narrow(const wchar_t * src, char * dest, size_t dest_len){
size_t i;
wchar_t code;
i = 0;
while (src[i] != '\0' && i < (dest_len - 1)){
code = src[i];
if (code < 128)
dest[i] = char(code);
else{
dest[i] = '?';
if (code >= 0xD800 && code <= 0xD8FF)
// lead surrogate, skip the next code unit, which is the trail
i++;
}
i++;
}
dest[i] = '\0';
return i - 1;
}

Technically, 'char' could have the same range as either 'signed char' or 'unsigned char'. For the unsigned characters, your range is correct; theoretically, for signed characters, your condition is wrong. In practice, very few compilers will object - and the result will be the same.
Nitpick: the last && in the assert is a syntax error.
Whether the assertion is appropriate depends on whether you can afford to crash when the code gets to the customer, and what you could or should do if the assertion condition is violated but the assertion is not compiled into the code. For debug work, it seems fine, but you might want an active test after it for run-time checking too.

Here's another way of doing it, remember to use free() on the result.
char* wchar_to_char(const wchar_t* pwchar)
{
// get the number of characters in the string.
int currentCharIndex = 0;
char currentChar = pwchar[currentCharIndex];
while (currentChar != '\0')
{
currentCharIndex++;
currentChar = pwchar[currentCharIndex];
}
const int charCount = currentCharIndex + 1;
// allocate a new block of memory size char (1 byte) instead of wide char (2 bytes)
char* filePathC = (char*)malloc(sizeof(char) * charCount);
for (int i = 0; i < charCount; i++)
{
// convert to char (1 byte)
char character = pwchar[i];
*filePathC = character;
filePathC += sizeof(char);
}
filePathC += '\0';
filePathC -= (sizeof(char) * charCount);
return filePathC;
}

one could also convert wchar_t --> wstring --> string --> char
wchar_t wide;
wstring wstrValue;
wstrValue[0] = wide
string strValue;
strValue.assign(wstrValue.begin(), wstrValue.end()); // convert wstring to string
char char_value = strValue[0];

In general, no. int(wchar_t(255)) == int(char(255)) of course, but that just means they have the same int value. They may not represent the same characters.
You would see such a discrepancy in the majority of Windows PCs, even. For instance, on Windows Code page 1250, char(0xFF) is the same character as wchar_t(0x02D9) (dot above), not wchar_t(0x00FF) (small y with diaeresis).
Note that it does not even hold for the ASCII range, as C++ doesn't even require ASCII. On IBM systems in particular you may see that 'A' != 65

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Replacing chars from string - c++

Related

How can I iterate over CString and compare its characters to int?

insert utf-8 data in openldap with c api

Converting a unsigned char(BYTE) array to const t_wchar* (LPCWSTR)

How to get codepoint of particular WCHAR character?

Convert wchar_t to char

Categories

Resources