I have to remove some chars from a string, but I have some problems. I found this part of code online, but it does not work so well, it removes the chars but it even removes the white spaces
string messaggio = "{Questo e' un messaggio} ";
char chars[] = {'Ì', '\x1','\"','{', '}',':'};
for (unsigned int i = 0; i < strlen(chars); ++i)
{
messaggio.erase(remove(messaggio.begin(), messaggio.end(), chars[i]), messaggio.end());
}
Can someone tell me how this part of code works and why it even removes the white spaces?
Because you use strlen on your chars array. This function stops ONLY when it encounters a \0, and you inserted none... So you're parsing memory after your array - which is bad, it should even provoke a SEGFAULT.
Also, calling std::remove is enough.
A correction could be:
char chars[] = {'I', '\x1','\"','{', '}',':'};
for (unsigned int i = 0; i < sizeof(chars); ++i)
{
std::remove(messaggio.begin(), messaggio.end(), chars[i]) ;
}
Answer for Wissblade is more or less correct, it just lacks of some technical details.
As mentioned strlen searches for terminating character: '\0'.
Since chars do not contain such character, this code invokes "Undefined behavior" (buffer overflow).
"Undefined behavior" - means anything can happen, code may work, may crash, may give invalid results.
So first step is to drop strlen and use different means to get size of the array.
There is also another problem. Your code uses none ASCII character: 'Ì'.
I assume that you are using Windows and Visual Studio. By default msvc compiler assumes that file is encoded using your system locale and uses same locale to generate exactable. Windows by default uses single byte encoding specific to your language (to be compatible with very old software). Only in such chase you code has chance to work. On platforms/configuration with mutibyte encoding, like UTF-8 this code can't work even after Wisblade fixes.
Wisblade fix can take this form (note I change order of loops, now iteration over characters to remove is internal loop):
bool isCharToRemove(char ch)
{
constexpr char skipChars[] = {'Ì', '\x1','\"','{', '}',':'};
return std::find(std::begin(skipChars), std::end(skipChars), ch) != std::end(skipChars);
}
std::string removeMagicChars(std::string message)
{
message.erase(
std::remove_if(message.begin(), message.end(), isCharToRemove),
message.end());
}
return message;
}
Let me know if you need solution which can handle more complex text encoding.
Related
Compiled and ran the code below and was surprised by the output.
# include <iostream>
int main()
{
const char* c="hello";
for (size_t i=0; i<116; ++i)
{
std::cout << *(c+i);
}
}
Output:
hellobasic_stringallocator<T>::allocate(size_t n) 'n' exceeds maximum supported sizeSt16nested_exception
Is this just undefined behavior or something else?
It's something else: bad code leading to a buffer overflow. Your c has exactly 6 bytes allocated in the read only section it points to, yet you're trying to read past its end, which then triggers a look up for the first '\0' character.
Luckily for you, that character is so far away from the beginning of the string (c + 6) that the string allocator gives up.
I want to remove only the first character in a string that is NOT a digit. The first character can be anything from ‘A’ to ‘Z’ or it may be a special character like ‘&’ or ‘#’. This legacy code is written in MFC. I've looked at the CString class but cannot figure out how to make this work.
I have strings that may look like any of the following:
J22008943452GF or 22008943452GF or K33423333333IF or 23000526987IF or #12000895236GF. You get the idea by now.
My dilemma is I need to remove the character in the first position of all the strings, but not the strings that starts with a digit. For the strings that begin with a digit, I need to leave them alone. Also, none of the other characters in the string should not be altered. For example the ‘G’, ‘I’ or ‘F’ in the later part of the string should not be changed. The length of the string will always be 13 or 14 digits.
Here is what I have so far.
CString GAbsMeterCalibration::TrimMeterSNString (CString meterSN)
{
meterSN.MakeUpper();
CString TrimmedMeterSNString = meterSN;
int strlength = strlen(TrimmedMeterSNString);
if (strlength == 13)
{
// Check the first character anyway, even though it’s
// probably okay. If it is a digit, life’s good.
// Return unaltered TrimmedMeterSNString;
}
if (strlength == 14))
{
//Check the first character, it’s probably going
// to be wrong and is a character, not a digit.
// if I find a char in the first postion of the
// string, delete it and shift everything to the
// left. Make this my new TrimmedMeterSNString
// return altered TrimmedMeterSNString;
}
}
The string lengths are checked and validated before the calls.
From my investigations, I’ve found that MFC does not have a regular expression
class. Nor does it have the substring methods.
How about:
CString GAbsMeterCalibration::TrimMeterSNString (CString meterSN)
{
meterSN.MakeUpper();
CString TrimmedMeterSNString = meterSN;
int strlength = strlen(TrimmedMeterSNString);
if (std::isdigit(TrimmedMeterSNString.GetAt(0)) )
{
// Check the first character anyway, even though it’s
// probably okay. If it is a digit, life’s good.
// Return unaltered TrimmedMeterSNString;
}
}
From what I understand, you want to remove the first letter if it is not a digit. So you may make this function simpler:
CString GAbsMeterCalibration::TrimMeterSNString(CString meterSN)
{
meterSN.MakeUpper();
int length = meterSN.GetLength();
// just check the first character is always a digit else remove it
if (length > 0 && unsigned(meterSN[0] - TCHAR('0')) > unsigned('9'))
{
return meterSN.Right(length - 1);
}
return meterSN;
}
I am not using function isdigit instead of the conditional trick with unsigned because CString uses TCHAR which can be either char or wchar_t.
The solution is fairly straight forward:
CString GAbsMeterCalibration::TrimMeterSNString(CString meterSN) {
meterSN.MakeUpper();
return _istdigit(meterSN.GetAt(0)) ? meterSN :
meterSN.Mid(1);
}
The implementation can be compiled for both ANSI and Unicode project settings by using _istdigit. This is required since you are using CString, which stores either MBCS or Unicode character strings. The desired substring is extracted using CStringT::Mid.
(Note that CString is a typedef for a specific CStringT template instantiation, depending on your project settings.)
CString test="12355adaddfca";
if((test.GetAt(0)>=48)&&(test.GetAt(0)<=57))
{
//48 and 57 are ascii values of 0&9, hence this is a digit
//do your stuff
//CString::GetBuffer may help here??
}
else
{
//it is not a digit, do your stuff
}
Compare the ascii value of the first position in the string and you know if it's a digit or not..
I don't know if you've tried this, but, it should work.
CString str = _T("#12000895236GF");
// check string to see if it starts with digit.
CString result = str.SpanIncluding(_T("0123456789"));
// if result is empty, string does not start with a number
// and we can remove the first character. Otherwise, string
// remains intact.
if (result.IsEmpty())
str = str.Mid(1);
Seems a little easier than what's been proposed.
I'm trying to copy data that conatin '\0'. I'm using C++ .
When the result of the research was negative, I decide to write my own fonction to copy data from one char* to another char*. But it doesn't return the wanted result !
My attempt is the following :
#include <iostream>
char* my_strcpy( char* arr_out, char* arr_in, int bloc )
{
char* pc= arr_out;
for(size_t i=0;i<bloc;++i)
{
*arr_out++ = *arr_in++ ;
}
*arr_out = '\0';
return pc;
}
int main()
{
char * out= new char[20];
my_strcpy(out,"12345aa\0aaaaa AA",20);
std::cout<<"output data: "<< out << std::endl;
std::cout<< "the length of my output data: " << strlen(out)<<std::endl;
system("pause");
return 0;
}
the result is here:
I don't understand what is wrong with my code.
Thank you for help in advance.
Your my_strcpy is working fine, when you write a char* to cout or calc it's length with strlen they stop at \0 as per C string behaviour. By the way, you can use memcpy to copy a block of char regardless of \0.
If you know the length of the 'string' then use memcpy. Strcpy will halt its copy when it meets a string terminator, the \0. Memcpy will not, it will copy the \0 and anything that follows.
(Note: For any readers who are unaware that \0 is a single-character byte with value zero in string literals in C and C++, not to be confused with the \\0 expression that results in a two-byte sequence of an actual backslash followed by an actual zero in the string... I will direct you to Dr. Rebmu's explanation of how to split a string in C for further misinformation.)
C++ strings can maintain their length independent of any embedded \0. They copy their contents based on this length. The only thing is that the default constructor, when initialized with a C-string and no length, will be guided by the null terminator as to what you wanted the length to be.
To override this, you can pass in a length explicitly. Make sure the length is accurate, though. You have 17 bytes of data, and 18 if you want the null terminator in the string literal to make it into your string as part of the data.
#include <iostream>
using namespace std;
int main() {
string str ("12345aa\0aaaaa AA", 18);
string str2 = str;
cout << str;
cout << str2;
return 0;
}
(Try not to hardcode such lengths if you can avoid it. Note that you didn't count it right, and when I corrected another answer here they got it wrong as well. It's error prone.)
On my terminal that outputs:
12345aaaaaaa AA
12345aaaaaaa AA
But note that what you're doing here is actually streaming a 0 byte to the stdout. I'm not sure how formalized the behavior of different terminal standards are for dealing with that. Things outside of the printable range can be used for all kinds of purposes depending on the kind of terminal you're running... positioning the cursor on the screen, changing the color, etc. I wouldn't write out strings with embedded zeros like that unless I knew what the semantics were going to be on the stream receiving them.
Consider that if what you're dealing with are bytes, not to confuse the issue and to use a std::vector<char> instead. Many libraries offer alternatives, such as Qt's QByteArray
Your function is fine (except that you should pass to it 17 instead of 20). If you need to output null characters, one way is to convert the data to std::string:
std::string outStr(out, out + 17);
std::cout<< "output data: "<< outStr << std::endl;
std::cout<< "the length of my output data: " << outStr.length() <<std::endl;
I don't understand what is wrong with my code.
my_strcpy(out,"12345aa\0aaaaa AA",20);
Your string contains character '\' which is interpreted as escape sequence. To prevent this you have to duplicate backslash:
my_strcpy(out,"12345aa\\0aaaaa AA",20);
Test
output data: 12345aa\0aaaaa AA
the length of my output data: 18
Your string is already terminated midway.
my_strcpy(out,"12345aa\0aaaaa AA",20);
Why do you intend to have \0 in between like that? Have some other delimiter if yo so desire
Otherwise, since std::cout and strlen interpret a \0 as a string terminator, you get surprises.
What I mean is that follow the convention i.e. '\0' as string terminator
I am reading input for my program in a loop using getline.
string temp(STR_SIZE, ' ');
string str_num(STR_SIZE, ' ');
...
getline(cin, temp, '\n');
After which, I use a function to find the next delimiter(white space) and assign all the characters before the white space to str_num. Looks something like this:
str_num.assign(temp, 0, next_white_space(0));
I have verified that this works well. The next step in my solution would be to convert str_num to an int(this part also works well), but I should check to make sure each character in str_num is a digit. Here's the best of what I've tried:
if(!isdigit(str_num[0] - '0')) {
cout << "Error: Not an appropriate value\n";
break; /* Leave control structure */
}
For some reason, This always prints the error message and exits the structure.
Why is that?
I've used operator[] for string objects before, and it seemed to work well. But, here, it's totally messing me up.
Thanks.
std::isdigit takes a char's integer value and checks it.
So, remove the - '0' and just pass str_num[index] to isdigit().
Note: because this function comes from C, the old style of treating chars as integers shows through in the method taking an int. However, chars can promote to int values, so a char becomes an int just fine and this works.
I'm trying to split a c++ string into a number of substrings (NUM_LINES) each with the length of CHAR_PER_LINE.
for(int i = 0; i < NUM_LINES; i++) {
lines[i] = totalstring.substr(i*CHAR_PER_LINE,CHAR_PER_LINE);
}
Works fine as long as there's no special character in the string. Otherwise substr() gets me a string that isn't CHAR_PER_LINE characters long, but stops right before a special character and exits the loop.
Any hints?
ok, edit:
1) I'm definitely not reaching the end of my string. If my totalstring.length() is 1000 and I have a special character in the first line (that is the first CHAR_PER_LINE (30) chars of the string) the loop exits.
2) Special characters I had problems with are for instance 'ö' and '–' (the long one)
EDIT 2:
std::string text = "aaaabbbbccccdödd";
std::string line[4];
for(int i = 0; i < 4; i++)
line[i] = text.substr(i*4,4);
for(int i = 0; i < 4; i++)
std::cout << line[i] << "\n";
This example works. I get a '%' for the ö.
So the problem wasn't substr(). Sorry. I'm using Cairo to create a gui and it seems my Cairo output is causing the troubles, not substr().
How about a hint of what special characters you're talking about?
My guess is that you reached the end of the string.
The STL doesn't care of special characters. If there are multibyte sequences (i.e. UTF8), std::string treats them as a sequence of single one-byte-characters. If you need proper Unicode handling, do not use the builtin substr or length.
You can, however, use std::wstring (from your posting it isn't clear whether you're already using it, but I guess not) - it holds wchar_t characters - large enough for the native character set of your target platform.
What's happening is that you're running off the end of the string on the last line. It isn't exiting the loop after skipping characters. It exits the loop precisely when it should, and the last line contains the right number of characters, it's just that some of them are garbage so your diagnositic printout is showing that the line is short.
The only way the loop could be exited early is if an exception were thrown.