garbage characters in buffer - c++

I have this function.
void cast(char *buf)
{
string str(buf);
string s=str.substr(0,5);
std::transform(s.begin(), s.end(), s.begin(),::toupper);
DemoInput=s;
}
The *buf is a message that the client sends. I'm trying to take that message and no matter how long it is strip it to five characters and make it uppercase. This works if the message > 5 but if the message < 5 then there are garbage characters at the end of it.
ex: if buf is "long" then DemoInput becomes "LONG\\r"
I thought about using regex ("[:upper:]") but think there must be an easier way to do this.
I find posix regex a bit more complicated then python regex for example.

If you only need the first 5 characters, don't copy the whole of buf. That just wastes space and time. Also, you shouldn't copy anything past the telnet control character \r.
void cast(char *buf)
{
size_t len = 0;
while (len < 5 && buf[len] != '\0' && buf[len] != '\r') {
++len;
}
string s(buf, len);
std::transform(s.begin(), s.end(), s.begin(),::toupper);
DemoInput=s;
}

Why don't you change the code supplying the buf to the cast function. Append '\0' to signify end of string as it sounds though it may not be null terminated.

Related

String Rev function, strange behavior for out of bounds exception (c++)

I played with the string function,i wrote the following one, obviously I set the first character in the ret string to be written in a place that is out of bounds, but instead of an exception, I get a string that has one extra place .
std::string StringManipulations::rev(std::string s)
{
std::string ret(s.size(), ' ');
for (int i = 0; i < s.size(); i++)
{
std::string ch;
ch.push_back(s[i]);
int place = s.size() -i;
ret.replace(place,1,ch);
}
return ret;
}
I write by mistake in a position that corresponds to a place that is one larger than the original string size that I assign at the beginning of the function.
Why don't we get an error ?
s = StringManipulations::rev("abcde");
std::cout << s.size();
std::cout << s;
output is : 6 _edcba
any help ?
solved: adding ch as a String adds a null terminator automatically, and by doing so we can get a new string with size+1.
C++ has a zero-overhead rule.
This means that no overhead, (like checking if an index is in-bounds) should be done unintentionally.
You don't get an exception because c++ simply doesn't verify if the index is valid.
For the extra character, this might have something to do with (regular) c strings.
In c, strings are arrays of type char (char*) without a defined size.
The end of a string is denoted with a null terminator.
C++ strings are backwards compatible, meaning that they have a null terminator too.
It's possible that you replaced the terminator with an other character but the next byte was also a zero meaning that you added one more char.
In addition to the information above about null terminators, another answer to your question is that the docs says it will only throw if the position is greater than the string size, rather than beyond the end of the string.
string replace api

Use C++, How to change the above multiple \n into only one \n?

CString str = _T("111\n\n\n222");
How to change the above multiple \n into only one \n?
Cannot use Replace directly, because the number of \n is not fixed
while (str.Replace("\n\n", "\n") > 0)
;
You can use CString::GetBuffer to obtain a buffer that you can modify. The corresponding CString::ReleaseBuffer allows you to specify a new length for the string.
If you want to remove consecutive characters, you can do this easily by simply walking through the string and rewriting its characters. Any time you see a character that you wish to remove, simply don't write it and don't update the end-position of the string.
Here's a general-purpose function to remove some number of consecutive characters from a CString:
void LimitConsecutiveCharacters(CString& str, TCHAR ch, int maxConsecutive = 1)
{
LPTSTR *begin = str.GetBuffer(0);
LPTSTR *end = begin;
int consecutive = 0;
for (LPTSTR *pos = begin; *pos != _T('\0'); ++pos)
{
if (*pos == ch)
{
if (consecutive >= maxConsecutive)
continue;
++consecutive;
}
else
{
consecutive = 0;
}
*end++ = *pos;
}
int newLength = end - begin;
str.ReleaseBuffer(newLength);
}
As you can see above, it keeps a count of how many consecutive values it has seen for the target character. If the maximum number of consecutive characters is reached, then it simply moves to the next loop iteration. Any time it sees some other character, the "consecutive" count resets.
The end tracks the position that is being written to, which might even be the same position you're reading from, if you've not removed any characters. At the end, some simple pointer arithmetic calculates the new string length and calls CString::ReleaseBuffer.
An example invocation would be:
CString str = _T("111\n\n\n222");
LimitConsecutiveCharacters(str, _T('\n'));
You can convert your CString into a std::wstring, use regex_replace and then convert back to CString.
The patterns for the regular expression would be something like:
find what: L"\n+"
replace by: L"\n"

How to check the contents of a LPTSTR string?

I'm trying to understand why a segmentation fault (SIGSEGV) occurs during the execution of this piece of code. This error occurs when testing the condition specified in the while instruction, but it does not occur at the first iteration, but at the second iteration.
LPTSTR arrayStr[STR_COUNT];
LPTSTR inputStr;
LPTSTR str;
// calls a function from external library
// in order to set the inputStr string
set_input_str(param1, (char*)&inputStr, param3);
str = inputStr;
while( *str != '\0' )
{
if( debug )
printf("String[%d]: %s\n", i, (char*)str);
arrayStr[i] = str;
str = str + strlen((char*)str) + 1;
i++;
}
After reading this answer, I have done some research on the internet and found this article, so I tried to modify the above code, using this piece of code read in this article (see below). However, this change did not solve the problem.
for (LPTSTR pszz = pszzStart; *pszz; pszz += lstrlen(pszz) + 1) {
... do something with pszz ...
}
As assumed in this answer, it seems that the code expects double null terminated arrays of string. Therefore, I wonder how I could check the contents of the inputStr string, in order to check if it actually contains only one null terminator char.
NOTE: the number of characters in the string printed from printf instruction is twice the value returned by the lstrlen(str) function call at the first iteration.
OK, now that you've included the rest of the code it is clear that it is indeed meant to parse a set of consecutive strings. The problem is that you're mixing narrow and wide string types. All you need to do to fix it is change the variable definitions (and remove the casts):
char *arrayStr[STR_COUNT];
char *inputStr;
char *str;
// calls a function from external library
// in order to set the inputStr string
set_input_str(param1, &inputStr, param3);
str = inputStr;
while( *str != '\0' )
{
if( debug )
printf("String[%d]: %s\n", i, str);
arrayStr[i] = str;
str = str + strlen(str) + 1;
i++;
}
Specifically, the issue was occurring on this line:
while( *str != '\0' )
since you hadn't cast str to char * the comparison was looking for a wide nul rather than a narrow nul.
str = str + strlen(str) + 1;
You go out of bounds, change to
str = str + 1;
or simply:
str++;
Of course you are inconsistently using TSTR and strlen, the latter assuming TCHAR = char
In any case, strlen returns the length of the string, which is the number of characters it contains not including the nul character.
Your arithmetic is out by one but you know you have to add one to the length of the string when you allocate the buffer.
Here however you are starting at position 0 and adding the length which means you are at position len which is the length of the string. Now the string runs from offset 0 to offset len - 1 and offset len holds the null character. Offset len + 1 is out of bounds.
Sometimes you might get away with reading it, if there is extra padding, but it is undefined behaviour and here you got a segfault.
This looks to me like code that expects double null terminated arrays of strings. I suspect that you are passing a single null terminated string.
So you are using something like this:
const char* inputStr = "blah";
but the code expects two null terminators. Such as:
const char* inputStr = "blah\0";
or perhaps an input value with multiple strings:
const char* inputStr = "foo\0bar\0";
Note that these final two strings are indeed double null terminated. Although only one null terminator is written explicitly at the end of the string, the compiler adds another one implicitly.
Your question edit throws a new spanner in the works? The cast in
strlen((char*)str)
is massively dubious. If you need to cast then the cast must be wrong. One wonders what LPTSTR expands to for you. Presumably it expands to wchar_t* since you added that cast to make the code compile. And if so, then the cast does no good. You are lying to the compiler (str is not char*) and lying to the compiler never ends well.
The reason for the segmentation fault is already given by Alter's answer. However, I'd like to add that the usual style of parsing a C-style string is more elegant and less verbose
while (char ch = *str++)
{
// other instructions
// ...
}
The scope of ch is only within in the body of the loop.
Aside: Either tag the question as C or C++ but not both, they're different languages.

Parsing a character array with several null terminated characters into different strings - C++

I asked this question before but with less information than I have now.
What I essentially have is a data block of type char. That block contains filenames that I need to format and put into a vector. I initially thought the formation of this char block had three spaces between each filename. Now, I realize they are '/0' null terminated characters. So the solution that was provided was fantastic for the example I gave when I thought that there were spaces rather than null chars.
Here is what the structure looks like. Also, I should point out I DO have the size of the character data block.
filename1.bmp/0/0/0brick.bmp/0/0/0toothpaste.gif/0/0/0
The way the best solution did it was this:
// The stringstream will do the dirty work and deal with the spaces.
std::istringstream iss(s);
// Your filenames will be put into this vector.
std::vector<std::string> v;
// Copy every filename to a vector.
std::copy(std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(v));
// They are now in the vector, print them or do whatever you want with them!
for(int i = 0; i < v.size(); ++i)
std::cout << v[i] << "\n";
This works fantastic for my original question but not with the fact they are null chars instead of spaces. Is there any way to make the above example work. I tried replacing null chars in the array with spaces but that didn't work.
Any ideas on the best way to format this char block into a vector of strings?
Thanks.
If you know your filenames don't have embedded "\0" characters in them, then this should work. (untested)
const char * buffer = "filename1.bmp/0/0/0brick.bmp/0/0/0toothpaste.gif/0/0/0";
int size_of_buffer = 1234; //Or whatever the real value is
const char * end_of_buffer = buffer + size_of_buffer;
std::vector<std::string> v;
while( buffer!=end_of_buffer)
{
v.push_back( std::string(buffer) );
buffer = buffer+filename1.size()+3;
}
If they do have embedded null characters in the filename you'll need to be a little cleverer.
Something like this should work. (untested)
char * start_of_filename = buffer;
while( start_of_filename != end_of_buffer )
{
//Create a cursor at the current spot and move cursor until we hit three nulls
char * scan_cursor = buffer;
while( scan_cursor[0]!='\0' && scan_cursor[1]!='\0' && scan_cursor[2]!='\0' )
{
++scan_cursor;
}
//From our start to the cursor is our word.
v.push_back( std::string(start_of_filename,scan_cursor) );
//Move on to the next word
start_of_filename = scan_cursor+3;
}
If spaces would be a suitable separator, you could just replace the null characters by spaces:
std::replace(std::begin(), std::end(), 0, ' ');
... and go from there. However, I'd suspect that you really need to use the null characters as separators as file names typically can include spaces. In this case, you could either use std::getline() with '\0' as the end of line or use the find() and substr() members of the string itself. The latter would look something like this:
std::vector<std::string> v;
std::string const null(1, '\0');
for (std::string::size_type pos(0); (pos = s.find_first_not_of(null, pos)) != s.npos; )
{
end = s.find(null, pos);
v.push_back(s.substr(0, end - pos));
pos = end;
}

Trimming UTF8 buffer

I have a buffer with UTF8 data. I need to remove the leading and trailing spaces.
Here is the C code which does it (in place) for ASCII buffer:
char *trim(char *s)
{
while( isspace(*s) )
memmove( s, s+1, strlen(s) );
while( *s && isspace(s[strlen(s)-1]) )
s[strlen(s)-1] = 0;
return s;
}
How to do the same for UTF8 buffer in C/C++?
P.S.
Thanks for perfomance tip regarding strlen(). Back to UTF8 specific: what if I need to remove all spaces all together, not only at beginning and at the tail? Also I may need to remove all characters with ASCII code <32. Is any specific here for UTF8 case, like using mbstowcs()?
Do you want to remove all of the various Unicode spaces too, or just ASCII spaces? In the latter case you don't need to modify the code at all.
In any case, the method you're using that repeatedly calls strlen is extremely inefficient. It turns a simple O(n) operation into at least O(n^2).
Edit: Here's some code for your updated problem, assuming you only want to strip ASCII spaces and control characters:
unsigned char *in, *out;
for (out = in; *in; in++) if (*in > 32) *out++ = *in;
*out = 0;
strlen() scans to the end of the string, so calling it multiple times, as in your code, is very inefficient.
Try looking for the first non-space and the last non-space and then memmove the substring:
char *trim(char *s)
{
char *first;
char *last;
first = s;
while(isspace(*first))
++first;
last = first + strlen(first) - 1;
while(last > first && isspace(*last))
--last;
memmove(s, first, last - first + 1);
s[last - first + 1] = '\0';
return s;
}
Also remember that the code modifies its argument.