c++ Remove everything in string prior to sequence including sequence - c++

I've been writing a program that receives data from other sources across a network, and I need to sanitize the data before I send it to be processed. Previously, I had been doing it based on size, as below:
char data[max_length];
boost::system::error_code error;
size_t length = sock->read_some( boost::asio::buffer( data ), error );
std::stringstream ss;
for( int i = 0; i < max_length; i++ ) {
ss << data[i];
}
std::vector<int> idata;
std::string s2 = ss.str();
s2.erase( 0, 255 );
But the headers I need to remove are of variable length. So after doing some digging, I found I could remove them by finding the sequence of characters I know they'll end in - in this case \r\n\r\n - and removing everything up until then using size_t like so:
size_t p = s2.find( "\r\n\r\n" );
s2.erase( 0, p );
But that still leaves the \r\n\r\n at the beginning of my string which, at best, throws off my data handling later, and at worst, might cause issues down the line, as there are segments of my program that don't respond well to whitespace.
So my question is this: Is there a better way I could be doing this that will remove up to and including the specified sequence of characters? Can I just do p = p + 4; ? is that even possible with the size_t type?

Yes, you can write p + 4, since size_t is an (unsigned) integer type.
By the way, you might also want to pass data directly into a std::string constructor, rather than use std::stringstream ss.
Edit: To explain in more detail, it would look something like this:
char data[max_length];
// Read data and ensure that it is null-terminated ...
std::string s2(data); // Call the std::string constructor that inputs a null-terminated C string.
size_t p = s2.find("\r\n\r\n");
s2.erase(0, p + 4);

Related

Convert from vector<unsigned char> to char* includes garbage data

I'm trying to base64 decode a string, then convert that value to a char array for later use. The decode works fine, but then I get garbage data when converting.
Here's the code I have so far:
std::string encodedData = "VGVzdFN0cmluZw=="; //"TestString"
std::vector<BYTE> decodedData = base64_decode(encodedData);
char* decodedChar;
decodedChar = new char[decodedData.size() +1]; // +1 for the final 0
decodedChar[decodedData.size() + 1] = 0; // terminate the string
for (size_t i = 0; i < decodedData.size(); ++i) {
decodedChar[i] = decodedData[i];
}
vector<BYTE> is a typedef of unsigned char BYTE, as taken from this SO answer. The base64 code is also from this answer (the most upvoted answer, not the accepted answer).
When I run this code, I get the following value in the VisualStudio Text Visualiser:
TestStringÍ
I've also tried other conversion methods, such as:
char* decodedChar = reinterpret_cast< char *>(&decodedData[0]);
Which gives the following:
TestStringÍÍÍýýýýÝÝÝÝÝÝÝ*b4d“
Why am I getting the garbage data at the end of the string? What am i doing wrong?
EDIT: clarified which answer in the linked question I'm using
char* decodedChar;
decodedChar = new char[decodedData.size() +1]; // +1 for the final 0
Why would you manually allocate a buffer and then copy to it when you have std::string available that does this for you?
Just do:
std::string encodedData = "VGVzdFN0cmluZw=="; //"TestString"
std::vector<BYTE> decodedData = base64_decode(encodedData);
std::string decodedString { decodedData.begin(), decodedData.end() };
std::cout << decodedString << '\n';
If you need a char * out of this, just use .c_str()
const char* cstr = decodedString.c_str();
If you need to pass this on to a function that takes char* as input, for example:
void someFunc(char* data);
//...
//call site
someFunc( &decodedString[0] );
We have a TON of functions and abstractions and containers in C++ that were made to improve upon the C language, and so that programmers wouldn't have to write things by hand and make same mistakes every time they code. It would be best if we use those functionalities wherever we can to avoid raw loops or to do simple modifications like this.
You are writing beyond the last element of your allocated array, which can cause literally anything to happen (according to the C++ standard). You need decodedChar[decodedData.size()] = 0;

C++ Send a file containing \0 via sockets without accidental closure

I serialize the file via the code beneath, and send it over winsocks, this works fine with textfiles, but when I tried to send a jpg, the string contains \0 as some of the character elements, so the sockets only send part of the string, thinking \0 is the end, i was considering replacing \0 with something else, but say i replace it with 'xx', then replace it back on the other end, what if the file had natural occurrences of 'xx' that get lost? Sure I could make a large, unlikely sequence, but that bloats the file.
Any help appreciated.
char* read_file(string path, int& len)
{
std::ifstream infile(path);
infile.seekg(0, infile.end);
size_t length = infile.tellg();
infile.seekg(0, infile.beg);
len = length;
char* buffer = new char[len]();
infile.read(buffer, length);
return buffer;
}
string load_to_buffer(string file)
{
char* img;
int ln;
img = read_file(file, ln);
string s = "";
for (int i = 1; i <= ln; i++){
char c = *(img + i);
s += c;
}
return s;
}
Probably somewhere in your code (that isn't seen in the code you have posted) you use strlen() or std::string::length() to send the data, and/or you use std::string::c_str() to get the buffer. This results in truncated data because these functions stop at \0.
std::string is not good to handle binary data. Use std::vector<char> instead, and remove the new[] stuff.

string <-> byte[] conversions

I'm implementing a system that uses libcrafter and crypto++ to transmit specific frames on the network. But the problem I'm stuck with isn't at this level at all.
It's about conversion between types used in these libraries.
1) At the emission (solved)
I'm trying to convert the Crafter::byte array to a std::string, in order to put this message into a network frame (as an initialization vector for an AES encryption/decryption).
Moreover, the iv must be zeroed, and I can't initialize it properly (despite the answers here or there).
EDIT : to initialize it to 00, I had to do it in hexadecimal : 0x30. And to convert it to a std::string I had to provide the length ie ivLen2 (thanks for the answers).
Here's what I do :
const int ivLen2 = 2;
std::string encrypted_message("whatever");
Crafter::byte iv[ivLen2]={0x30, 0x30}; // crypto salt of 2 bytes at 0.
std::string ivStr( reinterpret_cast< char const* >(iv), ivLen2 ) ;
string mess2send = ivStr + encrypted_message;
And if I display them, with this :
cout<<"iv[0] : "<<iv[0]<<endl; // display 0
cout<<"mess2send : "<<mess2send<<endl; // display 00whatever
Why don't I simply create a zeroed string and send it ? In order to have generic functions, and a re-usable code.
2) At the reception (pending)
Without surprises I have to do the opposite. I get a iv and the message concatenated within a vector<byte>* payload, and I have to extract the iv as a byte array, and the message within a string.
The message isn't actually the problem, given that std::string is close to vector.
Here's what I tempt to retrieve the iv :
Crafter::byte iv[ ivLen2 ];
for (int i = 0; i < ivLen2; i++)
{
iv[i] = (byte)payload->at(i);
}
std::string iv_rcv( reinterpret_cast< char const* >(iv) ) ;
And to display them, I do (in the same loop) :
cout<<iv[i];
But it gives me a non-ASCII character.
I've also tried this (following this and this answers) :
Crafter::byte* iv;
std::string iv_rcv( payload->begin(), payload->begin()+ivLen2 ) ;
iv = (byte*)iv_rcv.c_str();
But it doesn't give me the supposed initialized values...
Does anybody have a clue ? Is my code wrong ?
I don't think this will work:
const int ivLen2 = 2;
std::string encrypted_message("whatever");
Crafter::byte iv[ivLen2]={0x00, 0x00}; // crypto salt of 2 bytes
std::string ivStr( reinterpret_cast< char const* >(iv) ) ;
How does the std::string know how much data to copy from the iv pointer?
You have to use a constructor that takes the length of the data like:
std::string ivStr( reinterpret_cast< char const* >(iv), ivLen2 ) ;
The pointer only constructor is for specifically encoded strings that are terminated by a null character. Unless you are using one of those you must pass the length.
Try this:
const int ivLen2 = 2;
std::string encrypted_message("whatever");
Crafter::byte iv[ivLen2]={0x00, 0x00}; // crypto salt of 2 bytes
std::string ivStr( reinterpret_cast< char const* >(iv), ivLen2 ) ;
std::string mess2send = ivStr + encrypted_message;
std::cout << (int)mess2send[0] << (int)mess2send[1] << mess2send.substr(2) << '\n';

Parsing a character array with several null terminated characters into different strings - C++

I asked this question before but with less information than I have now.
What I essentially have is a data block of type char. That block contains filenames that I need to format and put into a vector. I initially thought the formation of this char block had three spaces between each filename. Now, I realize they are '/0' null terminated characters. So the solution that was provided was fantastic for the example I gave when I thought that there were spaces rather than null chars.
Here is what the structure looks like. Also, I should point out I DO have the size of the character data block.
filename1.bmp/0/0/0brick.bmp/0/0/0toothpaste.gif/0/0/0
The way the best solution did it was this:
// The stringstream will do the dirty work and deal with the spaces.
std::istringstream iss(s);
// Your filenames will be put into this vector.
std::vector<std::string> v;
// Copy every filename to a vector.
std::copy(std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(v));
// They are now in the vector, print them or do whatever you want with them!
for(int i = 0; i < v.size(); ++i)
std::cout << v[i] << "\n";
This works fantastic for my original question but not with the fact they are null chars instead of spaces. Is there any way to make the above example work. I tried replacing null chars in the array with spaces but that didn't work.
Any ideas on the best way to format this char block into a vector of strings?
Thanks.
If you know your filenames don't have embedded "\0" characters in them, then this should work. (untested)
const char * buffer = "filename1.bmp/0/0/0brick.bmp/0/0/0toothpaste.gif/0/0/0";
int size_of_buffer = 1234; //Or whatever the real value is
const char * end_of_buffer = buffer + size_of_buffer;
std::vector<std::string> v;
while( buffer!=end_of_buffer)
{
v.push_back( std::string(buffer) );
buffer = buffer+filename1.size()+3;
}
If they do have embedded null characters in the filename you'll need to be a little cleverer.
Something like this should work. (untested)
char * start_of_filename = buffer;
while( start_of_filename != end_of_buffer )
{
//Create a cursor at the current spot and move cursor until we hit three nulls
char * scan_cursor = buffer;
while( scan_cursor[0]!='\0' && scan_cursor[1]!='\0' && scan_cursor[2]!='\0' )
{
++scan_cursor;
}
//From our start to the cursor is our word.
v.push_back( std::string(start_of_filename,scan_cursor) );
//Move on to the next word
start_of_filename = scan_cursor+3;
}
If spaces would be a suitable separator, you could just replace the null characters by spaces:
std::replace(std::begin(), std::end(), 0, ' ');
... and go from there. However, I'd suspect that you really need to use the null characters as separators as file names typically can include spaces. In this case, you could either use std::getline() with '\0' as the end of line or use the find() and substr() members of the string itself. The latter would look something like this:
std::vector<std::string> v;
std::string const null(1, '\0');
for (std::string::size_type pos(0); (pos = s.find_first_not_of(null, pos)) != s.npos; )
{
end = s.find(null, pos);
v.push_back(s.substr(0, end - pos));
pos = end;
}

How to capture length of sscanf'd string?

I'm parsing a string that follows a predictable pattern:
1 character
an integer (one or more digits)
1 colon
a string, whose length came from #2
For example:
s5:stuff
I can see easily how to parse this with PCRE or the like, but I'd rather stick to plain string ops for the sake of speed.
I know I'll need to do it in 2 steps because I can't allocate the destination string until I know its length. My problem is gracefully getting the offset for the start of said string. Some code:
unsigned start = 0;
char type = serialized[start++]; // get the type tag
int len = 0;
char* dest = NULL;
char format[20];
//...
switch (type) {
//...
case 's':
// Figure out the length of the target string...
sscanf(serialized + start, "%d", &len);
// <code type='graceful'>
// increment start by the STRING LENGTH of whatever %d was
// </code>
// Don't forget to skip over the colon...
++start;
// Build a format string which accounts for length...
sprintf(format, "%%%ds", len);
// Finally, grab the target string...
sscanf(serialized + start, format, string);
break;
//...
}
That code is roughly taken from what I have (which isn't complete because of the issue at hand) but it should get the point across. Maybe I'm taking the wrong approach entirely. What's the most graceful way to do this? The solution can either C or C++ (and I'd actually like to see the competing methods if there are enough responses).
You can use the %n conversion specifier, which doesn't consume any input - instead, it expects an int * parameter, and writes the number of characters consumed from the input into it:
int consumed;
sscanf(serialized + start, "%d%n", &len, &consumed);
start += consumed;
(But don't forget to check that sscanf() returned > 0!)
Use the %n format specifier to write the number of characters read so far to an integer argument.
Here's a C++ solution, it could be better, and is hard-coded specifically to deal with your example input, but shouldn't require much modification to get working.
std::stringstream ss;
char type;
unsigned length;
char dummy;
std::string value;
ss << "s5:Helloxxxxxxxxxxx";
ss >> type;
ss >> length;
ss >> dummy;
ss.width(length);
ss >> value;
std::cout << value << std::endl;
Disclaimer:
I'm a noob at C++.
You can probably just use atoi which will ignore the colon.
e.g. len = atoi(serialized + start);
The only thing with atoi is that if it returns zero it could mean either the conversion failed, or that the length was truly zero. So it's not always the most appropriate function.
if you replace you colon with a space scanf will stop on it and you can get the size malloc the size then run another scanf to get the rest of the string`
int main (int argc, const char * argv[]) {
char foo[20];
char *test;
scanf("%s",foo); //"hello world"
printf("foo = %s\n", foo);//prints hello
//get size
test = malloc(sizeof(char)* 10);//replace 10 with your string size
scanf("%s", test);
printf("test = %s\n", test);//prints world
return 0;
}
`
Seems like the format is overspecified... (using a variable length field to specify the length of a variable length field).
If you're using GCC, I'd suggest
if (sscanf(serialized,"%c%d:%as",&type,&len,&dest)<3) return -1;
/* use type, dest; ignore len */
free(dest);
return 0;