String issue with assert on erase - c++

I am developing a program in C++, using the string container , as in std::string to store network data from the socket (this is peachy), I receive the data in a maximum possible 1452 byte frame at a time, the protocol uses a header that contains information about the data area portion of the packets length, and header is a fixed 20 byte length. My problem is that a string is giving me an unknown debug assertion, as in , it asserts , but I get NO message about the string. Now considering I can receive more than a single packet in a frame at a any time, I place all received data into the string , reinterpret_cast to my data struct, calculate the total length of the packet, then copy the data portion of the packet into a string for regex processing, At this point i do a string.erase, as in mybuff.Erase(totalPackLen); <~ THIS is whats calling the assert, but totalpacklen is less than the strings size.
Is there some convention I am missing here? Or is it that the std::string really is an inappropriate choice here? Ty.
Fixed it on my own. Rolled my own VERY simple buffer with a few C calls :)
int ret = recv(socket,m_buff,0);
if(ret > 0)
{
BigBuff.append(m_buff,ret);
while(BigBuff.size() > 16){
Header *hdr = reinterpret_cast<Header*>(&BigBuff[0]);
if(ntohs(hdr->PackLen) <= BigBuff.size() - 20){
hdr->PackLen = ntohs(hdr->PackLen);
string lData;
lData.append(BigBuff.begin() + 20,BigBuff.begin() + 20 + hdr->PackLen);
Parse(lData); //regex parsing helper function
BigBuff.erase(hdr->PackLen + 20); //assert here when len is packlen is 235 and string len is 1458;
}
}
}

From the code snippet you provided it appears that your packet comprises a fixed-length binary header followed by a variable length ASCII string as a payload. Your first mistake is here:
BigBuff.append(m_buff,ret);
There are at least two problems here:
1. Why the append? You presumably have dispatched with any previous messages. You should be starting with a clean slate.
2. Mixing binary and string data can work, but more often than not it doesn't. It is usually better to keep the binary and ASCII data separate. Don't use std::string for non-string data.
Append adds data to the end of the string. The very next statement after the append is a test for a length of 16, which says to me that you should have started fresh. In the same vein you do that reinterpret cast from BigBuff[0]:
Header *hdr = reinterpret_cast<Header*>(&BigBuff[0]);
Because of your use of append, you are perpetually dealing with the header from the first packet received rather than the current packet. Finally, there's that erase:
BigBuff.erase(hdr->PackLen + 20);
Many problems here:
- If the packet length and the return value from recv are consistent the very first call will do nothing (the erase is at but not past the end of the string).
- There is something very wrong if the packet length and the return value from recv are not consistent. It might mean, for example, that multiple physical frames are needed to form a single logical frame, and that in turn means you need to go back to square one.
- Suppose the physical and logical frames are one and the same, you're still going about this all wrong. As noted, the first time around you are erasing exactly nothing. That append at the start of the loop is exactly what you don't want to do.
Serialization oftentimes is a low-level concept and is best treated as such.

Your comment doesn't make sense:
BigBuff.erase(hdr->PackLen + 20); //assert here when len is packlen is 235 and string len is 1458;
BigBuff.erase(hdr->PackLen + 20) will erase from hdr->PackLen + 20 onwards till the end of the string. From the description of the code - seems to me that you're erasing beyond the end of the content data. Here's the reference for std::string::erase() for you.
Needless to say that std::string is entirely inappropriate here, it should be std::vector.

Related

ZLib GZIP Returning Z_BUF_ERROR(-5)

I am using the zlib library (compiled from src) to deflate/inflate gzip/zlib/raw bytes. I have created a wrapper class for decompressing and compressing (Compressor/Decompresser). I have also created several test cases (GZIP, ZLib, Raw, Auto-Detect). The tests pass for Zlib/Raw/Auto-Detect(Zlib), but not for GZip (window bits of 15u | 16u).
Here is my compress function.
std::vector<char> out(zlib->avail_in + 8);
deflateInit2(zlib.get(), Z_DEFAULT_COMPRESSION, Z_DEFLATED, static_cast<int32_t>(mode), 8, Z_DEFAULT_STRATEGY);
zlib->avail_out = out.size();
zlib->next_out = reinterpret_cast<Bytef*>(out.data());
deflate(zlib.get(), Z_FINISH);
out.resize(zlib->total_out + 3);
deflateEnd(zlib.get());
return std::move(out);
And here is decompress
uIntf multiplier = 2;
uIntf currentSize = zlib->avail_in * (multiplier++) * 1000 /* Just to make sure enough output space(will implement loop) */;
std::vector<char> out(currentSize);
inflateInit2(zlib.get(), static_cast<int>(mode));
zlib->avail_out = out.size();
zlib->next_out = reinterpret_cast<Bytef*>(out.data());
inflate(zlib.get(), Z_FINISH);
out.resize(zlib->total_out);
inflateEnd(zlib.get());
return std::move(out);
Input is set in a different function (that is being called), that looks like this. (char* is not being deleted when compress/decompress is called)
zlib->next_in = reinterpret_cast<Bytef*>(bytes);
zlib->avail_in = static_cast<uIntf>(length);
I also have a mode enum
enum class Mode : int32_t {
AUTO = 15u | 32u, // Never used on compress
GZIP = 15u | 16u,
ZLIB = 15,
RAW = -15
};
Note: Test cases with the mode being AUTO (paired with zlib), ZLib, and RAW work. GZip fails the test case. (The test case is just a simple alphanum character array).
Also I debugged the output of the gzip decompress (after it failed) and the output is missing the last 3 characters (y, z, termination character)
Another note:
The constructor of the wrapper classes look like this
zlib->zalloc = Z_NULL;
zlib->zfree = Z_NULL;
zlib->opaque = Z_NULL;
zlib->avail_in = 0;
zlib->next_in = Z_NULL;
First off, a bunch of scattered code fragments with no context makes it impossible to see what's happening. See How to create a Minimal, Reproducible Example for how to provide a decent example.
Second, you are not saying what is returning Z_BUF_ERROR. There aren't even any places in your code fragments where you retain the return values of deflate() or inflate(), so it's not even possible for you to see a Z_BUF_ERROR! You need to at least do something like int ret = deflate(zlib.get(), Z_FINISH); and then check the value of ret.
Third, I cannot tell in your code fragments where or even if you set the input pointer and length. Is the length set to zero before the inits? Or is it set to the data? Or is the data pointer and length set after the inits? See the MRE link above.
Fourth, we don't have the example data that you're using. So we cannot reproduce the error. Again, see the MRE link.
Ok, so making a stab in the dark here, I will guess that deflate() is returning the error. Then the problem is likely that you have not provided enough output space, and you have asked for Z_FINISH, which is telling deflate() you have provided enough output space. In that case, deflate() returning Z_BUF_ERROR means that you didn't. Compression can expand the data if it is not compressible, and gzip adds more header and trailer information than zlib. Your + 8 is inadequate to account for those two things. A zlib header and trailer is six bytes, whereas a gzip header and trailer is at least 18 bytes. The expansion is a multiplier on the input, adding some part of a percent, where you have no multiplier on the length at all.
zlib provides a function for just this purpose, deflateBound(). You would call that after deflateInit() with the size of your input, and it will return the maximum size of the compressed output.
However it is better to call deflate() multiple times in a loop. For most practical applications, it is necessary to call inflate() multiple times in a loop. This is seen in your comment, as well as in your attempt (also inadequate) to account for the possible size of the inflated data by multiplying by a thousand.
You can find a heavily commented example of how to use the zlib functions properly, with loops, at zlib Usage Example.

C++ Winsock Download File Cut off HTTP Header

I'm downloading the bytes of a file from the web using winsock2. so good so far.
I have the problem that I download my bytes including the http header which I don't need and which causes troubles in my files bytecodes.
Example:
I know I can find the position where the header is ending by finding "\r\n\r\n".
But somehow I can't find or at least cut it... :(
int iResponseBytes = 0;
ofstream ofDownloadedFile;
ofDownloadedFile.open(pathonclient, ios::binary);
do {
iResponseBytes = recv(this->Socket, responseBuffer, pageBufferSize, 0);
if (iResponseBytes > 0) // if bytes received
{
ofDownloadedFile.write(responseBuffer, pageBufferSize);
}
else if (iResponseBytes == 0) //Done
{
break;
}
else //fail
{
cout << "Error while downloading" << endl;
break;
}
} while (iResponseBytes > 0);
I tried searching the array / the pointer using strncmp etc.
Hopefully someone can help me.
Best greetings
You have no guarantees, whatsoever, that the \r\n\r\n sequence will be received completely within a single recv() call.
For example, the first recv() call could end up reading everything up until the first two characters of the sequence, \r\n, then your code runs around the loop again, and the second time recv() gets called it receives the remaining \r\n for the initial two bytes received (followed by the first part of the actual content). A small possibility that this might happen, but it cannot be ignored, and must be correctly handled.
If your goal is to trim everything up until the \r\n\r\n, your current approach is not going to work very well.
Instead, what you should do is invest some time studying how file stream buffering actually works. Pontificate, for a moment, how std::istream/std::ostream read/write large chunks of data at a time, but they provide a character-oriented interface. std::istream, for example, reads a buffer's full of file data at a time, placing it into an internal buffer, which your code can then retrieve one character at a time (if it wishes to). How does that work? Think about it.
To do this correctly, you need to implement the same algorithm yourself: recv() from the socket a buffer at a time, then provide a byte-oriented interface, to return the received contents one byte at a time.
Then, the main code becomes a simple loop, reading the streamed socket contents one byte at a time, at which point discarding everything up until the code sees \r\n\r\n becomes trivial (although there are still a few non-obvious gotchas in doing this right, but that can be a new question).
Of course, once the \r\n\r\n gets processed, it is certainly possible to optimize things going forward, by flushing out whatever's still buffered internally, to the output file, and then resume reading from the socket a whole buffer-at-a-time, and copying it to the output file without burning CPU cycles dealing with the byte-oriented interface.

In Oracle OCCI/OCI, should buffer for reading LOBs be larger than actual data? Getting ORA-32116

We are reading data from a CLOB into an std::vector via OCCI. The simplified code looks as follows:
oracle::occi::Clob clob = result.getClob( 3 );
unsigned len = clob.length();
std::vector< unsigned char > result( len );
unsigned have_read = clob.read( len , result.data() , len );
This yields the error ORA-32116, saying that the buffer size (3rd argument of read) should be equal or greater than the amount of data to be read (1st argument of read). This condition is apparently held.
After increasing the buffer size to 4*len:
unsigned have_read = clob.read(len , result.data() , 4 * len);
the operation is performed properly. So far, the values of have_read and len were always identical.
Is there an undocumented extra amount of space needed for the buffer? Or are complete pages needed?
We are using "Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit".
Any clarification on the topic is welcome.
I suspect that you've got multi-byte characters in your CLOB.
According to the documentation clob.length() "returns the number of characters in the CLOB" whereas the buffsize parameter of clob.read() states that "valid values are numbers greater or equal to amt", which in turn says that it's "the number of bytes to be read."
In other words (and according to the documentation) you're passing the number of characters to clob.read() when it's expecting the number of bytes. The fact that you're getting an error would suggest that the former is smaller than the latter.
The documentation suggests changing the buffer to be a utext, after setting the character set via setCharSetId() would fix it.
Alternatively, if you've got multi-byte characters and don't need to do any character representation (no idea), it might be worth working with BLOBs instead; blob.length() returns the number of bytes.

Curlpp, incomplete data from request

I am using Curlpp to send requests to various webservices to send and receive data.
So far this has worked fine since i have only used it for sending/receiving JSON data.
Now i have a situation where a webservice returns a zip file in binary form. This is where i encountered a problem where the data received is not complete.
I first had Curl set to write any data to a ostringstream by using the option WriteStream, but this proved not to be the correct approach since the data contained null characters, and thus the data stopped at the first null char.
After that, instead of using WriteStream i used WriteFunction with a callback function.
The problem in this case is that this function is always called 2 or 3 times, regardless of the amount of data.
This results in always having a few chunks of data that don't seem to be the first part of the file, although the data always contains PK as the first 2 characters, indicating a zip file.
I used several tools to verify that the data is entirely being sent to my application so this is not a problem of the webservice.
Here the code. Do note that the options like hostname, port, headers and postfields are set elsewhere.
string requestData;
size_t WriteStringCallback(char* ptr, size_t size, size_t nmemb)
{
requestData += ptr;
int totalSize= size*nmemb;
return totalSize;
}
const string CurlRequest::Perform()
{
curlpp::options::WriteFunction wf(WriteStringCallback);
this->request.setOpt( wf );
this->request.perform();
return requestData;
}
I hope anyone can help me out with this issue because i've run dry of any leads on how to fix this, also because curlpp is poorly documented(and even worse since the curlpp website disappeared).
The problem with the code is that the data is put into a std::string, despite having the data in binary (ZIP) format. I'd recommend to put the data into a stream (or a binary array).
You can also register a callback to retrieve the response headers and act in the WriteCallback according to the "Content-type".
curlpp::options::HeaderFunction to register a callback to retrieve response-headers.
std::string is not a problem, but the concatenation is:
requestData += ptr;
C string (ptr) is terminated with zero, if the input contains any zero bytes, the input will be truncated. You should wrap it into a string which knows the length of its data:
requestData += std::string(ptr, size*nmemb);

Can I use JsonCpp to partially-validate JSON input?

I'm using JsonCpp to parse JSON in C++.
e.g.
Json::Reader r;
std::stringstream ss;
ss << "{\"name\": \"sample\"}";
Json::Value v;
assert(r.parse(ss, v)); // OK
assert(v["name"] == "sample"); // OK
But my actual input is a whole stream of JSON messages, that may arrive in chunks of any size; all I can do is to get JsonCpp to try to parse my input, character by character, eating up full JSON messages as we discover them:
Json::Reader r;
std::string input = "{\"name\": \"sample\"}{\"name\": \"aardvark\"}";
for (size_t cursor = 0; cursor < input.size(); cursor++) {
std::stringstream ss;
ss << input.substr(0, cursor);
Json::Value v;
if (r.parse(ss, v)) {
std::cout << v["name"] << " ";
input.erase(0, cursor);
}
} // Output: sample aardvark
This is already a bit nasty, but it does get worse. I also need to be able to resync when part of an input is missing (for any reason).
Now it doesn't have to be lossless, but I want to prevent an input such as the following from potentially breaking the parser forever:
{"name": "samp{"name": "aardvark"}
Passing this input to JsonCpp will fail, but that problem won't go away as we receive more characters into the buffer; that second name is simply invalid directly after the " that precedes it; the buffer can never be completed to present valid JSON.
However, if I could be told that the fragment certainly becomes invalid as of the second n character, I could drop everything in the buffer up to that point, and then simply wait for the next { to consider the start of a new object, as a best-effort resync.
So, is there a way that I can ask JsonCpp to tell me whether an incomplete fragment of JSON has already guaranteed that the complete "object" will be syntactically invalid?
That is:
{"name": "sample"} Valid (Json::Reader::parse == true)
{"name": "sam Incomplete (Json::Reader::parse == false)
{"name": "sam"LOL Invalid (Json::Reader::parse == false)
I'd like to distinguish between the two fail states.
Can I use JsonCpp to achieve this, or am I going to have to write my own JSON "partial validator" by constructing a state machine that considers which characters are "valid" at each step through the input string? I'd rather not re-invent the wheel...
It certainly depends if you actually control the packets (and thus the producer), or not. If you do, the most simple way is to indicate the boundaries in a header:
+---+---+---+---+-----------------------
| 3 | 16|132|243|endofprevious"}{"name":...
+---+---+---+---+-----------------------
The header is simple:
3 indicates the number of boundaries
16, 132 and 243 indicate the position of each boundary, which correspond to the opening bracket of a new object (or list)
and then comes the buffer itself.
Upon receiving such a packet, the following entries can be parsed:
previous + current[0:16]
current[16:132]
current[132:243]
And current[243:] is saved for the next packet (though you can always attempt to parse it in case it's complete).
This way, the packets are auto-synchronizing, and there is no fuzzy detection, with all the failure cases it entails.
Note that there could be 0 boundaries in the packet. It simply implies that one object is big enough to span several packets, and you just need to accumulate for the moment.
I would recommend making the numbers representation "fixed" (for example, 4 bytes each) and settling on a byte order (that of your machine) to convert them into/from binary easily. I believe the overhead to be fairly minimal (4 bytes + 4 bytes per entry given that {"name":""} is already 11 bytes).
Iterating through the buffer character-by-character and manually checking for:
the presence of alphabetic characters
outside of a string (being careful that " can be escaped with \, though)
not part of null, true or false
not a e or E inside what looks like a numeric literal with exponent
the presence of a digit outside of a string but immediately after a "
...is not all-encompassing, but I think it covers enough cases to fairly reliably break parsing at the point of or reasonably close to the point of a message truncation.
It correctly accepts:
{"name": "samL
{"name": "sam0
{"name": "sam", 0
{"name": true
as valid JSON fragments, but catches:
{"name": "sam"L
{"name": "sam"0
{"name": "sam"true
as being unacceptable.
Consequently, the following inputs will all result in the complete trailing object being parsed successfully:
1. {"name": "samp{"name": "aardvark"}
// ^ ^
// A B - B is point of failure.
// Stripping leading `{` and scanning for the first
// free `{` gets us to A. (*)
{"name": "aardvark"}
2. {"name": "samp{"0": "abc"}
// ^ ^
// A B - B is point of failure.
// Stripping and scanning gets us to A.
{"0": "abc"}
3. {"name":{ "samp{"0": "abc"}
// ^ ^ ^
// A B C - C is point of failure.
// Stripping and scanning gets us to A.
{ "samp{"0": "abc"}
// ^ ^
// B C - C is still point of failure.
// Stripping and scanning gets us to B.
{"0": "abc"}
My implementation passes some quite thorough unit tests. Still, I wonder whether the approach itself can be improved without exploding in complexity.
* Instead of looking for a leading "{", I actually have a sentinel string prepended to every message which makes the "stripping and scanning" part even more reliable.
Just look at expat or other streamed xml parsers. The logic of jsoncpp should be similar if its not. (Ask developers of this library to improve stream reading if needed.)
In other words, and from my point of view:
If some of your network (not JSON) packets are lost its not problem of JSON parser, just use more reliable protocol or invent your own. And only then transfer JSON over it.
If JSON parser reports errors and this error happened on the last parsed token (no more data in stream but expected) - accumulate data and try again (this task should be done by the library itself).
Sometimes it may not report errors though. For example when you transfer 123456 and only 123 is received. But this does not match your case since you don't transfer primitive data in a single JSON packet.
If the stream contains valid packets followed by semi-received packets, some callback should be called for each valid packet.
If the JSON parser reports errors and it's really invalid JSON, the stream should be closed and opened again if necessary.