Can I safely use std::string to assemble binary data into messages? - c++

I am using a std::string to hold binary data read from a socket.
The data consists of messages beginning with a '$' and ending with a '#'. Each message may contain '\0' characters.
I use std::string::find() to find the location of the first message and extract it from the string using std::string::substr():
class MessageSplitter {
public:
MessageSplitter() { m_data.reserve(1'000'000); }
void appendBinaryData(const std::string& binaryData) {
m_data.append(bytes);
}
bool popMessage(std::string& msg) {
size_t beg_index = m_data.find("$");
if (beg_index == std::string::npos) {
return false;
}
size_t end_index = m_data.find("#", beg_index);
if (end_index == std::string::npos) {
return false;
}
size_t count = end_index - beg_index + end.size();
msg = m_data.substr(beg_index, count);
m_data = m_data.substr(end_index + end.size());
return true;
}
private:
std::string m_data;
};
I read from socket this way (error checking on recv omitted):
char buffer[4096];
int ret = ::recv(m_socket, buffer, 4096, 0);
std::string binaryData = std::string(buffer, ret);
This approach seems to work fine on Windows.
However is it guaranteed to work on other platforms according to the C++ standard?

This is perfectly safe from a language level. std::string is guaranteed to be able to handle non-printable characters including embedded nul characters just fine.
From a programmer's prospective though it's somewhat unsafe because it's surprising. When I see std::string I generally expect it to be printable text. It has an operator<< for example to make it easy to print to output streams, and I have to remember never to use that.
For the second reason, I would tend to prefer something more explicit. std::vector<std::byte> or std::vector<unsigned char> or similar. Something that doesn't act like text is much more difficult to accidentally treat as text.

Related

C++ Send a file containing \0 via sockets without accidental closure

I serialize the file via the code beneath, and send it over winsocks, this works fine with textfiles, but when I tried to send a jpg, the string contains \0 as some of the character elements, so the sockets only send part of the string, thinking \0 is the end, i was considering replacing \0 with something else, but say i replace it with 'xx', then replace it back on the other end, what if the file had natural occurrences of 'xx' that get lost? Sure I could make a large, unlikely sequence, but that bloats the file.
Any help appreciated.
char* read_file(string path, int& len)
{
std::ifstream infile(path);
infile.seekg(0, infile.end);
size_t length = infile.tellg();
infile.seekg(0, infile.beg);
len = length;
char* buffer = new char[len]();
infile.read(buffer, length);
return buffer;
}
string load_to_buffer(string file)
{
char* img;
int ln;
img = read_file(file, ln);
string s = "";
for (int i = 1; i <= ln; i++){
char c = *(img + i);
s += c;
}
return s;
}
Probably somewhere in your code (that isn't seen in the code you have posted) you use strlen() or std::string::length() to send the data, and/or you use std::string::c_str() to get the buffer. This results in truncated data because these functions stop at \0.
std::string is not good to handle binary data. Use std::vector<char> instead, and remove the new[] stuff.

How to send image data over linux socket

I have a relatively simple web server I have written in C++. It works fine for serving text/html pages, but the way it is written it seems unable to send binary data and I really need to be able to send images.
I have been searching and searching but can't find an answer specific to this question which is written in real C++ (fstream as opposed to using file pointers etc.) and whilst this kind of thing is necessarily low level and may well require handling bytes in a C style array I would like the the code to be as C++ as possible.
I have tried a few methods, this is what I currently have:
int sendFile(const Server* serv, const ssocks::Response& response, int fd)
{
// some other stuff to do with headers etc. ........ then:
// open file
std::ifstream fileHandle;
fileHandle.open(serv->mBase + WWW_D + resource.c_str(), std::ios::binary);
if(!fileHandle.is_open())
{
// error handling code
return -1;
}
// send file
ssize_t buffer_size = 2048;
char buffer[buffer_size];
while(!fileHandle.eof())
{
fileHandle.read(buffer, buffer_size);
status = serv->mSock.doSend(buffer, fd);
if (status == -1)
{
std::cerr << "Error: socket error, sending file\n";
return -1;
}
}
return 0
}
And then elsewhere:
int TcpSocket::doSend(const char* message, int fd) const
{
if (fd == 0)
{
fd = mFiledes;
}
ssize_t bytesSent = send(fd, message, strlen(message), 0);
if (bytesSent < 1)
{
return -1;
}
return 0;
}
As I say, the problem is that when the client requests an image it won't work. I get in std::cerr "Error: socket error sending file"
EDIT : I got it working using the advice in the answer I accepted. For completeness and to help those finding this post I am also posting the final working code.
For sending I decided to use a std::vector rather than a char array. Primarily because I feel it is a more C++ approach and it makes it clear that the data is not a string. This is probably not necessary but a matter of taste. I then counted the bytes read for the stream and passed that over to the send function like this:
// send file
std::vector<char> buffer(SEND_BUFFER);
while(!fileHandle.eof())
{
fileHandle.read(&buffer[0], SEND_BUFFER);
status = serv->mSock.doSend(&buffer[0], fd, fileHandle.gcount());
if (status == -1)
{
std::cerr << "Error: socket error, sending file\n";
return -1;
}
}
Then the actual send function was adapted like this:
int TcpSocket::doSend(const char* message, int fd, size_t size) const
{
if (fd == 0)
{
fd = mFiledes;
}
ssize_t bytesSent = send(fd, message, size, 0);
if (bytesSent < 1)
{
return -1;
}
return 0;
}
The first thing you should change is the while (!fileHandle.eof()) loop, because that will not work as you expect it to, in fact it will iterate once too many because the eof flag isn't set until after you try to read from beyond the end of the file. Instead do e.g. while (fileHandle.read(...)).
The second thing you should do is to check how many bytes was actually read from the file, and only send that amount of bytes.
Lastly, you read binary data, not text, so you can't use strlen on the data you read from the file.
A little explanations of the binary file problem: As you should hopefully know, C-style strings (the ones you use strlen to get the length of) are terminated by a zero character '\0' (in short, a zero byte). Random binary data can contain lots of zero bytes anywhere inside it, and it's a valid byte and doesn't have any special meaning.
When you use strlen to get the length of binary data there are two possible problems:
There's a zero byte in the middle of the data. This will cause strlen to terminate early and return the wrong length.
There's no zero byte in the data. That will cause strlen to go beyond the end of the buffer to look for the zero byte, leading to undefined behavior.

c++ Remove everything in string prior to sequence including sequence

I've been writing a program that receives data from other sources across a network, and I need to sanitize the data before I send it to be processed. Previously, I had been doing it based on size, as below:
char data[max_length];
boost::system::error_code error;
size_t length = sock->read_some( boost::asio::buffer( data ), error );
std::stringstream ss;
for( int i = 0; i < max_length; i++ ) {
ss << data[i];
}
std::vector<int> idata;
std::string s2 = ss.str();
s2.erase( 0, 255 );
But the headers I need to remove are of variable length. So after doing some digging, I found I could remove them by finding the sequence of characters I know they'll end in - in this case \r\n\r\n - and removing everything up until then using size_t like so:
size_t p = s2.find( "\r\n\r\n" );
s2.erase( 0, p );
But that still leaves the \r\n\r\n at the beginning of my string which, at best, throws off my data handling later, and at worst, might cause issues down the line, as there are segments of my program that don't respond well to whitespace.
So my question is this: Is there a better way I could be doing this that will remove up to and including the specified sequence of characters? Can I just do p = p + 4; ? is that even possible with the size_t type?
Yes, you can write p + 4, since size_t is an (unsigned) integer type.
By the way, you might also want to pass data directly into a std::string constructor, rather than use std::stringstream ss.
Edit: To explain in more detail, it would look something like this:
char data[max_length];
// Read data and ensure that it is null-terminated ...
std::string s2(data); // Call the std::string constructor that inputs a null-terminated C string.
size_t p = s2.find("\r\n\r\n");
s2.erase(0, p + 4);

strcmp error comparing converted wide string

I added this because I am trying to convert handle WStrings in Android NDK NDK does not support wide characters. I could use advice on how to do this. I think the asciiConvert method does not work anymore
typedef std::basic_string<wchar_t> WString;
WString val;
val=L"";
set_val(L"");
char* value=asciiConvert(get_val()); // value is 0x00000000
std::string token; // value is ""
if (strcmp(token.c_str(),value)==0) //ERROR HERE: INFINITE LOOP HERE I THINK since it will never be true.
HERE IS THE CONVERSION FUNCTION:
char* asciiConvert(const wchar_t* wideStr, char replSpace) // replSpace == -1
{
if (wideStr == NULL)
return NULL;
char* asciiStr = new char[wcslen(wideStr) + 10];
sprintf(asciiStr, "%S", wideStr);
if (replSpace >= 0)
{
int len = strlen(asciiStr);
while (len)
{
if (asciiStr[len] == ' ')
asciiStr[len] = replSpace;
len--;
}
}
return asciiStr;
}
UPDATE: the typedef is advised for some implementations which do not support wstring so I think I need, but now something to not work like above. Have not used C++ in a while so I could use very specific instructions on this.
Basically I have dozens of const wchar_t* foo(const wchar_t* a, const wchar_t& b)
and a quite a few wchar* [] as well as const wchar_t* memVariable; even virutal functions with these.
How about CrystalX for this? Is that the way to go?

Possible reasons for tellg() failing?

ifstream::tellg() is returning -13 for a certain file.
Basically, I wrote a utility that analyzes some source code; I open all files alphabetically, I start with "Apple.cpp" and it works perfectly.. But when it gets to "Conversion.cpp", always on the same file, after reading one line successfully tellg() returns -13.
The code in question is:
for (int i = 0; i < files.size(); ++i) { /* For each .cpp and .h file */
TextIFile f(files[i]);
while (!f.AtEof()) // When it gets to conversion.cpp (not on the others)
// first is always successful, second always fails
lines.push_back(f.ReadLine());
The code for AtEof is:
bool AtEof() {
if (mFile.tellg() < 0)
FATAL(format("DEBUG - tellg(): %d") % mFile.tellg());
if (mFile.tellg() >= GetSize())
return true;
return false;
}
After it reads successfully the first line of Conversion.cpp, it always crashes with DEBUG - tellg(): -13.
This is the whole TextIFile class (wrote by me, the error may be there):
class TextIFile
{
public:
TextIFile(const string& path) : mPath(path), mSize(0) {
mFile.open(path.c_str(), std::ios::in);
if (!mFile.is_open())
FATAL(format("Cannot open %s: %s") % path.c_str() % strerror(errno));
}
string GetPath() const { return mPath; }
size_t GetSize() { if (mSize) return mSize; const size_t current_position = mFile.tellg(); mFile.seekg(0, std::ios::end); mSize = mFile.tellg(); mFile.seekg(current_position); return mSize; }
bool AtEof() {
if (mFile.tellg() < 0)
FATAL(format("DEBUG - tellg(): %d") % mFile.tellg());
if (mFile.tellg() >= GetSize())
return true;
return false;
}
string ReadLine() {
string ret;
getline(mFile, ret);
CheckErrors();
return ret;
}
string ReadWhole() {
string ret((std::istreambuf_iterator<char>(mFile)), std::istreambuf_iterator<char>());
CheckErrors();
return ret;
}
private:
void CheckErrors() {
if (!mFile.good())
FATAL(format("An error has occured while performing an I/O operation on %s") % mPath);
}
const string mPath;
ifstream mFile;
size_t mSize;
};
Platform is Visual Studio, 32 bit, Windows.
Edit: Works on Linux.
Edit: I found the cause: line endings. Both Conversion and Guid and others had \n instead of \r\n. I saved them with \r\n instead and it worked. Still, this is not supposed to happen is it?
It's difficult to guess without knowing exactly what's in Conversion.cpp. However, using < with stream positions is not defined by the standard. You might want to consider an explicit cast to the correct integer type before formatting it; I don't know what formatting FATAL and format() expect to perform or how the % operator is overloaded. Stream positions don't have to map in a predicatable way to integers, certainly not if the file isn't opened in binary mode.
You might want to consider an alternative implementation for AtEof(). Say something like:
bool AtEof()
{
return mFile.peek() == ifstream::traits_type::eof();
}