Is it possible to use an std::string for read()? - c++

Is it possible to use an std::string for read() ?
Example :
std::string data;
read(fd, data, 42);
Normaly, we have to use char* but is it possible to directly use a std::string ? (I prefer don't create a char* for store the result)
Thank's

Well, you'll need to create a char* somehow, since that's what the
function requires. (BTW: you are talking about the Posix function
read, aren't you, and not std::istream::read?) The problem isn't
the char*, it's what the char* points to (which I suspect is what
you actually meant).
The simplest and usual solution here would be to use a local array:
char buffer[43];
int len = read(fd, buffer, 42);
if ( len < 0 ) {
// read error...
} else if ( len == 0 ) {
// eof...
} else {
std::string data(buffer, len);
}
If you want to capture directly into an std::string, however, this is
possible (although not necessarily a good idea):
std::string data;
data.resize( 42 );
int len = read( fd, &data[0], data.size() );
// error handling as above...
data.resize( len ); // If no error...
This avoids the copy, but quite frankly... The copy is insignificant
compared to the time necessary for the actual read and for the
allocation of the memory in the string. This also has the (probably
negligible) disadvantage of the resulting string having an actual buffer
of 42 bytes (rounded up to whatever), rather than just the minimum
necessary for the characters actually read.
(And since people sometimes raise the issue, with regards to the
contiguity of the memory in std:;string: this was an issue ten or more
years ago. The original specifications for std::string were designed
expressedly to allow non-contiguous implementations, along the lines of
the then popular rope class. In practice, no implementor found this
to be useful, and people did start assuming contiguity. At which point,
the standards committee decided to align the standard with existing
practice, and require contiguity. So... no implementation has ever not
been contiguous, and no future implementation will forego contiguity,
given the requirements in C++11.)

No, you cannot and you should not. Usually, std::string implementations internally store other information such as the size of the allocated memory and the length of the actual string. C++ documentation explicitly states that modifying values returned by c_str() or data() results in undefined behaviour.

If the read function requires a char *, then no. You could use the address of the first element of a std::vector of char as long as it's been resized first. I don't think old (pre C++11) strings are guarenteed to have contiguous memory otherwise you could do something similar with the string.

No, but
std::string data;
cin >> data;
works just fine. If you really want the behaviour of read(2), then you need to allocate and manage your own buffer of chars.

Because read() is intended for raw data input, std::string is actually a bad choice, because std::string handles text. std::vector seems like the right choice to handle raw data.

Using std::getline from the strings library - see cplusplus.com - can read from an stream and write directly into a string object. Example (again ripped from cplusplus.com - 1st hit on google for getline):
int main () {
string str;
cout << "Please enter full name: ";
getline (cin,str);
cout << "Thank you, " << str << ".\n";
}
So will work when reading from stdin (cin) and from a file (ifstream).

Related

Reading contents of file into dynamically allocated char* array- can I read into std::string instead?

I have found myself writing code which looks like this
// Treat the following as pseudocode - just an example
iofile.seekg(0, std::ios::end); // iofile is a file opened for read/write
uint64_t f_len = iofile.tellg();
if(f_len >= some_min_length)
{
// Focus on the following code here
char *buf = new char[7];
char buf2[]{"MYFILET"}; // just some random string
// if we see this it's a good indication
// the rest of the file will be in the
// expected format (unlikely to see this
// sequence in a "random file", but don't
// worry too much about this)
iofile.read(buf, 7);
if(memcmp(buf, buf2, 7) == 0) // I am confident this works
{
// carry on processing file ...
// ...
// ...
}
}
else
cout << "invalid file format" << endl;
This code is probably an okay sketch of what we might want to do when opening a file, which has some specified format (which I've dictated). We do some initial check to make sure the string "MYFILET" is at the start of the file - because I've decided all my files for the job I'm doing are going to start with this sequence of characters.
I think this code would be better if we didn't have to play around with "c-style" character arrays, but used strings everywhere instead. This would be advantageous because we could do things like if(buf == buf2) if buf and buf2 where std::strings.
A possible alternative could be,
// Focus on the following code here
std::string buf;
std::string buf2("MYFILET"); // very nice
buf.resize(7); // okay, but not great
iofile.read(buf.data(), 7); // pretty awful - error prone if wrong length argument given
// also we have to resize buf to 7 in the previous step
// lots of potential for mistakes here,
// and the length was used twice which is never good
if(buf == buf2) then do something
What are the problems with this?
We had to use the length variable 7 (or constant in this case) twice. Which is somewhere between "not ideal" and "potentially error prone".
We had to access the contents of buf using .data() which I shall assume here is implemented to return a raw pointer of some sort. I don't personally mind this too much, but others may prefer a more memory-safe solution, perhaps hinting we should use an iterator of some sort? I think in Visual Studio (for Windows users which I am not) then this may return an iterator anyway, which will give [?] warnings/errors [?] - not sure on this.
We had to have an additional resize statement for buf. It would be better if the size of buf could be automatically set somehow.
It is undefined behavior to write into the const char* returned by std::string::data(). However, you are free to use std::vector::data() in this way.
If you want to use std::string, and dislike setting the size yourself, you may consider whether you can use std::getline(). This is the free function, not std::istream::getline(). The std::string version will read up to a specified delimiter, so if you have a text format you can tell it to read until '\0' or some other character which will never occur, and it will automatically resize the given string to hold the contents.
If your file is binary in nature, rather than text, I think most people would find std::vector<char> to be a more natural fit than std::string anyway.
We had to use the length variable 7 (or constant in this case) twice.
Which is somewhere between "not ideal" and "potentially error prone".
The second time you can use buf.size()
iofile.read(buf.data(), buf.size());
We had to access the contents of buf using .data() which I shall
assume here is implemented to return a raw pointer of some sort.
And pointed by John Zwinck, .data() return a pointer to const.
I suppose you could define buf as std::vector<char>; for vector (if I'm not wrong) .data() return a pointer to char (in this case), not to const char.
size() and resize() are working in the same way.
We had to have an additional resize statement for buf. It would be
better if the size of buf could be automatically set somehow.
I don't think read() permit this.
p.s.: sorry for my bad English.
We can validate a signature without double buffering (rdbuf and a string) and allocating from the heap...
// terminating null not included
constexpr char sig[] = { 'M', 'Y', 'F', 'I', 'L', 'E', 'T' };
auto ok = all_of(begin(sig), end(sig), [&fs](char c) { return fs.get() == (int)c; });
if (ok) {}
template<class Src>
std::string read_string( Src& src, std::size_t count){
std::string buf;
buf.resize(count);
src.read(&buf.front(), 7); // in C++17 make it buf.data()
return buf;
}
Now auto read = read_string( iofile, 7 ); is clean at point of use.
buf2 is a bad plan. I'd do:
if(read=="MYFILET")
directly, or use a const char myfile_magic[] = "MYFILET";.
I liked many of the ideas from the examples above, however I wasn't completely satisfied that there was an answer which would produce undefined-behaviour-free code for C++11 and C++17. I currently write most of my code in C++11 - because I don't anticipate using it on a machine in the future which doesn't have a C++11 compiler.
If one doesn't, then I add a new compiler or change machines.
However it does seem to me to be a bad idea to write code which I know may not work under C++17... That's just my personal opinion. I don't anticipate using this code again, but I don't want to create a potential problem for myself in the future.
Therefore I have come up with the following code. I hope other users will give feedback to help improve this. (For example there is no error checking yet.)
std::string
fstream_read_string(std::fstream& src, std::size_t n)
{
char *const buffer = new char[n + 1];
src.read(buffer, n);
buffer[n] = '\0';
std::string ret(buffer);
delete [] buffer;
return ret;
}
This seems like a basic, probably fool-proof method... It's a shame there seems to be no way to get std::string to use the same memory as allocated by the call to new.
Note we had to add an extra trailing null character in the C-style string, which is sliced off in the C++-style std::string.

Using sprintf with std::string in C++

I am using sprintf function in C++ 11, in the following way:
std::string toString()
{
std::string output;
uint32_t strSize=512;
do
{
output.reserve(strSize);
int ret = sprintf(output.c_str(), "Type=%u Version=%u ContentType=%u contentFormatVersion=%u magic=%04x Seg=%u",
INDEX_RECORD_TYPE_SERIALIZATION_HEADER,
FORAMT_VERSION,
contentType,
contentFormatVersion,
magic,
segmentId);
strSize *= 2;
} while (ret < 0);
return output;
}
Is there a better way to do this, than to check every time if the reserved space was enough? For future possibility of adding more things.
Your construct -- writing into the buffer received from c_str() -- is undefined behaviour, even if you checked the string's capacity beforehand. (The return value is a pointer to const char, and the function itself marked const, for a reason.)
Don't mix C and C++, especially not for writing into internal object representation. (That is breaking very basic OOP.) Use C++, for type safety and not running into conversion specifier / parameter mismatches, if for nothing else.
std::ostringstream s;
s << "Type=" << INDEX_RECORD_TYPE_SERIALIZATION_HEADER
<< " Version=" << FORMAT_VERSION
// ...and so on...
;
std::string output = s.str();
Alternative:
std::string output = "Type=" + std::to_string( INDEX_RECORD_TYPE_SERIALIZATION_HEADER )
+ " Version=" + std::to_string( FORMAT_VERSION )
// ...and so on...
;
The C++ patterns shown in other answers are nicer, but for completeness, here is a correct way with sprintf:
auto format = "your %x format %d string %s";
auto size = std::snprintf(nullptr, 0, format /* Arguments go here*/);
std::string output(size + 1, '\0');
std::sprintf(&output[0], format, /* Arguments go here*/);
Pay attention to
You must resize your string. reserve does not change the size of the buffer. In my example, I construct correctly sized string directly.
c_str() returns a const char*. You may not pass it to sprintf.
std::string buffer was not guaranteed to be contiguous prior to C++11 and this relies on that guarantee. If you need to support exotic pre-C++11 conforming platforms that use rope implementation for std::string, then you're probably better off sprinting into std::vector<char> first and then copying the vector to the string.
This only works if the arguments are not modified between the size calculation and formatting; use either local copies of variables or thread synchronisation primitives for multi-threaded code.
We can mix code from here https://stackoverflow.com/a/36909699/2667451 and here https://stackoverflow.com/a/7257307 and result will be like that:
template <typename ...Args>
std::string stringWithFormat(const std::string& format, Args && ...args)
{
auto size = std::snprintf(nullptr, 0, format.c_str(), std::forward<Args>(args)...);
std::string output(size + 1, '\0');
std::sprintf(&output[0], format.c_str(), std::forward<Args>(args)...);
return output;
}
A better way is to use the {fmt} library. Ex:
std::string message = fmt::sprintf("The answer is %d", 42);
It exposes also a nicer interface than iostreams and printf. Ex:
std::string message = fmt::format("The answer is {}", 42);`
See:
https://github.com/fmtlib/fmt
http://fmtlib.net/latest/api.html#printf-formatting-functions
Your code is wrong. reserve allocates memory for the string, but does not change its size. Writing into the buffer returned by c_str does not change its size either. So the string still believes its size is 0, and you've just written something into the unused space in the string's buffer. (Probably. Technically, the code has Undefined Behaviour, because writing into c_str is undefined, so anything could happen).
What you really want to do is forget sprintf and similar C-style functions, and use the C++ way of string formatting—string streams:
std::ostringstream ss;
ss << "Type=" << INDEX_RECORD_TYPE_SERIALIZATION_HEADER
<< " Version=" << FORAMT_VERSION
<< /* ... the rest ... */;
return ss.str();
Yes, there is!
In C, the better way is to associate a file with the null device and make a dummy printf of the desired output to it, to learn how much space would it take if actually printed. Then allocate appropriate buffer and sprintf the same data to it.
In C++ you could associate the output stream with a null device, too, and test the number of charactes printed with std::ostream::tellp. However, using ostringstream is a way better solution – see the answers by DevSolar or Angew.
You can use an implementation of sprintf() into a std::string I wrote that uses vsnprintf() under the hood.
It splits the format string into sections of plain text which are just copied to the destination std::string and sections of format fields (such as %5.2lf) which are first vsnprintf()ed into a buffer and then appended to the destination.
https://gitlab.com/eltomito/bodacious-sprintf

Read in data of a dynamic size into char*?

I was wondering how the following code works.
#include <iostream>
using namespace std;
int main()
{
char* buffer = new char(NULL);
while(true)
{
cin >> buffer;
cout << buffer;
cout << endl;
}
return 0;
}
I can input any amount of text of any size and it will print it back out to me. How does this work? Is it dynamically allocating space for me?
Also, if I enter in a space, it will print the next section of text on a new line.
This however, is fixed by using gets(buffer); (unsafe).
Also, is this code 'legal'?
It's not safe at all. It's rewriting whatever memory happens to lie after the buffer, and then reading it. The fact that this is working is coincidental. This is because your cin/cout operations don't say "oh, a pointer to one char, I should just write one char" but "oh, you have enough space allocated for me."
Improvement #1:
char* buffer = new char(10000) or simply char buffer[10000];
Now you can safely write long-ish paragraphs with no issue.
Improvement #2:
std::string buffer;
To answer your question in the comment, C++ is all for letting you make big memory mistakes. As noted in comment this is because it's a "don't pay for what you don't need" language. There are some people who really need this level of optimization in their code although you are probably not one of them.
However, it also gives you plenty of ways to do it where you don't have to think about memory at all. I will say firmly: if you are using new and delete or char[] and not because you are using a design pattern with which you've familiarized that require them, or because you are using 3rd-party or C libraries that require them, there is a safer way to do it.
Some guidelines that will save you 80% of the time:
-Don't use char[]. Use string.
-Don't use pointers to pass or return argument. Pass by reference, return by value.
-Don't use arrays (e.g. int[]). Use vectors. You still have to check your own bounds.
With just those three you'll be writing "pretty safe", non-C-like code.
This is what std::string is for:
std::string s;
while (true)
{
std::cin >> s;
std::cout << s << std::endl;
}
std::string WILL dynamically allocate space for you, so you don't have to worry about overwriting memory elsewhere.

std::string or std::vector<char> to hold raw data

I hope this question is appropriate for stackoverflow... What is the difference between storing raw data bytes (8 bits) in a std::string rather than storing them in std::vector<char>. I'm reading binary data from a file and storing those raw bytes in a std::string. This works well, there are no problems or issues with doing this. My program works as expected. However, other programmers prefer the std::vector<char> approach and suggest I stop using std::string as it's unsafe for raw bytes. So I'm wondering why might it be unsafe to use std::string to hold raw data bytes? I know std::string is most often used to store ASCII text, but a byte is a byte, so I don't understand the preference of the std::vector<char>.
Thanks for any advice!
The problem is not really whether it works or it doesn't. The problem is that it is utterly confusing for the next guy reading your code. std::string is meant for displaying text. Anybody reading your code will expect that. You'll declare your intent much better with a std::vector<char>.
It increases your WTF/min in code reviews.
In C++03, using std::string to store an array of byte data was not a good idea. By the standard, std::string did not have to store data contiguously. C++11 fixed that so that it's data does have to be contiguous.
So it would not be functional to do this in C++03. Not unless you have personally vetted your C++ standard library implementation of std::string to ensure that it is contiguous.
Either way, I would suggest vector<char>. Generally, when you see string, you expect it to be a... string. You know, a sequence of characters in some form of encoding. A vector<char> makes it obvious that it isn't a string, but an array of bytes.
Besides contiguous storage and code-clarity issues, I ran into some fairly insidious errors trying to use std::string to hold raw bytes.
Most of them centered around trying to convert a char array of bytes to std::string when interfacing with C libraries. For example:
std::string password = "pass\0word";
std::cout << password.length() << std::endl; // prints 4, not 9
Maybe you can fix that by specifying the length:
std::string password("pass\0word", 0, 9);
std::cout << password.length() << std::endl; // nope! still 4!
This is probably because the constructor expects to receive a C-string, not a byte array. There might be a better way, but I ended up with this:
std::string password("pass0word", 0, 9);
password[4] = '\0';
std::cout << password.length() << std::endl; // hurray! 9!
A little clunky. Thankfully I found this in unit testing, but I would have missed it if my test vectors didn't have null bytes. What makes this insidious is that the second approach above will work fine until the array contains a null byte.
So far std::vector<uint8_t> looks like a good option (thanks J.N. and Hurkyl):
char p[] = "pass\0word";
std::vector<uint8_t> password(p, p, p+9); // :)
Note: I haven't tried the iterator constructor with std::string, but this error is easy enough to make that it might be worth avoiding even the possibility.
Lessons learned:
Test byte-handling methods witih null byte-containing test vectors.
Be careful when (and I would say avoid) using std::string to hold raw bytes.

understanding the dangers of sprintf(...)

OWASP says:
"C library functions such as strcpy
(), strcat (), sprintf () and vsprintf
() operate on null terminated strings
and perform no bounds checking."
sprintf writes formatted data to string
int sprintf ( char * str, const char * format, ... );
Example:
sprintf(str, "%s", message); // assume declaration and
// initialization of variables
If I understand OWASP's comment, then the dangers of using sprintf are that
1) if message's length > str's length, there's a buffer overflow
and
2) if message does not null-terminate with \0, then message could get copied into str beyond the memory address of message, causing a buffer overflow
Please confirm/deny. Thanks
You're correct on both problems, though they're really both the same problem (which is accessing data beyond the boundaries of an array).
A solution to your first problem is to instead use std::snprintf, which accepts a buffer size as an argument.
A solution to your second problem is to give a maximum length argument to snprintf. For example:
char buffer[128];
std::snprintf(buffer, sizeof(buffer), "This is a %.4s\n", "testGARBAGE DATA");
// std::strcmp(buffer, "This is a test\n") == 0
If you want to store the entire string (e.g. in the case sizeof(buffer) is too small), run snprintf twice:
int length = std::snprintf(nullptr, 0, "This is a %.4s\n", "testGARBAGE DATA");
++length; // +1 for null terminator
char *buffer = new char[length];
std::snprintf(buffer, length, "This is a %.4s\n", "testGARBAGE DATA");
(You can probably fit this into a function using va or variadic templates.)
Both of your assertions are correct.
There's an additional problem not mentioned. There is no type checking on the parameters. If you mismatch the format string and the parameters, undefined and undesirable behavior could result. For example:
char buf[1024] = {0};
float f = 42.0f;
sprintf(buf, "%s", f); // `f` isn't a string. the sun may explode here
This can be particularly nasty to debug.
All of the above lead many C++ developers to the conclusion that you should never use sprintf and its brethren. Indeed, there are facilities you can use to avoid all of the above problems. One, streams, is built right in to the language:
#include <sstream>
#include <string>
// ...
float f = 42.0f;
stringstream ss;
ss << f;
string s = ss.str();
...and another popular choice for those who, like me, still prefer to use sprintf comes from the boost Format libraries:
#include <string>
#include <boost\format.hpp>
// ...
float f = 42.0f;
string s = (boost::format("%1%") %f).str();
Should you adopt the "never use sprintf" mantra? Decide for yourself. There's usually a best tool for the job and depending on what you're doing, sprintf just might be it.
Yes, it is mostly a matter of buffer overflows. However, those are quite serious business nowdays, since buffer overflows are the prime attack vector used by system crackers to circumvent software or system security. If you expose something like this to user input, there's a very good chance you are handing the keys to your program (or even your computer itself) to the crackers.
From OWASP's perspective, let's pretend we are writing a web server, and we use sprintf to parse the input that a browser passes us.
Now let's suppose someone malicious out there passes our web browser a string far larger than will fit in the buffer we chose. His extra data will instead overwrite nearby data. If he makes it large enough, some of his data will get copied over the webserver's instructions rather than its data. Now he can get our webserver to execute his code.
Your 2 numbered conclusions are correct, but incomplete.
There is an additional risk:
char* format = 0;
char buf[128];
sprintf(buf, format, "hello");
Here, format is not NULL-terminated. sprintf() doesn't check that either.
Your interpretation seems to be correct. However, your case #2 isn't really a buffer overflow. It's more of a memory access violation. That's just terminology though, it's still a major problem.
The sprintf function, when used with certain format specifiers, poses two types of security risk: (1) writing memory it shouldn't; (2) reading memory it shouldn't. If snprintf is used with a size parameter that matches the buffer, it won't write anything it shouldn't. Depending upon the parameters, it may still read stuff it shouldn't. Depending upon the operating environment and what else a program is doing, the danger from improper reads may or may not be less severe than that from improper writes.
It is very important to remember that sprintf() adds the ASCII 0 character as string terminator at the end of each string. Therefore, the destination buffer must have at least n+1 bytes (To print the word "HELLO", a 6-byte buffer is required, NOT 5)
In the example below, it may not be obvious, but in the 2-byte destination buffer, the second byte will be overwritten by ASCII 0 character. If only 1 byte was allocated for the buffer, this would cause buffer overrun.
char buf[3] = {'1', '2'};
int n = sprintf(buf, "A");
Also note that the return value of sprintf() does NOT include the null-terminating character. In the example above, 2 bytes were written, but the function returns '1'.
In the example below, the first byte of class member variable 'i' would be partially overwritten by sprintf() (on a 32-bit system).
struct S
{
char buf[4];
int i;
};
int main()
{
struct S s = { };
s.i = 12345;
int num = sprintf(s.buf, "ABCD");
// The value of s.i is NOT 12345 anymore !
return 0;
}
I pretty much have stated a small example how you could get rid of the buffer size declaration for the sprintf (if you intended to, of course!) and no snprintf envolved ....
Note: This is an APPEND/CONCATENATION example, take a look at here