Implementation of getline ( istream& is, string& str ) - c++

My Question is very simple, how is getline(istream, string) implemented?
How can you solve the problem of having fixed size char arrays like with getline (char* s, streamsize n ) ?
Are they using temporary buffers and many calls to new char[length] or another neat structure?

getline(istream&, string&) is implemented in a way that it reads a line. There is no definitive implementation for it; each library probably differs from one another.
Possible implementation:
istream& getline(istream& stream, string& str)
{
char ch;
str.clear();
while (stream.get(ch) && ch != '\n')
str.push_back(ch);
return stream;
}

#SethCarnegie is right: more than one implementation is possible. The C++ standard does not say which should be used.
However, the question is still interesting. It's a classic computer-science problem. Where, and how, does one allocate memory when one does not know in advance how much memory to allocate?
One solution is to record the string's characters as a linked list of individual characters. This is neither memory-efficient nor fast, but it works, is robust, and is relatively simple to program. However, a standard library is unlikely to be implemented this way.
A second solution is to allocate a buffer of some fixed length, such as 128 characters. When the buffer overflows, you allocate a new buffer of double length, 256 characters, then copy the old characters over to the new storage, then release the old. When the new buffer overflows, you allocate an even newer buffer of double length again, 512 characters, then repeat the process; and so on.
A third solution combines the first two. A linked list of character arrays is maintained. The first two members of the list store (say) 128 characters each. The third stores 256. The fourth stores 512, and so on. This requires more programming than the others, but may be preferable to either, depending on the application.
And the list of possible implementations goes on.
Regarding standard-library implementations, #SteveJessop adds that "[a] standard library's string isn't permitted to be implemented as (1), because of the complexity requirement of operator[] for strings. In C++11 it's not permitted to be implemented as (3) either, because of the contiguity requirement for strings. The C++ committee expressed the belief that no active C++ implementation did (3) at the time they added the contiguity requirement. Of course, getline can do what it likes temporarily with the characters before adding them all to the string, but the standard does say a lot about what string can do."
The addition is relevant because, although getline could temporarily store its data in any of several ways, if the data's ultimate target is a string, this may be relevant to getline's implementation. #SteveJessop further adds, "For string itself, implementations are pretty much required to be (2) except that they can choose their own rate of expansion; they don't have to double each time as long as they multiply by some constant."

As #3bdalla said, implementation of thb doesn't work as gnu implementation. So, I wrote my own implementation, which works like gnu's one. I don't know what will be with errors in this variant, so it needs to be tested.
My implementation of getline:
std::istream& getline(std::istream& is, std::string& s, char delim = '\n'){
s.clear();
char c;
std::string temp;
if(is.get(c)){
temp.push_back(c);
while((is.get(c)) && (c != delim))
temp.push_back(c);
if(!is.bad())
s = temp;
if(!is.bad() && is.eof())
is.clear(std::ios_base::eofbit);
}
return is;
}

Related

Why is the last character of character array is getting excluded?

#include<iostream>
using namespace std;
int main()
{
int n;
cin>>n;
cin.ignore();
char arr[n+1];
cin.getline(arr,n);
cin.ignore();
cout<<arr;
return 0;
}
Input:
11
of the year
Output:
of the yea
I'm already providing n+1 for the null character. Then why is the last character getting excluded?
You allocated n+1 characters for your array, but then you told getline that there were only n characters available. It should be like this:
int n;
cin>>n;
cin.ignore();
char arr[n+1];
cin.getline(arr,n+1); // change here
cin.ignore();
cout<<arr;
Per cppreference.com:
https://en.cppreference.com/w/cpp/io/basic_istream/getline
Behaves as UnformattedInputFunction. After constructing and checking the sentry object, extracts characters from *this and stores them in successive locations of the array whose first element is pointed to by s, until any of the following occurs (tested in the order shown):
end of file condition occurs in the input sequence (in which case setstate(eofbit) is executed)
the next available character c is the delimiter, as determined by Traits::eq(c, delim). The delimiter is extracted (unlike basic_istream::get()) and counted towards gcount(), but is not stored.
count-1 characters have been extracted (in which case setstate(failbit) is executed).
If the function extracts no characters (e.g. if count < 1), setstate(failbit)is executed.
In any case, if count > 0, it then stores a null character CharT() into the next successive location of the array and updates gcount().
In your case, n=11. You are allocating n+1 (12) chars, but telling getline() that only n (11) chars are available, so it reads only n-1 (10) chars into the array and then terminates the array with '\0' in the 11th char. That is why you are missing the last character.
of the year
^
10th char, stops here
You need to +1 when calling getline(), to match your actual array size:
cin.getline(arr,n+1);
john's answer should fix your issue. Variable-length arrays (your char arr[n+1]) are not part of the C++ standard, for justified reasons. Yet I've taken a few hours of my time to go way out of question's scope and create the...
Student's guide to C++ I/O
...and I/O in general, with an emphasis on the I part. Fear not, do it the C++ way! The following snippets should be compiled with a standard-conforming C++ compiler.
C++ I/O & standard library
Textual input
This is the recommended way of reading UTF-8 encoded strings in C++, the most widespread text encoding. We will use std::string for storage, which is the de-facto way for holding UTF-8 encoded strings, and std::getline for the reading itself.
#include <iostream> // std::cin, std::cout, std::ws
#include <string> // std::string, std::getline
int main() {
int size;
// std::ws ignores all whitespace in the stream,
// until the first non-whitespace character.
// it's prettier and handles cases a simple .ignore() does not.
std::cin >> size >> std::ws;
std::string input;
std::getline(std::cin, input);
// This condition will most certainly be true (output will be 1).
std::cout << (size == input.size()) << '\n';
}
std::string is dynamically allocated, or, as you may hear, on the heap. This is a broad subject, so feel free to venture on your own, from this given starting point! How does this help us? We can store strings of sizes unknown ahead of time on the heap, because we can always reallocate a bigger buffer! std::getline allocates and reallocates as it reads the input until newline is reached, so you can read without knowing a size beforehand. Your size variable will most probably be equal with the size of the string under the assumption that this is a school exercise where the input length is provided as you're probably not taught of dynamic memory. For good reason, though - it's complex and would needlessly distract from the actual subject (algorithms, data structures etc.). Good to keep in mind: std::strings, unlike C-style strings, are not null terminated, but you can get a null-terminated C-style string from an std::string by calling the .c_str() method.
Binary data
What's binary data? Everything that's not text: images, videos, music, 2003 MS Word documents (the .doc ones, wait 'til you see what .docx is) and many others. It's customary to store binary as raw bytes, which is a fancy way to say numbers. unsigned char is the C/C++ type used to represent these raw bytes (C++17 introduces std::byte for this purpose. To work with data from binary input we need to store it somewhere in memory - either on the stack, or on the heap. We could store the whole input at once, but binary files are considered too large for this (and, really, are - think about the size of a movie!), so we usually read it in chunks - that means, we read only a finite part of it at a time (say 256 characters, that's our buffer), and we keep reading until we reach the end of the input (usually called end-of-file or, short, EOF). As a rule of thumb, when a buffer is small and static (doesn't need to be resized, as our string above), we can store it on the stack. If any of those conditions is not met, it goes on the heap. We should note that the notions of small and large are quite context dependent - compiler, OS, hardware, runtime environment (see this thread on stack size limits and embedded systems). The buffer size you'll choose is also task-specific, so there's no rule here, too. Let's see some code now!
#include <array> // std::array
#include <fstream> // std::ifstream, std::ofstream
int main() {
// We open this file in binary mode.
// The default mode may modify the input.
std::ifstream input{"some_image.jpg", std::ios::binary};
// 256 is our buffer size, unsigned char is the array type.
// This is the C++ way of `unsigned char buffer[256]`.
std::array<unsigned char, 256> buffer;
while (input.read(buffer.data(), buffer.size())) {
// Buffer is filled, do something with it
}
// At this point, either EOF is reached or an error occurred.
if (input.eof()) {
// Less characters than the buffer's size have been read.
// .gcount() returns the number of characters read by
// the last operation.
const std::streamsize chunk_size = input.gcount();
// Do something with these characters, as in the loop.
// Valid range to access in the buffer is [0, chunk_size).
// chunk_size can be 0, too. In that case, there is no more data
// to handle.
} else {
// Some other failure, handle error.
}
}
This snippet is reading through a file using a small, stack-allocated buffer of 256 bytes. std::array makes usage convenient and safe with its methods - read the linked docs! If we want to use a large buffer (say, 16MB), we replace the std::array with an std::vector:
std::vector buffer(1 << 24); // 1 << 24 gives 16MB in bytes
Rest is the same. You could also use std::string here, too, as std::string does not imply/force UTF-8 encoding of input. It's useful to have a convention that easily differentiates between binary and text data, in code.
Something to note is that reading in smaller chunks uses less space, but takes more time - taking bytes from a file involves making OS system calls and moving disks or electrons, when reading from a hard drive or an SSD, respectively. C++'s fstream objects already do buffering for you to speed up reads, which is usually a much-needed optimization. You'll know if this affects you.
Another thing to note is the EOF and error handling, using the .eof() method. We have omitted error handling in the textual input retrieval, but here we are forced into doing it, if we don't want to lose data. When EOF is reached, usually less bytes than the buffer size have been read, so we need a way to know how much of the buffer was filled with data. This is what .gcount() tells us. Depending on the program you're making, you may deem the EOF error as "unexpected" if the buffer is partially filled (.gcount() returns a non-0 value) - for example, the data read is incomplete, according to the rules it was assumed to be created after, or in other words, the end of file was reached before the data was supposed to end. Other than that, EOF is a condition that all files are in after being fully read.
C-style I/O
This may look closer to what's taught in school. As we've explained the general concepts above, this section will be richer in coding and explanations of code. We still use C++ as a language, so the C++ version of the C headers and the std namespace will be used - to have the code that follows work in a C compiler, replace the <csomething> headers with <something.h> and remove the std:: namespace prefix from types and functions. Let's dive into it!
Textual input
The equivalent of a C++ stream (std::cin, std::fstream etc.) in C is the std::FILE. FILEs are buffered by default, as are C++ streams. We'll use std::fscanf for reading the size of the input, which is just scanf but it takes as parameter the stream you read from, and std::fgets for reading the text line.
#include <cstdio> // std::FILE, std::fscanf, std::fgets, stdin
#include <cstring> // std::strcspn
// discard_whitespace does what std::ws did above.
// It consumes all whitespace before a non-whitespace
// character from stream f.
void discard_whitespace(std::FILE* f) {
char discard;
// The leading space in the format string
// tells fscanf to consume all whitespace.
std::fscanf(f, " %c", &discard);
}
int main() {
int size;
// stdin is a macro, doesn't have a namespace,
// hence no std:: prefix.
std::fscanf(stdin, "%d", &size);
// fscanf, like std::cin, doesn't consume whitespace
discard_whitespace(stdin);
// Your school exercise will probably have a size limit for the input.
// We consider it to be 256.
const int SIZE_UPPER_BOUND = 256;
// We add some extra bytes so the maximum length input can be accomodated.
// 1 is added for the null terminator of C-style strings.
// The other 2 is because `fgets` will also read the newline,
// which can be \n or \r\n, depending on OS. See explanation after code.
char input[SIZE_UPPER_BOUND + 3];
// The actual read - sizeof gets the size of our input buffer,
// we don't have to write it twice.
std::fgets(input, sizeof input, stdin);
// fgets also reads the newline, unlike `std::getline` or
// `std::cin.getline` - we have to remove it ourselves.
input[std::strcspn(input, "\r\n")] = '\0';
// This condition will be true, as in the C++ example.
std::fprintf(stdin, "%d\n", std::strlen(input) == size);
}
Let's unpack that newline removal. std::strcspn finds the first position of any of the given characters in the input. We provide both \r and \n, to support UNIX (\n) and Windows (\r\n) newline terminators - yeah, they're different, see Wikipedia, on "Newline". By adding the null terminator, '\0', we move the ending of the string where the newline was, basically "removing" the newline. If this is a school assignment, we can assume input is correct, so we could have used size + 1 instead of std::strcspn to remove the newline:
input[size + 1] = '\0';
This doesn't work when we don't know the input size or the input may be invalid.
As an optimization trick, observe that std::strcspn returns the line length, in this case. When you don't know the size, but you need it for later, you can save the result of std::strcspn in a variable before, and then use it instead of std::strlen:
// std::size_t is an unsigned integral type, used to represent
// array sizes and indexes in C/C++
const std::size_t input_size = std::strcspn(input, "\r\n");
input[input_size] = '\0';
You'll see some people use 0 or NULL for the terminator. I recommend against it - unlike the \0 literal, that is of char type, the other two variants are implicitly casted to char. If you read the linked documentation, you'll realize NULL is even incorrect, according to the spec, as it's meant to be used in contexts that require pointers only.
An alternative method to fgets is fscanf, again. Thread carefully, though - while a simple %s may do it, it makes your code vulnerable to buffer overflow exploits. See this StackOverflow thread on disadvantages of scanf, too. Let's see the (safe) code:
std::fscanf(stdin, "%256[^\r\n]s", input);
That number limits the input size to our SIZE_UPPER_BOUND, and the [^\r\n] tells fscanf to read all characters up to \r or \n. With this method you can remove the discard_whitespace call, as fscanf with the %s verb consumes leading whitespace. A downside to fscanf is that you have to keep the size limit in the input string and the buffer size in sync - you have no way to specify the input size dynamically other than building the format string dynamically, which is overkill for a school assignment). This is a problem in more sizable codebases, but for a one-file, one-time school assignment it's not a big deal, so you may prefer fscanf over fgets, as it's less work. fscanf doesn't read the newline in the buffer, too.
Binary data
The equivalent of C++'s std::cin.read in C world is std::fread. Code will resemble its C++ counterpart:
#include <cstdio>
int main() {
// The second parameter is the file access mode.
// In this case, it is read (r) binary (b).
std::FILE* f = std::fopen("some_image.jpg", "rb");
unsigned char buffer[256];
std::size_t chunk_size;
while (chunk_size = std::fread(buffer, sizeof buffer[0], sizeof buffer, f)) {
// chunk_size == sizeof buffer, do something with the buffer
}
if (std::feof(f)) {
// chunk_size != sizeof buffer, do something with buffer
// or handle as error
} else {
// an error occurred, handle it
}
// We need to close the file, unlike in C++, where it is closed automatically.
std::fclose(f);
}
The arguments to std::fread are hairy: read the documentation. Everything else looks very similar to the C++ way, from the loop to the error handling. Why? Because it's literally the same thing - we're just using different (standard) libraries. Another similarity is that C I/O is also buffered by default, just like C++'s. What's different is the line at the end - the call to std::fclose. We're not doing anything similar in the C++ code, right? No. Remember that C++ classes have constructors and destructors, functions that are automatically called at the beginning, respectively at the end of a variable's lifetime. These two allow us to implement the RAII technique, which will do the resource management automatically (opening the file in the constructor, closing it in the destructor). RAII is used inside std::string and std::vector (and other containers, smart pointers & others). In other words, the destructor of std::ifstream closes the file at the end of main(), just as we are doing here, manually.
Hybrid approach (??)
Would you ever want to combine the two? So it seems. Let's talk drawbacks:
The C++ I/O library, due to the way it's built, takes more care to use in a performant manner compared to C's (virtual function calls and extra function calls in general, especially when using << and >> operators & stream manipulators, as each of these is a function call, compared to a single plain function call/operation with the C library). See this StackOverflow thread on i/ostream speed, too. The C++ library is also more verbose, especially in the case of outputting (ever heard of the "chevrone hell"?)
The C I/O library is easy to use improperly/unsafely, the terse, shorthand namings make code difficult to follow, and output cannot be extended to support custom types (this is a problem when using C-style I/O in C++). It also takes great care to handle dynamic buffers correctly, given that the only way of managing heap memory in C is malloc and free.
Some schools may crucify you if any trace of std::string is left in sight (or so I've heard)
Using C-style types (char[N] instead of std::array<char, N>, for example), is easier - no headers to include, as the types are builtin primitives and less to type. May be preferred in short, throwaway programs like algorithmic exercises at school.
With these in mind, we can take a look at how to conveniently combine the two when reading text and binary!
Textual input
We will take advantage of the terseness of C-style types and the ease of use of C++'s I/O library:
#include <iostream>
int main() {
int size;
std::cin >> size >> std::ws;
const int SIZE_UPPER_BOUND = 256;
char input[SIZE_UPPER_BOUND + 1];
std::cin.getline(input, sizeof input);
// Input done, solve the problem.
}
Teachers don't have to scratch their head at the presence of std::string and std::getline and all the standard library shenanigans you start using after diving in this rabbit hole. You, the programmer, don't have to clean up newline endings and memorize arcane format specifiers just to read a string and an int. Focus on code and solve problems without having to debug the input reading logic, ever - it just works!
Binary data
The convoluted hierarchical tree of C++'s I/O library types scares you, the clean assembly output enjoyer, just like Linus Torvalds. You're still somehow afraid to manually manage memory, so you choose this solution:
#include <cstdio>
#include <vector>
int main() {
// The second parameter is the file access mode.
// In this case, it is read (r) binary (b).
std::FILE* f = std::fopen("some_image.jpg", "rb");
std::vector<unsigned char> buffer(1 << 24);
std::size_t chunk_size;
while (chunk_size = std::fread(buffer.data(), sizeof buffer[0], buffer.size(), f)) {
// use the buffer
}
if (std::feof(f)) {
// handle EOF
} else {
// handle error
}
std::fclose(f);
}
Weird choice, given that you still manage the file's lifetime manually. While this may not be the best example, using C++ RAII containers together with C libraries is not uncommon - memory safety is crucial.
Trivia
as usual, weigh your decision of using namespace std;
Cool things you won't need:
speed up C++ I/O using a single line at the beginning of the program (but be careful)
disable C I/O buffering
disable C++ I/O buffering
Conclusion
I/O is the crowded junction of fundamental CS concepts, hardware & software inner workings and C++'s features and quirks. Take in what you can at a time & focus on what matters, and make sure you're building on sturdy fundamentals.

How to implement istream& getline (istream& in, MyString& s, char delim = '\n')? [duplicate]

My Question is very simple, how is getline(istream, string) implemented?
How can you solve the problem of having fixed size char arrays like with getline (char* s, streamsize n ) ?
Are they using temporary buffers and many calls to new char[length] or another neat structure?
getline(istream&, string&) is implemented in a way that it reads a line. There is no definitive implementation for it; each library probably differs from one another.
Possible implementation:
istream& getline(istream& stream, string& str)
{
char ch;
str.clear();
while (stream.get(ch) && ch != '\n')
str.push_back(ch);
return stream;
}
#SethCarnegie is right: more than one implementation is possible. The C++ standard does not say which should be used.
However, the question is still interesting. It's a classic computer-science problem. Where, and how, does one allocate memory when one does not know in advance how much memory to allocate?
One solution is to record the string's characters as a linked list of individual characters. This is neither memory-efficient nor fast, but it works, is robust, and is relatively simple to program. However, a standard library is unlikely to be implemented this way.
A second solution is to allocate a buffer of some fixed length, such as 128 characters. When the buffer overflows, you allocate a new buffer of double length, 256 characters, then copy the old characters over to the new storage, then release the old. When the new buffer overflows, you allocate an even newer buffer of double length again, 512 characters, then repeat the process; and so on.
A third solution combines the first two. A linked list of character arrays is maintained. The first two members of the list store (say) 128 characters each. The third stores 256. The fourth stores 512, and so on. This requires more programming than the others, but may be preferable to either, depending on the application.
And the list of possible implementations goes on.
Regarding standard-library implementations, #SteveJessop adds that "[a] standard library's string isn't permitted to be implemented as (1), because of the complexity requirement of operator[] for strings. In C++11 it's not permitted to be implemented as (3) either, because of the contiguity requirement for strings. The C++ committee expressed the belief that no active C++ implementation did (3) at the time they added the contiguity requirement. Of course, getline can do what it likes temporarily with the characters before adding them all to the string, but the standard does say a lot about what string can do."
The addition is relevant because, although getline could temporarily store its data in any of several ways, if the data's ultimate target is a string, this may be relevant to getline's implementation. #SteveJessop further adds, "For string itself, implementations are pretty much required to be (2) except that they can choose their own rate of expansion; they don't have to double each time as long as they multiply by some constant."
As #3bdalla said, implementation of thb doesn't work as gnu implementation. So, I wrote my own implementation, which works like gnu's one. I don't know what will be with errors in this variant, so it needs to be tested.
My implementation of getline:
std::istream& getline(std::istream& is, std::string& s, char delim = '\n'){
s.clear();
char c;
std::string temp;
if(is.get(c)){
temp.push_back(c);
while((is.get(c)) && (c != delim))
temp.push_back(c);
if(!is.bad())
s = temp;
if(!is.bad() && is.eof())
is.clear(std::ios_base::eofbit);
}
return is;
}

C++ std::string append vs push_back()

This really is a question just for my own interest I haven't been able to determine through the documentation.
I see on http://www.cplusplus.com/reference/string/string/ that append has complexity:
"Unspecified, but generally up to linear in the new string length."
while push_back() has complexity:
"Unspecified; Generally amortized constant, but up to linear in the new string length."
As a toy example, suppose I wanted to append the characters "foo" to a string. Would
myString.push_back('f');
myString.push_back('o');
myString.push_back('o');
and
myString.append("foo");
amount to exactly the same thing? Or is there any difference? You might figure that append would be more efficient because the compiler would know how much memory is required to extend the string the specified number of characters, while push_back may need to secure memory each call?
In C++03 (for which most of "cplusplus.com"'s documentation is written), the complexities were unspecified because library implementers were allowed to do Copy-On-Write or "rope-style" internal representations for strings. For instance, a COW implementation might require copying the entire string if a character is modified and there is sharing going on.
In C++11, COW and rope implementations are banned. You should expect constant amortized time per character added or linear amortized time in the number of characters added for appending to a string at the end. Implementers may still do relatively crazy things with strings (in comparison to, say std::vector), but most implementations are going to be limited to things like the "small string optimization".
In comparing push_back and append, push_back deprives the underlying implementation of potentially useful length information which it might use to preallocate space. On the other hand, append requires that an implementation walk over the input twice in order to find that length, so the performance gain or loss is going to depend on a number of unknowable factors such as the length of the string before you attempt the append. That said, the difference is probably extremely Extremely EXTREMELY small. Go with append for this -- it is far more readable.
I had the same doubt, so I made a small test to check this (g++ 4.8.5 with C++11 profile on Linux, Intel, 64 bit under VmWare Fusion).
And the result is interesting:
push :19
append :21
++++ :34
Could be possible this is because of the string length (big), but the operator + is very expensive compared with the push_back and the append.
Also it is interesting that when the operator only receives a character (not a string), it behaves very similar to the push_back.
For not to depend on pre-allocated variables, each cycle is defined in a different scope.
Note : the vCounter simply uses gettimeofday to compare the differences.
TimeCounter vCounter;
{
string vTest;
vCounter.start();
for (int vIdx=0;vIdx<1000000;vIdx++) {
vTest.push_back('a');
vTest.push_back('b');
vTest.push_back('c');
}
vCounter.stop();
cout << "push :" << vCounter.elapsed() << endl;
}
{
string vTest;
vCounter.start();
for (int vIdx=0;vIdx<1000000;vIdx++) {
vTest.append("abc");
}
vCounter.stop();
cout << "append :" << vCounter.elapsed() << endl;
}
{
string vTest;
vCounter.start();
for (int vIdx=0;vIdx<1000000;vIdx++) {
vTest += 'a';
vTest += 'b';
vTest += 'c';
}
vCounter.stop();
cout << "++++ :" << vCounter.elapsed() << endl;
}
Add one more opinion here.
I personally consider it better to use push_back() when adding characters one by one from another string. For instance:
string FilterAlpha(const string& s) {
string new_s;
for (auto& it: s) {
if (isalpha(it)) new_s.push_back(it);
}
return new_s;
}
If using append()here, I would replace push_back(it) with append(1,it), which is not that readable to me.
Yes, I would also expect append() to perform better for the reasons you gave, and in a situation where you need to append a string, using append() (or operator+=) is certainly preferable (not least also because the code is much more readable).
But what the Standard specifies is the complexity of the operation. And that is generally linear even for append(), because ultimately each character of the string being appended (and possible all characters, if reallocation occurs) needs to be copied (this is true even if memcpy or similar are used).

Is it possible to use an std::string for read()?

Is it possible to use an std::string for read() ?
Example :
std::string data;
read(fd, data, 42);
Normaly, we have to use char* but is it possible to directly use a std::string ? (I prefer don't create a char* for store the result)
Thank's
Well, you'll need to create a char* somehow, since that's what the
function requires. (BTW: you are talking about the Posix function
read, aren't you, and not std::istream::read?) The problem isn't
the char*, it's what the char* points to (which I suspect is what
you actually meant).
The simplest and usual solution here would be to use a local array:
char buffer[43];
int len = read(fd, buffer, 42);
if ( len < 0 ) {
// read error...
} else if ( len == 0 ) {
// eof...
} else {
std::string data(buffer, len);
}
If you want to capture directly into an std::string, however, this is
possible (although not necessarily a good idea):
std::string data;
data.resize( 42 );
int len = read( fd, &data[0], data.size() );
// error handling as above...
data.resize( len ); // If no error...
This avoids the copy, but quite frankly... The copy is insignificant
compared to the time necessary for the actual read and for the
allocation of the memory in the string. This also has the (probably
negligible) disadvantage of the resulting string having an actual buffer
of 42 bytes (rounded up to whatever), rather than just the minimum
necessary for the characters actually read.
(And since people sometimes raise the issue, with regards to the
contiguity of the memory in std:;string: this was an issue ten or more
years ago. The original specifications for std::string were designed
expressedly to allow non-contiguous implementations, along the lines of
the then popular rope class. In practice, no implementor found this
to be useful, and people did start assuming contiguity. At which point,
the standards committee decided to align the standard with existing
practice, and require contiguity. So... no implementation has ever not
been contiguous, and no future implementation will forego contiguity,
given the requirements in C++11.)
No, you cannot and you should not. Usually, std::string implementations internally store other information such as the size of the allocated memory and the length of the actual string. C++ documentation explicitly states that modifying values returned by c_str() or data() results in undefined behaviour.
If the read function requires a char *, then no. You could use the address of the first element of a std::vector of char as long as it's been resized first. I don't think old (pre C++11) strings are guarenteed to have contiguous memory otherwise you could do something similar with the string.
No, but
std::string data;
cin >> data;
works just fine. If you really want the behaviour of read(2), then you need to allocate and manage your own buffer of chars.
Because read() is intended for raw data input, std::string is actually a bad choice, because std::string handles text. std::vector seems like the right choice to handle raw data.
Using std::getline from the strings library - see cplusplus.com - can read from an stream and write directly into a string object. Example (again ripped from cplusplus.com - 1st hit on google for getline):
int main () {
string str;
cout << "Please enter full name: ";
getline (cin,str);
cout << "Thank you, " << str << ".\n";
}
So will work when reading from stdin (cin) and from a file (ifstream).

(How) can I use the Boost String Algorithms Library with c strings (char pointers)?

Is it possible to somehow adapt a c-style string/buffer (char* or wchar_t*) to work with the Boost String Algorithms Library?
That is, for example, it's trimalgorithm has the following declaration:
template<typename SequenceT>
void trim(SequenceT &, const std::locale & = std::locale());
and the implementation (look for trim_left_if) requires that the sequence type has a member function erase.
How could I use that with a raw character pointer / c string buffer?
char* pStr = getSomeCString(); // example, could also be something like wchar_t buf[256];
...
boost::trim(pStr); // HOW?
Ideally, the algorithms would work directly on the supplied buffer. (As far as possible. it obviously can't work if an algorithm needs to allocate additional space in the "string".)
#Vitaly asks: why can't you create a std::string from char buffer and then use it in algorithms?
The reason I have char* at all is that I'd like to use a few algorthims on our existing codebase. Refactoring all the char buffers to string would be more work than it's worth, and when changing or adapting something it would be nice to just be able to apply a given algorithm to any c-style string that happens to live in the current code.
Using a string would mean to (a) copy char* to string, (b) apply algorithm to string and (c) copy string back into char buffer.
For the SequenceT-type operations, you probably have to use std::string. If you wanted to implement that by yourself, you'd have to fulfill many more requirements for creation, destruction, value semantics etc. You'd basically end up with your implementation of std::string.
The RangeT-type operations might be, however, usable on char*s using the iterator_range from Boost.Range library. I didn't try it, though.
There exist some code which implements a std::string like string with a fixed buffer. With some tinkering you can modify this code to create a string type which uses an external buffer:
char buffer[100];
strcpy(buffer, " HELLO ");
xstr::xstring<xstr::fixed_char_buf<char> >
str(buffer, strlen(buffer), sizeof(buffer));
boost::algorithm::trim(str);
buffer[str.size()] = 0;
std::cout << buffer << std::endl; // prints "HELLO"
For this I added an constructor to xstr::xstring and xstr::fixed_char_buf to take the buffer, the size of the buffer which is in use and the maximum size of the buffer. Further I replaced the SIZE template argument with a member variable and changed the internal char array into a char pointer.
The xstr code is a bit old and will not compile without trouble on newer compilers but it needs some minor changes. Further I only added the things needed in this case. If you want to use this for real, you need to make some more changes to make sure it can not use uninitialized memory.
Anyway, it might be a good start for writing you own string adapter.
I don't know what platform you're targeting, but on most modern computers (including mobile ones like ARM) memory copy is so fast you shouldn't even waste your time optimizing memory copies. I say - wrap char* in std::string and check whether the performance suits your needs. Don't waste time on premature optimization.