reading and writing a vector of structs to file - c++

I've read a few posts on Stack Overflow and a number of other site about writing vectors to files. I've implemented what I feel is working, but I'm having some troubles. One of the data members in the struct is a class string, and when reading the vector back in, that data is lost. Also, after writing the first iteration, additional iterations cause a malloc error. How can I modify the code below to achieve my desired ability to save the vector to a file, then read it back in when the program launches again? Currently, the read is done in the constructor, write in destructor, of a class who's only data member is the vector, but has methods to manipulate that vector.
Here is the gist of my read / write methods. Assuming vector<element> elements...
Read:
ifstream infile;
infile.open("data.dat", ios::in | ios::binary);
infile.seekg (0, ios::end);
elements.resize(infile.tellg()/sizeof(element));
infile.seekg (0, ios::beg);
infile.read( (char *) &elements[0], elements.capacity()*sizeof(element));
infile.close();
Write:
ofstream outfile;
outfile.open("data.dat", ios::out | ios::binary | ios_base::trunc);
elements.resize(elements.size());
outfile.write( (char *) &elements[0], elements.size() * sizeof(element));
outfile.close();
Struct element:
struct element {
int id;
string test;
int other;
};

In C++, memory can not generally be directly read and written to disk directly like that. In particular, your struct element contains a string, which is a non-POD data type, and therefore cannot be directly accessed.
A thought experiment might help clarify this. Your code assumes that all your element values are the same size. What would happen if one of the string test values was longer than what you've assumed? How would your code know what size to use when reading and writing to disk?
You will want to read about serialization for more information about how to handle this.

You code assumes all the relevant data exists directly inside the vector, whereas strings are fixed-sized objects that have pointers which can addres their variable sized content on the heap. You're basically saving the pointers and not the text. You should write a some string serialisation code, for example:
bool write_string(std::ostream& os, const std::string& s)
{
size_t n = s.size();
return os.write(n, sizeof n) && os.write(s.data(), n);
}
Then you can write serialisation routines for your struct. There are a few design options:
- many people like to declare Binary_IStream / Binary_OStream types that can house a std::ostream, but being a distinct type can be used to create a separate set of serialisation routines ala:
operator<<(Binary_OStream& os, const Some_Class&);
Or, you can just abandon the usual streaming notation when dealing with binary serialisation, and use function call notation instead. Obviously, it's nice to let the same code correctly output both binary serialisation and human-readable serialisation, so the operator-based approach is appealing.
If you serialise numbers, you need to decide whether to do so in a binary format or ASCII. With a pure binary format, where portable is required (even between 32-bit and 64-bit compiles on the same OS), you may need to make some effort to encode and use type size metadata (e.g. int32_t or int64_t?) as well as endianness (e.g. consider network byte order and ntohl()-family functions). With ASCII you can avoid some of those considerations, but it's variable length and can be slower to write/read. Below, I arbitrarily use ASCII with a '|' terminator for numbers.
bool write_element(std::ostream& os, const element& e)
{
return (os << e.id << '|') && write_string(os, e.test) && (os << e.other << '|');
}
And then for your vector:
os << elements.size() << '|';
for (std::vector<element>::const_iterator i = elements.begin();
i != elements.end(); ++i)
write_element(os, *i);
To read this back:
std::vector<element> elements;
size_t n;
if (is >> n)
for (int i = 0; i < n; ++i)
{
element e;
if (!read_element(is, e))
return false; // fail
elements.push_back(e);
}
...which needs...
bool read_element(std::istream& is, element& e)
{
char c;
return (is >> e.id >> c) && c == '|' &&
read_string(is, e.test) &&
(is >> e.other >> c) && c == '|';
}
...and...
bool read_string(std::istream& is, std::string& s)
{
size_t n;
char c;
if ((is >> n >> c) && c == '|')
{
s.resize(n);
return is.read(s.data(), n);
}
return false;
}

Related

save struct into a binary file and read it

I have array of struct in class,and I want save that in file.
if I put the input ac.mem [i] .username except the username is stored in the file
And if I put the input ac.mem [i] nothing will be saved.
This is part of my code:
const int len=5;
class account {
public:
struct members {
string username;
string password;
int acsess;
}mem[len];
};
class account ac;
....
ac.mem[0] = { "admin","soran",5 };
ac.mem[1] = { "hamid","hamid",4 };
fstream acc1("account", ios::binary);
for (int i = 0; i <= 1; i++) {
acc1.write((char*)&ac.mem[i].username, sizeof(ac.mem[i].username));
}
acc1.close();
....
ifstream acc2("account", ios::binary);
for (int i = 0; i <= len; ++i) {
acc1.read((char*)&ac.mem[i].username, sizeof(ac.mem[i].username));
cout << i + 1 << "." << setw(10) << ac.mem[i].username << setw(20) << ac.mem[i].password << setw(20) << ac.mem[i].acsess << endl;
}
acc2.close();
std::string objects are pretty complex types – they internally maintain pointers to memory. When you just write the internal representation to a file (casting address of to char*) all you write out are these pointers plus possibly some additional management data.
The actual string contents, though, are stored at the locations these pointers point to. When reading back you cannot ever assume to find the same data at the address you've just restored from file (unless the original string object written to still exists – but then playing around with the internals will, if two different std::string objects involved, with 100% probability lead to undefined behaviour due to double deletion, if not reading and writing them from/to memory that way already is).
What you actually want to print to file are the contents of the string – which you get by either std::string::c_str or alternatively std::string::data. You might additionally want to include the terminating null character (hopefully there are no internal ones within the string...) to be able to read back multiple strings, stopping reading each one at exactly the null terminator, then writing to file might look like:
std::string s; // assign some content...
std::ofstream f; // open some path
if(f) // stream opened successfully?
{
f.write(s.c_str(), s.length() + 1);
}
Note that std::string::length returns the length without the terminating null character, so if you want/need to include it, you need to add one to as done above.
Alternatively you can write out the string's length first and then skip writing the null character – with the advantage that on reading back you already know in advance how many characters to read and thus to pre-allocate within your objects (std::string::reserve). For compatibilty reasons over different compilers and especially machines make sure to write out fixed-size data types from <cstdint> header, e.g.:
uint32_t length = s.length();
f.write(reinterpret_cast<char const*>(&length), sizeof(length));
f.write(s.c_str(), s.length());
This approach covers internally existing null characters as well (though if such data exists, std::vector<unsigned char> or preferably std::vector<uint8_t> might be better alternative, std::string is intended for texts).
If you want to use C language, you could refer to the following code.
#include <stdio.h>
#include <stdlib.h>
#pragma warning(disable : 4996)
typedef struct {
char* name;
int phone;
}address;
int main(void)
{
int i;
address a[3];
for (i = 0; i < 3; i++)
{
a[i].name = "jojo";
a[i].phone = "123456";
}
FILE* fp;
fp = fopen("list.txt", "ab");
for (i = 0; i < 3; i++)
{
printf(" % s, % d",a[i].name,a[i].phone);
fwrite(&a[i], sizeof(address), 1, fp);
}
fclose(fp);
return 0;
}

How to reserve memory space for std::vector in terms of bytes?

I am trying to write up a config parser class in c++.
I'll first give a snippet of my class:
class foo{
private:
struct st{
std::vector<pair<std::string,std::string>> dvec;
std::string dname;
}
std::vector<st> svec;
public:
//methods of the class
}
int main(){
foo f;
//populate foo
}
I will populate the vectors with data from a file. The file has some text with delimiters. I'll break up the text into strings using the delimiters. I know the exact size of the file and since I am keeping all data as character string, it's safe to assume the vector svec will take the same memory space as the file size. However, I don't know how many strings there will be. e.g., I know the file size is 100 bytes but I don't know if it's 1 string of 100 characters or 10 strings of 10 characters each or whatever.
I would like to avoid reallocation as much as possible. But std::vector.reserve() and std::vector.resize() both allocate memory in terms of number of elements. And this is my problem, I don't know how many elements there will be. I just know how many bytes it will need. I dug around a bit but couldn't find anything.
I am guessing I will be cursed if I try this- std::vector<st> *svec = (std::vector<st> *) malloc(filesize);
Is there any way to reserve memory for vector in terms of bytes instead of number of elements? Or some other workaround?
Thank you for your time.
Edit: I have already written the entire code and it's working. I am just looking for ways to optimize it. The entire code is too long so I will give the repository link for anyone interested- https://github.com/Rakib1503052/Config_parser
For the relevant part of the code:
class Config {
private:
struct section {
std::string sec_name;
std::vector<std::pair<std::string, std::string>> sec_data;
section(std::string name)
{
this->sec_name = name;
//this->sec_data = data;
}
};
//std::string path;
std::vector<section> m_config;
std::map<std::string, size_t> section_map;
public:
void parse_config(std::string);
//other methods
};
void Config::parse_config(string path)
{
struct stat buffer;
bool file_exists = (stat(path.c_str(), &buffer) == 0);
if (!file_exists) { throw exception("File does not exist in the given path."); }
else {
ifstream FILE;
FILE.open(path);
if (!FILE.is_open()) { throw exception("File could not be opened."); }
string line_buffer, key, value;;
char ignore_char;
size_t current_pos = 0;
//getline(FILE, line_buffer);
while (getline(FILE, line_buffer))
{
if (line_buffer == "") { continue; }
else{
if ((line_buffer.front() == '[') && (line_buffer.back() == ']'))
{
line_buffer.erase(0, 1);
line_buffer.erase(line_buffer.size() - 1);
this->m_config.push_back(section(line_buffer));
current_pos = m_config.size() - 1;
section_map[line_buffer] = current_pos;
}
else
{
stringstream buffer_stream(line_buffer);
buffer_stream >> key >> ignore_char >> value;
this->m_config[current_pos].sec_data.push_back(field(key, value));
}
}
}
FILE.close();
}
}
It reads an INI file of the format
[section1]
key1 = value1
key2 = value2
[section2]
key1 = value1
key2 = value2
.
.
.
However, after some more digging I found out that std::string works differently than I thought. Simply put, the strings themselves are not in the vector. The vector only holds pointers to the strings. This makes my case moot.
I'll keep the question here for anyone interested. Especially, if the data is changed to unary types like int or double, the question stands and it has a great answer here- https://stackoverflow.com/a/18991810/11673912
Feel free to share other opinions.

Read selected columns using fstream

I am using the following codes to read my data from a file, but my issue is that I only want to catch some columns out of many more columns in the file. Is there any better way of doing this than the approach I am using.
void Data::read_simulated (const string &filepath)
{
ifstream data_out (filepath.c_str());
if (!data_out)
cout<<"Failed to open"<<endl;
else
{
string id_p,age_p, dim_p, my_p, mcf_p, mcp_p, mcl_p, bw_p, bcs_p;
string dummy_line, g;
getline(data_out, dummy_line);
while(data_out>>age_p>>g>>g>>g>>g>>g>>g>>g>>bcs_p>>g>>g>>my_p>>g>>g>>bw_p>>g>>g>>dim_p>>g>>g>>g>>g>>g>>g>>g>>g>>g>>g>>g>>g>>g)
{
//s.cow_id.push_back(get_number(id_p));
if (get_number(age_p)>=1424.0 &&get_number(age_p)<=1733.0)
{
age_pre.push_back(age_p);
dim_pre.push_back(dim_p);
my_pre.push_back(my_p);
//mcf_obs.push_back(get_number(mcf_p));
// mcp_obs.push_back(get_number(mcp_p));
//mcl_obs.push_back(get_number(mcl_p));
bw_pre.push_back(bw_p);
bcs_pre.push_back(bcs_p);
}
}
data_out.close();
}
}
If the columns are aligned with spaces, each starting on a position in line being multiple of a column number and a constant, you could use std::istream::ignore or std::istream::seekg functions to skip some rows.
If that's not the case, at least make your code prettier by using this function:
std::istream &skip_row(std::istream &is, unsigned int count)
{
std::string s;
while(count-- && is >> s) {}
return is;
}
You could make it a template to accept various types, or you could overload an operator>> for a class to get a different syntax than this:
data_out >> age_p && skip_row(data_out, 5) && data_out >> bcs_p >> ...
A naive approach is to read all rows into a std::vector<std::string> and then index it, but it will have an impact on performance due to excessive memory allocation.

A local array repeats inside a loop! C++

The current_name is a local char array inside the following loop. I declared it inside the loop so it changes every time I read a new line from a file. But, for some reason the previous data is not removed from the current_name! It prints old data out if it wasn't overridden by new characters from the next line.
ANY IDEAS?
while (isOpen && !file.eof()) {
char current_line[LINE];
char current_name[NAME];
file.getline(current_line, LINE);
int i = 0;
while (current_line[i] != ';') {
current_name[i] = current_line[i];
i++;
}
cout << current_name << endl;
}
You're not terminating current_name after filling it. Add current_name[i] = 0 after the inner loop just before your cout. You're probably seeing this if you read abcdef then read jkl and probably get jkldef for output
UPDATE
You wanted to know if there is a better way. There is--and we'll get to it. But, coming from Java, your question and followup identified some larger issues that I believe you should be aware of. Be careful what you wish for--you may actually get it [and more] :-). All of the following is based on love ...
Attention All Java Programmers! Welcome to "A Brave New World"!
Basic Concepts
Before we even get to C the language, we need to talk about a few concepts first.
Computer Architecture:
https://en.wikipedia.org/wiki/Computer_architecture
https://en.wikipedia.org/wiki/Instruction_set
Memory Layout of Computer Programs:
http://www.geeksforgeeks.org/memory-layout-of-c-program/
Differences between Memory Addresses/Pointers and Java References:
Is Java "pass-by-reference" or "pass-by-value"?
https://softwareengineering.stackexchange.com/questions/141834/how-is-a-java-reference-different-from-a-c-pointer
Concepts Alien to Java Programmers
The C language gives you direct access the underlying computer architecture. It will not do anything that you don't explicitly specify. Herein, I'm mentioning C [for brevity] but what I'm really talking about is a combination of the memory layout and the computer architecture.
If you read memory that you didn't initialize, you will see seemingly random data.
If you allocate something from the heap, you must explicitly free it. It doesn't magically get marked for deletion by a garbage collector when it "goes out of scope".
There is no garbage collector in C
C pointers are far more powerful that Java references. You can add and subtract values to pointers. You can subtract two pointers and use the difference as an index value. You can loop through an array without using index variables--you just deference a pointer and increment the pointer.
The data of automatic variables in Java are stored in the heap. Each variable requires a separate heap allocation. This is slow and time consuming.
In C, the data of automatic variables in stored in the stack frame. The stack frame is a contiguous area of bytes. To allocate space for the stack frame, C simply subtracts the desired size from the stack pointer [hardware register]. The size of the stack frame is the sum of all variables within a given function's scope, regardless of whether they're declared inside a loop inside the function.
Its initial value depends upon what previous function used that area for and what byte values it stored there. Thus, if main calls function fnca, it will fill the stack with whatever data. If then main calls fncb it will see fnca's values, which are semi-random as far as fncb is concerned. Both fnca and fncb must initialize stack variables before they are used.
Declaration of a C variable without an initializer clause does not initialize the variable. For the bss area, it will be zero. For a stack variable, you must do that explicitly.
There is no range checking of array indexes in C [or pointers to arrays or array elements for that matter]. If you write beyond the defined area, you will write into whatever has been mapped/linked into the memory region next. For example, if you have a memory area: int x[10]; int y; and you [inadvertently] write to x[10] [one beyond the end] you will corrupt y
This is true regardless of which memory section (e.g. data, bss, heap, or stack) your array is in.
C has no concept of a string. When people talk about a "c string" what they're really talking about is a char array that has an "end of string" (aka EOS) sentinel character at the end of the useful data. The "standard" EOS char is almost universally defined as 0x00 [since ~1970]
The only intrinsic types supported by an architecture are: char, short, int, long/pointer, long long, and float/double. There may be some others on a given arch, but that's the usual list. Everything else (e.g. a class or struct is "built up" by the compiler as a convenience to the programmer from the arch intrinsic types)
Here are some things that are about C [and C++]:
- C has preprocessor macros. Java has no concept of macros. Preprocessor macros can be thought of as a crude form of metaprogramming.
- C has inline functions. They look just like regular functions, but the compiler will attempt to insert their code directly into any function that calls one. This is handy if the function is cleanly defined but small (e.g. a few lines). It saves the overhead of actually calling the function.
Examples
Here are several versions of your original program as an example:
// myfnc1 -- original
void
myfnc1(void)
{
istream file;
while (isOpen && !file.eof()) {
char current_line[LINE];
char current_name[NAME];
file.getline(current_line, LINE);
int i = 0;
while (current_line[i] != ';') {
current_name[i] = current_line[i];
i++;
}
current_name[i] = 0;
cout << current_name << endl;
}
}
// myfnc2 -- moved definitions to function scope
void
myfnc2(void)
{
istream file;
int i;
char current_line[LINE];
char current_name[NAME];
while (isOpen && !file.eof()) {
file.getline(current_line, LINE);
i = 0;
while (current_line[i] != ';') {
current_name[i] = current_line[i];
i++;
}
current_name[i] = 0;
cout << current_name << endl;
}
}
// myfnc3 -- converted to for loop
void
myfnc(void)
{
istream file;
int i;
char current_line[LINE];
char current_name[NAME];
while (isOpen && !file.eof()) {
file.getline(current_line, LINE);
for (i = 0; current_line[i] != ';'; ++i)
current_name[i] = current_line[i];
current_name[i] = 0;
cout << current_name << endl;
}
}
// myfnc4 -- converted to use pointers
void
myfnc4(void)
{
istream file;
const char *line;
char *name;
char current_line[LINE];
char current_name[NAME];
while (isOpen && !file.eof()) {
file.getline(current_line, LINE);
name = current_name;
for (line = current_line; *line != ';'; ++line, ++name)
*name = *line;
*name = 0;
cout << current_name << endl;
}
}
// myfnc5 -- more efficient use of pointers
void
myfnc5(void)
{
istream file;
const char *line;
char *name;
int chr;
char current_line[LINE];
char current_name[NAME];
while (isOpen && !file.eof()) {
file.getline(current_line, LINE);
name = current_name;
line = current_line;
for (chr = *line++; chr != ';'; chr = *line++, ++name)
*name = chr;
*name = 0;
cout << current_name << endl;
}
}
// myfnc6 -- fixes bug if line has no semicolon
void
myfnc6(void)
{
istream file;
const char *line;
char *name;
int chr;
char current_line[LINE];
char current_name[NAME];
while (isOpen && !file.eof()) {
file.getline(current_line, LINE);
name = current_name;
line = current_line;
for (chr = *line++; chr != 0; chr = *line++, ++name) {
if (chr == ';')
break;
*name = chr;
}
*name = 0;
cout << current_name << endl;
}
}
// myfnc7 -- recoded to use "smart" string
void
myfnc7(void)
{
istream file;
const char *line;
char *name;
int chr;
char current_line[LINE];
xstr_t current_name;
xstr_t *name;
name = &current_name;
xstrinit(name);
while (isOpen && !file.eof()) {
file.getline(current_line, LINE);
xstragain(name);
line = current_line;
for (chr = *line++; chr != 0; chr = *line++) {
if (chr == ';')
break;
xstraddchar(name,chr);
}
cout << xstrcstr(name) << endl;
}
xstrfree(name);
}
Here is a "smart" string [buffer] class similar to what you're used to:
// xstr -- "smart" string "class" for C
typedef struct {
size_t xstr_maxlen; // maximum space in string buffer
char *xstr_lhs; // pointer to start of string
char *xstr_rhs; // pointer to start of string
} xstr_t;
// xstrinit -- reset string buffer
void
xstrinit(xstr_t *xstr)
{
memset(xstr,0,sizeof(xstr));
}
// xstragain -- reset string buffer
void
xstragain(xstr_t xstr)
{
xstr->xstr_rhs = xstr->xstr_lhs;
}
// xstrgrow -- grow string buffer
void
xstrgrow(xstr_t *xstr,size_t needlen)
{
size_t curlen;
size_t newlen;
char *lhs;
lhs = xstr->xstr_lhs;
// get amount we're currently using
curlen = xstr->xstr_rhs - lhs;
// get amount we'll need after adding the whatever
newlen = curlen + needlen + 1;
// allocate more if we need it
if ((newlen + 1) >= xstr->xstr_maxlen) {
// allocate what we'll need plus a bit more so we're not called on
// each add operation
xstr->xstr_maxlen = newlen + 100;
// get more memory
lhs = realloc(lhs,xstr->xstr_maxlen);
xstr->xstr_lhs = lhs;
// adjust the append pointer
xstr->xstr_rhs = lhs + curlen;
}
}
// xstraddchar -- add character to string
void
xstraddchar(xstr_t *xstr,int chr)
{
// get more space in string buffer if we need it
xstrgrow(xstr,1);
// add the character
*xstr->xstr_rhs++ = chr;
// maintain the sentinel/EOS as we go along
*xstr->xstr_rhs = 0;
}
// xstraddstr -- add string to string
void
xstraddstr(xstr_t *xstr,const char *str)
{
size_t len;
len = strlen(str);
// get more space in string buffer if we need it
xstrgrow(xstr,len);
// add the string
memcpy(xstr->xstr_rhs,str,len);
*xstr->xstr_rhs += len;
// maintain the sentinel/EOS as we go along
*xstr->xstr_rhs = 0;
}
// xstrcstr -- get the "c string" value
char *
xstrcstr(xstr_t *xstr,int chr)
{
return xstr->xstr_lhs;
}
// xstrfree -- release string buffer data
void
xstrfree(xstr_t *xstr)
{
char *lhs;
lhs = xstr->xstr_lhs;
if (lhs != NULL)
free(lhs);
xstrinit(xstr);
}
Recommendations
Before you try to "get around" a "c string", embrace it. You'll encounter it in many places. It's unavoidable.
Learn how to manipulate pointers as easily as index variables. They're more flexible and [once you get the hang of them] easier to use. I've seen code written by programmers who didn't learn this and their code is always more complex than it needs to be [and usually full of bugs that I've needed to fix].
Good commenting is important in any language but, perhaps, more so in C than Java for certain things.
Always compile with -Wall -Werror and fix any warnings. You have been warned :-)
I'd play around a bit with the myfnc examples I gave you. This can help.
Get a firm grasp of the basics before you ...
And now, a word about C++ ...
Most of the above was about architecture, memory layout, and C. All of that still applies to C++.
C++ does do a more limited reclamation of stack variables when the function returns and they go out of scope. This has its pluses and minuses.
C++ has many classes to alleviate the tedium of common functions/idioms/boilerplate. It has the std standard template library. It also has boost. For example, std::string will probably do what you want. But, compare it against my xstr first.
But, once again, I wish to caution you. At your present level, work from the fundamentals, not around them.
Adding current_name[i] = 0; as described did not work for me.
Also, I got an error on isOpen as shown in the question.
Therefore, I freehanded a revised program beginning with the code presented in the question, and making adjustments until it worked properly given an input file having two rows of text in groups of three alpha characters that were delimited with " ; " without the quotes. That is, the delimiting code was space, semicolon, space. This code works.
Here is my code.
#define LINE 1000
int j = 0;
while (!file1.eof()) {
j++;
if( j > 20){break;} // back up escape for testing, in the event of an endless loop
char current_line[LINE];
//string current_name = ""; // see redefinition below
file1.getline(current_line, LINE, '\n');
stringstream ss(current_line); // stringstream works better in this case
while (!ss.eof()) {
string current_name;
ss >> current_name;
if (current_name != ";")
{
cout << current_name << endl;
} // End if(current_name....
} // End while (!ss.eof...
} // End while(!file1.eof() ...
file1.close();
cout << "Done \n";

Which is faster and more efficient, processing char by char as char or as stream?

If I want to process a text file char by char before using it. What method is most efficient?
I can do this:
ifstream ifs("the_file.txt", ios_base::in);
char c;
while (ifs >> noskipws >> c) {
// process c ...
}
ifs.close();
and this:
ifstream ifs("the_file.txt", ios_base::in);
stringstream sstr;
sstr << ifs.rdbuf();
string txt = sstr.str();
for (string::iterator iter = txt.begin(); iter != txt.end(); ++iter) {
// process *iter ...
}
The final output will be splitted string based on char found while iterating.
Which is faster? Or maybe there's another more efficient way? Do I need to flush the stringstream for every character (I read somewhere that flush is affecting performance)?
a) Measure (I'd guess that first one should be faster as it avoids extra allocation, but it is just a guess)
b) While it can indeed be a Really Bad case of premature optimization, if you really need the very best performance, try something along the lines of:
int f = open(...);
//error handling here
char buf[256];
while(1) {
int rd = read(f,buf,256);
if( rd == 0 ) break;
for(const char*p=buf;p<buf+rd;++p) {
//process *p; note that this loop can be entered more than once
}
}
close(f);
I'm pretty sure that it will be very difficult to beat this code performance-wise (unless going into very low-level non-standard IO); however, it might easily happen that ifstream will produce comparable results. Or it might not.
NB: for C++ the difference provided by this technique (read fixed-size buffer, then scan buffer) is small and usually negligible, but for other languages it might easily provide up to 2x difference (has been observed on Java).
Based on a crude test for a 20 mega byte file, this method loads the file in to one string in 0.1 second, versus 0.5 second for rdbuf method you had earlier. So basically there is no difference unless you are accessing lots and lots of files.
ifstream ifs(filename, ios::binary);
string txt;
unsigned int cursor = 0;
const unsigned int readsize = 4096;
while (ifs.good())
{
txt.resize(cursor + readsize);
ifs.read(&txt[cursor], readsize);
cursor += (unsigned int)ifs.gcount();
}
txt.resize(cursor);