I've been coding a multi-thread simulation storing the outputs in files. So far, I've assigned one file to core (with a ofstream myfiles[NUMBER_OF_CORES]) from the beginning but it's a bit messy as I'm working with several computers having 20+ cores. I've been doing that to avoid overheading if using one file, but could I use something like a stream per core and in the end, use something like:
for(int i =0; i < NUMBER_OF_CORES; i++){
myfile << CORE_STREAM[i];
}
starting with a CORE_STREAM[NUMBER_OF_CORES] array? I've never manipulated streams in this way. Which class should I construct this from if it exists?
You could use a ostringstream to store intermediate results in memory. Like ofstream, it implements the ostream interface so your existing code probably will work as-is.
To dump one stream on another, you'd do myfile << core_stream[i].rdbuf(). = Read Buffer
Have you considered using a ZMQ pipeline? Your simulation threads could write to a ZMQ_PUSH socket (see zmq_socket(3)) and whatever is writing to the file (another thread or process, ZMQ doesn't care) could read from a ZMQ_PULL socket. That way your simulation threads can potentially get out of doing any blocking IO without staging results in memory. I can't imagine working on a distributed computing project these days and not using ZMQ.
Related
Although i have read about buffer and stream and it's working with files in c++ but i don't know what is the need of a buffer if a stream is there, stream is always there to transfer the data of one file to the program. So why do we use buffers to store data(performing same task that stream does) and what are buffered and unbuffered stream.
Consider a stream that writes to a file. If there were no buffer, if your program wrote a single byte to the stream, you'd have to write a single byte to the file. That's very inefficient. So streams have buffers to decouple operations one on side of the stream from operations on the other side of the stream.
Ok lets lets start from the scratch suppose you want to work with files. For this purpose you would have to manage how the data is entered into your file and if the sending of data into the file was successful or not, and all other basic working problems. Now either you can manage all that on your own which would take a lots a time and hard work or What you can do is you can use a stream.
Yes, you can allocate a stream for such purposes. Streams work with abstraction mechanism i.e. we c++ programmers don't know how they are working but we only know that we are at the one side of a stream (on our program's side) we offer our data to the stream and it has the responsibility to transfer data from one end to the other (file's side)
Eg-
ofstream file("abc.txt"); //Here an object of output file stream is created
file<<"Hello"; //We are just giving our data to stream and it transfers that
file.close(); //The closing of file
Now if you work with files you should know that working with files is a really expensive operation i.e. it takes more time to access file than to access memory and we also don't have to perform file operations every time. So programmers created a new feature called buffer which is a part of computer's memory and stores data temporarily for handling files.
Suppose at the place of reading file every time to read data you just read some memory location where all the data of file is copied temporarily.Now it would be a less expensive task as you are reading memory not file.
Those streams that have a buffer for their working i.e. they open the file and by default copy all the data of file to the buffer are called as buffered streams whereas those streams which don't use any buffer are called as unbuffered streams.
Now if you enter data to a buffered stream then that data will be queued up until the stream is not flushed (flushing means replacing the data of buffer with that of file). Unbuffered streams are faster in working (from the point of user at one end of the stream) as data is not temporarily stored into a buffer and is sent to the file as it comes to the stream.
A buffer and a stream are different concepts.
A buffer is a part of the memory to temporarily store data. It can be implemented and structured in various ways. For instance, if one wants to read a very large file, chunks of the file can be read and stored in the buffer. Once a certain chunk is processed the data can be discarded and the next chunk can be read. A chunk in this case could be a line of the file.
Streams are the way C++ handles input and output. Their implementation uses buffers.
I do agree that stream is probably the poorest written and the most badly udnerstood part of standard library. People use it every day and many of them have not a slightest clue how the constructs they use work. For a little fun, try asking what is std::endl around - you might find that some answers are funny.
On any rate, streams and streambufs have different responsibilities. Streams are supposed to provide formatted input and output - that is, translate an integer to a sequence of bytes (or the other way around), and buffers are responsible of conveying the sequence of bytes to the media.
Unfortunately, this design is not clear from the implementation. For instance, we have all those numerous streams - file stream and string stream for example - while the only difference between those are the buffer. The stream code remains exactly the same. I believe, many people would redesign streams if they had their way, but I am afraid, this is not going to happen.
In the C++ primer book, in chapter (1), it mentions the following:
endl is a special value, called a manipulator, that when written to an
output stream has the effect of writing a newline to the output and
flushing the buffer associated with that device. By flushing the buffer, we ensure that the user will see the output written to the
stream immediately.
What is meant by "flushing the buffer" here?
Output is generally buffered before it's written to the intended device. That way, when writing to slow to access devices(like files), it doesn't have to access the device after every single character.
Flushing means emptying the buffer and actually writing it to the device.
C++'s iostreams are buffered, that means that when you output to an ostream, the content will not immediately go to what is behind the stream, e.g. stdout in the case of cout. The implementation of the stream determines when to actually send the buffered part of the stream out. This is done for reasons of efficiency, it would be very inefficient to write to a network or disk stream byte by byte, by buffering this problem is solved.
This does however mean that when you write say debug messages to a log file and your program crashes you may lose part of the data you wrote to the log file through the stream, as a part of the log may still be in the stream's buffer and not yet written to the actual file. To prevent this from happening you need to make the stream flush its buffers either by an explicit flush method call, or by using the convenience of endl.
If however you're just writing to a file regularly you should use \n instead of endl to prevent the stream from unnecessarily flushing the stream every line reducing your performance.
Edited to include this note:
cin and cout have a special relationship, where reading from cin will automatically flush cout beforehand. This makes sure that the e.g. the prompt you wrote to cout will actually be seen by the user before the read from cin is waiting for input. Hence, even in cout you don't normally need endl but can use \n instead. You can create such relationships between other streams as well by tying them together.
What is meant by "flushing the buffer" here?
std::endl causes the data in the stream's internal staging memory (its "buffer") to be "flushed" (transferred) to the operating system. The subsequent behavior depends on what type of device the stream is mapped to, but in general, flushing will give the appearance that the data has been physically transferred to the associated device. A sudden loss of power, however, might defeat the illusion.
This flushing involves some overhead (wasted time), and should therefore be minimized when execution speed is an important concern. Minimizing the overall impact of this overhead is the fundamental purpose of data buffering, but this goal can be defeated by excessive flushing.
Background information
The I/O of a computing system is typically very sophisticated and composed of multiple abstraction layers. Each such layer may introduce a certain amount of overhead. Data buffering is a way of reducing this overhead by minimizing the number of individual transactions performed between two layers of the system.
CPU/memory system-level buffering (caching): For very high activity, even the random-access-memory system of a computer can become a bottleneck. To address this, the CPU virtualizes memory accesses by providing multilple layers of hidden caches (the individual buffers of which are called cache lines). These processor caches buffer your algorithm's memory writes (pursuant to a writing policy) in order to minimize redundant accesses on the memory bus.
Application-level buffering: Although it isn't always necessary, it is not uncommon for an application to allocate chunks of memory to accumulate output data before passing it to the I/O library. This provides the fundamental benefit of allowing for random accesses (if necessary), but a significant reason for doing this is that it minimizes the overhead associated with making library calls -- which may be substantially more time-consuming than simply writing to a memory array.
I/O library buffering: The C++ IO stream library optionally manages a buffer for every open stream. This buffer is used, in particular, to limit the number of system calls to the operating system kernel because such calls tend to have some non-trivial overhead. This is the buffer which is flushed when using std::endl.
operating system kernel and device drivers: The operating system routes the data to a specific device driver (or subsystem) based on what output device the stream is attached to. At this point, the actual behavior may vary widely depending on the nature and characteristics of that type of device. For example, when the device is a hard disk, the device driver might not initiate an immediate transfer to the device, but rather maintain its own buffer in order to further minimize redundant operations (since disks, too, are most efficiently written to in chunks). In order to explicitly flush kernel-level buffers, it may be necessary to call a system-level function such as fsync() on Linux -- even closing the associated stream, doesn't necessarily force such flush.
Example output devices might include...
a terminal on the local machine
a terminal on a remote machine (via SSH or similar)
data being sent to another application via pipes or sockets
many variations of mass-storage devices and associated file-systems, which may be (again) locally attached or distributed via a network
hardware buffers: Specific hardware may contain its own memory buffers. Hard drives, for example, typically contain a disk buffer in order to (among other things) allow the physical writes to occur without requiring the system's CPU to be engaged in the entire process.
Under many circumstances, these various buffering layers tend to be (to a certain extent) redundant -- and therefore essentially overkill. However, the buffering at each layer can provide a tremendous gain in throughput if the other layers, for whatever reason, fail to deliver optimum buffering with respect to the overhead associated with each layer.
Long story short, std::endl only addressed the buffer which is managed by the C++ IO stream library for that particular stream. After calling std::endl, the data will have been moved to kernel-level management, and what happens next with the data depends on a great many factors.
How to avoid the overhead of std::endl
Method 1: Don't use std::endl -- use '\n' instead.
Method 2: Don't use std::endl -- use something like the following version instead...
inline std::ostream & endl( std::ostream & os )
{
os.put( os.widen('\n') ); // http://en.cppreference.com/w/cpp/io/manip/endl
if ( debug_mode ) os.flush(); // supply 'debug_mode' however you want
return os;
}
In this example, you provide a custom endl which can be called with-or-without invoking the internal call to flush() (which is what forces the transfer to the operating system). Enabling the flush (with the debug_mode variable) is useful for debugging scenarios where you want to be able to examine the output (for example a disk-file) when the program has terminated before cleanly closing the associated streams (which would have forced a final flush of the buffer).
When using std::cout, the operand used after the output operator ( << ) are stored in a buffer and are not displayed onto the stdin (usually terminal, or the command prompt) until it comes across std::endl or std::cin, which causes the buffer to be flushed, in the sense, display/output the contents of the buffer onto the stdin.
Consider this program:
#include <iostream>
#include <unistd.h>
int main(void)
{
std::cout << "Hello, world";
sleep(2);
std::cout << std::endl;
return 0;
}
The output obtained will be:
after 2 seconds
Hello, World
One simple code to show you the effects of buffered I/O in c++
Whatever input you provide is buffered and then passed on to the program variables in case of inputs.
Have a look at the code below:
//program to test how buffered I/O can have unintended effects on our program
#include<bits/stdc++.h>
using namespace std;
int main()
{
int a;
char c;
cin>>a;
cin>>c;
cout<<"the number is : "<<a;
cout<<"\nthe character is : "<<c;
}
here we have declared two variables one int and one char
if we input the number as "12d34"
this will cause the int variable to accept only 12 as value and it will discard the rest which will still be there in the buffer.
And in the next input the char variable will automatically accept the value "d"
without even asking you for any input
I just spent quite some time trying to get this loop openMPed, but for 2 threads, it doubles Wall time! Am I missing something important?
The overall task is to read in a big file (~ 1GB) in parallel, an ifstream is divided into several stringbuffer and these are used to insert the data into the structs Symbol. Up to here everything is fast. Also giving the loop private variables str and locVec to operate on doesn't change something.
vector<string> strbuf; // filled from ifstream
vector< vector <Symbol> > symVec; // to be filled
#pragma omp parallel for num_threads(2) default(none) shared(strbuf, symVec)
for (int i=0; i<2; i++)
{
string str = strbuf[i];
std::stringstream ss(str);
// no problem until here
// this is where it slows down:
vector<Symbol> locVec;
std::copy(std::istream_iterator<Symbol>(ss), std::istream_iterator<Symbol>(), std::back_inserter(locVec));
symVec[i] = locVec;
}
EDIT::
Sorry for being unprecise, but the file content is already read in sequencially and divided into the strbufs at this point. The file is closed. Within the loop there is no file access.
It's much better to do sequential I/O on a file rather than I/O at different sections of a file. This essentially boils down to causing a lot of seeks on the underlying device (I'm assuming a disk here). This also increases the amount of underlying system calls required to read the file into said buffers. You're better off using 1 thread to read the file in it's totality sequentially (maybe mmap() with MAP_POPULATE) and assigning processing to different threads.
Another option is to use calls such as aio_read() to handle reading in different sections if for some reason you really do not want to read the file all at once.
Without all the code I cannot be completely sure but remember that simply opening a file does not guarantee it's content to be in memory and reading from a file will cause page faults that will then cause the actual file contents to be read so even if you're not explicitly trying to read from the file using a read/write the OS will take care of that for you.
I have one big file. It is a text file so I am reading one line at a time.
std::ifstream inFile( "big_file.txt" );
std::string line;
while( getline( inFile, line ) )
{
}
I want to distribute the lines that I read from 'big_file.txt' to several files. The file count depends on the number of cores available on the machine.
Edit: The target files might be on different physical devices, or content possibly sent to a different machine
My (unsuccesful)attempt so far is as follows
// list of writer objects each running in its own thread
std::vector<FileWriter> writers;
// create as many threads as there are cores
unsigned long const cores = boost::thread::hardware_concurrency();
for( unsigned long i = 0; i < cores; ++i)
{
std::ostringstream ss;
ss << i;
FileWriter rt(ss.str());
writers.push_back(rt);
}
then as I call getline(inFile, line), I want to be able to send the line to the threads in a round-robin fashion. It really does not have to be in round-robin; whatever method is best to distribute the work among threads is fine.
I have run out of ideas.
Please suggest boost and pre c++11 STL as I don't have a complete c++11 environment yet.
Unless each new file is on a separate physical device, it is unlikely that there would be a performance gain simply by using multiple threads to write the individual files. This type of process will be I/O bound rather than CPU bound typically.
One important thing to make sure of is to use buffered I/O (which it appears to be the case since you show ifstream). Without buffered I/O, the latency of writing individual lines to different files would be a huge bottleneck.
Edit Given that the individual lines may be written to separate devices, then it might gain in performance by using multiple threads. If there is a long latency (e.g., on a network send call if sending to another machine via some mechanism), then other threads could still be writing to other locations, so that would definitely help.
I might not completely understand the question, but it seems then it would just make sense to use a thread pool. One possibility would be to use threadpool. I have not used it, but it seems to have a good reputation.
I would like to know whether there might be any possibility of some performance gain on file read by using openMP.
Example code,
fstream file;
file.open("test.txt",ios::in);
file.seekg(0,ios::end);
int len = file.tellg();
char *arr = new char[len];
char *temp = new char[1];
int i;
#pragma omp parallel for shared(arr, len) private(temp, i)
for(i = 0; i < len; i++)
{
file.seekg(i);
file.read(temp,1);
arr[i] = temp[0];
}
I guess using multiple threads for I/O operation is a bad option because finally file read operation will be serialized. But still, I would like to whether one can expect a performance gain. Moreover, I would also like to know how does openMP handles the parallel file read operations.
As you mentioned, you're not likely to get any speedup parallelizing any sort of I/O bound task like this. However, there is a much bigger problem. The code isn't even correct.
The seekg() and read() methods modify the file variable. So your iterations aren't independent. So you will have race conditions on the stream. In other words, the loop isn't parallelizable.
So don't expect that code to work at all - let alone with better performance.
Although there are lots of performance improvements in file streams those you are proposing are not among them:
std::streambuf is stateful and trying to access it simultanously from multiple threads of execution will thoroughly mess it up.
Processing individual characters is essentially a worst case scenario for a contemporary processor. If you really end up doing it in parallel you'd have multiple processors messing with the same cache lines. This will actually dramatically degrade performance compared to a single thread of execution.
I don't know why people are so.fond of using seeks: each seek essentially kills any current buffer and may cause a system call just to position the stream to a defined state. The key problem with seeking is that sets the stream up to be either for reading or writing, depending what is the next operation. Yes, the open mode may be taken into account but it probably isn't.
If you want to read a fast approach to read a file using std::ifstream you should
imbue() a std::locale which advertises not to do any conversion
open the file in std::binary mode
skip trying to get what may be a wrong estimate on the size of the file (seeking to the end and hoping that this somehow gives you the number of characters in a file is futile)
read the to a suitable std::ostream e.g. std::ostringstream (if you can provide the destination buffer you can use a faster output stream) using the output operator for stream buffers: out << in.rdbuf()
I don't see that concurreny would help you with reading a stream.