fsync() and write() in different threads

fsync() and write() in different threads - c++

I am trying to write program using fsync() and write() but fsync need time to sync data but i haven't this time to wait. I made one more thread for fsync()
Here is my code:
#include <thread>
void thread_func(int fd) {
while (1) {
if(fsync(fd) != 0)
std::cout << "ERROR fsync()\n";
usleep(100);
}
}
int main () {
int fd = open ("device", O_RDWR | O_NONBLOCK);
if (fd < 0) {
std::cout << "ERROR: open()\n";
return -1;
}
std::thread *thr = new std::thread (thread_func, fd);
if (thr == nullptr) {
std::cout << "Cannot create thread\n";
close (fd);
return -1;
}
while (1) {
if (write (fd, 'x', 1) < 1)
std::cout << "ERROR write()\n";
}
close(fd);
}
Question is:
is it need to lock different thread when i use file descriptor to fsync in other thread than main?
when i test my program without mutex it have no problem. and when i read man description for fsync it have nothing for different thread.

If the fact that fsync takes time and even sometimes blocks for a very short time is a problem, then you are most probably doing something wrong.
Normally, you do not want to call fsync at all, ever. It is a serious anti-optimization to do so, and one will only ever want to do it if it must be assured that data has been written out1. In this case, however, you absolutely want fsync to block, this is not only works-as-intended, but necessary.
Only when fsync has returned, you know that it has done its task. You know that the OS has done its best to assure that data has been written, and only then it is safe to proceed. If you offload this to a background thread, you can just as well not call fsync, because you don't know when it's safe to assume data has been written.
If initiating writes is your primary goal, you use sync_file_range under Linux (which runs asynchronously) followed by a call to fsync some time later. The reason for following up with fsync is both to ensure that writes are done, and the fact that sync_file_range does not update metadata, so unless you are strictly overwriting already allocated data within the file, your writes may not be visible in case of a crash even though data is on disk (I can't imagine how that might happen since allocating more sectors to a file necessarily means metadata must be modified, but the manpage explicitly warns that this can happen).
1The fsync function still does not (and cannot) guarantee that data is on a permanent storage, it might still be somewhere in the cache hierarchy, such as a controller's or disk's write cache.

Unless you require the thread for something else I would suggest you use the asynchronous I/O aio library:
struct aiocb fsync_cb = {
.aio_fildes = fd
, .aio_sigevent = {
.sigev_notify = SIGEV_NONE
}
}
aio_fsync(O_SYNC, &fsync_cb);
There is also an equivalent variant for write.
struct aiocb write_cb = {
.aio_fildes = fd
, .aio_buf = buffer
, .aio_nbytes = nbytes
, .aio_offset = offset
, .aio_sigevent = {
.sigev_notify = SIGEV_NONE
}
}
aio_write(&write_cb);
If you choose not to have any notificaton of success then you will have to check/wait at some point for completion:
while (aio_error(&write_cb) == EINPROGRESS);

Related

Which types of memory_order should be used for non-blocking behaviour with an atomic_flag?

I'd like, instead of having my threads wait, doing nothing, for other threads to finish using data, to do something else in the meantime (like checking for input, or re-rendering the previous frame in the queue, and then returning to check to see if the other thread is done with its task).
I think this code that I've written does that, and it "seems" to work in the tests I've performed, but I don't really understand how std::memory_order_acquire and std::memory_order_clear work exactly, so I'd like some expert advice on if I'm using those correctly to achieve the behaviour I want.
Also, I've never seen multithreading done this way before, which makes me a bit worried. Are there good reasons not to have a thread do other tasks instead of waiting?
/*test program
intended to test if atomic flags can be used to perform other tasks while shared
data is in use, instead of blocking
each thread enters the flag protected part of the loop 20 times before quitting
if the flag indicates that the if block is already in use, the thread is intended to
execute the code in the else block (only up to 5 times to avoid cluttering the output)
debug note: this doesn't work with std::cout because all the threads are using it at once
and it's not thread safe so it all gets garbled. at least it didn't crash
real world usage
one thread renders and draws to the screen, while the other checks for input and
provides frameData for the renderer to use. neither thread should ever block*/
#include <fstream>
#include <atomic>
#include <thread>
#include <string>
struct ThreadData {
int numTimesToWriteToDebugIfBlockFile;
int numTimesToWriteToDebugElseBlockFile;
};
class SharedData {
public:
SharedData() {
threadData = new ThreadData[10];
for (int a = 0; a < 10; ++a) {
threadData[a] = { 20, 5 };
}
flag.clear();
}
~SharedData() {
delete[] threadData;
}
void runThread(int threadID) {
while (this->threadData[threadID].numTimesToWriteToDebugIfBlockFile > 0) {
if (this->flag.test_and_set(std::memory_order_acquire)) {
std::string fileName = "debugIfBlockOutputThread#";
fileName += std::to_string(threadID);
fileName += ".txt";
std::ofstream writeFile(fileName.c_str(), std::ios::app);
writeFile << threadID << ", running, output #" << this->threadData[threadID].numTimesToWriteToDebugIfBlockFile << std::endl;
writeFile.close();
writeFile.clear();
this->threadData[threadID].numTimesToWriteToDebugIfBlockFile -= 1;
this->flag.clear(std::memory_order_release);
}
else {
if (this->threadData[threadID].numTimesToWriteToDebugElseBlockFile > 0) {
std::string fileName = "debugElseBlockOutputThread#";
fileName += std::to_string(threadID);
fileName += ".txt";
std::ofstream writeFile(fileName.c_str(), std::ios::app);
writeFile << threadID << ", standing by, output #" << this->threadData[threadID].numTimesToWriteToDebugElseBlockFile << std::endl;
writeFile.close();
writeFile.clear();
this->threadData[threadID].numTimesToWriteToDebugElseBlockFile -= 1;
}
}
}
}
private:
ThreadData* threadData;
std::atomic_flag flag;
};
void runThread(int threadID, SharedData* sharedData) {
sharedData->runThread(threadID);
}
int main() {
SharedData sharedData;
std::thread thread[10];
for (int a = 0; a < 10; ++a) {
thread[a] = std::thread(runThread, a, &sharedData);
}
thread[0].join();
thread[1].join();
thread[2].join();
thread[3].join();
thread[4].join();
thread[5].join();
thread[6].join();
thread[7].join();
thread[8].join();
thread[9].join();
return 0;
}```

The memory ordering you're using here is correct.
The acquire memory order when you test and set your flag (to take your hand-written lock) has the effect, informally speaking, of preventing any memory accesses of the following code from becoming visible before the flag is tested. That's what you want, because you want to ensure that those accesses are effectively not done if the flag was already set. Likewise, the release order on the clear at the end prevents any of the preceding accesses from becoming visible after the clear, which is also what you need so that they only happen while the lock is held.
However, it's probably simpler to just use a std::mutex. If you don't want to wait to take the lock, but instead do something else if you can't, that's what try_lock is for.
class SharedData {
// ...
private:
std::mutex my_lock;
}
// ...
if (my_lock.try_lock()) {
// lock was taken, proceed with critical section
my_lock.unlock();
} else {
// lock not taken, do non-critical work
}
This may have a bit more overhead, but avoids the need to think about atomicity and memory ordering. It also gives you the option to easily do a blocking wait if that later becomes useful. If you've designed your program around an atomic_flag and later find a situation where you must wait to take the lock, you may find yourself stuck with either spinning while continually retrying the lock (which is wasteful of CPU cycles), or something like std::this_thread::yield(), which may wait for longer than necessary after the lock is available.
It's true this pattern is somewhat unusual. If there is always non-critical work to be done that doesn't need the lock, commonly you'd design your program to have a separate thread that just does the non-critical work continuously, and then the "critical" thread can just block as it waits for the lock.

C++ SQLite Serialized Mode Questions

I have been through Quite a few pages, and have an ok Idea of whats happening it think, but I have a few Questions just to be sure....
my program uses the -DTHREADSAFE=1 compile options, forks on receiving a database request (Select, Delete, Insert, Update) from a user or my network, then the child process handles the various database tasks, and relaying of messages should that be required and so on,
at the moment my database is not setup for concurrency which I wont lie is a major design flaw, but that's beside the point at the moment, let's say I have a function that prints all the entries in my table LEDGER as follows...
void PersonalDataBase::printAllEntries()
{
//get all entries
const char query [] = "select * from LEDGER";
sqlite3_stmt *stmt;
int error
try
{
if ((error = sqlite3_prepare(publicDB, query, -1, &stmt, 0 )) == SQLITE_OK)
{
int ctotal = sqlite3_column_count(stmt);
int res = 0;
while ( 1 )
{
res = sqlite3_step(stmt);
if ( res == SQLITE_ROW )
{
Entry *temp = loadBlockRow(stmt);
string from, to;
from = getNameForHash(temp -> from);
to = getNameForHash(temp -> to);
temp -> setFromOrTo(from, 0);
temp -> setFromOrTo(to, 1);
temp -> printEntry();
printlnEnd();
delete temp;
}
else if ( res == SQLITE_DONE || res==SQLITE_ERROR)
{
if (res == SQLITE_ERROR) { throw res; }
sqlite3_finalize(stmt);
break;
}
}
}
//problems
else
{
throw error;
}
}
catch (int err)
{
sqlite3_finalize(stmt);
setupOutput();
cout << "Database Error: " << sqlite3_errmsg(publicDB) << ", Error Code: " << (int) error << endl;
cout << "Did Not Find Values Try Again After Fixing Problems Above." << endl;
printlnEnd();
}
println("Done!");
}
my setupOutput(), printlnEnd(), println(), all help with my use of 'non-blocking' keyboard i/o, they work as I want lets not worry about them here, and think of them as just a call to cout
ok so now at this point I figure there are 4 options...
A while around my try/catch, then in catch check if err = 5, if so I need to setup a sqlite3_busy_handler and have it wait for whatever is blocking the current operation (once it returns SQLITE_OK and have cleaned up all my old variables I reiterate through the while/try again), now as only one of these can be setup at a time, let's say for instance Child1 is doing a large write and child2 and child3 are trying to say read and update concurrently on top of the first child's write, so if a SQLITE_BUSY is returned by this function I print out an error, then restart my while loop (restarting the function), of course after I have finalized my old statement, and cleared up any local objects that may have been created, if this a correct line of thinking?
Should I setup a recursive mutex, say screw it to SQLites own locking mechanism, set it up to be shared across processes then only allow one operation on a database at a time? for using my app on a small scale this doesn't seem to bad of an option, however I'm reading a lot of warnings on using a recursive mutex and am wondering if this is is the best option, as many posts say handle mutual exclusion yourself. however then I cannot have concurrent reads, which is a bit of a pain
Use option 1 but instead of using the SQLite busy handler, just call usleep on a random number, clean up data, and restart while?
before/after any function involving my database use sqlite3_exec() with "BEGIN IMMEDIATE"/"COMMIT" respectively, Locking the database for the duration of the code in between those 2 statements. So that nothing enclosed within can(or at least should) return SQLITE_BUSY, then if my "BEGIN IMMEDIATE" returns BUSY (it should be the only one so long as everything is set up correctly), I use the sqlite3_busy_handler which honestly, if only one process can use it at a time seems annoying... or a random number with usleep(), (presumably at this number is rather large 1mil = 1 second the chance of overlap between 1-20 processes is pretty slim) so each process will constantly try to re lock the database at random intervals for their own purposes
Is there a better way? or which one of these is best?

SQLite's internal busy handler (installed with sqlite3_busy_timeout()) already sleeps a more-or-less random number of times; there is no need to write your own handler.
Using your own locking mechanism would be more efficient than random waiting, but only if you have reader/writer locks.
BEGIN or BEGIN IMMEDIATE ensure that no other statement in the same transaction can run into a lock, but only if IMMEDIATE is used for transactions that write.
To allow concurrent readers and writers, consider using WAL mode. (But this does not allow multiple writers either.)

exit() or _exit() after forking?

I am writing a program which requires communicating with an external program two-way simultaneously, i.e., reading and writing to an external program at the same time.
I create two pipes, one for sending data to the external process, one for receiving data from the external process. After forking the child process, which becomes the external program, the parent forks again. The new child now writes data into the outgoing pipe to the external program, and the parent now reads data from the incoming pipe from the external program for further processing.
I've heard that using exit(3) may cause buffers to be flushed twice, however I am also afraid that using _exit(2) may leave buffers left unflushed. In my program, there are outputs both before and after forking. Which, exit(3) or _exit(2), should I use in this case?
The below is my main function. The #includes and auxiliary function is left out for simplicity.
int main() {
srand(time(NULL));
ssize_t n;
cin >> n;
for (double p = 0.0; p <= 1.0; p += 0.1) {
string s = generate(n, p);
int out_fd[2];
int in_fd[2];
pipe(out_fd);
pipe(in_fd);
pid_t child = fork();
if (child) {
// parent
close(out_fd[0]);
close(in_fd[1]);
if (fork()) {
close(out_fd[1]);
ssize_t size = 0;
const ssize_t block_size = 1048576;
char buf[block_size];
ssize_t n_read;
while ((n_read = read(in_fd[0], buf, block_size)) != 0) {
size += n_read;
}
size += n_read;
close(in_fd[0]);
cout << "p = " << p << "; compress ratio = " << double(size) / double(n) << '\n'; // data written before forking (the loop continues to fork)
} else {
write(out_fd[1], s.data(), s.size()); // data written after forking
exit(EXIT_SUCCESS); // exit(3) or _exit(2) ?
}
} else {
// child
close(in_fd[0]);
close(out_fd[1]);
dup2(out_fd[0], STDIN_FILENO);
dup2(in_fd[1], STDOUT_FILENO);
close(STDERR_FILENO);
execlp("xz", "xz", "-9", "--format=raw", reinterpret_cast<char *>(NULL));
}
}
}

You need to be careful with these sort of things. exit() does different things to _exit() and yet again different to _Exit(), and as the answer suggested as a duplicate explains, the _Exit (not same as _exit, note upper case E) will not call atexit() handlers, or flush any output buffers, delete temporary files, etc [which may in fact be atexit() handling, but it could also be done as a direct call, depending on how the C library code has been written].
Most of your output is done via write, which should be unbuffered from the applications perspective. But you are calling cout << ... as well. You will need to make sure that is flushed before exiting. Right now, you are using '\n' as the end of line marker, which may or may not flush the output. If you change that to endl instead, it will flush the file. Now you can safely use _Exit() from an output perspective - if your code were to set up its own atexit() handler for example, open temporary files or a bunch of other such things, this would be problematic. If you want to do more complex things in the forked process, it should be done by another exec.
In your program as it stands, there isn't any pending output to flush, so it "works" anyway, but if you add a cout << ... << '\n'; (or without the newline) type statement at the beginning of the code, you would see it go wrong. If you add a cout.flush();, it would "fix" the problem (based on your current code).
You should also check the return value from your execlp() call and call _Exit() in that case (and handle it in the main process so you don't continue the loop in case of failure?)

In the child branch of a fork(), it is normally incorrect to use exit(), because that can lead to stdio buffers being flushed twice, and temporary files being unexpectedly removed. In C++ code the situation is worse, because destructors for static objects may be run incorrectly. (There are some unusual cases, like daemons, where the parent should call _exit() rather than the child; the basic rule, applicable in the overwhelming majority of cases, is that exit() should be called only once for each entry into main.)

How to reduce cpu usage during data transfer on TCP ports realtime

I have a socket program which acts like both client and server.
It initiates connection on an input port and reads data from it. On a real time scenario it reads data on input port and sends the data (record by record ) on to the output port.
The problem here is that while sending data to the output port CPU usage increases to 50% while is not permissible.
while(1)
{
if(IsInputDataAvail==1)//check if data is available on input port
{
//condition to avoid duplications while sending
if( LastRecordSent < LastRecordRecvd )
{
record_time temprt;
list<record_time> BufferList;
list<record_time>::iterator j;
list<record_time>::iterator i;
// Storing into a temp list
for(i=L.begin(); i != L.end(); ++i)
{
if((i->recordId > LastRecordSent) && (i->recordId <= LastRecordRecvd))
{
temprt.listrec = i->listrec;
temprt.recordId = i->recordId;
temprt.timestamp = i->timestamp;
BufferList.push_back(temprt);
}
}
//Sending to output port
for(j=BufferList.begin(); j != BufferList.end(); ++j)
{
LastRecordSent = j->recordId;
std::string newlistrecord = j->listrec;
newlistrecord.append("\n");
char* newrecord= new char [newlistrecord.size()+1];
strcpy (newrecord, newlistrecord.c_str());
if ( s.OutputClientAvail() == 1) //check if output client is available
{
int ret = s.SendBytes(newrecord,strlen(newrecord));
if ( ret < 0)
{
log1.AddLogFormatFatal("Nice Send Thread : Nice Client Disconnected");
--connected;
return;
}
}
else
{
log1.AddLogFormatFatal("Nice Send Thread : Nice Client Timedout..connection closed");
--connected; //if output client not available disconnect after a timeout
return;
}
}
}
}
// Sleep(100); if we include sleep here CPU usage is less..but to send data real time I need to remove this sleep.
If I remove Sleep()...CPU usage goes very high while sending data to out put port.
}//End of while loop
Any possible ways to maintain real time data transfer and reduce CPU usage..please suggest.

There are two potential CPU sinks in the listed code. First, the outer loop:
while (1)
{
if (IsInputDataAvail == 1)
{
// Not run most of the time
}
// Sleep(100);
}
Given that the Sleep call significantly reduces your CPU usage, this spin-loop is the most likely culprit. It looks like IsInputDataAvail is a variable set by another thread (though it could be a preprocessor macro), which would mean that almost all of that CPU is being used to run this one comparison instruction and a couple of jumps.
The way to reclaim that wasted power is to block until input is available. Your reading thread probably does so already, so you just need some sort of semaphore to communicate between the two, with a system call to block the output thread. Where available, the ideal option would be sem_wait() in the output thread, right at the top of your loop, and sem_post() in the input thread, where it currently sets IsInputDataAvail. If that's not possible, the self-pipe trick might work in its place.
The second potential CPU sink is in s.SendBytes(). If a positive result indicates that the record was fully sent, then that method must be using a loop. It probably uses a blocking call to write the record; if it doesn't, then it could be rewritten to do so.
Alternatively, you could rewrite half the application to use select(), poll(), or a similar method to merge reading and writing into the same thread, but that's far too much work if your program is already mostly complete.

if(IsInputDataAvail==1)//check if data is available on input port
Get rid of that. Just read from the input port. It will block until data is available. This is where most of your CPU time is going. However there are other problems:
std::string newlistrecord = j->listrec;
Here you are copying data.
newlistrecord.append("\n");
char* newrecord= new char [newlistrecord.size()+1];
strcpy (newrecord, newlistrecord.c_str());
Here you are copying the same data again. You are also dynamically allocating memory, and you are also leaking it.
if ( s.OutputClientAvail() == 1) //check if output client is available
I don't know what this does but you should delete it. The following send is the time to check for errors. Don't try to guess the future.
int ret = s.SendBytes(newrecord,strlen(newrecord));
Here you are recomputing the length of the string which you probably already knew back at the time you set j->listrec. It would be much more efficient to just call s.sendBytes() directly with j->listrec and then again with "\n" than to do all this. TCP will coalesce the data anyway.

Speeding up non-blocking Unix Sockets (C++)

I've implemented a simple socket wrapper class. It includes a non-blocking function:
void Socket::set_non_blocking(const bool b) {
mNonBlocking = b; // class member for reference elsewhere
int opts = fcntl(m_sock, F_GETFL);
if(opts < 0) return;
if(b)
opts |= O_NONBLOCK;
else
opts &= ~O_NONBLOCK;
fcntl(m_sock, F_SETFL, opts);
}
The class also contains a simple receive function:
int Socket::recv(std::string& s) const {
char buffer[MAXRECV + 1];
s = "";
memset(buffer,0,MAXRECV+1);
int status = ::recv(m_sock, buffer, MAXRECV,0);
if(status == -1) {
if(!mNonBlocking)
std::cout << "Socket, error receiving data\n";
return 0;
} else if (status == 0) {
return 0;
} else {
s = buffer;
return status;
}
}
In practice, there seems to be a ~15ms delay when Socket::recv() is called. Is this delay avoidable? I've seen some non-blocking examples that use select(), but don't understand how that might help.

It depends on how you using sockets. If you have multiple sockets and you loop over all of them checking for data that may account for the delay.
With non-blocking recv you are depending on data being there. If your application need to use more than one socket you will have to constantly pool each socket in turns to find out if any of them have data available.
This is bad for system resources because it means your application is constantly running even when there is nothing to do.
You can avoid that with select. You basically set up your sockets, add them to group and select on the group. When anything happens on any of the selected sockets select returns specifying what happened and on which socket.
For some code about how to use select look at beej's guide to network programming

select will let you a specify a timeout, and can test if the socket is ready to be read from. So you can use something smaller than 15ms. Incidentally you need to be careful with that code you have, if the data on the wire can contain embedded NULs s won't contain all the read data. You should use something like s.assign(buffer, status);.

In addition to stefanB, I see that you are zeroing out your buffer every time. Why bother? recv returns how many bytes were actually read. Just zero out the one byte after ( buffer[status+1]=NULL )

How big is your MAXRECV? It might just be that you incur a page fault on the stack growth. Others already mentioned that zeroing out the receive buffer is completely unnecessary. You also take memory allocation and copy hit when you create a std::string out of received character data.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js