I have created a module to transfer the data using multiple sockets using TCP client server communication. This transfers the file of 20MB in 10 secs.
Multiple sockets sends/receives the data in each of their a separate thread.
When I launch the module from another worker thread the time taken to send the same file increases to 40 secs.
Please let me know any solutions to avoid the time lagging.
Are you synchronizing the threads to read the content from the file at Client side and write back to file at server side? This adds time.
Along with this, by default you will having context switching time between multiple threads at both client and server.
One problem may be disk caching and seeking. If you are not already doing this, try interleaving blocks transferred by different threads more finely (like, say, to 4KB, so bytes 0...4095 transferred by 1st thread, 4096...8191 by 2nd thread, etc).
Also avoid mutexes, for example by having each thread know what it's supposed to read and send, or write and receive, when thread starts, so no inter-thread communication is needed. Aborting the whole transfer can be done by an atomic flag variable (checked by each thread after transferring a block) instead of mutexes.
Also on receiving end, make sure to do buffering in memory, so that you write to destination file sequentially. That is, if one thread transfers blocks faster than some other, those "early" blocks are just kept in memory until all the preceding blocks have been received and written.
If buffer size becomes an issue here, you may need to implement some inter-thread synchronization at one end (doesn't matter much, wether you slow down receiving or sending), to prevent the fastest thread getting too far ahead of the slowest thread, but for file sizes in the order of tens of megabytes, on PCs, this should not become an issue.
Related
I need to make different GET queries to a server to download a bunch of json files and write each download to disk and I want to launch some threads to speed that up.
Each download and writting of each file takes approximately 0.35 seconds.
I would like to know if, under linux at least (and under Windows since we are here), it is safe to write in parallel to disk and how many threads can I launch taking into account the waiting time of each thread.
If it changes something (I actually think so), the program doesn't write directly to disk. It just calls std::system to run the program wget because it is currently easier to do that way than importing a library. So, the waiting time is the time that the system call takes to return.
So, each writting to disk is being performed by a different process. I only wait that program to finish, and I'm not actually bound by I/O, but by the running time of an external process (each wget call creates and writes to a different file and thus they are completely independent processes). Each thread just waits for one call to complete.
My machine has 4 CPUs.
Some kind of formula to get an ideal number of threads according to CPU concurrency and "waiting time" per thread would be welcome.
NOTE: The ideal solution will be of course to make some performance testing, but I could be banned for the server if I abuse with so many request.
It is safe to do concurrent file I/O from multiple threads, but if you are concurrently writing to the same file, some form of synchronization is necessary to ensure that the writes to the file don't become interleaved.
For what you describe as your problem, it is perfectly safe to fetch each JSON blob in a separate thread and write them to different, unique files (in fact, this is probably the sanest, simplest design). Given that you mention running on a 4-core machine, I would expect to see a speed-up well past the four concurrent thread mark; network and file I/O tends to do quite a bit of blocking, so you'll probably run into a bottleneck with network I/O (or on the server's ability to send) before you hit a processing bottleneck.
Write your code so that you can control the number of threads that are spawned, and benchmark different numbers of threads. I'll guess that your sweet spot will be somewhere between 8 and 16 threads.
I was in a discussion about Multiple Threads in a client application and was told that using a separate thread for receiving data and another thread for sending data is not the way to go.
Why?
From what I know TCP is Full-Duplex so this would be a performance improvement, or not?
Having a dedicated send thread and a dedicated receive thread is bad for two reasons.
First, it means that a context switch is required every time you go from receiving to sending unless you are doing both at the same time.
Second, it means that in the the typical path where you receive a query, formulate a response, and then send that response, data will need to be handed from one thread to another, blowing out caches.
That said, if performance isn't super-critical and it fits into your design well, it certainly works. It's just that there's usually no advantage.
I suppose it depends on the scale of your application. If you are doing a small app for a class project, it might be enough to have the send and receive on the same thread. Then you don't have to worry about threading issues.
However, I worked on an application that had to listen for several thousand incoming connections, and each connection might be sending a significant amount of data. We had a thread whose sole purpose was listening for socket connections and putting the new connections into a pool, and a variable number of threads (depending on how busy the app was) just for reading off of sockets, and a different pool of threads for writing.
The problem is that if your listening socket isn't reading the data off of the wire fast enough and the buffer fills up, an error is returned, and in the case of thousands of clients, caused there to be a lot of reconnects and re-sends of data, which compounded the problem that the data was not being read fast enough in the first place.
So it comes back to what I said in the first place - it depends on the scale of your application, but why not add in the ability now? Just make sure that you are thread safe, and you should be OK.
My Linux C++ application is periodically reading sensor data. Readout is done by simple file I/O operation (OS is writing to file, application is reading from this file).
Some information about my platform:
I have single core processor with hyper-threading
sensor data update frequency is 1 second
application GUI runs in main thread and shouldn't be blocked
I considered two approaches for sensor data read out:
timer running in main application thread
separate thread with infinite loop which does sensor data readout and then sleeps
Which approach makes more sens, are there any other alternatives ? What are the costs of both solution (e.g. blocking of main thread in first or context switching in second approach) ?
I don't know anything about your application or the hardware, but here are a few things to consider:
If you use a thread, you will have to create a communication channel of some sort to tell the main thread that data has been updated. Usually this would be a pipe(), as signals are inherently unreliable and condition locks don't work with I/O multiplexing (i.e. select()/poll()).
Can you get the entire set of data without blocking? If so, then just reading it in the main thread is probably easier. However, if your read can block you'll probably need some more "keep track of my read state to incorporate it into my central select()", whereas a thread can just block until more data is available.
Thus, neither solution is automatically "easier" to do.
I wouldn't worry about "context switching" for a read that only occurs once per second; that's irrelevant.
What else does the main thread have to do? Is it ok if it blocks? If so, then you dont need to do the timer, etc in a separate thread.
If the main thread cant block waiting for the periodic timer, then a separate thread must be created. The communication of data between the threads can be via an object that is accessible to both threads and protected via a mutex (look up pthread_mutex_t), which is quite simple to do.
As for which solution would be better and what are the costs, it depends on what else the main thread is doing. But for something this simple, either way should be about the same, and the context switching shouldnt affect anything. What should affect performance the most is how performance intensive the reads are.
I believe that cost of the context switch once a second is not an issue even for single-core CPU without hyper-threading especially taking to the account that the application is running in user space, thus is not really time-critical. The polling of your sensor in the main thread complicates the logic of the application. So, I would recommend you to start a thread for that purpose.
A sleep-loop will skew the timing because each iteration is going to take longer than 1sec. Timers don't have that problem, and they are made for this scenario. So choose a timer.
Performance-wise there is no difference because you are only triggering once a second.
If the Linux driver is reading a sensor data and writing it to a device file every second, you shouldn't duplicate the timer logic in your application. It may happen that after 1 second sleep your application will still read the same data as 1 second ago. A better approach would be to have a thread that would call a blocking read on a device file. When new sensor data is available, blocking read returns, the thread can process the data and call read again.
I am designing a client and server socket program.
I have a file to be transferred to the server from the client using UDP, I repeat I am using UDP.....
I am sending through UDP so, the sending rate is too fast then the receiver, so I have created 3 threads listening on the same socket, so that when one thread is doing some work(I mean writing to a file using fwrite) with the received data the other thread can recv from the client.
My 1st question is when I am using a fwrite with multiple threads I have to use locks as the file pointer is shared between the threads. I am right in thinking???
My 2nd question is "Will there be any improvement in the performance if I use multiple threads to fwrite using locks over using a single thread to do the fwrite work with no locks...??? " ... Please guide me...
I would use one thread. Saves the complications. You can buffer the data and use asynchronous writes
http://www.gnu.org/s/hello/manual/libc/Asynchronous-Reads_002fWrites.html
Cache the data before writing it.
Let the writing happen in another thread.
Doing it the way you do will require locking the socket.
Q1: yes you do need to lock it (very slow!). Why not use a separate file descriptor in each thread? the problem comes mostly with the current file position managed by that descriptor.
Q2: Neither. If data needs ordering (yes, UDP!) you should still buffer it. RAM is much faster then disk IO. Feed a stream to buffer it and handle the data in that stream in a separate thread.
Similar to Ed's answer, I'd suggest using asynchronous I/O and a single thread for your server. Though I find using Boost.Asio easier than posix AIO.
My 1st question is when I am using a fwrite with multiple threads I have to use locks as the file pointer is shared between the threads
Yes, you always have to use locks when multiple threads are writing to a single object (file, memory, etc).
My 2nd question is "Will there be any improvement in the performance if I use multiple threads to fwrite using locks over using a single thread to do the fwrite work with no locks...??? "
I would use two threads. The first thread does nothing but read from the socket and store the data in memory. The second thread reads data from memory and writes it to the file. Treat the memory buffer as a FIFO queue and use a mutex to protect the queue pointers. You'll gain nothing from a third thread. In fact, it would probably harm performance and it definitely makes the problem far more complicated.
First, try to avoid using UDP for bulk transfers. If you use UDP you have to reinvent your own flow control protocol, as well as logic for retransmission and reordering. From the sounds of it, your problems boil down to missing flow control - so why not just use TCP?
Anyway, don't put your file writing in another thread. Modern OSes will internally buffer disk writes in any case - you'll only start blocking if you're writing data much faster than the disk can keep up, in which case buffering inside your process will only buy you another few seconds at most. Switch to TCP, or implement a proper flow control mechanism.
I'm writing a TCP server on Windows Server 2k8. This servers receives data, parses it and then sinks it on a database. I'm doing now some tests but the results surprises me.
The application is written in C++ and uses, directly, Winsocks and the Windows API. It creates a thread for each client connection. For each client it reads and parses the data, then insert it into the database.
Clients always connect in simultaneous chunks - ie. every once in a while about 5 (I control this parameter) clients will connect simultaneously and feed the server with data.
I've been timing both the reading-and-parsing and the database-related stages of each thread.
The first stage (reading-and-parsing) has a curious behavoir. The amount of time each thread takes is roughly equal to each thread but is also proportional to the number of threads connecting. The server is not CPU starved: it has 8 cores and always less than 8 threads are connected to it.
For example, with 3 simultaneous threads 100k rows (for each thread) will be read-and-parsed in about 4,5s. But with 5 threads it will take 9,1s on average!
A friend of mine suggested this scaling behavoir might be related to the fact I'm using blocking sockets. Is this right? If not, what might be the reason for this behavoir?
If it is, I'd be glad if someone can point me out good resources for understanding non blocking sockets on Windows.
Edit:
Each client thread reads a line (ie., all chars untils a '\n') from the socket, then parses it, then read again, until the parse fails or a terminator character is found. My readline routine is based on this:
http://www.cis.temple.edu/~ingargio/cis307/readings/snaderlib/readline.c
With static variables being declared as __declspec(thread).
The parsing, assuming from the non networking version, is efficient (about 2s for 100k rows). I assume therefore the problem is in the multhreaded/network version.
If your lines are ~120–150 characters long, you are actually saturating the network!
There's no issue with sockets. Simply transfering 3 times 100k lines, 150 bytes each, over a 100 Mbps line (1 take 10 bytes/byte to account for headers) will take... 4.5 s! There is no problem with sockets, blocking or otherwise. You've simply hit the limit of how much data you can feed it.
Non-blocking sockets are only useful if you want one thread to service multiple connections. Using non-blocking sockets and a poll/select loop means that your thread does not sit idle while waiting for new connections.
In your case this is not an issue since there is only one connection per thread so there is no issue if your thread is waiting for input.
Which leads to your original questions of why things slow down when you have more connections. Without further information, the most likely culprit is that you are network limited: ie your network cannot feed your server data fast enough.
If you are interested in non-blocking sockets on Windows do a search on MSDN for OVERLAPPED APIs
You could be running into other threading related issues, like deadlocks/race conditions/false sharing, which could be destroying the performance.
One thing to keep in mind is that although you have one thread per client, Windows will not automatically ensure they all run on different cores. If some of them are running on the same core, it is possible (although unlikely) to have your server in a sort of CPU-starved state, with some cores at 100% load and others idle. There are simply no guarantees as to how the OS spreads the load (in the default case).
To explicitly assign threads to particular cores, you can use SetThreadAffinityMask. It may or may not be worth having a bit of a play around with this to see if it helps.
On the other hand, this may not have anything to do with it at all. YMMV.