Multithreading – known similarity solution

Multithreading – known similarity solution - c++

I am looking for a known solution (as producer-consumer problem) for this situation .
In my case there are two options:
link to image,
text file with links to images and links to other text files (with other links).
I'm trying to create a multi-threading downloader in C++ (on unix) using posix mutex and posix semaphore.
Application has link to the first text file.
Threads sleep (semaphore = 0).
Main thread downloads first text file.
Parse for other links -- put links in some queue (semaphore += links_count --> other threads wake up).
Other threads produce other links.
What with main thread?
How to check other threads -- finish state?
With use finite queue there can be deadlock: text file contains many links (queue as full with other text files). No text file can be finished.
Thank you for your ideas.

Well, your problem is still kind of a producer/consumer problem but your consumers are also producers. Some ways to deal with the problem:
Do not limit your queue size. Simply fail when your process runs out of memory. Not very elegant but will probably work in 99.99% of all download scenarios (assuming 100 bytes per download link on average and about 2GB available memory you would have to store more than 20 million links in your queue before running out of memory).
Split your producer and consumer by using the hard drive as buffer. Download files into a temporary folder. Have a thread watch that folder for new files. Once a new file appears, parse it and put the items in the consumer queue. Once the file is finished parsing put it into the final download location. This way you are only limited by disk space. This way your producer (parser) is a different thread than your consumers (downloader).
Edit
You can wait on your worker threads with pthread_join in the main thread.

Related

How to keep data for progress bar

My program downloads files from servers, and parse it.
For download files i have got a bar, but i want to make bar for parsing.
Parsing take a lot of time, and power, so my solution dont have to use a lot of power.
few servers -> few files -> line in file
I.e in one time, i download files from servers (about 4-5 files) and when downloaded, just start parsing.
But when servers is more than 1, my program download files from two servers so i have 2x more files. Files name on servers are the same but when i download file, i change name of these files to "world"+"orginalfile.txt"
I thought about something like that:
map<int server,std::map<int file(<make it enum),{current line, max lines} >> (struktura)
Because when reading file, i want to make emit to send data to window.
When start reading i want to send (file,lines_in_file,server)
And when reading send (file,current_line,world)
Then in window which read this data push this data to some variable (like above example) and run second function to calculate progress bar.
i.e
servers[] -> files [] -> thread download -> thread reading (these
threads start per file, so if servers are 2 and files 4 these threads
start 8x) -> emit init signal (send file,lines_in_file,server_number)
+ emit (currentLineWhenReading,file,server) signal when reading line-per-line
So how to make it the best, to get a lot of data and hold it + use litte power to calculate this ?

Parse in a separate thread. Increment (with mutex) some counter while parsing (or use some atomic increment), like e.g. in this answer.
Probably the parsing thread is mostly IO bound. So it will most often wait for disk IO. In that case, the small overhead of mutex locking is insignificant.
In the main GUI thread, set up things (e.g. some Qt timer...) to read every second the parse counter (with mutex), and update the progress bar

How to read from a file every 30ms and process it in a new thread?

I have a file with several integers (each one in a new line). I want to read the file every 30ms and process each of these integers.
Software : C++
Present idea :
1) In the main(), use file input/ouput to continuously read from the file and sleep for 30ms everytime.
2) Everytime I read an integer, I create a new thread which processes this integer.
Will the main() be suspended till the new thread finishes its operation? or will it also run in parallel?
Is there any better approach to doing the same process?

The idea works in theory. You do create a lot of threads, which is inefficient. And if you don't have enough CPU cores, at a certain moment the main thread won't get scheduled every 30 ms anymore so you'll fall behind.
The better solution is to have a threadpool whose threads pick up numbers that the main thread has made available.

What is the fastest way to search all the files in hard disk?

I am currently trying to search all the files in the hard disk.
I'll search a lot of documents on window 7. That means using lot of File I/O...
I am thinking I should use multi-thread or Asynchronous I/O.
What do you think?

If you think about it the right way, this can lend itself well to a worker pipeline: thread 1 consumes a list of directories to retrieve and fetches directory listings. Thread 2 consumes directory listings and dispatches additional directories back to thread 1 while forwarding files to thread 3.
Thread 3 meanwhile has a simple job: fetch N pages of data at a time from files and forward them to thread 4 which searches pages of memory for matches.
Because the application is largely going to be IO bound, you can comfortably afford to invest some CPU in thread 3 to optimize the concurrency and priority of requests to try and ensure that you maximize the speed with which new pages are delivered to thread 4 and thus how quickly the entire process completes.
OTOH, you may find that just switching to memory-mapped IO will produce a less complex solution with a good-enough speed.

File Copy Tool w/ Producer/Consumer Model

so I was looking over my next school assignment, and I'm baffled. I figured I would come to the experts for some direction. My knowledge on synchronization is severely lacking, and I didn't do so hot on the "mcopyfile" assignment it refers to. Terrible would probably be a good word for it. If I could get some direction on how to accomplish this problem, it would be much appreciated. Not looking for someone to do my assignment, just need someone to point me in the right direction. baby steps.
Based on the multi-threaded file copy tool
(mcopyfile) you have created in Lab 2, now please use a worker
pool (Producer-Consumer model) implementation that uses a fixed
number of threads to handle the load (regardless how many files in the
directory to copy). Your program should create 1 file copy producer
thread and multiple file copy consumer threads (this number is taken
from the command-line argument). The file copy producer thread will
generate a list of (source and destination) file descriptors in a buffer
structure with bounded size. Each time when the producer accesses the buffer it will write
one (source, destination) file entry (per visit). And all file copy
consumer threads will read from this buffer, execute the actual file
copy task, and remove the corresponding file entry (each consumer
will consume one entry each time). Both producer and consumer
threads will write a message to standard output giving the file name
and the completion status (e.g., for producer: “Completing putting
file1 in the buffer”, for consumer: “Completing copying file1 to …”).

Assuming, you know how to spawn threads, let me break down the problem for you. There are following components:
Producer. It generates Tasks for the Consumers based on the source directory input parameter.
Task. A task is information for Consumer to execute its copy task. Namely a tuple of source file descriptor and destination file descriptor.
Queue. It is the central piece of communication between Producer and Consumer. Producers writes Tasks to Queue and Consumer consumes it.
Consumer. You have a pool of actual workers that take Task as input and executes copy operation.
Now as per the question, spawn a thread for producer and n threads for consumers. And this is what the threads do:
Producer thread
For list of files in the source directory:
Task <- (Source file path, destination file path)
Acquire lock on Queue
Write Task to queue
Release lock on Queue
Acquire lock on stdout
Write to stdout
Release lock on stdout
Consumer thread
While True:
If size of queue == 0:
Sleep for some time
Else:
Acquire lock on Queue
Dequeue a Task
Release lock on Queue
Execute copy operation
Acquire lock on stdout
Write to stdout
Release lock on stdout
I hope this helps.

Your assignment looks pretty straightforward to me once you know what API/library you'll use for the threading functionality.
First, you'll parse the command-line argument and create the specified number of threads, then from the main thread obtain the list of files in the folder and start putting them in an array (like a std::vector) that is shared among the threads and is synchronized with a mutex (or a critical section on Windows). Whenever one of the consumer threads acquires the mutex, it makes a copy of a file entry in the array, removes that entry from the array, releases the mutex so that another thread can start doing the same, and starts copying the file represented by the entry it removed from the array.
I would give you some code snippets, but you didn't say what API/library you're using for the threading functionality.

Writing concurrently to a file

I have this tool in which a single log-like file is written to by several processes.
What I want to achieve is to have the file truncated when it is first opened, and then have all writes done at the end by the several processes that have it open.
All writes are systematically flushed and mutex-protected so that I don't get jumbled output.
First, a process creates the file, then starts a sequence of other processes, one at a time, that then open the file and write to it (the master sometimes chimes in with additional content; the slave process may or may not be open and writing something).
I'd like, as much as possible, not to use more IPC that what already exists (all I'm doing now is writing to a popen-created pipe). I have no access to external libraries other that the CRT and Win32 API, and I would like not to start writing serialization code.
Here is some code that shows where I've gone:
// open the file. Truncate it if we're the 'master', append to it if we're a 'slave'
std::ofstream blah(filename, ios::out | (isClient ? ios:app : 0));
// do stuff...
// write stuff
myMutex.acquire();
blah << "stuff to write" << std::flush;
myMutex.release();
Well, this does not work: although the output of the slave process is ordered as expected, what the master writes is either bunched together or at the wrong place, when it exists at all.
I have two questions: is the flag combination given to the ofstream's constructor the right one ? Am I going the right way anyway ?

If you'll be writing a lot of data to the log from multiple threads, you'll need to rethink the design, since all threads will block on trying to acquire the mutex, and in general you don't want your threads blocked from doing work so they can log. In that case, you'd want to write your worker thread to log entries to queue (which just requires moving stuff around in memory), and have a dedicated thread to pull entries off the queue and write them to the output. That way your worker threads are blocked for as short a time as possible.
You can do even better than this by using async I/O, but that gets a bit more tricky.

As suggested by reinier, the problem was not in the way I use the files but in the way the programs behave.
The fstreams do just fine.
What I missed out is the synchronization between the master and the slave (the former was assuming a particular operation was synchronous where it was not).
edit: Oh well, there still was a problem with the open flags. The process that opened the file with ios::out did not move the file pointer as needed (erasing text other processes were writing), and using seekp() completely screwed the output when writing to cout as another part of the code uses cerr.
My final solution is to keep the mutex and the flush, and, for the master process, open the file in ios::out mode (to create or truncate the file), close it and reopen it using ios::app.

I made a 'lil log system that has it's own process and will handle the writing process, the idea is quite simeple. The proccesses that uses the logs just send them to a pending queue which the log process will try to write to a file. It's like batch procesing in any realtime rendering app. This way you'll grt rid of too much open/close file operations. If I can I'll add the sample code.

How do you create that mutex?
For this to work this needs to be a named mutex so that both processes actually lock on the same thing.
You can check that your mutex is actually working correctly with a small piece of code that lock it in one process and another process which tries to acquire it.

I suggest blocking such that the text is completely written to the file before releasing the mutex. I've had instances where the text from one task is interrupted by text from a higher priority thread; doesn't look very pretty.
Also, put the format into Comma Separated format, or some format that can be easily loaded into a spreadsheet. Include thread ID and timestamp. The interlacing of the text lines shows how the threads are interacting. The ID parameter allows you to sort by thread. Timestamps can be used to show sequential access as well as duration. Writing in a spreadsheet friendly format will allow you to analyze the log file with an external tool without writing any conversion utilities. This has helped me greatly.

One option is to use ACE::logging. It has an efficient implementation of concurrent logging.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js