Matrix multiplication using threads - c++

I am trying to multiply 2 matrices together by using 1 thread for each cell of output.
I am using c++/g++ on unix.
How would I go about doing this?
Can I do this in a loop?

Here's my suggestion:
Write a function that will compute one output cell. Give it parameters indicating which cell to compute.
Write a single-threaded program that uses a loop to compute every cell (calling the function from "1"). Store all the results and don't write them out until you have finished computing all cells.
Modify the program so that instead of each loop calling the function, each loop creates a thread to execute the function.
Figure out how to make the "main" program wait until all threads have finished before writing out all the results.
I think that will give you a strategy to work out a solution, without me doing your homework for you.
If you have a go and it doesn't work, post your code on here and people will help you debug it. The important part is not for you to get a good answer, it is for you to learn how to solve this type of problem -- so it won't really help you if somebody just gives you the answer.

Related

Is it a good idea to multi-thread a program that its main function is to read from the disk?

I am developing a program which reads multiple big pages from the disk and perform several range search. The program reads from several pages then writes the query results to an output page. I have to maintain the order of the output so it corresponds to the order of the input.
However, the program becomes very slow after multiple reads, I am thinking of using a multi-thread program to run several search at a time. I have 4-core Linux machine and I would like to have two threads of the program. Is it a good idea? and how can I maintain the order of the output file? locks won't help since I don't know which thread would finishes first.
If it is a good idea how can I do so?
Thanks
Update: This must be done without the use of any kind of SQL libraries.
Threading is clearly a good idea in this case.
To mantain the order in the results I would use a mapReduce approach, you can find some good help using qtConcurrent module from Qt.
The idea is that you take one or more pages and you pass them to a thread which will return the search results in a list.
Every thread/task is going to have a unique index so in the end the list returned from thread with index 2 will be placed after the list returned by thread with index 1.

Stop program with timer

I'm creating a logistic project in C++ where I have to compare the execution time of a solver that I created with an open source solver.
So, what I need is to stop the solver that I created if it will run longer than the open source solver.
The problem is that I didn't find anything about a timer that stops the actual executing program.
Someone can help me?
You could just launch a future, that sleeps for a given time and then call std::exit.
Without further information about what you are solving I would suggest running both in a series of benchmarks using multiple objectives to solve if possible since both might perform differently in different situations. Running both in rigorous benchmarks will help make sure your results are valid. Also even if your solver takes longer knowing that time difference will be able to help you optimize it.

Which technique that can I execute and get output of same function in same time?

According to the previous question How can I find ROI and detect markers inside, thank you for professional who's help me. I have already done that task. :)
My next question that related with previously,
Now I would like to track each blob individually by call the detecting function (named track(param)) with different parameters (as blobs number) in same time then functions will return me the output of blob position.
Which techniques that can I execute same function in the same time?
I have confused about OpenMP, OpenCL and some parallel programming that possible to return output on the same time or not?
Sorry for poorly in English. Thank you for all of helpers :)
It sounds to me like you just need multi-threading, which can be done by a library such as Boost Thread. You can spawn multiple threads to track each blob. You can then use join in the end to ensure all threads are completed and combine the results

C++ Asymptotic Profiling

I have a performance issue where I suspect one standard C library function is taking too long and causing my entire system (suite of processes) to basically "hiccup". Sure enough if I comment out the library function call, the hiccup goes away. This prompted me to investigate what standard methods there are to prove this type of thing? What would be the best practice for testing a function to see if it causes an entire system to hang for a sec (causing other processes to be momentarily starved)?
I would at least like to definitively correlate the function being called and the visible freeze.
Thanks
The best way to determine this stuff is to use a profiling tool to get the information on how long is spent in each function call.
Failing that set up a function that reserves a block of memory. Then in your code at various points, write a string to memory including the current time. (This avoids the delays associated with writing to the display).
After you have run your code, pull out the memory and parse it to deterimine how long parts of your code are taking.
I'm trying to figure out what you mean by "hiccup". I'm imagining your program does something like this:
while (...){
// 1. do some computing and/or file I/O
// 2. print something to the console or move something on the screen
}
and normally the printed or graphical output hums along in a subjectively continuous way, but sometimes it appears to freeze, while the computing part takes longer.
Is that what you meant?
If so, I suspect in the running state it is most always in step 2, but in the hiccup state it spending time in step 1.
I would comment out step 2, so it would spend nearly all it's time in the hiccup state, and then just pause it under the debugger to see what it's doing.
That technique tells you exactly what the problem is with very little effort.

Possible to distribute or parallel process a sequential program?

In C++, I've written a mathematical program (for diffusion limited aggregation) where each new point calculated is dependent on all of the preceding points.
Is it possible to have such a program work in a parallel or distributed manner to increase computing speed?
If so, what type of modifications to the code would I need to look into?
EDIT: My source code is available at...
http://www.bitbucket.org/damigu78/brownian-motion/downloads/
filename is DLA_full3D.cpp
I don't mind significant re-writes if that's what it would take. After all, I want to learn how to do it.
If your algorithm is fundamentally sequential, you can't make it fundamentally not that.
What is the algorithm you are using?
EDIT: Googling "diffusion limited aggregation algorithm parallel" lead me here, with the following quote:
DLA, on the other hand, has been shown
[9,10] to belong to the class of
inherently sequential or, more
formally, P-complete problems.
Therefore, it is unlikely that DLA
clusters can be sampled in parallel in
polylog time when restricted to a
number of processors polynomial in the
system size.
So the answer to your question is "all signs point to no".
Probably. There are parallel versions of most sequential algorithms, and for those sequential algorithms which are not immediately parallelisable there are usually parallel substitutes. This looks like be one of those cases where you need to consider parallelisation or parallelisability before you choose an algorithm. But unless you tell us a bit (a lot ?) more about your algorithm we can't provide much specific guidance. If it amuses you to watch SOers argue in the absence of hard data sit back and watch, but if you want answers, edit your question.
The toxiclibs website gives some useful insight into how one DLA implementation is done
There is cilk, which is an enhancement to the C language (unfortunately not C++ (yet)) that allows you to add some extra information to your code. With just a few minor hints, the compiler can automatically parallelize parts of your code, such as running multiple iterations of a for loop in parallel instead of in series.
Without knowing more about your problem, I'll just say that this looks like a good candidate to implement as a parallel prefix scan (http://en.wikipedia.org/wiki/Prefix_sum). The simplest example of this is an array that you want to make a running sum out of:
1 5 3 2 5 6
becomes
1 6 9 11 16 22
This looks inherently serial (as all the points depend on the ones previous), but it can be done in parallel.
You mention that each step depends on the results of all preceding steps, which makes it hard to parallelize such a program.
I don't know which algorithm you are using, but you could use multithreading for speedup. Each thread would process one step, but must wait for results that haven't yet been calculated (though it can work with the already calculated results if they don't change values over time). That essentially means you would have to use a locking/waiting mechanism in order to wait for results that haven't yet been calculated but are currently needed by a certain worker thread to go on.