Multithreading brute-force and recursive multidimensional loops. Another way? - c++

I like figuring things out myself in terms of programming... So I was thinking of a method to loop through multi-dimensional arrays with dynamic dimensions. (mainly for things like brute force)
The method I came up with to loop through arrays of unknown dimensions is as follows:
#include <stdio.h>
#include <stdlib.h>
/* a simple example of the method I'm using */
void func(int *v,char *usable,int len,int D,int d)
{
for(v[d] = 0; v[d] < len; v[d]++)
{
if(d+1 < D)
func(v,len,D,d+1);
else
{
for(int i = 0; i < D; i++)
printf("%c",usable[v[i]]);
printf("\r");
}
}
}
int main()
{
int *v,z,min = 4,max = 6;
for(z = min; z <= max; z++)
{
v = malloc(sizeof(int)*z);
func(v,"0123456789",10,z,0);
printf("\n");
free(v);
}
return 0;
}
I see this as a nice elegant solution but I've come up with more problems when thinking of multi-threading the process. I would like to know alternate solutions for this type of process that I haven't discovered, as well as possible ways of multi-threading a process like this. One method I tried was creating pre-determined chunks but because of the large amount of values that need to be processed when digits or usable characters increase are too big, they overflow any normal variables.
One might ask: "Why do you need to create a multithreading brute force." And I would reply that the ability to create a multithreaded brute force means the ability to multithread other processes like maze-solving and best-route determination.
Thank you in advanced.

I'm still confused on what you are trying todo with your code. But from your description. You said you were trying to develope a "method to loop through multi-dimensional arrays with dynamic dimensions."
To me that statement says that you want to loop through a array of n-dimension. So it would loop through array[n] or loop through array[n][m] and so on for any number of dimension and length. Is this what you are trying todo? If so them you can simply use a templated function to loop through the n-dimensional array.

Related

modifying values in pointers is very slow?

I'm working with a huge amount of data stored in an array, and am trying to optimize the amount of time it takes to access and modify it. I'm using Window, c++ and VS2015 (Release mode).
I ran some tests and don't really understand the results I'm getting, so I would love some help optimizing my code.
First, let's say I have the following class:
class foo
{
public:
int x;
foo()
{
x = 0;
}
void inc()
{
x++;
}
int X()
{
return x;
}
void addX(int &_x)
{
_x++;
}
};
I start by initializing 10 million pointers to instances of that class into a std::vector of the same size.
#include <vector>
int count = 10000000;
std::vector<foo*> fooArr;
fooArr.resize(count);
for (int i = 0; i < count; i++)
{
fooArr[i] = new foo();
}
When I run the following code, and profile the amount of time it takes to complete, it takes approximately 350ms (which, for my purposes, is far too slow):
for (int i = 0; i < count; i++)
{
fooArr[i]->inc(); //increment all elements
}
To test how long it takes to increment an integer that many times, I tried:
int x = 0;
for (int i = 0; i < count; i++)
{
x++;
}
Which returns in <1ms.
I thought maybe the number of integers being changed was the problem, but the following code still takes 250ms, so I don't think it's that:
for (int i = 0; i < count; i++)
{
fooArr[0]->inc(); //only increment first element
}
I thought maybe the array index access itself was the bottleneck, but the following code takes <1ms to complete:
int x;
for (int i = 0; i < count; i++)
{
x = fooArr[i]->X(); //set x
}
I thought maybe the compiler was doing some hidden optimizations on the loop itself for the last example (since the value of x will be the same during each iteration of the loop, so maybe the compiler skips unnecessary iterations?). So I tried the following, and it takes 350ms to complete:
int x;
for (int i = 0; i < count; i++)
{
fooArr[i]->addX(x); //increment x inside foo function
}
So that one was slow again, but maybe only because I'm incrementing an integer with a pointer again.
I tried the following too, and it returns in 350ms as well:
for (int i = 0; i < count; i++)
{
fooArr[i]->x++;
}
So am I stuck here? Is ~350ms the absolute fastest that I can increment an integer, inside of 10million pointers in a vector? Or am I missing some obvious thing? I experimented with multithreading (giving each thread a different chunk of the array to increment) and that actually took longer once I started using enough threads. Maybe that was due to some other obvious thing I'm missing, so for now I'd like to stay away from multithreading to keep things simple.
I'm open to trying containers other than a vector too, if it speeds things up, but whatever container I end up using, I need to be able to easily resize it, remove elements, etc.
I'm fairly new to c++ so any help would be appreciated!
Let's look from the CPU point of view.
Incrementing an integer means I have it in a CPU register and just increments it. This is the fastest option.
I'm given an address (vector->member) and I must copy it to a register, increment, and copy the result back to the address. Worst: My CPU cache is filled with vector pointers, not with vector-member pointers. Too few hits, too much cache "refueling".
If I could manage to have all those members just in a vector, CPU cache hits would be much more frequent.
Try the following:
int count = 10000000;
std::vector<foo> fooArr;
fooArr.resize(count, foo());
for (auto it= fooArr.begin(); it != fooArr.end(); ++it) {
it->inc();
}
The new is killing you and actually you don't need it because resize inserts elements at the end if the size it's greater (check the docs: std::vector::resize)
And the other thing it's about using pointers which IMHO should be avoided until the last moment and it's uneccesary in this case. The performance should be a little bit faster in this case since you get better locality of your references (see cache locality). If they were polymorphic or something more complicated it might be different.

Modularizing spaghetti code

I'm still a newbie to C++ and I've been trying to modularize some spaghetti code that was given to me. So far (apart from learning how to use git and installing the rarray library to replace the automatic arrays with them) I've been sort of stumped as to how to modularize things and then compile it via make.
I understand that I must create prototypes in a header, create my object files from my functions, and then compile it all with a 'driver' code. Running/writing a make file is not my concern, but it's how to begin modularizing code like this; I'm not sure how to make functions that modify arrays!
Any pointers in the right direction would be amazing. I can clarify more if necessary.
#include <cmath>
#include <iostream>
#include <rarray> // Including the rarray library.
#include <rarrayio> // rarray input/output, if necessary. Probably not.
int main()
{
// ants walk on a table
rarray<float,2> number_of_ants(356,356);
rarray<float,2> new_number_of_ants(356,356);
rarray<float,2> velocity_of_ants(356,356);
const int total_ants = 1010; // initial number of ants
// initialize
for (int i=0;i<356;i++) {
for (int j=0;j<356;j++) {
velocity_of_ants[i][j] = M_PI*(sin((2*M_PI*(i+j))/3560)+1);
}
}
int n = 0;
float z = 0;
for (int i=0;i<356;i++) {
for (int j=0;j<356;j++) {
number_of_ants[i][j] = 0.0;
}
}
while (n < total_ants) {
for (int i=0;i<356;i++) {
for (int j=0;j<356;j++) {
z += sin(0.3*(i+j));
if (z>1 and n!=total_ants) {
number_of_ants[i][j] += 1;
n += 1;
}
}
}
}
// run simulation
for (int t = 0; t < 40; t++) {
float totants = 0.0;
for (int i=0;i<356;i++) {
for (int j=0;j<356;j++) {
totants += number_of_ants[i][j];
}
}
std::cout << t<< " " << totants << std::endl;
for (int i=0;i<356;i++) {
for (int j=0;j<356;j++) {
new_number_of_ants[i][j] = 0.0;
}
}
for (int i=0;i<356;i++) {
for (int j=0;j<356;j++) {
int di = 1.9*sin(velocity_of_ants[i][j]);
int dj = 1.9*cos(velocity_of_ants[i][j]);
int i2 = i + di;
int j2 = j + dj;
// some ants do not walk
new_number_of_ants[i][j]+=0.8*number_of_ants[i][j];
// the rest of the ants walk, but some fall of the table
if (i2>0 and i2>=356 and j2<0 and j2>=356) {
new_number_of_ants[i2][j2]+=0.2*number_of_ants[i][j];
}
}
}
for (int i=0;i<356;i++) {
for (int j=0;j<356;j++) {
number_of_ants[i][j] = new_number_of_ants[i][j];
totants += number_of_ants[i][j];
}
}
}
return 0;
}
I've been sort of stumped as to how to modularize things and then compile it via make.
That might be in part due to the code you are trying to modularize. Modularization is an idiom that is often used to help separate problem domains so that if one area of code has an issue, it won't necessarily* affect another area, and is especially useful when building larger applications; modularization is also one of the key points to classes in object oriented design.
*necessarily with regards to "spaghettification", that is, if the code really is "spaghetti code", often modifying or fixing one area of code most certainly affects other areas of code with unintended or unforeseen consequences, in other words, not modular.
The code you've posted is 63 lines (the main function), and doesn't really require any modularization. Though if you wanted to, you'd want to look at what could be modularized and what should be, but again, there isn't really much in the way to separate out, aside from making separate functions (which would just add to the code bulk). And since you asked specifically
I'm not sure how to make functions that modify arrays!
That can be done with the following:
// to pass a variable by reference (so as to avoid making copies), just give the type with the & symbol
void run_simulation(rarray<float,2>& noa, rarray<float,2>& new_noa, rarray<float,2>& voa)
{
// do something with the arrays
}
int main()
{
// ants walk on a table
rarray<float,2> number_of_ants(356,356);
rarray<float,2> new_number_of_ants(356,356);
rarray<float,2> velocity_of_ants(356,356);
...
run_simulation(number_of_ants, new_number_of_ants, velocity_of_ants);
...
}
Also, it should be noted there's a potential bug in your code; under the run simulation loop, you declare float totants = 0.0; then act on that variable until the end of the loop, at which point you still modify it with totants += number_of_ants[i][j];. If this variable is to be used to keep a 'running' total without being reset, you'd need to move the totants declaration outside of the for loop, otherwise, strictly speaking, that last totants += statement is not necessary.
Hope that can help add some clarity.
Except for replacing the magic numbers with constants in the beginning, there is not much that can be done to improve scientific code as barely anything is reusable.
The only part that is repeated is:
for (int i=0;i<356;i++) {
for (int j=0;j<356;j++) {
new_number_of_ants[i][j] = 0.0;
}
}
Which you can extract as a function (I have not replaced the magic numbers, you should do that first and give them as parameters):
void zeroRarray(rarray<float, 2> number_of_ants) {
for (int i = 0; i < 356; i++) {
for (int j = 0; j < 356; j++) {
number_of_ants[i][j] = 0.0;
}
}
}
And call like:
zeroRarray(number_of_ants); // Btw the name of this rarray is misleading!
Also, replace the mathematical expressions with function calls:
velocity_of_ants[i][j] = M_PI* (sin((2 * M_PI * (i + j)) / 3560) + 1);
with:
velocity_of_ants[i][j] = calculateSomething(i, j);
where the function looks something like:
double calculateSomethingHere(int i, int j) {
return M_PI * (sin((2 * M_PI * (i + j)) / 3560) + 1);
}
so you can give these long and insightful names and focus on what each part of your code does and not what it looks like.
Most IDE's have a refactoring functionality built-in where you highlight part of code that you want to extract and right-click and select Extract function from Refactor (or something similar).
If your code is short (e.g under 200 lines) there is not much you can do except extracting parts of your code that are very abstract. The next step is to write a class for ants and what ever these ants are doing, but there is little benefit for this unless you have more code.
This is not spaghetti code at all. The control structure is actually quite straight-forward (a series of loops, sometimes nested). From the manner csome constructs are being used, it has been translated from some other programming language to C++ without much effort to turn it from the original language to "effective C++" (i.e. it is C++ written with techniques from another language). But my guess is that the original language was somewhat different from C++ - or that the original code did not make a lot of use of that language's features.
If you want to modularise it, consider breaking some things into separately, appropriately named, functions.
Get rid of the magic values (like 356, 3560, 0.3, 40, 1.9, etc). Turn them into named constants (if they are to be fixed at compile time) or named variables (if there is a reasonable chance you may wish them to be inputs to the code at some time in the future). Bear in mind that M_PI is not actually standard in C or C++ (it is common to a number of C and C++ implementations, but is not standard so is not guaranteed to work with all compilers).
Work out what rarray is, and work out how to replace it with a standard C++ container. My guess, from the usage, is that rarray<float, 2> number_if_ants(356,356) represents a two-dimensional array of floats, with both dimensions equal to 356. As such, it might be appropriate to use a std::vector<std::vector<float> > (any version of C++) or (in C++11) std::array<std::array<float, dimension>, dimension> (where dimension is my arbitrary name to replace your magic value of 356). That may look a bit more complicated, but can be made much simpler with the help of a couple of tyepdefs. In the long run, C++ developers will understand the code better than they will if you insist on using rarray.
Look carefully at operations that work on the C++ standard containers. For example, construction and resizing of a std::vector - by default - initialises elements to zero in many circumstances. You might be able replace some of sets of nested loops with a single statement.
Also, dig into the standard algorithms (in header algorithm). They can act on a range of elements in any std::vector - via iterators - and possibly do other things directly that this code needs nested loops for.

Writing two versions of a function, one for "clarity" and one for "speed"

My professor assigned homework to write a function that takes in an array of integers and sorts all zeros to the end of the array while maintaining the current order of non-zero ints. The constraints are:
Cannot use the STL or other templated containers.
Must have two solutions: one that emphasizes speed and another that emphasizes clarity.
I wrote up this function attempting for speed:
#include <iostream>
#include <cstdio>
#include <cstdlib>
using namespace std;
void sortArray(int array[], int size)
{
int i = 0;
int j = 1;
int n = 0;
for (i = j; i < size;)
{
if (array[i] == 0)
{
n++;
i++;
}
else if (array[i] != 0 && j != i)
{
array[j++] = array[i++];
}
else
{
i++;
n++;
}
}
while (j < size)
{
array[j++] = 0;
}
}
int main()
{
//Example 1
int array[]{20, 0, 0, 3, 14, 0, 5, 11, 0, 0};
int size = sizeof(array) / sizeof(array[0]);
sortArray(array, size);
cout << "Result :\n";
for (int i = 0; i < size; i++)
{
cout << array[i] << " ";
}
cout << endl << "Press any key to exit...";
cin.get();
return 0;
}
It outputs correctly, but;
I don't know what the speed of it actually is, can anyone help me figure out how to calculate that?
I have no idea how to go about writing a function for "clarity"; any ideas?
I my experience, unless you have very complicated algorithm, speed and clarity come together:
void sortArray(int array[], int size)
{
int item;
int dst = 0;
int src = 0;
// collect all non-zero elements
while (src < size) {
if (item = array[src++]) {
array[dst++] = item;
}
}
// fill the rest with zeroes
while (dst < size) {
array[dst++] = 0;
}
}
Speed comes from a good algorithm. Clarity comes from formatting, naming variables and commenting.
Speed as in complexity?
Since you are, and need, to look at all the elements in the array — and as such have a single loop going through the indexes in the range [0, N)—where N denotes the size of the input—your solution is O(N).
Further reading:
Plain English explanation of big O
Determining big O Notation
Regarding clearity
In my honest opinion there shouldn't need to be two alternatives when implementing such functionality as you are presenting. If you rename your variables to more suitable (descriptive) names your current solution should be clear enough to count as both performant and clear.
Your current approach can be written in plain english in a very clear fashion:
pseudo-explanation
set write_index to 0
set number_of_zeroes to 0
For each element in array
If element is 0
increase number_of_zeros by one
otherwise
write element value to position denoted by write_index
increase write_index by one
write number_of_zeroes 0s at the end of array
Having stated the explanation above we can quickly see that sortArray is not a descriptive name for your function, a more suitable name would probably be partition_zeroes or similar.
Adding comments could improve readability, but you current focus should lie in renaming your variables to better express the intent of the code.
(I feel your question is almost off-topic; I am answering it from a Linux perspective; I recommend using Linux to learn C++ programming; you'll adapt my advices to your operating system if you are using something else....)
speed
Regarding speed, you should have two complementary approaches.
The first (somehow "theoretical") is to analyze (i.e. think on) your algorithm and give (with some proof) its asymptotic time complexity.
The second approach (only "practical", and often pragmatical) is to benchmark and profile your program. Don't forget to compile with optimizations enabled (e.g. using g++ -Wall -O2 with GCC). Have a benchmark which runs for more than half of a second (so processes a large amount of data, e.g. several million numbers) and repeat it several times (e.g. using time(1) command on Linux). You could also measure some time inside your program using e.g. <chrono> in C++11, or just clock(3) (if you read a large array from some file, or build a large array of pseudo-random numbers with <random> or with random(3) you certainly want to measure separately the time to read or fill the array with the time to move zeros out of it). See also time(7).
(You need to process a large amount of data - more than a million items, perhaps many millions of them - because computer are very fast; a typical "elementary" operation -a machine instruction- takes less than a nanosecond, and you have lot of uncertainty on a single run, see this)
clarity
Regarding clarity, it is a bit subjective, but you might try to make your code readable and concise. Adding a few good comments could also help.
Be careful about naming: sorting is not exactly what your program is doing (it is more moving zeros than sorting the array)...
I think this is the best - Of course you may wish to use doxygen or some other
// Shift the non-zeros to the front and put zero in the rest of the array
void moveNonZerosTofront(int *list, unsigned int length)
{
unsigned int from = 0, to = 0;
// This will move the non-zeros
for (; from < length; ++from) {
if (list[from] != 0) {
list[to] = list[from];
to++;
}
}
// So the rest of the array needs to be assigned zero (as we found those on the way)
for (; to < length; +=to) {
list[to] = 0;
}
}

Faster way to sort an array of structs c++

I have a struct called count declared called count with two things in it, an int called frequency and a string called word. To simplify, my program takes in a book as a text file and I count how many times each word appears. I have an array of structs and I have my program to where it will count each time a word appears and now I want a faster way to sort the array by top frequency than the way I have below. I used the bubble sorting algorithm below but it is taking my code way too long to run using this method. Any other suggestions or help would be welcome!! I have looked up sort from the algorithm library but don't understand how I would use it here. I am new to c++ so lots of explanation on how to use sort would help a lot.
void sortArray(struct count array[],int size)
{
int cur_pos = 0;
string the_word;
bool flag= true;
for(int i=0; i<(size); i++)
{
flag = false;
for(int j=0; j< (size); j++)
{
if((array[j+1].frequency)>(array[j].frequency))
{
cur_pos = array[j].frequency;
the_word = array[j].word;
array[j].frequency = array[j+1].frequency;
array[j].word = array[j+1].word;
array[j+1].frequency = cur_pos;
array[j+1].word = the_word;
flag = true;
}
}
}
};
You just need to define operator less for your structures,
and use std::sort, see example:
http://en.wikipedia.org/wiki/Sort_%28C%2B%2B%29
After you created a pair of for the data set, you can use std::map as container and insert the pairs into it. If you want to sort according to frequency define std:map as follows
std::map myMap;
myMap.insert(std::make_pair(frequency,word));
std::map is internally using a binary tree so you will get a sorted data when you retrieve it.

Stack versus Integer

I've created a program to solve Cryptarithmetics for a class on Data Structures. The professor recommended that we utilize a stack consisting of linked nodes to keep track of which letters we replaced with which numbers, but I realized an integer could do the same trick. Instead of a stack {A, 1, B, 2, C, 3, D, 4} I could hold the same info in 1234.
My program, though, seems to run much more slowly than the estimation he gave us. Could someone explain why a stack would behave much more efficiently? I had assumed that, since I wouldn't be calling methods over and over again (push, pop, top, etc) and instead just add one to the 'solution' that mine would be faster.
This is not an open ended question, so do not close it. Although you can implement things different ways, I want to know why, at the heart of C++, accessing data via a Stack has performance benefits over storing in ints and extracting by moding.
Although this is homework, I don't actually need help, just very intrigued and curious.
Thanks and can't wait to learn something new!
EDIT (Adding some code)
letterAssignments is an int array of size 26. for a problem like SEND + MORE = MONEY, A isn't used so letterAssignments[0] is set to 11. All chars that are used are initialized to 10.
answerNum is a number with as many digits as there are unique characters (in this case, 8 digits).
int Cryptarithmetic::solve(){
while(!solved()){
for(size_t z = 0; z < 26; z++){
if(letterAssignments[z] != 11) letterAssignments[z] = 10;
}
if(answerNum < 1) return NULL;
size_t curAns = answerNum;
for(int i = 0; i < numDigits; i++){
if(nextUnassigned() != '$') {
size_t nextAssign = curAns % 10;
if(isAssigned(nextAssign)){
answerNum--;
continue;
}
assign(nextUnassigned(), nextAssign);
curAns /= 10;
}
}
answerNum--;
}
return answerNum;
}
Two helper methods in case you'd like to see them:
char Cryptarithmetic::nextUnassigned(){
char nextUnassigned = '$';
for(int i = 0; i < 26; i++) {
if(letterAssignments[i] == 10) return ('A' + i);
}
}
void Cryptarithmetic::assign(char letter, size_t val){
assert('A' <= letter && letter <= 'Z'); // valid letter
assert(letterAssignments[letter-'A'] != 11); // has this letter
assert(!isAssigned(val)); // not already assigned.
letterAssignments[letter-'A'] = val;
}
From the looks of things the way you are doing things here is quite inefficiant.
As a general rule try to have the least amount of for loops possible since each one will slow down your implementation greatly.
for instance if we strip all other code away, your program looks like
while(thing) {
for(z < 26) {
}
for(i < numDigits) {
for(i < 26) {
}
for(i < 26) {
}
}
}
this means that for each while loop you are doing ((26+26)*numDigits)+26 loop operations. Thats assuming isAssigned() does not use a loop.
Idealy you want:
while(thing) {
for(i < numDigits) {
}
}
which i'm sure is possible with changes to your code.
This is why your implementation with the integer array is much slower than an implementation using the stack which does not use the for(i < 26) loops (I assume).
In Answer to your original question however, storing an array of integers will always be faster than any struct you can come up with simply because there are more overheads involved in assigning the memory, calling functions, etc.
But as with everything, implementation is the key difference between a slow program and a fast program.
The problem is that by counting you are considering also repetitions, when may be the problem asks to assign a different number to each different letter so that the numeric equation holds.
For example for four letters you are testing 10*10*10*10=10000 letter->number mappings instead of 10*9*8*7=5040 of them (the bigger is the number of letters and bigger becomes the ratio between the two numbers...).
The div instruction used by the mod function is quite expensive. Using it for your purpose can easily be less efficient than a good stack implementation. Here is an instruction timings table: http://gmplib.org/~tege/x86-timing.pdf
You should also write unit tests for your int-based stack to make sure that it works as intended.
Programming is actually trading memory for time and vice versa.
Here you are packing data into integer. You spare memory but loose time.
Speed of course depends on the implementation of stack. C++ is C with classes. If you are not using classes it's basically C(as fast as C).
const int stack_size = 26;
struct Stack
{
int _data[stack_size];
int _stack_p;
Stack()
:_stack_size(0)
{}
inline void push(int val)
{
assert(_stack_p < stack_size); // this won't be overhead
// unless you compile debug version(-DNDEBUG)
_data[_stack_p] = val;
}
inline int pop()
{
assert(_stack_p > 0); // same thing. assert is very useful for tracing bugs
return _data[--_stack_p]; // good hint for RVO
}
inline int size()
{
return _stack_p;
}
inline int val(int i)
{
assert(i > 0 && i < _stack_p);
return _data[i];
}
}
There is no overhead like vtbp. Also pop() and push() are very simple so they will be inlined, so no overhead of function call. Using int as stack element also good for speed because int is guaranteed to be of best suitable size for processor(no need for alignment etc).