Efficiency of an algorithm for scrambled input - c++

I am currently writing a program, its done for the most part, in CPP that takes in a file, with numbered indices and then pushes out a scrambled quiz based on the initial input, so that no two are, theroretically, the same.
This is the code
// There has to be a more efficient way of doing this...
for (int tempCounter(inputCounter);
inputCounter != 0;
/* Blank on Purpose*/) {
randInput = (rand() % tempCounter) + 1;
inputIter = find (scrambledArray.begin(),
scrambledArray.end(),
randInput);
// Checks if the value passed in is within the given vector, no duplicates.
if (inputIter == scrambledArray.end()) {
--inputCounter;
scrambledArray.push_back(randInput);
}
}
The first comment states my problem. It will not happen, under normal circumstances, but what about if this were being applied to a larger application standpoint. This works, but it is highly inefficient should the user want to scramble 10000 or so results. I feel as if in that point this would be highly inefficient.
I'm not speaking about the efficiency of the code, as in shortening some sequences and compacting it to make it a bit prettier, I was more or less teaching someone, and upon getting to this point I came to the conclusion that this could be done in a way better manner, just don't know which way it could be...

So you want just the numbers 1..N shuffled? Yes, there is a more efficient way of doing that. You can use std::iota to construct your vector:
// first, construct your vector:
std::vector<int> scrambled(N);
std::iota(scrambled.begin(), scrambled.end(), 1);
And then std::shuffle it:
std::shuffle(scrambled.begin(), scrambled.end(),
std::mt19937{std::random_device{}()});
If you don't have C++11, the above would look like:
std::vector<int> scrambled;
scrambled.reserve(N);
for (int i = 1; i <= N; ++i) {
scrambled.push_back(i);
}
std::random_shuffle(scrambled.begin(), scrambled.end());

Related

most efficient way to have a if statement in C++

I am trying to do some Monte Carlo simulation, and as it is with this kind of simulation, it requires a lot of iterations, even for the smallest system. Now I want to do some tweaks with my previous code but it increases the wall time or running time, by 10 fold, which makes a week of calculations to more than two months. I wonder whether I am doing the most efficient way to do the simulation.
Before that, I was using a set of fixed intervals to get the properties of the simulations, but now I want to record a set of random intervals to get the system information as it is the most logical thing to do. However I don't know how to do it.
The code that I was using was basically something like that:
for(long long int it=0; it<numIterations; ++it)
{
if((numIterations>=10) && (it%1000==0))
{
exportedStates = system.GetStates();
Export2D(exportedStates, outputStatesFile1000, it);
}
}
As you see, before the tweaks made it was going through the simulation and only record the data, every 1000th iterations.
Now I want to do something like this
for(long long int it=0; it<numIterations; ++it)
{
for(int j = 1; j <= n_graph_points; ++j){
for (int i = 0; i < n_data_per_graph_points; ++i){
if (it == initial_position_array[j][i] || it == (initial_position_array[j][i] + delta_time_arr[j])) {
exportedStates = system.GetStates();
Export2D(exportedStates, outputStatesFile, it);
}
}
}
}
In this part, the initial position array is just an array with lots of random numbers. The two for loop inside of each other checks every iteration and if the iterations is equal to that random number, it starts recording. I know this is not the best method as it is checking lots of iterations that are not necessary. But, I don't know how can I improve my code. I am a little helpless at this point, so any comment would be appreciated
This does not answer the implied question
[What is the] most efficient way to have [an] if statement in C++ (all of them should be equivalent), but
Supposing varying intervals between exports were logical, how do I code that adequately?
Keep a sane Monte Carlo control, initialise a nextExport variable to a random value to your liking, and whenever it equals nextExport, export and increase nextExport by the next random interval.
if (it == initial_position_array[j][i] || it == (initial_position_array[j][i] + delta_time_arr[j]))
you can use references for both expressions.(please use meaningful names as per your convinience)
int& i_p_a = initial_position_array[j][i];
int& i_p_a_d = (initial_position_array[j][i] + delta_time_arr[j]);
now you final if statement will be readable and maintainable.
if (it == i_p_a || it == i_p_a_d) {
exportedStates = system.GetStates();
Export2D(exportedStates, outputStatesFile, it);
}

Merge Sort With String Vectors C++

Hello all I am a noob to recursion and I'm feeling like banging my head against the wall. I watched some videos, read the chapter and have been trying to figure out the answer to this problem for over 6 hours now with no luck. My professor gave us the following code and we have to mod it from there. Note: We are reading 52k words from a file and then sorting them using this algorithm. Not sure if that matters but thought I would add the info just in case.
include
using namespace std;
vector<int> MergeUsingArrayIndices(const vector<int> & LHS,
const vector<int> & RHS)
{
vector<int> ToReturn;
int i = 0; // LHS index
int j = 0; // RHS index
while ((i < LHS.size()) && (j < RHS.size()))
{
if (LHS[i] < RHS[j])
{
ToReturn.push_back(LHS[i]);
++i;
}
else
{
ToReturn.push_back(RHS[j]);
++j;
}
}
while (i < LHS.size())
{
ToReturn.push_back(LHS[i]);
++i;
}
while (j < RHS.size())
{
ToReturn.push_back(RHS[j]);
++j;
}
return ToReturn;
}
Except now we have to make this work from just a single vector. This is what I have so far.
vector<string> MergeUsingArrayIndices(vector<string> & LHS,
int START, int MID, int MIDPLUSONE, int END)
{
vector<string> ToReturn;
int i = 0; // LHS index
int j = MIDPLUSONE; // RHS index
while ((i <= MID) && (j <= END))
{
if (LHS[i] < LHS[j])
{
ToReturn.push_back(LHS[i]);
++i;
}
else
{
ToReturn.push_back(LHS[j]);
++j;
}
}
while (i <= MID)
{
ToReturn.push_back(LHS[i]);
++i;
}
while (j <= END)
{
ToReturn.push_back(LHS[j]);
++j;
}
for (int k = 0; k < ToReturn.size(); ++k)
{
LHS[k] = ToReturn[k];
}
return ToReturn;
}
Plus this is the call prior to the function.
void MergeSort(vector<string> & VECTOR, int START, int END)
{
if (END > START)
{
int MID = (START + END) / 2;
MergeSort(VECTOR, START, MID);
MergeSort(VECTOR, MID + 1, END);
MergeUsingArrayIndices(VECTOR, START, MID, (MID+1), END);
}
}
void Merge(std::vector<string> & VECTOR)
{
MergeSort(VECTOR, 0, VECTOR.size()-1);
}
Console Screen Shot
Basically it is sorting but not very well since not everything is in alphabetical order. That was just a small sample of words from the list.
Thank you and best regards,
DON'T GET MARRIED.
UPDATE FOR: PNKFELIX
It tried the following;
vector<string> ToReturn;
int i = START; // LHS index
int j = MIDPLUSONE; // RHS index
while (i <= MID && j <= END)
{
if (LHS[i] <= LHS[j])
{
ToReturn[START] = LHS[i];
//ToReturn.push_back(LHS[i]);
++START;
++i;
}
and so on but this made the code worse so I am sure that is not what you were referring to. I have been up for days trying to figure this out and I cannot sleep......
The one thing you pointed to that is bothering me because I see why it's not happening but cannot fix is the call
I'm guessing that is why you used the apple, pear, orange, banana example. (very clever by the way). You can lead a horse to water but cannot make it drink. However, I still do not see how to fix this? I tried replacing my i = 0; with i = START as I now see this is probably the culprit when comparing the right side since it should start at that position but it actually made my code worse? What else am I missing here?
I have so much going on and cannot stand it when professors do stuff like this (my community college isn't great for CIS and my professor has never taught this class before). I cannot rest until I figure it out but the textbook is so far above my head (the professor even apologized for the textbook at the beginning of the semester saying it was too advanced for us but it is what they gave him) and uses a totally different approach (two separate arrays instead of one vector). What am I supposed to do with START? I have spent so much time on this and am dying to know the answer. Maybe that makes me lazy but there is a point where you can only think about something so much. I love to learn but this is not learning as I've hit my limit. I am missing something and don't know how to begin desk checking what it is. I am assuming the right hand side of each vector comparison is not sorted but how do I fix that? Is it because start is not always zero (example: for the right hand side )? I am not good at sorting algorithms (because I am not very bright (although I study allot)) as it is, and this is a new twist. It's like handing someone a bubble sort that is broken and asking them to desk check it, fix whats wrong with it, and make it more efficient yet they've never seen one working before.
The nice thing about a problem like this is that there's nothing specific to C++ here. One can take the proposed code and port it to pretty much any other reasonable language (e.g. JavaScript) and then debug it there to determine what is going wrong.
A good practice in any program is to document the assumptions and invariants of the code. If these invariants are simple enough, you can even check that they hold within the code itself, via assert statements.
So, lets see: from looking at how MergeUsingArrayIndices is invoked by MergeSort, it seems like your approach is a recursive divide-and-conquer: You first divide the input in two at a midpoint element, sort each side of the divided input, and then merge the two parts.
From that high-level description, we can identify a couple of invariants that must hold on entry to MergeUsingArrayIndices: the left half of LHS must be sorted, and the right half of LHS must also be sorted. We can check that these two conditions hold as we merge the vector, which may help us identify the spot where things are going wrong.
I took the original code and ported it as faithfully as I could to Rust (my preferred programming language), then added some assertions, and some print statements so that we can see where the assertions fail.
(There was one other change I forgot to mention above: I also got rid of the unused return value from MergeUsingArrayIndices. The array you're building up is solely used as temporary storage that is later copied into LHS; you never use the return value and therefore we can just remove that from the function's type entirely.)
Here is that code in a running playpen:
https://play.rust-lang.org/?gist=bd61b9572ea45b7139bf081cb51dc491&version=stable&backtrace=0
Some leading questions:
What indices is the assertion comparing when it reports that LHS[i] is in fact not less than LHS[i+1]?
The printouts report when the vector should be sorted at certain subranges: 0...0, 1...1, 0...1, et cetera. The indices you found above (assuming they are the same as the ones I found) are not within one of these subranges; so we in fact do not have a justification for trying to claim that LHS[i] is less than LHS[i+1]! So what happened, why does the code think that they should fall into a sorted subrange of the vector?
Strong hint number one: I left on a warning that the compiler issues about the code.
Strong hint number two: Try doing the exercise I left in the comment above the MergeUsingArrayIndices function.
Use strcmp(LHS[i],LHS[j])<0 in if condition

Most efficient way to search for a value and return its index in a vector?

I am trying to iterate through a vector (k), and check if it contains a value (key), if it does, I want to add the value found at the same index of a different vector (val) and then add whatever value is found there to a third vector (temp).
for(int i = 0; i < k.size(); ++i)
{
if(k.at(i) == key)
{
temp.push_back(val.at(i));
}
}
I've learned a lot lately but I'm still not super advanced in C++, this code does work for my purposes but it is extremely slow. It can handle small vectors of sizes like 10 or 100, but takes much too long for sizes bigger like 1000, 10000 or even 1000000.
My question is, is there a faster and more efficient way to do this?
I've tried this:
std::vector<int>::iterator it = k.begin();
while ((iter = std::find(it, k.end(), key)) != k.end())
{
int index = std::distance(k.begin(), it);
temp.push_back(val.at(index));
}
I thought maybe using a vector iterator would speed things up, but I can't get the code to work due to bad_alloc errors that I'm not sure how to fix.
Does anyone know what I can do to make this little bit of code much faster?
Here are a few things you could do:
Pre-allocate the data for temp, so that push_back doesn't cause repeated allocations:
temp.reserve(k.size());
If k is sorted, you can use that fact to speed things up a bit:
auto lowerIt = std::lower_bound(k.begin(), k.end(), key);
auto upperIt = std::upper_bound(k.begin(), k.end(), key);
for (auto it = lowerIt; it != upperIt; ++it)
temp.push_back(val[it - k.begin()]);
at does bounds checking, so it is a tad bit slower than []. You obviously have to guarantee that you are never accessing an out of bounds index.
Besides Rakete's suggestions:
If your keys vector is sorted - use std::binary_search instead of std::find and then just iterate until the next value/end of vector.
If you're free to change your data structures, keep your data in std::unordered_multimap and use equal_range to access elements with your desired keys.

no duplicate function for a lottery program

right now im trying to make a function that checks to see if the user’s selection is already in the array , and if it does itll tell you to choose a diff number. how can i do this?
Do you mean something like this?
bool CheckNumberIsValid()
{
for(int i = 0 ; i < array_length; ++i)
{
if(array[i] == user_selection)
return false;
}
return true;
}
That should give you a clue, at least.
What's wrong with std::find? If you get the end iterator back, the
value isn't in the array; otherwise, it is. Or if this is homework, and
you're not allowed to use the standard library, a simple while loop
should do the trick: this is a standard linear search, algorithms for
which can be found anywhere. (On the other hand, some of the articles
which pop up when searching with Google are pretty bad. You really
should use the standard implementation:
Iterator
find( Iterator begin, Iterator end, ValueType target )
{
while ( begin != end && *begin != target )
++ begin;
return begin;
}
Simple, effective, and proven to work.)
[added post factum]Oh, homework tag. Ah well, it won't really benefit you that much then, still - I'll leave my answer since it can be of some use to others browsing through SO.
If you'd need to have lots of unique random numbers in a range - say 45000 random numbers from 0..45100 - then you should see how this is going to get problematic using the approach of:
while (size_of_range > v.size()) {
int n = // get random
if ( /* n is not already in v */ ) {
v.push_back(n);
}
}
If the size of the pool and the range you want to get are close, and the pool size is not a very small integer - it'll get harder and harder to get a random number that wasn't already put in the vector/array.
In that case, you'll be much better of using std::vector (in <vector>) and std::random_shuffle (in <algorithm>):
unsigned short start = 10; // the minimum value of a pool
unsigned short step = 1; // for 10,11,12,13,14... values in the vector
// initialize the pool of 45100 numbers
std::vector<unsigned long> pool(45100);
for (unsigned long i = 0, j = start; i < pool.size(); ++i, j += step) {
pool[i] = j;
}
// get 45000 numbers from the pool without repetitions
std::random_shuffle(pool.begin(), pool.end());
return std::vector<unsigned long>(pool.begin(), pool.begin() + 45000);
You can obviously use any type, but you'll need to initialize the vector accordingly, so it'd contain all possible values you want.
Note that the memory overhead probably won't really matter if you really need almost all of the numbers in the pool, and you'll get good performance. Using rand() and checking will take a lot of time, and if your RAND_MAX is equal 32767 then it'd be an infinite loop.
The memory overhead is however noticeable if you only need few of those values. The first approach would usually be faster then.
If it really needs to be the array you need to iterate or use find function from algorithm header. Well, I would suggest you go for putting the numbers in a set as the look up is fast in sets and handy using set::find function
ref: stl set
These are some of the steps (in pseudo-code since this is a homework question) on how you may get around to doing this:
Get user to enter a new number.
If the number entered is the first, push it to the vector anyway.
Sort the contents of the vector in case size is > 1.
Ask user to enter the number.
Perform a binary search on the contents to see if the number was entered.
If number is unique, push it into vector. If not unique, ask again.
Go to step 3.
HTH,
Sriram.

How to keep only the last duplicate when iterating through rows

Following code iterates through many data-rows, calcs some score per row and then sorts the rows according to that score:
unsigned count = 0;
score_pair* scores = new score_pair[num_rows];
while ((row = data.next_row())) {
float score = calc_score(data.next_feature())
scores[count].score = score;
scores[count].doc_id = row->docid;
count++;
}
assert(count <= num_rows);
qsort(scores, count, sizeof(score_pair), score_cmp);
Unfortunately, there are many duplicate rows with the same docid but different score. Now i like to keep the last score for any docid only. The docids are unsigned int, but usually big (=> no lookup-array) - using a HashMap to lookup the last count for a docid would probably be too slow (many millions of rows, should only take seconds not minutes...).
Ok, i modified my code to use a std:map:
map<int, int> docid_lookup;
unsigned count = 0;
score_pair* scores = new score_pair[num_rows];
while ((row = data.next_row())) {
float score = calc_score(data.next_feature())
map<int, int>::iterator iter;
iter = docid_lookup.find(row->docid);
if (iter != docid_lookup.end()) {
scores[iter->second].score = score;
scores[iter->second].doc_id = row->docid;
} else {
scores[count].score = score;
scores[count].doc_id = row->docid;
docid_lookup[row->docid] = count;
count++;
}
}
It works and the performance hit is not as bad as i expected - now it runs a minute instead of 16 seconds, so it's about a factor of 3. Memory usage has also gone up from about 1Gb to 4Gb.
The first thing I'd try would be a map or unordered_map: I'd be surprised if performance is a factor of 60 slower than what you did without any unique-ification. If the performance there isn't acceptable, another option is something like this:
// get the computed data into a vector
std::vector<score_pair>::size_type count = 0;
std::vector<score_pair> scores;
scores.reserve(num_rows);
while ((row = data.next_row())) {
float score = calc_score(data.next_feature())
scores.push_back(score_pair(score, row->docid));
}
assert(scores.size() <= num_rows);
// remove duplicate doc_ids
std::reverse(scores.begin(), scores.end());
std::stable_sort(scores.begin(), scores.end(), docid_cmp);
scores.erase(
std::unique(scores.begin(), scores.end(), docid_eq),
scores.end()
);
// order by score
std::sort(scores.begin(), scores.end(), score_cmp);
Note that the use of reverse and stable_sort is because you want the last score for each doc_id, but std::unique keeps the first. If you wanted the first score you could just use stable_sort, and if you didn't care what score, you could just use sort.
The best way of handling this is probably to pass reverse iterators into std::unique, rather than a separate reverse operation. But I'm not confident I can write that correctly without testing, and errors might be really confusing, so you get the unoptimised code...
Edit: just for comparison with your code, here's how I'd use the map:
std::map<int, float> scoremap;
while ((row = data.next_row())) {
scoremap[row->docid] = calc_score(data.next_feature());
}
std::vector<score_pair> scores(scoremap.begin(), scoremap.end());
std::sort(scores.begin(), scores.end(), score_cmp);
Note that score_pair would need a constructor taking a std::pair<int,float>, which makes it non-POD. If that's not acceptable, use std::transform, with a function to do the conversion.
Finally, if there is much duplication (say, on average 2 or more entries per doc_id), and if calc_score is non-trivial, then I would be looking to see whether it's possible to iterate the rows of data in reverse order. If it is, then it will speed up the map/unordered_map approach, because when you get a hit for the doc_id you don't need to calculate the score for that row, just drop it and move on.
I'd go for a std::map of docids. If you could create an appropriate hashing function, a hash-map would be preferable. But I guess it's too difficult. And no - the std::map ist not too slow. Access is O(log n), which is nearly as good as O(1). O(1) is array access time (and Hashmap btw).
Btw, if std::map is too slow, qsort O(n log n) is too slow as well. And, using a std::map and iterating over it's contents, you can perhaps save your qsort.
Some additions for the comment (by onebyone):
I did not go for the implementation
details, since there wasn't enough
information on that.
qsort may behave bad with sorted data
(depending on the implementation).
Std::map may not. This is a real
advantage, especially if you read the
values from a database that might
output them ordered by key.
There was no word on the memory allocation strategy. Changing to a memory allocator with fast allocation of small objects may improve the performance.
Still - the fastest would be a hash map with an appropriate hash function. Since there's not enough information about the distribution of the keys, presenting one in this answer is not possible.
Short - if you ask general questions, you get general answers. This means - at least for me, looking at the time complexity in the O-Notation. Still you were right, depending on different factors, the std::map may be too slow while qsort is still fast enough - it may also be the other way round in the worst case of qsort, where it has n^2 complexity.
Unless I've misunderstood the question, the solution can be simplified considerably. At least as I understand it, you have a few million docid's (which are of type unsigned int) and for each unique docid, you want to store one 'score' (which is a float). If the same docid occurs more than once in the input, you want to keep the score from the last one. If that's correct, the code can be reduced to this:
std::map<unsigned, float> scores;
while ((row = data.next_row()))
scores[row->docid] = calc_score(data.next_feature());
This will probably be somewhat slower than your original version since it allocates a lot of individual blocks rather than one big block of memory. Given your statement that there are a lot of duplicates in the docid's, I'd expect this to save quite a bit of memory, since it only stores data for each unique docid rather than for every row in the original data.
If you wanted to optimize this, you could almost certainly do so -- since it uses a lot of small blocks, a custom allocator designed for that purpose would probably help quite a bit. One possibility would be to take a look at the small-block allocator in Andrei Alexandrescu's Loki library. He's done more work on the problem since, but the one in Loki is probably sufficient for the task at hand -- it'll almost certainly save a fair amount of memory and run faster as well.
If your C++ implementation has it, and most do, try hash_map instead of std::map (it's sometimes available under std::hash_map).
If the lookups themselves are your computational bottleneck, this could be a significant speedup over std::map's binary tree.
Why not sort by doc id first, calculate scores, then for any subset of duplicates use the max score?
On re-reading the question; I'd suggest a modification to how scores are read in. Keep in mind C++ isn't my native language, so this won't quite be compilable.
unsigned count = 0;
pair<int, score_pair>* scores = new pair<int, score_pair>[num_rows];
while ((row = data.next_row())) {
float score = calc_score(data.next_feature())
scores[count].second.score = score;
scores[count].second.doc_id = row->docid;
scores[count].first = count;
count++;
}
assert(count <= num_rows);
qsort(scores, count, sizeof(score_pair), pair_docid_cmp);
//getting number of unique scores
int scoreCount = 0;
for(int i=1; i<num_rows; i++)
if(scores[i-1].second.docId != scores[i].second.docId) scoreCount++;
score_pair* actualScores=new score_pair[scoreCount];
int at=-1;
int lastId = -1;
for(int i=0; i<num_rows; i++)
{
//if in first entry of new doc id; has the last read time by pair_docid_cmp
if(lastId!=scores[i].second.docId)
actualScores[++at]=scores[i].second;
}
qsort(actualScores, count, sizeof(score_pair), score_cmp);
Where pair_docid_cmp would compare first on docid; grouping same docs together, then second by reverse order read; such that the last item read is the first in the sublist of items with the same docid. Should only be ~5/2x memory usage, and ~double the execution speed.