no duplicate function for a lottery program - c++

right now im trying to make a function that checks to see if the user’s selection is already in the array , and if it does itll tell you to choose a diff number. how can i do this?

Do you mean something like this?
bool CheckNumberIsValid()
{
for(int i = 0 ; i < array_length; ++i)
{
if(array[i] == user_selection)
return false;
}
return true;
}
That should give you a clue, at least.

What's wrong with std::find? If you get the end iterator back, the
value isn't in the array; otherwise, it is. Or if this is homework, and
you're not allowed to use the standard library, a simple while loop
should do the trick: this is a standard linear search, algorithms for
which can be found anywhere. (On the other hand, some of the articles
which pop up when searching with Google are pretty bad. You really
should use the standard implementation:
Iterator
find( Iterator begin, Iterator end, ValueType target )
{
while ( begin != end && *begin != target )
++ begin;
return begin;
}
Simple, effective, and proven to work.)

[added post factum]Oh, homework tag. Ah well, it won't really benefit you that much then, still - I'll leave my answer since it can be of some use to others browsing through SO.
If you'd need to have lots of unique random numbers in a range - say 45000 random numbers from 0..45100 - then you should see how this is going to get problematic using the approach of:
while (size_of_range > v.size()) {
int n = // get random
if ( /* n is not already in v */ ) {
v.push_back(n);
}
}
If the size of the pool and the range you want to get are close, and the pool size is not a very small integer - it'll get harder and harder to get a random number that wasn't already put in the vector/array.
In that case, you'll be much better of using std::vector (in <vector>) and std::random_shuffle (in <algorithm>):
unsigned short start = 10; // the minimum value of a pool
unsigned short step = 1; // for 10,11,12,13,14... values in the vector
// initialize the pool of 45100 numbers
std::vector<unsigned long> pool(45100);
for (unsigned long i = 0, j = start; i < pool.size(); ++i, j += step) {
pool[i] = j;
}
// get 45000 numbers from the pool without repetitions
std::random_shuffle(pool.begin(), pool.end());
return std::vector<unsigned long>(pool.begin(), pool.begin() + 45000);
You can obviously use any type, but you'll need to initialize the vector accordingly, so it'd contain all possible values you want.
Note that the memory overhead probably won't really matter if you really need almost all of the numbers in the pool, and you'll get good performance. Using rand() and checking will take a lot of time, and if your RAND_MAX is equal 32767 then it'd be an infinite loop.
The memory overhead is however noticeable if you only need few of those values. The first approach would usually be faster then.

If it really needs to be the array you need to iterate or use find function from algorithm header. Well, I would suggest you go for putting the numbers in a set as the look up is fast in sets and handy using set::find function
ref: stl set

These are some of the steps (in pseudo-code since this is a homework question) on how you may get around to doing this:
Get user to enter a new number.
If the number entered is the first, push it to the vector anyway.
Sort the contents of the vector in case size is > 1.
Ask user to enter the number.
Perform a binary search on the contents to see if the number was entered.
If number is unique, push it into vector. If not unique, ask again.
Go to step 3.
HTH,
Sriram.

Related

Most efficient way to search for a value and return its index in a vector?

I am trying to iterate through a vector (k), and check if it contains a value (key), if it does, I want to add the value found at the same index of a different vector (val) and then add whatever value is found there to a third vector (temp).
for(int i = 0; i < k.size(); ++i)
{
if(k.at(i) == key)
{
temp.push_back(val.at(i));
}
}
I've learned a lot lately but I'm still not super advanced in C++, this code does work for my purposes but it is extremely slow. It can handle small vectors of sizes like 10 or 100, but takes much too long for sizes bigger like 1000, 10000 or even 1000000.
My question is, is there a faster and more efficient way to do this?
I've tried this:
std::vector<int>::iterator it = k.begin();
while ((iter = std::find(it, k.end(), key)) != k.end())
{
int index = std::distance(k.begin(), it);
temp.push_back(val.at(index));
}
I thought maybe using a vector iterator would speed things up, but I can't get the code to work due to bad_alloc errors that I'm not sure how to fix.
Does anyone know what I can do to make this little bit of code much faster?
Here are a few things you could do:
Pre-allocate the data for temp, so that push_back doesn't cause repeated allocations:
temp.reserve(k.size());
If k is sorted, you can use that fact to speed things up a bit:
auto lowerIt = std::lower_bound(k.begin(), k.end(), key);
auto upperIt = std::upper_bound(k.begin(), k.end(), key);
for (auto it = lowerIt; it != upperIt; ++it)
temp.push_back(val[it - k.begin()]);
at does bounds checking, so it is a tad bit slower than []. You obviously have to guarantee that you are never accessing an out of bounds index.
Besides Rakete's suggestions:
If your keys vector is sorted - use std::binary_search instead of std::find and then just iterate until the next value/end of vector.
If you're free to change your data structures, keep your data in std::unordered_multimap and use equal_range to access elements with your desired keys.

Efficiency of an algorithm for scrambled input

I am currently writing a program, its done for the most part, in CPP that takes in a file, with numbered indices and then pushes out a scrambled quiz based on the initial input, so that no two are, theroretically, the same.
This is the code
// There has to be a more efficient way of doing this...
for (int tempCounter(inputCounter);
inputCounter != 0;
/* Blank on Purpose*/) {
randInput = (rand() % tempCounter) + 1;
inputIter = find (scrambledArray.begin(),
scrambledArray.end(),
randInput);
// Checks if the value passed in is within the given vector, no duplicates.
if (inputIter == scrambledArray.end()) {
--inputCounter;
scrambledArray.push_back(randInput);
}
}
The first comment states my problem. It will not happen, under normal circumstances, but what about if this were being applied to a larger application standpoint. This works, but it is highly inefficient should the user want to scramble 10000 or so results. I feel as if in that point this would be highly inefficient.
I'm not speaking about the efficiency of the code, as in shortening some sequences and compacting it to make it a bit prettier, I was more or less teaching someone, and upon getting to this point I came to the conclusion that this could be done in a way better manner, just don't know which way it could be...
So you want just the numbers 1..N shuffled? Yes, there is a more efficient way of doing that. You can use std::iota to construct your vector:
// first, construct your vector:
std::vector<int> scrambled(N);
std::iota(scrambled.begin(), scrambled.end(), 1);
And then std::shuffle it:
std::shuffle(scrambled.begin(), scrambled.end(),
std::mt19937{std::random_device{}()});
If you don't have C++11, the above would look like:
std::vector<int> scrambled;
scrambled.reserve(N);
for (int i = 1; i <= N; ++i) {
scrambled.push_back(i);
}
std::random_shuffle(scrambled.begin(), scrambled.end());

Is there any way of optimising this function?

This piece of code seems to be the worst offender in terms of time in my program. What my program is trying to do find the minimum number of individual "nodes" required to satisfy a network with two constraints:
Each node must connect to x number of other nodes
Each node must have y degrees of separation between it and each of the nodes it's connected to.
However for values of x greater than 600 this task takes a very long time, the task is on the order of exponential anyway so I expect it to take forever at some point but that also means that if any small changes could be made here it'd speed up the entire program by alot.
uniint = unsigned long long int (64-bit)
network is a vector of the form vector<vector<uniint>>
The piece of code:
/* Checks if id2 is in id1's list of connections */
inline bool CheckIfInList (uniint id1, uniint id2)
{
uniint id1size = network[id1].size();
for (uniint itr = 0; itr < id1size; ++itr)
{
if (network[id1][itr] == id2)
{
return true;
}
}
return false;
}
The only way is to sort the network[id1] array when you build it.
If you arrive here with a sorted array you can easiliy find, if exists, what you are looking for using a dichotomic search.
Use std::map or std::unordered_map for fast search. I guess it's impossible to MICRO optimize this code, std::vector is cool. But not for 600 elements search.
I'm guessing CheckIfInList() is called in a loop? Perhaps a vector is not the best choice, you could try vector<set<uniint>>. This will give you O(log n) for a look up of the inner collection instead of O(n)
For quick microoptimization, check whether your compiler optimizes the multiple calls to network[id1] away. If not, that is where you loose a lot of time, so remember the address:
vector<uniint>& connectedNodes = network[id1];
uniint id1size = connectedNodes.size();
for (uniint itr = 0; itr < id1size; ++itr)
{
if (connectedNodes[itr] == id2)
{
return true;
}
}
return false;
If your compiler already took care of that, I'm afraid that there's not much you can micro optimize about this method. The only real optimization can be achieved on the algorithmic level, starting with sorting the neighbour lists, moving on to using unordered_map<> instead of vector<>, and ending with asking yourself whether you can't somehow reduce the number of calls to CheckIfInList().
This is not as effective as HAL9000's suggestion, and is good for cases when you have an unsorted list/array. What you can do is to ask less question in each iteration if you put the value you looking for at the end of the vector.
uniint id1size = network[id1].size();
network[id1][id1size] = id2;
for (uniint itr = 0; network[id1][itr] == id2; ++itr);
//if itr != id1size return true else flase....
need to add checks if the last member in the vector was your id2.
This way you don't need to ask each time whether you get to the end of the list.

How to keep only the last duplicate when iterating through rows

Following code iterates through many data-rows, calcs some score per row and then sorts the rows according to that score:
unsigned count = 0;
score_pair* scores = new score_pair[num_rows];
while ((row = data.next_row())) {
float score = calc_score(data.next_feature())
scores[count].score = score;
scores[count].doc_id = row->docid;
count++;
}
assert(count <= num_rows);
qsort(scores, count, sizeof(score_pair), score_cmp);
Unfortunately, there are many duplicate rows with the same docid but different score. Now i like to keep the last score for any docid only. The docids are unsigned int, but usually big (=> no lookup-array) - using a HashMap to lookup the last count for a docid would probably be too slow (many millions of rows, should only take seconds not minutes...).
Ok, i modified my code to use a std:map:
map<int, int> docid_lookup;
unsigned count = 0;
score_pair* scores = new score_pair[num_rows];
while ((row = data.next_row())) {
float score = calc_score(data.next_feature())
map<int, int>::iterator iter;
iter = docid_lookup.find(row->docid);
if (iter != docid_lookup.end()) {
scores[iter->second].score = score;
scores[iter->second].doc_id = row->docid;
} else {
scores[count].score = score;
scores[count].doc_id = row->docid;
docid_lookup[row->docid] = count;
count++;
}
}
It works and the performance hit is not as bad as i expected - now it runs a minute instead of 16 seconds, so it's about a factor of 3. Memory usage has also gone up from about 1Gb to 4Gb.
The first thing I'd try would be a map or unordered_map: I'd be surprised if performance is a factor of 60 slower than what you did without any unique-ification. If the performance there isn't acceptable, another option is something like this:
// get the computed data into a vector
std::vector<score_pair>::size_type count = 0;
std::vector<score_pair> scores;
scores.reserve(num_rows);
while ((row = data.next_row())) {
float score = calc_score(data.next_feature())
scores.push_back(score_pair(score, row->docid));
}
assert(scores.size() <= num_rows);
// remove duplicate doc_ids
std::reverse(scores.begin(), scores.end());
std::stable_sort(scores.begin(), scores.end(), docid_cmp);
scores.erase(
std::unique(scores.begin(), scores.end(), docid_eq),
scores.end()
);
// order by score
std::sort(scores.begin(), scores.end(), score_cmp);
Note that the use of reverse and stable_sort is because you want the last score for each doc_id, but std::unique keeps the first. If you wanted the first score you could just use stable_sort, and if you didn't care what score, you could just use sort.
The best way of handling this is probably to pass reverse iterators into std::unique, rather than a separate reverse operation. But I'm not confident I can write that correctly without testing, and errors might be really confusing, so you get the unoptimised code...
Edit: just for comparison with your code, here's how I'd use the map:
std::map<int, float> scoremap;
while ((row = data.next_row())) {
scoremap[row->docid] = calc_score(data.next_feature());
}
std::vector<score_pair> scores(scoremap.begin(), scoremap.end());
std::sort(scores.begin(), scores.end(), score_cmp);
Note that score_pair would need a constructor taking a std::pair<int,float>, which makes it non-POD. If that's not acceptable, use std::transform, with a function to do the conversion.
Finally, if there is much duplication (say, on average 2 or more entries per doc_id), and if calc_score is non-trivial, then I would be looking to see whether it's possible to iterate the rows of data in reverse order. If it is, then it will speed up the map/unordered_map approach, because when you get a hit for the doc_id you don't need to calculate the score for that row, just drop it and move on.
I'd go for a std::map of docids. If you could create an appropriate hashing function, a hash-map would be preferable. But I guess it's too difficult. And no - the std::map ist not too slow. Access is O(log n), which is nearly as good as O(1). O(1) is array access time (and Hashmap btw).
Btw, if std::map is too slow, qsort O(n log n) is too slow as well. And, using a std::map and iterating over it's contents, you can perhaps save your qsort.
Some additions for the comment (by onebyone):
I did not go for the implementation
details, since there wasn't enough
information on that.
qsort may behave bad with sorted data
(depending on the implementation).
Std::map may not. This is a real
advantage, especially if you read the
values from a database that might
output them ordered by key.
There was no word on the memory allocation strategy. Changing to a memory allocator with fast allocation of small objects may improve the performance.
Still - the fastest would be a hash map with an appropriate hash function. Since there's not enough information about the distribution of the keys, presenting one in this answer is not possible.
Short - if you ask general questions, you get general answers. This means - at least for me, looking at the time complexity in the O-Notation. Still you were right, depending on different factors, the std::map may be too slow while qsort is still fast enough - it may also be the other way round in the worst case of qsort, where it has n^2 complexity.
Unless I've misunderstood the question, the solution can be simplified considerably. At least as I understand it, you have a few million docid's (which are of type unsigned int) and for each unique docid, you want to store one 'score' (which is a float). If the same docid occurs more than once in the input, you want to keep the score from the last one. If that's correct, the code can be reduced to this:
std::map<unsigned, float> scores;
while ((row = data.next_row()))
scores[row->docid] = calc_score(data.next_feature());
This will probably be somewhat slower than your original version since it allocates a lot of individual blocks rather than one big block of memory. Given your statement that there are a lot of duplicates in the docid's, I'd expect this to save quite a bit of memory, since it only stores data for each unique docid rather than for every row in the original data.
If you wanted to optimize this, you could almost certainly do so -- since it uses a lot of small blocks, a custom allocator designed for that purpose would probably help quite a bit. One possibility would be to take a look at the small-block allocator in Andrei Alexandrescu's Loki library. He's done more work on the problem since, but the one in Loki is probably sufficient for the task at hand -- it'll almost certainly save a fair amount of memory and run faster as well.
If your C++ implementation has it, and most do, try hash_map instead of std::map (it's sometimes available under std::hash_map).
If the lookups themselves are your computational bottleneck, this could be a significant speedup over std::map's binary tree.
Why not sort by doc id first, calculate scores, then for any subset of duplicates use the max score?
On re-reading the question; I'd suggest a modification to how scores are read in. Keep in mind C++ isn't my native language, so this won't quite be compilable.
unsigned count = 0;
pair<int, score_pair>* scores = new pair<int, score_pair>[num_rows];
while ((row = data.next_row())) {
float score = calc_score(data.next_feature())
scores[count].second.score = score;
scores[count].second.doc_id = row->docid;
scores[count].first = count;
count++;
}
assert(count <= num_rows);
qsort(scores, count, sizeof(score_pair), pair_docid_cmp);
//getting number of unique scores
int scoreCount = 0;
for(int i=1; i<num_rows; i++)
if(scores[i-1].second.docId != scores[i].second.docId) scoreCount++;
score_pair* actualScores=new score_pair[scoreCount];
int at=-1;
int lastId = -1;
for(int i=0; i<num_rows; i++)
{
//if in first entry of new doc id; has the last read time by pair_docid_cmp
if(lastId!=scores[i].second.docId)
actualScores[++at]=scores[i].second;
}
qsort(actualScores, count, sizeof(score_pair), score_cmp);
Where pair_docid_cmp would compare first on docid; grouping same docs together, then second by reverse order read; such that the last item read is the first in the sublist of items with the same docid. Should only be ~5/2x memory usage, and ~double the execution speed.

Find the lowest unused number

I've setup a std map to map some numbers, at this point I know what numbers I'm mapping from an to, eg:
std::map<int, int> myMap;
map[1] = 2;
map[2] = 4;
map[3] = 6;
Later however, I want to map some numbers to the lowest number possilbe that is not in the map, eg:
map[4] = getLowestFreeNumberToMapTo(map); // I'd like this to return 1
map[5] = getLowestFreeNumberToMapTo(map); // I'd like this to return 3
Any easy way of doing this?
I considered building an ordered list of numbers as I added them to the map so I could just look for 1, not find it, use it, add it etc.
Something like
typedef std::set<int> SetType;
SetType used; // The already used numbers
int freeCounter = 1; // The first available free number
void AddToMap(int i)
{
used.insert(i);
// Actually add the value to map
}
void GetNewNumber()
{
SetType::iterator iter = used.lower_bound(freeCounter);
while (iter != used.end() && *iter == freeCounter)
{
++iter;
++freeCounter;
}
return freeCounter++;
}
If your map is quite big but sparse, this will work like o(log(N)), where N is the number of items in the map - in most cases you won't have to iterate through the set, or just make a few steps.
Otherwise, if there are few gaps in the map, then you would better have a set of free items in the range [1..maxValueInTheMap].
Finding the lowest unused number is a very common operation in UNIX kernels, as every open/socket/etc. syscall is supposed to bind to the lowest unused FD number.
On Linux, the algorithm in fs/file.c­#alloc_fd is:
keep track of next_fd, a low water mark -- it is not necessarily 100% accurate
whenever a FD is freed, next_fd = min(fd, next_fd)
to allocate a FD, start searching the bitmap starting from next_fd -- lib/find_next_bit.c­#find_next_zero_bit is linear but still very fast, because it takes BITS_PER_LONG strides at a time
after a FD is allocated, next_fd = fd + 1
FreeBSD's sys/kern/kern_descrip.c­#fdalloc follows the same idea: start with int fd_freefile; /* approx. next free file */, and search the bitmap upwards.
However, these are all operating under the assumption that most processes have few FDs open, and very, very few have thousands. If the numbers will go much higher, with sparse holes, the common solution (as far as I've seen) is
#include <algorithm>
#include <functional>
#include <vector>
using namespace std;
int high_water_mark = 0;
vector<int> unused_numbers = vector<int>();
int get_new_number() {
if (used_numbers.empty())
return high_water_mark++;
pop_heap(unused_numbers.begin(), unused_numbers.end(), greater<int>());
return unused_numbers.pop_back();
}
void recycle_number(int number) {
unused_numbers.push_back(number);
push_heap(unused_numbers.begin(), unused_numbers.end(), greater<int>());
}
(untested code... idea is: keep a high water mark; try to steal from unused below the high water mark, or up the high water mark otherwise; return freed to unused)
and if your assumption is that the used numbers will be sparse, then Dmitry's solution makes more sense.
I'd use a bidirectional map class for this problem. That way you can simply check if value 1 exists etc.
Edit
The benefits of using a bimap is that there already exist robust implementations of it and even if searching for the next free number is O(n) it is only an issue if n is large (or possibly if n is moderate and this is called very frequently). Overall making for a simple implementation that is unlikely to be error prone and easily maintainable.
If n is large or this operation is performed very frequently than investing the effort of implementing a more advanced solution is merited. IMHO.