Searching using unordered map vs array - c++

I have a block allocation function that takes in an array and then searches through it to find values 0 which indicates the free space, and then allocates blocks to the available free space. I am trying to use unordered map to improve the speed of searching for 0s. In my function, all the elements in the array are inserted into the unordered map. I was wondering if implementing unordered map like below even improves the searching speed compared to just using arrays?
int arr[] = {15, 11, 0, 0, 0, 27, 0, 0}; // example array
int n = sizeof(arr)/sizeof(arr[0]);
unordered_map<int, int> hash;
for(i=n;i>=0;i--)
{
hash[i+1] = arr[i];
}
for(auto v : hash)
{
if(v.second==0)
{
return v.second;
}
}
int arr[] = {15, 11, 0, 0, 0, 27, 0, 0};
int n = sizeof(arr)/sizeof(arr[0]);
for(i=0;i<n;i++)
{
if(arr[i]==0)
{
return arr[i];
}
}

First, note that both functions as you've written them always return zero, which is not what you want.
Now, to answer the main question: No, this approach doesn't help. In both cases you're just iterating over the values in the array until you hit on one that's a zero. This is an O(n) operation in the worst case, and introducing an unordered_map here is just slowing things down.
The most similar thing you could do here that would actually help would be something like
std::unordered_map<int, std::vector<int>> lookup;
for(int i = 0; i < n; i++)
{
lookup[arr[i]].push_back(i);
}
Now if you want to find a block with a zero in it you just an element from lookup[0].
However, given that we only need to track the blocks with zeroes in them, and not immediately look up the blocks with, say, a 13 in them, we may as well just do:
std::vector<int> emptyBlocks;
for(int i = 0; i < n; i++)
{
if(arr[i] == 0) { emptyBlocks.push_back(i); }
}
and then we can just grab empty blocks as we need them.
Note that you should take blocks from the back of emptyBlocks so that deleting them from the list doesn't require us to shift everything over. If you need to take the smallest indices first for some reason, traverse arr backwards when building the list of empty blocks.
That said, when you're allocating blocks typically you're trying to find a range of consecutive empty blocks. If that's the case, what you likely want is a way to look up the starting point of blocks of a given size. And you probably want it to be ordered, too, so that you can ask for "the smallest block at least this large."

Related

Fastest way to determine if a uint64 has been "seen" already

I've been interested in optimizing "renumbering" algorithms that can relabel an arbitrary array of integers with duplicates into labels starting from 1. Sets and maps are too slow for what I've been trying to do, as are sorts. Is there a data structure that only remembers if a number has been seen or not reliably? I was considering experimenting with a bloom filter, but I have >12M integers and the target performance is faster than a good hashmap. Is this possible?
Here's a simple example pseudo-c++ algorithm that would be slow:
// note: all elements guaranteed > 0
std::vector<uint64_t> arr = { 21942198, 91292, 21942198, ... millions more };
std::unordered_map<uint64_t, uint64_t> renumber;
renumber.reserve(arr.size());
uint64_t next_label = 1;
for (uint64_t i = 0; i < arr.size(); i++) {
uint64_t elem = arr[i];
if (renumber[elem]) {
arr[i] = renumber[elem];
}
else {
renumber[elem] = next_label;
arr[i] = next_label;
++next_label;
}
}
Example input/output:
{ 12, 38, 1201, 120, 12, 39, 320, 1, 1 }
->
{ 1, 2, 3, 4, 1, 5, 6, 7, 7 }
Your algorithm is not bad, but the appropriate data structure to use for the map is a hash table with open addressing.
As explained in this answer, std::unordered_map can't be implemented that way: https://stackoverflow.com/a/31113618/5483526
So if the STL container is really too slow for you, then you can do better by making your own.
Note, however, that:
90% of the time, when someone complains about STL containers being too slow, they are running a debug build with optimizations turned off. Make sure you are running a release build compiled with optimizations on. Running your code on 12M integers should take a few milliseconds at most.
You are accessing the map multiple times when only once is required, like this:
uint64_t next_label = 1;
for (size_t i = 0; i < arr.size(); i++) {
uint64_t elem = arr[i];
uint64_t &label = renumber[elem];
if (!label) {
label = next_label++;
}
arr[i] = label;
}
Note that the unordered_map operator [] returns a reference to the associated value (creating it if it doesn't exist), so you can test and modify the value without having to search the map again.
Updated with bug fix
First, anytime you experience "slowness" with a std:: collection class like vector or map, just recompile with optimizations (release build). There is usually a 10x speedup.
Now to your problem. I'll show a two-pass solution that runs in O(N) time. I'll leave it as an exercise for you to convert to a one-pass solution. But I'll assert that this should be fast enough, even for vectors with millions of items.
First, declare not one, but two unordered maps:
std::unordered_map<uint64_t, uint64_t> element_to_label;
std::unordered_map<uint64_t, std::pair<uint64_t, std::vector<uint64_t>>> label_to_elements;
The first map, element_to_label maps an integer value found in the original array to it's unique label.
The second map, label_to_elements maps to both the element value and the list of indices that element occurs in the original array.
Now to build these maps:
element_to_label.reserve(arr.size());
label_to_elements.reserve(arr.size());
uint64_t next_label = 1;
for (size_t index = 0; index < arr.size(); index++)
{
const uint64_t elem = arr[index];
auto itor = element_to_label.find(elem);
if (itor == element_to_label.end())
{
// new element
element_to_label[elem] = next_label;
auto &p = label_to_elements[next_label];
p.first = elem;
p.second.push_back(index);
next_label++;
}
else
{
// existing element
uint64_t label = itor->second;
label_to_elements[label].second.push_back(index);
}
}
When the above code runs, it's built up a database all values in the array, their labels, and indices where they occur.
So now to renumber the array such that all elements are replaced with their smaller label value:
for (auto itor = label_to_elements.begin(); itor != label_to_elements.end(); itor++)
{
uint64_t label = itor->first;
auto& p = itor->second;
uint64_t elem = p.first; // technically, this isn't needed. It's just useful to know which element value we are replacing from the original array
const auto& vec = p.second;
for (size_t j = 0; j < vec.size(); j++)
{
size_t index = vec[j];
arr[index] = label;
}
}
Notice where I assign variables by reference with the & operator to avoid making an expensive copy of any value in the maps.
So if your original vector or array was this:
{ 100001, 2000002, 300003, 400004, 400004, 300003, 2000002, 100001 };
Then the application of labels would render the array as this:
{1,2,3,4,4,3,2,1}
And what's nice you still have a quick O(1) look operator to map any label in that set back to its original element value using label_to_elements

How do you: fill & sort the newly-filled vector?

I want to fill a vector with random elements that appear 2 or more times besides one, then sort the said vector.
To try and explain what I meant by this question, I am going to leave you with an example of this type of vector:
vector<int> myVec = {1, 1, 4, 4, 8, 8, 11, 13, 13}
Fill it with random elements (1, 4, 8, 11, 13 for example) seem pretty random
Make every element besides one appear two times (so see how there's only a single "iteration" of 11)
Sort it from the smallest number to the biggest
I've already managed to do step 3 in this way:
sort(myVec.begin(), myVec.end());
for(int i = 0; i < 9; ++i) {
printf("%d", myVec[i]);
}
How would you do step 1 & 2? Some sort of myVec.insert or myVec.push_back trickery that I can't think of or is there a completely different way?
I was originally thinking about myVec.push_back & two for loops (int i = 0; i < nr of elements; ++i) and another loop inside of that (int k = 0; k <= i; ++k) but I must've messed something up (I think that way I would've been able to have the duplicate part done, not sure).
Take an empty vector.
fill it(push_back) with random numbers(see random function online)
now take a for loop and except the last one push_back remaining existing
elements in the vector
so now you can sort it.
Since you want to generate the values first, we can be a bit more efficient and use insertion-sort instead of sorting at the end.
#include <algorithm>
#include <random>
#include <vector>
// Constant to make the code flexible. Doesn't need to be constexpr.
constexpr int num_values = 10;
// First, create the source of randomness.
std::random_device rand_device;
// Then, build an engine for generating the random values.
std::mt19937 mersenne_engine{rand_device()};
// Finally, specify the distribution of values to generate.
std::uniform_int_distribution<int> value_dist{1, 50};
// Now we're finally ready to fill the vector!
std::vector<int> myVec;
// Reserve the space required for all of the values.
const int capacity = (num_values * 2) - 1;
// NOTE: Actual capacity not guaranteed to be equal, might be greater.
myVec.reserve(capacity);
// Pick the random unique value to place into the vector.
myVec.push_back(value_dist(mersenne_engine));
// Loop until enough values are generated.
while (myVec.size() < capacity) {
// Choose a random value.
const int value = value_dist(mersenne_engine);
// Find the insertion position of the new value.
const auto it = std::lower_bound(myVec.begin(), myVec.end(), value);
// Make sure the value doesn't exist yet.
if (it == myVec.end() || *it != value) {
// Then insert it twice.
myVec.insert(it, value);
myVec.insert(it, value);
}
}
Demo
Note that this strategy will loop infinitely if the value distribution is smaller than the number of elements you're looking to insert. Hopefully, the code is clear enough for you to make changes to handle that situation.

Manipulating array's values in a certain way

So I was asked to write a function that changes array's values in a way that:
All of the values that are the smallest aren't changed
if, let's assume, the smallest number is 2 and there is no 3's and 4's then all 5's are changed for 3's etc.
for example, for an array = [2, 5, 7, 5] we would get [2, 3, 4, 3], which generalizes to getting a minimal value of an array which remains unchanged, and every other minimum (not including the first one) is changed depending on which minimum it is. On our example - 5 is the first minimum (besides 2), so it is 2 (first minimum) + 1 = 3, 7 is 2nd smallest after 2, so it is 2+2(as it is 2nd smallest).
I've come up with something like this:
int fillGaps(int arr[], size_t sz){
int min = *min_element(arr, arr+sz);
int w = 1;
for (int i = 0; i<sz; i++){
if (arr[i] == min) {continue;}
else{
int mini = *min_element(arr+i, arr+sz);
for (int j = 0; j<sz; j++){
if (arr[j] == mini){arr[j] = min+w;}
}
w++;}
}
return arr[sz-1];
}
However it works fine only for the 0th and 1st value, it doesnt affect any further items. Could anyone please help me with that?
I don't quite follow the logic of your function, so can't quite comment on that.
Here's how I interpret what needs to be done. Note that my example implementation is written to be as understandable as possible. There might be ways to make it faster.
Note that I'm also using an std::vector, to make things more readable and C++-like. You really shouldn't be passing raw pointers and sizes, that's super error prone. At the very least bundle them in a struct.
#include <algorithm>
#include <set>
#include <unordered_map>
#include <vector>
int fillGaps (std::vector<int> & data) {
// Make sure we don't have to worry about edge cases in the code below.
if (data.empty()) { return 0; }
/* The minimum number of times we need to loop over the data is two.
* First to check which values are in there, which lets us decide
* what each original value should be replaced with. Second to do the
* actual replacing.
*
* So let's trade some memory for speed and start by creating a lookup table.
* Each entry will map an existing value to its new value. Let's use the
* "define lambda and immediately invoke it" to make the scope of variables
* used to calculate all this as small as possible.
*/
auto const valueMapping = [&data] {
// Use an std::set so we get all unique values in sorted order.
std::set<int> values;
for (int e : data) { values.insert(e); }
std::unordered_map<int, int> result;
result.reserve(values.size());
// Map minimum value to itself, and increase replacement value by one for
// each subsequent value present in the data vector.
int replacement = *values.begin();
for (auto e : values) { result.emplace(e, replacement++); }
return result;
}();
// Now the actual algorithm is trivial: loop over the data and replace each
// element with its replacement value.
for (auto & e : data) { e = valueMapping.at(e); }
return data.back();
}

Large vector "Segmentation fault" error

I have gathered a large amount of extremely useful information from other peoples' questions and answers on SO, and have searched duly for an answer to this one as well. Unfortunately I have not found a solution to this problem.
The following function to generate a list of primes:
void genPrimes (std::vector<int>* primesPtr, int upperBound = 10)
{
std::ofstream log;
log.open("log.txt");
std::vector<int>& primesRef = *primesPtr;
// Populate primes with non-neg reals
for (int i = 2; i <= upperBound; i++)
primesRef.push_back(i);
log << "Generated reals successfully." << std::endl;
log << primesRef.size() << std::endl;
// Eratosthenes sieve to remove non-primes
for (int i = 0; i < primesRef.size(); i++) {
if (primesRef[i] == 0) continue;
int jumpStart = primesRef[i];
for (int jump = jumpStart; jump < primesRef.size(); jump += jumpStart) {
if (primesRef[i+jump] == 0) continue;
primesRef[i+jump] = 0;
}
}
log << "Executed Eratosthenes Sieve successfully.\n";
for (int i = 0; i < primesRef.size(); i++) {
if (primesRef[i] == 0) {
primesRef.erase(primesRef.begin() + i);
i--;
}
}
log << "Cleaned list.\n";
log.close();
}
is called by:
const int SIZE = 500;
std::vector<int>* primes = new std::vector<int>[SIZE];
genPrimes(primes, SIZE);
This code works well. However, when I change the value of SIZE to a larger number (say, 500000), the compiler returns a "segmentation error." I'm not familiar enough with vectors to understand the problem. Any help is much appreciated.
You are accessing primesRef[i + jump] where i could be primesRef.size() - 1 and jump could be primesRef.size() - 1, leading to an out of bounds access.
It is happening with a 500 limit, it is just that you happen to not have any bad side effects from the out of bound access at the moment.
Also note that using a vector here is a bad choice as every erase will have to move all of the following entries in memory.
Are you sure you wanted to do
new std::vector<int> [500];
and not
new std::vector<int> (500);
In the latter case, you are specifying the size of the vector, whose location is available to you via the variable named 'primes'.
In the former, you are requesting space for 500 vectors, each sized to the default that the STL library wants.
That would be something like (on my system : 24*500 bytes). In the latter case, 500 length vector(only one vector) is what you are asking for.
EDIT: look at the usage - he needs just one vector.
std::vector& primesRef = *primesPtr;
The problem lies here:
// Populate primes with non-neg reals
for (int i = 2; i <= upperBound; i++)
primesRef.push_back(i);
You only have N-2 elements in your vector pushed back, but then try to access an element at N-1 (i+jump). The fact that it did not fail on 500 is just dumb luck that the memory being overwritten was not catastrophic.
This code works well. However, when I change the value of SIZE to a larger number (say, 500000), ...
That may blow your stack, and be to big allocated with it. You need dynamic memory allocation for all of the std::vector<int> instances you believe to need.
To achieve that, simply use a nested std::vetcor like this.
std::vector<std::vector<int>> primes(SIZE);
instead.
But to get straight on, I seriously doubt you need number of SIZE vector instances to store all of the prime numbers found, but just a single one initialized like this:
std::vector<int> primes(SIZE);

How to delete element of an array of structs?

I have a program that has to choose between 10 bins of parts. Once the user have chosen a bin, the program ask if you want to either add or remove parts.my problem is when I am trying to add or remove from a single bin it adds and removes from all the bins.
struct Inventory
{
char description[35];
int num;
};
Inventory parts[Number_Bins] = {
{"Valve", 10},
{"Bearing", 5},
{"Bushing", 15},
{"Coupling", 21},
{"Flange", 7},
{"Gear", 5},
{"Gear Housing", 5},
{"Vacuum Gripper", 25},
{"Cable", 18},
{"Rod", 12}
};
This is my function to remove parts. I have another to add parts and it is similar. I could create like 10 of this for each element of the array but that is not the point.
void RemoveParts(Inventory bins[])
{
int e = 10;
int enter2;
cout << "Enter how many you want to remove\n";
cin >> enter2;
if (enter2 < 0)
{
cout << "Negative Values are not legal. Try again\n";
}
else
{
for (int index = 0; index < e; index++)
{
bins[index].num = bins[index].num - enter2;
}
}
}
I use a switch menu to pick up any bins. So there are 10 cases. Is there any way I can make it easier and write less code?
That's because you are looping through all the bins in here:
for (int index=0; index<e; index++){
bins[index].num = bins[index].num - enter2;
}
If you want to remove from a certain bin you have to tell your program which one.
From what I understand, you want to remove an entire bin...
To do this you have to remove the element from the array.
I suggest using either std::vector<Inventory> parts(Number_Bins) or std::array<Inventory, 10> parts; or even std::list<Inventory> parts(Number_Bins). then set up that.
Then removing elements will be just by using remove
To remove from your array specifically you have to shift the entire array.
//where enter2 is the element we want to erase
for (int index=enter2-1; index<total_size; index++){
bins[index] = bins[index + 1];
}
// then reinit content of last element
bins[total_size-1] = 0;
I don't recommend this road at all, it makes everything harder, and that feels like an understatement. This is why:
Now you have to check that setting the Inventory item to 0 successfully sets both variables to 0
Now you have to keep track of the number of Initialized elements in the array
You also need to track the size of the array
This is no C++, there is no encapsulation, no OOP, and this will likely force you to introduce global variables because you'll have to keep track of stuff.
Writing non scalable and non maintainable code is bad for your present, but most importantly, to your future.