Getting multiple Instances of key using bsearch() - c++

is there a way to implement bsearch() to find multiple instances of key.
for example: (obj*)bsearch(key=r,arr,elements,sizeof(obj),(int(*)(const void*, const void*)bcompare);
The code I currently wrote only finds the first instance and cannot proceed past the first found due to how it works.
getline(target,81);
if(strcmp(target,"exit") == 0 || strcmp(target, "") == 0) break;
p = (Info*)bsearch(target,list,num,sizeof(Info),(int(*)(const void*, const void*))bcompare);
int foundIndex = (int)(p-list);
if(!p){
err_dis1_win();
clrscr();
}
else{
display_record(p);
cout << "\n\n found at index " << foundIndex << "\n";
getch();
clrscr();
}
Variables:
p - is a pointer to object of class Info
target - arr of char
list - arr of obj
foundIndex - index of element found
Info - derived class from base class
**compare function
int bcompare(char *a,Info *b){
return(strncmpi(a, b -> get_name(), strlen(a)));
}
I cannot use other methods such as std::find or writing my own binary search function and have to use bsearch()
I have tried loops inside the else block, and the compare function using the varible foundIndex, as well as using a while loop on the return value looping through the obj list arr. Is there a way to start at a specific index. I appreciate any help. I am not looking for code but a general push in the right direction. Thank you.
Caveat - The current code compiles and runs as expected however, the functionality that I want, cannot be figured out by myself. Google and search on Stackoverflow has not produced an related issue.

Since bsearch() returns only one item, I interpret "find multiple instances of key" as "find the first instance of a key". The caller can then step forward through the array from that item to process each item matching the key, until it reaches the end or reaches an item that does not match.
If you must use the standard library's bsearch() function and persuade it to find the first item matching a given key, then all you really have to work with is the comparison function you present. bsearch() will return an item that matches the key according to that function, but if more than one item matches then there is no guarantee which one will be returned. You must ensure, then, that only the item you want matches.
You can approach that with an appropriate implementation of the comparison function, but there is a significant problem. The function will in some cases need to evaluate the item preceding the one specified to it, but it must not attempt to examine an item preceding the array's first. bsearch() does not itself convey any information about the array bounds to the comparison function.
There are at least two possible solutions, neither of them stellar.
Store the array lower bound in some well-known location that the function can access. For example, if the comparison function is a static member function, then maybe you would use a static variable of its class. But that is not thread-safe. You could do something similar with thread-local variables, but even then it's ugly. Either way, you have to be sure to set that variable appropriately before you call bsearch(), and that's ugly, too.
OR
Ensure that you never bsearch() for the first item. One way you could do that would be by checking preliminarily whether the first item matches (but not via the comparison function), and using it directly instead of calling bsearch() in the event that it does match. I'd wrap that in a method, myself, and if you must not do so then requiring that such a calling discipline be employed manually is also ugly.
Having chosen one of the above, you can implement a comparison function that looks at the previous item's key in addition to the specified item's. Something along these lines (which assumes the second alternative):
struct my_item {
int key;
void *data;
};
// bsearch() passes the target item as the first argument, and the one to compare
// to it as the second
int compare_items(const void *to_find, const void *to_check) {
const struct my_item *to_find_item = (const struct my_item *) to_find;
const struct my_item *to_check_item = (const struct my_item *) to_check;
// Check first how the key members are ordered
if (to_find_item->key < to_check_item->key) {
return -1;
} else if (to_find_item->key > to_check_item->key) {
return 1;
} else {
// The key members match, so check whether we're looking at the first
// such item.
const struct my_item *previous_item = to_check_item - 1;
// If the previous item's key does match, then we know the item we're
// looking for is an earlier one than we are presently checking.
return (previous_item->key == to_check_item->key) ? -1 : 0;
}
}

Related

Declaring a std::list with an array index C++

I was following a hash table implementation online (https://www.youtube.com/watch?v=2_3fR-k-LzI) when I observed the video author initialize a std::list with an array index. This was very confusing to me as I was always under the impression that std::list was always meant to operate like a linked list and was not capable of supporting random indexing. However, I thought it was maybe a weird way to declare the size of a list and ignored it and moved on. Specifically, he did the following:
static const int hashGroups = 10;
std::list<std::pair<int, std::string>> table[hashGroups];
Upon trying to implement a function to search to see if a key resided in the hash table, I realized that I could not access the std::list objects as I would expect to be able to. In HashTable.cpp (which includes the header file that defines the two variables above) I was only able to access the table member variable's elements as a pointer with -> instead of with . as I would expect to be able to. It looks like what is directly causing this is using the array index in the list definition. This seems to change the type of the table variable from a std::list to a pointer to a std::list. I do not understand why this is the case. This also appears to break my current implementation of attempting to iterate through the table variable because when I declare an iterator to iterate through table's elements, I am able to see that the table has the correct data in the VS debugger but the iterator seems to have completely invalid data and does not iterate through the loop even once despite seeing table correctly have 10 elements. My attempt at the search function is pasted below:
std::string HashTable::searchTable(int key) {
for (std::list<std::pair<int, std::string>>::const_iterator it = table->begin(); it != table->end(); it++)
{
if (key == it->first) {
return it->second;
}
std::cout << "One iteration performed." << std::endl;
}
return "No value found for that key.";
}
With all of this being said, I have several burning questions:
Why are we even able to declare a list with brackets when a std::list does not support random access?
Why does declaring a list like this change the type of the list from std::list to a pointer?
What would be the correct way to iterate through table in its current implementation with an iterator?
Thank you for any help or insight provided!
After reading the responses from #IgorTandetnik I realized that I was thinking about the list incorrectly. What I didn't fully understand was that we were declaring an array of lists and not attempting to initialize a list like an array. Once I realized this, I was able to access the elements correctly since I was not trying to iterate through an array with an iterator for a list. My revised searchTable function which to my knowledge now works correctly looks like this:
std::string HashTable::searchTable(int key) {
int hashedKey = hashFunction(key);
if (table[hashedKey].size() > 0)
{
for (std::list<std::pair<int, std::string>>::const_iterator it = table[hashedKey].begin(); it != table[hashedKey].end(); it++)
{
if (key == it->first) {
return it->second;
}
}
}
return "No value found for that key.";
}
And to answer my three previous questions...
1. Why are we even able to declare a list with brackets when a std::list does not support random access?
Response: We are declaring an array of std::list that contains a std::pair of int and std::string, not a list with the array index operator.
2. Why does declaring a list like this change the type of the list from std::list to a pointer?
Response: Because we are declaring table to be an array (which is equivalent to a const pointer to the first element) which contains instances of std::list. So we are never "changing" the type of the list variable.
3. What would be the correct way to iterate through table in its current implementation with an iterator?
Response: The current implementation only attempts to iterate over the first element of table. Create an iterator which uses the hashed key value as the array index of table and then tries to iterate through the std::list that holds instances of std::pair at that index.

Alternative to find() for determining whether an unordered_set contains a key

Suppose I have an unordered_set<int> S and I wanna to check if it contains a certain int x.
Is there a way for me to write something like if(S.contains(x)){ /* code */ } that works like if(S.find(x) != S.end()){ /* code */ } ?
It can be a macro or anything but I just find it ugly and unnecessarily long to write a simple lookup method like that.
Instead of using std::unordered_set's find() member function for determining whether a given key x is present as in:
if (S.find(x) != S.end()) { /* code */ }
you can simply use the count() member function:
if (S.count(x)) { /* code */ }
An std::unordered_set does not allow duplicates, so count() will return either 0 or 1.
The unordered_set::count() member function shouldn't be less efficient than unordered_set::find() since the traversal of the elements for finding out the count of the requested key can be stopped as soon as one is found because there can't be duplicates.
I think you need if(S.count(x)){//do something}.
According to cplusplus.com, the count function searches the container for elements with a value of k and returns the number of elements found. Because unordered_set containers do not allow for duplicate values, this means that the function actually returns 1 if an element with that value exists in the container, and zero otherwise.

How to search by member accessor value with std::find_if()?

I am learning C++ at the moment and have an example program implemented with an array of objects data store. To make some other operations easier, I have changed the store to a vector. With this change I am now not sure of the best way to search the store to find an object based on a member accessor value.
Initially I used a simple loop:
vector<Composer> composers; // where Composer has a member function get_last_name() that returns a string
Composer& Database::get_composer(string last_name)
{
for (Composer& c : composers)
if (c.get_last_name().compare(last_name))
return c;
throw std::out_of_range("Composer not found");
}
This works just fine of course, but to experiment I wanted to see if there were vector specific functions that could also do the job. So far I have settled on trying to use find_if() (if there is a better function, please suggest).
However, I am not sure exactly the correct way to use find_if(). Based on code seen in online research I have replaced the above with the following:
vector<Composer> composers; // where Composer has a member function get_last_name() that returns a string
Composer& Database::get_composer(string last_name)
{
auto found = find_if(composers.begin(), composers.end(),
[last_name](Composer& c) -> bool {c.get_last_name().compare(last_name);});
if (found == composers.end())
throw out_of_range("Composer not found");
else
return *found;
}
This does not work. It does find a result, but it is the incorrect one. If an argument that matches, say the third composer's last name the function always returns the first item from the vector (if I pass an argument that doesn't match any last name the function correctly throws an exception)... what am I doing wrong?
You are on the right track, your lambda needs return statement. Also in such case you do not have to specify it's return type explicitly, it can be deduced:
find_if(composers.begin(), composers.end(),
[last_name](const Composer& c) { return c.get_last_name() == last_name);});
you original code should not compile or at least emit warning(s), you should pay attention to them.
Note: it is not clear how your original code worked if you tested it, it should be:
if (c.get_last_name().compare(last_name) == 0 )
or simply:
if (c.get_last_name() == last_name )
as std::string::compare() returns int -1 0 or 1, so your code searches for string that does not match variable last_name
With range-v3, you may use projection:
auto it = ranges::find(composers, last_name, &composers::get_last_name);

a pushBack() function, as opposite to popFront()

Can I use popFront() and then eventually push back what was poped? The number of calls to popFront() might be more than one (but not much greater than it, say < 10, if does matter). This is also the number of calls which the imaginary pushBack() function will be called too.
for example:
string s = "Hello, World!";
int n = 5;
foreach(i; 0 .. n) {
// do something with s.front
s.popFront();
}
if(some_condition) {
foreach(i; 0 .. n) {
s.pushBack();
}
}
writeln(s); // should output "Hello, World!" since number of poped is same as pushed back.
I think popFront() does use .ptr but I'm not sure if it in D does makes any difference and can help anyway to reach my goal easily (i.e, in D's way and not write my own with a Circular buffer or so).
A completely different approach to reach it is very welcome too.
A range is either generative (e.g. if it's a list of random numbers), or it's a view into a container. In neither case does it make sense to push anything onto it. As you call popFront, you're iterating through the list and shrinking your view of the container. If you think of a range being like two C++ iterators for a moment, and you have something like
struct IterRange(T)
{
#property bool empty() { return iter == end; }
#property T front() { return *iter; }
void popFront() { ++iter; }
private Iterator iter;
private Iterator end;
}
then it will be easier to understand. If you called popFront, it would move the iterator forward by one, thereby changing which element you're looking at, but you can't add elements in front of it. That would require doing something like an insertion on the container itself, and maybe the iterator or range could be used to tell the container where you want an alement inserted, but the iterator or range can't do that itself. The same goes if you have a generative range like
struct IncRange(T)
{
#property bool empty() { value == T.max; }
#property T front() { return value; }
void popFront() { ++value; }
private T value;
}
It keeps incrementing the value, and there is no container backing it. So, it doesn't even have anywhere that you could push a value onto.
Arrays are a little bit funny because they're ranges but they're also containers (sort of). They have range semantics when popping elements off of them or slicing them, but they don't own their own memory, and once you append to them, you can get a completely different chunk of memory with the same values. So, it is sort of a range that you can add and remove elements from - but you can't do it using the range API. So, you could do something like
str = newChar ~ str;
but that's not terribly efficient. You could make it more efficient by creating a new array at the target size and then filling in its elements rather than concatenating repeatedly, but regardless, pushing something on the the front of an array is not a particularly idiomatic or efficient thing to be doing.
Now, if what you're looking to do is just reset the range so that it once again refers to the elements that were popped off rather than really push elements onto it - that is, open up the window again so that it shows what it showed before - that's a bit different. It's still not supported by the range API at all (you can never unpop anything that was popped off). However, if the range that you're dealing with is a forward range (and arrays are), then you can save the range before you pop off the elements and then use that to restore the previous state. e.g.
string s = "Hello, World!";
int n = 5;
auto saved = s.save;
foreach(i; 0 .. n)
s.popFront();
if(some_condition)
s = saved;
So, you have to explicitly store the previous state yourself in order to restore it instead of having something like unpopFront, but having the range store that itself (as would be required for unpopFront) would be very inefficient in most cases (much is it might work in the iterator case if the range kept track of where the beginning of the container was).
No, there is no standard way to "unpop" a range or a string.
If you were to pass a slice of a string to a function:
fun(s[5..10]);
You'd expect that that function would only be able to see those 5 characters. If there was a way to "unpop" the slice, the function would be able to see the entire string.
Now, D is a system programming language, so expanding a slice is possible using pointer arithmetic and GC queries. But there is nothing in the standard library to do this for you.

How to achieve better efficiency re-inserting into sets in C++

I need to modify an object that has already been inserted into a set. This isn't trivial because the iterator in the pair returned from an insertion of a single object is a const iterator and does not allow modifications. So, my plan was that if an insert failed I could copy that object into a temporary variable, erase it from the set, modify it locally and then insert my modified version.
insertResult = mySet.insert(newPep);
if( insertResult.second == false )
modifySet(insertResult.first, newPep);
void modifySet(set<Peptide>::iterator someIter, Peptide::Peptide newPep) {
Peptide tempPep = (*someIter);
someSet.erase(someIter);
// Modify tempPep - this does not modify the key
someSet.insert(tempPep);
}
This works, but I want to make my insert more efficient. I tried making another iterator and setting it equal to someIter in modifySet. Then after deleting someIter I would still have an iterator to that location in the set and I could use that as the insertion location.
void modifySet(set<Peptide>::iterator someIter, Peptide::Peptide newPep) {
Peptide tempPep = (*someIter);
anotherIter = someIter;
someSet.erase(someIter);
// Modify tempPep - this does not modify the key
someSet.insert(anotherIter, tempPep);
}
However, this results in a seg fault. I am hoping that someone can tell me why this insertion fails or suggest another way to modify an object that has already been inserted into a set.
The full source code can be viewed at github.
I agree with Peter that a map is probably a better model of what you are doing, specifically something like map<pep_key, Peptide::Peptide>, would let you do something like:
insertResult = myMap.insert(std::make_pair(newPep.keyField(), newPep));
if( insertResult.second == false )
insertResult.first->second = newPep;
To answer your question, the insert segfaults because erase invalidates an iterator, so inserting with it (or a copy of it) is analogous to dereferencing an invalid pointer. The only way I see to do what you want is with a const_cast
insertResult = mySet.insert(newPep);
if( insertResult.second == false )
const_cast<Peptide::Peptide&>(*(insertResult.first)) = newPep;
the const_cast approach looks like it will work for what you are doing, but is generally a bad idea.
I hope it isn't bad form to answer my own question, but I would like it to be here in case someone else ever has this problem. The answer of why my attempt seg faulted was given my academicRobot, but here is the solution to make this work with a set. While I do appreciate the other answers and plan to learn about maps, this question was about efficiently re-inserting into a set.
void modifySet(set<Peptide>::iterator someIter, Peptide::Peptide newPep) {
if( someIter == someSet.begin() ) {
Peptide tempPep = (*someIter);
someSet.erase(someIter);
// Modify tempPep - this does not modify the key
someSet.insert(tempPep);
}
else {
Peptide tempPep = (*someIter);
anotherIter = someIter;
--anotherIter;
someSet.erase(someIter);
// Modify tempPep - this does not modify the key
someSet.insert(anotherIter, tempPep);
}
}
In my program this change dropped my run time by about 15%, from 32 seconds down to 27 seconds. My larger data set is currently running and I have my fingers crossed that the 15% improvement scales.
std::set::insert returns a pair<iterator, bool> as far as I know. In any case, directly modifying an element in any sort of set is risky. What if your modification causes the item to compare equal to another existing item? What if it changes the item's position in the total order of items in the set? Depending on the implementation, this will cause undefined behaviour.
If the item's key remains the same and only its properties change, then I think what you really want is a map or an unordered_map instead of a set.
As you realized set are a bit messy to deal with because you have no way to indicate which part of the object should be considered for the key and which part you can modify safely.
The usual answer is to use a map or an unordered_map (if you have access to C++0x) and cut your object in two halves: the key and the satellite data.
Beware of the typical answer: std::map<key_type, Peptide>, while it seems easy it means you need to guarantee that the key part of the Peptide object always match the key it's associated with, the compiler won't help.
So you have 2 alternatives:
Cut Peptide in two: Peptide::Key and Peptide::Data, then you can use the map safely.
Don't provide any method to alter the part of Peptide which defines the key, then you can use the typical answer.
Finally, note that there are two ways to insert in a map-like object.
insert: insert but fails if the value already exists
operator[]: insert or update (which requires creating an empty object)
So, a solution would be:
class Peptide
{
public:
Peptide(int const id): mId(id) {}
int GetId() const;
void setWeight(float w);
void setLength(float l);
private:
int const mId;
float mWeight;
float mLength;
};
typedef std::unordered_map<int, Peptide> peptide_map;
Note that in case of update, it means creating a new object (default constructor) and then assigning to it. This is not possible here, because assignment means potentially changing the key part of the object.
std::map will make your life a lot easier and I wouldn't be surprised if it outperforms std::set for this particular case. The storage of the key might seem redundant but can be trivially cheap (ex: pointer to immutable data in Peptide with your own comparison predicate to compare the pointee correctly). With that you don't have to fuss about with the constness of the value associated with a key.
If you can change Peptide's implementation, you can avoid redundancy completely by making Peptide into two separate classes: one for the key part and one for the value associated with the key.