Is an unordered_set modified internally? - c++

I've been reading the cplusplus.com site and trying to make sure that my unordered_set of numbers won't be modified in any way. The site says that the elements of a container are not sorted which is the case with plain set.
The site also says this:
Internally, the elements in the unordered_set are not sorted in any
particular order, but organized into buckets depending on their hash
values to allow for fast access to individual elements directly by
their values.
I have no clue what that means really (can you explain btw.?). Consider following example:
typedef const std::unordered_set<short> set_t;
set_t some_set = {1,3,5,7,9,12,14,16,18,19,21,23,25,27,30,32,34,36};
Can I make sure that the above set "some_set" will never be changed and that the numbers will always stay in the same order (because this is the goal here)?
I'm also not planning to insert or remove numbers from the set.

typedef const std::unordered_set<short> set_t;
set_t some_set = {1,3,5,7,9,12,14,16,18,19,21,23,25,27,30,32,34,36};
The changing order of the numbers in some_set depends on the operations that you do to some_set. The contents of some_set immediately after creation is not defined, but it probably won't be {1,3,5,7,9,12,14,16,18,19,21,23,25,27,30,32,34,36}. You can see this with a simple demo:
#include <iostream>
#include <unordered_set>
int main() {
typedef const std::unordered_set<short> set_t;
set_t some_set = {1,3,5,7,9,12,14,16,18,19,21,23,25,27,30,32,34,36};
for (short s : some_set)
std::cout << s << std::endl;
// The order won't change if we don't modify the contents
std::cout << "AGAIN!" << std::endl;
for (short s : some_set)
std::cout << s << std::endl;
// If we put a bunch of stuff in
for (short s = 31; s < 100; s += 4)
some_set.insert(s);
// The elements from *before* the modification are not necessarily in the
// same order as before.
std::cout << "MODIFIED" << std::endl;
for (short s : some_set)
std::cout << s << std::endl;
}

Related

Array like data structure with negetive index in C++

I need a structure to keep track of presence of some items. I just wanted to take an array a0....aN and mark the elements as a[0]=0,a[1]=0,a[2]=1........(a[i]=1 if the element is present,a[i]=0 if element is not present).
But the items range from -1000 to +1000. It can be done by putting the negative range from 1001 to 2000. I needed to know if there is any other data structure in c++ that can work like array and with negative indexes. Thank you for your time.
map is used for this only, to have key/index of any basic/user-defined data type. See - http://www.cplusplus.com/reference/map/map/
Example for your case:
#include <iostream>
#include <map>
#include <string>
int main ()
{
std::map<int, int> mymap;
mymap[-1]=1;
mymap[-2]=0;
mymap[-3]=1;
std::cout << mymap[-1] << '\n';
std::cout << mymap[-2] << '\n';
std::cout << mymap[-3] << '\n';
return 0;
}
Example for char:
#include <iostream>
#include <map>
#include <string>
int main ()
{
std::map<char,std::string> mymap;
mymap['a']="an element";
mymap['b']="another element";
mymap['c']=mymap['b'];
std::cout << "mymap['a'] is " << mymap['a'] << '\n';
std::cout << "mymap['b'] is " << mymap['b'] << '\n';
std::cout << "mymap['c'] is " << mymap['c'] << '\n';
std::cout << "mymap['d'] is " << mymap['d'] << '\n';
std::cout << "mymap now contains " << mymap.size() << " elements.\n";
return 0;
}
You an create your own data structure which supports -ve indexes. Just add an offset to the indexs while storing them in an array.
class MyArray {
int *arr;
public:
MyArray(int offset) {
arr = new int[2*offset]; // size must to double the offset
}
~MyArray(){
delete arr;
}
void add(int index, int val) {
arr[index + offset] = val;
}
void get(int index) {
return arr[index + offset];
}
}
Then you can just use your class to add and get elements with any index.
MyArray arr = MyArray(1000); // pass max -ve index as offset
arr.add(10, -150);
cout << arr.get(100);
I need a structure to keep track of presence of some items.
If what you want is set semantics, use a set data structure.
No need to implement a custom array wrapper.
You can use a std::set (or std::unordered_set) for that. Remember that "premature optimization is the root of all evil".
Insert the values that are there, leave out the values that are missing. No need to worry about negative indices.
You can use the methods std::set::find() or std::set::count() to check the presence of an item. Have a look at the documentation to find some example code.
If you later find it's a performance critical optimization, you can replace a std::set<int> with a data structure that you wrote yourself on the basis of an array of bits anytime. If it's not, doing so prematurely might turn out to be an unnecessary source of unexpected errors and a waste of time.
For reference:
http://en.cppreference.com/w/cpp/container/set
http://en.cppreference.com/w/cpp/container/unordered_set
http://en.cppreference.com/w/cpp/container/set/find
http://en.cppreference.com/w/cpp/container/set/count
How to check that an element is in a std::set?
Most efficient approach will be just shifting your array indexes so all of them are non-negative. In your case just use a[i+1000] and it will be sufficient.
If you really need to use negative indexes it is also possible.
C / C++ calculates memory address of array element using address of table and then adding index value to it. Using negative numbers just points to memory area placed before your table (which is not you normally want).
int a[2001];
int *b = &a[1000];
int x = b[-1000]; // This points to 1000 places before b which translates to a[0] (valid place)
Another approach will be using containers. Then any number can be translated to string and stored in proper container.
I think that the answer of #Rajev is almost fine. I have just replaced a plain array with a std::vector. Thus, the memory management is secure and copying and moving is easy.
template <typname T>
class MyArray {
private:
std::vector<T> arr;
public:
MyArray(int offset) {
arr.resize(2*offset); // size must to double the offset
}
void set(int index, int val) {
arr[index + offset] = val;
}
void get(int index) {
return arr[index + offset];
}
}
You can expand this further by overloading the operator [] of MyArray.

Iterating through two maps in c++

I would like to loop through two maps at the same time, how could I achieve this?
I have two vectors want to print both, can I do two time (auto it : mymap) within one for? Something like:
for (auto it: mymap && auto on: secondMap)
is this even allowed?
I am trying to print values like (value1, value2) where each of the values is in a different map. The maps do not necessarily contain the exact same items but the key is an Instruction and the value is an integer, so if I have a element in the map for value2, then not necessarily there is a value1 corresponding to the same key, but in that case it should be 0 which is the default integer value.
Any ideas?
Perhaps it is possible to combine two iterators, one for each map?
Kind regards,
Guus Leijsten
You can use the regular for-loop for this :
#include <iostream>
#include <map>
int main(int argc, char* argv[]) {
std::map<int, std::string> m1, m2;
m1.insert({15, "lala"});
m1.insert({10, "hey!"});
m1.insert({99, "this"});
m2.insert({50, "foo"});
m2.insert({51, "bar"});
for(auto it_m1 = m1.cbegin(), end_m1 = m1.cend(),
it_m2 = m2.cbegin(), end_m2 = m2.cend();
it_m1 != end_m1 || it_m2 != end_m2;)
{
if(it_m1 != end_m1) {
std::cout << "m1: " << it_m1->first << " " << it_m1->second << " | ";
++it_m1;
}
if(it_m2 != end_m2) {
std::cout << "m2: " << it_m2->first << " " << it_m2->second << std::endl;
++it_m2;
}
}
return EXIT_SUCCESS;
}
Note that because you want to iterate over maps of different size, you have to use the || operator in loop condition. The direct consequence is that you cannot increment in the last part of the for-loop, as one of the iterator may be invalid at that time (and lead to a segmentation fault).
You have to check iterator validity inside the loop and increment it when it's valid, as shown in the sample above.

Combinations of N Boost interval_set

I have a service which has outages in 4 different locations. I am modeling each location outages into a Boost ICL interval_set. I want to know when at least N locations have an active outage.
Therefore, following this answer, I have implemented a combination algorithm, so I can create combinations between elemenets via interval_set intersections.
Whehn this process is over, I should have a certain number of interval_set, each one of them defining the outages for N locations simultaneusly, and the final step will be joining them to get the desired full picture.
The problem is that I'm currently debugging the code, and when the time of printing each intersection arrives, the output text gets crazy (even when I'm using gdb to debug step by step), and I can't see them, resulting in a lot of CPU usage.
I guess that somehow I'm sending to output a larger portion of memory than I should, but I can't see where the problem is.
This is a SSCCE:
#include <boost/icl/interval_set.hpp>
#include <algorithm>
#include <iostream>
#include <vector>
int main() {
// Initializing data for test
std::vector<boost::icl::interval_set<unsigned int> > outagesPerLocation;
for(unsigned int j=0; j<4; j++){
boost::icl::interval_set<unsigned int> outages;
for(unsigned int i=0; i<5; i++){
outages += boost::icl::discrete_interval<unsigned int>::closed(
(i*10), ((i*10) + 5 - j));
}
std::cout << "[Location " << (j+1) << "] " << outages << std::endl;
outagesPerLocation.push_back(outages);
}
// So now we have a vector of interval_sets, one per location. We will combine
// them so we get an interval_set defined for those periods where at least
// 2 locations have an outage (N)
unsigned int simultaneusOutagesRequired = 2; // (N)
// Create a bool vector in order to filter permutations, and only get
// the sorted permutations (which equals the combinations)
std::vector<bool> auxVector(outagesPerLocation.size());
std::fill(auxVector.begin() + simultaneusOutagesRequired, auxVector.end(), true);
// Create a vector where combinations will be stored
std::vector<boost::icl::interval_set<unsigned int> > combinations;
// Get all the combinations of N elements
unsigned int numCombinations = 0;
do{
bool firstElementSet = false;
for(unsigned int i=0; i<auxVector.size(); i++){
if(!auxVector[i]){
if(!firstElementSet){
// First location, insert to combinations vector
combinations.push_back(outagesPerLocation[i]);
firstElementSet = true;
}
else{
// Intersect with the other locations
combinations[numCombinations] -= outagesPerLocation[i];
}
}
}
numCombinations++;
std::cout << "[-INTERSEC-] " << combinations[numCombinations] << std::endl; // The problem appears here
}
while(std::next_permutation(auxVector.begin(), auxVector.end()));
// Get the union of the intersections and see the results
boost::icl::interval_set<unsigned int> finalOutages;
for(std::vector<boost::icl::interval_set<unsigned int> >::iterator
it = combinations.begin(); it != combinations.end(); it++){
finalOutages += *it;
}
std::cout << finalOutages << std::endl;
return 0;
}
Any help?
As I surmised, there's a "highlevel" approach here.
Boost ICL containers are more than just containers of "glorified pairs of interval starting/end points". They are designed to implement just that business of combining, searching, in a generically optimized fashion.
So you don't have to.
If you let the library do what it's supposed to do:
using TimePoint = unsigned;
using DownTimes = boost::icl::interval_set<TimePoint>;
using Interval = DownTimes::interval_type;
using Records = std::vector<DownTimes>;
Using functional domain typedefs invites a higher level approach. Now, let's ask the hypothetical "business question":
What do we actually want to do with our records of per-location downtimes?
Well, we essentially want to
tally them for all discernable time slots and
filter those where tallies are at least 2
finally, we'd like to show the "merged" time slots that remain.
Ok, engineer: implement it!
Hmm. Tallying. How hard could it be?
❕ The key to elegant solutions is the choice of the right datastructure
using Tally = unsigned; // or: bit mask representing affected locations?
using DownMap = boost::icl::interval_map<TimePoint, Tally>;
Now it's just bulk insertion:
// We will do a tally of affected locations per time slot
DownMap tallied;
for (auto& location : records)
for (auto& incident : location)
tallied.add({incident, 1u});
Ok, let's filter. We just need the predicate that works on our DownMap, right
// define threshold where at least 2 locations have an outage
auto exceeds_threshold = [](DownMap::value_type const& slot) {
return slot.second >= 2;
};
Merge the time slots!
Actually. We just create another DownTimes set, right. Just, not per location this time.
The choice of data structure wins the day again:
// just printing the union of any criticals:
DownTimes merged;
for (auto&& slot : tallied | filtered(exceeds_threshold) | map_keys)
merged.insert(slot);
Report!
std::cout << "Criticals: " << merged << "\n";
Note that nowhere did we come close to manipulating array indices, overlapping or non-overlapping intervals, closed or open boundaries. Or, [eeeeek!] brute force permutations of collection elements.
We just stated our goals, and let the library do the work.
Full Demo
Live On Coliru
#include <boost/icl/interval_set.hpp>
#include <boost/icl/interval_map.hpp>
#include <boost/range.hpp>
#include <boost/range/algorithm.hpp>
#include <boost/range/adaptors.hpp>
#include <boost/range/numeric.hpp>
#include <boost/range/irange.hpp>
#include <algorithm>
#include <iostream>
#include <vector>
using TimePoint = unsigned;
using DownTimes = boost::icl::interval_set<TimePoint>;
using Interval = DownTimes::interval_type;
using Records = std::vector<DownTimes>;
using Tally = unsigned; // or: bit mask representing affected locations?
using DownMap = boost::icl::interval_map<TimePoint, Tally>;
// Just for fun, removed the explicit loops from the generation too. Obviously,
// this is bit gratuitous :)
static DownTimes generate_downtime(int j) {
return boost::accumulate(
boost::irange(0, 5),
DownTimes{},
[j](DownTimes accum, int i) { return accum + Interval::closed((i*10), ((i*10) + 5 - j)); }
);
}
int main() {
// Initializing data for test
using namespace boost::adaptors;
auto const records = boost::copy_range<Records>(boost::irange(0,4) | transformed(generate_downtime));
for (auto location : records | indexed()) {
std::cout << "Location " << (location.index()+1) << " " << location.value() << std::endl;
}
// We will do a tally of affected locations per time slot
DownMap tallied;
for (auto& location : records)
for (auto& incident : location)
tallied.add({incident, 1u});
// We will combine them so we get an interval_set defined for those periods
// where at least 2 locations have an outage
auto exceeds_threshold = [](DownMap::value_type const& slot) {
return slot.second >= 2;
};
// just printing the union of any criticals:
DownTimes merged;
for (auto&& slot : tallied | filtered(exceeds_threshold) | map_keys)
merged.insert(slot);
std::cout << "Criticals: " << merged << "\n";
}
Which prints
Location 1 {[0,5][10,15][20,25][30,35][40,45]}
Location 2 {[0,4][10,14][20,24][30,34][40,44]}
Location 3 {[0,3][10,13][20,23][30,33][40,43]}
Location 4 {[0,2][10,12][20,22][30,32][40,42]}
Criticals: {[0,4][10,14][20,24][30,34][40,44]}
At the end of the permutation loop, you write:
numCombinations++;
std::cout << "[-INTERSEC-] " << combinations[numCombinations] << std::endl; // The problem appears here
My debugger tells me that on the first iteration numCombinations was 0 before the increment. But incrementing it made it out of range for the combinations container (since that is only a single element, so having index 0).
Did you mean to increment it after the use? Was there any particular reason not to use
std::cout << "[-INTERSEC-] " << combinations.back() << "\n";
or, for c++03
std::cout << "[-INTERSEC-] " << combinations[combinations.size()-1] << "\n";
or even just:
std::cout << "[-INTERSEC-] " << combinations.at(numCombinations) << "\n";
which would have thrown std::out_of_range?
On a side note, I think Boost ICL has vastly more efficient ways to get the answer you're after. Let me think about this for a moment. Will post another answer if I see it.
UPDATE: Posted the other answer show casing highlevel coding with Boost ICL

iterating a vector how to check at which position I am?

Example:
for (vector<string>::reverse_iterator it = myV.rbegin(); it != myV.rend(); ++it)
{
cout << "current value is: " << *it << ", and current position is: " << /* */ << endl;
}
I know I could check how many items there are in the vector, make a counter, and so on. But I wonder if there is a more direct way of checking current index without asserting that I got the length of the vector right.
vector Iterators support difference you can subtract you current iterator it from rbegin.
EDIT
As noted in a comment not all iterators support operator- so std::distance would have to be used. However I would not recommend this as std::distance will cause a liner time performance cost for iterators that are not random access while if you use it - begin() the compiler will tell you that won't work and then you can use distance if you must.
Subtract std::vector<T>::begin() (or rbegin() in your case) from the current iterator. Here's a small example:
#include <vector>
#include <iostream>
int main()
{
std::vector<int> x;
x.push_back(1);
x.push_back(1);
x.push_back(3);
std::cout << "Elements: " << x.end() - x.begin();
std::cout << "R-Elements: " << x.rend() - x.rbegin();
return 0;
}
As pointed out in a really great comment above, std::distance may be an even better choice. std::distance supports random access iterators in constant time, but also supports other categories of iterators in linear time.
Iterators are used to allow generic algorithms to be written that invariant to a choice of a container. I've read in the STL Book that this is great, but may lead to performance drop because sometimes the member functions of a container are optimized for the container and will run faster than generic code that relies on iterators. In this case, if you are dealing with a large vector, you will be calling the std::distance, which although constant is not necessary. If you know that you will be using oly vector for this algorithm, you may recognize that it supports the direct access operator "[]" and write something like this:
#include <vector>
#include <iostream>
using namespace std;
int main ()
{
vector<int> myV;
for (int I = 0; I < 100; ++I)
{
myV.push_back(I);
}
for (int I = 0; I < myV.size(); ++I)
{
cout << "current value is: " << myV[I]
<< ", and current position is: " << I << endl;
}
return 0;
}
In case you are interested in speed, you can always try the different answers proposed here and measure the execution time. It will depend on the vector size probably.
Keep a counter:
for (vector<string>::reverse_iterator it = myV.rbegin(),
int pos = myV.size;
it != myV.rend(),
--pos;
++it)
{
cout << "current value is: " << *it << ", and current position is: " << pos << endl;
}

changing value in a stl map in place

I understand that when we insert values into the STL map, a copy is made and stored.
I have code that essentially does a find on the map and obtains an iterator.
I then intend to use the iterator to change the value in the map.
The results are not what I would expect ie: the value is not changed when accessed from another part of the program. I suspect its because the change I am applying is to a copy of
value.
the relevant code is as follows.
ObjectMappingType::iterator it = objectMapping_.find(symbol);
if (it == objectMapping_.end()) {
throw std::invalid_argument("Unknown symbol: " + symbol);
}
get<3>(it->second) = value;
NOTE: I am actually trying to change a value inside a boost::tuple that is stored as the 'value' part of the map.
Hmm... both methods seem to work fine for me. Here's the entire example that I used:
#include <iostream>
#include <map>
#include <string>
#include <boost/tuple/tuple.hpp>
typedef boost::tuple<int, std::string> value_type;
typedef std::map<int, value_type> map_type;
std::ostream&
operator<<(std::ostream& os, value_type const& v) {
os << " number " << boost::get<0>(v)
<< " string " << boost::get<1>(v);
return os;
}
int
main() {
map_type m;
m[0] = value_type(0, "zero");
m[1] = value_type(0, "one");
m[2] = value_type(0, "two");
std::cout
<< "m[0] " << m[0] << "\n"
<< "m[1] " << m[1] << "\n"
<< "m[2] " << m[2] << "\n"
<< std::endl;
boost::get<0>(m[1]) = 1;
map_type::iterator iter = m.find(2);
boost::get<0>(iter->second) = 2;
std::cout
<< "m[0] " << m[0] << "\n"
<< "m[1] " << m[1] << "\n"
<< "m[2] " << m[2] << "\n"
<< std::endl;
return 0;
}
The output is exactly what I would have expected.
lorien$ g++ -I/opt/include -gdwarf-2 foo.cpp
lorien$ ./a.out
m[0] number 0 string zero
m[1] number 0 string one
m[2] number 0 string two
m[0] number 0 string zero
m[1] number 1 string one
m[2] number 2 string two
lorien$
The operator[] on a map will give a reference to the actual contained element, but it has the nasty side-effect of creating a map entry if none existed before. Since you're already checking the result of find() to see if the key exists, you can use it safely.
get<3>(objectMapping_[symbol]) = value;
Without seeing more of your code I can't be sure of this, but it sounds like you could have a threading issue. Does your program use multiple threads by any chance? Maybe not even explicitly, but perhaps you call a library that does some work in a separate thread? Here is what I would do to start debugging.
Have a check that will re-find the value in the map after you set it, and check that it is the correct new value and throw an exception if it is not.
Reproduce the error by accessing the value from the "other part of the program" and see whether it throws the exception
Step through with a debugger to make sure that the modification is indeed happening BEFORE the access in the other part of the program instead of after.
If there are too many accesses to make it practical to do this by hand, dump a trace to a file. That is, add code to append to a log file every time the map is accessed. Each line should have the time of access to as fine a resolution as your system clock allows, the address of the map (so you know you are modifying the same map), the symbol key, the value, and the new value (if this was a modifying access). This way you can pinpoint exactly what times the map modifications are not showing up in the other part of the program, and whether they are before or after the access.