How to store a list of strings - c++

I want store a list of strings that I will generate. I don't know the amount of strings and I also don't want to store them if there is an identical string in the list. I then want to be able to count the amount of strings in the list.
Thanks!

Use std::set for a container that automatically keeps the elements sorted and only allows distinct elements (no duplicates), e.g.:
std::set<std::string> s;
s.emplace("Arthur");
s.emplace("Barry");
s.emplace("Barry");
s.emplace("Barry");
s.emplace("Charlie");
std::cout << std::size(s) << std::endl; // Outputs '3'.
If you need fast access and seldomly update the container you might as well use std::vector and simply remove any duplicates after every insertion.
std::vector<std::string> v;
v.emplace_back("Arthur");
v.emplace_back("Barry");
v.emplace_back("Barry");
v.emplace_back("Barry");
v.emplace_back("Charlie");
std::sort(std::begin(v), std::end(v)); // Sort needed for 'std::unique' to always work.
v.erase(std::unique(std::begin(v), std::end(v)), std::end(v)); // Remove duplicates.
std::cout << std::size(v) << std::endl; // Outputs '3'.
std::vector does not keep the elements sorted though.

Related

Where will a new element be inserted in a std::set?

I have a loop like this (where mySet is a std::set):
for(auto iter=mySet.begin(); iter!=mySet.end(); ++iter){
if (someCondition){mySet.insert(newElement);}
if (someotherCondition){mySet.insert(anothernewElement);}
}
I am experiencing some strange behavior, and I am asking myself if this could be due to the inserted element being inserted "before" the current iterator position in the loop. Namely, I have an Iteration where both conditions are true, but still the distance
distance(iter, mySet.end())
is only 1, not 2 as I would expect. Is my guess about set behavior right? And more importantly, can I still do what I want to do?
what I'm trying to do is to build "chains" on a hexagonal board beween fields of the same color. I have a set containing all fields of my color, and the conditions check the color of neighboring fields, and if they are of the same color, copy this field to mySet, so the chain.
I am trying to use std::set for this because it allows no fields to be in the chain more than once. Reading the comments so far I fear I need to swich to std::vector, where append() will surely add the element at the end, but then I will run into new problems due to having to think of a way to forbid doubling of elements. I therefore am hoping for advice how to solve this the best way.
Depending on the new element's value, it may be inserted before or after current iterator value. Below is an example of inserting before and after an iterator.
#include <iostream>
#include <set>
int main()
{
std::set<int> s;
s.insert(3);
auto it = s.begin();
std::cout << std::distance(it, s.end()) << std::endl; // prints 1
s.insert(2); // 2 will be inserted before it
std::cout << std::distance(it, s.end()) << std::endl; // prints 1
s.insert(5); // 5 will be inserted after it
std::cout << std::distance(it, s.end()) << std::endl; // prints 2
}
Regarding your question in the comments: In my particular case, modifying it while iterating is basically exactly what I want, but of course I need to add averything after the current position; no you can not manually arrange the order of the elements. A new value's order is determined by comparing the new one and existing elements. Below is the quote from cppreference.
std::set is an associative container that contains a sorted set of unique objects of type Key. Sorting is done using the key comparison function Compare. Search, removal, and insertion operations have logarithmic complexity. Sets are usually implemented as red-black trees.
Thus, the implementation of the set will decide where exactly it will be placed.
If you really need to add values after current position, you need to use a different container. For example, simply a vector would be suitable:
it = myvector.insert ( it+1 , 200 ); // +1 to add after it
If you have a small number of items, doing a brute-force check to see if they're inside a vector can actually be faster than checking if they're in a set. This is because vectors tend to have better cache locality than lists.
We can write a function to do this pretty easily:
template<class T>
void insert_unique(std::vector<T>& vect, T const& elem) {
if(std::find(vect.begin(), vect.end(), elem) != vect.end()) {
vect.push_back(elem);
}
}

C++ Insert result of permutations into a vector

I am encountering the issue that the first result of the permutation is being entered into the vector, but on the next for_each loop iteration the size of the vector resets itself to {size = 0}, instead of increasing its size and inserting the second permutation, and so on. How do I get around this? I've tried using a while loop but I couldn't work out what the condition for it should be.
I also wanted to ask, as later on I will need to compare the values in this vector to a vector containing a dictionary, would the current code (when working correctly) allow me to do so.
This is my code so far:
for_each(permutations.begin(), permutations.end(), [](string stringPermutations)
{
vector<string> permutations;
permutations.push_back(stringPermutations);
cout << stringPermutations << endl;
});
So apparently it looks like the lambda always creates a new, local, vector each time it's called. If I place vector<string> permutations; outside of the lambda I get an error with permutations.push_back(stringPermutations);. So how do I go about retrieving the stringPermutations out of the lambda and into a public accessible vector?
Thanks for the help and feedback.
Declare the vector outside the lambda and use lambda capture to capture this vector:
vector<string> permutation_v;
for_each(permutations.begin(), permutations.end(), [&](string stringPermutations)
// ^
{
permutation_v.push_back(stringPermutations);
cout << stringPermutations << endl;
});
But if I were you, I would directly construct this vector as
vector<string> permutation_v{permutations.begin(), permutations.end()};
It is unclear what you want to achieve with your code, but it just seems you want to print the contents of permutations.
Then just look at the elements in the vector.
for (auto &permutation : permutations) std::cout << permutation << '\n';
The question is: why do you use an std::unordered_set<std::string> and not a std::vector<std::string> in the first place? then you do not need to copy the elements into a new vector.

Working with structure objects

I have a logic that looks like the below (Not the actual code):
StructureElement x;
For i in 1 to 1000
do
x.Elem1 = 20;
x.Elem2 = 30;
push(x into a std::vector)
end
My knowledge is that x be allocated memory only once and that the existing values will be overwritten for every iteration.
Also, the 'x' pushed into the vector will not be affected by subsequent iterations of pushing a modified 'x'.
Am I right in my observations?
Is the above optimal? I would want to keep memory consumption minimal and would not prefer using new. Am I missing anything by not using new?
Also, I pass this vector and recieve a reference to it it another method.
And, if I were to read the vector elements back, is this right?
Structure element xx = mYvector.begin()
print xx.Elem1
print xx.Elem2
Any optimizations or different ideas would be welcome.
Am I right in my observations?
Yes, if the vector is std::vector<StructureElement>, in which case it keeps its own copies if what is pushed in.
Is the above optimal?
It is sub-optimal because it results in many re-allocations of the vector's underlying data buffer, plus unnecessary assignments and copies. The compiler may optimize some of the assignments and copies away, but there is no reason, for example, to re-set the elements of x in the loop.
You can simplify it like this:
std:vector<StructureElement> v(1000, StructureElement{20, 30});
This creates a size-1000 vector containing copies of StructureElement with the desired values, which is what you seem to be trying in your pseudo-code.
To read the elements back, you have options. A range based for-loop if you want to iterate over all elements:
for (const auto& e: v):
std::cout << e.Elem1 << " " << e.Elem2 << std::endl;
Using iterators,
for (auto it = begin(v); it != end(v); ++it)
std::cout << it->Elem1 << it->Elem2 << std::endl;
Or, pass ranges in to algorithms
std::transform(begin(v), end(v), ....);

std::mutiset vs std::vector to read and write sorted strings to a file

I've a file say somefile.txt it contains names (single word) in sorted order.
I want to updated this file, after adding new name, in sorted order.
Which of the following will be most preferred way and why ?
Using a std::multiset
std::multiset<std::string> s;
std::copy(std::istream_iterator<std::string>(fin),//fin- object of std::fstream
std::istream_iterator<std::string>(),
std::inserter(s, s.begin()));
s.insert("new_name");
//Write s to the file
OR
Using a std::vector
std::vector<std::string> v;
std::copy(std::istream_iterator<std::string>(fin),
std::istream_iterator<std::string>(),
std::back_inserter(v));
v.push_back("new_name");
std::sort(v.begin(),v.end());
//Write v to the file.
The multiset is slower to insert objects than the vector, but they are held sorted.
The multiset is likely to take up more memory than the vector as it has to hold pointers to an internal tree structure. This may not always be the case as the vector may have some empty space.
I guess if you need the information to grow incrementally but always to be ready for immediate access in order then the multi set wins.
If you collect the data all at once without needing to access it in order, it is probably simpler to push it onto the vector and then sort. So how dynamic is the data to be stored is the real criterion.
std::string new_name = "new_name";
bool inserted = false;
std::string current;
while (std::cin >> current) {
if (!inserted && new_name < current) {
std::cout << new_name << '\n';
inserted = true;
}
std::cout << current << '\n';
}
Both options are basically equivalent.
In a performance-critical scenario, the vector approach will be faster, but your perf is largely going to be constrained by the disk in this case; which container you choose won't matter much.
Vectors are faster from what I could see from this guy's testing (http://fallabs.com/blog/promenade.cgi?id=34). I would suggest that you test it out and see for yourself. Performance is often related to platform and especially, in this case, datasets.
From his testing, he concluded that simple element works best with vector. For complex element (more than 4 strings for instance), multiset is faster.
Also, since vectors are big arrays, if you're adding lots of data, it may be worth looking into using another type of container (linked list for instance or a specialized boost container see Is there a sorted_vector class, which supports insert() etc.?).

C++ std::set index of insert

I've faced a following issue:
Suppose I've got a std::set named Numbers, containing n values. I want to insert (n+1)th value (equal to x), which I in advance know not to be in the set yet. What I need is some way to check, in which position will it be inserted, or, equivalently, how many of elements less than x are already contained in Numbers.
I definitely know some ways of doing it at O(n), but what I need is O(log(n)). Theoretically it might be possible as std::set is usually implemented as Binary Search Tree (presumably O(log(n)) is possible only if it stores information about sizes of each subtree in each vertex). The question is whether it's technically possible, and if it is, how to do it.
There's no "position" in set, there's iterator and set gives you no promises regarding implementation. You can, probably use lower/upper_bound and count elements, but I don't think it's going to take internals into account.
All of the set functions are going to work with iterators; the iterator of a set is bidirectional, not random-access, so determining the position will be an O(n) operation.
You don't need to know the position to insert a new element in the set, and insertions are O(log n).
You can find "position" where this new element would be inserted in O(lon(n)) using set::lower_bound, but it's just an iterator. std::set::iterator is bidirectional, not random access, so you cannot count how many elements are smaller than that new one in O(lon(n))
Maybe you should use set::lower_bound(), which time, according to this (http://lafstern.org/matt/col1.pdf) document, should be proportional to log N
Instead of:
std::set<MyT> mySet;
use:
std::set<std::pair<MyT,int>> mySet;
Then, for example:
//inserting a std::vector<MyT> myVec:
for (int i=0; i<myVec.size(); i++)
mySet.insert( std::pair<MyT,int>(myVec[i], i) );
The sorted result:
for (auto it=mySet.begin(); it!=mySet.end(); ++it)
cout << it->first << " index=" << it->second << "\n";