Selectively populated vectors with substrings extracted from a source string - c++

I have a char array, in which its contents look something like the following:
char buffer[] = "I1 I2 V1 V2 I3 V3 I4 DO V4";
As you may see, it's a typical blank separated character string. I want to put all sub-string(s) starting with a letter "I" into a vector (IVector), and sort its elements in ascending order. At the same time, I'd also want to put all sub-string(s) starting with a letter "V" into another vector (VVector), and sort its elements in ascending order. The other(s) (e.g. "DO" in this example) will be ignored.
I'm not familiar with STL algorithm library. Are there any functions to help me achieve the avove-mentioned job?
Thank you!

You can iterate over all the substrings using an std::istream_iterator<std::string>:
std::stringstream s(buffer);
std::istream_iterator<std::string> begin(s);
std::istream_iterator<std::string> end;
for( ; begin != end; ++begin) {
switch((*begin)[0]) { // switch on first character
// insert into appropriate vector here
}
}
Then you can use std::sort to sort the vectors, as #Billy has already pointed out. You could also consider using an std::set, as that will always keep your items sorted in the first place.

Are there any functions to help me achieve the avove-mentioned job?
Yes. Have a look at std::find and std::sort.

Related

How to remove duplicates from a vector whose numbers might be in different positions?

How do you remove elements from a vector of vectors that are identical to another vector but whose elements are not in the same indices?
For example:
std::vector<vector<int>> vectA = {{1,3,4}. {1,2,3}, {3,2,1};
I want it so that {3,2,1} is removed from vectA and it becomes:
vectA = {{1,3,4}, {1,2,3}}
Any idea how to proceed efficiently?
Sort the elements of each vector
Drop duplicates (this is an easy look-up)
If you need to retain the original element order, then build any correspondence you wish: parallel arrays of vectors (original and sorted), pairs of (unsorted, sorted) vectors, etc. Drop duplicates based on the sorted ones.
I trust that you can take it from here.
What you are describing is the behavior of std::set, ie. this solves your problem:
set<set<int>> input = {{1,3,4}, {1,2,3}, {3,2,1}};
// input is now {{1,2,3},{1,3,4}}
This works because a set is basically equal to a sorted vector with no duplicates.
If you really want to, you can now convert to std::vector:
vector<vector<int>> nums;
for(auto & s : input) nums.emplace_back(s.begin(), s.end());

How do I use an iterator on an ifstream twice in C++?

I'm new to C++ and I'm confused about using iterators with ifstream. In this following code I have an ifstream variable called dataFile.
In the code I first iterate through the file once to count how many characters it has (is there a more efficient way to do this?). Then I create a matrix of that size, and iterate through again to fill the matrix.
The problem is that the iterator refuses to iterate the second time around, and will not do anything. I tried resetting the ifstream from the beginning by using dataFile.clear(), but this didn't work, probably because I have some deep misunderstanding about iterators. Could someone help me please?
typedef istreambuf_iterator<char> dataIterator;
for (dataIterator counter(dataFile), end; counter != end; ++counter, ++numCodons) {} // Finds file size in characters.
MatrixXd YMatrix = MatrixXd::Constant(3, numCodons, 0);
dataFile.clear(); // Resets the ifstream to be used again.
for (dataIterator counter(dataFile), end; counter != end; ++counter) {...}
istreambuf_iterator is an input iterator which once has been incremented, all copies of its previous value may be invalidated, not a forward iterator which guarantees validity when used in multipass algorithms. More about the category of iterators, see here.

Remove duplicate strings in string vector

I have the code, listed below, which I am trying to get to remove any duplicate football team names from a string vector. However, it is only working sometimes, it will remove duplicate names for some of the teams; but then for others there will be multiple occurrences of the same team name in the final array.
For example it would print:
aresnal
wigan
villa
liverpool
villa
Notice there are two 'villa' names, could anyone give me a suggestion?
The 'finalLeague' is the array which is storing all of the names, and is the array which needs the duplicates removing out of.
for (int i = 0;i < finalLeague.size();i++)
{
string temp = finalLeague[i];
int h = i + 1;
for (int j = i+1;j < finalLeague.size();j++)
{
if (finalLeague[j] == finalLeague[i])
{
finalLeague.erase(finalLeague.begin()+j);
}
}
}
Sure, you can use a combination of std::sort, std::unique and std::vector::erase:
std::sort(finalLeague.begin(), finalLeague.end());
auto it = std::unique(finalLeague.begin(), finalLeague.end());
finalLeague.erase(it, finalLeague.end());
Alternatively, use a container that does not accept duplicates in the first place:
std::set<std::string> finalLeague; // BST, C++03 and C++11
std::unordered_set<std::string> finalLeague; // hash table, C++11
This can also be done using a hashmap. Using #include <unordered_map> will let you use it. Note that you might have to use C++ 11 for it. Read about unordered maps here.
All you need to do is check whether the string has occurred before or not and keep pushing unique strings into a new vector.
USP of this method is that it needs minimal amount of code. Just one loop would do the trick.
you should use std::unique
std::vector<std::string> vec;
// filling vector
// ....
std::vector<std::string>::iterator it;
it = std::unique (vec.begin(), vec.end());
vec.resize(std::distance(vec.begin(),it));
#edit: as #Gorpik said, vector must be sorted before use std::unique, otherwise only equal consecutive elements will be deleted.

Word Frequency Statistics

In an pre-interview, I am faced with a question like this:
Given a string consists of words separated by a single white space, print out the words in descending order sorted by the number of times they appear in the string.
For example an input string of “a b b” would generate the following output:
b : 2
a : 1
Firstly, I'd say it is not so clear that whether the input string is made up of single-letter words or multiple-letter words. If the former is the case, it could be simple.
Here is my thought:
int c[26] = {0};
char *pIn = strIn;
while (*pIn != 0 && *pIn != ' ')
{
++c[*pIn];
++pIn;
}
/* how to sort the array c[26] and remember the original index? */
I can get the statistics of the frequecy of every single-letter word in the input string, and I can get it sorted (using QuickSort or whatever). But after the count array is sorted, how to get the single-letter word associated with the count so that I can print them out in pair later?
If the input string is made of of multiple-letter word, I plan to use a map<const char *, int> to track the frequency. But again, how to sort the map's key-value pair?
The question is in C or C++, and any suggestion is welcome.
Thanks!
I would use a std::map<std::string, int> to store the words and their counts. Then I would use something this to get the words:
while(std::cin >> word) {
// increment map's count for that word
}
finally, you just need to figure out how to print them in order of frequency, I'll leave that as an exercise for you.
You're definitely wrong in assuming that you need only 26 options, 'cause your employer will want to allow multiple-character words as well (and maybe even numbers?).
This means you're going to need an array with a variable length. I strongly recommend using a vector or, even better, a map.
To find the character sequences in the string, find your current position (start at 0) and the position of the next space. Then that's the word. Set the current position to the space and do it again. Keep repeating this until you're at the end.
By using the map you'll already have the word/count available.
If the job you're applying for requires university skills, I strongly recommend optimizing the map by adding some kind of hashing function. However, judging by the difficulty of the question I assume that that is not the case.
Taking the C-language case:
I like brute-force, straightforward algos so I would do it in this way:
Tokenize the input string to give an unsorted array of words. I'll have to actually, physically move each word (because each is of variable length); and I think I'll need an array of char*, which I'll use as the arg to qsort( ).
qsort( ) (descending) that array of words. (In the COMPAR function of qsort(), pretend that bigger words are smaller words so that the array acquires descending sort order.)
3.a. Go through the now-sorted array, looking for subarrays of identical words. The end of a subarray, and the beginning of the next, is signalled by the first non-identical word I see.
3.b. When I get to the end of a subarray (or to the end of the sorted array), I know (1) the word and (2) the number of identical words in the subarray.
EDIT new step 4: Save, in another array (call it array2), a char* to a word in the subarry and the count of identical words in the subarray.
When no more words in sorted array, I'm done. it's time to print.
qsort( ) array2 by word frequency.
go through array2, printing each word and its frequency.
I'M DONE! Let's go to lunch.
All the answers prior to mine did not give really an answer.
Let us think on a potential solution.
There is a more or less standard approach for counting something in a container.
We can use an associative container like a std::map or a std::unordered_map. And here we associate a "key", in this case the word, to a count, with a value, in this case the count of the specific word.
And luckily the maps have a very nice index operator[]. This will look for the given key and, if found, return a reference to the value. If not found, then it will create a new entry with the key and return a reference to the new entry. So, in both cases, we will get a reference to the value used for counting. And then we can simply write:
std::unordered_map<char,int> counter{};
counter[word]++;
And that looks really intuitive.
After this operation, you have already the frequency table. Either sorted by the key (the word), by using a std::map or unsorted, but faster accessible with a std::unordered_map.
Now you want to sort according to the frequency/count. Unfortunately this is not possible with maps.
Therefore we need to use a second container, like a ```std::vector`````which we then can sort unsing std::sort for any given predicate, or, we can copy the values into a container, like a std::multiset that implicitely orders its elements.
For getting out the words of a std::string we simply use a std::istringstream and the standard extraction operator >>. No big deal at all.
And because writing all this long names for the std containers, we create alias names, with the using keyword.
After all this, we now write ultra compact code and fulfill the task with just a few lines of code:
#include <iostream>
#include <string>
#include <sstream>
#include <utility>
#include <set>
#include <unordered_map>
#include <type_traits>
#include <iomanip>
// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<std::string, unsigned int>;
// Standard approach for counter
using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;
// Sorted values will be stored in a multiset
struct Comp { bool operator ()(const Pair& p1, const Pair& p2) const { return (p1.second == p2.second) ? p1.first<p2.first : p1.second>p2.second; } };
using Rank = std::multiset<Pair, Comp>;
// ------------------------------------------------------------
std::istringstream text{ " 4444 55555 1 22 4444 333 55555 333 333 4444 4444 55555 55555 55555 22 "};
int main() {
Counter counter;
// Count
for (std::string word{}; text >> word; counter[word]++);
// Sort
Rank rank(counter.begin(), counter.end());
// Output
for (const auto& [word, count] : rank) std::cout << std::setw(15) << word << " : " << count << '\n';
}

string vector not getting properly assigned using set_union

I think I'm lacking some basic understanding of assignment in C/C++ here! I have a function that computes the set union between two string vectors. The reason I do this is because the algorithm library's function set_union requires that both vectors are sorted first and if I do it the following way then I can't forget to sort:
vector<string> SetOperations::my_set_union(vector<string> set1,
vector<string> set2) {
sort(set1.begin(), set1.end());
sort(set2.begin(), set2.end());
vector<string> v;
set_union(set1.begin(), set1.end(), set2.begin(), set2.end(), back_inserter(v));
return v;
}
I then do the following:
vector<string> vec = set_ops.my_set_union(vec1, vec2);
where vec1 and vec2 are string vectors containing a single "a" and "a" each and set_ops is an instantiation of a class that I have these set operations in (like the one above). They both definitely have these elements - I have printed the two vectors out.
For some (simple?) reason, vec ends up having a single element of "a" instead of two elements ("a" and "a").
Any ideas what I'm doing wrong? Am I meant to a copy function or something?
Thank you :).
You have to use merge instead of set_union. set_union will eliminate similar enteries.
see merge and set_union refrences.
I think you've misunderstood what set_union is supposed to do.
It sounds like you want std::merge instead.
If I remember my set theory well, that is what the union of two sets is. So it's expected behavior.
The reason is that a set cannot have duplicate elements. Since the union of two sets also produces a valid set, then it will only have a single "a" value.
This is desired behavior, as mathematical sets do not contain duplicates. When you call set_union, it should remove duplicate elements (in this case your two cases of "a"). Try it on two vectors containing (respectively) ("a", "b"), and ("a", "c"). You should get a vector with just "a" back.