This question already has answers here:
Frequency of ngrams (strings) in tokenized text
(2 answers)
Closed 3 years ago.
How can I use this for x amounts of numbers ? for my approach, its hard coded to 2 substrings. Also is there a better way with less time complexity? There may be a loop hole over here which needs to be fixed about the number of num im passing as I am not using the um parameter at all.
Your current approach has a few problems, including a hard-coded maximum number of ngrams, and the fixed ngram size. In addition, your short variable names and lack of comments do not help explain the code to whoever is reading it.
A simpler solution is to use a map to count the number of times each ngram occurs, and then find the one with the highest count. That would give rougly N.logN time complexity. Alternatively unordered_map would be closer to linear time complexity.
There will of course be an edge case where more than one ngram occurs the same highest count. You would need to decide which of a variety of strategies should be used to resolve that. In my example, I take advantage of intrinsic ordering of std::map to select the ngram with the lowest sort order. If using unordered_map, you'd need a different strategy for resolving contention in a deterministic way.
#include <algorithm>
#include <iostream>
#include <map>
#include <string>
std::string ngram(const std::string &input, int num)
{
if (num <= 0 || num > input.size()) return "";
// Count ngrams of size 'num'
std::map<std::string, int> ngram_count;
for(size_t i = 0; i <= input.size() - num; i++)
{
++ngram_count[input.substr(i, num)];
}
// Select ngram with highest count
std::map<std::string, int>::iterator highest = std::max_element(
ngram_count.begin(), ngram_count.end(),
[](const std::pair<std::string, int>& a, const std::pair<std::string, int>& b)
{
return a.second < b.second;
});
// Return ngram with highest count, otherwise empty string
return highest != ngram_count.end() ? highest->first : "";
}
int main()
{
std::cout << ngram("engineering", 2) << std::endl;
std::cout << ngram("engineering", 3) << std::endl;
return 0;
}
I did it a bit different than paddy, so I thought I would post it. Use std::set. He explains the issue so should get the credit for your answer.
struct test {
test(const std::string& str) :val(str), cnt(0) {}
test(const test& thet) { *this = thet; }
std::string val;
int cnt;
friend bool operator < (const test& a, const test& b) { return a.val < b.val; }
};
using test_set_type = std::set<test>;
const test ngram(std::string A, int num) {
test_set_type set;
for (auto it = A.begin(); it < A.end() - num + 1; ++it)
{
auto found = set.find(std::string(it, it + num));
if (found != set.end())
++const_cast<test&>(*found).cnt;
else
set.insert(std::string(it, it + num));
}
int find = -1;
test_set_type::iterator high = set.begin();
for (auto it = set.begin(); it != set.end(); ++it)
if(it->cnt > find)
++find, high= it;
return *high;
}
int main() {
int num = 2;
std::string word("engineering");
std::cout << ngram(word, num).val << std::endl;
return 0;
}
i have given a vector `
vector<string> inputArray = { "aba","aa","ad","vcd","aba" };
and i want to return this vector which contains only string with the longest length, in this case i want to return only {"aba","vcd","aba"}, so for now i want to erase elements which length is not equal to the highest `
vector<string> allLongestStrings(vector<string> inputArray) {
int length = inputArray.size();
int longstring = inputArray[0].length();
int count = 0;
vector<string> result;
for (int i = 0; i < length; i++)
{
if (longstring < inputArray[i].length())
{
longstring = inputArray[i].length();
}
count++;
}
for (int = 0; i<count;i++)
{
if (inputArray[i].length() != longstring)
{
inputArray[i].erase(inputArray.begin() + i);
count--;
i--;
}
}
return inputArray;
}
but i get this error no instance of overloaded fucntion "std::basic_string<_Elem,_Traits,_Alloc>::erase[with_Elem=char,_Traits=std::char_traits<char>,_Alloc=std::allocator<char>]" matches the argument list" in inputArray[i].erase(inputArray.begin()+i); this line
what's wrong?
There are other problems, but this specific compiler message is telling you that's not the right way to remove specific character(s) from a string.
However, reading the question in the OP, we see that you wanted to remove a string from a vector. To fix that one specific error, simply change
inputArray[i].erase( /*character position(s) in the string*/ )
to
inputArray.erase( /*some position in the array*/ )
Or you could fix it so it uses an iterator in the string denoted by inputArray[i] to actually delete characters from that string, which of course isn't what you said you wanted to do. The point is, the error message is because you're using the wrong iterator type because you think that you're working with a vector, but you actually told it to work with a string that you got out of the vector.
And then you will compile and have other issues which are well covered in comments already.
The issue with inputArray[i].erase(inputArray.begin() + i); can be fixed as shown in Kenny Ostrom's answer.
I'd like to point out that the OP could make use of the erase-remove idiom or even create a new vector with only the bigger strings instead (the posted code is already copying the source vector).
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
template <typename InputIt>
auto only_the_longest_of(InputIt first, InputIt last)
{
using value_type = typename std::iterator_traits<InputIt>::value_type;
std::vector<value_type> result;
// find the longest size
auto longest = std::max_element(first, last,
[](value_type const &a, value_type const &b) {
return a.size() < b.size();
});
if ( longest == last )
return result;
// extract only the longest ones, instead of erasing
std::copy_if( first, last, std::back_inserter(result)
, [max_size = longest->size()] (value_type const& v) {
return v.size() >= max_size;
});
return result;
}
template <typename T>
auto erase_the_shortest_from(std::vector<T> &input)
{
// find the longest size
auto longest = std::max_element(input.cbegin(), input.cend(),
[](T const &a, T const &b) {
return a.size() < b.size();
});
if ( longest == input.cend() || longest->size() == 0 )
return input.end();
// implement erase-remove idiom
return input.erase(std::remove_if(
input.begin(), input.end(), [max_size = longest->size()] (T const &v) {
return v.size() < max_size;
}));
}
int main()
{
std::vector<std::string> test = {
"aba", "aa", "ad", "vcd", "aba"
};
// The original vector remain unchanged
auto result = only_the_longest_of(test.cbegin(), test.cend());
for (auto const& str : result)
std::cout << str << '\n';
std::cout << '\n';
// This will change the vector
erase_the_shortest_from(test);
for (auto const& str : test)
std::cout << str << '\n';
}
This question already has answers here:
How can I sort two vectors in the same way, with criteria that uses only one of the vectors?
(9 answers)
Closed 9 months ago.
I have several std::vector, all of the same length. I want to sort one of these vectors, and apply the same transformation to all of the other vectors. Is there a neat way of doing this? (preferably using the STL or Boost)? Some of the vectors hold ints and some of them std::strings.
Pseudo code:
std::vector<int> Index = { 3, 1, 2 };
std::vector<std::string> Values = { "Third", "First", "Second" };
Transformation = sort(Index);
Index is now { 1, 2, 3};
... magic happens as Transformation is applied to Values ...
Values are now { "First", "Second", "Third" };
friol's approach is good when coupled with yours. First, build a vector consisting of the numbers 1…n, along with the elements from the vector dictating the sorting order:
typedef vector<int>::const_iterator myiter;
vector<pair<size_t, myiter> > order(Index.size());
size_t n = 0;
for (myiter it = Index.begin(); it != Index.end(); ++it, ++n)
order[n] = make_pair(n, it);
Now you can sort this array using a custom sorter:
struct ordering {
bool operator ()(pair<size_t, myiter> const& a, pair<size_t, myiter> const& b) {
return *(a.second) < *(b.second);
}
};
sort(order.begin(), order.end(), ordering());
Now you've captured the order of rearrangement inside order (more precisely, in the first component of the items). You can now use this ordering to sort your other vectors. There's probably a very clever in-place variant running in the same time, but until someone else comes up with it, here's one variant that isn't in-place. It uses order as a look-up table for the new index of each element.
template <typename T>
vector<T> sort_from_ref(
vector<T> const& in,
vector<pair<size_t, myiter> > const& reference
) {
vector<T> ret(in.size());
size_t const size = in.size();
for (size_t i = 0; i < size; ++i)
ret[i] = in[reference[i].first];
return ret;
}
typedef std::vector<int> int_vec_t;
typedef std::vector<std::string> str_vec_t;
typedef std::vector<size_t> index_vec_t;
class SequenceGen {
public:
SequenceGen (int start = 0) : current(start) { }
int operator() () { return current++; }
private:
int current;
};
class Comp{
int_vec_t& _v;
public:
Comp(int_vec_t& v) : _v(v) {}
bool operator()(size_t i, size_t j){
return _v[i] < _v[j];
}
};
index_vec_t indices(3);
std::generate(indices.begin(), indices.end(), SequenceGen(0));
//indices are {0, 1, 2}
int_vec_t Index = { 3, 1, 2 };
str_vec_t Values = { "Third", "First", "Second" };
std::sort(indices.begin(), indices.end(), Comp(Index));
//now indices are {1,2,0}
Now you can use the "indices" vector to index into "Values" vector.
Put your values in a Boost Multi-Index container then iterate over to read the values in the order you want. You can even copy them to another vector if you want to.
Only one rough solution comes to my mind: create a vector that is the sum of all other vectors (a vector of structures, like {3,Third,...},{1,First,...}) then sort this vector by the first field, and then split the structures again.
Probably there is a better solution inside Boost or using the standard library.
You can probably define a custom "facade" iterator that does what you need here. It would store iterators to all your vectors or alternatively derive the iterators for all but the first vector from the offset of the first. The tricky part is what that iterator dereferences to: think of something like boost::tuple and make clever use of boost::tie. (If you wanna extend on this idea, you can build these iterator types recursively using templates but you probably never want to write down the type of that - so you either need c++0x auto or a wrapper function for sort that takes ranges)
I think what you really need (but correct me if I'm wrong) is a way to access elements of a container in some order.
Rather than rearranging my original collection, I would borrow a concept from Database design: keep an index, ordered by a certain criterion. This index is an extra indirection that offers great flexibility.
This way it is possible to generate multiple indices according to different members of a class.
using namespace std;
template< typename Iterator, typename Comparator >
struct Index {
vector<Iterator> v;
Index( Iterator from, Iterator end, Comparator& c ){
v.reserve( std::distance(from,end) );
for( ; from != end; ++from ){
v.push_back(from); // no deref!
}
sort( v.begin(), v.end(), c );
}
};
template< typename Iterator, typename Comparator >
Index<Iterator,Comparator> index ( Iterator from, Iterator end, Comparator& c ){
return Index<Iterator,Comparator>(from,end,c);
}
struct mytype {
string name;
double number;
};
template< typename Iter >
struct NameLess : public binary_function<Iter, Iter, bool> {
bool operator()( const Iter& t1, const Iter& t2 ) const { return t1->name < t2->name; }
};
template< typename Iter >
struct NumLess : public binary_function<Iter, Iter, bool> {
bool operator()( const Iter& t1, const Iter& t2 ) const { return t1->number < t2->number; }
};
void indices() {
mytype v[] = { { "me" , 0.0 }
, { "you" , 1.0 }
, { "them" , -1.0 }
};
mytype* vend = v + _countof(v);
Index<mytype*, NameLess<mytype*> > byname( v, vend, NameLess<mytype*>() );
Index<mytype*, NumLess <mytype*> > bynum ( v, vend, NumLess <mytype*>() );
assert( byname.v[0] == v+0 );
assert( byname.v[1] == v+2 );
assert( byname.v[2] == v+1 );
assert( bynum.v[0] == v+2 );
assert( bynum.v[1] == v+0 );
assert( bynum.v[2] == v+1 );
}
A slightly more compact variant of xtofl's answer for if you are just looking to iterate through all your vectors based on the of a single keys vector. Create a permutation vector and use this to index into your other vectors.
#include <boost/iterator/counting_iterator.hpp>
#include <vector>
#include <algorithm>
std::vector<double> keys = ...
std::vector<double> values = ...
std::vector<size_t> indices(boost::counting_iterator<size_t>(0u), boost::counting_iterator<size_t>(keys.size()));
std::sort(begin(indices), end(indices), [&](size_t lhs, size_t rhs) {
return keys[lhs] < keys[rhs];
});
// Now to iterate through the values array.
for (size_t i: indices)
{
std::cout << values[i] << std::endl;
}
ltjax's answer is a great approach - which is actually implemented in boost's zip_iterator http://www.boost.org/doc/libs/1_43_0/libs/iterator/doc/zip_iterator.html
It packages together into a tuple whatever iterators you provide it.
You can then create your own comparison function for a sort based on any combination of iterator values in your tuple. For this question, it would just be the first iterator in your tuple.
A nice feature of this approach is that it allows you to keep the memory of each individual vector contiguous (if you're using vectors and that's what you want). You also don't need to store a separate index vector of ints.
This would have been an addendum to Konrad's answer as it an approach for a in-place variant of applying the sort order to a vector. Anyhow since the edit won't go through I will put it here
Here is a in-place variant with a slightly higher time complexity that is due to a primitive operation of checking a boolean. The additional space complexity is of a vector which can be a space efficient compiler dependent implementation. The complexity of a vector can be eliminated if the given order itself can be modified.
Here is a in-place variant with a slightly higher time complexity that is due to a primitive operation of checking a boolean. The additional space complexity is of a vector which can be a space efficient compiler dependent implementation. The complexity of a vector can be eliminated if the given order itself can be modified. This is a example of what the algorithm is doing.
If the order is 3 0 4 1 2, the movement of the elements as indicated by the position indices would be 3--->0; 0--->1; 1--->3; 2--->4; 4--->2.
template<typename T>
struct applyOrderinPlace
{
void operator()(const vector<size_t>& order, vector<T>& vectoOrder)
{
vector<bool> indicator(order.size(),0);
size_t start = 0, cur = 0, next = order[cur];
size_t indx = 0;
T tmp;
while(indx < order.size())
{
//find unprocessed index
if(indicator[indx])
{
++indx;
continue;
}
start = indx;
cur = start;
next = order[cur];
tmp = vectoOrder[start];
while(next != start)
{
vectoOrder[cur] = vectoOrder[next];
indicator[cur] = true;
cur = next;
next = order[next];
}
vectoOrder[cur] = tmp;
indicator[cur] = true;
}
}
};
Here is a relatively simple implementation using index mapping between the ordered and unordered names that will be used to match the ages to the ordered names:
void ordered_pairs()
{
std::vector<std::string> names;
std::vector<int> ages;
// read input and populate the vectors
populate(names, ages);
// print input
print(names, ages);
// sort pairs
std::vector<std::string> sortedNames(names);
std::sort(sortedNames.begin(), sortedNames.end());
std::vector<int> indexMap;
for(unsigned int i = 0; i < sortedNames.size(); ++i)
{
for (unsigned int j = 0; j < names.size(); ++j)
{
if (sortedNames[i] == names[j])
{
indexMap.push_back(j);
break;
}
}
}
// use the index mapping to match the ages to the names
std::vector<int> sortedAges;
for(size_t i = 0; i < indexMap.size(); ++i)
{
sortedAges.push_back(ages[indexMap[i]]);
}
std::cout << "Ordered pairs:\n";
print(sortedNames, sortedAges);
}
For the sake of completeness, here are the functions populate() and print():
void populate(std::vector<std::string>& n, std::vector<int>& a)
{
std::string prompt("Type name and age, separated by white space; 'q' to exit.\n>>");
std::string sentinel = "q";
while (true)
{
// read input
std::cout << prompt;
std::string input;
getline(std::cin, input);
// exit input loop
if (input == sentinel)
{
break;
}
std::stringstream ss(input);
// extract input
std::string name;
int age;
if (ss >> name >> age)
{
n.push_back(name);
a.push_back(age);
}
else
{
std::cout <<"Wrong input format!\n";
}
}
}
and:
void print(const std::vector<std::string>& n, const std::vector<int>& a)
{
if (n.size() != a.size())
{
std::cerr <<"Different number of names and ages!\n";
return;
}
for (unsigned int i = 0; i < n.size(); ++i)
{
std::cout <<'(' << n[i] << ", " << a[i] << ')' << "\n";
}
}
And finally, main() becomes:
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
void ordered_pairs();
void populate(std::vector<std::string>&, std::vector<int>&);
void print(const std::vector<std::string>&, const std::vector<int>&);
//=======================================================================
int main()
{
std::cout << "\t\tSimple name - age sorting.\n";
ordered_pairs();
}
//=======================================================================
// Function Definitions...
**// C++ program to demonstrate sorting in vector
// of pair according to 2nd element of pair
#include <iostream>
#include<string>
#include<vector>
#include <algorithm>
using namespace std;
// Driver function to sort the vector elements
// by second element of pairs
bool sortbysec(const pair<char,char> &a,
const pair<int,int> &b)
{
return (a.second < b.second);
}
int main()
{
// declaring vector of pairs
vector< pair <char, int> > vect;
// Initialising 1st and 2nd element of pairs
// with array values
//int arr[] = {10, 20, 5, 40 };
//int arr1[] = {30, 60, 20, 50};
char arr[] = { ' a', 'b', 'c' };
int arr1[] = { 4, 7, 1 };
int n = sizeof(arr)/sizeof(arr[0]);
// Entering values in vector of pairs
for (int i=0; i<n; i++)
vect.push_back( make_pair(arr[i],arr1[i]) );
// Printing the original vector(before sort())
cout << "The vector before sort operation is:\n" ;
for (int i=0; i<n; i++)
{
// "first" and "second" are used to access
// 1st and 2nd element of pair respectively
cout << vect[i].first << " "
<< vect[i].second << endl;
}
// Using sort() function to sort by 2nd element
// of pair
sort(vect.begin(), vect.end(), sortbysec);
// Printing the sorted vector(after using sort())
cout << "The vector after sort operation is:\n" ;
for (int i=0; i<n; i++)
{
// "first" and "second" are used to access
// 1st and 2nd element of pair respectively
cout << vect[i].first << " "
<< vect[i].second << endl;
}
getchar();
return 0;`enter code here`
}**
with C++11 lambdas and the STL algorithms based on answers from Konrad Rudolph and Gabriele D'Antona:
template< typename T, typename U >
std::vector<T> sortVecAByVecB( std::vector<T> & a, std::vector<U> & b ){
// zip the two vectors (A,B)
std::vector<std::pair<T,U>> zipped(a.size());
for( size_t i = 0; i < a.size(); i++ ) zipped[i] = std::make_pair( a[i], b[i] );
// sort according to B
std::sort(zipped.begin(), zipped.end(), []( auto & lop, auto & rop ) { return lop.second < rop.second; });
// extract sorted A
std::vector<T> sorted;
std::transform(zipped.begin(), zipped.end(), std::back_inserter(sorted), []( auto & pair ){ return pair.first; });
return sorted;
}
So many asked this question and nobody came up with a satisfactory answer. Here is a std::sort helper that enables to sort two vectors simultaneously, taking into account the values of only one vector. This solution is based on a custom RadomIt (random iterator), and operates directly on the original vector data, without temporary copies, structure rearrangement or additional indices:
C++, Sort One Vector Based On Another One
I want to sort suffices of a string.
The most simple way to do that is putting all the suffices into map.
In order to use memory efficiently, I pass suffix as (str+i), where str is char* and i is a position suffix starts with. However, I found out that map is not going to sort these suffices. Here goes an example
typedef std::map < char*, int,Comparator> MapType;
MapType data;
// let's declare some initial values to this map
char* bob=(char* )"Bobs score";
char* marty=(char* ) "Martys score";
data.insert(pair<char*,int>(marty+1,15));
data.insert(pair<char*,int>(bob+1,10));
MapType::iterator end = data.end();
for (MapType::iterator it = data.begin(); it != end; ++it) {
std::cout << "Who(key = first): " << it->first;
std::cout << " Score(value = second): " << it->second << '\n';
}
The output is
Who(key = first): obs score Score(value = second): 10
Who(key = first): artys score Score(value = second): 15
However, strcmp, standard function for comparing strings, works correctly for bob+1 and marty+1. It says marty+1 is less than bob+1.
The map will sort by the address of the char*, not lexiographically. Change the key to a std::string or define a comparator.
EDIT:
It looks as though you have attempted to define a Comparator but the definition of it is not posted. Here is an example:
#include <iostream>
#include <map>
#include <string.h>
struct cstring_compare
{
bool operator()(const char* a_1, const char* a_2) const
{
return strcmp(a_1, a_2) < 0;
}
};
typedef std::map<const char*, int, cstring_compare> cstring_map;
int main()
{
cstring_map m;
m["bcd"] = 1;
m["acd"] = 1;
m["abc"] = 1;
for (cstring_map::iterator i = m.begin(); i != m.end(); i++)
{
std::cout << i->first << "\n";
}
return 0;
}
Output:
abc
acd
bcd
define a custom Comparator, eg
class compare_char {
public:
bool operator()(const char* lhs, const char* rhs) { return strcmp(lhs, rhs); }
};
define your map using this comparator instead of whatever you currently have. Alternatively, use a map with a key type that has a comparison operator that works with values, a std::string is better for you. Currently you have a map using char* as the key which compares char* types, ie. the value of the pointer, not the contents.
You should add the comparer class or function you are using since that is where your error is probably coming from.
There is a slight difference between strcmp and a map comparaison function.
strcmp returns 0 if a == b, -1 if a < b, 1 if a > b
comp returns true is a < b, false otherwise.
A correct way to implement the comparison function is the following:
bool operator() (char* lhs, char* rhs) const
{
return strcmp(lhs,rhs) < 0;
}
I'm having a very odd problem with some code using std::sort. If I replace std::sort by stable_sort the problem goes away.
class Entry
{
public:
Entry() : _date(0), _time(0), _size(0) {}
Entry(unsigned int d, unsigned int t, unsigned int s) : _date(d), _time(t), _size(s) {}
~Entry() {_size=0xfffffffe;}
unsigned int _date, _time, _size;
};
void initialise(std::vector<Entry> &vec)
vec.push_back(Entry(0x3f92, 0x9326, 0x1ae));
vec.push_back(Entry(0x3f92, 0x9326, 0x8a54));
//.... + a large number of other entries
}
static bool predicate(const Entry &e1, const Entry &e2)
{
// Sort by date and time, then size
if (e1._date < e2._date )
return true;
if (e1._time < e2._time )
return true;
return e1._size < e2._size;
}
int main (int argc, char * const argv[]) {
using namespace std;
vector<Entry> vec;
initialise(vec);
sort(vec.begin(), vec.end(), predicate);
vector<Entry>::iterator iter;
for (iter=vec.begin(); iter!=vec.end(); ++iter)
cout << iter->_date << ", " << iter->_time <<
", 0x" << hex << iter->_size << endl;
return 0;
}
The idea is that I sort the data first by date and time then by size. However, depending on the data in the vector, I will end up with 0xfffffffe in the size printed out at the end for the first object, indicating that a destroyed object has been accessed, or a seg fault during the sort.
(Xcode 3.2.4 - 64 bit intel target)
Any ideas anyone??
I suspect it has something to do with my predicate, but I can't see for the life of me what it is....!!
This page seems to refer to the same problem:
http://schneide.wordpress.com/2010/11/01/bug-hunting-fun-with-stdsort/
but the reason he gives (that the predicate needs to define a strict weak ordering) seems to be satisfied here...
Your predicate does not satisfy strict weak ordering criteria. Look at your function and ask yourself, what happens if e1's date comes after e2, but e1's time comes before e2?
I think what your predicate really should be is something like this:
static bool predicate(const Entry &e1, const Entry &e2)
{
// Sort by date and time, then size
return e1._date < e2._date ||
(e1._date == e2._date &&
(e1._time < e2._time ||
(e1._time == e2._time && e1._size < e2._size)));
}
What you wrote - if e1._date>e2._date, the first condition will be false, but the second may still be true and the function will still claim that e1<e2 which is probably not what you want.
Your code needs to be:
static bool predicate(const Entry &e1, const Entry &e2)
{
// Sort by date and time, then size
if (e1._date != e2._date )
return e1._data < e2._date;
if (e1._time != e2._time )
return e1._time < e2._time;
return e1._size < e2._size;
}
If e2's date is after e1, then your version treats goes on to compare the time and size. This is not what you want. This eventually confuses std::sort because if you swap e1 and e2 you will not get a consistent answer.