Faster way to sort an array of structs c++ - c++

I have a struct called count declared called count with two things in it, an int called frequency and a string called word. To simplify, my program takes in a book as a text file and I count how many times each word appears. I have an array of structs and I have my program to where it will count each time a word appears and now I want a faster way to sort the array by top frequency than the way I have below. I used the bubble sorting algorithm below but it is taking my code way too long to run using this method. Any other suggestions or help would be welcome!! I have looked up sort from the algorithm library but don't understand how I would use it here. I am new to c++ so lots of explanation on how to use sort would help a lot.
void sortArray(struct count array[],int size)
{
int cur_pos = 0;
string the_word;
bool flag= true;
for(int i=0; i<(size); i++)
{
flag = false;
for(int j=0; j< (size); j++)
{
if((array[j+1].frequency)>(array[j].frequency))
{
cur_pos = array[j].frequency;
the_word = array[j].word;
array[j].frequency = array[j+1].frequency;
array[j].word = array[j+1].word;
array[j+1].frequency = cur_pos;
array[j+1].word = the_word;
flag = true;
}
}
}
};

You just need to define operator less for your structures,
and use std::sort, see example:
http://en.wikipedia.org/wiki/Sort_%28C%2B%2B%29

After you created a pair of for the data set, you can use std::map as container and insert the pairs into it. If you want to sort according to frequency define std:map as follows
std::map myMap;
myMap.insert(std::make_pair(frequency,word));
std::map is internally using a binary tree so you will get a sorted data when you retrieve it.

Related

Sorting a vector of structures based on one of the elements

I was writing a program to input the marks of n students in four subjects and then find the rank of one of them based on the total scores (from codeforces.com: https://codeforces.com/problemset/problem/1017/A). I thought storing the marks in a structure would help keeping track of the various subjects.
Now, what I did is simply implement a bubble sort on the vector while checking the total value. I want to know, is there a way that I can sort the vector based on just one of the members of the struct using std::sort()? Also, how do we make it descending?
Here is what the code looks like right now:
//The Structure
struct scores
{
int eng, ger, mat, his, tot, rank;
bool tommyVal;
};
//The Sort (present inside the main function)
bool sorted = false;
while (!sorted)
{
sorted = true;
for (int i = 0; i < n-1; i++)
{
if (stud[i].tot < stud[i + 1].tot)
{
std::swap(stud[i], stud[i + 1]);
sorted = false;
}
}
}
Just in case you're interested, I need to find the rank of a student named Thomas. So, for that, I set the value of tommyVal true for his element, while I set it as false for the others. This way, I can easily locate Thomas' marks even though his location in the vector has changed after sorting it based on their total marks.
Also nice to know that std::swap() works for swapping entire structs as well. I wonder what other data structures it can swap.
std::sort() allows you to give it a predicate so you can perform comparisons however you want, eg:
std::sort(
stud.begin(),
stud.begin()+n, // <-- use stud.end() instead if n == stud.size() ...
[](const scores &a, const scores &b){ return a.tot < b.tot; }
);
Simply use return b.tot < a.tot to reverse the sorting order.

2D Vector - Remove Rows by search

I'm quite new to vector and need some additional help with regards to vector manipulation.
I've currently created a global StringArray Vector that is populated by string values from a text file.
typedef std::vector<std::string> StringArray;
std::vector<StringArray> array1;
I've created a function called "Remove" which takes the input from the user and will eventually compare the input against the first value in the array to see whether it's a match. If it is, the entire row will then deleted and all elements beneath the deleted row will be "shuffled up" a position to fill the game.
The populated array looks like this:
Test1 Test2 Test3
Cat1 Cat2 Cat3
Dog1 Dog2 Dog3
And the remove function looks like this:
void remove()
{
string input;
cout << "Enter the search criteria";
cin >> input;
I know that I will need a loop to iterate through the array and compare each element with the input value and check whether it's a match.
I think this will look like:
for (int i = 0; i < array1.size(); i++)
{
for (int j = 0; j < array1[i].size(); j++)
{
if (array1[i] = input)
**//Remove row code goes here**
}
}
But that's as far as I understand. I'm not really sure A) if that loop is correct and B) how I would go about deleting the entire row (not just the element found). Would I need to copy across the array1 to a temp vector, missing out the specified row, and then copying back across to the array1?
I ultimately want the user to input "Cat1" for example, and then my array1 to end up being:
Test1 Test2 Test3
Dog1 Dog2 Dog3
All help is appreciated. Thank you.
So your loop is almost there. You're correct in using one index i to loop through the outer vector and then using another index j to loop through the inner vectors. You need to use j in order to get a string to compare to the input. Also, you need to use == inside your if statement for comparison.
for (int i = 0; i < array1.size(); i++)
{
for (int j = 0; j < array1[i].size(); j++)
{
if (array1[i][j] == input)
**//Remove row code goes here**
}
}
Then, removing a row is the same as removing any vector element, i.e. calling array1.erase(array1.begin() + i); (see How do I erase an element from std::vector<> by index?)
Use std::list<StringArray> array1;
Erasing an item from an std::vector is less efficient as it has to move all the proceeding data.
The list object will allow you to remove an item (a row) from the list without needing to move the remaining rows up. It is a linked list, so it won't allow random access using a [ ] operator.
You can use explicit loops, but you can also use already implemented loops available in the standard library.
void removeTarget(std::vector<StringArray>& data,
const std::string& target) {
data.erase(
std::remove_if(data.begin(), data.end(),
[&](const StringArray& x) {
return std::find(x.begin(), x.end(), target) != x.end();
}),
data.end());
}
std::find implements a loop to search for an element in a sequence (what you need to see if there is a match) and std::remove_if implements a loop to "filter out" elements that match a specific rule.
Before C++11 standard algorithms were basically unusable because there was no easy way to specify custom code parameters (e.g. comparison functions) and you had to code them separately in the exact form needed by the algorithm.
With C++11 lambdas however now algorithms are more usable and you're not forced to create (and give a reasonable name to) an extra global class just to implement a custom rule of matching.

Which data structure and algorithm is appropriate for this?

I have 1000's of string. Given a pattern that need to be searched in all the string, and return all the string which contains that pattern.
Presently i am using vector for to store the original strings. searching for a pattern and if matches add it into new vector and finally return the vector.
int main() {
vector <string> v;
v.push_back ("maggi");
v.push_back ("Active Baby Pants Large 9-14 Kg ");
v.push_back ("Premium Kachi Ghani Pure Mustard Oil ");
v.push_back ("maggi soup");
v.push_back ("maggi sauce");
v.push_back ("Superlite Advanced Jar");
v.push_back ("Superlite Advanced");
v.push_back ("Goldlite Advanced");
v.push_back ("Active Losorb Oil Jar");
vector <string> result;
string str = "Advanced";
for (unsigned i=0; i<v.size(); ++i)
{
size_t found = v[i].find(str);
if (found!=string::npos)
result.push_back(v[i]);
}
for (unsigned j=0; j<result.size(); ++j)
{
cout << result[j] << endl;
}
// your code goes here
return 0;
}
Is there any optimum way to achieve the same with lesser complexity and higher performance ??
The containers I think are appropriate for your application.
However instead of std::string::find, if you implement your own KMP algorithm, then you can guarantee the time complexity to be linear in terms of the length of string + search string.
http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
As such the complexity of std::string::find is unspecified.
http://www.cplusplus.com/reference/string/string/find/
EDIT: As pointed out by this link, if the length of your strings is not large (more than 1000), then probably using std::string::find would be good enough since here tabulation etc is not needed.
C++ string::find complexity
If the result is used in the same block of code as the input string vector (it is so in your example) or even if you have a guarantee that everyone uses the result only while input exists, you don't need actually to copy strings. It could be an expensive operation, which considerably slows total algorithm.
Instead you could have a vector of pointers as the result:
vector <string*> result;
If the list of strings is "fixed" for many searches then you can do some simple preprocessing to speed up things quite considerably by using an inverted index.
Build a map of all chars present in the strings, in other words for each possible char store a list of all strings containing that char:
std::map< char, std::vector<int> > index;
std::vector<std::string> strings;
void add_string(const std::string& s) {
int new_pos = strings.size();
strings.push_back(s);
for (int i=0,n=s.size(); i<n; i++) {
index[s[i]].push_back(new_pos);
}
}
Then when asked to search for a substring you first check for all chars in the inverted index and iterate only on the list in the index with the smallest number of entries:
std::vector<std::string *> matching(const std::string& text) {
std::vector<int> *best_ix = NULL;
for (int i=0,n=text.size(); i<n; i++) {
std::vector<int> *ix = &index[text[i]];
if (best_ix == NULL || best_ix->size() > ix->size()) {
best_ix = ix;
}
}
std::vector<std::string *> result;
if (best_ix) {
for (int i=0,n=best_ix->size(); i<n; i++) {
std::string& cand = strings[(*best_ix)[i]];
if (cand.find(text) != std::string::npos) {
result.push_back(&cand);
}
}
} else {
// Empty text as input, just return the whole list
for (int i=0,n=strings.size(); i<n; i++) {
result.push_back(&strings[i]);
}
}
return result;
}
Many improvements are possible:
use a bigger index (e.g. using pairs of consecutive chars)
avoid considering very common chars (stop lists)
use hashes computed from triplets or longer sequences
search the intersection instead of searching the shorter list. Given the elements are added in order the vectors are anyway already sorted and intersection could be computed efficently even using vectors (see std::set_intersection).
All of them may make sense or not depending on the parameters of the problem (how many strings, how long, how long is the text being searched ...).
If the source text is large and static (e.g. crawled webpages), then you can save search time by pre-building a suffix tree or a trie data structure. The search pattern can than traverse the tree to find matches.
If the source text is small and changes frequently, then your original approach is appropriate. The STL functions are generally very well optimized and have stood the test of time.

How to get random and unique values from a vector? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Unique random numbers in O(1)?
Unique random numbers in an integer array in the C programming language
I have a std::vector of unique elements of some undetermined size. I want to fetch 20 unique and random elements from this vector. By 'unique' I mean that I do not want to fetch the same index more than once. Currently the way I do this is to call std::random_shuffle. But this requires me to shuffle the entire vector (which may contain over 1000 elements). I don't mind mutating the vector (I prefer not to though, as I won't need to use thread locks), but most important is that I want this to be efficient. I shouldn't be shuffling more than I need to.
Note that I've looked into passing in a partial range to std::random_shuffle but it will only ever shuffle that subset of elements, which would mean that the elements outside of that range never get used!
Help is appreciated. Thank you!
Note: I'm using Visual Studio 2005, so I do not have access to C++11 features and libraries.
You can use Fisher Yates http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
The Fisher–Yates shuffle (named after Ronald Fisher and Frank Yates), also known as the Knuth shuffle (after Donald Knuth), is an algorithm for generating a random permutation of a finite set—in plain terms, for randomly shuffling the set. A variant of the Fisher–Yates shuffle, known as Sattolo's algorithm, may be used to generate random cycles of length n instead. Properly implemented, the Fisher–Yates shuffle is unbiased, so that every permutation is equally likely. The modern version of the algorithm is also rather efficient, requiring only time proportional to the number of items being shuffled and no additional storage space.
The basic process of Fisher–Yates shuffling is similar to randomly picking numbered tickets out of a hat, or cards from a deck, one after another until there are no more left. What the specific algorithm provides is a way of doing this numerically in an efficient and rigorous manner that, properly done, guarantees an unbiased result.
I think this pseudocode should work (there is a chance of an off-by-one mistake or something so double check it!):
std::list chosen; // you don't have to use this since the chosen ones will be in the back of the vector
for(int i = 0; i < num; ++i) {
int index = rand_between(0, vec.size() - i - 1);
chosen.push_back(vec[index]);
swap(vec[index], vec[vec.size() - i - 1]);
}
You want a random sample of size m from an n-vector:
Let rand(a) return 0..a-1 uniform
for (int i = 0; i < m; i++)
swap(X[i],X[i+rand(n-i)]);
X[0..m-1] is now a random sample.
Use a loop to put random index numbers into a std::set and stop when the size() reaches 20.
std::set<int> indexes;
std::vector<my_vector::value_type> choices;
int max_index = my_vector.size();
while (indexes.size() < min(20, max_index))
{
int random_index = rand() % max_index;
if (indexes.find(random_index) == indexes.end())
{
choices.push_back(my_vector[random_index]);
indexes.insert(random_index);
}
}
The random number generation is the first thing that popped into my head, feel free to use something better.
#include <iostream>
#include <vector>
#include <algorithm>
template<int N>
struct NIntegers {
int values[N];
};
template<int N, int Max, typename RandomGenerator>
NIntegers<N> MakeNRandomIntegers( RandomGenerator func ) {
NIntegers<N> result;
for(int i = 0; i < N; ++i)
{
result.values[i] = func( Max-i );
}
std::sort(&result.values[0], &result.values[0]+N);
for(int i = 0; i < N; ++i)
{
result.values[i] += i;
}
return result;
};
Use example:
// use a better one:
int BadRandomNumberGenerator(int Max) {
return Max>4?4:Max/2;
}
int main() {
NIntegers<100> result = MakeNRandomIntegers<100, 500>( BadRandomNumberGenerator );
for (int i = 0; i < 100; ++i) {
std::cout << i << ":" << result.values[i] << "\n";
}
}
make each number 1 smaller in max than the last. Sort them, then bump up each value by the number of integers before it.
template stuff is just trade dress.

Template Sort In C++

Hey all, I'm trying to write a sort function but am having trouble figuring out how to initialize a value, and making this function work as a generic template. The sort works by:
Find a pair =(ii,jj)= with a minimum value = ii+jj = such at A[ii]>A[jj]
If such a pair exists, then
swap A[ii] and A[jj] else
break;
The function I have written is as follows:
template <typename T>
void sort(T *A, int size)
{
T min =453;
T temp=0;
bool swapper = false;
int index1 = 0, index2 = 0;
for (int ii = 0; ii < size-1; ii++){
for (int jj = ii + 1; jj < size; jj++){
if((min >= (A[ii]+A[jj])) && (A[ii] > A[jj])){
min = (A[ii]+A[jj]);
index1 = ii;
index2 = jj;
swapper = true;
}
}
}
if (!swapper)
return;
else
{
temp = A[index1];
A[index1] = A[index2];
A[index2] = temp;
sort(A,size);
}
}
This function will successfully sort an array of integers, but not an array of chars. I do not know how to properly initialize the min value for the start of the comparison. I tried initializing the value by simply adding the first two elements of the array together (min = A[0] + A[1]), but it looks to me like for this algorithm it will fail. I know this is sort of a strange type of sort, but it is practice for a test, so thanks for any input.
most likely reason it fails, is because char = 453 does not produce 453 but rather different number, depending what char is (signed versus unsigned). your immediate solution would be to use numerical_limits, http://www.cplusplus.com/reference/std/limits/numeric_limits/
you may also need to think about design, because char has small range, you are likely to overflow often when adding two chars.
The maximum value of any type is std::numeric_limits<T>::max(). It's defined in <limits>.
Also, consider a redesign. This is not a good algorithm. And I would make sure I knew what I was doing before calling my sort function recursively.
I haven't put too much time reading your algorithm, but as an alternative to std::numeric_limits, you can use the initial element in your array as the initial minimum value. Then you don't have to worry about what happens if you call the function with a class that doesn't specialize std::numeric_limits, and thus can't report a maximum value.