tallying elements in an array - c++

So, I'm trying to tally the elements of an array. By this I mean, I have a large array, and each element will have multiples of itself throughout the array. I am trying to figure out how many times each element occurs, however I keep running into the issue of there being duplicate tallies. Since "x" could exist at 12 different places in the array, when I loop through it and keep a running sum, I get the tally for "x" 12 different times. Does anyone know of a simpler/better way to keep a tally of an array with no duplicates?
My code is:
where count is the number of elements
for(i=0;i<count;i++)
{
for(x=0; x<count;x++)
{
if(array[i]==array[x])
{
tallyz++;
}
}
tally[i]=tallyz-1;
tallyz=0;
}
}

std::map<X, unsigned> tally;
for(i = 0; i < count; ++i)
++tally[array[i]];
Note that this is best if the redundancy in the array is fairly high. If most items are unique you're probably better just sorting the array as others have mentioned.

If you can sort the array, simply sort it. Then all you have left is a linear scan of the elements, checking if the element behind this one is the same as the current element (don't forget bounds checking).

As an alternative to sorting, you could use a map:
template<class T, size_t N>
void printSums(T (array&)[N]) {
map<T, size_t> m;
for(T*p = array; p < array+N; ++p) {
++m[*p];
}
for(map<T,size_t>::iterator it = m.begin(); it != m.end(); ++it) {
cout << it->first << ": " << it->second << "\n";
}
}
Warning: this is untested code.

first use a map just as John said,then traverse the tally array:
std::map<X, unsigned> data;
for(i = 0; i < count; i++)
data[array[i]]++;
for(i = 0; i < count; i++)
tally[i]=data[tally[i]]-1;

Related

How can I find repeated words in a vector of strings in C++?

I have a std::vector<string> where each element is a word. I want to print the vector without repeated words!
I searched a lot on the web and I found lots of material, but I can't and I don't want to use hash maps, iterators and "advanced" (to me) stuff. I can only use plain string comparison == as I am still a beginner.
So, let my_vec a std::vector<std::string> initialized from std input. My idea was to read all the vector and erase any repeated word once I found it:
for(int i=0;i<my_vec.size();++i){
for (int j=i+1;j<my_vec.size();++j){
if(my_vec[i]==my_vec[j]){
my_vec.erase(my_vec.begin()+j); //remove the component from the vector
}
}
}
I tried to test for std::vector<std::string> my_vec{"hey","how","are","you","fine","and","you","fine"}
and indeed I found
hey how are you fine and
so it seems to be right, but for instance if I write the simple vector std::vector<std::string> my_vec{"hello","hello","hello","hello","hello"}
I obtain
hello hello
The problem is that at every call to erase the dimension gets smaller and so I lose information. How can I do that?
Minimalist approach to your existing code. The auto-increment of j is what is ultimately breaking your algorithm. Don't do that. Instead, only increment it when you do NOT remove an element.
I.e.
for (int i = 0; i < my_vec.size(); ++i) {
for (int j = i + 1; j < my_vec.size(); ) { // NOTE: no ++j
if (my_vec[i] == my_vec[j]) {
my_vec.erase(my_vec.begin() + j);
}
else ++j; // NOTE: moved to else-clause
}
}
That is literally it.
You can store the element element index to erase and then eliminate it at the end.
Or repeat the cycle until no erase are performed.
First code Example:
std::vector<int> index_to_erase();
for(int i=0;i<my_vec.size();++i){
for (int j=i+1;j<my_vec.size();++j){
if(my_vec[i]==my_vec[j]){
index_to_erase.push_back(j);
}
}
}
//starting the cycle from the last element to the vector of index, in this
//way the vector of element remains equal for the first n elements
for (int i = index_to_erase.size()-1; i >= 0; i--){
my_vec.erase(my_vec.begin()+index_to_erase[i]); //remove the component from the vector
}
Second code Example:
bool Erase = true;
while(Erase){
Erase = false;
for(int i=0;i<my_vec.size();++i){
for (int j=i+1;j<my_vec.size();++j){
if(my_vec[i]==my_vec[j]){
my_vec.erase(my_vec.begin()+j); //remove the component from the vector
Erase = true;
}
}
}
}
Why don't you use std::unique?
You can use it as easy as:
std::vector<std::string> v{ "hello", "hello", "hello", "hello", "hello" };
std::sort(v.begin(), v.end());
v.erase(std::unique(v.begin(), v.end()), v.end());
N.B. Elements need to be sorted because std::unique works only for consecutive duplicates.
In case you don't want to change the content of the std::vector, but only have stable output, I recommend other answers.
Erasing elements from a container inside a loop is a little tricky, because after erasing element at index i the next element (in the next iteration) is not at index i+1 but at index i.
Read about the erase-remove-idiom for the idomatic way to erase elements. However, if you just want to print on the screen there is a much simpler way to fix your code:
for(int i=0; i<my_vec.size(); ++i){
bool unique = true;
for (int j=0; j<i; ++j){
if(my_vec[i]==my_vec[j]) {
unique = false;
break;
}
if (unique) std::cout << my_vec[i];
}
}
Instead of checking for elements after the current one you should compare to elements before. Otherwise "bar x bar y bar" will result in "x x bar" when I suppose it should be "bar x y".
Last but not least, consider that using the traditional loops with indices is the complicated way, while using iterators or a range-based loop is much simpler. Don't be afraid of new stuff, on the long run it will be easier to use.
You can simply use the combination of sort and unique as follows.
#include <iostream>
#include <algorithm>
#include <vector>
int main() {
std::vector<std::string> vec{"hey","how","are","you","fine","and","you","fine"};
sort(vec.begin(), vec.end());
vec.erase(unique(vec.begin(), vec.end() ), vec.end());
for (int i = 0; i < vec.size(); i ++) {
std::cout << vec[i] << " ";
}
std::cout << "\n";
return 0;
}

Can you change a pointer in loop?

Let's say I have a vector of integers:
vector<int> v(n);
Which I fill up in a for loop with valid values. What I want to do is to find a index of a given value in this vector. For example if I have a vector of 1, 2, 3, 4 and a value of 2, i'd get a index = 1. The algorithm would assume that the vector is sorted in ascending order, it would check a middle number and then depending of it's value (if its bigger or smaller than the one we're asking for) it would check one of halves of the vector. I was asked to do this recursive and using pointer. So I wrote a void function like:
void findGiven(vector<int> &v){
int i = 0;
int *wsk = &v[i];
}
and I can easily access 0th element of the vector. However I seem to have some basic knowledge lacks, because I can't really put this in a for loop to print all the values. I wanted to do something like this:
for (int j = 0; j<v.size(); j++){
cout << *wsk[j];
}
Is there a way of doing such a thing? Also I know it's recurisve, I'm just trying to figure out how to use pointers properly and how to prepare the algorithm so that later I can build it recursively. Thanks in advance!
The correct way is:
for (int wsk : v) {
cout << wsk;
}
If you insist on pointers:
int* first = v.data();
for (size_t j = 0; j < v.size(); ++j) {
cout << first[j];
}

Why can't I insert 6 million elements in STL set?

I am trying to insert a little over 6.5 million elements(ints) in an stl set. Here is the code:
set<int> s;
cout << s.max_size() << endl;
for(int i = 0; i < T.MULT * T.MAXP; i++) {
s.insert(a[i]);
}
T.MULT is 10; T.MAXP is 666013.
a is an array - statically allocated - (int a[T.MULT * T.MAXP];) that contains distinct elements.
After about 4.6 million elements s.insert() throws a bad_alloc exception. The resource monitor available on Windows 7 says I have 3 GB free memory left.
What am I doing wrong? Why can't STL set allocate the memory?
Edit: Here is the full code: http://ideone.com/rdrEnt
Edit2: apparently the inserted elements might not be distinct after all, but that should not be a problem.
Edit3: Here is a simplified version of the code: http://ideone.com/dTp0fZ
The problem actually lies in the fact that you statically allocated the array A with more than 6.5 million elements, which corrupts your program stack space. If you allocate the array on the heap, it actually works. I did some code change based on your description, it worked fine.
int *A = new int[T.MULT * T.MAXP];
for (int i= 0; i < T.MULT * T.MAXP; ++i)
{
A[i] = i; //for simplicity purpose, your array may have different elem. values
}
set<int> s;
for (int i = 0; i < T.MULT * T.MAXP; ++i )
{
s.insert(A[i]);
}
cout << s.size();
set<int>::iterator iter;
int count = 0;
for (iter = s.begin(); iter != s.end(); ++ iter)
{
cout << *iter << " ";
count ++;
if (count == 100)
{
cout <<endl;
count = 0;
}
}
delete [] A;
return 0;
It worked perfectly fine with both vector and set. It can print all those 6.6 million elements on the screen.
As other posts indicated, you may also want to try STXXL if you have interest.
You might want to take a look at STXXL.
While I can't answer your question directly, I think it is more efficient to store your data in a std::vector, sort it, and then use std::binary_search to test for the existence of the item. Storage in a std::set is relatively expensive compared to that of std::vector. That's because there is some overhead when storing each element.
As an example, here's how you could do it. This sorts the static array.
std::sort(a,a+(T.MULT*T.MAXP));
bool existence=std::binary_search(a,a+(T.MULT*T.MAXP),3);
Fast and easy.

Removing duplicate entries in an array (C++)

I'm having an issue in which a function that in theory should remove all duplicate values from an array doesn't work. Here's how it works:
I have two arrays, and then I populate them with random numbers
between 0 and 50 inclusive.
I sort the array values in order using a sort function
I then run my dedupe function
I sort the array values in order again
I then output the values in both arrays
The problem is, the loop in the dedupe function is ran 19 times regardless of how many duplicate entries it finds, which is extremely strange. Also, it still gives duplicates.
Any ideas? Thanks!
int* dedupe(int array[ARRAY_SIZE]) //remove duplicate array values and replace with new values.
{ bool dupe = false;
while(dupe!=true)
{
for(int j=0; j<ARRAY_SIZE; j++)
{ if(array[j] == array[j+1])
{ array[j] = rand();
array[j] = array[j] % 51;
dupe = false;
}
else { dupe = true; // the cout part is for debugging
cout << dupe << endl; }
}
} return array;
}
int main()
{
int a[9], b[9];
srand(time(0));
populate(b);
populate(a);
sort(a,ARRAY_SIZE);
sort(b,ARRAY_SIZE);
dedupe(a);
dedupe(b);
sort(a,ARRAY_SIZE);
sort(b,ARRAY_SIZE);
for(int i=0; i<10; i++)
{ cout << "a[" << i << "] = " << a[i] << "\t\t" << "b[" << i << "] = " << b[i] << endl; }
return 0;
}
Nothing suggested so far has solved the problem. Does anyone know of a solution?
You're not returning from inside the for loop... so it should run exactly ARRAY_SIZE times each time.
The problem that you want to solve and the algorithm that you provided do not really match. You do not really want to remove the duplicates, but rather guarantee that all the elements in the array are different, the difference being that by removing duplicates the number of elements in the array would be less than the size of the array, but you want a full array.
I don't know what the perfect solution would be (algorithmically), but one simple answer would be creating an array of all the values in the valid range (since the range is small), shuffling it and then picking up the first N elements. Think of this as using cards to pick the values.
const int array_size = 9;
void create_array( int (&array)[array_size] ) {
const int max_value = 51;
int range[max_value];
for ( int i = 0; i < max_value; ++i ) {
range[i] = i;
}
std::random_shuffle( range, range+max_value );
std::copy_n( range, array_size, array );
}
This is not the most efficient approach, but it is simple, and with a small number of elements there should not be any performance issues. A more complex approach would be to initialize the array with the random elements in the range, sort and remove duplicates (actually remove, which means that the array will not be full at the end) and then continue generating numbers and checking whether they are new against the previously generated numbers.
Simplest approach is just comparing with every other value which is linear time but on an array of 9 elements linear time is small enough not to matter.
you are doing it wrong at
array[j] = rand();
array[j] = array[j] % 51
It will always have 1 to ARRAY SIZE!!

Find the biggest 3 numbers in a vector

I'm trying to make a function to get the 3 biggest numbers in a vector. For example:
Numbers: 1 6 2 5 3 7 4
Result: 5 6 7
I figured I could sort them DESC, get the 3 numbers at the beggining, and after that resort them ASC, but that would be a waste of memory allocation and execution time. I know there is a simpler solution, but I can't figure it out. And another problem is, what if I have only two numbers...
BTW: I use as compiler BorlandC++ 3.1 (I know, very old, but that's what I'll use at the exam..)
Thanks guys.
LE: If anyone wants to know more about what I'm trying to accomplish, you can check the code:
#include<fstream.h>
#include<conio.h>
int v[1000], n;
ifstream f("bac.in");
void citire();
void afisare_a();
int ultima_cifra(int nr);
void sortare(int asc);
void main() {
clrscr();
citire();
sortare(2);
afisare_a();
getch();
}
void citire() {
f>>n;
for(int i = 0; i < n; i++)
f>>v[i];
f.close();
}
void afisare_a() {
for(int i = 0;i < n; i++)
if(ultima_cifra(v[i]) == 5)
cout<<v[i]<<" ";
}
int ultima_cifra(int nr) {
return nr - 10 * ( nr / 10 );
}
void sortare(int asc) {
int aux, s;
if(asc == 1)
do {
s = 0;
for(int i = 0; i < n-1; i++)
if(v[i] > v[i+1]) {
aux = v[i];
v[i] = v[i+1];
v[i+1] = aux;
s = 1;
}
} while( s == 1);
else
do {
s = 0;
for(int i = 0; i < n-1; i++)
if(v[i] < v[i+1]) {
aux = v[i];
v[i] = v[i+1];
v[i+1] = v[i];
s = 1;
}
} while(s == 1);
}
Citire = Read
Afisare = Display
Ultima Cifra = Last digit of number
Sortare = Bubble Sort
If you were using a modern compiler, you could use std::nth_element to find the top three. As is, you'll have to scan through the array keeping track of the three largest elements seen so far at any given time, and when you get to the end, those will be your answer.
For three elements that's a trivial thing to manage. If you had to do the N largest (or smallest) elements when N might be considerably larger, then you'd almost certainly want to use Hoare's select algorithm, just like std::nth_element does.
You could do this without needing to sort at all, it's doable in O(n) time with linear search and 3 variables keeping your 3 largest numbers (or indexes of your largest numbers if this vector won't change).
Why not just step through it once and keep track of the 3 highest digits encountered?
EDIT: The range for the input is important in how you want to keep track of the 3 highest digits.
Use std::partial_sort to descending sort the first c elements that you care about. It will run in linear time for a given number of desired elements (n log c) time.
If you can't use std::nth_element write your own selection function.
You can read about them here: http://en.wikipedia.org/wiki/Selection_algorithm#Selecting_k_smallest_or_largest_elements
Sort them normally and then iterate from the back using rbegin(), for as many as you wish to extract (no further than rend() of course).
sort will happen in place whether ASC or DESC by the way, so memory is not an issue since your container element is an int, thus has no encapsulated memory of its own to manage.
Yes sorting is good. A especially for long or variable length lists.
Why are you sorting it twice, though? The second sort might actually be very inefficient (depends on the algorithm in use). A reverse would be quicker, but why even do that? If you want them in ascending order at the end, then sort them into ascending order first ( and fetch the numbers from the end)
I think you have the choice between scanning the vector for the three largest elements or sorting it (either using sort in a vector or by copying it into an implicitly sorted container like a set).
If you can control the array filling maybe you could add the numbers ordered and then choose the first 3 (ie), otherwise you can use a binary tree to perform the search or just use a linear search as birryree says...
Thank #nevets1219 for pointing out that the code below only deals with positive numbers.
I haven't tested this code enough, but it's a start:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> nums;
nums.push_back(1);
nums.push_back(6);
nums.push_back(2);
nums.push_back(5);
nums.push_back(3);
nums.push_back(7);
nums.push_back(4);
int first = 0;
int second = 0;
int third = 0;
for (int i = 0; i < nums.size(); i++)
{
if (nums.at(i) > first)
{
third = second;
second = first;
first = nums.at(i);
}
else if (nums.at(i) > second)
{
third = second;
second = nums.at(i);
}
else if (nums.at(i) > third)
{
third = nums.at(i);
}
std::cout << "1st: " << first << " 2nd: " << second << " 3rd: " << third << std::endl;
}
return 0;
}
The following solution finds the three largest numbers in O(n) and preserves their relative order:
std::vector<int>::iterator p = std::max_element(vec.begin(), vec.end());
int x = *p;
*p = std::numeric_limits<int>::min();
std::vector<int>::iterator q = std::max_element(vec.begin(), vec.end());
int y = *q;
*q = std::numeric_limits<int>::min();
int z = *std::max_element(vec.begin(), vec.end());
*q = y; // restore original value
*p = x; // restore original value
A general solution for the top N elements of a vector:
Create an array or vector topElements of length N for your top N elements.
Initialise each element of topElements to the value of your first element in your vector.
Select the next element in the vector, or finish if no elements are left.
If the selected element is greater than topElements[0], replace topElements[0] with the value of the element. Otherwise, go to 3.
Starting with i = 0, swap topElements[i] with topElements[i + 1] if topElements[i] is greater than topElements[i + 1].
While i is less than N, increment i and go to 5.
Go to 3.
This should result in topElements containing your top N elements in reverse order of value - that is, the largest value is in topElements[N - 1].