Removing duplicate entries in an array (C++)

Removing duplicate entries in an array (C++) - c++

I'm having an issue in which a function that in theory should remove all duplicate values from an array doesn't work. Here's how it works:
I have two arrays, and then I populate them with random numbers
between 0 and 50 inclusive.
I sort the array values in order using a sort function
I then run my dedupe function
I sort the array values in order again
I then output the values in both arrays
The problem is, the loop in the dedupe function is ran 19 times regardless of how many duplicate entries it finds, which is extremely strange. Also, it still gives duplicates.
Any ideas? Thanks!
int* dedupe(int array[ARRAY_SIZE]) //remove duplicate array values and replace with new values.
{ bool dupe = false;
while(dupe!=true)
{
for(int j=0; j<ARRAY_SIZE; j++)
{ if(array[j] == array[j+1])
{ array[j] = rand();
array[j] = array[j] % 51;
dupe = false;
}
else { dupe = true; // the cout part is for debugging
cout << dupe << endl; }
}
} return array;
}
int main()
{
int a[9], b[9];
srand(time(0));
populate(b);
populate(a);
sort(a,ARRAY_SIZE);
sort(b,ARRAY_SIZE);
dedupe(a);
dedupe(b);
sort(a,ARRAY_SIZE);
sort(b,ARRAY_SIZE);
for(int i=0; i<10; i++)
{ cout << "a[" << i << "] = " << a[i] << "\t\t" << "b[" << i << "] = " << b[i] << endl; }
return 0;
}
Nothing suggested so far has solved the problem. Does anyone know of a solution?

You're not returning from inside the for loop... so it should run exactly ARRAY_SIZE times each time.

The problem that you want to solve and the algorithm that you provided do not really match. You do not really want to remove the duplicates, but rather guarantee that all the elements in the array are different, the difference being that by removing duplicates the number of elements in the array would be less than the size of the array, but you want a full array.
I don't know what the perfect solution would be (algorithmically), but one simple answer would be creating an array of all the values in the valid range (since the range is small), shuffling it and then picking up the first N elements. Think of this as using cards to pick the values.
const int array_size = 9;
void create_array( int (&array)[array_size] ) {
const int max_value = 51;
int range[max_value];
for ( int i = 0; i < max_value; ++i ) {
range[i] = i;
}
std::random_shuffle( range, range+max_value );
std::copy_n( range, array_size, array );
}
This is not the most efficient approach, but it is simple, and with a small number of elements there should not be any performance issues. A more complex approach would be to initialize the array with the random elements in the range, sort and remove duplicates (actually remove, which means that the array will not be full at the end) and then continue generating numbers and checking whether they are new against the previously generated numbers.
Simplest approach is just comparing with every other value which is linear time but on an array of 9 elements linear time is small enough not to matter.

you are doing it wrong at
array[j] = rand();
array[j] = array[j] % 51
It will always have 1 to ARRAY SIZE!!

Related

Issues with checking an array moving both forwards and backwards simultaneously and issue printing values stored in a pointer array

Preface: Currently reteaching myself C++ so please excuse some of my ignorance.
The challenge I was given was to write a program to search through a static array with a function and return the indices of the number you were searching for. This only required 1 function and minimal effort so I decided to make it more "complicated" to practice more of the things I have learned thus far. I succeeded for the most part, but I'm having issues with my if statements within my for loop. I want them to check 2 separate spots within the array passed to it, but it is checking the same indices for both of them. I also cannot seem to get the indices as an output. I can get the correct number of memory locations, but not the correct values. My code is somewhat cluttered and I understand there are more efficient ways to do this. I would love to be shown these ways as well, but I would also like to understand where my error is and how to fix it. Also, I know 5 won't always be present within the array since I'm using a pseudo random number generator.
Thank you in advance.
#include <iostream>
#include <cstdlib>
#include <ctime>
using namespace std;
// This is supposed to walk throught the array both backwards and forwards checking for the value entered and
// incrementing the count so you know the size of the array you need to create in the next function.
int test(int A[], int size, int number) {
int count = 0;
for (int i = 0; i <= size; i++, size--)
{
if (A[i] == number)
count++;
// Does not walk backwards through the array. Why?
if (A[size] == number)
count++;
}
cout << "Count is: " << count << endl;
return (count);
}
// This is a linear search that creates a pointer array from the previous "count" variable in function test.
// It should store the indices of the value you are searching for in this newly created array.
int * search(int A[], int size, int number, int arr_size){
int *p = new int[arr_size];
int count =0;
for(int i = 0; i < size; i++){
if(A[i]==number) {
p[count] = i;
}
count++;
}
return p;
}
int main(){
// Initializing the array to zero just to be safe
int arr[99]={0},x;
srand(time(0));
// Populating the array with random numbers in between 1-100
for (int i = 0; i < 100; i++)
arr[i]= (rand()%100 + 1);
// Was using this to check if the variable was actually in the array.
// for(int x : arr)
// cout << x << " ";
// Selecting the number you wish to search for.
// cout << "Enter the number you wish to search for between 1 and 100: ";
// cin >> x;
// Just using 5 as a test case.
x = 5;
// This returns the number of instances it finds the number you're looking for
int count = test(arr, (sizeof(arr)/4), x);
// If your count returns 0 that means the number wasn't found so no need to continue.
if(count == 0){
cout << "Your number was not found " << endl;
return 0;
}
// This should return the address array created in the function "search"
int *index = search(arr, (sizeof(arr)/4), x, count);
// This should increment through the array which address you assigned to index.
for(int i=0; i < count; i++) {
// I can get the correct number of addresses based on count, just not the indices themselves.
cout << &index[i] << " " << endl;
}
return 0;
}
I deeply appreciate your help and patience as well as I want to thank you again for your help.

Why is an exception being thrown in this dynamic array?

I am having trouble understanding why this exception is being thrown. I allocated an array to receive 100 int values and want to store all odd numbers under 200 into the array (which should be 100 integer values). I'm trying to understand why my code is not working.
I have called my function to allocate an array of 100 int values. After, I created a for-loop to iterate through and store integers into the array however I created an if statement to only store odd numbers. What I can't understand is if I put my counter to 200 and use the if statement an exception is thrown, but if I don't insert the if statement and only put my counter to 100 all numbers between 1-100 stored and an exception won't be thrown.
The only thing I can think of that's causing this is when my counter is at 200 and I have the if statement to catch all odd number, somehow all numbers under 200 are being stored in the array causing the exception to be thrown.
int *allocIntArray(int);
int main() {
int *a;
a = allocIntArray(100);
for (int count = 1; count < 200; count++) {
if (a[count] % 2 == 1) {
a[count] = count;
cout << a[count] << endl;
}
}
delete[] a;
return 0;
}
int *allocIntArray(int size) {
int *newarray = new int[size]();
return newarray;
}
When I look at the program output, it only displays the odd numbers yet the exception is being thrown. That tells me my if statement is working yet something is being muddied up.
What am I missing?
Thanks for your time and knowledge.

Cause of the error
If you have an array a that was created with n elements, it is undefined behavior when trying to access an array element out of bouds. So the index MUST always be between 0 and n-1.
So the behavior of your program is undefined as soon as count is 100, since evaluating the condition in the if-clause already tries to access out of bounds.
Adjustment that does what you want
Now in addition, there is a serious bug in your program logic: If you want to add numbers that satisfy some kind of condition, you need 2 counters: one for iterating on the numbers, and one for the last index used in the array:
for (int nextitem=0, count = 1; count < 200; count++) {
if (count % 2 == 1) { // not a[count], you need to test number itself
a[nextitem++] = count;
cout << count << endl;
if (nextitem == 100) { // attention: hard numbers should be avoided
cout << "Array full: " << nextitem << " items reached at " << count <<endl;
break; // exit the for loop
}
}
}
But, this solution requires you to keep track of the last item in the array, and the size of the array (it's hard-coded here).
Vectors
You are probably learning. But in C++ a better solution would be to use vector instead of an array, and use push_back(). Vectors manage the memory, so that you can focus on your algorithm. The full program would then look like:
vector<int> a;
for (int count = 1; count < 200; count++) {
if (count % 2 == 1) {
a.push_back(count);
cout << count << endl;
}
}
cout << "Added " << a.size() << " elements" <<endl;
cout << "10th element: "<< a[9] << endl;

The problem is not how many numbers you're storing but where you're storing them; you're storing 101 in a[101], which is obviously wrong.
If the i:th odd number is C, the correct index is i-1, not C.
The most readable change is probably to introduce a new counter variable.
int main() {
int a[100] = {0};
int count = 0;
for (int number = 1; number < 200; number++) {
if (number % 2 == 1) {
a[count] = number;
count += 1;
}
}
}
I think transforming this from a search problem to a generation problem makes it easier to get right.
If you happen to remember that every odd number C can be written on the form 2 * A + 1for some A, you' will see that the sequence you're looking for is
2*0+1, 2*1+1, 2*2+1, ..., 2*99+1
so
int main()
{
int numbers[100] = {0};
for (int i = 0; i < 100; i++)
{
numbers[i] = 2 * i + 1;
}
}
You can also go the other way around, looping over the odd numbers and storing them in the right place:
int main()
{
int numbers[100] = {0};
for (int i = 1; i < 200; i += 2) // This loops over the odd numbers.
{
numbers[i/2] = i; // Integer division makes this work.
}
}

C++ How do I print elements of an array but leave out repeats? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
My assignment is to have the user type in how many elements are in an array then enter integer number to be put in the array. I then have to sort through the array and find the largest number and print out the elements of the array but if there is a repeat then only print that number one time. I also have to print out the number of times each element in the array occurs. For example if the user types in that there is 5 elements then enters 2, 1, 2, -3, 2 then it should print -3 with 1 count, 1 with 1 count, and 2 with 3 count. So far I have it so it will print out the elements and delete the repeats but I cant get it to print out the correct number of occurrences for each element. This is my code so far.
void findRepeats(int numbers[], int num)
{
int instances = 0;
cout << "Number" << " " << "Occurrences" << endl;
for (int i = 0; i < num; i++)
{
bool matching = false;
instances = 1;
for (int j = 0; (j < i); j++)
{
if (numbers[i] == numbers[j])
{
instances++;
matching = true;
}
}
if (!matching)
cout << numbers[i] << " " << instances << endl;
}
}
Right now its saying all number occur only 1 time

One approach that you could take, is to sort the numbers first, before deciding how many duplicates there are. That way, it will be easier to avoid printing results for the same number more than once, and you also won't have to loop through the entire array for each number.
void findRepeats(int numbers[], int num);
int main(){
int array[] = {2, 1, 2, -3, 2};
findRepeats(array,5);
}
void findRepeats(int numbers[], int num) {
//sort the array first
std::sort(numbers, numbers + num);
int last = numbers[0];
int count = 0;
cout << "Number of Occurrences\n";
for (int i = 0; i < num; i++) {
if (last == numbers[i]) {
++count;
} else {
cout << last << " " << count << '\n';
count = 1;
}
last = numbers[i];
}
if (count > 0) {
cout << last << " " << count << '\n';
}
}
prints:
Number of Occurrences
-3 1
1 1
2 3

I would use map or unordered_map to, well..., map the integer to the number of it's occurrences. It makes things quite simple, as it basically takes care of the duplicates for you.
#include <iostream>
#include <unordered_map>
using namespace std;
void reportCounts(const int numbers[], const size_t size){
unordered_map<int, unsigned int> counts;
//unfortunately range-for here would a little PIA to apply
//or at least I don't know convenient way
for(size_t i = 0; i < size; ++i) {
counts [ numbers[i] ]++; //increase `count` of i-th number
}
//print results
for(auto count : counts ){
cout << count.first << ' ' << count.second << endl;
}
}
int main(){
int array[] = {2, 1, 2, -3, 2};
reportCounts(array,5);
}
Since it's an assignment I am leaving figuring out the c++ shenaningans to you and http://cppreference.com. Keywords are map, map::iterator and maybe associative container which map in an example of.
I do understand that it might be harder to understand than plain implementation of some algorithm, but this is probably close to optimal solution in modern c++, and putting effort into understanding how and why it works should prove beneficial. One should notice how much less of code had to be written, and no algorithm had to be invented. Less implementation time, less place to make mistakes, less testing.

Search your array. For every integer, either record it, or increment your count of it. Repeat process till done, then print it.
How? you say? One approach would be to use parallel arrays to store the unique integers found, and another to store the count of integers. Then print the unique integers and their counts.
Code example of simple search algorithm:
#include <iostream>
#include <string>
#include <vector>
using namespace std;
void print(vector<int> valueArray,vector<int> countArray){
for(unsigned int i = 0; i < valueArray.size(); ++i){
cout<<"Found value "<<valueArray[i]<<": "<<countArray[i]<<" times."<<endl;
}
}
void findRepeats(vector<int> testArray,vector<int> &valueArray,vector<int> &countArray){
for(unsigned int i = 0; i < testArray.size(); ++i){
if(valueArray.size() == 0){
valueArray.push_back(testArray[i]);
countArray.push_back(1);
}else{
bool newEntry = true;
for(unsigned int j = 0; j < valueArray.size(); ++j){
if(testArray[i] == valueArray [j]){
countArray[j]++;
newEntry = false;
break;//After find, break out of j-for-loop to save time.
}
}
if(newEntry){
valueArray.push_back(testArray[i]);
countArray.push_back(1);
}
}
}
}
int main(){
vector<int> testArray; //To store all integers entered.
vector<int> valueArray; //To store non-copied integers, dynamically, else handle it yourself.
vector<int> countArray; //To count increments of numbers found, dynamically, else handle it yourself.
testArray = {0,2,5,4,1,3,6,2,5,9,8,7,4,1,2,6,5,4,8,3,2,1,5,8,6,9,8,7,4,4,5,6,8,2,1,3,0,0,1,2,0,2,5,8};//Dummy data.
findRepeats(testArray,valueArray,countArray);//Function to find statistics on testArray.
cout<<"\nPrinting found characters, and number of times found: "<<endl;
print(valueArray,countArray);
return 0;
}
Output would be something like:
Printing found characters, and number of times found:
Found value 0: 4 times.
Found value 2: 7 times.
Found value 5: 6 times.
Found value 4: 5 times.
Found value 1: 5 times.
Found value 3: 3 times.
Found value 6: 4 times.
Found value 9: 2 times.
Found value 8: 6 times.
Found value 7: 2 times.
In the above, I used vectors for simplicity, but if you must do so with c-style arrays, one approach would be to create all three vectors the same size, and keep one integer counter for number of indices used in the valueArray and countArray; they should share, since they're related 1 to 1. And you will need to pass it to the findRepeats function as well.
Having arrays of the same size will ensure that your values and counts will fit in your array; this would happen if every number entered was unique.

tallying elements in an array

So, I'm trying to tally the elements of an array. By this I mean, I have a large array, and each element will have multiples of itself throughout the array. I am trying to figure out how many times each element occurs, however I keep running into the issue of there being duplicate tallies. Since "x" could exist at 12 different places in the array, when I loop through it and keep a running sum, I get the tally for "x" 12 different times. Does anyone know of a simpler/better way to keep a tally of an array with no duplicates?
My code is:
where count is the number of elements
for(i=0;i<count;i++)
{
for(x=0; x<count;x++)
{
if(array[i]==array[x])
{
tallyz++;
}
}
tally[i]=tallyz-1;
tallyz=0;
}
}

std::map<X, unsigned> tally;
for(i = 0; i < count; ++i)
++tally[array[i]];
Note that this is best if the redundancy in the array is fairly high. If most items are unique you're probably better just sorting the array as others have mentioned.

If you can sort the array, simply sort it. Then all you have left is a linear scan of the elements, checking if the element behind this one is the same as the current element (don't forget bounds checking).

As an alternative to sorting, you could use a map:
template<class T, size_t N>
void printSums(T (array&)[N]) {
map<T, size_t> m;
for(T*p = array; p < array+N; ++p) {
++m[*p];
}
for(map<T,size_t>::iterator it = m.begin(); it != m.end(); ++it) {
cout << it->first << ": " << it->second << "\n";
}
}
Warning: this is untested code.

first use a map just as John said,then traverse the tally array:
std::map<X, unsigned> data;
for(i = 0; i < count; i++)
data[array[i]]++;
for(i = 0; i < count; i++)
tally[i]=data[tally[i]]-1;

Find the biggest 3 numbers in a vector

I'm trying to make a function to get the 3 biggest numbers in a vector. For example:
Numbers: 1 6 2 5 3 7 4
Result: 5 6 7
I figured I could sort them DESC, get the 3 numbers at the beggining, and after that resort them ASC, but that would be a waste of memory allocation and execution time. I know there is a simpler solution, but I can't figure it out. And another problem is, what if I have only two numbers...
BTW: I use as compiler BorlandC++ 3.1 (I know, very old, but that's what I'll use at the exam..)
Thanks guys.
LE: If anyone wants to know more about what I'm trying to accomplish, you can check the code:
#include<fstream.h>
#include<conio.h>
int v[1000], n;
ifstream f("bac.in");
void citire();
void afisare_a();
int ultima_cifra(int nr);
void sortare(int asc);
void main() {
clrscr();
citire();
sortare(2);
afisare_a();
getch();
}
void citire() {
f>>n;
for(int i = 0; i < n; i++)
f>>v[i];
f.close();
}
void afisare_a() {
for(int i = 0;i < n; i++)
if(ultima_cifra(v[i]) == 5)
cout<<v[i]<<" ";
}
int ultima_cifra(int nr) {
return nr - 10 * ( nr / 10 );
}
void sortare(int asc) {
int aux, s;
if(asc == 1)
do {
s = 0;
for(int i = 0; i < n-1; i++)
if(v[i] > v[i+1]) {
aux = v[i];
v[i] = v[i+1];
v[i+1] = aux;
s = 1;
}
} while( s == 1);
else
do {
s = 0;
for(int i = 0; i < n-1; i++)
if(v[i] < v[i+1]) {
aux = v[i];
v[i] = v[i+1];
v[i+1] = v[i];
s = 1;
}
} while(s == 1);
}
Citire = Read
Afisare = Display
Ultima Cifra = Last digit of number
Sortare = Bubble Sort

If you were using a modern compiler, you could use std::nth_element to find the top three. As is, you'll have to scan through the array keeping track of the three largest elements seen so far at any given time, and when you get to the end, those will be your answer.
For three elements that's a trivial thing to manage. If you had to do the N largest (or smallest) elements when N might be considerably larger, then you'd almost certainly want to use Hoare's select algorithm, just like std::nth_element does.

You could do this without needing to sort at all, it's doable in O(n) time with linear search and 3 variables keeping your 3 largest numbers (or indexes of your largest numbers if this vector won't change).

Why not just step through it once and keep track of the 3 highest digits encountered?
EDIT: The range for the input is important in how you want to keep track of the 3 highest digits.

Use std::partial_sort to descending sort the first c elements that you care about. It will run in linear time for a given number of desired elements (n log c) time.

If you can't use std::nth_element write your own selection function.
You can read about them here: http://en.wikipedia.org/wiki/Selection_algorithm#Selecting_k_smallest_or_largest_elements

Sort them normally and then iterate from the back using rbegin(), for as many as you wish to extract (no further than rend() of course).
sort will happen in place whether ASC or DESC by the way, so memory is not an issue since your container element is an int, thus has no encapsulated memory of its own to manage.

Yes sorting is good. A especially for long or variable length lists.
Why are you sorting it twice, though? The second sort might actually be very inefficient (depends on the algorithm in use). A reverse would be quicker, but why even do that? If you want them in ascending order at the end, then sort them into ascending order first ( and fetch the numbers from the end)

I think you have the choice between scanning the vector for the three largest elements or sorting it (either using sort in a vector or by copying it into an implicitly sorted container like a set).

If you can control the array filling maybe you could add the numbers ordered and then choose the first 3 (ie), otherwise you can use a binary tree to perform the search or just use a linear search as birryree says...

Thank #nevets1219 for pointing out that the code below only deals with positive numbers.
I haven't tested this code enough, but it's a start:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> nums;
nums.push_back(1);
nums.push_back(6);
nums.push_back(2);
nums.push_back(5);
nums.push_back(3);
nums.push_back(7);
nums.push_back(4);
int first = 0;
int second = 0;
int third = 0;
for (int i = 0; i < nums.size(); i++)
{
if (nums.at(i) > first)
{
third = second;
second = first;
first = nums.at(i);
}
else if (nums.at(i) > second)
{
third = second;
second = nums.at(i);
}
else if (nums.at(i) > third)
{
third = nums.at(i);
}
std::cout << "1st: " << first << " 2nd: " << second << " 3rd: " << third << std::endl;
}
return 0;
}

The following solution finds the three largest numbers in O(n) and preserves their relative order:
std::vector<int>::iterator p = std::max_element(vec.begin(), vec.end());
int x = *p;
*p = std::numeric_limits<int>::min();
std::vector<int>::iterator q = std::max_element(vec.begin(), vec.end());
int y = *q;
*q = std::numeric_limits<int>::min();
int z = *std::max_element(vec.begin(), vec.end());
*q = y; // restore original value
*p = x; // restore original value

A general solution for the top N elements of a vector:
Create an array or vector topElements of length N for your top N elements.
Initialise each element of topElements to the value of your first element in your vector.
Select the next element in the vector, or finish if no elements are left.
If the selected element is greater than topElements[0], replace topElements[0] with the value of the element. Otherwise, go to 3.
Starting with i = 0, swap topElements[i] with topElements[i + 1] if topElements[i] is greater than topElements[i + 1].
While i is less than N, increment i and go to 5.
Go to 3.
This should result in topElements containing your top N elements in reverse order of value - that is, the largest value is in topElements[N - 1].

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Removing duplicate entries in an array (C++) - c++

You're not returning from inside the for loop... so it should run exactly ARRAY_SIZE times each time.

you are doing it wrong at array[j] = rand(); array[j] = array[j] % 51 It will always have 1 to ARRAY SIZE!!

Related

Issues with checking an array moving both forwards and backwards simultaneously and issue printing values stored in a pointer array

Why is an exception being thrown in this dynamic array?

C++ How do I print elements of an array but leave out repeats? [closed]

tallying elements in an array

Find the biggest 3 numbers in a vector

Categories

Resources