How to find first non-repeating element? - c++

How to find first non-repeating element in an array.
Provided that you can only use 1 bit for every element of the array and time complexity should be O(n) where n is length of array.
Please make sure that I somehow imposed constraint on memory requirements. It is also possible that it can not be done with just an extra bit per element of the string. Also please let me know if it is possible or not?

I would say there is no comparison based algorithm, that can do it in O(n). As you have to compare the the first element of the array with all others, the 2nd with all except the first, the 3rd with all except the first = Sum i = O(n^2).
(But that does not necessarily mean that there is no faster algorithm, see sorting: There is a proof that you cant sort fast than O(n log n) if you are comparison based - and there is indeed one faster: Bucket Sort, which can do it in O(n)).
EDIT: In one of the other comments I said something about hash functions. I checked some facts about it, and here are the hashmap approach thoughts:
Obvious approach is (in Pseudocode):
for (i = 0; i < maxsize; i++)
count[i] = 0;
for (i = 0; i < maxsize; i++) {
h = hash(A[i]);
count[h]++;
}
first = -1;
for (i = 0; i < maxsize; i++)
if (count[i] == 0) {
first = i;
break;
}
}
for (i = 0; hash(A[i]) != first; i++) ;
printf("first unique: " + A[i]);
There are some caveats:
How to get hash. I did some research on perfect hash functions. And indeed you can generate one in O(n). (Optimal algorithms for minimal perfect hashing by George Havas et al. - Not sure how good this paper is, as it claims as Time Limit O(n) but speaks from non linear space limit (which is plan an error, I hope I am not the only seeing the flaw in the this, but according to all theorical computer science I know off time is an upper border for space (as you dont have time to write in more space)). But I believe them when they say it is possible in O(n).
The additional space - here I dont see a solution. Above papers cites some research that says that you need 2.7 bits for the perfect hash function. With the additional count array (which you can shorten to the states: Empty + 1 Element + More than 1 Element) you need 2 additional bits per element (1.58 if you assume you can it somehow combine with the above 2.7), which sums up to additional 5 bits.

Here I'm just taking one assumption that the string is Character String, just containing small alphabets, so that I can use one Integer (32 bit) so that with 26 alphabets it will be sufficient to take one bit per alphabet. Earlier I thought to take an array of 256 elements but then it will have 256*32 bits in total. 32 bits per element. But finally I found that I will be unable to do it without one more variable. So the solution is like this with just one integer (32 bits) for 26 alphabets:
int print_non_repeating(char* str)
{
int bitmap = 0, bitmap_check = 0;
int length = strlen(str);
for(int i=0;i<len;i++)
{
if(bitmap & 1<<(str[i] - 'a'))
{
bitmap_check = bitmap_check | ( 1 << (str[i] - 'a');
}
else
bitmap = bitmap | (1 << str[i] - 'a');
}
bitmap = bitmap ^ bitmap_check;
i = 0;
if(bitmap != 0)
{
while(!bitmap & (1<< (str[i])))
i++;
cout<<*(str+i);
return 1;
}
else
return 0;
}

You can try doing a modified bucketsort as exemplified below. However, you need to know the max value in the array passed into the firstNonRepeat method. So this runs at O(n).
For comparison based methods, the theoretical fastest (at least in terms of sorting) is O(n log n). Alternatively, you can even use modified versions of radix sort to accomplish this.
public class BucketSort{
//maxVal is the max value in the array
public int firstNonRepeat(int[] a, int maxVal){
int [] bucket=new int[maxVal+1];
for (int i=0; i<bucket.length; i++){
bucket[i]=0;
}
for (int i=0; i<a.length; i++){
if(bucket[a[i]] == 0) {
bucket[a[i]]++;
} else {
return bucket[a[i]];
}
}
}
}

This code finds the first repeating element. havent figured out yet if in the same for loop if it is possible to find the non-repeating element without introducing another for (to keep the code O(n)). Other answers suggest bubble sort which is O(n^2)
#include <iostream>
using namespace std;
#define max_size 10
int main()
{
int numbers[max_size] = { 1, 2, 3, 4, 5, 1, 3, 4 ,2, 7};
int table[max_size] = {0,0,0,0,0,0,0,0,0,0};
int answer = 0, j=0;
for (int i = 0; i < max_size; i++)
{
j = numbers[i] %max_size;
table[j]++;
if(table[j] >1)
{
answer = 1;
break;
}
}
std::cout << "answer = " << answer ;
}

Related

How to choose a random number excluding those which were previously chosen? [duplicate]

I'd like to make a number generator that does not repeat the number it has given out
already (C++).
All I know is:
int randomgenerator(){
int random;
srand(time(0));
random = rand()%11;
return(random);
} // Added this on edition
That function gives me redundant numbers.
I'm trying to create a questionnaire program that gives out 10 questions in a random order and I don't want any of the questions to reappear.
Does anyone know the syntax?
What I would do:
Generate a vector of length N and fill it with values 1,2,...N.
Use std::random_shuffle.
If you have say 30 elements and only want 10, use the first 10 out the vector.
EDIT: I have no idea how the questions are being stored, so.. :)
I am assuming the questions are being stored in a vector or somesuch with random access. Now I have generated 10 random numbers which don't repeat: 7, 4, 12, 17, 1, 13, 9, 2, 3, 10.
I would use those as indices for the vector of questions:
std::vector<std::string> questions;
//fill with questions
for(int i = 0; i < number_of_questions; i++)
{
send_question_and_get_answer(questions[i]);
}
You are trying to solve the problem "the wrong way".
Try this instead (supposing you have a vector<int> with question ids, but the same idea will work with whatever you have):
Get a random R from 0 to N-1 where N is the number of questions in the container
Add question R to another collection of "selected" questions
If the "selected questions" collection has enough items, you 're done
Remove question R from your original container (now N has decreased by 1)
Go to 1
Sounds like you essentially want to shuffle a deck of cards (in this case, the "cards" being the questions, or question numbers).
In C++, I would do:
#include <vector>
#include <algorithms>
std::vector<int> question_numbers;
for (unsigned int i = 0; i < 10; ++i)
question_numbers.push_back(i+1);
std::random_shuffle(question_numbers.begin(), question_numbers.end());
// now dole out the questions based on the shuffled numbers
You do not have to hand out all of the questions, any more than you have to deal out a whole deck of cards every time you play a game. You can, of course, but there's no such requirement.
Create a vector of 10 elements (numbers 1-10), then shuffle it, with std::random_shuffle. Then just iterate through it.
Should look more like this: (Note: does not solve your original problem).
int randomgenerator(){
int random;
// I know this looks re-dunand compared to %11
// But the bottom bits of rand() are less random than the top
// bits do you get a better distribution like this.
random = rand() / (RAND_MAX / 11);
return random;
}
int main()
{
// srand() goes here.
srand(time(0));
while(true)
{
std::cout << randomgenerator() << "\n";
}
}
A better way to solve the original problem is to pre-generate the numbers so you know that each number will appear only once. Then shuffle the order randomly.
int main()
{
int data[] = { 0,1,2,3,4,5,6,7,8,9,10,11};
int size = sizeof(data)/sizeof(data[0]);
std::random_shuffle(data, data + size);
for(int loop = 0; loop < size; ++loop)
{
std::cout << data[loop] << "\n";
}
}
Why not use some STL to perform the checks for you? The idea:
Create an (initially empty) set of 10 integers that will be the indices of the random questions (they will be distinct as a set forbids duplicate items). Keep pushing random numbers in [0, num_of_questions-1] in there until it grows to a size of 10 (duplicates will get rejected automatically). When you have that set ready, iterate over it and output the questions of the corresponding indexes:
std::vector<std::string> questions = /* I am assuming questions are stored in here */
std::set<int> random_indexes;
/* loop here until you get 10 distinct integers */
while (random_indexes.size() < 10) random_indexes.insert(rand() % questions.size());
for (auto index: random_indexes){
std::cout << questions[index] <<std::endl;
}
I may be missing something, but it seems to me the answers that use shuffling of either questions or indexes perform more computations or use an unnecessary memory overhead.
//non repeating random number generator
for (int derepeater = 0; derepeater < arraySize; derepeater++)
{
for (int j = 0; j < arraySize; j++)
{
for (int i = arraySize; i > 0; i--)
{
if (Compare[j] == Compare[i] && j != i)
{
Compare[j] = rand() % upperlimit + 1;
}
}
}
}

Writing two versions of a function, one for "clarity" and one for "speed"

My professor assigned homework to write a function that takes in an array of integers and sorts all zeros to the end of the array while maintaining the current order of non-zero ints. The constraints are:
Cannot use the STL or other templated containers.
Must have two solutions: one that emphasizes speed and another that emphasizes clarity.
I wrote up this function attempting for speed:
#include <iostream>
#include <cstdio>
#include <cstdlib>
using namespace std;
void sortArray(int array[], int size)
{
int i = 0;
int j = 1;
int n = 0;
for (i = j; i < size;)
{
if (array[i] == 0)
{
n++;
i++;
}
else if (array[i] != 0 && j != i)
{
array[j++] = array[i++];
}
else
{
i++;
n++;
}
}
while (j < size)
{
array[j++] = 0;
}
}
int main()
{
//Example 1
int array[]{20, 0, 0, 3, 14, 0, 5, 11, 0, 0};
int size = sizeof(array) / sizeof(array[0]);
sortArray(array, size);
cout << "Result :\n";
for (int i = 0; i < size; i++)
{
cout << array[i] << " ";
}
cout << endl << "Press any key to exit...";
cin.get();
return 0;
}
It outputs correctly, but;
I don't know what the speed of it actually is, can anyone help me figure out how to calculate that?
I have no idea how to go about writing a function for "clarity"; any ideas?
I my experience, unless you have very complicated algorithm, speed and clarity come together:
void sortArray(int array[], int size)
{
int item;
int dst = 0;
int src = 0;
// collect all non-zero elements
while (src < size) {
if (item = array[src++]) {
array[dst++] = item;
}
}
// fill the rest with zeroes
while (dst < size) {
array[dst++] = 0;
}
}
Speed comes from a good algorithm. Clarity comes from formatting, naming variables and commenting.
Speed as in complexity?
Since you are, and need, to look at all the elements in the array — and as such have a single loop going through the indexes in the range [0, N)—where N denotes the size of the input—your solution is O(N).
Further reading:
Plain English explanation of big O
Determining big O Notation
Regarding clearity
In my honest opinion there shouldn't need to be two alternatives when implementing such functionality as you are presenting. If you rename your variables to more suitable (descriptive) names your current solution should be clear enough to count as both performant and clear.
Your current approach can be written in plain english in a very clear fashion:
pseudo-explanation
set write_index to 0
set number_of_zeroes to 0
For each element in array
If element is 0
increase number_of_zeros by one
otherwise
write element value to position denoted by write_index
increase write_index by one
write number_of_zeroes 0s at the end of array
Having stated the explanation above we can quickly see that sortArray is not a descriptive name for your function, a more suitable name would probably be partition_zeroes or similar.
Adding comments could improve readability, but you current focus should lie in renaming your variables to better express the intent of the code.
(I feel your question is almost off-topic; I am answering it from a Linux perspective; I recommend using Linux to learn C++ programming; you'll adapt my advices to your operating system if you are using something else....)
speed
Regarding speed, you should have two complementary approaches.
The first (somehow "theoretical") is to analyze (i.e. think on) your algorithm and give (with some proof) its asymptotic time complexity.
The second approach (only "practical", and often pragmatical) is to benchmark and profile your program. Don't forget to compile with optimizations enabled (e.g. using g++ -Wall -O2 with GCC). Have a benchmark which runs for more than half of a second (so processes a large amount of data, e.g. several million numbers) and repeat it several times (e.g. using time(1) command on Linux). You could also measure some time inside your program using e.g. <chrono> in C++11, or just clock(3) (if you read a large array from some file, or build a large array of pseudo-random numbers with <random> or with random(3) you certainly want to measure separately the time to read or fill the array with the time to move zeros out of it). See also time(7).
(You need to process a large amount of data - more than a million items, perhaps many millions of them - because computer are very fast; a typical "elementary" operation -a machine instruction- takes less than a nanosecond, and you have lot of uncertainty on a single run, see this)
clarity
Regarding clarity, it is a bit subjective, but you might try to make your code readable and concise. Adding a few good comments could also help.
Be careful about naming: sorting is not exactly what your program is doing (it is more moving zeros than sorting the array)...
I think this is the best - Of course you may wish to use doxygen or some other
// Shift the non-zeros to the front and put zero in the rest of the array
void moveNonZerosTofront(int *list, unsigned int length)
{
unsigned int from = 0, to = 0;
// This will move the non-zeros
for (; from < length; ++from) {
if (list[from] != 0) {
list[to] = list[from];
to++;
}
}
// So the rest of the array needs to be assigned zero (as we found those on the way)
for (; to < length; +=to) {
list[to] = 0;
}
}

Large vector "Segmentation fault" error

I have gathered a large amount of extremely useful information from other peoples' questions and answers on SO, and have searched duly for an answer to this one as well. Unfortunately I have not found a solution to this problem.
The following function to generate a list of primes:
void genPrimes (std::vector<int>* primesPtr, int upperBound = 10)
{
std::ofstream log;
log.open("log.txt");
std::vector<int>& primesRef = *primesPtr;
// Populate primes with non-neg reals
for (int i = 2; i <= upperBound; i++)
primesRef.push_back(i);
log << "Generated reals successfully." << std::endl;
log << primesRef.size() << std::endl;
// Eratosthenes sieve to remove non-primes
for (int i = 0; i < primesRef.size(); i++) {
if (primesRef[i] == 0) continue;
int jumpStart = primesRef[i];
for (int jump = jumpStart; jump < primesRef.size(); jump += jumpStart) {
if (primesRef[i+jump] == 0) continue;
primesRef[i+jump] = 0;
}
}
log << "Executed Eratosthenes Sieve successfully.\n";
for (int i = 0; i < primesRef.size(); i++) {
if (primesRef[i] == 0) {
primesRef.erase(primesRef.begin() + i);
i--;
}
}
log << "Cleaned list.\n";
log.close();
}
is called by:
const int SIZE = 500;
std::vector<int>* primes = new std::vector<int>[SIZE];
genPrimes(primes, SIZE);
This code works well. However, when I change the value of SIZE to a larger number (say, 500000), the compiler returns a "segmentation error." I'm not familiar enough with vectors to understand the problem. Any help is much appreciated.
You are accessing primesRef[i + jump] where i could be primesRef.size() - 1 and jump could be primesRef.size() - 1, leading to an out of bounds access.
It is happening with a 500 limit, it is just that you happen to not have any bad side effects from the out of bound access at the moment.
Also note that using a vector here is a bad choice as every erase will have to move all of the following entries in memory.
Are you sure you wanted to do
new std::vector<int> [500];
and not
new std::vector<int> (500);
In the latter case, you are specifying the size of the vector, whose location is available to you via the variable named 'primes'.
In the former, you are requesting space for 500 vectors, each sized to the default that the STL library wants.
That would be something like (on my system : 24*500 bytes). In the latter case, 500 length vector(only one vector) is what you are asking for.
EDIT: look at the usage - he needs just one vector.
std::vector& primesRef = *primesPtr;
The problem lies here:
// Populate primes with non-neg reals
for (int i = 2; i <= upperBound; i++)
primesRef.push_back(i);
You only have N-2 elements in your vector pushed back, but then try to access an element at N-1 (i+jump). The fact that it did not fail on 500 is just dumb luck that the memory being overwritten was not catastrophic.
This code works well. However, when I change the value of SIZE to a larger number (say, 500000), ...
That may blow your stack, and be to big allocated with it. You need dynamic memory allocation for all of the std::vector<int> instances you believe to need.
To achieve that, simply use a nested std::vetcor like this.
std::vector<std::vector<int>> primes(SIZE);
instead.
But to get straight on, I seriously doubt you need number of SIZE vector instances to store all of the prime numbers found, but just a single one initialized like this:
std::vector<int> primes(SIZE);

Fast Popcount instruction or Hamming distance for binary array?

I'm implementing on Visual Studio 2010 C++
I have two binary arrays. For example,
array1[100] = {1,0,1,0,0,1,1, .... }
array2[100] = {0,0,1,1,1,0,1, .... }
To calculate the Hamming distance between array1 and array2,
array3[100] stores the xor result of array1 and array2.
Then I have to count the number of 1 bits in array3. To do this, I know I can use the __popcnt instruction.
For now, I'm doing something like below:
popcnt_result = 0;
for (i=0; i<100; i++) {
popcnt_result = popcnt_result + __popcnt(array3[i]);
}
It shows a good result but is slow. How can I make it faster?
array3 seems a bit wasteful, you're accessing a whole extra 400 bytes of memory that you don't need to. I would try comparing what you have with the following:
for (int i = 0; i < 100; ++i) {
result += (array1[i] ^ array2[i]); // could also try != in place of ^
}
If that helps at all, then I leave it as an exercise for the reader how to apply both this change and duskwuff's.
As implemented, the __popcnt call is not helping. It's actually slowing you down.
__popcnt counts the number of set bits in its argument. You're only passing in one element, which looks like it's guaranteed to be 0 or 1, so the result (also 0 or 1) is not useful. Doing this would be slightly faster:
popcnt_result += array3[i];
Depending on how your array is laid out, you may or may not be able to use __popcnt in a cleverer way. Specifically, if your array consists of one-byte elements (e.g, char, bool, int8_t, or similar), you could perform a population count on four elements at a time:
for(i = 0; i < 100; i += 4) {
uint32_t *p = (uint32_t *) &array3[i];
popcnt_result += __popcnt(*p);
}
(Note that this depends on the fact that 100 is divisible evenly by 4. You'd have to add some special-case handling for the last few elements otherwise.)
If the array consists of larger values, such as int, though, you're out of luck, and there's still no guarantee that this will be any faster than the naïve implementation above.
If your arrays only contain two values (0 or 1) the Hamming distance is just the number of positions where corresponding values are different. This can be done in one pass using std::inner_product from the standard library.
#include <iostream>
#include <functional>
#include <numeric>
int main()
{
int array1[100] = { 1,0,1,0,0,1,1, ... };
int array2[100] = { 0,0,1,1,1,0,1, ... };
int distance = std::inner_product(array1, array1 + 100, array2, 0, std::plus<int>(), std::not_equal_to<int>());
std::cout << "distance=" << distance << '\n';
return 0;
}

Shifting elements in an array C++

I've developed a method called "rotate" to my stack object class. What I did was that if the stack contains elements: {0,2,3,4,5,6,7} I would needed to rotate the elements forwards and backwards.
Where if i need to rotate forwards by 2 elements, then we would have, {3,4,5,6,7,0,2} in the array. And if I need to rotate backwards, or -3 elements, then, looking at the original array it would be, {5,6,7,0,2,3,4}
So the method that I have developed works fine. Its just terribly ineffecient IMO. I was wondering if I could wrap the array around by using the mod operator? Or if their is useless code hangin' around that I havent realized yet, and so on.
I guess my question is, How can i simplify this method? e.g. using less code. :-)
void stack::rotate(int r)
{
int i = 0;
while ( r > 0 ) // rotate postively.
{
front.n = items[top+1].n;
for ( int j = 0; j < bottom; j++ )
{
items[j] = items[j+1];
}
items[count-1].n = front.n;
r--;
}
while ( r < 0 ) // rotate negatively.
{
if ( i == top+1 )
{
front.n = items[top+1].n;
items[top+1].n = items[count-1].n; // switch last with first
}
back.n = items[++i].n; // second element is the new back
items[i].n = front.n;
if ( i == bottom )
{
items[count-1].n = front.n; // last is first
i = 0;
r++;
continue;
}
else
{
front.n = items[++i].n;
items[i].n = back.n;
if ( i == bottom )
{
i = 0;
r++;
continue;
}
}
}
}
Instead of moving all the items in your stack, you could change the definition of 'beginning'. Have an index that represents the first item in the stack, 0 at the start, which you add to and subtract from using modular arithmetic whenever you want to rotate your stack.
Note that if you take this approach you shouldn't give users of your class access to the underlying array (not that you really should anyway...).
Well, as this is an abstraction around an array, you can store the "zero" index as a member of the abstraction, and index into the array based on this abstract notion of the first element. Roughly...
class WrappedArray
{
int length;
int first;
T *array;
T get(int index)
{
return array[(first + index) % length];
}
int rotateForwards()
{
first++;
if (first == length)
first = 0;
}
}
You've gotten a couple of reasonable answers, already, but perhaps one more won't hurt. My first reaction would be to make your stack a wrapper around an std::deque, in which case moving an element from one end to the other is cheap (O(1)).
What you are after here is a circular list.
If you insist on storing items in an array just use top offset and size for access. This approach makes inserting elements after you reached allocated size expensive though (re-allocation, copying). This can be solved by using doubly-linked list (ala std::list) and an iterator, but arbitrary access into the stack will be O(n).
The function rotate below is based on reminders (do you mean this under the 'mod' operation?)
It is also quite efficient.
// Helper function.
// Finds GCD.
// See http://en.wikipedia.org/wiki/Euclidean_algorithm#Implementations
int gcd(int a, int b) {return b == 0 ? a : gcd(b, a % b);}
// Number of assignments of elements in algo is
// equal to (items.size() + gcd(items.size(),r)).
void rotate(std::vector<int>& items, int r) {
int size = (int)items.size();
if (size <= 1) return; // nothing to do
r = (r % size + size) % size; // fits r into [0..size)
int num_cycles = gcd(size, r);
for (int first_index = 0; first_index < num_cycles; ++first_index) {
int mem = items[first_index]; // assignment of items elements
int index = (first_index + r) % size, index_prev = first_index;
while (index != first_index) {
items[index_prev] = items[index]; // assignment of items elements
index_prev = index;
index = (index + r) % size;
};
items[index_prev] = mem; // assignment of items elements
}
}
Of course if it is appropriate for you to change data structure as described in other answers, you can obtain more efficient solution.
And now, the usual "it's already in Boost" answer: There is a Boost.CircularBuffer
If for some reason you'd prefer to perform actual physical rotation of array elements, you might find several alternative solutions in "Programming Pearls" by Jon Bentley (Column 2, 2.3 The Power of Primitives). Actually a Web search for Rotating Algorithms 'Programming Pearls' will tell you everything. The literal approach you are using now has very little practical value.
If you'd prefer to try to solve it yourself, it might help to try looking at the problem differently. You see, "rotating an array" is really the same thing as "swapping two unequal parts of an array". Thinking about this problem in the latter terms might lead you to new solutions :)
For example,
Reversal Approach. Reverse the order of the elements in the entire array. Then reverse the two parts independently. You are done.
For example, let's say we want to rotate abcdefg right by 2
abcdefg -> reverse the whole -> gfedcba -> reverse the two parts -> fgabcde
P.S. Slides for that chapter of "Programming Pearls". Note that in Bentley's experiments the above algorithm proves to be quite efficient (among the three tested).
I don't understand what the variables front and back mean, and why you need .n. Anyway, this is the shortest code I know to rotate the elements of an array, which can also be found in Bentley's book.
#include <algorithm>
std::reverse(array , array + r );
std::reverse(array + r, array + size);
std::reverse(array , array + size);