Fast where() function in C++

Fast where() function in C++ - c++

I have an integer array filled with 0s and 1s. I am looking for the fastest way in C/C++ to get the positions (indices) of the 1s, something similar to the where() function in numpy.
Edit 1: since I only store bits, a char array would do the job just as fine.
Edit 2: an example:
char a[5];
a[0]=0;
a[1]=1;
a[2]=1;
a[3]=0;
a[4]=1;
should return
1,2,4
The type of the array is not crucial, however I have to find the position of 1s as fast as possible.

If you store only bits, I suppose you could just use bool type, not char.
const unsigned int size = 5;
bool bits[size] = {0 , 1 , 0 , 1 , 0};
std::vector<unsigned int> indices;
auto ptr = &bits[0];
for (int i = 0; i<size; i++, ptr++)
{
if (*ptr) indices.push_back (i);
}
If speed is more important than memory for you, you could use regular (statically-sized) array, instead of std::vector (it takes time to "enlarge" it), an cope with over-allocation.
I recommend you to somehow judge approximate size of your bit array and use reserve() function of your indices vector. Then your vector won't have to realocate so much (or even won't do it at all).

Related

Initializing large 2-dimensional array to all one value in C++

I want to initialize a large 2-dimensional array (say 1000x1000, though I'd like to go even larger) to all -1 in C++.
If my array were 1-dimensional, I know I could do:
int my_array[1000];
memset(my_array, -1, sizeof(my_array));
However, memset does not allow for initializing all the elements of an array to be another array. I know I could just make a 1-dimensional array of length 1000000, but for readability's sake I would prefer a 2-dimensional array. I could also just loop through the 2-dimensional array to set the values after initializing it to all 0, but this bit of code will be run many times in my program and I'm not sure how fast that would be. What's the best way of achieving this?
Edited to add minimal reproducible example:
int my_array[1000][1000];
// I want my_array[i][j] to be -1 for each i, j

I am a little bit surprised.
And I know, it is C++. And, I would never use plain C-Style arrays.
And therefore the accepted answer is maybe the best.
But, if we come back to the question
int my_array[1000];
memset(my_array, -1, sizeof(my_array));
and
int my_array[1000][1000];
// I want my_array[i][j] to be -1 for each i, j
Then the easiest and fastest solution is the same as the original assumption:
int my_array[1000][1000];
memset(my_array, -1, sizeof(my_array));
There is no difference. The compiler will even optimze this away and use fast assembler loop instructions.
Sizeof is smart enough. It will do the trick. And the memory is contiguous: So, it will work. Fast. Easy.
(A good compiler will do the same optimizations for the other solutions).
Please consider.

With GNU GCC you can:
int my_array[1000][1000] = { [0 .. 999] = { [0 .. 999] = -1, }, };
With any other compiler you need to:
int my_array[1000][1000] = { { -1, -1, -1, .. repeat -1 1000 times }, ... repeat { } 1000 times ... };
Side note: The following is doing assignment, not initialization:
int my_array[1000][1000];
for (auto&& i : my_array)
for (auto&& j : i)
j = -1;
Is there any real difference between doing what you wrote and doing for(int i=0; i<1000; i++){ for(int j=0; j<1000; j++){ my_array[i][j]=-1; } }?
It depends. If you have a bad compiler, you compile without optimization, etc., then yes. Most probably, no. Anyway, don't use indexes. I believe the range based for loop in this case roughly translates to something like this:
for (int (*i)[1000] = my_array; i < my_array + 1000; ++i)
for (int *j = *i; j < *i + 1000; ++j)
*j = -1;
Side note: Ach! It hurts to calculate my_array + 1000 and *i + 1000 each loop. That's like 3 operations done each loop. This cpu time wasted! It can be easily optimized to:
for (int (*i)[1000] = my_array, (*max_i)[1000] = my_array + 10000; i < max_i; ++i)
for (int *j = *i, *max_j = *i + 1000; j < max_j; ++j)
*j = -1;
The my_array[i][j] used in your loop, translates into *(*(my_array + i) + j) (see aarray subscript operator). That from pointer arithmetics is equal to *(*((uintptr_t)my_array + i * sizeof(int**)) + j * sizeof(int*)). Counting operations, my_array[i][j] is behind the scenes doing multiplication, addition, dereference, multiplication, addition, derefence - like six operations. (When using bad or non-optimizing compiler), your version could be way slower.
That said, a good compiler should optimize each version to the same code, as shown here.
And are either of these significantly slower than just initializing it explicitly by typing a million -1's?
I believe assigning each array element (in this particular case of elements having the easy to optimize type int) will be as fast or slower then initialization. It really depends on your particular compiler and on your architecture. A bad compiler can do very slow version of iterating over array elements, so it would take forever. On the other hand a static initialization can embed the values in your program, so your program size will increase by sizeof(int) * 1000 * 1000, and during program startup is will do plain memcpy when initializing static regions for your program. So, when compared to a properly optimized loop with assignment, you will not gain nothing in terms of speed and loose tons of read-only memory.

If the array is static, it's placed in sequential memory (check this question). So char [1000][1000] is equal to char [1000000] (if your stack can hold that much).
If the array has been created with multidimensional new (say char(*x)[5] = new char[5][5]) then it's also contiguous.
If it's not (if you create it with dynamic allocations), then you can use the solutions found in my question to map a n-dimension array to a single one after you have memsetd it.

How can I use a mixture of array and map in C++?

A short version of my problem: Is it possible to use an array data structure, for example, treats x[0] to x[10] as a normal array, and some other point value, x[15], x[20] as a map?
Reasons: I do not calculate or store any other value bigger than index 11,
and making the whole thing a map slows down the calculation significantly.
My initial problem: I am writing a fast program to calculate a series, which has x(0)=0, x(1)=1, x(2k)=(3x(k)+2x(Floor(k/2)))mod2^60, x(2k+1)=(2x(k)+3x(Floor(k/2)))mod2^60, and my target is to list numbers from x(10^12) to x(2*10^12)
I am listing and storing the first 10^8 value with normal array,
for (unsigned long long int i = 2; i<=100000000;i++){
if (i%2==0) {
x[i] =(3*x[i/2] + 2*x[(unsigned long long int)(i/4)])&1152921504606846975;
}
else{
x[i] =(2*x[(i-1)/2] + 3*x[(unsigned long long int)((i-1)/4)])&1152921504606846975;
}
}//these code for listing
unsigned long long int xtrans(unsigned long long int k){
if (k<=100000000)return x[k];
unsigned long long int result;
if (k%2==0) {
result =(3*xtrans(k/2) + 2*xtrans((unsigned long long int)(k/4)))&1152921504606846975;
}
else{
result =(2*xtrans((k-1)/2) + 3*xtrans((unsigned long long int)((k-1)/4)))&1152921504606846975;
}
return result;
}//These code for calculating x
listing those numbers takes me around 2s and 750MB of memory.
And I am planning to store specific values for example x[2*10^8], x[4*10^8] without calculating and storing other values for further optimization. But I have to use map in this situation. However, after I convert the declaration of x from array to map, it took me 90s and 4.5GB memory to achieve the same listing.
So I am now wondering if it is possible to use index under 10^8 as an array, and the remaining part as map?

Simply write a wrapper class for your idea:
class MyMap {
...
operator[](size_t i) {
return ( i <= barrier_ ) ? array_[i] : map_[i];
}
}

TL;DR
Why not create a custom class with an std::array of size 10 and an std::map as members and override [] operator to check index and pick value from either array or map as per the need.

Theoretically, you can use ArrayWithHash library to store your dictionary. It stores dictionary as a hybrid of array and hash table similar to table implementation in lua interpreter.
Awh::ArrayWithHash<uint64_t, uint64_t> x;
dict.Reserve(100000000, 0); //preallocate array part
for (uint64_t i = 2; i <= 100000000; i++) {
if (i % 2 == 0) {
x.Set(i, (3 * x.Get(i/2) + 2 * x.Get(i/4)) & 1152921504606846975ULL);
}
...
Unfortunately, memory consumption is one of the drawbacks of ArrayWithHash. It pads array to power-of-two size, so array part would eat 1 GB. As for hash table implementation, it is even less memory-efficient: it can take three times more memory than required to storing key/value pairs.

Speed gains: Converting 2D array to 1D array

I initially had a 2D array. The results were taking time to get back the results. So, I converted the 2D array into 1D array but still there is not much improvement in speed of my program.
Here is my code:
for( counter1=0; counter1< size ; ++counter1)
{
y=buffer[counter1];
x=buffer[counter1+1];
IndexEntries index= OneDTable[x*256+y];
SingleTableEntries NewNextState=NewSingleTable[Next*blocksize+index];
Next=NewNextState.Next_State;
if(NewNextState.accept[index.serialnumber]=='1' )
{
return 1;
}
}
In my code above: OneDTable is a 1D array generated from a 2D array of 256 * 256 elements.
NewSingleTable is a 1D array generated from a 2D array of blocksize* (Total Next Elements).
Actually , I was expecting large speed gains after converting into 1D arrays. Is this the right way to extract value from a 1D array or certain improvements can be done to the above code?
More Details:
Both 2D arrays are of structure type:
Structure type of IndexEntries consists of:
int
int
Structure type of NewSingleTable consists of:
int
vector<char>

You could gain something changing from a vector of vector to a plain vector. E.g. from:
std::vector<std::vector<my_struct>> table(total_rows,
std::vector<my_struct>(total_columns,
my_struct()));
// do something with table[row][column]...
to
std::vector<my_struct> table(total_rows * total_columns);
// do something with table[row * total_columns + column]...
This because a vector of vector is not really a matrix and you lose data locality.
Changing from:
my_struct table[total_rows][total_columns];
to
my_struct table[total_rows * total_columns];
is worthless since the memory layout between the two is (usually) precisely the same.
The only difference is the semantic type of the array and the fact that you now have to implement the 2D element lookup yourself (of course changing from table[row * 256 + column] to table[row << 8 + column] is useless since any decent compiler will automatically perform this "optimization").
The 1D array could be a bit faster when you have to perform an operation on every element. This because of the simpler for loop:
for (unsigned row(0); row < total_rows; ++row)
for (unsigned column(0); column < total_columns; ++column)
// do something with table[row][column]
const unsigned stop(total_rows * total_columns);
for (unsigned i(0); i < stop; ++i)
// do something with table[i]
but this isn't your case.
As laune said in is comment, copying a NewSingleTable just to extract a couple of integers is bad:
SingleTableEntries NewNextState=NewSingleTable[Next*blocksize+index];
From your example it seems that a const reference should be enough:
...
const SingleTableEntries &NewNextState(NewSingleTable[Next * blocksize + index]);
if (NewNextState.accept[index.serialnumber] == '1' )
return 1;
Next = NewNextState.Next_State;
...

How to get intersection of two Arrays

I have two integer arrays
int A[] = {2, 4, 3, 5, 6, 7};
int B[] = {9, 2, 7, 6};
And i have to get intersection of these array.
i.e. output will be - 2,6,7
I am thinking to sove it by saving array A in a data strcture and then i want to compare all the element till size A or B and then i will get intersection.
Now i have a problem i need to first store the element of Array A in a container.
shall i follow like -
int size = sizeof(A)/sizeof(int);
To get the size but by doing this i will get size after that i want to access all the elemts too and store in a container.
Here i the code which i am using to find Intersection ->
#include"iostream"
using namespace std;
int A[] = {2, 4, 3, 5, 6, 7};
int B[] = {9, 2, 7, 6};
int main()
{
int sizeA = sizeof(A)/sizeof(int);
int sizeB = sizeof(B)/sizeof(int);
int big = (sizeA > sizeB) ? sizeA : sizeB;
int small = (sizeA > sizeB) ? sizeB : sizeA;
for (int i = 0; i <big ;++i)
{
for (int j = 0; j <small ; ++j)
{
if(A[i] == B[j])
{
cout<<"Element is -->"<<A[i]<<endl;
}
}
}
return 0;
}

Just use a hash table:
#include <unordered_set> // needs C++11 or TR1
// ...
unordered_set<int> setOfA(A, A + sizeA);
Then you can just check for every element in B, whether it's also in A:
for (int i = 0; i < sizeB; ++i) {
if (setOfA.find(B[i]) != setOfA.end()) {
cout << B[i] << endl;
}
}
Runtime is expected O(sizeA + sizeB).

You can sort the two arrays
sort(A, A+sizeA);
sort(B, B+sizeB);
and use a merge-like algorithm to find their intersection:
#include <vector>
...
std::vector<int> intersection;
int idA=0, idB=0;
while(idA < sizeA && idB < sizeB) {
if (A[idA] < B[idB]) idA ++;
else if (B[idB] < A[idA]) idB ++;
else { // => A[idA] = B[idB], we have a common element
intersection.push_back(A[idA]);
idA ++;
idB ++;
}
}
The time complexity of this part of the code is linear. However, due to the sorting of the arrays, the overall complexity becomes O(n * log n), where n = max(sizeA, sizeB).
The additional memory required for this algorithm is optimal (equal to the size of the intersection).

saving array A in a data strcture
Arrays are data structures; there's no need to save A into one.
i want to compare all the element till size A or B and then i will get intersection
This is extremely vague but isn't likely to yield the intersection; notice that you must examine every element in both A and B but "till size A or B" will ignore elements.
What approach i should follow to get size of an unkown size array and store it in a container??
It isn't possible to deal with arrays of unknown size in C unless they have some end-of-array sentinel that allows counting the number of elements (as is the case with NUL-terminated character arrays, commonly referred to in C as "strings"). However, the sizes of your arrays are known because their compile-time sizes are known. You can calculate the number of elements in such arrays with a macro:
#define ARRAY_ELEMENT_COUNT(a) (sizeof(a)/sizeof *(a))
...
int *ptr = new sizeof(A);
[Your question was originally tagged [C], and my comments below refer to that]
This isn't valid C -- new is a C++ keyword.
If you wanted to make copies of your arrays, you could simply do it with, e.g.,
int Acopy[ARRAY_ELEMENT_COUNT(A)];
memcpy(Acopy, A, sizeof A);
or, if for some reason you want to put the copy on the heap,
int* pa = malloc(sizeof A);
if (!pa) /* handle out-of-memory */
memcpy(pa, A, sizeof A);
/* After you're done using pa: */
free(pa);
[In C++ you would used new and delete]
However, there's no need to make copies of your arrays in order to find the intersection, unless you need to sort them (see below) but also need to preserve the original order.
There are a few ways to find the intersection of two arrays. If the values fall within the range of 0-63, you can use two unsigned longs and set the bits corresponding to the values in each array, then use & (bitwise "and") to find the intersection. If the values aren't in that range but the difference between the largest and smallest is < 64, you can use the same method but subtract the smallest value from each value to get the bit number. If the range is not that small but the number of distinct values is <= 64, you can maintain a lookup table (array, binary tree, hash table, etc.) that maps the values to bit numbers and a 64-element array that maps bit numbers back to values.
If your arrays may contain more than 64 distinct values, there are two effective approaches:
1) Sort each array and then compare them element by element to find the common values -- this algorithm resembles a merge sort.
2) Insert the elements of one array into a fast lookup table (hash table, balanced binary tree, etc.), and then look up each element of the other array in the lookup table.

Sort both arrays (e.g., qsort()) and then walk through both arrays one element at a time.
Where there is a match, add it to a third array, which is sized to match the larger of the two input arrays (your result array can be no larger than the largest of the two arrays). Use a negative or other "dummy" value as your terminator.
When walking through input arrays, where one value in the first array is larger than the other, move the index of the second array, and vice versa.
When you're done walking through both arrays, your third array has your answer, up to the terminator value.

Finding repeating signed integers with O(n) in time and O(1) in space

(This is a generalization of: Finding duplicates in O(n) time and O(1) space)
Problem: Write a C++ or C function with time and space complexities of O(n) and O(1) respectively that finds the repeating integers in a given array without altering it.
Example: Given {1, 0, -2, 4, 4, 1, 3, 1, -2} function must print 1, -2, and 4 once (in any order).
EDIT: The following solution requires a duo-bit (to represent 0, 1, and 2) for each integer in the range of the minimum to the maximum of the array. The number of necessary bytes (regardless of array size) never exceeds (INT_MAX – INT_MIN)/4 + 1.
#include <stdio.h>
void set_min_max(int a[], long long unsigned size,\
int* min_addr, int* max_addr)
{
long long unsigned i;
if(!size) return;
*min_addr = *max_addr = a[0];
for(i = 1; i < size; ++i)
{
if(a[i] < *min_addr) *min_addr = a[i];
if(a[i] > *max_addr) *max_addr = a[i];
}
}
void print_repeats(int a[], long long unsigned size)
{
long long unsigned i;
int min, max = min;
long long diff, q, r;
char* duos;
set_min_max(a, size, &min, &max);
diff = (long long)max - (long long)min;
duos = calloc(diff / 4 + 1, 1);
for(i = 0; i < size; ++i)
{
diff = (long long)a[i] - (long long)min; /* index of duo-bit
corresponding to a[i]
in sequence of duo-bits */
q = diff / 4; /* index of byte containing duo-bit in "duos" */
r = diff % 4; /* offset of duo-bit */
switch( (duos[q] >> (6 - 2*r )) & 3 )
{
case 0: duos[q] += (1 << (6 - 2*r));
break;
case 1: duos[q] += (1 << (6 - 2*r));
printf("%d ", a[i]);
}
}
putchar('\n');
free(duos);
}
void main()
{
int a[] = {1, 0, -2, 4, 4, 1, 3, 1, -2};
print_repeats(a, sizeof(a)/sizeof(int));
}

The definition of big-O notation is that its argument is a function (f(x)) that, as the variable in the function (x) tends to infinity, there exists a constant K such that the objective cost function will be smaller than Kf(x). Typically f is chosen to be the smallest such simple function such that the condition is satisfied. (It's pretty obvious how to lift the above to multiple variables.)
This matters because that K — which you aren't required to specify — allows a whole multitude of complex behavior to be hidden out of sight. For example, if the core of the algorithm is O(n2), it allows all sorts of other O(1), O(logn), O(n), O(nlogn), O(n3/2), etc. supporting bits to be hidden, even if for realistic input data those parts are what actually dominate. That's right, it can be completely misleading! (Some of the fancier bignum algorithms have this property for real. Lying with mathematics is a wonderful thing.)
So where is this going? Well, you can assume that int is a fixed size easily enough (e.g., 32-bit) and use that information to skip a lot of trouble and allocate fixed size arrays of flag bits to hold all the information that you really need. Indeed, by using two bits per potential value (one bit to say whether you've seen the value at all, another to say whether you've printed it) then you can handle the code with fixed chunk of memory of 1GB in size. That will then give you enough flag information to cope with as many 32-bit integers as you might ever wish to handle. (Heck that's even practical on 64-bit machines.) Yes, it's going to take some time to set that memory block up, but it's constant so it's formally O(1) and so drops out of the analysis. Given that, you then have constant (but whopping) memory consumption and linear time (you've got to look at each value to see whether it's new, seen once, etc.) which is exactly what was asked for.
It's a dirty trick though. You could also try scanning the input list to work out the range allowing less memory to be used in the normal case; again, that adds only linear time and you can strictly bound the memory required as above so that's constant. Yet more trickiness, but formally legal.
[EDIT] Sample C code (this is not C++, but I'm not good at C++; the main difference would be in how the flag arrays are allocated and managed):
#include <stdio.h>
#include <stdlib.h>
// Bit fiddling magic
int is(int *ary, unsigned int value) {
return ary[value>>5] & (1<<(value&31));
}
void set(int *ary, unsigned int value) {
ary[value>>5] |= 1<<(value&31);
}
// Main loop
void print_repeats(int a[], unsigned size) {
int *seen, *done;
unsigned i;
seen = calloc(134217728, sizeof(int));
done = calloc(134217728, sizeof(int));
for (i=0; i<size; i++) {
if (is(done, (unsigned) a[i]))
continue;
if (is(seen, (unsigned) a[i])) {
set(done, (unsigned) a[i]);
printf("%d ", a[i]);
} else
set(seen, (unsigned) a[i]);
}
printf("\n");
free(done);
free(seen);
}
void main() {
int a[] = {1,0,-2,4,4,1,3,1,-2};
print_repeats(a,sizeof(a)/sizeof(int));
}

Since you have an array of integers you can use the straightforward solution with sorting the array (you didn't say it can't be modified) and printing duplicates. Integer arrays can be sorted with O(n) and O(1) time and space complexities using Radix sort. Although, in general it might require O(n) space, the in-place binary MSD radix sort can be trivially implemented using O(1) space (look here for more details).

The O(1) space constraint is intractable.
The very fact of printing the array itself requires O(N) storage, by definition.
Now, feeling generous, I'll give you that you can have O(1) storage for a buffer within your program and consider that the space taken outside the program is of no concern to you, and thus that the output is not an issue...
Still, the O(1) space constraint feels intractable, because of the immutability constraint on the input array. It might not be, but it feels so.
And your solution overflows, because you try to memorize an O(N) information in a finite datatype.

There is a tricky problem with definitions here. What does O(n) mean?
Konstantin's answer claims that the radix sort time complexity is O(n). In fact it is O(n log M), where the base of the logarithm is the radix chosen, and M is the range of values that the array elements can have. So, for instance, a binary radix sort of 32-bit integers will have log M = 32.
So this is still, in a sense, O(n), because log M is a constant independent of n. But if we allow this, then there is a much simpler solution: for each integer in the range (all 4294967296 of them), go through the array to see if it occurs more than once. This is also, in a sense, O(n), because 4294967296 is also a constant independent of n.
I don't think my simple solution would count as an answer. But if not, then we shouldn't allow the radix sort, either.

I doubt this is possible. Assuming there is a solution, let's see how it works. I'll try to be as general as I can and show that it can't work... So, how does it work?
Without losing generality we could say we process the array k times, where k is fixed. The solution should also work when there are m duplicates, with m >> k. Thus, in at least one of the passes, we should be able to output x duplicates, where x grows when m grows. To do so, some useful information has been computed in a previous pass and stored in the O(1) storage. (The array itself can't be used, this would give O(n) storage.)
The problem: we have O(1) of information, when we walk over the array we have to identify x numbers(to output them). We need a O(1) storage than can tell us in O(1) time, if an element is in it. Or said in a different way, we need a data structure to store n booleans (of wich x are true) that uses O(1) space, and takes O(1) time to query.
Does this data structure exists? If not, then we can't find all duplicates in an array with O(n) time and O(1) space (or there is some fancy algorithm that works in a completely different manner???).

I really don't see how you can have only O(1) space and not modify the initial array. My guess is that you need an additional data structure. For example, what is the range of the integers? If it's 0..N like in the other question you linked, you can have an additinal count array of size N. Then in O(N) traverse the original array and increment the counter at the position of the current element. Then traverse the other array and print the numbers with count >= 2. Something like:
int* counts = new int[N];
for(int i = 0; i < N; i++) {
counts[input[i]]++;
}
for(int i = 0; i < N; i++) {
if(counts[i] >= 2) cout << i << " ";
}
delete [] counts;

Say you can use the fact you are not using all the space you have. You only need one more bit per possible value and you have lots of unused bit in your 32-bit int values.
This has serious limitations, but works in this case. Numbers have to be between -n/2 and n/2 and if they repeat m times, they will be printed m/2 times.
void print_repeats(long a[], unsigned size) {
long i, val, pos, topbit = 1 << 31, mask = ~topbit;
for (i = 0; i < size; i++)
a[i] &= mask;
for (i = 0; i < size; i++) {
val = a[i] & mask;
if (val <= mask/2) {
pos = val;
} else {
val += topbit;
pos = size + val;
}
if (a[pos] < 0) {
printf("%d\n", val);
a[pos] &= mask;
} else {
a[pos] |= topbit;
}
}
}
void main() {
long a[] = {1, 0, -2, 4, 4, 1, 3, 1, -2};
print_repeats(a, sizeof (a) / sizeof (long));
}
prints
4
1
-2

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Fast where() function in C++ - c++

Related

Initializing large 2-dimensional array to all one value in C++

How can I use a mixture of array and map in C++?

Speed gains: Converting 2D array to 1D array

How to get intersection of two Arrays

Finding repeating signed integers with O(n) in time and O(1) in space

Categories

Resources