In short, I am trying to compress(left pack) 64-bit integers by index. Neither scatter nor compress intrinsics solves this problem directly.
Suppose you have eight 64-bit integers in a and want to left pack those elements at addresses starting at base_addr by the index subject to mask k.
int64_t* dst; // memory to store the result
__m512i a = _mm512_loadu_si512 ( arr ); // load data from memory into a
__mmask8 k = _mm512_cmpgt_epi64_mask ( a, _mm512_set1_epi64(6) ); // compare for greater-than
__m512i index = _mm512_set_epi64 ( 14, 12, 10, 8, 6, 4, 2, 0 ); // index vector
_mm512_mask_compressstoreu_epi64_by_index ( dst, k, index, a ); // How can I implement this function efficiently?
So, _mm512_mask_compressstoreu_epi64_by_index function should compress 64-bit integers from a into memory dst using index. The writemask k stores the element, which is active in a, to memory.
The result of this function will looks like:
dst = [10, 0, 7, 0, 9, 0, 0, 0 ...].
The elements 10, 7 and 9 are stored after an index 0, 2 and 4 accordingly.
I've tried _mm512_mask_compressstoreu_epi64 and _mm512_mask_i64scatter_epi64 intrinsics, but these instructions save the elements differently. They will give you following results:
_mm512_mask_compressstoreu_epi64( dst, k, a ) produces: dst = [ 10, 7, 9 , ... ]
_mm512_mask_i64scatter_epi64 ( dst, k, index, a, 8 ) produces: dst = [ 0, 10, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 9, ...]
What I want is _mm512_mask_compressstoreu_epi64_by_index( dst, k, index, a ) which results dst = [10, 0, 7, 0, 9, 0, 0, 0 ...]
How can I solve this problem?
My goal is to create a general function that creates a two-dimensional vector filled with permutations (vector) based on a template given and on parameters, as follows:
some positions of the vector have to be fixed, based on a template as a function parameter vector. For example, if the given template is {0, 1, 0, -1, 3, -1}, this means that permutations will only vary by the numbers in places of -1.
n. n-1 is the range of integers the permutation can include. E.g. if n = 4, only 0, 1, 2, 3 can appear in the vector
length, which is the length of the vector
Note, that if a number from the template already appears in it, it will not be generated in the permutations.
So, to give an example:
n = 6, length = 5, template = {2, 1, 0, -1, 0, -1}
the permutations are:
{2, 1, 0, 3, 0, 3}
{2, 1, 0, 3, 0, 4}
{2, 1, 0, 3, 0, 5}
{2, 1, 0, 4, 0, 3}
{2, 1, 0, 4, 0, 4}
{2, 1, 0, 4, 0, 5}
{2, 1, 0, 5, 0, 3}
{2, 1, 0, 5, 0, 4}
{2, 1, 0, 5, 0, 5}
As you can see, the numbers are only generated in indexes 3 and 5 (places, where it was -1), also, the places to do not include 0, 1 or 2, since they already appear in the template.
I need to generate these permutations without using the <algorithm> library.
I assume creating a recursive function is the best option, but I do not know how to move forward. Any suggestions would help.
Since you've offered no visible attempt, I assume it might be helpful for you to study some working code. This is in JavaScript (I hope it's producing the expected output). I hope it can help give you some ideas you could translate to C++.
function f(template){
var used = template.reduce((acc, x) => { if (x != -1) acc.add(x); return acc; }, new Set());
console.log(`used: ${Array.from(used)}`);
var needed = new Set(template.reduce((acc, x, i) => { if (!used.has(i)) acc.push(i); return acc; }, []));
console.log(`needed: ${Array.from(needed)}`);
var indexes = template.reduce((acc, x, i) => { if (x == -1) return acc.concat(i); else return acc; }, []);
console.log(`indexes: ${indexes}`);
function g(needed, indexes, template, i=0){
if (i == indexes.length)
return [template];
var result = [];
// Each member of 'needed' must appear in
// each position, indexes[i]
for (x of needed){
let _template = template.slice();
_template[ indexes[i] ] = x;
result = result.concat(
g(needed, indexes, _template, i + 1));
return result;
return g(needed, indexes, template);
var template = [2, 1, 0, -1, 0, -1];
var result = f(template);
var str = '\n';
for (let r of result)
str += JSON.stringify(r) + '\n';
I would like to make all the values in the first list inside the list of lists named "child_Before" below zero. The piece of code I wrote to accomplish this task is also shown below after the list:
child_Before = [[9, 12, 7, 3, 13, 14, 10, 5, 4, 11, 8, 6, 2],
[1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1],
[[1, 0], [1, 1]]]
for elem in range(len(child_Before[0])):
child_Before[0][elem] = 0
Below is the expected result:
child_After = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1],
[[1, 0], [1, 1]]]
However, I think there should be a more nibble way to accomplish this exercise. Hence, I welcome your help. Thank you in advance.
just to add a creative answer
import numpy as np
child_Before[0] = (np.array(child_Before[0])&0).tolist()
this is bad practice though since i'm using bitwise operasions in a senario where it is not intuitive, and i think there is a slight chance i'm making 2 loops xD on the bright site the & which is making all the zeros is O(1) time complexity
Just create a list of [0] with the same length as the original list.
# Answer to this question - make first list in the list to be all 0
child_Before[0] = [0] * len(child_Before[0])
As for you answer, I can correct it to make all the elements in the lists of this list to be zero.
# Make all elements 0
for child in range(len(child_Before)):
child_Before[child] = [0] * len(child_Before[child])
Use list comprehension:
child_after = [[i if n != 0 else 0 for i in j] for n, j in enumerate(child_Before)]
Lets say I have an array A of size n, where 0 <= A[i] <= n.
Lets say I have 2 arrays Forward and Backward, size n, where:
Forward[i] = index j where
A[j] = min(A[i], A[i+1], ..., A[n-1])
Backward[i] = index j where
A[j] = min(A[i], A[i-1], ..., A[0])
My question is:
given A, Forward and Backward
given 2 indexes l and r
Can I discover the index k such that A[k] = min(A[l], A[l+1], ..., A[r]) in constant time?
No. Atleast not in O(1) time. A counter example is as follows. 0-based indexing is used here. Let
index = {0, 1, 2, 3, 4, 5, 6, 7, 8}
A = {1, 3, 5, 7, 9, 6, 4, 2, 0}
Forward = {8, 8, 8, 8, 8, 8, 8, 8, 8}
Backward = {0, 0, 0, 0, 0, 0, 0, 0, 8}
Now, if I ask you to get the index of the minimum value in range [3, 7], how will you do it?
Basically they will be of no use to find in the range [a, b]
if forward[a] > b and backward[b] < a.
No you cant. A counter example is:
A = {0, 4, 3, 2, 3, 4, 0}
Forward = {6, 6, 6, 6, 6, 6, 6}
Backward = {0, 0, 0, 0, 0, 0, 0}
l = 1, k = 5
ie Forward and Backward are of no use in that case and you have to search the array which is O(k-l).
I have a 2-dimensional array of ones and zeros called M where the g rows represent groups and the a columns represent articles. M maps groups and articles. If a given article "art" belongs to group "gr" then we have M[gr,art]=1; if not we have M[gr,art]=0.
Now, I would like to convert M into a square a x a matrix of ones and zeros (call it N) where if an article "art1" is in the same group as article "art2", we have N(art1,art2)=1 and N(art1,art2)=0 otherwise. N is clearly symmetric with 1's in the diagonal.
How do I construct N based on M?
Many thanks for your suggestions - and sorry if this is trivial (still new to python...)!
So you have a boolean matrix M like the following:
>>> M
array([[1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1],
[0, 0, 1, 0, 0, 0],
[1, 0, 1, 0, 0, 0]])
>>> ngroups, narticles = M.shape
and what you want is a matrix of shape (narticles, narticles) that represents co-occurrence. That's simply the square of the matrix:
>>>, M.T)
array([[1, 0, 0, 1],
[0, 2, 0, 0],
[0, 0, 1, 1],
[1, 0, 1, 2]])
... except that you don't want counts, so set entries > 0 to 1.
>>> N =, M.T)
>>> N[N > 0] = 1
>>> N
array([[1, 0, 0, 1],
[0, 1, 0, 0],
[0, 0, 1, 1],
[1, 0, 1, 1]])