Not all invocations running - c++

I am attempting to implement a simple compute operation of multiplying all values in a buffer by a push constant.
This shader:
#version 450
layout(local_size_x = 1024, local_size_y = 1, local_size_z = 1) in;
layout(binding = 0) buffer Buffer { float x[]; };
layout(push_constant) uniform PushConsts { float a; };
void main() {
x[gl_GlobalInvocationID.x] *= a;
}
In my current implementation from my prints I get:
in:
1 2 3 4 5 5 4 3 2 1
[7]
workgroups: (1,1,1)
out:
7 14 3 4 5 5 4 3 2 1
1 2 3 4 5 5 4 3 2 1 being the initial values of the buffer.
7 being the value of the push constant.
(1,1,1) being the number of workgroups set.
7 14 3 4 5 5 4 3 2 1 being the resultant values of the buffer after execution.
Strangely only the 1st 2 values have been multiplied by the push constant (which leads me to think only the 1st 2 invocations ran).
I suspect this is likely an issue with memory allocation, although I don’t know where it may be coming from.
I would greatly appreciate any help, and if there is anything I can do to make this post better please let me know.
Project link, code file link and the .spv files (.comp files are in the repo, but I thought it worth including these in case you didn't want to compile the .comp files)

Problem originated with setting the size of the VkBuffer and VkDescriptorBufferInfo to the number of values rather than the number of bytes.
Making these changes fixes it:
bufferCreateInfo.size = size; -> bufferCreateInfo.size = sizeof(float)*size;
bindings[i].range = size; -> bindings[i].range = VK_WHOLE_SIZE;

Related

How to construct a tree given its depth and postorder traversal, then print its preorder traversal

I need to construct a tree given its depth and postorder traversal, and then I need to generate the corresponding preorder traversal. Example:
Depth: 2 1 3 3 3 2 2 1 1 0
Postorder: 5 2 8 9 10 6 7 3 4 1
Preorder(output): 1 2 5 3 6 8 9 10 7 4
I've defined two arrays that contain the postorder sequence and depth. After that, I couldn't come up with an algorithm to solve it.
Here's my code:
int postorder[1000];
int depth[1000];
string postorder_nums;
getline(cin, postorder_nums);
istringstream token1(postorder_nums);
string tokenString1;
int idx1 = 0;
while (token1 >> tokenString1) {
postorder[idx1] = stoi(tokenString1);
idx1++;
}
string depth_nums;
getline(cin, depth_nums);
istringstream token2(depth_nums);
string tokenString2;
int idx2 = 0;
while (token2 >> tokenString2) {
depth[idx2] = stoi(tokenString2);
idx2++;
}
Tree tree(1);
You can do this actually without constructing a tree.
First note that if you reverse the postorder sequence, you get a kind of preorder sequence, but with the children visited in opposite order. So we'll use this fact and iterate over the given arrays from back to front, and we will also store values in the output from back to front. This way at least the order of siblings will come out right.
The first value we get from the input will thus always be the root value. Obviously we cannot store this value at the end of the output array, as it really should come first. But we will put this value on a stack until all other values have been processed. The same will happen for any value that is followed by a "deeper" value (again: we are processing the input in reversed order). But as soon as we find a value that is not deeper, we flush a part of the stack into the output array (also filling it up from back to front).
When all values have been processed, we just need to flush the remaining values from the stack into the output array.
Now, we can optimise our space usage here: as we fill the output array from the back, we have free space at its front to use as the stack space for this algorithm. This has as nice consequence that when we arrive at the end we don't need to flush the stack anymore, because it is already there in the output, with every value where it should be.
Here is the code for this algorithm where I did not include the input collection, which apparently you already have working:
// Input example
int depth[] = {2, 1, 3, 3, 3, 2, 2, 1, 1, 0};
int postorder[] = {5, 2, 8, 9, 10, 6, 7, 3, 4, 1};
// Number of values in the input
int n = sizeof(depth)/sizeof(int);
int preorder[n]; // This will contain the ouput
int j = n; // index where last value was stored in preorder
int stackSize = 0; // how many entries are used as stack in preorder
for (int i = n - 1; i >= 0; i--) {
while (depth[i] < stackSize) {
preorder[--j] = preorder[--stackSize]; // flush it
}
preorder[stackSize++] = postorder[i]; // stack it
}
// Output the result:
for (int i = 0; i < n; i++) {
std::cout << preorder[i] << " ";
}
std::cout << "\n";
This algorithm has an auxiliary space complexity of O(1) -- so not counting the memory needed for the input and the output -- and has a time complexity of O(n).
I won't give you the code, but some hints how to solve the problem.
First, for postorder graph processing you first visit the children, then print (process) the value of the node. So, the tree or subtree parent is the last thing that is processed in its (sub)tree. I replace 10 with 0 for better indentation:
2 1 3 3 3 2 2 1 1 0
--------------------
5 2 8 9 0 6 7 3 4 1
As explained above, node of depth 0, or the root, is the last one. Let's lower all other nodes 1 level down:
2 1 3 3 3 2 2 1 1 0
-------------------
1
5 2 8 9 0 6 7 3 4
Now identify all nodes of depth 1, and lower all that is not of depth 0 or 1:
2 1 3 3 3 2 2 1 1 0
-------------------
1
2 3 4
5 8 9 0 6 7
As you can see, (5,2) is in a subtree, (8,9,10,6,7,3) in another subtree, (4) is a single-node subtree. In other words, all that is to the left of 2 is its subtree, all to the right of 2 and to the left of 3 is in the subtree of 3, all between 3 and 4 is in the subtree of 4 (here: empty).
Now lets deal with depth 3 in a similar way:
2 1 3 3 3 2 2 1 1 0
-------------------
1
2 3 4
5 6 7
8 9 0
2 is the parent for 2;
6 is the parent for 8, 8, 10;
3 is ahe parent for 6,7;
or very explicitly:
2 1 3 3 3 2 2 1 1 0
-------------------
1
/ / /
2 3 4
/ / /
5 6 7
/ / /
8 9 0
This is how you can construct a tree from the data you have.
EDIT
Clearly, this problem can be solved easily by recursion. In each step you find the lowest depth, print the node, and call the same function recursively for each of its subtrees as its argument, where the subtree is defined by looking for current_depth + 1. If the depth is passed as another argument, it can save the necessity of computing the lowest depth.

Optimal way to compress 60 bit string

Given 15 random hexadecimal numbers (60 bits) where there is always at least 1 duplicate in every 20 bit run (5 hexdecimals).
What is the optimal way to compress the bytes?
Here are some examples:
01230 45647 789AA
D8D9F 8AAAF 21052
20D22 8CC56 AA53A
AECAB 3BB95 E1E6D
9993F C9F29 B3130
Initially I've been trying to use Huffman encoding on just 20 bits because huffman coding can go from 20 bits down to ~10 bits but storing the table takes more than 9 bits.
Here is the breakdown showing 20 bits -> 10 bits for 01230
Character Frequency Assignment Space Savings
0 2 0 2×4 - 2×1 = 6 bits
2 1 10 1×4 - 1×2 = 2 bits
1 1 110 1×4 - 1×3 = 1 bits
3 1 111 1×4 - 1×3 = 1 bits
I then tried to do huffman encoding on all 300 bits (five 60bit runs) and here is the mapping given the above example:
Character Frequency Assignment Space Savings
---------------------------------------------------------
a 10 101 10×4 - 10×3 = 10 bits
9 8 000 8×4 - 8×3 = 8 bits
2 7 1111 7×4 - 7×4 = 0 bits
3 6 1101 6×4 - 6×4 = 0 bits
0 5 1100 5×4 - 5×4 = 0 bits
5 5 1001 5×4 - 5×4 = 0 bits
1 4 0010 4×4 - 4×4 = 0 bits
8 4 0111 4×4 - 4×4 = 0 bits
d 4 0101 4×4 - 4×4 = 0 bits
f 4 0110 4×4 - 4×4 = 0 bits
c 4 1000 4×4 - 4×4 = 0 bits
b 4 0011 4×4 - 4×4 = 0 bits
6 3 11100 3×4 - 3×5 = -3 bits
e 3 11101 3×4 - 3×5 = -3 bits
4 2 01000 2×4 - 2×5 = -2 bits
7 2 01001 2×4 - 2×5 = -2 bits
This yields a savings of 8 bits overall, but 8 bits isn't enough to store the huffman table. It seems because of the randomness of the data that the more bits you try to encode with huffman the less effective it works. Huffman encoding seemed to work best with 20 bits (50% reduction) but storing the table in 9 or less bits isnt possible AFAIK.
In the worst-case for a 60 bit string there are still at least 3 duplicates, the average case there are more than 3 duplicates (my assumption). As a result of at least 3 duplicates the most symbols you can have in a run of 60 bits is just 12.
Because of the duplicates plus the less than 16 symbols, I can't help but feel like there is some type of compression that can be used
If I simply count the number of 20-bit values with at least two hexadecimal digits equal, there are 524,416 of them. A smidge more than 219. So the most you could possibly save is a little less than one bit out of the 20.
Hardly seems worth it.
If I split your question in two parts:
How do I compress (perfect) random data: You can't. Every bit is some new entropy which can't be "guessed" by a compression algorithm.
How to compress "one duplicate in five characters": There are exactly 10 options where the duplicate can be (see table below). This is basically the entropy. Just store which option it is (maybe grouped for the whole line).
These are the options:
AAbcd = 1 AbAcd = 2 AbcAd = 3 AbcdA = 4 (<-- cases where first character is duplicated somewhere)
aBBcd = 5 aBcBd = 6 aBcdB = 7 (<-- cases where second character is duplicated somewhere)
abCCd = 8 abCdC = 9 (<-- cases where third character is duplicated somewhere)
abcDD = 0 (<-- cases where last characters are duplicated)
So for your first example:
01230 45647 789AA
The first one (01230) is option 4, the second 3 and the third option 0.
You can compress this by multiplying each consecutive by 10: (4*10 + 3)*10 + 0 = 430
And uncompress it by using divide and modulo: 430%10=0, (430/10)%10=3, (430/10/10)%10=4. So you could store your number like that:
1AE 0123 4567 789A
^^^ this is 430 in hex and requires only 10 bit
The maximum number for the three options combined is 1000, so 10 bit are enough.
Compared to storing these 3 characters normally you save 2 bit. As someone else already commented - this is probably not worth it. For the whole line it's even less: 2 bit / 60 bit = 3.3% saved.
If you want to get rid of the duplicates first, do this, then look at the links at the bottom of the page. If you don't want to get rid of the duplicates, then still look at the links at the bottom of the page:
Array.prototype.contains = function(v) {
for (var i = 0; i < this.length; i++) {
if (this[i] === v) return true;
}
return false;
};
Array.prototype.unique = function() {
var arr = [];
for (var i = 0; i < this.length; i++) {
if (!arr.contains(this[i])) {
arr.push(this[i]);
}
}
return arr;
}
var duplicates = [1, 3, 4, 2, 1, 2, 3, 8];
var uniques = duplicates.unique(); // result = [1,3,4,2,8]
console.log(uniques);
Then you would have shortened your code that you have to deal with. Then you might want to check out Smaz
Smaz is a simple compression library suitable for compressing strings.
If that doesn't work, then you could take a look at this:
http://ed-von-schleck.github.io/shoco/
Shoco is a C library to compress and decompress short strings. It is very fast and easy to use. The default compression model is optimized for english words, but you can generate your own compression model based on your specific input data.
Let me know if it works!

Downscale array for decimal factor

Is there efficient way to downscale number of elements in array by decimal factor?
I want to downsize elements from one array by certain factor.
Example:
If I have 10 elements and need to scale down by factor 2.
1 2 3 4 5 6 7 8 9 10
scaled to
1.5 3.5 5.5 7.5 9.5
Grouping 2 by 2 and use arithmetic mean.
My problem is what if I need to downsize array with 10 elements to 6 elements? In theory I should group 1.6 elements and find their arithmetic mean, but how to do that?
Before suggesting a solution, let's define "downsize" in a more formal way. I would suggest this definition:
Downsizing starts with an array a[N] and produces an array b[M] such that the following is true:
M <= N - otherwise it would be upsizing, not downsizing
SUM(b) = (M/N) * SUM(a) - The sum is reduced proportionally to the number of elements
Elements of a participate in computation of b in the order of their occurrence in a
Let's consider your example of downsizing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 to six elements. The total for your array is 55, so the total for the new array would be (6/10)*55 = 33. We can achieve this total in two steps:
Walk the array a totaling its elements until we've reached the integer part of N/M fraction (it must be an improper fraction by rule 1 above)
Let's say that a[i] was the last element of a that we could take as a whole in the current iteration. Take the fraction of a[i+1] equal to the fractional part of N/M
Continue to the next number starting with the remaining fraction of a[i+1]
Once you are done, your array b would contain M numbers totaling to SUM(a). Walk the array once more, and scale the result by N/M.
Here is how it works with your example:
b[0] = a[0] + (2/3)*a[1] = 2.33333
b[1] = (1/3)*a[1] + a[2] + (1/3)*a[3] = 5
b[2] = (2/3)*a[3] + a[4] = 7.66666
b[3] = a[5] + (2/3)*a[6] = 10.6666
b[4] = (1/3)*a[6] + a[7] + (1/3)*a[8] = 13.3333
b[5] = (2/3)*a[8] + a[9] = 16
--------
Total = 55
Scaling down by 6/10 produces the final result:
1.4 3 4.6 6.4 8 9.6 (Total = 33)
Here is a simple implementation in C++:
double need = ((double)a.size()) / b.size();
double have = 0;
size_t pos = 0;
for (size_t i = 0 ; i != a.size() ; i++) {
if (need >= have+1) {
b[pos] += a[i];
have++;
} else {
double frac = (need-have); // frac is less than 1 because of the "if" condition
b[pos++] += frac * a[i]; // frac of a[i] goes to current element of b
have = 1 - frac;
b[pos] += have * a[i]; // (1-frac) of a[i] goes to the next position of b
}
}
for (size_t i = 0 ; i != b.size() ; i++) {
b[i] /= need;
}
Demo.
You will need to resort to some form of interpolation, as the number of elements to average isn't integer.
You can consider computing the prefix sum of the array, i.e.
0 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
yields by summation
0 1 2 3 4 5 6 7 8 9
1 3 6 10 15 21 28 36 45 55
Then perform linear interpolation to get the intermediate values that you are lacking, like at 0*, 10/6, 20/6, 30/5*, 40/6, 50/6, 60/6*. (Those with an asterisk are readily available).
0 1 10/6 2 3 20/6 4 5 6 40/6 7 8 50/6 9
1 3 15/3 6 10 35/3 15 21 28 100/3 36 45 145/3 55
Now you get fractional sums by subtracting values in pairs. The first average is
(15/3-1)/(10/6) = 12/5
I can't think of anything in the C++ library that will crank out something like this, all fully cooked and ready to go.
So you'll have to, pretty much, roll up your sleeves and go to work. At this point, the question of what's the "efficient" way of doing it boils down to its very basics. Which means:
1) Calculate how big the output array should be. Based on the description of the issue, you should be able to make that calculation even before looking at the values in the input array. You know the input array's size(), you can calculate the size() of the destination array.
2) So, you resize() the destination array up front. Now, you no longer need to worry about the time wasted in growing the size of the dynamic output array, incrementally, as you go through the input array, making your calculations.
3) So what's left is the actual work: iterating over the input array, and calculating the downsized values.
auto b=input_array.begin();
auto e=input_array.end();
auto p=output_array.begin();
Don't see many other options here, besides brute force iteration and calculations. Iterate from b to e, getting your samples, calculating each downsized value, and saving the resulting value into *p++.

Formula Sequence

I need help finding the formula of the sequence for the next problem.
What I think and have for now is Sn=n(10^n-1)/9 but it just works in some cases...
Here is the description of the problem:
Description
Sn is based upon the sequence positive integers numbers. The value n can be found n times, so the first 25 terms of this sequence are as follows:
1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7...
For this problem, you have to write a program that calculates the i-th term in the sequence. That is, determine Sn(i).
Input specification
Input may contain several test cases (but no more than 10^5). Each test case is given in a line of its own, and contains an integer i (1 <= i <= 2 * 10^9). Input ends with a test case in which i is 0, and this case must not be processed.
Output specification
For each test case in the input, you must print the value of Sn(i) in a single line.
Sample input
1
25
100
0
Sample output
1
7
14
Thanks solopilot! I made the code but the online judge show me Time Limit Exceeded, what could be my error?
#include <iostream> #include <math.h> using namespace std; int main() {int i;
int NTS;
cin>>i;
while (i>=1){
NTS=ceil((sqrt(8*i+1)-1)/2);
cout<<" "<<NTS<<endl;
cin>>i;
}
return 0;}
F(n) = ceiling((sqrt(8*n+1)-1)/2)
Say F(n) = a.
Then n ~= a * (a+1) / 2.
Rearranging: a^2 + a - 2n ~= 0.
Solving: a = F(n) = (-1 + sqrt(1+8n)) / 2.
Ignore the negative answer.
The pattern looks like a pyramid.
Level : 1 3 6 10 15 21 28...
No : 1 2 3 4 5 6 7...
Level = n(n+1)/2 => elements
3 = 3*4/2 => 6
6 = 6*7/2 => 21

Transposing a sparse matrix using linked lists (Traversal problems)

I'm trying to transpose a sparse matrix in c++. I'm struggling with the traversal of the new transposed matrix. I want to enter everything from the first row of the matrix to the first column of the new matrix.
Each row has the column index the number should be in and the number itself.
Input:
colInd num colInd num colInd num
Input:
1 1 2 2 3 3
1 4 2 5 3 6
1 7 2 8 3 9
Output:
1 1 2 4 3 7
1 2 2 5 3 8
1 3 2 6 3 9
How do I make the list traverse down the first column inserting the first element as it goes then go back to the top inserting down the second column. Apologies if this is two hard to follow. But all I want help with is traversing the Transposed matrix to be in the right place at the right time inserting a nz(non zero) object in the right place.
Here is my code
list<singleRow> tran;
//Finshed reading so transpose
for (int i = 0; i < rows.size(); i++){ // Initialize transposed matrix
singleRow trow;
tran.push_back(trow);
}
list<singleRow>::const_iterator rit;
list<singleRow>::const_iterator trowit;
int rowind;
for (rit = rows.begin(), rowind = 1; rit != rows.end(); rit++, rowind++){//rit = row iterator
singleRow row = *rit;
singleRow::const_iterator nzit;
trowit = tran.begin(); //Start at the beginning of the list of rows
trow = *trowit;
for (nzit = row.begin(); nzit != row.end(); nzit++){//nzit = non zero iterator
int col = nzit->getCol();
double val = nzit->getVal();
trow.push_back(nz(rowind,val)); //How do I attach this to tran so that it goes in the right place?
trowit++;
}
}
Your representation of the matrix is inefficient: it doesn't use the fact that the matrix is sparse. I say so because it includes all the rows of the matrix, even if most of them are zero (empty), like it usually happens with sparse matrices.
Your representation is also hard to work with. So i suggest converting the representation first (to a regular 2-D array), transposing the matrix, and convert back.
(Edited:)
Alternatively, you can change the representation, for example, like this:
Input: rowInd colInd num
1 1 1
1 2 2
1 2 3
2 1 4
2 2 5
2 3 6
3 1 7
3 2 8
3 3 9
Output:
1 1 1
2 1 2
3 1 3
1 2 4
2 2 5
3 2 6
1 3 7
2 3 8
3 3 9
The code would be something like this:
struct singleElement {int row, col; double val;};
list<singleElement> matrix_input, matrix_output;
...
// Read input matrix from file or some such
list<singleElement>::const_iterator i;
for (i = matrix_input.begin(); i != matrix_input.end(); ++i)
{
singleElement e = *i;
std::swap(e.row, e.col);
matrix_output.push_back(e);
}
Your choice of list-of-list representation for a sparse matrix is poor for transposition. Sometimes, when considering algorithms and data structures, the best thing to do is to take the hit for transforming your data structure into one better suited for your algorithm than to mangle your algorithm to work with the wrong data structure.
In this case you could, for example, read your matrix into a coordinate list representation which would be very easy to transpose, then write into whatever representation you like. If space is a challenge, then you might need to do this chunk by chunk, allocating new columns in your target representation 1 by 1 and deallocating columns in your old representation as you go.