I am trying to create a mask for torch in C++ of type BoolTensor. The first n elements in dimension one need to be False and the rest need to be True.
This is my attempt but I do not know if this is correct (size is the number of elements):
src_mask = torch::BoolTensor({6, 1});
src_mask[:size,:] = 0;
src_mask[size:,:] = 1;
I'm not sure to understand exactly your goal here, so here is my best attempt to convert into C++ you pseudo-code .
First, with libtorch you declare the type of your tensor through the torch::TensorOptions struct (types names are prefixed with a lowercase k)
Second, your python-like slicing is possible thanks to the torch::Tensor::slicefunction (see here and there).
Finally, that gives you something like :
// Creates a tensor of boolean, initially all ones
auto options = torch::TensorOptions().dtype(torch::kBool));
torch::Tensor bool_tensor = torch::ones({6,1}, options);
// Set the slice to 0
int size = 3;
bool_tensor.slice(/*dim=*/0, /*start=*/0, /*end=*/size) = 0;
std::cout << bool_tensor << std::endl;
Please not that this will set the first size rows to 0. I assumed that's what you meant by "first elements in dimension x".
Another way to do it:
using namespace torch::indexing; //for using Slice(...) function
at::Tensor src_mask = at::empty({ 6, 1 }, at::kBool); //empty bool tensor
src_mask.index_put_({ Slice(None, size), Slice() }, 0); //src_mask[:size,:] = 0
src_mask.index_put_({ Slice(size, None), Slice() }, 1); //src_mask[size:,:] = 0
Related
So I was asked to write a function that changes array's values in a way that:
All of the values that are the smallest aren't changed
if, let's assume, the smallest number is 2 and there is no 3's and 4's then all 5's are changed for 3's etc.
for example, for an array = [2, 5, 7, 5] we would get [2, 3, 4, 3], which generalizes to getting a minimal value of an array which remains unchanged, and every other minimum (not including the first one) is changed depending on which minimum it is. On our example - 5 is the first minimum (besides 2), so it is 2 (first minimum) + 1 = 3, 7 is 2nd smallest after 2, so it is 2+2(as it is 2nd smallest).
I've come up with something like this:
int fillGaps(int arr[], size_t sz){
int min = *min_element(arr, arr+sz);
int w = 1;
for (int i = 0; i<sz; i++){
if (arr[i] == min) {continue;}
else{
int mini = *min_element(arr+i, arr+sz);
for (int j = 0; j<sz; j++){
if (arr[j] == mini){arr[j] = min+w;}
}
w++;}
}
return arr[sz-1];
}
However it works fine only for the 0th and 1st value, it doesnt affect any further items. Could anyone please help me with that?
I don't quite follow the logic of your function, so can't quite comment on that.
Here's how I interpret what needs to be done. Note that my example implementation is written to be as understandable as possible. There might be ways to make it faster.
Note that I'm also using an std::vector, to make things more readable and C++-like. You really shouldn't be passing raw pointers and sizes, that's super error prone. At the very least bundle them in a struct.
#include <algorithm>
#include <set>
#include <unordered_map>
#include <vector>
int fillGaps (std::vector<int> & data) {
// Make sure we don't have to worry about edge cases in the code below.
if (data.empty()) { return 0; }
/* The minimum number of times we need to loop over the data is two.
* First to check which values are in there, which lets us decide
* what each original value should be replaced with. Second to do the
* actual replacing.
*
* So let's trade some memory for speed and start by creating a lookup table.
* Each entry will map an existing value to its new value. Let's use the
* "define lambda and immediately invoke it" to make the scope of variables
* used to calculate all this as small as possible.
*/
auto const valueMapping = [&data] {
// Use an std::set so we get all unique values in sorted order.
std::set<int> values;
for (int e : data) { values.insert(e); }
std::unordered_map<int, int> result;
result.reserve(values.size());
// Map minimum value to itself, and increase replacement value by one for
// each subsequent value present in the data vector.
int replacement = *values.begin();
for (auto e : values) { result.emplace(e, replacement++); }
return result;
}();
// Now the actual algorithm is trivial: loop over the data and replace each
// element with its replacement value.
for (auto & e : data) { e = valueMapping.at(e); }
return data.back();
}
How would one get a view of a PyArrayObject* similar to the following python code?
# n-column array x
# d is the length of each column
print(x.shape) # => (d, n)
by_column = [x[::,i] for i in range(x.shape[1])]
assert len(by_column) == n
print(by_column[n-1].shape) # => (d,)
So far my code is this:
// my_array is a PyArrayObject*
std::vector<PyArrayObject*> columns = {};
npy_intp* dims = my_array->dimensions;
npy_intp* strides = my_array->strides;
std::vector<int> shape = {};
for (int i = 0; &dims[i] != strides; i++){
shape.push_back(dims[i]);
}
switch (shape.size()) {
case 1: {
// handle 1D array by simply iterating
}
case 2: {
int columns = shape[1];
// What now?
}
}
I'm having trouble finding any reference to do this in C/C++ in both the documentation and the source code, could you give an example of how one would do this?
The C/C++ API for numpy seems really convoluted when compared to something like std::vector, and the documentation isn't very beginner-friendly either, so any references to easier guides would be appreciated too.
You should access the internal structure of PyArrayObject via the PyArray_XXX functions like PyArray_NDIM. To get the contents of a sequence, you use PyObject_GetItem with a tuple key, where in your use case the tuple will have a PySliceObject as the first element.
I am using the Xilinx's triSYCL github implementation,https://github.com/triSYCL/triSYCL.
I am trying to create a design with 100 producer/consumer to read/write from 100 pipes.
What I am not sure of is, How to create an array of cl::sycl::buffer and initialize it using std::iota.
Here is my code:
constexpr size_t T=6;
constexpr size_t n_threads=100;
cl::sycl::buffer<float, n_threads> a { T };
for (int i=0; i<n_threads; i++)
{
auto ba = a[i].get_access<cl::sycl::access::mode::write>();
// Initialize buffer a with increasing integer numbers starting at 0
std::iota(ba.begin(), ba.end(), i*T);
}
And I am getting the following error:
error: no matching function for call to ‘cl::sycl::buffer<float, 2>::buffer(<brace-enclosed initializer list>)’
cl::sycl::buffer<float, n_threads> a { T };
I am new to C++ programming. So I am not able to figure out the exact way to do this.
There are 2 points I think cause the issue you are currently having:
The 2nd template argument in the buffer object definition should be the dimensionality of the buffer (count of dimensions, should be 1, 2 or 3), not the dimensions themselves.
The constructor for the buffer should contain either the actual dimensions of the buffer, or the data that you want the buffer to have and the dimensions. To pass the dimensions, you need to pass a cl::sycl::range object to the constructor
As I understand you are trying to initialize a buffer of dimensionality 1 and with dimensions { 100, 1, 1 }. To do this, the definition of a should change to:
cl::sycl::buffer < float, 1 > a(cl::sycl::range< 1 >(n_threads));
Also, as the dimensionality can be deduced from the range template parameter, thus you can achieve the same effect with:
cl::sycl::buffer< float > a (cl::sycl::range< 1 >(n_threads));
As for initializing the buffer with std::iota, you have 3 options:
Use an array to initialize the data with the iota usage and pass them to the sycl buffer (case A),
Use the accessor to write to the buffer directly for host - CPU only (case B), or
Use an accessor with a parallel_for for execution on either host or an OpenCL device (case C).
Accessors should not be used as iterators (with .begin(), .end())
Case A:
std::vector<float> data(n_threads); // or std::array<float, n_threads> data;
std::iota(data.begin(), data.end(), 0); // this will create the data { 0, 1, 2, 3, ... }
cl::sycl::buffer<float> a(data.data(), cl::sycl::range<1>(n_threads));
// The data in a are already initialized, you can create an accessor to use them directly
Case B:
cl::sycl::buffer<float> a(cl::sycl::range<1>(n_threads));
{
auto ba = a.get_access<cl::sycl::access::mode::write>();
for(size_t i=0; i< n_threads; i++) {
ba[i] = i;
}
}
Case C:
cl::sycl::buffer<float> a(cl::sycl::range<1>(n_threads));
cl::sycl::queue q{cl::sycl::default_selector()}; // create a command queue for host or device execution
q.Submit([&](cl::sycl::handler& cgh) {
auto ba = a.get_access<cl::sycl::access::mode::write>();
cgh.parallel_for<class kernel_name>([=](cl::sycl::id<1> i){
ba[i] = i.get(0);
});
});
q.wait_and_throw(); // wait until kernel execution completes
Also check chapter 4.8 of the SYCL 1.2.1 spec https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf as it has an example for iota
Disclaimer: triSYCL is a research project for now. Please use ComputeCpp for anything serious. :-)
If you really need arrays of buffer, I guess you can use something similar to Is there a way I can create an array of cl::sycl::pipe?
As a variant, you can use a std::vector<cl::sycl::buffer<float>> or std::array<cl::sycl::buffer<float>, n_threads> and initialize with a loop from a cl::sycl::buffer<float> { T }.
I need to put the first value of a loop = 0, and then use a range to start the loop.
In MatLab this is possible : x = [0 -range:range] (range is a integer)
This will give a value of [0, -range, -range+1, -range+2, .... , range-1, range]
The problem is I need to do this in C++, I tried to do by an array and then put in like the value on the loop without success.
//After loading 2 images, put it into matrix values and then trying to compare each one.
for r=1:bRows
for c=1:bCols
rb=r*blockSize;
cb=c*blockSize;
%%for each block search in the near position(1.5 block size)
search=blockSize*1.5;
for dr= [0 -search:search] //Here's the problem.
for dc= [0 -search:search]
%%check if it is inside the image
if(rb+dr-blockSize+1>0 && rb+dr<=rows && cb+dc-blockSize+1>0 && cb+dc<=cols)
%compute the error and check if it is lower then the previous or not
block=I1(rb+dr-blockSize+1:rb+dr,cb+dc-blockSize+1:cb+dc,1);
TE=sum( sum( abs( block - cell2mat(B2(r,c)) ) ) );
if(TE<E)
M(r,c,:)=[dr dc]; %store the motion vector
Err(r,c,:)=TE; %store th error
E=TE;
end
end
end
end
%reset the error for the next search
E=255*blockSize^2;
end
end
C++ doesn't natively support ranges of the kind you know from MatLab, although external solutions are available, if somewhat of an overkill for your use case. However, C++ allows you to implement them easily (and efficiently) using the primitives provided by the language, such as for loops and resizable arrays. For example:
// Return a vector consisting of
// {0, -limit, -limit+1, ..., limit-1, limit}.
std::vector<int> build_range0(int limit)
{
std::vector<int> ret{0};
for (auto i = -limit; i <= limit; i++)
ret.push_back(i);
return ret;
}
The resulting vector can be easily used for iteration:
for (int dr: build_range0(search)) {
for (int dc: build_range0(search)) {
if (rb + dr - blockSize + 1 > 0 && ...)
...
}
}
The above of course wastes some space to create a temporary vector, only to throw it away (which I suspect happens in your MatLab example as well). If you want to just iterate over the values, you will need to incorporate the loop such as the one in build_range0 directly in your function. This has the potential to reduce readability and introduce repetition. To keep the code maintainable, you can abstract the loop into a generic function that accepts a callback with the loop body:
// Call fn(0), fn(-limit), fn(-limit+1), ..., fn(limit-1), and fn(limit)
template<typename F>
void for_range0(int limit, F fn) {
fn(0);
for (auto i = -limit; i <= limit; i++)
fn(i);
}
The above function can be used to implement iteration by providing the loop body as an anonymous function:
for_range0(search, [&](int dr) {
for_range0(search, [&](int dc) {
if (rb + dr - blockSize + 1 > 0 && ...)
...
});
});
(Note that both anonymous functions capture enclosing variables by reference in order to be able to mutate them.)
Reading your comment, you could do something like this
for (int i = 0, bool zero = false; i < 5; i++)
{
cout << "hi" << endl;
if (zero)
{
i = 3;
zero = false;
}
}
This would start at it 0, then after doing what I want it to do, assign i the value 3, and then continue adding to it each iteration.
I am almost done with my code except I need help on two thing. Here is my code: Code. For the function below, I am trying to make it so that I can use the input of "n" to initialize my array, myBits, instead of a constant, which is currently 5.
My Other question is right below that. I am trying to switch all of the right most bits to "true". I wrote the for loop in "/* .....*/" but it doesn't seem to be working. Right above it, I do it long ways for C(5,4) ....(myBit[0] = myBit[1]....etc...... (I am using this to find r-combinations of strings).... and it seems to work. Any help would be appreciated!!
void nCombination(const vector<string> &Vect, int n, int r){
bool myBits[5] = { false }; // everything is false now
myBits[1] = myBits[2] = myBits[3] = myBits[4] = true;
/* for(int b = n - r - 1; b = n - 1; b++){
myBits[b] = true; // I am trying to set the r rightmost bits to true
}
*/
do // start combination generator
{
printVector(Vect, myBits, n);
} while (next_permutation(myBits, myBits + n)); // change the bit pattern
}
These are called variable length arrays (or VLAs for short) and they are not a feature of standard C++. This is because we already have arrays that can change their length how ever they want: std::vector. Use that instead of an array and it will work.
Use std::vector<bool>:
std::vector<bool> myBits(n, false);
Then you have to change your while statement:
while (next_permutation(myBits.begin(), myBits.end()));
You will also have to change your printVector function to take a vector<bool>& as the second argument (you won't need the last argument, n, since a vector knows its own size by utilizing the vector::size() function).
As to your program: If you're attempting to get the combination of n things taken r at a time, you will need to write a loop that initializes the last right r bools to true instead of hard-coding the rightmost 4 entries.
int count = 1;
for (size_t i = n-1; i >= 0 && count <= r; --i, ++count)
myBits[i] = true;
Also, you should return immediately from the function if r is 0.