I would like to instantiate a cv::Mat with a custom defined type, but the allocation seems to be failing. For example:
struct SType
{
int a;
char c[16];
};
cv::Mat m = cv::Mat_<SType>(1, 1);
printf("cols = %i rows %i step = %zi elemSize = %zi elemSize1 = %zi\n",
m.cols, m.rows, m.step[0], m.elemSize(), m.elemSize1() );
this provides the following output:
cols = 1 rows 1 step = 8 elemSize = 8 elemSize1 = 8
which is obviously wrong, since I'm expecting a elemSize of 20. Is this a bug or the cv::Mat_ wrapper is not supposed to be used with custom element types?
Edit:
When assigning the instance to a cv::Mat_ element instead
cv::Mat_<SType> m = cv::Mat_<SType>(1, 1);
printf("cols = %i rows %i step = %zi elemSize = %zi elemSize1 = %zi\n",
m.cols, m.rows, m.step[0], m.elemSize(), m.elemSize1() );
I get the following output:
cols = 1 rows 1 step = 8 elemSize = 20 elemSize1 = 20
Now, elemSize is correct, but step is wrong. As I understand, step is used to compute the specific element to access in the matrix via the operator() (row,col), and I'm observing problems when doing so. Anybody has a better insight on what's going on here?
Edit 2:
I submitted a bug report regarding this issue. http://code.opencv.org/issues/4415 . In the meantime, if anyone has an idea how to deal with it, please let me know. Thanks.
I think I spotted the mistake. You declare your variable as cv::Mat. Try:
cv::Mat_<SType> m = cv::Mat_<SType>(1, 1);
Related
trying to convert some C++ code into C, I'm working with binary data and need to use a C equivalent of this:
enum GssipFlags : uint16_t
{
SPARE0 = 1,
SPARE1 = 2 * SPARE0,
SPARE2 = 2 * SPARE1,
SPARE3 = 2 * SPARE2,
REQ_MSG = 2 * SPARE3,
DISCONNECT = 2 * REQ_MSG,
CONNECT = 2 * DISCONNECT,
INVALID_DATA = 2 * CONNECT,
CMD_REJECT = 2 * INVALID_DATA,
HANDSHAKE = 2 * CMD_REJECT,
NAK_MSG = 2 * HANDSHAKE,
ACK_MSG = 2 * NAK_MSG,
ACK_REQ = 2 * ACK_MSG,
RESYNC = 2 * ACK_REQ,
MODE = 2 * RESYNC,
READY = 2 * MODE
};
enum GssipMessageIDs : uint16_t
{
CCCCCCCC = 1,
RECEIVER_ID_MSG = 2,
BUFFER_BOX_STATUS_REQUEST_MSG = 3,
SETUP_DATA_5031 = 4,
WARNING_MSG = 5,
TIME_TRANSFER = 6
};
enum GssipWarningMsgIDs : uint16_t
{
EXTERNAL_POWER_DISCONNECT = 17,
SELF_TEST_OK = 8,
AAAAA = 9,
BBBBB = 10
};
Everything I've tried hasnt worked. the main aspect of this I need is for everything to be uint16_t
You have one standard option and two potential options depending on your compiler and what "I'm working with binary data and need to use a C" means (memory usage?, speed?, etc):
This has been already commented, the use of structs with the type you are looking for:
typedef struct {
uint16_t SPARE0;
...
} GssipFlags_t;
GssipFlags_t a = {
.SPARE0=1
...
};
If you are trying to reduce the size of enums, take advantage of the compiler (if available) and use -fshort-enums.
Allocate to an enum type only as many bytes as it needs for the declared range of possible values. Specifically, the enum type is equivalent to the smallest integer type that has enough room.
__attribute__((packed)), in order to remove the padding added between members (which may do things slower due to the cost of accessing to unaligned data).
If you don't mind about the size and your only concern is to compile C++11 code with a C compiler (which might produce the same output than adding -fshort-enums), just do:
enum GssipFlags {
SPARE0 = 1
...
};
Items 2, 3 and 4 don't explicitly create members with uint16_t type, but if this is a XY problem, they provide different solutions depending on your real issue.
I am trying to write a program for matrix calculations using C/CUDA.
I have the following program:
In main.cu
#include <cuda.h>
#include <iostream>
#include "teste.cuh"
using std::cout;
int main(void)
{
const int Ndofs = 2;
const int Nel = 4;
double *Gh = new double[Ndofs*Nel*Ndofs*Nel];
double *Gg;
cudaMalloc((void**)& Gg, sizeof(double)*Ndofs*Nel*Ndofs*Nel);
for (int ii = 0; ii < Ndofs*Nel*Ndofs*Nel; ii++)
Gh[ii] = 0.;
cudaMemcpy(Gh, Gg, sizeof(double)*Ndofs*Nel*Ndofs*Nel, cudaMemcpyHostToDevice);
integraG<<<256, 256>>>(Nel, Gg);
cudaMemcpy(Gg, Gh, sizeof(double)*Ndofs*Nel*Ndofs*Nel, cudaMemcpyDeviceToHost);
for (int ii = 0; ii < Ndofs*Nel*Ndofs*Nel; ii++)
cout << ii + 1 << " " << Gh[ii] << "\n";
return 0;
}
In mtrx.cuh
#ifndef TESTE_CUH_
#define TESTE_CUH_
__global__ void integraG(const int N, double* G)
{
const int szmodel = 2*N;
int idx = threadIdx.x + blockIdx.x*blockDim.x;
int idy = threadIdx.y + blockIdx.y*blockDim.y;
int offset = idx + idy*blockDim.x*gridDim.x;
int posInit = szmodel*offset;
G[posInit + 0] = 1;
G[posInit + 1] = 1;
G[posInit + 2] = 1;
G[posInit + 3] = 1;
}
#endif
The result (which is supposed to be a matrix filled with 1's) is copied back to the host array; The problem is: nothing happens! Apparently, my program is not calling the gpu kernel, and I am still getting an array full of zeros.
I am very new to CUDA programming and I am using CUDA by example (Jason Sanders) as a reference book.
My questions are:
What is wrong with my code?
Is this the best way to deal with matrices using GPU, using matrices vectorized form?
Is there another reference that can provide more examples on matrices using GPU's?
These are your questions:
What is wrong with my code?
Is this the best way to deal with matrices using GPU, using matrices vectorized form?
Is there another reference that can provide more examples on matrices using GPU's?
For your first question. First of all, your problem should explicitly be defined. What do you want to do with this code? what sort of calculations do you want to do on the Matrix?
Try to check for errors properly THIS is a very good way to do so. There are some obvious bugs in your code as well. some of your bugs:
You're passing the wrong address pointers to the cudaMemcpy, the pointers that are passed to the source and the destination have to be swapped with each other, Check here
Change them to:
"NdofsNelNdofs*Nel" shows that you're interested in the value of the first 64 numbers of the array, so why calling 256 Threads and 256 blocks?
This part of your code:
int idx = threadIdx.x + blockIdx.xblockDim.x;
int idy = threadIdx.y + blockIdx.yblockDim.y;
shows that you want to use 2-Dim threads and blocks; to do that so, you need to use Dim type.
By making the following changes:
cudaMemcpy(Gg, Gh, sizeof(double)*Ndofs*Nel*Ndofs*Nel, cudaMemcpyHostToDevice); //HERE
dim3 block(2,2); //HERE
dim3 thread(4,4); //HERE
integraG<<<block, thread>>>(Nel, Gg); //HERE
cudaMemcpy(Gh, Gg, sizeof(double)*Ndofs*Nel*Ndofs*Nel, cudaMemcpyDeviceToHost); //HERE
You'll get a result like the following:
1 1
2 1
3 1
4 1
5 0
6 0
7 0
8 0
9 1
10 1
11 1
12 1
.
.
.
57 1
58 1
59 1
60 1
61 0
62 0
63 0
64 0
Anyway, if you state your problem and goal more clearly, better suggestions can be provided for you.
Regarding to your last two questions:
In my opinion CUDA C PROGRAMMING GUIDE and CUDA C BEST PRACTICES GUIDE are the two must documents to read when starting with CUDA, and they include examples on Matrix calculations as well.
I have a linear optimization goal to Maximize EE+FF, where EE and FF each consist of some C and D.
With code I've written, I can get solver to find:
EE_quantity: 0, FF_quantity: 7
...but I know there to be another solution:
EE_quantity: 1, FF_quantity: 6
In order to validate user input for other valid solutions, I added a constraint for both EE and FF. So I added the EE_quantity == 0, FF_quantity == 7 in the code below, which is a runnable example:
SolverContext c2 = SolverContext.GetContext();
Model m2 = c2.CreateModel();
p.elements = elements_multilevel_productmix();
Decision C_quantity = new Decision(Domain.IntegerNonnegative, "C_quantity");
Decision D_quantity = new Decision(Domain.IntegerNonnegative, "D_quantity");
Decision EE_quantity = new Decision(Domain.IntegerNonnegative, "EE_quantity");
Decision FF_quantity = new Decision(Domain.IntegerNonnegative, "FF_quantity");
m2.AddDecisions(C_quantity, D_quantity, EE_quantity, FF_quantity);
m2.AddConstraints("production",
6 * C_quantity + 4 * D_quantity <= 100,
1 * C_quantity + 2 * D_quantity <= 200,
2 * EE_quantity + 1 * FF_quantity <= C_quantity,
1 * EE_quantity + 2 * FF_quantity <= D_quantity,
EE_quantity == 0,
FF_quantity == 7
);
m2.AddGoal("fixed_EE_FF", GoalKind.Maximize, "EE_quantity + FF_quantity");
Solution sol = c2.Solve(new SimplexDirective());
foreach (var item in sol.Decisions)
{
System.Diagnostics.Debug.WriteLine(
item.Name + ": " + item.GetDouble().ToString()
);
}
It seems that Solver Foundation really doesn't like this specific combination. Using EE_quantity == 1, FF_quantity == 6 is fine, as is using just EE_quantity == 0 or FF_quantity == 7. But using both, AND having one of them being zero, throws an exception:
Index was outside the bounds of the array.
What is going on under the hood, here? And how do I specify that I want to find "all" solutions for a specific problem?
(Note: no new releases of Solver Foundation are forthcoming - it's essentially been dropped by Microsoft.)
The stack trace indicates that this is a bug in the simplex solver's presolve routine. Unfortunately the SimplexDirective does not have a way to disable presolve (unlike InteriorPointDirective). Therefore the way to get around this problem is to specify the fixed variables differently.
Remove the last two constraints that set EE_quantity and FF_quantity, and instead set both the upper and lower bounds to be 0 and 7 respectively when you create the Decision objects. This is equivalent to what you wanted to express, but appears to avoid the MSF bug:
Decision EE_quantity = new Decision(Domain.IntegerRange(0, 0), "EE_quantity");
Decision FF_quantity = new Decision(Domain.IntegerRange(7, 7), "FF_quantity");
The MSF simplex solver, like many mixed integer solvers, only returns the optimal solution. If you want MSF to return all solutions, change to the constraint programming solver (ConstraintProgrammingDirective). If you review the documentation for Solution.GetNext() you should figure out how to do this.
Of course the CP solver is not guaranteed to produce the globally optimal solution immediately. But if you iterate through solutions long enough, you'll get there.
I would like to have a multidimensional array that allows for different sizes.
Example:
int x[][][] = {{{1,2},{2,3}},{{1,2}},{{4,5},{2,7},{1,1}}};
The values will be known at compile time and will not change.
I would like to be able to access the values like val = x[2][0][1];
What is the best way to go about this? I'm used to java/php where doing something like this is trivial.
Thanks
I suppose you could do this "the old fashioned (uphill both ways) way":
#include <stdio.h>
int main(void){
int *x[3][3];
int y[12] = {1,2,3,4,5,6,7,8,9,10,11,12};
x[0][0] = &y[0];
x[0][1] = &y[2];
x[1][0] = &y[4];
x[2][0] = &y[6];
x[2][1] = &y[8];
x[2][2] = &y[10];
// testing:
printf("x[0][0][0] = %d\n", x[0][0][0]);
printf("x[0][0][1] = %d\n", x[0][0][1]);
printf("x[0][1][0] = %d\n", x[0][1][0]);
printf("x[0][1][1] = %d\n", x[0][1][1]);
printf("x[1][0][0] = %d\n", x[1][0][0]);
printf("x[1][0][1] = %d\n", x[1][0][1]);
printf("x[2][0][0] = %d\n", x[2][0][0]);
printf("x[2][0][1] = %d\n", x[2][0][1]);
printf("x[2][1][0] = %d\n", x[2][1][0]);
printf("x[2][1][1] = %d\n", x[2][1][1]);
printf("x[2][2][1] = %d\n", x[2][2][0]);
printf("x[2][2][1] = %d\n", x[2][2][1]);
return 0;
}
Basically, the array x is a little bit too big (3x3) and it points to the "right place" in the array y that contains your data (I am using the digits 1…12 because it's easier to see it is doing the right thing). For a small example like this, you end up with an array of 9 pointers in x (72 bytes), plus the 12 integers in y (48 bytes).
If you filled an int array with zeros where you didn't need values (or -1 if you wanted to indicate "invalid") you would end up with 18x4 = 72 bytes. So the above method is less efficient - because this array is not "very sparse". As you change the degree of raggedness, this gets better. If you really wanted to be efficient you would have an array of pointers-of-pointers, followed by n arrays of pointers - but this gets very messy very quickly.
Very often the right approach is a tradeoff between speed and memory size (which is always at a premium on the Arduino).
By the way - the above code does indeed produce the output
x[0][0][0] = 1
x[0][0][1] = 2
x[0][1][0] = 3
x[0][1][1] = 4
x[1][0][0] = 5
x[1][0][1] = 6
x[2][0][0] = 7
x[2][0][1] = 8
x[2][1][0] = 9
x[2][1][1] = 10
x[2][2][1] = 11
x[2][2][1] = 12
Of course it doesn't stop you from accessing an invalid array element - and doing so will generate a seg fault (since the unused elements in x are probably invalid pointers).
Thanks Floris.
I've decided to just load all values into a single array, like
{1,2,2,3,1,2,4,5,2,7,1,1}
and have a second array which stores the length of each first dimension, like
{2,1,3}
The third dimension always has a length of 2, so I will just multiply the number by 2. I'm going to make a helper class so I can just do something like getX(2,0) which would return 4, and have another function like getLength(0) which would return 2.
I have to implement small multimage graphic control, which in essence is an array of 9 images, shown one by one. The final goal is to act as minislider.
Now, this graphic control is going to receive various integer ranges: from 5 to 25 or from 0 to 7 or from -9 to 9.
If I am going to use proportion - "rule of three" I am afraid is not technically suistainable because it can be a source of errors. My guess is to use some lookup tables, but has anyone an good advice for approach?
Thnx
I'm not sure look up tables are required. You can get from your input value to an image index between 0 and 9 proportionally:
int ConvertToImageArrayIndex(int inputValue)
{
int maxInputFromOtherModule = 25;
int minInputFromOtherModule = 5;
// +1 required so include both min and max input values in possible range.
// + 0.5 required so that round to the nearest image instead of always rounding down.
// 8.0 required to get to an output range of 9 possible indexes [0..8]
int imageIndex = ( (float)((inputValue-minInputFromOtherModule) * 8.0) / (float)(maxInputFromOtherModule - minInputFromOtherModule + 1) ) + 0.5;
return imageIndex;
}
yes, a lookup table is a good solution
int lookup[9] = {5, 25, ... the other values };
int id1 = floor(slider);
int id2 = id1+1;
int texId1 = lookup[id1];
int texId2 = lookup[id2];
interpolate(texId1, texId2, slider - float(id1));