Strange uint8_t conversion with OpenCV - c++

I have encountered a strange behavior from the Matrix class in OpenCV regarding the conversion float to uint8_t.
It seems that OpenCV with the Matrix class converts float to uint8_t by doing a ceil instead of just truncating the decimal.
#include <iostream>
#include <opencv2/core/core.hpp>
#include <opencv2/imgcodecs.hpp>
int main() {
cv::Mat m1(1, 1, CV_8UC1);
cv::Mat m2(1, 1, CV_8UC1);
cv::Mat m3(1, 1, CV_8UC1);
m1.at<uint8_t>(0, 0) = 121;
m2.at<uint8_t>(0, 0) = 105;
m3.at<uint8_t>(0, 0) = 82;
cv::Mat x = m1 * 0.5 + m2 * 0.25 + m3 * 0.25;
printf("%d \n", x.at<uint8_t>(0, 0));
uint8_t r = 121 * 0.5 + 105 * 0.25 + 82 * 0.25;
printf("%d \n\n", r);
return 0;
}
Output:
108
107
Do you know why this append and how to correct this behavior ?
Thank you,

The strange behavior is a result of cv::MatExpr and Lasy evaluation usage as described here.
The actual result equals:
round(round(121*0.5 + 105*0.25) + 82*0.25) = 108
The rounding is used because the element type is UINT8 (integer type).
The computation order is a result of the "Lasy evaluation" strategy.
Following the computation process using the debugger is challenging, because OpenCV implementation includes operator overloading, templates, macros and pointer to functions...
The actual computation is performed in static void scalar_loop function in
dst[x] = op::r(src1[x], src2[x], scalar);
When for example: src1[x] = 121, src2[x] = 105 and scalar = 0.5.
It executes an inline function:
inline uchar c_add<uchar, float>(uchar a, uchar b, float alpha, float beta, float gamma)
{ return saturate_cast<uchar>(CV_8TO32F(a) * alpha + CV_8TO32F(b) * beta + gamma); }
The actual rounding is in saturate_cast:
template<> inline uchar saturate_cast<uchar>(float v) { int iv = cvRound(v); return saturate_cast<uchar>(iv); }
cvRound uses an SIMD intrinsic return _mm_cvtss_si32(t)
It's equivalent to: return (int)(value + (value >= 0 ? 0.5f : -0.5f));
The Lasy evaluation stages builds MatExpr with alpha and beta scalars.
cv::Mat x = m1 * 0.5 + m2 * 0.25 + m3 * 0.25; //m1 = 121, m2 = 105, m3 = 82
The expression is built recursively (hard to follow).
Following the "operator +" function (using the debugger):
MatExpr operator + (const MatExpr& e1, const MatExpr& e2)
{
MatExpr en;
e1.op->add(e1, e2, en);
return en;
}
State 1:
e1.a data = 121 (UINT8)
e1.b (NULL)
e1.alpha = 0.5
e1.beta = 0
e2.a data = 105 (UINT8)
e1.b (NULL)
e1.alpha = 0.25
e1.beta = 0
Result:
en.a data = 121 (UINT8)
en.b data = 105 (UINT8)
en.alpha = 0.5
en.beta = 0.25
State 2:
e1.a data = 121 (UINT8)
e1.b data = 105 (UINT8)
e1.alpha = 0.5
e1.beta = 0.25
e2.a data = 82 (UINT8)
e1.b (NULL)
e1.alpha = 0.25
e1.beta = 0
en.a data = 87 (UINT8) <--- 121*0.5 + 105*0.25 = 86.7500 rounded to 87
en.b data = 82 (UINT8)
en.alpha = 1
en.beta = 0.25
Stage 3: (in MatExpr::operator Mat() const):
m data = 108 (UINT8) <--- 87*1 + 82*0.25 = 87 + 20.5 = 107.5 rounded to 108
You may try to follow the computation process using the debugger.
It requires building OpenCV from sources, in Debug configuration, and a lot of patient...

Related

Checking if a float is an even number before printing it

I have a float that has a value like this: 0.012447, I want to rescale the value I have between 0.0 - 1.0 and use it to display a percentage, but I would like to only display even values, like ... 12%, 14%, 16%, ...
Something like this:
float percent = ....
int value = percent * 100;
if(value == 12 || value == 14 || value == 16 .....){
printf("Result: = %f\n", percent);
}
If there is an easy way to do it without using the percent * 10 and multiple ifs i would like examples
As a complement to #salcc's answer, this is probably close to what you ask for; conditional printing of value if it's even.
#include <cmath>
#include <iostream>
int main() {
for(float percent = 0.f; percent <= 1.f; percent += 1.f / 47.f) {
int value = static_cast<int>(std::round(percent * 100.f));
if(value % 2 == 0) { // Is it even? If so, print it.
std::cout << "Result: = " << value << "\n";
}
}
}
What's really bad with this approach is that it may create gaps in the output. The above creates this output:
Result: = 0
Result: = 2
Result: = 4
Result: = 6
Result: = 26
Result: = 28
Result: = 30
Result: = 32
Result: = 34
Result: = 36
Result: = 38
Result: = 40
Result: = 60
Result: = 62
Result: = 64
Result: = 66
Result: = 68
Result: = 70
Result: = 72
Result: = 74
Result: = 94
Result: = 96
Result: = 98
Result: = 100
To round to the nearest even number you can divide the number by two, round the result, and then multiply it by two. You can do that and convert it to a percentage by just doing:
printf("Result: %g%%\n", round(percent * 100 / 2.0) * 2);
To use the round() function, you have to put #include <cmath> at the top of the file.
Demo:
float percent = 0.129;
printf("Result: %g%%\n", round(percent * 100 / 2.0) * 2);
Output:
Result: 12%

Convert RGB surface to YUV in hardware using Direct3D 9

I have an ARGB Direct3D9 surface that I need to blit into UYVY surface of the same dimensions. Both surfaces are in system memory. How can I accomplish this?
UpdateSurface and StretchRect fail.
I'm open to using textures instead of surfaces if needed.
This must be done in GPU, i.e. with hardware acceleration.
In DirectXTex there is code for doing all these conversions you could look at for reference. The legacy Direct3D 9 D3DFMT_UYVY format is the same as DXGI_FORMAT_YUY2 with some channel swizzling.
These formats encode two visible pixels in each image pixel:
struct XMUBYTEN4 { // DirectXMath data type
uint8_t x;
uint8_t y;
uint8_t z;
uint8_t w;
};
XMUBYTEN4 rgb1, rgb2 = // Input pixel pair
int y0 = ((66 * rgb1.x + 129 * rgb1.y + 25 * rgb1.z + 128) >> 8) + 16;
int u0 = ((-38 * rgb1.x - 74 * rgb1.y + 112 * rgb1.z + 128) >> 8) + 128;
int v0 = ((112 * rgb1.x - 94 * rgb1.y - 18 * rgb1.z + 128) >> 8) + 128;
int y1 = ((66 * rgb2.x + 129 * rgb2.y + 25 * rgb2.z + 128) >> 8) + 16;
int u1 = ((-38 * rgb2.x - 74 * rgb2.y + 112 * rgb2.z + 128) >> 8) + 128;
int v1 = ((112 * rgb2.x - 94 * rgb2.y - 18 * rgb2.z + 128) >> 8) + 128;
For DXGI_FORMAT_YUY2 you would use:
XMUBYTEN4 *dPtr = // Output pixel pair
dPtr->x = static_cast<uint8_t>(std::min<int>(std::max<int>(y0, 0), 255));
dPtr->y = static_cast<uint8_t>(std::min<int>(std::max<int>((u0 + u1) >> 1, 0), 255));
dPtr->z = static_cast<uint8_t>(std::min<int>(std::max<int>(y1, 0), 255));
dPtr->w = static_cast<uint8_t>(std::min<int>(std::max<int>((v0 + v1) >> 1, 0), 255));
For D3DFMT_UYVY you would use:
dPtr->x = static_cast<uint8_t>(std::min<int>(std::max<int>((u0 + u1) >> 1, 0), 255));
dPtr->y = static_cast<uint8_t>(std::min<int>(std::max<int>(y0, 0), 255));
dPtr->z = static_cast<uint8_t>(std::min<int>(std::max<int>((v0 + v1) >> 1, 0), 255));
dPtr->w = static_cast<uint8_t>(std::min<int>(std::max<int>(y1, 0), 255));

Bilinear interpolation in C/C++ and CUDA

I want to emulate the behavior of CUDA bilinear interpolation on CPU, but I found that the return value of tex2D seems not fit to the bilinear formula.
I guess that casting the interpolation coefficients from float to 9-bit fixed point format with 8 bits of fractional value[1] results in different values.
According to the conversion fomula [2, line 106], the result of the conversion will be the same as the input float when the coeffient is 1/2^n, with n=0,1,..., 8, but I still (not always) receive weird values.
Below I report an example of weird values. In this case, weird values always happen when id = 2*n+1, could anyone tell me why?
Src Array:
Src[0][0] = 38;
Src[1][0] = 39;
Src[0][1] = 118;
Src[1][1] = 13;
Texture Definition:
static texture<float4, 2, cudaReadModeElementType> texElnt;
texElnt.addressMode[0] = cudaAddressModeClamp;
texElnt.addressMode[1] = cudaAddressModeClamp;
texElnt.filterMode = cudaFilterModeLinear;
texElnt.normalized = false;
Kernel Function:
static __global__ void kernel_texElnt(float* pdata, int w, int h, int c, float stride/*0.03125f*/) {
const int gx = blockIdx.x*blockDim.x + threadIdx.x;
const int gy = blockIdx.y*blockDim.y + threadIdx.y;
const int gw = gridDim.x * blockDim.x;
const int gid = gy*gw + gx;
if (gx >= w || gy >= h) {
return;
}
float2 pnt;
pnt.x = (gx)*(stride)/*1/32*/;
pnt.y = 0.0625f/*1/16*/;
float4 result = tex2D( texElnt, pnt.x + 0.5, pnt.y + 0.5f);
pdata[gid*3 + 0] = pnt.x;
pdata[gid*3 + 1] = pnt.y;
pdata[gid*3 + 2] = result.x;
}
Bilinear Result of CUDA
id pnt.x pnt.y tex2D
0 0.00000 0.0625 43.0000000
1 0.03125 0.0625 42.6171875
2 0.06250 0.0625 42.6484375
3 0.09375 0.0625 42.2656250
4 0.12500 0.0625 42.2968750
5 0.15625 0.0625 41.9140625
6 0.18750 0.0625 41.9453125
7 0.21875 0.0625 41.5625000
8 0.25000 0.0625 41.5937500
9 0.28125 0.0625 41.2109375
0 0.31250 0.0625 41.2421875
10 0.34375 0.0625 40.8593750
11 0.37500 0.0625 40.8906250
12 0.40625 0.0625 40.5078125
13 0.43750 0.0625 40.5390625
14 0.46875 0.0625 40.1562500
15 0.50000 0.0625 40.1875000
16 0.53125 0.0625 39.8046875
17 0.56250 0.0625 39.8359375
18 0.59375 0.0625 39.4531250
19 0.62500 0.0625 39.4843750
20 0.65625 0.0625 39.1015625
21 0.68750 0.0625 39.1328125
22 0.71875 0.0625 38.7500000
23 0.75000 0.0625 38.7812500
24 0.78125 0.0625 38.3984375
25 0.81250 0.0625 38.4296875
26 0.84375 0.0625 38.0468750
27 0.87500 0.0625 38.0781250
28 0.90625 0.0625 37.6953125
29 0.93750 0.0625 37.7265625
30 0.96875 0.0625 37.3437500
31 1.00000 0.0625 37.3750000
CPU Result:
// convert coefficient ((1-α)*(1-β)), (α*(1-β)), ((1-α)*β), (α*β) to fixed point format
id pnt.x pnt.y tex2D
0 0.00000 0.0625 43.00000000
1 0.03125 0.0625 43.23046875
2 0.06250 0.0625 42.64843750
3 0.09375 0.0625 42.87890625
4 0.12500 0.0625 42.29687500
5 0.15625 0.0625 42.52734375
6 0.18750 0.0625 41.94531250
7 0.21875 0.0625 42.17578125
8 0.25000 0.0625 41.59375000
9 0.28125 0.0625 41.82421875
0 0.31250 0.0625 41.24218750
10 0.34375 0.0625 41.47265625
11 0.37500 0.0625 40.89062500
12 0.40625 0.0625 41.12109375
13 0.43750 0.0625 40.53906250
14 0.46875 0.0625 40.76953125
15 0.50000 0.0625 40.18750000
16 0.53125 0.0625 40.41796875
17 0.56250 0.0625 39.83593750
18 0.59375 0.0625 40.06640625
19 0.62500 0.0625 39.48437500
20 0.65625 0.0625 39.71484375
21 0.68750 0.0625 39.13281250
22 0.71875 0.0625 39.36328125
23 0.75000 0.0625 38.78125000
24 0.78125 0.0625 39.01171875
25 0.81250 0.0625 38.42968750
26 0.84375 0.0625 38.66015625
27 0.87500 0.0625 38.07812500
28 0.90625 0.0625 38.30859375
29 0.93750 0.0625 37.72656250
30 0.96875 0.0625 37.95703125
31 1.00000 0.0625 37.37500000
I leave a simple code on my github [3], after running the program you will got two files in D:\.
Edit 2014/01/20
I run the program with different increments and found the specification of tex2D "when alpha multiplied beta is less than 0.00390625, the return of tex2D does not match the bilinear interpolation formula"
Already satisfactory answers have been provided to this question, so now I just want to give a compendium of hopefully useful information on bilinear interpolation, how can it be implemented in C++ and the different ways it can be done in CUDA.
Maths behind bilinear interpolation
Assume that the original function T(x, y) is sampled at the Cartesian regular grid of points (i, j) with 0 <= i < M1, 0 <= j < M2 and i and j integers. For each value of y, one can first use 0 <= a < 1 to represent an arbitrary point i + a comprised between i and i + 1. Then, a linear interpolation along the y = j axis (which is parallel to the x axis) at that point can be performed obtaining
where r(x,y) is the function interpolating the samples of T(x,y). The same can be done for the line y = j + 1, obtaining
Now, for each i + a, an interpolation along the y axis can be performed on the samples r(i+a,j) and r(i+a,j+1). Accordingly, if one uses 0 <= b < 1 to represent an arbitrary point j + b located between j and j + 1, then a linear interpolation along the x = i + a axis (which is parallel to the y axis) can be worked out, so getting the final result
Note that the relations between i, j, a, b, x and y are the following
C/C++ implementation
Let me stress that this implementation, as well as the following CUDA ones, assume, as done at the beginning, that the samples of T are located on the Cartesian regular grid of points (i, j) with 0 <= i < M1, 0 <= j < M2 and i and j integers (unit spacing). Also, the routine is provided in single precision, complex (float2) arithmetics, but it can be easily cast in other arithmetics of interest.
void bilinear_interpolation_function_CPU(float2 * __restrict__ h_result, float2 * __restrict__ h_data,
float * __restrict__ h_xout, float * __restrict__ h_yout,
const int M1, const int M2, const int N1, const int N2){
float2 result_temp1, result_temp2;
for(int k=0; k<N2; k++){
for(int l=0; l<N1; l++){
const int ind_x = floor(h_xout[k*N1+l]);
const float a = h_xout[k*N1+l]-ind_x;
const int ind_y = floor(h_yout[k*N1+l]);
const float b = h_yout[k*N1+l]-ind_y;
float2 h00, h01, h10, h11;
if (((ind_x) < M1)&&((ind_y) < M2)) h00 = h_data[ind_y*M1+ind_x]; else h00 = make_float2(0.f, 0.f);
if (((ind_x+1) < M1)&&((ind_y) < M2)) h10 = h_data[ind_y*M1+ind_x+1]; else h10 = make_float2(0.f, 0.f);
if (((ind_x) < M1)&&((ind_y+1) < M2)) h01 = h_data[(ind_y+1)*M1+ind_x]; else h01 = make_float2(0.f, 0.f);
if (((ind_x+1) < M1)&&((ind_y+1) < M2)) h11 = h_data[(ind_y+1)*M1+ind_x+1]; else h11 = make_float2(0.f, 0.f);
result_temp1.x = a * h10.x + (-h00.x * a + h00.x);
result_temp1.y = a * h10.y + (-h00.y * a + h00.y);
result_temp2.x = a * h11.x + (-h01.x * a + h01.x);
result_temp2.y = a * h11.y + (-h01.y * a + h01.y);
h_result[k*N1+l].x = b * result_temp2.x + (-result_temp1.x * b + result_temp1.x);
h_result[k*N1+l].y = b * result_temp2.y + (-result_temp1.y * b + result_temp1.y);
}
}
}
The if/else statements within the above code are simply boundary checkings. If the sample falls outside the [0, M1-1] x [0, M2-1], then it is set to 0.
Standard CUDA implementation
This is a "standard" CUDA implementation tracing the above CPU one. No usage of texture memory.
__global__ void bilinear_interpolation_kernel_GPU(float2 * __restrict__ d_result, const float2 * __restrict__ d_data,
const float * __restrict__ d_xout, const float * __restrict__ d_yout,
const int M1, const int M2, const int N1, const int N2)
{
const int l = threadIdx.x + blockDim.x * blockIdx.x;
const int k = threadIdx.y + blockDim.y * blockIdx.y;
if ((l<N1)&&(k<N2)) {
float2 result_temp1, result_temp2;
const int ind_x = floor(d_xout[k*N1+l]);
const float a = d_xout[k*N1+l]-ind_x;
const int ind_y = floor(d_yout[k*N1+l]);
const float b = d_yout[k*N1+l]-ind_y;
float2 d00, d01, d10, d11;
if (((ind_x) < M1)&&((ind_y) < M2)) d00 = d_data[ind_y*M1+ind_x]; else d00 = make_float2(0.f, 0.f);
if (((ind_x+1) < M1)&&((ind_y) < M2)) d10 = d_data[ind_y*M1+ind_x+1]; else d10 = make_float2(0.f, 0.f);
if (((ind_x) < M1)&&((ind_y+1) < M2)) d01 = d_data[(ind_y+1)*M1+ind_x]; else d01 = make_float2(0.f, 0.f);
if (((ind_x+1) < M1)&&((ind_y+1) < M2)) d11 = d_data[(ind_y+1)*M1+ind_x+1]; else d11 = make_float2(0.f, 0.f);
result_temp1.x = a * d10.x + (-d00.x * a + d00.x);
result_temp1.y = a * d10.y + (-d00.y * a + d00.y);
result_temp2.x = a * d11.x + (-d01.x * a + d01.x);
result_temp2.y = a * d11.y + (-d01.y * a + d01.y);
d_result[k*N1+l].x = b * result_temp2.x + (-result_temp1.x * b + result_temp1.x);
d_result[k*N1+l].y = b * result_temp2.y + (-result_temp1.y * b + result_temp1.y);
}
}
CUDA implementation with texture fetch
This is the same implementation as above, but the global memory is now accessed by the texture cache. For example, T[i,j] is accessed as
tex2D(d_texture_fetch_float,ind_x,ind_y);
(where, of course ind_x = i and ind_y = j, and d_texture_fetch_float is assumed to be a global scope variable) instead of
d_data[ind_y*M1+ind_x];
Note that the hard-wired texture filtering capabilities are not exploited here. The routine below has the same precision as the above one and could result somewhat faster than that on old CUDA architectures.
__global__ void bilinear_interpolation_kernel_GPU_texture_fetch(float2 * __restrict__ d_result,
const float * __restrict__ d_xout, const float * __restrict__ d_yout,
const int M1, const int M2, const int N1, const int N2)
{
const int l = threadIdx.x + blockDim.x * blockIdx.x;
const int k = threadIdx.y + blockDim.y * blockIdx.y;
if ((l<N1)&&(k<N2)) {
float2 result_temp1, result_temp2;
const int ind_x = floor(d_xout[k*N1+l]);
const float a = d_xout[k*N1+l]-ind_x;
const int ind_y = floor(d_yout[k*N1+l]);
const float b = d_yout[k*N1+l]-ind_y;
const float2 d00 = tex2D(d_texture_fetch_float,ind_x,ind_y);
const float2 d10 = tex2D(d_texture_fetch_float,ind_x+1,ind_y);
const float2 d11 = tex2D(d_texture_fetch_float,ind_x+1,ind_y+1);
const float2 d01 = tex2D(d_texture_fetch_float,ind_x,ind_y+1);
result_temp1.x = a * d10.x + (-d00.x * a + d00.x);
result_temp1.y = a * d10.y + (-d00.y * a + d00.y);
result_temp2.x = a * d11.x + (-d01.x * a + d01.x);
result_temp2.y = a * d11.y + (-d01.y * a + d01.y);
d_result[k*N1+l].x = b * result_temp2.x + (-result_temp1.x * b + result_temp1.x);
d_result[k*N1+l].y = b * result_temp2.y + (-result_temp1.y * b + result_temp1.y);
}
}
Texture binding can be done according to
void TextureBindingBilinearFetch(const float2 * __restrict__ data, const int M1, const int M2)
{
size_t pitch;
float* data_d;
gpuErrchk(cudaMallocPitch((void**)&data_d,&pitch, M1 * sizeof(float2), M2));
cudaChannelFormatDesc desc = cudaCreateChannelDesc<float2>();
gpuErrchk(cudaBindTexture2D(0,&d_texture_fetch_float,data_d,&desc,M1,M2,pitch));
d_texture_fetch_float.addressMode[0] = cudaAddressModeClamp;
d_texture_fetch_float.addressMode[1] = cudaAddressModeClamp;
gpuErrchk(cudaMemcpy2D(data_d,pitch,data,sizeof(float2)*M1,sizeof(float2)*M1,M2,cudaMemcpyHostToDevice));
}
Note that now we need no if/else boundary checking, because the texture will automatically clamp to zero the samples falling outside the [0, M1-1] x [0, M2-1] sampling region, thanks to the instructions
d_texture_fetch_float.addressMode[0] = cudaAddressModeClamp;
d_texture_fetch_float.addressMode[1] = cudaAddressModeClamp;
CUDA implementation with texture interpolation
This is the last implementation and uses the hard-wired capabilities of texture filtering.
__global__ void bilinear_interpolation_kernel_GPU_texture_interp(float2 * __restrict__ d_result,
const float * __restrict__ d_xout, const float * __restrict__ d_yout,
const int M1, const int M2, const int N1, const int N2)
{
const int l = threadIdx.x + blockDim.x * blockIdx.x;
const int k = threadIdx.y + blockDim.y * blockIdx.y;
if ((l<N1)&&(k<N2)) { d_result[k*N1+l] = tex2D(d_texture_interp_float, d_xout[k*N1+l] + 0.5f, d_yout[k*N1+l] + 0.5f); }
}
Note that the interpolation formula implemented by this feature is the same as derived above, but now
where x_B = x - 0.5 and y_B = y - 0.5. This explains the 0.5 offset in the instruction
tex2D(d_texture_interp_float, d_xout[k*N1+l] + 0.5f, d_yout[k*N1+l] + 0.5f)
In this case, texture binding should be done as follows
void TextureBindingBilinearInterp(const float2 * __restrict__ data, const int M1, const int M2)
{
size_t pitch;
float* data_d;
gpuErrchk(cudaMallocPitch((void**)&data_d,&pitch, M1 * sizeof(float2), M2));
cudaChannelFormatDesc desc = cudaCreateChannelDesc<float2>();
gpuErrchk(cudaBindTexture2D(0,&d_texture_interp_float,data_d,&desc,M1,M2,pitch));
d_texture_interp_float.addressMode[0] = cudaAddressModeClamp;
d_texture_interp_float.addressMode[1] = cudaAddressModeClamp;
d_texture_interp_float.filterMode = cudaFilterModeLinear; // --- Enable linear filtering
d_texture_interp_float.normalized = false; // --- Texture coordinates will NOT be normalized
gpuErrchk(cudaMemcpy2D(data_d,pitch,data,sizeof(float2)*M1,sizeof(float2)*M1,M2,cudaMemcpyHostToDevice));
}
Note that, as already mentioned in the other answers, a and b are stored in 9-bit fixed point format with 8 bits of fractional value, so this approach will be very fast, but less accurate than those above.
The UV interpolants are truncated to 9 bits, not the participating texel values. In Chapter 10 (Texturing) of The CUDA Handbook, this is described in detail (including CPU emulation code) for the 1D case. Code is open source and may be found at https://github.com/ArchaeaSoftware/cudahandbook/blob/master/texturing/tex1d_9bit.cu
Wrong formula of bilinear interpolation makes result of texture fetching weird.
Formula - 1: you can find it in cuda appendix or wiki easily
tex(x,y)=(1−α)(1−β)T[i,j] + α(1−β)T[i+1,j] + (1−α)βT[i,j+1] + αβT[i+1,j+1]
Formula - 2: reduce times of multiply
tex(x,y)=T[i,j] + α(T[i+1,j]-T[i,j]) + β(T[i,j+1]-T[i,j]) + αβ(T[i,j]+T[i+1,j+1] - T[i+1, j]-T[i,j+1])
If you use 9-bit fixed point format to formula 1, you will get mismatch result of texture fetching, but formula 2 works fine.
Conclusion :
If you want to emulate the bilinear interpolation implemented by cuda texture, you should use formula 3. Try it!
Formula - 3:
tex(x,y)=T[i,j] + frac(α)(T[i+1,j]-T[i,j]) + frac(β)(T[i,j+1]-T[i,j]) + frac(αβ)(T[i,j]+T[i+1,j+1] - T[i+1, j]-T[i,j+1])
// frac(x) turns float to 9-bit fixed point format with 8 bits of fraction values.
float frac( float x ) {
float frac, tmp = x - (float)(int)(x);
float frac256 = (float)(int)( tmp*256.0f + 0.5f );
frac = frac256 / 256.0f;
return frac;
}

Why do these RNG's in C++ and R not produce similar results?

Please excuse the disgustingly noobish nature of this post, but I have a question for those who program in C++ and R on their personal computer.
Question: Why are these random numbers produced from the two programs below not equal, and how do I resolve this issue?
Firstly, I suspect that I have misused the local function and the <<- operator in the R program.
Secondly, I suspect it may be a floating-accuracy issue. It's not immediately obvious to me how the two programs are different, so I don't know how to go about this problem.
I have tried casting all my calculations in C++ to double/float (even long double), and using fmod instead of the modulus operator %: different outputs again, but still not similar to the output in R. I do not know if it of any significant importance, but I want to add that I am compiling the C++ code using the G++ compiler.
Algorithm: The following algorithm can be used in any standard personal computer. It was proposed to use in parallel three word generators,
mk = 171 mk-1 (mod 30269)
m'k = 172 m'k-1 (mod 30307)
m''k = 172 m''k-1 (mod 30323)
and to use as pseudorandom numbers the fractional parts
gk = {mk / 30269 + m'k / 30307 + m''k / 30323}
I have used the initial values m0 = 5, m'0 = 11, and m''0 = 17.
Programs: I have the following program in C++:
//: MC:Uniform.cpp
// Generate pseudo random numbers uniformly between 0 and 1
#include <iostream>
#include <math.h> // For using "fmod()"
using namespace std;
float uniform(){
// A sequence of initial values
static int x = 5;
static int y = 11;
static int z = 17;
// Some integer arithmetic required
x = 171 * (x % 177) - 2 * (x / 177);
y = 172 * (x % 176) - 35 * (y / 176);
z = 170 * (x % 178) - 63 * (z / 178);
/* If both operands are nonnegative then the
remainder is nonnegative; if not, the sign of
the remainder is implementation-defined. */
if(x < 0)
x = x + 30269;
if(y < 0)
y = y + 30307;
if(z < 0)
z = z + 30323;
return fmod(x / 30269. + y / 30307. + z / 30323., 1.);
}
int main(){
// Print 5 random numbers
for(int i = 0; i < 5; i++){
cout << uniform() << ", ";
}
}///:~
The program exites with code and outputs the following:
0.686912, 0.329174, 0.689649, 0.753722, 0.209394,
I also have a program in R, that looks like the following:
## Generate pseudo random numbers uniformly between 0 and 1
uniform <- local({
# A sequence of initial values
x = 5
y = 11
z = 17
# Use the <<- operator to make x, y and z local static
# variables in R.
f <- function(){
x <<- 171 * (x %% 177) - 2 * (x / 177)
y <<- 172 * (y %% 176) - 35 * (y / 176)
z <<- 170 * (z %% 178) - 63 * (z / 178)
return((x / 30269. + y / 30307. + z / 30323.)%%1.)
}
})
# Print 5 random numbers
for(i in 1:5){
print(uniform())
}
This program exites with code as well and produces the output
[1] 0.1857093
[1] 0.7222047
[1] 0.05103441
[1] 0.7375034
[1] 0.2065817
Any suggestions are appreciated, thanks in advance.
You need a few more %/%'s (integer division) in your R code. Remember that numeric variables in R are floating-point, not integer, by default; so / will do ordinary division with a non-integral quotient. You've also left out the part where you deal with negative x/y/z.
f <- function(){
x <<- 171 * (x %% 177) - 2 * (x %/% 177)
y <<- 172 * (y %% 176) - 35 * (y %/% 176)
z <<- 170 * (z %% 178) - 63 * (z %/% 178)
if(x < 0)
x <<- x + 30269;
if(y < 0)
y <<- y + 30307;
if(z < 0)
z <<- z + 30323;
return((x / 30269. + y / 30307. + z / 30323.)%%1)
}
After making those changes, there doesn't seem to be anything seriously wrong with the result. A quick histogram of 100000 random draws looks very uniform, and there's no autocorrelation I can find. Still doesn't match your C++ result though....
There's a simple copy/paste error in your C++ code. This
x = 171 * (x % 177) - 2 * (x / 177);
y = 172 * (x % 176) - 35 * (y / 176);
z = 170 * (x % 178) - 63 * (z / 178);
should be this.
x = 171 * (x % 177) - 2 * (x / 177);
y = 172 * (y % 176) - 35 * (y / 176);
z = 170 * (z % 178) - 63 * (z / 178);

2d rotation opengl

Here is the code I am using.
#define ANGLETORADIANS 0.017453292519943295769236907684886f // PI / 180
#define RADIANSTOANGLE 57.295779513082320876798154814105f // 180 / PI
rotation = rotation *ANGLETORADIANS;
cosRotation = cos(rotation);
sinRotation = sin(rotation);
for(int i = 0; i < 3; i++)
{
px[i] = (vec[i].x + centerX) * (cosRotation - (vec[i].y + centerY)) * sinRotation;
py[i] = (vec[i].x + centerX) * (sinRotation + (vec[i].y + centerY)) * cosRotation;
printf("num: %i, px: %f, py: %f\n", i, px[i], py[i]);
}
so far it seams my Y value is being fliped.. say I enter the value of X = 1 and Y = 1 with a 45 rotation you should see about x = 0 and y = 1.25 ish but I get x = 0 y = -1.25.
Also my 90 degree rotation always return x = 0 and y = 0.
p.s I know I'm only centering my values and not putting them back where they came from. It's not needed to put them back as all I need to know is the value I'm getting now.
Your bracket placement doesn't look right to me. I would expect:
px[i] = (vec[i].x + centerX) * cosRotation - (vec[i].y + centerY) * sinRotation;
py[i] = (vec[i].x + centerX) * sinRotation + (vec[i].y + centerY) * cosRotation;
Your brackets are wrong. It should be
px[i] = ((vec[i].x + centerX) * cosRotation) - ((vec[i].y + centerY) * sinRotation);
py[i] = ((vec[i].x + centerX) * sinRotation) + ((vec[i].y + centerY) * cosRotation);
instead