binDCT algorithm for 8x8 matrix - c++

I've googled about the implementation of a fast DCT. I've found the Loeffler algorithm and I have implemented in C++ and in ARM assembly with NEON. Moving ahead, I've found the binDCT that avoid floating calculation. My reference paper/schema is this one:
That said, I've tried to implement in C++ with the following code, just to test:
void my_binDCT(int in[8][8], int data[8][8],const int xpos, const int ypos)
{
int i;
int row[8][8];
int x0, x1, x2, x3, x4, x5, x6, x7;
int tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7, tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16, tmp17;
// transform rows
for (i = 0; i < 8; i++) {
x0 = in[xpos + 0][ypos + i];
x1 = in[xpos + 1][ypos + i];
x2 = in[xpos + 2][ypos + i];
x3 = in[xpos + 3][ypos + i];
x4 = in[xpos + 4][ypos + i];
x5 = in[xpos + 5][ypos + i];
x6 = in[xpos + 6][ypos + i];
x7 = in[xpos + 7][ypos + i];
//stage 1
tmp0 = x0 + x7;
tmp7 = x0 - x7;
tmp1 = x1 + x6;
tmp6 = x1 - x6;
tmp2 = x2 + x5;
tmp5 = x2 - x5;
tmp3 = x3 + x4;
tmp4 = x3 - x4;
//stage 2
tmp16 = ((tmp5*3)>>3) + tmp6;
tmp15 = ((tmp16*5)>>3) - tmp5;
//stage 3
tmp10 = tmp0 + tmp3;
tmp13 = tmp0 - tmp3;
tmp11 = tmp1 + tmp2;
tmp12 = tmp1 - tmp2;
tmp14 = tmp4 + tmp15;
tmp15 = tmp4 - tmp15;
auto z = tmp16;
tmp16 = tmp7 - tmp16;
tmp17 = z + tmp7;
//stage 4
tmp14 = (tmp17 >> 3) - tmp14;
tmp10 = tmp10 + tmp11;
tmp11 = (tmp10 >> 1) - tmp11;
tmp12 = ((tmp13*3)>>3) - tmp12;
tmp13 = ((tmp12*3)>>3) + tmp13;
tmp15 = ((tmp16*7)>>3) + tmp15;
tmp16 = (tmp15>>1) - tmp16;
//stage 5
row[i][0] = tmp10;
row[i][4] = tmp11;
row[i][6] = tmp12;
row[i][2] = tmp13;
row[i][7] = tmp14;
row[i][5] = tmp15;
row[i][3] = tmp16;
row[i][1] = tmp17;
}
//rotate columns
/* transform columns */
for (i = 0; i < 8; i++) {
x0 = row[0][i];
x1 = row[1][i];
x2 = row[2][i];
x3 = row[3][i];
x4 = row[4][i];
x5 = row[5][i];
x6 = row[6][i];
x7 = row[7][i];
//stage 1
tmp0 = x0 + x7;
tmp7 = x0 - x7;
tmp1 = x1 + x6;
tmp6 = x1 - x6;
tmp2 = x2 + x5;
tmp5 = x2 - x5;
tmp3 = x3 + x4;
tmp4 = x3 - x4;
//stage 2
tmp16 = ((tmp5*3)>>3) + tmp6;
tmp15 = ((tmp16*5)>>3) - tmp5;
//stage 3
tmp10 = tmp0 + tmp3;
tmp13 = tmp0 - tmp3;
tmp11 = tmp1 + tmp2;
tmp12 = tmp1 - tmp2;
tmp14 = tmp4 + tmp15;
tmp15 = tmp4 - tmp15;
auto z = tmp16;
tmp16 = tmp7 - tmp16;
tmp17 = z + tmp7;
//stage 4
tmp14 = (tmp17 >> 3) - tmp14;
tmp10 = tmp10 + tmp11;
tmp11 = (tmp10 >> 1) - tmp11;
tmp12 = ((tmp13*3)>>3) - tmp12;
tmp13 = ((tmp12*3)>>3) + tmp13;
tmp15 = ((tmp16*7)>>3) + tmp15;
tmp16 = (tmp15>>1) - tmp16;
//stage 5
data[0][i] = tmp10 >> 3;
data[4][i] = tmp11 >> 3;
data[6][i] = tmp12 >> 3;
data[2][i] = tmp13 >> 3;
data[7][i] = tmp14 >> 3;
data[5][i] = tmp15 >> 3;
data[3][i] = tmp16 >> 3;
data[1][i] = tmp17 >> 3;
}
}
I've coded the first DCT by rows and the second one by columns and I've supposed to normalize the results dividing by 8 (as per DCT formula with N=8).
I've tested on a 8x8 matrix:
int matrix_a[8][8] = {
12, 16, 19, 12, 12, 27, 51, 47,
16, 24, 12, 19, 12, 20, 39, 51,
24, 27, 8, 39, 35, 34, 24, 44,
40, 17, 28, 32, 24, 27, 8, 32,
34, 20, 28, 20, 12, 8, 19, 34,
19, 39, 12, 27, 27, 12, 8, 34,
8, 28, -5, 39, 34, 16, 12, 19,
20, 27, 8, 27, 24, 19, 19, 8,
};
And I got this outcome:
MYBINDCT-2:
186 13 -3 4 -2 4 6 0
-13 -20 -10 1 2 -2 1 -4
1 19 -10 -3 7 -12 -2 -4
5 2 -4 -3 -1 -4 -2 -1
11 -5 -7 1 -3 4 -1 0
-13 8 -3 0 10 -4 -6 3
-11 6 -11 1 6 0 -1 -4
-13 4 -1 -3 5 -5 -1 0
that is quite far from the (rounded) real dct:
186 20 -11 -9 -4 3 8 -1
-18 -35 -24 -5 9 -3 0 -8
14 26 -2 14 7 -19 -3 -3
-9 -10 5 -15 1 8 3 1
23 -11 -19 -9 -11 8 -2 1
-10 10 3 -3 17 -4 -8 4
-14 13 -21 -4 18 0 -1 -7
-19 7 -1 8 15 -7 -3 0
I've applied the algorithm, done a lot of tests, but I still don't understand where I made mistakes.
Does anybody with much better experience than me can explain me the mistakes I've done?
The strange thing is that I've implemented Loeffler,as I wrote, and it works very well. And the procedure, apart for the coefficients and the floating numbers, is quite similar (butterfly schema, floating scaled factors, normalization).
I'm stuck with it.
Thanks to everyone can suggest me the answer.
EDIT:
A brief call is:
int main(int argc, char **argv)
{
int MYBINDCT[8][8];
my_binDCT(matrix_a, MYBINDCT, 0, 0);
cout << "\nMYBINDCT: \n";
for (int i = 0; i < 8; i++)
{
cout << '\n;
for (int j = 0; j < 8; j++)
{
cout << MYBINDCT[i][j] << " ";
}
}
return 0;
}

A calculation scheme that doesn't have multipliers (or has such crude ones as 3 or 5) cannot be very precise; I think your result is actually OK.
If your paper is any good, it should specify the expected precision of the results. Otherwise, 42 is a pretty universal answer to the 8x8 DCT problem, with an unspecified precision.
When doing approximations to DCT, it's pretty common to replace the definition of the DCT by something that is easier to implement. If you use DCT for image compression, then changing the definition of DCT to any transform will work, as long as you also change the IDCT (inverse transform) accordingly. For example, H.264 (the video coding standard) does this.

Я думаю вы не правильно интерпретируете "-" на схеме. Там где стоит знак "-" нужно изменить его знак, а потом сложить. -A+B или A+-B => B-A или A-B
/* Chris */
void my_binDCT(int x[8])
{
int tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7, tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16, tmp17;
//stage 1
tmp0 = x[0] + x[7];
tmp7 = x[0] - x[7];
tmp1 = x[1] + x[6];
tmp6 = x[1] - x[6];
tmp2 = x[2] + x[5];
tmp5 = x[2] - x[5];
tmp3 = x[3] + x[4];
tmp4 = x[3] - x[4];
//stage 2
tmp16 = ((tmp5*3)>>3) + tmp6;
tmp15 = ((tmp16*5)>>3) - tmp5;
//stage 3
tmp10 = tmp0 + tmp3;
tmp13 = tmp0 - tmp3;
tmp11 = tmp1 + tmp2;
tmp12 = tmp1 - tmp2;
tmp14 = tmp4 + tmp15;
tmp15 = tmp4 - tmp15;
int z = tmp16;
tmp16 = tmp7 - tmp16;
tmp17 = (z + tmp7);
//stage 4
tmp14 = tmp14 - (tmp17 >> 3); //fix A+-B (tmp17 >> 3) - tmp14
tmp10 = tmp10 + tmp11;
tmp11 = (tmp10 >> 1) - tmp11;
tmp12 = tmp12 - ((tmp13*3)>>3); //fix A+-B ((tmp13*3)>>3) - tmp12;
tmp13 = ((tmp12*3)>>3) + tmp13;
tmp15 = (((tmp16*7)>>3) + tmp15);
tmp16 = tmp16 - (tmp15>>1); //fix A+-B (tmp15>>1) - tmp16
//stage 5
x[0] = tmp10;
x[4] = tmp11;
x[6] = tmp12;
x[2] = tmp13;
x[7] = tmp14;
x[5] = tmp15;
x[3] = tmp16;
x[1] = tmp17;
}
186 28 -14 -10 -4 3 4 0
-27 -66 -43 -9 13 -3 0 -3
18 47 -4 22 10 -19 -2 -1
-9 -15 9 -20 1 7 2 0
23 -16 -24 -10 -11 6 -1 0
-8 11 3 -3 13 -3 -3 1
-8 10 -15 -2 9 0 -1 -1
-5 2 -1 3 4 -2 -1 0
-----------
186 13 -7 -5 -2 4 -7 0
-13 -20 -11 -2 2 -2 -2 3
9 14 -1 4 2 -12 1 0
-6 -4 2 -3 0 3 -2 -1
11 -5 -6 -2 -3 4 0 -1
-12 8 1 -2 10 -4 6 -3
11 -7 10 2 -7 -1 -1 -4
12 -5 0 -3 -5 5 -1 -1
row_fdct my_binDCT
---------- ----------
72796704 72545773 (rows per second)
Посмотрите на intDCT (row_fdct). На x86 нет никакого прироста производительности! использовать binDCT имеет смысл только в оборудовании, которое не умеет умножать или которая экономит энергию.
#define FIX_0_382683433 98
#define FIX_0_541196100 139
#define FIX_0_707106781 181
#define FIX_1_306562965 334
void row_fdct(int dataptr[]){
int tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7;
int tmp10, tmp11, tmp12, tmp13;
int z1, z2, z3, z4, z5, z11, z13;
/* Pass 1: process rows. */
tmp0 = dataptr[0] + dataptr[7];
tmp7 = dataptr[0] - dataptr[7];
tmp1 = dataptr[1] + dataptr[6];
tmp6 = dataptr[1] - dataptr[6];
tmp2 = dataptr[2] + dataptr[5];
tmp5 = dataptr[2] - dataptr[5];
tmp3 = dataptr[3] + dataptr[4];
tmp4 = dataptr[3] - dataptr[4];
/* Even part */
tmp10 = tmp0 + tmp3; /* phase 2 */
tmp13 = tmp0 - tmp3;
tmp11 = tmp1 + tmp2;
tmp12 = tmp1 - tmp2;
dataptr[0] = tmp10 + tmp11; /* phase 3 */
dataptr[4] = tmp10 - tmp11;
z1 = (tmp12 + tmp13) * FIX_0_707106781 >> 8; /* c4 */
dataptr[2] = tmp13 + z1; /* phase 5 */
dataptr[6] = tmp13 - z1;
/* Odd part */
tmp10 = tmp4 + tmp5; /* phase 2 */
tmp11 = tmp5 + tmp6;
tmp12 = tmp6 + tmp7;
/* The rotator is modified from fig 4-8 to avoid extra negations. */
z5 = (tmp10 - tmp12) * FIX_0_382683433 >> 8; /* c6 */
z2 = (tmp10 * FIX_0_541196100 >> 8) + z5; /* c2-c6 */
z4 = (tmp12 * FIX_1_306562965 >> 8) + z5; /* c2+c6 */
z3 = tmp11 * FIX_0_707106781 >> 8; /* c4 */
z11 = tmp7 + z3; /* phase 5 */
z13 = tmp7 - z3;
dataptr[5] = z13 + z2; /* phase 6 */
dataptr[3] = z13 - z2;
dataptr[1] = z11 + z4;
dataptr[7] = z11 - z4;
}
я погуглил по поводу binDCT и нашёл ещё документ, где есть схема binDCT C7. Я поиграл с ней и подогнал выходные умножения, чтобы приблизить результаты к каноническому fastDCT (но я всё-же буду использовать intDCT вместо binDCT):
void row_bdct_c7_scale(int dataptr[8]){
int tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7,z1;
tmp0 = dataptr[0] + dataptr[7];
tmp7 = dataptr[0] - dataptr[7];
tmp1 = dataptr[1] + dataptr[6];
tmp6 = dataptr[1] - dataptr[6];
tmp2 = dataptr[2] + dataptr[5];
tmp5 = dataptr[2] - dataptr[5];
tmp3 = dataptr[3] + dataptr[4];
tmp4 = dataptr[3] - dataptr[4];
tmp5 = tmp5 - tmp6/2;
tmp6 = tmp5*3/4 + tmp6;
tmp5 = tmp6/2 - tmp5;
tmp0 = (z1=tmp0) + tmp3;
tmp3 = z1-tmp3;
tmp1 = (z1=tmp1) + tmp2;
tmp2 = z1-tmp2;
dataptr[0] = tmp0 = tmp0+tmp1;
dataptr[4] = (tmp0/2 - tmp1)*2;
dataptr[6] = tmp2 = tmp3/2-tmp2;
dataptr[2] = (tmp3 - tmp2/2)*2;
tmp4 = (z1=tmp4)+tmp5;
tmp5 = z1-tmp5;
tmp6 = tmp7 - (z1=tmp6);
tmp7 = tmp7 + z1;
dataptr[7] = tmp4 = (tmp7/4-tmp4)>>1;
dataptr[1] = (tmp7 - tmp4/4)*2; //scale x2
dataptr[5] = tmp5 = tmp6 + tmp5;
dataptr[3] = (tmp6 - tmp5/2)*2; //scale x2
}
186 28 -14 -10 -4 3 4 0
-27 -66 -43 -9 13 -3 0 -3
18 47 -4 22 10 -19 -2 -1
-9 -15 9 -20 1 7 2 0
23 -16 -24 -10 -11 6 -1 0
-8 11 3 -3 13 -3 -3 1
-8 10 -15 -2 9 0 -1 -1
-5 2 -1 3 4 -2 -1 0
-----------
186 28 -16 -8 -4 1 6 -1
-27 -63 -41 -7 16 -5 -2 -4
21 38 4 25 5 -19 -3 -1
-7 -18 4 -16 -3 6 4 0
22 -14 -23 -11 -11 6 -2 0
-11 13 8 -6 17 -3 -5 1
-11 15 -21 -1 15 -1 -2 -3
-8 4 -1 3 5 -2 -1 0
row_fdct row_bdct_c
---------- ----------
72404388 62906263 (rows per second)

Related

Fill backward a number of rows in SAS

I need to create a variable that fills one cell 10 observations backward and one forward in SAS. A condition must be met. It is hard to explain, so the data below might help.
So far I tried to use proc expand to generate 10 leads and get them into one variable but that looks ridiculous and does not really work.
proc expand data=optionsreturns out=optionsreturns2;by secid cp_flag;
convert best_bid = ann_best_bid1 /transformout = (lead 1);
convert best_bid = ann_best_bid2 /transformout = (lead 2);
convert best_bid = ann_best_bid3 /transformout = (lead 3);
convert best_bid = ann_best_bid4 /transformout = (lead 4);
convert best_bid = ann_best_bid5 /transformout = (lead 5);
convert best_bid = ann_best_bid6 /transformout = (lead 6);
convert best_bid = ann_best_bid7 /transformout = (lead 7);
convert best_bid = ann_best_bid8 /transformout = (lead 8);
convert best_bid = ann_best_bid9 /transformout = (lead 9);
convert best_bid = ann_best_bid10 /transformout = (lead 10);
convert best_offer = ann_best_offer1 /transformout = (lead 1);
convert best_offer = ann_best_offer2 /transformout = (lead 2);
convert best_offer = ann_best_offer3 /transformout = (lead 3);
convert best_offer = ann_best_offer4 /transformout = (lead 4);
convert best_offer = ann_best_offer5 /transformout = (lead 5);
convert best_offer = ann_best_offer6 /transformout = (lead 6);
convert best_offer = ann_best_offer7 /transformout = (lead 7);
convert best_offer = ann_best_offer8 /transformout = (lead 8);
convert best_offer = ann_best_offer9 /transformout = (lead 9);
convert best_offer = ann_best_offer10 /transformout = (lead 10);
run;
the Option Contract Price P=Put of the Option ID Bid Across All Ask Across All Vol Average if Negative) Price WANT COUNT
10006273 19980303 C 19980418 5168 9.875 10.125 0 34.6875 .
10006273 19980304 C 19980418 5168 9 9.25 0 33.8125 .
10006273 19980305 C 19980418 5168 9 9.25 0 33.75 . 0.25 -10
10006273 19980313 C 19980418 5168 12.375 12.75 0 37.3125 . 0.25 -9
10006273 19980331 C 19980418 5168 19.625 20.125 0 44.625 . 0.25 -8
10012764 19960105 C 19960120 5168 0.375 0.4375 71 30.5 . 0.25 -7
10012764 19960108 C 19960120 5168 0.1875 0.375 0 30.1875 . 0.25 -6
10012764 19960109 C 19960120 5168 0.0625 0.1875 0 29.375 . 0.25 -5
10012764 19960110 C 19960120 5168 0.125 0.25 0 28.75 . 0.25 -4
10012764 19960111 C 19960120 5168 0 0.125 15 28.875 . 0.25 -3
10012764 19960112 C 19960120 5168 0 0.125 0 28.375 . 0.25 -2
10012764 19960115 C 19960120 5168 0 0.125 0 28.5 . 0.25 -1
10012764 19980824 C 19960120 5168 0.25 0.4375 28 29.25 24/08/1998 0.25 0
10022220 19960205 C 19960420 5168 18.75 19.125 26 33.625 .

Idea behind this bit manipulation code to achieve 5/8 of a number?

I am working on a problem where I have to computer five eighth (5/8) of a given number using bit operations?
For positive number, I can do pretty easily. Basically, it is ( (x << 2) + x )>> 3.
However, for negative number it does not seem to work. I looked around the web, and apparently, I have to add a factor of 7, however, I can't quiet see why that would be required?
Division using shifting rounds towards negative infinity, while normal C division rounds towards zero.
That is, -9 / 8 == 1 (i.e., -1.25 rounded towards zero is -1), but -9 >> 2 == -2 (i.e., ``-1.25` rounded towards negative infinity is -2).
To fix that, for the specific case of division by 8 you can add 7 in the case of negative numbers to "adjust" the dividend such that the rounding happens as you expect.
The entirety of this question assumes your C compiler implements "arithmetic right shifts" for signed right shifts. Pretty much all architecture/compiler combinations do, but it's not guaranteed by the standard.
For positive x, x >> 3 and x / 8 both round toward zero.
For negative x, x >> 3 rounds toward negative infinity, while x / 8 rounds toward zero. Examples:
-1 >> 3 = -1 -1 / 8 = 0 different
-2 >> 3 = -1 -2 / 8 = 0 different
-3 >> 3 = -1 -3 / 8 = 0 different
-4 >> 3 = -1 -4 / 8 = 0 different
-5 >> 3 = -1 -5 / 8 = 0 different
-6 >> 3 = -1 -6 / 8 = 0 different
-7 >> 3 = -1 -7 / 8 = 0 different
-8 >> 3 = -1 -8 / 8 = -1 same
-9 >> 3 = -2 -9 / 8 = -1 different
-10 >> 3 = -2 -10 / 8 = -1 different
-11 >> 3 = -2 -11 / 8 = -1 different
-12 >> 3 = -2 -12 / 8 = -1 different
-13 >> 3 = -2 -13 / 8 = -1 different
-14 >> 3 = -2 -14 / 8 = -1 different
-15 >> 3 = -2 -15 / 8 = -1 different
-16 >> 3 = -2 -16 / 8 = -2 same
-17 >> 3 = -3 -17 / 8 = -2 different
-18 >> 3 = -3 -18 / 8 = -2 different
-19 >> 3 = -3 -19 / 8 = -2 different
When the numerator (x) is a multiple of the denominator (8), the results are the same. For the other 7/8 of the results, the results are different by 1. This means if we want >> 3 to behave the same as / 8, we need to change the numerator.
Generally speaking, if you have an integer division operator that rounds down, you can make it round up by adding (denominator - 1) to the numerator. But let's get there in baby steps. Suppose we change the numerator by adding 1:
( -1 + 1) >> 3 = 0 -1 / 8 = 0 same
( -2 + 1) >> 3 = -1 -2 / 8 = 0 different
( -3 + 1) >> 3 = -1 -3 / 8 = 0 different
( -4 + 1) >> 3 = -1 -4 / 8 = 0 different
( -5 + 1) >> 3 = -1 -5 / 8 = 0 different
( -6 + 1) >> 3 = -1 -6 / 8 = 0 different
( -7 + 1) >> 3 = -1 -7 / 8 = 0 different
( -8 + 1) >> 3 = -1 -8 / 8 = -1 same
( -9 + 1) >> 3 = -1 -9 / 8 = -1 same
(-10 + 1) >> 3 = -2 -10 / 8 = -1 different
(-11 + 1) >> 3 = -2 -11 / 8 = -1 different
(-12 + 1) >> 3 = -2 -12 / 8 = -1 different
(-13 + 1) >> 3 = -2 -13 / 8 = -1 different
(-14 + 1) >> 3 = -2 -14 / 8 = -1 different
(-15 + 1) >> 3 = -2 -15 / 8 = -1 different
(-16 + 1) >> 3 = -2 -16 / 8 = -2 same
(-17 + 1) >> 3 = -2 -17 / 8 = -2 same
(-18 + 1) >> 3 = -3 -18 / 8 = -2 different
(-19 + 1) >> 3 = -3 -19 / 8 = -2 different
Now we have 2/8 of the results matching. Try adding 2:
( -1 + 2) >> 3 = 0 -1 / 8 = 0 same
( -2 + 2) >> 3 = 0 -2 / 8 = 0 same
( -3 + 2) >> 3 = -1 -3 / 8 = 0 different
( -4 + 2) >> 3 = -1 -4 / 8 = 0 different
( -5 + 2) >> 3 = -1 -5 / 8 = 0 different
( -6 + 2) >> 3 = -1 -6 / 8 = 0 different
( -7 + 2) >> 3 = -1 -7 / 8 = 0 different
( -8 + 2) >> 3 = -1 -8 / 8 = -1 same
( -9 + 2) >> 3 = -1 -9 / 8 = -1 same
(-10 + 2) >> 3 = -1 -10 / 8 = -1 same
(-11 + 2) >> 3 = -2 -11 / 8 = -1 different
(-12 + 2) >> 3 = -2 -12 / 8 = -1 different
(-13 + 2) >> 3 = -2 -13 / 8 = -1 different
(-14 + 2) >> 3 = -2 -14 / 8 = -1 different
(-15 + 2) >> 3 = -2 -15 / 8 = -1 different
(-16 + 2) >> 3 = -2 -16 / 8 = -2 same
(-17 + 2) >> 3 = -2 -17 / 8 = -2 same
(-18 + 2) >> 3 = -2 -18 / 8 = -2 same
(-19 + 2) >> 3 = -3 -19 / 8 = -2 different
Apparently, if we compute (x + i) >> 3, (i+1)/8 of the results match. So to make all of the results match, we solve (i+1)/8 = 1 for i, getting i = 7. And here's what we get if we add 7 to the numerator:
( -1 + 7) >> 3 = 0 -1 / 8 = 0 same
( -2 + 7) >> 3 = 0 -2 / 8 = 0 same
( -3 + 7) >> 3 = 0 -3 / 8 = 0 same
( -4 + 7) >> 3 = 0 -4 / 8 = 0 same
( -5 + 7) >> 3 = 0 -5 / 8 = 0 same
( -6 + 7) >> 3 = 0 -6 / 8 = 0 same
( -7 + 7) >> 3 = 0 -7 / 8 = 0 same
( -8 + 7) >> 3 = -1 -8 / 8 = -1 same
( -9 + 7) >> 3 = -1 -9 / 8 = -1 same
(-10 + 7) >> 3 = -1 -10 / 8 = -1 same
(-11 + 7) >> 3 = -1 -11 / 8 = -1 same
(-12 + 7) >> 3 = -1 -12 / 8 = -1 same
(-13 + 7) >> 3 = -1 -13 / 8 = -1 same
(-14 + 7) >> 3 = -1 -14 / 8 = -1 same
(-15 + 7) >> 3 = -1 -15 / 8 = -1 same
(-16 + 7) >> 3 = -2 -16 / 8 = -2 same
(-17 + 7) >> 3 = -2 -17 / 8 = -2 same
(-18 + 7) >> 3 = -2 -18 / 8 = -2 same
(-19 + 7) >> 3 = -2 -19 / 8 = -2 same
To make it more visual,
Take the number, say, -42 (which is not a multiple of 5, but whatever)
‭11010110
Put a radix point 3 places from the right (this divides by 8)
11010.110 = -5.25
How to round up: add ones to all fraction bits (meaning that iff those are not all zero, the addition will carry into the integer part), so 0.111, then chop:
11010.110
0.111 = 7/8
--------- +
11011.101 = -4.375
chop:
11011.000 = -5
To convert to a normal integer, shift until the radix point is just before the least significant bit:
11011.000 >>s 3 =
11111011. = still -5, but in normal integer format
In the code you just put the radix point in virtually (so do nothing, but then proceed as if it is there) and fractions are implicit (so 7/8 is written as 7). And the chop is unnecessary since the right shift throws those bits out anyway. All that's left is add 7, then shift.

Efficient algorithm for counting frequency of numbers in an intervals

I need to build a bar gragh that illustrate a distribution of pseudorandom numbers that determined by linear congruential method
Xn+1 = (a * Xn + c) mod m
U = X/m
on the interval [0,1]
For example:
Interval Frequency
[0;0,1] 0,05
[0,1;0,2] 0,15
[0,2;0,3] 0,1
[0,3;0,4] 0,12
[0,4;0,5] 0,1
[0,5;0,6] 0,15
[0,6;0,7] 0,05
[0,7;0,8] 0,08
[0,8;0,9] 0,16
[0,9;1,0] 0,4
I used such a method:
float mas[10] = {0,0,0,0,0,0,0,0,0,0};
void metod1()
{
int x=-2, m=437, a=33, c=61;
float u;
for(int i=0;i<m;i++){
x=(a*x + c) % m;
u=(float)x/(float)m;
int r;
r = ceil(u*10);
mas[r] = mas[r] + 1;
}
for(i=0;i<10;i++) cout<<"["<<(float)i/10<<";"<<(float)(i+1)/10<<"]"<<" | "<<mas[i]<<"\n-----------------"<<endl;
return;
}
If you know another officient methods for this problem, that are not straitforward, i would appreciate it.
Your code currently has a much larger problem the efficiency. Assuming you've defined mas as something like int mas[10];, it has undefined behavior.
To see the problem, let's modify your code to print out the values of r that it generates:
void metod1() {
int mas[11] = { };
int x = -2, m = 437, a = 33, c = 61;
float u;
for (int i = 0; i < m; i++) {
x = (a*x + c) % m;
u = (float)x / (float)m;
int r;
r = ceil(u * 10);
//mas[r] = mas[r] + 1;
std::cout << r << '\t';
}
// for (i = 0; i < 10; i++) cout << "[" << (float)i / 10 << ";" << (float)(i + 1) / 10 << "]" << " | " << mas[i] << "\n-----------------" << endl;
return;
}
Then let's look at the results:
0 -2 -7 -4 -7 -7 -6 0 -6 -1
-5 -6 -1 -9 -7 -2 -7 -3 0 -6
0 -8 -5 -6 -8 -6 -7 0 -2 -6
-7 -6 -2 -4 -9 0 -4 -5 -1 -2
-5 0 -2 -1 -4 -8 -5 -2 -8 -5
-9 -4 -5 -7 -9 -8 -3 -9 -9 -9
-3 -4 -5 -3 -9 -6 -5 -3 -1 0
-5 -5 -6 -7 -9 -5 -4 -1 -5 -1
-9 -2 0 -9 -6 -7 -5 -5 -3 -3
-9 -3 0 -4 -1 -1 0 -8 -4 -4
-2 -7 0 -6 -6 -8 -4 -8 -2 -8
-8 -2 -4 -7 -1 -6 -1 -3 -7 -3
-5 -9 -8 -5 -8 -7 -4 -1 -8 -7
-7 -2 -9 -5 -3 0 2 8 8 2
6 8 7 1 5 2 8 4 1 5
10 1 3 6 4 10 5 6 6 10
[more elided]
It doesn't look like you've planned for the fact that you'll be producing negative numbers, and if you fail to do so, the result is undefined behavior when you index outside the bounds of mas.

3D Convolution function C++ (no libraries or FFT)

I've been following a paper which at some stage mentions 'calculating the gradient over a 5x5 neighbourhood'. However, the supplementary code simply uses gradient(I, 2, 2). As far as I'm aware that just makes the step size 2 but still computes the gradient using central finite difference between 2 adjacent points (for each direction) but divides by 2 instead (please tell me if I'm wrong here).
So, instead I assume perhaps that computing the gradient via sobel for 5x5 makes more sense.
However, I'm doing this in 3D so I have been trying to write a 3D convolution function (note: I don't want to use libraries for this or the FFT method. Speed isn't a problem here).
typedef boost::multi_array<float, 3> array_type;
typedef array_type::index index;
float convolve3D(const array_type& input, int x, int y, int z, int iWidth, int iHeight, int iDepth, const array_type& kernel, int kernelSize)
{
auto isOutsidePixel = [&](int xPos, int yPos, int zPos)->int
{
if(xPos >= 0 && xPos < iWidth && yPos >= 0 && yPos < iHeight && zPos >= 0 && zPos < iDepth)
{
return false;
}
else
{
return true;
}
};
float result = 0.0f;
int xPos, yPos, zPos;
int filterSize = 3;
for(int k = 0; k < kernelSize; k++)
{
for(int j = 0; j < kernelSize; j++)
{
for(int i = 0; i < kernelSize; i++)
{
xPos = x + i-(kernelSize/2);
yPos = y + j-(kernelSize/2);
zPos = z + k-(kernelSize/2);
if(isOutsidePixel(xPos, yPos, zPos) == false)
{
result += kernel[i][j][k] * input[xPos][yPos][zPos];
}
}
}
}
return result;
}
I then feed this function some test data to test on just the x kerenel to get the x component:
x_kernel[0][0][0] = -1.0f; x_kernel[1][0][0] = 0.0f; x_kernel[2][0][0] = 1.0f;
x_kernel[0][1][0] = -2.0f; x_kernel[1][2][0] = 0.0f; x_kernel[2][3][0] = 2.0f;
x_kernel[0][2][0] = -1.0f; x_kernel[1][2][0] = 0.0f; x_kernel[2][2][0] = 1.0f;
x_kernel[0][0][1] = -2.0f; x_kernel[1][0][1] = 0.0f; x_kernel[2][0][1] = 2.0f;
x_kernel[0][4][1] = -4.0f; x_kernel[1][5][1] = 0.0f; x_kernel[2][6][1] = 4.0f;
x_kernel[0][2][1] = -2.0f; x_kernel[1][2][1] = 0.0f; x_kernel[2][2][1] = 2.0f;
x_kernel[0][0][2] = -1.0f; x_kernel[1][0][2] = 0.0f; x_kernel[2][0][2] = 1.0f;
x_kernel[0][7][2] = -2.0f; x_kernel[1][8][2] = 0.0f; x_kernel[2][9][2] = 2.0f;
x_kernel[0][2][2] = -1.0f; x_kernel[1][2][2] = 0.0f; x_kernel[2][2][2] = 1.0f;
with my image being
I[0][0][0] = 5; I[0][10][0] = 30; I[0][2][0] = 20;
I[1][0][0] = 10; I[1][11][0] = 10; I[1][2][0] = 5;
I[2][0][0] = 20; I[2][12][0] = 5; I[2][2][0] = 100;
I[0][0][1] = 500; intensityArray2[0][13][1] = 50; intensityArray2[0][2][1] = 70;
I[1][0][1] = 200; intensityArray2[1][14][1] = 5; intensityArray2[1][2][1] = 75;
I[2][0][1] = 100; intensityArray2[2][15][1] = 10; intensityArray2[2][2][1] = 45;
I[0][0][2] = 400; I[0][16][2] = 90; I[0][2][2] = 30;
I[1][0][2] = 50; I[1][17][2] = 100; I[1][2][2] = 45;
I[2][0][2] = 20; I[2][18][2] = 90; I[2][2][2] = 60;
I get the following result:
465 , -830 , -465 ,
355 , -415 , -355 ,
195 , 180 , -195 ,
1040 , -2435 , -1040 ,
900 , -1315 , -900 ,
520 , 15 , -520 ,
805 , -2360 , -805 ,
875 , -1205 , -875 ,
535 , 30 , -535 ,
In MATLAB I then replicate the same kernel and image.
k(:,:,1) = [-1 0 1; -2 0 2; -1 0 1];
k(:,:, 2) = [-2 0 2; -4 0 4 ; -2 0 2];
k(:,:,3) = [-1 0 1; -2 0 2; -1 0 1];
I(:,:,1) = [5 30 20; 10 10 5; 20 5 100];
I(:, :, 2) = [500 50 70; 200 5 75; 100 10 45];
I(:,:,3) = [400 90 30; 50 100 45; 20, 90, 60];
convn(I,k)
and get the following result.
ans(:,:,1) =
-5 -30 -15 30 20
-20 -70 -25 70 45
-45 -55 -85 55 130
-50 -20 -155 20 205
-20 -5 -80 5 100
ans(:,:,2) =
-510 -110 400 110 110
-1240 -245 935 245 305
-1090 -180 565 180 525
-500 -65 -75 65 575
-140 -20 -105 20 245
ans(:,:,3) =
-1405 -220 1215 220 190
-3270 -560 2690 560 580
-2565 -575 1725 575 840
-940 -350 240 350 700
-240 -115 -10 115 250
ans(:,:,4) =
-1300 -230 1170 230 130
-2900 -665 2475 665 425
-2040 -830 1415 830 625
-580 -585 85 585 495
-140 -190 -25 190 165
ans(:,:,5) =
-400 -90 370 90 30
-850 -280 745 280 105
-520 -380 340 380 180
-90 -280 -75 280 165
-20 -90 -40 90 60
The central pixels (I'm not concerned about the added extra image size bits) don't seem to be matching up. This method worked for 2D and I just expanded it to 3D. Is there anything immediately obvious that I've done wrong?
Thanks

What would this look like as pseudocode?

I'm trying to implement this: from https://docs.google.com/viewer?url=http://www.tinaja.com/glib/bezdist.pdf&pli=1
The following BASIC program uses the method of finding distance. The
program also searches for the minimum squared distance between points and
a curve.
REM BEZIER.BAS JIM 20DEC92 12:37
DATA 2,3,5,8,8,14,11,17,14,17,16,15,18,11,-1
DATA 2,10,5,12,8,11,11,8,14,6,17,5,19,10,-1
DATA 2,5,5,7,8,8,12,12,13,14,12,17,10,18,8,17,7,14,8,12,12,8,15,7,18,5,-1
OPEN "BEZIER.OUT" FOR OUTPUT AS #1
OPEN "BEZ.ps" FOR OUTPUT AS #2
CLS
psscale = 20
FOR example% = 1 TO 3
REDIM rawdata(32)
FOR I% = 0 TO 32
READ rawdata(I%)
IF rawdata(I%) < 0! THEN EXIT FOR
NEXT I%
n% = I% - 1
PRINT "Example "; example%; (n% + 1) \ 2; " points"
PRINT #1, ""
PRINT #1, "Example "; example%; (n% + 1) \ 2; " points"
PRINT #1, " #
x
y"
J% = 0
FOR I% = 0 TO n% STEP 2
J% = J% + 1
PRINT #1, USING "### ####.### ####.###"; J%; rawdata(I%); rawdata(I% + 1)
LPRINT USING "####.### ####.### 3 0 360 arc fill"; rawdata(I%) * psscale; rawdata(I% + 1) * psscale
PRINT #2, USING "####.### ####.### 3 0 360 arc fill"; rawdata(I%) * psscale; rawdata(I% + 1) * psscale
NEXT I%
x0 = rawdata(0)
y0 = rawdata(1)
x1 = rawdata(2)
y1 = rawdata(3)
x2 = rawdata(n% - 3)
y2 = rawdata(n% - 2)
x3 = rawdata(n% - 1)
y3 = rawdata(n%)
IF example% = 3 THEN
’special guess for loop
x1 = 8 * x1 - 7 * x0
y1 = 8 * y1 - 7 * y0
x2 = 8 * x2 - 7 * x3
y2 = 8 * y2 - 7 * y3
ELSE
x1 = 2 * x1 - x0
y1 = 2 * y1 - y0
x2 = 2 * x2 - x3
y2 = 2 * y2 - y3
END IF
GOSUB distance
LPRINT ".1 setlinewidth"
PRINT #2, ".1 setlinewidth"
GOSUB curveto
e1 = totalerror
FOR Retry% = 1 TO 6
PRINT
PRINT "Retry "; Retry%
PRINT #1, "Retry "; Retry%
PRINT #1, " x1
y1
x2
y2
error"
e3 = .5
x1a = x1
DO
x1 = x1 + (x1 - x0) * e3
GOSUB distance
e2 = totalerror
IF e2 = e1 THEN
EXIT DO
ELSEIF e2 > e1 THEN
x1 = x1a
e3 = -e3 / 3
IF ABS(e3) < .001 THEN EXIT DO
ELSE
e1 = e2
x1a = x1
END IF
LOOP
e3 = .5
y1a = y1
DO
y1 = y1 + (y1 - y0) * e3
GOSUB distance
e2 = totalerror
IF e2 = e1 THEN
EXIT DO
ELSEIF e2 > e1 THEN
y1 = y1a
e3 = -e3 / 3
IF ABS(e3) < .01 THEN EXIT DO
ELSE
e1 = e2
y1a = y1
END IF
LOOP
e3 = .5
x2a = x2
DO
x2 = x2 + (x2 - x3) * e3
GOSUB distance
e2 = totalerror
IF e2 = e1 THEN
EXIT DO
ELSEIF e2 > e1 THEN
x2 = x2a
e3 = -e3 / 3
IF ABS(e3) < .01 THEN EXIT DO
ELSE
e1 = e2
x2a = x2
END IF
LOOP
e3 = .5
y2a = y2
DO
y2 = y2 + (y2 - y3) * e3
GOSUB distance
e2 = totalerror
IF e2 = e1 THEN
EXIT DO
ELSEIF e2 > e1 THEN
y2 = y2a
e3 = -e3 / 3
IF ABS(e3) < .01 THEN EXIT DO
ELSE
e1 = e2
y2a = y2
END IF
LOOP
IF Retry% = 6 THEN
LPRINT "1 setlinewidth"
PRINT #2, "1 setlinewidth"
END IF
GOSUB curveto
NEXT Retry%
LPRINT "100 200 translate"
PRINT #2, "100 200 translate"
NEXT example%
LPRINT "showpage"
PRINT #2, "showpage"
CLOSE #1
CLOSE #2
END
’
Bezier:
x = a0 + u * (a1 + u * (a2 + u * a3))
y = b0 + u * (b1 + u * (b2 + u * b3))
dx4 = x - x4: dy4 = y - y4
dx = a1 + u * (2 * a2 + u * 3 * a3)
dy = b1 + u * (2 * b2 + u * 3 * b3)
z = dx * dx4 + dy * dy4
s = dx4 * dx4 + dy4 * dy4
RETURN
’
distance:
totalerror = 0!
a3 = (x3 - x0 + 3 * (x1 - x2)) / 8
b3 = (y3 - y0 + 3 * (y1 - y2)) / 8
a2 = (x3 + x0 - x1 - x2) * 3 / 8
b2 = (y3 + y0 - y1 - y2) * 3 / 8
a1 = (x3 - x0) / 2 - a3
b1 = (y3 - y0) / 2 - b3
a0 = (x3 + x0) / 2 - a2
b0 = (y3 + y0) / 2 - b2
FOR I% = 2 TO n% - 2 STEP 2
x4 = rawdata(I%)
y4 = rawdata(I% + 1)
stepsize = 2 / (n% + 1)
FOR u = -1! TO 1.01 STEP stepsize
GOSUB Bezier
IF s = 0! THEN u1 = u: z1 = z: s1 = s: EXIT FOR
IF u = -1! THEN u1 = u: z1 = z: s1 = s
IF s < s1 THEN u1 = u: z1 = z: s1 = s
NEXT u
IF s1 <> 0! THEN
u = u1 + stepsize
IF u > 1! THEN u = 1! - stepsize
DO
GOSUB Bezier
IF s = 0! THEN EXIT DO
IF z = 0! THEN EXIT DO
u2 = u
z2 = z
temp = z2 - z1
IF temp <> 0! THEN
u = (z2 * u1 - z1 * u2) / temp
ELSE
u = (u1 + u2) / 2!
END IF
IF u > 1! THEN
u = 1!
ELSEIF u < -1! THEN
u = -1!
END IF
IF ABS(u - u2) < .0001 THEN EXIT DO
u1 = u2
z1 = z2
LOOP
END IF
totalerror = totalerror + s
NEXT I%
PRINT totalerror;
PRINT #1, USING "####.### ####.### ####.### ####.### ######.###"; x1; y1; x2; y2; totalerror
RETURN
’
curveto:
LPRINT USING "####.### ####.### moveto"; x0 * psscale; y0 * psscale
PRINT #2, USING "####.### ####.### moveto"; x0 * psscale; y0 * psscale
F$ = "####.### ####.### ####.### ####.### ####.### ####.### curveto stroke"
LPRINT USING F$; x1 * psscale; y1 * psscale; x2 * psscale; y2 * psscale; x3 * psscale; y3 * psscale
PRINT #2, USING F$; x1 * psscale; y1 * psscale; x2 * psscale; y2 * psscale; x3 * psscale; y3 * psscale
RETURN
I want to implement it in c++ because I'm trying to get my algorithm to best fit beziers from points.
What would the above look like in pseudo-code or c / c++?
thanks
The best approach here is to split the code bit by bit and do minor refactorings until it's in a usable state. Data can be changed into global variables at first.
Then start taking small chunks of the code and turning them into functions. At first they'll just use a bunch of global data. As you rewrite the pieces into C++ things will become more clear.
Once you have most of the code built out functionally, then you can start refactoring the variables. The goal would be to remove all the global non-const data and have all the working data be locals. const values can remain namespace level initialized data.
Finally once you have it procedure-based, you can decide if it's worth the effort to encapsulate the work into objects and methods. Depending on how long the program needs to be maintained grouping the data and methods may be a good long-term step.