I've been porting Sebastiano Vigna's xorshift1024* PRNG to be compatible with the standard C++11 uniform random number generator contract and noticed some strange behavior with the jump() function he provides.
According to Vigna, a call to jump() should be equivalent to 2^512 calls to next(). Therefore a series of calls to jump() and next() should be commutative. For example, assuming the generator starts in some known state,
jump();
next();
should leave the generator in the same state as
next();
jump();
since both should be equivalent to
for (bigint i = 0; i < (bigint(1) << 512) + 1; ++i)
next();
assuming bigint is some integer type with an extremely large maximum value (and assuming you are a very, very, very patient person).
Unfortunately, this doesn't work with the reference implementation Vigna provides (which I will include at the end for posterity; in case the implementation linked above changes or is taken down in the future). When testing the first two options using the following test code:
memset(s, 0xFF, sizeof(s));
p = 0;
// jump() and/or next() calls...
std::cout << p << ';';
for (int i = 0; i < 16; ++i)
std::cout << ' ' << s[i];
calling jump() before next() outputs:
1; 9726214034378009495 13187905351877324975 10033047168458208082 990371716258730972 965585206446988056 74622805968655940 11468976784638207029 3005795712504439672 6792676950637600526 9275830639065898170 6762742930827334073 16862800599087838815 13481924545051381634 16436948992084179560 6906520316916502096 12790717607058950780
while calling next() first results in:
1; 13187905351877324975 10033047168458208082 990371716258730972 965585206446988056 74622805968655940 11468976784638207029 3005795712504439672 6792676950637600526 9275830639065898170 6762742930827334073 16862800599087838815 13481924545051381634 16436948992084179560 6906520316916502096 12790717607058950780 9726214034378009495
Clearly either my understanding of what jump() is doing is wrong, or there's a bug in the jump() function, or the jump polynomial data is wrong. Vigna claims that such a jump function can be calculated for any stride of the period, but doesn't elaborate on how to calculate it (including in his paper on xorshift* generators). How can I calculate the correct jump data to verify that there's not a typo somewhere in it?
Xorshift1024* reference implementation; http://xorshift.di.unimi.it/xorshift1024star.c
/* Written in 2014-2015 by Sebastiano Vigna (vigna#acm.org)
To the extent possible under law, the author has dedicated all copyright
and related and neighboring rights to this software to the public domain
worldwide. This software is distributed without any warranty.
See <http://creativecommons.org/publicdomain/zero/1.0/>. */
#include <stdint.h>
#include <string.h>
/* This is a fast, top-quality generator. If 1024 bits of state are too
much, try a xorshift128+ generator.
The state must be seeded so that it is not everywhere zero. If you have
a 64-bit seed, we suggest to seed a splitmix64 generator and use its
output to fill s. */
uint64_t s[16];
int p;
uint64_t next(void) {
const uint64_t s0 = s[p];
uint64_t s1 = s[p = (p + 1) & 15];
s1 ^= s1 << 31; // a
s[p] = s1 ^ s0 ^ (s1 >> 11) ^ (s0 >> 30); // b,c
return s[p] * UINT64_C(1181783497276652981);
}
/* This is the jump function for the generator. It is equivalent
to 2^512 calls to next(); it can be used to generate 2^512
non-overlapping subsequences for parallel computations. */
void jump() {
static const uint64_t JUMP[] = { 0x84242f96eca9c41dULL,
0xa3c65b8776f96855ULL, 0x5b34a39f070b5837ULL, 0x4489affce4f31a1eULL,
0x2ffeeb0a48316f40ULL, 0xdc2d9891fe68c022ULL, 0x3659132bb12fea70ULL,
0xaac17d8efa43cab8ULL, 0xc4cb815590989b13ULL, 0x5ee975283d71c93bULL,
0x691548c86c1bd540ULL, 0x7910c41d10a1e6a5ULL, 0x0b5fc64563b3e2a8ULL,
0x047f7684e9fc949dULL, 0xb99181f2d8f685caULL, 0x284600e3f30e38c3ULL
};
uint64_t t[16] = { 0 };
for(int i = 0; i < sizeof JUMP / sizeof *JUMP; i++)
for(int b = 0; b < 64; b++) {
if (JUMP[i] & 1ULL << b)
for(int j = 0; j < 16; j++)
t[j] ^= s[(j + p) & 15];
next();
}
memcpy(s, t, sizeof t);
}
OK, I'm sorry but sometimes this happens (I'm the author).
Originally the function had two memcpy(). Then I realised then a circular copy was needed. But I replaced just the first memcpy(). Stupid, stupid, stupid. All files on the site have been fixed. The arXiv copy is undergoing update. See http://xorshift.di.unimi.it/xorshift1024star.c
Incidentally: I didn't "publish" anything wrong in the scientific sense, as the jump() function is not part of the ACM Trans. Math. Soft. paper—it just has been added few weeks ago on the site and on the arXiv/WWW version. The fast publication path of the web and arXiv means that, sometimes, one distributes unpolished papers. I can only thank the reporter for reporting this bug (OK, technically StackOverflow is not reporting bugs, but I got an email, too).
Unfortunately, the unit test I had did not consider the case p ≠ 0. My main concern was that the correctness of the computed polynomial. The function, as noted above, is correct when p = 0.
As for the computation: to each generator corresponds a characteristic polynomial P(x). The jump polynomial for k is just x^k mod P(x). I use fermat to compute such powers, and then I have some scripts generating the C code.
Of course I can't test 2^512, but since my generation code works perfectly from 2 to 2^30 (the range you can easily test), I'm confident it works at 2^512, too. It's just fermat computing x^{2^512} instead of x^{2^30}. But independent verifications are more than welcome.
I have code working only for powers of the form x^{2^t}. This is what I need to compute useful jump functions. Computing polynomials modulo P(x) is not difficult, so one could conceivably have a completely generic jump function for any value, but frankly I find this totally overkill.
If anybody is interested in getting other jump polynomials, I can provide the scripts. They will be part, as it happens for all other code, of the next xorshift distribution, but I need to complete the documentation before giving them out.
For the record, the characteristic polynomial of xorshift1024* is x^1024 + x^974 + x^973 + x^972 + x^971 + x^966 + x^965 + x^964 + x^963 + x^960 + x^958 + x^957 + x^956 + x^955 + x^950 + x^949 + x^948 + x^947 + x^942 + x^941 + x^940 + x^939 + x^934 + x^933 + x^932 + x^931 + x^926 + x^925 + x^923 + x^922 + x^920 + x^917 + x^916 + x^915 + x^908 + x^906 + x^904 + x^902 + x^890 + x^886 + x^873 + x^870 + x^857 + x^856 + x^846 + x^845 + x^844 + x^843 + x^841 + x^840 + x^837 + x^835 + x^830 + x^828 + x^825 + x^824 + x^820 + x^816 + x^814 + x^813 + x^811 + x^810 + x^803 + x^798 + x^797 + x^790 + x^788 + x^787 + x^786 + x^783 + x^774 + x^772 + x^771 + x^770 + x^769 + x^768 + x^767 + x^765 + x^760 + x^758 + x^753 + x^749 + x^747 + x^746 + x^743 + x^741 + x^740 + x^738 + x^737 + x^736 + x^735 + x^728 + x^726 + x^723 + x^722 + x^721 + x^720 + x^718 + x^716 + x^715 + x^714 + x^710 + x^709 + x^707 + x^694 + x^687 + x^686 + x^685 + x^684 + x^679 + x^678 + x^677 + x^674 + x^670 + x^669 + x^667 + x^666 + x^665 + x^663 + x^658 + x^655 + x^651 + x^639 + x^638 + x^635 + x^634 + x^632 + x^630 + x^623 + x^621 + x^618 + x^617 + x^616 + x^615 + x^614 + x^613 + x^609 + x^606 + x^604 + x^601 + x^600 + x^598 + x^597 + x^596 + x^594 + x^593 + x^592 + x^590 + x^589 + x^588 + x^584 + x^583 + x^582 + x^581 + x^579 + x^577 + x^575 + x^573 + x^572 + x^571 + x^569 + x^567 + x^565 + x^564 + x^563 + x^561 + x^559 + x^557 + x^556 + x^553 + x^552 + x^550 + x^544 + x^543 + x^542 + x^541 + x^537 + x^534 + x^532 + x^530 + x^528 + x^526 + x^523 + x^521 + x^520 + x^518 + x^516 + x^515 + x^512 + x^511 + x^510 + x^508 + x^507 + x^506 + x^505 + x^504 + x^502 + x^501 + x^499 + x^497 + x^494 + x^493 + x^492 + x^491 + x^490 + x^487 + x^485 + x^483 + x^482 + x^480 + x^479 + x^477 + x^476 + x^475 + x^473 + x^469 + x^468 + x^465 + x^463 + x^461 + x^460 + x^459 + x^458 + x^455 + x^453 + x^451 + x^448 + x^447 + x^446 + x^445 + x^443 + x^438 + x^437 + x^431 + x^430 + x^429 + x^428 + x^423 + x^417 + x^416 + x^415 + x^414 + x^412 + x^410 + x^409 + x^408 + x^400 + x^398 + x^396 + x^395 + x^391 + x^390 + x^386 + x^385 + x^381 + x^380 + x^378 + x^375 + x^373 + x^372 + x^369 + x^368 + x^365 + x^360 + x^358 + x^357 + x^354 + x^350 + x^348 + x^346 + x^345 + x^344 + x^343 + x^342 + x^340 + x^338 + x^337 + x^336 + x^335 + x^333 + x^332 + x^325 + x^323 + x^318 + x^315 + x^313 + x^309 + x^308 + x^305 + x^303 + x^302 + x^300 + x^294 + x^290 + x^281 + x^279 + x^276 + x^275 + x^273 + x^272 + x^267 + x^263 + x^262 + x^261 + x^260 + x^258 + x^257 + x^256 + x^249 + x^248 + x^243 + x^242 + x^240 + x^238 + x^236 + x^233 + x^232 + x^230 + x^228 + x^225 + x^216 + x^214 + x^212 + x^210 + x^208 + x^206 + x^205 + x^200 + x^197 + x^196 + x^184 + x^180 + x^176 + x^175 + x^174 + x^173 + x^168 + x^167 + x^166 + x^157 + x^155 + x^153 + x^152 + x^151 + x^150 + x^144 + x^143 + x^136 + x^135 + x^125 + x^121 + x^111 + x^109 + x^107 + x^105 + x^92 + x^90 + x^79 + x^78 + x^77 + x^76 + x^60 + 1
tldr: I'm pretty sure there's a bug in the original code:
The memcpy in jump() must consider the p rotation too.
The author didn't test nearly as much as appropriate before publishing a paper...
Long version:
One next() call changes only one of the 16 s array elements, the one with index p. p starts at 0, gets increased each next() call, and after 15 it becomes 0 again. Let's call s[p] the "current" array element. Another (slower) possibility for implementing next() would be that the current element is always the first one, there is no p, and instead of incrementing p the whole s array is rotated (ie. the first element moves to the last position and the previous second element becomes the first).
Independent of the current p value, 16 calls to next() should result in the same p value as before, ie. the whole cycle is done and the current element is the same position as before the 16 calls. jump() should do 2^512 next(), 2^512 is a multiple of 16, so with one jump, the p value before and after it should be the same.
You probably noticed already that your two different results are only rotated one time, ie. one solution is "9726214034378009495 somethingelse" and one is "somethingelse 9726214034378009495"
...because you did one next() before/after the jump() and jump() can't handle p other than 0.
If you'd test it with 16 next() (or 32 or 0 or ...) before/after jump() instead of one, the two results are equal. The reason is, within jump, while for the s array the current element / p is handled as it is in next(), the t array is semantically rotated so that the current element is always the first one (t[j] ^= s[(j + p) & 15];). Then, right before the function terminates, memcpy(s, t, sizeof t); copies the new values from t back to s without considering the rotation at all. Just replace the memcpy with a proper loop including the p offset, then it should be fine.
(Well, but that doesn't mean jump() is really the same as 2^512 next(). But at least it could be.)
As Vigna himself said, that was actually a bug.
While working on a Java implementation, I found, if not mistaken, a small improvement on the correct implementation:
If you update t array also circularly from p to p-1, then you can just memcpy it back to the state and it will work correctly.
Moreover, the loop updating t gets tighter, as you do not need to add p + j every time. For instance:
int j = p;
do {
t[j] ^= s[j];
++j;
j &= 15;
} while (j != p);
Ok, as bcrist correctly noted, the previous code is wrong, as p changes for each bit in JUMP array. The best alternative I come up with is the following:
void jump() {
static const uint64_t JUMP[] = { 0x84242f96eca9c41dULL,
0xa3c65b8776f96855ULL, 0x5b34a39f070b5837ULL, 0x4489affce4f31a1eULL,
0x2ffeeb0a48316f40ULL, 0xdc2d9891fe68c022ULL, 0x3659132bb12fea70ULL,
0xaac17d8efa43cab8ULL, 0xc4cb815590989b13ULL, 0x5ee975283d71c93bULL,
0x691548c86c1bd540ULL, 0x7910c41d10a1e6a5ULL, 0x0b5fc64563b3e2a8ULL,
0x047f7684e9fc949dULL, 0xb99181f2d8f685caULL, 0x284600e3f30e38c3ULL
};
uint64_t t[16] = { 0 };
const int base = p;
int j = base;
for(int i = 0; i < sizeof JUMP / sizeof *JUMP; i++)
for(int b = 0; b < 64; b++) {
if (JUMP[i] & 1ULL << b) {
int k = p;
do {
t[j++] ^= s[k++];
j &= 15;
k &= 15;
} while (j != base);
}
next();
}
memcpy(s, t, sizeof t);
}
As p will have its original value in the end, this should work.
Not very sure whether it is actually an improvement in performance, as I am trading one addition for an increment and a bitwise AND.
I think it will not be slower, even if increment is as expensive as addition, due to the lack of data dependency between j and k updates. Hopefully, it may be slightly faster.
Opinions / corrections are more than welcome.
Hi I'm making a 3D graphics engine for an assignment that is due later tonight, it's going smoothly at the moment except I'm loading a cube model from an .obj file, the positions start at 0.
My transformation matrix works for numbers that don't = 0. I mean if X = 0 and I try to translate it by 10 on the X Axis, it returns 0.
Matrix * Vector:
Vec4 Mat4::operator*(const Vec4& v) const
{
Vec4 tmp(0, 0, 0, 0, 255, 255, 255, 255);
tmp.x = (this->data[0] * v.x) + (this->data[4] * v.y) + (this->data[8] * v.z) + (this->data[12] * v.w);
tmp.y = (this->data[1] * v.x) + (this->data[5] * v.y) + (this->data[9] * v.z) + (this->data[13] * v.w);
tmp.z = (this->data[2] * v.x) + (this->data[6] * v.y) + (this->data[10] * v.z) + (this->data[14] * v.w);
tmp.w = (this->data[3] * v.x) + (this->data[7] * v.y) + (this->data[11] * v.z) + (this->data[15] * v.w);
return tmp;
}
Translate Matrix:
Mat4 Mat4::translate(float x, float y, float z)
{
Mat4 tmp;
tmp.data[12] = x;
tmp.data[13] = y;
tmp.data[14] = z;
return tmp;
}
A Mat4 class by default is an identity matrix.
It is too late now, but... it might be helpful to know the following:
A vector strictly equal to 0.0 (e.g. <0,0,0,0>) cannot be translated using matrix multiplication and technically should not be considered a position in this context. In fact, such a vector is not even representative of a direction because it has 0 length. It is simply zero; there are not a whole lot of uses for a vector that cannot be rotated or translated.
You can rotate vectors with 0.0 for the W coordinate, but the value 0.0 for W prevents translation.
Generally you want a W coordinate of 1.0 for spatial (e.g. position) vectors and 0.0 for directional (e.g. normal).
If you want to understand this better, you need to consider how your 4x4 matrix is setup. The first 3 rows or columns (depending on which convention you use) store rotation, and the 4th stores translation.
Consider how translation is applied when you multiply your matrix and vector:
x = ... + (this->data[12] * v.w);
y = ... + (this->data[13] * v.w);
z = ... + (this->data[14] * v.w);
w = ... + (this->data[15] * v.w);
If v.w is 0.0, then translation evaluates to 0.0 for all coordinates.
I have a project where I read in an array that has 1 or more dimensions, and for this project I need to be able to determine a given element's neighbours quickly. I do not know the dimensionality ahead of time, and I likewise do not know the size of the dimensions ahead of time. What would be the best C++ data structure to store this data in? A colleague recommended a vector of vectors of vectors of . . ., but that seems incredibly unwieldy.
If you know the address of which element you need the neighbors for, could you just do pointer arithmetic to find out the neighbors. For example, if p is the location of the element, then p-- is the left neighbor and p++ is the right neighbor.
Think your multidimensional array as a 1D array. Let the dimension of the array is d1 * d2 * ....* dn
Then allocate memory for a 1D array, say A of size d1 * d2 * ....* dn. For example,
int *A = new int[d1 * d2 * ....* dn];
If you need to store data in the [i1][i2]...[in] th index, then store in the following index:
A[i1 * (d2*d3*d4.. *dn) + i2 * (d3*d4*....dn) + ..... + in]
Neighboring elements will be:
A[(i1 + 1) * (d2*d3*d4.. *dn) + i2 * (d3*d4*....dn) + ..... + in]
A[(i1 - 1) * (d2*d3*d4.. *dn) + i2 * (d3*d4*....dn) + ..... + in]
A[i1 * (d2*d3*d4.. *dn) + (i2 + 1) * (d3*d4*....dn) + ..... + in]
A[i1 * (d2*d3*d4.. *dn) + (i2 - 1) * (d3*d4*....dn) + ..... + in]
.............................
A[i1 * (d2*d3*d4.. *dn) + i2 * (d3*d4*....dn) + ..... + (in + 1)]
A[i1 * (d2*d3*d4.. *dn) + i2 * (d3*d4*....dn) + ..... + (in - 1)]
- (void)draw {
lines[0] = ccp(self.l.x + (segmentIndex[0] * segmentSpacing), self.l.y);
lines[1] = ccp(self.l.x + (segmentIndex[1] * segmentSpacing), self.l.y + segmentHeight * 1);
lines[2] = ccp(self.l.x + (segmentIndex[2] * segmentSpacing), self.l.y + segmentHeight * 2);
lines[3] = ccp(self.l.x + (segmentIndex[3] * segmentSpacing), self.l.y + segmentHeight * 3);
lines[4] = ccp(lines[3].x + segmentWidth, lines[3].y);
lines[5] = ccp(lines[2].x + segmentWidth, lines[2].y);
lines[6] = ccp(lines[1].x + segmentWidth, lines[1].y);
lines[7] = ccp(lines[0].x + segmentWidth, lines[0].y);
ccDrawPoly(lines, 8, YES);
//ccDrawSolidPoly(lines, 8, [self color]);
}
The ccDrawPoly line draws the zig-zag style shape I intend to draw.
There are 4 Y segments and each one has a point determined by segmentIndex & segmentSpacing, then +segmentWidth on the x for a total of 8 points.
I go down the left-most points first, then walk my way back up.
When I call ccDrawSolidPoly, it fills in the shape in a different way. It fills in a polygon using all the outermost points instead of drawing lines in the order I set in points.
How do I do ccDrawPoly and fill?
Write my own function, apparently.
Forum results showed people editing the cocos2d ccDrawSolidPoly function.
I have a line in a vertex shader
gl_Position = gl_ModelViewProjectionMatrix * vertex;
I need to do the same computation without a shader, like:
float vertex[4];
float modelviewProjection[16];
glGetFloatv(GL_MODELVIEW_MATRIX, modelviewProjection);
glMatrixMode(GL_PROJECTION_MATRIX);
glMultMatrixf(modelviewProjection);
for ( counter = 0; counter < numPoints; counter++ )
{
vertex[0] = *vertexPointer + randomAdvance(timeAlive) + sin(ParticleTime);
vertex[1] = *( vertexPointer + 1 ) + randomAdvance(timeAlive) + timeAlive * 0.6f;
vertex[2] = *( vertexPointer + 2 );
glPushMatrix();
glMultMatrixf(vertex);
*vertexPointer = vertex[0];
*( vertexPointer + 1 ) = vertex[1];
*( vertexPointer + 2 ) = vertex[2];
vertexPointer += 3;
glPopMatrix();
}
If you have no suitable vector/matrix library, look into GLM (it can do that kind of thing without any fuss).
If you want to do it manually, the components of the transformed vector are the dot products of the respective rows in the matrix and the untransformed vector. That is because a vector can be seen as a matrix with one column (then just apply the rules of matrix multiplication).
Thus, assuming OpenGL memory layout, that would be:
x = x*m[0] + y*m[4] + z*m[8] + w*m[12], y = x*m[1] + y*m[5] + z*m[9] + w*m[13], etc.