I have the following code (the xorshift128+ code from Wikipedia modified to use vector types):
#include <immintrin.h>
#include <climits>
__v8si rand_si() {
static auto s0 = __v4du{4, 8, 15, 16},
s1 = __v4du{23, 34, 42, 69};
auto x = s0, y = s1;
s0 = y;
x ^= x << 23;
s1 = x ^ y ^ (x >> 17) ^ (y >> 26);
return (__v8si)(s1 + y);
}
#include <iostream>
#include <iomanip>
void foo() {
//Shuffle a bit. The result is much worse without this.
rand_si(); rand_si(); rand_si(); rand_si();
auto val = rand_si();
for (auto it = reinterpret_cast<int*>(&val);
it != reinterpret_cast<int*>(&val + 1);
++it)
std::cout << std::hex << std::setfill('0') << std::setw(8) << *it << ' ';
std::cout << '\n';
}
which outputs
09e2a657 000b8020 1504cc3b 00110040 1360ff2b 00150078 2a9998b7 00228080
Every other number is very small, and none have the leading bit set. On the other hand, using xorshift* instead:
__v8si rand_si() {
static auto x = __v4du{4, 8, 15, 16};
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
return x * (__v4du)_mm256_set1_epi64x(0x2545F4914F6CDD1D);
}
I get the much better output
0889632e a938b990 1e8b2f79 832e26bd 11280868 2a22d676 275ca4b8 10954ef9
But according to Wikipedia, xorshift+ is a good PRNG, and produces better pseudo-randomness than xorshift*. So, do I have a bug in my RNG code, or am I using it wrong?
I think you should not judge a random generator by looking at 8 numbers it generated. Furthermore, generators usually needs good seeding (your seeding can be considered bad - your seeds start with almost all bits zeros. Calling rand_si() just a few times is not enough for the bits to "spread").
So I recommend you to use proper seeding (for example, a simple solution is to call rand_si() even more times).
xorshift* look like behaving better because of the final multiplication, so it doesn't have easily spotted bad behavior because of inadequate seeding.
A tip: Compare the numbers your code generates with the original implementation. This way you can be sure that your implementation is correct.
geza's answer was exactly right, the seeding was the culprit. It worked much better to seed it using a standard 64-bit PRNG:
void seed(uint64_t s) {
std::mt19937_64 e(s);
s0 = __v4du{e(), e(), e(), e()};
s1 = __v4du{e(), e(), e(), e()};
}
Both guys above are mistaken. The xorshift+ generator works fine even when the initial base (seed) is 1, 2, 3, ... and other simplest values. Generator fails on a zero valued seed only. Check your 64-bit variables representation and correctly work of the binary operators.
Related
I am trying to write a simple gradient descent algorithm in C++ (for 10,000 iterations). Here is my program:
#include<iostream>
#include<cmath>
using namespace std;
int main(){
double learnrate=10;
double x=10.0; //initial start value
for(int h=1; h<=10000; h++){
x=x-learnrate*(2*x + 100*cos(100*x));
}
cout<<"The minimum is at y = "<<x*x + sin(100*x)<<" and at x = "<<x;
return 0;
}
The output ends up being: y=nan and x=nan. I tried looking at the values of x and y by putting them into a file, and after a certain amount of iterations, I am getting all nans (for x and y). edit: I picked the learning rate (or step size) to be 10 as an experiment, I will use much smaller values afterwards.
There must be something wrong with your formula. Already the first 10 values of x are increasing like hell:
-752.379
15290.7
-290852
5.52555e+06
-1.04984e+08
1.9947e+09
-3.78994e+10
7.20088e+11
-1.36817e+13
2.59952e+14
No matter what starting value you choose the absolute value of the next x will be bigger.
|next_x| = | x - 20 * x - 100 * cos(100*x) |
For example consider what happens when you choose a very small starting value (|x|->0), then
|next_x| = | 0 - 20 * 0 - 100 * cos ( 0 ) | = 100
Because at h=240 the variable "x" exceeds the limits of double type (1.79769e+308). This is a diverging arithmetic progression. You need to reduce your learn rate.
A couple of more things:
1- Do not use "using namespace std;" it is bad practice.
2- You can use "std::isnan() function to identify this situation.
Here is an example:
#include <iomanip>
#include <limits>
int main()
{
double learnrate = 10.0;
double x = 10.0; //initial start value
std::cout<<"double type maximum=" << std::numeric_limits<double>::max()<<std::endl;
bool failed = false;
for (int h = 1; h <= 10000; h++)
{
x = x - learnrate*(2.0*x + 100.0 * std::cos(100.0 * x));
if (std::isnan(x))
{
failed = true;
std::cout << " Nan detected at h=" << h << std::endl;
break;
}
}
if(!failed)
std::cout << "The minimum is at y = " << x*x + std::sin(100.0*x) << " and at x = " << x;
return 0;
}
Print x before the call to the cosine function and you will see that the last number printed before NaN (at h = 240) is:
-1.7761e+307
This means that the value is going to infinity, which cannot be represented (thus Not a Number).
It overflows the double type.
If you use long double, you will succeed in 1000 iterations, but you will still overflow the type with 10000 iterations.
So the problem is that the parameter learnrate is just too big. You should do let steps, while using a data type with larger range, as I suggested above.
The "learn rate" is far too high. Change it to 1e-4, for example, and the program works, for an initial value of 10 at least. When the learnrate is 10, the iterations jump too far past the solution.
At its best, gradient descent is not a good algorithm. For serious applications you want to use something better. Much better. Search for Brent optimizer and BFGS.
I calculated the value by c++ amp. Enviroment: VS2015, Win8.
When run parallel_for_each function, value was NaN. The cause was the concurrency::fast_math::tanh function.
concurrency::fast_math::tanh function returns a NaN when argument is greater than 1000 when ran through parallel_for_each:
float arr[2];
concurrency::array_view<float> arr_view(2, arr);
concurrency::extent<1> ex;
ex[0] = 1;
parallel_for_each(ex, [=](Concurrency::index<1> idx) restrict(amp){
float t = 10000000;
arr_view[0] = concurrency::fast_math::fabs(t);
arr_view[1] = concurrency::fast_math::tanh(t);
});
arr_view.synchronize();
std::cout << arr[0] << "," << arr[1] << std::endl;
output
1e+07,nan
case2, if not running parallel_for_each:
float arr[2];
concurrency::array_view<float> arr_view(2, arr);
concurrency::extent<1> ex;
ex[0] = 1;
float t = 10000000;
arr_view[0] = concurrency::fast_math::fabs(t);
arr_view[1] = concurrency::fast_math::tanh(t);
arr_view.synchronize();
std::cout << arr[0] << "," << arr[1] << std::endl;
output:
1e+07,1
It is a result that had been expected for me.
If changing the tanh to tanhf result was the same.
Why does tanh function return NaN ?
Why, returns NaN only while running parrallel_for_each ?
Please tell me reason and the solution of problem.
The functions defined in fast_math prioritize speed over precision. The implementation and precision are hardware dependent. When you don't use parallel_for_each syntax the code will be run on the CPU, which only implements one "precise" tanh function, and hence gives the correct answer.
To fix this you can call the function under precise_math,
concurrency::precise_math::tanh(t);
If this is too slow and the precision for fast_math::tanh is otherwise sufficient, you could try something like
double myTanh(double t){
return (concurrency::fast_math::fabs(t)>100) ? concurrency::precise_math::copysign(1,t) : concurrency::fast_math::tanh(t);
}
It might or might not run faster than the precise version, depending on the hardware. So you need to run some tests.
Most functions in concurrency::fast_math are not guaranteed to return correct value. Some of them (like tanh) can even return NaN values. On my HD 6870 fast tanh of all numbers more than 90 returns NaN.
Here are some tricks to solve this problem.
You can "bound" Tanh argument to 10
float Tanh(float val) restrict(amp)
{
if (val > 10)
return 1;
else if (val < -10)
return-1;
return Concurrency::fast_math::tanh(val);
}
This will not cause any precision loss because float has only 7-digits precision while difference between Tanh(10) and 1 is 4*10-9
Alternatively, you can implement your own Tanh function which wont have such limits.
float Tanh(float val) restrict(amp)
{
float ax = fabs(val);
float x2 = val * val;
float z = val * (1.0f + ax + (1.05622909486427f + 0.215166815390934f * x2 * ax) * x2);
return (z / (1.02718982441289f + fabs(z)));
}
Found this tanh approximation somewhere a long ago. It is pretty fast and considerably precise.
However, if you need very accurate tanh u can replace concurrency::fast_math with concurrency::precise_math. But this option has major disadvantage: precise_math can't run on many GPUs (e.g. my 6870).
From here.
These functions, including the
single-precision functions, require extended double-precision support
on the accelerator. You can use the
accelerator::supports_double_precision Data Member to determine if you
can run these functions on a specific accelerator.
Also, precise_math can be more than 10 times slower than fast_math, especially on non-professional video cards.
If u running concurrent code not in a parallel_for_each block it looks like you dont actually use gpu. So, tanh u evaluate there is evaluated on CPU with no GPU specfic bugs. In fact, if you run this code
float t = 0.65;
arr_view[1] = concurrency::fast_math::tanh(t);
parallel_for_each(e, [=](index<1> idx) restrict(amp)
{
arr_view[0] = concurrency::fast_math::tanh(t);
});
std::cout << arr[0] << "," << arr[1] << std::endl;
arr_view.synchronize();
std::cout << arr[0] << "," << arr[1] << std::endl;
std::cout << arr[0] - arr[1] << std::endl;//may return non-zero value, depending on gpu
you can see result of first tanh before syncronization while getting result of the parallel_for_each block requires it. Also, for me it returns slightly different results, but this may be hardware dependent.
PROBLEM SOLVED: thanks everyone!
I am almost entirely new to C++ so I apologise in advance if the question seems trivial.
I am trying to convert a string of letters to a set of 2 digit numbers where a = 10, b = 11, ..., Y = 34, Z = 35 so that (for example) "abc def" goes to "101112131415". How would I go about doing this? Any help would really be appreciated. Also, I don't mind whether capitalization results in the same number or a different number. Thank you very much in advance. I probably won't need it for a few days but if anyone is feeling particularly nice how would I go about reversing this process? i.e. "101112131415" --> "abcdef" Thanks.
EDIT: This isn't homework, I'm entirely self taught. I have completed this project before in a different language and decided to try C++ to compare the differences and try to learn C++ in the process :)
EDIT: I have roughly what I want, I just need a little bit of help converting this so that it applies to strings, thanks guys.
#include <iostream>
#include <sstream>
#include <string>
int returnVal (char x)
{
return (int) x - 87;
}
int main()
{
char x = 'g';
std::cout << returnVal(x);
}
A portable method is to use a table lookup:
const unsigned int letter_to_value[] =
{10, 11, 12, /*...*/, 35};
// ...
letter = toupper(letter);
const unsigned int index = letter - 'A';
value = letter_to_value[index];
cout << index;
Each character has it's ASCII values. Try converting your characters into ASCII and then manipulate the difference.
Example:
int x = 'a';
cout << x;
will print 97; and
int x = 'a';
cout << x - 87;
will print 10.
Hence, you could write a function like this:
int returnVal(char x)
{
return (int)x - 87;
}
to get the required output.
And your main program could look like:
int main()
{
string s = "abcdef"
for (unsigned int i = 0; i < s.length(); i++)
{
cout << returnVal(s[i]);
}
return 0;
}
This is a simple way to do it, if not messy.
map<char, int> vals; // maps a character to an integer
int g = 1; // if a needs to be 10 then set g = 10
string alphabet = "abcdefghijklmnopqrstuvwxyz";
for(char c : alphabet) { // kooky krazy for loop
vals[c] = g;
g++;
}
What Daniel said, try it out for yourself.
As a starting point though, casting:
int i = (int)string[0] + offset;
will get you your number from character, and: stringstream will be useful too.
How would I go about doing this?
By trying to do something first, and looking for help only if you feel you cannot advance.
That being said, the most obvious solution that comes to mind is based on the fact that characters (i.e. 'a', 'G') are really numbers. Suppose you have the following:
char c = 'a';
You can get the number associated with c by doing:
int n = static_cast<int>(c);
Then, add some offset to 'n':
n += 10;
...and cast it back to a char:
c = static_cast<char>(n);
Note: The above assumes that characters are consecutive, i.e. the number corresponding to 'a' is equal to the one corresponding to 'z' minus the amount of letters between the two. This usually holds, though.
This can work
int Number = 123; // number to be converted to a string
string Result; // string which will contain the result
ostringstream convert; // stream used for the conversion
convert << Number; // insert the textual representation of 'Number' in the characters in the stream
Result = convert.str(); // set 'Result' to the contents of the stream
you should add this headers
#include <sstream>
#include <string>
Many answers will tell you that characters are encoded in ASCII and that you can convert a letter to an index by subtracting 'a'.
This is not proper C++. It is acceptable when your program requirements include a specification that ASCII is in use. However, the C++ standard alone does not require this. There are C++ implementations with other character sets.
In the absence of knowledge that ASCII is in use, you can use translation tables:
#include <limits.h>
// Define a table to translate from characters to desired codes:
static unsigned int Translate[UCHAR_MAX] =
{
['a'] = 10,
['b'] = 11,
…
};
Then you may translate characters to numbers by looking them up in the table:
unsigned char x = something;
int result = Translate[x];
Once you have the translation, you could print it as two digits using printf("%02d", result);.
Translating in the other direction requires reading two characters, converting them to a number (interpreting them as decimal), and performing a similar translation. You might have a different translation table set up for this reverse translation.
Just do this !
(s[i] - 'A' + 1)
Basically we are converting a char to number by subtracting it by A and then adding 1 to match the number and letters
To check my C++ code, I would like to be able to let Boost::Random and Matlab produce the same random numbers.
So for Boost I use the code:
boost::mt19937 var(static_cast<unsigned> (std::time(0)));
boost::uniform_int<> dist(1, 6);
boost::variate_generator<boost::mt19937&, boost::uniform_int<> > die(var, dist);
die.engine().seed(0);
for(int i = 0; i < 10; ++i) {
std::cout << die() << " ";
}
std::cout << std::endl;
Which produces (every run of the program):
4 4 5 6 4 6 4 6 3 4
And for matlab I use:
RandStream.setDefaultStream(RandStream('mt19937ar','seed',0));
randi(6,1,10)
Which produces (every run of the program):
5 6 1 6 4 1 2 4 6 6
Which is bizarre, since both use the same algorithm, and same seed.
What do I miss?
It seems that Python (using numpy) and Matlab seems comparable, in the random uniform numbers:
Matlab
RandStream.setDefaultStream(RandStream('mt19937ar','seed',203));rand(1,10)
0.8479 0.1889 0.4506 0.6253 0.9697 0.2078 0.5944 0.9115 0.2457 0.7743
Python:
random.seed(203);random.random(10)
array([ 0.84790006, 0.18893843, 0.45060688, 0.62534723, 0.96974765,
0.20780668, 0.59444858, 0.91145688, 0.24568615, 0.77430378])
C++Boost
0.8479 0.667228 0.188938 0.715892 0.450607 0.0790326 0.625347 0.972369 0.969748 0.858771
Which is identical to ever other Python and Matlab value...
I have to agree with the other answers, stating that these generators are not "absolute". They may produce different results according to the implementation. I think the simplest solution would be to implement your own generator. It might look daunting (Mersenne twister sure is by the way) but take a look at Xorshift, an extremely simple though powerful one. I copy the C implementation given in the Wikipedia link :
uint32_t xor128(void) {
static uint32_t x = 123456789;
static uint32_t y = 362436069;
static uint32_t z = 521288629;
static uint32_t w = 88675123;
uint32_t t;
t = x ^ (x << 11);
x = y; y = z; z = w;
return w = w ^ (w >> 19) ^ (t ^ (t >> 8));
}
To have the same seed, just put any values you want int x,y,z,w (except(0,0,0,0) I believe). You just need to be sure that Matlab and C++ use both 32 bit for these unsigned int.
Using the interface like
randi(6,1,10)
will apply some kind of transformation on the raw result of the random generator. This transformation is not trivial in general and Matlab will almost certainly do a different selection step than Boost.
Try comparing raw data streams from the RNGs - chances are they are the same
In case this helps anyone interested in the question:
In order to the get the same behavior for the Twister algorithm:
Download the file
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c
Try the following:
#include <stdint.h>
// mt19937ar.c content..
int main(void)
{
int i;
uint32_t seed = 100;
init_genrand(seed);
for (i = 0; i < 100; ++i)
printf("%.20f\n",genrand_res53());
return 0;
}
Make sure the same values are generated within matlab:
RandStream.setGlobalStream( RandStream.create('mt19937ar','seed',100) );
rand(100,1)
randi() seems to be simply ceil( rand()*maxval )
Thanks to Fezvez's answer I've written xor128 in matlab:
function [ w, state ] = xor128( state )
%XOR128 implementation of Xorshift
% https://en.wikipedia.org/wiki/Xorshift
% A starting state might be [123456789, 362436069, 521288629, 88675123]
x = state(1);
y = state(2);
z = state(3);
w = state(4);
% t1 = (x << 11)
t1 = bitand(bitshift(x,11),hex2dec('ffffffff'));
% t = x ^ (x << 11)
t = bitxor(x,t1);
x = y;
y = z;
z = w;
% t2 = (t ^ (t >> 8))
t2 = bitxor(t, bitshift(t,-8));
% t3 = w ^ (w >> 19)
t3 = bitxor(w, bitshift(w,-19));
% w = w ^ (w >> 19) ^ (t ^ (t >> 8))
w = bitxor(t3, t2);
state = [x y z w];
end
You need to pass state in to xor128 every time you use it. I've written a "tester" function which simply returns a vector with random numbers. I tested 1000 numbers output by this function against values output by cpp with gcc and it is perfect.
function [ v ] = txor( iterations )
%TXOR test xor128, returns vector v of length iterations with random number
% output from xor128
% output
v = zeros(iterations,1);
state = [123456789, 362436069, 521288629, 88675123];
i = 1;
while i <= iterations
disp(i);
[t,state] = xor128(state);
v(i) = t;
i = i + 1;
end
I would be very careful assuming that two different implementations of pseudo random generators (even though based on the same algorithms) produce the same result. There could be that one of the implementations use some sort of tweak, hence producing different results. If you need two equal "random" distributions I suggest you either precalculate a sequence, store and access from both C++ and Matlab or create your own generator. It should be fairly easy to implement MT19937 if you use the pseudocode on Wikipedia.
Take care ensuring that both your Matlab and C++ code runs on the same architecture (that is, both runs on either 32 or 64-bit) - using a 64 bit integer in one implementation and a 32 bit integer in the other will lead to different results.
Is it possible to print a random number in C++ from a set of numbers with ONE SINGLE statement?
Let's say the set is {2, 5, 22, 55, 332}
I looked up rand() but I doubt it's possible to do in a single statement.
int numbers[] = { 2, 5, 22, 55, 332 };
int length = sizeof(numbers) / sizeof(int);
int randomNumber = numbers[rand() % length];
Pointlessly turning things into a single expression is practically what the ternary operator was invented for (I'm having none of litb's compound-statement trickery):
std::cout << ((rand()%5==0) ? 2 :
(rand()%4==0) ? 5 :
(rand()%3==0) ? 22 :
(rand()%2==0) ? 55 :
332
) << std::endl;
Please don't rat on me to my code reviewer.
Ah, here we go, a proper uniform distribution (assuming rand() is uniform on its range) in what you could maybe call a "single statement", at a stretch.
It's an iteration-statement, but then so is a for loop with a great big block containing multiple statements. The syntax doesn't distinguish. This actually contains two statements: the whole thing is a statement, and the whole thing excluding the for(...) part is a statement. So probably "a single statement" means a single expression-statement, which this isn't. But anyway:
// weasel #1: #define for brevity. If that's against the rules,
// it can be copy and pasted 7 times below.
#define CHUNK ((((unsigned int)RAND_MAX) + 1) / 5)
// weasel #2: for loop lets me define and use a variable in C++ (not C89)
for (unsigned int n = 5*CHUNK; n >= 5*CHUNK;)
// weasel #3: sequence point in the ternary operator
((n = rand()) < CHUNK) ? std::cout << 2 << "\n" :
(n < 2*CHUNK) ? std::cout << 5 << "\n" :
(n < 3*CHUNK) ? std::cout << 22 << "\n" :
(n < 4*CHUNK) ? std::cout << 55 << "\n" :
(n < 5*CHUNK) ? std::cout << 332 << "\n" :
(void)0;
// weasel #4: retry if we get one of the few biggest values
// that stop us distributing values evenly between 5 options.
If this is going to be the only code in the entire program, and you don't want it to return the same value every time, then you need to call srand(). Fortunately this can be fitted in. Change the first line to:
for (unsigned int n = (srand((time(0) % UINT_MAX)), 5*CHUNK); n >= 5*CHUNK;)
Now, let us never speak of this day again.
Say these numbers are in a set of size 5, all you gotta do is find a random value multiplied by 5 (to make it equi probable). Assume the rand() method returns you a random value between range 0 to 1. Multiply the same by 5 and cast it to integer you will get equiprobable values between 0 and 4. Use that to fetch from the index.
I dont know the syntax in C++. But this is how it should look
my_rand_val = my_set[(int)(rand()*arr_size)]
Here I assume rand() is a method that returns a value between 0 and 1.
Yes, it is possible. Not very intuitive but you asked for it:
#include <time.h>
#include <stdlib.h>
#include <iostream>
int main()
{
srand(time(0));
int randomNumber = ((int[]) {2, 5, 22, 55, 332})[rand() % 5];
std::cout << randomNumber << std::endl;
return 0;
}
Your "single statement" criteria is very vague. Do you mean one machine instruction, one stdlib call?
If you mean one machine instruction, the answer is no, without special hardware.
If you mean one function call, then of course it is possible. You could write a simple function to do what you want:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main()
{
int setSize = 5;
int set[] = {2, 5, 22, 55, 332 };
srand( time(0) );
int number = rand() % setSize;
printf("%d %d", number, set[number]);
return 0;
}