I've successfully used std::async in the past, but lately in checking the fidelity of some new code, I've run into an oddity that has me stumped. I'm sure there should be a simple explanation and a proper solution, but I can't find a discussion of it anywhere.
The following bit of minimal code illustrates the matter:
#include <functional>
#include <thread>
#include <future>
#include <iostream>
#include <sstream>
#include <vector>
#include <algorithm>
int main(int argc, char **argv) {
for (size_t delay = 0; delay < 2; delay++) {
std::vector<std::future<std::string>> futures;
for (size_t i = 0; i < 10; i++) {
auto fut = std::async(std::launch::async,
[&i] () -> std::string
{
std::stringstream ss;
ss << "work on number " << i << " " << std::this_thread::get_id();
return ss.str();
}
);
if (delay == 1) {
std::this_thread::sleep_for (std::chrono::milliseconds(10));
}
futures.push_back(std::move(fut));
}
// do not proceed until all threads are done
std::for_each(futures.begin(), futures.end(), [](std::future<std::string>& fut)
{
auto codeconf = fut.get();
std::cout << codeconf << std::endl;
}
);
std::cout << std::endl;
}
}
Without the delay (i.e. first time through the outer loop), some loop-elements (integers) get missed and don't get assigned to a thread/task, while other loop elements get assigned to more than one thread. The loop also runs beyond it's limits:
work on number 4 139770383861504
work on number 4 139770375468800
work on number 4 139770367076096
work on number 6 139770358683392
work on number 5 139770350290688
work on number 6 139770341897984
work on number 7 139770333505280
work on number 8 139770325112576
work on number 10 139770248296192
work on number 10 139770239903488
Including a minor delay (10 ms) allows loop increments and threads to correspond as expected and intended -- i.e. a one-to-one correspondence between loop increment and task/thread (the order of completion doesn't matter, of course, even though they are in order here):
work on number 0 139770239903488
work on number 1 139770248296192
work on number 2 139770325112576
work on number 3 139770333505280
work on number 4 139770383861504
work on number 5 139770375468800
work on number 6 139770367076096
work on number 7 139770358683392
work on number 8 139770350290688
work on number 9 139770341897984
My understanding is that the async launch policy should just pick up the integer that corresponds to the loop iteration, feed it into the lambda function, and execute it on an independent task/thread; when it starts (which is essentially immediate) and when it ends doesn't really matter to the functioning and logic of the loop. But here, without a delay, "async" seems to quite literally to describe the relationship between loop iterations and tasks.
Is the tiny delay workaround legitimate? What am I failing to understand?
Without the delay (i.e. first time through the outer loop), some loop-elements (integers) get missed and don't get assigned to a thread/task, while other loop elements get assigned to more than one thread
This is an immediate red flag for trying to access a loop counter from another thread that was spawned in that loop.
In this case, your tasks use a reference to i, which is being incremented (and eventually destroyed) in the main thread.
You should pass a copy of i to each task, so that the task assuredly uses whatever the value of i was on that iteration.
As #RichardCritten said in the comments, having one thread (the main one) writing to i while other threads are reading it leads to undefined behaviour. I wouldn't try to figure out why the output is like it is, the compile can change the ordering of memory stores/writes can change at will without synchronisation (mutex's etc).
A couple of helpful talks on the subject:
Herb Sutter's "atomic<> weapons": http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2
Han's Boehm's "Threads and Shared Variables in C++11": http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Threads-and-Shared-Variables-in-C-11
Related
I want to parallelize the execution of a randomized algorithm in the following way: I have a number of threads which execute the same randomized operations in a loop and return in case of success. I want to start multiple threads and return once at least one of them stops (returns a value). As a minimum example, consider the following code snippet:
#include <iostream>
#include <stdlib.h> /* srand, rand */
#include <future>
#include <vector>
int random_algorithm(){
while(true) {
int random_number = rand() % 10 + 1;
if (random_number > 5){
return random_number;
}
}
}
int main(){
std::vector<std::future<int>> thread_vec;
for(int i=0;i<5;++i){
std::future<int> t = std::async(std::launch::async, random_algorithm);
thread_vec.push_back(std::move(t));
}
**//Instead of the following loop, I want to**
**//continue execution as soon as one of the threads returned.**
for(auto& th: thread_vec){
th.wait();
std::cout << "thread returned " << th.get() << std::endl;
}
return 0;
}
Basically, instead of calling th.wait() on every thread, I just want to wait here until one of the threads in thread_vec has finished its work and then get that threads return value. How would I achieve this?
Note: I saw this question, but this does not seem to reveal which of the threads finished its work.
Ok, let's start discussing your code:
rand() is not re-entrant safe. You must never use it in multiple threads concurrently. Also, it's typically a really bad random number generator.
You're using C++11 or later, so use std::random instead.
Solving your problem: instead of waiting on a future, you should simply share a condition variable with all threads, and the first thread to notify the variable and thus the main thread ends the computation.
Result returning can be implemented through atomic variables, for example (std::atomic).
I am experimenting with std::async to populate a vector. The idea behind it is to use multi-threading to save time. However, running some benchmark tests I find that my non-async method is faster!
#include <algorithm>
#include <vector>
#include <future>
std::vector<int> Generate(int i)
{
std::vector<int> v;
for (int j = i; j < i + 10; ++j)
{
v.push_back(j);
}
return v;
}
Async:
std::vector<std::future<std::vector<int>>> futures;
for (int i = 0; i < 200; i+=10)
{
futures.push_back(std::async(
[](int i) { return Generate(i); }, i));
}
std::vector<int> res;
for (auto &&f : futures)
{
auto vec = f.get();
res.insert(std::end(res), std::begin(vec), std::end(vec));
}
Non-async:
std::vector<int> res;
for (int i = 0; i < 200; i+=10)
{
auto vec = Generate(i);
res.insert(std::end(res), std::begin(vec), std::end(vec));
}
My benchmark test shows that the async method is 71 times slower than non-async. What am I doing wrong?
std::async has two modes of operation:
std::launch::async
std::launch::deferred
In this case, you've called std::async without specifying either one, which means it's allowed to choose either one. std::launch::deferred basically means do the work on the calling thread. So std::async returns a future, and with std::launch::deferred, the action you've requested won't be carried out until you call .get on that future. It can be kind of handy under a few circumstances, but it's probably not what you want here.
Even if you specify std::launch::async, you need to realize that this starts up a new thread of execution to carry out the action you've requested. It then has to create a future, and use some sort of signalling from the thread to the future to let you know when the computation you've requested is done.
All of that adds a fair amount of overhead--anywhere from microseconds to milliseconds or so, depending on the OS, CPU, etc.
So, for asynchronous execution to make sense, the "stuff" you do asynchronously typically needs to take tens of milliseconds at the very least (and hundreds of milliseconds might be a more sensible lower threshold). I wouldn't get too wrapped up in the exact cutoff, but it needs to be something that takes a while.
So, filling an array asynchronously probably only makes sense if the array is quite a lot larger than you're dealing with here.
For filling memory, you'll quickly run into another problem though: most CPUs are enough faster than main memory that if all you're doing is writing to memory, there's a pretty good chance that a single thread will already saturate the path to memory, so even at best doing the job asynchronously will only gain a little, and may still pretty easily cause a slow-down.
The ideal case for asynchronous operation would be something like one thread that's heavily memory bound, but another that (for example) reads a little bit of data, and does a lot of computation on that small amount of data. In this case, the computation thread will mostly operate on its data in the cache, so it won't get in the way of the memory thread doing its thing.
There are multiple factors that are causing the Multithreaded code to perform (much) slower than the Singlethreaded code.
Your array sizes are too small
Multithreading often has negligible-to-no effect on datasets that are particularly small. In both versions of your code, you're generating 2000 integers, and each Logical Thread (which, because std::async is often implemented in terms of thread pools, might not be the same as a Software Thread) is only generating 10 integers. The cost of spooling up a thread every 10 integers way offsets the benefit of generating those integers in parallel.
You might see a performance gain if each thread were instead responsible for, say, 10,000 integers each, but you'll probably instead have a different issue:
All your code is bottlenecked by an inherently serial process
Both versions of the code copy the generated integers into a host vector. It would be one thing if the act of generating those integers was itself a time consuming process, but in your case, it's likely just a matter of a small, fast bit of assembly generating each integer.
So the act of copying each integer into the final vector is probably not inherently faster than generating each integer, meaning a sizable chunk of the "work" being done is completely serial, defeating the whole purpose of multithreading your code.
Fixing the code
Compilers are very good at their jobs, so in trying to revise your code, I was only barely able to get multithreaded code that was faster than the serial code. Multiple executions had varying results, so my general assessment is that this kind of code is bad at being multithreaded.
But here's what I came up with:
#include <algorithm>
#include <vector>
#include <future>
#include<chrono>
#include<iostream>
#include<iomanip>
//#1: Constants
constexpr int BLOCK_SIZE = 500000;
constexpr int NUM_OF_BLOCKS = 20;
std::vector<int> Generate(int i) {
std::vector<int> v;
for (int j = i; j < i + BLOCK_SIZE; ++j) {
v.push_back(j);
}
return v;
}
void asynchronous_attempt() {
std::vector<std::future<void>> futures;
//#2: Preallocated Vector
std::vector<int> res(NUM_OF_BLOCKS * BLOCK_SIZE);
auto it = res.begin();
for (int i = 0; i < NUM_OF_BLOCKS * BLOCK_SIZE; i+=BLOCK_SIZE)
{
futures.push_back(std::async(
[it](int i) {
auto vec = Generate(i);
//#3 Copying done multithreaded
std::copy(vec.begin(), vec.end(), it + i);
}, i));
}
for (auto &&f : futures) {
f.get();
}
}
void serial_attempt() {
//#4 Changes here to show fair comparison
std::vector<int> res(NUM_OF_BLOCKS * BLOCK_SIZE);
auto it = res.begin();
for (int i = 0; i < NUM_OF_BLOCKS * BLOCK_SIZE; i+=BLOCK_SIZE) {
auto vec = Generate(i);
it = std::copy(vec.begin(), vec.end(), it);
}
}
int main() {
using clock = std::chrono::steady_clock;
std::cout << "Theoretical # of Threads: " << std::thread::hardware_concurrency() << std::endl;
auto begin = clock::now();
asynchronous_attempt();
auto end = clock::now();
std::cout << "Duration of Multithreaded Attempt: " << std::setw(10) << (end - begin).count() << "ns" << std::endl;
begin = clock::now();
serial_attempt();
end = clock::now();
std::cout << "Duration of Serial Attempt: " << std::setw(10) << (end - begin).count() << "ns" << std::endl;
}
This resulted in the following output:
Theoretical # of Threads: 2
Duration of Multithreaded Attempt: 361149213ns
Duration of Serial Attempt: 364785676ns
Given that this was on an online compiler (here) I'm willing to bet the multithreaded code might win out on a dedicated machine, but I think this at least demonstrates the improvement in performance that we're at least on par between the two methods.
Below are the changes I made, that are ID'd in the code:
We've dramatically increased the number of integers being generated, to force the threads to do actual meaningful work, instead of getting bogged down on OS-level housekeeping
The vector has its size pre-allocated. No more frequent resizing.
Now that the space has been preallocated, we can multithread the copying instead of doing it in serial later.
We have to change the serial code so it also preallocates + copies so that it's a fair comparison.
Now, we've ensured that all the code is indeed running in parallel, and while it's not amounting to a substantial improvement over the serial code, it's at least no longer exhibiting the degenerate performance losses we were seeing before.
First of all, you are not forcing the std::async to work asynchronously (you would need to specify std::launch::async policy to do so). Second of all, it'd be kind of an overkill to asynchronously create an std::vector of 10 ints. It's just not worth it. Remember - using more threads does not mean that you will see performance benefit! Creating a thread (or even using a threadpool) introduces some overhead, which, in this case, seems to dwarf the benefits of running tasks asynchronously.
Thanks #NathanOliver ;>
#include <math.h>
#include <sstream>
#include <iostream>
#include <mutex>
#include <stdlib.h>
#include <chrono>
#include <thread>
bool isPrime(int number) {
int i;
for (i = 2; i < number; i++) {
if (number % i == 0) {
return false;
}
}
return true;
}
std::mutex myMutex;
int pCnt = 0;
int icounter = 0;
int limit = 0;
int getNext() {
std::lock_guard<std::mutex> guard(myMutex);
icounter++;
return icounter;
}
void primeCnt() {
std::lock_guard<std::mutex> guard(myMutex);
pCnt++;
}
void primes() {
while (getNext() <= limit)
if (isPrime(icounter))
primeCnt();
}
int main(int argc, char *argv[]) {
std::stringstream ss(argv[2]);
int tCount;
ss >> tCount;
std::stringstream ss1(argv[4]);
int lim;
ss1 >> lim;
limit = lim;
auto t1 = std::chrono::high_resolution_clock::now();
std::thread *arr;
arr = new std::thread[tCount];
for (int i = 0; i < tCount; i++)
arr[i] = std::thread(primes);
for (int i = 0; i < tCount; i++)
arr[i].join();
auto t2 = std::chrono::high_resolution_clock::now();
std::cout << "Primes: " << pCnt << std::endl;
std::cout << "Program took: " << std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count() <<
" milliseconds" << std::endl;
return 0;
}
Hello , im trying to find the amount of prime numbers between the user specified range, i.e., 1-1000000 with a user specified amount of threads to speed up the process, however, it seems to take the same amount of time for any amount of threads compared to one thread. Im not sure if its supposed to be that way or if theres a mistake in my code. thank you in advance!
You don't see performance gain because time spent in isPrime() is much smaller than time which threads take when fighting on mutex.
One possible solution is to use atomic operations, as #The Badger suggested. The other way is to partition your task into smaller ones and distribute them over your thread pool.
For example, if you have n threads, then each thread should test numbers from i*(limit/n) to (i+1)*(limit/n), where i is thread number. This way you wouldn't need to do any synchronization at all and your program would (theoretically) scale linearly.
Multithreaded algorithms work best when threads can do a lot of work on their own.
Imagine doing this in real life: you have a group of 20 humans that will do work for you, and you want them to test whether each number up to 1000 is prime. How will you do this?
Would you hand each person a single number at a time, and ask them to come back to you to tell you if its prime and to receive another number?
Surely not; you would give each person a bunch of numbers to work on at once, and have them come back and tell you how many were prime and to receive another bunch of numbers.
Maybe even you'd divide up the entire set of numbers into 20 groups and tell each person to work on a group. (but then you run the risk of one person being slow and having everyone else sitting idle while you wait for that one person to finish... although there are so-called "work stealing" algorithms, but that's complicated)
The same thing applies here; you want each thread to do a lot of work on its own and keep its own tally, and only have to check back with the centralized information once in a while.
A better solution would be to use the Sieve of Atkin to find the primes (even the Sieve of Eratosthenes which is easier to understand is better), your basic algorithm is very poor to start with. It will for every number n in your interval do n checks in order to determine if it's prime and do this limit times. This means that you're doing about limit*limit/2 checks - that's what we call O(n^2) complexity. The Sieve of Atkins OTOH only have to do O(n) operations to find all primes. If n is large it is hard to beat the algorithm that has fewer steps by performing the steps faster. Trying to fix a poor algorithm by throwing more resources on it is a bad strategy.
Another problem with your implementation is that it has race conditions and therefore is broken to start with. It's often little use in optimizing something unless you first make sure it's working correctly. The problem is in the primes function:
void primes() {
while (getNext() <= limit)
if( isPrime(icounter) )
primeCnt();
}
Between the getNext() and isPrime another thread may have increased the icounter and cause the program to skip candidates. This results in the program giving different result each time. In addition neither icounter nor pCnt is declared volatile so there's actually no guarantee that the value gets to the global storage location as part of the mutex lock.
Since the problem is CPU intensive, that is almost all of the time is spent executing CPU instructions multi threading won't help unless you have multiple CPU's (or cores) which the OS are scheduling threads of the same process on. This means that there is a limit of number of threads (that can be as low as 1 - I fx see only a improvement for two threads, beyond that theres none) where you can expect an improved performance. What happens if you have more threads than cores is that the OS will just let one thread run for a while on a core and then switch the thread an let the next thread execute for a while.
The problem that may arise when scheduling threads on different cores is in addition that each core may have separate cache (which is faster than the shared cache). In effect if two threads are going to access the same memory the separated cache has to be flushed as part of the synchronization of the data involved - this may be time consuming.
That is you have to strive to keep the data that the different threads are working on separate and minimize the frequent use of common variable data. In your example it would mean that you should avoid the global data as much as possible. The counter for example need only be accessed when the counting has finished (to add the threads contribution to the count). Also you could minimize the use of icounter by not reading it for each candidate, but get a bunch of candidates in one go. Something like:
void primes() {
int next;
int count=0;
while( (next = getNext(1000)) <= limit ) {
for( int j = next; j < next+1000 && j <= limit ; j++ ) {
if( isPrime(j) )
count++;
}
}
primeCnt(count);
}
where getNext is the same, but it reserves a number of candidates (by increasing icounter by the supplied count) and primeCnt adds count to pCnt.
Consequently you may end up in a situation where the core runs one thread, then after a while switch to another thread and so on. The result of this is that you will have to run all the code for your problem plus code for switching between the thread. Add that you will probably have more cache hits, then this will probably even be slower.
Perhaps instead of a mutex try to use an atomic integer for the counter. It might speed it up a bit, not sure by how much.
#include <atomic>
std::atomic<uint64_t> pCnt; // Made uint64 for bigger range as #IgnisErus mentioned
std::atomic<uint64_t> icounter;
int getNext() {
return ++icounter; // Pre increment is faster
}
void primeCnt() {
++pCnt;
}
On benchmarking, most of the time the processor need to warm up to get the best performance, so to take the time once is not always a good representation of the actual performance. Try to run the code many times and get an average. You can also try to do some heavy work before you do the calculation (A long for-loop calculating the power of some counter?)
Getting accurate benchmark results is also a topic of interest for me since I do not yet know how to do it.
One my thread writes data to circular-buffer and another thread need to process this data ASAP. I was thinking to write such simple spin. Pseudo-code!
while (true) {
while (!a[i]) {
/* do nothing - just keep checking over and over */
}
// process b[i]
i++;
if (i >= MAX_LENGTH) {
i = 0;
}
}
Above I'm using a to indicate that data stored in b is available for processing. Probaly I should also set thread afinity for such "hot" process. Of course such spin is very expensive in terms of CPU but it's OK for me as my primary requirement is latency.
The question is - am I should really write something like that or boost or stl allows something that:
Easier to use.
Has roughly the same (or even better?) latency at the same time occupying less CPU resources?
I think that my pattern is so general that there should be some good implementation somewhere.
upd It seems my question is still too complicated. Let's just consider the case when i need to write some items to array in arbitrary order and another thread should read them in right order as items are available, how to do that?
upd2
I'm adding test program to demonstrate what and how I want to achive. At least on my machine it happens to work. I'm using rand to show you that I can not use general queue and I need to use array-based structure:
#include "stdafx.h"
#include <string>
#include <boost/thread.hpp>
#include "windows.h" // for Sleep
const int BUFFER_LENGTH = 10;
int buffer[BUFFER_LENGTH];
short flags[BUFFER_LENGTH];
void ProcessorThread() {
for (int i = 0; i < BUFFER_LENGTH; i++) {
while (flags[i] == 0);
printf("item %i received, value = %i\n", i, buffer[i]);
}
}
int _tmain(int argc, _TCHAR* argv[])
{
memset(flags, 0, sizeof(flags));
boost::thread processor = boost::thread(&ProcessorThread);
for (int i = 0; i < BUFFER_LENGTH * 10; i++) {
int x = rand() % BUFFER_LENGTH;
buffer[x] = x;
flags[x] = 1;
Sleep(100);
}
processor.join();
return 0;
}
Output:
item 0 received, value = 0
item 1 received, value = 1
item 2 received, value = 2
item 3 received, value = 3
item 4 received, value = 4
item 5 received, value = 5
item 6 received, value = 6
item 7 received, value = 7
item 8 received, value = 8
item 9 received, value = 9
Is my program guaranteed to work? How would you redesign it, probably using some of existent structures from boost/stl instead of array? Is it possible to get rid of "spin" without affecting latency?
If the consuming thread is put to sleep it takes a few microseconds for it to wake up. This is the process scheduler latency you cannot avoid unless the thread is busy-spinning as you do. The thread also needs to be real-time FIFO so that it is never put to sleep when it is ready to run but exhausted its time quantum.
So, there is no alternative that could match latency of busy spinning.
(Surprising you are using Windows, it is best avoided if you are serious about HFT).
This is what Condition Variables were designed for. std::condition_variable is defined in the C++11 standard library.
What exactly is fastest for your purposes depends on your problem; You can attack it from several angles, but CVs (or derivative implementations) are a good starting point for understanding the subject better and approaching an implementation.
Consider using C++11 library if your compiler supports it. Or boost analog if not. And in your case especially std::future with std::promise.
There is a good book about threading and C++11 threading library:
Anthony Williams. C++ Concurrency in Action (2012)
Example from cppreference.com:
#include <iostream>
#include <future>
#include <thread>
int main()
{
// future from a packaged_task
std::packaged_task<int()> task([](){ return 7; }); // wrap the function
std::future<int> f1 = task.get_future(); // get a future
std::thread(std::move(task)).detach(); // launch on a thread
// future from an async()
std::future<int> f2 = std::async(std::launch::async, [](){ return 8; });
// future from a promise
std::promise<int> p;
std::future<int> f3 = p.get_future();
std::thread( [](std::promise<int>& p){ p.set_value(9); },
std::ref(p) ).detach();
std::cout << "Waiting..." << std::flush;
f1.wait();
f2.wait();
f3.wait();
std::cout << "Done!\nResults are: "
<< f1.get() << ' ' << f2.get() << ' ' << f3.get() << '\n';
}
If you want a fast method then simply drop to making OS calls. Any C++ library wrapping them is going to be slower.
e.g. On Windows your consumer can call WaitForSingleObject(), and your data-producing thread can wake the consumer using SetEvent(). http://msdn.microsoft.com/en-us/library/windows/desktop/ms687032(v=vs.85).aspx
For Unix, here is a similar question with answers: Windows Event implementation in Linux using conditional variables?
Do you really need threading?
A single threaded app is trivially simple and eliminates all the issues with thread safety and the overhead of launching threads. I did a study of threaded vs non threaded code to append text to a log file. The non threaded code was better in every measure of performance.
Hello dear members of stackoverflow I've recently started learning C++, today I wrote a little game but my random function doesn't work properly. When I call my random function more than once it doesn't re-generate a number instead, it prints the same number over and over again. How can I solve this problem without using for loop?
Thanks
#include "stdafx.h"
#include <iostream>
#include <time.h>
using namespace std;
int rolld6();
int main()
{
cout<<rolld6()<<endl;
cout<<rolld6()<<endl;
system("PAUSE");
return 0;
}
int rolld6()
{
srand(time(NULL));
return rand() % 6 + 1;;
}
srand(time(NULL)); should usually be done once at the start of main() and never again.
The way you have it will give you the same number every time you call rolld6 in the same second, which could be a lot of times and, in your sample, is near guaranteed since you call it twice in quick succession.
Try this:
#include "stdafx.h"
#include <iostream>
#include <time.h>
#include <stdlib.h>
int rolld6 (void) {
return rand() % 6 + 1;
}
int main (void) {
srand (time (NULL));
std::cout << rolld6() << std::endl;
std::cout << rolld6() << std::endl;
system ("PAUSE");
return 0;
}
One other thing to keep in mind is if you run this program itself twice in quick succession. If the time hasn't changed, you'll get the same two numbers in both runs. That's only usually a problem when you have a script running the program multiple times and the program itself is short lived.
For example, if you took out your system() call and had a cmd.exe script which called it thrice, you might see something like:
1
5
1
5
1
5
It's not something you usually do but it should be kept in mind on the off chance that the scenario pops up.
You are constantly reseeding the random number generator. Only call srand(time(NULL)); once at the beginning of your program.
Random functions (no matter the language) are only partially random.
in every technology you will have a equivalent to
srand(time(NULL));
This piece of codes seeds the random function to a start value and then the numbers a generated from there onwards
this means if your always reseeding form the same value you'll always get the same numbers
In your case you want to do something like this (calling srand(time(NULL)); only once).
int rolld6 (void) {
return rand() % 6 + 1;;
}
int main (void) {
srand (time (NULL));
...
//call your function here
}
one of the advantage of seeding with the same value is to offer the possibility to regenerate the same sequence of random numbers.
in one of my games, I would randomly place objects on the screen, but I also wanted to implement a retry option. this options of reseeding from the same value allows me to redo it without storing all the random values ^^