Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I want to populate an array with three different random integers.
int itemA[3] = {rand() % 20 + 1, rand() % 20 + 1, rand() % 20 + 1};
Currently I can only seed the random integer if it is in the main. Can someone tell me how to seed it in the header file where my array is?
From what I've found so far, I think I need srand ( time(0) ) in there, but it only does what I want it to do if it is in the main.
This is a surprisingly deep and important question—so much so that it’s mentioned in my profile. The answer is simple: you don’t do this in the header, even though inline variables make it possible to do so. The reason is important: as global state, the seed must be set once (consider that, if multiple headers each set the seed with time(0) before drawing their “random” numbers, they would typically all get the same results).
There are corollaries to this: since the main program is the only part that (by definition) knows the user’s intentions, it should perform such initialization; for example, the user might wish to reproduce previous results by specifying a seed via a command-line option. Even if your program doesn’t support such features (yet), you already have to have a source file to contain main, so you might as well seed the RNG there.
You might object that you’re not writing main, and perhaps that you have no source files at all. However, that just means that you’re writing a (perhaps header-only) library, which immediately implies that for composability you mustn’t arrogate the responsibility for initialization (what if more than one library did?.
The same logic applies to any other process-global parameters like the current working directory or environment variables. It’s fine for libraries (and internal header files, treating them as miniature libraries) to provide functions to help main manipulate such things (e.g., to collect entropy for the seed or add elements to PATH-like environment variables), but they should never take such actions on their own.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Whenever we want to pick a random number from a vector we use a method called rand(). I want to know how it works from the backend.
rand has a seed value - e.g. setting it to the current time...
srand( time(NULL ) ); // second good enough
Then there is some math such as this....
unsigned int seed; // set by srand
unsigned int rand() {
seed = seed * number + offset;
return seed;
}
The number and offset are chosen, so the whole of the range of `unsigned int are covered. This generally means some form of prime number.
As mentioned in the comments, this is a very complex area.
If srand is not called, then seed has an initial value, which means that (ignoring thread timing issues), your program will get the same results each time it is run.
Getting the same results is handy for re-running tests, but problematic if it is say a game logic.
There is no "back-end" involved for rand.
BTW, in C++ you'll better use <random> standard header and related utilities, which are in the C++ standard library.
The rand function is part of the C standard library. It is unrelated to C++ vectors.
They (both the rand function and utilities from <random>) are based upon pseudo-random number generators, a quite complex field. You can still get a PhD by inventing better PRNGs.
If you want to understand how rand is (or can be) implemented, you'll better study the source code of some existing free software C standard library (like e.g. GNU glibc or musl-libc).
If you want to understand how <random> is implemented, study the source code of your C++ standard library. If you use the GCC compiler (e.g. compiling with the g++ program), it is provided by it.
Background: I use rand(), std::rand(), std::random_shuffle() and other functions in my code for scientific calculations. To be able to reproduce my results, I always explicitly specify the random seed, and set it via srand(). That worked fine until recently, when I figured out that libxml2 would also call srand() lazily on its first usage - which was after my early srand() call.
I filled in a bug report to libxml2 about its srand() call, but I got the answer:
Initialize libxml2 first then.
That's a perfectly legal call to be made from a library. You should
not expect that nobody else calls srand(), and the man page nowhere
states that using srand() multiple time should be avoided
This is actually my question now. If the general policy is that every lib can/should/will call srand(), and I can/might also call it here and there, I don't really see how that can be useful at all. Or how is rand() useful then?
That is why I thought, the general (unwritten) policy is that no lib should ever call srand() and the application should call it only once in the beginning. (Not taking multi-threading into account. I guess in that case, you anyway should use something different.)
I also tried to research a bit which other libraries actually call srand(), but I didn't find any. Are there any?
My current workaround is this ugly code:
{
// On the first call to xmlDictCreate,
// libxml2 will initialize some internal randomize system,
// which calls srand(time(NULL)).
// So, do that first call here now, so that we can use our
// own random seed.
xmlDictPtr p = xmlDictCreate();
xmlDictFree(p);
}
srand(my_own_seed);
Probably the only clean solution would be to not use that at all and only to use my own random generator (maybe via C++11 <random>). But that is not really the question. The question is, who should call srand(), and if everyone does it, how is rand() useful then?
Use the new <random> header instead. It allows for multiple engine instances, using different algorithms and more importantly for you, independent seeds.
[edit]
To answer the "useful" part, rand generates random numbers. That's what it's good for. If you need fine-grained control, including reproducibility, you should not only have a known seed but a known algorithm. srand at best gives you a fixed seed, so that's not a complete solution anyway.
Well, the obvious thing has been stated a few times by others, use the new C++11 generators. I'm restating it for a different reason, though.
You use the output for scientific calculations, and rand usually implements a rather poor generator (in the mean time, many mainstream implementations use MT19937 which apart from bad state recovery isn't so bad, but you have no guarantee for a particular algorithm, and at least one mainstream compiler still uses a really poor LCG).
Don't do scientific calculations with a poor generator. It doesn't really matter if you have things like hyperplanes in your random numbers if you do some silly game shooting little birds on your mobile phone, but it matters big time for scientific simulations. Don't ever use a bad generator. Don't.
Important note: std::random_shuffle (the version with two parameters) may actually call rand, which is a pitfall to be aware of if you're using that one, even if you otherwise use the new C++11 generators found in <random>.
About the actual issue, calling srand twice (or even more often) is no problem. You can in principle call it as often as you want, all it does is change the seed, and consequentially the pseudorandom sequence that follows. I'm wondering why an XML library would want to call it at all, but they're right in their response, it is not illegitimate for them to do it. But it also doesn't matter.
The only important thing to make sure is that either you don't care about getting any particular pseudorandom sequence (that is, any sequence will do, you're not interested in reproducing an exact sequence), or you are the last one to call srand, which will override any prior calls.
That said, implementing your own generator with good statistical properties and a sufficiently long period in 3-5 lines of code isn't all that hard either, with a little care. The main advantage (apart from speed) is that you control exactly where your state is and who modifies it.
It is unlikely that you will ever need periods much longer than 2128 because of the sheer forbidding time to actually consume that many numbers. A 3GHz computer consuming one number every cycle will run for 1021 years on a 2128 period, so there's not much of an issue for humans with average lifespans. Even assuming that the supercomputer you run your simulation on is a trillion times faster, your grand-grand-grand children won't live to see the end of the period.
Insofar, periods like 219937 which current "state of the art" generators deliver are really ridiculous, that's trying to improve the generator at the wrong end if you ask me (it's better to make sure they're statistically firm and that they recover quickly from a worst-case state, etc.). But of course, opinions may differ here.
This site lists a couple of fast generators with implementations. They're xorshift generators combined with an addition or multiplication step and a small (from 2 to 64 machine words) lag, which results in both fast and high quality generators (there's a test suite as well, and the site's author wrote a couple of papers on the subject, too). I'm using a modification of one of these (the 2-word 128-bit version ported to 64-bits, with shift triples modified accordingly) myself.
This problem is being tackled in C++11's random number generation, i.e. you can create an instance of a class:
std::default_random_engine e1
which allows you to fully control only random numbers generated from object e1 (as opposed to whatever would be used in libxml). The general rule of thumb would then be to use new construct, as you can generate your random numbers independently.
Very good documentation
To address your concerns - I also think that it would be a bad practice to call srand() in a library like libxml. However, it's more that srand() and rand() are not designed to be used in the context you are trying to use them - they are enough when you just need some random numbers, as libxml does. However, when you need reproducibility and be sure that you are independent on others, the new <random> header is the way to go for you. So, to sum up, I don't think it's a good practice on library's side, but it's hard to blame them for doing that. Also, I could not imagine them changing that, as billion other pieces of software probably depend on it.
The real answer here is that if you want to be sure that YOUR random number sequence isn't being altered by someone else's code, you need a random number context that is private to YOUR work. Note that calling srand is only one small part of this. For example, if you call some function in some other library that calls rand, it will also disrupt the sequence of YOUR random numbers.
In other words, if you want predictable behaviour from your code, based on random number generation, it needs to be completely separate from any other code that uses random numbers.
Others have suggested using the C++ 11 random number generation, which is one solution.
On Linux and other compatible libraries, you could also use rand_r, which takes a pointer to an unsigned int to a seed that is used for that sequence. So if you initialize that a seed variable, then use that with all calls to rand_r, it will be producing a unique sequence for YOUR code. This is of course still the same old rand generator, just a separate seed. The main reason I meantion this is that you could fairly easily do something like this:
int myrand()
{
static unsigned int myseed = ... some initialization of your choice ...;
return rand_r(&myseed);
}
and simply call myrand instead of std::rand (and should be doable to work into the std::random_shuffle that takes a random generator parameter)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am working on parallel algorithm optimization (sparse matrix) and working on register blocking. I want to find number and type of registers (specifically floating point registers and then others) available in machine In order to tune my code based on available registers and make it platform independent. Is there any way to do this in c++?
thank you.
mjr
In general, compilers do know this sort of stuff (and how to best use it), so I'm slightly surprised that you think that you can outsmart the compiler - unless I have very high domain knowledge, and start writing assembler code, I very rarely outsmart the compiler.
Since writing assembler code is highly unportable, I don't think that counts as a solution for optimising the code using knowledge as to how many registers, etc. It is very difficult to know how the compiler uses registers. If you have int x = y + z; as a simple example, how many registers does it take? Depends on the compiler - it could use none, one, two, three, four, five or six, without being below optimal register usage - it all depends on how the compiler decides to deal with things, machine architecture, where/how variables are being stored, etc. The same principle applies to number of floating point registers if we change int to double. There is no obvious way to tell how many registers are being used in this statement (although I suspect no more than three - however, it could be zero or one, depending on what the compiler decides to do).
It's probably possible to do some clever tricks if you know the processor architecture and how the compiler deals with certain types of code - but that also assumes that the compiler doesn't change its behaviour in the next release. But if you know what processor architecture it is, then you also know the number of registers of various kinds...
I am afraid there is no easy portable solution.
There are many factors that could influence the optimal block size for a given computer. One way to discover a good configuration is by automatically running a series of benchmarks, and using the results to tune your code at runtime.
Another approach is to automatically tweak the source code based on the results of some benchmarks. This is what Automatically Tuned Linear Algebra Software (ATLAS) does.
A carelessly written template here, some excessive inlining there - it's all too easy to write bloated code in C++. In principle, refactoring to reduce that bloat isn't too hard. The problem is tracing the worst offending templates and inlines - tracing those items that are causing real bloat in real programs.
With that in mind, and because I'm certain that my libraries are a bit more bloat-prone than they should be, I was wondering if there's any tools that can track down those worst offenders automatically - i.e. identify those items that contribute most (including all their repeated instantiations and calls) to the size of a particular target.
I'm not much interested in performance at this point - it's all about the executable file size.
Are there any tools for this job, usable on Windows, and fitting with either MinGW GCC or Visual Studio?
EDIT - some context
I have a set of multiway-tree templates that act as replacements for the red-black tree standard containers. They are written as wrappers around non-typesafe non-template code, but they were also written a long time ago and as an "will better cache friendliness boost real performance" experiment. The point being, they weren't really written for long-term use.
Because they support some handy tricks, though (search based on custom comparisons/partial keys, efficient subscripted access, search for smallest unused key) they ended up being in use just about everywhere in my code. These days, I hardly ever use std::map.
Layered on top of those, I have some more complex containers, such as two-way maps. On top of those, I have tree and digraph classes. On top of those...
Using map files, I could track down whether non-inline template methods are causing bloat. That's just a matter of finding all the instantiations of a particular method and adding the sizes. But what about unwisely inlined methods? The templates were, after all, meant to be thin wrappers around non-template code, but historically my ability to judge whether something should be inlined or not hasn't been very reliable. The bloat impact of those template inlines isn't so easy to measure.
I have some idea which methods are heavily used, but that's the well-known opimization-without-profiling mistake.
Check out Symbol Sort. I used it a while back to figure out why our installer had grown by a factor of 4 in six months (it turns out the answer was static linking of the C runtime and libxml2).
Map file analysis
I have seen a problem like this some time ago, and I ended up writing a custom tool which analysed map file (Visual Studio linker can be instructed to produce one). The tool output was:
list of function sorted descending by code size, listing only first N
list of source files sorted descending by code size, listing only first N
Parsing map file is relatively easy (function code size can be computed as a difference between current and following line), the hardest part is probably handling mangled names in a reasonable way. You might find some ready to use libraries for both of this, I did it a few years ago and I do not know the current situation.
Here is a short excerpt from a map file, so that you know what to expect:
Address Publics by Value Rva+Base Lib:Object
0001:0023cbb4 ?ApplyScheme#Input##QAEXPBVParamEntry###Z 0063dbb4 f mainInput.obj
0001:0023cea1 ?InitKeys#Input##QAEXXZ 0063dea1 f mainInput.obj
0001:0023cf47 ?LoadKeys#Input##QAEXABVParamEntry###Z 0063df47 f mainInput.obj
Symbol Sort
As posted in Ben Staub's answer, Symbol Sort is a ready to use command line utility (comes with a complete C# source) which does all of this, with the only difference of not analysing map files, but rather pdb/exe files.
So what I'm reading based on your question and your comments is that the library is not actually too large.
The only tool you need to determine this is a command shell, or Windows File explorer. Look at the file size. Is it so big that it causes real actual problems? (Unacceptable download times, won't fit in memory on the target platform, that kind of thing)?
If not, then you should worry about code readability and maintainability and nothing else. And the tool for that is your eyes. Read the code, and take the actions needed to make it more readable if necessary.
If you can point to an actual reason why the executable size is a problem, please edit that into your question, as it is important context.
However, assuming the file size is actually a problem:
Inlined functions are generally not a problem, because the compiler, and no one else, chooses which functions to inline. Simply marking something inline does not inline the actual generated code. The compiler inlines if it determines the trade-off between larger code and less indirection to be worth it. If a function is called often, it will not be inlined, because that would dramatically affect code size, which would hurt performance.
If you're worried that inlined functions cause code bloat, simply compile with the "optimize for size" flag. Then the compiler will restrict inlining to the cases where it doesn't affect executable size noticeably.
For finding out which symbols are biggest, parse the map file as #Suma suggested.
But really, you said it yourself when you mentioned "the well-known opimization-without-profiling mistake."
The very first act of profiling you need to do is to ask is the executable size actually a problem? In the comments you said that you "have a feeling", which, in a profiling context is useless, and can be translated into "no, the executable size is not a problem".
Profile. Gather data and identify trouble spots. Before worrying about how to bring down the executable size, find out what the executable size is, and identify whether or not that is actually a problem. You haven't done that yet. You read in a book that "code bloat is a problem in C++", and so you assume that code bloat is a problem in your program. but is it? Why? How do you determine that it is?
http://www.sikorskiy.net/prj/amap/index.html
This is wonderful object file in lib/library size analysis GUI tool generated from Visual studio compiler map file . this tool analyses and generates report from map file . you can do filtering also and it dynamically display size . just input the map file to this tool and this tool will list what function are occupying which size the given map fiel generated by dll/exe check the screenshots of it in above file/ you can sort on size also.
Basically, you are looking for costly things that you don't need. Suppose there is some category of functions that you don't need taking some large percent of the space, like 20%. Then if you picked 20 random bytes out of the image size, on the average 4 of them (20 * 20%) will be in that category, and you will be able to see them. So basically, you take those samples, look at them, and if you see an obvious pattern of functions that you don't really need, then remove them. Then do it again because other categories of routines that used less space are now taking a higher percentage.
So I agree with Suma that parsing the map file is a good start. Then I would write a routine to walk through it, and every 5% of the way (space-wise) print the routine I am in. That way I get 20 samples. Often I find that a large chunk of object space results from a very small number (like 1) of lines of source code that I could easily have done another way.
You are also worried about too much inlining making functions larger than they could be. To figure that out, I would take each of those sample, and since it represents a specific address in a specific function, I would trace that back to the line of code it is in. That way, I can tell if it is in an expanded function. This is a bit of work, but doable.
A similar problem is how to find tumors when disks get full. The same idea there is to walk the directory tree, adding up the file sizes, Then you walk it again, and as you pass each 5% point, you print out the path of the file you are in. This tells you not only if you have large files, it tells you if you have large numbers of small files, and it doesn't matter how deeply they are buried or how widely they are scattered. When you clean out one category of files that you don't need, you can do it again to get the next category, and so on.
Good luck.
Your question seems to tend towards run-time rather than compile-time bloat.
However, if compile-time bloat (plus binary bloat resulting from inefficient compilation) is relevant, then I have to mention clang tool IWYU.
Since IWYU likely will manage to toss quite a number of #include:s in your code areas, this should also manage to reduce binary bloat. At least for my own environment I can certainly confirm a useful reduction in build time.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I've used
#include<stdlib>
#include<time>
using namespace std;
srand((unsigned)time(0));
int n=(rand()>>8)%4;
but what other random functions are there, or what other function could be used as random number generators?
EDIT: I don't really have a particular reason for asking this question, I just wanted to know if C++ had any other random functions.
Boost Random Number Library offers a broad range of generators (quality vs performance) and some typical random distributions. Everything rather nice and straightforward to use.
If you want some other methods/libraries - then google for cryptographic random numbers, also you can use this document as a reference.
Don't invent your own solutions unless you are an expert/researcher in the field/etc, take advantage of already existing solutions which were usually written by Smart People, and thoroughly examined by other Smart People.
The rand() and srand() functions are all the C++ Standard specifies. And if it comes to writing your own, be aware of what John von Neumann said:
"Anyone who considers arithmetical
methods of producing random digits is
of course in a state of sin"
This code is pretty efficient. Although users may begin to notice a pattern after a few iterations.
int FastRandom()
{
return 10;
}
Not strictly C++, but Windows specific:
CryptGenRandom
I'm sure all operating systems have their equivalent cryptographically secure random generator functions.
int unixrand()
{
int x;
int f = open("/dev/random", O_RDONLY);
if (f < 0) return -1; /* Error */
if (sizeof(x) != read(f, &x, sizeof(x))) {
close(f);
return -1;
}
close(f);
if (x < 0) x = ~x;
return x;
}
(Cross-posting from an answer I just wrote to a similar question)
Have a look at ISAAC (Indirection, Shift, Accumulate, Add, and Count). Its uniformly distributed and has an average cycle length of 2^8295.
It's fast too, since it doesnt involve multiplication or modulus.
Bruce Schneier and John Kelsey wrote a random number generator you may be interested in. Rather, it's a seed generator. Even though Yarrow is no longer supported, you may be interested in how it gathers entropy.
OpenSSL has an API that is relatively easy to access and pretty portable. And Mozilla comes with a decent API that wraps whatever the OS offers.
Personally, though, I generally use Boost.Random, which was already suggested.
Random gives you a good random number at uniform distribution and does a pretty good job at that.
Anything else would mean that you want to actually skew the distribution.
For example, using Microsoft's GUIDs generator would give you a random id that is less likely to be repeated and takes into account things like time and computer.
Time is usually the most random operation that is also cheap to perform, but it's still possible to predict.
If you want true randomness, using some kind of external input is your only solution.
Quantum Random Bit Generator is one service that provides such data.