Same C source, different output - c++

I've found an old C source code that I've implemented several years ago, and also its compiled binary executable.
In a comment I wrote that the compilation command was:
gcc -O3 source.c -o executable -lm
So I recompiled it but the new executable is different (in size) from the old one.
In fact if I run the new and the old executable they give me different results: the old executable returns the same result returned years ago, while the new one returns a different result.
My goal would be to be able to recompile the source and obtain the same executable as the old (or at least an executable that produce exactly the same result).
I'm sure that I run the two programs with the same parameters, and that the code does not use threads at all. The only particular thing is that I need random integers, but I use my own function to produce them just in case to be sure that the sequence of random numbers is always the same (and of course I always use the same seed).
unsigned int seed = 0;
void set_srand(unsigned int aseed) {
seed = aseed;
}
int get_rand() {
seed = seed * 0x12345 + 0xABC123;
int j = (seed >> 0x10) & RAND_MAX;
return j;
}
(I thought this was copied from some library).
So what can it be? Maybe the OS where the compilation is done (the original one was under WinXP, now I'm trying under both Win7 and Ubuntu), but I've always only used MinGW. So maybe the version of MinGW? If this is the case, I'm in trouble because I don't remember which version I've used several years ago.
Libraries that I use:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
What I do are only string operations and computation like pow(), sqrt(), multiplication and plus/minus, in order to apply an heuristic optimization algorithm to solve approximately an NP-hard problem: the results are both solutions of the problem, but their fitness differs.

The first thing I would check is the size of int. Your code relies on overflow and the size of integers may matter.
Your code seems to imply that int is at least 32 bit (you use 0x0x269EC3) but maybe you are now compiling with int at 64 bit.
I wouldn't worry about executable file, it's very unlikely you get the same size by two different compilers.

Related

Trouble with large numbers while using <iomanip> larger in cpp

While sorting and displaying big numbers you usually end up displaying the large numbers in enotation. I was trying to display the whole number by using the <iomanip> library in cpp and it fails for very large numbers.
//big sorting
#include<bits/stdc++.h>
#include<iomanip>
using namespace std;
int main()
{
int n;
cin>>n;
double arr[n];
for (int i = 0;i < n; i++)
cin>>arr[i];
sort(arr, arr+n);
cout<<fixed<<setprecision(0);
for (int i = 0;i < n; i++)
cout<<arr[i]<<endl;
}
Input:
31415926535897932384626433832795
1
3
10
3
5
Expected output:
1
3
3
5
10
31415926535897932384626433832795
Actual output:
1
3
3
5
10
31415926535897933290036940242944
The last digit is getting messed up.
The double type precision is only 15 decimal digits, so very large whole numbers can't be expressed in double type without loss of precison.
Read more about C++, perhaps the C++11 standard n3337.
Read also the documentation of your C++ compiler, e.g. GCC (invoked as g++) or Clang (invoked as clang++). Read of course a good C++ programming book, since C++ is a very difficult programming language. Use C++ standard containers and smart pointers.
Large numbers does not fit natively in a computer memory (or in its registers). For example, with C++ code compiled by GCC on Linux/x86-64, an int has just 32 bits.
Consider using arbitrary precision arithmetic. You might be interested by GMPlib.
Floating point numbers are weird. Be sure to read the famous floating-point-gui.de website, and see also this answer.
#include<bits/stdc++.h>
is wrong since non-standard. Take the habit of #include-ing only headers needed by your translation unit, except if you use pre-compiled headers.
Take some time to read more about numbers and arithmetic. Some notion of modular arithmetic is incredibly useful when programming: a lot of computers are computing modulo 232 or 264.
Study for inspiration the C++ source code of existing open source software (e.g. on github or gitlab, including FLTK). If you use Linux, its fish-shell has a nice C++ code. You could even glance inside the source code of GCC and of Clang, both being nice C++ open source compilers.
In practice, read also about build automation tools such as GNU make (free software coded in C) or ninja (open source tool coded in C++).
Don't forget to use a version control system (I recommend git).
Read How to debug small programs.
Enable all warnings and debug info when compiling your C++ code (with GCC, use g++ -Wall -Wextra -g).
Read of course the documentation of your favorite debugger.
I am a happy user of GDB.
Consider using static program analysis tools such as the Clang static analyzer or Frama-C++.

C++ How To Read First Couple Bytes Of Function? (32 bit Machine)

Let's say I have a function like this (completely random, I just wrote it up in like 30 seconds for an example)
bool exampleAuthetnication(char *a, char *b)
{
bool didAuthenticate = false;
if(strcmp(a, b) == 0)
{
didAuthenticate = true;
}
if(didAuthenticate)
{
return true;
}
else
{
stopExecutable();
return false;
}
}
How would I go about reading the first few bytes of this function?
I've come up with this
int functionByteArray[10];
for (int i = 0; i < 10; i++)
{
functionByteArray[i] = *(int*)(((int)&exampleAuthetnication) + (0x04 * i));
}
The logic behind it being that we get the memory address of our function (in this case exampleAuthetnication()) then we cast to int pointer then dereferance to get the value of the current line of bytes we are trying to read then store in functionByteArray, but it does not seem to work properly. What am I doing wrong? Is what I'm trying to accomplish possible?
In theory (according to the C++11 standard) you cannot even cast a function pointer into a data pointer (on Harvard architectures code and data sit in different memories and different address spaces). Some operating systems or processors might also forbid reading of executable code segments (read about NX bit).
In practice, on x86-64 (or 32 bits x86) running some operating system like Linux or Windows, a function code is a sequence of bytes and can be unaligned, and sits in the (common) virtual address space of its process. So you should at least have char functionByteArray[40]; and you might use std::memcpy from <string> and do some
std::memcpy(functionByteArray, (char*)&exampleAuthetnication,
sizeof(functionByteArray));
At last your code is wrong because -on x86-64 notably- int have not the same size as pointers (so (int)&exampleAuthetnication is losing the upper bytes of the address). You should at least use intptr_t. And int has stronger alignment constraints than the code.
BTW, you might also ask your compiler to show the generated assembler code. With GCC compile your exampleAhtetnication C++ code with g++ -O -fverbose-asm -S and look into the generated .s file.
Notice that the C++ compiler might optimize to the point of "removing" some function from the code segment (e.g. because that function has been inlined everywhere), or split the function code in several pieces, or put that
exampleAhtetnication code "inside" another function...
C++ source code is not a list of instructions for a computer to perform; it is a collection of statements that describe the meaning of a program.
Your compiler interprets these statements and produces an actual sequence of instructions (via the assembly stage) that can actually be executed in our physical reality.
The language used to do so does not provide any facilities for examining the bytes that make up the compiled program. All of your attempts to cast function pointers and the like may randomly give you some similar data, via the magic of undefined behaviour, but the results are just that: undefined.
If you wish to examine the compiled executable, do so from outside of the program. You could use a hex editor, for example.

C++ rand and srand gets different output on different machines

I wanted to generate a random integer, so I used C++ rand(void) and srand(int) functions:
int main(){
srand(1);
cout << rand() << endl;
return 0;
}
OK, it suits my needs. Each time I execute it I get same result, which I like it!
But there is a problem. When I executed it on my computer I got 16807 as output. But when I executed it on another machine, I got 1804289383.
I know that rand() and srand(int) have a simple implementation similar to this:
static unsigned long int next = 1;
int rand(void) // RAND_MAX assumed to be 32767
{
next = next * 1103515245 + 12345;
return (unsigned int)(next/65536) % 32768;
}
void srand(unsigned int seed)
{
next = seed;
}
So why? Is it possible that rand() has different implementations on multiple machines? What should I do?
I want to modify the other machine in such a way that I get 16807 from that machine too.
Please note that I love the rand implementation on my computer. Please show me a way that other machine gets same result with mine.
Thanks in advance.
Yes, rand() has different implementations; there's no requirement for them to be identical.
If you want consistent sequences across implementations and platforms, you can copy the sample implementation from the C standard section 7.20.2. Be sure to rename both rand and srand so they don't collide with the standard library's versions. You might need to adjust the code so the types have the same size and range across the implementations (e.g., use uint32_t from <stdint.h> rather than unsigned int).
EDIT: Given the new information from the comments, it looks like the requirements are different from what we thought (and I'm still not 100% clear on what they are).
You wants to generate random numbers on two systems consistent with a stored file that you've generated on one system, but you're unable to transfer it to the other due to network issues (the file is about a gigabyte). (Burning it to a DVD, or splitting it and burning it to 2 CDs, isn't an option?)
Suggested solution:
Write a custom generator that generates consistent results on both systems (even if they're not the same results you got before). Once you've done that, use it to re-generate a new 1-gigabyte data file on both systems. The existing file becomes unnecessary, and you don't need to transfer huge amounts of data.
I think it's because int/unsigned int on your two platforms is a different size. Are ints/unsigned ints the same number of bytes on both machines/OSes you're compiling on? What platforms/compilers are you using?
Assuming the same rand/srand implementation, you need to use datatypes of the same precision (or appropriate casting) to get the same result. If you have stdint.h on your platform, try and use that (so you can define explicit sizes, e.g. uint32_t).
The C and C++ specifications do not define a particular implementation for rand or srand. They could be anything, as long as it is somewhat random. You cannot expect consistent output from different standard libraries.
The rand implementations can be different. If you need identical behavior on different machines, you need a random number generator that provides that. You can roll your own or use someone else's.
I am not sure if the random generators in the C++0x library suffices. I think not. But reading the standardeese there makes my head spin.
Similarly, I'm not sure whether the Boost Random library suffices. But I think it's worth checking out. And there you have the source code, so at worst it can serve as basis for rolling your own.
Cheers & hth.,
Also, there are different Pseudo-RNG algorithms (e.g LCG vs Mersenne Twister)
http://en.wikipedia.org/wiki/Random_number_generation
C compiler on your first machine may use one, and the second machine may use another.

How much is 32 kB of compiled code

I am planning to use an Arduino programmable board. Those have quite limited flash memories ranging between 16 and 128 kB to store compiled C or C++ code.
Are there ways to estimate how much (standard) code it will represent ?
I suppose this is very vague, but I'm only looking for an order of magnitude.
The output of the size command is a good starting place, but does not give you all of the information you need.
$ avr-size program.elf
text data bss dec hex filename
The size of your image is usually a little bit more than the sum of the text and the data sections. The bss section is essentially compressed because it is all 0s. There may be other sections which are relevant which aren't listed by size.
If your build system is set up like ones that I've used before for AVR microcontrollers then you will end up with an *.elf file as well as a *.bin file, and possibly a *.hex file. The *.bin file is the actual image that would be stored in the program flash of the processor, so you can examine its size to determine how your program is growing as you make edits to it. The *.bin file is extracted from the *.elf file with the objdump command and some flags which I can't remember right now.
If you are wanting to know how to guess-timate how your much your C or C++ code will produce when compiled, this is a lot more difficult. I have observed a 10x blowup in a function when I tried to use a uint64_t rather than a uint32_t when all I was doing was incrementing it (this was about 5 times more code than I thought it would be). This was mostly to do with gcc's avr optimizations not being the best, but smaller changes in code size can creep in from seemingly innocent code.
This will likely be amplified with the use of C++, which tends to hide more things that turn into code than C does. Chief among the things C++ hides are destructor calls and lots of pointer dereferencing which has to do with the this pointer in objects as well as a secret pointer many objects have to their virtual function table and class static variables.
On AVR all of this pointer stuff is likely to really add up because pointers are twice as big as registers and take multiple instructions to load. Also AVR has only a few register pairs that can be used as pointers, which results in lots of moving things into and out of those registers.
Some tips for small programs on AVR:
Use uint8_t and int8_t instead of int whenever you can. You could also use uint_fast8_t and int_fast8_t if you want your code to be portable. This can lead to many operations taking up only half as much code, because int is two bytes.
Be very aware of things like string and struct constants and literals and how/where they are stored.
If you're not scared of it, read the AVR assembly manual. You can get an idea of the types of instructions, and from that the type of C code that easily maps to those instructions. Use that kind of C code.
You can't really say there. The length of the uncompiled code has little to do with the length of the compiled code. For example:
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
int main()
{
std::vector<std::string> strings;
strings.push_back("Hello");
strings.push_back("World");
std::sort(strings.begin(), strings.end());
std::copy(strings.begin(), strings.end(), std::ostream_iterator<std::string>(std::cout, ""));
}
vs
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
int main()
{
std::vector<std::string> strings;
strings.push_back("Hello");
strings.push_back("World");
for ( int idx = 0; idx < strings.size(); idx++ )
std::cout << strings[idx];
}
Both are the exact same number of lines, and produce the same output, but the first example involves an instantiation of std::sort, which is probably an order of magnitude more code than the rest of the code here.
If you absolutely need to count number of bytes used in the program, use assembler.
Download the arduino IDE and 'verify' some of your existing code, or look at the sample sketches. It will tell you how many bytes that code is, which will give you an idea of how much more you can fit into a given device. Picking a couple of the examples at random, the web server example is 5816 bytes, and the LCD hello world is 2616. Both use external libraries.
Try creating a simplified version of your app, focusing on the most valuable feature first, then start adding up the 'nice (and cool) stuff to have'. Keep an eye on the byte usage shown in the Arduino IDE when you verify your code.
As a rough indication, my first app (LED flasher controlled by a push buttun) requires 1092 bytes. That`s roughly 1K out of 32k. Pretty small footprint for C++ code!
What worries me most is the limited amount of RAM (1 Kb). If the CPU stack takes some of it, then there isn`t much left for creating any data structures.
I only had my Arduino for 48 hrs, so there is still a lot to use it effectively ;-) But it's a lot of fun to use :).
It's quite a bit for a reasonably complex piece of software, but you will start bumping into the limit if you want it to have a lot of different functionality. Also, if you want to store quite a lot of static strings and data, it can eat into that quite quickly. But 32 KB is a decent amount for embedded applications. It tends to be RAM that you have problems with first!
Also, quite often the C++ compilers for embedded systems are a lot worse than the C compilers.
That is, they are nowhere as good as C++ compilers for the common desktop OS's (in terms of producing efficient machine code for the target platform).
At a linux system you can do some experiments with static compiled example programs. E.g.
$ size `which busybox `
text data bss dec hex filename
1830468 4448 25650 1860566 1c63d6 /bin/busybox
The sizes are given in bytes. This output is independent from the executable file format, since the sizes of the different sections inside the file format. The text section contains the machine code and const stufff. The data section contains data for static initialization of variables. The bss size is the size of uninitialized data - of course uninitialized data does not need to be stored in the executable file.)
Well, busybox contains a lot of functionality (like all common shell commands, a shell etc.).
If you link own examples with gcc -static, keep in mind, that your used libc may dramatically increase the program size and that using an embedded libc may be much more space efficient.
To test that you can check out the diet-libc or uclibc and link against that. Actually, busybox is usually linked against uclibc.
Note that the sizes you get this way give you only an order of magnitude. For example, your workstation probably uses another CPU architecture than the arduino board and the machine code of different architecture may differ, more or less, in its size (because of operand sizes, available instructions, opcode encoding and so one).
To go on with rough order of magnitude reasoning, busybox contains roughly 309 tools (including ftp daemon and such stuff), i.e. the average code size of a busybox tool is roughly 5k.

How to eliminate all sources of randomness so that program always gives identical answers?

I have C++ code that relies heavily on sampling (using rand()), but I want it to be reproducible. So in the beginning, I initialize srand() with a random seed and print that seed out. I want others to be able to run the same code again but initializing srand() with that same seed and get exactly the same answer as I did.
But under what circumstances is that guaranteed? I suppose that works only if the binaries are compiled with the same compiler on the same system? What are other factors that might make the answer differ from the one I got initially?
The solution is to use the same code in all cases - the Boost random number library is infinitely better than any C++ standard library implementation, and you can use the same code on all platforms. Take a look at this question for example of its use and links to the library docs.
You're correct that the sequences might be different if compiled on different machines with different rand implementations. The best way to get around this is to write your own PRNG. The Linux man page for srand gives the following simple example (quoted from the POSIX standard):
POSIX.1-2001 gives the following
example of an implementation of rand()
and srand(), possibly useful when one
needs the same sequence on two
different machines.
static unsigned long next = 1;
/* RAND_MAX assumed to be 32767 */
int myrand(void) {
next = next * 1103515245 + 12345;
return((unsigned)(next/65536) % 32768);
}
void mysrand(unsigned seed) {
next = seed;
}
To avoid this kind of problem, write your own implementation of rand()! I'm no expert on random-number generation algorithms, so I'll say no more than that...
Check out implementation of rand(), and use one of the random number generators from there - which ensures repeatability no matter what platform you run on.