The file has not the right magic number - ocaml

I have a bytecode, which worked previously well. Today, I'm trying to run it, I get an error the file './analyze' has not the right magic number: expected Caml1999X029, got Caml1999X023.
I try to switch OCaml version from 4.12.0 e.g., to 4.07.0 which might be the version when I built the bytecode. I still get the same error when running the bytecode.
Does anyone know how I could run this bytecode correctly?

The error message looks extremely reasonable, which suggests that it is probably telling you what you need to know if you only knew what it meant :-) I don't know the structure of the bytecode magic numbers but it looks like a small difference.
You might try a few different versions of OCaml until its ocamlc generates the same magic number as your bytecode file. You can see the magic number of a bytecode file using dd (sorry I can't think of an easier way):
$ echo $(dd bs=12 count=1 if=m.cmo 2>/dev/null)
Caml1999O027
As you can see my OCaml version (4.10.0) is generating a different magic number from either of the two you're seeing.

(Re-posting as an answer what I wrote in comments.)
According to compiler docs, the part of the magic number after X is a number which grows monotonically with versions of the compiler. So, to find out what was the version of OCaml when Caml1999X023 was in use, you can try running your bytecode against various versions of OCaml, by dichotomy; or you can reverse-engineer the compiler by looking at the history of this source file or that one, which is where the magic number is set (I searched for the word ”magic” with the GitHub search bar).
I did it for you: looks like your magic number has been introduced by these 2 commits in April 2018: 1, 2. According to the description of the first one, your version is 4.07.0 — which is consistent with OCaml’s timeline.
I can’t tell why trying this version failed for you. I don’t know the answer to your question about js_of_ocaml-ppx (in comments).

Related

Convenient way to find the declaration of a variable

Sometimes I am reading some code and would like to find the definition for a certain symbol, but it is sprinkled throughout the code to such an extent that grep is more or less insufficient for pointing me to its definition.
For example, I am working with Zlib and I want to figure out what FAR means.
Steven#Steven-PC /c/Users/Steven/Desktop/zlib-1.2.5
$ grep "FAR" * -R | wc -l
260
That's a lot to scan through. It turns out it is in fact #defined to nothing but it took me some time to figure it out.
If I was using Eclipse I would have it easy because I can just hover over the symbol and it will tell me what it is.
What kinds of tools out there can I use to analyze code in this way? Can GCC do this for me? clang maybe? I'm looking for something command-line preferably. Some kind of tool that isn't a full fledged IDE at any rate.
You may want to check out cscope, it's basically made for this, and a command line tool (if you like, using ncurses). Also, libclang (part of clang/llvm) can do so - but that's just a library (but took me just ~100 lines of python to use libclang to emulate basic cscope features).
cscope requires you to build a database first. libclang can parse code "live".
If the variable is not declared in your curernt file, it is declared in an included file, i.e. a .h. So you can limit the amount of data by performing a grep only on those files.
Moreover, you can filter whole word matches with -w option of grep.
Try:
grep -w "FAR" *.h -R | wc -l
Our Source Code Search Engine (SCSE) is kind of graphical grep that indexes a large code base according to the tokens of its language(s) (e.g., C, Java, COBOL, ...). Queries are stated in terms of the tokens, not strings, so finding an identifier won't find it in the middle of a comment. This minimizes false positives, and in a big code base these can be a serious waste of time. Found hits are displayed one per line; a click takes to the source text.
One can do queries from the command line and get grep-like responses, too.
A query of the form of
I=foo*
will find all uses of any identifier that starts with the letters "foo".
Queries can compose mulitiple tokens:
I=foo* '[' ... ']' '='
finds assignments to a subscripted foo ("..." means "near").
For C, Java and COBOL, the SCSE can find reads, writes, updates, and declarations of variables.
D=*baz
finds declarations of variables whose names end in "baz". I think this is what OP is looking for.
While SCSE works for C++, it presently can't find reads/writes/updates/declarations in C++. It does everything else.
The SCSE will handle mixed languages with aplomb. An "I" query will search across all langauges that have identifiers, so you can see cross language calls relatively easily, since the source and target identifiers tend to be the same for software engineering reasons.
gcc can output the pre-processing result, with all macro definitions with gcc -E -dD. The output file would be rather larger, often due to the nested system headers. But the first appearance of a symbol is usually the declaration (definition). The output use #line to show the part pre-processed result belong to source/header file, so you can find where it is originally declared.
To get the exact result when the file is compiled, you may need to add all other parameters used to compile the file, like -I, -D, etc. In fact, I always copy a result compilation command line, and add -E -dD to the beginning, and add (or change) -o in case I accidental overwrite anything.
There is gccxml, but I am not aware of tools that build on top of it. clang and LLVM are suited for such stuff, too; equally, I am not aware of standalone tools that build on them.
Apart from that: QtCreator and code::blocks can find the declartion, too.
So what is it about a "full fledged IDE" you don't want? If its a little speed, I found netbeans somewhat usefull when I was in school, but really for power and speed and general utility I would like to reccomend emacs. It has key board shortcuts for things like this. Keep in mind, its a learning curve to be sure, but once you are over the hump there is no going back.

Reverse engineering your own code c++

I have a compiled program which I want to know if a certain line exist in it. Is there a way, using my source code, I could determine that?
Tony commented on my message so I'll add some info:
I'm using the g++ compiler.
I'm compiling the code on Linux(Scientific)/Unix machine
I only use standard library (nothing downloaded from the web)
The desired line is either multiplication by a number (in a subfunction of a while group) or printing a line in a specific case (if statement)
I need this becouse I'm running several MD simulations and sometimes I find my self in a situation where I'm not sure of the conditions.
objdump is a utility that can be used as a disassembler to view executable in assembly form.
Use this command to disassemble a binary,
objdump -Dslx file
Important to note though that disassemblers make use of the symbolic debugging information present in object files(ELF), So that information should be present in your object files. Also, constants & comments in source code will not be a part of the disassembled output.
Summary
Use source code control and keep track of which source code revision the executable's built from... it should write that into the output so you can always cross-reference the two, checkout the same sources and rebuild the executable that gave you those results etc..
Discussion
The desired line is either multiplication by a number (in a subfunction of a while group) or printing a line in a specific case (if statement)
I need this becouse I'm running several MD simulations and sometimes I find my self in a situation where I'm not sure of the conditions.
For the very simplest case where you want all the MD simulations to be running the latest source, you can compare timestamps on the source files with the executable to see if you forgot to recompile, compare the process start time (e.g. as listed by ps) with the executable creation time.
Where you're deliberately deploying multiple versions of the program and only have the latest source, then it gets pretty tricky. A multiplication will typically only generate a single machine code instruction... unless you have some contextual insight you're unlikely to know which multiplication is significant (or if it's missing). The compiler may generate its own multiplications for e.g. array indexing, and may sometimes optimise multiplications into bit shifts (or nothing, as Ira comments), so it's not as simple as saying 'well, it's my only multiplication in function "X"'. If you're printing a specific line that may be easier to distinguish... if there's a unique string literal you can search for it in the executable (e.g. puts("Hello") -> strings program | grep Hello, though that may get other matches too, and the compiler's allowed to reuse string literal sequences so "Well Hello" might cater to your need via a pointer to 'H' too). If there's a new extern symbol involved you might see it in nm output etc..
All that said (woah)... you should do something altogether different really. Best is to use a source control system (e.g. svn, cvs...), and get it configured so you can do something to find out which revision of the codebase was used to create the executable - it should be a FAQ for any revision control system.
Failing that, you could, for example, do something to print out what multipliers or conditions the progarm was using when it starts running, capturing that in your logs. While hackish, macros allow you to "stringify" their parameters, so you can log and execute something without typing all the code twice. Lots of other options too.
Hope some of that helps....

Variable renaming for plagiarism detection for C/C++

I have a couple of simple C++ homeworks and I know the students shared code. These are smart students and they know how to cheat moss. I'm looking for a tool that can rename variables based on their types (first variable of type int will be int1, first int array will be intptr1...), or does something similar that I cannot think of now. Do you know a quick way to do this?
edit: I'm required to use moss and report 90% match
Thanks
Yep, the tool you're looking for is called a compiler. :)
Seriously, if the programs submitted are exactly the same except for the identifier names, compiling then (without debugging info) should result in exactly the same output.
If you do this with debugging turned on, the compiler may leave meta-data in the executable that is different for each executable, hence the comment about ensuring it is off. This is also why this wont work for Java programs - that kind of info is present whether in debug mode or not (for the purposes of dynamic introspection).
EDIT: I see from the comments added to the question that you're observing some submissions that are different in more than just identifier names. If the programs are still structurally equivalent, this should still work.
EDIT: Given that the use of moss is a requirement, this probably isn't the way to go. I does seem though that moss has some support for comparing assembly - perhaps compiling to assembler and submitting that to moss is an option (depending on what compiler you're using).
You can download and try our C CloneDR duplicate code detector. It finds duplicated code even when the variable names have been changed. Multiple changes in the same chunk are treated as just one; if they rename the varaibles consistenly everywhere, you'll get back a report of "one clone" with the precise variable subsitution.
You can try Copy Paste Detector with ignoreIdentifiers turned on. You can at least use it for a first pass before going to the effort of normalizing names for moss. Or, since the source is available, maybe you can get it to spit out its internal normalization of the code.
Another way of doing this would be to compile the applications and compare their binaries, so your examination is not limited to variable/function name changing.
An HEX editor can help you with that. I just tried ExamDiff (not free $) and I was happy with the result.

How to check code generated by C++ compiler?

just like in topic - is there any software to open (what?) and here I don't even know what to open - file with object code or exe?
My today's questions (if only todays ;)) may seem bit odd but I'm going through excersises in "The C++ Programming Language" by B.S. and sometimes I'm just stuck on particular question. I'm sometimes bit irritated by style of this book (excellent in many aspects) that he (B.S.) asks some questions which you won't find answer in his book on how to do it or even where to start.
Like this one for example:
Run some tests to see if your compiler really generates equivalent code for iteration using pointers and iteration using indexing. If different degrees of opimization can be requested, see if and how that affects the quality of the generated code.
Thats from chapter 5 question 8. Up to this point nowhere in this book is even mentioning testing and analyzing code generated by compiler.
Anyway, if someone could help me with this I'll be greatful.
Thank you.
The debugger will help you. Most debuggers let you halt the program and look into disassembly. The nice thing is they point you right to disassembly of the line you set the breakpoint to, not to just all the compilation result.
Once in a while I do that in Visual Studio - compile the program, put a breakpoint onto the beginning of code of interest, start the program, then when it is halted I open the disassembly and immediately see the code corresponding to that C++ code.
If you're using g++, you can do g++ -S main.cpp. This will output the assembly in a file called main.s. However, if the functions you're interested in are spread across different .cpp files, it might be more convenient to do an objdump on the final executable.
There's also a nice tool called embroider that pretty-prints the objdump output for you as HTML, crosslinking the various function calls and jumps.
Many compilers can generate "listing" files of the assembly code they are generating during compilation, interspersed with the statements from the C source code. Also, there are tools which disassemble object and executable files.
How these tools are actually activated is depending on your toolchain, obviously.

Calculate SLOC GCC C/C++ Linux

We have a quite large (280 binaries) software project under Linux and currently it has a very dispersed code structure - that means one can't [work out] what code from the source tree is valid (builds to deployable binaries) and what is deprecated. But the Makefiles are good. We need to calculate C/C++ SLOC for entire project.
Here's a question - can I find out SLOC GCC has compiled? Or maybe I can gain this information from binary (debug info probably)? Or maybe I can find out what source files was the binary compiled from and use this info to calculate SLOC?
Thanks
Bogdan
It depends on what you mean by SLOC that GCC has compiled. If you mean, track the source files from your project that GCC used, then you'd probably use the dependency tracking options which lists source files and headers. That's -M and various related options. Beware of including system-provided headers. A technique I sometimes use is to replace the standard C compiler with an appropriate variation - for example, to ensure a 64-bit compilation, I use 'CC="gcc -m64"' to guarantee the when the C compiler is used, it will compile in 64-bit mode. Obviously, with a list of files, you can use wc to calculate the number of lines. You use 'sort -u' to eliminate duplicated headers.
One obvious gotcha is if you find that everything is included with relative path names - then you have to work out more carefully where each file is.
If you have some other definition of SLOC, then you will need to specify what you have in mind. Sometimes, people are looking for non-blank, non-comment SLOC, for example - but you still need the list of source files, which I think the -M options will help you determine.
The first thing you want is an accurate list of what you actually compiled. You can achieve this by using a wrapper script instead of gcc.
The second list you want is the list of files that were used for this. For this, consult the dependency list (as you said that was correct). (Seems you'd need make --print-data-base)
Then, sort and deduplicate the list of files, and throw out system headers. For each remaining file, determine the SLOC count using your prefered tool.
What you can do is do a pre-processor only compilation, using gcc's -E flag: this will result in output that is the actual code being compiled. Do a simple line count (wc -l) or something more advanced.
It might include extra code from macro's, etc. but especially if you compare it with a previous instance of your code it is a good comparison.
Here you can find a free (GPL) tool called sloccount dedicated to estimate SLOC in projects of any size:
http://www.dwheeler.com/sloccount/
I've used the following approach to get dirty metric value in 2 hours. Even though the preciseness was far from ideal it was enough to make the decision.
We took around 40 kb of code and calculated SLOC for this code using gcov. Then we calculated "source lines per byte" metric and used it to get approximate SLOC number using C source code size for the whole project.
It worked out just fine for our needs.
Thanks
You may want to try Resource Standard Metrics as it calculates effective lines of code which exclude the standalone braces etc which are programmer style and artificially inflate SLOC counts by 10 to 33%. Ask them for a free timed license to give it a try.
Their web page is http://msquaredtechnologies.com