Following the flow of code - c++

I'm trying to learn the level format in one of my favourite games, which is almost totally undocumented. Basically the only document that describes the level format is simply by saying things like First 12 bytes: header 4 following bytes: number of materials x next bytes: array of materials, and things like that.
I'm very inexperienced in hex and don't completely understand what they're saying. However, there is a level editor, and the source is freely available on google code. I was thinking of adding this in to my visual studio and trying to learn the level format by reading how the level editor opens the files.
However, another problem, I don't know c++ (I know python). This means I probably won't be able to locate which part of the code reads the bytes and whatnot.
What I'm looking for, is something that will allow me to follow the flow of the code, in its execution. Essentially something that acts similar to setting a breakpoint on every line, and having it show me what specific portion of code is executing when reading the file contents.
However, obviously setting breakpoints on every line is very messy and slow. I'm looking for something that will simply show me what code is being run when I open the file in the editor.
Does anyone know what I could do? Thanks.

You're looking for a feature to step from one statement to the next; every debugger I know has such a feature. You start by setting a single breakpoint at the beginning of the interesting region, and starting from there you "step" through your code.
E.g. in Visual C++ 2010, the key F10 does one step; you can also "step into" the next statement (e.g. a method call) with F11.
In your case, set the breakpoint to where the reading of the level file starts, and continue from there. To find the place where the file is read can be a hard problem as well - depending on the clearness of the code; but if it's well written code, there should be a method with "read" in the name or "load" or something similar - you'll figure it out!
You might have to know at least some basic C++ syntax to be able to follow what's going, though.
I would also recommend reading up on Debugging HowTo's (e.g this one).

The document wich you find so obscure, is just the level format specifications, in most cases the specifications are all you need. You need as well some little extra experience with file reading.
When reading a file you have to warry about few things.
1) When reading byte by byte (8 bits) order is no changed.
2) When reading 32bits at a time byte order can change according to endianness of machine.
(for example 0x12345678 becomes 0x78563412 when endiannes changes)
There was a very old tutorial that can help you loading 3D models that helped me to start working with files:
http://www.spacesimulator.net/wiki/index.php?title=Tutorials:3ds_Loader
this is usefull because you have part of the specifications (like in original documentation) and it shows how you can create a loader just starting from specifications. That's all you need. That's C but there is no big difference from C++ in this case.
If you need some other simple file format specification with related file loader for making things clearer to you, you can also look at libktx and ktx specifications:
http://www.khronos.org/opengles/sdk/tools/KTX/file_format_spec/
If I remember correctly there's also a unofficial C++ KTX loader you can look at if you itend to write C++ oop code rather than C.

Related

How to extract program code from ISO file

In the Xbox 360 game Project Sylpheed: Arc of Deception. There are secret sub objectives for each level. On stage 11 "Flaming Clouds" there are 4, I have found two sources claiming they know each of them, but is actually untrue. One is still hidden. This is a very unpopular game, and no one has investigated it, and I want to know what it is. I have a disk image file of the game (ISO file). How would I go about finding the level trigger for the sub objective? I have already attempted to extract the 7gb iso using 7zip and Winrar, but each yielded the same 12mb files that contain nothing relevant what so ever. Obviously the core of the information is hidden and remains unextracted. Please advise.
https://wincdemu.sysprogs.org/ or linux/mac mount -o loop /path/to/my-iso-image.iso /mnt/iso
from there you're going to need a reverse compiler probabably but I don't know what your game is written in. You might luck out and find the levels coded in lua or something though.
Something else that can be happening is 7zip is actually opening the iso right, and the 12mb could be instructions to go download the actual game code from somewhere else. That sometimes happens with consoles.

Fortran source code file setup and column rules: SWAN

Im tring to understand sourceforge swan wave model fortran source code. One point of confusion is the different files in the source code folder (.edt,.ftn,.ftn90,.inc,.lst,.nml,.pl,.eps,.bat), now I know what sort most of these files are but the naming convention is a little boggling.
I look at swanmain.ftn and I can't understand: Is the compiler just read this file in order because there seems to be no initialization of SWMAIN like (eg "program SWMAIN") it just starts with IMLICIT NONE?? And it just has END then next line is a bunch of USE statements?? The only formal declaration of the start of SWMAIN is in a comment.
Another question i have is it seams that in columns 75:end there is a number eg 40.30 seams like a version number where that line was added or edited, yet there is no ! or c to initialize comment so are lines past 75 just assumed to be unused ( I know fortran has line format of those old punch cards but I thought it ended at column 80)?
Where should I start as far as understanding how this program is set out ie which file will give me the most insite into what all the other files do.. Should I understand makefile format?
Sorry for all the noob questions as I basically never studied programming just kinda learn while doing. Ive written a few programs in Java and C++ but mostly MATLAB.
Thanks
Most of the sources should be obvious: ftn and ftn90 are Fortran sources, nml is a Namelist file, pl is a Perl script, eps is a figure, bat is a Windows Batch file, inc is an "include" file (not language specific) and "lst" appears to be a list (not sure the relevance).
The code appear to be written primarily in FORTRAN 77 standard. As such, anything past line 72 (not 80) is truncated as a comment (gfortran will even warn you that it's doing so if you have -Wall enabled).
Fortran technically does not require a PROGRAM <name> declaration. Really the only thing required in a Fortran code is END to end it.
If you are trying to figure the program out, I would suggest:
Read the documentation at least twice.
Read relevant publications from the code (specifically the initial announcement of the code)
Start by looking at the main program and seeing where the calls go (this is probably a very slow and bad way to do it, since it looks to be a fairly long code)

Loading a text file in to memory and analyze its contents

For educational purposes, I would like to build an IDE for PHP coding.
I made a form app and added OpenFileDialog ..(my c# knowledge was useful, because it was easy ... even though without intelisense!)
Loading a file and reading lines from it is basically the same in every language (even PERL).
But my goal is to write homemade intelisense. I don't need info on the richtextBox and the events it generates, endline, EOF, etc, etc.
The problem I have is, how do I handle the data? line for line?
a struct for each line of text file?
looping all the structs in a linked list? ...
while updating the richtextBox?
searching for opening and closing brackets, variables, etc, etc
I think Microsoft stores a SQL type of database in the app project folders.
But how would you keep track of the variables and simulate them in some sort of form?
I would like to know how to handle this efficiently on dynamic text.
Having never thought this through before, it sounds like an interesting challenge.
Personally, I think you'll have to implement a lexical scanner, tokenizing the entire source file into a source tree, with each token also having information about it mapping the token to a line/character inside of the source file.
From there you can see how far you want to go with it - when someone hovers over a token, it can use the context of the code around it to be more intelligent about the "intellisense" you are providing.
Hovering over something would map back to your source tree, which (as you are building it) you would load up with any information that you want to display.
Maybe it's overkill, but it sounds like a fun project.
This sounds to be related to this question:
https://softwareengineering.stackexchange.com/questions/189471/how-do-ide-s-provide-auto-completion-instant-error-checking-and-debugging
The accepted answer of that question recommends this link which I found very interesting:
http://msdn.microsoft.com/en-us/magazine/cc163781.aspx
In a nutshell, most IDEs generate the parse tree from the code and that is what they stores and manage.

How does large text file viewer work? How to build a large text reader

how does large text file viewer work?
I'm assuming that:
Threading is used to handle the file
The TextBox is updated line by line
Effective memory handling is used
Are these assumptions correct? if someone were to develop their own, what are the mustsand don'ts?
I'm looking to implement one using a DataGrid instead of a TextBox
I'm comfortable with C++ and python. I'll probably use QT/PyQT
EDIT
The files, I have are usually between 1.5 to 2 GB. I'm looking at editing and viewing these files
I believe that the trick is not loading the entire file into memory, but using seek and such to just load the part which is viewed (possibly with a block before and after to handle a bit of scrolling). Perhaps even using memory-mapped buffers, though I have no experience with those.
Do realize that modifying a large file (fast) is different from just viewing it. You might need to copy the gigabytes of data surrounding the edit to a new file, which may be slow.
In Kernighan and Plaugher's classic (antique?) book "Software Tools in Pascal" they cover the development and design choices of a version of ed(1) and note
"A warning: edit is a big
program (excluding contributions from
translit, find, and change; at
950 lines, it is fifty percent bigger
than anything else in this book."
And they (literally) didn't even have string types to use. Since they note that the file to be edited may exist on tape which doesn't support arbitrary writes in the middle, they had to keep an index of line positions in memory and work with a scratch file to store changes, deletions and additions, merging the whole together upon a "save" command. They, like you, were concerned about memory constraining the size of their editable file.
The general structure of this approach is preserved in the GNU ed project, particularly in buffer.c

How to quickly debug when something wrong in code workflow?

I have frequently encounter the following debugging scenario:
Tester provide some reproduce steps for a bug. And to find out where the problem is, I try to play with these reproduce steps to get the minimum necessary reproduce steps. Sometimes, luckily I found that when do a minor change to the steps, the problem is gone.
Then the job turns to find the difference in code workflow between these two reproduce steps. This job is tedious and painful especially when you are working on a large code base and it go through a lot code and involve lots of state changes which you are not familiar with.
So I was wondering is there any tools available to compare "code workflow". As I've learned the "wt" command in WinDbg, I thought it might be possible to do it. For example, I can run the "wt" command on some out most functions with 2 different reproduce steps and then compare the difference between outputs. Then it should be easy to found where the code flow starts to diverge.
But the problem with WinDBG is "wt" is quite slow (maybe I should use a log file instead of output to screen) and not very user-friendly (compared with visual studio debugger) ... So I want to ask you guys is there any existing tools available . or is it possible and difficult to develop a "plug-in" for visual studio debugger to support this functionality ?
Thanks
I'd run it under a profiler in "coverage" mode, then use diff on the results to see which parts of the code were executed in one run by not the other.
Sorry, I don't know of a tool which can do what you want, but even if it existed it doesn't sound like the quickest approach to finding out where the lower layer code is failing.
I would recommend to instrument your layer's code with high-level logs so you can know which module fails, stalls, etc. In debug, your logger can write to file, to output debug window, etc.
In general, failing fast and using exceptions are good ways to find out easily where things go bad.
Doing something after the fact is not going to cut it, since your problem is reproducing it.
The issue with bugs is seldom some interal wackiness but usually what the user's actually doing. If you log all the commands that the user enters then they can simply send you the log. You can substitute button clicks, mouse selects, etc. This will have some cost but certainly much less than something that keeps track of every method visited.
I am assuming that if you have a large application that you have good logging or tracing.
I work on a large server product with over 40 processes and over one million lines of code. Most of the time the error in the trace file is enough to identify the location of problem. However sometimes the error I see in the trace file is caused by some earlier code and the reason for this can be hard to spot. Then I use a comparative debugging technique:
Reproduce the first scenario, copy the trace to a new file (if the application is multi threaded ensure you only have the trace for the thread that does the work).
Reproduce the second scenario, copy the trace to a new file.
Remove the timestamps from the log files (I use awk or sed for this).
Compare the log files with winmerge or similar, to see where and how they diverge.
This technique can be a little time consuming, but is much quicker than stepping through thousand of lines in the debugger.
Another useful technique is producing uml sequence diagrams from trace files. For this you need the function entry and exit positions logged consistently. Then write a small script to parse your trace files and use sequence.jar to produce uml diagrams as png files. This is a great way to understand the logic of code you haven't touched in a while. I wrapped a small awk script in a batch file, I just provide trace file and line number to start then it untangles the threads and generates the input text to sequence.jar then runs its to create the uml diagram.