huge C file debugging problem - c++

I have a source file in my project, which has more than 65,536 code lines (112,444 to be exact). I'm using an "sqlite amalgamation", which comes in a single huge source file.
I'm using MSVC 2005. The problems arrives during debugging. Everything compiles and links ok. But then when I'm trying to step into a function with the debugger - it shows an incorrect code line.
What's interesting is that the difference between the correct line number and the one the debugger shows is exactly 65536. This makes me suspect (almost be sure in) some unsigned short overflow.
I also suspect that it's not a bug in the MSVC itself. Perhaps it's the limitation of the debug information format. That is, the debug information format used by MSVC stores the line numbers as 2-byte shorts.
Is there anything can be done about this (apart from cutting the huge file into several smaller ones) ?

According to a MS moderator, this is a known issue with the debugger only (the compiler seems to handle it fine as you pointed out). There is apparently no workaround, other than using shorter source files. See official response to very similar question here

Well, when I wanted to look at how sqlite works, I took the last 60000 or so lines, moved them to another file and then #include'd it. That was easy and did the trick for me. Also, if you do that, be careful not to split inside #ifdef .

If you look at the documentation for the symbolic debugging info, you will see the type used for line numbers. For example, both line and column parameters for IDiaSession::findLinesByLinenum are of type DWORD.
Edit: As #valdo points out, that still doesn't mean the debugger works properly with huge line numbers. So you have to use shorter files. It is unfortunate that such limitation exists, but even if there wasn't I'd still recommend you split your source.

Have you looked into using WinDBG instead? It's pretty capable as the Windows team use it for debugging the O/S and there's some biiiig files in there, or at least there was when I last looked.

For anyone having issues with incorrect line numbers for files < 65536 lines: I found my issue was because of inconsistent line endings in the source file. There were 129 \r newlines where the rest of the file was \r\n style. The difference between the debugger line and the correct line was 129 as well.

Unless you are modifying SQLite, you should just trust that it is doing its job. No need to step in at all. SQLite is run through a large battery of tests before release.

Related

Why does the g++ compiler add spaces between every character in my cpp file?

I'm trying to compile 3 cpp files, for only one of them, the g++ compiler on linux is reading spaces between every character on the making it impossible to compile. I get hundreds, if not thousands, of x.cpp:n:n: warning: null character(s) ignored (where x is a name and n is a number). I wrote the program in Visual studio and I copied them to linux. The other 2 files compile fine, I've done this for dozens of projects. How does this happen?
I managed to fix this issue by creating a new file and copying the text from the original cpp instead of copying the file.
Now I get an error from the terminal saying Permission Denied when I try launch the .o file
Your compiler problem is nothing to do with linebreaks.
You're trying to compile a file saved as UTF-16 (Unicode). Visual Studio will do this behind your back if the file contains any non-ASCII characters.
Solution 1 (recommended): stick to ASCII. Then the problem simply won't arise in the first place.
Solution 2: save the file in Visual Studio as UTF-8, as described here. You might need to save the file without a BOM (byte-order mark) as described here.
WRT your other problem, look for a file called a.out (yes, really) and try running that. And don't specify -c on the g++ command line.
There is no text but encoded text.
Dogmatic corollaries:
Authors choose a character encoding for a text file.
Readers must know what it is.
Any kind of shared understanding will do: specification, convention, internal tagging, metadata upon transfer, …. (Even last century's way of converting upon transfer would do in some cases.)
It seems you 1) didn't know what you chose. 2) didn't bring that knowledge with you when you copied the file between systems, and 3) didn't tell GCC.
Unfortunately, there has been a culture of hiding these basic communication needs, instead of doing it mindfully; so, your experience too much too common.
To tell GCC,
g++ -finput-charset=utf-16
Obviously, if you are using some sort of project system that supports keeping track of the required metadata of the relevant text files and passing it tools, that would be preferred.
You could try adopting UTF-8 Everywhere. That won't eliminate the need for communication (until maybe the middle of this century) but it could make it more agreeable.

Program I just made is apparently a virus? C++

Okay so I just made a C++ program that is basically a notebook,
you write stuff in it and it saves it to a .dat file and then you can
read it later.
I compiled it with Microsoft Visual C++ and now I sent it to a friend and it's
saying that it is a virus? I scan it online and it also says that it's a virus.
I don't know why this is happening, as I literally just used some if/else statements, created some strings and used a couple getlines. (and fstream to create the .dat files).
This is the virus report: https://www.virustotal.com/en/file/a1b72280a32915429607fd5abeef1aad4f8310867df1feb7707ea0f7a404026e/analysis/1455735299/
Here is my code. (Its 400+ lines). And I'm almost certain there's nothing wrong
with it. http://pastebin.com/ZwJZrRSu
Any idea why this is happening?
Most probably your PC is already infected by a virus, which adds itself to any executable it can find on your machine. That would easily explain this behavior. Try to compile the same program on PC that is clean for sure and check your PC by antivirus.
I am not sure but I think it because you imported kernel32.dll
Again, it is hard to tell without the source
Also take a look at the file detail in the report

Breakpoints does not point to the actual code

Does anyone have a clue what could cause a breakpoint not to show the actual place of the code in a specific file?
This is the second time this has happened to me.. maybe someone could help, my parameters:
I am working in visual studio 2010.
This one specifically is a static lib but it also happened to me inside dll's.
The PDB's are generated in Z7, although this has also happened to me in the default pdb generation.
I am sure the code is compiled with the correct lib(also happened in dlls so..)
Also I have some Doxygen comments I first suspected causing this problem(could it be?)
Attached is an image that show where the breakpoint arrow is compared to the callstack of where it ACTUALLY is..
Thanks!
If you debug code that has optimization enabled, the method might just be inlined. This is at least one proven source of breakpoints not pointing to the right position.
One of the scenario's which I have generally noticed is when source file changes due to fetch of a file from source control while debugging. Break point uses the line number of older code.
So, apparently the visual studio text editor doesn't adapt well to CR..
I found that the file had some CR (and not CRLF) and that confused the compiler all together..
When I actually made a compile error on purpose, it didn't even point to the correct line...
So I added LineFeeds(LF) after every CR and it compiles fine...
(Used notepad++ to detect where it was missing but I'm sure VS has a way as well..)
Cheers.

Following the flow of code

I'm trying to learn the level format in one of my favourite games, which is almost totally undocumented. Basically the only document that describes the level format is simply by saying things like First 12 bytes: header 4 following bytes: number of materials x next bytes: array of materials, and things like that.
I'm very inexperienced in hex and don't completely understand what they're saying. However, there is a level editor, and the source is freely available on google code. I was thinking of adding this in to my visual studio and trying to learn the level format by reading how the level editor opens the files.
However, another problem, I don't know c++ (I know python). This means I probably won't be able to locate which part of the code reads the bytes and whatnot.
What I'm looking for, is something that will allow me to follow the flow of the code, in its execution. Essentially something that acts similar to setting a breakpoint on every line, and having it show me what specific portion of code is executing when reading the file contents.
However, obviously setting breakpoints on every line is very messy and slow. I'm looking for something that will simply show me what code is being run when I open the file in the editor.
Does anyone know what I could do? Thanks.
You're looking for a feature to step from one statement to the next; every debugger I know has such a feature. You start by setting a single breakpoint at the beginning of the interesting region, and starting from there you "step" through your code.
E.g. in Visual C++ 2010, the key F10 does one step; you can also "step into" the next statement (e.g. a method call) with F11.
In your case, set the breakpoint to where the reading of the level file starts, and continue from there. To find the place where the file is read can be a hard problem as well - depending on the clearness of the code; but if it's well written code, there should be a method with "read" in the name or "load" or something similar - you'll figure it out!
You might have to know at least some basic C++ syntax to be able to follow what's going, though.
I would also recommend reading up on Debugging HowTo's (e.g this one).
The document wich you find so obscure, is just the level format specifications, in most cases the specifications are all you need. You need as well some little extra experience with file reading.
When reading a file you have to warry about few things.
1) When reading byte by byte (8 bits) order is no changed.
2) When reading 32bits at a time byte order can change according to endianness of machine.
(for example 0x12345678 becomes 0x78563412 when endiannes changes)
There was a very old tutorial that can help you loading 3D models that helped me to start working with files:
http://www.spacesimulator.net/wiki/index.php?title=Tutorials:3ds_Loader
this is usefull because you have part of the specifications (like in original documentation) and it shows how you can create a loader just starting from specifications. That's all you need. That's C but there is no big difference from C++ in this case.
If you need some other simple file format specification with related file loader for making things clearer to you, you can also look at libktx and ktx specifications:
http://www.khronos.org/opengles/sdk/tools/KTX/file_format_spec/
If I remember correctly there's also a unofficial C++ KTX loader you can look at if you itend to write C++ oop code rather than C.

Creating a dll with very large files

This is the first time I am writing in this forum, I hope someone could help me. I have been searching on the Web but have not found any answer related to my question.
I have a very large file (about 25000 lines) with thousands of definitions that must be used by another file
All these files (and about 600 more of them) are converted to .c files using a special tool. I am almost sure this conversion is made propertly.
If I create a.exe with all these files, there is no problem and everything works all right. Unfortunately, I need a .dll which crashes when I try to access to the very large file.
I have check that its .obj file is larger than 65MB so I have added the compiler command /bigobj as far as I have seen on the Internet but it didn't solve the problem.
I have also checked that the problem happens when access to the large file because everything works ok when I join both files (which is not possible in my development)
I am using Visual 2008
Could it be related to compile as C (/TC) or C++ (/TP) code? What's the difference between .exe and .dll that may make my program crashes?
Any ideas please?
Thanks in advance
Indeed, without the code not much can be said... (tho not sure if anyone would have the patience of reading 600 files each with 25k lines of code :) )
As advice, rebuild the exe and dll in debug mode, run the exe from MSVC, then put a breakpoint where you know it crashes. Next set a data breakpoint on the variable after you get its address from the watch window. ASSUMING the app does what it should correctly, then the pointer is set, but lost along the way; that means it should be triggered twice.
Alternatively, try an assertion check.
Another scenario is because the variable is volatile.
Another scenario is the value is returned from a temporary value and gets lost...
And last but not least, the value is never set because of wrong\bad conditions...
If your problem is the crash and not the missing value, just do a null check and return the call if you really want to avoid the complication, however, I would recommend you find why the value isn't set. Your choice.