Assembler and hex dump change when formatting c++ code

Assembler and hex dump change when formatting c++ code - c++

A C++ project I work on does not have consistent indentation. The lead developer told me it might not be safe to reformat the code. I thought it might not matter to the compiled code. As a test I tried reformatting one file using a Formatter based on the Eclipse "GNU [built-in]" Profile. When I recompiled the file the md5sum changed. I did a hexdump of the file and it showed one byte changed. I disassembled the object file. I also compiled with debugging and I got the source code line. I used diff to get the assembly instruction that changed.
The source was this line
logErr
<< xmlutils.GetErrorMessage() << endl;
Below is the diff output showing the changed assembly
23be: 89 04 24 mov %eax,(%esp)
23c1: e8 fc ff ff ff call 23c2 <_ZN12RerouteAdapt11WriteToFileERKSs+0x64>
23c6: e8 fc ff ff ff call 23c7 <_ZN12RerouteAdapt11WriteToFileERKSs+0x69>
- 23cb: c7 44 24 04 79 01 00 movl $0x179,0x4(%esp)
+ 23cb: c7 44 24 04 84 01 00 movl $0x184,0x4(%esp)
23d2: 00
23d3: 89 04 24 mov %eax,(%esp)
23d6: e8 fc ff ff ff call 23d7 <_ZN12RerouteAdapt11WriteToFileERKSs+0x79>
The ordering of the headers was not changed by the reformat.
I know some C/C++, but very little about assembly. I was wondering if there was a simple explanation for why the object file would change. I thought the C++ compiler (GCC 4.8.2 on RHEL 7) was indifferent to formatting and white space. There were no differences besides this in the assembly.

Thomas and Tim were correct. The value that changed corresponds to the line number before and after formatting. I assumed "logErr" was just a stream. Turns out it is a macro that uses the __LINE__ macro. So the line number is hard-coded in the assembly.
#define logErr theTracer().SetFuncName(__func__); theTracer().SetFile(__FILE__); theTracer().SetLine(__LINE__); theTracer().SetError(); theTracer()
Thank you for your help.

Related

Memory Leak (false positive) when an MFC applications uses a DLL

I develop a C++ Library (DLL). When I create a minimal MFC project (VS2017) that links with this DLL its okay. But as soon as any function of the library is used by the MFC application the debugger of the MFC project complains about memory leaks (many lines like the ones below):
Detected memory leaks!
Dumping objects ->
{2255} normal block at 0x000002A4B1F8C360, 48 bytes long.
Data: < > 10 BB F8 B1 A4 02 00 00 B0 BD F8 B1 A4 02 00 00
{2242} normal block at 0x000002A4B1F8BDB0, 48 bytes long.
Data: < > C0 C0 F8 B1 A4 02 00 00 F0 C2 F8 B1 A4 02 00 00
{2220} normal block at 0x000002A4B1F8C2F0, 48 bytes long.
Data: < > 80 C2 F8 B1 A4 02 00 00 10 C9 F8 B1 A4 02 00 00
These are false positives because it occurs also when only an empty test function of the library is called. Moreover this does not happen when the library is linked with a non-MFC project.
What can cause these warnings? Related information:
A VS2013 user said that he can avoid the warnings by changing the
character set of his MFC project. I have tested that in VS2017 but
get still warnings.
A VS2017 user said that the warnings are gone when he delay-loads
the DLL.
In the course of debugging I have compiled the DLL with CMake in order to use settings which are as standard as possible. But no change.

How to use scan codes to shut down PC?

Someone asked me to shut down Windows with a Teensy 2.0. I have to use the following scancodes found on win.tue.nl.
Set-1 make/brake Set-2 make/brake
Power e0 5e / e0 de e0 37 / e0 f0 37
Sleep e0 5f / e0 df e0 3f / e0 f0 3f
Wake e0 63 / e0 e3 e0 5e / e0 f0 5e
Right now I am using Arduino IDE with Teensyduino add-on to program the Teensy. My question is how to use the above called scancodes to simulate a "power" keystroke to shut down a PC?
I hope someone can help me.
/Zwilk

You can find the definitions in USB Keyboard Library (usb_keyboard.h, usb_keyboard.c) from pjrc. If you want to extend custom keys you can adapt these files.

Setting endianness of VS debugger

I am using VS 2012 and programming in C++. I have a wide string
wchar_t *str = L"Hello world".
Technically I read the string from a file but I don't know if that makes a difference. When I look at str in the memory window it looks like this:
00 48 00 65 00 6c 00 6c 00 6f 00 2c 00 20 00 77 00 6f 00 72 00 6c 00 64 00 21 00
As you can see the string is stored in memory as big-endian.
When I hover my mouse over the string I get:
L"䠀攀氀氀漀Ⰰ 眀漀爀氀搀℀"
And after I reverse the endianness of str the memory looks like:
48 00 65 00 6c 00 6c 00 6f 00 2c 00 20 00 77 00 6f 00 72 00 6c 00 64 00 21 00 00
And the hover over looks like:
L"Hello, world!"
It seems that the debugger displays UTF-16 in little-endian by default. My program reads big-endian files so it is very tedious to keep reversing the endianness of all strings to debug them. Is there any way to change the endianness of the debugger's display?
Except for debug purposes I can do all my processing in big endian.

It's not only the debugger. The wchar_t function of Visual Studio are little endian as the host is. When you want to process the data you need to reverse the string endianess to little endian anyway.
It's worth to have this change even if you output the strings to a file with a different endianess. Strings are defined as a byte sequence, your endianess applied to a string looks strange anyhow.

Your best shot in getting this to work is to define your own type and create a debugger type visualizer for it (see Customizing the Visual Studio Debugger Display of Your Data, or here).
Or maybe you can quick-hack it by shifting the address by 1 byte in watch window.
You're working with a non-native string format that just happens to "feel" similar to the native format. So you are tempted to think there should be almost a way to do it. But to the debugger, it's just a foreign binary format. The debugger is not designed to handle foreign endianness just as it does not handle visualizing an OGG stream packet.
If you want to use available tools for manipulating native-endian Unicode strings, you'll need to convert to native-endian Unicode format.

As has been pointed out, VS uses the native endianness, which is
little endian on an Intel/AMD. The problem is that you're not
reading the strings correctly; you should imbue the
std::istream with a locale which reads UTF-16BE (since this is
apparently the encoding form you're trying to read).
std::istream (or rather the backing std::filebuf) will
automatically do the code translation on the fly when reading
and writing.

You can set the endianness of the Memory window using the context menu. Right-click in the Memory window and check "Big Endian".

Determine whether a lib file is 2010 Build

We are migrating our C++ source from VS2008 to VS 2010. We are having issues due to incorrect lib files.
Is there any way to determine whether a lib file is build using VS 2010 or VS 2008?

Strictly speaking, You won't be able to get it from the lib file directly since those are just a container for .obj files (or 'pseudo object files in the case of import libraries). It's possible to have a library that contains object files created by different compilers, though I doubt you'll see that very often, if ever.
However, you may be able to coax the information out of the object files contained in the library.
I don't know how reliable this information is, but it appears that object files produced by MSVC contain version information about the compiler used to build them. The object file contains a section with the name ".debug$S", which will contain debugging information. However, even if you've built the object file without debugging information, there will still be a small ".debug$S" section, which might look like the following for a simple 'hello world' program compiled with VS 2008 SP1 (Compiler Version 15.00.30729.01):
RAW DATA #2
00000000: 04 00 00 00 F1 00 00 00 56 00 00 00 18 00 01 11 ....ñ...V.......
00000010: 00 00 00 00 63 3A 5C 74 65 6D 70 5C 68 65 6C 6C ....c:\temp\hell
00000020: 6F 2E 6F 62 6A 00 3A 00 3C 11 00 22 00 00 07 00 o.obj.:.<.."....
00000030: 0F 00 00 00 09 78 01 00 0F 00 00 00 09 78 01 00 .....x.......x..
00000040: 4D 69 63 72 6F 73 6F 66 74 20 28 52 29 20 4F 70 Microsoft (R) Op
00000050: 74 69 6D 69 7A 69 6E 67 20 43 6F 6D 70 69 6C 65 timizing Compile
00000060: 72 00 00 00 r...
Note that if you convert the components of the compiler version, 15.00.30729.01, to 16-bit hex numbers, you'll get (displayed in little endian):
0f 00 00 00 09 78 01 00
Which is a string you'll notice shows up twice in the ".debug$S" section at offsets 0x30 and 0x38.
For VS 2010 SP1 (Compiler version 16.00.40219.01) produces the following ".debug$S":
RAW DATA #2
00000000: 04 00 00 00 F1 00 00 00 56 00 00 00 18 00 01 11 ....ñ...V.......
00000010: 00 00 00 00 43 3A 5C 74 65 6D 70 5C 68 65 6C 6C ....C:\temp\hell
00000020: 6F 2E 6F 62 6A 00 3A 00 3C 11 00 22 00 00 07 00 o.obj.:.<.."....
00000030: 10 00 00 00 1B 9D 01 00 10 00 00 00 1B 9D 01 00 ................
00000040: 4D 69 63 72 6F 73 6F 66 74 20 28 52 29 20 4F 70 Microsoft (R) Op
00000050: 74 69 6D 69 7A 69 6E 67 20 43 6F 6D 70 69 6C 65 timizing Compile
00000060: 72 00 00 00 r...
where you'll note the compiler version data 10 00 00 00 1B 9D 01 00.
Similar signatures are produced by VS 2003 through VS 2012 compilers (VC6 does not produce a ".debug$S" section, and I don't have the means to test VS 2002). However, the offsets of the information differ at times (and may differ even for the same compiler depending on the actual options used and file being compiled).
I'm unaware of a tool that will easily extract this information, but some scripts that string together the lib tool and/or dumpbin could probably be cobbled together pretty easily. Microsoft's "PE and COFF Specification" document may be of some help if you want to pull apart libraries and object files yourself, though the document had no real information about the .debug$S section other than that it exists and contains debugging information.
Note that as far as I know this information is undocumented, and my reverse engineering of it is sketchy to say the least, and may change or not hold for all circumstances. I'm truly uncertain of how reliable this information is, but it's a start if no other better information shows up.

zlib decompression failing

I'm writing an application that needs to uncompress data compressed by another application (which is outside my control - I cannot make changes to it's source code). The producer application uses zlib to compress data using the z_stream mechanism. It uses the Z_FULL_FLUSH frequently (probably too frequently, in my opinion, but that's another matter). This third party application is also able to uncompress it's own data, so I'm pretty confident that the data itself is correct.
In my test, I'm using this third party app to compress the following simple text file (in hex):
48 65 6c 6c 6f 20 57 6f 72 6c 64 21 0d 0a
The compressed bytes I receive from the app look like this (again, in hex):
78 9c f2 48 cd c9 c9 57 08 cf 2f ca 49 51 e4 e5 02 00 00 00 ff ff
If I try and compress the same data, I get very similar results:
78 9c f3 48 cd c9 c9 57 08 cf 2f ca 49 51 e4 e5 02 00 24 e9 04 55
There are two differences that I can see:
First, the fourth byte is F2, rather than F3, so the deflate "final block" bit has not been set. I assume this is because the stream interface never knows when the end of the incoming data will be, so never sets that bit?
Finally, the last four bytes in the external data is 00 00 FF FF, whereas in my test data it is 24 E9 04 55. Searching around I found on this page
http://www.bolet.org/~pornin/deflate-flush.html
...that this is a signature of a sync or full flush.
When I try and decompress my own data using the decompress() function, everything works perfectly. However, when I try and decompress the external data the decompress() function call fails with a return code of Z_DATA_ERROR, indicating corrupt data.
I have a few questions:
Should I be able to use the zlib "uncompress" function to uncompress data that has been compressed with the z_stream method?
In the example above, what is the significance of the last four bytes? Given that both the externally compressed data stream and my own test data stream are the same length, what do my last four bytes represent?
Cheers

Thanks to the zlib authors, I have found the answer. The third party app is generating zlib streams that are not finished correctly:
78 9c f2 48 cd c9 c9 57 08 cf 2f ca 49 51 e4 e5 02 00 00 00 ff ff
That is a partial zlib stream,
consisting of a zlib header and a
partial deflate stream. There are two
blocks, neither of which is a last
block. The second block is an empty
stored block, used as a marker when
flushing. A zlib decoder would
correctly decode what's there, and
then continue to look for data after
those bytes.
78 9c f3 48 cd c9 c9 57 08 cf 2f ca 49 51 e4 e5 02 00 24 e9 04 55
That is a complete zlib stream,
consisting of a zlib header, a single
block marked as the last block, and a
zlib trailer. The trailer is the
Adler-32 checksum of the uncompressed
data.
So My decompression is failing - probably because the CRC is missing, or the decompression code keeps looking for more data that does not exist.

solution is here:
http://technology.amis.nl/2010/03/13/utl_compress-gzip-and-zlib/
this is decompression and compression functions for start with 78 9C signature
compressed database blob (or stream).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Assembler and hex dump change when formatting c++ code - c++

Related

Memory Leak (false positive) when an MFC applications uses a DLL

How to use scan codes to shut down PC?

Setting endianness of VS debugger

Determine whether a lib file is 2010 Build

zlib decompression failing

Categories

Resources