Memory Layout of application

Memory Layout of application - c++

The following question is a head-scratcher for me. Assuming that I have two platforms
with an identical hardware, the same OS and the same compiler on it. If I compile exactly the
same application, can I be sure that the memory layout on both machines will exactly be the same? In other words, both applications have exaclty the same virtual address space or is
there a high chance that this is not the case.
Thanks for your thoughts about this!

You can't count on it. As a security feature, some OS's (including Windows) randomize memory layout to some extent.
(Here's a supporting link: http://blogs.msdn.com/b/winsdk/archive/2009/11/30/how-to-disable-address-space-layout-randomization-aslr.aspx)

It is highly improbable that an application will be executed in the same address space on the same platform, nonetheless on another computer. Other applications may be running which will affect where the OS loads your application.
Another point to consider is that some applications load run-time libraries (a.k.a. DLLs & shared libraries) on demand. An application may have a few DLLs loaded or not when your application is running.
In non-embedded platforms, the majority of applications don't care about exact physical memory locations, nor is it a concern that they are loaded in the same location each time. Most embedded platforms load their applications in the same place each time, as they don't have enough memory to move it around.
Because of these cases and the situations other people have mentioned, DO NOT CODE CONSTANT MEMORY LOCATION principles into your program. Very bad things will happen, especially difficult to trace and debug.

Apart from dynamic question such as stack addresses as Steven points out, there is also the aspect of compile time and static layout.
Already I think that two machines are exact clones of each other is a very particular situation, since you may have tiny differences in CPU version, libraries etc. Then some compilers (perhaps depending on some options) also put the compile time and date in the executable. If e.g your two hostnames have different lengths or this uses a date format that varies in length, not only these strings will be different, but all other static variables might be slightly shifted in address space.
I remember that gcc had difficulties on some architectures with its automatic build since the compiler that was produced in stage 2 was differed of the one build in stage 3 for such stupid reasons.

The __TIME__ macro expands to (the start of) the compilation time. Furthermore, it's determined
independently for each and every .cpp file that you compile, and the linker can eliminate duplicate strings.
As a result, depending on the compile speed, your executables may end up not just with different __TIME__ strings, but even a different number of __TIME__ strings.
If you're working late, you could see the same with __DATE__ strings ;)

Is it possible for them to have the same memory layout? Yes, it is a possibility. Is it probable? Not really.
As others have pointed out, things like address space randomization and __TIME__ macros can cause the address space to differ (whether the changes were made at compile time or run time). In my experiences, many compilers don't produce the same identical output when run twice on the same machine using the exact same input (functions are laid out in memory in different orders, etc).
Is this a rhetorical/intellectual question, or is this causing you to run into some kind of problem with a program you are writing?

Related

Has image base address optimization sense? [duplicate]

Rebasing a DLL means to fix up the DLL such, that it's preferred load adress is the load address that the Loader is actually able to load the DLL at.
This can either be achieved by a tool such as Rebase.exe or by specifying default load addresses for all your (own) dlls so that they "fit" in your executable process.
The whole point of managing the DLL base addresses this way is to speed up application loads. (Or so I understand.)
The question is now: Is it worth the trouble?
I have the book Windows via C/C++ by Richter/Nazarre and they strongly recommend[a] making sure that the load addresses all match up so that the Loader doesn't have to rebase the loaded DLLs.
They fail to argue however, if this speeds up application load times to any significant amount.
Also, with ASLR it seems dubious that this has any value at all, since the load addresses will be randomized anyway.
Are there any hard facts on the pro/cons of this?
[a]: In my WvC++/5th ed it is in the sections titled Rebasing Modules and Binding Modules on pages 568ff. in Chapter 20, DLL Advanced Techniques.

Patching the relocatable addresses isn't the big deal, that runs at memory speeds, microseconds. The bigger issue is that the pages that contains this code now need to be backed up by the paging file instead of the DLL file. In other words, when pages containing code are unmapped, they need to be written to the paging file instead of just getting discarded.
The cost of this isn't that easy to measure, especially on modern machines with lots of RAM. It only counts when the machine starts to get under load with lots of processes competing for memory. And the fragmentation of the paging file.
But clearly, rebasing is a very cheap optimization. And it is very easy to see in the Debug + Windows + Modules window, there's a bright icon on the rebased DLLs. The Address column gives you a good hint what base address would be a good choice. Leave ample space between them so you don't constantly have to tweak this as your program grows.

I'd like to provide one answer myself, although the answers of Hans Passant and others are describing the tradeoffs already pretty well.
After recently fiddling with DLL base addresses in our application, I will here give my conclusion:
I think that, unless you can prove otherwise, providing DLLs with a non-default Base Address is an exercise in futility. This includes rebasing my DLLs.
For the DLLs I control, given the average application, each DLL will be loaded into memory only once anyway, so the load on the paging file should be minimal. (But see the comment of Michal Burr in another answer about Terminal Server environment.)
If DLLs are provided with a fixed base address (without rebasing) it will actually increase address space fragmentation, as sooner or later these addresses won't match anymore. In our app we had given all DLLs a fixed base address (for other legacy reasons, and not because of address space fragmentation) without using rebase.exe and this significantly increased address space fragmentation for us because you really can't get this right manually.
Rebasing (via rebase.exe) is not cheap. It is another step in the build process that has to be maintained and checked, so it has to have some benefit.
A large application will always have some DLLs loaded where the base address does not match, because of some hook DLLs (AV) and because you don't rebase 3rd party DLLs (or at least I wouldn't).
If you're using a RAM disk for the paging file, you might actually be better of if loaded DLLs get paged out :-)
So to sum up, I think that rebasing isn't worth the trouble except for special cases like the system DLLs.
I'd like to add a historical piece that I found on Old New Thing: How did Windows 95 rebase DLLs? --
When a DLL needed to be rebased, Windows 95 would merely make a note
of the DLL's new base address, but wouldn't do much else. The real
work happened when the pages of the DLL ultimately got swapped in. The
raw page was swapped off the disk, then the fix-ups were applied on
the fly to the raw page, thereby relocating it. The fixed-up page was
then mapped into the process's address space and the program was
allowed to continue.
Looking at how this process is done (read the whole thing), I personally suspect that part of the "rebasing is evil" stance dates back to the olden days of Win9x and low memory conditions.
Look, now there's a non-historical piece on Old New Thing:
How important is it nowadays to ensure that all my DLLs have non-conflicting base addresses?
Back in the day, one of the things you were exhorted to do was rebase
your DLLs so that they all had nonoverlapping address ranges, thereby
avoiding the cost of runtime relocation. Is this still important
nowadays?
...
In the presence of ASLR, rebasing your DLLs has no effect because ASLR is going to ignore your base address anyway and relocate the DLL into a location of its pseudo-random choosing.
...
Conclusion: It doesn't hurt to rebase, just in case, but understand
that the payoff will be extremely rare. Build your DLL with
/DYNAMICBASE enabled (and with /HIGHENTROPYVA for good measure)
and let ASLR do the work of ensuring that no base address collision
occurs. That will cover pretty much all of the real-world scenarios.
If you happen to fall into one of the very rare cases where ASLR is
not available, then your program will still work. It just may run a
little slower due to the relocation penalty.
... ASLR actually does a better job of avoiding collisions than manual
rebasing, since ASLR can view the system as a whole, whereas manual
rebasing requires you to know all the DLLs that are loaded into your
process, and coordinating base addresses across multiple vendors is
generally not possible.

They fail to argue however, if this speeds up application load times to any significant amount.
The load time change is minimal, because the v-table is what gets updated with the new addresses. However, if you have low memory - enough that stuff gets loaded in/out of the page file, then the system has to keep the dll in the page file (since the addresses are changed). If the dlls were rebased - and the rebased dlls don't collide with any other dlls - then instead of swapping them out to the page file (and back), the system just overwrites the memory and reloads the dll from the original on the hard drive.
The benefit is only relevant when systems are paging stuff in and out of main memory. The last time I made efforts to keep databases of applications and their base addresses was back in VB6 days, when the computers in our offices and data centers were lucky to have even 256MB of RAM.
Also, with ASLR it seems dubious that this has any value at all, since the load addresses will be randomized anyway.
At the moment ASLR only affects dlls and executables with the dynamic-relocation flag set. This includes Vista/Win7 system dlls and executables, and any developer made items where the developer intentionally set that flag during the build.
If you are going to set the dynamic-relocation flag, then don't bother rebasing the dlls. If all your clients have 4GB of RAM, then don't bother. If your boss is a cheapskate, then maybe.

You have to consider that user DLLs (that are not already loaded into another processes) has to be read from HDD. Usually the memory mapping is used for that (and it uses lazy loading), so if they have to be relocated, they'll have to be actually read from HDD before the process can start.
For those loaded by other processes the copy-on-write mechanism is used. So, again, relocating them will mean additional operations.
What's about ASLR, it's intended for security purposes, not for performance.

Yes, you should do it.
ASLR only impacts "system" DLLs and therefore the ones you are writing should not be impacted by ASLR. Additionally, ASLR doesn't completely "randomize" the location of these system binaries, it simply shuffles them around in the basic spot in the vm map.

Force executable into memory?

I have a cpp executable (it contains static libraries), about 1MB in size. When I run the exe, it consumes less than 200kb memory.
From what I understand this means the computer reads the exe little by little when it's needed from the HDD.
I want to improve the performance, even a bit, so, how can I say "load the exe into memory" and don't touch the HDD? Will this bring any performance improvement?

The OS will load parts of the executable into memory as it is needed. This is where knowing more about the instruction cache might be useful. The idea is that you structure your program so that common code is grouped together. For example, you might have some functions that are getting inlined - in this case the OS would have to load the same code in multiple places which might be slow. By removing the inline you'd have the code in one chunk in memory which would get cached and thus reduce loading time.
I would agree with the others though that this type of optimization should really be reserved until after you profile and know for sure that this is the bottleneck, which is very unlikely

If you really want to do this, you need to touch the memory pages by reading from them. But forcing pages into memory once does not guarantee that they will remain in memory. An apparent alternative solution would be to VirtualLock the region, but in practice this function doesn't work the way you'd think (at least on any system where I've used it), even if you have the appropriate privilegues.
Note that the default minimum working set is only 16MB, so for larger executables, forcing pages into RAM will necessarily push others (which you need!) out of the working set, so this is in fact an anti-optimization. Unless you have the necessary privilegues to increase the working set size.
It's a bit tedious to find out where the executable's mapping starts and ends. Not that it is impossible, but it's much more complicated than just mapping the file again. Then you simply run a loop which reads one byte every 4096 bytes, and you are done. This will consume twice as much address space, but will consume the same amount of RAM (thanks to how memory mapping works).
But, realistically, you will gain absolutely nothing from doing this.
The operating system does not need to load the entire executable and does not need to keep it resident at all times. Part of your executable will be debug info or import info, which the loader will maybe look at once (or won't look at) and never need afterwards. Forcing that stuff into memory only means you purge useful pages from the working set.
The OS likely has the parts (or most of it) that are not visible to you in the buffer cache anyway, but even if that isn't the case, you will hardly ever notice a difference.

Globally, forcing all of the program into RAM will slow it down.
There are usually large parts of the code which aren't executed
in any given run, and there's no need to ever read these from
disk.
Where forcing all or parts of the program into RAM can make a difference
is latency. If you're responding in real time to external
events, having to load the code in order to respond will reduce
latency. This can only be done by using a system specific
request (e.g. mlock under Posix systems supporting the read
time extension). You'll probably have to have special rights to
be able to do it, though. In practice, it should only be used
on machines dedicated to a specific application, since it can
have a very negative impact on the total system performance.
(There's a reason that it's in the real-time extensions, and not
in the basic Posix.) Locking the addresses used by the function in memory means that there can be no page faults when it is executed.

May a compiler ever generate code to unload parts of the code segment during execution?

Apart from Dll concept that provides ability of loading/unloading methods or functions at run-time, I'm wondering if a compiler may ever say something like, ok as this particular part of the code takes considerable amount of space in code segment and is never gonna be used again after this point during program execution, it'd be good to generate some code to unload that part of code segment after reaching that particular point during program execution so that overall space took by code segment gets smaller. Is it something just fictional or may that happen?

Sure. There's a technique called overlaying that loads different code into the same bit of address space at different times. Sometimes it was done manually, other times compilers helped. Sometimes the loading is done in software, sometimes in hardware (with address multiplexing, so that e.g. during boot time one bit of address space reads from a ROM chip, but after boot it switches to address RAM or a different ROM instead).
Overlaying was much more common when computers had less memory, e.g. in the early days of DOS where you had 640K at best and often not even that. These days it still has applications for embedded systems where memory and/or address space are at a premium.

A compiler can do anything it wants to, as long as that doesn't violate the standard. If it can figure out that the code is never called again, it can ditch it completely.
It could even replace it with a smaller stub function that would reload the code, were it required.
But you'll be very unlikely to ever see that in a modern OS since the OS itself provides that capability under the covers.
Operating systems (at least the common ones) will swap out your physical pages when memory runs low, and they won't be reloaded until they're needed (usually by a page fault when trying to access them).

Yes, Windows Device Drivers use this technique. The LE file format has certain code segments marked as discardable. The OS can also certain times take such a decision to swap out code segments that have not been used for a long time.
Stricly speaking however, this is not the area for compiler to play around with. It is mostly the linker/loader/OS that affect this.

I don't know of a compiler which does this, but there's no rule prohibiting it. If the compiler can prove that doing so won't change the semantics of the program, then it is allowed under the as-if rule.
It is usually not necessary however, because unused code can already get swapped out as part of the pagefile mechanism associated with virtual memory. (and because you'd probably only save a few KB of memory space)

Is rebasing DLLs (or providing an appropriate default load address) worth the trouble?

Rebasing a DLL means to fix up the DLL such, that it's preferred load adress is the load address that the Loader is actually able to load the DLL at.
This can either be achieved by a tool such as Rebase.exe or by specifying default load addresses for all your (own) dlls so that they "fit" in your executable process.
The whole point of managing the DLL base addresses this way is to speed up application loads. (Or so I understand.)
The question is now: Is it worth the trouble?
I have the book Windows via C/C++ by Richter/Nazarre and they strongly recommend[a] making sure that the load addresses all match up so that the Loader doesn't have to rebase the loaded DLLs.
They fail to argue however, if this speeds up application load times to any significant amount.
Also, with ASLR it seems dubious that this has any value at all, since the load addresses will be randomized anyway.
Are there any hard facts on the pro/cons of this?
[a]: In my WvC++/5th ed it is in the sections titled Rebasing Modules and Binding Modules on pages 568ff. in Chapter 20, DLL Advanced Techniques.

Patching the relocatable addresses isn't the big deal, that runs at memory speeds, microseconds. The bigger issue is that the pages that contains this code now need to be backed up by the paging file instead of the DLL file. In other words, when pages containing code are unmapped, they need to be written to the paging file instead of just getting discarded.
The cost of this isn't that easy to measure, especially on modern machines with lots of RAM. It only counts when the machine starts to get under load with lots of processes competing for memory. And the fragmentation of the paging file.
But clearly, rebasing is a very cheap optimization. And it is very easy to see in the Debug + Windows + Modules window, there's a bright icon on the rebased DLLs. The Address column gives you a good hint what base address would be a good choice. Leave ample space between them so you don't constantly have to tweak this as your program grows.

They fail to argue however, if this speeds up application load times to any significant amount.
The load time change is minimal, because the v-table is what gets updated with the new addresses. However, if you have low memory - enough that stuff gets loaded in/out of the page file, then the system has to keep the dll in the page file (since the addresses are changed). If the dlls were rebased - and the rebased dlls don't collide with any other dlls - then instead of swapping them out to the page file (and back), the system just overwrites the memory and reloads the dll from the original on the hard drive.
The benefit is only relevant when systems are paging stuff in and out of main memory. The last time I made efforts to keep databases of applications and their base addresses was back in VB6 days, when the computers in our offices and data centers were lucky to have even 256MB of RAM.
Also, with ASLR it seems dubious that this has any value at all, since the load addresses will be randomized anyway.
At the moment ASLR only affects dlls and executables with the dynamic-relocation flag set. This includes Vista/Win7 system dlls and executables, and any developer made items where the developer intentionally set that flag during the build.
If you are going to set the dynamic-relocation flag, then don't bother rebasing the dlls. If all your clients have 4GB of RAM, then don't bother. If your boss is a cheapskate, then maybe.

You have to consider that user DLLs (that are not already loaded into another processes) has to be read from HDD. Usually the memory mapping is used for that (and it uses lazy loading), so if they have to be relocated, they'll have to be actually read from HDD before the process can start.
For those loaded by other processes the copy-on-write mechanism is used. So, again, relocating them will mean additional operations.
What's about ASLR, it's intended for security purposes, not for performance.

Yes, you should do it.
ASLR only impacts "system" DLLs and therefore the ones you are writing should not be impacted by ASLR. Additionally, ASLR doesn't completely "randomize" the location of these system binaries, it simply shuffles them around in the basic spot in the vm map.

Are there any downsides to using UPX to compress a Windows executable?

I've used UPX before to reduce the size of my Windows executables, but I must admit that I am naive to any negative side effects this could have. What's the downside to all of this packing/unpacking?
Are there scenarios in which anyone would recommend NOT UPX-ing an executable (e.g. when writing a DLL, Windows Service, or when targeting Vista or Win7)? I write most of my code in Delphi, but I've used UPX to compress C/C++ executables as well.
On a side note, I'm not running UPX in some attempt to protect my exe from disassemblers, only to reduce the size of the executable and prevent cursory tampering.

... there are downsides to
using EXE compressors. Most notably:
Upon startup of a compressed EXE/DLL, all of the code is
decompressed from the disk image into
memory in one pass, which can cause
disk thrashing if the system is low on
memory and is forced to access the
swap file. In contrast, with
uncompressed EXE/DLLs, the OS
allocates memory for code pages on
demand (i.e. when they are executed).
Multiple instances of a compressed EXE/DLL create multiple
instances of the code in memory. If
you have a compressed EXE that
contains 1 MB of code (before
compression) and the user starts 5
instances of it, approximately 4 MB of
memory is wasted. Likewise, if you
have a DLL that is 1 MB and it is used
by 5 running applications,
approximately 4 MB of memory is
wasted. With uncompressed EXE/DLLs,
code is only stored in memory once and
is shared between instances.
http://www.jrsoftware.org/striprlc.php#execomp

I'm surprised this hasn't been mentioned yet but using UPX-packed executables also increases the risk of producing false-positives from heuristic anti-virus software because statistically a lot of malware also uses UPX.

There are three drawbacks:
The whole code will be fully uncompressed in virtual memory, while in a regular EXE or DLL, only the code actually used is loaded in memory. This is especially relevant if only a small portion of the code in your EXE/DLL is used at each run.
If there are multiple instances of your DLL and EXE running, their code can't be shared across the instances, so you'll be using more memory.
If your EXE/DLL is already in cache, or on a very fast storage medium, or if the CPU you're running on is slow, you will experience reduced startup speed as decompression will still have to take place, and you won't benefit from the reduced size. This is especially true for an EXE that will be invoked multiple times repeatedly.
Thus the above drawbacks are more of an issue if your EXE or DLLs contains lots of resources, but otherwise, they may not be much of a factor in practice, given the relative size of executables and available memory, unless you're talking of DLLs used by lots of executables (like system DLLs).
To dispell some incorrect information in other answers:
UPX will not affect your ability to run on DEP-protected machines.
UPX will not affect the ability of major anti-virus software, as they support UPX-compressed executables (as well as other executable compression formats).
UPX has been able to use LZMA compression for some time now (7zip's compression algorithm), use the --lzma switch.

The only time size matters is during download off the Internet. If you are using UPX then you actually get worse performance than if you use 7-zip (based on my testing 7-Zip is twice as good as UPX). Then when it is actually left compressed on the target computer your performance is decreased (see Lars' answer). So UPX is not a good solution for file size. Just 7zip the whole thing.
As far as to prevent tampering, it is a FAIL as well. UPX supports decompressing too. If someone wants to modify the EXE then they will see it is compress with UPX and then uncompress it. The percentage of possible crackers you might slow down does not justify the effort and performance loss.
A better solution would be to use binary signing or at least just a hash. A simple hash verification system is to take a hash of your binary and a secret value (usually a guid). Only your EXE knows the secret value, so when it recalculates the hash for verification it can use it again. This isn't perfect (the secret value can be retrieved). The ideal situation would be to use a certificate and a signature.

The final size of the executable on disk is largely irrelevant these days. Your program may load a few milliseconds faster, but once it starts running the difference is indistinguishable.
Some people may be more suspicious of your executable just because it is compressed with UPX. Depending on your end users, this may or may not be an important consideration.

The last time I tried to use it on a managed assembly, it munged it so bad that the runtime refused to load it. That's the only time I can think of that you wouldn't want to use it (and, really, it's been so long since I tried that that the situation may even be better now). I've used it extensively in the past on all types of unmanaged binaries, and never had an issue.

If your only interest is in decreasing the size of the executables, then have you tried comparing the size of the executable with and without runtime packages? Granted you will have to also include the sizes of the packages overall along with your executable, but if you have multiple executables which use the same base packages, then your savings would be rather high.
Another thing to look at would be the graphics/glyphs you use in your program. You can save quite a bit of space by consolidating them to a single Timagelist included in a global data module rather than have them repeated on each form. I believe each image is stored in the form resource as hex, so that would mean that each byte takes up two bytes...you can shrink this a bit by loading the image from a RCData resource using a TResourceStream.

There are no drawbacks.
But just FYI, there is a very common misconception regarding UPX as--
resources are NOT just being compressed
Essentially you are building a new executable that has a "loader" duty and the "real" executable, well, is being section-stripped and compressed, placed as a binary-data resource of the loader executable (regardless the types of resources were in the original executable).
Using reverse-engineering methods and tools either for education purposes or other will show you the information regarding the "loader executable", and not variable information regarding the original executable.

IMHO routinely UPXing is pointless, but the reasons are spelled above, mostly, memory is more expensive than disk.
Erik: the LZMA stub might be bigger. Even if the algorithm is better, it does not always be a net plus.

Virus scanners that look for 'unknown' viruses can flag UPX compressed executables as having a virus. I have been told this is because several viruses use UPX to hide themselves. I have used UPX on software and McAfee will flag the file as having a virus.

The reason UPX has so many false alarms is because its open licensing allows malware authors to use and modify it with impunity. Of course, this issue is inherent to the industry, but sadly the great UPX project is plagued by this problem.
UPDATE: Note that as the Taggant project is completed, the ability to use UPX (or anything else) without causing false positives will be enhanced, assuming UPX supports it.

I believe there is a possibility that it might not work on computers that have DEP (Data Execution Prevention) turned on.

When Windows load a binary, first thing it does is called Import/Export Table resolution. Ie, what ever API and DLL that is indicated in the Import Table, it will first load the DLL into a randomly generated base address. And using the base address plus offset into the DLL's function, this information will be updated to the Import Table.
EXE does not have Export Table.
All these happened even before jumping to the original entry point for execution.
Then after it start executing from the entry point, the EXE will run a small piece of code before starting the decompression algorithm. This small piece of code also means that the Windows API needed will be very small, resulting in a small Import Table.
But after the binary is decompressed, if it started to use any Windows API not resolved before, then likely it is going to crash. So it is essential that the decompression routine will resolve and update the Import Table for all the referenced Window API inside the decompressed codes, before executing the decompressed codes.
References:
https://malwaretips.com/threads/malware-analysis-2-pe-imports-static-analysis.62135/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js