Has image base address optimization sense? [duplicate]

Has image base address optimization sense? [duplicate] - c++

Rebasing a DLL means to fix up the DLL such, that it's preferred load adress is the load address that the Loader is actually able to load the DLL at.
This can either be achieved by a tool such as Rebase.exe or by specifying default load addresses for all your (own) dlls so that they "fit" in your executable process.
The whole point of managing the DLL base addresses this way is to speed up application loads. (Or so I understand.)
The question is now: Is it worth the trouble?
I have the book Windows via C/C++ by Richter/Nazarre and they strongly recommend[a] making sure that the load addresses all match up so that the Loader doesn't have to rebase the loaded DLLs.
They fail to argue however, if this speeds up application load times to any significant amount.
Also, with ASLR it seems dubious that this has any value at all, since the load addresses will be randomized anyway.
Are there any hard facts on the pro/cons of this?
[a]: In my WvC++/5th ed it is in the sections titled Rebasing Modules and Binding Modules on pages 568ff. in Chapter 20, DLL Advanced Techniques.

Patching the relocatable addresses isn't the big deal, that runs at memory speeds, microseconds. The bigger issue is that the pages that contains this code now need to be backed up by the paging file instead of the DLL file. In other words, when pages containing code are unmapped, they need to be written to the paging file instead of just getting discarded.
The cost of this isn't that easy to measure, especially on modern machines with lots of RAM. It only counts when the machine starts to get under load with lots of processes competing for memory. And the fragmentation of the paging file.
But clearly, rebasing is a very cheap optimization. And it is very easy to see in the Debug + Windows + Modules window, there's a bright icon on the rebased DLLs. The Address column gives you a good hint what base address would be a good choice. Leave ample space between them so you don't constantly have to tweak this as your program grows.

I'd like to provide one answer myself, although the answers of Hans Passant and others are describing the tradeoffs already pretty well.
After recently fiddling with DLL base addresses in our application, I will here give my conclusion:
I think that, unless you can prove otherwise, providing DLLs with a non-default Base Address is an exercise in futility. This includes rebasing my DLLs.
For the DLLs I control, given the average application, each DLL will be loaded into memory only once anyway, so the load on the paging file should be minimal. (But see the comment of Michal Burr in another answer about Terminal Server environment.)
If DLLs are provided with a fixed base address (without rebasing) it will actually increase address space fragmentation, as sooner or later these addresses won't match anymore. In our app we had given all DLLs a fixed base address (for other legacy reasons, and not because of address space fragmentation) without using rebase.exe and this significantly increased address space fragmentation for us because you really can't get this right manually.
Rebasing (via rebase.exe) is not cheap. It is another step in the build process that has to be maintained and checked, so it has to have some benefit.
A large application will always have some DLLs loaded where the base address does not match, because of some hook DLLs (AV) and because you don't rebase 3rd party DLLs (or at least I wouldn't).
If you're using a RAM disk for the paging file, you might actually be better of if loaded DLLs get paged out :-)
So to sum up, I think that rebasing isn't worth the trouble except for special cases like the system DLLs.
I'd like to add a historical piece that I found on Old New Thing: How did Windows 95 rebase DLLs? --
When a DLL needed to be rebased, Windows 95 would merely make a note
of the DLL's new base address, but wouldn't do much else. The real
work happened when the pages of the DLL ultimately got swapped in. The
raw page was swapped off the disk, then the fix-ups were applied on
the fly to the raw page, thereby relocating it. The fixed-up page was
then mapped into the process's address space and the program was
allowed to continue.
Looking at how this process is done (read the whole thing), I personally suspect that part of the "rebasing is evil" stance dates back to the olden days of Win9x and low memory conditions.
Look, now there's a non-historical piece on Old New Thing:
How important is it nowadays to ensure that all my DLLs have non-conflicting base addresses?
Back in the day, one of the things you were exhorted to do was rebase
your DLLs so that they all had nonoverlapping address ranges, thereby
avoiding the cost of runtime relocation. Is this still important
nowadays?
...
In the presence of ASLR, rebasing your DLLs has no effect because ASLR is going to ignore your base address anyway and relocate the DLL into a location of its pseudo-random choosing.
...
Conclusion: It doesn't hurt to rebase, just in case, but understand
that the payoff will be extremely rare. Build your DLL with
/DYNAMICBASE enabled (and with /HIGHENTROPYVA for good measure)
and let ASLR do the work of ensuring that no base address collision
occurs. That will cover pretty much all of the real-world scenarios.
If you happen to fall into one of the very rare cases where ASLR is
not available, then your program will still work. It just may run a
little slower due to the relocation penalty.
... ASLR actually does a better job of avoiding collisions than manual
rebasing, since ASLR can view the system as a whole, whereas manual
rebasing requires you to know all the DLLs that are loaded into your
process, and coordinating base addresses across multiple vendors is
generally not possible.

They fail to argue however, if this speeds up application load times to any significant amount.
The load time change is minimal, because the v-table is what gets updated with the new addresses. However, if you have low memory - enough that stuff gets loaded in/out of the page file, then the system has to keep the dll in the page file (since the addresses are changed). If the dlls were rebased - and the rebased dlls don't collide with any other dlls - then instead of swapping them out to the page file (and back), the system just overwrites the memory and reloads the dll from the original on the hard drive.
The benefit is only relevant when systems are paging stuff in and out of main memory. The last time I made efforts to keep databases of applications and their base addresses was back in VB6 days, when the computers in our offices and data centers were lucky to have even 256MB of RAM.
Also, with ASLR it seems dubious that this has any value at all, since the load addresses will be randomized anyway.
At the moment ASLR only affects dlls and executables with the dynamic-relocation flag set. This includes Vista/Win7 system dlls and executables, and any developer made items where the developer intentionally set that flag during the build.
If you are going to set the dynamic-relocation flag, then don't bother rebasing the dlls. If all your clients have 4GB of RAM, then don't bother. If your boss is a cheapskate, then maybe.

You have to consider that user DLLs (that are not already loaded into another processes) has to be read from HDD. Usually the memory mapping is used for that (and it uses lazy loading), so if they have to be relocated, they'll have to be actually read from HDD before the process can start.
For those loaded by other processes the copy-on-write mechanism is used. So, again, relocating them will mean additional operations.
What's about ASLR, it's intended for security purposes, not for performance.

Yes, you should do it.
ASLR only impacts "system" DLLs and therefore the ones you are writing should not be impacted by ASLR. Additionally, ASLR doesn't completely "randomize" the location of these system binaries, it simply shuffles them around in the basic spot in the vm map.

Related

How do DLLs handle concurrency from multiple processes?

I understand from Eric Lippert's answer that "two processes can share non-private memory pages. If twenty processes all load the same DLL, the processes all share the memory pages for that code. They don't share virtual memory address space, they share memory."
Now, if the same DLL file on the harddisk, after loaded into appliations, would share the same physical memory (be it RAM or page files), but mapped to different virtual memory address spaces, wouldn't that make it quite difficult to handle concurrency?
As I understand, concurrency concept in C++ is more about handling threading -- A process can start multiple threads, each can be run on an individual core, so when different threads calls the DLL at the same time, there might be data racing and we need mutex, lock, signal, conditional variable and so on.
But how would a DLL handles multi-processes? The same concept of data racing will happen, isn't it? What are the tools to handle that? Still the same toolset?

Now, if the same DLL file on the hard disk, after loaded into applications, would share the same physical memory (be it RAM or page files), but mapped to different virtual memory address spaces, wouldn't that make it quite difficult to handle concurrency?
As other answers have noted, the concurrency issues are of no concern if the shared memory is never written after it is initialized, which is typically the case for DLLs. If you are attempting to alter the code or resources in a DLL by writing into memory, odds are good you have a bad pointer somewhere and the best thing to do is to crash with an access violation.
However I wanted to also briefly follow up on your concern:
... mapped to different virtual memory address spaces ...
In practice we try very hard to avoid this happening because when it does, there can be a serious user-noticeable performance problem when loading code pages for the first time. (And of course a possible large increase in working set, which causes other performance problems.)
The code in a DLL often contains hard-coded virtual memory addresses, on the assumption that the code will be loaded into a known-at-compile-time virtual memory "base" address. If this assumption is violated at runtime -- because there's another DLL already there, for example -- then all those hard-coded addresses need to be patched at runtime, which is expensive.
If you want some historical details, see Raymond's article on the subject: https://blogs.msdn.microsoft.com/oldnewthing/20041217-00/?p=36953/

DLL's contain multiple "segments", and each segment has a descriptor telling Windows its Characteristics. This is a 32 bits DWORD. Code segments obviously have the code bit set, and generally also the shareable bit. Read-only data can also be shareable, whereas writeable data generally does not have the shareable flag.
Now you can set an unusual combination of characteristics on an extra segment: writeable and shareable. That is not the default, and indeed might cause race conditions. So the final answer to your question is: the problem is avoided chiefly by the default characteristics of segments, and secondly any DLL which has a segment with non-standard characteristics must deal with the self-inflicted problems.

Force executable into memory?

I have a cpp executable (it contains static libraries), about 1MB in size. When I run the exe, it consumes less than 200kb memory.
From what I understand this means the computer reads the exe little by little when it's needed from the HDD.
I want to improve the performance, even a bit, so, how can I say "load the exe into memory" and don't touch the HDD? Will this bring any performance improvement?

The OS will load parts of the executable into memory as it is needed. This is where knowing more about the instruction cache might be useful. The idea is that you structure your program so that common code is grouped together. For example, you might have some functions that are getting inlined - in this case the OS would have to load the same code in multiple places which might be slow. By removing the inline you'd have the code in one chunk in memory which would get cached and thus reduce loading time.
I would agree with the others though that this type of optimization should really be reserved until after you profile and know for sure that this is the bottleneck, which is very unlikely

If you really want to do this, you need to touch the memory pages by reading from them. But forcing pages into memory once does not guarantee that they will remain in memory. An apparent alternative solution would be to VirtualLock the region, but in practice this function doesn't work the way you'd think (at least on any system where I've used it), even if you have the appropriate privilegues.
Note that the default minimum working set is only 16MB, so for larger executables, forcing pages into RAM will necessarily push others (which you need!) out of the working set, so this is in fact an anti-optimization. Unless you have the necessary privilegues to increase the working set size.
It's a bit tedious to find out where the executable's mapping starts and ends. Not that it is impossible, but it's much more complicated than just mapping the file again. Then you simply run a loop which reads one byte every 4096 bytes, and you are done. This will consume twice as much address space, but will consume the same amount of RAM (thanks to how memory mapping works).
But, realistically, you will gain absolutely nothing from doing this.
The operating system does not need to load the entire executable and does not need to keep it resident at all times. Part of your executable will be debug info or import info, which the loader will maybe look at once (or won't look at) and never need afterwards. Forcing that stuff into memory only means you purge useful pages from the working set.
The OS likely has the parts (or most of it) that are not visible to you in the buffer cache anyway, but even if that isn't the case, you will hardly ever notice a difference.

Globally, forcing all of the program into RAM will slow it down.
There are usually large parts of the code which aren't executed
in any given run, and there's no need to ever read these from
disk.
Where forcing all or parts of the program into RAM can make a difference
is latency. If you're responding in real time to external
events, having to load the code in order to respond will reduce
latency. This can only be done by using a system specific
request (e.g. mlock under Posix systems supporting the read
time extension). You'll probably have to have special rights to
be able to do it, though. In practice, it should only be used
on machines dedicated to a specific application, since it can
have a very negative impact on the total system performance.
(There's a reason that it's in the real-time extensions, and not
in the basic Posix.) Locking the addresses used by the function in memory means that there can be no page faults when it is executed.

Is rebasing DLLs (or providing an appropriate default load address) worth the trouble?

Rebasing a DLL means to fix up the DLL such, that it's preferred load adress is the load address that the Loader is actually able to load the DLL at.
This can either be achieved by a tool such as Rebase.exe or by specifying default load addresses for all your (own) dlls so that they "fit" in your executable process.
The whole point of managing the DLL base addresses this way is to speed up application loads. (Or so I understand.)
The question is now: Is it worth the trouble?
I have the book Windows via C/C++ by Richter/Nazarre and they strongly recommend[a] making sure that the load addresses all match up so that the Loader doesn't have to rebase the loaded DLLs.
They fail to argue however, if this speeds up application load times to any significant amount.
Also, with ASLR it seems dubious that this has any value at all, since the load addresses will be randomized anyway.
Are there any hard facts on the pro/cons of this?
[a]: In my WvC++/5th ed it is in the sections titled Rebasing Modules and Binding Modules on pages 568ff. in Chapter 20, DLL Advanced Techniques.

Patching the relocatable addresses isn't the big deal, that runs at memory speeds, microseconds. The bigger issue is that the pages that contains this code now need to be backed up by the paging file instead of the DLL file. In other words, when pages containing code are unmapped, they need to be written to the paging file instead of just getting discarded.
The cost of this isn't that easy to measure, especially on modern machines with lots of RAM. It only counts when the machine starts to get under load with lots of processes competing for memory. And the fragmentation of the paging file.
But clearly, rebasing is a very cheap optimization. And it is very easy to see in the Debug + Windows + Modules window, there's a bright icon on the rebased DLLs. The Address column gives you a good hint what base address would be a good choice. Leave ample space between them so you don't constantly have to tweak this as your program grows.

They fail to argue however, if this speeds up application load times to any significant amount.
The load time change is minimal, because the v-table is what gets updated with the new addresses. However, if you have low memory - enough that stuff gets loaded in/out of the page file, then the system has to keep the dll in the page file (since the addresses are changed). If the dlls were rebased - and the rebased dlls don't collide with any other dlls - then instead of swapping them out to the page file (and back), the system just overwrites the memory and reloads the dll from the original on the hard drive.
The benefit is only relevant when systems are paging stuff in and out of main memory. The last time I made efforts to keep databases of applications and their base addresses was back in VB6 days, when the computers in our offices and data centers were lucky to have even 256MB of RAM.
Also, with ASLR it seems dubious that this has any value at all, since the load addresses will be randomized anyway.
At the moment ASLR only affects dlls and executables with the dynamic-relocation flag set. This includes Vista/Win7 system dlls and executables, and any developer made items where the developer intentionally set that flag during the build.
If you are going to set the dynamic-relocation flag, then don't bother rebasing the dlls. If all your clients have 4GB of RAM, then don't bother. If your boss is a cheapskate, then maybe.

You have to consider that user DLLs (that are not already loaded into another processes) has to be read from HDD. Usually the memory mapping is used for that (and it uses lazy loading), so if they have to be relocated, they'll have to be actually read from HDD before the process can start.
For those loaded by other processes the copy-on-write mechanism is used. So, again, relocating them will mean additional operations.
What's about ASLR, it's intended for security purposes, not for performance.

Yes, you should do it.
ASLR only impacts "system" DLLs and therefore the ones you are writing should not be impacted by ASLR. Additionally, ASLR doesn't completely "randomize" the location of these system binaries, it simply shuffles them around in the basic spot in the vm map.

How can I measure runtime memory requirements of an application on windows platform?

How can I measure runtime memory requirements of an application on windows platform?

Perfmon.exe will monitor the usage of a process.
Run perfmon.exe, right-click Add counters, pick Process for the Performance Object, then choose things like Virtual Bytes, Working Set, and Page File.

I'll assume you mean memory use at a particular point in time, not how much it could potentially ever need.
You can get the information about how much a process is consuming through the windows API, for example GetProcessMemoryInfo. Windows allocates memory in blocks so it may be more accurate than just checking how much memory or heap space is used.
See more details from MSDN

"Memory requirements" are not very well defined, in the first place. WHen you start, your executabel will be linked to many DLLs. Together with the first stack, this forms your initial process. Then, your proces might start extra threads, allocate more memory, and/or memory map some files.
Now Wwindows won't give you real RAM for all these needs. Many DLLs are already loaded for other reasons, so you'll share that RAM. Extra RAM for stacks is allocated when you get soft stackoverflows. Memory mapped files get RAM allocated when those pages fault.
So, one of the important questions is what you really want. You have to answer that first.

Are there any downsides to using UPX to compress a Windows executable?

I've used UPX before to reduce the size of my Windows executables, but I must admit that I am naive to any negative side effects this could have. What's the downside to all of this packing/unpacking?
Are there scenarios in which anyone would recommend NOT UPX-ing an executable (e.g. when writing a DLL, Windows Service, or when targeting Vista or Win7)? I write most of my code in Delphi, but I've used UPX to compress C/C++ executables as well.
On a side note, I'm not running UPX in some attempt to protect my exe from disassemblers, only to reduce the size of the executable and prevent cursory tampering.

... there are downsides to
using EXE compressors. Most notably:
Upon startup of a compressed EXE/DLL, all of the code is
decompressed from the disk image into
memory in one pass, which can cause
disk thrashing if the system is low on
memory and is forced to access the
swap file. In contrast, with
uncompressed EXE/DLLs, the OS
allocates memory for code pages on
demand (i.e. when they are executed).
Multiple instances of a compressed EXE/DLL create multiple
instances of the code in memory. If
you have a compressed EXE that
contains 1 MB of code (before
compression) and the user starts 5
instances of it, approximately 4 MB of
memory is wasted. Likewise, if you
have a DLL that is 1 MB and it is used
by 5 running applications,
approximately 4 MB of memory is
wasted. With uncompressed EXE/DLLs,
code is only stored in memory once and
is shared between instances.
http://www.jrsoftware.org/striprlc.php#execomp

I'm surprised this hasn't been mentioned yet but using UPX-packed executables also increases the risk of producing false-positives from heuristic anti-virus software because statistically a lot of malware also uses UPX.

There are three drawbacks:
The whole code will be fully uncompressed in virtual memory, while in a regular EXE or DLL, only the code actually used is loaded in memory. This is especially relevant if only a small portion of the code in your EXE/DLL is used at each run.
If there are multiple instances of your DLL and EXE running, their code can't be shared across the instances, so you'll be using more memory.
If your EXE/DLL is already in cache, or on a very fast storage medium, or if the CPU you're running on is slow, you will experience reduced startup speed as decompression will still have to take place, and you won't benefit from the reduced size. This is especially true for an EXE that will be invoked multiple times repeatedly.
Thus the above drawbacks are more of an issue if your EXE or DLLs contains lots of resources, but otherwise, they may not be much of a factor in practice, given the relative size of executables and available memory, unless you're talking of DLLs used by lots of executables (like system DLLs).
To dispell some incorrect information in other answers:
UPX will not affect your ability to run on DEP-protected machines.
UPX will not affect the ability of major anti-virus software, as they support UPX-compressed executables (as well as other executable compression formats).
UPX has been able to use LZMA compression for some time now (7zip's compression algorithm), use the --lzma switch.

The only time size matters is during download off the Internet. If you are using UPX then you actually get worse performance than if you use 7-zip (based on my testing 7-Zip is twice as good as UPX). Then when it is actually left compressed on the target computer your performance is decreased (see Lars' answer). So UPX is not a good solution for file size. Just 7zip the whole thing.
As far as to prevent tampering, it is a FAIL as well. UPX supports decompressing too. If someone wants to modify the EXE then they will see it is compress with UPX and then uncompress it. The percentage of possible crackers you might slow down does not justify the effort and performance loss.
A better solution would be to use binary signing or at least just a hash. A simple hash verification system is to take a hash of your binary and a secret value (usually a guid). Only your EXE knows the secret value, so when it recalculates the hash for verification it can use it again. This isn't perfect (the secret value can be retrieved). The ideal situation would be to use a certificate and a signature.

The final size of the executable on disk is largely irrelevant these days. Your program may load a few milliseconds faster, but once it starts running the difference is indistinguishable.
Some people may be more suspicious of your executable just because it is compressed with UPX. Depending on your end users, this may or may not be an important consideration.

The last time I tried to use it on a managed assembly, it munged it so bad that the runtime refused to load it. That's the only time I can think of that you wouldn't want to use it (and, really, it's been so long since I tried that that the situation may even be better now). I've used it extensively in the past on all types of unmanaged binaries, and never had an issue.

If your only interest is in decreasing the size of the executables, then have you tried comparing the size of the executable with and without runtime packages? Granted you will have to also include the sizes of the packages overall along with your executable, but if you have multiple executables which use the same base packages, then your savings would be rather high.
Another thing to look at would be the graphics/glyphs you use in your program. You can save quite a bit of space by consolidating them to a single Timagelist included in a global data module rather than have them repeated on each form. I believe each image is stored in the form resource as hex, so that would mean that each byte takes up two bytes...you can shrink this a bit by loading the image from a RCData resource using a TResourceStream.

There are no drawbacks.
But just FYI, there is a very common misconception regarding UPX as--
resources are NOT just being compressed
Essentially you are building a new executable that has a "loader" duty and the "real" executable, well, is being section-stripped and compressed, placed as a binary-data resource of the loader executable (regardless the types of resources were in the original executable).
Using reverse-engineering methods and tools either for education purposes or other will show you the information regarding the "loader executable", and not variable information regarding the original executable.

IMHO routinely UPXing is pointless, but the reasons are spelled above, mostly, memory is more expensive than disk.
Erik: the LZMA stub might be bigger. Even if the algorithm is better, it does not always be a net plus.

Virus scanners that look for 'unknown' viruses can flag UPX compressed executables as having a virus. I have been told this is because several viruses use UPX to hide themselves. I have used UPX on software and McAfee will flag the file as having a virus.

The reason UPX has so many false alarms is because its open licensing allows malware authors to use and modify it with impunity. Of course, this issue is inherent to the industry, but sadly the great UPX project is plagued by this problem.
UPDATE: Note that as the Taggant project is completed, the ability to use UPX (or anything else) without causing false positives will be enhanced, assuming UPX supports it.

I believe there is a possibility that it might not work on computers that have DEP (Data Execution Prevention) turned on.

When Windows load a binary, first thing it does is called Import/Export Table resolution. Ie, what ever API and DLL that is indicated in the Import Table, it will first load the DLL into a randomly generated base address. And using the base address plus offset into the DLL's function, this information will be updated to the Import Table.
EXE does not have Export Table.
All these happened even before jumping to the original entry point for execution.
Then after it start executing from the entry point, the EXE will run a small piece of code before starting the decompression algorithm. This small piece of code also means that the Windows API needed will be very small, resulting in a small Import Table.
But after the binary is decompressed, if it started to use any Windows API not resolved before, then likely it is going to crash. So it is essential that the decompression routine will resolve and update the Import Table for all the referenced Window API inside the decompressed codes, before executing the decompressed codes.
References:
https://malwaretips.com/threads/malware-analysis-2-pe-imports-static-analysis.62135/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js