If I understand correctly, the .bss section in ELF files is used to allocate space for zero-initialized variables. Our tool chain produces ELF files, hence my question: does the .bss section actually have to contain all those zeroes? It seems such an awful waste of spaces that when, say, I allocate a global ten megabyte array, it results in ten megabytes of zeroes in the ELF file. What am I seeing wrong here?
Has been some time since i worked with ELF. But i think i still remember this stuff. No, it does not physically contain those zeros. If you look into an ELF file program header, then you will see each header has two numbers: One is the size in the file. And another is the size as the section has when allocated in virtual memory (readelf -l ./a.out):
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4
INTERP 0x000114 0x08048114 0x08048114 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x00454 0x00454 R E 0x1000
LOAD 0x000454 0x08049454 0x08049454 0x00104 0x61bac RW 0x1000
DYNAMIC 0x000468 0x08049468 0x08049468 0x000d0 0x000d0 RW 0x4
NOTE 0x000128 0x08048128 0x08048128 0x00020 0x00020 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Headers of type LOAD are the one that are copied into virtual memory when the file is loaded for execution. Other headers contain other information, like the shared libraries that are needed. As you see, the FileSize and MemSiz significantly differ for the header that contains the bss section (the second LOAD one):
0x00104 (file-size) 0x61bac (mem-size)
For this example code:
int a[100000];
int main() { }
The ELF specification says that the part of a segment that the mem-size is greater than the file-size is just filled out with zeros in virtual memory. The segment to section mapping of the second LOAD header is like this:
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
So there are some other sections in there too. For C++ constructor/destructors. The same thing for Java. Then it contains a copy of the .dynamic section and other stuff useful for dynamic linking (i believe this is the place that contains the needed shared libraries among other stuff). After that the .data section that contains initialized globals and local static variables. At the end, the .bss section appears, which is filled by zeros at load time because file-size does not cover it.
By the way, you can see into which output-section a particular symbol is going to be placed by using the -M linker option. For gcc, you use -Wl,-M to put the option through to the linker. The above example shows that a is allocated within .bss. It may help you verify that your uninitialized objects really end up in .bss and not somewhere else:
.bss 0x08049560 0x61aa0
[many input .o files...]
*(COMMON)
*fill* 0x08049568 0x18 00
COMMON 0x08049580 0x61a80 /tmp/cc2GT6nS.o
0x08049580 a
0x080ab000 . = ALIGN ((. != 0x0)?0x4:0x1)
0x080ab000 . = ALIGN (0x4)
0x080ab000 . = ALIGN (0x4)
0x080ab000 _end = .
GCC keeps uninitialized globals in a COMMON section by default, for compatibility with old compilers, that allow to have globals defined twice in a program without multiple definition errors. Use -fno-common to make GCC use the .bss sections for object files (does not make a difference for the final linked executable, because as you see it's going to get into a .bss output section anyway. This is controlled by the linker script. Display it with ld -verbose). But that shouldn't scare you, it's just an internal detail. See the manpage of gcc.
The .bss section in an ELF file is used for static data which is not initialized programmatically but guaranteed to be set to zero at runtime. Here's a little example that will explain the difference.
int main() {
static int bss_test1[100];
static int bss_test2[100] = {0};
return 0;
}
In this case bss_test1 is placed into the .bss since it is uninitialized. bss_test2 however is placed into the .data segment along with a bunch of zeros. The runtime loader basically allocates the amount of space reserved for the .bss and zeroes it out before any userland code begins executing.
You can see the difference using objdump, nm, or similar utilities:
moozletoots$ objdump -t a.out | grep bss_test
08049780 l O .bss 00000190 bss_test1.3
080494c0 l O .data 00000190 bss_test2.4
This is usually one of the first surprises that embedded developers run into... never initialize statics to zero explicitly. The runtime loader (usually) takes care of that. As soon as you initialize anything explicitly, you are telling the compiler/linker to include the data in the executable image.
A .bss section is not stored in an executable file. Of the most common sections (.text, .data, .bss), only .text (actual code) and .data (initialized data) are present in an ELF file.
That is correct, .bss is not present physically in the file, rather just the information about its size is present for the dynamic loader to allocate the .bss section for the application program.
As thumb rule only LOAD, TLS Segment gets the memory for the application program, rest are used for dynamic loader.
About static executable file, bss sections is also given space in the execuatble
Embedded application where there is no loader this is common.
Suman
Related
I used objdump -t on the debug-info file of a program to find the address ranges of each function. There are a few functions the bounds of which can not be determined using this method. Because objdump reports 0 for their sizes. These symbols are shown, below:
deregister_tm_clones 0000000000197ce0
register_tm_clones 0000000000197d20
__do_global_dtors_aux 0000000000197d70
frame_dummy 0000000000197db0
_fini 00000000004e9474
_init 00000000001889e8
How can I determine their sizes? I can only imagine using GDB disas command on the start address and find the end of the disassembly for the function. This may not work in all cases. What is the standard approach?
UPDATE:
I am implementing a Pintool to generate callstacks at runtime. I only need symbols in certain binaries. In other words, I need a subset of functions (e.g., those in the GTK library) to be included in the callstack. Therefore, at runtime, I will need the ranges for these libraries.
On the other hand, I need the ranges for the symbols to find their outgoing jumps. This is a sign of tail-call elimination, which necessitates stack updates.
I have a simple hello world program and after i dumpbin it with /headers flag, i get this output:
FILE HEADER VALUES
8664 machine (x64)
D number of sections
5A3D287F time date stamp Fri Dec 22 18:45:03 2017
48F file pointer to symbol table
2D number of symbols
0 size of optional header
0 characteristics
Summary
F .data
A0 .debug$S
2F .drectve
24 .pdata
B9 .text$mn
18 .xdata
What exactly xdata section do and what it contains? No info on msdn.
For future reference:
.text: codesegment (think functions); there can be multiple of those when enabling function sections or when comdat is involved (for example templates)
.data: datasegment (think global vars); there can be multiple of those when enabling data sections or when comdat is involved (for example templates)
.bss: datasegment initialized to zeros (not present above); there can be multiple of those when enabling data sections or when comdat is involved (for example templates)
.debug: Debug info; like others, there can be multiple of these when function sections are involved.
.pdata: for x86_64, this is the "exception info" for a method, it defines the start/end of a function, and a pointer to the unwind info (see .xdata); inside object files this is duplicated per function
.drectve: not sure; but from the name I'd guess linker directives.
.xdata: for x86_64; this is the unwind info part that pdata points to. It contains where the exception handler of a function is, and what to do to unwind it when an exception occurs: https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=vs-2019
The "$" postfix is used for sorting. Given:
- .sec$z
- .sec$data
- .sec$a
The sections are sorted before they are merged into an executable (so .sec$a first, then data, then z), this can be used to create start/end symbols to a pe section.
The repeated sections are for things like c++ templates, the compiler will instantiate a template in any translation unit that needs it and then the linker will pick one of those instantiations (usually the first encountered).
Less common are compiler-specific features like Microsoft's __declspec(selectany) that allow a variable to be defined more than once and again the linker will simply pick one of those definitions and discard the rest.
gcc's ld scripts will take all the .text* sections to create the final .text of the linked executable. You can examine those scripts to get an idea of how the linker creates an executable out of object files.
Suppose we have a simple code :
int* q = new int(13);
int main() {
return 0;
}
Clearly, variable q is global and initialized. From this answer, we expect q variable to be stored in initialized data segment (.data) within program file but it is a pointer, so it's value (which is an address in heap segment) is determined at run time. So what's the value stored in data segment within program file ?
My try:
In my thinking, compiler allocates some space for variable q (typically 8 bytes for 64 bit address) in data segment with no meaningful value. Then, puts some initialization code in text segment before main function code to initialize q variable at run time. Something like this in assembly :
....
mov edi, 4
call operator new(unsigned long)
mov DWORD PTR [rax], 13 // rax: 64 bit address (pointer value)
// offset : q variable offset in data segment, calculated by compiler
mov QWORD PTR [ds+offset], rax // store address in data segment
....
main:
....
Any idea?
Yes, that is essentially how it works.
Note that in ELF .data, .bss, and .text are actually sections, not segments. You can look at the assembly yourself by running your compiler:
c++ -S -O2 test.cpp
You will typically see a main function, and some kind of initialization code outside that function. The program entry point (part of your C++ runtime) will call the initialization code and then call main. The initialization code is also responsible for running things like constructors.
int *q will go in the .bss, not the .data section, since it's only initialized at run-time by a non-constant initializer (so this is only legal in C++, not in C). There's no need to have 8 bytes in the executable's data segment for it.
The compiler arranges for the initializer function to be run by putting its address into an array of initializers that the CRT (C Run-Time) startup code calls before calling main.
On the Godbolt compiler explorer, you can see the init function's asm without all the noise of directives. Notice that the addressing mode is just a simple RIP-relative access to q. The linker fills in the right offset from RIP at this point, since that's a link-time constant even though the .text and .bss sections end up in separate segments.
Godbolt's compiler-noise filtering isn't ideal for us. Some of the directives are relevant, but many of them aren't. Below is a hand-chosen mix of gcc6.2 -O3 asm output with Godbolt's "filter directives" option unchecked, for just the int* q = new int(13); statement. (No need to compile a main at the same time, we're not linking an executable).
# gcc6.2 -O3 output
_GLOBAL__sub_I_q: # presumably stands for subroutine
sub rsp, 8 # align the stack for calling another function
mov edi, 4 # 4 bytes
call operator new(unsigned long) # this is the demangled name, like from objdump -dC
mov DWORD PTR [rax], 13
mov QWORD PTR q[rip], rax # clang uses the equivalent `[rip + q]`
add rsp, 8
ret
.globl q
.bss
q:
.zero 8 # reserve 8 bytes in the BSS
There's no reference to the base of the ELF data (or any other) segment.
Also definitely no segment-register overrides. ELF segments have nothing to do with x86 segments. (And the default segment register for this is DS anyway, so the compiler doesn't need to emit [ds:rip+q] or anything. Some disassemblers may be explicit and show DS even though there was no segment-override prefix on the instruction, though.)
This is how the compiler arranges for it to be called before main():
# the "aw" sets options / flags for this section to tell the linker about it.
.section .init_array,"aw"
.align 8
.quad _GLOBAL__sub_I_q # this assembles to the absolute address of the function.
The CRT start code has a loop that knows the size of the .init_array section and uses a memory-indirect call instruction on each function-pointer in turn.
The .init_array section is marked writeable, so it goes into the data segment. I'm not sure what writes it. Maybe the CRT code marks it as already-done by zeroing the pointers after calling them?
There's a similar mechanism in Linux for running initializers in dynamic libraries, which is done by the ELF interpreter while doing dynamic linking. This is why you can call printf() or other glibc stdio functions from _start in a dynamically-linked binary created from hand-written asm, and why that fails in a statically linked binary if you don't call the right init functions. (See this Q&A for more about building static or dynamic binaries that define their own _start or just main(), with or without libc).
I just used DUMPBIN for the first time and I see the term HIGHLOW repeatedly in the output file:
BASE RELOCATIONS #7
11000 RVA, E0 SizeOfBlock
...
3B5 HIGHLOW 2001753D ___onexitbegin
3C1 HIGHLOW 2001753D ___onexitbegin
...
I'm curious what this term stands for. I didn't find anything on Google or Stackoverflow about it.
To apply a fixup, a delta is calculated as the difference between the
preferred base address, and the base where the image is actually
loaded.
The basic idea is that when doing a fixup at some address, we must know
what memory must be changed ("offset" field)
what value is needed for its relocation ("delta" value)
which parts of relocated data and delta value to use ("type" field)
Here are some possible values of the "type" field
HIGH - add higher word (16 bits) of delta to the 16-bit value at "offset"
LOW - add lower word of delta to the value at "offset"
HIGHLOW - add full delta to the 32-bit value at "offset"
In other words, HIGHLOW type tells the program that it's doing a fix-up on offset "offset" from the page of this relocation block*, and that there is a doubleword that needs to be modified in order to have properly working executable.
* all of the relocation entries are grouped into blocks, and every block has a page on which its entries are applied
Let's say that you have this instruction in your code:
section .data
message: "Hello World!", 0
section .code
...
mov eax, message
...
You run assembler and immediately after it you run disassembler. Now your code looks like this:
mov eax, dword [0x702000]
You're now curious why is it 0x700000, and when you look into file dump, you see that
ImageBase: 0x00700000
Now you understand where did this number come from and you'e ready to run the executable.
Loader which loads executable files into memory and creates address space for them finds out, that memory 0x700000 is unavailable and it needs to place that file somewhere else. It decides that 0xf00000 will be OK and copies the file contents there.
But, your program was linked to work only with data on 0x700000 and there was no way for linker to know that its output would be relocated. Because of this, loader must do its magic. It
calculates delta value - the old address (image base) is 0x700000 but it wants 0xf00000 (preferred address). It subtracts one from another and gets 0x800000 as result.
gets to the .reloc section of the file
checks if there is still another page (4KB of data) to be relocated. If no, it continues toward calling fileĀ“s entry point.
4.for every relocation for the current page, it
gets data at relocation offset
adds the delta value (in the way as type field states)
places the new value at relocation offset
continues on step 3
There are also more types of relocation entry and some of them are architecture-specific. To see a full list, read the "Microsoft Portable Executable and Common Object File Format, section 6.6.2. Fixup Types".
What you see here is the content of the "Base relocation table" in Microsoft Windows executable files.
Base relocation tables are necessary in Windows for DLL files and they are optional for executable files; they contain information about the location of address information in the EXE/DLL file that must be updated when the actual address of the DLL file in memory is known (when loading the DLL into memory). Windows uses the information stored in this table to update the address information.
The table supports different types of addresses while the naming is Microsoft-specific: ABSOLUTE (= dummy), HIGH, LOW, HIGHLOW, HIGHADJ and MIPS_JMPADDR.
The full name of the constant is "IMAGE_REL_BASED_HIGHLOW".
The "ABSOLUTE" type is typically a dummy entry inserted to ensure the parts of the table are a multiple of 4 (or 8) bytes long.
On x86 CPUs only the "HIGHLOW" type is used: It tells Windows about the location of an absolute (32-bit) address in the file.
Some background info:
In your example the "Image Base" could be 0x20000000 which means that the EXE/DLL file has been compiled to be loaded into address 0x20000000. At the addresses 0x200113B5 (0x20000000 + 0x11000 + 0x3B5) and 0x200113C1 there are absolute addresses.
Let's say the memory at location 0x200113B5 contains the value 0x20012345 which is the address of a function or variable in the program.
Maybe the memory at address 0x20000000 cannot be used and Windows decides to load the DLL into the memory at 0x50000000 instead. Then the 0x20012345 must be replaced by 0x50012345.
The information in the base relocation table is used by Windows to find all addresses that must be replaced.
I am building a DLL using a custom build system (outside Visual Studio), and I can't get uninitialized data to show up in the .bss section; the compiler lumps it into .data. This bloats the final binary size, since it's full of giant arrays of zeroes.
For example (small 1KB arrays in the example, but the actual buffers are much larger):
int uninitialized[1024];
int initialized[1024] = { 123 };
The compiler emits assembly like this:
PUBLIC _initialized
_DATA SEGMENT
COMM _uninitialized:DWORD:0400H
_initialized DD 07bH
ORG $+4092
_DATA ENDS
Which ends up in the object file like this:
SECTION HEADER #3
.data name
0 physical address
0 virtual address
1000 size of raw data
147 file pointer to raw data (00000147 to 00001146)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
C0400040 flags
Initialized Data
8 byte align
Read Write
(There is no .bss section.)
The current compilation flags:
cl -nologo -c -FAsc -Faobjs\ -W4 -WX -X -J -EHs-c- -GR- -Gy -GS- -O1 -Os -Foobjs\file.o file.cpp
I have looked through the list of options at http://msdn.microsoft.com/en-us/library/fwkeyyhe(v=vs.71).aspx but I haven't spotted anything obvious.
I'm using the compiler from Visual Studio 2008 SP1 (Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.30729.01 for 80x86).
You want to use __declspec(allocate()), which you can read up on here: http://msdn.microsoft.com/en-us/library/5bkb2w6t(v=vs.80).aspx
Notice that "size of raw data" is only 0x1000 or 4kB - exactly the size of your initialized array only. The VirtualSize of your .data section will be larger than the size of the actual data stored in the binary image and your uninitialized array will occupy the slack space. Using the bss_seg pragma will force the linker to place your uninitialized data into its own separate section.
Yo can try using bss_seg pragma if you aren't concerned about portability.