Object file is 2.5x larger on linux than on macOS or Windows - c++

I have a file which, when compiled to object file, has the following size:
On Windows, using MSVC, it's 8MB.
On macOS, using clang, it's 8MB.
On linux (Ubuntu 18.04 or Gentoo), using either gcc or clang, it's 20MB.
The file (detailed below) is a representation of (a part of) a unicode table along with character properties. The encoding is utf8.
It occured to me that the problem might be that libstdc++ can't handle the file well, so I tried libc++ with clang on Gentoo, but it didn't do anything (the object file size remained the same).
Then I thought that it might be some optimization doing something odd, but once again I had no size improvements when I went from -O3 to -O0.
The file, on line 50 includes UnicodeTable.inc. The UnicodeTable.inc contains a std::array of the unicode codepoints.
I tried changing std::array to C style array, but again, the object file size did not change.
I have the preprocessed version of the CodePoint.cpp which can be compiled with $CC -xc++ CodePoint.i -c -o CodePoint.o. CodePoint.i contains about 40k lines of STL code and about 130k lines of unicode table.
I tried uploading the preprocessed CodePoint.i to gists.github.com and to paste.pound-python.org, but both refused the 170k lines long file.
At this point I'm out of ideas and would greatly appreciate any help regarding finding out the source of the "bloated" object file size.

From the output of size you linked you can see that there are 12 MB of relocations in the elf object (section .rela.dyn). If a 64 bit relocation takes 24 bytes and you have 132624 table entries with 4 pointers to strings each, this pretty much explains the 12 MB difference (132624 *4 * 24 = 12731904 ~ 12 MB ).
Apparently the other formats either use a more efficient relocation type or link the references directly and just relocate the whole block together with the strings as one piece of memory.
Since you are linking this to a shared library the dynamic relocations will not go away.
I am not sure if it is possible to avoid this with the code you currently use.
However, I think a unicode code point must have a maximal size. Why don't you store the code points by value in char arrays in the RawCodePoint struct? The size of each code point string should be no larger than the pointer you currently store, and the locality of reference of the table lookup may actually improve.
constexpr size_t MAX_CP_SIZE = 4; // Check if that is correct
struct RawCodePointLocal {
const std::array<char, MAX_CP_SIZE> original;
const std::array<char, MAX_CP_SIZE> normal;
const std::array<char, MAX_CP_SIZE> folded_case;
const std::array<char, MAX_CP_SIZE> swapped_case;
bool is_letter;
bool is_punctuation;
bool is_uppercase;
uint8_t break_property;
uint8_t combining_class;
};
This way you should not need relocations for the entries.

Related

C++ How to store large binary lookup table with application?

I suspect what I want to do is create something like a binary library or .obj that I can just feed the linker, but I'm not even sure which of TFM to R for this.
I have a large binary LUT (2 million 32 bit values.) It takes a long time to calculate and I really want to avoid recomputing it each time the application runs. I'd also prefer not to store it as a separate file and read it in, which is what I am doing now.
For smaller LUTs I'd normally just do a header file with suitable entry declarations, but in this case that seems fairly untenable and a will generate lot of overhead just generating the header file.
Development target and environment is a CMAKE application created by Visual Studio 2019 Community Edition.
If you are using C++14 or later you can probably use constexpr to make your LUT at compile time.
This maybe kinda what you are looking for?
https://stackoverflow.com/a/37413361/12743421
I did a similar task in the following way:
In the header file say it MySpecialArray.h there is just external array declaration:
enum {
uMySpecialArraySize = 1000000,
};
extern const unsigned* g_pMySpecialArray;
Generate (with a special run of definable block or external project) a file MySpecialArray.cpp with the actual LUT data as a static C-array:
#include "stdafx.h"
#include "MySpecialArray.h"
const unsigned g_MySpecialArray[uMySpecialArraySize] = {
0x12345678, 0x12345678, 0x12345678, 0x12345678, 0x12345678, 0x12345678,
...
};
const unsigned* g_pMySpecialArray = g_MySpecialArray;
This worked for me for many years. Array is easily accessible from the code and the data block is inside compiled exe. Compile time of mid-size project is normal and there was no significant change with this addition as far as I remember.
My LUT is a few times smaller than requested millions but hope it should work on millions as well.
The question is a bit old, but maybe someone else will seek for solution.

c++ How can I change the size of a void* according to a file I want to process

I am currently trying to make a program that can read a .blend file. Well trying is the important part, since I am already stuck on reading the file block info.
Im gonna quickly explain my problem, please refer this page for context
So in the .blend header there is a char that determines wheter or not the pointer size, later used in the file info block (Or just fileBlock on the linked webpage) among other things, is 4 or 8 bytes long. From what I have read, in c++ the void pointer only changes size according to the target platform it was compiled for ( 8 bytes for 64 bit and 4 bytes for 32 bits ). However .blend files can have either one, regardless of the platform I presume.
Now since blender itself does also read its own files using c, there must be a way to change the pointer to match the required pointer size, according to the info in the header. However my best guess would be to dynamically allocate a void pointer array to either one or two pointers, which then makes actually using the data even more complicated.
Please help me find the intended way of handling the different pointer sizes!
Go back to the top of the wiki page and you will find the File Header structure. The header of a blend file starts with "BLENDER" which is followed by the pointer size for the file -
Size of a pointer
All pointers in the file are stored in this format
'_' (underscore) means 4 bytes or 32 bit
'-' (minus) means 8 bytes or 64 bits.
So by reading the eighth byte of the file you know the size of the pointers in the file.
if (file_bytes[7] == "_")
ptr_size = 4;
else if (file_bytes[7] == "-")
ptr_size = 8;
The copy of blender creating the file determines the sizes used for the file, so a 32bit build will save 32bit pointers in the file while a 64 bit build will save 64bit pointers.
You should also read the next byte, it tells you whether the file was saved as big or little endian, to see if you need to do any byte swapping. The use of blender on big endian machines might be getting smaller, but you may still come across big endian files.
Another important thing that doesn't seem to be mentioned, is that blend files can be compressed and often are. Reading a compressed blend file will mean using gzread() to read the file. A compressed file has the first two bytes set to 0x1f 0x8b
You will find the code that blender uses to read blend files in source/blender/blenloader.
Yup, that's painful. The solution is not to treat them as C++ at all. Instead, create your own class BlendPointer to abstract this away. Those would be read from a BlendFile, and that BlendFile would store whether its BlendPointers are 4 or 8 bytes on disk.

int t[1000000]={1,2,3}; produces a large binary, why?

int t[1000000];
int main(){}
This code (as a .cpp file) produces a small binary when compiled with g++. If I would later use the array t, it would have all it's elements set to 0.
int t[1000000]={1,2,3};
int main(){}
This one, even if compiled with optimization for size (-Os), produces a binary file that is nearly 4M big. The array t is the same as in the first example, except t[0], t[1] and t[2] are set to 1, 2 and 3 respectively. Why does storing those three numbers require so much extra file size?
tested on linux, gcc version 5.4.0
Zero-initialised data with static storage duration is normally stored very efficiently in an executable file. It consumes almost no space on the disk. The executable contains a couple of words that say how many zero-initialised bytes it needs and at which address, and that's it.
Statically-initialised data, on the other hand, is stored as literal bytes that represent its value, and it doesn't matter if most of these bytes are zeroes. The executable format is defined such that it needs to physically store all of them. There's no provision to specify "1, 2, 3 and the rest are zeroes" in a space-efficient manner.
When the executable is loaded into RAM, the loader allocates the required amount of memory for zero-initialised data and fills it with zero-valued bytes, so there is no saving of RAM, only of disk space.

how to use macro for a unsigned long number?

Here're my codes:
#define MSK 0x0F
#define UNT 1
#define N 3000000000
unsigned char aln[1+N];
unsigned char pileup[1+N];
void set(unsigned long i)
{
if ((aln[i] & MSK) != MSK ) {
aln[i] += UNT;
}
}
int main(void) {}
When I try to compile it, the compiler complains like this:
tmp/ccJ4IgSa.o: In function `set':
bitmacs.c:(.text+0xf): relocation truncated to fit: R_X86_64_32S against symbol `aln' defined in COMMON \
section in /tmp/ccJ4IgSa.o
bitmacs.c:(.text+0x29): relocation truncated to fit: R_X86_64_32S against symbol `aln' defined in COMMON\
section in /tmp/ccJ4IgSa.o
bitmacs.c:(.text+0x32): relocation truncated to fit: R_X86_64_32S against symbol `aln' defined in COMMON\
section in /tmp/ccJ4IgSa.o
I think the reason may be the N is too big, because it can compile successfully if I change N to 2000000000. But I need 3000000000 as the value of N..
Anyone has idea about that?
Per your original question: use the integer literal suffix UL (or similar) to force the storage type of N:
#define N 3000000000UL
However, (per your comment on HLundvall's answer) the relocation truncated to fit error obviously isn't due to this - it may (as Mystical and Matt Lacey say) simply be too big to fit in the segment.
As an aside, if you ask a seperate question explaining what you're trying to accomplish with your huge arrays, someone may be able to suggest a better solution (that is more likely to fit in memory)
For example:
your sample code is only using the low nibble of each byte in the code shown: you could pack this into half the size (which is admittedly still much too large)
depending on your access patterns, you might be able to keep the array on disk and cache a working subset in memory
there may be better overall algorithms and data structures if we knew what you needed
Disregarding the "formal" problem that your numeric literal isn't of the correct type (see the other answers for the correct syntax), the key point here is that it's a very bad idea to allocate a 3 GB static/global array.
static and global1 variables on most platforms are mapped directly from the executable image, which means that your executable would have to be as big as 3 GB, which is quite big even for current day standards. Even if on some platforms this limitation may be lifted (see the comments), you don't have any control on how to handle the failure of allocation.
Most importantly, global variables are not intended for such big stuff, and you are likely to find problems with arbitrary limits imposed by the linker (such as the one you found) and the loader. Instead, you should allocate anything that's bigger than a few KBs on the heap, using malloc, new or some platform-specific function, handling gracefully the possible failure at runtime.
Still, keep in mind that for an application running under almost any 32 bit operating system it's not possible to get 3 GB of contiguous memory as you request, and it's impossible altogether to get more than one of these arrays (=more than 4 GB of contiguous memory) without resorting to platform-specific tricks (e.g. mapping only specific parts of the arrays in memory at a given moment).
Also, are you sure that you do need all that contiguous memory since your program starts to run? Isn't there some better data structure/algorithm that could avoid allocating all that memory?
In general, what the standard calls variables with static storage duration.
To enter a numeric constant of type unsigned long use:
#define N 3000000000UL
The problem is that gcc (by default) uses pc-relative accesses to get the address of static data objects on x86_64 targets, and those accesses are limited to 2^31 bytes maximum. So if the symbol ends up getting placed more than 2GB away from the code that accesses it, you'll end up getting this link error when it tries to use an offset that is too big to fit in the 32 bits of space allowed in the instruction.
You can avoid this problem by using the -mcmodel=large option to gcc. This tells it to not assume that it can use 32-bit PC relative offsets to access symbols (among other things)
Note that the type suffix of the constant literal is mostly irrelevant -- a constant literal that is too big for an int will automatically become a long (or even long long if needed) without any suffix. See 6.4.4.1.5 of the C99 spec.
Your executable is trying to put objects in memory past the 4GB mark, which is not allowed. See this link: http://www.technovelty.org/code/c/relocation-truncated.html.
From the article: "If you're seeing this and you're not hand-coding, you probably want to check out the -mmodel argument to gcc."

How much is 32 kB of compiled code

I am planning to use an Arduino programmable board. Those have quite limited flash memories ranging between 16 and 128 kB to store compiled C or C++ code.
Are there ways to estimate how much (standard) code it will represent ?
I suppose this is very vague, but I'm only looking for an order of magnitude.
The output of the size command is a good starting place, but does not give you all of the information you need.
$ avr-size program.elf
text data bss dec hex filename
The size of your image is usually a little bit more than the sum of the text and the data sections. The bss section is essentially compressed because it is all 0s. There may be other sections which are relevant which aren't listed by size.
If your build system is set up like ones that I've used before for AVR microcontrollers then you will end up with an *.elf file as well as a *.bin file, and possibly a *.hex file. The *.bin file is the actual image that would be stored in the program flash of the processor, so you can examine its size to determine how your program is growing as you make edits to it. The *.bin file is extracted from the *.elf file with the objdump command and some flags which I can't remember right now.
If you are wanting to know how to guess-timate how your much your C or C++ code will produce when compiled, this is a lot more difficult. I have observed a 10x blowup in a function when I tried to use a uint64_t rather than a uint32_t when all I was doing was incrementing it (this was about 5 times more code than I thought it would be). This was mostly to do with gcc's avr optimizations not being the best, but smaller changes in code size can creep in from seemingly innocent code.
This will likely be amplified with the use of C++, which tends to hide more things that turn into code than C does. Chief among the things C++ hides are destructor calls and lots of pointer dereferencing which has to do with the this pointer in objects as well as a secret pointer many objects have to their virtual function table and class static variables.
On AVR all of this pointer stuff is likely to really add up because pointers are twice as big as registers and take multiple instructions to load. Also AVR has only a few register pairs that can be used as pointers, which results in lots of moving things into and out of those registers.
Some tips for small programs on AVR:
Use uint8_t and int8_t instead of int whenever you can. You could also use uint_fast8_t and int_fast8_t if you want your code to be portable. This can lead to many operations taking up only half as much code, because int is two bytes.
Be very aware of things like string and struct constants and literals and how/where they are stored.
If you're not scared of it, read the AVR assembly manual. You can get an idea of the types of instructions, and from that the type of C code that easily maps to those instructions. Use that kind of C code.
You can't really say there. The length of the uncompiled code has little to do with the length of the compiled code. For example:
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
int main()
{
std::vector<std::string> strings;
strings.push_back("Hello");
strings.push_back("World");
std::sort(strings.begin(), strings.end());
std::copy(strings.begin(), strings.end(), std::ostream_iterator<std::string>(std::cout, ""));
}
vs
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
int main()
{
std::vector<std::string> strings;
strings.push_back("Hello");
strings.push_back("World");
for ( int idx = 0; idx < strings.size(); idx++ )
std::cout << strings[idx];
}
Both are the exact same number of lines, and produce the same output, but the first example involves an instantiation of std::sort, which is probably an order of magnitude more code than the rest of the code here.
If you absolutely need to count number of bytes used in the program, use assembler.
Download the arduino IDE and 'verify' some of your existing code, or look at the sample sketches. It will tell you how many bytes that code is, which will give you an idea of how much more you can fit into a given device. Picking a couple of the examples at random, the web server example is 5816 bytes, and the LCD hello world is 2616. Both use external libraries.
Try creating a simplified version of your app, focusing on the most valuable feature first, then start adding up the 'nice (and cool) stuff to have'. Keep an eye on the byte usage shown in the Arduino IDE when you verify your code.
As a rough indication, my first app (LED flasher controlled by a push buttun) requires 1092 bytes. That`s roughly 1K out of 32k. Pretty small footprint for C++ code!
What worries me most is the limited amount of RAM (1 Kb). If the CPU stack takes some of it, then there isn`t much left for creating any data structures.
I only had my Arduino for 48 hrs, so there is still a lot to use it effectively ;-) But it's a lot of fun to use :).
It's quite a bit for a reasonably complex piece of software, but you will start bumping into the limit if you want it to have a lot of different functionality. Also, if you want to store quite a lot of static strings and data, it can eat into that quite quickly. But 32 KB is a decent amount for embedded applications. It tends to be RAM that you have problems with first!
Also, quite often the C++ compilers for embedded systems are a lot worse than the C compilers.
That is, they are nowhere as good as C++ compilers for the common desktop OS's (in terms of producing efficient machine code for the target platform).
At a linux system you can do some experiments with static compiled example programs. E.g.
$ size `which busybox `
text data bss dec hex filename
1830468 4448 25650 1860566 1c63d6 /bin/busybox
The sizes are given in bytes. This output is independent from the executable file format, since the sizes of the different sections inside the file format. The text section contains the machine code and const stufff. The data section contains data for static initialization of variables. The bss size is the size of uninitialized data - of course uninitialized data does not need to be stored in the executable file.)
Well, busybox contains a lot of functionality (like all common shell commands, a shell etc.).
If you link own examples with gcc -static, keep in mind, that your used libc may dramatically increase the program size and that using an embedded libc may be much more space efficient.
To test that you can check out the diet-libc or uclibc and link against that. Actually, busybox is usually linked against uclibc.
Note that the sizes you get this way give you only an order of magnitude. For example, your workstation probably uses another CPU architecture than the arduino board and the machine code of different architecture may differ, more or less, in its size (because of operand sizes, available instructions, opcode encoding and so one).
To go on with rough order of magnitude reasoning, busybox contains roughly 309 tools (including ftp daemon and such stuff), i.e. the average code size of a busybox tool is roughly 5k.