Produce Large Object file from Smaller source file [closed] - c++

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I need to stress test my program with large object files. I have researched into C++ Templates and inline functions/templates but have not been able to get the desired obj/source size ratio I want (about 50). I want to be able to have a single source compiled to a single object file with the latter's max size of 200MB. Any high level ideas would be greatly appreciated.
Additional Edit: I have created large/complex and diverse (random) template functions and have started calling them (creating instantiations) with unique types/parameters. This increased the obj/source ratio, as expected, to a certain point (around 12). Then the ratio dropped significantly (about 1) to what I assume is gcc outsmarting me and optimizing my methods. I have also looked into forcing gcc to create all functions inline, but my tests haven't shown improvements on that yet either.
Using the preprocessor to code bloat is not a valid technique for what I wish to accomplish.

You could use the preprocessor to generate lots and lots of code at preprocessing, but that might count for you as the source file being large on its own.
Generally speaking, the machine code that a C++ compiler produces is relatively small (in bytes) compared to the code that wrote it.
One thing that does hog up space is string resources. Now, if you use the same string over and over again the compiler will be smart enough to realise that it only needs to store it once, but if you change the resource a little bit each time then each variation will probably be stored separately. This changing can be done using the preprocessor.
Another idea is like you said, using templates to generate lots of functions for a lot of different types.
What is it that you want to accomplish? There might be better ways.

Related

Authentication via command line [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to provide a binary-only program written in C or C++ where the user has to pass a valid code via command line to be able to use some extra features of the program itself. The idea is to implement some verification strategy in the program which compares the passed code against a run-time generated code which univocally identifies the system or hardware on which the program is being run.
In other words, if and only if the run-time check:
f(<sysinfo>) == <given code>
is true, then the user is allowed to use the extra features of the program. f is the function generating the code at run-time and sysinfo is an appropriate information identifying the current system/hardware (i.e. MAC address of the first ethernet card, Serial Number of the processor, etc..).
The aim is to make it as much difficult as possible for the user to guess or (guess the way to calculate) a valid code without knowing f and sysinfo a priori. More importantly, I want it to be difficult to re-implement f by analyzing the disassembled code of the program.
Assuming the above is a strong strategy, how could I implement f in C or C++ and what can I choose as its argument? Also what GCC compiler flags could I turn on to obfuscate f specifically? Note that, for example, things like MD5(MAC) or MD5(SHA(MAC)) would be too simple for evident reasons.
EDIT: Another interesting point is how to make it difficult for the user to attack the code directly by removing or bypassing the portion of the code doing the check.
If you are on Windows, a standard strategy is to hash the value of the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Cryptography\MachineGuid
If you're worried that a user might "guess" the hash function, take a standard SHA256 implementation and do something sneaky like change the algorithm initialization values (one of the two groups of these uses binary representations of the cube roots of the primes to initialize - change it to 5th or 7th or whatever roots, starting at the nth place, such that you chop off the "all-zero" parts, etc.)
But really if someone is going to take the time to RE your code, it's much easier to attack the branch in the code that does the if (codeValid) { allowExtraFeatures(); } then to mess with the hashes... so don't worry too much about it.

Where to store code constants when writing a JIT compiler? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am writing a JIT compiler for x86-64 and I have a question regarding best practice for inclusion of constants into the machine code I am generating.
My approach thus far is straightforward:
Allocate a chunk of RW memory with VirtualAlloc or mmap
Load the machine code into said memory region.
Mark the page executable with VirtualProtect or mprotect (and remove the write privilege for security).
Execute.
When I am generating the code, I have to include constants (numerical, strings) and I am not sure what is the best way to go about it. I have several approaches in mind:
Store all constants as immediate values into instructions' opcodes. This seems like a bad idea for everything except maybe small scalar values.
Allocate a separate memory region for constants. This seems to me like the best idea but it complicates memory management slightly and compilation workflow - I have to know the memory location before I can start writing the executable code. Also I am not sure if this affects performance somehow due to worse memory locality.
Store the constants in the same region as the code and access it with RIP-relative addressing. I like this approach since it keeps relevant parts of the program together but I feel slightly uneasy about mixing instructions and data.
Something completely different?
What is the preferable way to go about this?
A lot depends on how you are generating your binary code. If you use a JIT assembler that handles labels and figuring out offsets, things are pretty easy. You can stick the constants in a block after the end of the code, using pc-relative references to those labels and end up with a single block of bytes with both the code and the constants (easy management). If you're trying to generate binary code on the fly, you already have the problem of figuring out how to handle forward pc-relative references (eg for forward branches). If you use back-patching, you need to extend that to support references to your constants block.
You can avoid the pc-relative offset calculations by putting the constants in a separate block and passing the address of that block as a parameter to your code. This is pretty much the "Allocate a separate region for constants" you propose. You don't need to know the address of the block if you pass it in as an argument.

Why do developers split every single part of a software into so many modules? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm having a read of other people's source code from open source projects such as Pidgin, Filezilla and various others so that I may get a brief idea how software really is written.
I noticed that when writing GUI, they like to split the whole interface into classes.
More or less, lots of projects I see every single bit broken down, into perhaps a total of 70 files (35cpp and 35.h).
For example: 1 listview may be an entire class, a menubar may be a class or a tabview a whole module.
And this isn't just the UI part - the network modules are also broken down by a huge amount - almost every function itself is a .cpp file.
My question: Is this really just preference or does it have any particular benefit?
I for example would've written the whole UI into a single module..
What is the actual reason?
Some languages encourage one file per type, and people who know those languages also program in c++, and bring that habit here.
For reuse, you want thing you put into a header to be simple, orthogonal, and conceptually clean, this tends to mean avoiding files that have everything one particular project needs.
Before git/mercurial it could be a real hassle to have multiple people edit the same file. Thus separating things into lots of files help a lot with multiple people editting, both for the edits and for your version control software.
It also speed up compilation. The small the file you are editting, the less compilation is needed, so unless the linking stage is slow, small file is a very good thing.
Many people have been hurt by cramming things into a single or small numbers of files. Few people have been seriously hurt by chopping them up into 50+ files. People tend towards things that dont overtly teach you hard lessons.
you might want to split the project into separate files to improve readability and to sometimes also make debugging easier. The filezilla project could have all been written into just two files something like main.cpp and main.h but if you do that, you will have to write tens of thousands of codes into the same file which is a very bad programming practice even though it is legal.
One benefit will come from Testing. Distributing system testing throughout the design hierarchy (e.g. rather than a single physical component) can be much effective and cheaper than testing at only the highest-level interface.

Should I use a code converter (Python to C++)? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 months ago.
Improve this question
Let me just say right off the bat that i'm not a programmer. I'm just a guy with an idea taking his first steps to make it a reality. I'm no stranger to programming, mind you, but some of the concepts and terminology here are way over my head; so i apologize in advance if this question was answered before (i.e. Convert Python program to C/C++ code?).
I have an idea to create a simple A.I. network to analyze music data sent from a phone via cloud computing (I got a guy for the cloud stuff). It will require a lot of memory and need to be fast for the hard number-crunching. I had planned on doing it in python, but have since learned that might not be such a good idea (Is Python faster and lighter than C++?).
Since python is really the only gun i have in my holster, i was thinking of using a python-to-C++-converter. But nothing comes without a price:
Is this an advantageous way to keep my code fast?
What's the give-and-take for using a converter?
Am i missing anything? I'm still new to this so i'm not even sure what questions to ask.
Thanks in advance.
Generally it's an awful way to write code, and does not guarantee that it will be any faster. Things which are simple and fast in one language can be complex and slow in another. You're better off either learning how to write fast Python code or learning C++ directly than fighting with a translator and figuring out how to make the generated code run acceptably.
If you want C++, use C++. Note, however that PyPy have a bunch of benchmarks showing that they can be much faster than C; and with NumPy, which uses compiled extensions, numerical work becomes much faster and easier.
If you want to programme in something statically compiled, and a bit like Python, there's RPython.
Finally, you can do what NumPy does: use extensions written in C or C++ for most of your heavy computational lifting, where that appears to be appropriate, either because profiling shows a hotspot, or because you need an extension to more easily do something involving python's internals. Note that this will tie your code to a particular implementation.
Similar to what was already stated, C++ may be faster in some areas and slower in others. Python is exactly the same. In the end, any language will be converted into machine code. It is really up to the compiler in the end to make it as efficient as it knows how to do. That said, it is better to pick one language and learn how to write fast and efficient code to do what you want.
No because significant part of the good C++ performance comes from the possibility to choose the better performing architecture. It does not come magically from the same fact "because it is C".
A simple, line by line translation from Python into C++ is unlikely to increase the performance more than just using something like Cython so I think it is more reasonable to use Cython. It can still be much worse than a good developer can do with C++ from scratch. C++ simply provides more control over everything like possibility to define data type of the minimal needed length, fixed size array on stack, turn off array bounds checking in production and the like.

Reverse engineering C++ - best tools and approach [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I am sorry - C++ source code can be seen as implementation of a design, and with reverse-engineering I mean getting the design back. It seems most of you have read it as getting C++ source from binaries. I have posted a more precise question at Understanding a C++ codebase by generating UML - tools&methology
I think there are many tools that can reverse-engineer C++ (source-code), but usually it is not so easy to make sense of what you get out.
Have somebody found a good methodology?
I think one of the things I might want to see for example is the GUI-layer and how it is separated (or not from the rest). Think the tools should somehow detect packages, and then let me manually organize it.
To my knowledge, there are no reliable tools that can reverse-engineer compiled C++.
Moreover, I think it should be near impossible to construct such a device. A compiled C++ program becomes nothing more than machine language instructions. In order to kn ow how that's mapped to C++ constructs, you need to know the compiler, compiler settings, libraries included, etc ad infinitum.
Why do you want such a thing? Depending on what you want it for, there may be other ways to accomplish what you're really after.
While it isn't a complete solution. You should look into IDA Pro and Hexrays.
It is more for "reverse engineering" in the traditional sense of the phrase. As in, it will give you a good enough idea of what the code would look like in a C like language, but will not (cannot) provide fully functioning source code.
What it is good for, is getting a good understanding of how a particular segment (usually a function) works. It is "user assisted", meaning that it will often do a lot of dereferences of offsets when there is a really a struct or class. At which point, you can supply the decompiler with a struct definition (classes are really just structs with extra things like v-tables and such) and it will reanalyze the code with the new type information.
Like I said, it isn't perfect, but if you want to do "reverse engineering" it is the best solution I am aware of. If you want full "decompilation" then you are pretty much out of luck.
You can pull control flow with dissembly but you will never get data types back...
There are only integers (and maybe some shorts) in assembly. Think about objects, arrays, structs, strings, and pointer arithmetic all being the same type!
The OovAide project at http://sourceforge.net/projects/oovaide/ or on github
has a few features that may help. It uses the CLang compiler
for retrieving accurate information from the source code. It scans the
directories looking for source code, and collects the information into
a smaller dataset that contains the information needed for analysis.
One concept is called Zone Diagrams. It shows relationships between classes at
a very high level since each class as shown as a dot on the diagram, and
relationship lines are shown connecting them. This allows
the diagrams to show hundreds or thousands of classes.
The OovAide program zone diagram display has an option call "Show Child Zones",
which groups the classes that are within directories closer to each other.
There are also directory filters, which allow reducing the number of classes
shown on a diagram for very large projects.
An example of zone diagrams and how they work is shown here:
http://oovaide.sourceforge.net/articles/ZoneDiagrams.html
If the directories are assigned component types in the build settings, then
the component diagram will show the dependencies between components. This
even shows which components are dependent on external components such as
GTK, or other external libraries.
The next level down shows something like UML class diagrams, but shows all
relations instead of just aggregation and inheritance. It can show
classes that are used within methods, or classes that are passed as
parameters to methods. Any class can be chosen as a starting point, then before
a class is added the diagram, a list is displayed that allows viewing
which classes will be displayed by a relationship type.
The lowest level shows sequence diagrams. This allows navigating up or down
the call tree while showing the classes that contain the methods.