Tracking global definitions in LLVM - llvm

I am trying to manually build a list of instructions where a particular variable is getting assigned a value in the LLVM IR.
For local variables in a function, i can easily get the right set of instructions by using the instruction iterator and checking the operands of a particular instruction. This approach doesn't seem to work for the global variables since there's no store instruction associated with them.
Is there some way to get keep track of where the global variable is being defined without looking at the metadata field? If not, is there some way to create a dummy instruction which can be treated as a special marker for initial definition of the global variables?

For local variables in a function, i can easily get the right set of instructions by using the instruction iterator and checking the operands of a particular instruction.
That's not entirely accurate. It's true as long as the variable is in memory (and assignment is done via store), but if it is promoted to registers you'll need to rely on llvm.dbg.value calls to track assignments into it.
This approach doesn't seem to work for the global variables since there's no store instruction associated with them.
Assignments to globals also appear as stores - except for the initial assignment.
Is there some way to get keep track of where the global variable is being defined without looking at the metadata field?
If by "where" you mean in which source line, you'll have to rely on the debug-info metadata.

Related

C++ - Global variable performance when it is likely in the cache

I'm trying to understand if my global variable usage which is being done for convenience and ease of assembly generation has a positive side-effect or not (I guess I'm looking to rid myself of the guilt of having these globals).
Program Details:
Broken up into "operations". Each operation reads I/O then does heavy mathematical compute, lots of special casing of code paths via hand-written assembly.
Single-threaded, will never be multi-threaded
One global variable, a fixed-size pre-allocated array (128K)
One global variable, an integer that acts as a pointer
My justification for using global variables here is primarily that I can then just generate call instructions without having to pass parameters, setting up the stack, etc.
The calls will to be functions like this:
DoSomething1()
{
access global1's memory ...
increment global2 ...
reset code;
}
I can ofcourse generate code for parameters, but then I thought maybe the global variables will likely have a perf benefit as well, since the compiler is going to be a constant address for the access. Of course, my global is extremely likely to be in the cache as well.
Am I thinking about this right? Is it possible that using the global the way I describe will make the compiler try to do load/store as opposed to en register them? In fact, can the compiler en register a global variable?

When are global variables actually considered good/recommended practice?

I've been reading a lot about why global variables are bad and why they should not be used. And yet most of the commonly used programming languages support globals in some way.
So my question is what is the reason global variables are still needed, do they offer some unique and irreplaceable advantage that cannot be implemented alternatively? Are there any benefits to global addressing compared to user specified custom indirection to retrieve an object out of its local scope?
As far as I understand, in modern programming languages, global addressing comes with the same performance penalty as calculating every offset from a memory address, whether it is an offset from the beginning of the "global" user memory or an offset from a this or any other pointer. So in terms of performance, the user can fake globals in the narrow cases they are needed using common pointer indirection without losing performance to real global variables. So what else? Are global variables really needed?
Global variables aren't generally bad because of their performance, they're bad because in significantly sized programs, they make it hard to encapsulate everything - there's information "leakage" which can often make it very difficult to figure out what's going on.
Basically the scope of your variables should be only what's required for your code to both work and be relatively easy to understand, and no more. Having global variables in a program which prints out the twelve-times tables is manageable, having them in a multi-million line accounting program is not so good.
I think this is another subject similar to goto - it's a "religious thing".
There is a lot of ways to "work around" globals, but if you are still accessing the same bit of memory in various places in the code you may have a problem.
Global variables are useful for some things, but should definitely be used "with care" (more so than goto, because the scope of misuse is greater).
There are two things that make global variables a problem:
1. It's hard to understand what is being done to the variable.
2. In a multithreaded environment, if a global is written from one thread and read by any other thread, you need synchronisation of some sort.
But there are times when globals are very useful. Having a config variable that holds all your configuration values that came from the config file of the application, for example. The alternative is to store it in some object that gets passed from one function to another, and it's just extra work that doesn't give any benefit. In particular if the config variables are read-only.
As a whole, however, I would suggest avoiding globals.
Global variables imply global state. This makes it impossible to store overlapping state that is local to a given part or function in your program.
For example, let stay we store the credentials of a given user in global variables which are used throughout our program. It will now be a lot more difficult to upgrade our program to allow multiple users at the same time. Had we just passed a user's state as a parameter, to our functions, we would have had a lot less problems upgrading to multiple users.
my question is what is the reason global variables are still needed,
Sometimes you need to access the same data from a lot of different functions. This is when you need globals.
For instance, I am working on a piece of code right now, that looks like this:
static runtime_thread *t0;
void
queue_thread (runtime_thread *newt)
{
t0 = newt;
do_something_else ();
}
void
kill_and_replace_thread (runtime_thread *newt)
{
t0->status = dead;
t0 = newt;
t0->status = runnable;
do_something_else ();
}
Note: Take the above as some sort of mixed C and pseudocode, to give you an idea of where a global is actually useful.
Static Global is almost mandatory when writing any cross platform library. These Global Variables are static so that they stay within the translation unit. There are few if any cross platform libraries that does not use static global variables because they have to hide their platform specific implementation to the user. These platform specific implementations are held in static global variables. Of course, if they use an opaque pointer and require the platform specific implementation to be held in such a structure, they could make a cross platform library without any static global. However, such an object needs to be passed to all functions within such a library. Therefore, you have a pass this opaque pointer everywhere, or make static global variables.
There's also the identifier limit issue. Compilers (especially older ones) have a limit to the number of identifiers they could handle within a scope. Many operating systems still use tons of #define instead of enumerations because their old compilers cannot handle the enumeration constants that bloat their identifiers. A proper rewrite of the header files could solve some of these.
Global variables are considered when you want to use them in every function including main. Also remember that if you initialize a variable globally, its initial value will be same in every function, however you can reinitialize it inside a function to use a different value for that variable in that function. In this way you don't have to declare the same variable again and again in each function. But yes they can cause trouble at times.
List item
Global names are available everywhere. You may unknowingly end up using a global when you think you are using a local
And if you make a mistake while declaring a global variable, then you'll have to apply the changes to the whole program like if you accidentally declared it to be int instead of float

Debugging - detect a function writing a memory location

I need to know if there is a way with linux debugger gdb to detect if a function (any function) of a specific C++ class (represented by file Chord.cc) access a specific memory location (let's say 0xffffbc).
That will help me a lot.
Thanks.
GDB watchpoints are what you're looking for:
Quote from that page:
You can use a watchpoint to stop execution whenever the value of an
expression changes, without having to predict a particular place where
this may happen. (This is sometimes called a data breakpoint.) The
expression may be as simple as the value of a single variable, or as
complex as many variables combined by operators. Examples include:
A reference to the value of a single variable.
An address cast to an appropriate data type. For example, `*(int
*)0x12345678' will watch a 4-byte region at the specified address (assuming an int occupies 4 bytes).
You can then try to apply the techniques from this post to make it a conditional watchpoint, and see if you can find a way to restrict it to particular function calls from that class. You may also find this discussion relevant in that respect.

How to control the location of a global variable in LLVM IR?

I'm trying to modify LLVM so that it keeps certain constants and functions contiguous in memory.
In other words, I need to ensure that the machine codes for certain functions are always preceded by some ~4-byte constant in memory. The function body itself must not be modified.
Could I achieve this simply through modifying the LLVM IR somehow?
If yes:
How would I state in the LLVM IR to keep a variable and a function contiguous in memory?
If no:
What part of the code generation process (i.e. which pass(es)) should I modify in order to achieve this? Any links to the projects/files I should look at would be helpful, since I'm not sure where to begin yet.
As far as I know, I don't think you can do that by just modifying the IR; you'd have to write something to handle it, yourself. It shouldn't be a pass, either - it's too low-level, it should run during the target-specific code generation. You can piggyback on an existing target and just modify this aspect, of course, you don't have to write a new target from scratch. I don't know which location exactly will be good for this, though.
I think a good way to pass this information from the IR level to the DAG during the code generation would be using metadata: attach metadata to either the function or the associated constant that will link them with each other, then later on use that link for emitting them together. See this thread on llvm-dev for information how to transfer the metadata.

Should I set global vars value on startup or the first time I use them? C++

I have a few global vars I need to set the value to, should I set it into the main/winmain function? or should I set it the first time I use each var?
Instead, how about not using global variables at all?
Pass the variables as function parameters to the functions that need them, or store pointers or references to them as members of classes that use them.
Is there a chance that you won't be using the global var? Is calculating any of them expensive? If so then you have a argument for lazy initialization. If they are quick to calculate or always going to be used then init them on startup. There is no reason not to, and you will save yourself the head ache of having to check for initialization every time you use it.
When the linker links your program together, global variables (also known as writable static data) are assigned to their own section of memory (the ELF .data section) and have a value pre-assigned to them. This will mean that the compiler will not need to generate instructions to initialize them. If you initialize them in the main function, the compiler will generate initialization instructions unless it is clever enough to optimize them out.
This is certainly true for ELF file formats, I am not sure about other executable formats.