D: are global variables bad? - d

Sometimes I need to get variable variable, that can be accessed from any place of code. But I often hear that global variables are bad. What is the best practice in D for such case?
Now my code look like:
string roothtml;
static this()
{
roothtml = buildPath(getcwd, "html");
}
void main()
{
//
}
Is it's good or bad practice?

Global variables are problematic for a few reasons.
It's harder to track where variables are coming from when you're reading. This makes it harder to understand a function using globals.
It's harder to track where a variable is used. This makes it harder to modify how your global variables should be used.
It's more awkward to inject test data and test stubs.
Test state will spill over to other tests.
__gshared globals require locking or immutability.
Thread-local globals are thread-local, so you can't propagate writes to all threads.
Any globals require you to think about whether you want it thread-local or __gshared.
If you need to convert your single-tenant application into a multi-tenant one, that will be painful if you're using global variables. And it's more common than you probably suspect.
You have to be careful not to build with -unittest when running your application so you don't trash global state initialized in a static constructor.
On the plus side, it's convenient that you don't have to pass global state everywhere. You don't have to use method to method object refactorings as often. You don't have to bring in a dependency injection system. It's very convenient when it's not biting you.

It depends on what you really mean by 'global'. In the example above, I'd say its fine.
You appear to be showing a main module, which probably shouldn't be imported
by anything. In other words, it isn't really global, it is local to the main
module. It really isn't so different from
class Main {
private string _roothtml;
static this() { _roothtml = buildPath(getcwd, "html"); }
void run() { }
}
Even if it isn't really your main, D's
module system offers protections of its
own. Just stick a private on roothtml to encapsulate it within the module
(it wouldn't hurt to do this in your main module anyways, just to be clear).
A pattern like this is widely employed in the source code of git. Rather than
having a single main module that invokes a function for a given command, you
have many main functions -- one for each top-level command.
For example, take a look at
upload-pack.c.
See those variables declared at the top of the source file?
Would the code have been any clearer or safer if they were wrapped in a class in
typical OOP style or of explicitly passed to each function in a more purely
functional style?
Each source file acts as a unit of encapsulation for a given command. This style is not always appropriate, but in the case of a program that can be thought of as a set of distinct commands, it can be cleaner than the alternatives.
Ultimately, the answer will be specific to the context, your given project, and
your personal style. Generally speaking, cross-module globals are something to
be looked on with suspicion, but module-level variables can sometimes be cleaner
than the alternatives.

Related

How to make an old C codebase with many globals reentrant

I'm working with a large, old C codebase (an interpreter) that uses global variables a great deal, with the result that I cannot have two instances of it at once. Is there a straightforward (ideally automated) approach to convert this code to something reentrant? i.e. some refactor tool that would make all globals part of a struct and prepend the pointer to all variables?
Could I convert to C++ and wrap the entire thing in a class definition?
I would recommend to convert your project into C++11 project and change all your static vars into threadlocal.
This can be up to several days of work depending on the size of your project. In certain cases this will work.
I'm not aware of any "ready made" solution for this type of problem.
As a general rule, global variables are going to make it hard to make the code reentrant.
If you can remove all the global variables [simply delete the globals and see where you get compiler errors]. Replace the globals with a structure, and then use a structure per instance that is passed along, you'd be pretty much done (as long as the state of the interpreter instances is independent, and the instances don't need to know about each other). [Of course, you may need to have more than a single structure to solve the problem, but your global variables should be possible to "stick in a structure"].
Of course, making the structure and the code go together as a C++ class (which may have smaller classes as part of the solution) would be the "next step", but it's not entirely straight forward to do this, if you are not familiar with C++ and class designs.
Are you trying to make it reentrant in order to be able to make it multi-thread, and divide the work between threads?
If so, I would consider making it multi process, instead of multy-thread,
What I usually do with interpreters is go straight to a class with instance vars rather than globals. Not sure what you are interpreting, but it could be possible to pass in a file path or string container that the class interprets with an internal thread, so encapsulating the entire interpretation run.
It is possible to wrap the whole thing in a class definition, but it will not work for code that takes addresses of functions and passes them to C code. Also, converting a large legacy code base to be compilable by a C++ compiler is tedious enough that it probably outweighs the effort of removing the global variables by hand.
Barring that one, you have two options:
Since you need reentrancy to implement threading, it might be easiest to declare all global variables thread-local. If you have a C compiler that supports thread-locals, this is typically as easy as slapping a __thread (or other compiler-specific keyword) before every declaration. Then you create a new interpreter simply by creating a new thread and initializing the interpreter in the normal way.
If you cannot use __thread or equivalent, then you have to do a bit more footwork. You need to place all global variables in a structure, and replace every access to global variable foo with get_ctx()->foo. This is tedious, but straightforward. Once you are done, get_ctx() can allocate and return a thread-local interpreter state, using an API of your choosing.
A program transformation tool that can handle the C language would be able to automate such changes.
It needs to be able to resolve each symbol to its declaration, and preprocessor conditionals are likely to be trouble. I'd name one, but SO zealots object when I do that.
Your solution of a struct containing the globals is the right idea; you need rewrite that replaces each global declaration with a slot member, and each access to a global with access to the corresponding struct member.
The remaining question is, where does the struct pointer come from? One answer is a global variable that is multiplexed when threads are switched; a better answer if available under your OS is the get the struct pointer from thread local variables.

global variables for "things" that are global and "single"?

When dealing with microcontrollers there are things that are inherently global - I'm thinking about peripherals like serial ports or other interfaces. There are also peripherals that are not only global but there is only one (and there will never be more) - like peripheral controlling core clocks or interrupt controller. These peripherals do have some kind of global state (for example - core clock is set to something) and it's inefficient to reverse-calculate these values.
If I'd like my program to be nicely object-oriented, I'm having hard time deciding how to deal with such objects... Global variables are not nice and this is obvious, but I just don't know (not enough experience) whether I should try to "hide" the fact that these things ARE global... For example "cin" or "stdout" are globals too (let's ignore the fact that in multithreaded apps these are usually thread-specific) and noone is hiding that... Let's stick with the clock generator peripheral - there is only one, so I could use the singleton anti-pattern (; or make the class static or just have the single object global (that's what I have usually done), as this object has the current clock setting stored and this value is needed for LOTS of other things - it's needed to set system timer used by RTOS, it's needed to set clocks for other peripherals (UART baudrate, SPI bitrate, ...), it's needed to set correct clock for external memories or configure memory wait states. That's why I think that creating one object in main() and passing it around everywhere would be a bit cumbersome...
I could write the methods so that all "global" information would come from the peripheral registers (for example the core frequency could be reverse-calculated from current PLL settings), but this also seems like a wrong idea, not to mention that creating object for clock generator peripheral everywhere would look funny...
Current clock setting could be stored in static member of the class, but from here there's only one small step towards a fully static class (as "this" pointer will be useless for a class that has no state)...
The solution usually found in not-object-oriented programs is closest to fully static class - there are only functions that operate on global variables.
Anyone has some nice idea how to deal with such scenario nicely or whether this problem is worth the time? Maybe I should just use one global object and be done with it? (;
If I'd like my program to be nicely object-oriented, I'm having hard time deciding how to deal with such objects... Global variables are not nice and this is obvious, but I just don't know (not enough experience) whether I should try to "hide" the fact that these things ARE global...
When I read that, I wonder if you know why you are using OOP and why you don't use globals.
Firstly, OOP is a tool, not a goal. In your case, the interrupt controller doesn't need things like derivation and virtual functions. All you will need is an interface to program it, wrapped in a single class. You could even use a set of plain functions that do that (C=style modular programming) without giving up on maintainability. In your case, making the single instance global is even clearer. Imagine the alternative, where different parts of the program could instantiate a class that is used to access the same UART underneath. If you're using globals, the code (or rather the author) is aware of this and will think about how to coordinate access.
Now, concerning the globals, here's an example why not to use them:
int a, b, c;
void f1()
{
c = a;
f3();
}
void f2()
{
c = b;
f3();
}
void f3()
{
// use c
}
int main()
{
a = 4;
f1();
b = 5;
f2();
}
The point here is that parameters are stored in globals instead of passing them around as actual parameters and making it difficult to see where and when they are used. Also, the use above totally rules out any recursive calls. Concerning your environment, there are things that are inherently global though, because they are unique parts of the environment, like the interrupt controller (similar to cin/cout/cerr/clog). Don't worry about those. There have to be really many of them used all over the place until you need to think about restricting access.
There are two guidelines to make this easier:
The larger the scope of an object, the more it needs a speaking name (compare to a, b, c above). In your case, I'd store all hardware-specific objects in a common namespace, so it is obvious that some code is accessing globals. This also allows separate testing and reuse of this namespace. Many hardware vendors even provide something like this in the form of a library, in order to help application development.
In the code wrapping/modelling the hardware, perform requests but don't make decisions that are specific to the application on top of it. Compare to e.g. the standard streams, that are provided but never used by the standard library itself, except maybe for dumping fatal error information before aborting after a failed assertion.
You have pretty much outlined your options:
globals
static data members of classes / singletons
It's ultimately up to you to decide between the two, and choose which you like more from the aesthetic or other prospective.
At the end of the day, just as you have said, you'll still have one clock generator, one UART, etc.
One thing to keep in mind is that too much abstraction solely for the purpose of abstraction isn't going to buy you much if anything. What it may do, however, is make it harder for someone unfamiliar to your code figure out how things really work behind the layers of classes. So, take into account your team, if any.
The Singleton pattern is a source of debate, for sure. Some people simply say it's "bad", but I would disagree; it is just widely misused and misunderstood. It is definitely useful in certain situations. In fact, it applies to exactly the situation you've described.
If you have a class that needs to be available globally and by its very nature cannot have multiple instances, then a singleton is the best option.
Singletons, global objects, static classes, they are all the same. Dress the evil global state in whatever sauce you want, it's still global state. The problem is in the salad, not in the dressing, so to speak.
A little-explored path is monadic code, Haskell-style (yes in C++). I have never tried it myself, but from the looks of it, this option should be fun. See e.g. here for an example implementation of Monad interface in C++.

C++ global structure creates name conflict

I have written a quite extensive framework that drives characters in a physical simulation. Even though everybody warned me not to do it, I used a global public data structure for storage of information and called it State. It's not in a namespace either. I make it globally accessible by declaring extern State state;. The reason why I did this is because this structure is needed everywhere in the application and I find it extremely convenient to just include my State.h and then write to state.var anywhere and read state.var anywhere. The framework is changing rapidly, too, and I also find comfort in not having to care about passing data around, synchronizing etc. when new components are introduced.
Anyhow, now the s*** hit the fan. I want to use one of Qt's GUI classes and it already has it's own member called state of type State. Their state is at least in a namespace, but it doesn't seem to matter, since inside the class I'm already using that namespace.
What can I do now?
Your only choice is to rip out your global and replace it with something sane. This is extremely painful but you really don't have any other option. This is why people recommended against using one in the first place.
In short, congratulations on learning the lesson at hand- don't use global variables.
I probably do not understand the problem, but what's stopping you from doing
::state.var
?
Plain :: means global namespace, and while using global symbols has the well known issues, and global variables also have their own set of issues (generally in C++ code, singletons are used instead), there's nothing magically evil about using a global variable in the global namespace. ::errno is an example of such a variable linked to practically every C and C++ application on Unix-like platforms.
Well, there is a simple alternative:
extern State state;
State& mystate = state;
namespace qt {
class State;
class Gui {
public:
void foo() {
mystate.var = 3;
}
private:
State* state;
};
}
... but there is also something called Technical Debt, and you are borrowing deeply.

Use of global variables in simulation code [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Are global variables bad?
I am writing a simulation code making use of material and energy specific data. This data is stored in global arrays, because once uploaded, they are used during the simulation and should be accessible by most of the functions.
I read everywhere that it is not good practice to use global variables.
Could someone explain me or point me to material on the web explaining how I could avoid using global arrays in simulation application coding while massive data arrays need to be used. I try to code in C++ and make use as much as possible of object oriented features.
Thanks in advance for your help.
You are right about the fact that, using globals are not recommended. You can declare those unrelated golbals inside a namespace,
//Globals.h
namespace Globals
{
extern int a[100];
extern double d;
}
and define them in a .cpp file.
//Globals.cpp
int Globals::a[100] = { ... };
double Globals::d = 3.14;
Now use them as Globals::a, Globals::d etc. My answer is in code management perspective.
Yes you are right, global variables are not good.
This is a useful link which explains why global variables are bad and how to avoid them.
http://c2.com/cgi/wiki?GlobalVariablesAreBad
EDIT:
#sergio's post also points to the same link, you can ignore this answer
Could someone explain me or point me to material on the web explaining how I could avoid using global arrays in simulation application coding while massive data arrays need to be used.
The same way you avoid globals in general: by using locals instead. The natural way to get data into a function is to pass it as a parameter. This is not expensive, especially if you pass by reference where appropriate.
Have a look at this article about global variables. This is an excerpt:
Why Global Variables Should Be Avoided When Unnecessary
Non-locality -- Source code is easiest to understand when the scope of its individual elements are limited. Global variables can be read or modified by any part of the program, making it difficult to remember or reason about every possible use.
No Access Control or Constraint Checking -- A global variable can be get or set by any part of the program, and any rules regarding its use can be easily broken or forgotten. (In other words, get/set accessors are generally preferable over direct data access, and this is even more so for global data.) By extension, the lack of access control greatly hinders achieving security in situations where you may wish to run untrusted code (such as working with 3rd party plugins).
Implicit coupling -- A program with many global variables often has tight couplings between some of those variables, and couplings between variables and functions. Grouping coupled items into cohesive units usually leads to better programs.
Concurrency issues -- if globals can be accessed by multiple threads of execution, synchronization is necessary (and too-often neglected). When dynamically linking modules with globals, the composed system might not be thread-safe even if the two independent modules tested in dozens of different contexts were safe.
Namespace pollution -- Global names are available everywhere. You may unknowingly end up using a global when you think you are using a local (by misspelling or forgetting to declare the local) or vice versa. Also, if you ever have to link together modules that have the same global variable names, if you are lucky, you will get linking errors. If you are unlucky, the linker will simply treat all uses of the same name as the same object.
Memory allocation issues -- Some environments have memory allocation schemes that make allocation of globals tricky. This is especially true in languages where "constructors" have side-effects other than allocation (because, in that case, you can express unsafe situations where two globals mutually depend on one another). Also, when dynamically linking modules, it can be unclear whether different libraries have their own instances of globals or whether the globals are shared.
Testing and Confinement - source that utilizes globals is somewhat more difficult to test because one cannot readily set up a 'clean' environment between runs. More generally, source that utilizes global services of any sort (e.g. reading and writing files or databases) that aren't explicitly provided to that source is difficult to test for the same reason. For communicating systems, the ability to test system invariants may require running more than one 'copy' of a system simultaneously, which is greatly hindered by any use of shared services - including global memory - that are not provided for sharing as part of the test.
It also discusses several alternatives. Possibly in your case, you could consider:
hiding your globals (e.g., private static variables);
stateful procedures: setter and getter functions allowing access to the arrays while also "masking" it;
the singleton pattern.
EDIT:
I understand that a part of the development community are against the use of the singleton pattern. I fully respect this opinion. Anyway, in the context of the present discussion, the singleton offers several advantages over the raw use of globals:
improved access control;
opportunity for synchronization;
ability to abstract away the implementation.
In this respect, it is not better from a setter/getter set of functions, but still, not worse. I leave to the OP the hard task of choosing what to do with his own code. (BTW, the article discusses more approaches, like Context Objects, DependencyInjection, etc).
Introducing global state into your code can make it difficult to do things in a multi-threaded way.
I would also argue it can make the intent of your code more difficult to follow. If you pass all of the arguments to a function as parameters at least it's clear what data the function has access to, and what has the potential of changing. The use of global variables doesn't give someone reading the code this chance...
It's also not generally true that using global variables is in any way faster. If you have large objects that you need to pass to functions, pass these arguments via references and there won't be any issues with copying.
Without knowing more about your setup, it's difficult to make any recommendations, but if you have a large amount of data that needs to be passed around to a series of routines I would be tempted to put it all in a struct, and to pass that struct by reference:
struct my_data_type
{
double some_data;
double some_other_data;
std::vector<double> some_coefficients;
std::vector<double> some_other_coefficients;
std::string some_name;
std::string some_other_name;
// add more members as necessary...
};
void foo(my_data_type &data)
{
// there's no big overhead passing data by reference
}
If you only need to access the data in a read-only fashion, it's best to pass as a const reference:
void foo(my_data_type const&data)
{
// again, efficient, but ensures that data can't be modified
}
EDIT: In answer to your comment, I'm not talking about a "global" structure. You would need to declare a local struct variable, read the data from your file into the struct and then pass it by reference to any functions that need to be called:
int main()
{
// a local struct to encapsulate the file data
my_data_type file_data;
// read the data from your file into file_data
// the struct containing the data is passed by reference
foo(file_data);
return 0;
}

what is the best way to use global variables in c++?

For me, I usually make a global class with all members static. All other classes will inherit from this global class.
I wonder if this is a good practice?
Anybody got any suggestions?
Generally try to avoid global variables as they introduce global state. And with global state you do not have referential transparency. Referential transparency is a good thing and global state is a bad thing. Global state makes unit tests pretty pointless for example.
When you have to though, I'd agree that most of the time the method you mentioned is fine. You can also declare the global variable in any .cpp file and then have in your .h file an extern to that global variable.
Best way? Carefully... :-)
Your suggested practice has not solved a single problem associated with global variables.
Private member data with access functions allow single point control of read/write and validation, improving robustness, maintainability, and ease of debugging.
Making the data members static simply reduces flexibility; you might want multiple independent global objects containing the same data structure. If there should only ever be one, use the singleton pattern.
Collecting unrelated global data into a single class breaks best practice regarding coupling and cohesion and is hardly OO.
This article relates to C and embedded systems, but is no less relevant to your question.
First, global state is bad. It seriously complicates understanding the program, since the behavior of any part can depend on the global variables. It makes it harder to test. It provides a way by which two far-distant functions can create an inconsistent state that may mess up another function, and that will be very difficult indeed to debug.
The nature of the global state doesn't matter. This is what's usually maligned about the Singleton pattern, for example.
However, having every class inherit from one global variable class is a bad idea. In C++, inheritance should be used sparingly, as it ties two classes together in implementation. It's usually a bad thing to have all the classes inheriting from one base class in any form, and C++ doesn't handle multiple inheritance really well. It would be really easy to get the "deadly diamond" effect, since if A inherits from B and they both inherit from Global, Global will appear twice in A's inheritance hierarchy.
There are 2 things that chagrined me: The use of global variables is bad, but sometimes it's difficult to do without, however I am chagrined by:
The clear abuse of inheritance
The wonderful dependency issue
The combination of the two has a staggering effect.
Let's say I create a class that will access a global variable, from your Anti-Pattern it gives:
#include "globals.h"
class MyClass: Globals // for my own sake I assume it's not public inheritance
{
};
Of course, the #include is mandatory in the header, since I inherit from it. Therefore each time I add / change one of the global, even one that is used by a single class... I recompile the whole application.
If we'd ever worked on the same team, that would earn you a very harsh, very stern comment... to say the least.
Ick!
A global variable is a global variable. Renaming it -- even with a name that makes it look like a member varaible, doesn't change that. Every problem that you have with a oridinary global variable you will still have with your global-variable-as-common-static-member scheme (and possibly a few new ones).
You are most likely looking for the Singleton pattern. That is not to say that all global variables need to use the pattern. But, when I have a global it is usually because I want only one instantiation for the entire program. In this case, singleton can work very well.
http://en.wikipedia.org/wiki/Singleton_pattern