Providing settable global vars - clojure

I need to provide settable vars for my users, like warn-on-reflection provided by clojure AFAIK they are not defined on the clojure side thats why we can set them.
Problem is my vars (all configuration stuff) are used in a lot of tight loops thats why I don't want to make them refs cause they MAY get set when application starts up and thats it no change during runtime, they will be read maybe millions of times so making them refs seems like wasting resources.
So the question is can I define settable vars in my case?

If you want settable global state with low overhead that is visible to all threads and doesn't need any STM transactions to control mutation, I'd recommend just using atoms:
(def some-global-value (atom 1))
Reads and writes to atoms are extremely low overhead.

warn-on-reflection is just a var, albeit one defined in Clojure's Java code rather than in core.clj. However, aside from where it is defined, there is nothing special about warn-on-reflection; it behaves exactly like any other var.
If you do not want to use vars, you may need to approach your problem from a different perspective than "must use global variables". It is common in functional programming to pass all necessary values into a function rather than relying on global variables. Perhaps it is time for you to consider such an approach.

You can create local binding using let, this way the parameters can be dynamic and you'll still have fast access. The only thing is that if someone change a parameter while the loop is running, the loop won't pick that up (IMO this should be the desired behaviour anyway).

Related

When are global variables actually considered good/recommended practice?

I've been reading a lot about why global variables are bad and why they should not be used. And yet most of the commonly used programming languages support globals in some way.
So my question is what is the reason global variables are still needed, do they offer some unique and irreplaceable advantage that cannot be implemented alternatively? Are there any benefits to global addressing compared to user specified custom indirection to retrieve an object out of its local scope?
As far as I understand, in modern programming languages, global addressing comes with the same performance penalty as calculating every offset from a memory address, whether it is an offset from the beginning of the "global" user memory or an offset from a this or any other pointer. So in terms of performance, the user can fake globals in the narrow cases they are needed using common pointer indirection without losing performance to real global variables. So what else? Are global variables really needed?
Global variables aren't generally bad because of their performance, they're bad because in significantly sized programs, they make it hard to encapsulate everything - there's information "leakage" which can often make it very difficult to figure out what's going on.
Basically the scope of your variables should be only what's required for your code to both work and be relatively easy to understand, and no more. Having global variables in a program which prints out the twelve-times tables is manageable, having them in a multi-million line accounting program is not so good.
I think this is another subject similar to goto - it's a "religious thing".
There is a lot of ways to "work around" globals, but if you are still accessing the same bit of memory in various places in the code you may have a problem.
Global variables are useful for some things, but should definitely be used "with care" (more so than goto, because the scope of misuse is greater).
There are two things that make global variables a problem:
1. It's hard to understand what is being done to the variable.
2. In a multithreaded environment, if a global is written from one thread and read by any other thread, you need synchronisation of some sort.
But there are times when globals are very useful. Having a config variable that holds all your configuration values that came from the config file of the application, for example. The alternative is to store it in some object that gets passed from one function to another, and it's just extra work that doesn't give any benefit. In particular if the config variables are read-only.
As a whole, however, I would suggest avoiding globals.
Global variables imply global state. This makes it impossible to store overlapping state that is local to a given part or function in your program.
For example, let stay we store the credentials of a given user in global variables which are used throughout our program. It will now be a lot more difficult to upgrade our program to allow multiple users at the same time. Had we just passed a user's state as a parameter, to our functions, we would have had a lot less problems upgrading to multiple users.
my question is what is the reason global variables are still needed,
Sometimes you need to access the same data from a lot of different functions. This is when you need globals.
For instance, I am working on a piece of code right now, that looks like this:
static runtime_thread *t0;
void
queue_thread (runtime_thread *newt)
{
t0 = newt;
do_something_else ();
}
void
kill_and_replace_thread (runtime_thread *newt)
{
t0->status = dead;
t0 = newt;
t0->status = runnable;
do_something_else ();
}
Note: Take the above as some sort of mixed C and pseudocode, to give you an idea of where a global is actually useful.
Static Global is almost mandatory when writing any cross platform library. These Global Variables are static so that they stay within the translation unit. There are few if any cross platform libraries that does not use static global variables because they have to hide their platform specific implementation to the user. These platform specific implementations are held in static global variables. Of course, if they use an opaque pointer and require the platform specific implementation to be held in such a structure, they could make a cross platform library without any static global. However, such an object needs to be passed to all functions within such a library. Therefore, you have a pass this opaque pointer everywhere, or make static global variables.
There's also the identifier limit issue. Compilers (especially older ones) have a limit to the number of identifiers they could handle within a scope. Many operating systems still use tons of #define instead of enumerations because their old compilers cannot handle the enumeration constants that bloat their identifiers. A proper rewrite of the header files could solve some of these.
Global variables are considered when you want to use them in every function including main. Also remember that if you initialize a variable globally, its initial value will be same in every function, however you can reinitialize it inside a function to use a different value for that variable in that function. In this way you don't have to declare the same variable again and again in each function. But yes they can cause trouble at times.
List item
Global names are available everywhere. You may unknowingly end up using a global when you think you are using a local
And if you make a mistake while declaring a global variable, then you'll have to apply the changes to the whole program like if you accidentally declared it to be int instead of float

how do atoms differ from refs?

How do atoms and refs actual differ?
I understand that atoms are declared differently and are updated via the swap! function, whereas refs use alter inside a dosync. The internal implementation, however, seems quite similar, which makes me wonder why I would use one and not the other.
For example, the doc page for atoms (http://clojure.org/atoms) states:
Internally, swap! reads the current value, applies the function to it, and attempts to compare-and-set it in. Since another thread may have changed the value in the intervening time, it may have to retry, and does so in a spin loop. The net effect is that the value will always be the result of the application of the supplied function to a current value, atomically. However, because the function might be called multiple times, it must be free of side effects.
The method described sounds quite similar to me to the STM used for refs.
The difference is that you can't coordinate changes between multiple atoms but you can coordinate changes between multiple refs.
Ref changes have to take place inside of a dosync block. All of the changes in the dosync take place or none of them do (atomic) but that extends to all changes to the refs within that dosync. This behaves a lot like a database transaction.
Let's say for example that you wanted to remove an item from one collection and add it to another but without anyone seeing a case where neither collection has the item. That's impossible to guarantee with atoms but you can guarantee it with refs.
Keep in mind that:
Use Refs for synchronous, coordinated and shared changes.
Use Atoms for synchronous, independent and shared changes.
To me, I don't care about the implementation differences between the atoms and refs. What I care about is the "Use Cases" of each on of them.
I use refs with I need to change the state of more than one reference type and I need the ATM semantics. I use atoms when I change the state of one reference type (object, depends on how you see it).
For example, if I need to increase the number of page hits in a web analytics system; I use atoms. If I need to transfer money between two accounts, I use refs.

Is a bad practice to declare static variables into functions/member functions?

Recently a fellow worker showed to me a code like this:
void SomeClass::function()
{
static bool init = false;
if (!init)
{
// hundreds of lines of ugly code
}
init = true;
}
He wants to check if SomeClass is initialized in order to execute some piece of code once per Someclass instance but the fact is that only one instance of SomeClass will exist in all the lifetime of the program.
His question were about the init static variable, about when it's initialized. I've answered that the initialization occurs once, so the value will be false at first call and true the rest of its lifetime. After answering I've added that such use of static variables is bad practice but I haven't been able to explain why.
The reasons that I've been thinking so far are the following:
The behaviour of static bool init into SomeClass::function could be achieved with a non-static member variable.
Other functions in SomeClass couldn't check the static bool init value because it's visibility is limited to the void SomeClass::function() scope.
The static variables aren't OOPish because they define a global state instead of a object state.
This reasons looks poor, unclever and not very concrete to me so I'm asking for more reasons to explain why the use of static variables in function and member-function space are a bad practice.
Thanks!
This is certainly a rare occurrence, at least, in good quality code, because of the narrow case for which it's appropriate. What this basically does is a just-in-time initialization of a global state (to deliver some global functionality). A typical example of this is having a random number generator function that seeds the generator at the first call to it. Another typical use of this is a function that returns the instance of a singleton, initialized on the first call. But other use-case examples are few and far between.
In general terms, global state is not desirable, and having objects that contain self-sufficient states is preferred (for modularity, etc.). But if you need global state (and sometimes you do), you have to implement it somehow. If you need any kind of non-trivial global state, then you should probably go with a singleton class, and one of the preferred ways to deliver that application-wide single instance is through a function that delivers a reference to a local static instance initialized on the first call. If the global state needed is a bit more trivial, then doing the scheme with the local static bool flag is certainly an acceptable way to do it. In other words, I see no fundamental problem with employing that method, but I would naturally question its motivations (requiring a global state) if presented with such code.
As is always the case for global data, multi-threading will cause some problems with a simplistic implementation like this one. Naive introductions of global state are never going to be inherently thread-safe, and this case is no exception, you'd have to take measures to address that specific problem. And that is part of the reasons why global states are not desirable.
The behaviour of static bool init into SomeClass::function could be achieved with a non-static member variable.
If there is an alternative to achieve the same behavior, then the two alternatives have to be judged on the technical issues (like thread-safety). But in this case, the required behavior is the questionable thing, more so than the implementation details, and the existence of alternative implementations doesn't change that.
Second, I don't see how you can replace a just-in-time initialization of a global state by anything that is based on a non-static data member (a static data member, maybe). And even if you can, it would be wasteful (require per-object storage for a one-time-per-program-execution thing), and on that ground alone, wouldn't make it a better alternative.
Other functions in SomeClass couldn't check the static bool init value because it's visibility is limited to the void SomeClass::function() scope.
I would generally put that in the "Pro" column (as in Pro/Con). This is a good thing. This is information hiding or encapsulation. If you can hide away things that shouldn't be a concern to others, then great! But if there are other functions that would need to know that the global state has already been initialized or not, then you probably need something more along the lines of a singleton class.
The static variables aren't OOPish because they define a global state instead of a object state.
OOPish or not, who cares? But yes, the global state is the concern here. Not so much the use of a local static variable to implement its initialization. Global states, especially mutable global states, are bad in general and should never be abused. They hinder modularity (modules are less self-sufficient if they rely on global states), they introduce multi-threading concerns since they are inherently shared data, they make any function that use them non-reentrant (non-pure), they make debugging difficult, etc... the list goes on. But most of these issues are not tied to how you implement it. On the other hand, using a local static variable is a good way to solve the static-initialization-order-fiasco, so, they are good for that reason, one less problem to worry about when introducing a (well-justified) global state into your code.
Think multi-threading. This type of code is problematic when function() can be called concurrently by multiple threads. Without locking, you're open to race conditions; with locking, concurrency can suffer for no real gain.
Global state is probably the worst problem here. Other functions don't have to be concerned with it, so it's not an issue. The fact that it can be achieved without static variable essentially means you made some form of a singleton. Which of course introduces all problems that singleton has, like being totally unsuitable for multithreaded environment, for one.
Adding to what others said, you can't have multiple objects of this class at the same time, or at least would they not behave as expected. The first instance would set the static variable and do the initialization. The ones created later though would not have their own version of init but share it with all other instances. Since the first instance set it to true, all following won't do any initialization, which is most probably not what you want.

Can I use a global variable in this scenario?

After reading some SO threads on this topic: I've come up with these reasons for why global variables/singletons are bad.
It becomes increasingly difficult to understand functions that the global state, as the code grows, more and more functions will modify that global state.
It makes unit testing harder.
It hides dependencies.
You will have to rewrite code if someday it turns out that your global variable is not actually a singular object/variable.
I want to make a game in C++, and there will be a "heightmap object" that represents the landscape of the world in my game as heightmap. This heightmap can change. I want to use a global object for it. (I don't expect to run into static initialization order issues, as there won't be any other static variable that references this heightmap object).
Now, I know global state is bad, and global mutable state is even worse, for those above reasons. But it seems really, really cumbersome to do the alternative: Create a heightmap object at main() scope, and pass that heightmap object to every single function that wants to use it.
What if I'm 100% sure that there will only be one heightmap in my application? Also, since this is a small, solo project, I have faith that I will be able to understand what each function is doing to the global state? And I don't see how the use of a global variable in this this case hurts unit testing. If I want to use a mock heightmap, couldn't I just do globalHeightmap = generateMockHeightmap(); before calling the function that I want to test?
As certain as you may be about the characteristics of your project right now, I can pretty much assure you that at some point in the future you'll need to modify the code that relies on this global variable. At that point the global variable will likely come back and make it harder for you to figure out the needed changes because state isn't hidden (for example what if you accidentally change the state instead of reading from it - then the entire program is affected at some random point in the future). I can't overstate how important it is for program maintenance and debugging to minimize state mutation points and global variables are pretty much the antithesis to that goal.
Just overriding the global state map for your unit test seems fragile. What if you need to restore the old state, or mutate between states in the test? Then you wind up with a bunch of save/set/restore code.
What if you ever want to add a threading model to your application? Using global state will make that transition that much more complicated.
What if someone else helps you out with the project a year from now? Will they be able to understand the code? Will you be able to understand it a year from now (I know that I always try to write obvious code and add comments where it's not specifically because I might be the person that comes back a year later and no longer remember a thing about the mechanism).
Finally, if a non-global-variable approach seems like too much work or too complicated that probably means your alternate approach is too complicated or needs another idea/rework. There's no reason that you can't stash the heightmap into an object that's created at the appropriate high-level game object and passed/stored in lower level objects as needed.

RNGs and global variable avoidance

I'm wondering about this. I've heard that global variables are bad, that they hurt the maintainability, usability, reusability, etc. of the code. But in this case, what can I do otherwise? Namely, I have a "pseudo-random number generator" (PRNG) and as one may know, they involve an internal state that changes every time new random numbers are generated. But this seems like the kind of thing that needs to be a global! Or a "static" member of an RNG class, but that's essentially a global! And globals are bad!
So, what can I do? The obvious thing is to have something like this (really stripped down):
class RNG {
private:
StateType state; // this is the thing one may be tempted
// to make "static", which ruins the
// whole idea
public:
RNG(); // constructor seeds the RNG
~RNG();
int generateRandomInt();
};
But we need to seed that good, if we're going to create an instance of this every time we need a random number in some function or class. Using the clock may not work, since what if two instances of type "RNG" are created too close together? Then they get the same seed and produce the same random sequence. Bad.
We could also create one master RNG object and pass it around with pointers (instead of making it global, which would put us back on square 1), so that classes that need random numbers get a pointer to the RNG object in them. But then I run into a problem involving save/load of these objects to/from disk -- we can't just "save the RNG" for each instance, since we have only one RNG object. We'd have to instead pass an RNG into the load routines, which might give those routines different argument lists than for other objects that don't use the RNG. This would be a problem if, e.g. we wanted to use a common "Saveable" base class for everything that we can load/save. So, what to do? Eliminate the common "Saveable" base and just adopt a convention for how the load/save routines are to be made (but isn't that bad in and of itself? Oy!)?
What is the best solution to this that avoids the hostile-to-maintainability problems of globals yet also does not run into these new problems?
Or is it in fact okay to use a global here, as after all, that's how the "rand()" builtin works anyway? But then I hear that little thing in the back of my mind saying "but... but but but, globals are bad! Bad!" And from what I've read, there seem to be fairly good reasons to think them bad. But it seems like avoiding them creates new kinds of difficulties, like this one. It certainly seems harder to avoid globals than avoid "goto"s, for example.
There's some merit to your reluctance. You may want to substitute another generator for testing for example, that spits out a deterministic sequence that contains some edge cases.
Dependency Injection is one popular method for avoiding globals.
A random number generator is one of those things that is more OK to be global. You can think of it as:
-A RandomNumberFactory, a design pattern that uses a global factory, and it builds random numbers for you. The factory is of course constant semantically (in C++ you might use the keyword "mutable" internally, if that means anything to you.)
-A global constant (global constants are ok, right?) that gives you read-only access to the global constant randomNumber. randomNumber just happens to be non-deterministic, but it's constant in that of course no application is going to write to it.
-And besides, what's the worst that could happen? In a multithreaded application your RNG will yield non-deterministic behavior? Gasp. As #Mark Ransom pointed out above, yes that is actually a drawback as it makes testing harder. If this is a concern to you, you might consider a design pattern that writes this out.
It sometimes makes sense to use globals. No one can categorically say your code should not have any global variables, period.
What you do need is to be aware of the issues that can arise from global variables. If your application's logic needs global variables and it is possible for it to be called from a multi-threaded environment, then there is nothing inherently wrong with that, but you must take care to use locking, mutexes, and/or atomic reads/writes to ensure correct behavior.