I keep hearing that global variables should never be used, but I have a tendency to dismiss "never" rules as hot-headed. Are there really no exceptions?
For instance, I am currently writing a small game in c++ with SDL. It seems to me to make a lot of sense to have a global variable with a pointer to the screen buffer, because all the different class that represent the different type of things in the game will need to blit to it, and there is only one screen buffer.
Please tell me if I am right that there are exceptions, or if not then:
Why not, or what is so bad about them that they should be avoided at all costs (please explain a little bit)
How can this be achieved, preferably without having to pass it to every constructer to be stored internally until needed, or to every call to a paint() request.
(I would assume that this question had been asked on SO before, however couldn't find what I need (an explanation and workaround) when searching. If someone could just post a link to a previous question, that could be great)
We tell students never to use global variables because it encourages better programming methods. It's the same reason we tell them not to use a goto statement. Once you're an accomplished programmer then you can break the rules because you should know when it's appropriate.
Of course there are exceptions. I personally can't think of a single situation where a goto is the right solution (or where a singleton is the right solution), but global variables occasionally have their uses. But... you haven't found a valid excuse.
Most objects in your game do not, repeat, not need to access the screen buffer. That is the responsibility of the renderer and no one else. You don't want your logger, input manager, AI or anyone else putting random garbage on the screen.
And that is why people say "don't use globals". It's not because globals are some kind of ultimate evil, but because if we don't say this, people fall into the trap you're in, of "yeah but that rule doesn't apply to me, right? I need everything to have access to X". No, you need to learn to structure your program.
More common exceptions are for state-less or static objects, like a logger, or perhaps your app's configuration: things that are either read-only or write-only, and which truly needs to be accessible from everywhere. Every line of code may potentially need to write a log message. So a logger is a fair candidate for making global. But 99% of your code should not even need to know that a screen buffer exists.
The problem with globals is, in a nutshell, that they violate encapsulation:
Code that depends on a global is less reusable. I can take the exact same class you're using, put it in my app, and it'll break. Because I don't have the same network of global objects that it depends on.
It also makes the code harder to reason about. What value will a function f(x) return?
It obviously depends on what x is. But if I pass the same x twice, will I get the same result? If it uses a lot of globals, then probably not. Then it becomes really difficult to just figure out what it's going to return, and also what else it is going to do. Is it going to set some global variable that's going to affect other, seemingly unrelated, functions?
How can this be achieved, preferably without having to pass it to every constructor to be stored internally until needed
You make it sound like that's a bad thing. If an object needs to know about the screen buffer, then you should give it the screen buffer. Either in the constructor, or in a later call. (And it has a nice bonus: it alerts you if your design is sloppy. If you have 500 classes that need to use the screen buffer, then you have to pass it to 500 constructors. That's painful, and so it's a wake-up call: I am doing something wrong. That many object shouldn't need to know about the screen buffer. How can I fix this?`)
As a more obvious example, say I want to calculate the cosine of 1.42, so I pass 1.42 to the function: cos(1.42)
That's how we usually do it, with no globals. Of course, we could instead say "yeah but everyone needs to be able to set the argument to cos, I'd better make it global". Then it'd look like this:
gVal = 1.42;
cos();
I don't know about you, but I think the first version was more readable.
Like any other design decision, using global variables has a cost. It saves you having to pass variables unnecessarily, and allows you to share state among running functions. But it also has the potential to make your code hard to follow, and to reuse.
Some applications, like embedded systems, use global variables regularly. For them, the added speed of not having to pass the variable or even a pointer into the activation record, and the simplicity, makes it a good decision [arguably]. But their code suffers for it; it is often hard to follow execution and developing systems with increasing complexity becomes more and more difficult.
In a large system, consisting of heterogeneous components, using globals may become a nightmare to maintain. At some point you may need a different screen buffer with different properties, or the screen buffer may not be available until it's initialized meaning you'll have to wrap every call to it with a check if it's null, or you'll need to write multithreaded code, and the global will require a lock.
In short, you are free to use global vars while your application is small enough to manage. When it starts to grow, they will become a liability, and will either require refactoring to remove, or will cripple the programs growth (in terms of capability or stability). The admonition not to use them stems from years of hard-learned lessons, not programmer "hot-headedness".
What if you want to update your engine to support dual screen? Multiple displays are becoming more and more common all the time. Or what if you want to introduce threading? Bang. How about if you want to support more than one rendering subsystem? Whoopsie. I want to pack my code as a library for other people or myself to re-use? Crap.
Another problem is that the order of global init between source files is undefined, making it tricky to maintain more than a couple.
Ultimately, you should have one and only one object that can work with the screen buffer - the rendering object. Thus, the screen buffer pointer should be part of that object.
I agree with you from a fundamental point of view - "never" is inaccurate. Every function call you make is calling a global variable - the address of that function. This is especially true for imported functions like OS functions. There are other things that you simply cannot unglobal, even if you wanted to - like the heap. However, this is most assuredly not the right place to use a global.
The biggest problem with globals is that if you later decide that a global wasn't the right thing to do for any reason (and there are many reasons), then they're absolutely hell to factor out of an existing progam. The simple fact is that using a global is just not thinking. I can't be bothered to design an actual rendering subsystem and object, so I'm just gonna chuck this stuff in a global. It's easy, it's simple, and not doing this was the biggest revolution in software programming, ever, and for good reason.
Make a rendering class. Put the pointer in there. Use a member function. Problem solved.
Edit: I re-read your OP. The problem here is that you've split your responsibilities. Each class (bitmap, text, whatever) should NOT render itself. It should just hold the data that the master rendering object needs to render it. It's a Bitmap's job to represent a bitmap - not to render a bitmap.
Global variables can change in unexpected ways, which is usually not what you want. The state of the application will become complex and unmaintainable. Very easy to make something wrong. Especially if someone else is changing your code;
Singleton might be a better idea. That would at least give you some encapsulation in case you need to make extensions in the future.
One reason not to use global variables is a problem with namespaces (i.e. accidentally using the same name twice);
We do often use global (to namespace) constants at work which is considered normal, as they don't change (in unexpected ways) and it is very convenient to have them available in multiple files.
If the screen buffer is shared between lots of different pieces of code, then you have two options:
1) Pass it around all over the place. This is inconvenient, because every piece of code that uses the screen buffer, even indirectly, needs to be laboriously indicated as such by the fact that this object is passed through the call stack.
2) Use a global. If you do this, then for all you know any function at all in your entire program might use the screen buffer, just by grabbing it from the global[*]. So if you need to reason about the state of the screen buffer, then you need to include the entire program in your reasoning. If only there was some way to indicate which functions modify the screen buffer, and which cannot possibly ever do so. Oh, hang on a second...
This is even aside from the benefits of dependency injection - when testing, and in future iterations of your program, it might be useful for the caller of some function that blits, to be able to say where it should blit to, not necessarily the screen.
The same issues apply as much to singletons as they do to other modifiable globals.
You could perhaps even make a case that it should cost you something to add yet another piece of code that modifies the screen buffer, because you should try to write systems which are loosely coupled, and doing so will naturally result in fairly few pieces of code that need to know anything at all about the screen in order to do their job (even if they know that they're manipulating images, they needn't necessarily care whether those images are in the screen buffer, or some back buffer, or some completely unrelated buffer that's nothing to do with the screen). I'm not actually in favour of making extra work just to punish myself into writing better code, but it's certainly true that globals make it quite easy to add yet another inappropriate wad of coupling to my app.
[*] Well, you may be able to narrow it down on the basis that only TUs that include the relevant header file will have the declaration. There's nothing technically to stop them copy-and-pasting it, but in a code base that's at all well regulated, they won't.
The reason i never do it is because it creates a mess. Imagine setting ALL unique variables to globals, you would have an external list the size of a phonebook.
Another reason could be that you don't know where it is initialized or modified. What if you accidently modify it at place X in file Y? You will never know. What if it isn't initialized yet? You will have to check everytime.
if (global_var = 0) // uh oh :-(
if (object->Instance() = 0) // compile error :-)
This can both be fixed using singletons. You simply cant assign to a function returning you the object's adress.
Besides that: you don't need your screen buffer everywhere in your application, however if you want to: go ahead, it doesn't make the program run less good :-)
And then you still have the namespace problem but that at least gives you compile errors ;-)
"Why not": global variables give you spaghetti information flow.
That's the same as goto gives you spaghetti control flow.
You don't know where anything comes from, or what can be assumed at any point. The INTERCAL solution of introducing a come from statement, while offering some initial hope of finally being sure of where control comes from, turned out to not really solve that problem for goto. Similarly, more modern language features for tracking updates to global variables, like onchangeby, have not turned out to solve that problem for global variables.
Cheers & hth.,
Global variables (and singletons, which are just a wrapper around a global variable) can cause a number of problems, as I discussed in this answer.
For this specific issue -- blittable objects in a game kit -- I'd be apt to suggest a method signature like Sprite::drawOn(Canvas&, const Point&). It shouldn't be excessive overhead to pass a reference to the Canvas around, since it's not likely to be needed except in the paint pathway, and within that pathway you're probably iterating over a collection anyway, so passing it in that loop isn't that hard. By doing this, you're hiding that the main program has only one active screen buffer from the sprite classes, and therefore making it less likely to create a dependency on this fact.
Disclaimer: I haven't used SDL itself before, but I wrote a simple cross-platform C++ game kit back in the late '90s. At the time I was working on my game kit, it was fairly common practice for multi-player X11-based games to run as a single process on one machine that opened a connection to each player's display, which would quite efficiently make a mess of code that assumed the screen buffer was a singleton.
In this case a class that provides member functions for all methods that need access to the screen buffer would be a more OOP friendly approach. Why should everyone and any one have uncontrolled access to it!?
As to whether there are times when a global is better or even necessary, probably not. They are deceptively attractive when you are hacking out some code, because you need jump through no syntactic hoops to access it, but it is generally indicative of poor design, and one that will rapidly atrophy inter maintenance and extension.
Here's a good read on the subject (related to embedded programming, but the points apply to any code, it is just that some embedded programmers thing they have a valid excuse).
I'm also curious to hear the precise explanation, but I can tell you that the Singleton pattern usually works pretty well to fill the role of global variables.
Related
I hate to beat a dead horse, that said, I've gone over so many conflicting articles over the past few days in regards to the use of the singleton pattern.
This question isn't be about which is the better choice in general, rather what makes sense for my use case.
The pet project I'm working on is a game. Some of the code that I'm currently working on, I'm leaning towards using a singleton pattern.
The use cases are as follows:
a globally accessible logger.
an OpenGL rendering manager.
file system access.
network access.
etc.
Now for clarification, more than a couple of the above require shared state between accesses. For instance, the logger is wrapping a logging library and requires a pointer to the output log, the network requires an established open connection, etc.
Now from what I can tell it's more suggested that singletons be avoided, so lets look at how we may do that. A lot of the articles simply say to create the instance at the top and pass it down as a parameter to anywhere that is needed. While I agree that this is technically doable, my question then becomes, how does one manage the potentially massive number of parameters? Well what comes to mind is wrapping the different instances in a sort of "context" object and passing that, then doing something like context->log("Hello World"). Now sure that isn't to bad, but what if you have a sort of framework like so:
game_loop(ctx)
->update_entities(ctx)
->on_preupdate(ctx)
->run_something(ctx)
->only use ctx->log() in some freak edge case in this function.
->on_update(ctx)
->whatever(ctx)
->ctx->networksend(stuff)
->update_physics(ctx)
->ctx->networksend(stuff)
//maybe ctx never uses log here.
You get the point... in some areas, some aspects of the "ctx" aren't ever used but you're still stuck passing it literally everywhere in case you may want to debug something down the line using logger, or maybe later in development, you actually want networking or whatever in that section of code.
I feel like the above example would much rather be suited to a globally accessible singleton, but I must admit, I'm coming from a C#/Java/JS background which may color my view. I want to adopt the mindset/best practices of a C++ programmer, yet like I said, I can't seem to find a straight answer. I also noticed that the articles that suggest just passing the "singleton" as a parameter only give very simplistic use cases that anyone would agree a parameter would be the better way to go.
In this game example, you probably wan't to access logging everywhere even if you don't plan on using it immediately. File system stuff may be all over but until you build out the project, it's really hard to say when/where it will be most useful.
So do I:
Stick with using singletons for these use cases regardless of how "evil/bad" people say it is.
Wrap everything in a context object, and pass it literally everywhere. (seems kinda gross IMO, but if that's the "more accepted/better" way of doing it, so be it.)
Something completely else. (Really lost as to what that might be.)
If option 1, from a performance standpoint, should I switch to using namespace functions, and hiding the "private" variables / functions in anonymous namespaces like most people do in C? (I'm guessing there will be a small boost in performance, but then I'll be stuck having to call an "init" and "destroy" method on a few of these rather than being able to just allow the constructor/destructor to do that for me, still might be worth while?)
Now I realize this may be a bit opinion based, but I'm hoping I can still get a relatively good answer when a more complicated/nested code base is in question.
Edit:
After much more deliberation I've decided to use the "Service Locator" pattern instead. To prevent a global/singleton of the Service Locator I'm making anything that may use the services inherit from a abstract base class that requires the Service Locator be passed when constructed.
I haven't implemented everything yet so I'm still unsure if I'll run into any problems with this approach, and would still love feedback on if this is a reasonable alternative to the singleton / global scope dilemma.
I had read that Service Locator is also somewhat of an anti-pattern, that said, many of the example I found implemented it with statics and/or as a singleton, perhaps using it as I've described removes the aspects that cause it to be an anti-pattern?
Whenever you think you want to use a Singleton, ask yourself the following question: Why is it that it must be ensured at all cost that there never exists more than one instance of this class at any point in time? Because the whole point of the Singleton pattern is to make sure that there can never be more than one instance of the Singleton. That's what the term "singleton" is all about: there only being one. That's why it's called the Singleton pattern. That's why the pattern calls for the constructor to be private. The point of the Singleton pattern is not and never was to give you a globally-accessible instance of something. The fact that there is a global access point to the sole instance is just a consequence of the Singleton pattern. It is not the objective the Singleton pattern is meant to achieve. If all you want is a globally accessible instance of something, then use a global variable. That's exactly what global variables are for…
The Singleton pattern is probably the one design pattern that's singularly more often misunderstood than not. Is it an intrinsic aspect of the very concept of a network connection that there can only ever be one network connection at a time, and the world would come to an end if that constraint was ever to be violated? If the answer is no, then there is no justification for a network connection to ever be modeled as a Singleton. But don't take my word for it, convince yourself by checking out page 127 of Design Patterns: Elements of Reusable Object-Oriented Software where the Singleton pattern was originally described…😉
Concerning your example: If you're ending up having to pass a massive number of parameters into some place then that first and foremost tells you one thing: there are too many responsibilities in that place. This fact is not changed by the use of Singletons. The use of Singletons simply obfuscates this fact because you're not forced to pass all stuff in through one door in the form of parameters but rather just access whatever you want directly all over the place. But you're still accessing these things. So the dependencies of your piece of code are the same. These dependencies are just not expressed explicitly anymore at some interface level but creep around in the mists. And you never know upfront what stuff a certain piece of code depends on until the moment your build breaks after trying to take away one thing that something else happened to depend upon. Note that this issue is not specific to the Singleton pattern. This is a concern with any kind of global entity in general…
So rather than ask the question of how to best pass a massive number of parameters, you should ask the question of why the hell does this one piece of code need access to that many things? For example, do you really need to explicitly pass the network connection to the game loop? Should the game loop not maybe just know the physics world object and that physics world object is at the moment of creation given some object that handles the network communication. And that object in turn is upon initialization told the network connection it is supposed to use? The log could just be a global variable (or is there really anything about the very idea of a log itself that prohibits there ever being more than one log?). Or maybe it would actually make sense for each thread to have its own log (could be a thread-local variable) so that you get a log from each thread in the order of the control flow that thread happened to take rather than some (at best) interleaved mess that would be the output from multiple threads for which you'd probably want to write some tool so that you'd at least have some hope of making sense of it at all…
Concerning performance, consider that, in a game, you'll typically have some parent objects that each manage collections of small child objects. Performance-critical stuff would generally be happening in places where something has to be done to all child objects in such a collection. The relative overhead of first getting to the parent object itself should generally be negligible…
PS: You might wanna have a look at the Entity Component System pattern…
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
In C/C++, are global variables as bad as my professor thinks they are?
The problem with global variables is that since every function has access to these, it becomes increasingly hard to figure out which functions actually read and write these variables.
To understand how the application works, you pretty much have to take into account every function which modifies the global state. That can be done, but as the application grows it will get harder to the point of being virtually impossible (or at least a complete waste of time).
If you don't rely on global variables, you can pass state around between different functions as needed. That way you stand a much better chance of understanding what each function does, as you don't need to take the global state into account.
The important thing is to remember the overall goal: clarity
The "no global variables" rule is there because most of the time, global variables make the meaning of code less clear.
However, like many rules, people remember the rule, and not what the rule was intended to do.
I've seen programs that seem to double the size of the code by passing an enormous number of parameters around simply to avoid the evil of global variables. In the end, using globals would have made the program clearer to those reading it. By mindlessly adhering to the word of the rule, the original programmer had failed the intent of the rule.
So, yes, globals are often bad. But if you feel that in the end, the intent of the programmer is made clearer by the use of global variables, then go ahead. However, remember the drop in clarity that automatically ensues when you force someone to access a second piece of code (the globals) to understand how the first piece works.
My professor used to say something like: using global variables are okay if you use them correctly. I don't think I ever got good at using them correctly, so I rarely used them at all.
The problem that global variables create for the programmer is that it expands the inter-component coupling surface between the various components that are using the global variables. What this means is that as the number of components using a global variable increases, the complexity of the interactions can also increase. This increased coupling usually makes defects easier to inject into the system when making changes and also makes defects harder to diagnose and correct. This increase coupling can also reduce the number of available options when making changes and it can increase the effort required for changes as often one must trace through the various modules that are also using the global variable in order to determine the consequences of changes.
The purpose of encapsulation, which is basically the opposite of using global variables, is to decrease coupling in order to make understanding and changing the source easier and safer and more easily tested. It is much easier to use unit testing when global variables are not used.
For example if you have a simple global integer variable that is being used as an enumerated indicator that various components use as a state machine and you then make a change by adding a new state for a new component, you must then trace through all the other components to ensure that the change will not affect them. An example of a possible problem would be if a switch statement to test the value of the enumeration global variable with case statements for each of the current values is being used in various places and it so happens that some of the switch statements do not have a default case to handle an unexpected value for the global all of a sudden you have undefined behavior so far as the application is concerned.
On the other hand the use of a shared data area might be used to contain a set of global parameters which are referenced throughout the application. This approach is often used with embedded applications with small memory footprints.
When using global variables in these sort of applications typically the responsibility for writing to the data area is allocated to a single component and all other components see the area as const and read from it, never writing to it. Taking this approach limits the problems that can develop.
A few problems from global variables which need to be worked around
When the source for a global variable such as a struct is modified, everything using it must be recompiled so that everything using the variable knows its true size and memory template.
If more than one component can modify the global variable you can run into problems with inconsistent data being in the global variable. With a multi-threading application, you will probably need to add some kind of locking or critical region to provide a way so that only one thread at a time can modify the global variable and when a thread is modifying the variable, all changes are complete and committed before other threads can query the variable or modify it.
Debugging a multi-threaded application that uses a global variable can be more difficult. You can run into race conditions that can create defects that are difficult to replicate. With several components communicating through a global variable, especially in a multi-threaded application, being able to know what component is changing the variable when and how it is changing the variable can be very difficult to understand.
Name clash can be a problem with using of global variables. A local variable that has the same name as a global variable can hide the global variable. You also run into the naming convention issue when using the C programming language. A work around is to divide the system up into sub-systems with the global variables for a particular sub-system all beginning with the same first three letters (see this on resolving name space collisions in objective C). C++ provides namespaces and with C you can work around this by creating a globally visible struct whose members are various data items and pointers to data and functions which are provided in a file as static hence with file visibility only so that they can only be referenced through the globally visible struct.
In some cases the original application intent is changed so that global variables that provided the state for a single thread is modified to allow several duplicate threads to run. An example would be a simple application designed for a single user using global variables for state and then a request comes down from management to add a REST interface to allow remote applications to act as virtual users. So now you run into having to duplicate the global variables and their state information so that the single user as well as each of the virtual users from remote applications have their own, unique set of global variables.
Using C++ namespace and the struct Technique for C
For the C++ programming language the namespace directive is a huge help in reducing the chances of a name clash. namespace along with class and the various access keywords (private, protected, and public) provide most of the tools you need to encapsulate variables. However the C programming language doesn't provide this directive. This stackoverflow posting, Namespaces in C , provides some techniques for C.
A useful technique is to have a single memory resident data area that is defined as a struct which has global visibility and within this struct are pointers to the various global variables and functions that are being exposed. The actual definitions of the global variables are given file scope using the static keyword. If you then use the const keyword to indicate which are read only, the compiler can help you to enforce read only access.
Using the struct technique can also encapsulate the global so that it becomes a kind of package or component that happens to be a global. By having a component of this kind it becomes easier to manage changes that affect the global and the functionality using the global.
However while namespace or the struct technique can help manage name clashes, the underlying problems of inter-component coupling which the use of globals introduces especially in a modern multi-threaded application, still exist.
Global variables should only be used when you have no alternative. And yes, that includes Singletons. 90% of the time, global variables are introduced to save the cost of passing around a parameter. And then multithreading/unit testing/maintenance coding happens, and you have a problem.
So yes, in 90% of the situations global variables are bad. The exceptions are not likely to be seen by you in your college years. One exception I can think off the top of my head is dealing with inherently global objects such as interrupt tables. Things like DB connection seem to be global, but ain't.
Global variables are as bad as you make them, no less.
If you are creating a fully encapsulated program, you can use globals. It's a "sin" to use globals, but programming sins are laregly philosophical.
If you check out L.in.oleum, you will see a language whose variables are solely global. It's unscalable because libraries all have no choice but to use globals.
That said, if you have choices, and can ignore programmer philosophy, globals aren't all that bad.
Neither are Gotos, if you use them right.
The big "bad" problem is that, if you use them wrong, people scream, the mars lander crashes, and the world blows up....or something like that.
Yes, but you don't incur the cost of global variables until you stop working in the code that uses global variables and start writing something else that uses the code that uses global variables. But the cost is still there.
In other words, it's a long term indirect cost and as such most people think it's not bad.
If it's possible your code will end up under intensive review during a Supreme Court trial, then you want to make sure to avoid global variables.
See this article:
Buggy breathalyzer code reflects importance of source review
There were some problems with the
style of the code that were identified
by both studies. One of the stylistic
issues that concerned the reviewers
was the extensive use of unprotected
global variables. This is considered
poor form because it increases the
risk that the program state will
become inconsistent or that values
will be inadvertently modified or
overwritten. The researchers also
expressed some concern about the fact
that decimal precision is not
maintained consistently throughout the
code.
Man, I bet those developers are wishing they hadn't used global variables!
The issue is less that they're bad, and more that they're dangerous. They have their own set of pros and cons, and there are situations where they're either the most efficient or only way to achieve a particular task. However, they're very easy to misuse, even if you take steps to always use them properly.
A few pros:
Can be accessed from any function.
Can be accessed from multiple threads.
Will never go out of scope until the program ends.
A few cons:
Can be accessed from any function, without needing to be explicitly dragged in as a parameter and/or documented.
Not thread-safe.
Pollutes the global namespace and potentially causes name collisions, unless measures are taken to prevent this.
Note, if you will, that the first two pros and the first two cons I listed are the exact same thing, just with different wording. This is because the features of a global variable can indeed be useful, but the very features that make them useful are the source of all their problems.
A few potential solutions to some of the problems:
Consider whether they're actually the best or most efficient solution for the problem. If there are any better solutions, use that instead.
Put them in a namespace [C++] or singleton struct [C, C++] with a unique name (a good example would be Globals or GlobalVars), or use a standardised naming convention for global variables (such as global_[name] or g_module_varNameStyle (as mentioned by underscore_d in the comments)). This will both document their use (you can find code that uses global variables by searching for the namespace/struct name), and minimise the impact on the global namespace.
For any function that accesses global variables, explicitly document which variables it reads and which it writes. This will make troubleshooting easier.
Put them in their own source file and declare them extern in the associated header, so their use can be limited to compilation units that need to access them. If your code relies on a lot of global variables, but each compilation unit only needs access to a handful of them, you could consider sorting them into multiple source files, so it's easier to limit each file's access to global variables.
Set up a mechanism to lock and unlock them, and/or design your code so that as few functions as possible need to actually modify global variables. Reading them is a lot safer than writing them, although thread races may still cause problems in multithreaded programs.
Basically, minimise access to them, and maximise name uniqueness. You want to avoid name collisions and have as few functions as possible that can potentially modify any given variable.
Whether they're good or bad depends on how you use them. The majority tend to use them badly, hence the general wariness towards them. If used properly, they can be a major boon; if used poorly, however, they can and will come back to bite you when and how you least expect it.
A good way to look at it is that they themselves aren't bad, but they enable bad design, and can multiply the effects of bad design exponentially.
Even if you don't intend to use them, it is better to know how to use them safely and choose not to, than not to use them because you don't know how to use them safely. If you ever find yourself in a situation where you need to maintain pre-existing code that relies on global variables, you may be in for difficulty if you don't know how to use them properly.
I'd answer this question with another question: Do you use singeltons/ Are singeltons bad?
Because (almost all) singelton usage is a glorified global variable.
As someone said (I'm paraphrasing) in another thread "Rules like this should not be broken, until you fully understand the consequences of doing so."
There are times when global variables are necessary, or at least very helpful (Working with system defined call-backs for example). On the other hand, they're also very dangerous for all of the reasons you've been told.
There are many aspects of programming that should probably be left to the experts. Sometimes you NEED a very sharp knife. But you don't get to use one until you're ready...
Global variables are generally bad, especially if other people are working on the same code and don't want to spend 20mins searching for all the places the variable is referenced. And adding threads that modify the variables brings in a whole new level of headaches.
Global constants in an anonymous namespace used in a single translation unit are fine and ubiquitous in professional apps and libraries. But if the data is mutable, and/or it has to be shared between multiple TUs, you may want to encapsulate it--if not for design's sake, then for the sake of anybody debugging or working with your code.
Using global variables is kind of like sweeping dirt under a rug. It's a quick fix, and a lot easier in the short term than getting a dust-pan or vacuum to clean it up. However, if you ever end up moving the rug later, you're gonna have a big surprise mess underneath.
I think your professor is trying to stop a bad habit before it even starts.
Global variables have their place and like many people said knowing where and when to use them can be complicated. So I think rather than get into the nitty gritty of the why, how, when, and where of global variables your professor decided to just ban. Who knows, he might un-ban them in the future.
Global variables are bad, if they allow you to manipulate aspects of a program that should be only modified locally. In OOP globals often conflict with the encapsulation-idea.
Absolutely not. Misusing them though... that is bad.
Mindlessly removing them for the sake of is just that... mindless. Unless you know the advanatages and disadvantages, it is best to steer clear and do as you have been taught/learned, but there is nothing implicitly wrong with global variables. When you understand the pros and cons better make your own decision.
No they are not bad at all. You need to look at the (machine) code produced by the compiler to make this determination, sometimes it is far far worse to use a local than a global. Also note that putting "static" on a local variable is basically making it a global (and creates other ugly problems that a real global would solve). "local globals" are particularly bad.
Globals give you clean control over your memory usage as well, something far more difficult to do with locals. These days that only matters in embedded environments where memory is quite limited. Something to know before you assume that embedded is the same as other environments and assume the programming rules are the same across the board.
It is good that you question the rules being taught, most of them are not for the reasons you are being told. The most important lesson though is not that this is a rule to carry with you forever, but this is a rule required to honor in order to pass this class and move forward. In life you will find that for company XYZ you will have other programming rules that you in the end will have to honor in order to keep getting a paycheck. In both situations you can argue the rule, but I think you will have far better luck at a job than at school. You are just another of many students, your seat will be replaced soon, the professors wont, at a job you are one of a small team of players that have to see this product to the end and in that environment the rules developed are for the benefit of the team members as well as the product and the company, so if everyone is like minded or if for the particular product there is good engineering reason to violate something you learned in college or some book on generic programming, then sell your idea to the team and write it down as a valid if not the preferred method. Everything is fair game in the real world.
If you follow all of the programming rules taught to you in school or books your programming career will be extremely limited. You can likely survive and have a fruitful career, but the breadth and width of the environments available to you will be extremely limited. If you know how and why the rule is there and can defend it, thats good, if you only reason is "because my teacher said so", well thats not so good.
Note that topics like this are often argued in the workplace and will continue to be, as compilers and processors (and languages) evolve so do these kinds of rules and without defending your position and possibly being taught a lesson by someone with another opinion you wont move forward.
In the mean time, then just do whatever the one that speaks the loudest or carries the biggest stick says (until such a time as you are the one that yells the loudest and carries the biggest stick).
I would like to argue against the point being made throughout this thread that it makes multi-threading harder or impossible per se. Global variables are shared state, but the alternatives to globals (e. g. passing pointers around) might also share state. The problem with multi-threading is how to properly use shared state, not whether that state happens to be shared through a global variable or something else.
Most of the time when you do multi-threading you need to share something. In a producer-consumer pattern for example, you might share some thread-safe queue that contains the work units. And you are allowed to share it because that data structure is thread-safe. Whether that queue is global or not is completely irrelevant when it comes to thread-safety.
The implied hope expressed throughout this thread that transforming a program from single-threaded to multi-threaded will be easier when not using globals is naive. Yes, globals make it easier to shoot yourself in the foot, but there's a lot of ways to shoot yourself.
I'm not advocating globals, as the other points still stand, my point is merely that the number of threads in a program has nothing to do with variable scope.
Yes, because if you let incompetent programmers use them (read 90% especially scientists) you end up with 600+ global variable spread over 20+ files and a project of 12,000 lines where 80% of the functions take void, return void, and operate entirely on global state.
It quickly becomes impossible to understand what is going on at any one point unless you know the entire project.
Global variables are fine in small programs, but horrible if used the same way in large ones.
This means that you can easily get in the habit of using them while learning. This is what your professor is trying to protect you from.
When you are more experienced it will be easier to learn when they are okay.
Global are good when it comes to configuration . When we want our configuration/changes to have a global impact on entire project.
So we can change one configuration and the changes are directed to entire project . But I must warn you would have to be very smart to use globals .
Use of Global variables actually depends on the requirements. Its advantage is that,it reduces the overhead of passing the values repeatedly.
But your professor is right because it raises security issues so use of global variables should be avoided as much as possible. Global variables also create problems which are sometimes difficult to debug.
For example:-
Situations when the variables values is getting modified on runtime. At that moment its difficult to identify which part of code is modifying it and on what conditions.
In the end of the day, your program or app can still work but its a matter of being tidy and having a complete understanding of whats going on. If you share a variable value among all functions, it may become difficult to trace what function is changing the value(if the function does so) and makes debugging a million times harder
Sooner or later you will need to change how that variable is set or what happens when it is accessed, or you just need to hunt down where it is changed.
It is practically always better to not have global variables. Just write the dam get and set methods, and be gland you when you need them a day, week or month later.
I usually use globals for values that are rarely changed like singletons or function pointers to functions in dynamically loaded library. Using mutable globals in multithreaded applications tends to lead to hard to track bug so I try to avoid this as a general rule.
Using a global instead of passing an argument is often faster but if you're writing a multithreaded application, which you often do nowadays, it generally doesn't work very well (you can use thread-statics but then the performance gain is questionable).
In web applications within an enterprize can be used for holding session/window/thread/user specific data on the server for reasons of optimization and to preserve against loss of work where connection are unstable. As mentioned, race conditions need to be handled. We use a single instance of a class for this information and it is carefully managed.
security is less means any one can manipulate the variables if they are declared global , for this one to explain take this example if you have balance as a global variable in your bank program the user function can manipulate this as well as bank officer can also manipulate this so there is a problem .only user should be given the read only and withdraw function but the clerk of the bank can add the amount when the user personally gives the cash in the desk.this is the way it works
In a multi-threaded application, use local variables in place of global variables to avoid a race condition.
A race condition occurs when multiple thread access a shared resource, with at least one thread having a write access to the data. Then, the result of the program is not predictable, and depends on the order of accesses to the data by different threads.
More on this here, https://software.intel.com/en-us/articles/use-intel-parallel-inspector-to-find-race-conditions-in-openmp-based-multithreaded-code
I am working on a motion control system, and will have at least 5 motors, each with parameters such as "gearbox ratio", "ticks per rev" "Kp", "Ki", "Kd", etc. that will be referenced upon construction of instances of the motors.
My question to StackOverflow is how should I organize these numbers? I know this is likely a preferential thing, but being new to coding I figure I could get some good opinions from you.
The three approaches I immediately see are as follows:
Write in the call to the constructor, either via variables or numbers-- PROS: limited coding, could be implemented in a way that it's easy to change, but possibly harder than #define's
Use #define's to accomplish similar to above -- PROS: least coding, easy to change (assuming you want to look at the source code)
Load a file (possibly named "motorparameters.txt") and load the parameters into an array and populate from that array. If I really wanted to I could add a GUI approach to changing this file rather than manual. -- PROS:easiest to change without diving into source code.
These parameters could change over time, and while there are other coders at the company, I would like to leave it in a way that's easy to configure. Do any of you see a particular benefit of #define vs. variables? I have a "constants.h" file already that I could easily add the #defines to, or I could add variables near the call to the constructor.
There's a principle know as YAGNI (You Ain't Gonna Need It) which says do the simplest thing first, then change it when (if) your requirements expand.
Sounds to me like the thing to do is:
Write a flexible motor class, that can handle any values (within reason), even though there are only 5 different sets of values you currently care about.
Define a component that returns the "right" values for the 5 motors in your system (or that constructs the 5 motors for your system using the "right values")
Initially implement that component to use some hard-coded values out of a header file
Retain the option to replace that component in future with an implementation of the same API, but that reads values out of a resource file, text file, XML file, GUI interaction with the user, off the internet, by making queries to the hardware to find out what motors it thinks it has, whatever.
I say this on the basis that you minimize expected effort by putting in a point of customizability where you suspect you'll want one (to prevent a lot of work when you change it later), but implement using the simplest thing that satisfies your current certain requirements.
Some people might say that it's not actually worth doing the typing (a) to define the component, better just to construct 5 motors in main() (b) to use constants from a header file, better just to type numeric literals in main(). The (b) people are widely despised as peddlers of "magic constants" (which doesn't mean they're necessarily wrong about relative total programming time by implementer and future maintainers, they just probably are) and the (a) people divide opinion. I tend to figure that defining this kind of thing takes a few minutes, so I don't really care whether it's worth it or not. Loading the values out of a file involves specifying a file format that I might regret as soon as I encounter a real reason to customize, so personally I can't be bothered with that until the requirement arises.
The general idea is to separate the portions of your code that will change from those that won't. The more likely something is to change, the more you need to make it easy to change.
If you're building a commercial app where hundreds or thousands of users will use many different motors, it might make sense to code up a UI and store the data in a config file.
If this is development code and these parameters are unlikely to change, then stuffing them into #defines in your constants.h file is probably the way to go.
Number 3 is a great option if you don't have security or IP concerns. Anytime you or someone else touches your code, you introduce the possibility of regressions. By keeping your parameters in a text file, not only are you making life easier on yourself, you're also reducing the scope of possible errors down the road.
I know my gut reaction to global variables is "badd!" but in the two game development courses I've taken at my college globals were used extensively, and now in the DirectX 9 game programming tutorial I am using (www.directxtutorial.com) I'm being told globals are okay in game programming ...? The site also recommends using only structs if you can when doing game programming to help keep things simple.
I'm really confused on this issue, and all the research I've been trying to do is very confusing. I realize there are issues when using global variables (threading issues, they make code harder to maintain, the state of them is hard to track etc) but also there is a cost associated with not using globals, I'd have to pass a loooot of information around very often which can be confusing and I imagine time-costing, although I guess pointers would speed the process up (this is my first time writing a game in C++.) Anyway, I realize there is probably no "right" or "wrong" answer here since both ways work, but I want my code to be as proper as I can so any input would be good, thank you very much!
The trouble with games and globals is that games (nowadays) are threaded at engine level. Game developers using an engine use the engine's abstractions rather than directly programming concurrency (IIRC). In many of the highlevel languages such as C++, threads sharing state is complex. When many concurrent processes share a common resource they have to make sure they don't tread on eachother's toes.
To get around this, you use concurrency control such as mutex and various locks. This in effect makes asynchronous critical sections of code access shared state in a synchronous manner for writing. The topic of concurrency control is too much to explain fully here.
Suffice to say, if threads run with global variables, it makes debugging very hard, as concurrency bugs are a nightmare (think, "which thread wrote that? Who holds that lock?").
There are exceptions in games programming API such as OpenGL and DX. If your shared data/globals are pointers to DX or OpenGL graphics contexts then typically this maps down to GPU operations which don't suffer so much from the same trouble.
Just be careful. Keeping objects representing 'player' or 'zombie' or whatever, and sharing them between threads can be tricky. Spawn 'player' threads and 'zombie group' threads instead and have a robust concurrency abstraction between them based on message passing rather than accessing those object's state across the thread/critical section boundary.
Saying all that, I do agree with the "Say no to globals" point made below.
For more on the complexities of threads and shared state see:
1 POSIX Threads API - I know it is POSIX, but provides a good idea that translates to other API
2 Wikipedia's excellent range of articles on concurrency control mechanisms
3 The Dining Philosopher's problem (and many others)
4 ThreadMentor tutorials and articles on threading
5 Another Intel article, but more of a marketing thing.
6 An ACM article on building multi-threaded game engines
Have worked on AAA game titles, I can tell you that globals should be eradicated immediately before they spread like a cancer. I've seen them corrupt an I/O subsystem so completely that it had to be wholly thrown out to be rewritten.
Say no to globals. Always.
In this respect, there's no difference between games and other programs. While arguably OK in small examples given in elementary courses, global variables are strongly discouraged in real programs.
So if you want to write correct, readable and maintainable code, stay away from global variables as much as possible.
All answers until now deal with the globals/threads issue, but I will add my 2 cents to the struct/class (understood as all public attributes/private attributes + methods) discussion. Preferring structs over classes on the grounds of that being simpler is in the same line of thought of preferring assembler over C++ on the grounds of that being a simpler language.
The fact that you have to think on how your entities are going to be used and provide methods for it makes the concrete entity a little more complex, but greatly simplifies the rest of the code and maintainability. The whole point of encapsulation is that it simplifies the program by providing clear ways in that your data can be modified while maintaining your objects invariants. You control the entry points and what can happen there. Having all attributes public imply that any part of the code can have a small innocent error (forgot to check condition X) and break your invariants completely (health below 0, but no 'death' processing being triggered)
The other common discussion is performance: If I just need to update a datum, then having to call a method will impact my performance. Not really. If methods are simple and you provide them in the header as inlines (inside the class body or outside with the inline keyword), the compiler will be able to copy those instructions to each use place. You get the guarantee that the compiler will not leave out any check by mistake, and no impact in performance.
Having read a bit more what you posted though:
I'm being told globals are okay in
game programming ...? The site also
recommends using only structs if you
can when doing game programming to
help keep things simple.
Games code is no different from other code really. Gratuitous use of globals is bad regardless. And as for 'only use structs', that is just a load of crap. Approach game development on the same principles as any other software - you may find places where you need to bend this but they should be the exception, typically when dealing with low-level hardware issues.
I would say that the advice about globals and 'keeping things simple' is probably a mixture of being easier to learn and old fashioned thinking about performance. When I was being taught about game programming I remember being told that C++ wasn't advised for games as it would be too slow but I've worked on multiple games using every facet of C++ which proves that isn't true.
I would add to everyone's answers here that globals are to be avoided where possible, I wouldn't be afraid to use whatever you need from C++ to make your code understandable and easy to use. If you come up against a performance problem then profile that specific issue and I'll bet that most of the time you won't need to remove use of a C++ feature but just think about your problem better. There may still be some platforms around that require pure C but I don't really have experience of them, even the Gameboy Advance seemed to deal with C++ quite nicely.
The metaissue here is that of state. How much state does any given function depend on, and how much does it change? Then consider how much state is implicit versus explicit, and cross that with the inspect vs. change.
If you have a function/method/whatever that is called DoStuff(), you have no idea from the outside what it depends on, what it needs, and what's going to happen to the shared state. If this is a class member, you also have no idea how that object's state is going to mutate. This is bad.
Contrast to something like cosf(2), this function is understood not to change any global state, and any state that it requires (lookup tables for example) are hidden from view and have no effect on your program-at-large. This is a function that computes a value based on what you give it and it returns that value. It changes no state.
Class member functions then have the opportunity to step up some problems. There's a huge difference between
myObject.hitpoints -= 4;
myObject.UpdateHealth();
and
myObject.TakeDamage(4);
In the first example, an external operation is changing some state that one of its member functions implicitly depends upon. The fact is that these two lines can be separated by many other lines of code begins to make it non-obvious what's going to happen in the UpdateHealth call, even if outside of the subtraction it is the same as the TakeDamage call. Encapsulating the state changes (in the second example) implies that the specifics of the state changes aren't important to the outside world, and hopefully they're not. In the first example, the state changes are explicitly important to the outside world, and this is really no different than setting some globals and calling a function that uses those globals. E.g. hopefully you'd never see
extern float value_to_sqrt;
value_to_sqrt = 2.0f;
sqrt(); // reads the global value_to_sqrt
extern float sqrt_value; // has the results of the sqrt.
And yet, how many people do exactly this sort of thing in other contexts? (Considering especially that class instance state is "global" in regards to that particular instance.)
So- prefer giving explicit instruction to your function calls, and prefer that they return the results directly rather than having to explicitly set state before calling a function and then checking other state after it returns.
The more state dependencies a bit of code has, the harder it'll be to make it multithread safe, but that has already been covered above. The point I want to make is that the problem isn't so much globals but more the visibility of the collection of state that is required for a bit of code to operate (and subsequently how much other code also depends on that state).
Most games aren't multi-threaded, although newer titles are going down that route, and so they've managed to get away with it so far.
Saying that globals are okay in games is like not bothering to fix the brakes on your car because you only drive at 10mph!
It's bad practice which ever way you look at it.
You only have to look at the number of bugs in games to see examples of this.
If in class A you need to access data D, instead of setting D global, you'd better put into A a reference to D.
Globals are NOT intrinsically bad. In C for instance they are part of the language's normal use... and since C++ builds on C they still have a place.
On an aesthetic level, it's better to avoid them where you can sensibly make them part of a class, but if all you do is wrap a bunch of globals into a singleton, you made things worse because at least with globals it's obvious what the point is.
Be careful, but for some things it makes less sense to force OO concepts on what is actually a global value.
Two specific issues that I've encountered in the past:
First: If you're attempting to separate e.g. render phase (const access to most game state) from logic phase (non-const access to most game state) for whatever reason (decoupling render rate from logic rate, synchronizing game state across a network, recording and playback of gameplay at a fixed point in the frame, etc), globals make it very hard to enforce that.
Problems tend to creep in and become hard to debug and eradicate. This also has implications for threaded renderers separate from game logic, or the like (the other answers cover this topic thoroughly).
Second: The presence of many globals tends to bloat the literal pool, which the compiler typically places after each function.
If you get to your state through either a single "struct GlobalTable" which holds globals or a collection of methods on an object or the like, your literal pool tends to be a lot smaller, decreasing the size of the .text section in your executable.
This is mostly a concern for instruction set architectures that can't embed load targets directly into instructions (see e.g. fixed-width ARM or Thumb version 1 instruction encoding on ARM processors). Even on modern processors I'd wager you'll get slightly smaller codegen.
It also hurts doubly when your instruction and data caches are separate (again, as on some ARM processors); you'll tend to get two cachelines loaded where you might only need one with a unified cache. Since the literal pool may count as data, and won't always start on a cacheline boundary, the same cacheline might have to be loaded both into the i-cache and d-cache.
This may count as a "microoptimization", but it's a transform that's relatively easily applied to an entire codebase (e.g. extern struct GlobalTable { /* ... */ } the; and then replace all g_foo with the.foo)... And we've seen code size decrease between 5 and 10 percent in some cases, with a corresponding boost to performance due to keeping the caches cleaner.
I'm going to take a risk and say it depends to me on the scope/scale of your project. Because if you are trying to program an epic game with a boatload of code, then globals could, and easily in the most painful way in hindsight, cost you way more time than they save.
But if you are trying to code, say, something as simple as Super Mario or Metroid or Contra or even simpler, like Pac-Man, then these games were originally coded in 6502 assembly and used globals for almost everything. Just about all the game data and state was stored in data segments, and that didn't stop the devs from shipping a very competent and popular product in spite of working with absolutely inferior tools and engineering standards which would probably horrify people today.
So if you are just writing this kind of small and simple game which has a very limited scope and isn't designed to grow and expand far beyond its original design, isn't designed to be maintained for years and years, with a few thousand lines of simple C++ code, then I don't see the big deal of using a global here or a singleton there. Someone obsessed with trying to engineer Super Mario with the soundest engineering techniques with SOLID and a DI framework could end up taking far, far longer to ship than even the devs who wrote it in 6502 asm.
And I'm getting old and there's something to it there when I look at these old simple games and how they were coded, and it almost seems like the devs were doing something right in spite of the hard-coded magic numbers and globals all over the place while I spend my career fumbling around and trying to figure out the best way to engineer things. That said this is probably a very unpopular opinion, and not one I would have liked either a decade or two ago, but there's something to it. I don't look at the 6502 asm of Metroid and think, "these devs underengineered their product and their lives would have been so much easier if they did this or that." Seems like they did things just about right.
But again this is for small-scale stuff, maybe in the indie category of games by today's standards, and in the smaller of the indie games among them, and far from doing anything ground-breaking in terms of how much data it can process or using cutting-edge hardware techniques. If in doubt, I'd definitely suggest to err on the side of avoiding globals. It's also a little bit trickier in C++ as opposed to say, C, since you can have objects with constructor and destructors, and initialization and destruction order isn't well-defined and easily predictable for global objects. There I'd say to lean even more on the side of avoiding globals since they can trip you up in whole new ways when you aren't explicitly initializing and destroying them yourself in a predictable order. And naturally if you want to multithread a lot beyond a critical loop here and there, then your ability to reason about thread-safety of any particular code will be severely diminished if you cannot minimize the scope/visibility of your game state to the minimum of places.
When I start writing code from scratch, I have a bad habit of quickly writing everything in one function, the whole time thinking "I'll make it more modular later". Then when later comes along, I have a working product and any attempts to fix it would mean creating functions and having to figure out what I need to pass.
It gets worst because it becomes extremely difficult to redesign classes when your project is almost done. For example, I usually do some planning before I start writing code, then when my project is done, I realized I could have made the classes more modular and/or I could have used inheritance. Basically, I don't think I do enough planning and I don't get more than one-level of abstraction.
So in the end, I'm stuck with a program with a large main function, one class and a few helper functions. Needless to say, it is not very reusable.
Has anybody had the same problem and have any tips to overcome this? One thing I had in mind was to write the main function with pseduocode (without much detail but enough to see what objects and functions they need). Essentially a top-down approach.
Is this a good idea? Any other suggestions?
"First we make our habits, then they make us."
This seems to apply for both good and bad habits. Sounds like a bad one has taken hold of you.
Practice being more modular up front until it's "just the way I do things."
Yes, the solution is easy, although it takes time to get used to it.
Never claim there will be a "later", where you sit down and just do refactoring. Instead, continue adding functionality to your code (or tests) and during this phase perform small, incremental refactorings. The "later" will basically be "always", but hidden in the phase where you are actually doing something new every time.
I find the TDD Red-Green-Refactor discipline works wonders.
My rule of thumb is that anything longer than 20 LoC should be clean. IME every project stands on a few "just-a-proof-of-concept"s that were never intended to end up in production code. Since this seems inevitable though, even 20 lines of proof-of-concept code should be clear, because they might end up being one of the foundations of a big project.
My approach is top-down. I write
while( obj = get_next_obj(data) ) {
wibble(obj);
fumble(obj);
process( filter(obj) );
}
and only start to write all these functions later. (Usually they are inline and go into the unnamed namespace. Sometimes they turn out to be one-liners and then I might eliminate them later.)
This way I also avoid to have to comment the algorithms: The function names are explanation enough.
You pretty much identified the issue. Not having enough planning.
Spend some time analyzing the solution you're going to develop, break it down into pieces of functionality, identify how it would be best to implement them and try to separate the layers of the application (UI, business logic, data access layer, etc).
Think in terms of OOP and refactor as early as it makes sense. It's a lot cheaper than doing it after everything is built.
Write the main function minimally, with almost nothing in it. In most gui programs, sdl games programs, open gl, or anything with any kind of user interface at all, the main function should be nothing more than an event eating loop. It has to be, or there will always be long stretches of time where the computer seems unresponsive, and the operating system thinks considers maybe shutting it down because it's not responding to messages.
Once you get your main loop, quickly lock that down, only to be modified for bug fixes, not new functionality. This may just end up displacing the problem to another function, but having a monilithic function is rather difficult to do in an event based application anyway. You'll always need a million little event handlers.
Maybe you have a monolithic class. I've done that. Mainly the way to deal with it is to try and keep a mental or physical map of dependencies, and note where there's ... let's say, perforations, fissures where a group of functions doesn't explicitly depend on any shared state or variables with other functions in the class. There you can spin that cluster of functions off into a new class. If it's really a huge class, and really tangled up, I'd call that a code smell. Think about redesigning such a thing to be less huge and interdependant.
Another thing you can do is as you're coding, note that when a function grows to a size where it no longer fits on a single screen, it's probably too big, and at that point start thinking about how to break it down into multiple smaller functions.
Refactoring is a lot less scary if you have good tools to do it. I see you tagged your question as "C++" but the same goes for any language. Get an IDE where extracting and renaming methods, extracting variables, etc. is easy to do, and then learn how to use that IDE effectively. Then the "small, incremental refactorings" that Stefano Borini mentions will be less daunting.
Your approach isn't necessarily bad -- earlier more modular design might end up as over-engineering.
You do need to refactor -- this is a fact of life. The question is when? Too late, and the refactoring is too big a task and too risk-prone. Too early, and it might be over-engineering. And, as time goes on, you will need to refactor again .. and again. This is just part of the natural life-cycle of software.
The trick is to refactor soon, but not too soon. And frequently, but not too frequently. How soon and how frequently? That's why it's a art and not a science :)