Changing global variable names - c++

I working on a huge code base written many years ago. We're trying to implement multi-threading and I'm incharge of cleaning up global variables (sigh!)
My strategy is to move all global variables to a class, and then individual threads will use instances of that class and the globals will be accessed through class instance and -> operator.
In first go, I've compiled a list of global variables using nm by finding B and D group object names. The list is not complete, and incase of static variables, I don't get file and line number info.
The second stage is even more messy, I've to replace all globals in the code base with classinstance->global_name pattern. I'm using cscope Change text string for this. The problem is that in case of some globals, their name is also being used locally inside functions, and thus cscope is replacing them as well.
Any other way to go about it? Any strategies, or help please!

just some suggestions, from my experience:
use eclipse: the C++ indexer is very good, and when dealing with a large project I find it very useful to track variables. shift+ctrl+g (I have forgotten how to access to it from menus!) let you search all the references, ctrl+alt+h (open call hierarchy) the caller-callee trees...
use eclipse: it has good refactoring tools, that is able to rename a variable without touching same-name-different-scope variables. (it often fails in case there are templates involved. I find it good, better than visual studio 2008 counterpart).
use eclipse: I know, it get some time to get started with it, but after you get it, it's very powerful. It can deal easily with the existing makefile based project (file -> new -> project -> makefile project with existing code).
I would consider not to use class members, but accessors: it's possibile that some of them will be shared among threads, and need some locking in order to be properly used. So I would prefer: classinstance->get_global_name()
As a final note, I don't know whether using the eclipse indexer at command-line would be helpful for your task. You can find some examples googling for it.
This question/answer can give you some more hints: any C/C++ refactoring tool based on libclang? (even simplest "toy example" ). In particular I do quote "...C++ is a bitch of a language to transform"

Halfway there: if a function uses a local name that hides the global name, the object file won't have an undefined symbol. nm can show you those undefined symbols, and then you know in which files you must replace at least some instances of that name.
However, you still have a problem in the rare cases that a file uses both the global name and in another function hides the global name. I'm not sure if this can be resolved with --ffunction-sections; but I think so: nm can show the section and thus you'll see the undefined symbols used in foo() appear in section .text.foo.

Related

Why am I getting some of my python functions, when I import my module, but not others?

I'm writing function libraries in Python 2.7.8, to use in some UAT testing using froglogic Squish. It's for my employer, so I'm not sure how much I can share and still conform to company privacy regulations.
Early in the development, I put some functions in some very small files. There was one file that contained only a single function. I could import the file and use the function with no problem.
I am at a point where I want to consolidate some of those tiny files into a larger file. For some reason that completely eludes me, some of the functions that I copy/pasted into this larger file, are not being found, and a "NameError: global name 'My_variableStringVerify' is not defined" error is displayed, for example. (I just added the "My_", in case there was a name collision with some other function...)
This worked with the EXACT same simple function in a separate 'module'. Other functions in this python file -- appearing both before and after this function in the new, expanded module -- are being found and used without problems. The only module this function needs is re. I am importing that. I deleted all the pyc files in the directory, in case that was not getting updated (I'm pretty sure it was, from the datetime on the pyc file).
I have created and used dozens of functions in a dozen of my 'library modules', all with no issues. What's so special about this trivial, piece of crap function, as a part of a different module? It worked before, and it STILL works -- as long as I do not try to use it from the new library module.
I'm not python guru, but I have been doing this kind of thing for years...
Ugh. What a fool. The answer was in the error, after all: "global name xxx is not found". I was trying to use the function directly inside a Squish API call, which is the global scope. Moving the call to my function outside of the Squish API call (using it in the local scope), it worked fine.
The detail that surprised me: I was using "from foo import *", in both cases (before and after adding it to another 'library' module of mine).
When this one function was THE ONLY function in foo, I was able to use it in the global scope successfully.
When it was just one of many functions in foo-extended (names have been changed, to protect the innocent), I could NOT use it in the global scope. I had to reference it in the local scope.
After spending more time reading https://docs.python.org/2.0/ref/import.html (yes, it's old), I'm surprised it appeared in the global scope in either case. That page did state that "(The current implementation does not enforce the latter two restrictions, but programs should not abuse this freedom, as future implementations may enforce them or silently change the meaning of the program.)" about scope restrictions with the "from foo import *" statement.
I guess I found an edge case that somehow skirted the restriction in this implementation.
Still... what a maroon! Verifies my statement that I am no python guru.

How to organize subroutines for use by multiple commands?

I am working on creating a package with two new commands, say foo and bar.
For example, if foo.ado contains:
program define foo
...
rex
end
program define rex
...
end
But my other command, bar.ado, also needs to call rex. Where should I put rex?
I see the following few options:
Create a rex.ado file as well.
Create a rex.do file and include it from within both foo.ado and bar.ado using include "`c(sysdir_plus)'r/rex.do" at the bottom of each file.
Copy the code into both foo.ado and bar.ado, which seems ugly because now the code must be maintained in two places.
What is best practice for organizing subroutines that are needed by both foo and bar?
Also, should the subroutine be called rex, _rex, or something else — maybe _foobar_rex — to indicate it is actually a sub-command that foo and bar depend on to work correctly rather than a separate command intended to stand on its own?
Create a rex.ado file as well
Your question is a bit too broad. Personally, I would go with the first option to be safe, although it really depends on the structure of your project. Sometimes including rex in a single ado file may be enough. This will be the case, for example, if foo is a wrapper command. However, for most other use cases, including two commands sharing a common program, i strongly believe that you will need to have a separate ado file.
The second option is obviously unnecessary, since the first does the same thing, plus it does not have to load the program every single time you call it. The third option is probably the worst in a programming context, as it may create conflicts and will be difficult to maintain down the road.
With regards to naming conventions, I would recommend using something like _rex only if you include the program as a subroutine in an ado file. Otherwise, rex will do just fine and will also indicate that the program has a wider scope within your project. It is also better, in my opinion, to provide a more elaborate explanation about the intended use of rex using a comment at the start of the ado file, rather than trying to incorporate this in the name.

C++: Can function pointers be traced back to the original function before compilation without looking at the function name?

I want to set up a server on which students can upload and run code for a course. However, I don't want them to access various functions, like system(), which could allow bad access to my server. I can search the pre-processor output for an explicit function call, but if the user makes a function pointer like this:
int (*syst)(const char*) = system;
syst("rm *");
I'm still open to the threat. However, I can't just search for the string "system", for example, since it's otherwise a valid name - if the student didn't include cstdlib, they could use that name as a variable name. Since this is a beginning programming course, having a blacklist of variable names ten miles long is a bad idea.
Is there a way to define the functions other than by name and allow me to search for that other designation before compiling their code?
By far the easiest solution is to compile the code - that's pretty harmless - and then look at the actual library imports. Users may have defined their own system, but that wouldn't cause system to be imported from glibc.
Showing imported symbols
The main reason you can't look at the raw source code is because #define allows malicious users to hide the blacklisted symbol names. But there are plenty of other possibilities to do that, including
auto hidden = &sys\
tem;
So you need some processing of the source, and it's probably easiest just to fully process the whole source.
I would also suggest running this inside a chroot as a non-privileged user. It's lighter weight than a VM.
Alas, it's not possible (easily) to get a functions name from a pointer
How to get function's name from function's pointer in C? That question is from a C perspective, but it's the same problem, essentially.

Tools to refactor names of types, functions and variables?

struct Foo{
Bar get(){
}
}
auto f = Foo();
f.get();
For example you decide that get was a very poor choice for a name but you have already used it in many different files and manually changing ever occurrence is very annoying.
You also can't really make a global substitution because other types may also have a method called get.
Is there anything for D to help refactor names for types, functions, variables etc?
Here's how I do it:
Change the name in the definition
Recompile
Go to the first error line reported and replace old with new
Goto 2
That's semi-manual, but I find it to be pretty easy and it goes quickly because the compiler error message will bring you right to where you need to be, and most editors can read those error messages well enough to dump you on the correct line, then it is a simple matter of telling it to repeat the last replacement again. (In my vim setup with my hotkeys, I hit F4 for next error message, then dot for repeat last change until it is done. Even a function with a hundred uses can be changed reliably* in a couple minutes.)
You could probably write a script that handles 90% of cases automatically too by just looking for ": Error: " in the compiler's output, extracting the file/line number, and running a plain text replace there. If the word shows up only once and outside a string literal, you can automatically replace it, and if not, ask the user to handle the remaining 10% of cases manually.
But I think it is easy enough to do with my editor hotkeys that I've never bothered trying to script it.
The one case this doesn't catch is if there's another function with the same name that might still compile. That should never happen if you do this change in isolation, because an ambiguous name wouldn't compile without it.
In that case, you could probably do a three-step compiler-assisted change:
Make sure your code compiles before. Then add #disable to the thing you want to rename.
Compile. Every place it complains about it being unusable for being disabled, do the find/replace.
Remove #disable and rename the definition. Recompile again to make sure there's nothing you missed like child classes (the compiler will then complain "method foo does not override any function" so they stand right out too.
So yeah, it isn't fully automated, but just changing it and having the compiler errors help find what's left is good enough for me.
Some limited refactoring support can be found in major IDE plugins like Mono-D or VisualD. I remember that Brian Schott had plans to add similar functionality to his dfix tool by adding dependency on dsymbol but it doesn't seem implemented yet.
Not, however, that all such options are indeed of a very limited robustness right now. This is because figuring out the fully qualified name of any given symbol is very complex task in D, one that requires full semantics analysis to be done 100% correctly. Think about local imports, templates, function overloading, mixins and how it all affects identifying the symbol.
In the long run it is quite certain that we need to wait before reference D compiler frontend becomes available as a library to implement such refactoring tool in clean and truly reliable way.
A good find all feature can be better than a bad refactoring which, as mentioned previously, requires semantic.
Personally I have a find all feature in Coedit which displays the context of a match and works on all the project sources.
It's fast to process the results.

Do you really need a main() in C++?

From what I can tell you can kick off all the action in a constructor when you create a global object. So do you really need a main() function in C++ or is it just legacy?
I can understand that it could be considered bad practice to do so. I'm just asking out of curiosity.
If you want to run your program on a hosted C++ implementation, you need a main function. That's just how things are defined. You can leave it empty if you want of course. On the technical side of things, the linker wants to resolve the main symbol that's used in the runtime library (which has no clue of your special intentions to omit it - it just still emits a call to it). If the Standard specified that main is optional, then of course implementations could come up with solutions, but that would need to happen in a parallel universe.
If you go with the "Execution starts in the constructor of my global object", beware that you set yourself up to many problems related to the order of constructions of namespace scope objects defined in different translation units (So what is the entry point? The answer is: You will have multiple entry points, and what entry point is executed first is unspecified!). In C++03 you aren't even guaranteed that cout is properly constructed (in C++0x you have a guarantee that it is, before any code tries to use it, as long as there is a preceeding include of <iostream>).
You don't have those problems and don't need to work around them (wich can be very tricky) if you properly start executing things in ::main.
As mentioned in the comments, there are however several systems that hide main from the user by having him tell the name of a class which is instantiated within main. This works similar to the following example
class MyApp {
public:
MyApp(std::vector<std::string> const& argv);
int run() {
/* code comes here */
return 0;
};
};
IMPLEMENT_APP(MyApp);
To the user of this system, it's completely hidden that there is a main function, but that macro would actually define such a main function as follows
#define IMPLEMENT_APP(AppClass) \
int main(int argc, char **argv) { \
AppClass m(std::vector<std::string>(argv, argv + argc)); \
return m.run(); \
}
This doesn't have the problem of unspecified order of construction mentioned above. The benefit of them is that they work with different forms of higher level entry points. For example, Windows GUI programs start up in a WinMain function - IMPLEMENT_APP could then define such a function instead on that platform.
Yes! You can do away with main.
Disclaimer: You asked if it were possible, not if it should be done. This is a totally un-supported, bad idea. I've done this myself, for reasons that I won't get into, but I am not recommending it. My purpose wasn't getting rid of main, but it can do that as well.
The basic steps are as follows:
Find crt0.c in your compiler's CRT source directory.
Add crt0.c to your project (a copy, not the original).
Find and remove the call to main from crt0.c.
Getting it to compile and link can be difficult; How difficult depends on which compiler and which compiler version.
Added
I just did it with Visual Studio 2008, so here are the exact steps you have to take to get it to work with that compiler.
Create a new C++ Win32 Console Application (click next and check Empty Project).
Add new item.. C++ File, but name it crt0.c (not .cpp).
Copy contents of C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\crt\src\crt0.c and paste into crt0.c.
Find mainret = _tmain(__argc, _targv, _tenviron); and comment it out.
Right-click on crt0.c and select Properties.
Set C/C++ -> General -> Additional Include Directories = "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\crt\src".
Set C/C++ -> Preprocessor -> Preprocessor Definitions = _CRTBLD.
Click OK.
Right-click on the project name and select Properties.
Set C/C++ -> Code Generation -> Runtime Library = Multi-threaded Debug (/MTd) (*).
Click OK.
Add new item.. C++ File, name it whatever (app.cpp for this example).
Paste the code below into app.cpp and run it.
(*) You can't use the runtime DLL, you have to statically link to the runtime library.
#include <iostream>
class App
{
public: App()
{
std::cout << "Hello, World! I have no main!" << std::endl;
}
};
static App theApp;
Added
I removed the superflous exit call and the blurb about lifetime as I think we're all capable of understanding the consequences of removing main.
Ultra Necro
I just came across this answer and read both it and John Dibling's objections below. It was apparent that I didn't explain what the above procedure does and why that does indeed remove main from the program entirely.
John asserts that "there is always a main" in the CRT. Those words are not strictly correct, but the spirit of the statement is. Main is not a function provided by the CRT, you must add it yourself. The call to that function is in the CRT provided entry point function.
The entry point of every C/C++ program is a function in a module named 'crt0'. I'm not sure if this is a convention or part of the language specification, but every C/C++ compiler I've come across (which is a lot) uses it. This function basically does three things:
Initialize the CRT
Call main
Tear down
In the example above, the call is _tmain but that is some macro magic to allow for the various forms that 'main' can have, some of which are VS specific in this case.
What the above procedure does is it removes the module 'crt0' from the CRT and replaces it with a new one. This is why you can't use the Runtime DLL, there is already a function in that DLL with the same entry point name as the one we are adding (2). When you statically link, the CRT is a collection of .lib files, and the linker allows you to override .lib modules entirely. In this case a module with only one function.
Our new program contains the stock CRT, minus its CRT0 module, but with a CRT0 module of our own creation. In there we remove the call to main. So there is no main anywhere!
(2) You might think you could use the runtime DLL by renaming the entry point function in your crt0.c file, and changing the entry point in the linker settings. However, the compiler is unaware of the entry point change and the DLL contains an external reference to a 'main' function which you're not providing, so it would not compile.
Generally speaking, an application needs an entry point, and main is that entry point. The fact that initialization of globals might happen before main is pretty much irrelevant. If you're writing a console or GUI app you have to have a main for it to link, and it's only good practice to have that routine be responsible for the main execution of the app rather than use other features for bizarre unintended purposes.
Well, from the perspective of the C++ standard, yes, it's still required. But I suspect your question is of a different nature than that.
I think doing it the way you're thinking about would cause too many problems though.
For example, in many environments the return value from main is given as the status result from running the program as a whole. And that would be really hard to replicate from a constructor. Some bit of code could still call exit of course, but that seems like using a goto and would skip destruction of anything on the stack. You could try to fix things up by having a special exception you threw instead in order to generate an exit code other than 0.
But then you still run into the problem of the order of execution of global constructors not being defined. That means that in any particular constructor for a global object you won't be able to make any assumptions about whether or not any other global object yet exists.
You could try to solve the constructor order problem by just saying each constructor gets its own thread, and if you want to access any other global objects you have to wait on a condition variable until they say they're constructed. That's just asking for deadlocks though, and those deadlocks would be really hard to debug. You'd also have the issue of which thread exiting with the special 'return value from the program' exception would constitute the real return value of the program as a whole.
I think those two issues are killers if you want to get rid of main.
And I can't think of a language that doesn't have some basic equivalent to main. In Java, for example, there is an externally supplied class name who's main static function is called. In Python, there's the __main__ module. In perl there's the script you specify on the command line.
If you have more than one global object being constructed, there is no guarantee as to which constructor will run first.
If you are building static or dynamic library code then you don't need to define main yourself, but you will still wind up running in some program that has it.
If you are coding for windows, do not do this.
Running your app entirely from within the constructor of a global object may work just fine for quite awhile, but sooner or later you will make a call to the wrong function and end up with a program that terminates without warning.
Global object constructors run during the startup of the C runtime.
The C runtime startup code runs during the DLLMain of the C runtime DLL
During DLLMain, you are holding the DLL loader lock.
Tring to load another DLL while already holding the DLL loader lock results in a swift death for your process.
Compiling your entire app into a single executable won't save you - many Win32 calls have the potential to quietly load system DLLs.
There are implementations where global objects are not possible, or where non-trivial constructors are not possible for such objects (especially in the mobile and embedded realms).