So basically I am trying to set an environment variable in my C++ code to force the number of threads to be 1. I am using multiple machine learning libraries which by default use OpenMP and can be forced to operate in single thread mode by setting the following environment variable: OMP_NUM_THREADS=1
Here's my issue. I want to set this environment variable from within a library I am building.
When I set the environment variable from my main function (executable linking against the library I am building) then it works as expected (only 1 thread is used during the execution of the program):
auto implPtr = FRVT_11::Interface::getImplementation();
implPtr->initialize(CONFIG_DIR);
char ompEnv[]="OMP_NUM_THREADS=1";
putenv( ompEnv );
// More code
However, if I try to set the environment variable from within the library I am building (for example from within the getImplementation function), then the number of threads used is 4 instead of 1:
// This does not work
std::shared_ptr<FRVT_11::Interface> FRVT_11::Interface::getImplementation() {
char ompEnv[]="OMP_NUM_THREADS=1";
putenv( ompEnv );
return std::make_shared<MyImplementation>();
}
Any ideas why this would be the case? I am building and shipping the library so I need the number of threads to be set from within the library.
Your "library function" version is undefined behavior.
Your "main function" version is also likely undefined behavior, as an extra bonus, but you aren't aware of it, yet.
From the Linux version of the putenv(3) manual page (other OS implementations are likely the same):
The string pointed to by [the parameter to putenv()] becomes part of
the environment, so altering the string changes the environment.
That's your big, honking, alarm bell: you better not even think of touching this string, ever again, even with a 10-foot pole, because it is now a part of the environment.
In your shared library version:
char ompEnv[]="OMP_NUM_THREADS=1";
This array is a local variable in the function. Therefore, this array gets destroyed when this function returns. But this array is also passed as a parameter to putenv(). Grand total: as soon as this function returns, one of your environment variables is now a dangling pointer.
There is no sufficient information to conclusively prove that your "main function" version is also undefined behavior, but it's highly likely, too.
Related
I broke my Visual Studio project (2015, C++) into three pieces:
The main application in a static library
The executable which just has a main function
Unit tests which use the static .lib file so that they can import the required classes and do the unit tests on them.
Before I split it into a lib/exe/tests, the main application was just a standalone executable, and it worked perfectly fine.
Now I can't run the executable with only the main function, nor the unit tests as a certain pointer is always null. The only main difference is that I'm using a raw pointer in this example, but I was using a unique_ptr in my code (currently however I switched to the raw pointer just to make sure this following example is as accurate as possible and it didn't magically compile/run properly with the raw pointer).
The code looks very similar to the following (the extra code has been removed):
// console.hpp
extern Console *console;
The implementation cpp file (only of what is needed):
// console.cpp
Console *console = new Console();
Now in some unrelated function, this code fails due to the console pointer being a nullptr:
// some_other_file.cpp
#include "console.hpp"
// Inside some function...
console->doSomething(); // console is NULL
Again, the code I have worked perfectly fine when it was in one project. Regardless, it all compiles fine with no linking errors even though it has been broken into 3 pieces, but now that pointer is always null.
As a last interesting note, the non-pointer variables that are global and extern do work. Is this something limited to pointers/unique_ptrs?
Is this something fixable by a singleton pattern?
The clue is in this comment: "it appears other code is getting called before the main function that don't let the global variable get initialized."
The code referencing console is probably running as part of the initialization of another global and, in this case, is happening before the initialization of console. You have to be very careful to be sure that you're not depending on the order of the global initializers. You were probably getting lucky before you split the program, and now your luck has run out.
The easiest way to fix it is to use a singleton pattern. Instead of having other parts of the program directly reference the pointer, you have them call a function that returns the pointer, and that function will initialize it the first time. This ensures that it's initialized before it's used.
Here's a common pattern for it:
Console *GetConsole() {
static Console *console = new Console();
return console;
}
Now console is no longer a global variable. Everything that wants to access the console calls GetConsole instead.
The function-local static variable will be initialized the first time the function is called, and after that it just returns the value. If you have multiple threads and an older compiler, you have to do more work to protect against a possible race condition. Modern compilers should do the initialization in a thread-safe way.
From what I can tell you can kick off all the action in a constructor when you create a global object. So do you really need a main() function in C++ or is it just legacy?
I can understand that it could be considered bad practice to do so. I'm just asking out of curiosity.
If you want to run your program on a hosted C++ implementation, you need a main function. That's just how things are defined. You can leave it empty if you want of course. On the technical side of things, the linker wants to resolve the main symbol that's used in the runtime library (which has no clue of your special intentions to omit it - it just still emits a call to it). If the Standard specified that main is optional, then of course implementations could come up with solutions, but that would need to happen in a parallel universe.
If you go with the "Execution starts in the constructor of my global object", beware that you set yourself up to many problems related to the order of constructions of namespace scope objects defined in different translation units (So what is the entry point? The answer is: You will have multiple entry points, and what entry point is executed first is unspecified!). In C++03 you aren't even guaranteed that cout is properly constructed (in C++0x you have a guarantee that it is, before any code tries to use it, as long as there is a preceeding include of <iostream>).
You don't have those problems and don't need to work around them (wich can be very tricky) if you properly start executing things in ::main.
As mentioned in the comments, there are however several systems that hide main from the user by having him tell the name of a class which is instantiated within main. This works similar to the following example
class MyApp {
public:
MyApp(std::vector<std::string> const& argv);
int run() {
/* code comes here */
return 0;
};
};
IMPLEMENT_APP(MyApp);
To the user of this system, it's completely hidden that there is a main function, but that macro would actually define such a main function as follows
#define IMPLEMENT_APP(AppClass) \
int main(int argc, char **argv) { \
AppClass m(std::vector<std::string>(argv, argv + argc)); \
return m.run(); \
}
This doesn't have the problem of unspecified order of construction mentioned above. The benefit of them is that they work with different forms of higher level entry points. For example, Windows GUI programs start up in a WinMain function - IMPLEMENT_APP could then define such a function instead on that platform.
Yes! You can do away with main.
Disclaimer: You asked if it were possible, not if it should be done. This is a totally un-supported, bad idea. I've done this myself, for reasons that I won't get into, but I am not recommending it. My purpose wasn't getting rid of main, but it can do that as well.
The basic steps are as follows:
Find crt0.c in your compiler's CRT source directory.
Add crt0.c to your project (a copy, not the original).
Find and remove the call to main from crt0.c.
Getting it to compile and link can be difficult; How difficult depends on which compiler and which compiler version.
Added
I just did it with Visual Studio 2008, so here are the exact steps you have to take to get it to work with that compiler.
Create a new C++ Win32 Console Application (click next and check Empty Project).
Add new item.. C++ File, but name it crt0.c (not .cpp).
Copy contents of C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\crt\src\crt0.c and paste into crt0.c.
Find mainret = _tmain(__argc, _targv, _tenviron); and comment it out.
Right-click on crt0.c and select Properties.
Set C/C++ -> General -> Additional Include Directories = "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\crt\src".
Set C/C++ -> Preprocessor -> Preprocessor Definitions = _CRTBLD.
Click OK.
Right-click on the project name and select Properties.
Set C/C++ -> Code Generation -> Runtime Library = Multi-threaded Debug (/MTd) (*).
Click OK.
Add new item.. C++ File, name it whatever (app.cpp for this example).
Paste the code below into app.cpp and run it.
(*) You can't use the runtime DLL, you have to statically link to the runtime library.
#include <iostream>
class App
{
public: App()
{
std::cout << "Hello, World! I have no main!" << std::endl;
}
};
static App theApp;
Added
I removed the superflous exit call and the blurb about lifetime as I think we're all capable of understanding the consequences of removing main.
Ultra Necro
I just came across this answer and read both it and John Dibling's objections below. It was apparent that I didn't explain what the above procedure does and why that does indeed remove main from the program entirely.
John asserts that "there is always a main" in the CRT. Those words are not strictly correct, but the spirit of the statement is. Main is not a function provided by the CRT, you must add it yourself. The call to that function is in the CRT provided entry point function.
The entry point of every C/C++ program is a function in a module named 'crt0'. I'm not sure if this is a convention or part of the language specification, but every C/C++ compiler I've come across (which is a lot) uses it. This function basically does three things:
Initialize the CRT
Call main
Tear down
In the example above, the call is _tmain but that is some macro magic to allow for the various forms that 'main' can have, some of which are VS specific in this case.
What the above procedure does is it removes the module 'crt0' from the CRT and replaces it with a new one. This is why you can't use the Runtime DLL, there is already a function in that DLL with the same entry point name as the one we are adding (2). When you statically link, the CRT is a collection of .lib files, and the linker allows you to override .lib modules entirely. In this case a module with only one function.
Our new program contains the stock CRT, minus its CRT0 module, but with a CRT0 module of our own creation. In there we remove the call to main. So there is no main anywhere!
(2) You might think you could use the runtime DLL by renaming the entry point function in your crt0.c file, and changing the entry point in the linker settings. However, the compiler is unaware of the entry point change and the DLL contains an external reference to a 'main' function which you're not providing, so it would not compile.
Generally speaking, an application needs an entry point, and main is that entry point. The fact that initialization of globals might happen before main is pretty much irrelevant. If you're writing a console or GUI app you have to have a main for it to link, and it's only good practice to have that routine be responsible for the main execution of the app rather than use other features for bizarre unintended purposes.
Well, from the perspective of the C++ standard, yes, it's still required. But I suspect your question is of a different nature than that.
I think doing it the way you're thinking about would cause too many problems though.
For example, in many environments the return value from main is given as the status result from running the program as a whole. And that would be really hard to replicate from a constructor. Some bit of code could still call exit of course, but that seems like using a goto and would skip destruction of anything on the stack. You could try to fix things up by having a special exception you threw instead in order to generate an exit code other than 0.
But then you still run into the problem of the order of execution of global constructors not being defined. That means that in any particular constructor for a global object you won't be able to make any assumptions about whether or not any other global object yet exists.
You could try to solve the constructor order problem by just saying each constructor gets its own thread, and if you want to access any other global objects you have to wait on a condition variable until they say they're constructed. That's just asking for deadlocks though, and those deadlocks would be really hard to debug. You'd also have the issue of which thread exiting with the special 'return value from the program' exception would constitute the real return value of the program as a whole.
I think those two issues are killers if you want to get rid of main.
And I can't think of a language that doesn't have some basic equivalent to main. In Java, for example, there is an externally supplied class name who's main static function is called. In Python, there's the __main__ module. In perl there's the script you specify on the command line.
If you have more than one global object being constructed, there is no guarantee as to which constructor will run first.
If you are building static or dynamic library code then you don't need to define main yourself, but you will still wind up running in some program that has it.
If you are coding for windows, do not do this.
Running your app entirely from within the constructor of a global object may work just fine for quite awhile, but sooner or later you will make a call to the wrong function and end up with a program that terminates without warning.
Global object constructors run during the startup of the C runtime.
The C runtime startup code runs during the DLLMain of the C runtime DLL
During DLLMain, you are holding the DLL loader lock.
Tring to load another DLL while already holding the DLL loader lock results in a swift death for your process.
Compiling your entire app into a single executable won't save you - many Win32 calls have the potential to quietly load system DLLs.
There are implementations where global objects are not possible, or where non-trivial constructors are not possible for such objects (especially in the mobile and embedded realms).
I was asked an interview question to change the entry point of a C or C++ program from main() to any other function. How is it possible?
In standard C (and, I believe, C++ as well), you can't, at least not for a hosted environment (but see below). The standard specifies that the starting point for the C code is main. The standard (c99) doesn't leave much scope for argument:
5.1.2.2.1 Program startup: (1) The function called at program startup is named main.
That's it. It then waffles on a bit about parameters and return values but there's really no leeway there for changing the name.
That's for a hosted environment. The standard also allows for a freestanding environment (i.e., no OS, for things like embedded systems). For a freestanding environment:
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined. Any library facilities available to a freestanding program, other than the minimal set required by clause 4, are implementation-defined.
You can use "trickery" in C implementations so that you can make it look like main isn't the entry point. This is in fact what early Windows compliers did to mark WinMain as the start point.
First way: a linker may include some pre-main startup code in a file like start.o and it is this piece of code which runs to set up the C environment then call main. There's nothing to stop you replacing that with something that calls bob instead.
Second way: some linkers provide that very option with a command-line switch so that you can change it without recompiling the startup code.
Third way: you can link with this piece of code:
int main (int c, char *v[]) { return bob (c, v); }
and then your entry point for your code is seemingly bob rather than main.
However, all this, while of possibly academic interest, doesn't change the fact that I can't think of one single solitary situation in my many decades of cutting code, where this would be either necessary or desirable.
I would be asking the interviewer: why would you want to do this?
The entry point is actually the _start function (implemented in crt1.o) .
The _start function prepares the command line arguments and then calls main(int argc,char* argv[], char* env[]),
you can change the entry point from _start to mystart by setting a linker parameter:
g++ file.o -Wl,-emystart -o runme
Of course, this is a replacement for the entry point _start so you won't get the command line arguments:
void mystart(){
}
Note that global/static variables that have constructors or destructors must be initialized at the beginning of the application and destroyed at the end. Keep that in mind if you are planning on bypassing the default entry point which does it automatically.
From C++ standard docs 3.6.1 Main Function,
A program shall contain a global function called main, which is the designated start of the program. It is implementation-defined
whether a program in a freestanding environment is required to define a main function.
So, it does depend on your compiler/linker...
If you are on VS2010, this could give you some idea
As it is easy to understand, this is not mandated by the C++ standard and falls in the domain of 'implemenation specific behavior'.
This is highly speculative, but you might have a static initializer instead of main:
#include <iostream>
int mymain()
{
std::cout << "mymain";
exit(0);
}
static int sRetVal = mymain();
int main()
{
std::cout << "never get here";
}
You might even make it 'Java-like', by putting the stuff in a constructor:
#include <iostream>
class MyApplication
{
public:
MyApplication()
{
std::cout << "mymain";
exit(0);
}
};
static MyApplication sMyApplication;
int main()
{
std::cout << "never get here";
}
Now. The interviewer might have thought about these, but I'd personally never use them. The reasons are:
It's non-conventional. People won't understand it, it's nontrivial to find the entry point.
Static initialization order is nondeterministic. Put in another static variable and you'll never now if it gets initialized.
That said, I've seen it being used in production instead of init() for library initializers. The caveat is, on windows, (from experience) your statics in a DLL might or might not get initialized based on usage.
Modify the crt object that actually calls the main() function, or provide your own (don't forget to disable linking of the normal one).
With gcc, declare the function with attribute((constructor)) and gcc will execute this function before any other code including main.
For Solaris Based Systems I have found this. You can use the .init section for every platforms I guess:
pragma init (function [, function]...)
Source:
This pragma causes each listed function to be called during initialization (before main) or during shared module loading, by adding a call to the .init section.
It's very simple:
As you should know when you use constants in c, the compiler execute a kind of 'macro' changing the name of the constant for the respective value.
just include a #define argument in the beginning of your code with the name of start-up function followed by the name main:
Example:
#define my_start-up_function (main)
I think it is easy to remove the undesired main() symbol from the object before linking.
Unfortunately the entry point option for g++ is not working for me(the binary crashes before entering the entry point). So I strip undesired entry-point from object file.
Suppose we have two sources that contain entry point function.
target.c contains the main() we do not want.
our_code.c contains the testmain() we want to be the entry point.
After compiling(g++ -c option) we can get the following object files.
target.o, that contains the main() we do not want.
our_code.o that contains the testmain() we want to be the entry point.
So we can use the objcopy to strip undesired main() function.
objcopy --strip-symbol=main target.o
We can redefine testmain() to main() using objcopy too.
objcopy --redefine-sym testmain=main our_code.o
And then we can link both of them into binary.
g++ target.o our_code.o -o our_binary.bin
This works for me. Now when we run our_binary.bin the entry point is our_code.o:main() symbol which refers to our_code.c::testmain() function.
On windows there is another (rather unorthodox) way to change the entry point of a program: TLS. See this for more explanations: http://isc.sans.edu/diary.html?storyid=6655
Yes,
We can change the main function name to any other name for eg. Start, bob, rem etc.
How does the compiler knows that it has to search for the main() in the entire code ?
Nothing is automatic in programming.
somebody has done some work to make it looks automatic for us.
so it has been defined in the start up file that the compiler should search for main().
we can change the name main to anything else eg. Bob and then the compiler will be searching for Bob() only.
Changing a value in Linker Settings will override the entry point. i.e., MFC applications use a value of 'Windows (/SUBSYSTEM:WINDOWS)' to change entry point from main() to CWinApp::WinMain().
Right clicking on solution > Properties > Linker > System > Subsystem > Windows (/SUBSYSTEM:WINDOWS)
...
Very practical benefit to modifying entry point:
MFC is a framework we take advantage of to write Windows applications in C++. I know it's ancient, but my company maintains one for legacy reasons! You will not find a main() in MFC code. MSDN says the entry point is WinMain(), instead. Thus, you can override the WinMain() of your base CWinApp object. Or, most people override CWinApp::InitInstance() because the base WinMain() will call it.
Disclaimer: I use empty parentheses to denote a method, without caring how many arguments.
We have an application written in C/C++ which is broken into a single EXE and multiple DLLs. Each of these DLLs makes use of the same static library (utilities.lib).
Any global variable in the utility static library will actually have multiple instances at runtime within the application. There will be one copy of the global variable per module (ie DLL or EXE) that utilities.lib has been linked into.
(This is all known and good, but it's worth going over some background on how static libraries behave in the context of DLLs.)
Now my question.. We want to change utilities.lib so that it becomes a DLL. It is becoming very large and complex, and we wish to distribute it in DLL form instead of .lib form. The problem is that for this one application we wish to preserve the current behaviour that each application DLL has it's own copy of the global variables within the utilities library. How would you go about doing this? Actually we don't need this for all the global variables, only some; but it wouldn't matter if we got it for all.
Our thoughts:
There aren't many global variables within the library that we care about, we could wrap each of them with an accessor that does some funky trick of trying to figure out which DLL is calling it. Presumably we can walk up the call stack and fish out the HMODULE for each function until we find one that isn't utilities.dll. Then we could return a different version depending on the calling DLL.
We could mandate that callers set a particular global variable (maybe also thread local) prior to calling any function in utilities.dll. The utilities DLL could then use this global variable value to determine the calling context.
We could find some way of loading utilities.dll multiple times at runtime. Perhaps we'd need to make multiple renamed copies at build time, so that each application DLL can have it's own copy of the utilities DLL. This negates some of the advantages of using a DLL in the first place, but there are other applications for which this "static library" style behaviour isn't needed and which would still benefit from utilities.lib becoming utilities.dll.
You are probably best off simply having utilities.dll export additional functions to allocate and deallocate a structure that contains the variables, and then have each of your other worker DLLs call those functions at runtime when needed, such as in the DLL_ATTACH_PROCESS and DLL_DETACH_PROCESS stages of DllEntryPoint(). That way, each DLL gets its own local copy of the variables, and can pass the structure back to utilities.dll functions as an additional parameter.
The alternative is to simply declare the individual variables locally inside each worker DLL directly, and then pass them into utilities.dll as input/output parameters when needed.
Either way, do not have utilities.dll try to figure out context information on its own. It won't work very well.
If I were doing this, I'd factor out all stateful global variables - I would export a COM object or a simple C++ class that contains all the necessary state, and each DLL export would become a method on your class.
Answers to your specific questions:
You can't reliably do a stack trace like that - due to optimizations like tail call optimization or FPO you cannot determine who called you in all cases. You'll find that your program will work in debug, work mostly in release but crash occasionally.
I think you'll find this difficult to manage, and it also puts a demand that your library can't be reentrant with other modules in your process - for instance, if you support callbacks or events into other modules.
This is possible, but you've completely negated the point of using DLL's. Rather than renaming, you could copy into distinct directories and load via full path.
I looked in Stevens, and in the Posix Programmer's Guide, and the best I can find is
An array of strings called the enviroment is made available when the process begins.
This array is pointed to by the external variable environ, which is defined as:
extern char **environ;
It's that environ variable that has me hesitating. I want to say
-The calling process/shell has already allocated the block of null terminated strings
-the 'external' variable environ is used as the entry point by getenv().
-ipso facto feel free to call getenv() within a static initializer.
But I can't find any guarantee that the 'static initialization' of environ precedes all the other static initialization code. Am I overthinking this?
Update
On my platform (AMD Opteron, Redhat 4, GCC 3.2.3), setting LD_DEBUG shows that environ gets set before my static initializers are called. This is a nice thing to know; thanks, #codelogic. But it is not necessarily the result I'd get on all platforms.
Also, while I agree intuitively with #ChrisW on the behavior of the C/C++ runtime library, this is just my intuition based on experience. So anyone who can pipe up with a quote from someplace authoritative guaranteeing that environ is there before static initializers are called, bonus points!
I think you can run your program with LD_DEBUG set to see the exact order:
LD_DEBUG=all <myprogram>
EDIT:
If you look at the source code of the runtime linker (glibc 2.7), specifically in files:
sysdeps/unix/sysv/linux/init-first.c
sysdeps/i386/init-first.c
csu/libc-start.c
sysdeps/i386/elf/start.S
you will see that argc, argv and environ (alias for __environ) are set before any global constructors are called (the init functions). You can follow the execution starting right from _start, the actual entry point (start.S). As you've quoted Stevens "An array of strings called the enviroment is made available when the process begins", suggesting that environment assignment happens at the very beginning of the process initialization. This backed by the linker code, which does the same, should give you sufficient peace of mind :-)
EDIT 2: Also worth mentioning is that environ is set early enough that even the runtime linker can query it to determine whether or not to output verbosely (LD_DEBUG).
Given that both the environment setup and the invoking of the static initializers are functions that the language run-time has to perform before main() is invoked, I am not sure you will find a guarantee here. That is, I am not aware of a specific requirement here that this has to work and that the order is guaranteed prior to main() in, say, the ANSI language and library specifications or anything....but I didn't check to make sure either.
At the same time, I am not aware of a specific requirement that restricts which run-time library functions can be called from a static initializer. And, more to the point, it would feel like a run-time bug (to me) if you couldn't access the environment from one.
On this basis, I vote that I would expect this to work, is a safe assumption, and current data points seem to support this line of reasoning.
In my experience, the C run time library is initialized before the run-time invokes the initializers of your static variables (and so your initializers may call C run time library functions).