How can I initialize MPI in a function? - c++

I want to use multi-processes in a function and how can I make it.
As you know, MPI_Init needs two parameter: "int argc, char **argv". Does it mean that I must add these two parameters in the function definition?
My requirement is that I want to parallelize a step in function in stead of the step in main program.
For example,
func(mat &A, vec &x) {
some computation on A;
auto B = sub_mat(A, 0, 10);
B*x; // I want to parallelize this computation
mat A;
vec x;
func(A, x);
I just want to use MPI in B*x, but I don't know how to init MPI? By the way, if I can init MPI int func, does A exist in every processes at this time?
Help me & thanks!

You do not need to pass argc and argv around since MPI-2 lifted the restriction in MPI-1 that compilant implementations may require arguments to MPI_Init to be the same as the arguments to main:
In MPI-2 implementations are not allowed to impose this requirement. Conforming implementations of MPI are required to allow applications to pass NULL for both the argc and argv arguments of main.
But you still have to test if MPI is already initialised since MPI_Init() (or MPI_Init_thread()) should be called no more than once. This is done using MPI_Initialized(), so your code should look like this:
int initialized, finalized;
if (!initialized)
// Perform work in parallel
// You also need this when your program is about to exit
if (!finalized)
Note that MPI can be initialised and then finalised only once for the entire lifetime of the application. That is surrounding a block of function code with MPI_Init() ... MPI_Finalize() won't work if the function is to be called multiple times, i.e. MPI doesn't function the same way as OpenMP with its parallel regions does.
By the way, if I can init MPI int func, does A exist in every processes at this time?
A running MPI program consists of multiple processes with their own private address spaces. Usually these are multiple copies of the same program code (the so-called Single Program Multiple Data or SPMD paradigm), but could also be multiple copies of several programs, written to work together (also called Multiple Programs Multiple Data or MPMD). SPMD is the more simple and more common case where all processes execute exactly the same code up to the point where their MPI rank is used to branch the execution into multiple directions. So yes, A exists in every process and if no (pseudo-)random numbers/events are involved in the preceding computations, then A would have the same value in every MPI process prior to the initialisation of the MPI library. Note that MPI_Init() is just a regular library call as any other library call. It doesn't change the content of user memory - it only makes the multitude of running MPI processes aware of one another and enables them to communicate with each other, thus enabling them to work collectively in order to solve the particular problem.

If you want to use MPI_Init in a subfunction, you have to pass int argc, char **argv to the function, in order to pass it on.
But even if you just want to parallelise a part of a subfunction, you can (and should for more transparent code) use MPI_Init early in the programm. E.g. after other initialisation stuff is finished, or if you want to use it close to your parallelised function, immediately before you call the function.
In principle the function does not have to know about argc and argv, does it?


How to use threads with a recursive template function

I've been trying to optimize a sorting algorithm (quicksort) with threads. I know it is already quite good in the std::sort() implementation, but I'm trying to beat it with optimizations on my computer, and learn about threads at the same time.
So, my question is, how do I use threads with my recursive quicksort function?
Here's the function (with the not-important-to-the-question stuff removed):
template <typename T>
void quicksort(T arr[], const int &size, const int &beginning, const int &end)
// Algorithm here
thread t1(quicksort, arr, size, beginning, slow - 1);
thread t2(quicksort, arr, size, slow + 1, end);
If I was wrong and you do end up needing more of the code, let me know and I'll update it.
I'm using Visual Studio 2012, and as of right now, the error states:
error C2661: 'std::thread::thread' : no overloaded function takes 5 arguments
I've also tried calling ref(arr), etc. on each of the parameters, but I got the same error.
After trying the solution by #mfontanini I can compile with no errors, but on running, I get:
Debug Error!
Program: ...sktop\VisualStudio\Projects\SpeedTester\Debug\SpeedTester.exe
- abort() has been called
(Press Retry to debug the application)
Repeated over an over again. Eventually, it exits with code 3.
You need to explicitly indicate which is the T template parameter:
thread t1(&quicksort<T>, arr, size, beginning, slow - 1);
Otherwise the compiler sees that you're referring to a function template, but not to which specific specialization; it can't deduce T out of nowhere.
Your main problem probably is that you need to join() the thread(s) you spawn. If the thread objects are destructed without a prior join() or detach() the implementation calls std::terminate().
You don't want detach(), as you need to know that all partial sorts are finished for the overall sort to be complete, so joining is the right thing to do.
Additionally there are a few more things you could improve:
You should not pass around ints by reference. Pass by value is more efficient for simple scalar types and referencing local variables from other threads is generally not a good idea (unless you have a good reason and protocol for it)
You start far too many threads. After partitioning you need two threads for the two sub-sorts, but you have three: the current thread also continues to run, so you should create just one new thread and do the other sub-sort in the current thread. (And join() the other part when done.)
You should not keep creating new threads when the partitions get small. It may generally be a good idea to have a cutoff size for your quicksort and use something non-recursive (like insertion sort) for smaller sizes, as the recursion overhead becomes higher than the algorithm complexity benefit. A similar cut-off is even more important for concurrent sorting: the overhead of a thread is much higher than a simple recursive call and with small (and nearby) partitions, the threads will start to hit the same cache lines frequently, slowing things down even more.
It is generally not a good idea to create threads without limit. That will eventually run into platform limits. You might want to restrict the count of threads to use (using an atomic counter) or use something like std::async with default launch policy to avoid launching more threads than the platform can handle.

Is c lib a static in c++ program?

I have a c lib, algo.lib, which I need to call in my c++ program. I realise that the variables in algo.lib is static, which creates problem for my c++ program, when I call algo.lib multiple times, or use threads to call algo.lib concurrently.
For example, in algo.lib, there is a int a which is initiall set to 0. When I call algo.lib the first time, a will be set to 1000. But when I call algo.lib another time, I want the variables in algo.lib to be in the initial state, that is, a = 0 and not a = 1000.
Is it possible to make algo.lib to become object-oriented, so that when I call its function, it is created as an object and is set to its initial state? And after finish running algo.lib, this object is destroyed?
Yes, it is possible. If you rewrite it. If you only have the binary - then you cannot change this behavior. You can solve it by creating a separate executable that will do what you want with it and then exit, and pass the results back to the main program through some IPC. Basically - wrap it with your own implementation that will effectively initialize the library for each separate call.

Why do thread creation methods take an argument?

All thread create methods like pthread_create() or CreateThread() in Windows expect the caller to provide a pointer to the arg for the thread. Isn't this inherently unsafe?
This can work 'safely' only if the arg is in the heap, and then again creating a heap variable
adds to the overhead of cleaning the allocated memory up. If a stack variable is provided as the arg then the result is at best unpredictable.
This looks like a half-cooked solution to me, or am I missing some subtle aspect of the APIs?
Many C APIs provide an extra void * argument so that you can pass context through third party APIs. Typically you might pack some information into a struct and point this variable at the struct, so that when the thread initializes and begins executing it has more information than the particular function that its started with. There's no necessity to keep this information at the location given. For instance you might have several fields that tell the newly created thread what it will be working on, and where it can find the data it will need. Furthermore there's no requirement that the void * actually be used as a pointer, its a typeless argument with the most appropriate width on a given architecture (pointer width), that anything can be made available to the new thread. For instance you might pass an int directly if sizeof(int) <= sizeof(void *): (void *)3.
As a related example of this style: A FUSE filesystem I'm currently working on starts by opening a filesystem instance, say struct MyFS. When running FUSE in multithreaded mode, threads arrive onto a series of FUSE-defined calls for handling open, read, stat, etc. Naturally these can have no advance knowledge of the actual specifics of my filesystem, so this is passed in the fuse_main function void * argument intended for this purpose. struct MyFS *blah = myfs_init(); fuse_main(..., blah);. Now when the threads arrive at the FUSE calls mentioned above, the void * received is converted back into struct MyFS * so that the call can be handled within the context of the intended MyFS instance.
Isn't this inherently unsafe?
No. It is a pointer. Since you (as the developer) have created both the function that will be executed by the thread and the argument that will be passed to the thread you are in full control. Remember this is a C API (not a C++ one) so it is as safe as you can get.
This can work 'safely' only if the arg is in the heap,
No. It is safe as long as its lifespan in the parent thread is as long as the lifetime that it can be used in the child thread. There are many ways to make sure that it lives long enough.
and then again creating a heap variable adds to the overhead of cleaning the allocated memory up.
Seriously. That's an argument? Since this is basically how it is done for all threads unless you are passing something much more simple like an integer (see below).
If a stack variable is provided as the arg then the result is at best unpredictable.
Its as predictable as you (the developer) make it. You created both the thread and the argument. It is your responsibility to make sure that the lifetime of the argument is appropriate. Nobody said it would be easy.
This looks like a half-cooked solution to me, or am i missing some subtle aspects of the APIs?
You are missing that this is the most basic of threading API. It is designed to be as flexible as possible so that safer systems can be developed with as few strings as possible. So we now hove boost::threads which if I guess is build on-top of these basic threading facilities but provide a much safer and easier to use infrastructure (but at some extra cost).
If you want RAW unfettered speed and flexibility use the C API (with some danger).
If you want a slightly safer use a higher level API like boost:thread (but slightly more costly)
Thread specific storage with no dynamic allocation (Example)
#include <pthread.h>
#include <iostream>
struct ThreadData
// Stuff for my thread.
ThreadData threadData[5];
extern "C" void* threadStart(void* data);
void* threadStart(void* data)
intptr_t id = reinterpret_cast<intptr_t>(data);
ThreadData& tData = threadData[id];
// Do Stuff
return NULL;
int main()
for(intptr_t loop = 0;loop < 5; ++loop)
pthread_t threadInfo; // Not good just makes the example quick to write.
pthread_create(&threadInfo, NULL, threadStart, reinterpret_cast<void*>(loop));
// You should wait here for threads to finish before exiting.
Allocation on the heap does not add a lot of overhead.
Besides the heap and the stack, global variable space is another option. Also, it's possible to use a stack frame that will last as long as the child thread. Consider, for example, local variables of main.
I favor putting the arguments to the thread in the same structure as the pthread_t object itself. So wherever you put the pthread record, put its arguments as well. Problem solved :v) .
This is a common idiom in all C programs that use function pointers, not just for creating threads.
Think about it. Suppose your function void f(void (*fn)()) simply calls into another function. There's very little you can actually do with that. Typically a function pointer has to operate on some data. Passing in that data as a parameter is a clean way to accomplish this, without, say, the use of global variables. Since the function f() doesn't know what the purpose of that data might be, it uses the ever-generic void * parameter, and relies on you the programmer to make sense of it.
If you're more comfortable with thinking in terms of object-oriented programming, you can also think of it like calling a method on a class. In this analogy, the function pointer is the method and the extra void * parameter is the equivalent of what C++ would call the this pointer: it provides you some instance variables to operate on.
The pointer is a pointer to the data that you intend to use in the function. Windows style APIs require that you give them a static or global function.
Often this is a pointer to the class you are intending to use a pointer to this or pThis if you will and the intention is that you will delete the pThis after the ending of the thread.
Its a very procedural approach, however it has a very big advantage which is often overlooked, the CreateThread C style API is binary compatible so that when you wrap this API with a C++ class (or almost any other language) you can do this actually do this. If the parameter was typed, you wouldn't be able to access this from another language as easily.
So yes, this is unsafe but there's a good reason for it.

What exactly is a reentrant function?

Most of the times, the definition of reentrance is quoted from Wikipedia:
A computer program or routine is
described as reentrant if it can be
safely called again before its
previous invocation has been completed
(i.e it can be safely executed
concurrently). To be reentrant, a
computer program or routine:
Must hold no static (or global)
non-constant data.
Must not return the address to
static (or global) non-constant
Must work only on the data provided
to it by the caller.
Must not rely on locks to singleton
Must not modify its own code (unless
executing in its own unique thread
Must not call non-reentrant computer
programs or routines.
How is safely defined?
If a program can be safely executed concurrently, does it always mean that it is reentrant?
What exactly is the common thread between the six points mentioned that I should keep in mind while checking my code for reentrant capabilities?
Are all recursive functions reentrant?
Are all thread-safe functions reentrant?
Are all recursive and thread-safe functions reentrant?
While writing this question, one thing comes to mind:
Are the terms like reentrance and thread safety absolute at all i.e. do they have fixed concrete definitions? For, if they are not, this question is not very meaningful.
1. How is safely defined?
Semantically. In this case, this is not a hard-defined term. It just mean "You can do that, without risk".
2. If a program can be safely executed concurrently, does it always mean that it is reentrant?
For example, let's have a C++ function that takes both a lock, and a callback as a parameter:
#include <mutex>
typedef void (*callback)();
std::mutex m;
void foo(callback f)
// use the resource protected by the mutex
if (f) {
// use the resource protected by the mutex
Another function could well need to lock the same mutex:
void bar()
At first sight, everything seems ok… But wait:
int main()
return 0;
If the lock on mutex is not recursive, then here's what will happen, in the main thread:
main will call foo.
foo will acquire the lock.
foo will call bar, which will call foo.
the 2nd foo will try to acquire the lock, fail and wait for it to be released.
Ok, I cheated, using the callback thing. But it's easy to imagine more complex pieces of code having a similar effect.
3. What exactly is the common thread between the six points mentioned that I should keep in mind while checking my code for reentrant capabilities?
You can smell a problem if your function has/gives access to a modifiable persistent resource, or has/gives access to a function that smells.
(Ok, 99% of our code should smell, then… See last section to handle that…)
So, studying your code, one of those points should alert you:
The function has a state (i.e. access a global variable, or even a class member variable)
This function can be called by multiple threads, or could appear twice in the stack while the process is executing (i.e. the function could call itself, directly or indirectly). Function taking callbacks as parameters smell a lot.
Note that non-reentrancy is viral : A function that could call a possible non-reentrant function cannot be considered reentrant.
Note, too, that C++ methods smell because they have access to this, so you should study the code to be sure they have no funny interaction.
4.1. Are all recursive functions reentrant?
In multithreaded cases, a recursive function accessing a shared resource could be called by multiple threads at the same moment, resulting in bad/corrupted data.
In singlethreaded cases, a recursive function could use a non-reentrant function (like the infamous strtok), or use global data without handling the fact the data is already in use. So your function is recursive because it calls itself directly or indirectly, but it can still be recursive-unsafe.
4.2. Are all thread-safe functions reentrant?
In the example above, I showed how an apparently threadsafe function was not reentrant. OK, I cheated because of the callback parameter. But then, there are multiple ways to deadlock a thread by having it acquire twice a non-recursive lock.
4.3. Are all recursive and thread-safe functions reentrant?
I would say "yes" if by "recursive" you mean "recursive-safe".
If you can guarantee that a function can be called simultaneously by multiple threads, and can call itself, directly or indirectly, without problems, then it is reentrant.
The problem is evaluating this guarantee… ^_^
5. Are the terms like reentrance and thread safety absolute at all, i.e. do they have fixed concrete definitions?
I believe they do, but then, evaluating a function is thread-safe or reentrant can be difficult. This is why I used the term smell above: You can find a function is not reentrant, but it could be difficult to be sure a complex piece of code is reentrant
6. An example
Let's say you have an object, with one method that needs to use a resource:
struct MyStruct
P * p;
void foo()
if (this->p == nullptr)
this->p = new P();
// lots of code, some using this->p
if (this->p != nullptr)
delete this->p;
this->p = nullptr;
The first problem is that if somehow this function is called recursively (i.e. this function calls itself, directly or indirectly), the code will probably crash, because this->p will be deleted at the end of the last call, and still probably be used before the end of the first call.
Thus, this code is not recursive-safe.
We could use a reference counter to correct this:
struct MyStruct
size_t c;
P * p;
void foo()
if (c == 0)
this->p = new P();
// lots of code, some using this->p
if (c == 0)
delete this->p;
this->p = nullptr;
This way, the code becomes recursive-safe… But it is still not reentrant because of multithreading issues: We must be sure the modifications of c and of p will be done atomically, using a recursive mutex (not all mutexes are recursive):
#include <mutex>
struct MyStruct
std::recursive_mutex m;
size_t c;
P * p;
void foo()
if (c == 0)
this->p = new P();
// lots of code, some using this->p
if (c == 0)
delete this->p;
this->p = nullptr;
And of course, this all assumes the lots of code is itself reentrant, including the use of p.
And the code above is not even remotely exception-safe, but this is another story… ^_^
7. Hey 99% of our code is not reentrant!
It is quite true for spaghetti code. But if you partition correctly your code, you will avoid reentrancy problems.
7.1. Make sure all functions have NO state
They must only use the parameters, their own local variables, other functions without state, and return copies of the data if they return at all.
7.2. Make sure your object is "recursive-safe"
An object method has access to this, so it shares a state with all the methods of the same instance of the object.
So, make sure the object can be used at one point in the stack (i.e. calling method A), and then, at another point (i.e. calling method B), without corrupting the whole object. Design your object to make sure that upon exiting a method, the object is stable and correct (no dangling pointers, no contradicting member variables, etc.).
7.3. Make sure all your objects are correctly encapsulated
No one else should have access to their internal data:
// bad
int & MyObject::getCounter()
return this->counter;
// good
int MyObject::getCounter()
return this->counter;
// good, too
void MyObject::getCounter(int & p_counter)
p_counter = this->counter;
Even returning a const reference could be dangerous if the user retrieves the address of the data, as some other portion of the code could modify it without the code holding the const reference being told.
7.4. Make sure the user knows your object is not thread-safe
Thus, the user is responsible to use mutexes to use an object shared between threads.
The objects from the STL are designed to be not thread-safe (because of performance issues), and thus, if a user want to share a std::string between two threads, the user must protect its access with concurrency primitives;
7.5. Make sure your thread-safe code is recursive-safe
This means using recursive mutexes if you believe the same resource can be used twice by the same thread.
"Safely" is defined exactly as the common sense dictates - it means "doing its thing correctly without interfering with other things". The six points you cite quite clearly express the requirements to achieve that.
The answers to your 3 questions is 3× "no".
Are all recursive functions reentrant?
Two simultaneous invocations of a recursive function can easily screw up each other, if
they access the same global/static data, for example.
Are all thread-safe functions reentrant?
A function is thread-safe if it doesn't malfunction if called concurrently. But this can be achieved e.g. by using a mutex to block the execution of the second invocation until the first finishes, so only one invocation works at a time. Reentrancy means executing concurrently without interfering with other invocations.
Are all recursive and thread-safe functions reentrant?
See above.
The common thread:
Is the behavior well defined if the routine is called while it is interrupted?
If you have a function like this:
int add( int a , int b ) {
return a + b;
Then it is not dependent upon any external state. The behavior is well defined.
If you have a function like this:
int add_to_global( int a ) {
return gValue += a;
The result is not well defined on multiple threads. Information could be lost if the timing was just wrong.
The simplest form of a reentrant function is something that operates exclusively on the arguments passed and constant values. Anything else takes special handling or, often, is not reentrant. And of course the arguments must not reference mutable globals.
Now I have to elaborate on my previous comment. #paercebal answer is incorrect. In the example code didn't anyone notice that the mutex which as supposed to be parameter wasn't actually passed in?
I dispute the conclusion, I assert: for a function to be safe in the presence of concurrency it must be re-entrant. Therefore concurrent-safe (usually written thread-safe) implies re-entrant.
Neither thread safe nor re-entrant have anything to say about arguments: we're talking about concurrent execution of the function, which can still be unsafe if inappropriate parameters are used.
For example, memcpy() is thread-safe and re-entrant (usually). Obviously it will not work as expected if called with pointers to the same targets from two different threads. That's the point of the SGI definition, placing the onus on the client to ensure accesses to the same data structure are synchronised by the client.
It is important to understand that in general it is nonsense to have thread-safe operation include the parameters. If you've done any database programming you will understand. The concept of what is "atomic" and might be protected by a mutex or some other technique is necessarily a user concept: processing a transaction on a database can require multiple un-interrupted modifications. Who can say which ones need to be kept in sync but the client programmer?
The point is that "corruption" doesn't have to be messing up the memory on your computer with unserialised writes: corruption can still occur even if all individual operations are serialised. It follows that when you're asking if a function is thread-safe, or re-entrant, the question means for all appropriately separated arguments: using coupled arguments does not constitute a counter-example.
There are many programming systems out there: Ocaml is one, and I think Python as well, which have lots of non-reentrant code in them, but which uses a global lock to interleave thread acesss. These systems are not re-entrant and they're not thread-safe or concurrent-safe, they operate safely simply because they prevent concurrency globally.
A good example is malloc. It is not re-entrant and not thread-safe. This is because it has to access a global resource (the heap). Using locks doesn't make it safe: it's definitely not re-entrant. If the interface to malloc had be design properly it would be possible to make it re-entrant and thread-safe:
malloc(heap*, size_t);
Now it can be safe because it transfers the responsibility for serialising shared access to a single heap to the client. In particular no work is required if there are separate heap objects. If a common heap is used, the client has to serialise access. Using a lock inside the function is not enough: just consider a malloc locking a heap* and then a signal comes along and calls malloc on the same pointer: deadlock: the signal can't proceed, and the client can't either because it is interrupted.
Generally speaking, locks do not make things thread-safe .. they actually destroy safety by inappropriately trying to manage a resource that is owned by the client. Locking has to be done by the object manufacturer, thats the only code that knows how many objects are created and how they will be used.
The "common thread" (pun intended!?) amongst the points listed is that the function must not do anything that would affect the behaviour of any recursive or concurrent calls to the same function.
So for example static data is an issue because it is owned by all threads; if one call modifies a static variable the all threads use the modified data thus affecting their behaviour. Self modifying code (although rarely encountered, and in some cases prevented) would be a problem, because although there are multiple thread, there is only one copy of the code; the code is essential static data too.
Essentially to be re-entrant, each thread must be able to use the function as if it were the only user, and that is not the case if one thread can affect the behaviour of another in a non-deterministic manner. Primarily this involves each thread having either separate or constant data that the function works on.
All that said, point (1) is not necessarily true; for example, you might legitimately and by design use a static variable to retain a recursion count to guard against excessive recursion or to profile an algorithm.
A thread-safe function need not be reentrant; it may achieve thread safety by specifically preventing reentrancy with a lock, and point (6) says that such a function is not reentrant. Regarding point (6), a function that calls a thread-safe function that locks is not safe for use in recursion (it will dead-lock), and is therefore not said to be reentrant, though it may nonetheless safe for concurrency, and would still be re-entrant in the sense that multiple threads can have their program-counters in such a function simultaneously (just not with the locked region). May be this helps to distinguish thread-safety from reentarncy (or maybe adds to your confusion!).
The answers your "Also" questions are "No", "No" and "No". Just because a function is recursive and/or thread safe it doesn't make it re-entrant.
Each of these type of function can fail on all the points you quote. (Though I'm not 100% certain of point 5).
non reentrant function means that there will be a static context, maintained by function. when first time entering, there will be create new context for you. and next entering, you don't send more parameter for that, for convenient to token analyze, . e.g. strtok in c. if you have not clear the context, there might be some errors.
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
return 0;
on the contrary of non-reentrant, reentrant function means calling function in anytime will get the same result without side effect. because there is none of context.
in the view of thread safe, it just means there is only one modification for public variable in current time, in current process. so you should add lock guard to ensure just one change for public field in one time.
so thread safety and reentrant are two different things in different views.reentrant function safety says you should clear context before next time for context analyze. thread safety says you should keep visit public field order.
The terms "Thread-safe" and "re-entrant" mean only and exactly what their definitions say. "Safe" in this context means only what the definition you quote below it says.
"Safe" here certainly doesn't mean safe in the broader sense that calling a given function in a given context won't totally hose your application. Altogether, a function might reliably produce a desired effect in your multi-threaded application but not qualify as either re-entrant or thread-safe according to the definitions. Oppositely, you can call re-entrant functions in ways that will produce a variety of undesired, unexpected and/or unpredictable effects in your multi-threaded application.
Recursive function can be anything and Re-entrant has a stronger definition than thread-safe so the answers to your numbered questions are all no.
Reading the definition of re-entrant, one might summarize it as meaning a function which will not modify any anything beyond what you call it to modify. But you shouldn't rely on only the summary.
Multi-threaded programming is just extremely difficult in the general case. Knowing which part of one's code re-entrant is only a part of this challenge. Thread safety is not additive. Rather than trying to piece together re-entrant functions, it's better to use an overall thread-safe design pattern and use this pattern to guide your use of every thread and shared resources in the your program.

Context of Main function in C or C++

Does the main function we define in C or C++ run in a process or thread.
If it runs in a thread, which process is responsible for spawning it
main() is the entry point for your program. C++ (current C++ anyway) doesn't know what a process or thread is. The word 'process' is not even in the index of the standard. What happens before and after main() is mostly implementation defined. So, the answer to your question is also implementation defined.
In general though most operating systems have the concept of process and thread and they have similar meanings (though in Linux, for example, a thread is actually a "light weight process"). You can generally assume that your program will be started in a new process and that main() will then be called by the original thread after the implementation defined initialization.
Since there's plenty of room for the implementation and/or you to start up a whole bunch of threads before main is called though you will probably generally want to consider main() to have been called during the execution of a thread. The best way to think about it though is probably in terms of the standard unless you really have to think about the implementation. The standard doesn't currently know what a process or thread is. C++0x will change that in some way but I'm not sure at this point what the new concepts will be or how they will relate to OS specific constructs.
My answer is specifically addressed at the C++ language part of the question. C is a different language and I haven't used it in a good 10 years so I forget how the globals initialization is specified.
It's a process that you spawn when you execute your program. The main function is called at the beginning of the program. It is all a part of the same program (i.e. one process).
When you ask your OS to start a new process, it initializes data structures for a process and for a single thread inside that process. The initial instruction pointer in that thread context is the process entry point, which is a function provided by your C runtime library. That library-provided entry point converts the environment table and command-line arguments into the format demanded by the C standard, and then calls your main function.
Your whole program is a single process unless it starts fork()ing things, and by default the process has one thread that does everything; main() starts on that thread