Can I use execvp() on a function defined inside my program?

Can I use execvp() on a function defined inside my program? - c++

I have a C++ function that I'd like to call using execvp(), due to the way my program is organized.
Is this possible?

All of the exec variants including execvp() can only call complete programs visible in the filesystem. The good news is that if you want to call a function in your already loaded program, all you need is fork(). It will look something like this pseudo-code:
int pid = fork();
if (pid == 0) {
// Call your function here. This is a new process and any
// changes you make will not be reflected back into the parent
// variables. Be careful with files and shared resources like
// database connections.
_exit(0);
}
else if (pid == -1) {
// An error happened and the fork() failed. This is a very rare
// error, but you must handle it.
}
else {
// Wait for the child to finish. You can use a signal handler
// to catch it later if the child will take a long time.
waitpid(pid, ...);
}

excecvp() is meant ot start a program not a function. So you'll have to wrap that function into a compiled executable file and then have that file's main call your function.

Creating processes can be heavyweight. If you really only want to call your function in parallel why not using threads. There are many platform independent libraries available that have threading support for C++ like Boost, QT or ACE.
If you really need your function to be executed in another process you can use fork or vfork. vfork may not be available on every platform and it has it's drawbacks so make sure if you can use it. If not just use fork.

Related

cancelling a search using threads

I am new to multi-threading. I am using c++ on unix.
In the code below, runSearch() takes a long time and I want to be able to kill the search as soon as "cancel == true". The function cancelSearch is called by another thread.
What is the best way to solve this problem?
Thanks you..
------------------This is the existing code-------------------------
struct SearchTask : public Runnable
{
bool cancel = false;
void cancelSearch()
{
cancel = true;
}
void run()
{
cancel = false;
runSearch();
if (cancel == true)
{
return;
}
//...more steps.
}
}
EDIT: To make it more clear, say runSearch() takes 10 mins to run. After 1 min, cancel==true, then I want to exit out of run() immediately rather than waiting another 9 more mins for runSearch() to complete.

You'll need to keep checking the flag throughout the search operation. Something like this:
void run()
{
cancel = false;
while (!cancel)
{
runSearch();
//do your thread stuff...
}
}

You have mentioned that you cannot modify runSearch(). With pthreads there's a pthread_setcancelstate() function, however I don't believe this is safe, especially with C++ code that expects RAII semantics.
Safe thread cancellation must be cooperative. The code that gets canceled must be aware of the cancellation and be able to clean up after itself. If the code is not designed to do this and is simply terminated then your program will probably exhibit undefined behavior.
For this reason C++'s std::thread does not offer any method of thread cancellation and instead the code must be written with explicit cancellation checks as other answers have shown.

Create a generic method that accepts a action / delegate. Have each step be something REALLY small and specific. Send the generic method a delegate / action of what you consider a "step". In the generic method detect if cancel is true and return if true. Because steps are small if it is cancelled it shouldn't take long for the thread to die.
That is the best advice I can give without any code of what the steps do.
Also note :
void run()
{
cancel = false;
runSearch();
while (!cancel)
{
//do your thread stuff...
}
}
Won't work because if what you are doing is not a iteration it will run the entire thread before checking for !cancel. Like I said if you can add more details on what the steps do it would easier to give you advice. When working with threads that you want to halt or kill, your best bet is to split your code into very small steps.

Basically you have to poll the cancel flag everywhere. There are other tricks you could use, but they are more platform-specific, like thread cancellation, or are not general enough like interrupts.
And cancel needs to be an atomic variable (like in std::atomic, or just protected it with a mutex) otherwise the compiler might just cache the value in a register and not see the update coming from another thread.

Reading the responses is right - just because you've called a blocking function in a thread doesn't mean it magically turns into a non-blocking call. The thread may not interrupt the rest of the program, but it still has to wait for the runSearch call to complete.
OK, so there are ways round this, but they're not necessarily safe to use.
You can kill a thread explicitly. On Windows you can use TerminateThread() that will kill the thread execution. Sound good right? Well, except that it is very dangerous to use - unless you know exactly what all the resources and calls are going on in the killed thread, you may find yourself with an app that refuses to work correctly next time round. If runSearch opens a DB connection for example, the TerminateThread call will not close it. Same applies to memory, loaded dlls, and all they use. Its designed for killing totally unresponsive threads so you can close a program and restart it.
Given the above, and the very strong recommendation you not use it, the next step is to call the runSearch in a external manner - if you run your blocking call in a separate process, then the process can be killed with a lot more certainty that you won't bugger everything else up. The process dies, clears up its memory, its heap, any loaded dlls, everything. So inside your thread, call CreateProcess and wait on the handle. You'll need some form on IPC (probably best not to use shared memory as it can be a nuisance to reset that when you kill the process) to transfer the results back to your main app. If you need to kill this process, call ExitProcess on it's handle (or exit in Linux)
Note that these exit calls require to be called inside the process, so you'll need to run a thread inside the process for your blocking call. You can terminate a process externally, but again, its dangerous - not nearly as dangerous as killing a thread, but you can still trip up occasionally. (use TerminateProcess or kill for this)

C++'s "system" without wait (Win32)

I have got a program which checks if there's a version update on the server. Now I have to do something like
if(update_avail) {
system("updater.exe");
exit(0);
}
but without waiting for "updater.exe" to complete. Otherwise I can't replace my main program because it is running. So how to execute "updater.exe" and immediately exit? I know the *nix way with fork and so on, how to do this in Windows?

Use CreateProcess(), it runs asynchronously. Then you would only have to ensure that updater.exe can write to the original EXE, which you can do by waiting or retrying until the original process has ended. (With a grace interval of course.)

There is no fork() in Win32. The API call you are looking for is called ::CreateProcess(). This is the underlying function that system() is using. ::CreateProcess() is inherently asynchronous: unless you are specifically waiting on the returned process handle, the call is non-blocking.
There is also a higher-level function ::ShellExecute(), that you could use if you are not redirecting process standard I/O or doing the waiting on the process. This has an advantage of searching the system PATH for the executable file, as well as the ability to launch batch files and even starting a program associated with a document file.

You need a thread for that
Look here: http://msdn.microsoft.com/en-us/library/y6h8hye8(v=vs.80).aspx
You are currently writing your code in the "main thread" (which usually is also your frame code).
So if you run something that takes time to complete it will halt the execution of your main thread, if you run it in a second thread your main thread will continue.
Update:
I've missed the part that you want to exit immediately.
execl() is likely what you want.
#include <unistd.h>
int main(){
execl("C:\\path\\to\\updater.exe", (const char *) 0);
return 0;
}
The suggested CreateProcess() can be used as well but execl is conforming to POSIX and would keep your code more portable (if you care at all).
#include <unistd.h>
extern char **environ;
int execl(const char *path, const char *arg, ...);
Update:
tested on Win-7 using gcc as compiler

Releasing C++ resources and fork-exec?

I'm trying to spawn a new process from my C++-project using fork-exec. I'm using fork-exec in order to create a bi-directional pipe to the child process. But I'm afraid my resources in the forked process won't get freed properly, since the exec-call will completely take over my process and is not going to call any destructors.
I tried circumventing this by throwing an exception and calling execl from a catch block at the end of main, but this solution doesn't destruct any singletons.
Is there any sensible way to achieve this safely? (hopefully avoiding any atExit hacks)
Ex: The following code outputs:
We are the child, gogo!
Parent proc, do nothing
Destroying object
Even though the forked process also has a copy of the singleton which needs to be destructed before I call execl.
#include <iostream>
#include <unistd.h>
using namespace std;
class Resources
{
public:
~Resources() { cout<<"Destroying object\n"; }
};
Resources& getRes()
{
static Resources r1;
return r1;
}
void makeChild(const string &command)
{
int pid = fork();
switch(pid)
{
case -1:
cout<<"Big error! Wtf!\n";
return;
case 0:
cout<<"Parent proc, do nothing\n";
return;
}
cout<<"We are the child, gogo!\n";
throw command;
}
int main(int argc, char* argv[])
{
try
{
Resources& ref = getRes();
makeChild("child");
}
catch(const string &command)
{
execl(command.c_str(), "");
}
return 0;
}

There are excellent odds that you don't need to call any destructors in between fork and exec. Yeah, fork makes a copy of your entire process state, including objects that have destructors, and exec obliterates all that state. But does it actually matter? Can an observer from outside your program -- another, unrelated process running on the same computer -- tell that destructors weren't run in the child? If there's no way to tell, there's no need to run them.
Even if an external observer can tell, it may be actively wrong to run destructors in the child. The usual example for this is: imagine you wrote something to stdout before calling fork, but it got buffered in the library and so has not actually been delivered to the operating system yet. In that case, you must not call fclose or fflush on stdout in the child, or the output will happen twice! (This is also why you almost certainly should call _exit instead of exit if the exec fails.)
Having said all that, there are two common cases where you might need to do some cleanup work in the child. One is file descriptors (do not confuse these with stdio FILEs or iostream objects) that should not be open after the exec. The correct way to deal with these is to set the FD_CLOEXEC flag on them as soon as possible after they are opened (some OSes allow you to do this in open itself, but that's not universal) and/or to loop from 3 to some large number calling close (not fclose) in the child. (FreeBSD has closefrom, but as far as I know, nobody else does, which is a shame because it's really quite handy.)
The other case is system-global thread locks, which - this is a thorny and poorly standardized area - may wind up held by both the parent and the child, and then inherited across exec into a process that has no idea it holds a lock. This is what pthread_atfork is supposed to be for, but I have read that in practice it doesn't work reliably. The only advice I can offer is "don't be holding any locks when you call fork", say sorry.

Side effects of exit() without exiting?

If my application runs out of memory, I would like to re-run it with changed parameters. I have malloc / new in various parts of the application, the sizes of which are not known in advance. I see two options:
Track all memory allocations and write a restarting procedure which deallocates all before re-running with changed parameters. (Of course, I free memory at the appropriate places if no errors occur)
Restarting the application (e.g., with WinExec() on Windows) and exiting
I am not thrilled by either solution. Did I miss an alternative maybe.
Thanks

You could embedd all the application functionality in a class. Then let it throw an expection when it runs out of memory. This exception would be catched by your application and then you could simply destroy the class, construct a new one and try again. All in one application in one run, no need for restarts. Of course this might not be so easy, depending on what your application does...

There is another option, one I have used in the past, however it requires having planned for it from the beginning, and it's not for the library-dependent programmer:
Create your own heap. It's a lot simpler to destroy a heap than to cleanup after yourself.
Doing so requires that your application is heap-aware. That means that all memory allocations have to go to that heap and not the default one. In C++ you can simply override the static new/delete operators which takes care of everything your code allocates, but you have to be VERY aware of how your libraries, even the standard library, use memory. It's not as simple as "never call a library method that allocates memory". You have to consider each library method on a case-by-case basis.
It sounds like you've already built your app and are looking for a shortcut to memory wiping. If that is the case, this will not help as you could never tack this kind of thing onto an already built application.

The wrapper-program (as proposed before) does not need to be a seperate executable. You could just fork, run your program and then test the return code of the child. This would have the additional benefit, that the operating system automatically reclaims the child's memory when it dies. (at least I think so)
Anyway, I imagined something like this (this is C, you might have to change the includes for C++):
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#define OUT_OF_MEMORY 99999 /* or whatever */
int main(void)
{
int pid, status;
fork_entry:
pid = fork();
if (pid == 0) {
/* child - call the main function of your program here */
} else if (pid > 0) {
/* parent (supervisor) */
wait(&status); /* waiting for the child to terminate */
/* see if child exited normally
(i.e. by calling exit(), _exit() or by returning from main()) */
if (WIFEXITED(status)) {
/* if so, we can get the status code */
if (WEXITSTATUS(status) == OUT_OF_MEMORY) {
/* change parameters */
goto fork_entry; /* forking again */
}
}
} else {
/* fork() error */
return 1;
}
return 0;
}
This might not be the most elegant solution/workaround/hack, but it's easy to do.

A way to accomplish this:
Define an exit status, perhaps like this:
static const int OUT_OF_MEMORY=9999;
Set up a new handler and have it do this:
exit(OUT_OF_MEMORY);
Then just wrap your program with another program that detects this
exit status. When it does then it can rerun the program.
Granted this is more of a workaround than a solution...
The wrapper program I mentioned above could be something like this:
static int special_code = 9999;
int main()
{
const char* command = "whatever";
int status = system(command);
while ( status == 9999 )
{
command = ...;
status = system(command);
}
return 0;
}
That's the basicness of it. I would use std::string instead of char* in production. I'd probably also have another condition for breaking out of the while loop, some maximum number of tries perhaps.
Whatever the case, I think the fork/exec route mentioned below is pretty solid, and I'm pretty sure a solution like it could be created for Windows using spawn and its brethren.

simplicity rules: just restart your app with different parameters.
it is very hard to either track down all allocs/deallocs and clean up the memory (just forget some minor blocks inside bigger chunks [fragmentation] and you still have problems to rerun the class), or to do introduce your own heap-management (very clever people have invested years to bring nedmalloc etc to live, do not fool yourself into the illusion this is an easy task).
so:
catch "out of memory" somehow (signals, or std::bad_alloc, or whatever)
create a new process of your app:
windows: CreateProcess() (you can just exit() your program after this, which cleans up all allocated resources for you)
unix: exec() (replaces the current process completely, so it "cleans up all the memory" for you)
done.

Be warned that on Linux, by default, your program can request more memory than the system has available. (This is done for a number of reasons, e.g. avoiding memory duplication when fork()ing a program into two with identical data, when most of the data will remain untouched.) Memory pages for this data won't be reserved by the system until you try to write in every page you've allocated.
Since there's no good way to report this (since any memory write can cause your system to run out memory), your process will be terminated by the out of memory process killer, and you won't have the information or opportunity for your process to restart itself with different parameters.
You can change the default by using the setrlimit system call, to to limit the RLIMIT_RSS which limits the total amount of memory your process can request. Only after you have done this will malloc return NULL or new throw a std::bad_alloc exception when you reach the limit that you have set.
Be aware that on a heavily loaded system, other processes can still contribute to a systemwide out of memory condition that could cause your program to be killed without malloc or new raising an error, but if you manage the system well, this can be avoided.

What is the closest thing Windows has to fork()?

I guess the question says it all.
I want to fork on Windows. What is the most similar operation and how do I use it.

Cygwin has fully featured fork() on Windows. Thus if using Cygwin is acceptable for you, then the problem is solved in the case performance is not an issue.
Otherwise you can take a look at how Cygwin implements fork(). From a quite old Cygwin's architecture doc:
5.6. Process Creation
The fork call in Cygwin is particularly interesting
because it does not map well on top of
the Win32 API. This makes it very
difficult to implement correctly.
Currently, the Cygwin fork is a
non-copy-on-write implementation
similar to what was present in early
flavors of UNIX.
The first thing that happens when a
parent process forks a child process
is that the parent initializes a space
in the Cygwin process table for the
child. It then creates a suspended
child process using the Win32
CreateProcess call. Next, the parent
process calls setjmp to save its own
context and sets a pointer to this in
a Cygwin shared memory area (shared
among all Cygwin tasks). It then fills
in the child's .data and .bss sections
by copying from its own address space
into the suspended child's address
space. After the child's address space
is initialized, the child is run while
the parent waits on a mutex. The child
discovers it has been forked and
longjumps using the saved jump buffer.
The child then sets the mutex the
parent is waiting on and blocks on
another mutex. This is the signal for
the parent to copy its stack and heap
into the child, after which it
releases the mutex the child is
waiting on and returns from the fork
call. Finally, the child wakes from
blocking on the last mutex, recreates
any memory-mapped areas passed to it
via the shared area, and returns from
fork itself.
While we have some ideas as to how to
speed up our fork implementation by
reducing the number of context
switches between the parent and child
process, fork will almost certainly
always be inefficient under Win32.
Fortunately, in most circumstances the
spawn family of calls provided by
Cygwin can be substituted for a
fork/exec pair with only a little
effort. These calls map cleanly on top
of the Win32 API. As a result, they
are much more efficient. Changing the
compiler's driver program to call
spawn instead of fork was a trivial
change and increased compilation
speeds by twenty to thirty percent in
our tests.
However, spawn and exec present their
own set of difficulties. Because there
is no way to do an actual exec under
Win32, Cygwin has to invent its own
Process IDs (PIDs). As a result, when
a process performs multiple exec
calls, there will be multiple Windows
PIDs associated with a single Cygwin
PID. In some cases, stubs of each of
these Win32 processes may linger,
waiting for their exec'd Cygwin
process to exit.
Sounds like a lot of work, doesn't it? And yes, it is slooooow.
EDIT: the doc is outdated, please see this excellent answer for an update

I certainly don't know the details on this because I've never done it it, but the native NT API has a capability to fork a process (the POSIX subsystem on Windows needs this capability - I'm not sure if the POSIX subsystem is even supported anymore).
A search for ZwCreateProcess() should get you some more details - for example this bit of information from Maxim Shatskih:
The most important parameter here is SectionHandle. If this parameter
is NULL, the kernel will fork the current process. Otherwise, this
parameter must be a handle of the SEC_IMAGE section object created on
the EXE file before calling ZwCreateProcess().
Though note that Corinna Vinschen indicates that Cygwin found using ZwCreateProcess() still unreliable:
Iker Arizmendi wrote:
> Because the Cygwin project relied solely on Win32 APIs its fork
> implementation is non-COW and inefficient in those cases where a fork
> is not followed by exec. It's also rather complex. See here (section
> 5.6) for details:
>
> http://www.redhat.com/support/wpapers/cygnus/cygnus_cygwin/architecture.html
This document is rather old, 10 years or so. While we're still using
Win32 calls to emulate fork, the method has changed noticably.
Especially, we don't create the child process in the suspended state
anymore, unless specific datastructes need a special handling in the
parent before they get copied to the child. In the current 1.5.25
release the only case for a suspended child are open sockets in the
parent. The upcoming 1.7.0 release will not suspend at all.
One reason not to use ZwCreateProcess was that up to the 1.5.25
release we're still supporting Windows 9x users. However, two
attempts to use ZwCreateProcess on NT-based systems failed for one
reason or another.
It would be really nice if this stuff would be better or at all
documented, especially a couple of datastructures and how to connect a
process to a subsystem. While fork is not a Win32 concept, I don't
see that it would be a bad thing to make fork easier to implement.

Well, windows doesn't really have anything quite like it. Especially since fork can be used to conceptually create a thread or a process in *nix.
So, I'd have to say:
CreateProcess()/CreateProcessEx()
and
CreateThread() (I've heard that for C applications, _beginthreadex() is better).

People have tried to implement fork on Windows. This is the closest thing to it I can find:
Taken from: http://doxygen.scilab.org/5.3/d0/d8f/forkWindows_8c_source.html#l00216
static BOOL haveLoadedFunctionsForFork(void);
int fork(void)
{
HANDLE hProcess = 0, hThread = 0;
OBJECT_ATTRIBUTES oa = { sizeof(oa) };
MEMORY_BASIC_INFORMATION mbi;
CLIENT_ID cid;
USER_STACK stack;
PNT_TIB tib;
THREAD_BASIC_INFORMATION tbi;
CONTEXT context = {
CONTEXT_FULL |
CONTEXT_DEBUG_REGISTERS |
CONTEXT_FLOATING_POINT
};
if (setjmp(jenv) != 0) return 0; /* return as a child */
/* check whether the entry points are
initilized and get them if necessary */
if (!ZwCreateProcess && !haveLoadedFunctionsForFork()) return -1;
/* create forked process */
ZwCreateProcess(&hProcess, PROCESS_ALL_ACCESS, &oa,
NtCurrentProcess(), TRUE, 0, 0, 0);
/* set the Eip for the child process to our child function */
ZwGetContextThread(NtCurrentThread(), &context);
/* In x64 the Eip and Esp are not present,
their x64 counterparts are Rip and Rsp respectively. */
#if _WIN64
context.Rip = (ULONG)child_entry;
#else
context.Eip = (ULONG)child_entry;
#endif
#if _WIN64
ZwQueryVirtualMemory(NtCurrentProcess(), (PVOID)context.Rsp,
MemoryBasicInformation, &mbi, sizeof mbi, 0);
#else
ZwQueryVirtualMemory(NtCurrentProcess(), (PVOID)context.Esp,
MemoryBasicInformation, &mbi, sizeof mbi, 0);
#endif
stack.FixedStackBase = 0;
stack.FixedStackLimit = 0;
stack.ExpandableStackBase = (PCHAR)mbi.BaseAddress + mbi.RegionSize;
stack.ExpandableStackLimit = mbi.BaseAddress;
stack.ExpandableStackBottom = mbi.AllocationBase;
/* create thread using the modified context and stack */
ZwCreateThread(&hThread, THREAD_ALL_ACCESS, &oa, hProcess,
&cid, &context, &stack, TRUE);
/* copy exception table */
ZwQueryInformationThread(NtCurrentThread(), ThreadBasicInformation,
&tbi, sizeof tbi, 0);
tib = (PNT_TIB)tbi.TebBaseAddress;
ZwQueryInformationThread(hThread, ThreadBasicInformation,
&tbi, sizeof tbi, 0);
ZwWriteVirtualMemory(hProcess, tbi.TebBaseAddress,
&tib->ExceptionList, sizeof tib->ExceptionList, 0);
/* start (resume really) the child */
ZwResumeThread(hThread, 0);
/* clean up */
ZwClose(hThread);
ZwClose(hProcess);
/* exit with child's pid */
return (int)cid.UniqueProcess;
}
static BOOL haveLoadedFunctionsForFork(void)
{
HANDLE ntdll = GetModuleHandle("ntdll");
if (ntdll == NULL) return FALSE;
if (ZwCreateProcess && ZwQuerySystemInformation && ZwQueryVirtualMemory &&
ZwCreateThread && ZwGetContextThread && ZwResumeThread &&
ZwQueryInformationThread && ZwWriteVirtualMemory && ZwClose)
{
return TRUE;
}
ZwCreateProcess = (ZwCreateProcess_t) GetProcAddress(ntdll,
"ZwCreateProcess");
ZwQuerySystemInformation = (ZwQuerySystemInformation_t)
GetProcAddress(ntdll, "ZwQuerySystemInformation");
ZwQueryVirtualMemory = (ZwQueryVirtualMemory_t)
GetProcAddress(ntdll, "ZwQueryVirtualMemory");
ZwCreateThread = (ZwCreateThread_t)
GetProcAddress(ntdll, "ZwCreateThread");
ZwGetContextThread = (ZwGetContextThread_t)
GetProcAddress(ntdll, "ZwGetContextThread");
ZwResumeThread = (ZwResumeThread_t)
GetProcAddress(ntdll, "ZwResumeThread");
ZwQueryInformationThread = (ZwQueryInformationThread_t)
GetProcAddress(ntdll, "ZwQueryInformationThread");
ZwWriteVirtualMemory = (ZwWriteVirtualMemory_t)
GetProcAddress(ntdll, "ZwWriteVirtualMemory");
ZwClose = (ZwClose_t) GetProcAddress(ntdll, "ZwClose");
if (ZwCreateProcess && ZwQuerySystemInformation && ZwQueryVirtualMemory &&
ZwCreateThread && ZwGetContextThread && ZwResumeThread &&
ZwQueryInformationThread && ZwWriteVirtualMemory && ZwClose)
{
return TRUE;
}
else
{
ZwCreateProcess = NULL;
ZwQuerySystemInformation = NULL;
ZwQueryVirtualMemory = NULL;
ZwCreateThread = NULL;
ZwGetContextThread = NULL;
ZwResumeThread = NULL;
ZwQueryInformationThread = NULL;
ZwWriteVirtualMemory = NULL;
ZwClose = NULL;
}
return FALSE;
}

Prior to Microsoft introducing their new "Linux subsystem for Windows" option, CreateProcess() was the closest thing Windows has to fork(), but Windows requires you to specify an executable to run in that process.
The UNIX process creation is quite different to Windows. Its fork() call basically duplicates the current process almost in total, each in their own address space, and continues running them separately. While the processes themselves are different, they are still running the same program. See here for a good overview of the fork/exec model.
Going back the other way, the equivalent of the Windows CreateProcess() is the fork()/exec() pair of functions in UNIX.
If you were porting software to Windows and you don't mind a translation layer, Cygwin provided the capability that you want but it was rather kludgey.
Of course, with the new Linux subsystem, the closest thing Windows has to fork() is actually fork() :-)

As other answers have mentioned, NT (the kernel underlying modern versions of Windows) has an equivalent of Unix fork(). That's not the problem.
The problem is that cloning a process's entire state is not generally a sane thing to do. This is as true in the Unix world as it is in Windows, but in the Unix world, fork() is used all the time, and libraries are designed to deal with it. Windows libraries aren't.
For example, the system DLLs kernel32.dll and user32.dll maintain a private connection to the Win32 server process csrss.exe. After a fork, there are two processes on the client end of that connection, which is going to cause problems. The child process should inform csrss.exe of its existence and make a new connection – but there's no interface to do that, because these libraries weren't designed with fork() in mind.
So you have two choices. One is to forbid the use of kernel32 and user32 and other libraries that aren't designed to be forked – including any libraries that link directly or indirectly to kernel32 or user32, which is virtually all of them. This means that you can't interact with the Windows desktop at all, and are stuck in your own separate Unixy world. This is the approach taken by the various Unix subsystems for NT.
The other option is to resort to some sort of horrible hack to try to get unaware libraries to work with fork(). That's what Cygwin does. It creates a new process, lets it initialize (including registering itself with csrss.exe), then copies most of the dynamic state over from the old process and hopes for the best. It amazes me that this ever works. It certainly doesn't work reliably – even if it doesn't randomly fail due to an address space conflict, any library you're using may be silently left in a broken state. The claim of the current accepted answer that Cygwin has a "fully-featured fork()" is... dubious.
Summary: In an Interix-like environment, you can fork by calling fork(). Otherwise, please try to wean yourself from the desire to do it. Even if you're targeting Cygwin, don't use fork() unless you absolutely have to.

The following document provides some information on porting code from UNIX to Win32:
https://msdn.microsoft.com/en-us/library/y23kc048.aspx
Among other things, it indicates that the process model is quite different between the two systems and recommends consideration of CreateProcess and CreateThread where fork()-like behavior is required.

"as soon as you want to do file access or printf then io are refused"
You cannot have your cake and eat it too... in msvcrt.dll, printf() is based on the Console API, which in itself uses lpc to communicate with the console subsystem (csrss.exe). Connection with csrss is initiated at process start-up, which means that any process that begins its execution "in the middle" will have that step skipped. Unless you have access to the source code of the operating system, then there is no point in trying to connect to csrss manually. Instead, you should create your own subsystem, and accordingly avoid the console functions in applications that use fork().
once you have implemented your own subsystem, don't forget to also duplicate all of the parent's handles for the child process;-)
"Also, you probably shouldn't use the Zw* functions unless you're in kernel mode, you should probably use the Nt* functions instead."
This is incorrect. When accessed in user mode, there is absolutely no difference between Zw*** Nt***; these are merely two different (ntdll.dll) exported names that refer to the same (relative) virtual address.
ZwGetContextThread(NtCurrentThread(), &context);
obtaining the context of the current (running) thread by calling ZwGetContextThread is wrong, is likely to crash, and (due to the extra system call) is also not the fastest way to accomplishing the task.

Your best options are CreateProcess() or CreateThread(). There is more information on porting here.

There is no easy way to emulate fork() on Windows.
I suggest you to use threads instead.

fork() semantics are necessary where the child needs access to the actual memory state of the parent as of the instant fork() is called. I have a piece of software which relies on the implicit mutex of memory copying as of the instant fork() is called, which makes threads impossible to use. (This is emulated on modern *nix platforms via copy-on-write/update-memory-table semantics.)
The closest that exists on Windows as a syscall is CreateProcess. The best that can be done is for the parent to freeze all other threads during the time that it is copying memory over to the new process's memory space, then thaw them. Neither the Cygwin frok [sic] class nor the Scilab code that Eric des Courtis posted does the thread-freezing, that I can see.
Also, you probably shouldn't use the Zw* functions unless you're in kernel mode, you should probably use the Nt* functions instead. There's an extra branch that checks whether you're in kernel mode and, if not, performs all of the bounds checking and parameter verification that Nt* always do. Thus, it's very slightly less efficient to call them from user mode.

The closest you say... Let me think... This must be fork() I guess :)
For details see Does Interix implement fork()?

Most of the hacky solutions are outdated. Winnie the fuzzer has a version of fork that works on current versions of Windows 10 (tho this requires system specific offsets and can break easily too).
https://github.com/sslab-gatech/winnie/tree/master/forklib

If you only care about creating a subprocess and waiting for it, perhaps _spawn* API's in process.h are sufficient. Here's more information about that:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/process-and-environment-control
https://en.wikipedia.org/wiki/Process.h

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js