Where is the pid stored?

Where is the pid stored? - c++

I have the following question in an assignment:
In every one second a process calls the following function:
#include <string>
using namespace std;
string create_file_name(time_t timestamp) {
pid_t pid = getpid();
string s = “results-” + to_string(pid) + to_string(timestamp);
return s;
}
The question is where does the kernel store the process PID?
there are 5 different answers:
user's stack \ kernel's stack \ heap \ PCB \ runqueue
Now generally, I know that the PID is stored inside the PCB but in this case, should it also be stored inside the user's stack? (since it's a local variable).
The question seems to have only one answer, so I am quite confused.

As said from the manpage :
From glibc version 2.3.4 up to and including version 2.24, the glibc
wrapper function for getpid() cached PIDs, with the goal of avoiding
additional system calls when a process calls getpid() repeatedly.
Normally this caching was invisible, but its correct operation relied
on support in the wrapper functions for fork(2), vfork(2), and
clone(2): if an application bypassed the glibc wrappers for these
system calls by using syscall(2), then a call to getpid() in the
child would return the wrong value (to be precise: it would return
the PID of the parent process). In addition, there were cases where
getpid() could return the wrong value even when invoking clone(2) via
the glibc wrapper function. (For a discussion of one such case, see
BUGS in clone(2).) Furthermore, the complexity of the caching code
had been the source of a few bugs within glibc over the years.
Because of the aforementioned problems, since glibc version 2.25, the
PID cache is removed: calls to getpid() always invoke the actual
system call, rather than returning a cached value.
On Alpha, instead of a pair of getpid() and getppid() system calls, a
single getxpid() system call is provided, which returns a pair of PID
and parent PID. The glibc getpid() and getppid() wrapper functions
transparently deal with this. See syscall(2) for details regarding
register mapping.
It depend on the glibc you use. In fact in some version glibc mantains a cache of the pid, while in some versions it repetedly call the syscall to get the pid of the process if you want to know how the system call work is suggest you to see the kernel code.
You can find the getpid() function at this link. ( you can change the kernel version and navigate all the source code to rebuild how the getpid() syscall works.

Related

Program segfaults on alpine linux. How do I resolve it?

I've been working on a webrtc datachannel library in C/C++ and wrote a program in C to:
Create two peers from the same process.
Establish a connection between them.
Close the connection if it's successful.
Everything runs fine on a debian docker container and on my host opensuse tumbleweed (all x86_64 and 64bit), but on alpine linux container (64bit x86_64), I'm getting a SEGFAULT inside the child processes:
The function above is from the program's dependency "libnice". It seems like *agent == NULL and there is no way that is made null in the caller's scope. I even inserted a printf("Argument is %p", agent); right before the function call and it prints out its memory and I can verify it's not null. From the disassembly, it looks like the line where copying the agent's contents (0x557a1d20) as the local variable in the callee's stack results in a segfault. The segfault always occurs at this point even after a make clean and recompilation. Fail at activation record? Stack corruption?
UPDATE: I made a more lightweight container and ran it, and now it segfaults at a different place in that same priv_conn_keepalive_tick_unlocked. The argument seems to be set though (Notice the 0x7ffff7f9ad08):
Since I thought I might be hitting the libmusl's default stack limit of 80k, I used getrlimit(RLIMIT_STACK, &rl) to obtain the stack size and it looks like it's already 8 MB and not 80k. Increasing this limit further does not seem to make any difference except that if I assign more than 8 MB, my program crashes early inside the Gdb. Gdb says it got an unknown signal "? ?"; outside the gdb, it crashes at the normal point where it normally crashes without the altered stack size.
I'm not sure what exactly the problem is (stack corruption?) and what to do next to resolve this.
Here's my program's flow:
For every peer that is created, a child process is created with a fork(). Parent <--> child communication is done by ZeroMQ and I use protocol buffers to forward any callbacks (and its arguments) that are triggered inside the child onto an event loop running in the parent process.
So for the above program, there are 2 child processes and 1 parent process.
Steps to reproduce:
Source file: https://github.com/hamon-in/librtcdcpp/blob/alpine-test/examples/websocket_client/2in1.c
Alpine docker container: https://github.com/hamon-in/librtcdcpp/blob/alpine-test/Dockerfile.amd64
Run the container and binary is located at /psl-librtcdcpp/examples/websocket_client/2in1
2in1 will spawn two child processes both of which will crash.

On further investigation, the crash is in an instruction writing at a mildly large negative offset from the stack base pointer, so it's probably just a simple stack overflow.
The right way to fix this is reducing the excess stack usage or explicitly requesting a large stack at pthread_create time, but I don't see where pthread_create is being called from. A quick check to verify that this is the problem would be to override the default stack size for new threads by performing the following somewhere early in the program:
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 1<<20); // 1 MB
pthread_setattr_default_np(&attr);

Add -Werror=implicit-function-declaration to your CFLAGS and you'll immediately have the cause. The key clue is the pointer value 0x557a1d20, which is almost surely the result of truncating a pointer to 32 bits. This happens when you failed to declare a function that returns a pointer and the compiler (by an awful backwards default) assumes it returns int rather than producing an error, then subsequently allows the implicit conversion from int to pointer despite the C language disallowing it.

How to map pthread_t to pid (on Linux)

Is there a sane way to map a pthread_t value (as returned from pthread_create() or std::thread::native_hanle() ) to pid(tid) in Linux? Before someone gets duplicate-happy, this is not about finding thread's own pid (which can be done with gettid()).
The insane way would be to somehow compel a thread to call gettid() and pass along the result, but that's way too much trouble.
One of the possible applications I have in mind is to reconcile threads created within program (where pthread_t is available) with output provided by ps -T.

One (convoluted, non-portable, Linux-specific, lightly destructive) method of mapping pthread_t to tid without looking into struct pthread is as follows:
Use pthread_setname_np to set a thread name to something unique.
Iterate over subdirectories of /proc/self/task and read a line from a file named comm in each of those.
If the line equals to the unique string just used, extract tid from the last component of the subdirectory name. This is your answer.
The thread name is not used by the OS for anything, so it should be safe to change it. Nevertheless you probably want to set it back to the value it had originally (use pthread_getname_np to obtain it).

A somewhat hacky way of doing this on Linux would be with the process_vm_readv system call. If you know the pthread_t, use it to copy the memory over from the remote process to a local buffer, then cast the pointer to a struct pthread and retrieve the value of the tid field.

Pthreads are POSIX Threads.
In pthread_t is a typedef to some type of long depending on your architecture.
It is actually a pointer typecasted to an internal struct pthread as mentioned here above.
It is on purpose that the struct pthread is not returned.
Threading is highly dependent on the underlying OS. Threads are not implemented equally on all Unix flavored OS'es.
I don't believe even that gettid is a POSIX function, I believe it is Linux specific.
You can have a look at the glibc/nptl source code for linux specific implementation of struct pthread
See https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/descr.h;h=fdeb397eab94730a5dab3181abcdae815ed6914e;hb=48a8f8328122ab8d06b7333cb87be46feeaf7cca
But I believe what you are looking for is getpid. It is the process id and not thread id

Does the process automatically clean up the resources taken by pthreads upon exit

Assume that I have a code something like this :
void *my_thread(void *data)
{
while (1) { }
}
void foo_init(struct my_resource *res)
{
pthread_create(&res->tid, NULL, my_thread, res);
/* Some init code */
}
void foo_exit(void)
{
/* Some exit code */
}
The scenario is something like this. When the process gets initialized, the function foo_init() is called with a pointer to my allocated resources(the allocation is done automatically by some other function, which isn't under my control). Within the function I am creating a pthread, which runs in infinite loop.
After a while when the process is about to terminate, the function foo_exit() is called, but this time without the pointer to my resources, and hence I am unable to call pthread_join(), as my tid is contained within my_resource structure.
Now my question is that, whether the resources pertaining to the pthreads are destroyed upon the termination of the process by the OS or not? If yes, how can I make that sure.
Also is it safe to terminate the process without calling pthread_join()?
Thanks in advance.

If you're talking about allocated memory, yes. When a process exits all virtual memory pages allocated to that process are returned to the system, which will clean up all memory allocated within your process.
Generally the OS is supposed to clean up all resources associated with a process on exit. It will handle closing file handles (which can include sockets and RPC mechanisms), wiping away the stack, and cleaning up kernel resources for the task.
Short answer, if the OS doesn't clean up after a process it is a bug in the OS. But none of us write buggy software right?

All "regular" resources needed by a process are released automatically by the OS when the process terminates (e.g. memory, sockets, file handles). The most important exception is shared memory but also other resources can be problematic if they're managed not by OS but by other processes.
For example if your process talks to a daemon or to another process like a window manager and allocates resources, whether or not those are released in case the process terminates without releasing them depends on the implementation.

I think the question can be answered another way: pthreads do not own any resources, resources are owned by the process. (A pthread may be the "custodian" of resources, such as memory it has malloc'ed, but it is not the owner.) When the process terminates, any still running pthreads suddenly stop and then the usual process clean-up happens.
POSIX says (for _Exit()):
• Threads terminated by a call to _Exit() or _exit() shall not invoke their cancellation cleanup handlers or per-thread data destructors.
For exit() POSIX specifies a little more clean-up -- in particular running all atexit() things and flushing streams and such -- before proceeding as if by _Exit(). Note that this does not invoke any pthread cancellation cleanup for any pthread -- the system cannot tell what state any pthread is in, and cannot be sure of being able to pthread_cancel() all pthreads, so does the only thing it can do, which is to stop them all dead.
I can recommend the Single UNIX® Specification (POSIX) -- like any standard, it's not an easy read, but worth getting to know.

how can I tell if pthread_self is the main (first) thread in the process?

background: I'm working on a logging library that is used by many programs.
I'm assigning a human-readable name for each thread, the main thread should get "main", but I'd like to be able to detect that state from within the library without requiring code at the beginning of each main() function.
Also note: The library code will not always be entered first from the main thread.

This is kinda doable, depending on the platform you're on, but absolutely not in any portable and generic way...
Mac OS X seems to be the only one with a direct and documented approach, according to their pthread.h file:
/* returns non-zero if the current thread is the main thread */
int pthread_main_np(void);
I also found that FreeBSD has a pthread_np.h header that defines pthread_main_np(), so this should work on FreeBSD too (8.1 at least), and OpenBSD (4.8 at least) has pthread_main_np() defined in pthread.h too. Note that _np explicitly stands for non-portable!
Otherwise, the only more "general" approach that comes to mind is comparing the PID of the process to the TID of the current thread, if they match, that thread is main.
This does not necessarily work on all platforms, it depends on if you can actually get a TID at all (you can't in OpenBSD for example), and if you do, if it has any relation to the PID at all or if the threading subsystem has its own accounting that doesn't necessarily relate.
I also found that some platforms give back constant values as TID for the main thread, so you can just check for those.
A brief summary of platforms I've checked:
Linux: possible here, syscall(SYS_gettid) == getpid() is what you want
FreeBSD: not possible here, thr_self() seems random and without relation to getpid()
OpenBSD: not possible here, there is no way to get a TID
NetBSD: possible here, _lwp_self() always returns 1 for the main thread
Solaris: possible here, pthread_self() always returns 1 for the main thread
So basically you should be able to do it directly on Mac OS X, FreeBSD and OpenBSD.
You can use the TID == PID approach on Linux.
You can use the TID == 1 approach on NetBSD and Solaris.
I hope this helps, have a good day!

Call pthread_self() from main() and record the result. Compare future calls to pthread_self() to your stored value to know if you're on the main thread.

You can utilize some kind of shared name resource that allows threads that spawn to register a name (perhaps a map of thread id to name). Your logging system can then place a call into a method that gets the name via the thread ID in a thread-safe manner.
When the thread dies, have it remove it's name from the mapping to avoid leaking memory.
This method should allow all threads to be named, not just main.

What is the closest thing Windows has to fork()?

I guess the question says it all.
I want to fork on Windows. What is the most similar operation and how do I use it.

Cygwin has fully featured fork() on Windows. Thus if using Cygwin is acceptable for you, then the problem is solved in the case performance is not an issue.
Otherwise you can take a look at how Cygwin implements fork(). From a quite old Cygwin's architecture doc:
5.6. Process Creation
The fork call in Cygwin is particularly interesting
because it does not map well on top of
the Win32 API. This makes it very
difficult to implement correctly.
Currently, the Cygwin fork is a
non-copy-on-write implementation
similar to what was present in early
flavors of UNIX.
The first thing that happens when a
parent process forks a child process
is that the parent initializes a space
in the Cygwin process table for the
child. It then creates a suspended
child process using the Win32
CreateProcess call. Next, the parent
process calls setjmp to save its own
context and sets a pointer to this in
a Cygwin shared memory area (shared
among all Cygwin tasks). It then fills
in the child's .data and .bss sections
by copying from its own address space
into the suspended child's address
space. After the child's address space
is initialized, the child is run while
the parent waits on a mutex. The child
discovers it has been forked and
longjumps using the saved jump buffer.
The child then sets the mutex the
parent is waiting on and blocks on
another mutex. This is the signal for
the parent to copy its stack and heap
into the child, after which it
releases the mutex the child is
waiting on and returns from the fork
call. Finally, the child wakes from
blocking on the last mutex, recreates
any memory-mapped areas passed to it
via the shared area, and returns from
fork itself.
While we have some ideas as to how to
speed up our fork implementation by
reducing the number of context
switches between the parent and child
process, fork will almost certainly
always be inefficient under Win32.
Fortunately, in most circumstances the
spawn family of calls provided by
Cygwin can be substituted for a
fork/exec pair with only a little
effort. These calls map cleanly on top
of the Win32 API. As a result, they
are much more efficient. Changing the
compiler's driver program to call
spawn instead of fork was a trivial
change and increased compilation
speeds by twenty to thirty percent in
our tests.
However, spawn and exec present their
own set of difficulties. Because there
is no way to do an actual exec under
Win32, Cygwin has to invent its own
Process IDs (PIDs). As a result, when
a process performs multiple exec
calls, there will be multiple Windows
PIDs associated with a single Cygwin
PID. In some cases, stubs of each of
these Win32 processes may linger,
waiting for their exec'd Cygwin
process to exit.
Sounds like a lot of work, doesn't it? And yes, it is slooooow.
EDIT: the doc is outdated, please see this excellent answer for an update

I certainly don't know the details on this because I've never done it it, but the native NT API has a capability to fork a process (the POSIX subsystem on Windows needs this capability - I'm not sure if the POSIX subsystem is even supported anymore).
A search for ZwCreateProcess() should get you some more details - for example this bit of information from Maxim Shatskih:
The most important parameter here is SectionHandle. If this parameter
is NULL, the kernel will fork the current process. Otherwise, this
parameter must be a handle of the SEC_IMAGE section object created on
the EXE file before calling ZwCreateProcess().
Though note that Corinna Vinschen indicates that Cygwin found using ZwCreateProcess() still unreliable:
Iker Arizmendi wrote:
> Because the Cygwin project relied solely on Win32 APIs its fork
> implementation is non-COW and inefficient in those cases where a fork
> is not followed by exec. It's also rather complex. See here (section
> 5.6) for details:
>
> http://www.redhat.com/support/wpapers/cygnus/cygnus_cygwin/architecture.html
This document is rather old, 10 years or so. While we're still using
Win32 calls to emulate fork, the method has changed noticably.
Especially, we don't create the child process in the suspended state
anymore, unless specific datastructes need a special handling in the
parent before they get copied to the child. In the current 1.5.25
release the only case for a suspended child are open sockets in the
parent. The upcoming 1.7.0 release will not suspend at all.
One reason not to use ZwCreateProcess was that up to the 1.5.25
release we're still supporting Windows 9x users. However, two
attempts to use ZwCreateProcess on NT-based systems failed for one
reason or another.
It would be really nice if this stuff would be better or at all
documented, especially a couple of datastructures and how to connect a
process to a subsystem. While fork is not a Win32 concept, I don't
see that it would be a bad thing to make fork easier to implement.

Well, windows doesn't really have anything quite like it. Especially since fork can be used to conceptually create a thread or a process in *nix.
So, I'd have to say:
CreateProcess()/CreateProcessEx()
and
CreateThread() (I've heard that for C applications, _beginthreadex() is better).

People have tried to implement fork on Windows. This is the closest thing to it I can find:
Taken from: http://doxygen.scilab.org/5.3/d0/d8f/forkWindows_8c_source.html#l00216
static BOOL haveLoadedFunctionsForFork(void);
int fork(void)
{
HANDLE hProcess = 0, hThread = 0;
OBJECT_ATTRIBUTES oa = { sizeof(oa) };
MEMORY_BASIC_INFORMATION mbi;
CLIENT_ID cid;
USER_STACK stack;
PNT_TIB tib;
THREAD_BASIC_INFORMATION tbi;
CONTEXT context = {
CONTEXT_FULL |
CONTEXT_DEBUG_REGISTERS |
CONTEXT_FLOATING_POINT
};
if (setjmp(jenv) != 0) return 0; /* return as a child */
/* check whether the entry points are
initilized and get them if necessary */
if (!ZwCreateProcess && !haveLoadedFunctionsForFork()) return -1;
/* create forked process */
ZwCreateProcess(&hProcess, PROCESS_ALL_ACCESS, &oa,
NtCurrentProcess(), TRUE, 0, 0, 0);
/* set the Eip for the child process to our child function */
ZwGetContextThread(NtCurrentThread(), &context);
/* In x64 the Eip and Esp are not present,
their x64 counterparts are Rip and Rsp respectively. */
#if _WIN64
context.Rip = (ULONG)child_entry;
#else
context.Eip = (ULONG)child_entry;
#endif
#if _WIN64
ZwQueryVirtualMemory(NtCurrentProcess(), (PVOID)context.Rsp,
MemoryBasicInformation, &mbi, sizeof mbi, 0);
#else
ZwQueryVirtualMemory(NtCurrentProcess(), (PVOID)context.Esp,
MemoryBasicInformation, &mbi, sizeof mbi, 0);
#endif
stack.FixedStackBase = 0;
stack.FixedStackLimit = 0;
stack.ExpandableStackBase = (PCHAR)mbi.BaseAddress + mbi.RegionSize;
stack.ExpandableStackLimit = mbi.BaseAddress;
stack.ExpandableStackBottom = mbi.AllocationBase;
/* create thread using the modified context and stack */
ZwCreateThread(&hThread, THREAD_ALL_ACCESS, &oa, hProcess,
&cid, &context, &stack, TRUE);
/* copy exception table */
ZwQueryInformationThread(NtCurrentThread(), ThreadBasicInformation,
&tbi, sizeof tbi, 0);
tib = (PNT_TIB)tbi.TebBaseAddress;
ZwQueryInformationThread(hThread, ThreadBasicInformation,
&tbi, sizeof tbi, 0);
ZwWriteVirtualMemory(hProcess, tbi.TebBaseAddress,
&tib->ExceptionList, sizeof tib->ExceptionList, 0);
/* start (resume really) the child */
ZwResumeThread(hThread, 0);
/* clean up */
ZwClose(hThread);
ZwClose(hProcess);
/* exit with child's pid */
return (int)cid.UniqueProcess;
}
static BOOL haveLoadedFunctionsForFork(void)
{
HANDLE ntdll = GetModuleHandle("ntdll");
if (ntdll == NULL) return FALSE;
if (ZwCreateProcess && ZwQuerySystemInformation && ZwQueryVirtualMemory &&
ZwCreateThread && ZwGetContextThread && ZwResumeThread &&
ZwQueryInformationThread && ZwWriteVirtualMemory && ZwClose)
{
return TRUE;
}
ZwCreateProcess = (ZwCreateProcess_t) GetProcAddress(ntdll,
"ZwCreateProcess");
ZwQuerySystemInformation = (ZwQuerySystemInformation_t)
GetProcAddress(ntdll, "ZwQuerySystemInformation");
ZwQueryVirtualMemory = (ZwQueryVirtualMemory_t)
GetProcAddress(ntdll, "ZwQueryVirtualMemory");
ZwCreateThread = (ZwCreateThread_t)
GetProcAddress(ntdll, "ZwCreateThread");
ZwGetContextThread = (ZwGetContextThread_t)
GetProcAddress(ntdll, "ZwGetContextThread");
ZwResumeThread = (ZwResumeThread_t)
GetProcAddress(ntdll, "ZwResumeThread");
ZwQueryInformationThread = (ZwQueryInformationThread_t)
GetProcAddress(ntdll, "ZwQueryInformationThread");
ZwWriteVirtualMemory = (ZwWriteVirtualMemory_t)
GetProcAddress(ntdll, "ZwWriteVirtualMemory");
ZwClose = (ZwClose_t) GetProcAddress(ntdll, "ZwClose");
if (ZwCreateProcess && ZwQuerySystemInformation && ZwQueryVirtualMemory &&
ZwCreateThread && ZwGetContextThread && ZwResumeThread &&
ZwQueryInformationThread && ZwWriteVirtualMemory && ZwClose)
{
return TRUE;
}
else
{
ZwCreateProcess = NULL;
ZwQuerySystemInformation = NULL;
ZwQueryVirtualMemory = NULL;
ZwCreateThread = NULL;
ZwGetContextThread = NULL;
ZwResumeThread = NULL;
ZwQueryInformationThread = NULL;
ZwWriteVirtualMemory = NULL;
ZwClose = NULL;
}
return FALSE;
}

Prior to Microsoft introducing their new "Linux subsystem for Windows" option, CreateProcess() was the closest thing Windows has to fork(), but Windows requires you to specify an executable to run in that process.
The UNIX process creation is quite different to Windows. Its fork() call basically duplicates the current process almost in total, each in their own address space, and continues running them separately. While the processes themselves are different, they are still running the same program. See here for a good overview of the fork/exec model.
Going back the other way, the equivalent of the Windows CreateProcess() is the fork()/exec() pair of functions in UNIX.
If you were porting software to Windows and you don't mind a translation layer, Cygwin provided the capability that you want but it was rather kludgey.
Of course, with the new Linux subsystem, the closest thing Windows has to fork() is actually fork() :-)

As other answers have mentioned, NT (the kernel underlying modern versions of Windows) has an equivalent of Unix fork(). That's not the problem.
The problem is that cloning a process's entire state is not generally a sane thing to do. This is as true in the Unix world as it is in Windows, but in the Unix world, fork() is used all the time, and libraries are designed to deal with it. Windows libraries aren't.
For example, the system DLLs kernel32.dll and user32.dll maintain a private connection to the Win32 server process csrss.exe. After a fork, there are two processes on the client end of that connection, which is going to cause problems. The child process should inform csrss.exe of its existence and make a new connection – but there's no interface to do that, because these libraries weren't designed with fork() in mind.
So you have two choices. One is to forbid the use of kernel32 and user32 and other libraries that aren't designed to be forked – including any libraries that link directly or indirectly to kernel32 or user32, which is virtually all of them. This means that you can't interact with the Windows desktop at all, and are stuck in your own separate Unixy world. This is the approach taken by the various Unix subsystems for NT.
The other option is to resort to some sort of horrible hack to try to get unaware libraries to work with fork(). That's what Cygwin does. It creates a new process, lets it initialize (including registering itself with csrss.exe), then copies most of the dynamic state over from the old process and hopes for the best. It amazes me that this ever works. It certainly doesn't work reliably – even if it doesn't randomly fail due to an address space conflict, any library you're using may be silently left in a broken state. The claim of the current accepted answer that Cygwin has a "fully-featured fork()" is... dubious.
Summary: In an Interix-like environment, you can fork by calling fork(). Otherwise, please try to wean yourself from the desire to do it. Even if you're targeting Cygwin, don't use fork() unless you absolutely have to.

The following document provides some information on porting code from UNIX to Win32:
https://msdn.microsoft.com/en-us/library/y23kc048.aspx
Among other things, it indicates that the process model is quite different between the two systems and recommends consideration of CreateProcess and CreateThread where fork()-like behavior is required.

"as soon as you want to do file access or printf then io are refused"
You cannot have your cake and eat it too... in msvcrt.dll, printf() is based on the Console API, which in itself uses lpc to communicate with the console subsystem (csrss.exe). Connection with csrss is initiated at process start-up, which means that any process that begins its execution "in the middle" will have that step skipped. Unless you have access to the source code of the operating system, then there is no point in trying to connect to csrss manually. Instead, you should create your own subsystem, and accordingly avoid the console functions in applications that use fork().
once you have implemented your own subsystem, don't forget to also duplicate all of the parent's handles for the child process;-)
"Also, you probably shouldn't use the Zw* functions unless you're in kernel mode, you should probably use the Nt* functions instead."
This is incorrect. When accessed in user mode, there is absolutely no difference between Zw*** Nt***; these are merely two different (ntdll.dll) exported names that refer to the same (relative) virtual address.
ZwGetContextThread(NtCurrentThread(), &context);
obtaining the context of the current (running) thread by calling ZwGetContextThread is wrong, is likely to crash, and (due to the extra system call) is also not the fastest way to accomplishing the task.

Your best options are CreateProcess() or CreateThread(). There is more information on porting here.

There is no easy way to emulate fork() on Windows.
I suggest you to use threads instead.

fork() semantics are necessary where the child needs access to the actual memory state of the parent as of the instant fork() is called. I have a piece of software which relies on the implicit mutex of memory copying as of the instant fork() is called, which makes threads impossible to use. (This is emulated on modern *nix platforms via copy-on-write/update-memory-table semantics.)
The closest that exists on Windows as a syscall is CreateProcess. The best that can be done is for the parent to freeze all other threads during the time that it is copying memory over to the new process's memory space, then thaw them. Neither the Cygwin frok [sic] class nor the Scilab code that Eric des Courtis posted does the thread-freezing, that I can see.
Also, you probably shouldn't use the Zw* functions unless you're in kernel mode, you should probably use the Nt* functions instead. There's an extra branch that checks whether you're in kernel mode and, if not, performs all of the bounds checking and parameter verification that Nt* always do. Thus, it's very slightly less efficient to call them from user mode.

The closest you say... Let me think... This must be fork() I guess :)
For details see Does Interix implement fork()?

Most of the hacky solutions are outdated. Winnie the fuzzer has a version of fork that works on current versions of Windows 10 (tho this requires system specific offsets and can break easily too).
https://github.com/sslab-gatech/winnie/tree/master/forklib

If you only care about creating a subprocess and waiting for it, perhaps _spawn* API's in process.h are sufficient. Here's more information about that:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/process-and-environment-control
https://en.wikipedia.org/wiki/Process.h

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js