Check pointer handle is valid - c++

I want to implement a Microsoft CryptographicServiceProvider library and currently I thinking about the best way how to deal with context handle which I create.
My question is specific to this case but the design approach can be used in other situations.
I come from a managed code background and I am not 100% shure about multithread pointer handling in C/C++.
In general there are two functions which are responsible for handle creation and destruction (CryptAcquireContext, CryptReleaseContext), and all subsequent CSP functions uses the handle which is return by the creator function.
I didn't found any concrete information or specification from Microsoft which gives a design approach or rules how to do it. But I did research with other CSP providers created by Microsoft to find out the design rules, which are:
The functions must be thread safe
The context handle will not be shared between threads
If a context handle is not valid return with an error
Other MS CSP Provider will return a valid pointer as handle, or NULL if not.
I don't think that the calling application will pass complete garbage but it could happen that it passes a handle which has been already released and my library should return with an error.
This brought me to three ideas how to implement that:
Just allocate memory of my context struct with malloc or new and return the raw pointer as handle.
I can expect that the applications which call my library will pass a valid handle. But if not my library will run into an undefined behaviour. So I need a better solution.
Add the pointer which I create to a list (std::list, std::map). So I can iterate the list to check if the pointer exists. The access to the list is guarded with a mutex.
This should be safe and a regular API usage shouldn't be a performance issue. But in a Terminal Server scenario it could be. In this case the Windows process lsass.exe creates for every user who wants to login a CSP context in a separate thread and makes around 10 API calls per context.
The design goal is that my library should be able to handle 300 clients parallel. I don't know how many threads a created by Windows in this case.
So if possible I would prefer a lockless implementation.
I allocate a basic struct which holds a check value and the pointer of the actual data. Use the pointer of this struct as context handle.
typedef struct CSPHandle
{
int Type; // (eg. magic number CSPContext=0xA1B2C3D4)
CSPContextPtr pCSPContext;
};
So I could read the first byte of the passed pointer and check if the data equals my defined type. And I have the full control about actual data pointer, which is set to NULL if the context is released. Is this a good or bad idea?
What are your thoughts about this case? Should I go with one of these approaches or is there a other solution?
Thanks

I found a solution and will answer my question.
I overlooked a little but important detail
In CSP there are not direct API calls to the dll (load library, get function pointer, call function) because the function calls are forwarded by the Microsoft CSP which loads the CSP library by name.
So the Microsoft CSP needs to know and to check the passed context to get a correct mapping to the specific library.
Example:
1. client->cryptacquirecontext(in cspname, out ctx)
2. MS CSP->loads libray from the cspname
3. MS CSP->calls the function pointer of the loaded library
4. CSP LIB->cryptacquirecontext creates new context
5. MS CSP->receives the returned csp handle and saves it to the dll mapping
6. MS CSP->returns the result to the calling application
7. client->cryptsetprovparam(ctx) // which was created before
8. MS CSP->checks if the context exists and which library is responsible
9. MS CSP->if the given context can not be mapped to a csp dll an error will be returned, because the MS CSP doesn’t know which function pointer should be called.
So in this case it should be sufficient just to allocate memory. If the client application passes invalid context handle it will never hit the csp library.
I think that the MS CSP uses a list with mutex guard to store the context mappings. Because the context can be anything from a random number to a valid pointer.

Related

Passing 'this' pointer to MouseProc of SetWindowsHookEx

Generally, whenever we want to wrap a Window/Thread in a C++ object, we do so by passing the this pointer via SetWindowLong/GetWindowLong or SetProp/GetProp for a Window, and as lpParameter for CreateThread/etc.
My question is specific to Hooks. What is the elegant approach to pass the 'this' pointer to SetWindowsHookEx's callback procedures, or in other words How to wrap a hook's callback procedure ?
Since SetWindowsHookEx does not accept any UserData argument, I don't see much options apart from using un-encapsulated i.e. global/static/TLS data.
You are expected to have just one instance of a given hook, so global data is not an issue.
If you are developing a library allowing multiple hook instances that can be dynamically added or removed, do not add multiple hooks at the OS level. Instead, add a library-level hook procedure that walks the list of hook instances. Since you maintain this list, you can track whatever "user data" alongside each entry you want.
The 'most elegant approach' is to use a thunk. It's a small piece of code generated at runtime that holds your this pointer. This is the approach that ATL uses even for regular windows.
See
What is a thunk?
How to generate the code for thunks
C++ WinAPI Wrapper Object using thunks (x32 and x64)

Returning thread-local data from a shared library C-api

Question 1: Is it safe and portable to return a pointer to a thread_local data from a shared library providing a traditional C-API?
The lib itself is naturally implemented with C++11. Safetyness in respect to memory leaks and race conditions, portablitity covering the main desktop OSs: Windows, Linux and OSX. The calling application might be for example native, Java, C#, etc.
The use case is to implement caller-friendly and thread-safe routine which returns data from shared library. In this case consequent calls specifically will overwrite the thread-local buffer, and this drawback is preferred over requiring the caller to explicitely free the returned data using a library provided "free_data()" function.
// For example as a return value:
const char* text = MYLIB_get_foo_info();
The shared lib quarantees that the returned data is valid until the next call of the specific API function by the same thread that originally received the data from the API, and termination of the thread will invalidate (deallocate) it. The usage of the data is therefore in practice limited to a single-thread use, and should the API user desire to use the data with other threads or store it for later use it must take a value copy of it, in the API function caller thread.
By definition in this particular case one can safely assume that nothing will invalidate the data during the caller reads it. This is exceptionally strong assumption indeed, which is based on the particular nature of the API and its very limited intented use. It is an option to later add another version of the API that does not require this assumption, if some need for it arises.
Question 2: Is it quaranteed that when the library user (application) thread terminate the TLS memory is deallocated at that moment?
For example if the returned string is static thread_local std::string in the library.
I have failed to find clear and direct answer this specific case (of using TLS from a shared library).
Found two good articles about library API design, but these do not give any hints regarding TLS:
http://lucumr.pocoo.org/2013/8/18/beautiful-native-libraries/
https://anteru.net/blog/2016/05/01/3249/index.html
Before C++11 for example with Windows one could use TlsAlloc() but one had to specifically check for library-caller threads created before loading the lib.
Question 3: Am I correct that with C++11 thread_local one does not have this kind of issues anymore?
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686997(v=vs.85).aspx

Converting a string into a function in c++

I have been looking for a way to dynamically load functions into c++ for some time now, and I think I have finally figure it out. Here is the plan:
Pass the function as a string into C++ (via a socket connection, a file, or something).
Write the string into file.
Have the C++ program compile the file and execute it. If there are any errors, catch them and return it.
Have the newly executed program with the new function pass the memory location of the function to the currently running program.
Save the location of the function to a function pointer variable (the function will always have the same return type and arguments, so
this simplifies the declaration of the pointer).
Run the new function with the function pointer.
The issue is that after step 4, I do not want to keep the new program running since if I do this very often, many running programs will suck up threads. Is there some way to close the new program, but preserve the memory location where the new function is stored? I do not want it being overwritten or made available to other programs while it is still in use.
If you guys have any suggestions for the other steps as well, that would be appreciated as well. There might be other libraries that do things similar to this, and it is fine to recommend them, but this is the approach I want to look into — if not for the accomplishment of it, then for the knowledge of knowing how to do so.
Edit: I am aware of dynamically linked libraries. This is something I am largely looking into to gain a better understanding of how things work in C++.
I can't see how this can work. When you run the new program it'll be a separate process and so any addresses in its process space have no meaning in the original process.
And not just that, but the code you want to call doesn't even exist in the original process, so there's no way to call it in the original process.
As Nick says in his answer, you need either a DLL/shared library or you have to set up some form of interprocess communication so the original process can send data to the new process to be operated on by the function in question and then sent back to the original process.
How about a Dynamic Link Library?
These can be linked/unlinked/replaced at runtime.
Or, if you really want to communicated between processes, you could use a named pipe.
edit- you can also create named shared memory.
for the step 4. we can't directly pass the memory location(address) from one process to another process because the two process use the different virtual memory space. One process can't use memory in other process.
So you need create a shared memory through two processes. and copy your function to this memory, then you can close the newly process.
for shared memory, if in windows, looks Creating Named Shared Memory
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366551(v=vs.85).aspx
after that, you still create another memory space to copy function to it again.
The idea is that the normal memory allocated only has read/write properties, if execute the programmer on it, the CPU will generate the exception.
So, if in windows, you need use VirtualAlloc to allocate the memory with the flag,PAGE_EXECUTE_READWRITE (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx)
void* address = NULL;
address= VirtualAlloc(NULL,
sizeof(emitcode),
MEM_COMMIT|MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
After copy the function to address, you can call the function in address, but need be very careful to keep the stack balance.
Dynamic library are best suited for your problem. Also forget about launching a different process, it's another problem by itself, but in addition to the post above, provided that you did the virtual alloc correctly, just call your function within the same "loadder", then you shouldn't have to worry since you will be running the same RAM size bound stack.
The real problems are:
1 - Compiling the function you want to load, offline from the main program.
2 - Extract the relevant code from the binary produced by the compiler.
3 - Load the string.
1 and 2 require deep understanding of the entire compiler suite, including compiler flag options, linker, etc ... not just the IDE's push buttons ...
If you are OK, with 1 and 2, you should know why using a std::string or anything but pure char *, is an harmfull.
I could continue the entire story but it definitely deserve it's book, since this is Hacker/Cracker way of doing things I strongly recommand to the normal user the use of dynamic library, this is why they exists.
Usually we call this code injection ...
Basically it is forbidden by any modern operating system to access something for exceution after the initial loading has been done for sake of security, so we must fall back to OS wide validated dynamic libraries.
That's said, one you have valid compiled code, if you realy want to achieve that effect you must load your function into memory then define it as executable ( clear the NX bit ) in a system specific way.
But let's be clear, your function must be code position independant and you have no help from the dynamic linker in order to resolve symbol ... that's the hard part of the job.

Why does prevInstance exist in WinMain and wWinMain if it is always NULL

Since I am a beginner, it may be a very basic question. I am starting DirectX 11, and while creating my first application, wWinMain was used, and while searching for difference between WinMain and wWinMain, i came across this parameter prevInstance.
prevInstance is always null according to MSDN, and since it is always null, why does it exist (since it is logical to think that creators will not have given a useless parameter). And (quoting from the book),
if you need a way to determine whether a previous instance of the
application is already running, the documentation recommends creating
a uniquely named mutex using CreateMutex. Although the mutex will be
created, the CreateMutex function will return ERROR_ALREADY_EXISTS.
What is a mutex, and how to use it (a good link will be sufficient). And it looks like a method is needed to find if another instance of an application exists, prevInstance should have a pointer or reference to it, which is apparently not the case, since it is null. Why is it so, and what is the role of prevInstance?
Raymond Chen's blog is almost entirely dedicated to discussing aspects of the Windows API that are "oddities" to us today. And fortunately, he has a blog post that answers this exact question:
In 16-bit Windows there was a function called GetInstanceData. This
function took an HINSTANCE, a pointer, and a length, and copied memory
from that instance into your current instance. (It's sort of the
16-bit equivalent to ReadProcessMemory, with the restriction that the
second and third parameters had to be the same.)
...
This was the reason for the hPrevInstance parameter to WinMain. If
hPrevInstance was non-NULL, then it was the instance handle of a copy
of the program that is already running. You can use GetInstanceData to
copy data from it, get yourself up off the ground faster. For example,
you might want to copy the main window handle out of the previous
instance so you could communicate with it.
Whether hPrevInstance was NULL or not told you whether you were the
first copy of the program. Under 16-bit Windows, only the first
instance of a program registered its classes; second and subsequent
instances continued to use the classes that were registered by the
first instance. (Indeed, if they tried, the registration would fail
since the class already existed.) Therefore, all 16-bit Windows
programs skipped over class registration if hPrevInstance was
non-NULL.
The people who designed Win32 found themselves in a bit of a fix when
it came time to port WinMain: What to pass for hPrevInstance? The
whole module/instance thing didn't exist in Win32, after all, and
separate address spaces meant that programs that skipped over
reinitialization in the second instance would no longer work. So Win32
always passes NULL, making all programs believe that they are the
first one.
Of course, now that hPrevInstance is irrelevant to the Windows API today except for compatibility reasons, MSDN recommends that you use a mutex to detect previous instances of an application.
A mutex stands for "mutual exclusion". You can refer to the MSDN documentation for CreateMutex(). There are lots of examples of using mutexes to detect previous instances of applications, such as this one. The basic idea is to create a mutex with a unique name that you come up with, then attempt to create that named mutex. If CreateMutex() failed with ERROR_ALREADY_EXISTS, you know that an instance of your application was already launched.
The prev instance parameter is there for 16-bit Windows compatibility. I think that was stated in the MSDN reference for WinMain, at least it used to be.

Accidental Complexity in OpenSSL HMAC functions

SSL Documentation Analaysis
This question is pertaining the usage of the HMAC routines in OpenSSL.
Since Openssl documentation is a tad on the weak side in certain areas, profiling has revealed that using the:
unsigned char *HMAC(const EVP_MD *evp_md, const void *key,
int key_len, const unsigned char *d, int n,
unsigned char *md, unsigned int *md_len);
From here, shows 40% of my library runtime is devoted to creating and taking down HMAC_CTX's behind the scenes.
There are also two additional function to create and destroy a HMAC_CTX explicetly:
HMAC_CTX_init() initialises a HMAC_CTX
before first use. It must be called.
HMAC_CTX_cleanup() erases the key and
other data from the HMAC_CTX and
releases any associated resources. It
must be called when an HMAC_CTX is no
longer required.
These two function calls are prefixed with:
The following functions may be used if
the message is not completely stored
in memory
My data fits entirely in memory, so I choose the HMAC function -- the one whose signature is shown above.
The context, as described by the man page, is made use of by using the following two functions:
HMAC_Update() can be called repeatedly
with chunks of the message to be
authenticated (len bytes at data).
HMAC_Final() places the message
authentication code in md, which must
have space for the hash function
output.
The Scope of the Application
My application generates a authentic (HMAC, which is also used a nonce), CBC-BF encrypted protocol buffer string. The code will be interfaced with various web-servers and frameworks Windows / Linux as OS, nginx, Apache and IIS as webservers and Python / .NET and C++ web-server filters.
The description above should clarify that the library needs to be thread safe, and potentially have resumeable processing state -- i.e., lightweight threads sharing a OS thread (which might leave thread local memory out of the picture).
The Question
How do I get rid of the 40% overhead on each invocation in a (1) thread-safe / (2) resume-able state way ? (2) is optional since I have all of the source-data present in one go, and can make sure a digest is created in place without relinquishing control of the thread mid-digest-creation. So,
(1) can probably be done using thread local memory -- but how do I resuse the CTX's ? does the HMAC_final() call make the CTX reusable ?.
(2) optional: in this case I would have to create a pool of CTX's.
(3) how does the HMAC function do this ? does it create a CTX in the scope of the function call and destroy it ?
Psuedocode and commentary will be useful.
The documentation for the HMAC_Init_ex() function in OpenSSL 0.9.8g says:
HMAC_Init_ex() initializes or reuses a
HMAC_CTX structure to use the function
evp_md and key key. Either can be
NULL, in which case the existing one
will be reused.
(Emphasis mine). So this means that you can initialise a HMAC_CTX with HMAC_CTX_Init() once, then keep it around to create multiple HMACs with, as long as you don't call HMAC_CTX_cleanup() on it and you start off each HMAC with HMAC_Init_ex().
So yes, you should be able to do what you want with a HMAC_CTX in thread-local memory.
If you aren't trying to restrict your dependencies, you could choose a HMAC implementation that is self contained and requires that the user explicitly control all the aspects that OpenSSL is, in it's documentation, vague about. Many such simple C/C++ alternatives exist, but it is up to you to choose and evaluate such an alternative.