Background
I am currently working on a project for which I have written a DLL as an interface between a Windows driver and MATLAB. All of this is working very well, but one thing up until recently it has been lacking is documentation for certain functionality - essentially it allows command strings to be sent to an FPGA, and all of these commands need documenting.
This could be done using a PDF etc. but I wanted also a way to integrate it the documentation into the DLL itself - so functions like 'lookup command' etc. can be used. At any rate I went ahead and implemented this in a way which I am mostly satisfied.
Essentially I have a structure (see below) to which a pointer can be returned from functions so that the documentation can be accessed. The caller provides the address of a pointer to one of these which is then updated with the address of an entry in a constant global array.
typedef struct {
CONST CHAR * commandString;
ULONG commandStringLen;
CONST CHAR * documentationString;
ULONG documentationStringLen;
CONST CHAR * commandParameters;
ULONG commandParametersLen;
} COMMAND_DOCS;
#define STRING_LEN(a) a,(sizeof(a)-1)
#define NEWGROUP "\n "
#define NEWENTRY "\n "
#define NEWLINE "\n"
#define ENDTITLE "\n----------------------------------------\n"
CONST COMMAND_DOCS CommandDocs[] = {
//-----
#define COMMAND_xyz_GROUP_INDEX_START (0)
{ STRING_LEN("ABCD"),
STRING_LEN("Something Command"
ENDTITLE"Low Queue"
NEWLINE "Description:"
NEWGROUP"The .........."
NEWLINE),
STRING_LEN(NEWGROUP"Type x:"
NEWENTRY"No Payload"
NEWLINE)
},
#define COMMAND_xyz_GROUP_LENGTH (1)
//-----
... And so on
};
This results in a load of constant strings being stored in memory and an array of documentation structures which contain pointers to these constants and their lengths as well for good measure. A pointer to the required element in the array is returned as I say. The caller of the library API is then free to make copies if needed or display the strings, whatever.
This is all working nicely at the moment, except for a minor annoyance. Whenever I need to update the documentation, it required me to recompile the DLL - as clearly all of the strings are compiled into it. For me that is not an issue as I can easily compile it, but as I am working at a university developing a research platform for them to use, I want it to be as simple for people to update in the future as I move on to other work. Ideally if documentation needs updating - say new commands get added to the system, I would like the additions to be possible without having to do a recompile.
Question
So my question really is about what the best way to go about doing this is.
At the moment I am thinking along the lines of loading the documentation from a file, either at the time of loading the DLL, or when the search function is called. Currently there are #defines in the array to separate indexes (identify groups of commands), but these could be replaced by variables that are initialised by data from the file.
I could go for something like XML and parse that to fill up the structures, but part of me thinks it would be easier to understand in the outside world if it was something simpler, but then I suppose I would still need some way of identifying the boundaries between entries, etc.
Thoughts?
Note that the DLL is mostly C - all of the APIs are C interfaces, but internally it is C++ as I've been using classes for other parts. I don't mind which is used as long as it is compatible with C interfaces.
I went away from the problem for a while as it wasn't urgent, but have come back to it in the last few weeks. I've found that I need to add even more settings and configurable stuff to the DLL which makes the approach I was using before definitely not useable.
Basically as #PhilWilliams suggested in the comments, I've gone for using XML files for configuring everything. There are now some APIs that have to be called just after loading the library with which the location of the XML file to load is specified. The DLL will then parse the XML files and populate a load of internal structures. Rather than using #defines and constant strings, I now have an array of structs in which pointers to the parsed strings are located along with the indices which were previously #defines.
After looking on StackOverflow and finding various suggestions on simple XML parsers, I've gone for TinyXML2 as it means that I don't have the headache of allocating and freeing memory for the many documentation strings as it handles that internally.
Related
Is there a way to rename a win32 function like GetVolumeInformationW() using #define ?
For example:
#define abc(LPCWSTR a, LPWSTR b, ...) GetVolumeInformationW(Some argumments..)
Why do that? I want to hide the function name on debbuger programms like IDA, is there some way to did that?
Language: C++
There is no point in using #define for this, as this will have no effect on the contents of the binary executable. Using preprocessor macros will only affect what you as a programmer will see, but it won't affect what the compiler or linker will see. See this link for information on how the C++ preprocessor works and its relationship with the compiler/linker.
If you do not want the function to appear in the Import Table of your executable, then you can instead load the function dynamically using GetProcAddress. That way, a disassembler will probably be unable to determine which function the address is pointing to, when the function is being called. However, the disassembler will be able to see that you are using GetProcAddress for something, it just won't know what. Using the function GetProcAddress may make someone trying to crack your software suspicious, because that is a common thing to do if you are trying to hide something.
If you do not want the string GetVolumeInformationW to appear in cleartext in your executable file, then you can somehow encrypt or obfuscate it, for example store it in reverse and then reverse it back before passing it to GetProcAddress. This was just a very simple example of how it could be done. Using XOR (which is the ^ operator in C++) on every character with a certain key to encrypt, and then do the same thing again to decrypt, would probably be a better solution, as this would make the the encrypted text not be easily identifiable as text.
Is there a way to rename a win32 function like GetVolumeInformationW() using #define ?
No, macros do not serve that purpose. You could define a macro such that Win32 function names do not appear literally in your source code, other than in the macro definitions, but that does not rename the functions, nor even prevent the function names from appearing in your compiled object files, libraries, or executables.
It can't, because the Win32 API's function names are established by the platform headers and (especially) libraries. You're not rebuilding the platform libraries, only linking the existing ones to your own code, so your code has no alternative but to use the API's function names to call API functions.
Why do that? I want to hide the function name on debbuger programms like IDA, is there some way to did that?
Obfuscation is not a very effective defense technique. It is far more likely to make trouble for you, in the ordinary development of your software, than to present a major hurdle to a skilled adversary. You can obfuscate the names of your own functions if you nevertheless wish to do so, but no, you cannot change the names of platform API functions.
You'll be calling a function out of a shared DLL. Defines are strictly preprocessor.
What you want to do is create a hash function to hash the string "GetVolumeInformationW". As well as the name of the module thats in. For example "Kernel32.dll"
Get the PEB using the FS or GS register. Then go to the PEB_LDR_DATA list. Run each list entry and hash the DLL name against your Kernel32 hashed string. If the hashes match, you grab the base of the library in that same structure.
After this you will then trace the export table. And do the same thing you did above, where you compare each export name to the hashed "GetVolumeInformationW" string. When it's found, you will then call the address it's at using a function pointer.
This is the sole way to do it. Bonus points if the encrypted strings are stored on the stack. So when coding it do
char[] szKernel32 = 'K', 'e', 'r', 'n'.........;
Also, do not use GetProcAddress. It defeats the point of hiding, since anyone experienced with IDA will instantly search for GetProcAddress.
I've been researching how to extend .net controls to have more freedom to do the same things you can do with the regular windows API in C++ in a VB program. For example, if you want to add week numbers to a calendar control, you'll have to manually import the DLL and extend the control's class, calling internal windows functions.
I've found various topics on how people handle this, and I'm not quite happy with the 'canonical method'. To be honest, I think it's a pretty bad paradigm to use.
These internal windows functions use pointers to set magic properties.
First, I find it rather strange that a pointer, which its system-dependent value size, is being abused to hold something that isn't a memory location but a value, but that aside: these pointers are also used to set which attribute is being set.
For example, (leaving out all the boilerplate necessary to link up the code), changing the first day of the week to Tuesday would use this code:
Private Const MCM_FIRST As Int32 = &H1000
Private Const DTM_FIRST As Int32 = &H1000
Private Const DTM_GETMONTHCAL As Int32 = (DTM_FIRST + 8)
Private Const MCM_SETFIRSTDAYOFWEEK As Int32 = (MCM_FIRST + 15)
Dim hMonthView As IntPtr =
SendMessage(Me.Handle, DTM_GETMONTHCAL, IntPtr.Zero, IntPtr.Zero)
Call SendMessage(hMonthView, MCM_SETFIRSTDAYOFWEEK, 0&, 1&)
So the magic values of 0x1008 and 0x1015 is what my question is about in this code.
First off, this is a rather strange way of working: these values aren't documented anywhere as far as I know other than the examples. What if I need a property where there happens to not be an internet tutorial on so far? Where/how do I find the value of MCM_<ARBITRARY_VALUE_HERE> in general?
Note: I mean the latter question in the broad, general sense: not applying to just the specific calendar control the example is about, but really any windows control. I can already google up the specific C++ header file by name (e.g. for the example it's defined in Commctrl.h: it's just that that piece of information is rather useless if I don't know the idiomatic way of how to pull something like that out of the C++ header into the VB code.
Secondly... these values are defined in headers somewhere. Is it not possible to import the values from the proper header? This way the program will stay working in the (admittedly unlikely) scenario where the DLL is changed by re-compiling it.
One approach for this back for VB6 was to prepare a TLB file with constants, function declarations, etc. of the Win32 API, and then reference that in the VB6 program. The TLB didn't provide COM objects, it was just a convenient way of packaging up all the declarations as though they were in (what we now think of as) an assembly.
As far as I can think, that approach should still work perfectly well today in .NET through "COM" interop. You can just as easily reference the TLB in a C# or VB project and thereby access its contents.
The book Hardcore Visual Basic by Bruce McKinney included a disk with a prepared TLB for this purpose, and this seems to still be available today:
http://vb.mvps.org/hardweb/mckinney2a.htm
I don't know how comprehensive this was at the time, nor if it is really still up to date. At the very least it seems instructive in how to prepare a TLB for this type of approach.
The following page also provides a description of this approach with some additional explanation an examples (too long to copy in here).
http://www.brainbell.com/tutors/Visual_Basic/newfile156.html
I'm fairly new to c++ and am really interested in learning more. Have been reading quite a bit. Recently discovered the init/fini elf sections.
I started to wonder if & how one would use the init section to prepopulate objects that would be used at runtime. Say for example you wanted
to add performance measurements to your code, recording the time, filename, linenumber, and maybe some ID (monotonic increasing int for ex) or name.
You would place for example:
PROBE(0,"EventProcessing",__FILE__,__LINE__)
...... //process event
PROBE(1,"EventProcessing",__FILE__,__LINE__)
......//different processing on same event
PROBE(2,"EventProcessing",__FILE__,__LINE__)
The PROBE could be some macro that populates a struct containing this data (maybe on an array/list, etc using the id as an indexer).
Would it be possible to have code in the init section that could prepopulate all of this data for each PROBE (except for the time of course), so only the time would need to be retrieved/copied at runtime?
As far as I know the __attribute__((constructor)) can not be applied to member functions?
My initial idea was to create some kind of
linked list with each node pointing to each probe and code in the init secction could iterate it populating the id, file, line, etc, but
that idea assumed I could use a member function that could run in the "init" section, but that does not seem possible. Any tips appreciated!
As far as I understand it, you do not actually need an ELF constructor here. Instead, you could emit descriptors for your probes using extended asm statements (using data, instead of code). This also involves switching to a dedicated ELF section for the probe descriptors, say __probes.
The linker will concatenate all the probes and in an array, and generate special symbols __start___probes and __stop___probes, which you can use from your program to access thes probes. See the last paragraph in Input Section Example.
Systemtap implements something quite similar for its userspace probes:
User Space Probe Implementation
Adding User Space Probing to an Application (heapsort example)
Similar constructs are also used within the Linux kernel for its self-patching mechanism.
There's a pretty simple way to have code run on module load time: Use the constructor of a global variable:
struct RunMeSomeCode
{
RunMeSomeCode()
{
// your code goes here
}
} do_it;
The .init/.fini sections basically exist to implement global constructors/destructors as part of the ABI on some platforms. Other platforms may use different mechanisms such as _start and _init functions or .init_array/.deinit_array and .preinit_array. There are lots of subtle differences between all these methods and which one to use for what is a question that can really only be answered by the documentation of your target platform. Not all platforms use ELF to begin with…
The main point to understand is that things like the .init/.fini sections in an ELF binary happen way below the level of C++ as a language. A C++ compiler may use these things to implement certain behavior on a certain target platform. On a different platform, a C++ compiler will probably have to use different mechanisms to implement that same behavior. Many compilers will give you tools in the form of language extensions like __attributes__ or #pragmas to control such platform-specific details. But those generally only make sense and will only work with that particular compiler on that particular platform.
You don't need a member function (which gets a this pointer passed as an arg); instead you can simply create constructor-like functions that reference a global array, like
#define PROBE(id, stuff, more_stuff) \
__attribute__((constructor)) void \
probeinit##id(){ probes[id] = {id, stuff, 0/*to be written later*/, more_stuff}; }
The trick is having this macro work in the middle of another function. GNU C / C++ allows nested functions, but IDK if you can make them constructors.
You don't want to declare a static int dummy#id = something because then you're adding overhead to the function you profile. (gcc has to emit a thread-safe run-once locking mechanism.)
Really what you'd like is some kind of separate pass over the source that identifies all the PROBE macros and collects up their args to declare
struct probe global_probes[] = {
{0, "EventName", 0 /*placeholder*/, filename, linenum},
{1, "EventName", 0 /*placeholder*/, filename, linenum},
...
};
I'm not confident you can make that happen with CPP macros; I don't think it's possible to #define PROBE such that every time it expands, it redefines another macro to tack on more stuff.
But you could easily do that with an awk/perl/python / your fave scripting language program that scans your program and constructs a .c that declares an array with static storage.
Or better (for a single-threaded program): keep the runtime timestamps in one array, and the names and stuff in a separate array. So the cache footprint of the probes is smaller. For a multi-threaded program, stores to the same cache line from different threads is called false sharing, and creates cache-line ping-pong.
So you'd have #define PROBE(id, evname, blah blah) do { probe_times[id] = now(); }while(0)
and leave the handling of the later args to your separate preprocessing.
I have the following structure in an MFC DLL:
struct retornoSAP
{
enum tipoRetorno
{
Ok,
Falha,
AbrirArquivo, // mStrData válido
InserirArquivo, // mStrData válido
retornoMap, // mLstData válido
retornoTable, // mLstTable válido
retornoString, // mStrData válido
};
tipoRetorno mTipo;
CString mStrData;
CMapStringToString mLstData;
retTable* mLstTable;
CMapStringToString mLstVars;
retornoSAP( tipoRetorno pTipo )
{
mTipo = pTipo;
mLstTable = NULL;
}
~retornoSAP()
{
if ( mLstTable ) CISap::release( mLstTable );
}
};
retTable definition is:
typedef vector<CMapStringToString*> retTable;
This structure is used to store data read from an SAP API, and I have a lot of functions that return a "retornoSAP" value.
It happens that I have to call those functions from VB.NET, using P/Invoke (DllImport). I have read some material about marshaling unmanaged types to .NET (like this http://blogs.msdn.com/b/dsvc/archive/2009/02/18/marshalling-complicated-structures-using-pinvoke.aspx), and it probably would be easy to marshal a structure with some basic types in it, but I wonder if it is even possible to marshal a CMapStringToString or, even worse, a vector of CMapStringToString.
My question is if it's worth spending some time trying to translate this structure to a .NET type (in this case, where I could find some good documentation)?
If not, I wonder if it sounds like a good idea to use a XML parser in C++, write all my data to a XML structure, and then return that XML structure as a BSTR string, so I could read the BSTR return value easily it in my .NET application and parse it back to a XML structure. In that case, I would pass some big strings between the MFC DLL and the .NET application...
You can't really deal with MFC classes with P/INVOKE. I think you can choose one of two option:
Not touching the MFC dll at all: create a dll with CLI C++. Some effort to learn the language, but is a thunk dll, you don't need to study it all. This way you can expose some ref classes to .NET world and call from its methods the MFC dll. From inside the ref class you can read your original MFC structures and populate some other structures more .NET friendly.
Modify the MFC dll, expose some entry point with extern "C" decoration, in order to avoid name mangling, and convert internal structure in something easyer to manage, string as you guess, would be a not so elegant, but cutting edge solution;)
Both solution have some performance overhead, but I guess the first one would pay better, and it is sometimes the only one preventing modification to the original MFC dll, and sometimes this is desiderable. Second one is probably simpler, but passing magic string would involve some parsing leading to performance leak, possible errors so you need more test, more costs and so on.
Another drawback of solution 1 is you need an extra deploy for the C++/CLI redistributables.
I did not mention since you probably already know, but doing this kind of interop needs the .NET code compiled in x86 mode if your target C++ dll are 32 bit compiled.
I'm writing a C shared library for internal use (I'll be dlopen()'ing it to a c++ application, if that matters). The shared library loads (amongst other things) some java code through a JNI module, which means all manners of nightmare error modes can come out of the JVM that I need to handle intelligently in the application. Additionally, this library needs to be re-entrant. Is there in idiom for passing error strings back in this case, or am I stuck mapping errors to integers and using printfs to debug things?
Thanks!
My approach to the problem would be a little different from everyone else's. They're not wrong, it's just that I've had to wrestle with a different aspect of this problem.
A C API needs to provide numeric error codes, so that the code using the API can take sensible measures to recover from errors when appropriate, and pass them along when not. The errno.h codes demonstrate a good categorization of errors; in fact, if you can reuse those codes (or just pass them along, e.g. if all your errors come ultimately from system calls), do so.
Do not copy errno itself. If possible, return error codes directly from functions that can fail. If that is not possible, have a GetLastError() method on your state object. You have a state object, yes?
If you have to invent your own codes (the errno.h codes don't cut it), provide a function analogous to strerror, that converts these codes to human-readable strings.
It may or may not be appropriate to translate these strings. If they're meant to be read only by developers, don't bother. But if you need to show them to the end user, then yeah, you need to translate them.
The untranslated version of these strings should indeed be just string constants, so you have no allocation headaches. However, do not waste time and effort coding your own translation infrastructure. Use GNU gettext.
If your code is layered on top of another piece of code, it is vital that you provide direct access to all the error information and relevant context information that that code produces, and you make it easy for developers against your code to wrap up all that information in an error message for the end user.
For instance, if your library produces error codes of its own devising as a direct consequence of failing system calls, your state object needs methods that return the errno value observed immediately after the system call that failed, the name of the file involved (if any), and ideally also the name of the system call itself. People get this wrong waaay too often -- for instance, SQLite, otherwise a well designed API, does not expose the errno value or the name of the file, which makes it infuriatingly hard to distinguish "the file permissions on the database are wrong" from "you have a bug in your code".
EDIT: Addendum: common mistakes in this area include:
Contorting your API (e.g. with use of out-parameters) so that functions that would naturally return some other value can return an error code.
Not exposing enough detail for callers to be able to produce an error message that allows a knowledgeable human to fix the problem. (This knowledgeable human may not be the end user. It may be that your error messages wind up in server log files or crash reports for developers' eyes only.)
Exposing too many different fine distinctions among errors. If your callers will never plausibly do different things in response to two different error codes, they should be the same code.
Providing more than one success code. This is asking for subtle bugs.
Also, think very carefully about which APIs ought to be allowed to fail. Here are some things that should never fail:
Read-only data accessors, especially those that return scalar quantities, most especially those that return Booleans.
Destructors, in the most general sense. (This is a classic mistake in the UNIX kernel API: close and munmap should not be able to fail. Thankfully, at least _exit can't.)
There is a strong case that you should immediately call abort if malloc fails rather than trying to propagate it to your caller. (This is not true in C++ thanks to exceptions and RAII -- if you are so lucky as to be working on a C++ project that uses both of those properly.)
In closing: for an example of how to do just about everything wrong, look no further than XPCOM.
You return pointers to static const char [] objects. This is always the correct way to handle error strings. If you need them localized, you return pointers to read-only memory-mapped localization strings.
In C, if you don't have internationalization (I18N) or localization (L10N) to worry about, then pointers to constant data is a good way to supply error message strings. However, you often find that the error messages need some supporting information (such as the name of the file that could not be opened), which cannot really be handled by constant data.
With I18N/L10N to worry about, I'd recommend storing the fixed message strings for each language in an appropriately formatted file, and then using mmap() to 'read' the file into memory before you fork any threads. The area so mapped should then be treated as read-only (use PROT_READ in the call to mmap()).
This avoids complicated issues of memory management and avoids memory leaks.
Consider whether to provide a function that can be called to get the latest error. It can have a prototype such as:
int get_error(int errnum, char *buffer, size_t buflen);
I'm assuming that the error number is returned by some other function call; the library function then consults any threadsafe memory it has about the current thread and the last error condition returned to that thread, and formats an appropriate error message (possibly truncated) into the given buffer.
With C++, you can return (a reference to) a standard String from the error reporting mechanism; this means you can format the string to include the file name or other dynamic attributes. The code that collects the information will be responsible for releasing the string, which isn't (shouldn't be) a problem because of the destructors that C++ has. You might still want to use mmap() to load the format strings for the messags.
You do need to be careful about the files you load and, in particular, any strings used as format strings. (Also, if you are dealing with I18N/L10N, you need to worry about whether to use the 'n$ notation to allow for argument reordering; and you have to worry about different rules for different cultures/languages about the order in which the words of a sentence are presented.)
I guess you could use PWideChars, as Windows does. Its thread safe. What you need is that the calling app creates a PwideChar that the Dll will use to set an error. Then, the callling app needs to read that PWideChar and free its memory.
R. has a good answer (use static const char []), but if you are going to have various spoken languages, I like to use an Enum to define the error codes. That is better than some #define of a bunch of names to an int value.
return integers, don't set some global variable (like errno— even if it is potentially TLSed by an implementation); aking to Linux kernel's style of return -ENOENT;.
have a function similar to strerror that takes such an integer and returns a pointer to a const string. This function can transparently do I18N if needed, too, as gettext-returnable strings also remain constant over the lifetime of the translation database.
If you need to provide non-static error messages, then I recommend returning strings like this: error_code_t function(, char** err_msg). Then provide a function to free the error message: void free_error_message(char* err_msg). This way you hide how the error strings are allocated and freed. This is of course only worth implementing of your error strings are dynamic in nature, meaning that they convey more than just a translation of error codes.
Please havy oversight with mu formatting. I'm writing this on a cell phone...