Localization of string literals - c++

I need to localize error messages from a compiler. As it stands, all error messages are spread throughout the source code as string literals in English. We want to translate these error messages into German. What would be the best way to approach this? Leave the string literals as-is, and map the char* to another language inside the error reporting routine, or replace the english literals with a descriptive macro, eg. ERR_UNDEFINED_LABEL_NAME and have this map to the correct string at compile time?
How is such a thing approached in similar projects?

On Windows, typically this is done by replacing the string with integer constants, and then using LoadString or similar to get them from a resource in a DLL or EXE. Then you can have multiple language DLLs and a single EXE.
On Unixy systems I believe the most typical approach is gettext. The end result is similar, but instead of defining integer constants, you wrap your English string literals in a macro, and it will apply some magic to turn that into a localized string.

The most flexible way would be for the compiler to read the translated messages from message catalogs, with the choice of language being made according to the locale. This would require changing the compiler to use some tool like
gettext.

Just a quick thought...
Could you overload your error reporting routine? Say you are using
printf("MESSAGE")
You could overload it in a way that "MESSAGE" is the input, and you hash it to the corresponding message in German.
Could this work?

You could use my CMsg() and CFMsg() wrappers around the LoadString() API. They make your life easier to load and format the strings pulled out of the resources.
And of course, appTranslator is your best friend to translate your resources ;-)
Disclaimer: I'm the author of appTranslator.

On Windows you can use the resource compiler and the WinAPI load functions to have localized strings and other resources. FindResource() and its specialized derivatives like LoadString() will automatically load language specific resources according to the user's current locale. FindResourceEx() even allows you to manually specify the language version of the resource you wish to retrieve.
In order to enable this in your program you must first change your program to compile the strings in an resource file(.rc) and use LoadString() to fetch the strings at runtime instead of using a literal string. Within the resource file you then setup multiple language versions of the STRINGTABLEs you use, with the LANGUAGE modifier. The multi-lingual resources are then loaded based on the search order described here on MSDN: Multiple-Language Resources
Note: If you have no reason to need a single executable, or are doing something like using a user selected language from within your app, it gives you more control and less confusion to compile each language in a seperate dll and load them dynamically rather than have a large single resource file and trying to dynamically switch locales.
Here is an example of a multiple language StringTable resource file (ie:strings.rc):
#define IDS_HELLOSTR 361
STRINGTABLE
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_CAN
BEGIN
IDS_HELLO, "Hello!"
END
STRINGTABLE
LANG_FRENCH, SUBLANG_NEUTRAL
BEGIN
IDS_HELLO, "Bonjour!"
END

Related

Rename Win32 Functions for Security on C++

Is there a way to rename a win32 function like GetVolumeInformationW() using #define ?
For example:
#define abc(LPCWSTR a, LPWSTR b, ...) GetVolumeInformationW(Some argumments..)
Why do that? I want to hide the function name on debbuger programms like IDA, is there some way to did that?
Language: C++
There is no point in using #define for this, as this will have no effect on the contents of the binary executable. Using preprocessor macros will only affect what you as a programmer will see, but it won't affect what the compiler or linker will see. See this link for information on how the C++ preprocessor works and its relationship with the compiler/linker.
If you do not want the function to appear in the Import Table of your executable, then you can instead load the function dynamically using GetProcAddress. That way, a disassembler will probably be unable to determine which function the address is pointing to, when the function is being called. However, the disassembler will be able to see that you are using GetProcAddress for something, it just won't know what. Using the function GetProcAddress may make someone trying to crack your software suspicious, because that is a common thing to do if you are trying to hide something.
If you do not want the string GetVolumeInformationW to appear in cleartext in your executable file, then you can somehow encrypt or obfuscate it, for example store it in reverse and then reverse it back before passing it to GetProcAddress. This was just a very simple example of how it could be done. Using XOR (which is the ^ operator in C++) on every character with a certain key to encrypt, and then do the same thing again to decrypt, would probably be a better solution, as this would make the the encrypted text not be easily identifiable as text.
Is there a way to rename a win32 function like GetVolumeInformationW() using #define ?
No, macros do not serve that purpose. You could define a macro such that Win32 function names do not appear literally in your source code, other than in the macro definitions, but that does not rename the functions, nor even prevent the function names from appearing in your compiled object files, libraries, or executables.
It can't, because the Win32 API's function names are established by the platform headers and (especially) libraries. You're not rebuilding the platform libraries, only linking the existing ones to your own code, so your code has no alternative but to use the API's function names to call API functions.
Why do that? I want to hide the function name on debbuger programms like IDA, is there some way to did that?
Obfuscation is not a very effective defense technique. It is far more likely to make trouble for you, in the ordinary development of your software, than to present a major hurdle to a skilled adversary. You can obfuscate the names of your own functions if you nevertheless wish to do so, but no, you cannot change the names of platform API functions.
You'll be calling a function out of a shared DLL. Defines are strictly preprocessor.
What you want to do is create a hash function to hash the string "GetVolumeInformationW". As well as the name of the module thats in. For example "Kernel32.dll"
Get the PEB using the FS or GS register. Then go to the PEB_LDR_DATA list. Run each list entry and hash the DLL name against your Kernel32 hashed string. If the hashes match, you grab the base of the library in that same structure.
After this you will then trace the export table. And do the same thing you did above, where you compare each export name to the hashed "GetVolumeInformationW" string. When it's found, you will then call the address it's at using a function pointer.
This is the sole way to do it. Bonus points if the encrypted strings are stored on the stack. So when coding it do
char[] szKernel32 = 'K', 'e', 'r', 'n'.........;
Also, do not use GetProcAddress. It defeats the point of hiding, since anyone experienced with IDA will instantly search for GetProcAddress.

Application with a patchable embedded configuration

I work on a cross-platform C++ application (Visual C++, GCC, clang++ regarding the target platform). I want to embed a configuration string into my application and have possibility to patch the binary after compilation to change the configuration and make it preconfigured.
Now I only consider declaring a configuration variable:
const char* embeddedConfig = "*magic*random characters filling the maximum configuration size";
Patcher is going to search for the magic in the binary and replace it with the actual configuration.
I am not sure of the stability of the hacky approach. Is there any more reliable way (perhaps compiler-specific)?
That will work if you keep the size of the text unchanged and declare the constant outside of any function. Such constants are simply put into the data section of binary by compiler.
However you will need to re-sign the binary if you are using code signing.
Embed the string as a resource and use the UpdateResourceA function. See here.

How to find a pointer to a function by string

I have a list of functions in a text file that I'd like to expose to LLVM for its execution engine at run time, I'm wondering if its possible to find pointers to the functions at runtime rather than hard code in all the GlobalMappings by hand as I'd probably like to add in more later. For example:
// File: InternalFunctions.txt
PushScreen
PopScreen
TopScreen
// File: ExposeEngine.cpp
// Somehow figure out the address of the function specified in a string
void* addy = magicAddress("PushScreen");
jit->addGlobalMapping(llvmfunction, addy);
If this is possible I love to know how to do it, as I am trying to write my game engine by jit-ing c++. I was able to create some results earlier, but I had to hard-code in the mappings. I noticed that Gtk uses something along the lines of what I'm asking. When you use glade and provide a signal handler, the program you build in c will automatically find the function in your executable referenced by the string provided in the glade file. If getting results requires me to look into this Gtk thing I'd be more than happy to, but I don't know what feature or part of the api deals with that - I've already tried to look it up. I'd love to hear suggestions or advice.
Yes, you can do this. Look at the man pages for dlopen() and dlsym(): these functions are standard on *nix systems and let you look up symbols (functions or variables) by name. There is one significant issue, which is that C++ function names are usually "mangled" to encode type information. A typical way around this is to define a set of wrapper functions in an extern "C" {} block: these will be non-member, C-style functions which can then call into your C++ code. Their names will not be mangled, making them easy to look up using dlsym().
This is a pretty standard way that some plugin architectures work. Or at least used to work, before everyone started using interpreted languages!

Is there a tool that enables me to insert one line of code into all functions and methods in a C++-source file?

It should turn this
int Yada (int yada)
{
return yada;
}
into this
int Yada (int yada)
{
SOME_HEIDEGGER_QUOTE;
return yada;
}
but for all (or at least a big bunch of) syntactically legal C/C++ - function and method constructs.
Maybe you've heard of some Perl library that will allow me to perform these kinds of operations in a view lines of code.
My goal is to add a tracer to an old, but big C++ project in order to be able to debug it without a debugger.
Try Aspect C++ (www.aspectc.org). You can define an Aspect that will pick up every method execution.
In fact, the quickstart has pretty much exactly what you are after defined as an example:
http://www.aspectc.org/fileadmin/documentation/ac-quickref.pdf
If you build using GCC and the -pg flag, GCC will automatically issue a call to the mcount() function at the start of every function. In this function you can then inspect the return address to figure out where you were called from. This approach is used by the linux kernel function tracer (CONFIG_FUNCTION_TRACER). Note that this function should be written in assembler, and be careful to preserve all registers!
Also, note that this should be passed only in the build phase, not link, or GCC will add in the profiling libraries that normally implement mcount.
I would suggest using the gcc flag "-finstrument-functions". Basically, it automatically calls a specific function ("__cyg_profile_func_enter") upon entry to each function, and another function is called ("__cyg_profile_func_exit") upon exit of the function. Each function is passed a pointer to the function being entered/exited, and the function which called that one.
You can turn instrumenting off on a per-function or per-file basis... see the docs for details.
The feature goes back at least as far as version 3.0.4 (from February 2002).
This is intended to support profiling, but it does not appear to have side effects like -pg does (which compiles code suitable for profiling).
This could work quite well for your problem (tracing execution of a large program), but, unfortunately, it isn't as general purpose as it would have been if you could specify a macro. On the plus side, you don't need to worry about remembering to add your new code into the beginning of all new functions that are written.
There is no such tool that I am aware of. In order to recognise the correct insertion point, the tool would have to include a complete C++ parser - regular expressions are not enough to accomplish this.
But as there are a number of FOSS C++ parsers out there, such a tool could certainly be written - a sort of intelligent sed for C++ code. The biggest problem would probably be designing the specification language for the insert/update/delete operation - regexes are obviously not the answer, though they should certainly be included in the language somehow.
People are always asking here for ideas for projects - how about this for one?
I use this regex,
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
to locate the functions and add extra lines of code.
With that regex I also get the function name (group 1) and the arguments (group 2).
Note: you must filter out names like, "while", "do", "for", "switch".
This can be easily done with a program transformation system.
The DMS Software Reengineering Toolkit is a general purpose program transformation system, and can be used with many languages (C#, COBOL, Java, EcmaScript, Fortran, ..) as well as specifically with C++.
DMS parses source code (using full langauge front end, in this case for C++),
builds Abstract Syntax Trees, and allows you to apply source-to-source patterns to transform your code from one C# program into another with whatever properties you wish. THe transformation rule to accomplish exactly the task you specified would be:
domain CSharp.
insert_trace():function->function
"\visibility \returntype \fnname(int \parametername)
{ \body } "
->
"\visibility \returntype \fnname(int \parametername)
{ Heidigger(\CppString\(\methodname\),
\CppString\(\parametername\),
\parametername);
\body } "
The quote marks (") are not C++ quote marks; rather, they are "domain quotes", and indicate that the content inside the quote marks is C++ syntax (because we said, "domain CSharp"). The \foo notations are meta syntax.
This rule matches the AST representing the function, and rewrites that AST into the traced form. The resulting AST is then prettyprinted back into source form, which you can compile. You probably need other rules to handle other combinations of arguments; in fact, you'd probably generalize the argument processing to produce (where practical) a string value for each scalar argument.
It should be clear you can do a lot more than just logging with this, and a lot more than just aspect-oriented programming, since you can express arbitrary transformations and not just before-after actions.

Convert three letter language code to language identifier (LANGID)

Is there some way in the Win32 API to convert a three letter language code, as returned by GetLocaleInfo() with LOCALE_SABBREVLANGNAME specified, to a corresponding LANGID or LCID? That is, going in "reverse" to what GetLocaleInfo() normally does?
What I'm trying to do is to parse what kind of language a resource DLL is using, and so far, without touching anything about the DLL, going by the dll name with a format nameLNG.dll, where LNG is a three letter language code, seems to be the easiest method, assuming such a function exists.
If this isn't easy to do, I guess Plan B is to give our language DLL's a version info resource, specify their respective cultures there, and later on in the application, read which cultures they use.
Unfortunately, there's no direct Win32 API that gives you a LANGID given a 3-letters abbreviation.
It looks like CLanguageSupport is your friend today :-) It already implements your plan B to lookup the LANGID based on the contents of the version info resource.
The piece of code you're looking for is int the function
LANGID CLanguageSupport::GetLangIdFromFile(LPCTSTR pszFilename)
Of course, the drawback is that you may have a mismatch between the version info and the DLL name. But you'd very quickly catch it during tests. And if you let a tool such as appTranslator create the DLLs for you, you're sure to be on the safe side.
You can enumerate the installed locales using EnumSystemLocales() and build the map yourself. I do this during application initialization in a service that I wrote a while ago and it has worked out well thus far.
I would recommend using Plan B in this case. I usually steer clear of encoding stuff into the file name. If for no other reason, using the 3-character ISO-639 variant isn't perfect unless you strictly specify which variant you are using - ISO-639-2/B, ISO-639-2/T, or ISO-639-3.
If you need to provide locale-specific variants, then you should take a close look at RFC3066. Basically, you need to specify the language and country code and, in some cases, the region code as well. In any case, the LCID encapsulates all of this goodness.
The one thing that I am not completely certain of is whether the langID in the resource information is a full LCID or not. The codes listed in the VERSIONINFO reference are LCIDs so I would try using an LCID in the VERSIONINFO header. If not, you can always include the information as a string in the string block.
You can obtain the LangID by calling GetLocaleInfo() with LOCAL_RETURN_NUMBER|LOCALE_ILANGUAGE for the LCType parameter, just as you passed LOCALE_SABBREVLANGNAME to obtain the three-letter ISO code. By passing the same lcid to this function, you can store the corresponding LangID with the ISO code.
Note that MSDN says LOCALE_ILANGUAGE should not be used in favor LOCALE_SLANG on Vista and above, but I believe the comment does not apply for use of LOCALE_ILANGUAGE with LOCALE_RETURN_NUMBER.
As your project evolves, you might generate several localized files for each language. For this reason, I would suggest you store your localized files in subdirectories named after the language. Microsoft has used directories named after the LangID's, for example "1033" as the English resources directory. I think it would be more friendly to use the three-letter code, such as "ENU\name.dll". In any case, subdirectories are a simple solution and hopefully won't complicate your build process as much as changing the target file name.