Convert three letter language code to language identifier (LANGID) - c++

Is there some way in the Win32 API to convert a three letter language code, as returned by GetLocaleInfo() with LOCALE_SABBREVLANGNAME specified, to a corresponding LANGID or LCID? That is, going in "reverse" to what GetLocaleInfo() normally does?
What I'm trying to do is to parse what kind of language a resource DLL is using, and so far, without touching anything about the DLL, going by the dll name with a format nameLNG.dll, where LNG is a three letter language code, seems to be the easiest method, assuming such a function exists.
If this isn't easy to do, I guess Plan B is to give our language DLL's a version info resource, specify their respective cultures there, and later on in the application, read which cultures they use.

Unfortunately, there's no direct Win32 API that gives you a LANGID given a 3-letters abbreviation.
It looks like CLanguageSupport is your friend today :-) It already implements your plan B to lookup the LANGID based on the contents of the version info resource.
The piece of code you're looking for is int the function
LANGID CLanguageSupport::GetLangIdFromFile(LPCTSTR pszFilename)
Of course, the drawback is that you may have a mismatch between the version info and the DLL name. But you'd very quickly catch it during tests. And if you let a tool such as appTranslator create the DLLs for you, you're sure to be on the safe side.

You can enumerate the installed locales using EnumSystemLocales() and build the map yourself. I do this during application initialization in a service that I wrote a while ago and it has worked out well thus far.
I would recommend using Plan B in this case. I usually steer clear of encoding stuff into the file name. If for no other reason, using the 3-character ISO-639 variant isn't perfect unless you strictly specify which variant you are using - ISO-639-2/B, ISO-639-2/T, or ISO-639-3.
If you need to provide locale-specific variants, then you should take a close look at RFC3066. Basically, you need to specify the language and country code and, in some cases, the region code as well. In any case, the LCID encapsulates all of this goodness.
The one thing that I am not completely certain of is whether the langID in the resource information is a full LCID or not. The codes listed in the VERSIONINFO reference are LCIDs so I would try using an LCID in the VERSIONINFO header. If not, you can always include the information as a string in the string block.

You can obtain the LangID by calling GetLocaleInfo() with LOCAL_RETURN_NUMBER|LOCALE_ILANGUAGE for the LCType parameter, just as you passed LOCALE_SABBREVLANGNAME to obtain the three-letter ISO code. By passing the same lcid to this function, you can store the corresponding LangID with the ISO code.
Note that MSDN says LOCALE_ILANGUAGE should not be used in favor LOCALE_SLANG on Vista and above, but I believe the comment does not apply for use of LOCALE_ILANGUAGE with LOCALE_RETURN_NUMBER.
As your project evolves, you might generate several localized files for each language. For this reason, I would suggest you store your localized files in subdirectories named after the language. Microsoft has used directories named after the LangID's, for example "1033" as the English resources directory. I think it would be more friendly to use the three-letter code, such as "ENU\name.dll". In any case, subdirectories are a simple solution and hopefully won't complicate your build process as much as changing the target file name.

Related

Rename Win32 Functions for Security on C++

Is there a way to rename a win32 function like GetVolumeInformationW() using #define ?
For example:
#define abc(LPCWSTR a, LPWSTR b, ...) GetVolumeInformationW(Some argumments..)
Why do that? I want to hide the function name on debbuger programms like IDA, is there some way to did that?
Language: C++
There is no point in using #define for this, as this will have no effect on the contents of the binary executable. Using preprocessor macros will only affect what you as a programmer will see, but it won't affect what the compiler or linker will see. See this link for information on how the C++ preprocessor works and its relationship with the compiler/linker.
If you do not want the function to appear in the Import Table of your executable, then you can instead load the function dynamically using GetProcAddress. That way, a disassembler will probably be unable to determine which function the address is pointing to, when the function is being called. However, the disassembler will be able to see that you are using GetProcAddress for something, it just won't know what. Using the function GetProcAddress may make someone trying to crack your software suspicious, because that is a common thing to do if you are trying to hide something.
If you do not want the string GetVolumeInformationW to appear in cleartext in your executable file, then you can somehow encrypt or obfuscate it, for example store it in reverse and then reverse it back before passing it to GetProcAddress. This was just a very simple example of how it could be done. Using XOR (which is the ^ operator in C++) on every character with a certain key to encrypt, and then do the same thing again to decrypt, would probably be a better solution, as this would make the the encrypted text not be easily identifiable as text.
Is there a way to rename a win32 function like GetVolumeInformationW() using #define ?
No, macros do not serve that purpose. You could define a macro such that Win32 function names do not appear literally in your source code, other than in the macro definitions, but that does not rename the functions, nor even prevent the function names from appearing in your compiled object files, libraries, or executables.
It can't, because the Win32 API's function names are established by the platform headers and (especially) libraries. You're not rebuilding the platform libraries, only linking the existing ones to your own code, so your code has no alternative but to use the API's function names to call API functions.
Why do that? I want to hide the function name on debbuger programms like IDA, is there some way to did that?
Obfuscation is not a very effective defense technique. It is far more likely to make trouble for you, in the ordinary development of your software, than to present a major hurdle to a skilled adversary. You can obfuscate the names of your own functions if you nevertheless wish to do so, but no, you cannot change the names of platform API functions.
You'll be calling a function out of a shared DLL. Defines are strictly preprocessor.
What you want to do is create a hash function to hash the string "GetVolumeInformationW". As well as the name of the module thats in. For example "Kernel32.dll"
Get the PEB using the FS or GS register. Then go to the PEB_LDR_DATA list. Run each list entry and hash the DLL name against your Kernel32 hashed string. If the hashes match, you grab the base of the library in that same structure.
After this you will then trace the export table. And do the same thing you did above, where you compare each export name to the hashed "GetVolumeInformationW" string. When it's found, you will then call the address it's at using a function pointer.
This is the sole way to do it. Bonus points if the encrypted strings are stored on the stack. So when coding it do
char[] szKernel32 = 'K', 'e', 'r', 'n'.........;
Also, do not use GetProcAddress. It defeats the point of hiding, since anyone experienced with IDA will instantly search for GetProcAddress.

Importing constants out of C++ headers instead of hardcoding them: extending .net controls?

I've been researching how to extend .net controls to have more freedom to do the same things you can do with the regular windows API in C++ in a VB program. For example, if you want to add week numbers to a calendar control, you'll have to manually import the DLL and extend the control's class, calling internal windows functions.
I've found various topics on how people handle this, and I'm not quite happy with the 'canonical method'. To be honest, I think it's a pretty bad paradigm to use.
These internal windows functions use pointers to set magic properties.
First, I find it rather strange that a pointer, which its system-dependent value size, is being abused to hold something that isn't a memory location but a value, but that aside: these pointers are also used to set which attribute is being set.
For example, (leaving out all the boilerplate necessary to link up the code), changing the first day of the week to Tuesday would use this code:
Private Const MCM_FIRST As Int32 = &H1000
Private Const DTM_FIRST As Int32 = &H1000
Private Const DTM_GETMONTHCAL As Int32 = (DTM_FIRST + 8)
Private Const MCM_SETFIRSTDAYOFWEEK As Int32 = (MCM_FIRST + 15)
Dim hMonthView As IntPtr =
SendMessage(Me.Handle, DTM_GETMONTHCAL, IntPtr.Zero, IntPtr.Zero)
Call SendMessage(hMonthView, MCM_SETFIRSTDAYOFWEEK, 0&, 1&)
So the magic values of 0x1008 and 0x1015 is what my question is about in this code.
First off, this is a rather strange way of working: these values aren't documented anywhere as far as I know other than the examples. What if I need a property where there happens to not be an internet tutorial on so far? Where/how do I find the value of MCM_<ARBITRARY_VALUE_HERE> in general?
Note: I mean the latter question in the broad, general sense: not applying to just the specific calendar control the example is about, but really any windows control. I can already google up the specific C++ header file by name (e.g. for the example it's defined in Commctrl.h: it's just that that piece of information is rather useless if I don't know the idiomatic way of how to pull something like that out of the C++ header into the VB code.
Secondly... these values are defined in headers somewhere. Is it not possible to import the values from the proper header? This way the program will stay working in the (admittedly unlikely) scenario where the DLL is changed by re-compiling it.
One approach for this back for VB6 was to prepare a TLB file with constants, function declarations, etc. of the Win32 API, and then reference that in the VB6 program. The TLB didn't provide COM objects, it was just a convenient way of packaging up all the declarations as though they were in (what we now think of as) an assembly.
As far as I can think, that approach should still work perfectly well today in .NET through "COM" interop. You can just as easily reference the TLB in a C# or VB project and thereby access its contents.
The book Hardcore Visual Basic by Bruce McKinney included a disk with a prepared TLB for this purpose, and this seems to still be available today:
http://vb.mvps.org/hardweb/mckinney2a.htm
I don't know how comprehensive this was at the time, nor if it is really still up to date. At the very least it seems instructive in how to prepare a TLB for this type of approach.
The following page also provides a description of this approach with some additional explanation an examples (too long to copy in here).
http://www.brainbell.com/tutors/Visual_Basic/newfile156.html

How to find a pointer to a function by string

I have a list of functions in a text file that I'd like to expose to LLVM for its execution engine at run time, I'm wondering if its possible to find pointers to the functions at runtime rather than hard code in all the GlobalMappings by hand as I'd probably like to add in more later. For example:
// File: InternalFunctions.txt
PushScreen
PopScreen
TopScreen
// File: ExposeEngine.cpp
// Somehow figure out the address of the function specified in a string
void* addy = magicAddress("PushScreen");
jit->addGlobalMapping(llvmfunction, addy);
If this is possible I love to know how to do it, as I am trying to write my game engine by jit-ing c++. I was able to create some results earlier, but I had to hard-code in the mappings. I noticed that Gtk uses something along the lines of what I'm asking. When you use glade and provide a signal handler, the program you build in c will automatically find the function in your executable referenced by the string provided in the glade file. If getting results requires me to look into this Gtk thing I'd be more than happy to, but I don't know what feature or part of the api deals with that - I've already tried to look it up. I'd love to hear suggestions or advice.
Yes, you can do this. Look at the man pages for dlopen() and dlsym(): these functions are standard on *nix systems and let you look up symbols (functions or variables) by name. There is one significant issue, which is that C++ function names are usually "mangled" to encode type information. A typical way around this is to define a set of wrapper functions in an extern "C" {} block: these will be non-member, C-style functions which can then call into your C++ code. Their names will not be mangled, making them easy to look up using dlsym().
This is a pretty standard way that some plugin architectures work. Or at least used to work, before everyone started using interpreted languages!

C++ Exposed property for COM in idl is showing as lower case "all of a sudden"

I have this old C++ COM component. I took the latest code base, built it and found that one of the properties has become lower case. For example, in the pre-compiled dll i have a property "Type", but when building from source it's called "type". The idl shows that the property is called "Type". So what could possibly be happening here?
COM is case-insensitive, so there is only one entry in the library's symbol table for the symbol "type". The version which is put into the symbol table is the first one that the compiler encounters.
Microsoft's advice on the matter is simply:
Make sure that the same name is not already present in the IDL file when introducing a new identifier.
You should stick to either Type or type in the IDL, for consistent results.
You discovered a quirk in the OS stock implementation of ICreateTypeLib, used by practically all tool chains on Windows that can create a type library. It uses a rather crude way to deal with possible problems caused by languages that are not case-sensitive, VB/A being a prominent example.
At issue is the definition of an identifier with one casing, being referenced elsewhere in the type library with another casing. Not a problem at all in, say, VB, big problem when the client programmer uses a case-sensitive language like C# or C++.
The "fix" it uses is to force the casing to be consistent everywhere in the library. Unfortunately it is not very sophisticated about it. Best example is a method declaration earlier in the type library that takes an argument named type. Any identifier named Type in the rest of the type library will now get case-converted to type.
Repairing this problem is easy enough, just change the name of the identifier so it no longer matches. You'll have to find it, not so easy, best to use Oleview.exe, File > View Typelib command. Copy/paste the decompiled IDL into a text editor and use its Search command.
I had the same problem almost 10 years after this question was asked and I would like to share my solution (thanks for the help in understanding the problem).
First I would like to say that I had several names whose casing was changed by tlbimp and changing all the instances of these names to my expected casing in the IDL fixed all but one. I'm assuming that that name (Text) came from a different IDL I imported. I was also not happy with the solution of changing the names of parameters and the like since in the future someone else may change them.
The solution I found was to introduce a dummy interface with the casing I wanted. I did this before all other imports and then referenced it in the library section of the IDL. Note that both these details are required. If you don't put it in the library section it's ignored and if I defined it at the beginning of the library section after the imports it's too late.
import "oaidl.idl";
import "ocidl.idl";
[
uuid(4EA92D5A-BF84-46C4-AA38-0F7DEADC69B),
helpstring("Ensure that names used in interop have correct casing")
]
interface IAmHack : IUnknown
{
HRESULT Space();
HRESULT The();
HRESULT Final();
HRESULT Frontier();
};
// ...
library MyLib
{
interface IAmHack;
importlib("stdole2.tlb");

Localization of string literals

I need to localize error messages from a compiler. As it stands, all error messages are spread throughout the source code as string literals in English. We want to translate these error messages into German. What would be the best way to approach this? Leave the string literals as-is, and map the char* to another language inside the error reporting routine, or replace the english literals with a descriptive macro, eg. ERR_UNDEFINED_LABEL_NAME and have this map to the correct string at compile time?
How is such a thing approached in similar projects?
On Windows, typically this is done by replacing the string with integer constants, and then using LoadString or similar to get them from a resource in a DLL or EXE. Then you can have multiple language DLLs and a single EXE.
On Unixy systems I believe the most typical approach is gettext. The end result is similar, but instead of defining integer constants, you wrap your English string literals in a macro, and it will apply some magic to turn that into a localized string.
The most flexible way would be for the compiler to read the translated messages from message catalogs, with the choice of language being made according to the locale. This would require changing the compiler to use some tool like
gettext.
Just a quick thought...
Could you overload your error reporting routine? Say you are using
printf("MESSAGE")
You could overload it in a way that "MESSAGE" is the input, and you hash it to the corresponding message in German.
Could this work?
You could use my CMsg() and CFMsg() wrappers around the LoadString() API. They make your life easier to load and format the strings pulled out of the resources.
And of course, appTranslator is your best friend to translate your resources ;-)
Disclaimer: I'm the author of appTranslator.
On Windows you can use the resource compiler and the WinAPI load functions to have localized strings and other resources. FindResource() and its specialized derivatives like LoadString() will automatically load language specific resources according to the user's current locale. FindResourceEx() even allows you to manually specify the language version of the resource you wish to retrieve.
In order to enable this in your program you must first change your program to compile the strings in an resource file(.rc) and use LoadString() to fetch the strings at runtime instead of using a literal string. Within the resource file you then setup multiple language versions of the STRINGTABLEs you use, with the LANGUAGE modifier. The multi-lingual resources are then loaded based on the search order described here on MSDN: Multiple-Language Resources
Note: If you have no reason to need a single executable, or are doing something like using a user selected language from within your app, it gives you more control and less confusion to compile each language in a seperate dll and load them dynamically rather than have a large single resource file and trying to dynamically switch locales.
Here is an example of a multiple language StringTable resource file (ie:strings.rc):
#define IDS_HELLOSTR 361
STRINGTABLE
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_CAN
BEGIN
IDS_HELLO, "Hello!"
END
STRINGTABLE
LANG_FRENCH, SUBLANG_NEUTRAL
BEGIN
IDS_HELLO, "Bonjour!"
END