I am very new to localization, I am trying to localize a small software which has 19 folders 'en', 'jp, 'tw' as names for example. Inside each one is a text file saved as utf-8 with language data.
The problem is when I try and copy and paste from a chinese site I get strange glyphs like this [][][][] I presume its because my system font is not chinese and it does not support that.
As a developer should I somehow change my entire system font to have all of these languages supported? Is there such a font? I am unsure how software companies handle these things.
As a developer should I somehow change my entire system font to have
all of these languages supported?
No, you should not. Consider localization string as data.
The problem is when I try and copy and paste from a chinese site I get
strange glyphs like this [][][][] I presume its because my system font
is not chinese and it does not support that.
But you should be provided with such data and you should know it's encoding.
Also, I've suggest you to check internationalization libraries (like gettext) to prevent reinventing the wheel.
Related
I have software originally developed 20 years ago in Visual C++ using MFC without UNICODE. Currently strings are held either in char[] or CString, and it works on English and Japanese Windows PCs until Japanese characters are used, as these tend to get converted to strange characters or empty boxes.
Setting UNICODE is presumably the way forward but will require a massive code change, whereas quite a lot seems to work simply by setting System Locale to Japan (in “Window’s Language for non-Unicode programs” setting). I have no idea how Windows does this, but some Japanese character things now work on my English Windows PC, e.g. I can open and save Japanese filenames with no code changes. And in Japan they set System Locale to English and again much works, but not everything.
I get the impression the problems are due to using a font that doesn’t include Japanese characters. Currently I am using Arial / MS Sans Serif and charset set to ANSI_CHARSET or DEFAULT_CHARSET. Is there a different font I should be using, or can I extend these fonts to include Japanese characters? Or am I barking up the wrong tree in which case what do I do next? Am very new to all this unfortunately …
That's a common question (OK I guess not so common any more in 2015, as MBCS programs luckily are a dying breed - I still maintain several though...)
Either way, I'm afraid that, depending on your definition of 'working', to get this working you'll have to bite the bullet and convert to a Unicode build. If you can't make a business case for that, then you'll have to set the right locale (well, worse, have the user set the 'right' one) and test what works and what doesn't, and ask more specific questions on what doesn't.
If your goal is to make one application that correctly displays strings in various encodings in the 'right' way regardless of the locale settings on the computer, and compatible with every input data set / database content without the user having to be aware of encoding issues, then you're out of luck with an MBCS build.
The font missing characters is most likely not the problem. Before you go any further and/or ask further questions, you should read http://www.joelonsoftware.com/articles/Unicode.html, read it again, sleep on it, read it again, explain to somebody else what the relationship is between 'encoding', 'locale', 'character set', 'font' and 'Unicode code point', because only after you can do that, you can decide on how to progress with your application. Sorry, it's not what you want to hear, but it's the reality if you've been tasked with handling internationalization.
I am aware of utilities like GNU gettext for making the software multilingual.
You give it a string id and it will return the translated string.
But I also need this for images in Qt.
For example, if I am displaying an image in en_US locale, I want to display a different version of the image if say ja_JP or fr_FR locale is set.
Qt doesn't recommend this. But I still need to do this.
I am working on C/C++, Linux.
Is there any standard way(like gettext) of achieving this for images?
Any suggestions on this will be appreciated.
Yes the resource system allows you to specify the language each resource is associated with the lang attribute in the .qrc file.
Title says pretty much everything. Once upon a time when I was under 13, my older bro did in BorlandPascal a thing which amazed me. He defined kind of table [8][8] with values of 1 and 0, meaning respectively foreground and background. Having several of such tables he could somehow redefine default ASCII characters to look like in these tables. I have no idea how it was done, but it worked.
My question is: can I do similar thing in ncurses, and if I can then how to do it?
The short answer is no. What ncurses does is generating ANSI escape codes which are interpreted by the terminal. There are no codes for altering the font. (Althou there have been extensions propesed no commonly used terminal supports them, neither does ncurses.) And there is no generic way of communicating with the terminal through some kind of side channel for changing the font. But there might ways in some specific situations.
If you have direct access to a Linux console for example you could could do all sorts of things, much like in Borland Pascal. But it will likely be more messy and less impressive.
As the selected answer explains, this is not possible for NCurses to render custom glyphs. ncurses only manipulates the terminal screen state via escape codes (Clearing and rewriting lines to achieve interactivity).
However it should be noted that's very possible to use custom glyphs in the terminal via custom fonts.
This is what Powerline does (a popular terminal UI status line for vim, tmux and friends): https://github.com/powerline/fonts
By patching the fonts, you can inject your glyphs into the existing font being used by the terminal, which then you can access and render via ncurses as any other character.
Of course this is not ideal solution, but with some auto patching of the fonts, and careful testing, it makes it possible to build an app that uses custom glyphs—when your really in a pinch for more expressive UI tools than ncurses can offer.
Further reading: https://apw-bash-settings.readthedocs.io/en/latest/fontpatching.html
I am finishing application in Visual C++/Windows API and I am using MySql C Connector.
Whole application code uses ANSI, MySql C Connector is in ANSI too.
This program will be used on Polish and German computers with Windows XP/Vista/7 or 8.
I want to correcly display german umlauts and polish accent characters on:
DialogBox controls (strings are loaded from language files)
Generated XHTML documents
Strings retrieved from MySql database displayed on controls and in XHTML documents
I have heard about MultiByteToWideChar and Unicode functions (MessageBoxW etc.), but application code is nearly finished, converting is a lot of work...
How to make character encoding correctly with the least work and time?
Maybe changing system code page for non-Unicode program?
First, of course: what code set is MySQL returning? Or perhaps:
what code set was used when writing the data into the data base?
Other than that, I don't think you'll be able to avoid using
either wide characters or multibyte characters: for single byte
characters, German would use ISO 8859-1 (code page 1252) or
ISO 8859-15, Polish ISO 8859-2 (code page 1250). But what are
you doing with the characters in your own code? You may be able
to get away with UTF-8 (code page 65001), without many changes.
The real question is where the characters originally come from
(although it might not be too difficult to translate them into
UTF-8 immediately at the source); I don't think that Windows
respects the code page for input.
Although it doesn't help you much to know it, you're dealing
with an almost impossible problem, since so much depends on
things outside your program: things like the encoding of the
display font, or the keyboard driver, for example. In fact,
it's not rare for programs to display one thing on the screen,
and something different when outputting to the printer, or to
display one thing on the screen, but something different if the
data is written to a file, and read with another program. The
situation is improving—modern Unix and the Internet are
gradually (very gradually) standardizing on UTF-8, everywhere
and for everything, and Windows normally uses UTF-16 for
everything that is pure Windows (but needs to support UTF-8 for
the Internet). But even using the platform standard won't help
if the human client has installed (and is using) fonts which
don't have the characters you need.
I like to print a document. The content of the document are tables and text with different colors. Does a lightwight printer-file-format exist, which can be used like a template?
PS, PDF, DOC files in my opinion are to heavy to parse. May there exist some XML or YAML file format which supports:
Easy creation (maybe with a WYSIWYG-Editor)
Parsing and manipulation with Library-Support
Easy sending to the printer (maybe with Library-Support)
Or do I have to do it the usual way and paint within a CDC?
I noticed you’re using MFC (so, Windows). In that case the answer is a qualified yes. In recent versions of Windows, Microsoft offers the XPS Document API which lets you create and manipulate a PDF-like document using XML, which can then be printed using the XPS Print API.
(For earlier versions of Windows that don’t support this API, you could try to deal with the XPS file format directly, but that is probably a lot harder than using CDC. Even with the API you will be working at a fairly low level.)
End users can generate XPS documents using the XPS print driver that is available for free from Microsoft (and bundled with certain MS products—they probably already have it on their system).
There is no universal language that is supported across all (or even many) printers. While PCL and PS are the most used, there are also printers which only work with specific printer drivers because they only support a proprietary data format (often pre-rendered on the client).
However, you could use XSL-FO to create documents which can then be rendered to a printer driver using library support.
I think something like TeX or LaTeX (or even troff or groff) may meet your needs. Google them and see.
There are also libraries to render documents for print from HTML source. Look at http://libharu.sourceforge.net/ for example. This outputs a printer-ready .PDF
A think that Post Script is a really good choice for that.
It is actually a very simple language, and it must be very easy to parse becuse it is stack-oriented. Then -- most printers supprort it, and even if you have no support you can use GhostScript to convert for many different formats (Consider GS as a "virtual PS supporting printer").
Finally there are a lot of books and tutorials for the language.
About the parsing -- you can actually define new variables and functions in PS. So, maybe, your problem can be solved (almost) entirely using PS.
HTML + CSS can be printed -- properly. CSS was designed to support this with the media attribute to specify that your CSS is for printer layout, not for screen layout. Tools like PRINCE (free + commercial versions) exist to render this for printing.
I think postscript is the markup language used by printers. I read this somewhere, so correct me if postscript is now outdated.
http://en.wikipedia.org/wiki/PostScript
For more powerful suite you can use Latex. It will give options of creating templates where you can just copy the text.
On a more GUI friendly note, MS-Word and other word processors have templates. The issue is they are not of a common standard or markup.
You can also use HTML to render stuff in a common markup but it will not be very printer friendly.