I have a binary file read/write module in c++ . Which works fine for English language, but fails to read write french character set. What changes do i need to make ? any special encoding type needs to be specified ? (I have access to c++ std libs and qt 4.7 lib functions) .
You can try QString::fromUtf8(yourString)
For starters, make sure that your data files are UTF8 and that you open them as UTF8. Make sure that your source code files are UTF8, too, especially if you use any explicit strings in them, but it's better to avoid using explicit strings.
Related
I have the following problem:
When I build my application on Windows QML texts do actually wrap correctly with respect to the nbsp character (U+00A0 I think). On my Raspberry Pi with Raspbian however, it seems that the nbsp is ignored and the text is wrapped as if it was just a normal space.
There are several things that may have some importance here:
On Windows I have QT 5.4 whereas on the Raspberry Pi there is 5.2
I think it may have something to do with encoding. The thing is I remember it worked before I forced the G++ compiler on Pi to take the input files as CP1250 (I added QMAKE_CXXFLAGS += -finput-charset=CP1250 to the project file). Well I had to make this tweak because of the diacritics in some of the string literals (otherwise the texts are absolutely broken on raspberry). So as I said I think the word wrap have worked before I changed this compiler switch.
But still, there is not a single problem with the displaying of anything except that the texts happen to be breaked where they shouldn't. Note that there is not any "random" character or something but a regular space. That's absolutely strange as this looks there is no problem with encoding but rather with the word wrapping algorith itslef. But as I said it used to work when it thought the string literals are whatever the default on Linux is (UTF-8 I guess...).
As for the QML Text assignment these strings are taken from C array and assigned to the QML text using QObject::setProperty if that is of any importance...
Also note that I probably cannot change the encoding of my sources to UTF-8 because the file with the strings is shared also for some embedded project that works on the other side of the communication and this one has to be CP1250 because of the IDE.
Thanks in advance
EDIT:
I have some additional information: If I go through one of the affected string literals on Windows, it is in fact shorter than the same literal compiled on Raspberry, even when the source encoding is set to CP1250. For example the nbsp is encoded in only one byte on Windows (160d), but it is two bytes on Raspberry (194d,160d). That's strange, isn't it? I'd expect that after explaining g++ that the source code is encoded in CP1250, it should encode the literals in the same way? Or maybe not because this is then encoding of the string in the memory which is different by default on both Windows and Linux. But still I don't see where's the problem.
As suggested by Kevin Krammer,
QString::fromLocal8Bit()
was the solution.
I am trying to display a unicode character (Euro sign) on a button using Qt and C++ in Visual Studio 2013. I tried the following code:
_rotateLeftButton->setText("\u20AC");
and
_rotateLeftButton->setText("€");
and
_rotateLeftButton->setText(QString::fromUtf8("\u20AC"));
and
_rotateLeftButton->setText(QString::fromUtf8("€"));
However, all of those lines result in the following:
All my code files are UTF-8 encoded, except for the moc files (.cxx). For whichever reason the moc executable does not generate them using unicode. Yet I was not able to get this unicode symbol displayed correctly. I also tried setting another font than the default one withouth success. Does anyone know what could be the problem?
Thank you for your help.
QString::fromUtf8("€")
Will work if the file really is handled as UTF-8. As #n.m. commented, VS requires some help from a faux-BOM to ensure this.
QString::fromUtf8("\u20AC")
\u doesn't make sense in a byte string literal. You could spell it using \x byte escapes for the UTF-8 encoded version:
QString::fromUtf8("\xE2\x82\xAC")
Or use a wide string literal:
QString::fromWCharArray(L"\u20AC")
I have such files. I just want to open files with non-Latin names correctly.
I have no problems with files that have Latin names only with non-Latin names.
I use QDir for scanning directory and I hold names in QString, so it's held fine inside.
But there is a bottleneck with opening the file.
It gets so that I don't want to use QFile, I can use only C++ streams (more preferred) or C files.
When I want to open file, I do so:
fstream stream(source.toStdString().c_str(),ios_base::in | ios_base::binary);
After that I check whether attempt was successful:
if(!stream.is_open())
{ cout<<"file wasn't opened " <<source.toStdString().c_str())<<"\n";
return false; // cout was redirected to file // just a notice
}
I get in my log file:
file wasn't opened /home/sh/.mozilla/firefox/004_??????? - ????? - ?????.mp3
It doesn't work for any file with non-Latin name and it does work fine for every file with Latin names.
I understand that this problem can be jumped over using QFile.
But I wonder, is it possible to get it done without third-party libraries or are there some another ways for solving it?
Thanks in advance for any tips.
Things are going wrong when you call toStdString() on your QString. It will convert the contents based on QTextCodec::codecForCStrings(), if it has been set, and latin-1 will be used otherwise. Latin-1 will collapse your non-latin characters to '?'s.
Using source.toLocal8Bit().data() or source.toUtf8().data() instead will likely do what you want, but failing that you'll need to deal with QTextCodecs to get the right 8-bit encoding.
I need to read files with different encodings. Unicode files are correctly read using
wxFileInputStream fileInputStream(dialog->GetPath());
wxTextInputStream textInputStream(fileInputStream);
If I need to read, say, Cyrillic (cp1251) files, I use:
wxFileInputStream fileInputStream(dialog->GetPath());
wxTextInputStream textInputStream(fileInputStream, " \n", wxCSConv(wxFONTENCODING_CP1251));
But neither of these ways works with both kinds of files. In .NET we can just use:
new StreamReader(file, Encoding.Default)
So what's the alternative of Encoding.Default in wxWidgets or in C++ in general?
Thank you
I believe wxFONTENCODING_SYSTEM would be analogous to Encoding.Default.
The problem was solved by using wxConvAuto(wxFONTENCODING_SYSTEM) instead of wxCSConv(wxFONTENCODING_SYSTEM). The wxConvAuto function first tries to read the file as a Unicode document, and then if it fails, it uses system's encoding to read the ANSI file. It works great!
If I'm given a .doc file with special tags in it such as [first_name], how do I go about replacing all occurrences of it with something like "Clark"? A simple binary replacement only works if the replacement string is the exact same length.
Haskell, C, and C++ answers would be best, but any compiled language would do. I'd also prefer to do this without an external library since it has to be deployed on Windows and Linux and cross-platform dependency handling is a bitch.
To summarize...
.doc -> magic program -> .doc with strings replaced
You could use the Word COM component ("Word.Application") on Windows to open the file, do the replacements, save the file, and close it. However, this is Windows-only and can be buggy.
Another thing you could do is use the OpenOffice.org command line interface to convert the file to the ODF format, unzip the file (ODF is mostly zipped XML), do the replacements with the files inside, re-zip the file, and re-convert it to .doc format. However, OpenOffice.org doesn't always read Word files correctly (especially if there is a lot of complex formatting) and it can make it harder to distribute (users must either have OpenOffice.org or you must distribute it with your program).
Also, if you have a file in the .docx format, you can unzip it, do the replacements, and re-zip it.
First read the Word Document Specification.
If that hasn't terrified you, then you should find it fairly straightforward to figure out how to read and write it. It must be possible; Word manages to do it most of the time.
You probably have to use .Net programming (VB or C#) to create an object of Word.Application and then use the MS Word object model to manipulate your document.
Why do you want to be using C/C++/Haskell or another compiled language? I'm not too familiar with Haskell, but in general I would say that C is not a great language for performing text processing. A lot of interpreted languages (Perl, Python, etc.) also have powerful regular expression libraries that are suited for finding and replacing phrases.
With that said, as the other posters have noted, you will still have to deal with the eccentricities of the .doc format.