I am trying to build a project written in VS 2008 using QtCreator under Linux and I get loads of errors:
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: error: stray ‘\377’ in program
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: error: stray ‘\376’ in program
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: error: stray ‘#’ in program
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: warning: null character(s) ignored
etc.
Does it mean that the compiler can't handle unicode correctly? How can I fix it?
That looks like a UTF-16 BOM for little-endian UTF-16. You need to make sure the file is saved as UTF-8 or convert it manually via iconv -f UTF-16LE -t UTF8 myfile.
Ensure the file is encoded in UTF-8. Open it with a text editor that allows you chosing the file encoding (e.g. gedit or notepad++) and convert it. I've had similar issues before, but UTF-8 files work fine (other encodings like UTF-16 won't work).
Edit: Don't convert your resource script (if there's any) to UTF-8. The resource compiler won't be able to read it (at least when using MSVC 2008).
It may be that your files use windows encoding, with characters like ^M, \r\n...
Have you tried to run dos2unix on your source files before compiling ?
I think i've seen 'stray ...' in file with unicode.
You may configure your editor's or console's (or both) encoding setting to fix it.
Related
I am using Eclipse CDT Eclipse C/C++ Development Tools 10.2.0.202103011047, CppStyle 1.5.0.0 and clang-format from clang 12.0.0.
My C++ source file contains a line
QString fileExportDirectory = "./😁";
because I am writing a unit test for handling Unicode paths. As soon as I format any part of the file, the line gets changed into
QString fileExportDirectory = "./?";
Why does that happen? Both, the encoding of the respective file and the default text file encoding are set to "UTF-8". I have not read anything that clang-format or Cppstyle have difficulties with Unicode. How can I prevent my clang-format code formatter from destroying Unicode contents?
The described misbehaviour is a reported CppStyle bug (https://github.com/wangzw/CppStyle/issues/39). As a workaround
-Dfile.encoding=UTF-8
can be added to "eclipse.ini", then formatting works as expected.
I got a Visual Studio Qt/C++ project (from China, not sure if this matters so mentioning here because Chinese characters are a little tricky sometimes). When I open it on QtCreator (my preferred IDE) on macOS (my preferred OS) then I get
Could not decode main.cpp with UTF-8 ecoding. Editing not possible.
If I click Select Encoding and choose System then I can edit normally. I can even save and close the file but when I open it again same thing happen.
I noticed there are some comments appearing as )//?????????????????? and //???????�??????????UI which seems to be a problem related to enconding.
How to deal with this issue?
What the System encoding means?
Openning the file on SublimeText and Save with Encoding UTF8 seems to solve the problem. But I have a lot of files, any suggestion on how to do it from command line for all files?
And the file seems not to be UTF8:
$ file main.cpp
main.cpp: c program text, ISO-8859 text
Finally I went to QtCreator, Tools, Options, Text Editor, Behavior, File Encodings and set Default Encoding to ISO-8859-1. Now there is no more complains on QtCretor side. Are there any downsides on doing this?
I suspect it contains non-valid UTF-8 characters. Here is a question with the same problem on Qt forum. One of the comments says
I just discovered this because a header file from FTDI contained a copyright symbol. I just converted that to read (C) instead of the copyright symbol and it was fine.
You can try that. If it's not that, I advise you to check if it is valid UTF-8 text. You can check if it is valid UTF-8 with a command like: iconv -f UTF-8 your_file > /dev/null; echo $?, it will return 0 if it is valid and 1 if it is not valid.
I am starting to use wxWidgets (Version 3.1.4) for a Windows GUI application. Compiling and linking is done with gcc coming with MINGW64 (gcc (x86_64-posix-sjlj-rev0, Built by MinGW-W64 project) 8.1.0).
In the source, the characters are entered correctly:
m_menuItem31 = new wxMenuItem(m_name6, wxID_ANY, _("Stückliste drucken"), _("Stückliste drucken"), wxITEM_NORMAL);
m_name6->Append(m_menuItem31);
In the application it looks like this:
After some research I tried using the linker option -municode ending up in an error "no reference to wWinMain". #define wxUNICODE has no effect.
Edit:
After preprocessing, the characters are still as desired. In the .o file, they are already spoiled, so the solution should be a compiler switch I am not yet aware of...
Any good ideas are welcome.
This might be up to the encoding of your source file (as VZ hinted) - I am developing wxWidgets C++ app with some non-english (e.g. - ć) characters similar to German umlauts in terms of occasional display problems.
I opened my source file in Notepad++ to see my source file's encoding and it showed encoding was ANSI:
and when this file was compiled (in MSVC) it produced correct display in application:
But when in Notepad++ I converted encoding to UTF-8, in source file it still appeared correct:
I saved file with encoding converted from ANSI to UTF-8 and compiled but after that application showed wrong character:
I advise you to take Notepad++ and do similar experiment and try to find out what encoding you are using and what encoding you should be using in your source files (possibly it should be encoding that the compiler is expecting, as VZ hinted).
In my case it didn't really seem to matter that much if string was bounded in _() or wxT() - I was able to get correct display when compiled as long as encoding of the source file was correct.
As explained in the wx docs, you can't simply expect wxMenuItem(...,..., _("Stückliste drucken"),...) to work, due to that ü is not a 7-bit valid character.
You may use
_(wxString::FromUTF8("St \xC3\xBC ckliste drucken"))
or
_(L"St \u00FC ckliste drucken")
because ü has 00FC Unicode point or C3BC in UTF-8 encoding.
Finally I got it sorted out...
I use Codelite, and after the answer from Ivan (thanks a lot for the hints!) I made some more tests. After adding some of the special characters to the minimal sample of wxWidgets and compiling it from commandline, the characters were displayed wrong in the first place. In Notepad++ I found coding to be UTF-8. After changing it to ANSI, characters were displayed correctly (consistent with Ivan's answer).
I tried the same with Codelite, and it failed. The reason was that the coding of the source file was always reset to UTF-8. It took 2 iterations to get the settings in the Codelite preferences correct (ISO-8859-1 with localization enabled) after which it worked.
The only strange effect I observed was that linking now took about 14 min... need to dig into that further.
Is there a way through install script/Windows batch/PowerShell that I will be able to check if a file is UTF-8 before passing it for conversion?
As a background, I am currently working on a legacy (Japanese) Windows software developed with Visual Studio 2005 (Upgraded to Visual Studio 2017) in C++.
I am dealing with a requirement to make GUI be able display and input Chinese characters. Thus the decision to use UNICODE for the project/solution encoding.
Since the project was originally using Multibyte, to be backwards compatible with UNICODE I have decided to encode configuration files (ini, dat, save files) in UTF-8 as these files are also referenced by a web application.
The main bits of the software are now done and working, and I am left with one last problem - rolling out a version up installer.
In this installer (using Install script), I am required to update save files (previously encoded in SHIFT-JIS as these save files contains Japanese text) to UTF-8.
I have already created a batch file in the following lines which converts SHIFT-JIS to UTF-8, which is called at the last part of the installer and is deleted after conversion.
#echo off
:: Shift_JIS -> UTF-8
setlocal enabledelayedexpansion
for %%f in ("%~dp0\savedfiles\*.sav") do (
echo %%~ff| findstr /l /e /i ".sav"
if !ERRORLEVEL! equ 0 (
powershell -nop -c "&{[IO.File]::WriteAllText($args[1], [IO.File]::ReadAllText($args[0], [Text.Encoding]::GetEncoding(932)))}" \"%%~ff" \"%%~ff"
)
)
However, the problem with this is that when the user (1) upgrades, (2) uninstalls (.sav files are left behind on purpose) and (3) re-installs the software the save files are doubly re-encoded and results in the software crashing. (UTF-8 Japanese characters updated during (1) upgrade, become garbage characters after (3) re-installation.)
If you're upgrading then all the current files should be in Shift-JIS. Even if you have some situations that leave both Shift-JIS and UTF-8 files at the same time then there are only 2 types of encodings that you need to handle. Therefore you can work around this by checking if the file is not valid UTF-8 then it's Shift-JIS. Of course this will still subject to incorrect detection in some rare cases but otherwise it might be good for your use case
By default when reading text files a best-fit fallback or replacement fallback handler is used. We can change to an exception fallback so it'll throw an exception if a Shift-JIS file is opened as UTF-8
try {
$t = [IO.File]::ReadAllText($f, [Text.Encoding]::GetEncoding(65001, `
(New-Object Text.EncoderExceptionFallback), `
(New-Object Text.DecoderExceptionFallback)))
} catch {
# File is not UTF-8, reopen as Shift-JIS
$t = [IO.File]::ReadAllText($f, [Text.Encoding]::GetEncoding(932))
}
# Write the file as UTF-8
[IO.File]::WriteAllText($f, $t)
It's better to loop through the files and convert in PowerShell. If you really need to use a batch file then wrap everything in a *.ps1 file and call it from batch
I'm working on a project and i need to get a list of all strings contained in a executable PE file, like some programs do. Here is a screenshot of what i need that the file returns:
https://i.imgur.com/Uw1yXIR.png
I can have the HEX dump of the file, and the strings are there, but i don't know how to extract them. Maybe with regex or something, idk...
I don't want the code, just the logic of the code. Thanks!
Any tips about this?
On linux computers there is a command "strings" (part of binutils package):
strings - print the strings of printable characters in files.
If you have cygwin installed on a windows computer, you could use that command from the cygwin command line:
strings /cygdrive/h:/.../executablefile