Could not decode main.cpp with UTF-8 ecoding

Could not decode main.cpp with UTF-8 ecoding - c++

I got a Visual Studio Qt/C++ project (from China, not sure if this matters so mentioning here because Chinese characters are a little tricky sometimes). When I open it on QtCreator (my preferred IDE) on macOS (my preferred OS) then I get
Could not decode main.cpp with UTF-8 ecoding. Editing not possible.
If I click Select Encoding and choose System then I can edit normally. I can even save and close the file but when I open it again same thing happen.
I noticed there are some comments appearing as )//?????????????????? and //???????�??????????UI which seems to be a problem related to enconding.
How to deal with this issue?
What the System encoding means?
Openning the file on SublimeText and Save with Encoding UTF8 seems to solve the problem. But I have a lot of files, any suggestion on how to do it from command line for all files?
And the file seems not to be UTF8:
$ file main.cpp
main.cpp: c program text, ISO-8859 text
Finally I went to QtCreator, Tools, Options, Text Editor, Behavior, File Encodings and set Default Encoding to ISO-8859-1. Now there is no more complains on QtCretor side. Are there any downsides on doing this?

I suspect it contains non-valid UTF-8 characters. Here is a question with the same problem on Qt forum. One of the comments says
I just discovered this because a header file from FTDI contained a copyright symbol. I just converted that to read (C) instead of the copyright symbol and it was fine.
You can try that. If it's not that, I advise you to check if it is valid UTF-8 text. You can check if it is valid UTF-8 with a command like: iconv -f UTF-8 your_file > /dev/null; echo $?, it will return 0 if it is valid and 1 if it is not valid.

Related

wxMenuItem in C++: Characters like "äöü" not displayed properly in the item text

I am starting to use wxWidgets (Version 3.1.4) for a Windows GUI application. Compiling and linking is done with gcc coming with MINGW64 (gcc (x86_64-posix-sjlj-rev0, Built by MinGW-W64 project) 8.1.0).
In the source, the characters are entered correctly:
m_menuItem31 = new wxMenuItem(m_name6, wxID_ANY, _("Stückliste drucken"), _("Stückliste drucken"), wxITEM_NORMAL);
m_name6->Append(m_menuItem31);
In the application it looks like this:
After some research I tried using the linker option -municode ending up in an error "no reference to wWinMain". #define wxUNICODE has no effect.
Edit:
After preprocessing, the characters are still as desired. In the .o file, they are already spoiled, so the solution should be a compiler switch I am not yet aware of...
Any good ideas are welcome.

This might be up to the encoding of your source file (as VZ hinted) - I am developing wxWidgets C++ app with some non-english (e.g. - ć) characters similar to German umlauts in terms of occasional display problems.
I opened my source file in Notepad++ to see my source file's encoding and it showed encoding was ANSI:
and when this file was compiled (in MSVC) it produced correct display in application:
But when in Notepad++ I converted encoding to UTF-8, in source file it still appeared correct:
I saved file with encoding converted from ANSI to UTF-8 and compiled but after that application showed wrong character:
I advise you to take Notepad++ and do similar experiment and try to find out what encoding you are using and what encoding you should be using in your source files (possibly it should be encoding that the compiler is expecting, as VZ hinted).
In my case it didn't really seem to matter that much if string was bounded in _() or wxT() - I was able to get correct display when compiled as long as encoding of the source file was correct.

As explained in the wx docs, you can't simply expect wxMenuItem(...,..., _("Stückliste drucken"),...) to work, due to that ü is not a 7-bit valid character.
You may use
_(wxString::FromUTF8("St \xC3\xBC ckliste drucken"))
or
_(L"St \u00FC ckliste drucken")
because ü has 00FC Unicode point or C3BC in UTF-8 encoding.

Finally I got it sorted out...
I use Codelite, and after the answer from Ivan (thanks a lot for the hints!) I made some more tests. After adding some of the special characters to the minimal sample of wxWidgets and compiling it from commandline, the characters were displayed wrong in the first place. In Notepad++ I found coding to be UTF-8. After changing it to ANSI, characters were displayed correctly (consistent with Ivan's answer).
I tried the same with Codelite, and it failed. The reason was that the coding of the source file was always reset to UTF-8. It took 2 iterations to get the settings in the Codelite preferences correct (ISO-8859-1 with localization enabled) after which it worked.
The only strange effect I observed was that linking now took about 14 min... need to dig into that further.

Is there a way to check encoding of files through Install script or batch?

Is there a way through install script/Windows batch/PowerShell that I will be able to check if a file is UTF-8 before passing it for conversion?
As a background, I am currently working on a legacy (Japanese) Windows software developed with Visual Studio 2005 (Upgraded to Visual Studio 2017) in C++.
I am dealing with a requirement to make GUI be able display and input Chinese characters. Thus the decision to use UNICODE for the project/solution encoding.
Since the project was originally using Multibyte, to be backwards compatible with UNICODE I have decided to encode configuration files (ini, dat, save files) in UTF-8 as these files are also referenced by a web application.
The main bits of the software are now done and working, and I am left with one last problem - rolling out a version up installer.
In this installer (using Install script), I am required to update save files (previously encoded in SHIFT-JIS as these save files contains Japanese text) to UTF-8.
I have already created a batch file in the following lines which converts SHIFT-JIS to UTF-8, which is called at the last part of the installer and is deleted after conversion.
#echo off
:: Shift_JIS -> UTF-8
setlocal enabledelayedexpansion
for %%f in ("%~dp0\savedfiles\*.sav") do (
echo %%~ff| findstr /l /e /i ".sav"
if !ERRORLEVEL! equ 0 (
powershell -nop -c "&{[IO.File]::WriteAllText($args[1], [IO.File]::ReadAllText($args[0], [Text.Encoding]::GetEncoding(932)))}" \"%%~ff" \"%%~ff"
)
)
However, the problem with this is that when the user (1) upgrades, (2) uninstalls (.sav files are left behind on purpose) and (3) re-installs the software the save files are doubly re-encoded and results in the software crashing. (UTF-8 Japanese characters updated during (1) upgrade, become garbage characters after (3) re-installation.)

If you're upgrading then all the current files should be in Shift-JIS. Even if you have some situations that leave both Shift-JIS and UTF-8 files at the same time then there are only 2 types of encodings that you need to handle. Therefore you can work around this by checking if the file is not valid UTF-8 then it's Shift-JIS. Of course this will still subject to incorrect detection in some rare cases but otherwise it might be good for your use case
By default when reading text files a best-fit fallback or replacement fallback handler is used. We can change to an exception fallback so it'll throw an exception if a Shift-JIS file is opened as UTF-8
try {
$t = [IO.File]::ReadAllText($f, [Text.Encoding]::GetEncoding(65001, `
(New-Object Text.EncoderExceptionFallback), `
(New-Object Text.DecoderExceptionFallback)))
} catch {
# File is not UTF-8, reopen as Shift-JIS
$t = [IO.File]::ReadAllText($f, [Text.Encoding]::GetEncoding(932))
}
# Write the file as UTF-8
[IO.File]::WriteAllText($f, $t)
It's better to loop through the files and convert in PowerShell. If you really need to use a batch file then wrap everything in a *.ps1 file and call it from batch

How to fix encoding issue in sublime text

I'm trying to fix an encoding issue I'm confronted with when using Sublime Text.
If I try to open a c++ file in sublime it displays the file's content as a series of hexadecimal numbers. I've tried to fix this by reopening it with all the various given encoding options ("File --> Reopen with Encoding"). I have also tried the following: "enable_hexadecimal_encoding": false under settings.
Is there something else I can do to fix this problem?
Screenshot when initially opened
Screenshot after reopening with UTF-8 encoding

I think you have opened an object file(.o) with the wrong extension, I have checked it on my machine your file is originally an object file with possibly wrong extension. Read more about object files here.
Try to find the original source file, a c++ source file will have an extension of .cpp(most likely).

Getting Bad configuration option: \377\376h

I am setting my systems for codecommit. but getting following error
I followed the below link :
https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up-ssh-windows.html
/c/Users/Prasanna/.ssh/config: line 1: Bad configuration option: \377\376h
/c/Users/Prasanna/.ssh/config: terminating, 1 bad configuration options
here is the config file
Host git-codecommit.*.amazonaws.com
User ********
IdentityFile ~/.ssh/codecommit_rsa
Am I missing anything to configure ?

You probably have some illegal characters in the config file. I had this problem while creating a config file on Windows. Unfortunately, simply opening the file in a Windows text editor may not show the illegal characters.
I was able to find this problem by running cat filename from a Bash prompt in Windows (git bash) and was able to fix it by running dos2unix filename in git bash. The same may work for you as well.

Just had the same issue. Open the file with Notepad++. On the bottom right it tells you the encoding the file is in. It has to be UTF-8 without BOM. You can fix that via selecting a new encoding at the top and saving the file.

This happened to me today, and I just recreated the config file and put my configs there, it works.

C++ compatibility between Visual Studio and gcc under Linux

I am trying to build a project written in VS 2008 using QtCreator under Linux and I get loads of errors:
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: error: stray ‘\377’ in program
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: error: stray ‘\376’ in program
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: error: stray ‘#’ in program
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: warning: null character(s) ignored
etc.
Does it mean that the compiler can't handle unicode correctly? How can I fix it?

That looks like a UTF-16 BOM for little-endian UTF-16. You need to make sure the file is saved as UTF-8 or convert it manually via iconv -f UTF-16LE -t UTF8 myfile.

Ensure the file is encoded in UTF-8. Open it with a text editor that allows you chosing the file encoding (e.g. gedit or notepad++) and convert it. I've had similar issues before, but UTF-8 files work fine (other encodings like UTF-16 won't work).
Edit: Don't convert your resource script (if there's any) to UTF-8. The resource compiler won't be able to read it (at least when using MSVC 2008).

It may be that your files use windows encoding, with characters like ^M, \r\n...
Have you tried to run dos2unix on your source files before compiling ?

I think i've seen 'stray ...' in file with unicode.
You may configure your editor's or console's (or both) encoding setting to fix it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Could not decode main.cpp with UTF-8 ecoding - c++

Related

wxMenuItem in C++: Characters like "äöü" not displayed properly in the item text

Is there a way to check encoding of files through Install script or batch?

How to fix encoding issue in sublime text

Getting Bad configuration option: \377\376h

C++ compatibility between Visual Studio and gcc under Linux

Categories

Resources