Why does clang-format destroy Unicode characters within eclipse? - c++

I am using Eclipse CDT Eclipse C/C++ Development Tools 10.2.0.202103011047, CppStyle 1.5.0.0 and clang-format from clang 12.0.0.
My C++ source file contains a line
QString fileExportDirectory = "./😁";
because I am writing a unit test for handling Unicode paths. As soon as I format any part of the file, the line gets changed into
QString fileExportDirectory = "./?";
Why does that happen? Both, the encoding of the respective file and the default text file encoding are set to "UTF-8". I have not read anything that clang-format or Cppstyle have difficulties with Unicode. How can I prevent my clang-format code formatter from destroying Unicode contents?

The described misbehaviour is a reported CppStyle bug (https://github.com/wangzw/CppStyle/issues/39). As a workaround
-Dfile.encoding=UTF-8
can be added to "eclipse.ini", then formatting works as expected.

Related

Could not decode main.cpp with UTF-8 ecoding

I got a Visual Studio Qt/C++ project (from China, not sure if this matters so mentioning here because Chinese characters are a little tricky sometimes). When I open it on QtCreator (my preferred IDE) on macOS (my preferred OS) then I get
Could not decode main.cpp with UTF-8 ecoding. Editing not possible.
If I click Select Encoding and choose System then I can edit normally. I can even save and close the file but when I open it again same thing happen.
I noticed there are some comments appearing as )//?????????????????? and //???????�??????????UI which seems to be a problem related to enconding.
How to deal with this issue?
What the System encoding means?
Openning the file on SublimeText and Save with Encoding UTF8 seems to solve the problem. But I have a lot of files, any suggestion on how to do it from command line for all files?
And the file seems not to be UTF8:
$ file main.cpp
main.cpp: c program text, ISO-8859 text
Finally I went to QtCreator, Tools, Options, Text Editor, Behavior, File Encodings and set Default Encoding to ISO-8859-1. Now there is no more complains on QtCretor side. Are there any downsides on doing this?
I suspect it contains non-valid UTF-8 characters. Here is a question with the same problem on Qt forum. One of the comments says
I just discovered this because a header file from FTDI contained a copyright symbol. I just converted that to read (C) instead of the copyright symbol and it was fine.
You can try that. If it's not that, I advise you to check if it is valid UTF-8 text. You can check if it is valid UTF-8 with a command like: iconv -f UTF-8 your_file > /dev/null; echo $?, it will return 0 if it is valid and 1 if it is not valid.

wxMenuItem in C++: Characters like "äöü" not displayed properly in the item text

I am starting to use wxWidgets (Version 3.1.4) for a Windows GUI application. Compiling and linking is done with gcc coming with MINGW64 (gcc (x86_64-posix-sjlj-rev0, Built by MinGW-W64 project) 8.1.0).
In the source, the characters are entered correctly:
m_menuItem31 = new wxMenuItem(m_name6, wxID_ANY, _("Stückliste drucken"), _("Stückliste drucken"), wxITEM_NORMAL);
m_name6->Append(m_menuItem31);
In the application it looks like this:
After some research I tried using the linker option -municode ending up in an error "no reference to wWinMain". #define wxUNICODE has no effect.
Edit:
After preprocessing, the characters are still as desired. In the .o file, they are already spoiled, so the solution should be a compiler switch I am not yet aware of...
Any good ideas are welcome.
This might be up to the encoding of your source file (as VZ hinted) - I am developing wxWidgets C++ app with some non-english (e.g. - ć) characters similar to German umlauts in terms of occasional display problems.
I opened my source file in Notepad++ to see my source file's encoding and it showed encoding was ANSI:
and when this file was compiled (in MSVC) it produced correct display in application:
But when in Notepad++ I converted encoding to UTF-8, in source file it still appeared correct:
I saved file with encoding converted from ANSI to UTF-8 and compiled but after that application showed wrong character:
I advise you to take Notepad++ and do similar experiment and try to find out what encoding you are using and what encoding you should be using in your source files (possibly it should be encoding that the compiler is expecting, as VZ hinted).
In my case it didn't really seem to matter that much if string was bounded in _() or wxT() - I was able to get correct display when compiled as long as encoding of the source file was correct.
As explained in the wx docs, you can't simply expect wxMenuItem(...,..., _("Stückliste drucken"),...) to work, due to that ü is not a 7-bit valid character.
You may use
_(wxString::FromUTF8("St \xC3\xBC ckliste drucken"))
or
_(L"St \u00FC ckliste drucken")
because ü has 00FC Unicode point or C3BC in UTF-8 encoding.
Finally I got it sorted out...
I use Codelite, and after the answer from Ivan (thanks a lot for the hints!) I made some more tests. After adding some of the special characters to the minimal sample of wxWidgets and compiling it from commandline, the characters were displayed wrong in the first place. In Notepad++ I found coding to be UTF-8. After changing it to ANSI, characters were displayed correctly (consistent with Ivan's answer).
I tried the same with Codelite, and it failed. The reason was that the coding of the source file was always reset to UTF-8. It took 2 iterations to get the settings in the Codelite preferences correct (ISO-8859-1 with localization enabled) after which it worked.
The only strange effect I observed was that linking now took about 14 min... need to dig into that further.

KDevelop automatically inserts a space after # in #include <file.h>

I simply want to include a C++ header file in KDevelop by writing
#include <file.h>
However, KDevelop automatically corrects the above statement to
# include <file.h>
I have not been able to figure out where I can change this. I want the first version, such that my files are similar to the other files in the project I am working on.
The file is type set correctly as a C++ file. I think the reason for the extra space is that KDevelop wants to indent the line because it is inside a conditional, i.e. a include guard as given below:
#ifndef THIS_FILE_H
#define THIS_FILE_H
// THIS_FILE_H
#endif
KDevelop also wants to indent the #define THIS_FILE_H line.
I have tried created my own indentation style by going to the Settings -> Customize KDevelop menu item, and then clicking on Source Formatter on the left. When defining the formatting style, I have disabled the indentation of preprocessor directives, nevertheless, it has no effect on the indentation of the #include and #define inside the include guard.
I know this is quite old but here is my advice, for future reference.
Go to Settings -> Configure KDevelop -> Code Formatter.
There you'll be able to see a dropdown the language you want KDevelop to format (C, C++, C#, Java, and whatever), the formatter ("Artistic Style" or "Custom Script Formatter") and a list of possible predefined styles plus buttons to customise your own.
Check whether you have selected one suitable predefined and try a few ones to see what happens.
KDevelop should stop adding the extra space after # include after this.
Question: does your KDevelop also adds a space after a #define as well?
Same Problem. So, I uninstall kate, then remove all config files in home directory whose name contains kate. Then I opened kdevelop and found nothing change. At last, Kdevelop->Settings->Open/Save->Modes&Filetypes, select Sources/C++ and Sources/C, change it to None, restart kdevelop. OMG, finally, the world is peace.
To conclude, the Sources Modes&Filetypes occur the problem.
The automatic spaces appear to be caused by the indentation mode, which (in KDevelop 4.7.1) you can switch for the current file via Editor -> Tools -> Indentation. If indentation is set to C++/boost Style, you'll get those weird automatic spaces while typing; while in mode Normal you only get the usual indentation at beginning of line.
The default indentation mode can be set in Settings -> Configure Editor -> Open/Save -> Modes & Filetypes. For each Filetype (eg. Sources/C, Sources/C++, Sources/C++11, Sources/C++11/Qt4) the Indentation Mode can be set independently.
KDevelop appears to remember the setting for files you have opened once; so for these files the new configuration settings have no effect. I don't know how to make KDevelop forget these per-file settings.

How to feed Visual Studio Clang-Format plugin with clang-format file?

So I downloaded, installed, and inserted into path the clang formatting plugin. I also tested it and it works for Google (Mozilla, etc.) formatting options out of the box, yet I cannot get it working with my .clang-format file. (I've put my file into the same folder as my source file, changed its encoding into UTF-8, also tried to put it into clang install folder, add file into project, write its contents inside '{key:value}' yet formatting does not happen). So how do you feed formatting file to chrome-format extension?
My file contents:
{ BasedOnStyle: "LLVM", IndentWidth: 4 }
My file name:nm.clang-format
Go to Tools->Options->LLVM/Clang->ClangFormat and put file in the Style option field.
Then place your style file named .clang-format (this is the full filename, not an extension) either in the source file's directory or one of its parent directories. Windows Explorer won't let you create filenames with leading . so you need to go to the console for this.
If like me you got confused later on where the .clang-format was living, use procmon to track the file reads of clang-format.exe
For the record, it seems that if both "Fallback Style" and "Style" are set to "file", no formatting will happen even if the style file is at its correct location. Setting "Fallback Style" to something different than "file" (e.g. "none") helps.
In VS2019 works if the clang-format file is named as .clang-format.
It must be .clang-format, not .clang-format.txt or clang-format.txt.

C++ compatibility between Visual Studio and gcc under Linux

I am trying to build a project written in VS 2008 using QtCreator under Linux and I get loads of errors:
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: error: stray ‘\377’ in program
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: error: stray ‘\376’ in program
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: error: stray ‘#’ in program
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: warning: null character(s) ignored
etc.
Does it mean that the compiler can't handle unicode correctly? How can I fix it?
That looks like a UTF-16 BOM for little-endian UTF-16. You need to make sure the file is saved as UTF-8 or convert it manually via iconv -f UTF-16LE -t UTF8 myfile.
Ensure the file is encoded in UTF-8. Open it with a text editor that allows you chosing the file encoding (e.g. gedit or notepad++) and convert it. I've had similar issues before, but UTF-8 files work fine (other encodings like UTF-16 won't work).
Edit: Don't convert your resource script (if there's any) to UTF-8. The resource compiler won't be able to read it (at least when using MSVC 2008).
It may be that your files use windows encoding, with characters like ^M, \r\n...
Have you tried to run dos2unix on your source files before compiling ?
I think i've seen 'stray ...' in file with unicode.
You may configure your editor's or console's (or both) encoding setting to fix it.