by default WebStorm creates file in utf-8 encoding, but without BOM. How can i add BOM label in my files?
According to the Unicode specification:
Use of BOM is neither required nor recommended for UTF-8 (Chapter 2.6)
There is a related request and forum discussion, feel free to comment there.
Related
I have a project where most files are utf-8 encoded, but there is a specific file type which must be in ISO-8859-1 encoding and these files have a specific extension.
Is there a way to configure CLion in such a way that these files are open with the correct encoding?
See https://www.jetbrains.com/help/idea/encoding.html#file-encoding-settings
Encoding could be set for a path, but not for files with extension:
Workaround is to use BOM (for XML, HTML and others), or set encoding per file:
I am using Eclipse CDT Eclipse C/C++ Development Tools 10.2.0.202103011047, CppStyle 1.5.0.0 and clang-format from clang 12.0.0.
My C++ source file contains a line
QString fileExportDirectory = "./😁";
because I am writing a unit test for handling Unicode paths. As soon as I format any part of the file, the line gets changed into
QString fileExportDirectory = "./?";
Why does that happen? Both, the encoding of the respective file and the default text file encoding are set to "UTF-8". I have not read anything that clang-format or Cppstyle have difficulties with Unicode. How can I prevent my clang-format code formatter from destroying Unicode contents?
The described misbehaviour is a reported CppStyle bug (https://github.com/wangzw/CppStyle/issues/39). As a workaround
-Dfile.encoding=UTF-8
can be added to "eclipse.ini", then formatting works as expected.
I got a Visual Studio Qt/C++ project (from China, not sure if this matters so mentioning here because Chinese characters are a little tricky sometimes). When I open it on QtCreator (my preferred IDE) on macOS (my preferred OS) then I get
Could not decode main.cpp with UTF-8 ecoding. Editing not possible.
If I click Select Encoding and choose System then I can edit normally. I can even save and close the file but when I open it again same thing happen.
I noticed there are some comments appearing as )//?????????????????? and //???????�??????????UI which seems to be a problem related to enconding.
How to deal with this issue?
What the System encoding means?
Openning the file on SublimeText and Save with Encoding UTF8 seems to solve the problem. But I have a lot of files, any suggestion on how to do it from command line for all files?
And the file seems not to be UTF8:
$ file main.cpp
main.cpp: c program text, ISO-8859 text
Finally I went to QtCreator, Tools, Options, Text Editor, Behavior, File Encodings and set Default Encoding to ISO-8859-1. Now there is no more complains on QtCretor side. Are there any downsides on doing this?
I suspect it contains non-valid UTF-8 characters. Here is a question with the same problem on Qt forum. One of the comments says
I just discovered this because a header file from FTDI contained a copyright symbol. I just converted that to read (C) instead of the copyright symbol and it was fine.
You can try that. If it's not that, I advise you to check if it is valid UTF-8 text. You can check if it is valid UTF-8 with a command like: iconv -f UTF-8 your_file > /dev/null; echo $?, it will return 0 if it is valid and 1 if it is not valid.
I am starting to use wxWidgets (Version 3.1.4) for a Windows GUI application. Compiling and linking is done with gcc coming with MINGW64 (gcc (x86_64-posix-sjlj-rev0, Built by MinGW-W64 project) 8.1.0).
In the source, the characters are entered correctly:
m_menuItem31 = new wxMenuItem(m_name6, wxID_ANY, _("Stückliste drucken"), _("Stückliste drucken"), wxITEM_NORMAL);
m_name6->Append(m_menuItem31);
In the application it looks like this:
After some research I tried using the linker option -municode ending up in an error "no reference to wWinMain". #define wxUNICODE has no effect.
Edit:
After preprocessing, the characters are still as desired. In the .o file, they are already spoiled, so the solution should be a compiler switch I am not yet aware of...
Any good ideas are welcome.
This might be up to the encoding of your source file (as VZ hinted) - I am developing wxWidgets C++ app with some non-english (e.g. - ć) characters similar to German umlauts in terms of occasional display problems.
I opened my source file in Notepad++ to see my source file's encoding and it showed encoding was ANSI:
and when this file was compiled (in MSVC) it produced correct display in application:
But when in Notepad++ I converted encoding to UTF-8, in source file it still appeared correct:
I saved file with encoding converted from ANSI to UTF-8 and compiled but after that application showed wrong character:
I advise you to take Notepad++ and do similar experiment and try to find out what encoding you are using and what encoding you should be using in your source files (possibly it should be encoding that the compiler is expecting, as VZ hinted).
In my case it didn't really seem to matter that much if string was bounded in _() or wxT() - I was able to get correct display when compiled as long as encoding of the source file was correct.
As explained in the wx docs, you can't simply expect wxMenuItem(...,..., _("Stückliste drucken"),...) to work, due to that ü is not a 7-bit valid character.
You may use
_(wxString::FromUTF8("St \xC3\xBC ckliste drucken"))
or
_(L"St \u00FC ckliste drucken")
because ü has 00FC Unicode point or C3BC in UTF-8 encoding.
Finally I got it sorted out...
I use Codelite, and after the answer from Ivan (thanks a lot for the hints!) I made some more tests. After adding some of the special characters to the minimal sample of wxWidgets and compiling it from commandline, the characters were displayed wrong in the first place. In Notepad++ I found coding to be UTF-8. After changing it to ANSI, characters were displayed correctly (consistent with Ivan's answer).
I tried the same with Codelite, and it failed. The reason was that the coding of the source file was always reset to UTF-8. It took 2 iterations to get the settings in the Codelite preferences correct (ISO-8859-1 with localization enabled) after which it worked.
The only strange effect I observed was that linking now took about 14 min... need to dig into that further.
I am trying to build a project written in VS 2008 using QtCreator under Linux and I get loads of errors:
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: error: stray ‘\377’ in program
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: error: stray ‘\376’ in program
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: error: stray ‘#’ in program
/home/ga/dev/CppGroup/MonteCarlo/main.cpp:1: warning: null character(s) ignored
etc.
Does it mean that the compiler can't handle unicode correctly? How can I fix it?
That looks like a UTF-16 BOM for little-endian UTF-16. You need to make sure the file is saved as UTF-8 or convert it manually via iconv -f UTF-16LE -t UTF8 myfile.
Ensure the file is encoded in UTF-8. Open it with a text editor that allows you chosing the file encoding (e.g. gedit or notepad++) and convert it. I've had similar issues before, but UTF-8 files work fine (other encodings like UTF-16 won't work).
Edit: Don't convert your resource script (if there's any) to UTF-8. The resource compiler won't be able to read it (at least when using MSVC 2008).
It may be that your files use windows encoding, with characters like ^M, \r\n...
Have you tried to run dos2unix on your source files before compiling ?
I think i've seen 'stray ...' in file with unicode.
You may configure your editor's or console's (or both) encoding setting to fix it.