Why does the g++ compiler add spaces between every character in my cpp file?

Why does the g++ compiler add spaces between every character in my cpp file? - c++

I'm trying to compile 3 cpp files, for only one of them, the g++ compiler on linux is reading spaces between every character on the making it impossible to compile. I get hundreds, if not thousands, of x.cpp:n:n: warning: null character(s) ignored (where x is a name and n is a number). I wrote the program in Visual studio and I copied them to linux. The other 2 files compile fine, I've done this for dozens of projects. How does this happen?
I managed to fix this issue by creating a new file and copying the text from the original cpp instead of copying the file.
Now I get an error from the terminal saying Permission Denied when I try launch the .o file

Your compiler problem is nothing to do with linebreaks.
You're trying to compile a file saved as UTF-16 (Unicode). Visual Studio will do this behind your back if the file contains any non-ASCII characters.
Solution 1 (recommended): stick to ASCII. Then the problem simply won't arise in the first place.
Solution 2: save the file in Visual Studio as UTF-8, as described here. You might need to save the file without a BOM (byte-order mark) as described here.
WRT your other problem, look for a file called a.out (yes, really) and try running that. And don't specify -c on the g++ command line.

There is no text but encoded text.
Dogmatic corollaries:
Authors choose a character encoding for a text file.
Readers must know what it is.
Any kind of shared understanding will do: specification, convention, internal tagging, metadata upon transfer, …. (Even last century's way of converting upon transfer would do in some cases.)
It seems you 1) didn't know what you chose. 2) didn't bring that knowledge with you when you copied the file between systems, and 3) didn't tell GCC.
Unfortunately, there has been a culture of hiding these basic communication needs, instead of doing it mindfully; so, your experience too much too common.
To tell GCC,
g++ -finput-charset=utf-16
Obviously, if you are using some sort of project system that supports keeping track of the required metadata of the relevant text files and passing it tools, that would be preferred.
You could try adopting UTF-8 Everywhere. That won't eliminate the need for communication (until maybe the middle of this century) but it could make it more agreeable.

Related

Special characters from STRINGTABLE not displayed correctly if built on differnt machine?

I have to maintain a small C++ / VS2015 project in my department which only checks the installed .NET Framework of a machine and prompts the user if the current version is not installed. This small application is localized by a file called Language.rc which contains some STRINGTABLES with the corresponding texts.
All this works fine if the program is compiled on my machine, but if the same code is compiled on our build machines then the special characters like for example the German ÄÖÜ are missing.
Unfortunately I'm not a c++ person and I have no clue what is wrong. I already searched the web but cannot find a hint on what might be the problem.
Does anybody have an idea what could be different on the build machines compared to my machine that causes the different characters?
UPDATE:
So after my TFS expert has analysed the problem on the build machines we were able to identify the culprit:
As I said before the application that was causing the problem is only a small tool. Our automatic build contains a lot more solutions and projects. One part of the automatic build is a script that sets the version numbers of all kinds of files to the same value. This is apparently also done for so called RC files. As far as I understand there are different kinds of RC files in C++ (and also in Delpi) which actually hold version numbers. The RC file in my case only has texts and translations but is opened and also saved even though it does not have a version number.
Unfortunately this operation also explicitly sets the encoding of the file to some old IBMxyz encoding (maybe for the Delphi RC files?). This is the actual operations where the special characters get lost... So the solution to my problem is not within the original encoding of the file but somewhere in the build process.
As a temporary fix we changed the .rc file to an .rc2 file - this way the project still compiles but the build does no longer modify it.
I've had enough fun for today...

Windows has two ways of handling text. These are known as "Unicode" (really UTF-16) and "ANSI" (which isn't related to the ANSI standards organization, and describes any 8 bit superset of ASCII).
Your problem is clearly a case of "ANSI" disease. ASCII does not contain "Ä", some supersets of ASCII do, but not all supersets do. Different machines using different supersets will cause different results.
The "simple" fix is to prefix an L to the string in the .rc file: L"zum Beispiel", and then save this .rc file as Unicode (UTF-16). While newer versions of Windows contain more UTF-16 characters, this never affects existing characters, and Ä has been part of every Unicode version. (Even € works everywhere - I think that was added in Windows 2000)

Modifying executable upon download (Like Ninite)

I'm currently developing an application (Windows) that needs internal modifications upon download time.
Also, I'm delivering it using a Linux host, so, can't compile on demand as proposed.
How does Ninite deal with it?
In Ninite.com, each time you select different options, you get the same .exe, however, with minor modifications inside.
Option 1
Compile the program with predefined data (in Windows).
Use PHP to fseek the file and replace my custom strings.
Option 2
Append the original .EXE with a different resource file
Other?
Has someone developed something like this? What would be the best approach?
Thank you.

You can just append data to the back of your original executable. The Windows PE file format is robust enough that this does not invalidate the executable itself. (It will however invalidate any existing digital signatures.)
Finding the start of this data can be a challenge if its size isn't known up front. In that case, it may be necessary to append the variable-length data, and then append the data length (itself a fixed length field - 4 bytes should do). To read the extra data, read the last 4 bytes to get the data length. Get the file length, subtract 4 for the length field, then subtract the variable length to get the start of the data.

The most portable way could be to have a plugin (whose path in wired inside your main program) inside your application. That plugin would be modified (e.g. on Linux by generating C++ code gencod.cc, forking a g++ -Wall -shared -fPIC -O gencod.cc -o gencod.so compilation, then dlopen-ing the ./gencod.so) and your application could have something to generate the C++ source code of that plugin and to compile it.
I guess that the same might be doable on Windows (which I don't know). Probably the issue is to compile it (the compilation command would be different on Windows and on Linux). Beware that AFAIK on Windows a process cannot modify its own executable (but you should check).
Qt has a portable layer for plugins. See QPluginLoader & Qt Plugins HowTo
Alternatively, don't modify the application, but use some persistent file or data (at a well defined place, -whose location or filepath is wired in the executable- preferably in textual format like JSON, or maybe using sqlite, or a real database) keeping the changing information. Read also about application checkpointing.
If you need to implement your specific application checkpointing, you'll better design your application very early with this concern. Study garbage collection algorithms (a checkpointing procedure is similar to a precise copying GC) and read more about continuations. See also this answer to a very similar question.

VS2010 doesn't understand the string encoding of its own files

I am working with VS2010 project Unicode which all works fine. When I remove my local files and download fresh copy of it from source control (Perforce), the resource.h file reads wrong (in chinese).
//{{NO_DEPENDENCIES}} ਍⼀⼀ 䴀椀挀爀漀猀漀昀琀 嘀椀猀甀愀氀 䌀⬀⬀ 最攀渀攀爀愀琀攀搀 椀渀挀氀甀搀攀 昀椀氀攀⸀ഀഀ // Used by MyDemo.rc ਍⼀⼀ഀഀ #define IDM_ABOUTBOX 0x0010 ਍⌀搀攀昀椀渀攀 䤀䐀䐀开䄀䈀伀唀吀䈀伀堀                    ㄀　　ഀഀ
Why does VS2010 does that? and how can I fix it? It is essentially an identical file but in once instance it opens and a new instance it is not able to figure out the file encoding.
Although this is MFC project but it doesn't look like that has anything to do with this issue.

This appears to be a problem with Perforce's line ending conversion and a failure to correctly deduce the UTF16 format of resource.h
Following the steps here may fix the problem if you encounter it in future:
Problem
On Windows, after syncing "text" (Perforce file type) files containing
utf16 encoding, the file in my workspace seems corrupted.
Solution
As the utf16 character encoding is a double byte character encoding
and Perforce treats "text" files as single byte, you may encounter
rendering or corruption issues in a Windows environment. Windows line
endings are not correctly converted within the UTF16 character set for
"text" files. This corrupts the utf16 file content.
File revisions with utf16 content should always be submitted using the
"utf16" file type (on add, Perforce will automatically detect utf16
files unless the user or a typemap rule overrides this behavior).
In order to fix your issue follow these steps:
Edit your workspace specification and change the value of the LineEnding field to "unix"
Force sync the file (no line ending conversion will be done)
Check that the workspace file is now rendered properly
Checkout the file(s), changing the file type to utf16 (change from "text" to "utf16")
Edit your workspace specification and change the value of the LineEnding field back to "local"
Submit a new revision of the file
Example:
p4 client bruno_ws
LineEnd: unix
p4 sync -f myfile.txt
p4 edit -t utf16 myfile.txt
p4 client bruno_ws
LineEnd: local
p4 submit -d "Fixing unicode file"

I think I have seen this issue before so I am going to answer this now that I figured it out. Somehow the resource.h encoding or format was messed up and I don't know why.I haven't made any manual changes to it. Perforce was not able to detect changes and display both of them correctly side by side in comparison. However it didn't prompt "The files are identical" message that it normally does if files are identical. However If I do 'Revert unchanged files", it rolls it back not detecting changes.
I used hex comparison tool and the internals of the two files were different. I simply picked the one which was working. Also the file size was also different for some reason.
The correct file shows as following
//{{NO_DEPENDENCIES}}
// Microsoft Visual C++ generated include file.
// Used by MyDemo.rc
//
#define IDD_ABOUTBOX 100
....

Resource.h needs to be in ANSI format.
Sometimes Visual Studio converts it to Unicode and puts a 2 byte BOM in the beginning. However, when it is loaded in the IDE editor, it cannot recognize it and displays it as Chinese.
If you take a look with a hex editor, you will be able to read the file contents.
The solution is to use an independent text editor (I use Notepad++ or Notepad2) and make sure the file is encoded in ANSI format without BOM.
Then check in the file and don't open it with Visual Studio anymore.
If you need to do changes, always go through the external editor and make sure that after saving the encoding is still ANSI.
I don't know why this happens. My assumption is that the OS default locale is different from the VS Project Resource locale. Then the IDE gets confused and probably tries to convert the resource file to Unicode in order to avoid conversion problems, but Resource.h is not an ordinary text file. The compilers seem not to understand Unicode sources with BOM.

Is it a good idea to include a large text variable in compiled code?

I am writing a program that produces a formatted file for the user, but it's not only producing the formatted file, it does more.
I want to distribute a single binary to the end user and when the user runs the program, it will generate the xml file for the user with appropriate data.
In order to achieve this, I want to give the file contents to a char array variable that is compiled in code. When the user runs the program, I will write out the char file to generate an xml file for the user.
char* buffers = "a xml format file contents, \
this represent many block text \
from a file,...";
I have two questions.
Q1. Do you have any other ideas for how to compile my file contents into binary, i.e, distribute as one binary file.
Q2. Is this even a good idea as I described above?

What you describe is by far the norm for C/C++. For large amounts of text data, or for arbitrary binary data (or indeed any data you can store in a file - e.g. zip file) you can write the data to a file, link it into your program directly.
An example may be found on sites like this one

I'll recommend using another file to contain data other than putting data into the binary, unless you have your own reasons. I don't know other portable ways to put strings into binary file, but your solution seems OK.
However, note that using \ at the end of line to form strings of multiple lines, the indentation should be taken care of, because they are concatenated from the begging of the next line：
char* buffers = "a xml format file contents, \
this represent many block text \
from a file,...";
Or you can use another form:
char *buffers =
"a xml format file contents,"
"this represent many block text"
"from a file,...";

Probably, my answer provides much redundant information for topic-starter, but here are what I'm aware of:
Embedding in source code: plain C/C++ solution it is a bad idea because each time you will want to change your content, you will need:
recompile
relink
It can be acceptable only your content changes very rarely or never of if build time is not an issue (if you app is small).
Embedding in binary: Few little more flexible solutions of embedding content in executables exists, but none of them cross-platform (you've not stated your target platform):
Windows: resource files. With most IDEs it is very simple
Linux: objcopy.
MacOS: Application Bundles. Even more simple than on Windows.
You will not need recompile C++ file(s), only re-link.
Application virtualization: there are special utilities that wraps all your application resources into single executable, that runs it similar to as on virtual machine.
I'm only aware of such utilities for Windows (ThinApp, BoxedApp), but there are probably such things for other OSes too, or even cross-platform ones.
Consider distributing your application in some form of installer: when starting installer it creates all resources and unpack executable. It is similar to generating whole stuff by main executable. This can be large and complex package or even simple self-extracting archive.
Of course choice, depends on what kind of application you are creating, who are your target auditory, how you will ship package to end-users etc. If it is a game and you targeting children its not the same as Unix console utility for C++ coders =)

It depends. If you are doing some small unix style utility with no perspective on internatialization, then it's probably fine. You don't want to bloat a distributive with a file no one would ever touch anyways.
But in general it is a bad practice, because eventually someone might want to modify this data and he or she would have to rebuild the whole thing just to fix a typo or anything.
The decision is really up to you.
If you just want to keep your distributive in one piece, you might also find this thread interesting: Store data in executable

Why don't you distribute your application with an additional configuration file? e.g. package your application executable and config file together.
If you do want to make it into a single file, try embed your config file into the executable one as resources.

I see it more of an OS than C/C++ issue. You can add the text to the resource part of your binary/program. In Windows programs HTML, graphics and even movie files are often compiled into resources that make part of the final binary.
That is handy for possible future translation into another language, plus you can modify resource part of the binary without recompiling the code.

huge C file debugging problem

I have a source file in my project, which has more than 65,536 code lines (112,444 to be exact). I'm using an "sqlite amalgamation", which comes in a single huge source file.
I'm using MSVC 2005. The problems arrives during debugging. Everything compiles and links ok. But then when I'm trying to step into a function with the debugger - it shows an incorrect code line.
What's interesting is that the difference between the correct line number and the one the debugger shows is exactly 65536. This makes me suspect (almost be sure in) some unsigned short overflow.
I also suspect that it's not a bug in the MSVC itself. Perhaps it's the limitation of the debug information format. That is, the debug information format used by MSVC stores the line numbers as 2-byte shorts.
Is there anything can be done about this (apart from cutting the huge file into several smaller ones) ?

According to a MS moderator, this is a known issue with the debugger only (the compiler seems to handle it fine as you pointed out). There is apparently no workaround, other than using shorter source files. See official response to very similar question here

Well, when I wanted to look at how sqlite works, I took the last 60000 or so lines, moved them to another file and then #include'd it. That was easy and did the trick for me. Also, if you do that, be careful not to split inside #ifdef .

If you look at the documentation for the symbolic debugging info, you will see the type used for line numbers. For example, both line and column parameters for IDiaSession::findLinesByLinenum are of type DWORD.
Edit: As #valdo points out, that still doesn't mean the debugger works properly with huge line numbers. So you have to use shorter files. It is unfortunate that such limitation exists, but even if there wasn't I'd still recommend you split your source.

Have you looked into using WinDBG instead? It's pretty capable as the Windows team use it for debugging the O/S and there's some biiiig files in there, or at least there was when I last looked.

For anyone having issues with incorrect line numbers for files < 65536 lines: I found my issue was because of inconsistent line endings in the source file. There were 129 \r newlines where the rest of the file was \r\n style. The difference between the debugger line and the correct line was 129 as well.

Unless you are modifying SQLite, you should just trust that it is doing its job. No need to step in at all. SQLite is run through a large battery of tests before release.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js