Why is COW std::string optimization still enabled in GCC 5.1? - c++

According to GCC 5 release changes page (https://gcc.gnu.org/gcc-5/changes.html):
A new implementation of std::string is enabled by default, using the small string optimization instead of copy-on-write reference counting
I decided to check it and wrote a simple program:
int main()
{
std::string x{"blah"};
std::string y = x;
printf("0x%X\n", x.c_str());
printf("0x%X\n", y.c_str());
x[0] = 'c';
printf("0x%X\n", x.c_str());
printf("0x%X\n", y.c_str());
}
And the result is:
0x162FC38
0x162FC38
0x162FC68
0x162FC38
Notice that the x.c_str() pointer changes after x[0] = 'c'. This means that the internal buffer is copied upon write. So it seems that COW is still in work. Why?
I use g++ 5.1.0 on Ubuntu.

Some distributions intentionally deviate from the FSF GCC choice to default to the new ABI. Here's an explanation of why Fedora 22 deviates from upstream GCC like that. In short:
In a program, it's best not to mix the old and the new ABIs, but to pick one and stick with it. Things break if one part of the program assumes a different internal representation for a type than another part of the program.
Therefore, if any C++ library is used that uses the old C++ ABI, then the programs using that library should also use the old C++ ABI.
Therefore, if any C++ library is used that was built with GCC 4.9 or earlier, then the programs using that library should also use the old C++ ABI.
Fedora 22 still provides (or provided?) a lot of libraries built with GCC 4.9, because there wasn't enough time to rebuild them all with GCC 5.1 before the Fedora 22 release. To allow programs to use those libraries, the GCC default was switched to the old ABI.
As far as I can tell, GCC 5 isn't the default compiler in Ubuntu yet (but will soon be), so if it's provided as an extra install, those same arguments from Fedora also apply to Ubuntu.

Related

_GLIBCXX_USE_CXX11_ABI disabled on RHEL6 and RHEL7?

I have gcc 5.2.1 on RHEL6 and RHEL7, and it looks like _GLIBCXX_USE_CXX11_ABI gets disabled. It's not working even if I manually run -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14. This means I won't get small string optimization feature. For example, the output of following code always have 8 and 'micro not set'. For SSO, size of std::string should be at least 16 if we look at code bits/basic_string.h. Any workaround?
#include <string>
#include <iostream>
int main()
{
std::cout << sizeof(std::string) << std::endl;
#if _GLIBCXX_USE_CXX11_ABI
std::cout << "macro set" << std::endl;
#else
std::cout << "macro not set" << std::endl;
#endif
}
bugzilla.redhat has below reply
Jakub Jelinek 2018-02-19 06:08:00 EST
We've tried hard, but it is not possible to support this, neither on
RHEL6 nor on RHEL7, which is why it is forcefully disabled. It will
work in RHEL8 (and be the default there as well).
It would depend on your libstdc++ version, make sure your include/link/runtime paths are correct. Search your system for that macro and then use that instead, just make sure you link against the correct stdlib/abi libs.
If you do not have that, you can always build it yourself, however beware that if the rest of the programs you have use the old ABI, they will not work with your new libstdc++.
Edit: Thinking about this, did you specify the correct -std= flag to g++? Have you tried -std=gnu11? It could be as trivial as that. If not, then read on. Do not manually specify that define, you will break ABI compatibility with your libstdc++ leading to cascades of wonderful crashes. The only time you can specify things like that are when you're building the stdlib yourself.
Rest of this is a bit overkill, but it explains how to build and/or pick which stdlib you want to use.
I have a similar problem when using version 2 ABI of libc++, where everything that links against it has to be rebuilt with the right headers and thus the right ABI (things like small string optimization being one of them).
For example, when building C++ objects I use the following flags to specify a location to a custom stdlib header path instead of using the OS provided one (I use Clang but the principle is similar):
-nostdinc++ -I/usr/local/sdk/llvm.6.0.1/include/c++/v1/
And then during the linking phase I use an $ORIGIN relative runtime search path since on production machines the standard library is installed in in a more sane location, but you can specify a fixed one to whichever stdlib you want. You also want to make sure the linker can find the appropriate stdlib during static linking with -L.
-Wl,-rpath,'$ORIGIN/../lib' -L/usr/local/sdk/llvm.6.0.1/lib
You will need to link against -lstdc++ and -lsupc++ (order is important if static linking), as long as you provide the correct library search path, the static linker should find them which are the GCC/GNU C++ stdlib and ABI support library.
Beware, if you replace your system libstdc+ with this any programs linked against old ABI layout will break if they're dynamically linked so be careful.

GCC 4.2.2 unsigned short error in casting

this line of code isn't compiling for me on GCC 4.2.2
m_Pout->m_R[i][j] = MIN(MAX(unsigned short(m_Pin->m_R[i][j]), 0), ((1 << 15) - 1));
error: expected primary-expression before ‘unsigned’
however if I add braces to (unsigned short) it works fine.
can you please explain what type of casting (allocation) is being done here?
why isn't the lexical parser/compiler is able to understand this c++ code in GCC?
Can you suggest a "better" way to write this code? supporting GCC 4.2.2 (no c++11, and cross platform)
unsigned short(m_Pin->m_R[i][j]) is a declaration with initialisation of an anonymous temporary, and that cannot be part of an expression.
(unsigned short)(m_Pin->m_R[i][j]) is a cast, and is an expression.
So (1) cannot be used as an argument for MAX, but (2) can be.
I think Bathsheba's answer is at least misleading. short(m_Pin->m_R[i][j]) is a cast. Why is it that the extra unsigned messing things up? It's because unsigned short is not a simple-type-specifier. The cast syntax T(E) works only if T is a single token, and unsigned short is two tokens.
Other types which are spelled with more than one token are char* and int const, and therefore these are also not valid casts: char*(0) and int const(0).
With static_cast<>, the < > are balanced so the type can be named with a sequence of identifiers, even static_cast<int const*const>(0)
You could use the §2 in Bathsheba's answer but it is more idiomatic to use static_cast in C++:
static_cast<unsigned short>(m_Pin->m_R[i][j])
BTW, your error is not related to GCC. You'll get the same if using Clang/LLVM or any (C++99 or C++11) standard conforming C++ compiler.
But independently of that, you should use a much newer version of GCC. In july 2015 the current version is GCC 5.1 and your GCC 4.2.2 version is from 2007, which is very ancient.
Using a more recent version of GCC is worthwhile because:
it enables you to stick to a more recent version of C++, e.g. C++11 (compile with -std=c++11 or -std=gnu++11)
recent GCC have improved their diagnostics. Compiling with -Wall -Wextra will help a lot.
recent GCC are optimizing better, and you'll get more performance from your code
recent GCC have a better and more standard conforming standard C++ library
recent GCC are better for debugging (with a recent GDB), and have sanitizer options (-fsanitize=address, -fsanitize=undefined, other -fsanitize=.... options) which help finding bugs
recent GCC are more standard conforming
recent GCC are customizable thru plugins, including MELT
older GCC 4.2 is no more supported by the FSF, and you'll need to pay big bucks the few companies supporting them.
You don't need any root access to compile from its source code a GCC 5 compiler (or cross-compiler). Read the installation procedures. You'll build a GCC tailored to your particular libc (and you might even use musl-libc if you wanted to ....), perhaps by compiling outside of the source tree after having configured with a command like
...your-path-to/gcc-5/configure --prefix=$HOME/soft/ --program-suffix=-mine
then make then make install then add $HOME/soft/bin/ to your PATH and use gcc-mine and g++-mine

how to satisfy both gcc4.1.2 and gcc 4.7.3

A project need to compile in both gcc4.1.2(company's server) and gcc 4.7.3+(desktop linux system), and have some problems:
1. gcc 4.1.2 does not have Wno-unused-result and Wno-unused-but-set-variable. I tried to substitute the latter two with Wno-unused, but still generate an ignoring return value of a build-in function error.
2. There's also no Wno-narrowing in gcc 4.1.2, is there anything else I can use?
What should I do to make both of them happy?
I'd suggest you deal with the differences between the two versions in the makefile. You can detect the GCC version and pramatically include the extra warning options if the GCC version supports them. This will help when the company finally moves forward.
Fixing the code is worth doing, but don't then not use the warnings. They're the thing telling you there's a problem in the first place (otherwise you wouldn't have enabled them right?)
Anyway, you can get round the unused warnings to system functions by casting the result to void which the compiler is happy you should ignore:
(void)builtin( ... );

Create automatic C wrapper for C++ library?

Let say I have a C++ DLL. AFAIK, there is no widely-adopted ABI standard for C++, therefore to make sure it works and does not depend on the compiler of the target application I would need to wrap my library in a C interface.
Are there any tools that can automatically generate such interface? Would also be nice if they could generate wrappers around C interface to look as if they are original C++ objects, e.g.
Foo* f = new Foo(); // FooWrapper* fw = Foo_create();
f->bar("test"); // Foo_bar(fw, "test")
translates into C functions that are invoked in my library using generated C ABI. I understand that C++ is fairly complicated language and not everything can be easily wrapped in a C interface, but I was wondering if there are any such solutions that even support a subset of the C++ language (maybe with the help of some manually written IDL/XML files)?
there is no widely-adopted ABI standard for C++
I'm pretty sure that is a bit exaggerated - there aren't THAT many different compilers available for any given platform, so it would probably be easier to just produce a DLL for each vendor (e.g. Microsoft, GCC on Windows, GCC on Linux, Sun and GCC for Solaris, GCC for MacOS - CLANG is compatible with GCC as far as I know).
To add a C layer interface basically means that the interface layer must not:
1. Use any objects of that require special copy/assignment/construction behaviour.
2. Use any "throw" exceptions.
3. Use virtual functions.
across that interface.
It is my opinion that it's easier to "fix" the problems caused by "lack of ABI" than it is to make a good interface suitable for C++ use with a C interface in the middle of it.
If you want a way to make C++ code callable from other compilers/standard libraries, you can use cppcomponents from https://github.com/jbandela/cppcomponents. Full disclosure - I am the author of the library.
Here is a simple hello world example
First make a file called library.h
In this file you will define the Component
#include <cppcomponents/cppcomponents.hpp>
struct IPerson
:public cppcomponents::define_interface<cppcomponents::uuid<0xc618fd04,0xaa62,0x46e0,0xaeb8,0x6605eb4a1e64>>
{
std::string SayHello();
CPPCOMPONENTS_CONSTRUCT(IPerson,SayHello);
};
inline std::string PersonId(){return "library!Person";}
typedef cppcomponents::runtime_class<PersonId,cppcomponents::object_interfaces<IPerson>> Person_t;
typedef cppcomponents::use_runtime_class<Person_t> Person;
Next create library.cpp
In this file you will implement the interface and component
#include "library.h"
struct PersonImplementation:cppcomponents::implement_runtime_class<PersonImplementation,Person_t>
{
std::string SayHello(){return "Hello World\n";}
};
CPPCOMPONENTS_DEFINE_FACTORY(PersonImplementation);
Finally here is you main program (call it example1.cpp) that uses your implementation
#include "library.h"
#include <iostream>
int main(){
Person p;
std::cout << p.SayHello();
}
To build the program you will need to download cppcomponents (just clone from the git link above). It is a header only library and needs only a c++11 compiler.
Here is how you would build it on Windows
cl /EHsc example1.cpp /I pathtocppcomponents
g++ -std=c++11 library.cpp -o library.dll -shared -I pathtocppcomponents
where pathocppcomponents is the directory of cppcomponents.
I am assuming you have cl and g++ in your path.
To run the program, make sure library.dll is in the same directory as example1.exe and run example1.exe
This library requires fairly compliant c++11 support, so it needs MSVC 2013 Preview, and at least g++ 4.7. This library works on both Windows and Linux.
As far as I know the answer is no and you are supposed to handle this by yourself with a little bit of "hacking" and modifications, for example your t variable which is an std::string can possibly be "externed" to a C interface by t.c_str() because c_str returns a const char * which is a type that C understands without any problem at all.
I personally don't find C++ complicated, I can't see that "ABI issue" either, I mean nothing is perfect but you are externalizing to C your entire code base to "solve" this issue ? Just use C in the first place, also C it's no easy language to deal with either, for example in C there is not even the notion of "string", and problems that are trivial to solve in C++ while keeping everything type-safe, are really challenging in C if you want to meet the same goal.
I think that you are going a little bit too far with this, and you are complicating things, as it is now you have 3 + 1 main options on the most popular platforms :
libsupc++
libcxxrt
libc++abi
plus the whetever ABI is for the MSVC of your choice ( aka "only god knows")
for me, on linux, libsupc++ works very well, I'm following the libc++abi project and I don't see any big problem either, the only real problem with this is that llvm is basically an Apple oriented project for now, so there isn't that real and good support for the other platforms, but libc++abi compiles and works quite well on linux too ( although it's basically useless and pointless, on linux there is libsupc++ already.) .
I also would never ever use MSVC under Windows, in my opinion it's better to stick with a GCC-like compiler such as mingw, you got bleeding edge features, and you can simplify your codebase and your building phase a lot.

vector<bool>::push_back bug in GCC 3.4.3?

The following code crashes for me using GCC to build for ARM:
#include <vector>
using namespace std;
void foo(vector<bool>& bools) {
bools.push_back(true);
}
int main(int argc, char** argv) {
vector<bool> bools;
bool b = false;
bools.push_back(b);
}
My compiler is: arm_v5t_le-gcc (GCC) 3.4.3 (MontaVista 3.4.3-25.0.30.0501131 2005-07-23). The crash doesn't occur when building for debug, but occurs with optimizations set to -O2.
Yes, the foo function is necessary to reproduce the issue. This was very confusing at first, but I've discovered that the crash only happens when the push_back call isn't inlined. If GCC notices that the push_back method is called more than once, it won't inline it in each location. For example, I can also reproduce the crash by calling push_back twice inside of main. If you make foo static, then gcc can tell it is never called and will optimize it out, resulting in push_back getting inlined into main, resulting in the crash not occurring.
I've tried this on x86 with gcc 4.3.3, and it appears the issue is fixed for that version.
So, my questions are:
Has anyone else run into this? Perhaps there are some compiler flags I can pass in to prevent it.
Is this a bug with gcc's code generation, or is it a bug in the stl implementation (bits/stl_bvector.h)? (I plan on testing this out myself when I get the time)
If it is a problem with the compiler, is upgrading to 4.3.3 what fixes it, or is it switching to x86 from arm?
Incidentally, most other vector<bool> methods seem to work. And yes, I know that using vector<bool> isn't the best option in the world.
Can you build your own toolchain with gcc 3.4.6 and Montavista's patches? 3.4.6 is the last release of the 3.x line.
I can append some instructions for how to build an ARM cross-compiler from GCC sources if you want. I have to do it all the time, since nobody does prebuilt toolchains for Mac OS X.
I'd be really surprised if this is broken for ARM in gcc 4.x. But the only way to test is if you or someone else can try this out on an ARM-targeting gcc 4.x.
Upgrading to GCC 4 is a safe bet. Its code generation backend replaces the old RTL (Register Transfer Language) representation with SSA (Static Single Assignment). This change allowed a significant rewrite of the optimizer.