Does std::string move constructor actually move? - c++

So here i got a small test program:
#include <string>
#include <iostream>
#include <memory>
#include <vector>
class Test
{
public:
Test(const std::vector<int>& a_, const std::string& b_)
: a(std::move(a_)),
b(std::move(b_)),
vBufAddr(reinterpret_cast<long long>(a.data())),
sBufAddr(reinterpret_cast<long long>(b.data()))
{}
Test(Test&& mv)
: a(std::move(mv.a)),
b(std::move(mv.b)),
vBufAddr(reinterpret_cast<long long>(a.data())),
sBufAddr(reinterpret_cast<long long>(b.data()))
{}
bool operator==(const Test& cmp)
{
if (vBufAddr != cmp.vBufAddr) {
std::cout << "Vector buffers differ: " << std::endl
<< "Ours: " << std::hex << vBufAddr << std::endl
<< "Theirs: " << cmp.vBufAddr << std::endl;
return false;
}
if (sBufAddr != cmp.sBufAddr) {
std::cout << "String buffers differ: " << std::endl
<< "Ours: " << std::hex << sBufAddr << std::endl
<< "Theirs: " << cmp.sBufAddr << std::endl;
return false;
}
}
private:
std::vector<int> a;
std::string b;
long long vBufAddr;
long long sBufAddr;
};
int main()
{
Test obj1 { {0x01, 0x02, 0x03, 0x04}, {0x01, 0x02, 0x03, 0x04}};
Test obj2(std::move(obj1));
obj1 == obj2;
return 0;
}
Software i used for test:
Compiler: gcc 7.3.0
Compiler flags: -std=c++11
OS: Linux Mint 19 (tara) with upstream release Ubuntu 18.04 LTS (bionic)
The results i see here, that after move, vector buffer still has the same address, but string buffer doesn't. So it looks to me, that it allocated fresh one, instead of just swapping buffer pointers. What causes such behavior?

You're likely seeing the effects of the small/short string optimization (SSO). To avoid unnecessary allocations for every tiny little string, many implementations of std::string include a small fixed size array to hold small strings without requiring new (this array usually repurposes some of the other members that aren't necessary when dynamic allocation has not been used, so it consumes little or no additional memory to provide it, either for small or large strings), and those strings don't benefit from std::move (but they're small, so it's fine). Larger strings will require dynamic allocation, and will transfer the pointer as you expect.
Just for demonstration, this code on g++:
void move_test(std::string&& s) {
std::string s2 = std::move(s);
std::cout << "; After move: " << std::hex << reinterpret_cast<uintptr_t>(s2.data()) << std::endl;
}
int main()
{
std::string sbase;
for (size_t len=0; len < 32; ++len) {
std::string s1 = sbase;
std::cout << "Length " << len << " - Before move: " << std::hex << reinterpret_cast<uintptr_t>(s1.data());
move_test(std::move(s1));
sbase += 'a';
}
}
Try it online!
produces high (stack) addresses that change on move construction for lengths of 15 or less (presumably varies with architecture pointer size), but switches to low (heap) addresses that remain unchanged after move construction once you hit length 16 or higher (the switch is at 16, not 17, because it is NUL-terminating the strings, since C++11 and higher require it).
To be 100% clear: This is an implementation detail. No part of the C++ spec requires this behavior, so you should not rely on it occurring at all, and when it occurs, you should not rely on it occurring for specific string lengths.

Related

How to store Processor Status flags' values with corresponding enum for 6502

I'm working on 6502 emulator in C++ as a part of my thesis. It has 6 registers, most of them just hold values but there's one special - Processor Status. It's 8 bit wide and each bit means a different flag. The best choice for me seemed to make it std::bitset<8> and create a corresponding enum class to map its values to the real bits as follow:
enum class PSFlags : uint8_t
{
Carry = 0,
Zero = 1,
InterruptDisable = 2,
Decimal = 3,
Break = 4,
Unknown = 5,
Overflow = 6,
Negative = 7
};
struct Registers
{
int8_t A;
int8_t X;
int8_t Y;
uint8_t SP;
uint16_t PC;
static constexpr uint8_t PSSize = 8;
std::bitset<PSSize> PS;
constexpr Registers() noexcept :
A(0),
X(0),
Y(0),
SP(0xFF),
PS(0b00100100),
PC(0)
{
}
};
And now, if I want to refer to one of three: size of PS, the flag number or the bitset itself I have:
Registers::PSSize; // size
PSFlags::Carry; // flag number
Registers r; r.PS; // bitset itself
Where every call accesses the value in a very different way. I'd like to have it more consistent, e.g.
Registers::PS::value; // for the bitset itself
Registers::PS::size; // for the size
Registers::PS::flags::Carry; // for the name of flag
Do you have any good ideas on how to achieve such (or similar) consistency without creating some crazy or ugly constructs in the code?
What OP wants (or something acceptable similar) can be achieved using nested structs.
Just for fun, I tried to model what OP intended:
#include <bitset>
struct Registers
{
int8_t A;
int8_t X;
int8_t Y;
uint8_t SP;
static constexpr uint8_t PSSize = 8;
struct PS: std::bitset<PSSize> {
enum Flags {
Carry = 0,
Zero = 1,
InterruptDisable = 2,
Decimal = 3,
Break = 4,
Unknown = 5,
Overflow = 6,
Negative = 7
};
static constexpr unsigned Size = PSSize;
constexpr PS(std::uint8_t value):
std::bitset<PSSize>((unsigned long long)value)
{ }
std::uint8_t value() const { return (std::uint8_t)to_ulong(); }
} PS;
uint16_t PC;
constexpr Registers() noexcept :
A(0),
X(0),
Y(0),
SP(0xFF),
PS(0x24),//PS(0b00100100),
PC(0)
{
}
} r;
A small test to show this in action:
#include <iomanip>
#include <iostream>
#define DEBUG(...) std::cout << #__VA_ARGS__ << ";\n"; __VA_ARGS__
int main()
{
std::cout << std::hex << std::setfill('0');
DEBUG(std::cout << Registers::PS::Flags::Carry << '\n');
DEBUG(std::cout << r.PS[Registers::PS::Flags::Carry] << '\n');
DEBUG(std::cout << Registers::PS::Flags::InterruptDisable << '\n');
DEBUG(std::cout << r.PS[Registers::PS::Flags::InterruptDisable] << '\n');
DEBUG(std::cout << Registers::PS::Flags::Break << '\n');
DEBUG(std::cout << r.PS[Registers::PS::Flags::Break] << '\n');
DEBUG(std::cout << Registers::PS::Size << '\n');
DEBUG(std::cout << "0x" << std::setw(2) << (unsigned)r.PS.value() << '\n');
// done
return 0;
}
Output:
std::cout << Registers::PS::Flags::Carry << '\n';
0
std::cout << r.PS[Registers::PS::Flags::Carry] << '\n';
0
std::cout << Registers::PS::Flags::InterruptDisable << '\n';
2
std::cout << r.PS[Registers::PS::Flags::InterruptDisable] << '\n';
1
std::cout << Registers::PS::Flags::Break << '\n';
4
std::cout << r.PS[Registers::PS::Flags::Break] << '\n';
0
std::cout << Registers::PS::Size << '\n';
8
std::cout << "0x" << std::setw(2) << (unsigned)r.PS.value() << '\n';
0x24
Note:
About naming the nested struct Registers::PS and the member Registers::PS with same name is I was expecting to work. Though, usually I use uppercase start character for type identifiers and lowercase start characters for variables. Hence, I don't have this issue usually.
As being in doubt about this, I tested the struct Registers against various compilers (though I wouldn't count this as proof against the standard): Compiler Explorer
Deriving from std::containers should be done with care (i.e. better not). Probably for performance reasons, none of the std::containers provides a virtual destructor with the respective consequences. In the above code, this shouldn't be a problem.
6502 reminded me to the Commodore 64 where I made my first attempts on (although the C64 had the even more modern 6510 CPU). However, that's looong ago... ;-)

Passing string 'by value' change in local value reflect in original value

Why is the change of my local variable's value getting reflected into original variable? I am passing it by value in C++.
#include <string>
#include <iostream>
void test(std::string a)
{
char *buff = (char *)a.c_str();
buff[2] = 'x';
std::cout << "In function: " << a;
}
int main()
{
std::string s = "Hello World";
std::cout << "Before : "<< s << "\n" ;
test(s);
std::cout << "\n" << "After : " << s << std::endl;
return 0;
}
Output:
Before : Hello World
In function: Hexlo World
After : Hexlo World
As soon as you wrote
buff[2] = 'x';
and compiled your code all bets were off. Per [string.accessors]
const charT* c_str() const noexcept;
Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
Complexity: constant time.
Requires: The program shall not alter any of the values stored in the character array.
emphasis mine
Since you are not allowed to modify the characters that the pointer points to but you do, you have undefined behavior. The compiler at this point is allowed to do pretty much whatever it wants. Trying to figure out why it did what it did is meaningless as any other compiler might not do this.
The moral of the story is do not cast const away unless you are really sure that you know what you are doing and if you do you need to, then document the code to show you know what you are doing.
Your std::string implementation uses reference counting and makes a deep copy only if you modify the string via its operator[] (or some other method). Casting the const char* return value of c_str() to char* will lead to undefined behavior.
I believe since C++11 std::string must not do reference counting anymore, so switching to C++11 might be enough to make your code work (Edit: I did not actually check that before, and it seems my assumption was wrong).
To be on the safe side, consider looking for a string implementation that guarantees deep copying (or implement one yourself).
#include <cstring>
#include <string>
#include <iostream>
void test(std::string a)
{
// modification trough valid std::string API
a[2] = 'x';
const char *buff = a.c_str(); // only const char* is available from API
std::cout << "In function: " << a << " | Trough pointer: " << buff;
// extraction to writeable char[] buffer
char writeableBuff[100];
// unsafe, possible attack trough buffer overflow, don't use in real code
strcpy(writeableBuff, a.c_str());
writeableBuff[3] = 'y';
std::cout << "\n" << "In writeable buffer: " << writeableBuff;
}
int main()
{
std::string s = "Hello World";
std::cout << "Before : "<< s << "\n" ;
test(s);
std::cout << "\n" << "After : " << s << std::endl;
return 0;
}
Output:
Before : Hello World
In function: Hexlo World | Trough pointer: Hexlo World
In writeable buffer: Hexyo World
After : Hello World

string move assignment exchange of values

I was programming some test cases an noticed an odd behaviour.
An move assignment to a string did not erase the value of the first string, but assigned the value of the target string.
sample code:
#include <utility>
#include <string>
#include <iostream>
int main(void) {
std::string a = "foo";
std::string b = "bar";
std::cout << a << std::endl;
b = std::move(a);
std::cout << a << std::endl;
return 0;
}
result:
$ ./string.exe
foo
bar
expected result:
$ ./string.exe
foo
So to my questions:
Is that intentional?
Does this happen only with strings and/or STL objects?
Does this happen with custom objects (as in user defined)?
Environment:
Win10 64bit
msys2
g++ 5.2
EDIT
After reading the possible duplicate answer and the answer by #OMGtechy
i extended the test to check for small string optimizations.
#include <utility>
#include <string>
#include <iostream>
#include <cinttypes>
#include <sstream>
int main(void) {
std::ostringstream oss1;
oss1 << "foo ";
std::ostringstream oss2;
oss2 << "bar ";
for (std::uint64_t i(0);;++i) {
oss1 << i % 10;
oss2 << i % 10;
std::string a = oss1.str();
std::string b = oss2.str();
b = std::move(a);
if (a.size() < i) {
std::cout << "move operation origin was cleared at: " << i << std::endl;
break;
}
if (0 == i % 1000)
std::cout << i << std::endl;
}
return 0;
}
This ran on my machine up to 1 MB, which is not a small string anymore.
And it just stopped, so i could paste the source here (Read: i stopped it).
This is likely due to short string optimization; i.e. there's no internal pointer to "move" over, so it ends up acting just like a copy.
I suggest you try this with a string large number of characters; this should be enough to get around short string optimization and exhibit the behaviour you expected.
This is perfectly valid, because the C++ standard states that moved from objects (with some exceptions, strings are not one of them as of C++11) shall be in a valid but unspecified state.

How to print a bunch of integers with the same formatting?

I would like to print a bunch of integers on 2 fields with '0' as fill character. I can do it but it leads to code duplication. How should I change the code so that the code duplication can be factored out?
#include <ctime>
#include <sstream>
#include <iomanip>
#include <iostream>
using namespace std;
string timestamp() {
time_t now = time(0);
tm t = *localtime(&now);
ostringstream ss;
t.tm_mday = 9; // cheat a little to test it
t.tm_hour = 8;
ss << (t.tm_year+1900)
<< setw(2) << setfill('0') << (t.tm_mon+1) // Code duplication
<< setw(2) << setfill('0') << t.tm_mday
<< setw(2) << setfill('0') << t.tm_hour
<< setw(2) << setfill('0') << t.tm_min
<< setw(2) << setfill('0') << t.tm_sec;
return ss.str();
}
int main() {
cout << timestamp() << endl;
return 0;
}
I have tried
std::ostream& operator<<(std::ostream& s, int i) {
return s << std::setw(2) << std::setfill('0') << i;
}
but it did not work, the operator<< calls are ambigous.
EDIT I got 4 awesome answers and I picked the one that is perhaps the simplest and the most generic one (that is, doesn't assume that we are dealing with timestamps). For the actual problem, I will probably use std::put_time or strftime though.
In C++20 you'll be able to do this with std::format in a less verbose way:
ss << std::format("{}{:02}{:02}{:02}{:02}{:02}",
t.tm_year + 1900, t.tm_mon + 1, t.tm_mday,
t.tm_hour, t.tm_min, t.tm_sec);
and it's even easier with the {fmt} library that supports tm formatting directly:
auto s = fmt::format("{:%Y%m%d%H%M%S}", t);
You need a proxy for your string stream like this:
struct stream{
std::ostringstream ss;
stream& operator<<(int i){
ss << std::setw(2) << std::setfill('0') << i;
return *this; // See Note below
}
};
Then your formatting code will just be this:
stream ss;
ss << (t.tm_year+1900)
<< (t.tm_mon+1)
<< t.tm_mday
<< t.tm_hour
<< t.tm_min
<< t.tm_sec;
return ss.ss.str();
ps. Note the general format of my stream::operator<<() which does its work first, then returns something.
The "obvious" solution is to use a manipulator to install a custom std::num_put<char> facet which just formats ints as desired.
The above statement may be a bit cryptic although it entirely describes the solution. Below is the code to actually implement the logic. The first ingredient is a special std::num_put<char> facet which is just a class derived from std::num_put<char> and overriding one of its virtual functions. The used facet is a filtering facet which looks at a flag stored with the stream (using iword()) to determine whether it should change the behavior or not. Here is the code:
class num_put
: public std::num_put<char>
{
std::locale loc_;
static int index() {
static int rc(std::ios_base::xalloc());
return rc;
}
friend std::ostream& twodigits(std::ostream&);
friend std::ostream& notwodigits(std::ostream&);
public:
num_put(std::locale loc): loc_(loc) {}
iter_type do_put(iter_type to, std::ios_base& fmt,
char fill, long value) const {
if (fmt.iword(index())) {
fmt.width(2);
return std::use_facet<std::num_put<char> >(this->loc_)
.put(to, fmt, '0', value);
}
else {
return std::use_facet<std::num_put<char> >(this->loc_)
.put(to, fmt, fill, value);
}
}
};
The main part is the do_put() member function which decides how the value needs to be formatted: If the flag in fmt.iword(index()) is non-zero, it sets the width to 2 and calls the formatting function with a fill character of 0. The width is going to be reset anyway and the fill character doesn't get stored with the stream, i.e., there is no need for any clean-up.
Normally, the code would probably live in a separate translation unit and it wouldn't be declared in a header. The only functions really declared in a header would be twodigits() and notwodigits() which are made friends in this case to provide access to the index() member function. The index() member function just allocates an index usable with std::ios_base::iword() when called the time and it then just returns this index. The manipulators twodigits() and notwodigits() primarily set this index. If the num_put facet isn't installed for the stream twodigits() also installs the facet:
std::ostream& twodigits(std::ostream& out)
{
if (!dynamic_cast<num_put const*>(
&std::use_facet<std::num_put<char> >(out.getloc()))) {
out.imbue(std::locale(out.getloc(), new num_put(out.getloc())));
}
out.iword(num_put::index()) = true;
return out;
}
std::ostream& notwodigits(std::ostream& out)
{
out.iword(num_put::index()) = false;
return out;
}
The twodigits() manipulator allocates the num_put facet using new num_put(out.getloc()). It doesn't require any clean-up because installing a facet in a std::locale object does the necessary clean-up. The original std::locale of the stream is accessed using out.getloc(). It is changed by the facet. In theory the notwodigits could restore the original std::locale instead of using a flag. However, imbue() can be a relatively expensive operation and using a flag should be a lot cheaper. Of course, if there are lots of similar formatting flags, things may become different...
To demonstrate the use of the manipulators there is a simple test program below. It sets up the formatting flag twodigits twice to verify that facet is only created once (it would be a bit silly to create a chain of std::locales to pass through the formatting:
int main()
{
std::cout << "some-int='" << 1 << "' "
<< twodigits << '\n'
<< "two-digits1='" << 1 << "' "
<< "two-digits2='" << 2 << "' "
<< "two-digits3='" << 3 << "' "
<< notwodigits << '\n'
<< "some-int='" << 1 << "' "
<< twodigits << '\n'
<< "two-digits4='" << 4 << "' "
<< '\n';
}
Besides formatting integers with std::setw / std::setfill or ios_base::width / basic_ios::fill, if you want to format a date/time object you may want to consider using std::put_time / std::gettime
For convenient output formatting you may use boost::format() with sprintf-like formatting options:
#include <boost/format.hpp>
#include <iostream>
int main() {
int i1 = 1, i2 = 10, i3 = 100;
std::cout << boost::format("%03i %03i %03i\n") % i1 % i2 % i3;
// output is: 001 010 100
}
Little code duplication, additional implementation effort is marginal.
If all you want to do is output formatting of your timestamp, you should obviously use strftime(). That's what it's made for:
#include <ctime>
#include <iostream>
std::string timestamp() {
char buf[20];
const char fmt[] = "%Y%m%d%H%M%S";
time_t now = time(0);
strftime(buf, sizeof(buf), fmt, localtime(&now));
return buf;
}
int main() {
std::cout << timestamp() << std::endl;
}
operator<<(std::ostream& s, int i) is "ambiguous" because such a function already exists.
All you need to do is give that function a signature that doesn't conflict.

Restore the state of std::cout after manipulating it

Suppose I have a code like this:
void printHex(std::ostream& x){
x<<std::hex<<123;
}
..
int main(){
std::cout<<100; // prints 100 base 10
printHex(std::cout); //prints 123 in hex
std::cout<<73; //problem! prints 73 in hex..
}
My question is if there is any way to 'restore' the state of cout to its original one after returning from the function? (Somewhat like std::boolalpha and std::noboolalpha..) ?
Thanks.
you need to #include <iostream> or #include <ios> then when required:
std::ios_base::fmtflags f( cout.flags() );
//Your code here...
cout.flags( f );
You can put these at the beginning and end of your function, or check out this answer on how to use this with RAII.
Note that the answers presented here won't restore the full state of std::cout. For example, std::setfill will "stick" even after calling .flags(). A better solution is to use .copyfmt:
std::ios oldState(nullptr);
oldState.copyfmt(std::cout);
std::cout
<< std::hex
<< std::setw(8)
<< std::setfill('0')
<< 0xDECEA5ED
<< std::endl;
std::cout.copyfmt(oldState);
std::cout
<< std::setw(15)
<< std::left
<< "case closed"
<< std::endl;
Will print:
case closed
rather than:
case closed0000
The Boost IO Stream State Saver seems exactly what you need. :-)
Example based on your code snippet:
void printHex(std::ostream& x) {
boost::io::ios_flags_saver ifs(x);
x << std::hex << 123;
}
I've created an RAII class using the example code from this answer. The big advantage to this technique comes if you have multiple return paths from a function that sets flags on an iostream. Whichever return path is used, the destructor will always be called and the flags will always get reset. There is no chance of forgetting to restore the flags when the function returns.
class IosFlagSaver {
public:
explicit IosFlagSaver(std::ostream& _ios):
ios(_ios),
f(_ios.flags()) {
}
~IosFlagSaver() {
ios.flags(f);
}
IosFlagSaver(const IosFlagSaver &rhs) = delete;
IosFlagSaver& operator= (const IosFlagSaver& rhs) = delete;
private:
std::ostream& ios;
std::ios::fmtflags f;
};
You would then use it by creating a local instance of IosFlagSaver whenever you wanted to save the current flag state. When this instance goes out of scope, the flag state will be restored.
void f(int i) {
IosFlagSaver iosfs(std::cout);
std::cout << i << " " << std::hex << i << " ";
if (i < 100) {
std::cout << std::endl;
return;
}
std::cout << std::oct << i << std::endl;
}
You can create another wrapper around the stdout buffer:
#include <iostream>
#include <iomanip>
int main() {
int x = 76;
std::ostream hexcout (std::cout.rdbuf());
hexcout << std::hex;
std::cout << x << "\n"; // still "76"
hexcout << x << "\n"; // "4c"
}
In a function:
void print(std::ostream& os) {
std::ostream copy (os.rdbuf());
copy << std::hex;
copy << 123;
}
Of course if performance is an issue this is a bit more expensive because it's copying the entire ios object (but not the buffer) including some stuff that you're paying for but unlikely to use such as the locale.
Otherwise I feel like if you're going to use .flags() it's better to be consistent and use .setf() as well rather than the << syntax (pure question of style).
void print(std::ostream& os) {
std::ios::fmtflags os_flags (os.flags());
os.setf(std::ios::hex);
os << 123;
os.flags(os_flags);
}
As others have said you can put the above (and .precision() and .fill(), but typically not the locale and words-related stuff that is usually not going to be modified and is heavier) in a class for convenience and to make it exception-safe; the constructor should accept std::ios&.
C++20 std::format will be a superior alternative to save restore in most cases
Once you can use it, you will e.g. be able to write hexadecimals simply as:
#include <format>
#include <string>
int main() {
std::cout << std::format("{:x} {:#x} {}\n", 16, 17, 18);
}
Expected output:
10 0x11 18
This will therefore completely overcome the madness of modifying std::cout state.
The existing fmt library implements it for before it gets official support: https://github.com/fmtlib/fmt Install on Ubuntu 22.04:
sudo apt install libfmt-dev
Modify source to replace:
<format> with <fmt/core.h>
std::format to fmt::format
main.cpp
#include <iostream>
#include <fmt/core.h>
int main() {
std::cout << fmt::format("{:x} {:#x} {}\n", 16, 17, 18);
}
and compile and run with:
g++ -std=c++11 -o main.out main.cpp -lfmt
./main.out
Output:
10 0x11 18
Related: std::string formatting like sprintf
With a little bit of modification to make the output more readable :
void printHex(std::ostream& x) {
ios::fmtflags f(x.flags());
x << std::hex << 123 << "\n";
x.flags(f);
}
int main() {
std::cout << 100 << "\n"; // prints 100 base 10
printHex(std::cout); // prints 123 in hex
std::cout << 73 << "\n"; // problem! prints 73 in hex..
}
Instead of injecting format into cout, the << way, adopting setf and unsetf could be a cleaner solution.
void printHex(std::ostream& x){
x.setf(std::ios::hex, std::ios::basefield);
x << 123;
x.unsetf(std::ios::basefield);
}
the ios_base namespace works fine too
void printHex(std::ostream& x){
x.setf(std::ios_base::hex, std::ios_base::basefield);
x << 123;
x.unsetf(std::ios_base::basefield);
}
Reference: http://www.cplusplus.com/reference/ios/ios_base/setf/
I would like to generalize the answer from qbert220 somewhat:
#include <ios>
class IoStreamFlagsRestorer
{
public:
IoStreamFlagsRestorer(std::ios_base & ioStream)
: ioStream_(ioStream)
, flags_(ioStream_.flags())
{
}
~IoStreamFlagsRestorer()
{
ioStream_.flags(flags_);
}
private:
std::ios_base & ioStream_;
std::ios_base::fmtflags const flags_;
};
This should work for input streams and others as well.
PS: I would have liked to make this simply a comment to above answer, stackoverflow however does not allow me to do so because of missing reputation. Thus make me clutter the answers here instead of a simple comment...