I want to do something like that:
vector<string> road_map;
// do some stuff
for_each(road_map.begin(), road_map.end(), bind(cout, &ostream::operator<<));
Note: I don't want to use lambda for this purpose, like that:
[](const string& str){ cout << str << endl; }
C++ compiler will create aux code for lambda. That's why I don't want to use lambda in such case. I suppose that there is more lightweight solution for such problem. Of course it is not critical, if there is not simple solution I just will use lambda.
This answer is mainly used to investigate the C++ compiler will create aux code for lambda claim.
You should note that there is no member function ostream::operator<< taking a std::string. There's only a free-standing function opertator<<(std::ostream&,const std::string&) defined in the <string> header.
I used gcc godbolt, you can see the example Live here
So I made a version using a lambda and a version using std::bind:
With bind you get
void funcBind(std::vector<std::string>& a)
{
using namespace std;
using func_t = std::ostream&(*)(std::ostream&, const std::string&);
func_t fptr = &operator<<; //select the right overload
std::for_each(
a.begin(),
a.end(),
std::bind(
fptr,
std::ref(std::cout),
std::placeholders::_1)
);
}
and this assembly on x86_64
push rbp
push rbx
sub rsp, 8
mov rbp, QWORD PTR [rdi+8]
mov rbx, QWORD PTR [rdi]
cmp rbx, rbp
je .L22
.L24:
mov rdx, QWORD PTR [rbx+8]
mov rsi, QWORD PTR [rbx]
mov edi, OFFSET FLAT:std::cout
add rbx, 32
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
cmp rbp, rbx
jne .L24
.L22:
add rsp, 8
pop rbx
pop rbp
ret
and with a lambda you get:
void funcLambda(std::vector<std::string>& a)
{
std::for_each(
a.begin(),
a.end(),
[](const std::string& b){std::cout << b;});
}
and this assembly
push rbp
push rbx
sub rsp, 8
mov rbp, QWORD PTR [rdi+8]
mov rbx, QWORD PTR [rdi]
cmp rbx, rbp
je .L27
.L29:
mov rdx, QWORD PTR [rbx+8]
mov rsi, QWORD PTR [rbx]
mov edi, OFFSET FLAT:std::cout
add rbx, 32
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
cmp rbp, rbx
jne .L29
.L27:
add rsp, 8
pop rbx
pop rbp
ret
so you don't actually see any difference with any appreciable level of optimization enable
You can use an ostream_iterator to generate the output.
#include <string>
#include <vector>
#include <iostream>
#include <algorithm>
#include <iterator>
int main()
{
std::vector<std::string> road_map{"ab", "cde"};
// do some stuff
std::copy(road_map.begin(), road_map.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Related
When I compile this code using the {fmt} lib, the executable size becomes 255 KiB whereas by using only iostream header it becomes 65 KiB (using GCC v11.2).
time_measure.cpp
#include <iostream>
#include "core.h"
#include <string_view>
int main( )
{
// std::cout << std::string_view( "Oh hi!" );
fmt::print( "{}", std::string_view( "Oh hi!" ) );
return 0;
}
Here is my build command:
g++ -std=c++20 -Wall -O3 -DNDEBUG time_measure.cpp -I include format.cc -o runtime_measure.exe
Isn't the {fmt} library supposed to be lightweight compared to iostream? Or maybe I'm doing something wrong?
Edit: By adding -s to the command in order to remove all symbol table and relocation information from the executable, it becomes 156 KiB. But still ~2.5X more than the iostream version.
As with any other library there is a fixed cost and a per-call cost. The fixed cost for the {fmt} library is indeed around 100-150k without debug info (it depends on the compiler flags). In your example you are comparing this fixed cost of linking with the library and the reason why iostreams appears to be smaller is because it is included in the standard library itself which is linked dynamically and not counted to the binary size of the executable.
Note that a large part of this size comes from floating-point formatting functionality which doesn't even exist in iostreams (shortest round-trip representation).
If you want to compare per-call binary size which is more important for real-world code with large number of formatting function calls, you can look at object files or generated assembly. For example:
#include <fmt/core.h>
int main() {
fmt::print("Oh hi!");
}
generates (https://godbolt.org/z/qWTKEMqoG)
.LC0:
.string "Oh hi!"
main:
sub rsp, 24
pxor xmm0, xmm0
xor edx, edx
mov edi, OFFSET FLAT:.LC0
mov rcx, rsp
mov esi, 6
movaps XMMWORD PTR [rsp], xmm0
call fmt::v8::vprint(fmt::v8::basic_string_view<char>, fmt::v8::basic_format_args<fmt::v8::basic_format_context<fmt::v8::appender, char> >)
xor eax, eax
add rsp, 24
ret
while
#include <iostream>
int main() {
std::cout << "Oh hi!";
}
generates (https://godbolt.org/z/frarWvzhP)
.LC0:
.string "Oh hi!"
main:
sub rsp, 8
mov edx, 6
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
xor eax, eax
add rsp, 8
ret
_GLOBAL__sub_I_main:
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
call std::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
Other than static initialization for cout there is not much difference because there is virtually no formatting here, so it's just one function call in both cases. Once you add formatting you'll quickly see the benefits of {fmt}, see e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0645r10.html#BinaryCode.
You forget that iostreams is already included in the stdlibc++.so that is not counted towards the binary size since it is a shared library (usually). I believe by default fmt is built as a static library file so it increases the binary size. You need to compile fmt as a shared library with -DBUILD_SHARED_LIBS=TRUE as explained in the building instructions
in your build/link command, why don't you use -Os option (Optimize for size) ?
I'm dealing with a class that defines a friend function in the class without outside declaration
namespace our_namespace {
template <typename T>
struct our_container {
friend our_container set_union(our_container const &, our_container const &) {
// meaningless for the example here, just a valid definition
// no valid semantics
return our_container{};
}
};
} // namespace our_namespace
As discussed (e.g. here or here) the function set_union is not in the our_namespace namespace but will be found by argument dependent lookup:
auto foo(std::vector<our_namespace::our_container<float>> in) {
// works:
return set_union(in[0], in[1]);
}
I noticed however that in the debug flags set_union appears to be in the our_namespace namespace
mov rdi, qword ptr [rbp - 40] # 8-byte Reload
mov rsi, rax
call our_namespace::set_union(our_namespace::our_container<float> const&, our_namespace::our_container<float> const&)
add rsp, 48
pop rbp
ret
our_namespace::set_union(our_namespace::our_container<float> const&, our_namespace::our_container<float> const&): # #our_namespace::set_union(our_namespace::our_container<float> const&, our_namespace::our_container<float> const&)
push rbp
mov rbp, rsp
mov qword ptr [rbp - 16], rdi
mov qword ptr [rbp - 24], rsi
pop rbp
ret
although I can't call it as our_namespace::set_union
auto foo(std::vector<our_namespace::our_container<float>> in) {
// fails:
return our_namespace::set_union(in[0], in[1]);
}
Any hints about how the debug information is to be understood?
EDIT: The set_union function body is only a strawdog example here to have a valid definition.
The C++ standard only defines compiler behavior in regards to the code compilation and behavior of the resulting program. It doesn't define all the aspects of code generation, and in particular, it doesn't define debug symbols.
So your compiler correctly (as per Standard) disallows calling the function through namespace it is not in. But since the function does exist and you should be able to debug it, it needs to put debug symbol somewhere. Enclosing namespace seems to be a reasonable choice.
I have C library with such API:
#ifdef __cplusplus
extern "C" {
#endif
struct Foo {
void *p;
int len;
};
struct Foo f(void *opaque, int param);
void foo_free(struct Foo *);
#ifdef __cplusplus
}
#endif
to simplify my C++ life I decide do simple thing:
struct Foo {
void *p;
int len;
#ifdef __cplusplus
~Foo() { foo_free(this); }
#endif
};
And after that things become crazy: for example if I call
f(0xfffeeea0, 40) in C++, then on C side I got 0x7fff905d2050 -69984:
assember without destructor:
0x000055555555467a <+0>: push %rbp
0x000055555555467b <+1>: mov %rsp,%rbp
0x000055555555467e <+4>: sub $0x10,%rsp
0x0000555555554682 <+8>: mov $0x28,%esi
0x0000555555554687 <+13>: mov $0xfffeeea0,%edi
0x000055555555468c <+18>: callq 0x5555555546a0 <f>
0x0000555555554691 <+23>: mov %rax,-0x10(%rbp)
0x0000555555554695 <+27>: mov %rdx,-0x8(%rbp)
0x0000555555554699 <+31>: mov $0x0,%eax
0x000055555555469e <+36>: leaveq
0x000055555555469f <+37>: retq
assember with destructor:
0x00000000000006da <+0>: push %rbp
0x00000000000006db <+1>: mov %rsp,%rbp
0x00000000000006de <+4>: sub $0x20,%rsp
0x00000000000006e2 <+8>: mov %fs:0x28,%rax
0x00000000000006eb <+17>: mov %rax,-0x8(%rbp)
0x00000000000006ef <+21>: xor %eax,%eax
0x00000000000006f1 <+23>: lea -0x20(%rbp),%rax
0x00000000000006f5 <+27>: mov $0x28,%edx
0x00000000000006fa <+32>: mov $0xfffeeea0,%esi
0x00000000000006ff <+37>: mov %rax,%rdi
0x0000000000000702 <+40>: callq 0x739 <f>
0x0000000000000707 <+45>: lea -0x20(%rbp),%rax
0x000000000000070b <+49>: mov %rax,%rdi
0x000000000000070e <+52>: callq 0x72e <Foo::~Foo()>
0x0000000000000713 <+57>: mov $0x0,%eax
0x0000000000000718 <+62>: mov -0x8(%rbp),%rcx
0x000000000000071c <+66>: xor %fs:0x28,%rcx
0x0000000000000725 <+75>: je 0x72c <main()+82>
0x0000000000000727 <+77>: callq 0x5c0 <__stack_chk_fail#plt>
0x000000000000072c <+82>: leaveq
0x000000000000072d <+83>: retq
I wonder what is going on?
I can understand why compiler should handle return in different
way, but why it moves arguments in different registers %esi vs %edi.
For clearness I understand that I do wrong thing, and I rewrite code with
some kind of smart pointers instead without touching real Foo.
But I wonder how ABI of c++ and c works in this particular case.
full example:
//test.cpp
extern "C" {
struct Foo {
void *p;
int len;
~Foo() {/*call free*/}
};
struct Foo f(void *opaque, int param);
}
int main()
{
auto foo = f(reinterpret_cast<void *>(0xfffeeea0), 40);
}
//test.c
#include <stdio.h>
struct Foo {
void *p;
int len;
};
struct Foo f(void *opaque, int param)
{
printf("!!! %p %d\n", opaque, param);
struct Foo ret = {0, 0};
return ret;
}
#makefile:
prog: test.cpp test.c
gcc -Wall -ggdb -std=c11 -c -o test.c.o test.c
g++ -Wall -ggdb -std=c++11 -o $# test.cpp test.c.o
./prog
In the first version of your code (no destructor), we have:
// allocate 16 bytes on the stack (for a Foo instance)
sub $0x10,%rsp
// load two (constant) arguments into %edi and %esi
mov $0x28,%esi
mov $0xfffeeea0,%edi
// call f
callq 0x5555555546a0 <f>
// a 2-word struct was returned by value (in %rax/%rdx).
// move the values to the corresponding slots on the stack
mov %rax,-0x10(%rbp)
mov %rdx,-0x8(%rbp)
In the second version (with a destructor):
// load address of Foo instance into %rax
lea -0x20(%rbp),%rax
// load three arguments:
// - 40 in %edx
// - 0xfffeeea0 in %esi
// - &foo in %rdi
mov $0x28,%edx
mov $0xfffeeea0,%esi
mov %rax,%rdi
// ... and call f
callq 0x739 <f>
// ignore f's return value; load &foo into %rax again
lea -0x20(%rbp),%rax
// call ~Foo on &foo
mov %rax,%rdi
callq 0x72e <Foo::~Foo()>
My guess is that without a destructor the struct is treated like a plain 2-word tuple and returned by value.
But with a destructor the compiler assumes it can't just copy the member values around, so it transforms the struct return value into a hidden pointer argument:
struct Foo f(void *opaque, int param);
// actually implemented as:
void f(struct Foo *_hidden, void *opaque, int param);
Normally f would then take care of writing the return value into *_hidden.
Because the caller and the implementer of the function see a different return type, they disagree about the number of parameters the function actually has. The C++ code passes 3 arguments, but the C code only looks at two of them. It misinterprets the address of the Foo instance as the opaque pointer, and what was supposed to be the opaque pointer ends up in param.
In other words, the presence of a destructor means Foo is no longer a POD type, which inhibits simple return-by-value through registers.
I just have a quick question on if this will work on or not.
void __declspec(naked) HookProcessEventProxy() {
__asm {
mov CallObjectPointer, ecx
push edx
mov edx, dword ptr[esp + 0x8]
mov UFunctionPointer, edx
mov edx, dword ptr[esp + 0xC]
mov ParamsPointer, edx
pop edx
pushfd
pushad
}
ProcessEventProxy();
__asm {
popad
popfd
jmp[Pointers::OldProcessEvent] // This is the line in question.
}
}
Does the Pointers namespace define to go to the Pointers::OldProcessEvent or will it go to the ProcessEvent I have inside of my DLLMain?
The HookProcessEventProxy is inside my DLLMain.
From the vendor-specific extensions in the code, it seems that you are compiling this on MSVC. If so, then this is not a problem. The inline assembler understands C++ scoping rules and identifiers.
You can easily verify this for yourself by analyzing the object code produced by the compiler. Either disassemble the binary using dumpbin /disasm, or throw the /FA switch when running the compiler to get a separate listing. What you'll see is that the compiler emits your inline assembly in a very literal fashion:
?HookProcessEventProxy##YAXXZ PROC ; HookProcessEventProxy, COMDAT
mov DWORD PTR ?CallObjectPointer##3HA, ecx ; CallObjectPointer
push edx
mov edx, DWORD PTR [esp+8]
mov DWORD PTR ?UFunctionPointer##3HA, edx ; UFunctionPointer
mov edx, DWORD PTR [esp+12]
mov DWORD PTR ?ParamsPointer##3HA, edx ; ParamsPointer
pop edx
pushfd
pushad
call ?ProcessEventProxy##YAXXZ ; ProcessEventProxy
popad
popfd
jmp ?OldProcessEvent#Pointers##YAXXZ ; Pointers::OldProcessEvent
?HookProcessEventProxy##YAXXZ ENDP ; HookProcessEventProxy
The above listing is from the file generated by the compiler when the /FA switch is used. The comments out to the right indicate the corresponding C++ object.
Note that you do not need the brackets around the branch target. Although the inline assembler ignores them, it is confusing to include them. Just write:
jmp Pointers::OldProcessEvent
I've been writing a simple c++ program that uses Assembly to take the GCD of 2 numbers and output them as an example used in a tutorial I watched. I understand what it's doing, but I don't understand why it won't work.
EDIT: Should add that when it runs, it doesn't output anything at all.
#include <iostream>
using namespace std;
int gcd(int a, int b)
{
int result;
_asm
{
push ebp
mov ebp, esp
mov eax, a
mov ebx, b
looptop:
cmp eax, 0
je goback
cmp eax, ebx
jge modulo
xchg eax, ebx
modulo:
idiv ebx
mov eax, edx
jmp looptop
goback:
mov eax, ebx
mov esp, ebp
pop ebp
mov result, edx
}
return result;
}
int main()
{
cout << gcd(46,90) << endl;
return 0;
}
I'm running it on a 32bit Windows system, any help would be appreciated. When compiling, I get 4 errors:
warning C4731: 'gcd' : frame pointer register 'ebp' modified by inline assembly code
warning C4731: 'gcd' : frame pointer register 'ebp' modified by inline assembly code
warning C4731: 'main' : frame pointer register 'ebp' modified by inline assembly code
warning C4731: 'main' : frame pointer register 'ebp' modified by inline assembly code
The compiler will insert these or equivalent instructions for you at the beginning and end of the function:
push ebp
mov ebp, esp
...
mov esp, ebp
pop ebp
If you add them manually, you won't be able to access the function's parameters through ebp, which is why the compiler is issuing warnings.
Remove these 4 instructions.
Also, start using the debugger. Today.