How to compile and execute from memory directly?

How to compile and execute from memory directly? - c++

Is it possible to compile a C++ (or the like) program without generating the executable file but writing it and executing it directly from memory?
For example with GCC and clang, something that has a similar effect to:
c++ hello.cpp -o hello.x && ./hello.x $# && rm -f hello.x
In the command line.
But without the burden of writing an executable to disk to immediately load/rerun it.
(If possible, the procedure may not use disk space or at least not space in the current directory which might be read-only).

Possible? Not the way you seem to wish. The task has two parts:
1) How to get the binary into memory
When we specify /dev/stdout as output file in Linux we can then pipe into our program x0 that reads
an executable from stdin and executes it:
gcc -pipe YourFiles1.cpp YourFile2.cpp -o/dev/stdout -Wall | ./x0
In x0 we can just read from stdin until reaching the end of the file:
int main(int argc, const char ** argv)
{
const int stdin = 0;
size_t ntotal = 0;
char * buf = 0;
while(true)
{
/* increasing buffer size dynamically since we do not know how many bytes to read */
buf = (char*)realloc(buf, ntotal+4096*sizeof(char));
int nread = read(stdin, buf+ntotal, 4096);
if (nread<0) break;
ntotal += nread;
}
memexec(buf, ntotal, argv);
}
It would also be possible for x0 directly execute the compiler and read the output. This question has been answered here: Redirecting exec output to a buffer or file
Caveat: I just figured out that for some strange reason this does not work when I use pipe | but works when I use the x0 < foo.
Note: If you are willing to modify your compiler or you do JIT like LLVM, clang and other frameworks you could directly generate executable code. However for the rest of this discussion I assume you want to use an existing compiler.
Note: Execution via temporary file
Other programs such as UPX achieve a similar behavior by executing a temporary file, this is easier and more portable than the approach outlined below. On systems where /tmp is mapped to a RAM disk for example typical servers, the temporary file will be memory based anyway.
#include<cstring> // size_t
#include <fcntl.h>
#include <stdio.h> // perror
#include <stdlib.h> // mkostemp
#include <sys/stat.h> // O_WRONLY
#include <unistd.h> // read
int memexec(void * exe, size_t exe_size, const char * argv)
{
/* random temporary file name in /tmp */
char name[15] = "/tmp/fooXXXXXX";
/* creates temporary file, returns writeable file descriptor */
int fd_wr = mkostemp(name, O_WRONLY);
/* makes file executable and readonly */
chmod(name, S_IRUSR | S_IXUSR);
/* creates read-only file descriptor before deleting the file */
int fd_ro = open(name, O_RDONLY);
/* removes file from file system, kernel buffers content in memory until all fd closed */
unlink(name);
/* writes executable to file */
write(fd_wr, exe, exe_size);
/* fexecve will not work as long as there in a open writeable file descriptor */
close(fd_wr);
char *const newenviron[] = { NULL };
/* -fpermissive */
fexecve(fd_ro, argv, newenviron);
perror("failed");
}
Caveat: Error handling is left out for clarities sake. Includes for sake of brevity.
Note: By combining step main() and memexec() into a single function and using splice(2) for copying directly between stdin and fd_wr the program could be significantly optimized.
2) Execution directly from memory
One does not simply load and execute an ELF binary from memory. Some preparation, mostly related to dynamic linking, has to happen. There is a lot of material explaining the various steps of the ELF linking process and studying it makes me believe that theoretically possible. See for example this closely related question on SO however there seems not to exist a working solution.
Update UserModeExec seems to come very close.
Writing a working implementation would be very time consuming, and surely raise some interesting questions in its own right. I like to believe this is by design: for most applications it is strongly undesirable to (accidentially) execute its input data because it allows code injection.
What happens exactly when an ELF is executed? Normally the kernel receives a file name and then creates a process, loads and maps the different sections of the executable into memory, performs a lot of sanity checks and marks it as executable before passing control and a file name back to the run-time linker ld-linux.so (part of libc). The takes care of relocating functions, handling additional libraries, setting up global objects and jumping to the executables entry point. AIU this heavy lifting is done by dl_main() (implemented in libc/elf/rtld.c).
Even fexecve is implemented using a file in /proc and it is this need for a file name that leads us to reimplement parts of this linking process.
Libraries
UserModeExec
libelf -- read, modify, create ELF files
eresi -- play with elfes
OSKit (seems like a dead project though)
Reading
http://www.linuxjournal.com/article/1060?page=0,0 -- introduction
http://wiki.osdev.org/ELF -- good overview
http://s.eresi-project.org/inc/articles/elf-rtld.txt -- more detailed Linux-specific explanation
http://www.codeproject.com/Articles/33340/Code-Injection-into-Running-Linux-Application -- how to get to hello world
http://www.acsu.buffalo.edu/~charngda/elf.html -- nice reference of ELF structure
Loaders and Linkers by John Levine -- deeoer explanation of linking
Related Questions at SO
Linux user-space ELF loader
ELF Dynamic loader symbol lookup ordering
load-time ELF relocation
How do global variables get initialized by the elf loader
So it seems possible, you decide whether is also practical.

Yes, though doing it properly requires designing significant parts of the compiler with this in mind. The LLVM guys have done this, first with a kinda-separate JIT, and later with the MC subproject. I don't think there's a ready-made tool doing it. But in principle, it's just a matter of linking to clang and llvm, passing the source to clang, and passing the IR it creates to MCJIT. Maybe a demo does this (I vaguely recall a basic C interpreter that worked like this, though I think it was based on the legacy JIT).
Edit: Found the demo I recalled. Also, there's cling, which seems to do basically what I described, but better.

Linux can create virtual file systems in RAM using tempfs. For example, I have my tmp directory set up in my file system table like so:
tmpfs /tmp tmpfs nodev,nosuid 0 0
Using this, any files I put in /tmp are stored in my RAM.
Windows doesn't seem to have any "official" way of doing this, but has many third-party options.
Without this "RAM disk" concept, you would likely have to heavily modify a compiler and linker to operate completely in memory.

If you are not specifically tied to C++, you may also consider other JIT based solutions:
in Common Lisp SBCL is able to generate machine code on the fly
you could use TinyCC and its libtcc.a which emits quickly poor (i.e. unoptimized) machine code from C code in memory.
consider also any JITing library, e.g. libjit, GNU Lightning, LLVM, GCCJIT, asmjit
of course emitting C++ code on some tmpfs and compiling it...
But if you want good machine code, you'll need it to be optimized, and that is not fast (so the time to write to a filesystem is negligible).
If you are tied to C++ generated code, you need a good C++ optimizing compiler (e.g. g++ or clang++); they take significant time to compile C++ code to optimized binary, so you should generate to some file foo.cc (perhaps in a RAM file system like some tmpfs, but that would give a minor gain, since most of the time is spent inside g++ or clang++ optimization passes, not reading from disk), then compile that foo.cc to foo.so (using perhaps make, or at least forking g++ -Wall -shared -O2 foo.cc -o foo.so, perhaps with additional libraries). At last have your main program dlopen that generated foo.so. FWIW, MELT was doing exactly that, and on Linux workstation the manydl.c program shows that a process can generate then dlopen(3) many hundred thousands of temporary plugins, each one being obtained by generating a temporary C file and compiling it. For C++ read the C++ dlopen mini HOWTO.
Alternatively, generate a self-contained source program foobar.cc, compile it to an executable foobarbin e.g. with g++ -O2 foobar.cc -o foobarbin and execute with execve that foobarbin executable binary
When generating C++ code, you may want to avoid generating tiny C++ source files (e.g. a dozen lines only; if possible, generate C++ files of a few hundred lines at least; unless lots of template expansion happens thru extensive use of existing C++ containers, where generating a small C++ function combining them makes sense). For instance, try if possible to put several generated C++ functions in the same generated C++ file (but avoid having very big generated C++ functions, e.g. 10KLOC in a single function; they take a lot of time to be compiled by GCC). You could consider, if relevant, to have only one single #include in that generated C++ file, and pre-compile that commonly included header.
Jacques Pitrat's book Artificial Beings, the conscience of a conscious machine (ISBN 9781848211018) explains in details why generating code at runtime is useful (in symbolic artificial intelligence systems like his CAIA system). The RefPerSys project is trying to follow that idea and generate some C++ code (and hopefully, more and more of it) at runtime. Partial evaluation is a relevant concept.
Your software is likely to spend more CPU time in generating C++ code than GCC in compiling it.

tcc compiler "-run" option allows for exactly this, compile into memory, run there and finally discard the compiled stuff. No filesystem space needed. "tcc -run" can be used in shebang to allow for C script, from tcc man page:
#!/usr/local/bin/tcc -run
#include <stdio.h>
int main()
{
printf("Hello World\n");
return 0;
}
C scripts allow for mixed bash/C scripts, with "tcc -run" not needing any temporary space:
#!/bin/bash
echo "foo"
sed -n "/^\/\*\*$/,\$p" $0 | tcc -run -
exit
/**
*/
#include <stdio.h>
int main()
{
printf("bar\n");
return 0;
}
Execution output:
$ ./shtcc2
foo
bar
$
C scripts with gcc are possible as well, but need temporary space like others mentioned to store executable. This script produces same output as the previous one:
#!/bin/bash
exc=/tmp/`basename $0`
if [ $0 -nt $exc ]; then sed -n "/^\/\*\*$/,\$p" $0 | gcc -x c - -o $exc; fi
echo "foo"
$exc
exit
/**
*/
#include <stdio.h>
int main()
{
printf("bar\n");
return 0;
}
C scripts with suffix ".c" are nice, headtail.c was my first ".c" file that needed to be executable:
$ echo -e "1\n2\n3\n4\n5\n6\n7" | ./headtail.c
1
2
3
6
7
$
I like C scripts, because you just have one file, you can easily move around, and changes in bash or C part require no further action, they just work on next execution.
P.S:
The above shown "tcc -run" C script has a problem, C script stdin is not available for executed C code. Reason was that I passed extracted C code via pipe to "tcc -run". New gist run_from_memory_stdin.c does it correctly:
...
echo "foo"
tcc -run <(sed -n "/^\/\*\*$/,\$p" $0) 42
...
"foo" is printed by bash part, "bar 42" from C part (42 is passed argv[⁠1]), and piped script input gets printed from C code then:
$ route -n | ./run_from_memory_stdin.c
foo
bar 42
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.29.58.98 0.0.0.0 UG 306 0 0 wlan1
10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 wlan0
169.254.0.0 0.0.0.0 255.255.0.0 U 303 0 0 wlan0
172.29.58.96 0.0.0.0 255.255.255.252 U 306 0 0 wlan1
$

One can easily modify the compiler itself. It sounds hard first but thinking about it, it seams obvious. So modifying the compiler sources directly expose a library and make it a shared library should not take that much of afford (depending on the actual implementation).
Just replace every file access with a solution of a memory mapped file.
It is something I am about to do with compiling something transparently in the background to op codes and execute those from within Java.
-
But thinking about your original question it seams you want to speed up compilation and your edit and run cycle. First of all get a SSD-Disk you get almost memory speed (use a PCI version) and lets say its C we are talking about. C does this linking step resulting in very complex operations that are likely to take more time than reading and writing from / to disk. So just put everything on SSD and live with the lag.

Finally the answer to OP question is yes!
I found memrun repo from guitmz, that demoed running (x86_64) ELF from memory, with golang and assembler. I forked that, and provided C version of memrun, that runs ELF binaries (verified on x86_64 and armv7l), either from standard input, or via first argument process substitution. The repo contains demos and documentation (memrun.c is 47 lines of code only):
https://github.com/Hermann-SW/memrun/tree/master/C#memrun
Here is simplest example, with "-o /dev/fd/1" gcc compiled ELF gets sent to stdout, and piped to memrun, which executes it:
pi#raspberrypi400:~/memrun/C $ gcc info.c -o /dev/fd/1 | ./memrun
My process ID : 20043
argv[0] : ./memrun
no argv[1]
evecve --> /usr/bin/ls -l /proc/20043/fd
total 0
lr-x------ 1 pi pi 64 Sep 18 22:27 0 -> 'pipe:[1601148]'
lrwx------ 1 pi pi 64 Sep 18 22:27 1 -> /dev/pts/4
lrwx------ 1 pi pi 64 Sep 18 22:27 2 -> /dev/pts/4
lr-x------ 1 pi pi 64 Sep 18 22:27 3 -> /proc/20043/fd
pi#raspberrypi400:~/memrun/C $
The reason I was interested in this topic was usage in "C script"s. run_from_memory_stdin.c demonstrates all together:
pi#raspberrypi400:~/memrun/C $ wc memrun.c | ./run_from_memory_stdin.c
foo
bar 42
47 141 1005 memrun.c
pi#raspberrypi400:~/memrun/C $
The C script producing shown output is so small ...
#!/bin/bash
echo "foo"
./memrun <(gcc -o /dev/fd/1 -x c <(sed -n "/^\/\*\*$/,\$p" $0)) 42
exit
/**
*/
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("bar %s\n", argc>1 ? argv[1] : "(undef)");
for(int c=getchar(); EOF!=c; c=getchar()) { putchar(c); }
return 0;
}
P.S:
I added tcc's "-run" option to gcc and g++, for details see:
https://github.com/Hermann-SW/memrun/tree/master/C#adding-tcc--run-option-to-gcc-and-g
Just nice, and nothing gets stored in filesystem:
pi#raspberrypi400:~/memrun/C $ uname -a | g++ -O3 -Wall -run demo.cpp 42
bar 42
Linux raspberrypi400 5.10.60-v7l+ #1449 SMP Wed Aug 25 15:00:44 BST 2021 armv7l GNU/Linux
pi#raspberrypi400:~/memrun/C $

Related

Is there anyway to use data.table's fread C implementation in C++?

My ultimate goal is to use a fast csv parser in C++. I have looked at the following libraries:
https://github.com/ben-strasser/fast-cpp-csv-parser
https://github.com/vincentlaucsb/csv-parser#integration
https://github.com/p-ranav/csv2
I have also come across numerous stack-overflow questions regarding CSV Parsing such as:
Fastest way to get data from a CSV in C++
Parse very large CSV files with C++
Reason behind speed of fread in data.table package in R
My understanding that the fastest way to CSV parse is to use C ( obviously ), memory mapping, and multi-threading.
I've tried many of the solutions above, with csv2 coming out the fastest (https://github.com/p-ranav/csv2)
But none of these are even close to data.table's fread. I have tried looking through their source code (https://github.com/Rdatatable/data.table) to try and extract the fread implementation in C. But I am struggling to incorporate it into my C++ code.
I believe the relevant files are:
dt_stdio.h, fread.c, fread.h, and myomp.h
I was wondering if there was an easy way to compile the existing data.table solution into my C++ codebase.
I think my best solution so far is using csv2 (https://github.com/p-ranav/csv2). This gives very fast memory mapping time. I am struggling with parsing it quickly enough. Even if I just loop through the rows as in their documentation, my time goes to 2 seconds
csv2::Reader<csv2::delimiter<','>,
csv2::quote_character<'"'>,
csv2::first_row_is_header<true>,
csv2::trim_policy::trim_whitespace> csv;
if (csv.mmap(file_name)) {
const auto header = csv.header();
for (const auto &row: csv) {
// if i only loop through rows --> 2 seconds
for (const auto &cell: row) {
// if i run both loops which is probably necessary for parsing --> 17 seconds
// Do something with cell value
// std::string value;
// cell.read_value(value);
}
}
}
EDIT::
I am using G++ 11.2.0 on Windows.
my G++ -O option flag was set to 0 previously.
changing it to -O3 improved performance ( #Alan Birtles).
Even after changing compiler optimization settings, I get the following results pre-parsing:
Method
Time to Read w/o Parsing
Time to Read + Parse
data.table
Not Applicable
2 seconds
csv2_reader
.003 seconds
17 seconds
csv2_reader with += 1 in loops
6 seconds
17 seconds
fastcppcsvparser
2.5 seconds
14 seconds
csv_parser
17.5 seconds
not worth running
Is there a way to get data.table's implementation into C++ without using Rcpp along with RInside?
Latest Question:
I just downloaded one of the benchmark data-sets. and get the same timing. Maybe I'm misunderstanding something. but adding +=1 to count the rows and columns in the loop slows it down from .001 seconds to 6seconds. which seems weird. and then using cell.read_raw_value slows it down even further.
so how am i supposed to access this data in C++ once its in a memory map? without the huge performance loss. Similar to whatever R's data.table does
Chat: https://chat.stackoverflow.com/rooms/242552/c-csv-parsing

The question "can I call this c code from c++" is "yes you can" (unless there is something truly weird going on. Have to avoid name mangling tho
the trick is this
extern "C" {
#include "somecode.h"
}
see Call a C function from C++ code
But really c++ should be able to produce a csv parser that is the same speed as a c one, there is nothing that c can do that c++ cannot

I have tried the following in C++
// main.cpp
extern "C" {
#include <fread.h>
}
#include <iostream>
int main(int argc, char* argv[]) {
return 0;
}
I then use the following to compile and link:
gcc -Iinclude -c -o fread.o fread.c
g++-Iinclude -c -o main.o main.cpp
g++ -Iinclude -o main.exe main.o fread.o
But then get the following errors on compiling:
x86_64-w64-mingw32/bin/ld.exe: fread.o:fread.c:(.text+0x1d0): undefined reference to `libintl_dgettext'
x86_64-w64-mingw32/bin/ld.exe: fread.o:fread.c:(.text+0x1da): undefined reference to `Rprintf'
x86_64-w64-mingw32/bin/ld.exe: fread.o:fread.c:(.text+0x42e): undefined reference to `libintl_dgettext'
x86_64-w64-mingw32/bin/ld.exe: fread.o:fread.c:(.text+0x447): undefined reference to `__halt'
... alot more things
I have included the relevant files to compile in the include folder in my directory. And started this issue for potential C++ implementation:
https://github.com/Rdatatable/data.table/issues/5343

Why is a C++ Hello World binary larger than the equivalent C binary?

In his FAQ, Bjarne Stroustrup says that when compiled with gcc -O2, the file size of a hello world using C and C++ are identical.
Reference: http://www.stroustrup.com/bs_faq.html#Hello-world
I decided to try this, here is the C version:
#include <stdio.h>
int main(int argc, char* argv[])
{
printf("Hello world!\n");
return 0;
}
And here is the C++ version
#include <iostream>
int main(int argc, char* argv[])
{
std::cout << "Hello world!\n";
return 0;
}
Here I compile, and the sizes are different:
r00t#wutdo:~/hello$ ls
hello.c hello.cpp
r00t#wutdo:~/hello$ gcc -O2 hello.c -o c.out
r00t#wutdo:~/hello$ g++ -O2 hello.cpp -o cpp.out
r00t#wutdo:~/hello$ ls -l
total 32
-rwxr-xr-x 1 r00t r00t 8559 Sep 1 18:00 c.out
-rwxr-xr-x 1 r00t r00t 8938 Sep 1 18:01 cpp.out
-rw-r--r-- 1 r00t r00t 95 Sep 1 17:59 hello.c
-rw-r--r-- 1 r00t r00t 117 Sep 1 17:59 hello.cpp
r00t#wutdo:~/hello$ size c.out cpp.out
text data bss dec hex filename
1191 560 8 1759 6df c.out
1865 608 280 2753 ac1 cpp.out
I replaced std::endl with \n and it made the binary smaller. I figured something this simple would be inlined, and am dissapointed it's not.
Also wow, the optimized assemblies have hundreds of lines of assembly output? I can write hello world with like 5 assembly instructions using sys_write, what's up with all the extra stuff? Why does C put some much extra on the stack to setup? I mean, like 50 bytes of assembly vs 8kb of C, why?

You're looking at a mix of information that's easily misinterpreted. The 8559 and 8938 byte file sizes are largely meaningless since they're mostly headers with symbol names and other misc information for at least minimal debugging purposes. The somewhat meaningful numbers are the size(1) output you added later:
r00t#wutdo:~/hello$ size c.out cpp.out
text data bss dec hex filename
1191 560 8 1759 6df c.out
1865 608 280 2753 ac1 cpp.out
You could get a more detailed breakdown by using the -A option to size, but in short, the differences here are fairly trivial.
What's more interesting is that Bjarne Stroustrup never mentioned whether he was talking about static or dynamic linking. In your case, both programs are dynamic-linked, so the size differences have nothing to do with the actual size cost of stdio or iostream; you're just measuring the cost of the calling code, or (more likely, based on the other comments/answer) the base overhead of exception-handling support for C++. Now, there is a common claim that a static-linked C++ iostream-based hello world can be even smaller than a printf-based one, since the compiler can see exactly which overloaded versions of operator<< are used and optimize out unneeded code (such as expensive floating point printing), whereas printf's use of format strings makes this difficult in the common case and impossible in general. However, I've never seen a C++ implementation where a static-linked iostream-based hello program could come anywhere near close to being as small as, much less smaller than, a printf-based one in C.

I think he's treating the half kilobyte as a rounding error. Both are "9 kilobytes" and that's what you'll see in a typical file browser. They aren't exactly the same because, under the hood, the C and C++ libraries are quite different. If you're already familiar with your disassembler, you can see the details of the difference for yourself.
The "extra stuff" is for the sake of importing symbols from the standard library shlib, and handling C++ exceptions. Strangely enough, much of the GCC-compiled C executable is taken up by C++ exception handling tables. I've not figured out how to strip them using GCC.
endl is inlined, but it contains calls to print the \n character and flush the stream, which are not inlined. The difference in size is due to importing those from the standard library.
In truth, individual kilobytes seldom matter on any system with dynamically-loaded libraries. Self-contained code such as on an embedded system would need to include the standard library functionality it uses, and the C++ standard library tends to be heavier than its C counterpart — <iostream> vs. <stdio.h> in particular.

How to make g++ to show the execution time in terminal?

I want to make g++ showing me what is the execution time and maybe the return too.
g++ file.cpp -o file
./file
When i make the executable file and then call in it is showing only the output without the return and execution time.
And i want to make it showing something like this:
Process returned 0 (0x0) execution time : 0.002 s
Thank you for the attention!

You can measure how long your process takes with the "time" command.
To determine the return value, you can print the value of the $? environment variable after running your program:
time ./file ; echo Process returned $?
You can also specify how exactly time should format its results with the -f (or --format) option.
However, some Linux distributions might use a bash-builtin time implementation by default which lacks that option, so you might have to give the full path to use the real time program:
/usr/bin/time -f "Execution time: %E" ./file

You can use time command as follows:
time ./file

Look here first: calculating execution time in c++
You can also use a clock (from time.h)in code (although for multi-threaded code it works kinda funny)
int a;
unsigned t0 = clock(), t1;
std::cin >> a;
t1 = clock() - t0;

Read carefully time(7). You may use time(1) externally (from your shell, like answered here). You could also use from inside your program syscalls like clock_gettime(1) etc...
Don't expect timing to be really meaningful or accurate for small delays (e.g. less than half a second).
BTW, your question has not much to do with the GCC compiler (i.e. g++). If using a recent GCC 4.8 with -std=c++11 you could use std::chrono etc etc... And even if using these it is not the compiler which is timing your program (but some library functions using syscalls)

How can I monitor what's being put into the standard out buffer and break when a specific string is deposited in the pipe?

In Linux, with C/C++ code, using gdb, how can you add a gdb breakpoint to scan the incoming strings in order to break on a particular string?
I don't have access to a specific library's code, but I want to break as soon as that library sends a specific string to standard out so I can go back up the stack and investigate the part of my code that is calling the library. Of course I don't want to wait until a buffer flush occurs. Can this be done? Perhaps a routine in libstdc++ ?

This question might be a good starting point: how can I put a breakpoint on "something is printed to the terminal" in gdb?
So you could at least break whenever something is written to stdout. The method basically involves setting a breakpoint on the write syscall with a condition that the first argument is 1 (i.e. STDOUT). In the comments, there is also a hint as to how you could inspect the string parameter of the write call as well.
x86 32-bit mode
I came up with the following and tested it with gdb 7.0.1-debian. It seems to work quite well. $esp + 8 contains a pointer to the memory location of the string passed to write, so first you cast it to an integral, then to a pointer to char. $esp + 4 contains the file descriptor to write to (1 for STDOUT).
$ gdb break write if 1 == *(int*)($esp + 4) && strcmp((char*)*(int*)($esp + 8), "your string") == 0
x86 64-bit mode
If your process is running in x86-64 mode, then the parameters are passed through scratch registers %rdi and %rsi
$ gdb break write if 1 == $rdi && strcmp((char*)($rsi), "your string") == 0
Note that one level of indirection is removed since we're using scratch registers rather than variables on the stack.
Variants
Functions other than strcmp can be used in the above snippets:
strncmp is useful if you want match the first n number of characters of the string being written
strstr can be used to find matches within a string, since you can't always be certain that the string you're looking for is at the beginning of string being written through the write function.
Edit: I enjoyed this question and finding it's subsequent answer. I decided to do a blog post about it.

catch + strstr condition
The cool thing about this method is that it does not depend on glibc write being used: it traces the actual system call.
Furthermore, it is more resilient to printf() buffering, as it might even catch strings that are printed across multiple printf() calls.
x86_64 version:
define stdout
catch syscall write
commands
printf "rsi = %s\n", $rsi
bt
end
condition $bpnum $rdi == 1 && strstr((char *)$rsi, "$arg0") != NULL
end
stdout qwer
Test program:
#define _XOPEN_SOURCE 700
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
write(STDOUT_FILENO, "asdf1", 5);
write(STDOUT_FILENO, "qwer1", 5);
write(STDOUT_FILENO, "zxcv1", 5);
write(STDOUT_FILENO, "qwer2", 5);
printf("as");
printf("df");
printf("qw");
printf("er");
printf("zx");
printf("cv");
fflush(stdout);
return EXIT_SUCCESS;
}
Outcome: breaks at:
qwer1
qwer2
fflush. The previous printf didn't actually print anything, they were buffered! The write syacall only happened on the fflush.
Notes:
$bpnum thanks to Tromey at: https://sourceware.org/bugzilla/show_bug.cgi?id=18727
rdi: register that contains the number of the Linux system call in x86_64, 1 is for write
rsi: first argument of the syscall, for write it points to the buffer
strstr: standard C function call, searches for submatches, returns NULL if non found
Tested in Ubuntu 17.10, gdb 8.0.1.
strace
Another option if you are feeling interactive:
setarch "$(uname -m)" -R strace -i ./stdout.out |& grep '\] write'
Sample output:
[00007ffff7b00870] write(1, "a\nb\n", 4a
Now copy that address and paste it into:
setarch "$(uname -m)" -R strace -i ./stdout.out |& grep -E '\] write\(1, "a'
The advantage of this method is that you can use the usual UNIX tools to manipulate strace output, and it does not require deep GDB-fu.
Explanation:
-i makes strace output RIP
setarch -R disables ASLR for a process with a personality system call: How to debug with strace -i when everytime address is different GDB already does that by default, so no need to do it again.

Anthony's answer is awesome. Following his answer, I tried out another solution on Windows(x86-64 bits Windows). I know this question here is for GDB on Linux, however, I think this solution is a supplement for this kind of question. It might be helpful for others.
Solution on Windows
In Linux a call to printf would result in call to the API write. And because Linux is an open source OS, we could debug within the API. However, the API is different on Windows, it provided it's own API WriteFile. Due to Windows is a commercial non-open source OS, breakpoints could not be added in the APIs.
But some of the source code of VC is published together with Visual Studio, so we could find out in the source code where finally called the WriteFile API and set a breakpoint there. After debugging on the sample code, I found the printf method could result in a call to _write_nolock in which WriteFile is called. The function is located in:
your_VS_folder\VC\crt\src\write.c
The prototype is:
/* now define version that doesn't lock/unlock, validate fh */
int __cdecl _write_nolock (
int fh,
const void *buf,
unsigned cnt
)
Compared to the write API on Linux:
#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t count);
They have totally the same parameters. So we could just set a condition breakpoint in _write_nolock just refer to the solutions above, with only some differences in detail.
Portable Solution for Both Win32 and x64
It is very lucky that we could use the name of parameters directly on Visual Studio when setting a condition for breakpoints on both Win32 and x64. So it becomes very easy to write the condition:
Add a breakpoints in _write_nolock
NOTICE: There are little difference on Win32 and x64. We could just use the function name to set the location of breakpoints on Win32. However, it won't work on x64 because in the entrance of the function, the parameters is not initialized. Therefore, we could not use the parameter name to set the condition of breakpoints.
But fortunately we have some work around: use the location in the function rather than the function name to set the breakpoints, e.g., the 1st line of the function. The parameters are already initialized there. (I mean use the filename+line number to set the breakpoints, or open the file directly and set a breakpoint in the function, not the entrance but the first line. )
Restrict the condition:
fh == 1 && strstr((char *)buf, "Hello World") != 0
NOTICE: there is still a problem here, I tested two different ways to write something into stdout: printf and std::cout. printf would write all the strings to the _write_nolock function at once. However std::cout would only pass character by character to _write_nolock, which means the API would be called strlen("your string") times. In this case, the condition could not be activated forever.
Win32 Solution
Of course we could use the same methods as Anthony provided: set the condition of breakpoints by registers.
For a Win32 program, the solution is almost the same with GDB on Linux. You might notice that there is a decorate __cdecl in the prototype of _write_nolock. This calling convention means:
Argument-passing order is Right to left.
Calling function pops the arguments from the stack.
Name-decoration convention: Underscore character (_) is prefixed to names.
No case translation performed.
There is a description here. And there is an example which is used to show the registers and stacks on Microsoft's website. The result could be found here.
Then it is very easy to set the condition of breakpoints:
Set a breakpoint in _write_nolock.
Restrict the condition:
*(int *)($esp + 4) == 1 && strstr(*(char **)($esp + 8), "Hello") != 0
It is the same method as on the Linux. The first condition is to make sure the string is written to stdout. The second one is to match the specified string.
x64 Solution
Two important modification from x86 to x64 are the 64-bit addressing capability and a flat set of 16 64-bit registers for general use. As the increase of registers, x64 only use __fastcall as the calling convention. The first four integer arguments are passed in registers. Arguments five and higher are passed on the stack.
You could refer to the Parameter Passing page on Microsoft's website. The four registers (in order left to right) are RCX, RDX, R8 and R9. So it is very easy to restrict the condition:
Set a breakpoint in _write_nolock.
NOTICE: it's different from the portable solution above, we could just set the location of breakpoint to the function rather than the 1st line of the function. The reason is all the registers are already initialized at the entrance.
Restrict condition:
$rcx == 1 && strstr((char *)$rdx, "Hello") != 0
The reason why we need cast and dereference on esp is that $esp accesses the ESP register, and for all intents and purposes is a void*. While the registers here stores directly the values of parameters. So another level of indirection is not needed anymore.
Post
I also enjoy this question very much, so I translated Anthony's post into Chinese and put my answer in it as a supplement. The post could be found here. Thanks for #anthony-arnold 's permission.

Anthony's answer is very interesting and it definitely gives some results.
Yet, I think it might miss the buffering of printf.
Indeed on Difference between write() and printf(), you can read that: "printf doesn't necessarily call write every time. Rather, printf buffers its output."
STDIO WRAPPER SOLUTION
Hence I came with another solution that consists in creating a helper library that you can pre-load to wrap the printf like functions. You can then set some breakpoints on this library source and backtrace to get the info about the program you are debugging.
It works on Linux and target the libc, I do not know for c++ IOSTREAM, also if the program use write directly, it will miss it.
Here is the wrapper to hijack the printf (io_helper.c).
#include<string.h>
#include<stdio.h>
#include<stdarg.h>
#define MAX_SIZE 0xFFFF
int printf(const char *format, ...){
char target_str[MAX_SIZE];
int i=0;
va_list args1, args2;
/* RESOLVE THE STRING FORMATING */
va_start(args1, format);
vsprintf(target_str,format, args1);
va_end(args1);
if (strstr(target_str, "Hello World")){ /* SEARCH FOR YOUR STRING */
i++; /* BREAK HERE */
}
/* OUTPUT THE STRING AS THE PROGRAM INTENTED TO */
va_start(args2, format);
vprintf(format, args2);
va_end(args2);
return 0;
}
int puts(const char *s)
{
return printf("%s\n",s);
}
I added puts because gcc tend to replace printf by puts when it can. So I force it back to printf.
Next you just compile it to a shared library.
gcc -shared -fPIC io_helper.c -o libio_helper.so -g
And you load it before running gdb.
LD_PRELOAD=$PWD/libio_helper.so; gdb test
Where test is the program you are debugging.
Then you can break with break io_helper.c:19 because you compiled the library with -g.
EXPLANATIONS
Our luck here is that printf and other fprintf, sprintf... are just here to resolve the variadic arguments and to call their 'v' equivalent. (vprintf in our case). Doing this job is easy, so we can do it and leave the real work to libc with the 'v' function. To get the variadic args of printf, we just have to use va_start and va_end.
The main advantages of this method is that you are sure that when you break, you are in the portion of the program that output your target string and that this is not a leftover in a buffer. Also you do not make any assumption on the hardware. The drawback is that you are assuming that the program use the libc stdio function to output things.

How much is 32 kB of compiled code

I am planning to use an Arduino programmable board. Those have quite limited flash memories ranging between 16 and 128 kB to store compiled C or C++ code.
Are there ways to estimate how much (standard) code it will represent ?
I suppose this is very vague, but I'm only looking for an order of magnitude.

The output of the size command is a good starting place, but does not give you all of the information you need.
$ avr-size program.elf
text data bss dec hex filename
The size of your image is usually a little bit more than the sum of the text and the data sections. The bss section is essentially compressed because it is all 0s. There may be other sections which are relevant which aren't listed by size.
If your build system is set up like ones that I've used before for AVR microcontrollers then you will end up with an *.elf file as well as a *.bin file, and possibly a *.hex file. The *.bin file is the actual image that would be stored in the program flash of the processor, so you can examine its size to determine how your program is growing as you make edits to it. The *.bin file is extracted from the *.elf file with the objdump command and some flags which I can't remember right now.
If you are wanting to know how to guess-timate how your much your C or C++ code will produce when compiled, this is a lot more difficult. I have observed a 10x blowup in a function when I tried to use a uint64_t rather than a uint32_t when all I was doing was incrementing it (this was about 5 times more code than I thought it would be). This was mostly to do with gcc's avr optimizations not being the best, but smaller changes in code size can creep in from seemingly innocent code.
This will likely be amplified with the use of C++, which tends to hide more things that turn into code than C does. Chief among the things C++ hides are destructor calls and lots of pointer dereferencing which has to do with the this pointer in objects as well as a secret pointer many objects have to their virtual function table and class static variables.
On AVR all of this pointer stuff is likely to really add up because pointers are twice as big as registers and take multiple instructions to load. Also AVR has only a few register pairs that can be used as pointers, which results in lots of moving things into and out of those registers.
Some tips for small programs on AVR:
Use uint8_t and int8_t instead of int whenever you can. You could also use uint_fast8_t and int_fast8_t if you want your code to be portable. This can lead to many operations taking up only half as much code, because int is two bytes.
Be very aware of things like string and struct constants and literals and how/where they are stored.
If you're not scared of it, read the AVR assembly manual. You can get an idea of the types of instructions, and from that the type of C code that easily maps to those instructions. Use that kind of C code.

You can't really say there. The length of the uncompiled code has little to do with the length of the compiled code. For example:
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
int main()
{
std::vector<std::string> strings;
strings.push_back("Hello");
strings.push_back("World");
std::sort(strings.begin(), strings.end());
std::copy(strings.begin(), strings.end(), std::ostream_iterator<std::string>(std::cout, ""));
}
vs
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
int main()
{
std::vector<std::string> strings;
strings.push_back("Hello");
strings.push_back("World");
for ( int idx = 0; idx < strings.size(); idx++ )
std::cout << strings[idx];
}
Both are the exact same number of lines, and produce the same output, but the first example involves an instantiation of std::sort, which is probably an order of magnitude more code than the rest of the code here.
If you absolutely need to count number of bytes used in the program, use assembler.

Download the arduino IDE and 'verify' some of your existing code, or look at the sample sketches. It will tell you how many bytes that code is, which will give you an idea of how much more you can fit into a given device. Picking a couple of the examples at random, the web server example is 5816 bytes, and the LCD hello world is 2616. Both use external libraries.

Try creating a simplified version of your app, focusing on the most valuable feature first, then start adding up the 'nice (and cool) stuff to have'. Keep an eye on the byte usage shown in the Arduino IDE when you verify your code.
As a rough indication, my first app (LED flasher controlled by a push buttun) requires 1092 bytes. That`s roughly 1K out of 32k. Pretty small footprint for C++ code!
What worries me most is the limited amount of RAM (1 Kb). If the CPU stack takes some of it, then there isn`t much left for creating any data structures.
I only had my Arduino for 48 hrs, so there is still a lot to use it effectively ;-) But it's a lot of fun to use :).

It's quite a bit for a reasonably complex piece of software, but you will start bumping into the limit if you want it to have a lot of different functionality. Also, if you want to store quite a lot of static strings and data, it can eat into that quite quickly. But 32 KB is a decent amount for embedded applications. It tends to be RAM that you have problems with first!
Also, quite often the C++ compilers for embedded systems are a lot worse than the C compilers.
That is, they are nowhere as good as C++ compilers for the common desktop OS's (in terms of producing efficient machine code for the target platform).

At a linux system you can do some experiments with static compiled example programs. E.g.
$ size `which busybox `
text data bss dec hex filename
1830468 4448 25650 1860566 1c63d6 /bin/busybox
The sizes are given in bytes. This output is independent from the executable file format, since the sizes of the different sections inside the file format. The text section contains the machine code and const stufff. The data section contains data for static initialization of variables. The bss size is the size of uninitialized data - of course uninitialized data does not need to be stored in the executable file.)
Well, busybox contains a lot of functionality (like all common shell commands, a shell etc.).
If you link own examples with gcc -static, keep in mind, that your used libc may dramatically increase the program size and that using an embedded libc may be much more space efficient.
To test that you can check out the diet-libc or uclibc and link against that. Actually, busybox is usually linked against uclibc.
Note that the sizes you get this way give you only an order of magnitude. For example, your workstation probably uses another CPU architecture than the arduino board and the machine code of different architecture may differ, more or less, in its size (because of operand sizes, available instructions, opcode encoding and so one).
To go on with rough order of magnitude reasoning, busybox contains roughly 309 tools (including ftp daemon and such stuff), i.e. the average code size of a busybox tool is roughly 5k.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js