How to make g++ to show the execution time in terminal?

How to make g++ to show the execution time in terminal? - c++

I want to make g++ showing me what is the execution time and maybe the return too.
g++ file.cpp -o file
./file
When i make the executable file and then call in it is showing only the output without the return and execution time.
And i want to make it showing something like this:
Process returned 0 (0x0) execution time : 0.002 s
Thank you for the attention!

You can measure how long your process takes with the "time" command.
To determine the return value, you can print the value of the $? environment variable after running your program:
time ./file ; echo Process returned $?
You can also specify how exactly time should format its results with the -f (or --format) option.
However, some Linux distributions might use a bash-builtin time implementation by default which lacks that option, so you might have to give the full path to use the real time program:
/usr/bin/time -f "Execution time: %E" ./file

You can use time command as follows:
time ./file

Look here first: calculating execution time in c++
You can also use a clock (from time.h)in code (although for multi-threaded code it works kinda funny)
int a;
unsigned t0 = clock(), t1;
std::cin >> a;
t1 = clock() - t0;

Read carefully time(7). You may use time(1) externally (from your shell, like answered here). You could also use from inside your program syscalls like clock_gettime(1) etc...
Don't expect timing to be really meaningful or accurate for small delays (e.g. less than half a second).
BTW, your question has not much to do with the GCC compiler (i.e. g++). If using a recent GCC 4.8 with -std=c++11 you could use std::chrono etc etc... And even if using these it is not the compiler which is timing your program (but some library functions using syscalls)

Related

Running built-in octave function in c++ code (using API) is taking longer time than running that function in Octave... Why so...?

This is my C++ code (using C++ Octave API), which uses built-in Octave function filter(), which takes at least 3 arguments as input.
#include <iostream>
#include <complex>
#include "/usr/include/octave-4.2.2/octave/builtin-defun-decls.h"
int main (void) {
int len = 10000000; // 10 millions
Array<float> filter_taps(dim_vector (1,102),5.55); // # taps = 102
Array<std::complex<float>> A(dim_vector (1,len), std::complex<float>(0x00,0x00));
for (octave_idx_type i = 0; i < len; i++)
A(0,i) = std::complex<float>(1,5);
octave_value_list in;
in.append(octave_value(filter_taps));
in.append(octave_value(1));
in.append(octave_value(A));
octave_value_list out = Ffilter (in, 1);
A = out(0).complex_array_value();
return 0;
}
I am using Ubuntu 18.04 and Octave is installed in my system, and I am using Octave C++ API.
I compile this code by running g++ Ffilter.cpp -o Ffilter -loctave -loctinterp, and execute this by ./Ffilter. I am getting no error, results are exactly same as Octave output. But the execution time is approximately 9 seconds.
But when I run the equivalent code in Octave, then the execution time is 2.85 seconds.
The snippet of Octave code is given below.
h = 5.55*ones(1,102);
A = ones(1,10000000)+i*5*ones(1,10000000);
A = filter(h,1,A);
As Octave functions itself written in C/C++, and also C++ is faster than Octave. Thus the execution time of the C++ code should be much shorter than Octave code but the opposite is happening...
Why is this happening? What am I doing wrong?
Is there any way to reduce the execution time to use the filter function (built-in) using C++ (Octave API)? If so, please provide a demo code.

You seem to be missing the big -O:
g++ ofilter.cpp -O3 -o Ffilter -loctave -loctinterp
Most optimizations are completely disabled at -O0 or if an -O level is not set on the command line, even if individual optimization flags are specified. 1
For me, however, the unoptimized version ran about as fast as octave. And octave wasn't that much slower than pure optimized C++. Maybe you should test it on different machine.
The octave is only slow when it is interpreting its own code. In the script you've posted there isn't much work for interpreter/virtual machine, it basically only does some moving of data around the ram and calls the same long running C++ functions as the compiled version.
Have a look at profile function, if you use it you'll see that time octave spends running the filter is about the same as the compiled version runtime. And C++ is faster in this case only because data initialization code is easier in the complied version.

Fortran execution time

I am new with Fortran and I would like to ask for help. My code is very simple. It just enters a loop and then using system intrinsic procedure enters the file with the name code and runs the evalcode.x program.
program subr1
implicit none
integer :: i,
real :: T1,T2
call cpu_time(T1)
do i=1,6320
call system ("cd ~/code; ../evalcede/source/evalcode.x test ")
enddo
call cpu_time(T2)
print *, T1,T2
end program subr1
The time measured that the program is actually running is 0.5 sec, but time that this code actually needs for execution is 1.5 hours! The program is suspended or waiting and I do not know why.

note: this is more an elaborated comment to the post of Janneb to provide a bit more information.
As indicated by Janneb, the function CPU_TIME does not necesarily return wall-clock time, what you are after. This especially when timing system calls.
Furthermore, the output of CPU_TIME is really a processor and compiler dependent value. To demonstrate this, the following code is compiled with gfortran, ifort and solaris-studio f90:
program test_cpu_time
real :: T1,T2
call cpu_time(T1)
call execute_command_line("sleep 5")
call cpu_time(T2)
print *, T1,T2, T2-T1
end program test_cpu_time
#gfortran>] 1.68200000E-03 1.79799995E-03 1.15999952E-04
#ifort >] 1.1980000E-03 1.3410000E-03 1.4299992E-04
#f90 >] 0.0E+0 5.00534 5.00534
Here, you see that both gfortran and ifort exclude the time of the system-command while solaris-studio includes the time.
In general, one should see the difference between the output of two consecutive calls to CPU_TIME as the time spend by the CPU to perform the actions. Due to the system call, the process is actually in a sleep state during the time of execution and thus no CPU time is spent. This can be seen by a simple ps:
$ ps -O ppid,nlwp,psr,stat $(pgrep sleep) $(pgrep a.out)
PID PPID NLWP PSR STAT S TTY TIME COMMAND
27677 17146 1 2 SN+ S pts/40 00:00:00 ./a.out
27678 27677 1 1 SN+ S pts/40 00:00:00 sleep 5
NLWP indicates how many threads in use
PPID indicates parent PID
STAT indicates 'S' for interruptible sleep (waiting for an event to complete)
PSR is the cpu/thread it is running on.
You notice that the main program a.out is in a sleep state and both the system call and the main program are running on separate cores. Since the main program is in a sleep state, the CPU_TIME will not clock this time.
note: solaris-studio is the odd duck, but then again, it's solaris studio!
General comment: CPU_TIME is still useful for determining the execution time of segments of code. It is not useful for timing external programs. Other more dedicated tools exist for this such as time: The OP's program could be reduced to the bash command:
$ time ( for i in $(seq 1 6320); do blabla; done )
This is what the standard has to say on CPU_TIME(TIME)
CPU_TIME(TIME)
Description: Return the processor time.
Note:13.9: A processor for which a single result is inadequate (for example, a parallel processor) might choose to
provide an additional version for which time is an array.
The exact definition of time is left imprecise because of the variability in what different processors are able
to provide. The primary purpose is to compare different algorithms on the same processor or discover which
parts of a calculation are the most expensive.
The start time is left imprecise because the purpose is to time sections of code, as in the example.
Most computer systems have multiple concepts of time. One common concept is that of time expended by
the processor for a given program. This might or might not include system overhead, and has no obvious
connection to elapsed “wall clock” time.
source: Fortran 2008 Standard, Section 13.7.42
On top of that:
It is processor dependent whether the results returned from CPU_TIME, DATE_AND_TIME and SYSTEM_CLOCK are dependent on which image calls them.
Note 13.8: For example, it is unspecified whether CPU_TIME returns a per-image or per-program value, whether all
images run in the same time zone, and whether the initial count, count rate, and maximum in SYSTEM_CLOCK are the same for all images.
source: Fortran 2008 Standard, Section 13.5

The CPU_TIME intrinsic measures CPU time consumed by the program itself, not including those of it's subprocesses (1).
Apparently most of the time is spent in evalcode.x which explains why the reported wallclock time is much higher.
If you want to measure wallclock time intervals in Fortran, you can use the SYSTEM_CLOCK intrinsic.
(1) Well, that's what GFortran does, at least. The standard doesn't specify exactly what it means.

Is there a controllable way to slow down my C++ program on online judges?

I am looking for a controllable way (easy to set the time for delay) to slow down my C++ solution on online judges. (Mainly for UVa, g++ 4.8.2 -lm -lcrypt -O2 -std=c++11 -pipe)
I've tried the following code:
{auto start=std::chrono::high_resolution_clock::now();
while (std::chrono::duration<double,std::milli>
(std::chrono::high_resolution_clock::now()-start).count()<2000);
}
But the solution was slowed down for about 1.6 seconds, not the expected 2 seconds, I don't know why.
I also tried std::this_thread::sleep_for and usleep() fom <unistd.h>, but these almost didn't influence the runtime on online judges.
For std::this_thread::sleep_for, I tried:
std::this_thread::sleep_for(std::chrono::milliseconds(2600));
The reason why I want to do this is my teacher often assign problems on these online judges, and our homework grader will submit our solutions to those online judges to check if they can get AC (Accepted). As a result, my solution will be counted twice in the ranking system and I think this is unfair for later users, especially when my solution was ranked at the top of the ranklist. So I prefer to slow down my solution to reduce the influence on other users before submitting it to the homework grading system.

If you want to suspend execution of your program for a given amount of time, then std::this_thread::sleep_for is the way to go. Note however, that it really makes your thread sleep. That is, it relinquishes the CPU while it is sleeping. If the benchmark environment is measuring CPU time as opposed to wall time, then sleeping will not “help”. Instead, what you have to do is give the CPU some useless work to do. (Un)fortunately, compilers have become very good at eliminating useless work so you have to be careful.
You can use the time (1) utility to measure CPU and wall time consumed by your program.
This program sleeps for two wall time seconds.
#include <chrono>
#include <thread>
int
main()
{
std::this_thread::sleep_for(std::chrono::seconds {2});
}
$ g++ -o wall -std=c++14 -Wall -Wextra -Werror -pedantic wall.cxx -pthread
$ time ./wall
real 0m2.003s
user 0m0.000s
sys 0m0.000s
As you can see, the elapsed “real” time is almost exactly two seconds but the CPU time (detailed into CPU time used in user mode and by the kernel) is negligible.
This program wastes two seconds worth of CPU time.
#include <ctime>
int
main()
{
const auto t0 = std::clock();
while ((std::clock() - t0) / CLOCKS_PER_SEC < 2)
continue;
}
$ g++ -o cpu1 -std=c++14 -Wall -Wextra -Werror -pedantic cpu1.cxx
$ time ./cpu1
real 0m2.003s
user 0m0.530s
sys 0m1.470s
Again, the total (“real”) execution time is two seconds but this time, we've also spent about half a second in user-mode and one and a half second in kernel mode (due to the many calls to clock).
You can shift this by doing more work in user mode. For example, instead of immediately calling std::clock again, we can do some silly loop.
#include <ctime>
int
main()
{
const auto t0 = std::clock();
while ((std::clock() - t0) / CLOCKS_PER_SEC < 2)
{
int dummy;
volatile int * pdummy = &dummy;
for (int i = 0; i < 1'000'000; ++i)
*pdummy = i;
}
}
$ g++ -o cpu2 -std=c++14 -Wall -Wextra -Werror -pedantic cpu2.cxx
$ time ./cpu2
real 0m2.005s
user 0m2.003s
sys 0m0.000s
This time, nearly all CPU cycles were wasted in user mode. You may have to tinker with the magic number if your computer needs too long for one million iterations.

In morden online judgment systems usualy there is difference between time limit (which is count process system time) and real time limit (which is count real time elapsed). So your code get 1.6 sec from process time during 2 sec because your process gets about 80% cpu on server.

I have written a small test case for what you want and I cannot reproduce, that std::this_thread::sleep_for() does not give the expected result.
#include <thread>
#include <chrono>
int main() {
std::this_thread::sleep_for(std::chrono::seconds(2));
return 0;
}
This will give me (compiled using g++ -o slow slow.cpp -std=c++11) when run with time ./slow:
real 0m2.001s
user 0m0.000s
sys 0m0.000s
Thus giving the expected time of about 2 seconds.
Please notice that std::this_thread::sleep_for will block the current thread for at least the given duration. So it might block a few milliseconds longer than expected. See here:
Blocks the execution of the current thread for at least the specified
sleep_duration.
A steady clock is used to measure the duration. This function may
block for longer than sleep_duration due to scheduling or resource
contention delays.

Online judge systems usually do not rely on real execution time. Instead of this, they measure user CPU time or user + system (depends on implementation).
It means that you should consume 2 seconds of CPU time rather than execute for 2 seconds of real time. There is no system-independent way of doing this. If the server is running on Linux, you can try this solution: How do I get the total CPU usage of an application from /proc/pid/stat?. But if the online judge system is smart enough, it may block these actions.

C++ Time measurement of functions

I need to measure the time of a C++ programs, especially the overall running time of some recursive functions. There are a lot of function calls inside other functions. My first thought was to implement some time measurements functions in the actual code.
The problem with gprof is, that it prints out the time of class operators of a datatype, but i only need the infomartion about the functions and "-f func_name prog_name" wont work.
So, what is the most common way in science to measure time of a numerical program?
Its something like this:
function2()
{
}
function1()
{
function2();
funtcion1();
}
int main(){
function1();
}

If you're using the GNU package, i.e. gcc, you can try gprof. Just compile your program with -g and -pg flags and then run
gprof <your_program_name>
gprof: http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.html
EDIT:
In order to increase the level of detail you can run gprof with other flags:
-l (--line) enables line by line profiling, giving you a histogram hits to be charged to individual lines of code, instead of functions.
-a Don’t include private functions in the output.
-e <function> Exclude output for a function <function>. Use this when there are functions that won’t be changed. For example, some sites have source code that’s been approved by a regulatory agency, and no matter how inefficient, the code will remain unchanged.
-E <function> Also exclude the time spent in the function from the percentage tables.
-f <function> The opposite of -e: only track time in <function>.
-F <function> Only use the time in <function> when calculating percentages.
-b Don’t print the explanatory text. If you’re more experienced, you can appreciate this option.
-s Accumulate samples. By running the program several times, it’s possible to get a
better picture of where time is spent. For example, a slow routine may not be called
for all input values, and therefore you maybe mislead reading where to find
performance problems.

If you need higher precision (for functions which do not take more than few (or less) milliseconds), you can use std::chrono::high_resolution_clock:
auto beginT = std::chrono::high_resolution_clock::now();
//Your computation here
auto endT = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(endT - beginT).count()
The std::chrono::high_resolution_clock can be found in chrono header and is part of C++11 stadard.

How to compile and execute from memory directly?

Is it possible to compile a C++ (or the like) program without generating the executable file but writing it and executing it directly from memory?
For example with GCC and clang, something that has a similar effect to:
c++ hello.cpp -o hello.x && ./hello.x $# && rm -f hello.x
In the command line.
But without the burden of writing an executable to disk to immediately load/rerun it.
(If possible, the procedure may not use disk space or at least not space in the current directory which might be read-only).

Possible? Not the way you seem to wish. The task has two parts:
1) How to get the binary into memory
When we specify /dev/stdout as output file in Linux we can then pipe into our program x0 that reads
an executable from stdin and executes it:
gcc -pipe YourFiles1.cpp YourFile2.cpp -o/dev/stdout -Wall | ./x0
In x0 we can just read from stdin until reaching the end of the file:
int main(int argc, const char ** argv)
{
const int stdin = 0;
size_t ntotal = 0;
char * buf = 0;
while(true)
{
/* increasing buffer size dynamically since we do not know how many bytes to read */
buf = (char*)realloc(buf, ntotal+4096*sizeof(char));
int nread = read(stdin, buf+ntotal, 4096);
if (nread<0) break;
ntotal += nread;
}
memexec(buf, ntotal, argv);
}
It would also be possible for x0 directly execute the compiler and read the output. This question has been answered here: Redirecting exec output to a buffer or file
Caveat: I just figured out that for some strange reason this does not work when I use pipe | but works when I use the x0 < foo.
Note: If you are willing to modify your compiler or you do JIT like LLVM, clang and other frameworks you could directly generate executable code. However for the rest of this discussion I assume you want to use an existing compiler.
Note: Execution via temporary file
Other programs such as UPX achieve a similar behavior by executing a temporary file, this is easier and more portable than the approach outlined below. On systems where /tmp is mapped to a RAM disk for example typical servers, the temporary file will be memory based anyway.
#include<cstring> // size_t
#include <fcntl.h>
#include <stdio.h> // perror
#include <stdlib.h> // mkostemp
#include <sys/stat.h> // O_WRONLY
#include <unistd.h> // read
int memexec(void * exe, size_t exe_size, const char * argv)
{
/* random temporary file name in /tmp */
char name[15] = "/tmp/fooXXXXXX";
/* creates temporary file, returns writeable file descriptor */
int fd_wr = mkostemp(name, O_WRONLY);
/* makes file executable and readonly */
chmod(name, S_IRUSR | S_IXUSR);
/* creates read-only file descriptor before deleting the file */
int fd_ro = open(name, O_RDONLY);
/* removes file from file system, kernel buffers content in memory until all fd closed */
unlink(name);
/* writes executable to file */
write(fd_wr, exe, exe_size);
/* fexecve will not work as long as there in a open writeable file descriptor */
close(fd_wr);
char *const newenviron[] = { NULL };
/* -fpermissive */
fexecve(fd_ro, argv, newenviron);
perror("failed");
}
Caveat: Error handling is left out for clarities sake. Includes for sake of brevity.
Note: By combining step main() and memexec() into a single function and using splice(2) for copying directly between stdin and fd_wr the program could be significantly optimized.
2) Execution directly from memory
One does not simply load and execute an ELF binary from memory. Some preparation, mostly related to dynamic linking, has to happen. There is a lot of material explaining the various steps of the ELF linking process and studying it makes me believe that theoretically possible. See for example this closely related question on SO however there seems not to exist a working solution.
Update UserModeExec seems to come very close.
Writing a working implementation would be very time consuming, and surely raise some interesting questions in its own right. I like to believe this is by design: for most applications it is strongly undesirable to (accidentially) execute its input data because it allows code injection.
What happens exactly when an ELF is executed? Normally the kernel receives a file name and then creates a process, loads and maps the different sections of the executable into memory, performs a lot of sanity checks and marks it as executable before passing control and a file name back to the run-time linker ld-linux.so (part of libc). The takes care of relocating functions, handling additional libraries, setting up global objects and jumping to the executables entry point. AIU this heavy lifting is done by dl_main() (implemented in libc/elf/rtld.c).
Even fexecve is implemented using a file in /proc and it is this need for a file name that leads us to reimplement parts of this linking process.
Libraries
UserModeExec
libelf -- read, modify, create ELF files
eresi -- play with elfes
OSKit (seems like a dead project though)
Reading
http://www.linuxjournal.com/article/1060?page=0,0 -- introduction
http://wiki.osdev.org/ELF -- good overview
http://s.eresi-project.org/inc/articles/elf-rtld.txt -- more detailed Linux-specific explanation
http://www.codeproject.com/Articles/33340/Code-Injection-into-Running-Linux-Application -- how to get to hello world
http://www.acsu.buffalo.edu/~charngda/elf.html -- nice reference of ELF structure
Loaders and Linkers by John Levine -- deeoer explanation of linking
Related Questions at SO
Linux user-space ELF loader
ELF Dynamic loader symbol lookup ordering
load-time ELF relocation
How do global variables get initialized by the elf loader
So it seems possible, you decide whether is also practical.

Yes, though doing it properly requires designing significant parts of the compiler with this in mind. The LLVM guys have done this, first with a kinda-separate JIT, and later with the MC subproject. I don't think there's a ready-made tool doing it. But in principle, it's just a matter of linking to clang and llvm, passing the source to clang, and passing the IR it creates to MCJIT. Maybe a demo does this (I vaguely recall a basic C interpreter that worked like this, though I think it was based on the legacy JIT).
Edit: Found the demo I recalled. Also, there's cling, which seems to do basically what I described, but better.

Linux can create virtual file systems in RAM using tempfs. For example, I have my tmp directory set up in my file system table like so:
tmpfs /tmp tmpfs nodev,nosuid 0 0
Using this, any files I put in /tmp are stored in my RAM.
Windows doesn't seem to have any "official" way of doing this, but has many third-party options.
Without this "RAM disk" concept, you would likely have to heavily modify a compiler and linker to operate completely in memory.

If you are not specifically tied to C++, you may also consider other JIT based solutions:
in Common Lisp SBCL is able to generate machine code on the fly
you could use TinyCC and its libtcc.a which emits quickly poor (i.e. unoptimized) machine code from C code in memory.
consider also any JITing library, e.g. libjit, GNU Lightning, LLVM, GCCJIT, asmjit
of course emitting C++ code on some tmpfs and compiling it...
But if you want good machine code, you'll need it to be optimized, and that is not fast (so the time to write to a filesystem is negligible).
If you are tied to C++ generated code, you need a good C++ optimizing compiler (e.g. g++ or clang++); they take significant time to compile C++ code to optimized binary, so you should generate to some file foo.cc (perhaps in a RAM file system like some tmpfs, but that would give a minor gain, since most of the time is spent inside g++ or clang++ optimization passes, not reading from disk), then compile that foo.cc to foo.so (using perhaps make, or at least forking g++ -Wall -shared -O2 foo.cc -o foo.so, perhaps with additional libraries). At last have your main program dlopen that generated foo.so. FWIW, MELT was doing exactly that, and on Linux workstation the manydl.c program shows that a process can generate then dlopen(3) many hundred thousands of temporary plugins, each one being obtained by generating a temporary C file and compiling it. For C++ read the C++ dlopen mini HOWTO.
Alternatively, generate a self-contained source program foobar.cc, compile it to an executable foobarbin e.g. with g++ -O2 foobar.cc -o foobarbin and execute with execve that foobarbin executable binary
When generating C++ code, you may want to avoid generating tiny C++ source files (e.g. a dozen lines only; if possible, generate C++ files of a few hundred lines at least; unless lots of template expansion happens thru extensive use of existing C++ containers, where generating a small C++ function combining them makes sense). For instance, try if possible to put several generated C++ functions in the same generated C++ file (but avoid having very big generated C++ functions, e.g. 10KLOC in a single function; they take a lot of time to be compiled by GCC). You could consider, if relevant, to have only one single #include in that generated C++ file, and pre-compile that commonly included header.
Jacques Pitrat's book Artificial Beings, the conscience of a conscious machine (ISBN 9781848211018) explains in details why generating code at runtime is useful (in symbolic artificial intelligence systems like his CAIA system). The RefPerSys project is trying to follow that idea and generate some C++ code (and hopefully, more and more of it) at runtime. Partial evaluation is a relevant concept.
Your software is likely to spend more CPU time in generating C++ code than GCC in compiling it.

tcc compiler "-run" option allows for exactly this, compile into memory, run there and finally discard the compiled stuff. No filesystem space needed. "tcc -run" can be used in shebang to allow for C script, from tcc man page:
#!/usr/local/bin/tcc -run
#include <stdio.h>
int main()
{
printf("Hello World\n");
return 0;
}
C scripts allow for mixed bash/C scripts, with "tcc -run" not needing any temporary space:
#!/bin/bash
echo "foo"
sed -n "/^\/\*\*$/,\$p" $0 | tcc -run -
exit
/**
*/
#include <stdio.h>
int main()
{
printf("bar\n");
return 0;
}
Execution output:
$ ./shtcc2
foo
bar
$
C scripts with gcc are possible as well, but need temporary space like others mentioned to store executable. This script produces same output as the previous one:
#!/bin/bash
exc=/tmp/`basename $0`
if [ $0 -nt $exc ]; then sed -n "/^\/\*\*$/,\$p" $0 | gcc -x c - -o $exc; fi
echo "foo"
$exc
exit
/**
*/
#include <stdio.h>
int main()
{
printf("bar\n");
return 0;
}
C scripts with suffix ".c" are nice, headtail.c was my first ".c" file that needed to be executable:
$ echo -e "1\n2\n3\n4\n5\n6\n7" | ./headtail.c
1
2
3
6
7
$
I like C scripts, because you just have one file, you can easily move around, and changes in bash or C part require no further action, they just work on next execution.
P.S:
The above shown "tcc -run" C script has a problem, C script stdin is not available for executed C code. Reason was that I passed extracted C code via pipe to "tcc -run". New gist run_from_memory_stdin.c does it correctly:
...
echo "foo"
tcc -run <(sed -n "/^\/\*\*$/,\$p" $0) 42
...
"foo" is printed by bash part, "bar 42" from C part (42 is passed argv[⁠1]), and piped script input gets printed from C code then:
$ route -n | ./run_from_memory_stdin.c
foo
bar 42
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.29.58.98 0.0.0.0 UG 306 0 0 wlan1
10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 wlan0
169.254.0.0 0.0.0.0 255.255.0.0 U 303 0 0 wlan0
172.29.58.96 0.0.0.0 255.255.255.252 U 306 0 0 wlan1
$

One can easily modify the compiler itself. It sounds hard first but thinking about it, it seams obvious. So modifying the compiler sources directly expose a library and make it a shared library should not take that much of afford (depending on the actual implementation).
Just replace every file access with a solution of a memory mapped file.
It is something I am about to do with compiling something transparently in the background to op codes and execute those from within Java.
-
But thinking about your original question it seams you want to speed up compilation and your edit and run cycle. First of all get a SSD-Disk you get almost memory speed (use a PCI version) and lets say its C we are talking about. C does this linking step resulting in very complex operations that are likely to take more time than reading and writing from / to disk. So just put everything on SSD and live with the lag.

Finally the answer to OP question is yes!
I found memrun repo from guitmz, that demoed running (x86_64) ELF from memory, with golang and assembler. I forked that, and provided C version of memrun, that runs ELF binaries (verified on x86_64 and armv7l), either from standard input, or via first argument process substitution. The repo contains demos and documentation (memrun.c is 47 lines of code only):
https://github.com/Hermann-SW/memrun/tree/master/C#memrun
Here is simplest example, with "-o /dev/fd/1" gcc compiled ELF gets sent to stdout, and piped to memrun, which executes it:
pi#raspberrypi400:~/memrun/C $ gcc info.c -o /dev/fd/1 | ./memrun
My process ID : 20043
argv[0] : ./memrun
no argv[1]
evecve --> /usr/bin/ls -l /proc/20043/fd
total 0
lr-x------ 1 pi pi 64 Sep 18 22:27 0 -> 'pipe:[1601148]'
lrwx------ 1 pi pi 64 Sep 18 22:27 1 -> /dev/pts/4
lrwx------ 1 pi pi 64 Sep 18 22:27 2 -> /dev/pts/4
lr-x------ 1 pi pi 64 Sep 18 22:27 3 -> /proc/20043/fd
pi#raspberrypi400:~/memrun/C $
The reason I was interested in this topic was usage in "C script"s. run_from_memory_stdin.c demonstrates all together:
pi#raspberrypi400:~/memrun/C $ wc memrun.c | ./run_from_memory_stdin.c
foo
bar 42
47 141 1005 memrun.c
pi#raspberrypi400:~/memrun/C $
The C script producing shown output is so small ...
#!/bin/bash
echo "foo"
./memrun <(gcc -o /dev/fd/1 -x c <(sed -n "/^\/\*\*$/,\$p" $0)) 42
exit
/**
*/
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("bar %s\n", argc>1 ? argv[1] : "(undef)");
for(int c=getchar(); EOF!=c; c=getchar()) { putchar(c); }
return 0;
}
P.S:
I added tcc's "-run" option to gcc and g++, for details see:
https://github.com/Hermann-SW/memrun/tree/master/C#adding-tcc--run-option-to-gcc-and-g
Just nice, and nothing gets stored in filesystem:
pi#raspberrypi400:~/memrun/C $ uname -a | g++ -O3 -Wall -run demo.cpp 42
bar 42
Linux raspberrypi400 5.10.60-v7l+ #1449 SMP Wed Aug 25 15:00:44 BST 2021 armv7l GNU/Linux
pi#raspberrypi400:~/memrun/C $

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js