Is there any way to transform strings into compileable/runnable code? - c++

For example, suppose we have a string like:
string x = "for(int i = 0; i < 10; i++){cout << \"Hello World!\n\";}"
What is the simplest way to complete the following function definition:
void do_code(string x); /* given x that's valid c++ code, executes that code as if it were written inside of the function body */

The standard C++ libraries do not contain a C++ parser/compiler. This means that your only choice is to either find and link a C++ compiler library or to simply output your string as a file and launch the C++ compiler with a system call.
The first thing, linking to a C++ compiler, would actually be quite doable in something like Visual Studio for example, that does indeed have DLL libraries for compiling C++ and spitting out a new DLL that you could link at runtime.
The second thing, is pretty much what any IDE does. It saves your text-editor stuff into a C++ file, compile it by system-executing the compiler and run the output.
That said, there are many languages with build-in interpreter that would be more suitable for runtime code interpretation.

Not directly as you're asking for C++ to be simultaneously compiled and interpreted.
But there is LLVM, which is a compiler framework and API. That would allow you to take in this case a string containing valid C++, invoke the LLVM infrastructure and then afterwards use a LLVM-based just in time compiler as described at length here. Keep in mind you must also support the C++ library. You should also have some mechanism to map variables into your interpreted C++ and take data back out.
A big but worthy undertaking, seems like someone might have done something like this already, and maybe Cling is just that.

Use the Dynamic Linking Loader (POSIX only)
This has been tested in Linux and OSX.
#include<fstream>
#include<string>
#include<cstdlib>
#include<dlfcn.h>
void do_code( std::string x ) {
{ std::ofstream s("temp.cc");
s << "#include<iostream>\nextern \"C\" void f(){" << x << '}'; }
std::system( "g++ temp.cc -shared -fPIC -otemp.o" );
auto h = dlopen( "./temp.o", RTLD_LAZY );
reinterpret_cast< void(*)() >( dlsym( h, "f" ) )();
dlclose( h );
}
int main() {
std::string x = "for(int i = 0; i < 10; i++){std::cout << \"Hello World!\\n\";}";
do_code( x );
}
Try it online! You'll need to compile with the -ldl parameter to link libdl.a. Don't copy-paste this into production code as this has no error checking.

Works for me:
system("echo \"#include <iostream> \nint main() { for(int i = 0; i < 10; i++){std::cout << i << std::endl;} }\" >temp.cc; g++ -o temp temp.cc && ./temp");

Related

AmigaShell C++ (m68k-amigaos-g++) and command line arguments

I have tried working with the command line arguments in a small C++ program on Amiga 1200 (Workbench 3.1.4).
I have compiled with the use of bebbo’s cross-compiler g++ (m68k-amigaos-g++) (see https://github.com/bebbo/amiga-gcc) a simple CLI app that just outputs the arguments. While it works fine when compiled with 'normal' g++ in Windows, it failed in AmigaShell in Amiga Forever emulator and Amiga 1200 machine as well.
I have found on some forums that the preprocessor symbol __stdargs should be used, which as I understand instructs the compiler to handle the generated assembler as if the function was called with the parameters passed on stack and not with the use of registers. Is that correct understanding?
Is the normal that Amiga (and g++) by default use registers and it needs to be overridden for AmigaShell? I added that to __stdargs to the main() function. Anyway, that did not help.
Next, I have read, again on some forum, that -mcrt parameter has to be used when compiler output is linked. I have struggled to find the purpose do the parameter. It seems it specifies which standard C library (similar to glibc) to be linked? According the Google the following possible variants of the parameter (-mcrt=nix13, -mcrt=nix20, and mcrt=clib2) (see e.g. https://github.com/adtools/libnix).
The only one that works fine was nix20 (nix13 did not link and clib2 linked, but the program did not work on Amiga. Why in a first-place we need the standard C library?
I have used this with -mcrt: m68k-amigaos-g++ args.o -mcrt=nix20 -o args and it finally worked:
Can anybody describe to me as a newbie a bit more background details of all this?
Here is my test program:
#include <iostream>
using std::cout;
#if defined (__AMIGA__)
#define MAIN_FNC __stdargs
#else
#define MAIN_FNC
#endif
MAIN_FNC int main( int argc, char *argv[] )
{
cout << "Arguments count:" << argc << " \n";
for ( int i = 0; i < argc; i ++ )
cout << i << ". [" << argv[i] << "]\n";
return 0;
}
You don't need any MAIN_FNC, remove it. Also don't need to play with -mcrt=xxx. Just link with -noixemul option.
m68k-amigaos-g++ args.o -noixemul -o args
By default ixemul.library is used/linked (in short and very simply the ixemul.library emulate some unix behavior, see here). That cause your problem.
More info about -noixemul related to gcc & AmigaOS here:
GCC and ixemul.library

How can I undo compilation of a file overwriting another file?

I was trying to compile my c++ program using the command g++ program.cpp -o program in order to create an executable with the same name as the actual program but used g++ program.cpp -o program.cpp and now my code looks like gibberish. Is there anyway to reverse it?
Reversing this process is semi-possible, but I do not recommend it. It is easier to just rewrite the program, as the compilation loses a ton of info. If you had compiled it with -g I believe it would be possible to recover the source (I have no idea though how one would do that).
If you have to reverse it you need to use a decompiler such as Cutter/Ghidra, but the result will need to be cleaned up. Example:
#inlcude <iostream>
#include <string>
int main(){
std::string a = "Hey";
a += '\n';
std::cout << a;
return 0;
}
Compiling with g++ test.cc -o test, and then decompiling with Ghidra, gets us:
undefined8 main(void){
long in_FS_OFFSET;
allocator<char> local_49;
basic_string<char,std::char_traits<char>,std::allocator<char>> local_48 [40];
long local_20;
local_20 = *(long *)(in_FS_OFFSET + 0x28);
std::allocator<char>::allocator();
/* try { // try from 001012a3 to 001012a7 has its CatchHandler # 001012fc */
std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>>::basic_string
((char *)local_48,(allocator *)&DAT_00102005);
std::allocator<char>::~allocator(&local_49);
/* try { // try from 001012c2 to 001012d9 has its CatchHandler # 0010131a */
std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>>::operator+=
(local_48,"\n");
std::operator<<((basic_ostream *)std::cout,(basic_string *)local_48);
std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>>::~basic_string
(local_48);
if (local_20 != *(long *)(in_FS_OFFSET + 0x28)) {
/* WARNING: Subroutine does not return */
__stack_chk_fail();
}
return 0;
}
Pretty far gone, huh. The original program was only a few lines, but this seems a monstrosity. By knowing what to look for you can ignore things the compiler placed there to be safe like __stack_chk_fail(), and the anything that has in_FS_OFFSET or local_20 in it. But it is extremely laboursome in general. With some code it might be easier, but parts of it might be optimised away etc. so it is generally not recommended.
Note: I did not paste it here but at DAT_00102005, the string "Hey" is contained.

Where std::generator is in MSVC 2019? [duplicate]

This MCVE works fine in Visual Studio.
#include <experimental/generator>
#include <iostream>
std::experimental::generator<int> f() { for (int i = 0; i < 10; ++i) co_yield i; }
int main ()
{
for (int i : f())
std::cout << i << ' ';
return 0;
}
but in g++10, which is listed as having full support or C++20's coroutines, it does not.
(Taking out experimental doesn't help.)
I am compiling thus: g++ -g -std=c++2a -fcoroutines -c main.cpp.
It complains that there is no include file generator, and if I take out the #include, that generator is not a part of std:: or is not defined. I suppose there's another name for it in the new standard? Or if not, what do I do instead to get a coroutine that uses co_yield?
Nothing in GCC's status list alongside its coroutine support says it supports anything other than p0912r5, which does not provide std::generator, experimentally or otherwise.
I recall that VS added <experimental/generator> a few years ago; I guess GCC never did.
If it's currently proposed for inclusion in C++, and you can find the relevant proposal, perhaps you can track its support status. But honestly, for now, you'd be better off writing your own that works until it becomes part of some actual standard.
tl;dr: Though it is a coroutine, this feature is not part of the Coroutines TS.
If you need a generator for g++11 and above, copy paste the one here:
https://en.cppreference.com/w/cpp/coroutine/coroutine_handle

Make clang's Memory Sanitizer report unitialised variable use without it deciding branching

I was experimenting with Clang 6.0's Memory Sanitizer(MSan).
Code is compiled with
clang++ memsans.cpp -std=c++14 -o memsans -g -fsanitize=memory -fno-omit-frame-pointer -Weverything
on Ubuntu 18.04. As per the MSan documentation
It will tolerate copying of uninitialized memory, and also simple
logic and arithmetic operations with it. In general, MemorySanitizer
silently tracks the spread of uninitialized data in memory, and
reports a warning when a code branch is taken (or not taken) depending
on an uninitialized value.
So the following code does not generate any error
#include <iostream>
class Test {
public:
int x;
};
int main() {
Test t;
std::cout << t.x;
std::cout << std::endl;
return 0;
}
But this will
#include <iostream>
class Test {
public:
int x;
};
int main() {
Test t;
if(t.x) {
std::cout << t.x;
}
std::cout << std::endl;
return 0;
}
Ideally one would like both of these code samples to generate some sort of error since both are "using" an uninitialised variable in the sense that the first one is printing it. This code is a small test code and hence the error in the first code is obvious, however if it were a large codebase with a similar error, MSan would totally miss this. Is there any hack to force MSan to report this type of error as well ?
It sounds like your C++ library wasn't built with MSan. Unlike ASan and UBSan, MSan requires that the whole program was built with msan enabled. Think of it like having a different ABI, you shouldn't link two programs built with different msan settings. The one exception is libc for which msan adds "interceptors" to make it work.
If you write your own code which you want to integrate with msan by reporting an error where msan normally wouldn't (say, in a function which makes a copy but you know the data needs to be initialized) then you can use __msan_check_mem_is_initialized from the msan_interface.h file: https://github.com/llvm-mirror/compiler-rt/blob/master/include/sanitizer/msan_interface.h

Is it possible to create a function dynamically, during runtime in C++?

C++ is a static, compiled language, templates are resolved during compile time and so on...
But is it possible to create a function during runtime, that is not described in the source code and has not been converted to machine language during compilation, so that a user can throw at it data that has not been anticipated in the source?
I am aware this cannot happen in a straightforward way, but surely it must be possible, there are plenty of programing languages that are not compiled and create that sort of stuff dynamically that are implemented in either C or C++.
Maybe if factories for all primitive types are created, along with suitable data structures to organize them into more complex objects such as user types and functions, this is achievable?
Any info on the subject as well as pointers to online materials are welcome. Thanks!
EDIT: I am aware it is possible, it is more like I am interested in implementation details :)
Yes, of course, without any tools mentioned in the other answers, but simply using the C++ compiler.
just follow these steps from within your C++ program (on linux, but must be similar on other OS)
write a C++ program into a file (e.g. in /tmp/prog.cc), using an ofstream
compile the program via system("c++ /tmp/prog.cc -o /tmp/prog.so -shared -fPIC");
load the program dynamically, e.g. using dlopen()
You can also just give the bytecode directly to a function and just pass it casted as the function type as demonstrated below.
e.g.
byte[3] func = { 0x90, 0x0f, 0x1 }
*reinterpret_cast<void**>(&func)()
Yes, JIT compilers do it all the time. They allocate a piece of memory that has been given special execution rights by the OS, then fill it with code and cast the pointer to a function pointer and execute it. Pretty simple.
EDIT: Here's an example on how to do it in Linux: http://burnttoys.blogspot.de/2011/04/how-to-allocate-executable-memory-on.html
Below an example for C++ runtime compilation based on the method mentioned before (write code to output file, compile via system(), load via dlopen() and dlsym()). See also the example in a related question. The difference here is that it dynamically compiles a class rather than a function. This is achieved by adding a C-style maker() function to the code to be compiled dynamically. References:
https://www.linuxjournal.com/article/3687
http://www.tldp.org/HOWTO/C++-dlopen/thesolution.html
The example only works under Linux (Windows has LoadLibrary and GetProcAddress functions instead), and requires the identical compiler to be available on the target machine.
baseclass.h
#ifndef BASECLASS_H
#define BASECLASS_H
class A
{
protected:
double m_input; // or use a pointer to a larger input object
public:
virtual double f(double x) const = 0;
void init(double input) { m_input=input; }
virtual ~A() {};
};
#endif /* BASECLASS_H */
main.cpp
#include "baseclass.h"
#include <cstdlib> // EXIT_FAILURE, etc
#include <string>
#include <iostream>
#include <fstream>
#include <dlfcn.h> // dynamic library loading, dlopen() etc
#include <memory> // std::shared_ptr
// compile code, instantiate class and return pointer to base class
// https://www.linuxjournal.com/article/3687
// http://www.tldp.org/HOWTO/C++-dlopen/thesolution.html
// https://stackoverflow.com/questions/11016078/
// https://stackoverflow.com/questions/10564670/
std::shared_ptr<A> compile(const std::string& code)
{
// temporary cpp/library output files
std::string outpath="/tmp";
std::string headerfile="baseclass.h";
std::string cppfile=outpath+"/runtimecode.cpp";
std::string libfile=outpath+"/runtimecode.so";
std::string logfile=outpath+"/runtimecode.log";
std::ofstream out(cppfile.c_str(), std::ofstream::out);
// copy required header file to outpath
std::string cp_cmd="cp " + headerfile + " " + outpath;
system(cp_cmd.c_str());
// add necessary header to the code
std::string newcode = "#include \"" + headerfile + "\"\n\n"
+ code + "\n\n"
"extern \"C\" {\n"
"A* maker()\n"
"{\n"
" return (A*) new B(); \n"
"}\n"
"} // extern C\n";
// output code to file
if(out.bad()) {
std::cout << "cannot open " << cppfile << std::endl;
exit(EXIT_FAILURE);
}
out << newcode;
out.flush();
out.close();
// compile the code
std::string cmd = "g++ -Wall -Wextra " + cppfile + " -o " + libfile
+ " -O2 -shared -fPIC &> " + logfile;
int ret = system(cmd.c_str());
if(WEXITSTATUS(ret) != EXIT_SUCCESS) {
std::cout << "compilation failed, see " << logfile << std::endl;
exit(EXIT_FAILURE);
}
// load dynamic library
void* dynlib = dlopen (libfile.c_str(), RTLD_LAZY);
if(!dynlib) {
std::cerr << "error loading library:\n" << dlerror() << std::endl;
exit(EXIT_FAILURE);
}
// loading symbol from library and assign to pointer
// (to be cast to function pointer later)
void* create = dlsym(dynlib, "maker");
const char* dlsym_error=dlerror();
if(dlsym_error != NULL) {
std::cerr << "error loading symbol:\n" << dlsym_error << std::endl;
exit(EXIT_FAILURE);
}
// execute "create" function
// (casting to function pointer first)
// https://stackoverflow.com/questions/8245880/
A* a = reinterpret_cast<A*(*)()> (create)();
// cannot close dynamic lib here, because all functions of the class
// object will still refer to the library code
// dlclose(dynlib);
return std::shared_ptr<A>(a);
}
int main(int argc, char** argv)
{
double input=2.0;
double x=5.1;
// code to be compiled at run-time
// class needs to be called B and derived from A
std::string code = "class B : public A {\n"
" double f(double x) const \n"
" {\n"
" return m_input*x;\n"
" }\n"
"};";
std::cout << "compiling.." << std::endl;
std::shared_ptr<A> a = compile(code);
a->init(input);
std::cout << "f(" << x << ") = " << a->f(x) << std::endl;
return EXIT_SUCCESS;
}
output
$ g++ -Wall -std=c++11 -O2 -c main.cpp -o main.o # c++11 required for std::shared_ptr
$ g++ -ldl main.o -o main
$ ./main
compiling..
f(5.1) = 10.2
Have a look at libtcc; it is simple, fast, reliable and suits your need. I use it whenever I need to compile C functions "on the fly".
In the archive, you will find the file examples/libtcc_test.c, which can give you a good head start.
This little tutorial might also help you: http://blog.mister-muffin.de/2011/10/22/discovering-tcc/
#include <stdlib.h>
#include <stdio.h>
#include "libtcc.h"
int add(int a, int b) { return a + b; }
char my_program[] =
"int fib(int n) {\n"
" if (n <= 2) return 1;\n"
" else return fib(n-1) + fib(n-2);\n"
"}\n"
"int foobar(int n) {\n"
" printf(\"fib(%d) = %d\\n\", n, fib(n));\n"
" printf(\"add(%d, %d) = %d\\n\", n, 2 * n, add(n, 2 * n));\n"
" return 1337;\n"
"}\n";
int main(int argc, char **argv)
{
TCCState *s;
int (*foobar_func)(int);
void *mem;
s = tcc_new();
tcc_set_output_type(s, TCC_OUTPUT_MEMORY);
tcc_compile_string(s, my_program);
tcc_add_symbol(s, "add", add);
mem = malloc(tcc_relocate(s, NULL));
tcc_relocate(s, mem);
foobar_func = tcc_get_symbol(s, "foobar");
tcc_delete(s);
printf("foobar returned: %d\n", foobar_func(32));
free(mem);
return 0;
}
Ask questions in the comments if you meet any problems using the library!
In addition to simply using an embedded scripting language (Lua is great for embedding) or writing your own compiler for C++ to use at runtime, if you really want to use C++ you can just use an existing compiler.
For example Clang is a C++ compiler built as libraries that could be easily embedded in another program. It was designed to be used from programs like IDEs that need to analyze and manipulate C++ source in various ways, but using the LLVM compiler infrasructure as a backend it also has the ability to generate code at runtime and hand you a function pointer that you can call to run the generated code.
Clang
LLVM
Essentially you will need to write a C++ compiler within your program (not a trivial task), and do the same thing JIT compilers do to run the code. You were actually 90% of the way there with this paragraph:
I am aware this cannot happen in a straightforward way, but surely it
must be possible, there are plenty of programing languages that are
not compiled and create that sort of stuff dynamically that are
implemented in either C or C++.
Exactly--those programs carry the interpreter with them. You run a python program by saying python MyProgram.py--python is the compiled C code that has the ability to interpret and run your program on the fly. You would need do something along those lines, but by using a C++ compiler.
If you need dynamic functions that badly, use a different language :)
A typical approach for this is to combine a C++ (or whatever it's written on) project with scripting language.
Lua is one of the top favorites, since it's well documented, small, and has bindings for a lot of languages.
But if you are not looking into that direction, perhaps you could think of making a use of dynamic libraries?
Yes - you can write a compiler for C++, in C++, with some extra features - write your own functions, compile and run automatically (or not)...
Have a look into ExpressionTrees in .NET - I think this is basically what you want to achieve. Create a tree of subexpressions and then evaluate them. In an object-oriented fashion, each node in the might know how to evaluate itself, by recursion into its subnodes. Your visual language would then create this tree and you can write a simple interpreter to execute it.
Also, check out Ptolemy II, as an example in Java on how such a visual programming language can be written.
You could take a look at Runtime Compiled C++ (or see RCC++ blog and videos), or perhaps try one of its alternatives.
Expanding on Jay's answer using opcodes, the below works on Linux.
Learn opcodes from your compiler:
write own myfunc.cpp, e.g.
double f(double x) { return x*x; }
compile with
$ g++ -O2 -c myfunc.cpp
disassemble function f
$ gdb -batch -ex "file ./myfunc.o" -ex "set disassembly-flavor intel" -ex "disassemble/rs f"
Dump of assembler code for function _Z1fd:
0x0000000000000000 <+0>: f2 0f 59 c0 mulsd xmm0,xmm0
0x0000000000000004 <+4>: c3 ret
End of assembler dump.
This means the function x*x in assembly is mulsd xmm0,xmm0, ret and in machine code f2 0f 59 c0 c3.
Write your own function in machine code:
opcode.cpp
#include <cstdlib> // EXIT_FAILURE etc
#include <cstdio> // printf(), fopen() etc
#include <cstring> // memcpy()
#include <sys/mman.h> // mmap()
// allocate memory and fill it with machine code instructions
// returns pointer to memory location and length in bytes
void* gencode(size_t& length)
{
// machine code
unsigned char opcode[] = {
0xf2, 0x0f, 0x59, 0xc0, // mulsd xmm0,xmm0
0xc3 // ret
};
// allocate memory which allows code execution
// https://en.wikipedia.org/wiki/NX_bit
void* buf = mmap(NULL,sizeof(opcode),PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON,-1,0);
// copy machine code to executable memory location
memcpy(buf, opcode, sizeof(opcode));
// return: pointer to memory location with executable code
length = sizeof(opcode);
return buf;
}
// print the disassemby of buf
void print_asm(const void* buf, size_t length)
{
FILE* fp = fopen("/tmp/opcode.bin", "w");
if(fp!=NULL) {
fwrite(buf, length, 1, fp);
fclose(fp);
}
system("objdump -D -M intel -b binary -mi386 /tmp/opcode.bin");
}
int main(int, char**)
{
// generate machine code and point myfunc() to it
size_t length;
void* code=gencode(length);
double (*myfunc)(double); // function pointer
myfunc = reinterpret_cast<double(*)(double)>(code);
double x=1.5;
printf("f(%f)=%f\n", x,myfunc(x));
print_asm(code,length); // for debugging
return EXIT_SUCCESS;
}
compile and run
$ g++ -O2 opcode.cpp -o opcode
$ ./opcode
f(1.500000)=2.250000
/tmp/opcode.bin: file format binary
Disassembly of section .data:
00000000 <.data>:
0: f2 0f 59 c0 mulsd xmm0,xmm0
4: c3 ret
The simplest solution available, if you're not looking for performance is to embed a scripting language interpreter, e.g. for Lua or Python.
It worked for me like this. You have to use the -fpermissive flag.
I am using CodeBlocks 17.12.
#include <cstddef>
using namespace std;
int main()
{
char func[] = {'\x90', '\x0f', '\x1'};
void (*func2)() = reinterpret_cast<void*>(&func);
func2();
return 0;
}