I have an address with some deployed modules, is there a way for me to translate the Bytecode / ABI info to a file.move so it will be human readable?
By default, the source for a Move module is included when it is published on chain. If that is the case, you can download it like this:
aptos move download --account 6286dfd5e2778ec069d5906cd774efdba93ab2bec71550fa69363482fbd814e7 --package other
See this answer for much more information on the topic: How do I call a function in a different Move module / smart contract?
As for converting from bytecode to source, this is technically possible, but we have no tooling to support it today. Stay posted!
Related
Is there a way to securely ship Clojure written code without(or minimizing) the risk of it being decompiled and accessed?
Are jar files generated with uberjar, safe enough to pass around?
Thanks heaps!
If you can run code on a web server that is only accessed over a network by those who use your code, then as long as you keep that server secure, it does not matter whether the server has the source code or not.
It is possible to create JAR files that contain Clojure source code that can be deployed, by using the Clojure compiler on the computer where the JAR is deployed to compile the Clojure source to JVM byte code soon after the JVM process starts. You can do 'unzip -v foo.jar' on a JAR file to see a list of the file names within it, and any that have a file name suffix like '.clj', '.cljs', or '.cljc' are likely Clojure source code.
If any files in the JAR have file names ending in '.class', those are Java class files containing JVM byte code. You can run a decompiler on most such files and often get back syntactically legal Java source code that behaves the same as the Clojure source code does. e.g. See https://github.com/clojure-goes-fast/clj-java-decompiler or do a Google search for 'java decompiler' for many other such tools.
If you search for terms like 'java byte code obfuscation' you can probably find tools that claim to provide some level of scrambling of names and or functionality of JVM byte code. I do not know how effective they are.
In general, making a contract with a party that has something to lose in the contract, or more important things to do than try to reverse engineer your code, is a more sure protection against reverse engineering than technical methods.
I'm developing a new language in LLVM using the C++ API which compiles down to target the C ABI.
I would like to support modular compilation by allowing end users to build what are effectively static libraries. I noticed the LLVM C++ API has a llvm::Linker class that I can use during compilation to combine source files (llvm::Module), however I wanted to guarantee library compatibility via metadata version numbers or at least the publicly exposed interface between separate compilation runs.
Much of the information available on metadata in LLVM suggest that it should only be used for extended information that would not break correctness when silently removed.
llvm
blog
IntrinsicsMetadataAttributes
pdf
I wouldn't think this would be a deal breaker as it could be global metadata, but it would be good to get a second opinion on that point.
I also know there is a method in IRReader to parseIRFile so I can load some previously built bc files. I would be curious if it would be reasonable practice to include size and CRC information for comparison when loading these files.
My language has concepts similar to C# including interfaces. I figure I could allow modular compilation by importing/exporting an interface type along with external functions (Much like C++, I don't restrict the language to only methods of classes).
This approach allows me to include language specific information in the interface without needing to encode it in the IR as both the library and the calling code would be required to build with the interface. This again requires the interfaces to be compatible.
One language feature that would require extended information would be named parameters in functions.
My language is very type-safe and also mandates named parameters so there is no predetermined function parameter order. This allows call sites to be more explicit, the compiler to catch erroneous parameter usage, and authors have more liberty in determining default parameters as they are not restricted to the last parameters to the function.
The compiler will need to know names, modifiers, defaults, etc. of these parameters to correctly map calls at compile time, so I figure the interface approach would work well here.
TL;DR
Does LLVM have any predefined facilities for building static libraries?
Is version number, size, and CRC information reasonable use cases for LLVM's metadata?
This is probably not QUITE an answer... Or at least not a complete answer.
I like this question, as I'm going to need a solution in the future too (some time in the next few months or years) for my Pascal compiler. It supports "units" which is meant to be a separately compiled object, but currently what I do is simply drag in the source file and compile it into the main llvm::Module - that's neither efficient nor flexible (can't use the linker to choose between the "Linux" and "Windows" version of some code, for example - not that I think there is 5% chance that my compiler will work on Windows without modification anyway...)
However, I'm not sure storing the "object" file as LLVM IR would be the right thing to do. I was thinking that a better way would be to store your AST in some serialized form - then
you don't depend on LLVM versions changing the IR format.
You can add whatever metadata you like. There won't be much
difference in generating LLVM-IR from this during your link phase or
building the IR at compile and then reading the IR to figure out if
the metadata is correct. [The slow part, as you may have already found out, is the optimisation and MC generation, and you'd still have to do that either way]
Like I started out, I'm not sure this is an answer, but it's my thoughts so far on the subject. Now I'll go back to adding debug symbol stuff to my Pascal compiler... Before Christmas, I couldn't see the source in GDB. Now I can step, but no viewing of variables yet...
Aren't shaders cool? You can toss in just a plain string and as long as it is valid source, it will compile, link and execute. I was wondering if there is a way to embed GCC inside a user application so that it is "self sufficient" e.g. has the internal capability to compile native binaries compatible to itself.
So far I've been invoking stand alone GCC from a process, started inside the application, but I was wondering if there is some API or something that could allow to use "directly" rather than a standalone compiler. Also, in the case it is possible, is it permitted?
EDIT: Although the original question was about CGG, I'd settle for information how to embed LLVM/Clang too.
And now a special edit for people who cannot put 2 + 2 together: The question asks about how to embed GCC or Clang inside of an executable in a way that allows an internal API to be used from code rather than invoking compilation from a command prompt.
I'd add +1 to the suggestion to use Clang/LLVM instead of GCC. A few good reasons why:
it is more modular and flexible
compilation time can be substantially lower than GCC
it supports the platforms you listed in the comments
it has an API that can be used internally
string source = "app.c";
string target= "app";
llvm::sys::Path clangPath = llvm::sys::Program::FindProgramByName("clang");
// arguments
vector<const char *> args;
args.push_back(clangPath.c_str());
args.push_back(source.c_str());
args.push_back("-l");
args.push_back("curl");
clang::TextDiagnosticPrinter *DiagClient = new clang::TextDiagnosticPrinter(llvm::errs(), clang::DiagnosticOptions());
clang::IntrusiveRefCntPtr<clang::DiagnosticIDs> DiagID(new clang::DiagnosticIDs());
clang::DiagnosticsEngine Diags(DiagID, DiagClient);
clang::driver::Driver TheDriver(args[0], llvm::sys::getDefaultTargetTriple(), target, true, Diags);
clang::OwningPtr<clang::driver::Compilation> c(TheDriver.BuildCompilation(args));
int res = 0;
const clang::driver::Command *FailingCommand = 0;
if (c) res = TheDriver.ExecuteCompilation(*c, FailingCommand);
if (res < 0) TheDriver.generateCompilationDiagnostics(*c, FailingCommand);
Yes, it is possible, for example, QEMU does it.
I don't have any personal experience in this field, but from what I've read, it seems that LLVM might be better suited for embedding and extending than GCC.
Some older list of C++ compilers and interpreters is available at http://www.thefreecountry.com/compilers/cpp.shtml.
Answer to the "self sufficient" application is usually a good language interpreter. There are many of them out there, many compile the code into a byte code files. Very popular and easily embeddable is the Lua language interpreter. Even some strong players use it.
There was also an open source C++ interpreter with great language compatibility produced years ago starting with F.. Don't remember the rest of the name. There are also many other tools able to produce native binaries (e.g. Free Pascal).
Choice of the language and the target platform depends on the intentions. What would be the "self sufficiency" good for. Who will write those libraries. Once you have that clear - use Google - there is a wildlife out there. One of the latest beasts is the open sourced C# compiler "Roslyn"
EDIT
If you need some C compiler (as you generate C subset) that can be "embedded" you are probably looking for a "portable C compiler" in the sense that you can put it on USB stick and carry with you. Portable applications can be easily "embedded" into other applications and can be easily included in the installer.
Possibility to "embed" compiler as statically linked code into main application binary is probably not required.
Some reference to portable MinGW is described in this https://stackoverflow.com/questions/7617410/portable-c-compiler-ide SO question.
An open source C++ editor with integrated MinGW is here https://code.google.com/p/pocketcpp/.
I don't have anything more to say as I'd have to go and browse Google - so I will not win the bounty :)
Why not just call the compiler and linker from your application using fork()/exec() (for UNIX-like platforms)? Create a shared library that you can then load with dlopen().
This avoids possible licensing issues and gives you less of a maintenance burden.
This is e.g. what varnish does with its configuration files;
The VCL language is a small domain-specific language designed to be used to define request handling and document caching policies for Varnish Cache.
When a new configuration is loaded, the varnishd management process translates the VCL code to C and compiles it to a shared object which is then dynamically linked into the server process.
Is there some nice feature of the format or library for going from some part of the bytecode to the line of code it originally came from? This would obviously be useful for debugging and error messages.
In particular, I'm looking at how hard it would be to add support for source maps to js_of_ocaml.
When compiled with debug information enabled (option -g), the bytecode carries so-called "event" structures marking for example the function entry and return point, which provide source location and typing information.
As a proof of concept of how to inspect this information, I have created a small branch of the ocamlpp tool (a small utility by Benoît Vaugon to inspect bytecode files) that prints this debug information alongside the bytecode instructions.
I have no idea whether js_of_ocaml takes the necessary steps to preserve this location information throughout the compilation process. You should probably contact the maintainer, Jérôme Vouillon, to ask for more information.
js_of_ocaml -debuginfo uses debug_event in a bytecode to write the line of code in comment.
Is there any tool or method that can speed up this process?
For instance I just split neatTrick.cpp source file into two separate files neatTrickImplementation.cpp and neatTrickTests.cpp.
What I have to do now is to go through the list of #includes at the top of neatTrick.cpp and determine which of them need to go into the implementation file, and which need to go into the tests file. Some of the headers are required for both of them, some are not. Some may even be completely unnecessary.
I feel like my process (start with nothing, compile, see what's broken, add proper include, compile again, repeat) will produce the most unbloated code but it is so frustratingly slow. I think it'd be great if my IDE could analyze the rest of the headers in my project, see which ones could eliminate the current set of errors, and automate this task for me.
There was a talk by Chandler Carruth on Microsoft's "Going Native" (a C++ conference) where he said that the Clang tooling project had something in the pipeline to solve exactly this problem.
From my understanding, it was presented as something no publically available tool is able to do at the moment and most people were pretty impressed by this.
So: At the moment, there currently is no such tool. In the near future you will probably get something like this as a Clang-based tool to compile for yourself. Long-term, expect this to be a standard feature built upon a Clang toolchain.
(A bit OT: There currently is a discussion on the Clang/LLVM developers list dealing with a tooling/service infrastructure. The tools are not there yet but are under active development, currently by Google engineers, later probably by people in the whole industry and Clang open source community).
During the ACCU conference at Oxford last April, one of the speakers, Peter Sommerlad, demoed exactly this functionality with a plugin for Eclipse CDT, written by one of his students. I don't know if this plugin is already publicly available, but maybe you could drop him an e-mail to ask...