How to securely compile and hide Clojure code?

How to securely compile and hide Clojure code? - clojure

Is there a way to securely ship Clojure written code without(or minimizing) the risk of it being decompiled and accessed?
Are jar files generated with uberjar, safe enough to pass around?
Thanks heaps!

If you can run code on a web server that is only accessed over a network by those who use your code, then as long as you keep that server secure, it does not matter whether the server has the source code or not.
It is possible to create JAR files that contain Clojure source code that can be deployed, by using the Clojure compiler on the computer where the JAR is deployed to compile the Clojure source to JVM byte code soon after the JVM process starts. You can do 'unzip -v foo.jar' on a JAR file to see a list of the file names within it, and any that have a file name suffix like '.clj', '.cljs', or '.cljc' are likely Clojure source code.
If any files in the JAR have file names ending in '.class', those are Java class files containing JVM byte code. You can run a decompiler on most such files and often get back syntactically legal Java source code that behaves the same as the Clojure source code does. e.g. See https://github.com/clojure-goes-fast/clj-java-decompiler or do a Google search for 'java decompiler' for many other such tools.
If you search for terms like 'java byte code obfuscation' you can probably find tools that claim to provide some level of scrambling of names and or functionality of JVM byte code. I do not know how effective they are.
In general, making a contract with a party that has something to lose in the contract, or more important things to do than try to reverse engineer your code, is a more sure protection against reverse engineering than technical methods.

Related

Why are there clj and cljs folders in my lein re-frame template?

Why are there clj and cljs folders in my lein re-frame template as below? And why do they both include files called .core that appear to use the same namespaces? I've been told this is the place to start when learning re-frame, but I cannot find any explanations of why the templates are setup the way they are or created including the content they include.
There is no explanation for any of the boilerplate or code that comes with any lein template which make them very hard to use for beginners.
Thanks in advance.

This setup is used to separate Clojure Backend code from the ClojureScript frontend. It isn't actually necessary and I don't particularly recommend it but I can explain its history and why you'd want to do it.
For the ClojureScript side it really doesn't matter at all.
When builing a Clojure Backend you will often deploy in some "uberjar" or "uberwar" setup. This means that all source files and dependencies are packed into one single .jar file (basically just a zip file). This is typically done by including all files from a specified set of directories, so it would include src/clj but not src/cljs. If everything is in one directory it would add the .cljs files as well although they are never used by the Clojure backend. So in essence it just makes your "uberjar" bigger. It is not an important optimizations but some people prefer to keep things lean and clean.
In addition some developers just prefer to separate the code this way. In this case the template authors did.

One answer:
As the comments point out, some projects develop the backend code (Clojure) and the frontend code (ClojureScript) in the same project repo. I think this is a mistake as it can easily lead to confusion and entanglement (esp. if using lein to start both projects simultaneously). IMHO it is better to keep both front- and back-end parts in separate repositories. I would also strongly recommend using figwheel-main and the Clojure deps build tool for the CLJS code.
Another answer:
For CLJS code, any macros have to be defined in an "earlier" compilation stage. Thus, for namespaces defining macros, you often see files like either util.clj or util.cljc to define the macro, and then a file like util.cljs where the macro is used.
You can find more information below, but it is subtle & confusing:
https://clojurescript.org/guides/ns-forms
https://clojurescript.org/about/differences
https://blog.fikesfarm.com/posts/2018-08-12-two-file-clojurescript-namespace-pattern.html

What does embedding a language into another do?

This may be kind of basic but... here goes.
If I decide to embed some kind of scripting language like Lua or Ruby into a C++ program by linking it's interpreter what does that allow me to do in C++ then?
Would I be able to write Ruby or Lua code right into the cpp file or simply call scripts from the program?
If the latter is true, how would I do that?

Because they're scripting languages, the code is always going to be "interpreted." In reality, you aren't "calling" the script code inside your program, but rather when you reach that point, you're executing the interpreter in the context of that thread (the thread that reaches the scripting portion), which then reads the scripting language and executes the applicable machine code after interpreting it (JIT compiling kind of, but not really, there's no compiling involved).
Because of this, its basically the same thing as forking the interpreter and running the script, unless you want access to variables in your compiled program/in your script from the compiled program. To access values to/from, because you're using the thread that has your compiled program's context, you should be able to store script variables on the stack as well and access them when your thread stops running the interpreter (assuming you stored the variables on the stack).
Edit: response:
You would have to write it yourself. Think about it this way: if you want to use assembly in c++, you use the asm keyword. You then in the c++ compiler, need to parse the source file, get to the asm keyword, and then switch to the assembly compiler. Then the assembly compiler needs to go until the end bracket of the asm region and compile this code.
If you want to do this,it will be a bit different, since assembly gets compiled, not interpreted (which is what you want to do). What you'll need to do, is change the compiler you're using (lets say c++), so that it recognizes your own user defined keyword. Lets say this keyword is scriptX{}. You need to change the c++'s parser so that when it see's scriptX{}, it stores everything between the brackets in the readonly data section of your compiled program. You then need to add a hook in the compiled assembly file to switch the context of the thread to your script interpreter, and start the program counter at the beginning of your script section (which you put in read only data section of the object file).
Good luck with that...

A common reason to embed a scripting language into a program is to provide for the ability to control the program with scripts provided by the end user.
Probably the simplest example of such a script is a configuration file. Assume that your program has options, and needs to remember the options from run to run. You could write them out to a file as a binary image of your options structure, but that would be fragile, not easy to inspect or edit, and likely not portable across systems. Writing the options out in plain text with some sort of labels for which is which addresses most of those complaints, but now you need to parse that text and recover the options. Then some users want different options on Tuesdays, want to do simple arithmetic to compute one option from another, or to write one configuration file that they can use on both Windows and Linux, and pretty soon you find yourself inventing a little language to express all of those ideas and mechanisms with. At this point, there's a better way.
The languages Lua and TCL both grew out of essentially that scenario. Larger systems needed to be configured and controlled by end users. End users wanted to edit a simple text file and get immediate satisfaction, even (especially) when working with large systems that might have required hours to compile successfully.
One advantage here is that rather than inventing a programming language one feature at a time as user's needs change, you start with a complete language along with its documentation. The language designer has already made a number of tough decisions for you (how do I represent strings and numbers, what about lists, what about named values, what does if look like, etc.) and has generally also brought a carefully designed and debugged implementation to the table.
Lua is particularly easy to integrate. Reading a simple configuration file and extracting the settings from the Lua state can be done using a small subset of its C API. Once you have Lua available, it is attractive to use it for other purposes. In many cases, you will find that it is more productive to write only the innermost loops in C, and use Lua to glue those functions together and provide all the "business logic" of the application. This is how Adobe Lightroom is implemented, as well as many games on platforms ranging from simple set-top-boxes to iOS devices and even PCs.

Adding source instrumentation code - Is source-to-source compiler right approach? How to build one?

I am working on a project where I need to track changes to particular set of variables in any given application code to model memory access patterns.
I can think of two approaches mainly, please give your thoughts on them.
My initial thought is to do it like many profilers like gprof would do, where I add instrumentation code in the target application code before compilation and analyze the log generated by this instrumentation code to the get required information.
To accomplish, I can only think of some sort of source-to-source compiler where it parses given code and injects instrumentation code (Same language source-source compiler) into application which I can later compile and run to get the required logs.
Does this seem right or am I over-engineering? If not, are there tools that let me build a source-source compiler (relatively) easily?
I read about GDB's support for python, so, I am thinking if I can write a python script to get set of variables as config file and set watchpoints and log everytime there is a write to variables being watched. I tried to use this GDB feature but on my Ubuntu machine it doesn't seem to be working for now.
http://sourceware.org/gdb/onlinedocs/gdb/Python.html#Python
And, the language of applications is going to be nesC (I guess nesC is converted to C in the process of compilation) (and applications are going to run on TOSSIM like native apps on my computer).

See my paper on instrumenting codes using a program transformation systems (PTS) (PTS is a very general kind of "source-to-source compiler).
It shows how to install probes in code in a pretty straightforward way, once you have a grammar for the language of interest. The underlying tool, DMS, makes it fairly easy to define the grammar too.

Show Delphi And C++ Source Code

How can I see the source code of an executable compiled by Delphi or C++?
Please help me.
After Edit:
I have a program. When I start this program, it shows a dialog and asks for a password. This password is saved in source code. I want to take this password quickly and easily.

You can't.
An enormous amount of information is thrown away when the compiler reduces human readable text source code down to machine executable code. Local variables don't need names in machine code, for example, they're just register bits in the instruction opcode.
This is why debugging a compiled executable to step through the original source files line by line can only be done if you have the compiler debug symbols to go with the executable.
There are utilities that attempt to reverse engineer machine code into source code, but the result is less readable to humans than the original machine code, in my opinion. Machine generated function names, machine generated local variables and arguments, and many times the utility has to guess as to the exact data types of arguments and local vars. (is this arg a signed int or an unsigned int? Hard to tell when it's just a stack slot or machine register)
Compiling to an intermediate representation, as is done in Java and .NET, provides for much more reversibility because the types and symbol names of much of the original code are retained. Reflector, for example, can emit C# source code that is very close to the original human written source code.

You can't. The compiler takes the source code and turns it into machine instructions leaving 'no trace' of the original source code behind.
There are programs called de-compilers, but they just basically automate reverse-engineering, they can't actually access the original source code because that's long gone.

by using a disassembler or decompiler. You can't ever get the original source code back from a binary though. That information is lost.

How Can I See a Source Code of Executive File Compiled By Delphi or C++?
You can't, because source code does not exist in compiled Delphi/C++ program.
I Have a Program.When I Start This Program,Show a Dialog And Ask a Password.This Password Saved in Source Code.I Want take This Password Quickly And Easily.
Trying to crack something, huh?
It is quite possible that password is not saved in source code. Hash function can be used on a password to check if it is valid without storing password in a source code. Even if you find a hash, it won't be easy to get a password from it.
You can get an assembler listing from program using a disassembler (Ida Pro, OllyDBG, or similar tool). And you could debug your program even without source code, although you'll see pure assembly. AFAIK, "decompilers" exist, but I haven't ever used one of them, and doubt that they will be useful for C++/Delphi code (the one that compiles into native application).
There are a few simple techniques that would allow to hack program and bypass password check (if some conditions are met, program author wasn't into security, protection is easy, etc), but I'm not sure if this is allowed discussion topic on stackoverflow.
Anyway, if you're interested in reverse engineering for legal purposes, you could try a book called "Reversing: Secrets of Reverse Engineering".

When you say "executive" do you mean "executable"? If so, decompiling will only get you assembly. Some decompilers will try to turn the assembly into a more readable form, but there's no general way to get the source code from an exe unless you actually compile the source code into the file.

First off, the password is not saved in the source code. The compilation process is one-way only; the finished product isn't going to go altering its source. (Or its binary, for that matter, in most cases at least.) The password is most likely saved in a data file someplace. And if the program's author is at all competent, the password is hashed or encrypted in some way. Decompiling the program won't help you much.
Also, as InsertNickHere mentioned, we're not a hacking site here. We're honorable coders helping each other out with the complexities involved in building legitimate software. Please take your shady questions elsewhere.

Do you recommend Enabling Code Analysis for C/C++ on Build?

I'm using Visual Studio 2010, and in my C++/CLI project there are two Code Analysis settings:
Enable Code Analysis on Build
Enable Code Analysis for C/C++ on Build
My question is about the second setting.
I've enabled it and it takes a long time to run and it doesn't find much.
Do you recommend enabling this feature? Why?

The two options you specify control the automatic execution of Code Analysis on managed and native C++ respectively.
Code Analysis of managed code is performed by FXCop engine analyzing the generated IL.
Code Analysis of native code is performed during compilation by the PREFast engine analyzing the C++ source code.
I strongly encourage you to require your developers to have run CA on their code before checking it in. If you don't, you're:
Delaying the process of ensuring that your code has no known vulnerabilities and issues that could otherwise have been systematically removed from your product's source.
Denying your developers their right to improve their skills by learning incrementally what code they should not be writing and why.
Selling your customers short because they're the ones who will suffer from crashes and security issues when they're using your product.
Further, if you're writing native C++ and have not already planned to start adorning your code with SAL Annotations, then, frankly, someone at your place of work deserves to be dragged out into the street and humiliated! There's some great stuff coming down the pipe shortly in the next version of the SAL annotations - get on it now and be way ahead of the curve compared to your competitors! :)

Never did anything for me. In theory, it's supposed to help catch logical errors, but I've never found it to report anything.

We are using LINT to do a static code analysis for plain C++ applications (no .Net, no C++/CLI).
This is different from what you are using but probably the same principles can be applied.
We execute LINT like this:
During a build, only the changed sources (CPP files) are run through LINT. Possibly many more files are being recompiled (if a header file is changed), but only the changed .CPP files are run through LINT.
Run the static code analysis on all files on a Continuous Integration server. If it finds something, let it mail the error to the developers that most recently committed changes to the versioning system, or to the main developer.
What you could do additionally is to perform a static code analysis on all files that are committed to your versioning system. E.g. in Subversion you could do this in a commit-trigger.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js