Share Fortran library without revealing source code - fortran

I have a software developed in-house. It is written in Fortran and consists of 3 kinds of files: 1) the solver files, 2) the models' files and 3) a file where the models used are defined. The solver also uses some libraries namely lapack and HSL ma41. Usually, I select the needed models for the user, compile all together and provide an executable.
I want to allow users to add their own models or modify the existing ones without being able to change/modify/see the solver source code.
One thought was to compile the solver into an object file. Then the user would compile the definition file and his models and link them together with the libraries. Is that possible? I guess then the user must have the same platform as the one the solver was compiled on? (ie Intel compiler on Windows 64-bit) So I'll need to built a library for any possible combination OS/hardware/compiler?
Another idea is to send the solver source also but use obfuscation. I can't find any tested/reliable solutions for that online? Is it a good option?
Thanks in advance.

You can distribute the object code in a library, as you propose. If the entry points for your code are in Fortran modules, then you also need to distribute the mod files (or equivalent for your compiler) that also result from compilation of the modules.
(If any of the entry points for your library code are external procedures then it is a convenience for your users if you provide interface blocks for those external procedures. These interface blocks can be in source form (the interface block contains no information beyond what your library's documentation would have to provide), or again could be pre-compiled into a module.)
Object code may be platform (architecture) specific, compiler specific, compiler version specific and in some cases compile options specific. Careful design and specification of the interface between your solver and the clients models can mitigate some of the potential variation. For example - many platforms have a well defined (perhaps through explicit specification or near ubiquitous convention) C application binary interface, so interfaces described using the C equivalent are typically robust, at the cost of significant loss of capability over a common-processor Fortran to Fortran interface.

Related

Will C++20 modules change the anatomy of static and shared libraries?

Traditionally, C++ libraries are made of a header file + the implementation compiled in a binary file (.a, .so, .dylib, .dll, ...). The header file is #included in the source code and the binary part is linked to the final executable.
Will modules from C++20 change such layout? If so, will operating systems have to upgrade the way they distribute their core libraries, e.g. the Standard Library in Linux or other core dlls in Windows?
Absolutely. Modules are a very different type of representing library code to users than conventional header/libraries. The main advantage of a module is to have it parsed to the level of the abstract syntax tree (AST) by the compiler. This only happens once per module -- in contrary to every time you include a particular header file. Thus, one possibility for speedup is to convert very frequent header files into modules and save a lot of compiler time in not re-compiling to AST many times, but just once. AST also works perfectly fine for templates ... it is a generic and complete description of the C++ language.
But this is currently also the main "drawback": ASTs are absolutely compiler dependent. There is no stability between vendors, systems, or even compiler versions. So distributing ASTs makes no sense, at least in the current toolchain environment.
Thus, in my understanding, modules are not easily a replacement for common header/libraries. They are a ideal replacement for lightweight (best header-only) and potentially highly templated code to be included many times in typical programs. Many libraries, foremost the standard libraries, are like this. But there are also many libraries of different design (small API+heavy binary backend). For those, I guess, we will have to continue to use include/libraries as before. But, C++ is backward compatible, so mixing modules with includes is no problem at all.
Also, modules can just wrap include files (e.g. with an API). This is probably not a very powerful usage of modules, but may lead to a more uniform programming pattern. In this case, you would still need to link your "so" or "dylib" or whatever library to the resulting code.
Thus, I believe the future at some point will distribute both, some C++ modules, but also conventional headers+libraries. Modules by themselves can be split into many files (only the "exported" symbols are visible to users). Modules will likely follow very different design guidelines than headers. It very likely will not be any advantage to have "one class per module", but rather "a whole bunch of logically connected code per module".

COM exe, C++, and MinGW

Have abit of an odd question; I'm using a tool supplied by a large company that, for reasons I find somewhat baffling, uses a COM interface defined inside the exe itself. In the example code they provide, it looks alittle like this.
#import "C:\\Path_To_Exe\\the.exe" rename_namespace ("exe_namespace");
From what I understand, this is the way Microsoft Visual C++ compiler understands the COM and works with it, and I have had the example code working before (currently, it doesn't compile due to fiddling with my build environment).
My question is, is there a way to do the same with MinGW? The project I'm working on is mainly using that; we can use MSVC if required, but I'd ideally like to avoid using multiple compilers if possible. I'm currently using cmake to build with, but I'm willing to use a script to build the items that need the COM interface if needed.
Thanks for your time.
The answer to "is there a way to do the same with MinGW" is no. #import is an optional tool that reads a COM type library (embedded in a binary or not, the TLB corresponds in general to an .idl file, but that also is optional), and generates C/C++ code that's heavily dependent on .c and .h files that only Visual Studio provides.
The answer to "can I do COM with MinGW" is of course yes. I don't know much about MinGW and tools, but you can do COM with any compiler since COM is (just) a binary standard.
If you get rid of #import, you'll have to change the code that uses what was generated (in the .TLH file resulting of the #import directive), COM helper, wrappers, etc. It can be a lot of work, but it's technically possible.
Now, in your context, I suppose it really depends how big the .exe's type library (the description of your COM classes, interfaces, etc.) is. Visual Studio's #import adds value, so you'll have to assess how much value it added for you.
If it's just one class, one interface for example, then it can be interesting to get rid of the #import. If the .exe already has .h files that correspond to the tlb, then you can use them, otherwise you'll have to redeclare some by yourself (and again, change the code that was using generated wrappers).
The sole fact that you ask the question makes me wonder if you have enough knowledge of COM (no offense :-) to get rid of Visual Studio.
The COM subsystem is part of the Windows API, and you can access it using C calls to that API.
However there is a huge amount of boilerplate involved in this. The compilers which support COM "out of the box" have written all this boilerplate, and packaged it up in some combination of compiled libraries, template headers, and so on.
Another part of the usual suite of tools offered by these compilers is one that can read COM interface definitions out of an existing compiled object. COM objects usually contain a binary representation of their interface, for this reason.
There are a few ways you could proceed here in order to use g++; one option is following this broad outline:
Use your MSVC installation to read the COM object and produce a C header file describing the interface.
Pick out the enumerations and GUIDs from that header file.
In g++, use the Windows API to invoke the object, using those enumerations and GUIDs.
If you want to author objects in g++ then there is a lot more work to do as you need to implement a bunch of things, but it is possible.
I have done this successfully in the past with g++ (as part of testing COM objects I'd developed). Probably somebody could develop a nice open-source suite for using COM objects, or even for authoring, that does not depend on MSVC but I'm not aware of such a thing.
I would recommend reading the books by Don Box, they fill in a lot of gaps in understanding that you will have if you've only learned about COM by working with it and reading the internet.

Creating a modular language in LLVM?

I'm developing a new language in LLVM using the C++ API which compiles down to target the C ABI.
I would like to support modular compilation by allowing end users to build what are effectively static libraries. I noticed the LLVM C++ API has a llvm::Linker class that I can use during compilation to combine source files (llvm::Module), however I wanted to guarantee library compatibility via metadata version numbers or at least the publicly exposed interface between separate compilation runs.
Much of the information available on metadata in LLVM suggest that it should only be used for extended information that would not break correctness when silently removed.
llvm
blog
IntrinsicsMetadataAttributes
pdf
I wouldn't think this would be a deal breaker as it could be global metadata, but it would be good to get a second opinion on that point.
I also know there is a method in IRReader to parseIRFile so I can load some previously built bc files. I would be curious if it would be reasonable practice to include size and CRC information for comparison when loading these files.
My language has concepts similar to C# including interfaces. I figure I could allow modular compilation by importing/exporting an interface type along with external functions (Much like C++, I don't restrict the language to only methods of classes).
This approach allows me to include language specific information in the interface without needing to encode it in the IR as both the library and the calling code would be required to build with the interface. This again requires the interfaces to be compatible.
One language feature that would require extended information would be named parameters in functions.
My language is very type-safe and also mandates named parameters so there is no predetermined function parameter order. This allows call sites to be more explicit, the compiler to catch erroneous parameter usage, and authors have more liberty in determining default parameters as they are not restricted to the last parameters to the function.
The compiler will need to know names, modifiers, defaults, etc. of these parameters to correctly map calls at compile time, so I figure the interface approach would work well here.
TL;DR
Does LLVM have any predefined facilities for building static libraries?
Is version number, size, and CRC information reasonable use cases for LLVM's metadata?
This is probably not QUITE an answer... Or at least not a complete answer.
I like this question, as I'm going to need a solution in the future too (some time in the next few months or years) for my Pascal compiler. It supports "units" which is meant to be a separately compiled object, but currently what I do is simply drag in the source file and compile it into the main llvm::Module - that's neither efficient nor flexible (can't use the linker to choose between the "Linux" and "Windows" version of some code, for example - not that I think there is 5% chance that my compiler will work on Windows without modification anyway...)
However, I'm not sure storing the "object" file as LLVM IR would be the right thing to do. I was thinking that a better way would be to store your AST in some serialized form - then
you don't depend on LLVM versions changing the IR format.
You can add whatever metadata you like. There won't be much
difference in generating LLVM-IR from this during your link phase or
building the IR at compile and then reading the IR to figure out if
the metadata is correct. [The slow part, as you may have already found out, is the optimisation and MC generation, and you'd still have to do that either way]
Like I started out, I'm not sure this is an answer, but it's my thoughts so far on the subject. Now I'll go back to adding debug symbol stuff to my Pascal compiler... Before Christmas, I couldn't see the source in GDB. Now I can step, but no viewing of variables yet...

Any tutorial for embedding Clang as script interpreter into C++ Code?

I have no experience with llvm or clang, yet. From what I read clang is said to be easily embeddable Wikipedia-Clang, however, I did not find any tutorials about how to achieve this. So is it possible to provide the user of a c++ application with scripting-powers by JIT compiling and executing user-defined code at runtime? Would it be possible to call the applications own classes and methods and share objects?
edit: I'd prefer a C-like syntax for the script-languge (or even C++ itself)
I don't know of any tutorial, but there is an example C interpreter in the Clang source that might be helpful. You can find it here: http://llvm.org/viewvc/llvm-project/cfe/trunk/examples/clang-interpreter/
You probably won't have much of a choice of syntax for your scripting language if you go this route. Clang only parses C, C++, and Objective C. If you want any variations, you may have your work cut out for you.
I think here's what exactly you described.
http://root.cern.ch/drupal/content/cling
You can use clang as a library to implement JIT compilation as stated by other answers.
Then, you have to load up the compiled module (say, an .so library).
In order to accomplish this, you can use standard dlopen (unix) or LoadLibrary (windows) to load it, then use dlsym (unix) to dynamically reference compiled functions, say a "script" main()-like function whose name is known. Note that for C++ you would have to use mangled symbols.
A portable alternative is e.g. GNU's libltdl.
As an alternative, the "script" may run automatically at load time by implementing module init functions or putting some static code (the constructor of a C++ globally defined object would be called immediately).
The loaded module can directly call anything in the main application. Of course symbols are known at compilation time by using the proper main app's header files.
If you want to easily add C++ "plugins" to your program, and know the component interface a priori (say your main application knows the name and interface of a loaded class from its .h before the module is loaded in memory), after you dynamically load the library the class is available to be used as if it was statically linked. Just be sure you do not try to instantiate a class' object before you dlopen() its module.
Using static code allows to implement nice automatic plugin registration mechanisms too.
I don't know about Clang but you might want to look at Ch:
http://www.softintegration.com/
This is described as an embeddable or stand-alone c/c++ interpreter. There is a Dr. Dobbs article with examples of embedding it here:
http://www.drdobbs.com/architecture-and-design/212201774
I haven't done more than play with it but it seems to be a stable and mature product. It's commercial, closed-source, but the "standard" version is described as free for both personal and commercial use. However, looking at the license it seems that "commercial" may only include internal company use, not embedding in a product that is then sold or distributed. (I'm not a lawyer, so clearly one should check with SoftIntegration to be certain of the license terms.)
I am not sure that embedding a C or C++ compiler like Clang is a good idea in your case. Because the "script", that is the (C or C++) code fed (at runtime!) can be arbitrary so be able to crash the entire application. You usually don't want faulty user input to be able to crash your application.
Be sure to read What every C programmer should know about undefined behavior because it is relevant and applies to C++ also (including any "C++ script" used by your application). Notice that, unfortunately, a lot of UB don't crash processes (for example a buffer overflow could corrupt some completely unrelated data).
If you want to embed an interpreter, choose something designed for that purpose, like Guile or Lua, and be careful that errors in the script don't crash the entire application. See this answer for a more detailed discussion of interpreter embedding.

C++ internal code reuse: compile everything or share the library / dynamic library?

General question:
For unmanaged C++, what's better for internal code sharing?
Reuse code by sharing the actual source code? OR
Reuse code by sharing the library / dynamic library (+ all the header files)
Whichever it is: what's your strategy for reducing duplicate code (copy-paste syndrome), code bloat?
Specific example:
Here's how we share the code in my organization:
We reuse code by sharing the actual source code.
We develop on Windows using VS2008, though our project actually needs to be cross-platform. We have many projects (.vcproj) committed to the repository; some might have its own repository, some might be part of a repository. For each deliverable solution (.sln) (e.g. something that we deliver to the customer), it will svn:externals all the necessary projects (.vcproj) from the repository to assemble the "final" product.
This works fine, but I'm quite worried about eventually the code size for each solution could get quite huge (right now our total code size is about 75K SLOC).
Also one thing to note is that we prevent all transitive dependency. That is, each project (.vcproj) that is not an actual solution (.sln) is not allowed to svn:externals any other project even if it depends on it. This is because you could have 2 projects (.vcproj) that might depend on the same library (i.e. Boost) or project (.vcproj), thus when you svn:externals both projects into a single solution, svn:externals will do it twice. So we carefully document all dependencies for each project, and it's up to guy that creates the solution (.sln) to ensure all dependencies (including transitive) are svn:externals as part of the solution.
If we reuse code by using .lib , .dll instead, this would obviously reduce the code size for each solution, as well as eliminiate the transitive dependency mentioned above where applicable (exceptions are, for example, third-party library/framework that use dll like Intel TBB and the default Qt)
Addendum: (read if you wish)
Another motivation to share source code might be summed up best by Dr. GUI:
On top of that, what C++ makes easy is
not creation of reusable binary
components; rather, C++ makes it
relatively easy to reuse source code.
Note that most major C++ libraries are
shipped in source form, not compiled
form. It's all too often necessary to
look at that source in order to
inherit correctly from an object—and
it's all too easy (and often
necessary) to rely on implementation
details of the original library when
you reuse it. As if that isn't bad
enough, it's often tempting (or
necessary) to modify the original
source and do a private build of the
library. (How many private builds of
MFC are there? The world will never
know . . .)
Maybe this is why when you look at libraries like Intel Math Kernel library, in their "lib" folder, they have "vc7", "vc8", "vc9" for each of the Visual Studio version. Scary stuff.
Or how about this assertion:
C++ is notoriously non-accommodating
when it comes to plugins. C++ is
extremely platform-specific and
compiler-specific. The C++ standard
doesn't specify an Application Binary
Interface (ABI), which means that C++
libraries from different compilers or
even different versions of the same
compiler are incompatible. Add to that
the fact that C++ has no concept of
dynamic loading and each platform
provide its own solution (incompatible
with others) and you get the picture.
What's your thoughts on the above assertion? Does something like Java or .NET face these kinds of problems? e.g. if I produce a JAR file from Netbeans, will it work if I import it into IntelliJ as long as I ensure that both have compatible JRE/JDK?
People seem to think that C specifies an ABI. It doesn't, and I'm not aware of any standardised compiled language that does. To answer your main question, use of libraries is of course the way to go - I can't imagine doing anything else.
One good reason to share the source code: Templates are one of C++'s best features because they are an elegant way around the rigidity of static typing, but by their nature are a source-level construct. If you focus on binary-level interfaces instead of source-level interfaces, your use of templates will be limited.
We do the same. Trying to use binaries can be a real problem if you need to use shared code on different platforms, build environments, or even if you need different build options such as static vs. dynamic linking to the C runtime, different structure packing settings, etc..
I typically set projects up to build as much from source on-demand as possible, even with third-party code such as zlib and libpng. For those things that must be built separately, e.g. Boost, I typically have to build 4 or 8 different sets of binaries for the various combinations of settings needed (debug/release, VS7.1/VS9, static/dynamic), and manage the binaries along with the debugging information files in source control.
Of course, if everyone sharing your code is using the same tools on the same platform with the same options, then it's a different story.
I never saw shared libraries as a way to reuse code from an old project into a new one. I always thought it was more about sharing a library between different applications that you're developing at about the same time, to minimize bloat.
As far as copy-paste syndrome goes, if I copy and paste it in more than a couple places, it needs to be its own function. That's independent of whether the library is shared or not.
When we reuse code from an old project, we always bring it in as source. There's always something that needs tweaking, and its usually safer to tweak a project-specific version than to tweak a shared version that can wind up breaking the previous project. Going back and fixing the previous project is out of the question because 1) it worked (and shipped) already, 2) it's no longer funded, and 3) the test hardware needed may no longer be available.
For example, we had a communication library that had an API for sending a "message", a block of data with a message ID, over a socket, pipe, whatever:
void Foo:Send(unsigned messageID, const void* buffer, size_t bufSize);
But in a later project, we needed an optimization: the message needed to consist of several blocks of data in different parts of memory concatenated together, and we couldn't (and didn't want to, anyway) do the pointer math to create the data in its "assembled" form in the first place, and the process of copying the parts together into a unified buffer was taking too long. So we added a new API:
void Foo:SendMultiple(unsigned messageID, const void** buffer, size_t* bufSize);
Which would assemble the buffers into a message and send it. (The base class's method allocated a temporary buffer, copied the parts together, and called Foo::Send(); subclasses could use this as a default or override it with their own, e.g. the class that sent the message on a socket would just call send() for each buffer, eliminating a lot of copies.)
Now, by doing this, we have the option of backporting (copying, really) the changes to the older version, but we're not required to backport. This gives the managers flexibility, based on the time and funding constraints they have.
EDIT: After reading Neil's comment, I thought of something that we do that I need to clarify.
In our code, we do lots of "libraries". LOTS of them. One big program I wrote had something like 50 of them. Because, for us and with our build setup, they're easy.
We use a tool that auto-generates makefiles on the fly, taking care of dependencies and almost everything. If there's anything strange that needs to be done, we write a file with the exceptions, usually just a few lines.
It works like this: The tool finds everything in the directory that looks like a source file, generates dependencies if the file changed, and spits out the needed rules. Then it makes a rule to take eveything and ar/ranlib it into a libxxx.a file, named after the directory. All the objects and library are put in a subdirectory that is named after the target platform (this makes cross-compilation easy to support). This process is then repeated for every subdirectory (except the object file subdirs). Then the top-level directory gets linked with all the subdirs' libraries into the executable, and a symlink is created, again, naked after the top-level directory.
So directories are libraries. To use a library in a program, make a symbolic link to it. Painless. Ergo, everything's partitioned into libraries from the outset. If you want a shared lib, you put a ".so" suffix on the directory name.
To pull in a library from another project, I just use a Subversion external to fetch the needed directories. The symlinks are relative, so as long as I don't leave something behind it still works. When we ship, we lock the external reference to a specific revision of the parent.
If we need to add functionality to a library, we can do one of several things. We can revise the parent (if it's still an active project and thus testable), tell Subversion to use the newer revision and fix any bugs that pop up. Or we can just clone the code, replacing the external link, if messing with the parent is too risky. Either way, it still looks like a "library" to us, but I'm not sure that it matches the spirit of a library.
We're in the process of moving to Mercurial, which has no "externals" mechanism so we have to either clone the libraries in the first place, use rsync to keep the code synced between the different repositories, or force a common directory structure so you can have hg pull from multiple parents. The last option seems to be working pretty well.