Compiling libmagic statically (c/c++ file type detection)

Compiling libmagic statically (c/c++ file type detection) - c++

Thanks to the guys that helped me with my previous question (linked just for reference).
I can place the files fileTypeTest.cpp, libmagic.a, and magic in a directory, and I can compile with g++ -lmagic fileTypeTest.cpp fileTypeTest. Later, I'll be testing to see if it runs in Windows compiled with MinGW.
I'm planning on using libmagic in a small GUI application, and I'd like to compile it statically for distribution. My problem is that libmagic seems to require the external file, magic. (I'm actually using my own shortened and compiled version, magic_short.mgc, but I digress.)
A hacky solution would be to code the file into the application, creating (and deleting) the external file as needed. How can I avoid this?
added for clarity:
magic is a text file that describes properties of different filetypes. When asked to identify a file, libmagic searches through magic. There is a compiled version, magic.mgc that works faster. My application only needs to identify a handful of filetypes before deciding what to do with them, so I'll be using my own magic_short file to create magic_short.mgc.

This is tricky, I suppose you could do it this way... by the way, I have downloaded the libmagic source and looking at it...
There's a function in there called magic_read_entries within the minifile.c (this is the pure vanilla source that I downloaded from sourceforge where it is reading from the external file.
You could append the magic file (which is found in the /etc directory) to the end of the library code, like this cat magic >> libmagic.a. In my system, magic is 474443 bytes, libmagic.a is 38588 bytes.
In the magic.c file, you would need to change the magichandle_t* magic_init(unsigned flags) function, at the end of the function, add the line magic_read_entries and modify the function itself to read at the offset of the library itself to pull in the data, treat it as a pointer to pointer to char's (char **) and use that instead of reading from the file. Since you know where the offset is to the library data for reading, that should not be difficult.
Now the function magic_read_entries will no longer be used, as it is not going to be read from a file anymore. The function `magichandle_t* magic_init(unsigned flags)' will take care of loading the entries and you should be ok there.
If you need further help, let me know,
Edit:
I have used the old 'libmagic' from sourceforge.net and here is what I did:
Extracted the downloaded archive into my home directory, ungzipping/untarring the archive will create a folder called libmagic.
Create a folder within libmagic and call it Test
Copy the original magic.c and minifile.c into Test
Using the enclosed diff output highlighting the difference, apply it onto the magic.c source.
48a49,51
> #define MAGIC_DATA_OFFSET 0x971C
> #define MAGIC_STAT_LIB_NAME "libmagic.a"
>
125a129,130
> /* magic_read_entries is obsolete... */
> magic_read_entries(mh, MAGIC_STAT_LIB_NAME);
251c256,262
<
---
>
> if (!fseek(fp, MAGIC_DATA_OFFSET, SEEK_SET)){
> if (ftell(fp) != MAGIC_DATA_OFFSET) return 0;
> }else{
> return 0;
> }
>
Then issue make
The magic file (which I copied from /etc, under Slackware Linux 12.2) is concatenated to the libmagic.a file, i.e. cat magic >> libmagic.a. The SHA checksum for magic is (4abf536f2ada050ce945fbba796564342d6c9a61 magic),
here's the exact data for magic
(-rw-r--r-- 1 root root 474443 2007-06-03 00:52 /etc/file/magic) as found on my system.
Here's the diff for the minifile.c source, apply it and rebuild minifile executable by running make again.
40c40
< magic_read_entries(mh,"magic");
---
> /*magic_read_entries(mh,"magic");*/
It should work then. If not, you will need to adjust the offset into the library for reading by modifying the MAGIC_DATA_OFFSET. If you wish, I can stick up the magic data file into pastebin. Let me know.
Hope this helps,
Best regards,
Tom.

I can tell you how to compile a library in statically - you simply pass the path to the .a file on the end of your g++ command - .a files are just archives of compiled objects (.o). Using "ldd fileTypeTest" will show you the dynamically linked libraries - ${libdir}/libmagic.so shouldn't be in it.
As for linking in an external data file... I don't know - Can you not package the application (.deb|.rpm|.tar.bz2)? On windows, I'd write an installer using NSIS.

In the past I've built self extracting archives. Basically it is a .exe file consisting of a .zip archive and code to unzip it. download the .exe, run it, and poof! you can have as many files as you want.
http://en.wikipedia.org/wiki/Self-extracting_archive

Related

How can I compile c++ to multiple files?

I have a program (cpp) with many classes. Every class is in separate source file (.h + .cpp).
How can I split the compiled program into multiple files (instead of one big executable file)?
Let's say, one file for every class (same as the code structure).
So that every time there is change in a specific class, I compile only that class, and replace the specific compiled file related to that class.
(Something similar to .DLL files in Windows.)
Example from real life:
I am making TUI interface for managing mysql.
I would like to create mysql text editor (TUI) with ncurses.
the code (class) for creating and managing single window object is in
'textWin.cpp' + 'textWin.h'
the code (class) for managing multiple windows, by creating windows objects from previous class is in winMan.cpp winMan.h
the code (class) for managing mysql database is in :
mysql.cpp mysql.h
and so on...
so, I have the following files:
MyProgram.cpp
- winMan.cpp + winMan.h
- textWin.cpp + textWin.h
- mysql.cpp + mysql.h
- ..
- ..
After g++ compilation, I get one executable file, './MyProgram' (size about 15Mb.) which I deliver to all my customers (1000's of them).
I Just found a typo in textWin.cpp, I fixed it, and I told to all customers that there is an update... all of them need to download one big 15Mb file, this consumes allot of bandwidth and server resources, for just a small update.
Is there a way to send to all my customers smaller file, that contains only the compiled code for textWin class ?
I use g++ on Centos7

The gcc compiler will happily take a list of cpp files to compile together to make one executable. You don't need to write a "containing" cpp file. However, you still have the issue that each time it rebuilds them all.
The alternative is to build each sourcefile separately to an object file, then link those all together. Hopefully each of those invocations of the compiler will add up to less time than the single command-line. But how to keep track of which cpp files actually need to be rebuilt?
The usual approach is to use a makefile and a make utility which will check the dates of all the mentioned files. There are a variety of flavours of makefile, and helper makefile engines. Download a simple package like gzip and you can quickly get an idea of how the Makefile is structured. Then there is lots of help online, or you may decide that this is just too much trouble for a project with 5 files in it.

As suggested in the comments by #RSahu
Shared Libraries (.so files) is the way to split your compiled code.
here is a small example:
https://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html

Of course, you could put your texts into separate text-files and only deploy those in the an error is there. For your special use case, where binary differences must be deployed, this question might be helpful: How do I create binary patches?
Another option, do proper versioning. That way, your customers might be able to decide for themselves. That is, if they need this update.

Building a C++ project that links to a Haskell library, using Shake and Stack

I'm trying to build a simple C++ project (an executable) that calls a Haskell function,
using Shake for the build script and calling Stack from within the script to build the Haskell library.
Let's say the Haskell library is called haskell-simple-lib.
The shake script calls stack install haskell-simple-lib which outputs an .so file: libHShaskell-simple-lib-*version*-*unique identifier*.so
My Shake rules depends on filenames, and so I can't use the aforementioned name as I don't know in advance what the unique identifier will be. And so, the Shake script runs a cp on the file to _build/libHShaskell-simple-lib.so
The link options for the C++ executable has -L_build and -lhaskell-simple-lib.
When I try to run the executable I get an error saying:
error while loading shared libraries: libHShaskell-simple-lib-0.1.0.0-8DkaSm3F3d44RUd03fOuDx-ghc7.10.2.so: cannot open shared object file: No such file or directory
But, if I rename the file I copied to _build, to the original name that stack install outputted (the one with the unique identifier), the executable runs correctly.
One would think that all I need to do is to simply cp the file to _build without erasing the unique identifier from the name, however I need to know the name of the .so file in advance for the shake script.
I don't understand why when the executable is run the original .so filename is searched for. The link flag doesn't mention the fullname of the .so that stack install outputted, only libHShaskell-simple-lib.
Could it be that the original name is embedded in the .so file? If so, how does one go about solving this issue?
EDIT:
I'm aware this could be solved using a dummy file, but I'd like to know if there's a better way to do this.

The original identifier is embedded in the .so. I don't remember all the details, but I do know that I've solved such problems using rpath twiddling in the past.

Change Linux shared library (.so file) version after it was compiled

I'm compiling Linux libraries (for Android, using NDK's g++, but I bet my question makes sense for any Linux system). When delivering those libraries to partners, I need to mark them with a version number. I must also be able to access the version number programatically (to show it in an "About" dialog or a GetVersion function for instance).
I first compile the libraries with an unversioned flag (version 0.0) and need to change this version to a real one when I'm done testing just before sending it to the partner. I know it would be easier to modify the source and recompile, but we don't want to do that (because we should then test everything again if we recompile the code, we feel like it would be less error prone, see comments to this post and finally because our development environment works this way: we do this process for Windows binaries: we set a 0.0 resources version string (.rc) and we later change it by using verpatch...we'd like to work with the same kind of process when shipping Linux binaries).
What would be the best strategy here?
To summarize, requirements are:
Compile binaries with "unset" version (0.0 or anything else)
Be able to modify this "unset" version to a specific one without having to recompile the binary (ideally, run a 3rd party tool command, as we do with verpatch under Windows)
Be able to have the library code retrieve it's version information at runtime
If your answer is "rename the .so", then please provide a solution for 3.: how to retrieve version name (i.e.: file name) at runtime.
I was thinking of some solutions but have no idea if they could work and how to achieve them.
Have a version variable (one string or 3 int) in the code and have a way to change it in the binary file later? Using a binary sed...?
Have a version variable within a resource and have a way to change it in the binary file later? (as we do for win32/win64)
Use a field of the .so (like SONAME) dedicated to this and have a tool allowing to change it...and make it accessible from C++ code.
Rename the lib + change SONAME (did not find how this can be achieved)...and find a way to retrieve it from C++ code.
...
Note that we use QtCreator to compile the Android .so files, but they may not rely on Qt. So using Qt resources is not an ideal solution.

I am afraid you started to solve your problem from the end. First of all SONAME is provided at link time as a parameter of linker, so in the beginning you need to find a way to get version from source and pass to the linker. One of the possible solutions - use ident utility and supply a version string in your binary, for example:
const char version[] = "$Revision:1.2$"
this string should appear in binary and ident utility will detect it. Or you can parse source file directly with grep or something alike instead. If there is possibility of conflicts put additional marker, that you can use later to detect this string, for example:
const char version[] = "VERSION_1.2_VERSION"
So you detect version number either from source file or from .o file and just pass it to linker. This should work.
As for debug version to have version 0.0 it is easy - just avoid detection when you build debug and just use 0.0 as version unconditionally.
For 3rd party build system I would recommend to use cmake, but this is just my personal preference. Solution can be easily implemented in standard Makefile as well. I am not sure about qmake though.

Discussion with Slava made me realize that any const char* was actually visible in the binary file and could then be easily patched to anything else.
So here is a nice way to fix my own problem:
Create a library with:
a definition of const char version[] = "VERSIONSTRING:00000.00000.00000.00000"; (we need it long enough as we can later safely modify the binary file content but not extend it...)
a GetVersion function that would clean the version variable above (remove VERSIONSTRING: and useless 0). It would return:
0.0 if version is VERSIONSTRING:00000.00000.00000.00000
2.3 if version is VERSIONSTRING:00002.00003.00000.00000
2.3.40 if version is VERSIONSTRING:00002.00003.00040.00000
...
Compile the library, let's name it mylib.so
Load it from a program, ask its version (call GetVersion), it returns 0.0, no surprise
Create a little program (did it in C++, but could be done in Python or any other languauge) that will:
load a whole binary file content in memory (using std::fstream with std::ios_base::binary)
find VERSIONSTRING:00000.00000.00000.00000 in it
confirms it appears once only (to be sure we don't modify something we did not mean to, that's why I prefix the string with VERSIONSTRING, to make it more unic...)
patch it to VERSIONSTRING:00002.00003.00040.00000 if expected binary number is 2.3.40
save the binary file back from patched content
Patch mylib.so using the above tool (requesting version 2.3 for instance)
Run the same program as step 3., it now reports 2.3!
No recompilation nor linking, you patched the binary version!

How to add and use .zip (or .pak) files to c++ project?

I'm compiling CEF (Chromium Embedded Framework) for our local html5 presentation.
I should say I'm very new for all this (CEF and C++).
I've already optimized cefclient project for the presentation, but I need to embed all html/js/css/etc files into project (reading from local storage is not an option).
As I understood, I should use .zip or .pak (renamed zip) files to embed. But how can I use them inside the project?
Should I use some lib for unzipping (zlib?) or there is another popular way? And how can I be sure that files will be compiled into project?
Sorry for such basic questions but there are very few information about this (or google hates me today).
Thank you for any help!
UPD: found great tool - WBEA (http://asterclick.drclue.net/WBEA.html), it looks like exactly what I want to, but works pretty slow (with JS).
UPD 2: It turns out that there are many ways to make HTML5 desktop application, for example Node-Webkit.
Here is an article that compares some of them http://clintberry.com/2013/html5-apps-desktop-2013/

You need:
Create zip file whitin your resources.
Embed it as win32 resource (after this step you will get correct executable with .zip file inside).
Create custom scheme handler to access this zip file.
CefZipReader class will be handly to implement handler from step 3.
Look around, may be something like what you want already exist somewhere.

This sounds very similar to self extracting installers.
No need to compile anything, just concatenate the zip to the end of the executable. All you need to do is find the offset at runtime from the start of the executable. This can be done easily by writing a large magic number and looking for it later.
Example Linux:
cat app magic_number data > new_app
Example Windows:
copy app.exe /B + magic.dat /B + data.dat /B new_app.exe

c++ boost library cannot open file

I tried to work with the boost library to read/write configuration files but I just don't get it.
I even can't run the example code from boost.org (5 Minute Tutorial)
http://www.boost.org/doc/libs/1_49_0/libs/property_tree/examples/debug_settings.cpp
I've downloaded the boost_1_49_0.zip package and unzipped it to my c++ program folder. The code compiles (TheIDE - U++) but it always says "Error: debug_settings.xml: cannot open file" which basically means that the program works, but runs into the exception.
I didn't change the code, I just copy and pasted it to get a working example which I could try to understand then. But I don't even get this one to work. (Since it's exactly the same as in the link, I don't paste the code here... unless you think it's better.)
Please help me... or point to a different way to store variables in a file with some kind of structure (I wan't to learn a way that works for windows and linux, because some of my apps are cross-platform.)
Thanks.
EDIT: debug_settings.xml is in the same folder as the .cpp file
EDIT2: Working now, the debug_settings.xml is now in the folder where the executable is stored. (in my case, U++/TheIDE it's C:\upp\out\MyApps\MINGW.Debug.Debug_Full.Sse2 for debugging)

The configuration file would need to be in the working directory of the executable when it's running.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js