How to recover cpp data from linux library files?

How to recover cpp data from linux library files? - c++

I accidentally deleted a .cpp file with some valuable code of mine.
It was part of my own library: libandrissh.so
How can I recover it? I tried scalpel, but it did not find it.
I was wondering if I could somehow extract the info from my .so or .o other files that are in my library. I think this could be possible, because my programs using the library still work.
any suggestions?
thanks guys

If it's deleted and not in a recycle or trash bin you can't recover it easily from the compiled binaries. There are disassemblers to get you that far but I have not yet seen any decompilers that are production ready that can get you back to original sources. Even if they could it likely won't be able to recover the original symbol/variable names anyway.
Your best bet would be to look at something like PhotoRec to search the free sectors on your hard disk. Despite its name, it actually finds many different file formats including video, music, documents, text and even C source files. As long as your files haven't been overwritten, you will likely be able to find it. I used it to recover a lot of data from my wife's hard drive when her filesystem became corrupt. Also, it's free under the GPL.

if you have the library binary, you can of course disassemble it (use e.g. objdump --disassemble libandrissh.so), but going from the "bare" machine code back to a higher-level language like C++ is not easy. I'm not aware of any standard tools to do that.

You can try some disassembler like IDAPro
Depending on the compiler used, the flags and everything else you might get decent results.

Related

Access violation in module ilink32.dll

I have a gigantic C++ Builder 6 solution, when I try to compile it I get the following error when the linker starts its work:
It translates to:
---------------------------
Fehler
---------------------------
Access violation at address 0660EE22 in module 'ilink32.dll'. Reading from address 00000000.
---------------------------
OK
---------------------------
Does anyone have an idea how this comes and how I can fix it?
EDIT 1
Important note, the code sometimes compiles, mostly then, when I reset the working copy and then just modify the stuff in sublime text and use C++ Builder only for compiling. Including, I don't open a single file.
EDIT 2
Some more details, the project has about 80.000.000 lines of code (according to C++ Builder). The largest file is about 70.000 lines, but you cannot say clearly, because there are a lot of
#ifdef XY
#endif
Things.
The code itself is copy-paste from an existing part and got reviewed by some coworkers. So I think it is a bug in C++ Builder, because it actually works if I just use Sublime Text or Notepad++ to edit the stuff and then use C++ Builder to build it, it works (at least sometimes).
To be honest, I myself don't think there is a real solution. But I hope someone knows this bug. According to Google, the ilink32.dll is a C++ Builder library that is linked automatically.
Maybe someone has a solution.

The ilink32 has always had a lot of bugs. There's no chance of getting anything fixed in non-current versions , so your options are:
Look for workarounds on QC
Find your own workaround
Here are some QC searches that may or may not be useful to you.
AFAIK it is not possible to use a different linker. However you can turn on (or turn off) Incremental Linking via the project options and see if that makes a difference. Incremental linking is a speed optimization, it makes no difference to the semantics of linking.
the project has about 80.000.000 lines of code (according to C++ Builder).
Well, that number counts all lines in precompiled headers for each source file so maybe it doesn't mean much.
70K LOC is large for one source file; perhaps you could try refactoring code to have smaller object files, especially if it does seem that adding to a big file does trigger the problem.
It might be possible to identify which change you are making that is triggering the bug. For example it might be increasing a particular thing past some limit (e.g. size of one object file , number of object files, size of static data, etc.)
You could delete the precompiled header files (that is vclNN.csm, vclNN.#00, vclNN.#01, etc.) that are built and saved by default in the BCB6 lib directory. Perhaps they got corrupted or could be rebuilt better. PCH management is difficult in BCB6 anyway. (I ended up defining my own "all.h" and having every source file do #include "all.h" #pragma hdrstop). Later versions of CBB XE allow PCH injection making this process a lot tidier.
Have a look at the actual link command being passed to ilink32 and see if there are any unnecessary object files or libraries in it. You could delete and re-create the project files as they can build up crud over time as a project is developed. Actually that is probably a good idea anyway.
Another possibility might be to group some of the code into static libraries .
In all cases make sure you are using good source control so you can reverse out any failed options that might make things worse

Override c library file functions?

I am working on a game, and one of the requirements per the licence agreement of the sound assets I am using is that they be distributed in a way that makes them inaccessible to the end user. So, I am thinking about aggregating them into a flat file, encrypting them, or some such. The problem is that the sound library I am using (Hekkus Sound System) only accepts a 'char*' file path and handles file reading internally. So, if I am to continue to use it, I will have to override the c stdio file functions to handle encryption or whatever I decide to do. This seems doable, but it worries me. Looking on the web I am seeing people running into strange frustrating problems doing this on platforms I am concerned with(Win32, Android and iOS).
Does there happen to be a cross-platform library out there that takes care of this? Is there a better approach entirely you would recommend?

Do you have the option of using a named pipe instead of an ordinary file? If so, you can present the pipe to the sound library as the file to read from, and you can decrypt your data and write it to the pipe, no problem. (See Beej's Guide for an explanation of named pipes.)

Override stdio in a way that a lib you not knowing how it works exactly works in a way the developer hasn't in mind do not look like the right approach for me, as it isn't really easy. Implement a ramdrive needs so much effort that I recommend to search for another audio lib.
The Hekkus Sound System I found was build by a single person and last updated 2012. I wouldn't rely on a lib with only one person working on it without sharing the sources.
My advice, invest your time in searching for a proper sound lib instead of searching for a fishy work around for this one.

One possibility is to use a encrypted loopback filesystem (google for additional resources).
The way this works is that you put your assets on a encrypted filesystem, which actually lives in a simple file. This filesystem gets mounted someplace as a loopback device. Password needs to be supplied at attach / mount time. Once mounted, all files are available as regular files to your software. But otherwise, the files are encrypted and inaccessible.

It's compiler-dependent and not a guaranteed feature, but many allow you to embed files/resources directly into the exe and read them in your code as if from disk. You could embed your sound files that way. It will significantly increase the size of your exe however.

Another UNIX-based approach:
The environment variable LD_PRELOAD can be used to override any shared library an executable has been linked against. All symbols exported by a library mentioned in LD_PRELOAD are resolved to that library, including calls to libc functions like open, read, and close. Using the libdl, it is also possible for the wrapping library to call through to the original implementation.
So, all you need to do is to start the process which uses the Hekkus Sound System in an environment that has LD_PRELOAD set appropriately, and you can do anything you like to the file that it reads.
Note, however, that there is absolutely no way that you can keep the data inaccessible from the user: the very fact that he has to be able to hear it means he has to have access. Even if all software in the chain would use encryption, and your user is not willing to hack hardware, it would not be exactly difficult to connect the audio output jack with an audio input jack, would it? And you can't forbid you user to use earphones, can you? And, of course, the kernel can see all audio output unencrypted and can send a copy somewhere else...

The solution to your problem would be a ramdisk.
http://en.wikipedia.org/wiki/RAM_drive
Using a piece of memory in ram as if it was a disk.
There is software available for this too. Caching databases in ram is becoming popular.
And it keeps the file from being on the disk that would make it easy accessible to the user.

Virtual Files for dynamic linking

my problem is pretty complicated and potentially impossible but here we go:
Using C++,
I'm currently working on an universal server engine for a game project of mine. Universal, because every part of the engine will be loaded dynamically after startup. Now, also game objects will inherit from a base object and have overloaded "Simulate" functions. In that way, every object would have it's specific behavior and I can do something I call "C++ Scripting" which is alot faster than interpreted lua script files. Also it's more dynamic.
(Please no solutions which would kill the c++ "scripting" part, like "forget the dynamic linking, that's insane". This performance boost is totally necessary, since I'm working with large voxel maps)
My Problem:
That are indeed alot of .dll/.so files and I wanted to pack those into a simple archive so I can use zlib on said source code and maybe pack everything together with textures and sounds in little "object packages".
Now the Windows DLL API and the Linux SO API won't allow me to load a dll/so file from a memory address, which is a shame.(Am I right there, or can I bypass that? :) ) I don't want to unzip and temp save those files on the filesystem because there are hundreds to thousands of them and that would increase the loading time alot.
Also I'm not interested in more external dependencies like boost.
So here are my Questions:
Is there a cross platform-method to create virtual files IN memory with a real path?
That way I could bypass the slow IO speeds of HDDs.
Or is it really not such a big deal to use temp files, because the file buffers of modern operating systems are fast/intelligent enough to NOT write all those files to disc?
(Actually Linux supports virtual file systems, but windows does not...)
I hope you guys can help me there :)

Not with winapi, that's for sure, but you can do it manually. You can load it into the memory, fill it's import table and call exported functions (after you called DllMain). I saw a program, where someone actually created a new process with that method ... See the PE documentation for details, but it works.
Also it's relatively easy to do, since you only need to find the PE import tables, and do what the dynamic linker does, fill it with jumps and addresses. Dlls contains position independent code, so no relocation needed.
It sould be the same on linux (only using the elf structure), but if you have a better solution with virtual file systems, you should use that.

Profiling DLL/LIB Bloat

I've inherited a fairly large C++ project in VS2005 which compiles to a DLL of about 5MB. I'd like to cut down the size of the library so it loads faster over the network for clients who use it from a slow network share.
I know how to do this by analyzing the code, includes, and project settings, but I'm wondering if there are any tools available which could make it easier to pinpoint what parts of the code are consuming the most space. Is there any way to generate a "profile" of the DLL layout? A report of what is consuming space in the library image and how much?

When you build your DLL, you can pass /MAP to the linker to have it generate a map file containing the addresses of all symbols in the resulting image. You will probably have to do some scripting to calculate the size of each symbol.
Using a "strings" utility to scan your DLL might reveal unexpected or unused printable strings (e.g. resources, RCS IDs, __FILE__ macros, debugging messages, assertions, etc.).
Also, if you're not already compiling with /Os enabled, it's worth a try.

If your end goal is only to trim the size of the DLL, then after tweaking compiler settings, you'll probably get the quickest results by running your DLL through UPX. UPX is an excellent compression utility for DLLs and EXEs; it's also open-source with a non-viral license, so it's okay to use in commercial/closed-source products.
I've only had it turn up a virus warning on the highest compression setting (the brute-force option), so you'll probably be fine if you use a lower setting than that.

While i don't know about any binary size profilers, you could alternatively look for what object files (.obj) are the biggest - that gives you at least an idea of where your problematic spots are.
Of course this requires a sufficiently modularized project.

You can also try to link statically instead of using a dll. Indeed, when the library is linked statically the linker removes all unused functions from the final exe. Sometime the final exe is only slightly bigger and you don't have any more dll.

If your DLL is this big because it's exporting C++ function with exceptionally long mangled names, an alternative is to use a .DEF file to export the functions by ordinal, without name (using NONAME in the .DEF file). Somewhat brittle, but it reduces the DLL size, EXE size and load times.
See e.g. http://home.hiwaay.net/~georgech/WhitePapers/Exporting/Exp.htm

Given that all your .obj files are about the same size, assuming that you're using precompiled headers, try creating an empty obj file and see how large it is. That will give you an idea of the proportion of each .obj that's due to the PCH compilation. The linker will be able to remove all the duplicates there, incidentally. Alternatively you could try disabling PCH so that the obj files will give you a better indication of where the main culprits are.

All good suggestions. What I do is get the map file and then just eyeball it. The kind of thing I've found in the past is that a large part of the space is taken by one or more class libraries brought in by the fact that some variable somewhere was declared as having a type that sounded like it would save some coding effort but wasn't really necessary.
Like in MFC (remember that?) they have a wrapper class to go around every thing like controls, fonts, etc. that Win32 provides. Those take a ton of space and you don't always need them.
Another thing that can take a ton of space is collection classes you could manage without. Another is cout I/O routines you don't use.

i would recommend one of the following:
coverage - you can run a coverage tool in the hope of detecting some dead code
caching - cache the dll on the client side on the initial activatio
splitting - split the dll into several smaller dlls, start the application with the bootstrap dll and download the other dlls after the application starts
compilation and linking - use smaller run time library, compile with size optimization, etc. see this link for more suggestions.
compression - if you have data or large resources within the dll, you can compress them and decompress only after the download or at runtime.

How to decompress a file in fortran77?

I have a compressed file.
Let's ignore the tar command because I'm not sure it is compressed with that.
All I know is that it is compressed in fortran77 and that is what I should use to decompress it.
How can I do it?
Is decompression a one way road or do I need a certain header file that will lead (direct) the decompression?
It's not a .Z file. It ends at something else.
What do I need to decompress it? I know the format of the final decompressed archive.
Is it possible that the file is compressed thru a simple way but it appears with a different extension?

First, let's get the "fortran" part out of the equation. There is no standard (and by that, I mean the fortran standard) way to either compress or decompress files, since fortran doesn't have a compression utility as part of the language. Maybe someone written some of their own, but that's entirely up to him.
So, you're stuck with publicly available compression utilities, and such. On systems which have those available, and on compilers which support it (it varies), you can use the SYSTEM function, which executes the system command by passing a command string to the operating system's command interpreter (I know it exists in cvf, probably ivf ... you should probably look it up in help of your compiler).
Since you asked a similar question already I assume you're still having problem with this. You mentioned that "it was compressed with fortran77". What do you mean by that ? That someone builded a compression utility in f77 and used it ? So that would make it a custom solution ?
If it's some kind of a custom solution, then it can practically be anything, since a lot of algorithms can serve as "compression algorithms" (writing file as binary compared to plain text will save a few bytes; voila, "compression")
Or have I misunderstood something ? Please, elaborate this a little.

My guess is that you have a binary file, which is output by a Fortran program. These can look like compressed files because they are not readable in a text editor.
Fortran allows you to write the in-memory data out to a file without formatting it, so that you can reload it later without having to parse it. The problem, however, is that you need that original source code in order to see what types of variables are written in the file.
If you have no access to the fortran source code, but a lot of time to spare, you could write some simple fortran program and guess what types of variables are being used. I wouldn't advise it, though, as Fortran is not very forgiving.
If you want some simple source code to try, look at this page which details binary read and write in Fortran, and includes a code sample. Just start by replacing reclength=reclength*4 with reclength=reclength*2 for a double precision real.

There is no standard decompression method, there are tons. You will need to know the method used to compress it in order to decompress it.

You said that the file extension was not .Z, but something else. What was that something else?
If it's .gz (which is very common on Unix systems), "gunzip" is the proper command. If it's .tgz, you can gunzip and untar it. (Or you can read the man page for tar(1), since it probably has the ability to gunzip and extract together.)
If it's on Windows, see if Windows can read it directly, as the file system itself appears to support the ZIP format.
If something else, please just list the file name (or, if there are security implications, the file name beginning with the first period), and we might be able to figure it out.

You can check to see if it's a known compressed file type with the file command. Assuming file returns something like "binary file" then you're almost certainly looking at plain binary data.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js