How can I implement a custom iNode on linux? - c++

So every directory, file, queue or whatever in Linux creates it's own inodes that can be accessed in one way or another. How would I go about implementing my own inode type that doesn't quite fit any of the existing descriptions? A custom something that is visible in the file system but isn't a file? Do I have to extend the kernel or is there some simpler approach?

So every directory, file, queue or whatever in Linux creates it's own inodes that can be accessed in one way or another.
False. Directories, files etc. do not create their own inodes. They are stored with use of inodes belonging to the filesystem on which they are stored. The inodes are not even created specifically for particular files -- all inodes are created as part of filesystem creation, before there are any files stored on it.*
How would I go about implementing my own inode type that doesn't quite fit any of the existing descriptions?
It's unclear why you think you need a custom inode type, but if you do, then you need a whole custom filesystem. You will need to write either kernel drivers or FUSE drivers implementing it, plus all the needed utilities for formatting a device with that FS, mounting and unmounting it, checking it for errors, etc.
A custom something that is visible in the file system but isn't a file? Do I have to extend the kernel or is there some simpler approach?
Everything is a file. This is one of the principles of UNIX. But perhaps you mean something that isn't a regular file. Unfortunately for you, even a custom file system and inode wouldn't be enough to give you a custom file type. The partition of filesystem entries regular files, directories, character and block special files, etc. is deeply ingrained in the kernel and the standard file management APIs and utilities. You would not only have to extend the kernel (beyond writing filesystem drivers), but also modify the C standard library, several standard utilities, and probably a bunch of other libraries and utilities affected by those changes. In the end, you basically have your own whole operating system.
But maybe your premise is wrong. UNIX has been going along just fine with pretty much its current file model for a very long time. It's unclear why you want what you say you want, but there are at least two simpler options that might suit you:
Write a kernel driver for a character or block device with a filesystem interface, and use the system's existing facilities to link one or more device instances to the filesystem as a character or block special file.
Embed what you want to do in regular files / directories / etc.
*More or less. I ignore special administrative actions that may in some cases be able to expand a filesystem and add inodes to it in the process.

Related

Override c library file functions?

I am working on a game, and one of the requirements per the licence agreement of the sound assets I am using is that they be distributed in a way that makes them inaccessible to the end user. So, I am thinking about aggregating them into a flat file, encrypting them, or some such. The problem is that the sound library I am using (Hekkus Sound System) only accepts a 'char*' file path and handles file reading internally. So, if I am to continue to use it, I will have to override the c stdio file functions to handle encryption or whatever I decide to do. This seems doable, but it worries me. Looking on the web I am seeing people running into strange frustrating problems doing this on platforms I am concerned with(Win32, Android and iOS).
Does there happen to be a cross-platform library out there that takes care of this? Is there a better approach entirely you would recommend?
Do you have the option of using a named pipe instead of an ordinary file? If so, you can present the pipe to the sound library as the file to read from, and you can decrypt your data and write it to the pipe, no problem. (See Beej's Guide for an explanation of named pipes.)
Override stdio in a way that a lib you not knowing how it works exactly works in a way the developer hasn't in mind do not look like the right approach for me, as it isn't really easy. Implement a ramdrive needs so much effort that I recommend to search for another audio lib.
The Hekkus Sound System I found was build by a single person and last updated 2012. I wouldn't rely on a lib with only one person working on it without sharing the sources.
My advice, invest your time in searching for a proper sound lib instead of searching for a fishy work around for this one.
One possibility is to use a encrypted loopback filesystem (google for additional resources).
The way this works is that you put your assets on a encrypted filesystem, which actually lives in a simple file. This filesystem gets mounted someplace as a loopback device. Password needs to be supplied at attach / mount time. Once mounted, all files are available as regular files to your software. But otherwise, the files are encrypted and inaccessible.
It's compiler-dependent and not a guaranteed feature, but many allow you to embed files/resources directly into the exe and read them in your code as if from disk. You could embed your sound files that way. It will significantly increase the size of your exe however.
Another UNIX-based approach:
The environment variable LD_PRELOAD can be used to override any shared library an executable has been linked against. All symbols exported by a library mentioned in LD_PRELOAD are resolved to that library, including calls to libc functions like open, read, and close. Using the libdl, it is also possible for the wrapping library to call through to the original implementation.
So, all you need to do is to start the process which uses the Hekkus Sound System in an environment that has LD_PRELOAD set appropriately, and you can do anything you like to the file that it reads.
Note, however, that there is absolutely no way that you can keep the data inaccessible from the user: the very fact that he has to be able to hear it means he has to have access. Even if all software in the chain would use encryption, and your user is not willing to hack hardware, it would not be exactly difficult to connect the audio output jack with an audio input jack, would it? And you can't forbid you user to use earphones, can you? And, of course, the kernel can see all audio output unencrypted and can send a copy somewhere else...
The solution to your problem would be a ramdisk.
http://en.wikipedia.org/wiki/RAM_drive
Using a piece of memory in ram as if it was a disk.
There is software available for this too. Caching databases in ram is becoming popular.
And it keeps the file from being on the disk that would make it easy accessible to the user.

File table in Ubuntu OS

Does linux/Ubuntu OS creates a table, which keeps the entry of every file with its absolute address that is stored on the hard drive?
Just curious to know, because I am planning to make a file searcher program.
I know there are terminal commands like find etc, but as I will program in C I was thinking if there any such thing Ubuntu OS does, if so, how can I access that table?
Update:
As some people mentioned there is no such thing, then If I want to make a file searcher program, I would have to search each and every folder of every directory, starting program root directory. The resultant program will be very sluggish and will perform poorly! So is there a better way? Or my way is good!
The "thing" you describe is commonly called a file system and as you may know there's a choice of file systems available for Linux: ext3, ext4, btrfs, Reiser, xfs, jffs, and others.
The table you describe would probably map quite well onto the inode-directory combo.
From my point of view, the entire management of where files are physically located on the harddisk is none of the user's business, it's strictly the operating system's domain and not something to mess with unless you have an excellent excuse (like you're writing a data recovery program) and very deep knowledge of the file system(s) involved. Moreover, in most cases a file's storage will not be contiguous, but spread over several locations on the disk (fragments).
But the more important question here is probably: what exactly do you hope to achieve by finding files this way?
EDIT: based on OP's comment I think there may be a serious misunderstanding here - I can't see the link between absolute file addresses and a file searcher, but that may be due to a fundamental difference between our respective understanding of "absolute address" in the context of a file system.
If you just want to look at all files in the file system you can either
perform a recursive directory read or
use the database prepared by updatedb as suggested by SmartGuyz
As you want to look into the files anyways - and that's where almost all runtime will be spent on - I can't think of any advantage 2) would have over 1) and 2) has the disadvantage to have an external dependency, in casu the file prepared by updatedb must exist and be very fresh.
An SO question speaking about more advanced ways of traversing directories than good old opendir/readdir/closedir : Efficiently Traverse Directory Tree with opendir(), readdir() and closedir()
EDIT2 based on OP's question addendum: yes, traversing directories takes time, but that's life. Consider the next best thing, ie locate and friends. It depends on a "database" that will be updated regularly (typically once daily), so all files that were added or renamed after the last scheduled update will not be found, and files that were removed after the last scheduled update will be mentioned in the database although they don't exist anymore. Assuming locate is even installed on the target machine, something you can't be sure of.
As with most things in programming, it never hurts to look at previous solutions to the same problem, so may I suggest you read the documentation of GNU findutils?
No, there is no a single table of block addresses of files, you need to go deeper.
First of all, the file layout depends on the filesystem type (e.g. ext2, ext3, btrfs. reisersf, jfs, xfs, etc). This is abstracted by the Linux kernel, which provides drivers for access to files on a lot of filesystems and a specific partition with its filesystem is abstracted under the single Virtual File System (the single file-directory tree, which contains other devices as its subtrees).
So, basically no, you need to use the kernel abstract interfaces (readdir(), /proc/mounts and so on) in order to search for files or roll your own userspace drivers (e.g. through FUSE) to examine raw block devices (/dev/sda1 etc) if you really need to examine low-level details (this requires a lot of understanding of the kernel/filesystems internals and is highly error-prone).
updatedb -l 0 -o db_file -U source_directory
This will create a database with files, I hope this will help you.
No. The file system is actually structured with directories, each directory containing files and directories.
Within Linux, all of this is managed into the kernel with inodes.
YES.
Conceptually, it does create a table of every file's location on the disc**. There are a lot of details which muddy this picture slightly.
However, you should usually not care. You don't want to work at that level, nor should you. There are many filesystems in Linux which all do it in a slightly (or even significantly) different way.
** Not actually physical location. A hard drive may map the logical blocks to physical blocks in some way determine by its firmware.

Distributing DLLs Inside an EXE (C++)

How can I include my programs dependency DLLs inside the EXE file (so I only have to distribute that one file)? I am using C++ so I can't use ILMerge like I usually do for C#, but is there an easier way to automatically do this in Visual Studio?
I know this is possible (thats why installers work), I just need some help being pointed to the best way to this.
Thank you for your time.
There are many problems with this approach. For one example, see this post from REAL Software. Their “REALbasic” product used to do this and had problems including:
When writing the DLLs out at run-time, it would trigger anti-virus warnings.
Problems with machines where the user doesn’t have write permissions or is low on disk space.
Their attempt to fix the problem caused more problems, including crashes. Eventually they relented and now distribute DLLs side-by-side with apps.
If you really need a single-EXE deployment, and can’t use an installer for some reason, the reliable way is to static-link all dependencies. This assumes that you have the correct .libs (and not just .libs that link in the DLL).
There exist two options, both of which are far from ideal:
write a temporary file somewhere
load the DLL to memory "by hand", i.e. create a memory block, put DLL image to memory, then process relocations and external references.
The downside of the first approach is described above by Nate. Second approach is possible, but is complicated (requires deep knowledge of certain low-level things) and doesn't allow the DLL code to access DLL resources (this is obvious - there's no image of the DLL so the OS doesn't know where to take resources).
One more option usable in some scenarios: create a virtual disk whose contents are stored in your EXE file resources, and load the DLL from there. This is possible using our SolFS product (OS edition), but creation of the virtual disk itself requires use of kernel-mode drivers which must be written to disk before use.
Most installers use a zip file (or something similar) to hold whatever files are needed. When you run the installer, it decompresses the data and puts the individual files where needed (and typically adds registry entries, registers any COM controls it installed, etc.)

How to create a virtual file?

I'd like to simulate a file without writing it on disk. I have a file at the end of my executable and I would like to give its path to a dll. Of course since it doesn't have a real path, I have to fake it.
I first tried using named pipes under Windows to do it. That would allow for a path like \\.\pipe\mymemoryfile but I can't make it works, and I'm not sure the dll would support a path like this.
Second, I found CreateFileMapping and GetMappedFileName. Can they be used to simulate a file in a fragment of another ? I'm not sure this is what this API does.
What I'm trying to do seems similar to boxedapp. Any ideas about how they do it ? I suppose it's something like API interception (Like Detour ), but that would be a lot of work. Is there another way to do it ?
Why ? I'm interested in this specific solution because I'd like to hide the data and for the benefit of distributing only one file but also for geeky reasons of making it works that way ;)
I agree that copying data to a temporary file would work and be a much easier solution.
Use BoxedApp and do not worry.
You can store the data in an NTFS stream. That way you can get a real path pointing to your data that you can give to your dll in the form of
x:\myfile.exe:mystreamname
This works precisely like a normal file, however it only works if the file system used is NTFS. This is standard under Windows nowadays, but is of course not an option if you want to support older systems or would like to be able to run this from a usb-stick or similar. Note that any streams present in a file will be lost if the file is sent as an attachment in mail or simply copied from a NTFS partition to a FAT32 partition.
I'd say that the most compatible way would be to write your data to an actual file, but you can of course do it one way on NTFS systems and another on FAT systems. I do recommend against it because of the added complexity. The appropriate way would be to distribute your files separately of course, but since you've indicated that you don't want this, you should in that case write it to a temporary file and give the dll the path to that file. Make sure you write the temporary file to the users' temp directory (you can find the path using GetTempPath in C/C++).
Your other option would be to write a filesystem filter driver, but that is a road that I strongly advise against. That sort of defeats the purpose of using a single file as well...
Also, in case you want only a single file for distribution, how about using a zip file or an installer?
Pipes are for communication between processes running concurrently. They don't store data for later access, and they don't have the same semantics as files (you can't seek or rewind a pipe, for instance).
If you're after file-like behaviour, your best bet will always be to use a file. Under Windows, you can pass FILE_ATTRIBUTE_TEMPORARY to CreateFile as a hint to the system to avoid flushing data to disk if there's sufficient memory.
If you're worried about the performance hit of writing to disk, the above should be sufficient to avoid the performance impact in most cases. (If the system is low enough on memory to force the file data out to disk, it's probably also swapping heavily anyway -- you've already got a performance problem.)
If you're trying to avoid writing to disk for some other reason, can you explain why? In general, it's quite hard to stop data from ever hitting the disk -- the user can always hibernate the machine, for instance.
Since you don't have control over the DLL you have to assume that the DLL expects an actual file. It probably at some point makes that assumption which is why named pipes are failing on you.
The simplest solution is to create a temporary file in the temp directory, write the data from your EXE to the temp file and then delete the temporary file.
Is there a reason you are embedding this "pseudo-file" at the end of your EXE instead of just distributing it with our application? You are obviously already distributing this third party DLL with your application so one more file doesn't seem like it is going to hurt you?
Another question, will this data be changing? That is are you expecting to write back data this "pseudo-file" in your EXE? I don't think that will work well. Standard users may not have write access to the EXE and that would probably drive anti-virus nuts.
And no CreateFileMapping and GetMappedFileName definitely won't work since they don't give you a file name that can be passed to CreateFile. If you could somehow get this DLL to accept a HANDLE then that would work.
And I wouldn't even bother with API interception. Just hand the DLL a path to an acutal file.
Reading your question made me think: if you can pretend an area of memory is a file and have kind of "virtual path" to it, then this would allow loading a DLL directly from memory which is what LoadLibrary forbids by design by asking for a path name. And this is why people write their own PE loader when they want to achieve that.
I would say you can't achieve what you want with file mapping: the purpose of file mapping is to treat a portion of a file as if it was physical memory, and you're wanting the reciprocal.
Using Detours implies that you would have to replicate everything the intercepted DLL function does except from obtaining data from a real file; hence it's not generic. Or, even more intricate, let's pretend the DLL uses fopen; then you provide your own fopen that detects a special pattern in the path and you mimmic the C runtime internals... Hmm is it really worth all the pain? :D
Please explain why you can't extract the data from your EXE and write it to a temporary file. Many applications do this -- it's the classic solution to this problem.
If you really must provide a "virtual file", the cleanest solution is probably a filesystem filter driver. "clean" doesn't mean "good" -- a filter is a fully documented and supported solution, so it's cleaner than API hooking, injection, etc. However, filesystem filters are not easy.
OSR Online is the best place to find Windows filesystem information. The NTFSD mailing list is where filesystem developers hang out.
How about using a some sort of RamDisk and writing the file to this disk? I have tried some ramdisks myself, though never found a good one, tell me if you are successful.
Well, if you need to have the virtual file allocated in your exe, you will need to create a vector, stream or char array big enough to hold all of the virtual data you want to write.
that is the only solution I can think of without doing any I/O to disk (even if you don't write to file).
If you need to keep a file like path syntax, just write a class that mimics that behaviour and instead of writing to a file write to your memory buffer. It's as simple as it gets. Remember KISS.
Cheers
Open the file called "NUL:" for writing. It's writable, but the data are silently discarded. Kinda like /dev/null of *nix fame.
You cannot memory-map it though. Memory-mapping implies read/write access, and NUL is write-only.
I'm guessing that this dll cant take a stream? Its almost to simple to ask BUT if it can you could just use that.
Have you tried using the \?\ prefix when using named pipes? Many APIs support using \?\ to pass the remainder of the path directly through without any parsing/modification.
http://msdn.microsoft.com/en-us/library/aa365247(VS.85,lightweight).aspx
Why not just add it as a resource - http://msdn.microsoft.com/en-us/library/7k989cfy(VS.80).aspx - the same way you would add an icon.

Unpacking an executable from within a library in C/C++

I am developing a library that uses one or more helper executable in the course of doing business. My current implementation requires that the user have the helper executable installed on the system in a known location. For the library to function properly the helper app must be in the correct location and be the correct version.
I would like to removed the requirement that the system be configured in the above manner.
Is there a way to bundle the helper executable in the library such that it could be unpacked at runtime, installed in a temporary directory, and used for the duration of one run? At the end of the run the temporary executable could be removed.
I have considered automatically generating an file containing an unsigned char array that contains the text of the executable. This would be done at compile time as part of the build process. At runtime this string would be written to a file thus creating the executable.
Would it be possible to do such a task without writing the executable to a disk (perhaps some sort of RAM disk)? I could envision certain virus scanners and other security software objecting to such an operation. Are there other concerns I should be worried about?
The library is being developed in C/C++ for cross platform use on Windows and Linux.
"A clever person solves a problem. A
wise person avoids it." — Albert Einstein
In the spirit of this quote I recommend that you simply bundle this executable along with the end-application.
Just my 2 cents.
You can use xxd to convert a binary file to a C header file.
$ echo -en "\001\002\005" > x.binary
$ xxd -i x.binary
unsigned char x_binary[] = {
0x01, 0x02, 0x05
};
unsigned int x_binary_len = 3;
xxd is pretty standard on *nix systems, and it's available on Windows with Cygwin or MinGW, or Vim includes it in the standard installer as well. This is an extremely cross-platform way to include binary data into compiled code.
Another approach is to use objcopy to append data on to the end of an executable -- IIRC you can obtain objcopy and use it for PEs on Windows.
One approach I like a little better than that is to just append raw data straight onto the end of your executable file. In the executable, you seek to the end of the file, and read in a number, indicating the size of the attached binary data. Then you seek backwards that many bytes, and fread that data and copy it out to the filesystem, where you could treat it as an executable file. This is incidentally the way that many, if not all, self-extracting executables are created.
If you append the binary data, it works with both Windows PE files and *nix ELF files -- neither of them read past the "limit" of the executable.
Of course, if you need to append multiple files, you can either append a tar/zip file to your exe, or you'll need a slightly more advance data structure to read what's been appended.
You'll also probably want to UPX your executables before you append them.
You might also be interested in the LZO library, which is reportedly one of the fastest-decompressing compression libraries. They have a MiniLZO library that you can use for a very lightweight decompressor. However, the LZO libraries are GPL licensed, so that might mean you can't include it in your source code unless your code is GPLed as well. On the other hand, there are commercial licenses available.
Slightly different approach than using an unsigned char* array is to put the entire executable binary as resource of the dll. At runtime, you can save the binary data as a local temp file and execute the app. I'm not sure if there is a way to execute an executable in memory, though.
For the library to function properly
the helper app must be in the correct
location
On Windows, would that be the Program Files directory or System32 directory?
This might be a problem. When an application is installed, particularly in a corporate environment, it usually happens in an context with administrative rights. On Vista and later with UAC enabled (the default), this is necessary to write to certain directories. And most Unix flavours have had sensible restrictions like that for as long as anyone can remember.
So if you try to do it at the time the host application calls into your library, that might not be in a context with sufficient rights to install the files, and so your library would put constraints on the host application.
(Another thing that will be ruled out is Registry changes, or config file updates on the various Unices, if the host application doesn't have the ability to elevate the process to an administrative level.)
Having said all that, you say you're considering unpacking the helpers into a temporary directory, so maybe this is all moot.
Qt has an excellent method of achieving this: QResource
"The Qt resource system is a platform-independent mechanism for storing binary files in the application's executable."
You don't say if you are currently using Qt, but you do say "C++ for cross platform use on Windows and Linux", so even if you aren't using it, you may want to consider starting.
There is a way in Windows to run an executable from within memory without writing it to disk. The problem is that due to modern security systems (DEP) this probably won't work on all systems and almost any anti-malware scanner will detect it and warn the user.
My advice is to simply package the executable into your distribution, it's certainly the most reliable way to achieve this.
Well, my first thought would be: what does this helper executable do that couldn't be done within your library's code itself, perhaps using a secondary thread if necessary. This might be something to consider.
But as for the actual question... If your "library" is actually bundled up as a dll (or even an exe) then at least Windows has relatively simpe support for embedding files within your library.
The resource mechanism that allows things like version information and icons to be embedded within executables can also allow arbitrary chunks of data. Since I don't know what development environment you're using, I can't say exactly how to do this. But roughly speaking, you'd need to create a custom resource with a type of "FILE" or something sensible like that and point it at the exe you want to embed.
Then, when you want to extract it, you would write something like
HRSRC hResource = FindResource(NULL, MAKEINTRESOURCE(IDR_MY_EMBEDDED_FILE), "FILE");
HGLOBAL hResourceData = LoadResource(NULL, hResource);
LPVOID pData = LockResource(hResourceData);
HANDLE hFile = CreateFile("DestinationPath\\Helper.exe", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwBytesWritten = 0;
WriteFile(hFile, pData, SizeofResource(NULL, hResource), &dwBytesWritten, NULL);
CloseHandle(hFile);
(filling in your own desired path, filename, and any appropriate error checking of course)
After that, the helper exe exists as a normal exe file and so you can execute it however you normally would.
For removing the file after use, you should investigate the flags for CreateFile, particularly FILE_FLAG_DELETE_ON_CLOSE. You might also look at using MoveFileEx when combining the MOVEFILE_DELAY_UNTIL_REBOOT flag with NULL passed for the new file name. And of course, you could always delete it in your own code if you can tell when the executable has ended.
I don't know enough about Linux executables, so I don't know if a similar feature is available there.
If Linux doesn't provide any convenient mechanism and/or if this idea doesn't suit your needs in Windows, then I suppose your idea of generating an unsigned char array from the contents of the helper exe would be the next best way to embed the exe in your library.