I have a compressed file.
Let's ignore the tar command because I'm not sure it is compressed with that.
All I know is that it is compressed in fortran77 and that is what I should use to decompress it.
How can I do it?
Is decompression a one way road or do I need a certain header file that will lead (direct) the decompression?
It's not a .Z file. It ends at something else.
What do I need to decompress it? I know the format of the final decompressed archive.
Is it possible that the file is compressed thru a simple way but it appears with a different extension?
First, let's get the "fortran" part out of the equation. There is no standard (and by that, I mean the fortran standard) way to either compress or decompress files, since fortran doesn't have a compression utility as part of the language. Maybe someone written some of their own, but that's entirely up to him.
So, you're stuck with publicly available compression utilities, and such. On systems which have those available, and on compilers which support it (it varies), you can use the SYSTEM function, which executes the system command by passing a command string to the operating system's command interpreter (I know it exists in cvf, probably ivf ... you should probably look it up in help of your compiler).
Since you asked a similar question already I assume you're still having problem with this. You mentioned that "it was compressed with fortran77". What do you mean by that ? That someone builded a compression utility in f77 and used it ? So that would make it a custom solution ?
If it's some kind of a custom solution, then it can practically be anything, since a lot of algorithms can serve as "compression algorithms" (writing file as binary compared to plain text will save a few bytes; voila, "compression")
Or have I misunderstood something ? Please, elaborate this a little.
My guess is that you have a binary file, which is output by a Fortran program. These can look like compressed files because they are not readable in a text editor.
Fortran allows you to write the in-memory data out to a file without formatting it, so that you can reload it later without having to parse it. The problem, however, is that you need that original source code in order to see what types of variables are written in the file.
If you have no access to the fortran source code, but a lot of time to spare, you could write some simple fortran program and guess what types of variables are being used. I wouldn't advise it, though, as Fortran is not very forgiving.
If you want some simple source code to try, look at this page which details binary read and write in Fortran, and includes a code sample. Just start by replacing reclength=reclength*4 with reclength=reclength*2 for a double precision real.
There is no standard decompression method, there are tons. You will need to know the method used to compress it in order to decompress it.
You said that the file extension was not .Z, but something else. What was that something else?
If it's .gz (which is very common on Unix systems), "gunzip" is the proper command. If it's .tgz, you can gunzip and untar it. (Or you can read the man page for tar(1), since it probably has the ability to gunzip and extract together.)
If it's on Windows, see if Windows can read it directly, as the file system itself appears to support the ZIP format.
If something else, please just list the file name (or, if there are security implications, the file name beginning with the first period), and we might be able to figure it out.
You can check to see if it's a known compressed file type with the file command. Assuming file returns something like "binary file" then you're almost certainly looking at plain binary data.
Related
I need to write simulation data computed on GPU into an output .csv file. Normally I would just use the fstream library but that's not possible on GPU.
Are there any built-in functions or other libraries that I could use to write data to .csv or .txt files directly from device code? Right now, performance is really not that important but rather an easy interim solution.
No, it's not possible to do direct file I/O in CUDA from device code, unless you are using something like GPU Direct Storage (GDS) (which most likely you are not, at the current time, and based on your question). If you don't already have it set up, GDS might not be an "easy interim solution".
Copy the data to the host, then use whatever file I/O routines you are comfortable with.
Note that requests for library recommendations are specifically off-topic for SO.
Use the printf statement to output the prints from Cuda kernel to a text file and then parse the text file to convert to CSV.
I have an enormous input file, terabytes in size (it is gzipped (.gz)).
I need to read each line individually, and decide whether to add it to a new file.
The output file is also expected to be terabytes in size, but smaller since I won't add all the files.
Is there a way to do this in C++, using the standard libraries ? I don't want to use boost. Is that possible ?
The standard C++ libraries do not deal with gzip format. Neither do the standard C libraries. I don't know about boost.
But you can certainly use zlib, which I believe comes with a C++ wrapper if the use of C is too daunting.
It's not generally a good idea to append to a gzipped file, by the way, although it is theoretically possible. But you lose a lot of compression because the algorithm needs to be reset and thereby loses context. However, you can open a compressed stream and write to it, so you don't need to write the uncompressed file to disk. I think that's all you need for this query.
In *nix system there is a command called 'file', which can tell you the underlying type of a file. Say, if you rename a binary executable's name into foo.txt, or you rename a mp3 file into .txt, the system will always tell you the real type of the file. But in Windows, there seems no such functionality, if you rename an executable into .txt, you cannot execute it. Can anyone explain to me how this is done in *nix system, and how can I find the real type of a file using C++, especially in windows, where I cannot use std::system("file blah")?
File utility uses libmagic library. It recognises filetype parsing "special" fields in the file.
Of course, you can program by yourself recognition of some formats, but sometimes this requires plenty of work. E.g. when you try to differentiate between different formats of MP4.
Developers of that library did pretty huge amount of work. So it's adviced to use their results if you want to get god results in saying what type format you deal with.(this is a big sphere, really, and if knowing what type format you are working with,better rely on them then on your code)
File utility - http://www.darwinsys.com/file/
You can download source code and see how really many different recognition types they use.
Download archive file-4.26 -> magic -> Magdir
Personally I had luck with compiling file 4.26 on Windows ftp://ftp.astron.com/pub/file/
Caution It's merely a convention that files of certain formats should have predefined signatures and it's true almost always and helps identify formats of files properly.
If it's not point of concern, you can surely trust signature. But just keep in mind that anyone having enough knowledge and wish can open a file in hex editor and playing with bits make another format of file.
Even in Unix/Linux, the system doesn't actually definitively know a file's type. The "file" program makes an educated guess by comparing the file's contents against a database of patterns that characterize a variety of common file types, but it's no more than a guess — it doesn't know about all possible file formats, and it can be wrong about the ones that it does know.
It's entirely possible to write a program like "file" for Windows; it doesn't depend on any special capabilities in the OS. Cygwin provides a Windows port of the "file" program, for example.
The issue of renaming a program to have a .txt extension is unrelated to the "file" program. That comes from the fact that Windows decides whether a file is executable based on its name (specifically, its extension), whereas Unix/Linux decides whether a file is executable based on its permissions — not its contents. If you chmod a-x a program on a Linux system, the system will consider it non-executable, just like if you remove the .exe extension from a program on Windows.
The command reference is suggesting that the type information is saved to an external place for further usage. It is also mentioning magic numbers, which is refering to file signatures.
Being 100% sure of a file type is theorically impossible since there is no precise rules around what a certain type should contain. Even if they were such rules, it would be possible to alter the file in a way to make it look like another one. While both signatures and extension can give you a good idea of what the type actually is, you still need to face the possibility of dealing with a wrong type.
UNIX file command uses heuristics. There is a database of magic numbers, usually in /usr/share/file/magic and /etc/magic/ that allows you to add new file "types" to be recogized by the file command. It simply probes the file to look for magic numbers (signatures) in its contents.
UNIX traditionally doesn't have the same type of file extension and type associations that Windows does, although Linux is accumulating that in recent times.
I would think on Windows you'd want to at least check the file extension association, to be correct. But even within a given extension (such as .txt) the individual program may perform its own heuristics. Example, notepad has to make an educated guess at the character encoding when it opens a file. Raymond Chen wrote a good read in his blog about it The Old New Thing - The Notepad file encoding problem, redux
I am running a DCT code in matlab and i would like to read the compressed file (.mat) into a c code. However, am not sure this is right. I have not yet finished my code but i would like to request for an explanation of how to create a c++ readable file from my .mat file.
Am kinda confused when it comes to .mat, .txt and then binary, float details of files. Someone please explain this to me.
It seems that you have a lot of options here, depending on your exact needs, time, and skill level (in both Matlab and C++). The obvious ones are:
ASCII files
You can generate ASCII files in Matlab either using the save(filename, variablename, '-ascii') syntax, or you can create a more custom format using c-style fprintf commands. Then, within a C or C++ program the files are read using an fscanf.
This is often easiest, and good enough in many cases. The fact that a human can read the files using notepad++, emacs, etc. is a nice sanity check, (although this is often overrated).
There are two big downsides. First, the files are very large (an 8 byte double number requires about 19 bytes to store in ASCII). Second, you have to be very careful to minimize the inevitable loss of precision.
Bytes-on-a-disk
For a simple array of numbers (for example, a 32-by-32 array of doubles) you can simply use the fwrite Matlab function to write the array to a disk. Then within C/C++ use the parallel fread function.
This has no loss of precision, is pretty fast, and relatively small size on disk.
The downside with this approach is that complex Matlab structures cannot necessarily be saved.
Mathworks provided C library
Since this is a pretty common problem, the Mathworks has actually solved this by a direct C implementation of the functions needed to read/write to *.mat files. I have not used this particular library, but generally the libraries they provide are pretty easy to integrate. Some documentation can be found starting here: http://www.mathworks.com/help/matlab/read-and-write-matlab-mat-files-in-c-c-and-fortran.html
This should be a pretty robust solution, and relatively insensitive to changes, since it is part of the mainstream, supported Matlab toolset.
HDF5 based *.mat file
With recent versions of Matlab, you can use the notation save(filename, variablename, '-v7.3'); to force Matlab to save the file in an HDF5 based format. Then you can use tools from the HDF5 group to handle the file. Note a decent, java-based GUI viewer (http://www.hdfgroup.org/hdf-java-html/hdfview/index.html#download_hdfview) and libraries for C, C++ and Fortran.
This is a non-fragile method to store binary data. It is also a bit of work to get the libraries working in your code.
One downside is that the Mathworks may change the details of how they map Matlab data types into the HDF5 file. If you really want to be robust, you may want to try ...
Custom HDF5 file
Instead of just taking whatever format the Mathworks decides to use, it's not that hard create a HDF5 file directly and push data into it from Matlab. This lets you control things like compression, chunk sizing, dataset hierarchy and names. It also insulates you from any future changes in the default *.mat file format. See the h5write command in Matlab.
It is still a bit of effort to get running from the C/C++ end, so I would only go down this path if your project warranted it.
.mat is special format for the MATLAB itself.
What you can do is to load your .mat file in the MATLAB workspace:
load file.mat
Then use fopen and fprintf to write the data to file.txt and then you can read the content of that file in C.
You can also use matlab's dlmwrite to write to a delimited asci file which will be easy to read in C (and human readable too) although it may not be as compressed if that is core to the issue
Adding to what has already been mentioned you can save your data from MATLAB using -ascii.
save x.mat x
Becomes:
save x.txt x -ascii
I'd like to simulate a file without writing it on disk. I have a file at the end of my executable and I would like to give its path to a dll. Of course since it doesn't have a real path, I have to fake it.
I first tried using named pipes under Windows to do it. That would allow for a path like \\.\pipe\mymemoryfile but I can't make it works, and I'm not sure the dll would support a path like this.
Second, I found CreateFileMapping and GetMappedFileName. Can they be used to simulate a file in a fragment of another ? I'm not sure this is what this API does.
What I'm trying to do seems similar to boxedapp. Any ideas about how they do it ? I suppose it's something like API interception (Like Detour ), but that would be a lot of work. Is there another way to do it ?
Why ? I'm interested in this specific solution because I'd like to hide the data and for the benefit of distributing only one file but also for geeky reasons of making it works that way ;)
I agree that copying data to a temporary file would work and be a much easier solution.
Use BoxedApp and do not worry.
You can store the data in an NTFS stream. That way you can get a real path pointing to your data that you can give to your dll in the form of
x:\myfile.exe:mystreamname
This works precisely like a normal file, however it only works if the file system used is NTFS. This is standard under Windows nowadays, but is of course not an option if you want to support older systems or would like to be able to run this from a usb-stick or similar. Note that any streams present in a file will be lost if the file is sent as an attachment in mail or simply copied from a NTFS partition to a FAT32 partition.
I'd say that the most compatible way would be to write your data to an actual file, but you can of course do it one way on NTFS systems and another on FAT systems. I do recommend against it because of the added complexity. The appropriate way would be to distribute your files separately of course, but since you've indicated that you don't want this, you should in that case write it to a temporary file and give the dll the path to that file. Make sure you write the temporary file to the users' temp directory (you can find the path using GetTempPath in C/C++).
Your other option would be to write a filesystem filter driver, but that is a road that I strongly advise against. That sort of defeats the purpose of using a single file as well...
Also, in case you want only a single file for distribution, how about using a zip file or an installer?
Pipes are for communication between processes running concurrently. They don't store data for later access, and they don't have the same semantics as files (you can't seek or rewind a pipe, for instance).
If you're after file-like behaviour, your best bet will always be to use a file. Under Windows, you can pass FILE_ATTRIBUTE_TEMPORARY to CreateFile as a hint to the system to avoid flushing data to disk if there's sufficient memory.
If you're worried about the performance hit of writing to disk, the above should be sufficient to avoid the performance impact in most cases. (If the system is low enough on memory to force the file data out to disk, it's probably also swapping heavily anyway -- you've already got a performance problem.)
If you're trying to avoid writing to disk for some other reason, can you explain why? In general, it's quite hard to stop data from ever hitting the disk -- the user can always hibernate the machine, for instance.
Since you don't have control over the DLL you have to assume that the DLL expects an actual file. It probably at some point makes that assumption which is why named pipes are failing on you.
The simplest solution is to create a temporary file in the temp directory, write the data from your EXE to the temp file and then delete the temporary file.
Is there a reason you are embedding this "pseudo-file" at the end of your EXE instead of just distributing it with our application? You are obviously already distributing this third party DLL with your application so one more file doesn't seem like it is going to hurt you?
Another question, will this data be changing? That is are you expecting to write back data this "pseudo-file" in your EXE? I don't think that will work well. Standard users may not have write access to the EXE and that would probably drive anti-virus nuts.
And no CreateFileMapping and GetMappedFileName definitely won't work since they don't give you a file name that can be passed to CreateFile. If you could somehow get this DLL to accept a HANDLE then that would work.
And I wouldn't even bother with API interception. Just hand the DLL a path to an acutal file.
Reading your question made me think: if you can pretend an area of memory is a file and have kind of "virtual path" to it, then this would allow loading a DLL directly from memory which is what LoadLibrary forbids by design by asking for a path name. And this is why people write their own PE loader when they want to achieve that.
I would say you can't achieve what you want with file mapping: the purpose of file mapping is to treat a portion of a file as if it was physical memory, and you're wanting the reciprocal.
Using Detours implies that you would have to replicate everything the intercepted DLL function does except from obtaining data from a real file; hence it's not generic. Or, even more intricate, let's pretend the DLL uses fopen; then you provide your own fopen that detects a special pattern in the path and you mimmic the C runtime internals... Hmm is it really worth all the pain? :D
Please explain why you can't extract the data from your EXE and write it to a temporary file. Many applications do this -- it's the classic solution to this problem.
If you really must provide a "virtual file", the cleanest solution is probably a filesystem filter driver. "clean" doesn't mean "good" -- a filter is a fully documented and supported solution, so it's cleaner than API hooking, injection, etc. However, filesystem filters are not easy.
OSR Online is the best place to find Windows filesystem information. The NTFSD mailing list is where filesystem developers hang out.
How about using a some sort of RamDisk and writing the file to this disk? I have tried some ramdisks myself, though never found a good one, tell me if you are successful.
Well, if you need to have the virtual file allocated in your exe, you will need to create a vector, stream or char array big enough to hold all of the virtual data you want to write.
that is the only solution I can think of without doing any I/O to disk (even if you don't write to file).
If you need to keep a file like path syntax, just write a class that mimics that behaviour and instead of writing to a file write to your memory buffer. It's as simple as it gets. Remember KISS.
Cheers
Open the file called "NUL:" for writing. It's writable, but the data are silently discarded. Kinda like /dev/null of *nix fame.
You cannot memory-map it though. Memory-mapping implies read/write access, and NUL is write-only.
I'm guessing that this dll cant take a stream? Its almost to simple to ask BUT if it can you could just use that.
Have you tried using the \?\ prefix when using named pipes? Many APIs support using \?\ to pass the remainder of the path directly through without any parsing/modification.
http://msdn.microsoft.com/en-us/library/aa365247(VS.85,lightweight).aspx
Why not just add it as a resource - http://msdn.microsoft.com/en-us/library/7k989cfy(VS.80).aspx - the same way you would add an icon.