pyfits: read compressed fits file - compression

How does one open a compressed fits file with pyfits?
The code below reads in the primary hdu, which is an image. The result is a NoneType object.
# read in file
file_input_fit = "myfile.fits.fz"
hdulist = pyfits.open(file_input_fit)
img = hdulist[0].data
Usage of keyword in pyfits.open() "disable_image_compression=True" appears ineffective.

If the .data attribute on the primary HDU is None that means the primary HDU contains no data. You can confirm this by checking the file info:
hdulist.info()
Chances are you're trying to read a multi-extension FITS file, and the data you're looking for is in another castle, I mean, HDU. disable_image_compression=True wouldn't help since that disables support for compressed images :)
ETA: In fact, a tile-compressed FITS image can never be in the primary HDU, since it's stored internally as a binary table, which can only be an extension HDU.

This would be better as a comment but I don't have the reputation to make a comment so I'm forced to write an answer. However, the answer is the same -- namely that the compressed data is stored in the second HDU. The comment was just to show what this looks like on a compressed image I have here (after using the exact lines of the OP to open the file):
>>> hdulist.info()
Filename: /tmp/test.fits.fz
No. Name Type Cards Dimensions Format
0 PRIMARY PrimaryHDU 6 ()
1 CompImageHDU 9 (24576, 6160) float32

Related

save Large RasterBrick to file for later use

I have a Large RasterBrick, created through compiling a large number of .nc files and then manipulating in a few ways (cropping, collapsing, naming layers). I want to save this brick to a file on my laptop, so that I can access it without having to import all data and manipulate anew.
How do I do this? I think it should involve writeRaster, but I'm not sure how to specify the options.
My RasterBrick is 18 by 25, with 14975 layers, each named with the relevant date.
I tried this code from Save multi layer RasterBrick to harddisk:
outfile <- writeRaster(windstack_mn, filename='dailywindgrid.tif', format="GTiff", overwrite=TRUE,options=c("INTERLEAVE=BAND","COMPRESS=LZW"))
However, this code produce a tif file that holds a single 18 by 25 layer. I think it saved only the 1st layer of my RasterBrick, because if I bring in the saved .tif file and plot it, it looks identical to plotting the 1st layer of the original RasterBrick.
Did you look at outfile? Can you show it to us?
You should show what you do to "bring in the saved .tif". I am guessing that you do
raster('dailywindgrid.tif')
whereas you should be doing
brick('dailywindgrid.tif')
The comment/answer fr/ Robert solves my issue, with the one addition that one needs to specify the raster format. So I am now saving the file with this code:
writeRaster(StackName, filename='FileNAme.grd', format="raster", overwrite=TRUE,options=c("INTERLEAVE=BAND","COMPRESS=LZW"))
And that .grd file can later be opened using this code:
ImportName <- brick("FileNAme.grd")

Extracting MDCT coefficients for steganalysis?

MP3Stego (http://www.petitcolas.net/steganography/mp3stego/) hides data within MP3 files during the inner_loop and uses part2_3_length to modify bits.
I'm wondering whether it would be worth extracting MDCT coefficients and examining them as a histogram to compare it with an MP3 file with no hidden data in it. However, from the encoding process of MP3s MDCT happens before the inner_loop.
Would there be any use of extracting the coefficients if this is the case? If so, would the best way of doing this just to print out data to file during the encoding process?

VAX Fortran Keyed indexed file - sequential access

Okay, I know I'm going back a few years, but maybe I'll run across some graybeards (like mine) :).
I have an indexed data file with a key field. It's opened like so in the application:
OPEN (FILE='DATA.MAS',STATUS='OLD',
1 ORGANIZATION='INDEXED',ACCESS='KEYED',
1 RECL=28,UNIT=LUNTM,SHARED,
1 KEY=(1:49:CHARACTER),
1 IOSTAT=IOS,ERR=9999)
I need to be able to scan the content of this file sequentially. However, every combination of organization and access options in the open, followed by reads always results in an error, either on the open or the read. Is it even possible to get the nth record of a keyed file?
Okay, found the solution after reading the doc for the umpteenth time. I changed the OPEN statement for SEQUENTIAL access and INDEXED organization. What I missed was that when you do this, FORTRAN interprets the file as FORMATTED. Adding FORM='UNFOFRMATTED' and adjusting the record size yields happiness and yuletide greetings

File Binary vs Text

Are there some situation where I have to prefer binary file to text file? I'm using C++ as programming language?
For example if I have to store some large text file is it better use text file or binary file?
Edit
The file for the moment has no requirment to be readable from human. Are some performance difference, security difference and so on?
Edit
Sorry for the omit other the requirment (thanks to Carey Gregory)
The record to save are in ascii encoding
The file must be crypted ( AES )
The machine can power off any time. So I've to try to prevents errors.
I've to know if the file change outside the program, I think I'll use a sha1 digest of the file.
As a general rule, define a text format, and use it. It's much
easier to develop and debug, and it's much easier to see what is
going wrong if it doesn't work.
If you find that the files are becoming too big, or taking to
much time to transfer over the wire, consider compressing them.
A compressed text file is often smaller than you can do with
binary. Or consider a less verbose text format; it's possible
to reliably transmit a text representation of your data with
a lot less characters than XML uses.
And finally, if you do end up having to use binary, try to chose
an existing format (e.g. Google's protocol blocks), or base your
format on an existing format. Just remember that:
Binary is a lot more work than text, since you practically
have to write all of the << operators again, including those
in the standard library.
Binary is a lot more difficult to debug, because you can't
easily see what you've actually done.
Concerning your last edit:
Once you've encrypted, the results will be binary. You can
use a text representation of the binary (base64 or some such),
but the results won't be any more readable than the binary, so
it's not worth the bother. If you're encrypting in process,
before writing to disk, you automatically lose all of the
advantages of text.
The issues concerning powering off mean that you cannot use
ofstream directly. You must open or create the file with the
necessary options for full transactional integrity (O_SYNC as
a flag to open under Unix). You must write each record as
a single write request to the system.
It's always a good idea to have a checksum, just in case. If
you're worried about security, SHA1 is a good choice. But keep
in mind that if someone has access to the file, and wants to
intentionally change it, they can recalculate the SHA1 and
insert the new value as well.
All files are binary; the data within them is a binary representation of some information. If you have to store a large amount of text then the file will contain the binary representation of that text. The difference between a "binary file" and a "text file" is that creating the latter involves converting data to a text form before saving it. This is typically done so humans can read it.
The distinction between binary and text is usually made when storing data that is for computer consumption. Typically this data would not be text - it might be a list of numerical configuration values, for example: 1, 2, 3.
If you stored this in text format, your file could contain a list of human-readable numbers, and if you opened the file in Notepad you might see one number per line. But what you're actually saving here is not the binary values 1, 2, 3 - you're saving a string "1\n2\n3\n". Note that this string is 6 characters long, and the binary values (assuming ASCI) would actually be 49, 10, 50, 10, 51, 10!
If the same data were stored in binary format, you would store the numbers in the smallest useful space, and write the file as individual bytes that can often only be read by the code that created them. Opening this file in Notepad would likely display junk characters, because the data makes no sense as text. In this case you would be saving a byte array with actual values { 1, 2, 3 } - or even a single byte with the three values embedded. This could be much smaller than the human-readable equivalent.
Binary files store a sequence of bytes like all other files. You can store numeric values like integers per 4 bytes, characters per single byte, or even serialized class objects and anything you want.
When you know how to read a binary file (ie. you know what is stored in it) you can extract all the information from it. However, text files use text encodings like UTF8, ANSI etc. and they are intended to encode text characters to be processed by text editors.
Binary files are for machines only to interpret, whereas a text file, a human can also open and interpret its content.
So it depends whether you want your file to be readable by a human or not.
It depends on a lot of factors. I can think of two right now:
Do you require the file to be readable by humans?
Is compression a factor? A 10-digits number will take at least 10 bytes as text, but might take as little as four or two as binary.
All data is binary. You always need a machine to interpret it for you. Even if the data is compressed like protocol buffers, Avro, Thrift etc, it is binary, and if it is uncompressed, it is still binary. If you want to read protocol buffers by notepad, there is a two step process. Uncompress, and read. In case of text, this step of uncompressing is not needed. Same is case with encrypted. First unencrypted, and then read. Humans cannot read binary (as some commenters are mentioning). We still need notepad to interpret and display binary (so called text).
All data stored in a text file are human-readable graphic characters. Each line of data ends with a new line character.
In case of a binary file - data is stored in the same format as they are stored in the memory. There are no lines or new line characters. There is an end of file marker.
Moreover binary files show more efficiency for memory as they are stored in zeros and one's.

tar.Z file format, structure, header

I am trying to figure out the file layout of
tar.Z file. (so called .taz file. compressed tar file).
this file can be produced with tar -Z option or
using unix compress utility(result are same)
I tried to google some document about this file structure
but there is no documentation about this file structure.
I know that this is LZW compressed file and starts with
its magic number "1F 9D" but thats all I can figure out.
someone please tell me more details about the file header or
anything.
I am not interested about how to uncompress this file, or
what linux command can process this file.
I want to know is internal file structure/header/format/layout.
thank you in advance
A .Z file is compressed using compress and can be uncompressed with uncompress (or on some machines this is called uncompress.real). This .Z file can hold any data. .tar.Z or .taz is just a .tar file that is compressed with compress.
The first 2 bytes (MAGIC_1 and MAGIC_2) are used to check if the .Z file really is a .Z file, and not something else with accidentally the same extension. These bytes are hardcoded in the sources.
The third byte is a settings byte and holds 2 values:
The most significant bit is the block mode.
The last 5 bits indicate the maximum size of the code table (the code table is used for lzw compression).
From the original code: BLOCK_MODE=0x80; byte3=(BIT|BLOCK_MODE); and BIT is in an if/else block where it is 12..16.
If block mode is turned on, in the code table a entity will be added at place 256 (remember 0..255 are filled with the values 0..255) and this will contain the CLEAR sign. So whenever the CLEAR sign is gotten from the data stream from the file, the code table has to be reverted to it's initial state (so it has only 0..256 in it).
The maximum code size indicates the amount of bits the code table can be. When the maximum is hit, there are no entities added to the code table anymore. So if the maximum code size is 0b00001100, it means that the code table can only hold 12 bits, so a maximum of 2^12=4096 entities.
The highest amount possible that is used by compress is 16 bit. That means that there are 2 bits in this settings field that are unused.
After these 3 bytes the raw LZW data starts. Because the LZW table starts at 9 bits, the 4th byte will be the same as the first byte of the input (in case of a .tar.Z file, or taz file, this byte will be the first byte of the uncompressed .tar file).
A tar.Z file is just a compressed tar file, so you will only find the 1F 9D magic number telling you to uncompress it.
When uncompressed you can read the tar file header:
http://www.fileformat.info/format/tar/corion.htm
Q: this file can be produced with tar -Z option or using unix compress utility(result are same)
A: Yes. "tar -cvf myfile.tar myfiles; compress myfile.tar" is equivalent to using "-Z". An even better choice is often "j" (using BZip, instead of Zip)
Q: What is the layout of a tar file?
A: There are many references, and much freely available source. For example:
http://en.wikipedia.org/wiki/Tar_%28file_format%29
Q: What is the format of a Unix compressed file?
A: Again: many references; easy to find sample source code:
http://en.wikipedia.org/wiki/Compress
Fot a .tgz (compressed tar file) you'll need both formats: you must first uncompress it, then untar it. The "tar" utility will do both for you, automagically :)