What is the best way to compress my data in lmdb

What is the best way to compress my data in lmdb - compression

I have a large dataset which makes my lmdb huge. For 16.000 samples my database is already 20 GB. But in total I have 800.000 images which would end up in a huge amount of data. Is there any way to compress an lmdb? Or is it better to use HDF5 files? I would like to know if anyone knows probably the best solution for this problem.

If you look inside ReadImageToDatum function in io.cpp it can keep image in both compressed(jpg/png) format or raw format. To use compressed format you can compress the loaded image using cv::imencode. Now you just set the datum to the compressed data and set the encoded flag. Then you can store the datum in lmdb.

There are various techniques to reduce input size, but much of that depends on your application. For instance, the ILSVRC-2012 data set images can be resized to about 256x256 pixels without nasty effects on the training time or model accuracy. This reduces the data set from 240Gb to 40Gb. Can your data set suffer loss of fidelity from simple "physical" compression? How small do you have to have the data set?
I'm afraid that I haven't worked with HDF5 files enough to have an informed opinion.

Related

Image steganography that could survive jpeg compression

I am trying to implement a steganographic algorithm where hidden message could survive jpeg compression.
The typical scenario is the following:
Hide data in image
Compress image using jpeg
The hidden data is not destroyed by jpeg compressiona nd could be restored
I was trying to use different described algorithms but with no success.
For example I was trying to use simple repetition code but the jpeg compression destroyed hidden data. Also I was trying to implementt algorithms described by the following articles:
http://nas.takming.edu.tw/chkao/lncs2001.pdf
http://www.securiteinfo.com/ebooks/palm/irvine-stega-jpg.pdf
Do you know about any algorithm that actually can survive jpeg compression?

You can hide the data in the frequency domain, JPEG saves information using DCT (Discrete Cosine Transform) for every 8x8 pixel block, the information that is invariant under compression is the highest frequency values, and they are arranged in a matrix, the lossy compression is done when the lowest coefficients of the matrix are rounded to 0 after the quantization of the block, these zeroes are arranged in the low-right part of the matrix and that is why the compression works and the information is lost.

Quite a few applications seem to implement Steganography on JPEG, so it's feasible:
http://www.jjtc.com/Steganography/toolmatrix.htm
Here's an article regarding a relevant algorithm (PM1) to get you started:
http://link.springer.com/article/10.1007%2Fs00500-008-0327-7#page-1

Perhaps the answer is late,but ...
You can do it in compressed domain steganography.Read image as binary file and analysis this file with libs like JPEG Parser. Based on your selected algorithm, find location of venues and compute new value of this venue and replace result bits in file data. Finally write file in same input extension.
I hope I helped.

What you're looking for is called watermarking.
A little warning: Watermarking algorithms use insane amounts of redundancy to ensure high robustness of the information being embedded. That means the amount of data you'll be able to hide in an image will be orders of magnitude lower compared to standard steganographic algorithms.

How to compress large data set in raw format for volume rendering?

I have a large data set which I want to render using CUDA. The size of data set is about 5GB. It is in 8 bit raw format. Is there a way to compress the data set to sizes less than 3 GB?
The loss in quality/details is fine with me. The reduced data set should also be in 8-bit raw format.

There are many answers which could fit your need, although it really depends on what you want to do with your data set.
1) Is your data set an 8-bit Grey image ?
If not : how is defined 'a loss of quality and details' ?
2) Do you need to access "any point" anywhere in the image ? Is the data set going to be access in "batch processing" or "random access" mode ?
3) Have you considered using basic Texture Compression algorithms, such as DXTC (or any other compression algorithm supported by your hardware, presumably NVidia) ?

There are two ways to implement volume rendering of large volume data sets：
(1)use compressed format to load volume data into GPU memory
(2)use out-of-core GPU volume rendering

efficient TIFF tile extraction C++

I am working with 1gb large tiff images of around 20000 x 20000 pixels. I need to extract several tiles (of about 300x300 pixels) out of the images, in random positions.
I tried the following solutions:
Libtiff (the only low level library I could find) offers TIFFReadline() but that means reading in around 19700 unnecesary pixels.
I implemented my own tiff reader which extracts a tile out of the image without reading in unnecesary pixels. I expected it to be faster, but doing a seekg for every line of the tile makes it very slow. I also tried reading to a buffer all the lines of the file that include my tile, and then extracting the tile from the buffer, but results are more or less the same.
I'd like to receive suggestions that would improve my tile extraction tool!
Everything is welcome, maybe you can propose a more efficient library I could use, some tips about C/C++ I/O, some higher level strategy for my needs, etc.
Regards,
Juan

[Major edit 14 Jan 10]
I was a bit confused by your mention of tiles, when the tiff is not tiled.
I do use tiled/pyramidical TIFF images. I've created those with VIPS
vips im_vips2tiff source_image output_image.tif:none,tile:256x256,pyramid
I think you can do this with :
vips im_vips2tiff source_image output_image.tif:none,tile:256x256,flat
You may want to experiment with tile size. Then you can read using TIFFReadEncodedTile.
Multi-resolution storage using pyramidical tiffs are much faster if you need to zoom in/out. You may also want to use this to have a coarse image nearly immediately followed by a detailed picture.
After switching to (appropriately sized) tiled storage (which will bring you MASSIVE performance improvements for random access!), your bottleneck will be disk io. File read is much faster if read in sequence. Here mmapping may be the solution.
Some useful links:
VIPS
IIPImage
LibTiff.NET stackoverflow
VIPS is a image handling library which can do much more than just read/write. It has its own, very efficient internal format. It has a good documentation on the algorithms. For one, it decouples processing from filesystem, thereby allowing tiles to be cached.
IIPImage is a multi-zoom webserver/browser library. I found the documentation a very good source of information on multi-resolution imaging (like google maps)
The other solution on this page, using mmap, is efficient only for 'small' files. I've hit the 32-bit boundaries often. Generally, allocating a 1 GByte chunk of memory will fail on a 32-bit os (with 4 GBytes RAM installed) due to the fact that even virtual memory gets fragemented after one or two application runs. Still, there is sufficient memory to cache parts or whole of the image. More memory = more performance.

Just mmap your file.
http://www.kernel.org/doc/man-pages/online/pages/man2/mmap.2.html

Thanks everyone for the replies.
Actually a change in the way tiles were required, allowed me to extract the tiles from the files in hard disk, in a sequential way, instead of a random way. This allowed me to load a part of the file into ram, and extract the tiles from there.
The efficiency gain was huge. Otherwise, if you need random access to a file, mmap is a good deal.
Regards,
Juan

I did something similar to this to handle an arbitrarily large TARGA(TGA) format file.
The thing that made it simple for that kind of file is that the image is not compressed. You can calculate the position of any arbitrary pixel within the image and find it with a simple seek. You might consider targa format if you have the option to specify the image encoding.
If not there are many varieties of TIFF formats. You probably want to use a library if they've already gone through the pain of supporting all the different formats.

Did you get a specific error message? Depending on how you used that command line, you could have been stepping on your own file.
If that wasn't the issue, try using imagemagick instead of vips if it's an option.

If I take a loss-compressed file and save it again (e.g. JPEG) will there be loss of quality?

I've often wondered, if I load a compressed image file, edit it and the save it again, will it loose some quality? What if I use the same quality grade when saving, will the algorithms somehow detect that the file has already be compressed as a JPEG and therefore there is no point trying to compress the displayed representation again?
Would it be a better idea to always keep the original (say, a PSD) and always make changes to it and then save it as a JPEG or whatever I need?

Yes, you will lose further file information. If making multiple changes, work off of the original uncompressed file.

When it comes to lossy compression image formats such as JPEG, successive compression will lead to perceptible quality loss. The quality loss can be in the forms such as compression artifacts and blurriness of the image.
Even if one uses the same quality settings to save an image, there will still be quality loss. The only way to "preserve quality" or better yet, lose as little quality as possible, is to use the highest quality settings that is available. Even then, there is no guarantee that there won't be quality loss.
Yes, it would be a good idea to keep a copy of the original if one is going to make an image using a lossy compression scheme such as JPEG. The original could be saved with a compression scheme which is lossless such as PNG, which will preserve the quality of the file at the cost of (generally) larger file size.
(Note: There is a lossless version of JPEG, however, the most common one uses techniques such as DCT to process the image and is lossy.)

In general, yes. However, depending on the compression format there are usually certain operations (mainly rotation and mirroring) that can be performed without any loss of quality by software designed to work with the properties of the file format.
Theoretically, since JPEG compresses each 8x8 block pf pixels independantly, it should be possible to keep all unchanged blocks of an image if it is saved with the same compression settings, but I'm not aware of any software that implements this.

Of course. Because level of algorithm used initially will probably be different than in your subsequent saves. You can easily check this by using an Image manipulation software (eg. Photoshop). Save your file several times and change level of of compression each time. Just a slight bit. You'll see image degradation.

If the changes are local (fixing a few pixels, rather than reshading a region) and you use the original editing tool with the same settings, you may avoid degradation in the areas that you do not affect. Still, expect some additional quality loss around the area of change as the compressed blocks are affected, and cannot be recovered.
The real answer remains to carry out editing on the source image, captured without compression where possible, and applying the desired degree of compression before targeting the image for use.

Yes, you will always lose a bit of information when you re-save an image as JPEG. How much you lose depend on what you have done to the image after loading it.
If you keep the image the same size and only make minor changes, you will not lose that much data. When the image is loaded, an approximation of the original image is recreated from the compressed data. If you resave the image using the same compression, most of the data that you lose will be data that was recreated when loading.
If you resize the image, or edit large areas of it, you will lose more data when resaving it. Any edited part of the image will lose about the same amount of information as when you first compressed it.
If you want to get the best possible quality, you should always keep the original.

Does anyone know of a program/method to compress just certain parts of a PNG image w/o slicing it?

Please help! Thanks in advance.
Update: Sorry for the delayed response, but if it is helpful to provide more context here, since I'm not sure what alternative question I should be asking.
I have an image for a website home page that is 300px x 300px. That image has several distinct regions, including two that have graphical copy on top of the regions.
I have compressed the image down as much as I can without compromising the appearance of that text, and those critical regions of the image.
I tried slicing the less critical regions of the image and saving those at lower compressions in order to get the total kbs down, but as gregmac posted, the sections don't look right when rejoined.
I was wondering if there was a piece of software out there, or manual solution for identifying critical regions of an image to "compress less" and could compress other parts of the image more in order to get the file size down, while keeping those elements in the graphic that need to be high resolution sharper.

You cannot - you can only compress an entire PNG file.
You don't need to (I cannot think of a single case where compressing a specific portion of a PNG file would be useful)
Dividing the image in to multiple parts ("slicing") is the only way to compress different portions of a image file, although I'd even recommend again using different compression levels in one "sliced image", as differing compression artefacts joining up will probably look odd
Regarding your update,
identifying critical regions of an image to "compress less" and could compress other parts of the image more in order to get the file size down
This is inherently what image compression does - if there's a bit empty area it will be compressed to a few bytes (using RLE for example), but if there's a very detailed region it will have more bytes "spent" on it.
The problem sounds like the image is too big (in terms of file-size), have you tried other image formats, mainly GIF or JPEG (or the other PNG format, PNG-8 or PNG-24)?
I have compressed the image down as much as I can without compromising the appearance of that text
Perhaps the text could be overlaid using CSS, rather than embedded in the image? Might not be practical, but it would allow you to compress the background more (if the background image is a photo, JPEG might work best, since you no longer have to worry about the text)
Other than that, I'm out of ideas. Is the 300*300px PNG really too big?

It sounds like you are compressing parts of your image using something like JPEG and then pasting those compressed images onto a PNG combined with other images, and the entire PNG is sent to the browser where you split them up.
The problem with this is that the more you compress your JPEG parts the more decompression artifacts you will get. Then when you put these low quality images onto the PNG, which uses deflate compression, you will actually end up increasing the file size because it won't be able to compress well.
So if you are keen on keeping PNG as your file format the best solution would be to not compress the parts using JPEG which you paste onto your PNG - keep everything as sharp as possible.
PNG compresses each row separately unless you have used a "predictor" in the compression.
So it's best to keep your PNG as wide as possible with similar images next to each other horizontally rather than under each other vertically.
Perhaps upload an example of the images you're working with?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js