Does Compressed Sensing bring anything new to data Compression? [closed] - compression

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Compressed sensing is great for situations where capturing data is expensive (either in energy or time). It works by taking a smaller number of samples and using linear or convex programming to reconstruct the original reference signal away from the sensor.
However, in situations like image compression, given that the data is already on the computer -- does compressed sensing offer anything? For example, would it offer better data compression? Would it result in better image search?...

With regards to your question
"...given that the data is already on the computer -- does compressed sensing offer anything? For example, would it offer better data compression? Would it result in better image search?..."
In general the answer to your question is no it would not offer better data compression at least initially! This is the case for images where nonlinear schemes like jpeg does better than compressed sensing by a constant of 4 to 5 and comes from the klog(N/K) constant found in diverse theoretical results in different papers.
I said initially because right now compressed sensing is mostly focused on the concept of sparsity but there is new work now coming up that tries to use additional information such as the fact that wavelets decomposition comes in clumps that could improve the compression. This work and others are likely to provide additional improvement with maybe the possibility of getting close to the nonlinear transform such as jpeg.
The other thing you have to keep in mind is that jpeg is the result of a focused effort of the whole industry and many years of research. So it really is difficult to do better than that but compressive sensing really provides some means of compression of other datasets without the need for the years of experience and manpower.
Finally, there is something immensely awe inspiring in the compression found in compressive sensing. It is universal, this means that right now you may "decode" image to a certain level of detail and then in ten years, using the same data you might actually "decode" a better image/dataset (this is with the caveat that the information was there in the first place) because your solvers will be better. You cannot do that with jpeg or jpeg2000 because the data that is compressed is intrinsically connected to the decoding scheme.
(disclosure: I write a small blog on compressed sensing)

Since the whole point of compressed sensing is to avoid taking measurements, which, as you say, can be expensive to take, it should come as no surprise that the compression ratio will be worse than if the compression implementation is allowed to make all the measurements it wants, and cherry pick the ones that generates the best outcome.
As such, I very much doubt that an implementation utilizing compressed sensing for data already present (in effect, already having all the measurements), is going to produce better compression ratios than the optimal result.
Now, having said that, compressed sensing is also about picking a subset of the measurements that will reproduce a result that is similar to the original when decompressed, but might lack some of the detail, simply because you're picking that subset. As such, it might also be that you can indeed produce better compression ratios than the optimal result, at the expense of a bigger loss of detail. Whether this is better than, say, a jpeg compression algorithm where you simply throw out more of the coefficients, I don't know.
Also, if, say, an image compression implementation that utilizes compressed sensing can reduce the time it takes to compress the image from the raw bitmap data, that might give it some traction in scenarios where the time used is an expensive factor, but the detail level is not. For instance.
In essence, if you have to trade speed for quality of results, a compressed sensing implementation might be worth looking into. I have yet to see widespread usage of this though so something tells me it isn't going to be worth it, but I could be wrong.
I don't know why you bring up image search though, I don't see how the compression algorithm can help on image search, unless you will somehow use the compressed data to search for images. This will probably not do what you want, related to image search, as very often you search for images that contain certain visual patterns, but aren't 100% identical.

This may not be the exact answer for your question but I just want to emphasise on other important application domains of CS. Compressive Sending can be a great advantage in Wireless Multimedia Networks where there is great emphasis on powerconsumption of the sensor node. Here the sensor node has to transmit the information (say an image taken by a survillance camera). If it has to transmit all the samples, we cannot afford to improve the network lifetime. Where as if we use JPEG compression it bring in high complexity on the encoder (sensor node) side which is again undesirable. So, compressive Sensing somehow hwlps in moving the complexity from the encoider side to decoder side.
As a researcher in the area we are successful in transmitting an image and a video in a lossy channel with considerable quality only by sending 52% of the total samples.

One of the benefits of compressed sensing is that the sensed signal is not only compressed but it's encrypted as well. The only way a reference signal can be reconstructed from its sensed signal is to perform optimization (linear or convex programming) on a reference signal estimate when applied to the basis.
Does it offer better data compression? That's going to be application dependent. First, it will only work on sparse reference signals, meaning it's probably only applicable to image, audio, RF signal compression, and not applicable to general data compression. In some cases it may be possible to get a better compression ratio using compressed sensing than other approaches, and in other instances, that won't be the case. It dependes on the nature of the signal being sensed.
Would it result in better image search? I have little hesitation answering this "no". Since the sensed signal is both compressed and encrypted, there is virtually no way to reconstruct the reference signal from the sensed signal without the "key" (basis function). In those instances where the basis function is available, the reference signal still would need to be reconstructed to perform any sort of image processing / object identification / characterization or the like.

Compress sensing means some data can be reconstructed by some measurements. Most data can be linear transformed in another linear space in which most of the dimentions can be ignored.
So it means we can reconstruct most data in some dimentions, the "some" can be low rate of the number of premitive dimentions.

Related

Compression ratio of LZW, LZ77 and other easy-to-implement algorithms

I want to compress .txt files that contains dates in yyyy-mm-dd hh:mm:ss format and english words that sometimes tend to be repeated in different lines.
I read some articles about compression algorithm and find out that in my case dictionary based encoding is better than entropy based encoding. Since I want to implement algorithm myself I need something that isn't very complicated. So I paid attention to LZW and LZ77, but can't choose between them, because conclusions of articles I found are contradictory. According to some articles LZW has better compression ratio and according to others leader is LZ77. So the question is which one is most likely will be better in my case? Is there more easy-to-implement algorithms that can be good for my purpose?
LZW is obsolete. Modern, and even pretty old, LZ77 compressors outperform LZW.
In any case, you are the only one who can answer your question, since only you have examples of the data you want to compress. Simply experiment with various compression methods (zstd, xz, lz4, etc.) on your data and see what combination of compression ratio and speed meets your needs.

SLAM system that uses deep learned features?

Has anybody tried developing a SLAM system that uses deep learned features instead of the classical AKAZE/ORB/SURF features?
Scanning recent Computer Vision conferences, there seem to be quite a few reports of successful usage of neural nets to extract features and descriptors, and benchmarks indicate that they may be more robust than their classical computer vision equivalent. I suspect that extraction speed is an issue, but assuming one has a decent GPU (e.g. NVidia 1050), is it even feasible to build a real-time SLAM system running say at 30FPS on 640x480 grayscale images with deep-learned features?
This was a bit too long for a comment, so that's why I'm posting it as an answer.
I think it is feasible, but I don't see how this would be useful. Here is why (please correct me if I'm wrong):
In most SLAM pipelines, precision is more important than long-term robustness. You obviously need your feature detections/matchings to be precise to get reliable triangulation/bundle (or whatever equivalent scheme you might use). However, the high level of robustness that neural networks provide is only required with systems that do relocalization/loop closure on long time intervals (e.g. need to do relocalization in different seasons etc). Even in such scenarios, since you already have a GPU, I think it would be better to use a photometric (or even just geometric) model of the scene for localization.
We don't have any reliable noise models for the features that are detected by the neural networks. I know there have been a few interesting works (Gal, Kendall, etc...) for propagating uncertainties in deep networks, but these methods seem a bit immature for deployment ins SLAM systems.
Deep learning methods are usually good for initializing a system, and the solution they provide needs to be refined. Their results depend too much on the training dataset, and tend to be "hit and miss" in practice. So I think that you could trust them to get an initial guess, or some constraints (e.g. like in the case of pose estimation: if you have a geometric algorithm that drifts in time, then you can use the results of a neural network to constrain them. But I think that the absence of a noise model as mentioned previously will make the fusion a bit difficult here...).
So yes, I think that it is feasible and that you can probably, with careful engineering and tuning produce a few interesting demos, but I wouldn't trust it in real life.

Have the monetdb's developers tested any other compression algorithm on it?

Have the MonetDb's developers tested any other compression algorithm on it before?
Perhaps they have tested other compression algorithms ,but it's really had a negative performance impact.
So why haven't they improved this database's compression performance?
I am a student from China. MonetDb is really interesting me and I want to try to improve its compression performance.
So, I should make sure that any body have done this before.
It would be my grateful if you could answer my question.
That is because i really need this.
Thank you So much.
MonetDB only compresses String (Varchar and char) types using dictionary compression and only if the number of unique strings in a column is small.
Integrating any other kind of compression (even simple ones like Prefix-Coding, Run-length Encoding, Delta-compression, ...) need a complete rewrite of the system because the operators have to be made compression-aware (which pretty much means creating a new operator).
The only thing that may be feasible without a complete rewrite is having dedicated compression operators the compress/decompress data instead of spilling to disk. However, this would be very close to the memory compression apple implemented in Mavericks
MonetDB compresses columns using PFor compression. See http://paperhub.s3.amazonaws.com/7558905a56f370848a04fa349dd8bb9d.pdf for details. This also answers the your question about checking other compression methods.
The choice for PFOR is because of the way modern CPU's work, but really any algorithm that doesn't work with branches but with (only) arithmetics will do just fine. I've hit similar speeds with arithmetic coding in the past.

Why don't we use word ranks for string compression?

I have 3 main questions:
Let's say I have a large text file. (1)Is replacing the words with their rank an effective way to compress the file? (Got answer to this question. This is a bad idea.)
Also, I have come up with a new compression algorithm. I read some existing compression models that are used widely and I found out they use some pretty advanced concepts like statistical redundancy and probabilistic prediction. My algorithm does not use all these concepts and is a rather simple set of rules that need to be followed while compressing and decompressing. (2)My question is am I wasting my time trying to come up with a new compression algorithm without having enough knowledge about existing compression schemes?
(3)Furthermore, if I manage to successfully compress a string can I extend my algorithm to other content like videos, images etc.?
(I understand that the third question is difficult to answer without knowledge about the compression algorithm. But I am afraid the algorithm is so rudimentary and nascent I feel ashamed about sharing it. Please feel free to ignore the third question if you have to)
Your question doesn't make sense as it stands (see answer #2), but I'll try to rephrase and you can let me know if I capture your question. Would modeling text using the probability of individual words make for a good text compression algorithm? Answer: No. That would be a zeroth order model, and would not be able to take advantage of higher order correlations, such as the conditional probability of a given word following the previous word. Simple existing text compressors that look for matching strings and varied character probabilities would perform better.
Yes, you are wasting your time trying to come up with a new compression algorithm without having enough knowledge about existing compression schemes. You should first learn about the techniques that have been applied over time to model data, textual and others, and the approaches to use the modeled information to compress the data. You need to study what has already been researched for decades before developing a new approach.
The compression part may extend, but the modeling part won't.
Do you mean like having a ranking table of words sorted by frequency and assign smaller "symbols" to those words that are repeated the most, therefore reducing the amount of information that needs to be transmitted?
That's basically how Huffman Coding works, the problem with compression is that you always hit a limit somewhere along the road, of course, if the set of things that you try to compress follows a particular pattern/distribution then it's possible to be really efficient about it, but for general purposes (audio/video/text/encrypted data that appears to be random) there is no (and I believe that there can't be) "best" compression technique.
Huffman Coding uses frequency on letters. You can do the same with words or with letter frequency in more dimensions, i.e. combinations of letters and their frequency.

best compression algorithm with the following features

What is the best compression algorithm with the following features:
should take less time to decompress (can take reasonably more time compress)
should be able to compress sorted data (approx list of 3,000,000 strings/integers ...)
Please suggest along with metrics: compression ratio, algorithmic complexity for compression and decompression (if possible)?
Entire site devoted to compression benchmarking here
Well if you just want speed, then standard ZIP compression is just fine and it's most likely integrated into your language/framework already (ex: .NET has it, Java has it). Sometimes the most universal solution is the best, ZIP is a very mature format, any ZIP library and application will work with any other.
But if you want better compression, I'd suggest 7-Zip as the author is very smart, easy to get ahold of and encourages people to use the format.
Providing you with compression times is impossible, as it's directly related to your hardware. If you want a benchmark, you have to do it yourself.
You don't have to worry about decompression time. The time spent the higher compression level is mostly finding the longest matching pattern.
Decompression either
1) Writes the literal
2) for (backward position, length)=(m,n) pair,
goes back, in the output buffer, m bytes,
reads n bytes and
writes n bytes at the end of the buffer.
So the decompression time is independent of the compression level. And, with my experience with Universal Decompression Virtual Machine (RFC3320), I guess the same is true for any decompression algorithm.
This is an interessing question.
On such sorted data of strings and integers, I would expect that difference coding compression approaches would outperform any out-of-the-box text compression approach as LZ77 or LZ78 in terms of compression ratio. General purpose encoder do not use the special properties of the data.