I need to understand JPG decompression so that I don't need other libraries that just do it.
After being able to identify the different parts of a JPG file in terms of file format, what do I need to do, understand or learn first in mathematical or algorithmic terms so I can start implementing decoding primitives?
Look at this answer to find all the specifications you need to read, and then read them. Several times. Front to back. Then start to implement, testing along the way often with many example jpeg files.
It wouldn't hurt to know a little bit about fourier transforms and then the discrete cosine transform, and also how Huffman codes work. Though you could pick up much of what you need from the specifications.
Related
I have 3 main questions:
Let's say I have a large text file. (1)Is replacing the words with their rank an effective way to compress the file? (Got answer to this question. This is a bad idea.)
Also, I have come up with a new compression algorithm. I read some existing compression models that are used widely and I found out they use some pretty advanced concepts like statistical redundancy and probabilistic prediction. My algorithm does not use all these concepts and is a rather simple set of rules that need to be followed while compressing and decompressing. (2)My question is am I wasting my time trying to come up with a new compression algorithm without having enough knowledge about existing compression schemes?
(3)Furthermore, if I manage to successfully compress a string can I extend my algorithm to other content like videos, images etc.?
(I understand that the third question is difficult to answer without knowledge about the compression algorithm. But I am afraid the algorithm is so rudimentary and nascent I feel ashamed about sharing it. Please feel free to ignore the third question if you have to)
Your question doesn't make sense as it stands (see answer #2), but I'll try to rephrase and you can let me know if I capture your question. Would modeling text using the probability of individual words make for a good text compression algorithm? Answer: No. That would be a zeroth order model, and would not be able to take advantage of higher order correlations, such as the conditional probability of a given word following the previous word. Simple existing text compressors that look for matching strings and varied character probabilities would perform better.
Yes, you are wasting your time trying to come up with a new compression algorithm without having enough knowledge about existing compression schemes. You should first learn about the techniques that have been applied over time to model data, textual and others, and the approaches to use the modeled information to compress the data. You need to study what has already been researched for decades before developing a new approach.
The compression part may extend, but the modeling part won't.
Do you mean like having a ranking table of words sorted by frequency and assign smaller "symbols" to those words that are repeated the most, therefore reducing the amount of information that needs to be transmitted?
That's basically how Huffman Coding works, the problem with compression is that you always hit a limit somewhere along the road, of course, if the set of things that you try to compress follows a particular pattern/distribution then it's possible to be really efficient about it, but for general purposes (audio/video/text/encrypted data that appears to be random) there is no (and I believe that there can't be) "best" compression technique.
Huffman Coding uses frequency on letters. You can do the same with words or with letter frequency in more dimensions, i.e. combinations of letters and their frequency.
I want to make my own text file compression program. I don't know much about C++ programming, but I have learned all the basics and writing/reading a file.
I have searched on google a lot about compression, and saw many different kind of methods to compress a file like LZW and Huffman. The problem is that most of them don't have a source code, or they have a very complicated one.
I want to ask if you know any good webpages where I can learn and make a compression program myself?
EDIT:
I will let this topic be open for a little longer, since I plan to study this the next few days, and if I have any questions, I'll ask them here.
Most of the algorithms are pretty complex. But they all have in common that they are taking data that is repeating and only store it once and have a system of knowing how to uncompress them (putting the repeated segments back in place)
Here is a simple example you can try to implement.
We have this data file
XXXXFGGGJJ
DDDDDDDDAA
XXXXFGGGJJ
Here we have chars that repeat and two lines that repeat. So you could start with finding a way to reduce the filesize.
Here's a simple compression algorithm.
4XF3G2J
8D2A
4XF3G2J
So we have 4 of X, one of F, 3 of G etc.
You can try this page it contains a clear walk through of the basics of compression and the first principles.
Compression is not the most easy task. I took a college class to learn about compression algorithms like LZW and Huffman, and I can tell you that they're not that easy. If C++ is your first language and you're just starting into this sort of thing, I wouldn't recommend trying to write your own sorting algorithm just yet. If you are more experienced, then I would try writing source without any code being provided to you - this shows that you truly understand the compression algorithm.
This is how I was taught - the professor explained the algorithm in very broad terms, and then either we would implement it (in Java, mind you) or we would answer questions about how the algorithm would behave under certain circumstances. If we could do either of those, then we really knew the algorithm - without him showing us any source at all - it's a good skill to develop ;)
Huffman encoding tree's are not too complicated, I'd start with them. Here's a link: Example: Huffman Encoding Trees
I'm wondering if there is a way to extract the necessary data out of an autocad .dxf file, so I can visualize the structure in opengl?
I've found some old cold snippets for windows written in cpp but since the standard changes I assume 15 yr old code is a little outdated.
Also, there is a book about the .dxf file standard but it's also from the 90's and aside that, rarely available.
Another way might be to convert it to some other file format and then extract the data I need.
Trying to look into the .dxf files didn't give too much insight either since a simple cuboid contains a lot of data already!
Can anyone give me hint on how to approach this?
The references are a good place to start, but if you are doing heavy 3D work it may not be possible to accomplish what you are attempting..
We recently wrote a DXF converter in JAVA based entirely on the references. Although many of the entities are relatively straightfoward, many other entities (3DSOLID, BODY, REGION, SURFACE, Swept Surface) are not really possible to translate, since the reference states that the groups are primarily proprietary data. Other objects (Extruded Surface, Revolved Surface, Swept Surface (again)) have significant chunks of binary data which may hold important information you need.
These entities were not vital for our efforts, but if you are looking to convert to OpenGL, these may be the entities you were particularly concerned with.
Autodesk has references for the DXF formats used by recent revisions of AutoCAD. I'd probably take a second look at that 15 year-old code though. Even if you can't/don't use it as-is, it may provide a decent starting point. The DXF specification is sufficiently large and complex that having something to start from, and just add new bits and pieces where needed can be a big help. As an interchange format, DXF has to be pretty conservative anyway, only including elements that essentially all programs can interpret reasonably directly.
I'd probably be more concerned about the code itself than changes in the DXF format. A lot of code that old uses deep, monolithic class hierarchies that's quite a bit different from what you'd expect in modern C++.
In an effort to get better at programming assembly, and as an academic exercise, I would like to write a non-trivial program in x86 assembly. Since file compression has always been kind of an interest to me, I would like to write something like the zip utility in assembly.
I'm not exactly out of my element here, having written a simple web server using assembly and coded for embedded devices, and I've read some of the material for zlib (and others) and played with its C implementation.
My problem is finding a routine that is simple enough to port to assembly. Many of the utilities I've inspected thus far are full of #define's and other included code. Since this is really just for me to play with, I'm not really interested in super-awesome compression ratios or anything like that. I'm basically just looking for the RC4 of compression algorithms.
Is a Huffman Coding the path I should be looking down or does anyone have another suggestion?
And here is a more sophisticated algorithm which should not be too hard to implement: LZ77 (containing assembly examples) or LZ77 (this site contains many different compression algorithms).
One option would be to write a decompressor for DEFLATE (the algorithm behind zip and gzip). zlib's implementation is going to be heavily optimized, but the RFC gives pseudocode for a decoder. After you have learned the compressed format, you can move on to writing a compressor based on it.
I remember a project from second year computing science that was something similar to this (in C).
Basically, compressing involves replacing a string of xxxxx (5 x's) with #\005x (the at sign, a byte with a value of 5, followed by the repeated byte. This algorithm is very simple. It doesn't work that well for English text, but works surprisingly well for bitmap images.
Edit: what I am describing is run length encoding.
Take a look at UPX executable packer. It contains some low-level decompressing code as part of unpacking procedures...
I am looking for a compression algorithm (for a programming competition) and I need a full description of how to implement it (all technical details), any loseless and patent-free algorithm will do, but the ease of implementation is a bonus :)
(Although possibly irrelevant) I plan to implement the algorithm in C++...
Thanks in advance.
EDIT:
I will be compressing text files only, no other file types...
Well, I can't go so far as to complete the competition for you, but please check out this article on wiki: Run Length Encoding. It is by far one of the simplest ways to compress data, albeit not always an efficient one. Compression is also domain specific, even amongst lossless algorithms you will find that what you are compressing determines how best to encode it.
RFC 1951 describes inflate/deflate, including a brief description of the compressor's algorithm. Antaeus Feldspar's An Explanation of the Deflate Algorithm provides a bit more background.
Also, the zlib source distribution contains a simplified reference inflater in contrib/puff/puff.c that can be helpful reading to understand exactly how the bits are arranged (but it doesn't contain a deflate, only inflate).
I'd start here on Wikipedia.
There's a whole lot to choose from, but without knowing more about what you want it's difficult to help more. Are you compressing text, images, video or just random files? Each one has it's own set of techniques and challenges for optimal results.
If ease of implementation is the sole criterion I'd use "filecopy" compression. Guaranteed compression ratio of exactly 1:1, and trivial implementation...
Huffman is good if you're compressing plain text. And all the commenters below assure me it's a joy to implement ;D
Ease of implementation: Huffman, as stated before. I believe LZW is no longer under patent, but I don't know for sure. It' a relatively simple algorithm. LZ77 should be available, though. Lastly, the Burrows-Wheeler transform allows for compression, but it's significantly more difficult to implement.
I like this introduction to the Burrows-Wheeler Transform.
If you go under "View" in your internet browser, there should be an option to either "Zoom Out" or make the text smaller.
Select one of those and...
BAM!
You just got more text on the same screen! Yay compression!
The Security Now! podcast recently put out an episode highlighting data compression algorithms. Steve Gibson gives a pretty good explanation of the basics of Huffman and Lempel-Ziv compression techniques. You can listen to the audio podcast or read the transcript for Episode 205.