The Brotli compression format is excellently documented in RFC 7932. You can just read this RFC top to bottom, and it tells you how the format works.
However, while you could probably implement a decoder (decompressor) based on the RFC alone, the RFC doesn't describe the encoder algorithm that is part of Google's reference C implementation (the brotli command line tool). In other words, it doesn't tell us what strategies the encoder uses at different quality levels to find an efficient compressed representation for a given input stream.
Of course I can always read the encoder source, but I was wondering if there was an accessible high-level description of how the encoder works?
All I am aware of is a very brief description in this article:
The higher data density is achieved by a 2nd order context modeling,
re-use of entropy codes, larger memory window of past data and joint
distribution codes.
More importantly, from the same article:
the new algorithm is named after Swiss bakery products. Brötli means
‘small bread’ in Swiss German.
Update:
AardvarkSoup added a much better answer with this link to a compreshensive paper on how Brotli works, by its authors. Some moderator inexplicably deleted that answer, so I have copied the link here.
Related
I'm writing a tiff decoder and I can't find any technical resources with resources for decoding CCITT Fax Group 3 or 4.
Does anybody have any resources which explain these? The regular TIFF 6.0 document doesn't tell much about decoding. I guess each segment (tile or strip) are encoded independently but that's pretty much the only information I know. I have implemented compression type 2 and is that huffman code tree the same that is used for compression type 3 or 4?
If nobody can find any resources, please post any hints or code or maybe point to an open source library which contains an implementation (most preferably in Java, but any language works). I have looked at the GDAL source code but that file was huge and I would like to see that as an last resort.
Thanks!
I can only point you to the official documents (specifications).
These documentations aren't available online because the organization that produces them do not allow them to be posted on the internet. You will have to buy these documents from the standards organization.
As for finding relevant code samples, your best bets are:
Libtiff (C library)
Libtiff.NET (.NET library)
Java Imaging
A high level overview, without going into the details, can be found on the Wikipedia article
http://en.wikipedia.org/wiki/Fax#Compression
Terms such as "Modified Read (MR)", "Modified Modified Read (MMR)", are examples of decompression algorithms that are implemented in a TIFF encoding/decoding library.
List of official documentations
"T.4 group 3 Fax"
"T-REC-T.4-200307-I!!PDF-E.pdf"
"T-REC-T.6-198811-I!!PDF-E.pdf"
"T-REC-T.563-199610-I!!PDF-E.pdf"
199707, 199710, 199806, 199904
I have an int16_t[] buffer with PCM raw audio data and I want to apply some effects (like echo, reverb, gain...) into it.
I thought that SoX or similar can do the trick for me, but SoX only works with files and other similar libraries that supports adding sound effects seems to add the effects only when the sound is played. So my problem with this is that I want to apply the effect to the samples into my buffer without playing them.
I have never worked with audio, but reading about PCM data I have learned that I can apply gain multiplying each sample value, for example. But I'm looking for any library or relatively easy algorithms that I can use directly in my buffer to get the sound effects applied.
I'm sure there are a lot of solutions to my problem out there if you know what to look for, but it's my first time with audio "processing" and I'm lost, as you can see.
For everyone like me, interested in learning DSP related to audio processing with C++ I want to share my little research results and opinion, and perhaps save you some time :)
After trying several DSP libraries, finally I have found The Synthesis ToolKit in C++ (STK), an open-source library that offer easy and clear interfaces and easy to understand code that you can dive in to learn about various basic DSP algorithms.
So, I recommend to anyone who is starting out and have no previous experience to take a look at this library.
Your int16_t[] buffer contains a sequence of samples. They represent instantaneous amplitude levels. Think of them as the voltage to apply to the speaker at the corresponding instant in time. They are signed numbers with values in the range (-32767,32767]. A stream of constant zeros means silence. A stream of constant -32000 (for example) also means silence, but it will eventually burn your your speaker coil. The position in the array represents time, and the value of each sample represents voltage.
If you want to mix two sample streams together, for example to apply a chirp, you get yourself a sample stream with the chirp in it (record a bird or something). You then add the two sounds sample by sample.
You can do a super-cheesy reverb effect by taking your original sound buffer, lowering its volume (perhaps by dividing all the samples by a constant), and adding it back to your original stream, but shifting the samples by a tenth of a second's worth of array position.
Those are the basics of audio processing. Things get very sophisticated indeed. This field is known as "digital signal processing" and there are plenty of books on the subject.
You can do it either with hacking the audio buffer and trying to do some effects like gain and threshold with simple math operations or do it correct using proper DSP algorithms. If you wish to do it correct, I would recommend using the Speex Library. It's open source and and well tested. www (dot)speex (dot)org. The code should compile on MSVC or linux with minimal effort. This is the fastest way to get a good audio code working with proper DSP techniques. Your code would look like .. please read the AEC example.
st = speex_echo_state_init(NN, TAIL);
den = speex_preprocess_state_init(NN, sampleRate);
speex_echo_ctl(st, SPEEX_ECHO_SET_SAMPLING_RATE, &sampleRate);
speex_preprocess_ctl(den, SPEEX_PREPROCESS_SET_ECHO_STATE, st);
You need to setup the states, the code testecho includes these.
I am trying to take a video frame that I have and packettize it into various RTP packets. I am using jrtp, and am working in C++, can this be done with this library? If so how do I go about this?
Thank you,
First, know what codec you have. (H.263, H.264, MPEG-2, etc). Then find the IETF AVT RFC for packetizing that codec (RFC 3984 for H.264 for example). Then look for libraries or implementations of that RFC (and look in jrtp), or code it yourself.
jrtplib provides only basic RTP/RTCP functionality. You have to do any media-type specific packetization yourself. If you look at the RTPPacket constructor, it takes payload data and payload length parameters (amongst others). The RTPPacketBuilder could also be of interest to you.
If you decide to do this yourself, you need to read the corresponding RFCs and implement according to them as jesup stated.
FYI, the c++ live555 Streaming Media library handles packetization of many video formats for you, but is also a lot more complex.
The input data is a byte array which represents a h.264 frame. The frame consists of a single slice (not multislice frame).
So, as I understood I can cope with this frame as with slice. The slice has header, and slice data - macroblocks, each macroblock with its own header.
So I have to parse that byte array to extract frame number, frame type, quantisation coefficient (as I understood each macroblock has its own coefficient? or I'm wrong?)
Could You advise me, where I can get more detailed information about parsing h.264 frame bytes.
(In fact I've read the standard, but it wasn't very specific, and I'm lost.)
Thanks
The H.264 Standard is a bit hard to read, so here are some tips.
Read Annex B; make sure your input starts with a start code
Read section 9.1: you will need it for all of the following
Slice header is described in section 7.3.3
"Frame number" is not encoded explicitly in the slice header; frame_num is close to what you probably want.
"Frame type" probably corresponds to slice_type (the second value in the slice header, so most easy to parse; you should definitely start with this one)
"Quantization coefficient" - do you mean "quantization parameter"? If yes, be prepared to write a full H.264 parser (or reuse an existing one). Look in section 9.3 to get an idea on a complexity of a H.264 parser.
Standard is very hard to read. You can try to analyze source code of existing H.264 video stream decoding software such as ffmpeg with it's C (C99) libraries. For example there is avcodec_decode_video2 function documented here. You can get full working C (open file, get H.264 stream, iterate thru frames, dump information, get colorspace, save frames as raw PPM images etc.) here. Alternatively there is great "The H.264 Advanced Video Compression Standard" book, which explains standard in "human language". Another option is to try Elecard StreamEye Pro software (there is trial version), which could give you some additional (visual) perspective.
Actually much better and easier (it is only my opinion) to read H.264 video coding documentation.
ffmpeg is very good library but it contain a lot of optimized code. Better to look at reference implementation of the H.264 codec and official documentation.
http://iphome.hhi.de/suehring/tml/download/ - this is link to the JM codec implementation.
Try to separate levels of decoding process, like transport layer that contains NAL units (SPS, PPS, SEI, IDR, SLICE, etc). Than you need to implement VLC engine (mostly exp-Golomb codes of 0 range). Than very difficult and powerful codec called CABAC (Context Adaptive Arithmetic Binary Codec). It is quite tricky task. Demuxing process (goes after unpacking of a video data) also complicated. You need completely understand each of such modules.
Good luck.
How do you programmatically compress a WAV file to another format (PCM, 11,025 KHz sampling rate, etc.)?
I'd look into audacity... I'm pretty sure they don't have a command line utility that can do it, but they may have a library...
Update:
It looks like they use libsndfile, which is released under the LGPL. I for one, would probably just try using that.
Use sox (Sound eXchange : universal sound sample translator) in Linux:
SoX is a command line program that can convert most popular audio files to most other popular audio file formats. It can optionally
change the audio sample data type and apply one or more sound effects to the file during this translation.
If you mean how do you compress the PCM data to a different audio format then there are a variety of libraries you can use to do this, depending on the platform(s) that you want to support. If you just want to change the sample rate of the PCM data then you need a sample rate conversion algorithm instead, which is a completely different problem. Can you be more specific in your requirements?
You're asking about resampling, and more specifically downsampling, not compression. While both processes are lossy (meaning that you will suffer loss of information), downsampling works on raw samples instead of in the frequency domain.
If you are interested in doing compression, then you should look into lame or OGG vorbis libraries; you are no doubt familiar with MP3 and OGG technology, though I have a feeling from your question that you are interested in getting back a PCM file with a lower sampling rate.
In that case, you need a resampling library, of which there are a few possibilites. The most widely known is libsamplerate, which I honestly would not recommend due to quality issues not only within the generated audio files, but also of the stability of the code used in the library itself. The other non-commercial possibility is sox, as a few others have mentioned. Depending on the nature of your program, you can either exec sox as a separate process, or you can call it from your own code by using it as a library. I personally have not tried this approach, but I'm working on a product now where we use sox (for upsampling, actually), and we're quite happy with the results.
The other option is to write your own sample rate conversion library, which can be a significant undertaking, but, if you only are interested in converting with an integer factor (ie, from 44.1kHz to 22kHz, or from 44.1kHz to 11kHz), then it is actually very easy, since you only need to strip out every Nth sample.
In Windows, you can make use of the Audio Compression Manager to convert between files (the acm... functions). You will also need a working knowledge of the WAVEFORMAT structure, and WAV file formats. Unfortunately, to write all this yourself will take some time, which is why it may be a good idea to investigate some of the open source options suggested by others.
I have written a my own open source .NET audio library called NAudio that can convert WAV files from one format to another, making use of the ACM codecs that are installed on your machine. I know you have tagged this question with C++, but if .NET is acceptable then this may save you some time. Have a look at the NAudioDemo project for an example of converting files.