How to decode huffman code quickly? - c++
I have implementated a simple compressor using pure huffman code under Windows.But I do not know much about how to decode the compressed file quickly,my bad algorithm is:
Enumerate all the huffman code in the code table then compare it with the bits in the compressed file.It turns out horrible result:decompressing 3MB file would need 6 hours.
Could you provide a much more efficient algorithm?Should I use Hash or something?
Update:
I have implementated the decoder with state table,based on my friend Lin's advice.I think this method should be better than travesal huffman tree,3MB within 6s.
thanks.
One way to optimise the binary-tree approach is to use a lookup table. You arrange the table so that you can look up a particular encoded bit-pattern directly, allowing for the maximum possible bit-width of any code.
Since most codes don't use the full maximum width, they are included at multiple locations in the table - one location for each combination of the unused bits. The table indicates how many bits to discard from the input as well as the decoded output.
If the longest code is too long, so the table is impractical, a compromise is to use a tree of smaller fixed-width-subscript lookups. For example, you can use a 256-item table to handle a byte. If the input code is more than 8 bits, the table entry indicates that decoding is incomplete and directs you to a table that handles the next up-to 8 bits. Larger tables trade memory for speed - 256 items is probably too small.
I believe this general approach is called "prefix tables", and is what BobMcGees quoted code is doing. A likely difference is that some compression algorithms require the prefix table to be updated during decompression - this is not needed for simple Huffman. IIRC, I first saw it in a book about bitmapped graphics file formats which included GIF, some time before the patent panic.
It should be easy to precalculate either a full lookup table, a hashtable equivalent, or a tree-of-small-tables from a binary tree model. The binary tree is still the key representation (mental model) of how the code works - this lookup table is just an optimised way to implement it.
Why not take a look at how the GZIP source does it, specifically the Huffman decompression code in specifically unpack.c? It's doing exactly what you are, except it's doing it much, much faster.
From what I can tell, it's using a lookup array and shift/mask operations operating on whole words to run faster. Pretty dense code though.
EDIT: here is the complete source
/* unpack.c -- decompress files in pack format.
* Copyright (C) 1992-1993 Jean-loup Gailly
* This is free software; you can redistribute it and/or modify it under the
* terms of the GNU General Public License, see the file COPYING.
*/
#ifdef RCSID
static char rcsid[] = "$Id: unpack.c,v 1.4 1993/06/11 19:25:36 jloup Exp $";
#endif
#include "tailor.h"
#include "gzip.h"
#include "crypt.h"
#define MIN(a,b) ((a) <= (b) ? (a) : (b))
/* The arguments must not have side effects. */
#define MAX_BITLEN 25
/* Maximum length of Huffman codes. (Minor modifications to the code
* would be needed to support 32 bits codes, but pack never generates
* more than 24 bits anyway.)
*/
#define LITERALS 256
/* Number of literals, excluding the End of Block (EOB) code */
#define MAX_PEEK 12
/* Maximum number of 'peek' bits used to optimize traversal of the
* Huffman tree.
*/
local ulg orig_len; /* original uncompressed length */
local int max_len; /* maximum bit length of Huffman codes */
local uch literal[LITERALS];
/* The literal bytes present in the Huffman tree. The EOB code is not
* represented.
*/
local int lit_base[MAX_BITLEN+1];
/* All literals of a given bit length are contiguous in literal[] and
* have contiguous codes. literal[code+lit_base[len]] is the literal
* for a code of len bits.
*/
local int leaves [MAX_BITLEN+1]; /* Number of leaves for each bit length */
local int parents[MAX_BITLEN+1]; /* Number of parents for each bit length */
local int peek_bits; /* Number of peek bits currently used */
/* local uch prefix_len[1 << MAX_PEEK]; */
#define prefix_len outbuf
/* For each bit pattern b of peek_bits bits, prefix_len[b] is the length
* of the Huffman code starting with a prefix of b (upper bits), or 0
* if all codes of prefix b have more than peek_bits bits. It is not
* necessary to have a huge table (large MAX_PEEK) because most of the
* codes encountered in the input stream are short codes (by construction).
* So for most codes a single lookup will be necessary.
*/
#if (1<<MAX_PEEK) > OUTBUFSIZ
error cannot overlay prefix_len and outbuf
#endif
local ulg bitbuf;
/* Bits are added on the low part of bitbuf and read from the high part. */
local int valid; /* number of valid bits in bitbuf */
/* all bits above the last valid bit are always zero */
/* Set code to the next 'bits' input bits without skipping them. code
* must be the name of a simple variable and bits must not have side effects.
* IN assertions: bits <= 25 (so that we still have room for an extra byte
* when valid is only 24), and mask = (1<<bits)-1.
*/
#define look_bits(code,bits,mask) \
{ \
while (valid < (bits)) bitbuf = (bitbuf<<8) | (ulg)get_byte(), valid += 8; \
code = (bitbuf >> (valid-(bits))) & (mask); \
}
/* Skip the given number of bits (after having peeked at them): */
#define skip_bits(bits) (valid -= (bits))
#define clear_bitbuf() (valid = 0, bitbuf = 0)
/* Local functions */
local void read_tree OF((void));
local void build_tree OF((void));
/* ===========================================================================
* Read the Huffman tree.
*/
local void read_tree()
{
int len; /* bit length */
int base; /* base offset for a sequence of leaves */
int n;
/* Read the original input size, MSB first */
orig_len = 0;
for (n = 1; n <= 4; n++) orig_len = (orig_len << 8) | (ulg)get_byte();
max_len = (int)get_byte(); /* maximum bit length of Huffman codes */
if (max_len > MAX_BITLEN) {
error("invalid compressed data -- Huffman code > 32 bits");
}
/* Get the number of leaves at each bit length */
n = 0;
for (len = 1; len <= max_len; len++) {
leaves[len] = (int)get_byte();
n += leaves[len];
}
if (n > LITERALS) {
error("too many leaves in Huffman tree");
}
Trace((stderr, "orig_len %ld, max_len %d, leaves %d\n",
orig_len, max_len, n));
/* There are at least 2 and at most 256 leaves of length max_len.
* (Pack arbitrarily rejects empty files and files consisting of
* a single byte even repeated.) To fit the last leaf count in a
* byte, it is offset by 2. However, the last literal is the EOB
* code, and is not transmitted explicitly in the tree, so we must
* adjust here by one only.
*/
leaves[max_len]++;
/* Now read the leaves themselves */
base = 0;
for (len = 1; len <= max_len; len++) {
/* Remember where the literals of this length start in literal[] : */
lit_base[len] = base;
/* And read the literals: */
for (n = leaves[len]; n > 0; n--) {
literal[base++] = (uch)get_byte();
}
}
leaves[max_len]++; /* Now include the EOB code in the Huffman tree */
}
/* ===========================================================================
* Build the Huffman tree and the prefix table.
*/
local void build_tree()
{
int nodes = 0; /* number of nodes (parents+leaves) at current bit length */
int len; /* current bit length */
uch *prefixp; /* pointer in prefix_len */
for (len = max_len; len >= 1; len--) {
/* The number of parent nodes at this level is half the total
* number of nodes at parent level:
*/
nodes >>= 1;
parents[len] = nodes;
/* Update lit_base by the appropriate bias to skip the parent nodes
* (which are not represented in the literal array):
*/
lit_base[len] -= nodes;
/* Restore nodes to be parents+leaves: */
nodes += leaves[len];
}
/* Construct the prefix table, from shortest leaves to longest ones.
* The shortest code is all ones, so we start at the end of the table.
*/
peek_bits = MIN(max_len, MAX_PEEK);
prefixp = &prefix_len[1<<peek_bits];
for (len = 1; len <= peek_bits; len++) {
int prefixes = leaves[len] << (peek_bits-len); /* may be 0 */
while (prefixes--) *--prefixp = (uch)len;
}
/* The length of all other codes is unknown: */
while (prefixp > prefix_len) *--prefixp = 0;
}
/* ===========================================================================
* Unpack in to out. This routine does not support the old pack format
* with magic header \037\037.
*
* IN assertions: the buffer inbuf contains already the beginning of
* the compressed data, from offsets inptr to insize-1 included.
* The magic header has already been checked. The output buffer is cleared.
*/
int unpack(in, out)
int in, out; /* input and output file descriptors */
{
int len; /* Bit length of current code */
unsigned eob; /* End Of Block code */
register unsigned peek; /* lookahead bits */
unsigned peek_mask; /* Mask for peek_bits bits */
ifd = in;
ofd = out;
read_tree(); /* Read the Huffman tree */
build_tree(); /* Build the prefix table */
clear_bitbuf(); /* Initialize bit input */
peek_mask = (1<<peek_bits)-1;
/* The eob code is the largest code among all leaves of maximal length: */
eob = leaves[max_len]-1;
Trace((stderr, "eob %d %x\n", max_len, eob));
/* Decode the input data: */
for (;;) {
/* Since eob is the longest code and not shorter than max_len,
* we can peek at max_len bits without having the risk of reading
* beyond the end of file.
*/
look_bits(peek, peek_bits, peek_mask);
len = prefix_len[peek];
if (len > 0) {
peek >>= peek_bits - len; /* discard the extra bits */
} else {
/* Code of more than peek_bits bits, we must traverse the tree */
ulg mask = peek_mask;
len = peek_bits;
do {
len++, mask = (mask<<1)+1;
look_bits(peek, len, mask);
} while (peek < (unsigned)parents[len]);
/* loop as long as peek is a parent node */
}
/* At this point, peek is the next complete code, of len bits */
if (peek == eob && len == max_len) break; /* end of file? */
put_ubyte(literal[peek+lit_base[len]]);
Tracev((stderr,"%02d %04x %c\n", len, peek,
literal[peek+lit_base[len]]));
skip_bits(len);
} /* for (;;) */
flush_window();
Trace((stderr, "bytes_out %ld\n", bytes_out));
if (orig_len != (ulg)bytes_out) {
error("invalid compressed data--length error");
}
return OK;
}
The typical way to decompress a Huffman code is using a binary tree. You insert your codes in the tree, so that each bit in a code represents a branch either to the left (0) or right (1), with decoded bytes (or whatever values you have) in the leaves.
Decoding is then just a case of reading bits from the coded content, walking the tree for each bit. When you reach a leaf, emit that decoded value, and keep reading until the input is exhausted.
Update: this page describes the technique, and has fancy graphics.
You can perform a kind of batch lookup on the usual Huffmann tree lookup:
Choosing a bit depth (call it depth n); this is a trade-off between speed, memory, and time investment to construct tables;
Build a lookup table for all 2^n bit strings of length n. Each entry may encode several complete tokens; there will commonly also be some bits left over that are only a prefix of Huffman codes: for each of these, make a link to a further lookup table for that code;
Build the further lookup tables. The total number of tables is at most one less than the number of entries coded in the Huffmann tree.
Choosing a depth that is a multiple of four, e.g., depth 8, is a good fit for bit shifting operations.
Postscript This differs from the idea in potatoswatter's comment on unwind's answer and from Steve314's answer in using multiple tables: this means that all of the n-bit lookup is put to use, so should be faster but makes table construction and lookup significantly trickier, and will consume much more space for a given depth.
Why not use the decompress algorithm in the same source module? It appears to be a decent algorithm.
The other answers are right, but here is some code in Rust I wrote recently to make the ideas concrete. This is the key routine:
fn decode( &self, input: &mut InpBitStream ) -> usize
{
let mut sym = self.lookup[ input.peek( self.peekbits ) ];
if sym >= self.ncode
{
sym = self.lookup[ sym - self.ncode + ( input.peek( self.maxbits ) >> self.peekbits ) ];
}
input.advance( self.nbits[ sym ] as usize );
sym
}
The tricky bit is setting up the lookup table, see BitDecoder::setup_code in this complete RFC 1951 decoder in Rust:
// RFC 1951 inflate ( de-compress ).
pub fn inflate( data: &[u8] ) -> Vec<u8>
{
let mut inp = InpBitStream::new( &data );
let mut out = Vec::new();
let _chk = inp.get_bits( 16 ); // Checksum
loop
{
let last = inp.get_bit();
let btype = inp.get_bits( 2 );
match btype
{
2 => { do_dyn( &mut inp, &mut out ); }
1 => { do_fixed( &mut inp, &mut out ); }
0 => { do_copy( &mut inp, &mut out ); }
_ => { }
}
if last != 0 { break; }
}
out
}
fn do_dyn( inp: &mut InpBitStream, out: &mut Vec<u8> )
{
let n_lit_code = 257 + inp.get_bits( 5 );
let n_dist_code = 1 + inp.get_bits( 5 );
let n_len_code = 4 + inp.get_bits( 4 );
let mut len = LenDecoder::new( inp, n_len_code );
let mut lit = BitDecoder::new( n_lit_code );
len.get_lengths( inp, &mut lit.nbits );
lit.init();
let mut dist = BitDecoder::new( n_dist_code );
len.get_lengths( inp, &mut dist.nbits );
dist.init();
loop
{
let x = lit.decode( inp );
match x
{
0..=255 => { out.push( x as u8 ); }
256 => { break; }
_ =>
{
let mc = x - 257;
let length = MATCH_OFF[ mc ] + inp.get_bits( MATCH_EXTRA[ mc ] as usize );
let dc = dist.decode( inp );
let distance = DIST_OFF[ dc ] + inp.get_bits( DIST_EXTRA[ dc ] as usize );
copy( out, distance, length );
}
}
}
} // end do_dyn
fn copy( out: &mut Vec<u8>, distance: usize, mut length: usize )
{
let mut i = out.len() - distance;
while length > 0
{
out.push( out[ i ] );
i += 1;
length -= 1;
}
}
/// Decode length-limited Huffman codes.
struct BitDecoder
{
ncode: usize,
nbits: Vec<u8>,
maxbits: usize,
peekbits: usize,
lookup: Vec<usize>
}
impl BitDecoder
{
fn new( ncode: usize ) -> BitDecoder
{
BitDecoder
{
ncode,
nbits: vec![0; ncode],
maxbits: 0,
peekbits: 0,
lookup: Vec::new()
}
}
/// The key routine, will be called many times.
fn decode( &self, input: &mut InpBitStream ) -> usize
{
let mut sym = self.lookup[ input.peek( self.peekbits ) ];
if sym >= self.ncode
{
sym = self.lookup[ sym - self.ncode + ( input.peek( self.maxbits ) >> self.peekbits ) ];
}
input.advance( self.nbits[ sym ] as usize );
sym
}
fn init( &mut self )
{
let ncode = self.ncode;
let mut max_bits : usize = 0;
for bp in &self.nbits
{
let bits = *bp as usize;
if bits > max_bits { max_bits = bits; }
}
self.maxbits = max_bits;
self.peekbits = if max_bits > 8 { 8 } else { max_bits };
self.lookup.resize( 1 << self.peekbits, 0 );
// Code below is from rfc1951 page 7
let mut bl_count : Vec<usize> = vec![ 0; max_bits + 1 ]; // the number of codes of length N, N >= 1.
for i in 0..ncode { bl_count[ self.nbits[i] as usize ] += 1; }
let mut next_code : Vec<usize> = vec![ 0; max_bits + 1 ];
let mut code = 0;
bl_count[0] = 0;
for i in 0..max_bits
{
code = ( code + bl_count[i] ) << 1;
next_code[ i + 1 ] = code;
}
for i in 0..ncode
{
let len = self.nbits[ i ] as usize;
if len != 0
{
self.setup_code( i, len, next_code[ len ] );
next_code[ len ] += 1;
}
}
}
// Decoding is done using self.lookup ( see decode ). To keep the lookup table small,
// codes longer than 8 bits are looked up in two peeks.
fn setup_code( &mut self, sym: usize, len: usize, mut code: usize )
{
if len <= self.peekbits
{
let diff = self.peekbits - len;
for i in code << diff .. (code << diff) + (1 << diff)
{
// bits are reversed to match InpBitStream::peek
let r = reverse( i, self.peekbits );
self.lookup[ r ] = sym;
}
} else {
// Secondary lookup required.
let peekbits2 = self.maxbits - self.peekbits;
// Split code into peekbits portion ( key ) and remainder ( code).
let diff1 = len - self.peekbits;
let key = code >> diff1;
code &= ( 1 << diff1 ) - 1;
// Get the secondary lookup.
let kr = reverse( key, self.peekbits );
let mut base = self.lookup[ kr ];
if base == 0 // Secondary lookup not yet allocated for this key.
{
base = self.lookup.len();
self.lookup.resize( base + ( 1 << peekbits2 ), 0 );
self.lookup[ kr ] = self.ncode + base;
} else {
base -= self.ncode;
}
// Set the secondary lookup values.
let diff = self.maxbits - len;
for i in code << diff .. (code << diff) + (1<<diff)
{
let r = reverse( i, peekbits2 );
self.lookup[ base + r ] = sym;
}
}
}
} // end impl BitDecoder
struct InpBitStream<'a>
{
data: &'a [u8],
pos: usize,
buf: usize,
got: usize, // Number of bits in buffer.
}
impl <'a> InpBitStream<'a>
{
fn new( data: &'a [u8] ) -> InpBitStream
{
InpBitStream { data, pos: 0, buf: 1, got: 0 }
}
fn peek( &mut self, n: usize ) -> usize
{
while self.got < n
{
if self.pos < self.data.len()
{
self.buf |= ( self.data[ self.pos ] as usize ) << self.got;
}
self.pos += 1;
self.got += 8;
}
self.buf & ( ( 1 << n ) - 1 )
}
fn advance( &mut self, n:usize )
{
self.buf >>= n;
self.got -= n;
}
fn get_bit( &mut self ) -> usize
{
if self.got == 0 { self.peek( 1 ); }
let result = self.buf & 1;
self.advance( 1 );
result
}
fn get_bits( &mut self, n: usize ) -> usize
{
let result = self.peek( n );
self.advance( n );
result
}
fn get_huff( &mut self, mut n: usize ) -> usize
{
let mut result = 0;
while n > 0
{
result = ( result << 1 ) + self.get_bit();
n -= 1;
}
result
}
fn clear_bits( &mut self )
{
self.got = 0;
}
} // end impl InpBitStream
/// Decode code lengths.
struct LenDecoder
{
plenc: u8, // previous length code ( which can be repeated )
rep: usize, // repeat
bd: BitDecoder,
}
/// Decodes an array of lengths. There are special codes for repeats, and repeats of zeros.
impl LenDecoder
{
fn new( inp: &mut InpBitStream, n_len_code: usize ) -> LenDecoder
{
let mut result = LenDecoder { plenc: 0, rep:0, bd: BitDecoder::new( 19 ) };
// Read the array of 3-bit code lengths from input.
for i in 0..n_len_code
{
result.bd.nbits[ CLEN_ALPHABET[i] as usize ] = inp.get_bits(3) as u8;
}
result.bd.init();
result
}
// Per RFC1931 page 13, get array of code lengths.
fn get_lengths( &mut self, inp: &mut InpBitStream, result: &mut Vec<u8> )
{
let n = result.len();
let mut i = 0;
while self.rep > 0 { result[i] = self.plenc; i += 1; self.rep -= 1; }
while i < n
{
let lenc = self.bd.decode( inp ) as u8;
if lenc < 16
{
result[i] = lenc;
i += 1;
self.plenc = lenc;
} else {
if lenc == 16 { self.rep = 3 + inp.get_bits(2); }
else if lenc == 17 { self.rep = 3 + inp.get_bits(3); self.plenc=0; }
else if lenc == 18 { self.rep = 11 + inp.get_bits(7); self.plenc=0; }
while i < n && self.rep > 0 { result[i] = self.plenc; i += 1; self.rep -= 1; }
}
}
} // end get_lengths
} // end impl LenDecoder
/// Reverse a string of bits.
pub fn reverse( mut x:usize, mut bits: usize ) -> usize
{
let mut result: usize = 0;
while bits > 0
{
result = ( result << 1 ) | ( x & 1 );
x >>= 1;
bits -= 1;
}
result
}
fn do_copy( inp: &mut InpBitStream, out: &mut Vec<u8> )
{
inp.clear_bits(); // Discard any bits in the input buffer
let mut n = inp.get_bits( 16 );
let _n1 = inp.get_bits( 16 );
while n > 0 { out.push( inp.data[ inp.pos ] ); n -= 1; inp.pos += 1; }
}
fn do_fixed( inp: &mut InpBitStream, out: &mut Vec<u8> ) // RFC1951 page 12.
{
loop
{
// 0 to 23 ( 7 bits ) => 256 - 279; 48 - 191 ( 8 bits ) => 0 - 143;
// 192 - 199 ( 8 bits ) => 280 - 287; 400..511 ( 9 bits ) => 144 - 255
let mut x = inp.get_huff( 7 );
if x <= 23
{
x += 256;
} else {
x = ( x << 1 ) + inp.get_bit();
if x <= 191 { x -= 48; }
else if x <= 199 { x += 88; }
else { x = ( x << 1 ) + inp.get_bit() - 256; }
}
match x
{
0..=255 => { out.push( x as u8 ); }
256 => { break; }
_ => // 257 <= x && x <= 285
{
x -= 257;
let length = MATCH_OFF[x] + inp.get_bits( MATCH_EXTRA[ x ] as usize );
let dcode = inp.get_huff( 5 );
let distance = DIST_OFF[dcode] + inp.get_bits( DIST_EXTRA[dcode] as usize );
copy( out, distance, length );
}
}
}
} // end do_fixed
// RFC 1951 constants.
pub static CLEN_ALPHABET : [u8; 19] = [ 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15 ];
pub static MATCH_OFF : [usize; 30] = [ 3,4,5,6, 7,8,9,10, 11,13,15,17, 19,23,27,31, 35,43,51,59,
67,83,99,115, 131,163,195,227, 258, 0xffff ];
pub static MATCH_EXTRA : [u8; 29] = [ 0,0,0,0, 0,0,0,0, 1,1,1,1, 2,2,2,2, 3,3,3,3, 4,4,4,4, 5,5,5,5, 0 ];
pub static DIST_OFF : [usize; 30] = [ 1,2,3,4, 5,7,9,13, 17,25,33,49, 65,97,129,193, 257,385,513,769,
1025,1537,2049,3073, 4097,6145,8193,12289, 16385,24577 ];
pub static DIST_EXTRA : [u8; 30] = [ 0,0,0,0, 1,1,2,2, 3,3,4,4, 5,5,6,6, 7,7,8,8, 9,9,10,10, 11,11,12,12, 13,13 ];
Github repository here
Related
How to insert array of bytes in PostgreSQL table via libpq C++ API
I am trying to update table CREATE TABLE some_table ( id integer NOT NULL, client_fid bigint NOT NULL, index bytea[], update_time timestamp without time zone ) WITH ( OIDS = FALSE using modified code snipped from here How to insert text array in PostgreSQL table in binary format using libpq? #define BYTEAARRAYOID 1001 #define BYTEAOID 17 Here is a pgvals_t structure definition struct pgvals_t { /* number of array dimensions */ int32_t ndims; /* flag describing if array has NULL values */ int32_t hasNull; /* Oid of data stored in array. In our case is 25 for TEXT */ Oid oidType; /* Number of elements in array */ int32_t totalLen; /* Not sure for this one. I think it describes dimensions of elements in case of arrays storing arrays */ int32_t subDims; /* Here our data begins */ } __attribute__ ((__packed__)); I've removed dataBegins pointer from struct as it affects data layout in memo std::size_t nElems = _data.size(); uint32_t valsDataSize = sizeof(prx::pgvals_t) + sizeof(int32_t) * nElems + sizeof(uint8_t)*nElems; void *pData = malloc(valsDataSize); prx::pgvals_t* pvals = (prx::pgvals_t*)pData; /* our array has one dimension */ pvals->ndims = ntohl(1); /* our array has no NULL elements */ pvals->hasNull = ntohl(0); /* type of our elements is bytea */ pvals->oidType = ntohl(BYTEAOID); /* our array has nElems elements */ pvals->totalLen = ntohl(nElems); pvals->subDims = ntohl(1); int32_t elemLen = ntohl(sizeof(uint8_t)); std::size_t offset = sizeof(elemLen) + sizeof(_data[0]); char * ptr = (char*)(pvals + sizeof(prx::pgvals_t)); for(auto byte : _data){ memcpy(ptr, &elemLen, sizeof(elemLen)); memcpy(ptr + sizeof(elemLen), &byte, sizeof(byte)); ptr += offset; } Oid paramTypes[] = { BYTEAARRAYOID }; char * paramValues[] = {(char* )pData}; int paramLengths[] = { (int)valsDataSize }; int paramFormats[] = {1}; PGresult *res = PQexecParams(m_conn, _statement.c_str(), 1, paramTypes, paramValues, paramLengths, paramFormats, 1 ); if (PQresultStatus(res) != PGRES_COMMAND_OK) { std::string errMsg = PQresultErrorMessage(res); PQclear(res); throw std::runtime_error(errMsg); } free(pData); The binary data is contained in std::vector variable and am using the following query in a _statement variable of type std::string INSERT INTO some_table \ (id, client_id, \"index\", update_time) \ VALUES \ (1, 2, $1, NOW()) Now after call to PQExecParams I am get an exception with message "incorrect binary data format in bind parameter 1" What can be the problem here?
If you want to pass a bytea[] in binary format, you have to use the binary array format as read by array_recv and written by array_send. You cannot just pass a C array.
How to programatically decrypt aes-256-cbc file which was encrypted using password? [duplicate]
For example, the command: openssl enc -aes-256-cbc -a -in test.txt -k pinkrhino -nosalt -p -out openssl_output.txt outputs something like: key = 33D890D33F91D52FC9B405A0DDA65336C3C4B557A3D79FE69AB674BE82C5C3D2 iv = 677C95C475C0E057B739750748608A49 How is that key generated? (C code as an answer would be too awesome to ask for :) ) Also, how is the iv generated? Looks like some kind of hex to me.
OpenSSL uses the function EVP_BytesToKey. You can find the call to it in apps/enc.c. The enc utility used to use the MD5 digest by default in the Key Derivation Algorithm (KDF) if you didn't specify a different digest with the -md argument. Now it uses SHA-256 by default. Here's a working example using MD5: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <openssl/evp.h> int main(int argc, char *argv[]) { const EVP_CIPHER *cipher; const EVP_MD *dgst = NULL; unsigned char key[EVP_MAX_KEY_LENGTH], iv[EVP_MAX_IV_LENGTH]; const char *password = "password"; const unsigned char *salt = NULL; int i; OpenSSL_add_all_algorithms(); cipher = EVP_get_cipherbyname("aes-256-cbc"); if(!cipher) { fprintf(stderr, "no such cipher\n"); return 1; } dgst=EVP_get_digestbyname("md5"); if(!dgst) { fprintf(stderr, "no such digest\n"); return 1; } if(!EVP_BytesToKey(cipher, dgst, salt, (unsigned char *) password, strlen(password), 1, key, iv)) { fprintf(stderr, "EVP_BytesToKey failed\n"); return 1; } printf("Key: "); for(i=0; i<cipher->key_len; ++i) { printf("%02x", key[i]); } printf("\n"); printf("IV: "); for(i=0; i<cipher->iv_len; ++i) { printf("%02x", iv[i]); } printf("\n"); return 0; } Example usage: gcc b2k.c -o b2k -lcrypto -g ./b2k Key: 5f4dcc3b5aa765d61d8327deb882cf992b95990a9151374abd8ff8c5a7a0fe08 IV: b7b4372cdfbcb3d16a2631b59b509e94 Which generates the same key as this OpenSSL command line: openssl enc -aes-256-cbc -k password -nosalt -p < /dev/null key=5F4DCC3B5AA765D61D8327DEB882CF992B95990A9151374ABD8FF8C5A7A0FE08 iv =B7B4372CDFBCB3D16A2631B59B509E94 OpenSSL 1.1.0c changed the digest algorithm used in some internal components. Formerly, MD5 was used, and 1.1.0 switched to SHA256. Be careful the change is not affecting you in both EVP_BytesToKey and commands like openssl enc.
If anyone is looking for implementing the same in SWIFT I converted the EVP_BytesToKey in swift /* - parameter keyLen: keyLen - parameter ivLen: ivLen - parameter digest: digest e.g "md5" or "sha1" - parameter salt: salt - parameter data: data - parameter count: count - returns: key and IV respectively */ open static func evpBytesToKey(_ keyLen:Int, ivLen:Int, digest:String, salt:[UInt8], data:Data, count:Int)-> [[UInt8]] { let saltData = Data(bytes: UnsafePointer<UInt8>(salt), count: Int(salt.count)) var both = [[UInt8]](repeating: [UInt8](), count: 2) var key = [UInt8](repeating: 0,count: keyLen) var key_ix = 0 var iv = [UInt8](repeating: 0,count: ivLen) var iv_ix = 0 var nkey = keyLen; var niv = ivLen; var i = 0 var addmd = 0 var md:Data = Data() var md_buf:[UInt8] while true { addmd = addmd + 1 md.append(data) md.append(saltData) if(digest=="md5"){ md = NSData(data:md.md5()) as Data }else if (digest == "sha1"){ md = NSData(data:md.sha1()) as Data } for _ in 1...(count-1){ if(digest=="md5"){ md = NSData(data:md.md5()) as Data }else if (digest == "sha1"){ md = NSData(data:md.sha1()) as Data } } md_buf = Array (UnsafeBufferPointer(start: md.bytes, count: md.count)) // md_buf = Array(UnsafeBufferPointer(start: md.bytes.bindMemory(to: UInt8.self, capacity: md.count), count: md.length)) i = 0 if (nkey > 0) { while(true) { if (nkey == 0){ break } if (i == md.count){ break } key[key_ix] = md_buf[i]; key_ix = key_ix + 1 nkey = nkey - 1 i = i + 1 } } if (niv > 0 && i != md_buf.count) { while(true) { if (niv == 0){ break } if (i == md_buf.count){ break } iv[iv_ix] = md_buf[i] iv_ix = iv_ix + 1 niv = niv - 1 i = i + 1 } } if (nkey == 0 && niv == 0) { break } } both[0] = key both[1] = iv return both } I use CryptoSwift for the hash. This is a much cleaner way as apples does not recommend OpenSSL in iOS UPDATE : Swift 3
Here is a version for mbedTLS / Polar SSL - tested and working. typedef int bool; #define false 0 #define true (!false) //------------------------------------------------------------------------------ static bool EVP_BytesToKey( const unsigned int nDesiredKeyLen, const unsigned char* salt, const unsigned char* password, const unsigned int nPwdLen, unsigned char* pOutKey, unsigned char* pOutIV ) { // This is a re-implemntation of openssl's password to key & IV routine for mbedtls. // (See openssl apps/enc.c and /crypto/evp/evp_key.c) It is not any kind of // standard (e.g. PBKDF2), and it only uses an interation count of 1, so it's // pretty crappy. MD5 is used as the digest in Openssl 1.0.2, 1.1 and late // use SHA256. Since this is for embedded system, I figure you know what you've // got, so I made it compile-time configurable. // // The signature has been re-jiggered to make it less general. // // See: https://wiki.openssl.org/index.php/Manual:EVP_BytesToKey(3) // And: https://www.cryptopp.com/wiki/OPENSSL_EVP_BytesToKey #define IV_BYTE_COUNT 16 #if BTK_USE_MD5 # define DIGEST_BYTE_COUNT 16 // MD5 #else # define DIGEST_BYTE_COUNT 32 // SHA #endif bool bRet; unsigned char md_buf[ DIGEST_BYTE_COUNT ]; mbedtls_md_context_t md_ctx; bool bAddLastMD = false; unsigned int nKeyToGo = nDesiredKeyLen; // 32, typical unsigned int nIVToGo = IV_BYTE_COUNT; mbedtls_md_init( &md_ctx ); #if BTK_USE_MD5 int rc = mbedtls_md_setup( &md_ctx, mbedtls_md_info_from_type( MBEDTLS_MD_MD5 ), 0 ); #else int rc = mbedtls_md_setup( &md_ctx, mbedtls_md_info_from_type( MBEDTLS_MD_SHA256 ), 0 ); #endif if (rc != 0 ) { fprintf( stderr, "mbedutils_md_setup() failed -0x%04x\n", -rc ); bRet = false; goto exit; } while( 1 ) { mbedtls_md_starts( &md_ctx ); // start digest if ( bAddLastMD == false ) // first time { bAddLastMD = true; // do it next time } else { mbedtls_md_update( &md_ctx, &md_buf[0], DIGEST_BYTE_COUNT ); } mbedtls_md_update( &md_ctx, &password[0], nPwdLen ); mbedtls_md_update( &md_ctx, &salt[0], 8 ); mbedtls_md_finish( &md_ctx, &md_buf[0] ); // // Iteration loop here in original removed as unused by "openssl enc" // // Following code treats the output key and iv as one long, concatentated buffer // and smears as much digest across it as is available. If not enough, it takes the // big, enclosing loop, makes more digest, and continues where it left off on // the last iteration. unsigned int ii = 0; // index into mb_buf if ( nKeyToGo != 0 ) // still have key to fill in? { while( 1 ) { if ( nKeyToGo == 0 ) // key part is full/done break; if ( ii == DIGEST_BYTE_COUNT ) // ran out of digest, so loop break; *pOutKey++ = md_buf[ ii ]; // stick byte in output key nKeyToGo--; ii++; } } if ( nIVToGo != 0 // still have fill up IV && // and ii != DIGEST_BYTE_COUNT // have some digest available ) { while( 1 ) { if ( nIVToGo == 0 ) // iv is full/done break; if ( ii == DIGEST_BYTE_COUNT ) // ran out of digest, so loop break; *pOutIV++ = md_buf[ ii ]; // stick byte in output IV nIVToGo--; ii++; } } if ( nKeyToGo == 0 && nIVToGo == 0 ) // output full, break main loop and exit break; } // outermost while loop bRet = true; exit: mbedtls_md_free( &md_ctx ); return bRet; }
If anyone passing through here is looking for a working, performant reference implementation in Haskell, here it is: import Crypto.Hash import qualified Data.ByteString as B import Data.ByteArray (convert) import Data.Monoid ((<>)) evpBytesToKey :: HashAlgorithm alg => Int -> Int -> alg -> Maybe B.ByteString -> B.ByteString -> (B.ByteString, B.ByteString) evpBytesToKey keyLen ivLen alg mSalt password = let bytes = B.concat . take required . iterate go $ hash' passAndSalt (key, rest) = B.splitAt keyLen bytes in (key, B.take ivLen rest) where hash' = convert . hashWith alg required = 1 + ((keyLen + ivLen - 1) `div` hashDigestSize alg) passAndSalt = maybe password (password <>) mSalt go = hash' . (<> passAndSalt) It uses hash algorithms provided by the cryptonite package. The arguments are desired key and IV size in bytes, the hash algorithm to use (like e.g. (undefined :: MD5)), optional salt and the password. The result is a tuple of key and IV.
c++ copy array to array
I have taken code from here Webduino Network Setp I added one more field. struct config_t { .... ... ..... byte subnet[4]; byte dns_server[4]; unsigned int webserverPort; char HostName[10]; // Added code Here.. } eeprom_config; Snippet.. #define NAMELEN 5 #define VALUELEN 10 void setupNetHTML(WebServer &server, WebServer::ConnectionType type, char *url_tail, bool tail_complete) { URLPARAM_RESULT rc; char name[NAMELEN]; char value[VALUELEN]; boolean params_present = false; byte param_number = 0; char buffer [13]; ..... ..... } Added Lines to read date from web page and Wire to eeprom Write to eeprom: ( Facing issue here, I need to copy value to eeprom_config.HostName[0] ... ) // read Host Name if (param_number >= 25 && param_number <= 35) { // eeprom_config.HostName[param_number - 25] = strtol(value, NULL, 10); eeprom_config.HostName[param_number - 25] = value ; // Facing Issue here.. } and... for (int a = 0; a < 10; a++) { server.printP(Form_input_text_start); server.print(a + 25); server.printP(Form_input_value); server.print(eeprom_config.HostName[a]); server.printP(Form_input_size1); server.printP(Form_input_end); }
Issue was resolved. Thanks , got idea from this post. invalid conversion from char' tochar*' How ! changed // read Host Name if (param_number >= 25 && param_number <= 35) { // eeprom_config.HostName[param_number - 25] = strtol(value, NULL, 10); eeprom_config.HostName[param_number - 25] = value ; // Facing Issue here.. } changed to // read Host Name if (param_number >= 25 && param_number <= 35) { eeprom_config.HostName[param_number - 25] = value[0]; }
GIF LZW decompression
I am trying to implement a simple Gif-Reader in c++. I currently stuck with decompressing the Imagedata. If an image includes a Clear Code my decompression algorithm fails. After the Clear Code I rebuild the CodeTable reset the CodeSize to MinimumLzwCodeSize + 1. Then I read the next code and add it to the indexstream. The problem is that after clearing, the next codes include values greater than the size of the current codetable. For example the sample file from wikipedia: rotating-earth.gif has a code value of 262 but the GlobalColorTable is only 256. How do I handle this? I implemented the lzw decompression according to gif spec.. here is the main code part of decompressing: int prevCode = GetCode(ptr, offset, codeSize); codeStream.push_back(prevCode); while (true) { auto code = GetCode(ptr, offset, codeSize); // //Clear code // if (code == IndexClearCode) { //reset codesize codeSize = blockA.LZWMinimumCodeSize + 1; currentNodeValue = pow(2, codeSize) - 1; //reset codeTable codeTable.resize(colorTable.size() + 2); //read next code prevCode = GetCode(ptr, offset, codeSize); codeStream.push_back(prevCode); continue; } else if (code == IndexEndOfInformationCode) break; //exists in dictionary if (codeTable.size() > code) { if (prevCode >= codeTable.size()) { prevCode = code; continue; } for (auto c : codeTable[code]) codeStream.push_back(c); newEntry = codeTable[prevCode]; newEntry.push_back(codeTable[code][0]); codeTable.push_back(newEntry); prevCode = code; if (codeTable.size() - 1 == currentNodeValue) { codeSize++; currentNodeValue = pow(2, codeSize) - 1; } } else { if (prevCode >= codeTable.size()) { prevCode = code; continue; } newEntry = codeTable[prevCode]; newEntry.push_back(codeTable[prevCode][0]); for (auto c : newEntry) codeStream.push_back(c); codeTable.push_back(newEntry); prevCode = codeTable.size() - 1; if (codeTable.size() - 1 == currentNodeValue) { codeSize++; currentNodeValue = pow(2, codeSize) - 1; } } }
Found the solution. It is called Deferred clear code. So when I check if the codeSize needs to be incremented I also need to check if the codeSize is already max(12), as it is possible to to get codes that are of the maximum Code Size. See spec-gif89a.txt. if (codeTable.size() - 1 == currentNodeValue && codeSize < 12) { codeSize++; currentNodeValue = (1 << codeSize) - 1; }
AudioConverterNew returned -50
I have a little issue regarding the use of the AudioQueue services. I have followed the guide that is available on Apple's webiste, but when I got to start and run the Audio Queue, I get the message telling me that "AudioConverterNew returned -50". Now, I know that the -50 error code means that there is a bad parameter. However, what I don't know is which parameter is the bad one (thank you so much Apple...) ! So, here's my code. Here are the parameters of my class, named cPlayerCocoa AudioQueueRef mQueue; AudioQueueBufferRef mBuffers[NUMBER_BUFFERS]; // NUMBER_BUFFERS = 3 uint32 mBufferByteSize; AudioStreamBasicDescription mDataFormat; Here's the first function : static void BuildBuffer( void* iAQData, AudioQueueRef iAQ, AudioQueueBufferRef iBuffer ) { cPlayerCocoa* player = (cPlayerCocoa*) iAQData; player->HandleOutputBuffer( iAQ, iBuffer ); } It creates a cPlayerCocoa from the structure containing the AudioQueue and calls the HandleOutputBuffer function, which allocates the audio buffers : void cPlayerCocoa::HandleOutputBuffer( AudioQueueRef iAQ, AudioQueueBufferRef iBuffer ) { if( mContinue ) { xassert( iBuffer->mAudioDataByteSize == 32768 ); int startSample = mPlaySampleCurrent; int result = 0; int samplecount = 32768 / ( mSoundData->BytesPerSample() ); // BytesPerSample, in my case, returns 4 tErrorCode error = mSoundData->ReadData( (int16*)(iBuffer->mAudioData), samplecount, &result, startSample ); AudioQueueEnqueueBuffer( mQueue, iBuffer, 0, 0 ); // I'm using CBR data (PCM), hence the 0 passed into the AudioQueueEnqueueBuffer. if( result != samplecount ) mContinue = false; startSample += result; } else { AudioQueueStop( mQueue, false ); } } In this next function, the AudioQueue is created then started. I begin to initialise the parameters of the Data format. Then I create the AudioQueue, and I allocate the 3 buffers. When the buffers are allocated, I start the AudioQueue and then I run the loop. void cPlayerCocoa::ThreadEntry() { int samplecount = 32768 / ( mSoundData->BytesPerSample() ); mDataFormat.mSampleRate = mSoundData->SamplingRate(); // Returns 44100 mDataFormat.mFormatID = kAudioFormatLinearPCM; mDataFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked; mDataFormat.mBytesPerPacket = 32768; mDataFormat.mFramesPerPacket = samplecount; mDataFormat.mBytesPerFrame = mSoundData->BytesPerSample(); // BytesPerSample returns 4. mDataFormat.mChannelsPerFrame = 2; mDataFormat.mBitsPerChannel = uint32(mSoundData->BitsPerChannel()); mDataFormat.mReserved = 0; AudioQueueNewOutput( &mDataFormat, BuildBuffer, this, CFRunLoopGetCurrent(), kCFRunLoopCommonModes, 0, &mQueue ); for( int i = 0; i < NUMBER_BUFFERS; ++i ) { AudioQueueAllocateBuffer( mQueue, mBufferByteSize, &mBuffers[i] ); HandleOutputBuffer( mQueue, mBuffers[i] ); } AudioQueueStart( mQueue, NULL ); // I want the queue to start playing immediately, so I pass NULL do { CFRunLoopRunInMode( kCFRunLoopDefaultMode, 0.25, false ); } while ( !NeedStopASAP() ); AudioQueueDispose( mQueue, true ); } The call to AudioQueueStart returns -50 (bad parameter) and I can't figure what's wrong... I would really appreciate some help, thanks in advance :-)
I think your ASBD is suspect. PCM formats have predictable values for mBytesPerPacket, mBytesPerFrame, and mFramesPerPacket. For normal 16-bit interleaved signed 44.1 stereo audio the ASBD would look like AudioStreamBasicDescription asbd = { .mFormatID = kAudioFormatLinearPCM, .mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked, .mSampleRate = 44100, .mChannelsPerFrame = 2, .mBitsPerChannel = 16, .mBytesPerPacket = 4, .mFramesPerPacket = 1, .mBytesPerFrame = 4, .mReserved = 0 }; AudioConverterNew returns -50 when one of the ASBDs is unsupported. There is no PCM format where mBytesPerPacket should be 32768, which is why you're getting the error.