8b10b encoder with byte stream output (bits carry): faster bitwise algorithm? - c++

I have written a 8b10b encoder that generates a stream of bytes intended to be sent to a serial transmitter which sends the bytes as-is LSb first.
What I'm doing here is basically lay down groups of 10 bits (encoded from the input stream of bytes) on groups of 8, so a varying number of bits get carried over from one output byte to the next - kind of like in music/rhythm.
The program has been successfully tested, but it is about 4-5x too slow for my application. I think it comes from the fact that every bit has to be looked up in an array. My guts tell me we could make that faster by having some sort of rolling mask but I can't yet see how to do that even by swapping out the 3d array of booleans to a 2D array of integers.
Any pointer or other idea?
Here is the code. Please ignore most of the macros and some of the code related to deciding which byte is to be written as this is application-specific.
Header:
#ifndef TX_BYTESTREAM_GEN_H_INCLUDED
#define TX_BYTESTREAM_GEN_H_INCLUDED
#include <stdint.h> //for standard portable types such as uint16_t
#define MAX_USB_TRANSFER_SIZE 1016 //Bytes, size of the max payload in a USB transaction. Determined using FT4222_GetMaxTRansferSize()
#define MAX_USB_PACKET_SIZE 62 //Bytes, max size of the payload of a single USB packet
#define MANDATORY_TX_PACKET_BLOCK 5 //Bytes, constant - equal to the minimum number of bytes of TX packet necessary to exactly transfer blocks of 10 bits of encoded data (LCF of 8 and 10)
#define SYNC_CHARS_MAX_INTERVAL 172 //Target number of payload bytes between sync chars. Max is 188 before desynchronisation
#define ROUND_UP(N, S) ((((N) + (S) - 1) / (S)) * (S)) //Macro to round up the integer N to the largest multiple of the integer S
#define ROUND_DOWN(N,S) ((N / S) * S) //Same rounding down
#define N_SYNC_CHAR_PAIRS_IN_PCKT(pcktSz) (ROUND_UP((pcktSz*1000/(SYNC_CHARS_MAX_INTERVAL+2)),1000)/1000) //Number of sync (K28.5) character/byte pairs in a given packet
#define TX_PAYLOAD_SIZE(pcktSz) ((pcktSz*4/5)-2*N_SYNC_CHAR_PAIRS_IN_PCKT(pcktSz)) //Size in bytes of the payload data before encoding in a single TX packet
#define MAX_TX_PACKET_SIZE (ROUND_DOWN((MAX_USB_TRANSFER_SIZE-MAX_USB_PACKET_SIZE),(MAX_USB_PACKET_SIZE*MANDATORY_TX_PACKET_BLOCK))) //Maximum size in bytes of a TX packet
#define DEFAULT_TX_PACKET_SIZE (MAX_TX_PACKET_SIZE-MAX_USB_PACKET_SIZE*MANDATORY_TX_PACKET_BLOCK) //Default size in bytes of a TX packet with some margin
#define MAX_TX_PAYLOAD_SIZE (TX_PAYLOAD_SIZE(MAX_TX_PACKET_SIZE)) //Maximum size in bytes of the payload in a TX packet
#define DEFAULT_TX_PAYLOAD_SIZE (TX_PAYLOAD_SIZE(DEFAULT_TX_PACKET_SIZE))//Default size in bytes of the payload in a TX packet with some margin
//See string descriptors below for definitions. Error codes are individual bits so can be combined.
enum ErrCode
{
NO_ERR = 0,
INVALID_DIN_SIZE = 1,
INVALID_DOUT_SIZE = 2,
NULL_DIN_PTR = 4,
NULL_DOUT_PTR = 8
};
char const * const ERR_CODE_DESC[] = {
"No error",
"Invalid size of input data",
"Invalid size of output buffer",
"Input data pointer is NULL",
"Output buffer pointer is NULL"
};
/** #brief Generates the bytestream to the transmitter by encoding the incoming data using 8b10b encoding
and inserting K28.5 synchronisation characters to maintain the synchronisation with the demodulator (LVDS passthrough mode)
#arg din is a pointer to an allocated array of bytes which contains the data to encode
#arg dinSize is the size of din in bytes. This size must be equal to TX_PAYLOAD_SIZE(doutSize)
#arg dout is a pointer to an allocated array of bytes which is intended to contain the output bytestream to the transmitter
#arg doutSize is the size of dout in bytes. This size must meet the conditions at the top of this function's implementation. Use DEFAULT_TX_PACKET_SIZE if in doubt.
#return error code (c.f. ErrCode) **/
int TX_gen_bytestream(uint8_t *din, uint16_t dinSize, uint8_t *dout, uint16_t doutSize);
#endif // TX_BYTESTREAM_GEN_H_INCLUDED
Source file:
#include "TX_bytestream_gen.h"
#include <cstddef> //NULL
#define N_BYTE_VALUES (256+1) //256 possible data values + 1 special character (only accessible to this module)
#define N_ENCODED_BITS 10 //Number of bits corresponding to the 8b10b encoding of a byte
//Map the current running disparity, the desired value to encode to the array of encoded bits for 8b10b encoding.
//The Last value is the K28.5 sync character, only accessible to this module
//Notation = MSb to LSb
bool const encodedBits[2][N_BYTE_VALUES][N_ENCODED_BITS] =
{
//Long table (see appendix)
};
//New value of the running disparity after encoding with the specified previous running disparity and requested byte value (c.f. above)
bool const encodingDisparity[2][N_BYTE_VALUES] =
{
//Long table (see appendix)
};
int TX_gen_bytestream(uint8_t *din, uint16_t dinSize, uint8_t *dout, uint16_t doutSize)
{
static bool RDp = false; //Running disparity is initially negative
int ret = 0;
//If the output buffer size is not a multiple of the mandatory payload block or of the USB packet size, or if it cannot be held in a single USB transaction
//return an invalid output buffer size error
if(doutSize == 0 || (doutSize % MANDATORY_TX_PACKET_BLOCK) || (doutSize % MAX_USB_PACKET_SIZE) || (doutSize > MAX_TX_PACKET_SIZE)) //Temp
ret |= INVALID_DOUT_SIZE;
//If the input data size is not consistent with the output buffer size, return the appropriate error code
if(dinSize == 0 || dinSize != TX_PAYLOAD_SIZE(doutSize))
ret |= INVALID_DIN_SIZE;
if(din == NULL)
ret |= NULL_DIN_PTR;
if(dout == NULL)
ret |= NULL_DOUT_PTR;
//If everything checks out, carry on
if(ret == NO_ERR)
{
uint16_t iByteIn = 0; //Index of the byte of input data currently being processed
uint16_t iByteOut = 0; //Index of the output byte currently being written to
uint8_t iBitOut = 0; //Starts with LSb
int16_t nBytesUntilSync = 0; //Countdown of bytes until a sync marker needs to be sent. Cyclic.
//For all output bytes to generate
while(iByteOut < doutSize)
{
bool sync = false; //Initially this byte is not considered a sync byte (in which case the next byte of data will be processed)
//If the maximum interval between sync characters has been reached, mark the two next bytes as sync bytes and reset the counter
if(nBytesUntilSync <= 0)
{
sync = true;
if(nBytesUntilSync == -1) //After the second SYNC is written, the counter is reset
{
nBytesUntilSync = SYNC_CHARS_MAX_INTERVAL;
}
}
//Append bit by bit the encoded data of the byte to write to the output bitstream (carried over from byte to byte) - LSb first
//The byte to write is either the last byte of the encodedBits map (the sync character K28.5) if sync is set, or the next byte of
//input data if it isn't
uint16_t const byteToWrite = (sync?(N_BYTE_VALUES-1):din[iByteIn]);
for(int8_t iEncodedBit = N_ENCODED_BITS-1 ; iEncodedBit >= 0 ; --iEncodedBit, iBitOut++)
{
//If the current output byte is complete, reset the bit index and select the next one
if(iBitOut >= 8)
{
iByteOut++;
iBitOut = 0;
}
//Effectively sets the iBitOut'th bit of the iByteOut'th byte out to the encoded value of the byte to write
bool bitToWrite = encodedBits[RDp][byteToWrite][iEncodedBit]; //Temp
dout[iByteOut] ^= (-bitToWrite ^ dout[iByteOut]) & (1 << iBitOut);
}
//The running disparity is also updated as per the standard (to achieve DC balance)
RDp = encodingDisparity[RDp][byteToWrite]; //Update the running disparity
//If sync was not set, this means a byte of the input data has been processed, in which case take the next one in
//Also decrement the synchronisation counter
if(!sync) {
iByteIn++;
}
//In any case, decrease the synchronisation counter. Even sync characters decrease it (c.f. top of while loop)
nBytesUntilSync--;
}
}
return ret;
}
Testbench:
#include <iostream>
#include "TX_bytestream_gen.h"
#define PACKET_DURATION 0.000992 //In seconds, time of continuous data stream corresponding to one packet (5MHz output, default packet size)
#define TIME_TO_SIMULATE 10 //In seconds
#define PACKET_SIZE DEFAULT_TX_PACKET_SIZE
#define PAYLOAD_SIZE DEFAULT_TX_PAYLOAD_SIZE
#define N_ITERATIONS (TIME_TO_SIMULATE/PACKET_DURATION)
#include <chrono>
using namespace std;
//Testbench: measure the time taken to simulate TIME_TO_SIMULATE seconds of continuous encoding
int main()
{
uint8_t toEncode[PAYLOAD_SIZE] = {100}; //Dummy data, doesn't matter
uint8_t out[PACKET_SIZE] = {0};
std::chrono::time_point<std::chrono::system_clock> start, end;
start = std::chrono::system_clock::now();
for(unsigned int i = 0 ; i < N_ITERATIONS ; i++)
{
TX_gen_bytestream(toEncode, PAYLOAD_SIZE, out, PACKET_SIZE);
}
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end - start;
std::cout << "Task execution time: " << elapsed_seconds.count()/TIME_TO_SIMULATE*100 << "% (for " << TIME_TO_SIMULATE << "s simulated)\n";
return 0;
}
Appendix: lookup tables. I don't have enough characters to paste it here, but it looks like so:
bool const encodedBits[2][N_BYTE_VALUES][N_ENCODED_BITS] =
{
//Running disparity = RD-
{
{1,0,0,1,1,1,0,1,0,0},
//...
},
//Running disparity = RD+
{
{0,1,1,0,0,0,1,0,1,1},
//...
}
};
bool const encodingDisparity[2][N_BYTE_VALUES] =
{
//Previous running disparity was RD-
{
0,
//...
},
//Previous running disparity was RD+
{
1,
//...
}
};

This will be a lot faster if you do everything a byte at time instead of a bit at a time.
First change the way you store your lookup tables. You should have something like:
// conversion from (RD, byte) to (RD, 10-bit code)
// in each word, the lower 10 bits are the code,
// and bit 10 (the 11th bit) is the new RD
// The first 256 values are for RD -1, the next
// for RD 1
static const uint16_t BYTE_TO_CODE[512] = {
...
}
Then you need to change our encoding loop to write a byte at a time. You can use a uint16_t to store the leftover bits from each byte you output.
Something like this (I didn't figure out your sync byte logic, but presumably you can put that in the input or output byte loop):
// returns next isRD1
bool TX_gen_bytestream(uint8_t *dest, const uint8_t *src, size_t src_len, bool isRD1)
{
// bits generated, but not yet written, LSB first
uint16_t bits = 0;
// number of bits in bits
unsigned numbits = 0;
// current RD, either 0 or 256
uint16_t rd = isRD1 ? 256 : 0;
for (const uint8_t *end = src + src_len; src < end; ++src) {
// lookup code and next rd
uint16_t code = BYTE_TO_CODE[rd + *src];
// new rd from code bit 10
rd = (code>>2) & 256;
// store bits
bits |= (code & (uint16_t)0x03FF) << numbits;
numbits+=10;
// write out any complete bytes
while(numbits >= 8) {
*dest++ = (uint8_t)bits;
bits >>=8;
numbits-=8;
}
}
// If src_len isn't divisible by 4, then we have some extra bits
if (numbits) {
*dest = (uint8_t)bits;
}
return !!rd;
}

Related

VirtualBox crashing when assembly "sti" is executed when testing Operating System

I'm currently attempting to build my own Operating System and have run into an issue when trying to test out my kernel code using VirtualBox.
The real issue arises when I call the assembly instruction sti as I'm currently attempting to implement an interrupt descriptor table and communicate with the PICs.
Here is the code that calls it. It's a function called kernel_main that is called from another assembly file. That file simply sets up the stack before executing any code from the OS, but there hasn't been any issues there, and everything works fine until I add the instruction asm("sti"); to the following code:
/* main function of our kernal
* accepts the pointer to multiboot and the magic code (no particular reason to take the magic number)
*
* use extern C to prevent gcc from changing the name
*/
extern "C" void kernel_main(void *multiboot_structure, uint32_t magic_number)
{
// can't use the standard printf as we're outside an OS currently
// we don't have access to glibc so we write our own printf
printf_boot_message("kernel.....\n");
// create the global descriptor table
GlobalDescriptorTable gdt;
// create the interrupt descriptor table
InterruptHandler interrupt_handler(&gdt);
// enable interrupts (test)
asm("sti"); // <- causes crash
// random debug printf
printf_boot_message("sti called\n");
// kernal never really stops, inf loop
while (1)
;
}
Below is the virtual box debug output, I've googled around for VINF_EM_TRIPLE_FAULT but mostly found RAM related issues that I don't think apply to me. The printf calls in the above code execute as expected followed by the VM immediately crashing stating the following:
Link to output as it's too large to post here: https://pastebin.com/jfPfhJUQ
Here is my interrupt handling code:
* Implementations of the interrupt handling routines in sys_interrupts.h
*/
#include "sys_interrupts.h"
#include "misc.h"
//handle() is used to take the interrupt number,
//i_number, and the address to the current CPU stack frame.
uint32_t InterruptHandler::handle(uint8_t i_number, uint32_t crnt_stkptr)
{
// debug
printf(" INTERRUPT");
// after the interrupt code has been executed,
// return the stack pointer so that the CPU can resume
// where it left off.
// this works for now as we do not have multiple
// concurrent processes running, so there is no issue
// of handling the threat number.
return crnt_stkptr;
}
// define the global descriptor table
InterruptHandler::_gate_descriptor InterruptHandler::interrupt_desc_table[N_ENTRIES];
// define the constructor. Takes a pointer to the global
// descriptor table
InterruptHandler::InterruptHandler(GlobalDescriptorTable* global_desc_table)
{
// grab the offset of the usable memory within our global segment
uint16_t seg = global_desc_table->CodeSegmentSelector();
// set all the entries in the IDT to block request initially
for (uint16_t i = 0; i < N_ENTRIES; i++)
{
// create an a gate for a system level interrupt, calling the block function (does nothing) using seg as its memory.
create_entry(i, seg, &block_request, PRIV_LVL_KERNEL, GATE_INTERRUPT);
}
// create a couple interrupts for 0x00 and 0x01, really 0x20 and 0x21 in memory
//create_entry(BASE_I_NUM + 0x00, seg, &isr0x00, PRIV_LVL_KERNEL, GATE_INTERRUPT);
//create_entry(BASE_I_NUM + 0x01, seg, &isr0x01, PRIV_LVL_KERNEL, GATE_INTERRUPT);
// init the PICs
pic_controller.send_master_cmd(PIC_INIT);
pic_controller.send_slave_cmd(PIC_INIT);
// tell master pic to add 0x20 to any interrupt number it sends to CPU, while slave pic sends 0x28 + i_number
pic_controller.send_master_data(PIC_OFFSET_MASTER);
pic_controller.send_slave_data(PIC_OFFSET_SLAVE);
// set the interrupt vectoring to cascade and tell master that there is a slave PIC at IRQ2
pic_controller.send_master_data(ICW1_INTERVAL4);
pic_controller.send_slave_data(ICW1_SINGLE);
// set the PICs to work in 8086 mode
pic_controller.send_master_data(ICW1_8086);
pic_controller.send_slave_data(ICW1_8086);
// send 0s
pic_controller.send_master_data(DEFAULT_MASK);
pic_controller.send_slave_data(DEFAULT_MASK);
// tell the cpu to use the table
interrupt_desc_table_pointerdata idt_ptr;
//set the size
idt_ptr.table_size = N_ENTRIES * sizeof(_gate_descriptor) - 1;
// set the base address
idt_ptr.base_addr = (uint32_t)interrupt_desc_table;
// use lidt instruction to load the table
// the cpu will map interrupts to the table
asm volatile("lidt %0" : : "m" (idt_ptr));
// issue debug print
printf_boot_message(" 2: Created Interrupt Desc Table...\n");
}
// define the destructor of the class
InterruptHandler::~InterruptHandler()
{
}
// function to make entries in the IDT
// takes the interrupt number as an index, the segment offset it used to specify which memory segment to use
// a pointer to the function to call, the flags and access level.
void InterruptHandler::create_entry(uint8_t i_number, uint16_t segment_desc_offset, void (*isr)(), uint8_t priv_lvl, uint8_t desc_type)
{
// set the i_number'th entry to the given params
// take the lower bits of the pointer
interrupt_desc_table[i_number].handler_lower_bits = ((uint32_t)isr) & 0xFFFF;
// take the upper bits
interrupt_desc_table[i_number].handler_upper_bits = (((uint32_t)isr) >> 16) & 0xFFFF;
// calculate the privilage byte, setting the correct bits
interrupt_desc_table[i_number].priv_lvl = 0x80 | ((priv_lvl & 3) << 5) | desc_type;
interrupt_desc_table[i_number].segment_desc_offset = segment_desc_offset;
// reserved byte is always 0
interrupt_desc_table[i_number].reserved_byte = 0;
}
// need a function to block or ignore any requests
// that we dont want to service. Requests could be caused
// by devices we haven't yet configured when testing the os.
void InterruptHandler::block_request()
{
// do nothing
}
// function to tell the CPU to send interrupts
// to this table
void InterruptHandler::set_active()
{
// call sti assembly to start interrup poling at the CPU level
asm volatile("sti"); // <- calling this crashes the kernel
// issue debug print
printf_boot_message(" 4: Activated sys interrupts...\n");
}
And here is the code for my GDT, I followed the os dev wiki guide for this:
#include "global_desc_table.h"
/**
* A code segment is identified by flag 0x9A, cannot write to a code segment
* while a data segment is identified by flag 0x92
*
* Based on the C code present on OSDEV Wiki
*/
GlobalDescriptorTable::GlobalDescriptorTable() : nullSegmentSelector(0, 0, 0),
unusedSegmentSelector(0, 0, 0),
codeSegmentSelector(0, 64*1024*1024, 0x9A),
dataSegmentSelector(0, 64*1024*1024, 0x92)
{
//8 bytes defined, but processor expects 6 bytes only
uint32_t i[2];
//first 4 bytes is address of table
i[0] = (uint32_t)this;
//second 4 bytes, the high bytes, are size of global desc table
i[1] = sizeof(GlobalDescriptorTable) << 16;
// tell processor to use this table using its ldgt function
asm volatile("lgdt (%0)" : : "p" (((uint8_t *) i) + 2));
// issue debug print
printf_boot_message(" 1: Created Global Desc Table...\n");
}
// function to get the offset of the datasegment selector
uint16_t GlobalDescriptorTable::DataSegmentSelector()
{
// calculate the offset by subtracting the table's address from the datasegment's address
return (uint8_t *) &dataSegmentSelector - (uint8_t*)this;
}
// function to get the offset of the code segment
uint16_t GlobalDescriptorTable::CodeSegmentSelector()
{
// calculate the offset by subtracting the table's address from the code segment's address
return (uint8_t *) &codeSegmentSelector - (uint8_t*)this;
}
// default destructor
GlobalDescriptorTable::~GlobalDescriptorTable()
{
}
/**
* The constructor to create a new entry segment, set the flags, determine the formatting for the limit, and set the base
*/
GlobalDescriptorTable::SegmentDescriptor::SegmentDescriptor(uint32_t base, uint32_t limit, uint8_t flags)
{
uint8_t* target = (uint8_t*)this;
//if 16 bit limit
if (limit <= 65536)
{
// tell processor that this is a 16bit entry
target[6] = 0x40;
} else {
// if the last 12 bits of limit are not 1s
if ((limit & 0xFFF) != 0xFFF)
{
limit = (limit >> 12) - 1;
} else {
limit >>= 12;
}
// indicate that there was a shift of 12 done
target[6] = 0xC0;
}
// set the lower and upper 2 lowest bytes of limit
target[0] = limit & 0xFF;
target[1] = (limit >> 8) & 0xFF;
//the rest of limit must go in lower 4 bit of byte 6, and byte 5
target[6] |= (limit >> 16) & 0xF;
//encode the pointer
target[2] = base & 0xFF;
target[3] = (base >> 8) & 0xFF;
target[4] = (base >> 16) & 0xFF;
target[7] = (base >> 24) & 0xFF;
// set the flags
target[5] = flags;
}
/**
* Define the methods to get the base pointer from an segment and
* the limit for a segment, taken from os wiki
*/
uint32_t GlobalDescriptorTable::SegmentDescriptor::Base()
{
// simply do the reverse of wht was done to place the pointer in
uint8_t* target = (uint8_t*) this;
uint32_t result = target[7];
result = (result << 8) + target[4];
result = (result << 8) + target[3];
result = (result << 8) + target[2];
return result;
}
uint32_t GlobalDescriptorTable::SegmentDescriptor::Limit()
{
uint8_t* target = (uint8_t *)this;
uint32_t result = target[6] & 0xF;
result = (result << 8) + target[1];
result = (result << 8) + target[0];
//check if there was a shift of 12
if (target[6] & 0xC0 == 0xC0)
{
result = (result << 12) & 0xFFF;
}
return result;
}
i[0] = (uint32_t)this;
//second 4 bytes, the high bytes, are size of global desc table
i[1] = sizeof(GlobalDescriptorTable) << 16;
I've had the same problem, just swap the 0 and 1 in between:
i[1] = (uint32_t)this;
//second 4 bytes, the high bytes, are size of global desc table
i[0] = sizeof(GlobalDescriptorTable) << 16;
That's the problem if you are following the same tutorial and I think you do if you came here.
Sometimes due to wrong idtr value also(invalid pointer causing crash)
check the idtr reg value in vbox log
if u load idt in protected mode address of idt shows some wierd changes(shifted left 16bits or some value in lower 16 bit)
try changing pointer according to that(thats how i did) or use lidt in before entering protected mode(this is also tested)
There was a bug in my GDT that forced the kernel to read an invalid pointer from the segment. This caused a seg fault.

How to do memory implementation of Huffman Agorithm?

I have a Huffman code algorithm that compresses characters into sequences of bits of arbitrary length, smaller than the default size of a char (8 bits on most modern platforms)
If the Huffman Code compresses an 8-bit character into 3 bits, how do I represent that 3-bit value in memory? To take this further, how do I combine multiple compressed characters into a compressed representation?
For example consider l which is "00000" (5x8 bits since 0 is also character). How do I represent l with 00000 (5 bits) instead of a character sequence?
A C or C++ implementation is preferred.
Now that this question is re-opened...
To make a variable that holds a variable number of bits, we just use use the lower bits of one unsigned int to store the bits, and use another unsigned int to remember how many bits we have stored.
When writing out a Huffman-compressed file, we wait until we have at least 8 bits stored. Then we write out a char using the top 8 bits and subtract 8 from the stored bit count.
Finally, at the end if you have any bits left to write out, you round up to an even multiple of 8 and write chars.
In C++, it's useful to encapsulate the output in some kind of BitOutputStream class, like:
class BitOutputStream
{
std::ostream m_out;
unsigned m_bitsPending;
unsigned m_numPending;
public:
BitOutputStream(const char *fileName)
:m_out(... /* you can do this part */)
{
m_bitsPending = 0;
m_numPending = 0;
}
// write out the lower <count> bits of <bits>
void write(unsigned bits, unsigned count)
{
if (count > 16)
{
//do it in two steps to prevent overflow
write(bits>>16, count-16);
count=16;
}
//make space for new bits
m_numPending += count;
m_bitsPending <<= count;
//store new bits
m_bitsPending |= (bits & ((1<<count)-1));
//write out any complete bytes
while(m_numPending >= 8)
{
m_numPending-=8;
m_out.put((char)(m_bitsPending >> m_numPending));
}
}
//write out any remaining bits
void flush()
{
if (m_numPending > 0)
{
m_out.put((char)(m_bitsPending << (8-m_numPending)));
}
m_bitsPending = m_numPending = 0;
m_out.flush();
}
}
If your Huffman coder returns an array of 1s and 0s representing the bits that should and should not be set in the output, you can shift these bits onto an unsigned char. Every eight shifts, you start writing to the next character, ultimately outputting an array of unsigned char. The number of these compressed characters that you will output is equal to the number of bits divided by eight, rounded up to the nearest natural number.
In C, this is a relatively simple function, consisting of a left shift (<<) and a bitwise OR (|). Here is the function, with an example to make it runnable. To see it with more extensive comments, please refer to this GitHub gist.
#include <stdlib.h>
#include <stdio.h>
#define BYTE_SIZE 8
size_t compress_code(const int *code, const size_t code_length, unsigned char **compressed)
{
if (code == NULL || code_length == 0 || compressed == NULL) {
return 0;
}
size_t compressed_length = (code_length + BYTE_SIZE - 1) / BYTE_SIZE;
*compressed = calloc(compressed_length, sizeof(char));
for (size_t char_counter = 0, i = 0; char_counter < compressed_length && i < code_length; ++i) {
if (i > 0 && (i % BYTE_SIZE) == 0) {
++char_counter;
}
// Shift the last bit to be set left by one
(*compressed)[char_counter] <<= 1;
// Put the next bit onto the end of the unsigned char
(*compressed)[char_counter] |= (code[i] & 1);
}
// Pad the remaining space with 0s on the right-hand-side
(*compressed)[compressed_length - 1] <<= compressed_length * BYTE_SIZE - code_length;
return compressed_length;
}
int main(void)
{
const int code[] = { 0, 1, 0, 0, 0, 0, 0, 1, // 65: A
0, 1, 0, 0, 0, 0, 1, 0 }; // 66: B
const size_t code_length = 16;
unsigned char *compressed = NULL;
size_t compressed_length = compress_code(code, code_length, &compressed);
for (size_t i = 0; i < compressed_length; ++i) {
printf("%c\n", compressed[i]);
}
return 0;
}
You can then just write the characters in the array to a file, or even copy the array's memory directly to a file, to write the compressed output.
Reading the compressed characters into bits, which will allow you to traverse your Huffman tree for decoding, is done with right shifts (>>) and checking the rightmost bit with bitwise AND (&).

Best way to pass variables between Arduino and PC over serial?

I've written an Arduino sketch which reads data from a remote control receiver and returns a value between 0 and 1023 for that channel. I basically want to send this data (something in the format of channel:value, eg, Channel 1 : 1023, Channel 2 : 511) to a PC program (which I plan to write myself).
The most efficient way I can think to do this is to use two bytes of data, with the first 6 bits representing the channel (2^6 = 64 possible channels, way more than I need), and the last ten representing the value (2^10 = 1024, perfect). But I'm not sure on the best way to implement this in C++, or if this is even the most ideal way to do this. So:
What is the best way to craft individual bytes and work with binary numbers in C++? Preferably storing the values in memory as such (ie, no bool arrays, where each index takes up it's own byte). Two bytes of data is more than enough for what I need.
Is this the easiest/simplest/most efficient/recommended way to implement what I am trying to achieve? I basically want to pass variables as is between programs, are there any other ways to do this?
You can declare a packed struct to hold these two values:
struct chan_value_t
{
uint8_t channel : 6;
uint16_t value : 10;
};
But to send it as two bytes, you'll need to either (1) "union" it with a two-byte array:
union chan_value_t
{
struct {
uint8_t channel : 6;
uint16_t value : 10;
};
uint8_t bytes[2];
};
chan_value_t cv;
void setup()
{
Serial.begin( 9600 );
cv.channel = 2;
cv.value = 800;
for (int i=0; i<sizeof(cv.bytes); i++) {
Serial.print( cv.bytes[i], HEX );
Serial.print( ' ' );
}
Serial.println();
}
void loop() {}
(The struct is anonymous when nested in this union; the union has the name.)
Or (2) cast a pointer to the struct to a pointer to bytes:
struct chan_value_t {
uint8_t channel : 6;
uint16_t value : 10;
};
chan_value_t cv;
void setup()
{
Serial.begin( 9600 );
cv.channel = 2;
cv.value = 800;
uint8_t *bytes = (uint8_t *) &cv; // cast &cv to a pointer to bytes
for (int i=0; i<sizeof(cv); i++) {
Serial.print( bytes[i], HEX );
Serial.print( ' ' );
}
Serial.println();
}
void loop() {}
They both print the hexadecimal value of the bytes: 0x02 and 0xC8. 800 is 0x320, shifted left by 6 bits is 0xC800.
To send this to the PC, you may want to start with a special character sequence and finish with a checksum of some sort (Fletcher checksum is easy). Then it's easy to throw away garbage characters and know when there are transmission errors.
This is aimed at your no. 2 question.
OSC (OpenSoundControl) is a convenient way to send messages across different platforms and devices. Libraries exist for most platforms.
You could use the library OSC for Arduino and implement your own solution to the specification or using a library that fits your context.
The message you mention could be sent as /channel/1 /value/1023

bit pattern matching and replacing

I come across a very tricky problem with bit manipulation.
As far as I know, the smallest variable size to hold a value is one byte of 8 bits. The bit operations available in C/C++ apply to an entire unit of bytes.
Imagine that I have a map to replace a binary pattern 100100 (6 bits) with a signal 10000 (5 bits). If the 1st byte of input data from a file is 10010001 (8 bits) being stored in a char variable, part of it matches the 6 bit pattern and therefore be replaced by the 5 bit signal to give a result of 1000001 (7 bits).
I can use a mask to manipulate the bits within a byte to get a result of the left most bits to 10000 (5 bit) but the right most 3 bits become very tricky to manipulate. I cannot shift the right most 3 bits of the original data to get the correct result 1000001 (7 bit) followed by 1 padding bit in that char variable that should be filled by the 1st bit of next followed byte of input.
I wonder if C/C++ can actually do this sort of replacement of bit patterns of length that do not fit into a Char (1 byte) variable or even Int (4 bytes). Can C/C++ do the trick or we have to go for other assembly languages that deal with single bits manipulations?
I heard that Power Basic may be able to do the bit-by-bit manipulation better than C/C++.
If time and space are not important then you can convert the bits to a string representation and perform replaces on the string, then convert back when needed. Not an elegant solution but one that works.
<< shiftleft
^ XOR
>> shift right
~ one's complement
Using these operations, you could easily isolate the pieces that you are interested in and compare them as integers.
say the byte 001000100 and you want to check if it contains 1000:
char k = (char)68;
char c = (char)8;
int i = 0;
while(i<5){
if((k<<i)>>(8-3-i) == c){
//do stuff
break;
}
}
This is very sketchy code, just meant to be a demonstration.
I wonder if C/C++ can actually do this
sort of replacement of bit patterns of
length that do not fit into a Char (1
byte) variable or even Int (4 bytes).
What about std::bitset?
Here's a small bit reader class which may suit your needs. Of course, you may want to create a bit writer for your use case.
#include <iostream>
#include <sstream>
#include <cassert>
class BitReader {
public:
typedef unsigned char BitBuffer;
BitReader(std::istream &input) :
input(input), bufferedBits(8) {
}
BitBuffer peekBits(int numBits) {
assert(numBits <= 8);
assert(numBits > 0);
skipBits(0); // Make sure we have a non-empty buffer
return (((input.peek() << 8) | buffer) >> bufferedBits) & ((1 << numBits) - 1);
}
void skipBits(int numBits) {
assert(numBits >= 0);
numBits += bufferedBits;
while (numBits > 8) {
buffer = input.get();
numBits -= 8;
}
bufferedBits = numBits;
}
BitBuffer readBits(int numBits) {
assert(numBits <= 8);
assert(numBits > 0);
BitBuffer ret = peekBits(numBits);
skipBits(numBits);
return ret;
}
bool eof() const {
return input.eof();
}
private:
std::istream &input;
BitBuffer buffer;
int bufferedBits; // How many bits are buffered into 'buffer' (0 = empty)
};
Use a vector<bool> if you can read your data into the vector mostly at once. It may be more difficult to find-and-replace sequences of bits, though.
If I understood your questions correctly, you have an input stream and and output stream and you want to replace the 6bits of the input with 5 in the output - and your output still should be a bit stream?
So, the most important programmer's rule can be applied: Divide et impera!
You should split your component in three parts:
Input Stream converter: Convert every pattern in the input stream to a char array (ring) buffer. If I understood you correctly your input "commands" are 8bit long, so there is nothing special about this.
Do the replacement on the ring buffer in a way that you replace every matching 6-bit pattern with the 5bit one, but "pad" the 5 bit with a leading zero, so the total length is still 8bit.
Write an output handler that reads from the ring buffer and let this output handler write only the 7 LSB to the output stream from each input byte. Of course some bit manipulation is necessary again for this.
If your ring buffer size can be divided by 8 and 7 (= is a multiple of 56) you will have a clean buffer at the end and can start again with 1.
The most simplest way to implement this is to iterate over this 3 steps as long as input data is available.
If a performance really matters and you are running on a multi-core CPU you even could split the steps and 3 threads, but then you must carefully synchronize the access to the ring buffer.
I think the following does what you want.
PATTERN_LEN = 6
PATTERNMASK = 0x3F //6 bits
PATTERN = 0x24 //b100100
REPLACE_LEN = 5
REPLACEMENT = 0x10 //b10000
void compress(uint8* inbits, uint8* outbits, int len)
{
uint16 accumulator=0;
int nbits=0;
uint8 candidate;
while (len--) //for all input bytes
{
//for each bit (msb first)
for (i=7;i<=0;i--)
{
//add 1 bit to accumulator
accumulator<<=1;
accumulator|=(*inbits&(1<<i));
nbits++;
//check for pattern
candidate = accumulator&PATTERNMASK;
if (candidate==PATTERN)
{
//remove pattern
accumulator>>=PATTERN_LEN;
//add replacement
accumulator<<=REPLACE_LEN;
accumulator|=REPLACMENT;
nbits+= (REPLACE_LEN - PATTERN_LEN);
}
}
inbits++;
//move accumulator to output to prevent overflow
while (nbits>8)
{
//copy the highest 8 bits
nbits-=8;
*outbits++ = (accumulator>>nbits)&0xFF;
//clear them from accumulator
accumulator&= ~(0xFF<<nbits);
}
}
//copy remainder of accumulator to output
while (nbits>0)
{
nbits-=8;
*outbits++ = (accumulator>>nbits)&0xFF;
accumulator&= ~(0xFF<<nbits);
}
}
You could use a switch or a loop in the middle to check the candidate against multiple patterns. There might have to be some special handling after doing a replacment to ensure the replacement pattern is not re-checked for matches.
#include <iostream>
#include <cstring>
size_t matchCount(const char* str, size_t size, char pat, size_t bsize) noexcept
{
if (bsize > 8) {
return 0;
}
size_t bcount = 0; // curr bit number
size_t pcount = 0; // curr bit in pattern char
size_t totalm = 0; // total number of patterns matched
const size_t limit = size*8;
while (bcount < limit)
{
auto offset = bcount%8;
char c = str[bcount/8];
c >>= offset;
char tpat = pat >> pcount;
if ((c & 1) == (tpat & 1))
{
++pcount;
if (pcount == bsize)
{
++totalm;
pcount = 0;
}
}
else // mismatch
{
bcount -= pcount; // backtrack
//reset
pcount = 0;
}
++bcount;
}
return totalm;
}
int main(int argc, char** argv)
{
const char* str = "abcdefghiibcdiixyz";
char pat = 'i';
std::cout << "Num matches = " << matchCount(str, 18, pat, 7) << std::endl;
return 0;
}

nesC (C-like) question

This is the code from TestAVBoardM.nc file in nesC language:
#define BUFFERLEN 32768
uint32_t gBuffer[BUFFERLEN] __attribute__((aligned(32)));
uint32_t gNumSamples = BUFFERLEN/4;
event void Audio.ready(result_t success)
{
call Audio.audioRecord(gBuffer,gNumSamples));
return;
}
The buffer gBuffer is used to store sound recording samples. Samples are 16-bit stereo samples packed into a 32-bit word. Left samples are in the low 16 bits. Right samples are in the high 16 bits.
What makes me confused is the number of samples gNumSamples. As I understand, gNumSamples should be BUFFERLEN since gBuffer[i] is 32-bit word (16 bits for left channel + 16 for right channel). Am I right? (I changed gNumSamples = BUFFERLEN and it didn't work).
Thanks for your help.
This is how gBuffer is used:
command result_t Audio.audioRecord(uint32_t *buffer, uint32_t numSamples){
uint32_t *pBuf;
uint32_t bufpos;
bool initPlay;
atomic{
initPlay = gInitPlay;
}
if(initPlay == TRUE){
//gate the acceptance of a record command until we signal audio.ready();
return FAIL;
}
atomic{
pBuf = gRxBuffer;
bufpos = gRxBufferPos;
}
if( (bufpos != 0) || (pBuf != NULL)){
//gate acceptance due to ongoing record command
return FAIL;
}
atomic{
gRxBuffer = buffer;
gRxBufferPos = 0;
gRxNumBytes = numSamples * 4;
}
call BulkTxRx.BulkReceive((uint8_t *)buffer, ((numSamples*4) > 8188)? 8188: (numSamples*4));
return SUCCESS;
}
I just came across this question when looking for nesC. Just answering it for whatever it's worth.
If you look at the audioRecord function, they are multiplying numSamples by 4 to compensate for the division by 4 (BUFFERLEN/4) earlier. Without the full context, I cannot tell why they have to divide it in the first place. My guess would be gBuffer is divided into 4 parts, each part storing numSamples, so when the producer is writing to one part, the consumer can read from another part.