I am adapting the example for the Arduino AutoAnalogAudio library entitled
SDAudioWavPlayer
which can be found in Examples->AutoAnalogAudio->SDAudio->SDAudioWavPlayer
This example uses interrupts to repeatedly call the function
void loadBuffer(). The code for that is below
/* Function called from DAC interrupt after dacHandler(). Loads data into the dacBuffer */
void loadBuffer() {
if (myFile) {
if (myFile.available()) {
if (aaAudio.dacBitsPerSample == 8) {
//Load 32 samples into the 8-bit dacBuffer
myFile.read((byte*)aaAudio.dacBuffer, MAX_BUFFER_SIZE);
}else{
//Load 32 samples (64 bytes) into the 16-bit dacBuffer
myFile.read((byte*)aaAudio.dacBuffer16, MAX_BUFFER_SIZE * 2);
//Convert the 16-bit samples to 12-bit
for (int i = 0; i < MAX_BUFFER_SIZE; i++) {
aaAudio.dacBuffer16[i] = (aaAudio.dacBuffer16[i] + 0x8000) >> 4;
}
}
}else{
#if defined (AUDIO_DEBUG)
Serial.println("File close");
#endif
myFile.close();
aaAudio.disableDAC();
}
}
}
The specific part I am concerned with is the second part of the if statement
{
//Load 32 samples (64 bytes) into the 16-bit dacBuffer
myFile.read((byte*)aaAudio.dacBuffer16, MAX_BUFFER_SIZE * 2);
//Convert the 16-bit samples to 12-bit
for (int i = 0; i < MAX_BUFFER_SIZE; i++) {
aaAudio.dacBuffer16[i] = (aaAudio.dacBuffer16[i] + 0x8000) >> 4;
}
}
Despite the comment MAX_BUFFER_SIZE is 256 so 512 bytes are read into
aaAudio.dacBuffer16. That data was originally 16 bit signed integers (+/- 32k) and dacBuffer16 is an array of 16bit unsigned integers (0-64K). The negative sign is removed by going through the array and adding 2^15 (0x8000) to each element. This makes the negative numbers overflow leaving the positive part of the negative number. Positive numbers are just increased by 2^15. thus the values are rescalled to lie in 0 -64K. The result is then shifted 4 places right so that only the highest 12 bits remain which is what the Arduino DAC can handle. This all happens in the line
aaAudio.dacBuffer16[i] = (aaAudio.dacBuffer16[i] + 0x8000) >> 4;
So far so good.
Now I want to be able to programmatically reduce the volume. As far as I can find the library does not provide a function to do that so I thought that the simplest
thing to do was to change the '4' to 'N' and increase the amount of shifting to 5,6,7.. etc
eg
aaAudio.dacBuffer16[i] = (aaAudio.dacBuffer16[i] + 0x8000) >> N;
where N is an integer. I tried this but I got a terribly distorted result which I did not understand.
While fiddling around trying different things I tried the following which works
uint16_t sample;
int N = 5;
for (int i = 0; i < MAX_BUFFER_SIZE; i++)
{
sample = (aaAudio.dacBuffer16[i] + 0x8000);
sample = sample >> N;
// sample = sample / 40;
aaAudio.dacBuffer16[i] = sample;
}
You can also see that I have commented out simply dividing by a number which works if I want finer control.
My problem is I do not see what the difference is between the two bits of code.
Can anybody enlighten me ?
Related
I'm trying to figure out how to use masked loads and stores for the last few elements to be processed. My use case involves converting a packed 10 bit data stream to 16 bit which means loading 5 bytes before storing 4 shorts. This results in different masks of different types.
The main loop itself is not a problem. But at the end I'm left with up to 19 bytes input / 15 shorts output which I thought I could process in up to two loop iterations using the 128 bit vectors. Here is the outline of the code.
#include <immintrin.h>
#include <stddef.h>
#include <stdint.h>
void convert(uint16_t* out, ptrdiff_t n, const uint8_t* in)
{
uint16_t* const out_end = out + n;
for(uint16_t* out32_end = out + (n & -32); out < out32_end; in += 40, out += 32) {
/*
* insert main loop here using ZMM vectors
*/
}
if(out_end - out >= 16) {
/*
* insert half-sized iteration here using YMM vectors
*/
in += 20;
out += 16;
}
// up to 19 byte input remaining, up to 15 shorts output
const unsigned out_remain = out_end - out;
const unsigned in_remain = (out_remain * 10 + 7) / 8;
unsigned in_mask = (1 << in_remain) - 1;
unsigned out_mask = (1 << out_remain) - 1;
while(out_mask) {
__mmask16 load_mask = _cvtu32_mask16(in_mask);
__m128i packed = _mm_maskz_loadu_epi8(load_mask, in);
/* insert computation here. No masks required */
__mmask8 store_mask = _cvtu32_mask8(out_mask);
_mm_mask_storeu_epi16(out, store_mask, packed);
in += 10;
out += 8;
in_mask >>= 10;
out_mask >>= 8;
}
}
(Compile with -O3 -mavx2 -mavx512f -mavx512bw -mavx512vl -mavx512dq)
My idea was to create a bit mask from the number of remaining elements (since I know it fits comfortably in an integer / mask register), then shift values out of the mask as they are processed.
I have two issues with this approach:
I'm re-setting the masks from GP registers each iteration instead of using the kshift family of instructions
_cvtu32_mask8 (kmovb) is the only instruction in this code that requires AVX512DQ. Limiting the number of suitable hardware platforms just for that seems weird
What I'm wondering about:
Can I cast mmask32 to mmask16 and mmask8?
If I can, I could set it once from the GP register, then shift it in its own register. Like this:
__mmask32 load_mask = _cvtu32_mask32(in_mask);
__mmask32 store_mask = _cvtu32_mask32(out_mask);
while(out < out_end) {
__m128i packed = _mm_maskz_loadu_epi8((__mmask16) load_mask, in);
/* insert computation here. No masks required */
_mm_mask_storeu_epi16(out, (__mmask8) store_mask, packed);
load_mask = _kshiftri_mask32(load_mask, 10);
store_mask = _kshiftri_mask32(store_mask, 8);
in += 10;
out += 8;
}
GCC seems to be fine with this pattern. But Clang and MSVC create worse code, moving the mask in and out of GP registers without any apparent reason.
I am trying to understand the code of fpaq0 aritmetic compressor but I am not able to fully understand it.Here is the link to the code -fpaq0.cpp
I am not able to understand exactly the how ct[512]['2] and cxt are working.Also I am not very much clear how decoder is working.Why before encoding every charater e.encode(0) is being called.
NOTE; I have understood the arithmetic coder presented in the link-Data Compression with Arithmetic Encoding
void update(int y) {
if (++ct[cxt][y] > 65534) {
ct[cxt][0] >>= 1;
ct[cxt][1] >>= 1;
}
if ((cxt+=cxt+y) >= 512)
cxt=1;
}
// Assume a stationary order 0 stream of 9-bit symbols
int p() const {
return 4096*(ct[cxt][1]+1)/(ct[cxt][0]+ct[cxt][1]+2);
}
inline void Encoder::encode(int y) {
// Update the range
const U32 xmid = x1 + ((x2-x1) >> 12) * predictor.p();
assert(xmid >= x1 && xmid < x2);
if (y)
x2=xmid;
else
x1=xmid+1;
predictor.update(y);
// Shift equal MSB's out
while (((x1^x2)&0xff000000)==0) {
putc(x2>>24, archive);
x1<<=8;
x2=(x2<<8)+255;
}
}
inline int Encoder::decode() {
// Update the range
const U32 xmid = x1 + ((x2-x1) >> 12) * predictor.p();
assert(xmid >= x1 && xmid < x2);
int y=0;
if (x<=xmid) {
y=1;
x2=xmid;
}
else
x1=xmid+1;
predictor.update(y);
// Shift equal MSB's out
while (((x1^x2)&0xff000000)==0) {
x1<<=8;
x2=(x2<<8)+255;
int c=getc(archive);
if (c==EOF) c=0;
x=(x<<8)+c;
}
return y;
}
fpaq0 is a file compressor which uses an order-0 bitwise model for modeling and uses 12-bits carry-less arithmetic coder for entropy coding stage. ct[512][2] stores counters for each contexts to compute symbol probabilities. The context (order-0 in fpaq0) is calculated with partial bits with a leading one (to simplify calculations).
For more easy explanation, let's skip EOF symbol for now. Order-0 context calculated as follow without EOF symbol (simplified):
// Full byte encoding
int cxt = 1; // context starts with leading one
for (int i = 0; i < 8; ++i) {
// Encoding part
int y = ReadNextBit();
int p = GetProbability(ctx);
EncodeBit(y, p);
// Model updating
UpdateCounter(cxt, y); // Update related counter
cxt = (cxt << 1) | y; // shift left and insert new bit
}
For decoding, context is used without EOF symbol like following (simplified):
// Full byte decoding
int cxt = 1; // context starts with leading one
for (int i = 0; i < 8; ++i) {
// Decoding part
int p = GetProbability(ctx);
int y = DecodeBit(p);
WriteBit(y);
// Model updating
UpdateCounter(cxt, y); // Update related counter
cxt = (cxt << 1) | y; // shift left and insert new bit
}
fpaq0 is designed as a streaming compressor. Meaning that it doesn't need to know exact length of the input stream. So, the question how decoder should know when to stop? EOF symbol used exactly for that. While encoding every single byte, a zero bit is encoded as a flag to indicate there is more data to follow. One indicates we reached the end of stream. So, decoder knows when to stop. That's the reason why our context model is 9-bits (EOF flag + 8 bits data).
Now, the last part: probability calculation. fpaq0 uses just counts of past symbols under order-0 context to calculate final probability.
n0 = count of 0
n1 = count of 1
p = n1 / (n0 + n1)
There are two implementation details that should be addressed: counter overflow and division by zero.
Counter overflow is addressed by halving both counts when they reach a threshold. Since, we're dealing with p, it makes sense.
Division by zero is addressed by inserting one into formula for each variables. So,
p = (n1 + 1) / ((n0 + 1) + (n1 + 1))
I am implementing an audio channel mixer and using Viktor T. Toth's algorithm. Trying to mix two audio channel streams.
In the code, quantization_ is the byte representation of the bit depth of a channel. My mix function, takes a pointer to destination and source uint8_t buffers, mixes two channels and writes into the destination buffer. Because I am taking data in a uint8_t buffer, doing that addition, division, and multiplication operations to get the actual 8, 16 or 24-bit samples and convert them again to 8-bit.
Generally, it gives the expected output sample values. However, some samples turn out to have near 0 value as they are not supposed to be when I look the output in Audacity. In the screenshot, bottom 2 signals are two mono channels and the top one is the mixed channel. It can be seen that there are some very low values, especially in the middle.
Below, is my mix function;
void audio_mixer::mix(uint8_t* dest, const uint8_t* source)
{
uint64_t mixed_sample = 0;
uint64_t dest_sample = 0;
uint64_t source_sample = 0;
uint64_t factor = 0;
for (int i = 0; i < channel_size_; ++i)
{
dest_sample = 0;
source_sample = 0;
factor = 1;
for (int j = 0; j < quantization_; ++j)
{
dest_sample += factor * static_cast<uint64_t>(*dest++);
source_sample += factor * static_cast<uint64_t>(*source++);
factor = factor * 256;
}
mixed_sample = (dest_sample + source_sample) - (dest_sample * source_sample / factor);
dest -= quantization_;
for (int k = 0; k < quantization_; ++k)
{
*dest++ = static_cast<uint8_t>(mixed_sample % 256);
mixed_sample = mixed_sample / 256;
}
}
}
It seems like you aren't treating the signed audio samples correctly. The horizontal line should be zero voltage from your audio signal.
If you look at the positive voltage audio samples they obey your equation correctly (except for the peak values in the center). The negative values are being compressed which makes me feel like they are being treated as small positive voltages instead of negative voltages.
In other words, maybe those unsigned ints should be signed ints so the top bit indicates the voltage polarity and you can have audio samples in the range +127 to -128.
Those peak values in the center seem like they are wrapping around modulo 255 which would be the peak value for an unsigned byte representation of your audio. I'm not sure how this would happen but it seems related to the unsigned vs signed signals.
Maybe you should try the other formula Viktor provided in his document:
Z = 2(A+B) - (AB/128) - 256
I'm writing a small program for the Arduino that is able to read RGB values from a char array of HEX color codes. Let me just give you an example because it is hard to explain differently:
From the arduino serial monitor I for example send this:
/1ffffff000000
The first character tells the Arduino that this will be sequence of hex color codes. The second character tells it how many color codes there will be (it starts with 0. Thus 1 means two colors). Then the it loops trough six characters of every HEX code and adds it to the respected place in the hex[] char array. Hex[] array is two dimensional because in the first "dimension" it has the sequence number of a color and in the second it stores the RGB values of that color.
The output of this is following:
255
255
255
0
0
0
//the first part is okay, but the the second gets all messed up.
255 255 0 0 0 0 0
//the RED value of the next color gets set as the BLUE value of the previous color
And here is the code. I could't find any easier method for this idea to work. If you have suggestion on how to make this better or more efficient please let me know.
Thanks in advance!
char barva[10];
char hex[10][2];
long bluetoothVal;
bluetoothVal = Serial.read();
if (bluetoothVal == '/')
{
delay(2);
Serial.flush();
input=Serial.read();
char load = input;
int steps = load - '0';
for (int counter = 0; counter <= steps; counter++)
{
for (int i = 0; i <= 5; i++)
{
delay(2);
Serial.flush();
delay(2);
Serial.flush();
bluetoothVal=Serial.read();
char load = bluetoothVal;
barva[i] = load;
}
long int rgb = strtol(barva,0,16); //=>rgb=0x001234FE;
hex[counter][0] = (byte)(rgb>>16);
hex[counter][1] = (byte)(rgb>>8);
hex[counter][2] = (byte)(rgb);
Serial.println(hex[counter][0]);
Serial.println(hex[counter][1]);
Serial.println(hex[counter][2]);
}
for (int i = 0; i <= 1; i++)
{
Serial.println("");
Serial.println(hex[i][0]);
Serial.println(hex[i][1]);
Serial.println(hex[i-1][2]);
}
}
hex should be declared as
char hex[10][3];
You are accessing hex as hex[counter][2] = (byte)(rgb); at one place. For this you require a 10 * 3 array.
Hey folks!
I got this image.bmp.
When i read it with all padding included and such i get this result.
What am i doing wrong here besides reading the image upside down? I don't find anything relative on Wikipedia or by googling. It seems that after 24 pixels width the image is mirrored 8 pixels. Why!? I don't get it!? How can i fix this!?
I'm reading the file with some C++ code on Windows reading the BMP file raw.
The image file is monochrome. 1 bit per pixel.
Code for showing bitmap data:
unsigned int count = 0; // Bit counting variable
unsigned char *bitmap_data = new char[size]; // Array containing the raw data of the image
for(unsigned int i=0; i<size; i++){ // This for-loop goes through every byte of the bitmap_data
for(int j=1; j<256; j*=2){ // This gives j 1, 2, 4, 8, 16, 32, 64 and 128. Used to go through every bit in the bitmap_data byte
if(count >= width){ // Checking if the row is ended
cout << "\n"; // Line feed
while(count > 32) count -=32; // For padding.
if(count < 24) i++;
if(count < 16) i++;
if(count < 8) i++;
count = 0; // resetting bit count and break out to next row
break;
}
if(i>=size) break; // Just in case
count++; // Increment the bitcounter. Need to be after end of row check
if(bitmap_data[i] & j){ // Compare bits
cout << (char)0xDB; // Block
}else{
cout << (char)' '; // Space
}
}
}
Thanks in advance!
You are almost certainly interpreting/outputting the bits in the wrong order in each byte. This results in each column of 8 pixels being flipped left to right.
The BMP format states that the left-most pixel is the most significant bit, and the right-most pixel is the least. In your code, you are iterating the wrong way through the bits.