I am working on a project, it reads data(24bits per sample) from binary files, decompress it to 32bits and convert it signed int.
I have this function for the decompress and convert tasks, and it causes wasting a lot of time in the project for huge data.
signed int convertedSample(QByteArray samp){
QString sample = QString(samp.toHex() + "00");
bool ok;
signed int decimalSample = sample.toInt(&ok, 16);
return decimalSample;
}
To read data from files and store it, I do:
while(!file.atEnd()){
QByteArray chunkSamples = file.read(chunkSize);// chunkSize here is the size of 1 million of samples
for (int i = 0; i < chunkSamples.size() ; i++){
QByteArray samp = chunkSamples.mid(i * 3, 3); // 24bits
buffer.append(convertedSample(samp)); // buffer is a QVector of int
}
Any help to make it faster ?
Thanks to the guys in the comments, I've made some improvements over time. I avoided using strings for the conversion because it seems that using strings is always slow and replaced it with using bitwise shift and OR to combine the individual bytes into int.
I replaced this code:
QString sample = QString(samp.toHex() + "00");
bool ok;
int decimalSample = sample.toInt(&ok, 16);
with :
int decimalSample = ((samp[0] << 24) | (samp[1] << 16) | (samp[2] << 8));
I have written a 8b10b encoder that generates a stream of bytes intended to be sent to a serial transmitter which sends the bytes as-is LSb first.
What I'm doing here is basically lay down groups of 10 bits (encoded from the input stream of bytes) on groups of 8, so a varying number of bits get carried over from one output byte to the next - kind of like in music/rhythm.
The program has been successfully tested, but it is about 4-5x too slow for my application. I think it comes from the fact that every bit has to be looked up in an array. My guts tell me we could make that faster by having some sort of rolling mask but I can't yet see how to do that even by swapping out the 3d array of booleans to a 2D array of integers.
Any pointer or other idea?
Here is the code. Please ignore most of the macros and some of the code related to deciding which byte is to be written as this is application-specific.
Header:
#ifndef TX_BYTESTREAM_GEN_H_INCLUDED
#define TX_BYTESTREAM_GEN_H_INCLUDED
#include <stdint.h> //for standard portable types such as uint16_t
#define MAX_USB_TRANSFER_SIZE 1016 //Bytes, size of the max payload in a USB transaction. Determined using FT4222_GetMaxTRansferSize()
#define MAX_USB_PACKET_SIZE 62 //Bytes, max size of the payload of a single USB packet
#define MANDATORY_TX_PACKET_BLOCK 5 //Bytes, constant - equal to the minimum number of bytes of TX packet necessary to exactly transfer blocks of 10 bits of encoded data (LCF of 8 and 10)
#define SYNC_CHARS_MAX_INTERVAL 172 //Target number of payload bytes between sync chars. Max is 188 before desynchronisation
#define ROUND_UP(N, S) ((((N) + (S) - 1) / (S)) * (S)) //Macro to round up the integer N to the largest multiple of the integer S
#define ROUND_DOWN(N,S) ((N / S) * S) //Same rounding down
#define N_SYNC_CHAR_PAIRS_IN_PCKT(pcktSz) (ROUND_UP((pcktSz*1000/(SYNC_CHARS_MAX_INTERVAL+2)),1000)/1000) //Number of sync (K28.5) character/byte pairs in a given packet
#define TX_PAYLOAD_SIZE(pcktSz) ((pcktSz*4/5)-2*N_SYNC_CHAR_PAIRS_IN_PCKT(pcktSz)) //Size in bytes of the payload data before encoding in a single TX packet
#define MAX_TX_PACKET_SIZE (ROUND_DOWN((MAX_USB_TRANSFER_SIZE-MAX_USB_PACKET_SIZE),(MAX_USB_PACKET_SIZE*MANDATORY_TX_PACKET_BLOCK))) //Maximum size in bytes of a TX packet
#define DEFAULT_TX_PACKET_SIZE (MAX_TX_PACKET_SIZE-MAX_USB_PACKET_SIZE*MANDATORY_TX_PACKET_BLOCK) //Default size in bytes of a TX packet with some margin
#define MAX_TX_PAYLOAD_SIZE (TX_PAYLOAD_SIZE(MAX_TX_PACKET_SIZE)) //Maximum size in bytes of the payload in a TX packet
#define DEFAULT_TX_PAYLOAD_SIZE (TX_PAYLOAD_SIZE(DEFAULT_TX_PACKET_SIZE))//Default size in bytes of the payload in a TX packet with some margin
//See string descriptors below for definitions. Error codes are individual bits so can be combined.
enum ErrCode
{
NO_ERR = 0,
INVALID_DIN_SIZE = 1,
INVALID_DOUT_SIZE = 2,
NULL_DIN_PTR = 4,
NULL_DOUT_PTR = 8
};
char const * const ERR_CODE_DESC[] = {
"No error",
"Invalid size of input data",
"Invalid size of output buffer",
"Input data pointer is NULL",
"Output buffer pointer is NULL"
};
/** #brief Generates the bytestream to the transmitter by encoding the incoming data using 8b10b encoding
and inserting K28.5 synchronisation characters to maintain the synchronisation with the demodulator (LVDS passthrough mode)
#arg din is a pointer to an allocated array of bytes which contains the data to encode
#arg dinSize is the size of din in bytes. This size must be equal to TX_PAYLOAD_SIZE(doutSize)
#arg dout is a pointer to an allocated array of bytes which is intended to contain the output bytestream to the transmitter
#arg doutSize is the size of dout in bytes. This size must meet the conditions at the top of this function's implementation. Use DEFAULT_TX_PACKET_SIZE if in doubt.
#return error code (c.f. ErrCode) **/
int TX_gen_bytestream(uint8_t *din, uint16_t dinSize, uint8_t *dout, uint16_t doutSize);
#endif // TX_BYTESTREAM_GEN_H_INCLUDED
Source file:
#include "TX_bytestream_gen.h"
#include <cstddef> //NULL
#define N_BYTE_VALUES (256+1) //256 possible data values + 1 special character (only accessible to this module)
#define N_ENCODED_BITS 10 //Number of bits corresponding to the 8b10b encoding of a byte
//Map the current running disparity, the desired value to encode to the array of encoded bits for 8b10b encoding.
//The Last value is the K28.5 sync character, only accessible to this module
//Notation = MSb to LSb
bool const encodedBits[2][N_BYTE_VALUES][N_ENCODED_BITS] =
{
//Long table (see appendix)
};
//New value of the running disparity after encoding with the specified previous running disparity and requested byte value (c.f. above)
bool const encodingDisparity[2][N_BYTE_VALUES] =
{
//Long table (see appendix)
};
int TX_gen_bytestream(uint8_t *din, uint16_t dinSize, uint8_t *dout, uint16_t doutSize)
{
static bool RDp = false; //Running disparity is initially negative
int ret = 0;
//If the output buffer size is not a multiple of the mandatory payload block or of the USB packet size, or if it cannot be held in a single USB transaction
//return an invalid output buffer size error
if(doutSize == 0 || (doutSize % MANDATORY_TX_PACKET_BLOCK) || (doutSize % MAX_USB_PACKET_SIZE) || (doutSize > MAX_TX_PACKET_SIZE)) //Temp
ret |= INVALID_DOUT_SIZE;
//If the input data size is not consistent with the output buffer size, return the appropriate error code
if(dinSize == 0 || dinSize != TX_PAYLOAD_SIZE(doutSize))
ret |= INVALID_DIN_SIZE;
if(din == NULL)
ret |= NULL_DIN_PTR;
if(dout == NULL)
ret |= NULL_DOUT_PTR;
//If everything checks out, carry on
if(ret == NO_ERR)
{
uint16_t iByteIn = 0; //Index of the byte of input data currently being processed
uint16_t iByteOut = 0; //Index of the output byte currently being written to
uint8_t iBitOut = 0; //Starts with LSb
int16_t nBytesUntilSync = 0; //Countdown of bytes until a sync marker needs to be sent. Cyclic.
//For all output bytes to generate
while(iByteOut < doutSize)
{
bool sync = false; //Initially this byte is not considered a sync byte (in which case the next byte of data will be processed)
//If the maximum interval between sync characters has been reached, mark the two next bytes as sync bytes and reset the counter
if(nBytesUntilSync <= 0)
{
sync = true;
if(nBytesUntilSync == -1) //After the second SYNC is written, the counter is reset
{
nBytesUntilSync = SYNC_CHARS_MAX_INTERVAL;
}
}
//Append bit by bit the encoded data of the byte to write to the output bitstream (carried over from byte to byte) - LSb first
//The byte to write is either the last byte of the encodedBits map (the sync character K28.5) if sync is set, or the next byte of
//input data if it isn't
uint16_t const byteToWrite = (sync?(N_BYTE_VALUES-1):din[iByteIn]);
for(int8_t iEncodedBit = N_ENCODED_BITS-1 ; iEncodedBit >= 0 ; --iEncodedBit, iBitOut++)
{
//If the current output byte is complete, reset the bit index and select the next one
if(iBitOut >= 8)
{
iByteOut++;
iBitOut = 0;
}
//Effectively sets the iBitOut'th bit of the iByteOut'th byte out to the encoded value of the byte to write
bool bitToWrite = encodedBits[RDp][byteToWrite][iEncodedBit]; //Temp
dout[iByteOut] ^= (-bitToWrite ^ dout[iByteOut]) & (1 << iBitOut);
}
//The running disparity is also updated as per the standard (to achieve DC balance)
RDp = encodingDisparity[RDp][byteToWrite]; //Update the running disparity
//If sync was not set, this means a byte of the input data has been processed, in which case take the next one in
//Also decrement the synchronisation counter
if(!sync) {
iByteIn++;
}
//In any case, decrease the synchronisation counter. Even sync characters decrease it (c.f. top of while loop)
nBytesUntilSync--;
}
}
return ret;
}
Testbench:
#include <iostream>
#include "TX_bytestream_gen.h"
#define PACKET_DURATION 0.000992 //In seconds, time of continuous data stream corresponding to one packet (5MHz output, default packet size)
#define TIME_TO_SIMULATE 10 //In seconds
#define PACKET_SIZE DEFAULT_TX_PACKET_SIZE
#define PAYLOAD_SIZE DEFAULT_TX_PAYLOAD_SIZE
#define N_ITERATIONS (TIME_TO_SIMULATE/PACKET_DURATION)
#include <chrono>
using namespace std;
//Testbench: measure the time taken to simulate TIME_TO_SIMULATE seconds of continuous encoding
int main()
{
uint8_t toEncode[PAYLOAD_SIZE] = {100}; //Dummy data, doesn't matter
uint8_t out[PACKET_SIZE] = {0};
std::chrono::time_point<std::chrono::system_clock> start, end;
start = std::chrono::system_clock::now();
for(unsigned int i = 0 ; i < N_ITERATIONS ; i++)
{
TX_gen_bytestream(toEncode, PAYLOAD_SIZE, out, PACKET_SIZE);
}
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end - start;
std::cout << "Task execution time: " << elapsed_seconds.count()/TIME_TO_SIMULATE*100 << "% (for " << TIME_TO_SIMULATE << "s simulated)\n";
return 0;
}
Appendix: lookup tables. I don't have enough characters to paste it here, but it looks like so:
bool const encodedBits[2][N_BYTE_VALUES][N_ENCODED_BITS] =
{
//Running disparity = RD-
{
{1,0,0,1,1,1,0,1,0,0},
//...
},
//Running disparity = RD+
{
{0,1,1,0,0,0,1,0,1,1},
//...
}
};
bool const encodingDisparity[2][N_BYTE_VALUES] =
{
//Previous running disparity was RD-
{
0,
//...
},
//Previous running disparity was RD+
{
1,
//...
}
};
This will be a lot faster if you do everything a byte at time instead of a bit at a time.
First change the way you store your lookup tables. You should have something like:
// conversion from (RD, byte) to (RD, 10-bit code)
// in each word, the lower 10 bits are the code,
// and bit 10 (the 11th bit) is the new RD
// The first 256 values are for RD -1, the next
// for RD 1
static const uint16_t BYTE_TO_CODE[512] = {
...
}
Then you need to change our encoding loop to write a byte at a time. You can use a uint16_t to store the leftover bits from each byte you output.
Something like this (I didn't figure out your sync byte logic, but presumably you can put that in the input or output byte loop):
// returns next isRD1
bool TX_gen_bytestream(uint8_t *dest, const uint8_t *src, size_t src_len, bool isRD1)
{
// bits generated, but not yet written, LSB first
uint16_t bits = 0;
// number of bits in bits
unsigned numbits = 0;
// current RD, either 0 or 256
uint16_t rd = isRD1 ? 256 : 0;
for (const uint8_t *end = src + src_len; src < end; ++src) {
// lookup code and next rd
uint16_t code = BYTE_TO_CODE[rd + *src];
// new rd from code bit 10
rd = (code>>2) & 256;
// store bits
bits |= (code & (uint16_t)0x03FF) << numbits;
numbits+=10;
// write out any complete bytes
while(numbits >= 8) {
*dest++ = (uint8_t)bits;
bits >>=8;
numbits-=8;
}
}
// If src_len isn't divisible by 4, then we have some extra bits
if (numbits) {
*dest = (uint8_t)bits;
}
return !!rd;
}
I have just started to study the CRC and how to implement it in software. My information source is mainly following document. Here is mentioned some simple algorithm for calculating CRC for any generator polynomial. I have attempted to write this algorithm in C++ language. I have tested it for generator polynomial x^5 + x^4 + x^2 + 1 (CRC-5) (generator polynomial used by chip) with usage of this online calculator and it works.
#include <iostream>
using namespace std;
int main() {
uint8_t data_byte = 0x31;
// polynom x^5 + x^4 + x^2 + 1
uint16_t polynom = 0x35;
// register contains 0 at the beginning
uint32_t crc = 0;
uint32_t message = 0;
// shift the message byte to left by so many bits which is needed for generator polynomial
message = (data_byte << 5);
// now the message byte is 13 bits long
uint8_t processed_bit = 13;
while(processed_bit > 0) {
// prepare free space for new bit from the message byte
crc = crc << 1;
// find out value of current msb in the message byte
message = message << 1;
if(message & 0x2000) {
// msb in message byte is "1"
// lsb in register is set to "1"
crc |= 1U;
} else {
// msb in message byte is "0"
// lsb in register is set to "0"
crc &= ~1U;
}
// remove msb from message byte
message = message & ~0x2000;
if(crc & 0x20) {
// subtract current multiple of the generator polynomial
crc = crc ^ polynom;
}
// remove msb from the register
crc = crc & ~0x20;
processed_bit--;
}
cout << "CRC: " << (int)crc << endl;
return 0;
}
It is obvious that this program is uneffective as far as execution time. So I have been thinking about a possibility how to improve it in this perspective. I know that there is a variant to use the look-up table containing the precalculated reminders but I would like to avoid this method. Does anybody know how to improve the above mentioned algorithm from the execution time perspective? Thanks in advance for any suggestions.
Just a quick glance shows several unnecessary statements. You don't need crc &= ~1U;, since the crc = crc << 1; already put a zero there. You don't need message = message & ~0x2000;, since you are only ever looking at one bit in there. Just let the other bits shift up and away. You don't need the crc = crc & ~0x20;, since the exclusive-or with the polynomial already did that.
If you read the document you linked, you will find that you do not need to process five more bits (13 total). You only need to process the eight message bits. Also reading that document, you do not need to feed in the message bits one at a time. You can exclusive-or the message byte directly onto the CRC register, and then process the eight bits all in the register.
Finally, you can speed up the calculation significantly with a table look up, processing eight bits at a time instead of one bit at a time. This is also described beautifully in the document you linked. You can find code here to automatically generate the table and C code for the calculation.
In the end though, none of this matters if you're not calculating the right thing to begin with. You need to verify the calculation with data from the chip first. I found this document with details on the CRC calculation for that chip. You need to spend some time with it and understand it in detail.
To answer your question directly, here is some code that does what your code does, but is much simpler. Also it is extended to work on n bits, not just eight. It does n loops instead of n+5 loops:
// Return a CRC-5 of the low n bits of data. The remaining bits of data must be
// zero. n must be in [5..32].
uint8_t crc5(uint32_t data, int n) {
int shift = n - 5;
uint32_t poly = (uint32_t)0x15 << shift;
uint32_t top = (uint32_t)1 << (n - 1);
do {
data = data & top ? (data << 1) ^ poly : data << 1;
} while (--n);
return (data >> shift) & 0x1f;
}
Simpler and faster still is the equivalent of yours restricted to eight bits, unrolled:
uint8_t crc5_8(uint8_t data) {
data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
return data >> 3;
}
However neither can calculate what you need for your chip.
I'm french student and I have been playing with SimpleModbus for 1 week, and it worked fine... Until I tried to implement a slave with a large amount of registers (1000 needed). Registers beyond 255 using slave 1 cannot be accessed. Whatever I do I get a timeout error as soon as the registers reading go beyond 255 :
control.modbus.CommunicationException: While reading holding registers 255 on slave 1
The documentation says nothing about such a limitation. Reading
SimpleModbusSlave.cpp didn't help ... maybe I need to change the starting address in the "modbus update" function but I am too beginner to understand...
unsigned int modbus_update() {
if (*(ModbusPort).available())
{
unsigned char buffer = 0;
unsigned char overflow = 0;
while ((*ModbusPort).available())
{
// The maximum number of bytes is limited to the serial buffer size of 128 bytes
// If more bytes is received than the BUFFER_SIZE the overflow flag will be set and the
// serial buffer will be red untill all the data is cleared from the receive buffer.
if (overflow)
(*ModbusPort).read();
else
{
if (buffer == BUFFER_SIZE)
overflow = 1;
frame[buffer] = (*ModbusPort).read();
buffer++;
}
delayMicroseconds(T1_5); // inter character time out
}
// If an overflow occurred increment the errorCount
// variable and return to the main sketch without
// responding to the request i.e. force a timeout
if (overflow)
return errorCount++;
// The minimum request packet is 8 bytes for function 3 & 16
if (buffer > 7)
{
unsigned char id = frame[0];
broadcastFlag = 0;
if (id == 0)
broadcastFlag = 1;
if (id == slaveID || broadcastFlag) // if the recieved ID matches the slaveID or broadcasting id (0), continue
{
unsigned int crc = ((frame[buffer - 2] << 8) | frame[buffer - 1]); // combine the crc Low & High bytes
if (calculateCRC(buffer - 2) == crc) // if the calculated crc matches the recieved crc continue
{
function = frame[1];
unsigned int startingAddress = ((frame[2] << 8) | frame[3]); // combine the starting address bytes
unsigned int no_of_registers = ((frame[4] << 8) | frame[5]); // combine the number of register bytes
unsigned int maxData = startingAddress + no_of_registers;
unsigned char index;
unsigned char address;
unsigned int crc16;
// broadcasting is not supported for function 3
if (!broadcastFlag && (function == 3))
{
if (startingAddress < holdingRegsSize) // check exception 2 ILLEGAL DATA ADDRESS
{
if (maxData <= holdingRegsSize) // check exception 3 ILLEGAL DATA VALUE
{
unsigned char noOfBytes = no_of_registers * 2;
// ID, function, noOfBytes, (dataLo + dataHi)*number of registers,
// crcLo, crcHi
unsigned char responseFrameSize = 5 + noOfBytes;
frame[0] = slaveID;
frame[1] = function;
frame[2] = noOfBytes;
address = 3; // PDU starts at the 4th byte
unsigned int temp;
for (index = startingAddress; index < maxData; index++)
{
temp = regs[index];
frame[address] = temp >> 8; // split the register into 2 bytes
address++;
frame[address] = temp & 0xFF;
address++;
}
crc16 = calculateCRC(responseFrameSize - 2);
frame[responseFrameSize - 2] = crc16 >> 8; // split crc into 2 bytes
frame[responseFrameSize - 1] = crc16 & 0xFF;
sendPacket(responseFrameSize);
}
else
exceptionResponse(3); // exception 3 ILLEGAL DATA VALUE
}
else
exceptionResponse(2); // exception 2 ILLEGAL DATA ADDRESS
}
else if (function == 16)
{
// Check if the recieved number of bytes matches the calculated bytes
// minus the request bytes.
// id + function + (2 * address bytes) + (2 * no of register bytes) +
// byte count + (2 * CRC bytes) = 9 bytes
if (frame[6] == (buffer - 9))
{
if (startingAddress < holdingRegsSize) // check exception 2 ILLEGAL DATA ADDRESS
{
if (maxData <= holdingRegsSize) // check exception 3 ILLEGAL DATA VALUE
{
address = 7; // start at the 8th byte in the frame
for (index = startingAddress; index < maxData; index++)
{
regs[index] = ((frame[address] << 8) | frame[address + 1]);
address += 2;
}
// only the first 6 bytes are used for CRC calculation
crc16 = calculateCRC(6);
frame[6] = crc16 >> 8; // split crc into 2 bytes
frame[7] = crc16 & 0xFF;
// a function 16 response is an echo of the first 6 bytes from
// the request + 2 crc bytes
if (!broadcastFlag) // don't respond if it's a broadcast message
sendPacket(8);
}
else
exceptionResponse(3); // exception 3 ILLEGAL DATA VALUE
}
else
exceptionResponse(2); // exception 2 ILLEGAL DATA ADDRESS
}
else
errorCount++; // corrupted packet
}
else
exceptionResponse(1); // exception 1 ILLEGAL FUNCTION
}
else // checksum failed
errorCount++;
} // incorrect id
}
else if (buffer > 0 && buffer < 8)
errorCount++; // corrupted packet
}
return errorCount;
}
Slave: SimpleModbusSlave
Any thoughts ?
Edit : I think someone had the same problem but I really did not understand what he changed :( here
Thanks in advance to those who can enlighten me!
Thank you very much #Marker it works now! I changed my librarie for the slave: github.com/smarmengol/Modbus-Master-Slave-for-Arduino –
char bytes[2];
bytes[0] = 0; //0x00
bytes[1] = 24; //0x18
uint16_t* ptrU16 = (uint16_t*)bytes; // I expect it points to the memory block: 0x18
cout << *ptrU16 << endl; // I expect 24, but it is 6144
What's wrong in my code?
You have a little endian machine. 6144 is 0x1800. When your machine represents the 16 bit value 0x0018 in memory, it puts the 0x18 byte first, and the 0x00 byte second, so when you interpret the two byte sequence 0x0018 as a uint16_t, it gives you 6144 (i.e. 0x1800), and not 24 (i.e. 0x0018).
If you change to:
bytes[0] = 24;
bytes[1] = 0;
you'll likely see the result you expect.
If you really want to get the result you expect, then you'll either have to compute it manually, such as:
uint16_t n = (bytes[1] << 8) + bytes[0];
or, more generally:
char bytes[] = {0x18, 0x00};
uint16_t n = 0;
for ( size_t i = 0; i < 2; ++i ) {
n += bytes[i] << 8 * i;
}
std::cout << n << std::endl;
or you can use a function like ntohs(), since network byte-order is big endian.
You may want to look into the ntohs() function. ("Network to Host byte order conversion"). You've plugged in your data in big endian mode, which is traditionally also network byte order. No matter what host you're on, the ntohs() function should return the value you're expecting. There's a mirror function for going from host to network order.
#include <arpa/inet.h>
...
cout << htons(*ptrU16) << endl;
should work and be portable across systems. (i.e. should work on Power, ARM, X86, Alpha, etc).