AES-NI 256-Bit block encryption - c++

I am attempting to use this code which is taken from the intel whitepaper as shown below.
My aim is to perform 256-bit block encryption using AES-NI.
I have successfully derived the key schedule using the method, this method was provided in the Intel AES-NI library which is used to expand the keys: iEncExpandKey256(key,expandedKey);
and the expandedKey works fine in my non AES-NI implementation of AES.
However, when I pass the values into Rijndael256_encrypt(testVector,testResult,expandedKey,32,1) ;
I get an error of "Attempting to access protected memory and this usually indicates that the memory is corrupt" and the line of code which is causing this is data1 = _mm_xor_si128(data1, KS[0]); /* round 0 (initial xor) */ as shown below.
So my question is , what could be the possible errors for such an error? My current hypothesis is that data1 and KS[0] could be of different size and I am currently still verifying it. Other than that , I'm not really sure where else I could look at. Would be greatly appreciated if someone can point me in the right direction to troubleshoot this error.
#include <wmmintrin.h>
#include <emmintrin.h>
#include <smmintrin.h>
void Rijndael256_encrypt (unsigned char *in,
unsigned char *out,
unsigned char *Key_Schedule,
unsigned long long length,
int number_of_rounds)
{
__m128i tmp1, tmp2, data1 ,data2;
__m128i RIJNDAEL256_MASK =
_mm_set_epi32(0x03020d0c, 0x0f0e0908, 0x0b0a0504, 0x07060100);
__m128i BLEND_MASK=
_mm_set_epi32(0x80000000, 0x80800000, 0x80800000, 0x80808000);
__m128i *KS = (__m128i*)Key_Schedule;
int i,j;
for(i=0; i < length/32; i++) { /* loop over the data blocks */
data1 = _mm_loadu_si128(&((__m128i*)in)[i*2+0]); /* load data block */
data2 = _mm_loadu_si128(&((__m128i*)in)[i*2+1]);
data1 = _mm_xor_si128(data1, KS[0]); /* round 0 (initial xor) */
data2 = _mm_xor_si128(data2, KS[1]);
/* Do number_of_rounds-1 AES rounds */
for(j=1; j < number_of_rounds; j++) {
/*Blend to compensate for the shift rows shifts bytes between two
128 bit blocks*/
tmp1 = _mm_blendv_epi8(data1, data2, BLEND_MASK);
tmp2 = _mm_blendv_epi8(data2, data1, BLEND_MASK);
/*Shuffle that compensates for the additional shift in rows 3 and 4
as opposed to rijndael128 (AES)*/
tmp1 = _mm_shuffle_epi8(tmp1, RIJNDAEL256_MASK);
tmp2 = _mm_shuffle_epi8(tmp2, RIJNDAEL256_MASK);
/*This is the encryption step that includes sub bytes, shift rows,
mix columns, xor with round key*/
data1 = _mm_aesenc_si128(tmp1, KS[j*2]);
data2 = _mm_aesenc_si128(tmp2, KS[j*2+1]);
}
tmp1 = _mm_blendv_epi8(data1, data2, BLEND_MASK);
tmp2 = _mm_blendv_epi8(data2, data1, BLEND_MASK);
tmp1 = _mm_shuffle_epi8(tmp1, RIJNDAEL256_MASK);
tmp2 = _mm_shuffle_epi8(tmp2, RIJNDAEL256_MASK);
tmp1 = _mm_aesenclast_si128(tmp1, KS[j*2+0]); /*last AES round */
tmp2 = _mm_aesenclast_si128(tmp2, KS[j*2+1]);
_mm_storeu_si128(&((__m128i*)out)[i*2+0],tmp1);
_mm_storeu_si128(&((__m128i*)out)[i*2+1],tmp2);
}
}

You have:
UCHAR* Key_Schedule=Key_schedule+4;
This unaligns Key_Schedule, since Key_schedule is (I hope!) aligned and you've added 32-bits to it.
You're asking the CPU to do something that the hardware is not capable of doing because of the way the data lines are wired. This is a gross oversimplification, but: You can think of the CPU as having sixteen 8-bit slots that it has to read from. To read data, it sends out an address which is the byte address divided by 16 and then decides which slots to read from. If the byte address of all 16 bytes that compose the 128-bit address aren't the same when divided by 16, then it's not possible to read the 16 bytes into the 16 slots.
If you don't want to impose alignment requirements on all the parameters to the function, then you'll need to have the function itself copy them into aligned buffers.
SSE operations need to be aligned to 16 for loading and storing[.] -- AES Intrinsics

Related

KheperaIV Test File is more complicated than I expected

I am working on an undergrad project involving the Khepera IV mobile robot, and as I'm reading the files that came with it, I came across this line that confuses me:
for (i=0;i<5;i++) {
usvalues[i] = (short)(Buffer[i*2] | Buffer[i*2+1]<<8);
...
From the same file, usvalues[i] is initialized as usvalues[5] for each of the ultrasonic sensors on the robot, Buffer[] is initialized as Buffer[100] i assume for the sample rate of the ultrasonic sensors. But I've never seen a variable set like this. Can someone help me to understand this?
Code reads the Buffer[] array (certainly it has 8-bit elements) 2 successive bytes per iteration in little endian order (lower addressed byte is the least significant byte). It then forms a 16-bit value to save in usvalues[].
for (i=0;i<5;i++) {
usvalues[i] = (short)(Buffer[i*2] | Buffer[i*2+1]<<8);
Code should use uint8_t Buffer[100]; to prevent doing a signed left shift.
usvalues[] better as some unsigned type like uint16_t or unsigned and use unsigned operations.
uint8_t Buffer[100];
uint16_t /* or unsigned */ usvalues[5 /* or more */];
for (i = 0; i < 5; i++) {
usvalues[i] = Buffer[i*2] | (unsigned)Buffer[i*2+1] << 8;
}

8b10b encoder with byte stream output (bits carry): faster bitwise algorithm?

I have written a 8b10b encoder that generates a stream of bytes intended to be sent to a serial transmitter which sends the bytes as-is LSb first.
What I'm doing here is basically lay down groups of 10 bits (encoded from the input stream of bytes) on groups of 8, so a varying number of bits get carried over from one output byte to the next - kind of like in music/rhythm.
The program has been successfully tested, but it is about 4-5x too slow for my application. I think it comes from the fact that every bit has to be looked up in an array. My guts tell me we could make that faster by having some sort of rolling mask but I can't yet see how to do that even by swapping out the 3d array of booleans to a 2D array of integers.
Any pointer or other idea?
Here is the code. Please ignore most of the macros and some of the code related to deciding which byte is to be written as this is application-specific.
Header:
#ifndef TX_BYTESTREAM_GEN_H_INCLUDED
#define TX_BYTESTREAM_GEN_H_INCLUDED
#include <stdint.h> //for standard portable types such as uint16_t
#define MAX_USB_TRANSFER_SIZE 1016 //Bytes, size of the max payload in a USB transaction. Determined using FT4222_GetMaxTRansferSize()
#define MAX_USB_PACKET_SIZE 62 //Bytes, max size of the payload of a single USB packet
#define MANDATORY_TX_PACKET_BLOCK 5 //Bytes, constant - equal to the minimum number of bytes of TX packet necessary to exactly transfer blocks of 10 bits of encoded data (LCF of 8 and 10)
#define SYNC_CHARS_MAX_INTERVAL 172 //Target number of payload bytes between sync chars. Max is 188 before desynchronisation
#define ROUND_UP(N, S) ((((N) + (S) - 1) / (S)) * (S)) //Macro to round up the integer N to the largest multiple of the integer S
#define ROUND_DOWN(N,S) ((N / S) * S) //Same rounding down
#define N_SYNC_CHAR_PAIRS_IN_PCKT(pcktSz) (ROUND_UP((pcktSz*1000/(SYNC_CHARS_MAX_INTERVAL+2)),1000)/1000) //Number of sync (K28.5) character/byte pairs in a given packet
#define TX_PAYLOAD_SIZE(pcktSz) ((pcktSz*4/5)-2*N_SYNC_CHAR_PAIRS_IN_PCKT(pcktSz)) //Size in bytes of the payload data before encoding in a single TX packet
#define MAX_TX_PACKET_SIZE (ROUND_DOWN((MAX_USB_TRANSFER_SIZE-MAX_USB_PACKET_SIZE),(MAX_USB_PACKET_SIZE*MANDATORY_TX_PACKET_BLOCK))) //Maximum size in bytes of a TX packet
#define DEFAULT_TX_PACKET_SIZE (MAX_TX_PACKET_SIZE-MAX_USB_PACKET_SIZE*MANDATORY_TX_PACKET_BLOCK) //Default size in bytes of a TX packet with some margin
#define MAX_TX_PAYLOAD_SIZE (TX_PAYLOAD_SIZE(MAX_TX_PACKET_SIZE)) //Maximum size in bytes of the payload in a TX packet
#define DEFAULT_TX_PAYLOAD_SIZE (TX_PAYLOAD_SIZE(DEFAULT_TX_PACKET_SIZE))//Default size in bytes of the payload in a TX packet with some margin
//See string descriptors below for definitions. Error codes are individual bits so can be combined.
enum ErrCode
{
NO_ERR = 0,
INVALID_DIN_SIZE = 1,
INVALID_DOUT_SIZE = 2,
NULL_DIN_PTR = 4,
NULL_DOUT_PTR = 8
};
char const * const ERR_CODE_DESC[] = {
"No error",
"Invalid size of input data",
"Invalid size of output buffer",
"Input data pointer is NULL",
"Output buffer pointer is NULL"
};
/** #brief Generates the bytestream to the transmitter by encoding the incoming data using 8b10b encoding
and inserting K28.5 synchronisation characters to maintain the synchronisation with the demodulator (LVDS passthrough mode)
#arg din is a pointer to an allocated array of bytes which contains the data to encode
#arg dinSize is the size of din in bytes. This size must be equal to TX_PAYLOAD_SIZE(doutSize)
#arg dout is a pointer to an allocated array of bytes which is intended to contain the output bytestream to the transmitter
#arg doutSize is the size of dout in bytes. This size must meet the conditions at the top of this function's implementation. Use DEFAULT_TX_PACKET_SIZE if in doubt.
#return error code (c.f. ErrCode) **/
int TX_gen_bytestream(uint8_t *din, uint16_t dinSize, uint8_t *dout, uint16_t doutSize);
#endif // TX_BYTESTREAM_GEN_H_INCLUDED
Source file:
#include "TX_bytestream_gen.h"
#include <cstddef> //NULL
#define N_BYTE_VALUES (256+1) //256 possible data values + 1 special character (only accessible to this module)
#define N_ENCODED_BITS 10 //Number of bits corresponding to the 8b10b encoding of a byte
//Map the current running disparity, the desired value to encode to the array of encoded bits for 8b10b encoding.
//The Last value is the K28.5 sync character, only accessible to this module
//Notation = MSb to LSb
bool const encodedBits[2][N_BYTE_VALUES][N_ENCODED_BITS] =
{
//Long table (see appendix)
};
//New value of the running disparity after encoding with the specified previous running disparity and requested byte value (c.f. above)
bool const encodingDisparity[2][N_BYTE_VALUES] =
{
//Long table (see appendix)
};
int TX_gen_bytestream(uint8_t *din, uint16_t dinSize, uint8_t *dout, uint16_t doutSize)
{
static bool RDp = false; //Running disparity is initially negative
int ret = 0;
//If the output buffer size is not a multiple of the mandatory payload block or of the USB packet size, or if it cannot be held in a single USB transaction
//return an invalid output buffer size error
if(doutSize == 0 || (doutSize % MANDATORY_TX_PACKET_BLOCK) || (doutSize % MAX_USB_PACKET_SIZE) || (doutSize > MAX_TX_PACKET_SIZE)) //Temp
ret |= INVALID_DOUT_SIZE;
//If the input data size is not consistent with the output buffer size, return the appropriate error code
if(dinSize == 0 || dinSize != TX_PAYLOAD_SIZE(doutSize))
ret |= INVALID_DIN_SIZE;
if(din == NULL)
ret |= NULL_DIN_PTR;
if(dout == NULL)
ret |= NULL_DOUT_PTR;
//If everything checks out, carry on
if(ret == NO_ERR)
{
uint16_t iByteIn = 0; //Index of the byte of input data currently being processed
uint16_t iByteOut = 0; //Index of the output byte currently being written to
uint8_t iBitOut = 0; //Starts with LSb
int16_t nBytesUntilSync = 0; //Countdown of bytes until a sync marker needs to be sent. Cyclic.
//For all output bytes to generate
while(iByteOut < doutSize)
{
bool sync = false; //Initially this byte is not considered a sync byte (in which case the next byte of data will be processed)
//If the maximum interval between sync characters has been reached, mark the two next bytes as sync bytes and reset the counter
if(nBytesUntilSync <= 0)
{
sync = true;
if(nBytesUntilSync == -1) //After the second SYNC is written, the counter is reset
{
nBytesUntilSync = SYNC_CHARS_MAX_INTERVAL;
}
}
//Append bit by bit the encoded data of the byte to write to the output bitstream (carried over from byte to byte) - LSb first
//The byte to write is either the last byte of the encodedBits map (the sync character K28.5) if sync is set, or the next byte of
//input data if it isn't
uint16_t const byteToWrite = (sync?(N_BYTE_VALUES-1):din[iByteIn]);
for(int8_t iEncodedBit = N_ENCODED_BITS-1 ; iEncodedBit >= 0 ; --iEncodedBit, iBitOut++)
{
//If the current output byte is complete, reset the bit index and select the next one
if(iBitOut >= 8)
{
iByteOut++;
iBitOut = 0;
}
//Effectively sets the iBitOut'th bit of the iByteOut'th byte out to the encoded value of the byte to write
bool bitToWrite = encodedBits[RDp][byteToWrite][iEncodedBit]; //Temp
dout[iByteOut] ^= (-bitToWrite ^ dout[iByteOut]) & (1 << iBitOut);
}
//The running disparity is also updated as per the standard (to achieve DC balance)
RDp = encodingDisparity[RDp][byteToWrite]; //Update the running disparity
//If sync was not set, this means a byte of the input data has been processed, in which case take the next one in
//Also decrement the synchronisation counter
if(!sync) {
iByteIn++;
}
//In any case, decrease the synchronisation counter. Even sync characters decrease it (c.f. top of while loop)
nBytesUntilSync--;
}
}
return ret;
}
Testbench:
#include <iostream>
#include "TX_bytestream_gen.h"
#define PACKET_DURATION 0.000992 //In seconds, time of continuous data stream corresponding to one packet (5MHz output, default packet size)
#define TIME_TO_SIMULATE 10 //In seconds
#define PACKET_SIZE DEFAULT_TX_PACKET_SIZE
#define PAYLOAD_SIZE DEFAULT_TX_PAYLOAD_SIZE
#define N_ITERATIONS (TIME_TO_SIMULATE/PACKET_DURATION)
#include <chrono>
using namespace std;
//Testbench: measure the time taken to simulate TIME_TO_SIMULATE seconds of continuous encoding
int main()
{
uint8_t toEncode[PAYLOAD_SIZE] = {100}; //Dummy data, doesn't matter
uint8_t out[PACKET_SIZE] = {0};
std::chrono::time_point<std::chrono::system_clock> start, end;
start = std::chrono::system_clock::now();
for(unsigned int i = 0 ; i < N_ITERATIONS ; i++)
{
TX_gen_bytestream(toEncode, PAYLOAD_SIZE, out, PACKET_SIZE);
}
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end - start;
std::cout << "Task execution time: " << elapsed_seconds.count()/TIME_TO_SIMULATE*100 << "% (for " << TIME_TO_SIMULATE << "s simulated)\n";
return 0;
}
Appendix: lookup tables. I don't have enough characters to paste it here, but it looks like so:
bool const encodedBits[2][N_BYTE_VALUES][N_ENCODED_BITS] =
{
//Running disparity = RD-
{
{1,0,0,1,1,1,0,1,0,0},
//...
},
//Running disparity = RD+
{
{0,1,1,0,0,0,1,0,1,1},
//...
}
};
bool const encodingDisparity[2][N_BYTE_VALUES] =
{
//Previous running disparity was RD-
{
0,
//...
},
//Previous running disparity was RD+
{
1,
//...
}
};
This will be a lot faster if you do everything a byte at time instead of a bit at a time.
First change the way you store your lookup tables. You should have something like:
// conversion from (RD, byte) to (RD, 10-bit code)
// in each word, the lower 10 bits are the code,
// and bit 10 (the 11th bit) is the new RD
// The first 256 values are for RD -1, the next
// for RD 1
static const uint16_t BYTE_TO_CODE[512] = {
...
}
Then you need to change our encoding loop to write a byte at a time. You can use a uint16_t to store the leftover bits from each byte you output.
Something like this (I didn't figure out your sync byte logic, but presumably you can put that in the input or output byte loop):
// returns next isRD1
bool TX_gen_bytestream(uint8_t *dest, const uint8_t *src, size_t src_len, bool isRD1)
{
// bits generated, but not yet written, LSB first
uint16_t bits = 0;
// number of bits in bits
unsigned numbits = 0;
// current RD, either 0 or 256
uint16_t rd = isRD1 ? 256 : 0;
for (const uint8_t *end = src + src_len; src < end; ++src) {
// lookup code and next rd
uint16_t code = BYTE_TO_CODE[rd + *src];
// new rd from code bit 10
rd = (code>>2) & 256;
// store bits
bits |= (code & (uint16_t)0x03FF) << numbits;
numbits+=10;
// write out any complete bytes
while(numbits >= 8) {
*dest++ = (uint8_t)bits;
bits >>=8;
numbits-=8;
}
}
// If src_len isn't divisible by 4, then we have some extra bits
if (numbits) {
*dest = (uint8_t)bits;
}
return !!rd;
}

How to convert boost multiprecision integers into big endian from little endian?

I am trying to fix this part of an abandonware program because I failed to find an alternative program.
As you can see the data of PUSH instructions are in the wrong order whereas Ethereum is a big endian machine (address are correctly represented because they use a smaller type).
An alternative is to run porosity.exe --code '0x61004b60026319e44e32' --disassm
Theu256 type is defined as
using u256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
Here’s a minimal example to reproduce the bug:
#include <sstream>
#include <iostream>
#include <iomanip>
#include <boost/multiprecision/cpp_int.hpp>
using u256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
int main() {
std::stringstream stream;
u256 data=0xFEDEFA;
for (int i = 0; i<5; ++i) { // print only the first 5 digits
uint8_t dataByte = int(data & 0xFF);
data >>= 8;
stream << std::setfill('0') << std::setw(sizeof(char) * 2) << std::hex << int(dataByte) << " ";
}
std::cout << stream.str();
}
So numbers are converted to string with a space between each byte (and only the first bytes).
But then I ran into an endianness problem: bytes were printed in the reverse order. I mean for example, 31722 is written 8a 02 02 on my machine and 02 02 8a when compiled for a big endian target.
So as I don’t which boost function to call, I modified the code:
#include <sstream>
#include <iostream>
#include <iomanip>
#include <boost/multiprecision/cpp_int.hpp>
using u256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
int main() {
std::stringstream stream;
u256 data=0xFEDEFA;
for (int i = 0; i<5; ++i) {
uint8_t dataByte = int(data >> ((32 - i - 1) * 8));
stream << std::setfill('0') << std::setw(sizeof(char) * 2) << std::hex << int(dataByte) << " ";
}
std::cout << stream.str();
}
Now, why are my 256 bits integers printed mostly as series of 00 00 00 00 00?
BTW, this is not an endianness issue; you aren't doing byte accesses to the object-representation. You're operating on it as a 256-bit integer and simply asking for the low 8 bits at a time with data & 0xFF.
If you did know the endianness of the target C implementation, and the data layout of the boost object, you could efficiently loop over it in descending address order with unsigned char*.
You're introducing the idea of endianness only because it's associated with byte-reversal, which is what you're trying to do. But that's really inefficient, just loop over the bytes of your bigint the other way.
I'm hesitant to recommend a specific solution because I don't know what will compile efficiently. But you might want something like this instead of byte-reversing ahead of time:
for (outer loop) {
uint64_t chunk = data >> (64*3); // grab the highest 64-bit chunk
data <<= 64; // and shift everything up
// alternative: maybe keep a shift-count in a variable instead of modifying `data`
// Then pick apart the chunk into its component bytes, in MSB first order
for (int = 0 ; i<8 ; i++) {
unsigned tmp = (chunk >> 56) & 0xFF;
// do something with it
chunk <<= 8; // bring the next byte to the top
}
}
In the inner loop, more efficient than using two shifts can be using a rotate to bring the high byte to the bottom (for & 0xFF) at the same time as shifting lower bytes upward.
Best practices for circular shift (rotate) operations in C++
In the outer loop, IDK if boost::multiprecision::number has any APIs for efficient indexing of chunks built in; if so using that is probably more efficient.
I used nested loops because I assume data <<= 8 doesn't compile particularly efficiently, and neither would (data >> (256-8)) & 0xFF. But that's how you'd grab bytes from the top instead of the bottom.
Another option is the standard trick for converting numbers to strings: store characters into a buffer in descending order. A 256-bit (32-byte) number will take 64 hex digits, and you want another 32 bytes of spaces between them.
For example:
// 97 = 32 * 2 + 32, plus 1 byte for an implicit-length C string terminator
// plus another 1 for an extra space
char buf[98]; // small enough to use automatic storage
char *outp = buf+96; // pointer to the end
*outp = 0; // terminator
const char *hex_lut = "0123456789abcdef";
for (int i=0 ; i<32 ; i++) {
uint8_t byte = data & 0xFF;
*--outp = hex_lut[byte >> 4];
*--outp = hex_lut[byte & 0xF];
*--outp = ' ';
data >>= 8;
}
// outp points at an extra ' '
outp++;
// outp points at the first byte of a string like "12 ab cd"
stream << outp;
If you want to break that up into chunks to put a line break in there, you can do that too.
If you're interested in efficient conversion to hex for 8, 16 or 32 bytes of data at once, see How to convert a number to hex? for some x86 SIMD ways. The asm should port easily to C++ intrinsics. (You can use SIMD shuffles to handle putting bytes into MSB-first printing order after loading from little-endian integers.)
You could also use a SIMD shuffle to space-separate your pairs of hex digits before storing to memory like you apparently want here.
Bug in the code you added:
So I added this code before the loop above:
for(unsigned int i=0,data,_data;i<33;++i)
unsigned i, data, _data declares new variables of type unsigned int that shadow the previous declarations of data and _data. That loop has zero effect on data or _data outside the scope of the loop. (And contains UB because you read _data and data without initializing them.)
If those vars are actually both still the u256 vars of the outer scope, I don't see an obvious problem other than efficiency, but maybe I'm missing the obvious too. I didn't look very hard because using 64x 256-bit shifts and 32x ORs seems like a horrible idea. It's possible it could optimize away completely, or into bswap byte-reverse instructions on ISAs that have them, but I doubt it. Especially not through the extra complication of the boost::multiprecision::number wrapper functions.

c++ write on specific bits in matrix

I got a quite simple problem today. I have a matrix float gradient[COLS][ROWS]. As you probably know the float type includes 32 bits.
In my code I do 4 different checks on another table. For each of them I want to write in gradient[][] the results.
What I would like to do is write these results on 8 bits in gradient[][].
So the LSB would contain the result of the first Check, the 8 following bits the results of the second Check, and so on.
As for the reason I want to do this, it is because I'm trying to synthetize this code using HLS and make it run on a Xilinx ZedBoard. There is however not much memory available on the FPGA, so instead of storing the results of my 4 functions into 4 different matrix I woul like to store them in the same matrix using bit operations.
I know I can use masks with an AND operator like gradient[][]&0xFF. What I'm not sure however is when and how do I apply this mask ?
As an example here is the code for one of the Checks (sry for the spanish names i didn't write this) :
void FullCheck(float brightness_tab[COLS][ROWS]){
for(int i=0;i<ROWS;i++){
int previous_point = (int)(brightness_tab[0][i]);
for(int j=1;j<COLS-1;j++){
float brightness=brightness_tab[i][j];
int brightnessi=(int)(brightness);
gradient[i][j]=brightnessi- previous_point;
if(!(gradient[i][j]>VALOR_PENDIENTE || gradient[i][j]<-VALOR_PENDIENTE)){
if(!(gradient[i][j] + gradient[i][j-1] >VALOR_PENDIENTE_TRUNCAR || gradient[i][j] + gradient[i][j+1]<-VALOR_PENDIENTE_TRUNCAR)){
gradient[i][j]=0;
}
}
if(j<2 || i<2 || COLS-1 ==i){gradient[i][j]=0;}
previous_point=brightnessi;
}
}
}
Thank you in advance for your answers !
Deducing from your comments, I'll assume that gradient will be declared as an int array.
In your sample code, there are 2 cases for writing something to the matrix. In the first case, you want to write some value, such as this line:
gradient[i][j] = brightnessi - previous_point;
If you want to write some data to a specific byte, the data you want to write should be a 1-byte data itself.
gradient[i][j] = 0; // initialize to all zero bits
int data1 = 0x12; // 1-byte value
gradient[i][j] |= data1; // writing to the 1st byte (LSB)
int data2 = 0x34;
gradient[i][j] |= data2 << 8; // writing to the 2nd byte
int data3 = 0x56;
gradient[i][j] |= data3 << 16; // writing to the 3rd byte
int data4 = 0x78;
gradient[i][j] |= data4 << 24; // writing to the 4th byte
After executing above code, the value of gradient[i][j] will be 0x78563412.
The second case is clearing what you have written before by writing 0, such as this line:
gradient[i][j] = 0;
In this case you can do
gradient[i][j] &= 0xffffff00; // clearing the 1st byte (LSB)
gradient[i][j] &= 0xffff00ff; // clearing the 2nd byte
gradient[i][j] &= 0xff00ffff; // clearing the 3rd byte
gradient[i][j] &= 0x00ffffff; // clearing the 4th byte
You could also do a struct that has the same memory layout
struct Bytes
{
uint8_t a;
uint8_t b;
uint8_t c;
uint8_t d;
} ;
Bytes* g = reinterpret_cast<Bytes*>(&gradient[i][j]);
That way you can access the individual bytes easily like g->a

Split up 32 bit value in C++ and concatenate the chunks in MATLAB

I'm working on a project where I have to send values of 32 bits over UART to MATLAB where I need to print them in the MATLAB terminal. I do this by splitting up the 32 bit value into 8 bit values like so (:
void Configurator::send(void) {
/**
* Split the 32 bits in chunks of 4 bytes of 8 bits
*/
union {
uint32_t data;
uint8_t bytes[4];
} splitData;
splitData.data = 1234587;
for (int n : splitData.bytes) {
XUartPs_SendByte(STDOUT_BASEADDRESS, splitData.bytes[n]);
}
}
In MATLAB I receive the following 4 bytes:
252
230
25
155
Now the question is, how do I restore the 1234587?
Am I correct in creating an array of size 4 as uint8_t? I would also like to note that I'm using union for readability. If I'm doing it wrong, I'd be happy to hear why!
You could use left shift to restore the value
uint32_t value = (byte[3]<<24) + (byte[2]<<16) + (byte[1]<<8) + (byte[0]<<0);
Try to avoid using unions for this sort of thing. It is not (in principle) portable, and can cause undefined behaviour. Instead write it like this:
void Configurator::send(void) {
/**
* Split the 32 bits in chunks of 4 bytes of 8 bits
*/
uint32_t data = 1234587;
for (int n = 0; n<4; n++) {
unsigned char octet = (data >> (n*8)) & 0xFF;
XUartPs_SendByte(STDOUT_BASEADDRESS, octet);
}
}
uint32_t recieveBytes(
{
uint32_t result = 0;
for (int n = 0; n<4; n++)
{
unsigned char octet = getOctet();
uint32_t octet32 = octet;
result != octet32 << (n*8);
}
return result;
}
The point is that by shifting out byte like this, you avoid any problems with endianness. The masking also means that if either end has 32-bit chars (such platforms exist), it all works anyway.