Problem with Writing to the Flash Memory of the STM32L4R5 microcontroller - c++

I'm trying to write to the flash memory of STM32L4R5 in 'FLASH_TYPEPROGRAM_FAST' mode of the HAL_FLASH_Program().
The flash of the MCU is configured as Single Bank.
Writing to the flash only works when using 'FLASH_TYPEPROGRAM_DOUBLEWORD'. The flash reads as 0xFFFFFFFF when written in 'FLASH_TYPEPROGRAM_FAST' mode.
This is my test project:
// Page Erase Structure
static FLASH_EraseInitTypeDef EraseInitStruct;
// Page Erase Status
uint32_t eraseStatus;
// Data Buffer
uint64_t pDataBuf[32] =
{
0x1111111122222222, 0x3333333344444444,
0x5555555566666666, 0x7777777788888888,
0x12345678ABC12345, 0x23456789DEF01234,
0x34567890AAABBB12, 0x4567890FABCDDD34,
0x1111111122222222, 0x3333333344444444,
0x5555555566666666, 0x7777777788888888,
0x12345678ABC12345, 0x23456789DEF01234,
0x34567890AAABBB12, 0x4567890FABCDDD34,
0x1111111122222222, 0x3333333344444444,
0x5555555566666666, 0x7777777788888888,
0x12345678ABC12345, 0x23456789DEF01234,
0x34567890AAABBB12, 0x4567890FABCDDD34,
0x1111111122222222, 0x3333333344444444,
0x5555555566666666, 0x7777777788888888,
0x12345678ABC12345, 0x23456789DEF01234,
0x34567890AAABBB12, 0x4567890FABCDDD34
};
// Flash Page Start Address
uint32_t pageAddr = 0x081FE000;
// Fill Erase Init Structure
EraseInitStruct.TypeErase = FLASH_TYPEERASE_PAGES;
EraseInitStruct.Banks = FLASH_BANK_1;
EraseInitStruct.Page = 255;
EraseInitStruct.NbPages = 1;
// Unlocking the FLASH Control Register
HAL_FLASH_Unlock();
// Clear OPTVERR Bit Set on Virgin Samples
__HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_OPTVERR);
// Erasing the Flash Page
HAL_FLASHEx_Erase(&EraseInitStruct, &Error);
#if 0
// Wriring a Doubled Word to Flash. pDataBuf[0] is the 64-bit Word
HAL_FLASH_Program(FLASH_TYPEPROGRAM_DOUBLEWORD, pageAddr, pDataBuf[0]);
#else
// Wriring 32 Double Words. pDataBuf is the Starting Address of the 64-bit Array
HAL_FLASH_Program(FLASH_TYPEPROGRAM_FAST_AND_LAST, pageAddr, pDataBuf);
#endif
// Locking the FLASH Control Register
HAL_FLASH_Lock();
Am I doing anything wrong?
Thank you,
Ivan

Document RM0932, Reference manual for STM32L4+, section FLASH. It covers reading and writing from/to flash, for both single-bank and double-bank configurations and different MCU models of this line. It seems, most differences are about reading from Flash (64-bit for dual bank, 128-bit for single bank). As for writing, page 128:
Flash is very picky about data width, and every STM32 has different data width for its flash, it seems. Very recently I stumbled upon one, which accepted only 16-bit writes and reads. This one likes double words. There is no universal function to read and write flash to any STM32, so it seems one of your commands doesn't respect this MCU's Flash data width rules. You can check if any error flags appear as per reference manual, although, as you can see, it doesn't say anything about trying to write 32-bit piece of data. I would expect that write to fail, but we can't make any conclusions about error flags from the screenshot provided. If you're curious enough, you can look at what data width every mode/function of yours utilizes and see what happens. 64-bit writes have to work.

Related

Is HAL_UARTEx_RxEventCallback Size parameter calculated programmatically or by hardware

I'm realizing UART-DMA with STM_HAL library and I want to know if message size is counted by hardware (counting clock ticks till line is idle for example) or by some program method(something like strlen). So if Size in
HAL_UARTEx_RxEventCallback(UART_HandleTypeDef *huart, uint16_t Size)
is counted by hardware, I can send data in pure HEX format, but if it is calculated by something like strline, I may recieve problems if data is 0x00 and have to send data in ASCII.
I've tried to make some research in generated code in Keil but failed (maybe I didn't try hard enough) so maybe somebody can help me.
If you are using UART DMA, it is calculated by hardware.
If you check the call hierarchy of HAL_UARTEx_RxEventCallback using your ide, you can see how the Size variable is calculated.
The function is executed in the following flow.(Depending on the version of HAL Driver, it may be slightly different)
UART Idle Interrupt occur
Call HAL_UART_IRQHandler()
If DMA mod is enabled, Call HAL_UARTEx_RxEventCallback(huart, (huart->RxXferSize - huart->RxXferCount))
Therefore, Size variable is calculated as (huart->RxXferSize - huart->RxXferCount)
huart->RxXferSize is a set value when initializing RX DMA.
huart->RxXferCount is (huart->hdmarx)->Instance->NDTR
NDTR is a value calculated by hardware as the size of the buffer remaining after DMA transfer data to memory!!

How to diagnose a visual studio project slowing down as time goes on?

Computer:
Processor: Intel Xeon Silver 4114 CPU # 2.19Ghz (2 processors)
Ram: 96 Gb 2666 Hz: 12 - 8 Gb sticks
OS: Windows 10
GPU: None
Hard drive: Samsung MZVLB512HAJQ-000H2 - 512GB M.2 PCIe NVMe
IDE:
Visual Studio 2019
I am including what I am doing in case it is relevant. I am running a visual studio code where I read data off a GSC PCI SIO4B Sync Card 256K. Using the API for this card (Documentation: http://www.generalstandards.com/downloads/GscApi.1.6.10.1.pdf) I read 150 bytes of data at a speed of 100Hz using the code below. That data is then being split into to the message structure my device. I can’t give info on the message structure but the data is then combined into the various words using a union and added to an integer array int Data[100];
Union Example:
union data_set{
unsigned int integer;
unsigned char input[2];
} word;
Example of how the data is read read:
PLX_PHYSICAL_MEM cpRxBuffer;
#define TEST_BUFFER_SIZE 0x400
//allocates memory for the buffer
cpRxBuffer.Size = TEST_BUFFER_SIZE;
status = GscAllocPhysicalMemory(BoardNum, &cpRxBuffer);
status = GscMapPhysicalMemory(BoardNum, &cpRxBuffer);
memset((unsigned char*)cpRxBuffer.UserAddr, 0xa5, sizeof(cpRxBuffer));
// start data reception:
status = GscSio4ChannelReceivePlxPhysData(BoardNum, iRxChannel, &cpRxBuffer, SetMaxBytes, &messageID);
// wait for Rx operation to complete
status = GscSio4ChannelWaitForTransfer(BoardNum, iRxChannel, 7000, messageID, &amount);
if (status)
{
// If we have an error, "bytesTransferred" will contain the number of bytes that we
// actually transmitted.
DisplayErrorMessage(status);
printf("\n\t%04X bytes out of %04X transferred", amount, SetMaxBytes);
}
My issue is that this code works fine and keeps up for around 5 minutes then randomly it stops being able to keep up and the FIFO (first in first out) register on the PCI card begins to fill up faster than the code can process the data. To me this seems like a memory leak issue since the code works fine for a long time, then starts to slow down when nothing has changed as all the code is doing it reading the data off the card. We used to save the data in a really large array but even after removing that we had the same issue.
I am unsure how to figure out exactly what is happening and I'm hopping for a way to determine if there is a memory leak and how to fix it if there is.
It being a data leak is only a guess though and it very well could be something else that is the problem so any out of the box suggestions for diagnosing the problem are also appreciated.
Similar to Paul's answer, but I like to strategically place two (or more) _CrtMemCheckpoint followed by _CrtMemDifference, to cut down the noise.
Memory leaks can be detected and reported on (in Debug builds) by calling the _CrtDumpMemoryLeaks function. When running under the debugger, this will tell you (in the output tab) how many allocations you have at the time that it is called and the file and line number that each was allocated from.
Call this right at the end of your program, after you (think you) have freed all the resources you use. Anything left over is a candidate for being a leak.

Using memcpy on mmap'ed region crashes, a for loop does not

I have an NVIDIA Tegra TK1 processor module on a carrier board with a PCI-e slot connecting to it. In that PCIe slot is an FPGA board which exposes some registers and a 64K memory area via PCIe.
On the ARM CPU of the Tegra board, a minimal Linux installation is running.
I am using /dev/mem and the mmap function to obtain user-space pointers to the register structs and the 64K memory area.
The distinct register files and the memory block are all assigned addresses which are aligned and do not overlap with regards to 4KB memory pages.
I explicitly map whole pages with mmap, using the result of getpagesize(), which also is 4096.
I can read/write from/to those exposed registers just fine.
I can read from the memory area (64KB), doing uint32 word-by-word reads in a for loop, just fine. I.e. read contents are correct.
But if I use std::memcpy on the same address range, though, the Tegra CPU freezes, always. I do not see any error message, if GDB is attached I also don't see a thing in Eclipse when trying to step over the memcpy line, it just stops hard. And I have to reset the CPU using the hardware reset button, as the remote console is frozen.
This is debug build with no optimization (-O0), using gcc-linaro-6.3.1-2017.05-i686-mingw32_arm-linux-gnueabihf. I was told the 64K region is accessible byte-wise, I did not try that explicitly.
Is there an actual (potential) problem that I need to worry about, or is there a specific reason why memcpy does not work and maybe should not be used in the first place in this scenario - and I can just carry on using my for loops and think nothing of it?
EDIT: Another effect has been observed: The original code snippet was missing a "vital" printf in the copying for loop, that came before the memory read. That removed, I don't get back valid data. I now updated the code snippet to have an extra read from the same address instead of the printf, which also yields correct data. The confusion intensifies.
Here the (I think) important excerpts of what's going on. With minor modifications, to make sense as shown, in this "de-fluffed" form.
// void* physicalAddr: PCIe "BAR0" address as reported by dmesg, added to the physical address offset of FPGA memory region
// long size: size of the physical region to be mapped
//--------------------------------
// doing the memory mapping
//
const uint32_t pageSize = getpagesize();
assert( IsPowerOfTwo( pageSize ) );
const uint32_t physAddrNum = (uint32_t) physicalAddr;
const uint32_t offsetInPage = physAddrNum & (pageSize - 1);
const uint32_t firstMappedPageIdx = physAddrNum / pageSize;
const uint32_t lastMappedPageIdx = (physAddrNum + size - 1) / pageSize;
const uint32_t mappedPagesCount = 1 + lastMappedPageIdx - firstMappedPageIdx;
const uint32_t mappedSize = mappedPagesCount * pageSize;
const off_t targetOffset = physAddrNum & ~(off_t)(pageSize - 1);
m_fileID = open( "/dev/mem", O_RDWR | O_SYNC );
// addr passed as null means: we supply pages to map. Supplying non-null addr would mean, Linux takes it as a "hint" where to place.
void* mapAtPageStart = mmap( 0, mappedSize, PROT_READ | PROT_WRITE, MAP_SHARED, m_fileID, targetOffset );
if (MAP_FAILED != mapAtPageStart)
{
m_userSpaceMappedAddr = (volatile void*) ( uint32_t(mapAtPageStart) + offsetInPage );
}
//--------------------------------
// Accessing the mapped memory
//
//void* m_rawData: <== m_userSpaceMappedAddr
//uint32_t* destination: points to a stack object
//int length: size in 32bit words of the stack object (a struct with only U32's in it)
// this crashes:
std::memcpy( destination, m_rawData, length * sizeof(uint32_t) );
// this does not, AND does yield correct memory contents - but only with a preceding extra read
for (int i=0; i<length; ++i)
{
// This extra read makes the data gotten in the 2nd read below valid.
// Commented out, the data read into destination will not be valid.
uint32_t tmp = ((const volatile uint32_t*)m_rawData)[i];
(void)tmp; //pacify compiler
destination[i] = ((const volatile uint32_t*)m_rawData)[i];
}
Based on the description, it looks like your FPGA code is not responding correctly to load instructions that are reading from locations on your FPGA and it is causing the CPU to lock up. It's not crashing it is permanently stalled, hence the need for the hard reset. I had this problem also when debugging my PCIE logic on an FPGA.
Another indication that your logic is not responding correctly is that you need an extra read in order to get the right responses.
Your loop is doing 32-bit loads but memcpy is doing at least 64-bit loads, which changes how your logic responds. For example, it will need to use two TLPs with 32 bits of response if the first 128 bits of the completion and the next 32 bits in the second 128 bit TLP of the completion.
What I found super-useful was to add logic to log all the PCIE transactions into an SRAM and to be able to dump the SRAM out to see how the logic was behaving or misbehaving. We have a nifty utility, pcieflat, that prints one PCIE TLP per line. It even has documentation.
When the PCIE interface is not working well enough, I stream the log to a UART in hex which can be decoded by pcieflat.
This tool is also useful for debugging performance problems -- you can look at how well your DMA reads and writes are pipelined.
Alternatively, if you have integrated logic analyzer or similar on the FPGA, you can trace the activity that way. But it's nicer to have the TLPs parsed according to PCIE protocol.

Correct use of memcpy

I have some problems with a project I'm doing. Basically I'm just using memcpy the wrong way. I know the theroy of pointer/arrays/references and should know how to do that, nevertheless I've spend two days now without any progress. I'll try to give a short code overview and maybe someone sees a fault! I would be very thankful.
The Setup: I'm using an ATSAM3x Microcontroller together with a uC for signal aquisition. I receive the data over SPI.
I have an Interrupt receiving the data whenever the uC has data available. The data is then stored in a buffer (int32_t buffer[1024 or 2048]). There is a counter that counts from 0 to the buffer size-1 and determines the place where the data point is stored. Currently I receive a test signal that is internally generated by the uC
//ch1: receive 24 bit data in 8 bit chunks -> store in an int32_t
ch1=ch1|(SPI.transfer(PIN_CS, 0x00, SPI_CONTINUE)<<24)>>8;
ch1=ch1|(SPI.transfer(PIN_CS, 0x00, SPI_CONTINUE)<<16)>>8;
ch1=ch1|(SPI.transfer(PIN_CS, 0x00, SPI_CONTINUE)<<8)>>8;
if(Not Important){
_ch1Buffer[_ch1SampleCount] = ch1;
_ch1SampleCount++;
if(_ch1SampleCount>SAMPLE_BUFFER_SIZE-1) _ch1SampleCount=0;
}
This ISR is active all the time. Since I need raw data for signal processing and the buffer is changed by the ISR whenever a new data point is available, i want to copy parts of the buffer into a temporary "storage".
To do so, I have another, global counter wich is incremented within the ISR. In the mainloop, whenever the counter reaches a certain size, i call a method get some of the buffer data (about 30 samples).
The method aquires the current position in the buffer:
'int ch1Pos = _ch1SampleCount;'
and then, depending on that position I try to use memcpy to get my samples. Depending on the position in the buffer, there has to be a "wrap-around" to get the full set of samples:
if(ch1Pos>=(RAW_BLOCK_SIZE-1)){
memcpy(&ch1[0],&_ch1Buffer[ch1Pos-(RAW_BLOCK_SIZE-1)] , RAW_BLOCK_SIZE*sizeof(int32_t));
}else{
memcpy(&ch1[RAW_BLOCK_SIZE-1 - ch1Pos],&_ch1Buffer[0],(ch1Pos)*sizeof(int32_t));
memcpy(&ch1[0],&_ch1Buffer[SAMPLE_BUFFER_SIZE-1-(RAW_BLOCK_SIZE- ch1Pos)],(RAW_BLOCK_SIZE-ch1Pos)*sizeof(int32_t));
}
_ch1Buffer is the buffer containing the raw data
SAMPLE_BUFFER_SIZE is the size of that buffer
ch1 is the array wich is supposed to hold the set of samples
RAW_BLOCK_SIZE is the size of that array
ch1Pos is the position of the last data point written to the buffer from the ISR at the time where this method is called
Technically I'm aware of the requirements, but apparently thats not enough ;-).
I know, that the data received by the SPI interface is "correct". The problem is, that this is not the case for the extracted samples. There are a lot of spikes in the data that indicate that I've been reading something I wasn't supposed to read. I've changed the memcpy commands that often, that I completly lost the overview. The code sample above is one version of many's, and while you're reading this I'm sure I've changed everything again.
I would appreciate every hint!
Thanks & Greetings!
EDIT
I've written down everything (again) on a sheet of paper and tested some constellations. This is the updated Code for the memcpy part:
if(ch1Pos>=(RAW_BLOCK_SIZE-1)){
memcpy(&ch1[0],&_ch1Buffer[ch1Pos-(RAW_BLOCK_SIZE-1)] , RAW_BLOCK_SIZE*sizeof(int32_t));
}else{
memcpy(&ch1[RAW_BLOCK_SIZE-1-ch1Pos],&_ch1Buffer[0],(ch1Pos+1)*sizeof(int32_t));
memcpy(&ch1[0],&_ch1Buffer[SAMPLE_BUFFER_SIZE-(RAW_BLOCK_SIZE-1-ch1Pos)],(RAW_BLOCK_SIZE-1-ch1Pos)*sizeof(int32_t));
}
}
This already made it a lot better. From all the changes, everything kinda got messed up. Now there is just one Error there. There is a periodical spike. I'll try to get more information, but I think it is a wrong access while wrapping around.
I've changed the if(_ch1SampleCount>SAMPLE_BUFFER_SIZE-1) _ch1SampleCount=0; to if(_ch1SampleCount>=SAMPLE_BUFFER_SIZE) _ch1SampleCount=0;.
EDIT II
To answer the Questions of #David Schwartz :
SPI.transfer returns a single byte
The buffer is initialised once at startup: memset(_ch1Buffer,0,sizeof(int32_t)*SAMPLE_BUFFER_SIZE);
EDIT III
Sorry for the frequent updates, the comment section is getting too big.
I managed to get rid of a bunch of zero values at the beginning of the stream by decreasing ch1Pos: 'int ch1Pos = _ch1SampleCount;' Now there is just one periodic "spike" (wrong value). It must be something with the splitted memcpy command. I'll continue looking. If anyone has an idea ... :-)

IRQ 8 isn't working... HW or SW?

First, I program for Vintage computer groups. What I write is specifically for MS-DOS and not windows, because that's what people are running. My current program is for later systems and not the 8086 line, so the plan was to use IRQ 8. This allows me to set the interrupt rate in binary values from 2 / second to 8192 / second (2, 4, 8, 16, etc...)
Only, for some reason, on the newer old systems (ok, that sounds weird,) it doesn't seem to be working. In emulation, and the 386 system I have access to, it works just fine, but on the P3 system I have (GA-6BXC MB w/P3 800 CPU,) it just doesn't work.
The code
setting up the interrupt
disable();
oldrtc = getvect(0x70); //Reads the vector for IRQ 8
settvect(0x70,countdown); //Sets the vector for
outportb(0x70,0x8a);
y = inportb(0x71) & 0xf0;
outportb(0x70,0x8a);
outportb(0x71,y | _MRATE_); //Adjustable value, set for 64 interrupts per second
outportb(0x70,0x8b);
y = inportb(0x71);
outportb(0x70,0x8b);
outportb(0x71,y | 0x40);
enable();
at the end of the interrupt
outportb(0x70,0x0c);
inportb(0x71); //Reading the C register resets the interrupt
outportb(0xa0,0x20); //Resets the PIC (turns interrupts back on)
outportb(0x20,0x20); //There are 2 PICs on AT machines and later
When closing program down
disable();
outportb(0x70,0x8b);
y = inportb(0x71);
outportb(0x70,0x8b);
outportb(0x71,y & 0xbf);
setvect(0x70,oldrtc);
enable();
I don't see anything in the code that can be causing the problem. But it just doesn't seem to make sense. While I don't completely trust the information, MSD "does" report IRQ 8 as the RTC Counter and says it is present and working just fine. Is it possible that later systems have moved the vector? Everything I find says that IRQ 8 is vector 0x70, but the interrupt never triggers on my Pentium III system. Is there some way to find if the Vectors have been changed?
It's been a LONG time since I've done any MS-DOS code and I don't think I ever worked with this particular interrupt (I'm pretty sure you can just read the memory location to fetch the time too, and IRQ0 can be used to trigger you at an interval too, so maybe that's better. Anyway, given my rustiness, forgive me for kinda link dumping.
http://wiki.osdev.org/Real_Time_Clock the bottom of that page has someone saying they've had problem on some machines too. RBIL suggests it might be a BIOS thing: http://www.ctyme.com/intr/rb-7797.htm
Without DOS, I'd just capture IRQ0 itself and remap all of them to my own interrupt numbers and change the timing as needed. I've done that somewhat recently! I think that's a bad idea on DOS though, this looks more recommended for that: http://www.ctyme.com/intr/rb-2443.htm
Anyway though, I betcha it has to do with the BIOS thing:
"Notes: Many BIOSes turn off the periodic interrupt in the INT 70h handler unless in an event wait (see INT 15/AH=83h,INT 15/AH=86h).. May be masked by setting bit 0 on I/O port A1h "