Dynamically creating the volume based on the size of ubifs image size - ubifs

I have a requirement to create a new volume (it can be static) based on the size of the ubifs image (say rootfs.ubifs) which I am going to write into that volume. The aim is to create the volume with the minimum possible size required to write 'rootfs.ubifs' to that volume and boot the device from it.
Can somebody please help me in this regard?

The difference is the overhead of the UBI layer. This is documented as O in the web page or,
O - the overhead related to storing EC and VID headers in bytes, i.e. O = SP - SL.
SP is a physical erase block size and SL is what UbiFs will get. Usually, it is the minimum page size times two. One for an EC and another for a VID; these are the two structures that UBI uses to manage the flash. Both are defined in ubi-media.h. EC is the ubi_ec_hdr structure and VID is the ubi_vid_hdr structure. The EC or erase count is written every time an erase block is erased and this is responsible for wear leveling.note The VID or volume id header allows UBI to support multiple volumes and provide the PEB to LEB (physical to logical erase block) management.
So for a 2k page NAND flash without sub-pages, it is 4k; if sub-pages are supported then it is possible to put both headers in the same page and only 2k is needed. If your flash page size differs, you just need to multiply by two without sub-pages and only add the page overhead if you have sub-pages. The overhead for NOR flash is 256 bytes as it doesn't have the idea of pages.
In order to create your rootfs.ubifs, you must have specified a logic erase block size (to mkfs.ubifs). The difference between logical erase block (LEB) and physical erase block (PEB) is just the overhead documented above. Multiply your rootfs.ubifs by PEB/LEB to get the minimum possible size for the UBI volume.
note: If an erase is interrupted (reset/power cycle) between the actual erase and the EC write, an average of all other erase blocks is used to set the erase count when UBI re-reads the ubi device.

Related

Question about the Cortex-M3 vector table placement

I am trying to understand the placement of the vector table for Cortex-M3 processor.
According to the Cortex-M3 arch ref manual, the reset behavior is like this (some parts are omitted):
So, we can see that the vectortable comes from the VTOR (Vector Table Offset Register).
According to the Cortex-M3 tech ref manual, the VTOR is defined as:
So we can see, it has a reset value of 0x0. So based on the above 2 criteria, the Cortex-M3 processor expects a vector table at the absolute address 0x0 in the Code area after reset.
But in my MDK uVision IDE, I see my application is placed in the IROM1 area, which starts at 0x8000000, which is within the 0.5G Code memory area according to the Cortex-M3 memory map.
And since it has the Starup button checked, I guess that means the IROM1 area should contain the vector table (please correct me if I am wrong about this).
So I think the vector table should lie at the beginning of IROM1 area, i.e. 0x8000000. And it is indeed so. Below pic shows that at the beginning of IROM1, it is the vector table's 1st entry, the SP value.
And what's more strange, the VTOR register (at 0xE000ED08) still holds a 0x0 value:
So, how could my vector table be found with a 0x0 VTOR reset value?
And just out of curiosity, I checked the memory content at 0x0, there contains exactly the same vector table content as IROM1. So who did this magic copy??
ADD 1 - 4:39 PM 10/9/2020
I guess there must be something I don't know about the startup check box in below pic.
ADD 2 - 5:09 PM 10/9/2020
Thanks to #RealtimeRik and #domen. I downloaded the datasheet for STM32F103x8_xB(https://www.st.com/resource/en/datasheet/stm32f103c8.pdf). In section 4 Memory mapping, I saw below diagram:
So it seems the [0x0, 0x8000000) range does get aliased to somewhere else. But I haven't found how to determine where it is aliased to...
ADD 3 - 5:39 PM 10/9/2020
Now I found it!
I downloaded the STM32Fxxx fef manual (btw it's really huge).
In section 3.4 Boot configuration, it specifies the boot mode configured through the BOOT[1:0] pins.
And with different boot mode, different address aliasing is used:
Depending on the selected boot mode, main Flash memory, system memory
or SRAM is accessible as follows:
Boot from main Flash memory: the main Flash memory is aliased in the boot memory space (0x0000 0000), but still accessible from its
original memory space (0x800 0000). In other words, the Flash memory
contents can be accessed starting from address 0x0000 0000 or 0x800 0000.
Boot from system memory: the system memory is aliased in the boot memory space (0x0000 0000), but still accessible from its original
memory space (0x1FFF B000 in connectivity line devices, 0x1FFF F000 in
other devices).
Boot from the embedded SRAM: SRAM is accessible only at address 0x2000 0000.
What I saw is Boot from main Flash memory.
Well finally I can explain why 0x800 0000 is chosen...
ADD 4 - 3:19 PM 10/15/2020
The placement/expectation of the interrupt vector table at the address 0 is similar to the IA32 processor in real mode...
There is no "Magic Copy". 0x00000000 is aliased to 0x08000000.
The actual memory is physically located at 0x08000000 but can also be access at 0x00000000.
If you look in the processor specific reference manual you should find this in the the memory map section.

Reading directory contents in FAT32

I am trying to write bare metal FAT32 file system driver on RPI 3B.
I am able to read FAT sectors and root directory sectors using emmc driver.
I have doubt on how to follow FAT entries linked list when next entry pointer (next cluster number) doesn't fit in the current FAT sector?
Should I read the FAT sector each time I get to the new cluster number?
My current understanding is as follows:
Get first cluster number (cluster_number) of directory/file
Read the FAT sector which contains first_cluster_number entry.
Say I read FAT sector as
uint8_t fat_sector[512] = { 0 };
uint32_t this_fat_sector_num, this_fat_entry_offset;
this_fat_sector_num = unusedSectors + reservedSectorCount + ((cluster_number * 4)/ bytesPerSector);
this_fat_entry_offset = (cluster_number * 4)/ bytesPerSector;
read_fat_sector(this_fat_sector_num, & fat_sector[0]);
// Calculate next cluster in chain
uint32_t next_cluster_number = ((uint32_t * fat_sector[this_fat_entry_offset])) & 0x0fffffff;
// Calculate next cluster in chain 1 more time, is below code correct ?
uint32_t next_next_cluster_number = ((uint32_t * fat_sector[next_cluster_number])) & 0x0fffffff;
What happens when the next cluster number is not present in already read fat_sector buffer (512 bytes)?
If cluster number = index of next entry in fat_sector, Do I need to multiply it by 4 given that fat 32 entries span 4 bytes.
If anyone could give some clarity, that will be helpful. Thanks in advance.
Implement a cache (in RAM) of the FAT. Let's say that the cache has enough RAM for 20 sectors and starts out empty.
Next write a "getFATentry" function that checks if the sector is in the cache and finds the right entry in the cache if it is; or (if necessary) evicts something from the cache to make room, fetches the right sector from disk into the cache, then finds the right entry in the cache.
Once that's done you can next_cluster = getFATentry(previous_cluster); without worrying about the cache or any disk IO (but will want to do something when the FAT is modified - e.g. adopt a "write-through" or "write-back" policy).
Note: By adjusting the size of the "FAT cache" you can improve performance or reduce RAM consumption. It'd be nice to allow the cache to grow/shrink dynamically (e.g. grow to be as large as the whole FAT if nothing else needs RAM, but shrink to bare minimum if all the RAM is needed for something else).
I have found solution.
First read the initial fat sector for given cluster number.
Find out thisFatEntryOffset and read the next fat entry.
New fat entry will be new cluster number. Find out the thisFatNumber and thisFatEntryOffset for new cluster number.
If new fat sector != old fat sector then read new fat sector and read entry using thisFatEntryOffset.

Monitor buffers in GNU Radio

I have a question regarding buffering in between blocks in GNU Radio. I know that each block in GNU (including custom blocks) have buffers to store items that are going to be sent or received items. In my project, there is a certain sequence I have to maintain to synchronize events between blocks. I am using GNU radio on the Xilinx ZC706 FPGA platform with the FMCOMMS5.
In the GNU radio companion I created a custom block that controls a GPIO Output port on the board. In addition, I have an independent source block that is feeding information into the FMCOMMS GNU block. The sequence I am trying to maintain is that, in GNU radio, I first send data to the FMCOMMS block, second I want to make sure that the data got consumed by the FMCOMMS block (essentially by checking buffer), then finally I want to control the GPIO output.
From my observations, the source block buffer doesn’t seem to send the items until it’s full. This will cause a major issue in my project because this means that the GPIO data will be sent before or in parallel with sending the items to the other GNU blocks. That’s because I’m setting the GPIO value through direct access to its address in the ‘work’ function of my custom block.
I tried to use pc_output_buffers_full() in the ‘work’ function of my custom source in order to monitor the buffer, but I’m always getting 0.00. I’m not sure if it’s supposed to be used in custom blocks or if the ‘buffer’ in this case is something different from where the output items are stored. Here's a small code snippet which shows the problem:
char level_count = 0, level_val = 1;
vector<float> buff (1, 0.0000);
for(int i=0; i< noutput_items; i++)
{
if(level_count < 20 && i< noutput_items)
{
out[i] = gr_complex((float)level_val,0);
level_count++;
}
else if(i<noutput_items)
{
level_count = 0;
level_val ^=1;
out[i] = gr_complex((float)level_val,0);
}
buff = pc_output_buffers_full();
for (int n = 0; n < buff.size(); n++)
cout << fixed << setw(5) << setprecision(2) << setfill('0') << buff[n] << " ";
cout << "\n";
}
Is there a way to monitor the buffer so that I can determine when my first part of data bits have been sent? Or is there a way to make sure that the each single output item is being sent like a continuous stream to the next block(s)?
GNU Radio Companion version: 3.7.8
OS: Linaro 14.04 image running on the FPGA
Or is there a way to make sure that the each single output item is being sent like a continuous stream to the next block(s)?
Nope, that's not how GNU Radio works (at all!):
A while back I wrote an article that explains how GNU Radio deals with buffers, and what these actually are. While the in-memory architecture of GNU Radio buffers might be of lesser interest to you, let me quickly summarize the dynamics of it:
The buffers that (general_)work functions are called with behave for all that's practical like linearly addressable ring buffers. You get a random number of samples at once (restrictable to minimum numbers, multiples of numbers), and all that you not consume will be handed to you the next time work is called.
These buffers hence keep track of how much you've consumed, and thus, how much free space is in a buffer.
The input buffer a block sees is actually the output buffer of the "upstream" block in the flow graph.
GNU Radio's computation is backpressure-controlled: Any block's work method will immediately be called in an endless loop given that:
There's enough input for the block to do work,
There's enough output buffer space to write to.
Therefore, as soon as one block finishes its work call, the upstream block is informed that there's new free output space, thus typically leading to it running
That leads to high parallelity, since even adjacent blocks can run simultaneously without conflicting
This architecture favors large chunks of input items, especially for blocks that take a relative long time to computer: while the block is still working, its input buffer is already being filled with chunks of samples; when it's finished, chances are it's immediately called again with all the available input buffer being already filled with new samples.
This architecture is asynchronous: even if two blocks are "parallel" in your flow graph, there's no defined temporal relation between the numbers of items they produce.
I'm not even convinced switching GPIOs at times based on the speed computation in this completely non-deterministic timing data flow graph model is a good idea to start with. Maybe you'd rather want to calculate "timestamps" at which GPIOs should be switched, and send (timestamp, gpio state) command tuples to some entity in your FPGA that keeps absolute time? On the scale of radio propagation and high-rate signal processing, CPU timing is really inaccurate, and you should use the fact that you have an FPGA to actually implement deterministic timing, and use the software running on the CPU (i.e. GNU Radio) to determine when that should happen.
Is there a way to monitor the buffer so that I can determine when my first part of data bits have been sent?
Other than that, a method to asynchronously tell another another block that, yes, you've processed N samples, would be either to have a single block that just observes the outputs of both blocks that you want to synchronize and consumes an identical number of samples from both inputs, or to implement something using message passing. Again, my suspicion is that this is not a solution to your actual problem.

Aerospike error: All batch queues are full

I am running an Aerospike cluster in Google Cloud. Following the recommendation on this post, I updated to the last version (3.11.1.1) and re-created all servers. In fact, this change cause my 5 servers to operate in a much lower CPU load (it was around 75% load before, now it is on 20%, as show in the graph bellow:
Because of this low load, I decided to reduce the cluster size to 4 servers. When I did this, my application started to receive the following error:
All batch queues are full
I found this discussion about the topic, recommending to change the parameters batch-index-threads and batch-max-unused-buffers with the command
asadm -e "asinfo -v 'set-config:context=service;batch-index-threads=NEW_VALUE'"
I tried many combinations of values (batch-index-threads with 2,4,8,16) and none of them solved the problem, and also changing the batch-index-threads param. Nothing solves my problem. I keep receiving the All batch queues are full error.
Here is my aerospace.conf relevant information:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
paxos-recovery-policy auto-reset-master
pidfile /var/run/aerospike/asd.pid
service-threads 32
transaction-queues 32
transaction-threads-per-queue 4
batch-index-threads 40
proto-fd-max 15000
batch-max-requests 30000
replication-fire-and-forget true
}
I use 300GB SSD disks on these servers.
A quick note which may or may not pertain to you:
A common mistake we have seen in the past is that developers decide to use 'batch get' as a general purpose 'get' for single and multiple record requests. The single record get will perform better for single record requests.
It's possible that you are being constrained by the network between the clients and servers. Reducing from 5 to 4 nodes reduced the aggregate pipe. In addition, removing a node will start cluster migrations which adds additional network load.
I would look at the batch-max-buffer-per-queue config parameter.
Maximum number of 128KB response buffers allowed in each batch index
queue. If all batch index queues are full, new batch requests are
rejected.
In conjunction with raising this value from the default of 255 you will want to also raise the batch-max-unused-buffers to batch-index-threads x batch-max-buffer-per-queue + 1 (at least). If you do not do that new buffers will be created and destroyed constantly, as the amount of free (unused) buffers is smaller than the ones you're using. The moment the batch response is served the system will strive to trim the buffers down to the max unused number. You will see this reflected in the batch_index_created_buffers metric constantly rising.
Be aware that you need to have enough DRAM for this. For example if you raise the batch-max-buffer-per-queue to 320 you will consume
40 (`batch-index-threads`) x 320 (`batch-max-buffer-per-queue`) x 128K = 1600MB
For the sake of performance the batch-max-unused-buffers should be set to 13000 which will have a max memory consumption of 1625MB (1.59GB) per-node.

Concatenate data in an array in C ++

I'm working on software for processing audio in real time in C++ with Qt. I need that requirements are minimized.
Defining a temporary buffer 40ms, launching our device with a sampling frequency Fs = 8000Hz, every 320 samples entered a feature called Data Processing ().
The idea is to have a global buffer that stores the 10s last recorded, 80000 samples.
This Buffer in each iteration eliminates the initial 320 samples and looped at the end, 320 new samples. Thus the buffer is updated and the user can observe the real-time graphical representation of the recorded signal.
At first I thought of using QVector (equivalent to std::vector but for Qt) for this deployment, thus we reduce the process a few lines of code
int NUM_POINTS=320;
DatosTemporales.erase(DatosTemporales.begin(),DatosTemporales.begin()+NUM_POINTS);
DatosTemporales+= (DatosNuevos); // Datos Nuevos con un tamaño de NUM_POINTS
In each iteration we create a vector of 80000 samples in addition to free some positions so requires some processing time. An alternative for opting was the use of * double, and iterations a loop:
for(int i=0;i<80000;i++){
if(i<80000-NUM_POINTS){
aux=DatosTemporales[i];
DatosTemporales[i+NUM_POINTS]=aux;
}else{
DatosTemporales[i]=DatosNuevos[i-NUN_POINTS];
}
}
Does fails. I think the best way is to use dynamic memory. Implementing this process by pointers. Could anyone give me some idea how to implement it?
It sounds like what you are looking for is a circular buffer.
https://www.google.com/search?q=qcircularbuffer
https://qt.gitorious.org/qt/qtbase/merge_requests/60
And it looks like you only need the header file and you should be good to go.
A similar tool that is already in the Qt data set is found here:
http://doc.qt.io/qt-5/qcontiguouscache.html#details
The advantage of using a system like these presented, is that they don't need to have dynamic memory, it just needs to move the head and the tail pointers.
Hope that helps.