Reading directory contents in FAT32 - osdev

I am trying to write bare metal FAT32 file system driver on RPI 3B.
I am able to read FAT sectors and root directory sectors using emmc driver.
I have doubt on how to follow FAT entries linked list when next entry pointer (next cluster number) doesn't fit in the current FAT sector?
Should I read the FAT sector each time I get to the new cluster number?
My current understanding is as follows:
Get first cluster number (cluster_number) of directory/file
Read the FAT sector which contains first_cluster_number entry.
Say I read FAT sector as
uint8_t fat_sector[512] = { 0 };
uint32_t this_fat_sector_num, this_fat_entry_offset;
this_fat_sector_num = unusedSectors + reservedSectorCount + ((cluster_number * 4)/ bytesPerSector);
this_fat_entry_offset = (cluster_number * 4)/ bytesPerSector;
read_fat_sector(this_fat_sector_num, & fat_sector[0]);
// Calculate next cluster in chain
uint32_t next_cluster_number = ((uint32_t * fat_sector[this_fat_entry_offset])) & 0x0fffffff;
// Calculate next cluster in chain 1 more time, is below code correct ?
uint32_t next_next_cluster_number = ((uint32_t * fat_sector[next_cluster_number])) & 0x0fffffff;
What happens when the next cluster number is not present in already read fat_sector buffer (512 bytes)?
If cluster number = index of next entry in fat_sector, Do I need to multiply it by 4 given that fat 32 entries span 4 bytes.
If anyone could give some clarity, that will be helpful. Thanks in advance.

Implement a cache (in RAM) of the FAT. Let's say that the cache has enough RAM for 20 sectors and starts out empty.
Next write a "getFATentry" function that checks if the sector is in the cache and finds the right entry in the cache if it is; or (if necessary) evicts something from the cache to make room, fetches the right sector from disk into the cache, then finds the right entry in the cache.
Once that's done you can next_cluster = getFATentry(previous_cluster); without worrying about the cache or any disk IO (but will want to do something when the FAT is modified - e.g. adopt a "write-through" or "write-back" policy).
Note: By adjusting the size of the "FAT cache" you can improve performance or reduce RAM consumption. It'd be nice to allow the cache to grow/shrink dynamically (e.g. grow to be as large as the whole FAT if nothing else needs RAM, but shrink to bare minimum if all the RAM is needed for something else).

I have found solution.
First read the initial fat sector for given cluster number.
Find out thisFatEntryOffset and read the next fat entry.
New fat entry will be new cluster number. Find out the thisFatNumber and thisFatEntryOffset for new cluster number.
If new fat sector != old fat sector then read new fat sector and read entry using thisFatEntryOffset.

Related

Leveldb limit testing - limit Memory used by a program

I'm currently benchmarking an application built on Leveldb. I want to configure it in such a way that the key-values are always read from disk and not from memory.
For that, I need to limit the memory consumed by the program.
I'm using key-value pairs of 100 bytes each and 100000 of them, which makes their size equal to 10 MB. If I set the virtual memory limit to less than 10 MB using ulimit, I can't even run the command Makefile.
1) How can I configure the application so that the key value pairs are always fetched from the disk?
2) What does ulimit -v mean? Does limiting the virtual memory translate to limiting the memory used by the program on RAM?
Perhaps there is no need in reducing available memory, but simply disable cache as described here:
leveldb::ReadOptions options;
options.fill_cache = false;
leveldb::Iterator* it = db->NewIterator(options);
for (it->SeekToFirst(); it->Valid(); it->Next()) {
...
}

Slowdown when reading big file randomly with C++

I've run into some trouble when reading chunks of data at random locations all over a big file (>4GB).
The task is to save a 3D datacube to a file and transpose the axes while not loading the whole dataset into RAM.
The storage format is as follows:
I've got 3 Integer at the beginning of the File, storing the dimensions (nX, nY, nZ ).
After that follows the data as lines with length nX.
These Lines are repeated nY times which results in a page and the pages are repeated nZ times.
Meaning:
A line has nX bytes
A page has nX * nY bytes
The file has nX * nY * nZ + 12 bytes
To transpose the dataset i execute the following loop:
for( int i=0;i<nY;i++ )
{
for( int j=0;j<nZ;j++ )
{
read( pBuf, i*nX+j*nY*nX );//read nX bytes from offset i*nX+j*nX*nY
writeNext(pBuf);
}
}
When using fopen, _fseeki64 and fread it happens that after approx. 30% of the overall reads every 6th read or so takes up to 7 s, since there are multiple millions of those reads i can't accept these delays.
Thus i implemented the same algorithm with memory mapped files (CreateFile, CreateFileMapping and MapViewOfFile), but now every 6th read takes about 2 s.
Is there a method/chance of increasing the readout speed?
EDIT1:
I've added some code at http://pastebin.com/MejiTKj0
EDIT2:
Some may notice an inconsistency regarding the offset in the read function. To simplify matters i didn't tell about all variables saved in the file header thus the offset of 15 bytes is okay
If you have a HDD disk on which the files are stored, you should know that seek times dominate heavily when trying to perform random access. You may find you're better off reading the entire file sequentially into memory (a relatively quick operation compared to seek) and then performing your processing on the memory data instead. You may find this is quicker even if you need only a relatively small percentage of the overall file data.
In your loop Z / nZ should be outer most loop and Y should be inner loop. That would save seek times, if the storage memory layout has stored a nZ pages one by one .
In the current code displayed it shows nZ in inner loop, which is no good. The current arrangement of loops is analogous to book reading, with reading first line for each page of the book, then reading second line and so on;
Thank you all very much for your input.
Actually the first thing i should have checked was at fault, being the HDD, which wasn't able to provide the needed datarate.
I'm now thinking about switching to a SSD - Device.

Irregular file writing performance in c++

I am writing an app which receives a binary data stream wtih a simple function call like put(DataBLock, dateTime); where each data package is 4 MB
I have to write these datablocks to seperate files for future use with some additional data like id, insertion time, tag etc...
So I both tried these two methods:
first with FILE:
data.id = seedFileId;
seedFileId++;
std::string fileName = getFileName(data.id);
char *fNameArray = (char*)fileName.c_str();
FILE* pFile;
pFile = fopen(fNameArray,"wb");
fwrite(reinterpret_cast<const char *>(&data.dataTime), 1, sizeof(data.dataTime), pFile);
data.dataInsertionTime = time(0);
fwrite(reinterpret_cast<const char *>(&data.dataInsertionTime), 1, sizeof(data.dataInsertionTime), pFile);
fwrite(reinterpret_cast<const char *>(&data.id), 1, sizeof(long), pFile);
fwrite(reinterpret_cast<const char *>(&data.tag), 1, sizeof(data.tag), pFile);
fwrite(reinterpret_cast<const char *>(&data.data_block[0]), 1, data.data_block.size() * sizeof(int), pFile);
fclose(pFile);
second with ostream:
ofstream fout;
data.id = seedFileId;
seedFileId++;
std::string fileName = getFileName(data.id);
char *fNameArray = (char*)fileName.c_str();
fout.open(fNameArray, ios::out| ios::binary | ios::app);
fout.write(reinterpret_cast<const char *>(&data.dataTime), sizeof(data.dataTime));
data.dataInsertionTime = time(0);
fout.write(reinterpret_cast<const char *>(&data.dataInsertionTime), sizeof(data.dataInsertionTime));
fout.write(reinterpret_cast<const char *>(&data.id), sizeof(long));
fout.write(reinterpret_cast<const char *>(&data.tag), sizeof(data.tag));
fout.write(reinterpret_cast<const char *>(&data.data_block[0]), data.data_block.size() * sizeof(int));
fout.close();
In my tests the first methods looks faster, but my main problem is in both ways at first everythings goes fine, for every file writing operation it tooks almost the same time (like 20 milliseconds), but after the 250 - 300th package it starts to make some peaks like 150 to 300 milliseconds and then goes down to 20 milliseconds and then again 150 ms and so on... So it becomes very unpredictable.
When I put some timers to the code I figured out that the main reason for these peaks are because of the fout.open(...) and pfile = fopen(...) lines. I have no idea if this is because of the operating system, hard drive, any kind of cache or buffer mechanism etc...
So the question is; why these file opening lines become problematic after some time, and is there a way to make file writing operation stable, I mean fixed time?
Thanks.
NOTE: I'm using Visual studio 2008 vc++, Windows 7 x64. (I tried also for 32 bit configuration but the result is same)
EDIT: After some point writing speed slows down as well even if the opening file time is minimum. I tried with different package sizes so here are the results:
For 2 MB packages it takes double time to slow down, I mean after ~ 600th item slowing down begins
For 4 MB packages almost 300th item
For 8 MB packages almost 150th item
So it seems to me it is some sort of caching problem or something? (in hard drive or OS). But I also tried with disabling hard drive cache but nothing changed...
Any idea?
This is all perfectly normal, you are observing the behavior of the file system cache. Which is a chunk of RAM that's is set aside by the operating system to buffer disk data. It is normally a fat gigabyte, can be much more if your machine has lots of RAM. Sounds like you've got 4 GB installed, not that much for a 64-bit operating system. Depends however on the RAM needs of other processes that run on the machine.
Your calls to fwrite() or ofstream::write() write to a small buffer created by the CRT, it in turns make operating system calls to flush full buffers. The OS writes normally completely very quickly, it is a simple memory-to-memory copy going from the CRT buffer to the file system cache. Effective write speed is in excess of a gigabyte/second.
The file system driver lazily writes the file system cache data to the disk. Optimized to minimize the seek time on the write head, by far the most expensive operation on the disk drive. Effective write speed is determined by the rotational speed of the disk platter and the time needed to position the write head. Typical is around 30 megabytes/second for consumer-level drives, give or take a factor of 2.
Perhaps you see the fire-hose problem here. You are writing to the file cache a lot faster than it can be emptied. This does hit the wall eventually, you'll manage to fill the cache to capacity and suddenly see the perf of your program fall off a cliff. Your program must now wait until space opens up in the cache so the write can complete, effective write speed is now throttled by disk write speeds.
The 20 msec delays you observe are normal as well. That's typically how long it takes to open a file. That is a time that's completely dominated by disk head seek times, it needs to travel to the file system index to write the directory entry. Nominal times are between 20 and 50 msec, you are on the low end of that already.
Clearly there is very little you can do in your code to improve this. What CRT functions you use certainly don't make any difference, as you found out. At best you could increase the size of the files you write, that reduces the overhead spent on creating the file.
Buying more RAM is always a good idea. But it of course merely delays the moment where the firehose overflows the bucket. You need better drive hardware to get ahead. An SSD is pretty nice, so is a striped raid array. Best thing to do is to simply not wait for your program to complete :)
So the question is; why these file opening lines become problematic
after some time, and is there a way to make file writing operation
stable, I mean fixed time?
This observation(.i.e. varying time taken in write operation) does not mean that there is problem in OS or File System.There could be various reason behind your observation. One possible reason could be the delayed write may be used by kernel to write the data to disk. Sometime kernel cache it(buffer) in case another process should read or write it soon so that extra disk operation can be avoided.
This situation may lead to inconsistency in the time taken in different write call for same size of data/buffer.
File I/O is bit complex and complicated topic and depends on various other factors. For complete information on internal algorithm on File System, you may want to refer the great great classic book "The Design Of UNIX Operating System" By Maurice J Bach which describes these concepts and the implementation in detailed way.
Having said that, you may want to use the flush call immediately after your write call in both version of your program(.i.e. C and C++). This way you may get the consistent time in your file I/O write time. Otherwise your programs behaviour look correct to me.
//C program
fwrite(data,fp);
fflush(fp);
//C++ Program
fout.write(data);
fout.flush();
It's possible that the spikes are not related to I/O itself, but NTFS metadata: when your file count reach some limit, some NTFS AVL-like data structure needs some refactoring and... bump!
To check it you should preallocate the file entries, for example creating all the files with zero size, and then opening them when writing, just for testing: if my theory is correct you shouldn't see your spikes anymore.
UHH - and you must disable file indexing (Windows search service) there! Just remembered of it... see here.

Dynamically creating the volume based on the size of ubifs image size

I have a requirement to create a new volume (it can be static) based on the size of the ubifs image (say rootfs.ubifs) which I am going to write into that volume. The aim is to create the volume with the minimum possible size required to write 'rootfs.ubifs' to that volume and boot the device from it.
Can somebody please help me in this regard?
The difference is the overhead of the UBI layer. This is documented as O in the web page or,
O - the overhead related to storing EC and VID headers in bytes, i.e. O = SP - SL.
SP is a physical erase block size and SL is what UbiFs will get. Usually, it is the minimum page size times two. One for an EC and another for a VID; these are the two structures that UBI uses to manage the flash. Both are defined in ubi-media.h. EC is the ubi_ec_hdr structure and VID is the ubi_vid_hdr structure. The EC or erase count is written every time an erase block is erased and this is responsible for wear leveling.note The VID or volume id header allows UBI to support multiple volumes and provide the PEB to LEB (physical to logical erase block) management.
So for a 2k page NAND flash without sub-pages, it is 4k; if sub-pages are supported then it is possible to put both headers in the same page and only 2k is needed. If your flash page size differs, you just need to multiply by two without sub-pages and only add the page overhead if you have sub-pages. The overhead for NOR flash is 256 bytes as it doesn't have the idea of pages.
In order to create your rootfs.ubifs, you must have specified a logic erase block size (to mkfs.ubifs). The difference between logical erase block (LEB) and physical erase block (PEB) is just the overhead documented above. Multiply your rootfs.ubifs by PEB/LEB to get the minimum possible size for the UBI volume.
note: If an erase is interrupted (reset/power cycle) between the actual erase and the EC write, an average of all other erase blocks is used to set the erase count when UBI re-reads the ubi device.

Reading binary files, Linux Buffer Cache

I am busy writing something to test the read speeds for disk IO on Linux.
At the moment I have something like this to read the files:
Edited to change code to this:
const int segsize = 1048576;
char buffer[segsize];
ifstream file;
file.open(sFile.c_str());
while(file.readsome(buffer,segsize)) {}
For foo.dat, which is 150GB, the first time I read it in, it takes around 2 minutes.
However if I run it within 60 seconds of the first run, it will then take around 3 seconds to run. How is that possible? Surely the only place that could be read from that fast is the buffer cache in RAM, and the file is too big to fit in RAM.
The machine has 50GB of ram, and the drive is a NFS mount with all the default settings. Please let me know where I could look to confirm that this file is actually being read at this speed? Is my code wrong? It appears to take a correct amount of time the first time the file is read.
Edited to Add:
Found out that my files were only reading up to a random point. I've managed to fix this by changing segsize down to 1024 from 1048576. I have no idea why changing this allows the ifstream to read the whole file instead of stopping at a random point.
Thanks for the answers.
On Linux, you can do this for a quick troughput test:
$ dd if=/dev/md0 of=/dev/null bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 0.863904 s, 243 MB/s
$ dd if=/dev/md0 of=/dev/null bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 0.0748273 s, 2.8 GB/s
$ sync && echo 3 > /proc/sys/vm/drop_caches
$ dd if=/dev/md0 of=/dev/null bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 0.919688 s, 228 MB/s
echo 3 > /proc/sys/vm/drop_caches will flush the cache properly
in_avail doesn't give the length of the file, but a lower bound of what is available (especially if the buffer has already been used, it return the size available in the buffer). Its goal is to know what can be read without blocking.
unsigned int is most probably unable to hold a length of more than 4GB, so what is read can very well be in the cache.
C++0x Stream Positioning may be interesting to you if you are using large files
in_avail returns the lower bound of how much is available to read in the streams read buffer, not the size of the file. To read the whole file via the stream, just keep
calling the stream's readsome() method and checking how much was read with the gcount() method - when that returns zero, you have read everthing.
It appears to take a correct amount of time the first time the file is read.
On that first read, you're reading 150GB in about 2 minutes. That works out to about 10 gigabits per second. Is that what you're expecting (based on the network to your NFS mount)?
One possibility is that the file could be at least in part sparse. A sparse file has regions that are truly empty - they don't even have disk space allocated to them. These sparse regions also don't consume much cache space, and so reading the sparse regions will essentially only require time to zero out the userspace pages they're being read into.
You can check with ls -lsh. The first column will be the on-disk size - if it's less than the file size, the file is indeed sparse. To de-sparse the file, just write to every page of it.
If you would like to test for true disk speeds, one option would be to use the O_DIRECT flag to open(2) to bypass the cache. Note that all IO using O_DIRECT must be page-aligned, and some filesystems do not support it (in particular, it won't work over NFS). Also, it's a bad idea for anything other than benchmarking. See some of Linus's rants in this thread.
Finally, to drop all caches on a linux system for testing, you can do:
echo 3 > /proc/sys/vm/drop_caches
If you do this on both client and server, you will force the file out of memory. Of course, this will have a negative performance impact on anything else running at the time.