I need to read an array from a file. The array is not ordered continuously in the file, have to jump "offset" bytes to get the next element.
What is more efficient, assuming that I read a very large file.
1) Use an incremental relative position.
2) Use an absolute position.
option 1:
int var[N];
seekg(0);
for (int i=0; i<N; i++) {
file.read( (char*) var+i, sizeof(int))
seekg(offset,ios_base::cur);
}
option 2:
int var[N];
for (int i=0; i<N; i++) {
file.seekg(offset*i);
read( (char*) var+i, sizeof(int))
}
read will already advance the position, so you don't need to seek inside the loop. Moreover, arrays are laid out contiguously in memory, so you can just say:
std::vector<int> var(N);
auto res = file.read(reinterpret_cast<char*>(var.data()), sizeof(int) * var.size());
Just make sure to check the value of res and of file afterwards:
if (!file || res != sizeof(int) * var.size())
{
// an error occurred
}
If you're reading from random parts of the file, it makes no difference how you seek (files are essentially "random access"). But be sure to run the above test after every single read to catch errors.
I'm 99.9% sure that it will make no difference at all (aside from correctness in terms of offset needs to be correctly adjusted for the fact that you've moved sizeof(int) bytes forward in the relative case, and not in the absolute case. In both cases, you do a seek, which will move the current position in the file. The actual code in the filesystem that deals with that will ultimately move to an absolute position by calculating it from the current one in the case of ios_base::cur).
If it's REALLY important for you to know which is better, then benchmark the two options. But I'm pretty certain that it makes absolutely no difference at all inside the actual seek function in the filesystem. It's just a large integer (probably 64 bits) keeping track of where in the file you are reading (or writing) next.
Related
I am trying to copy from an array of arrays, to another one, while leaving a space between arrays in the target.
They are both contiguous each vector sizes size is between 5000 and 52000 floats,
Output_jump is the vector size times eight, and vector_count vary in my tests.
I did the best I learned here https://stackoverflow.com/a/34450588/1238848 and here https://stackoverflow.com/a/16658555/1238848
but still it seems so slow.
void copyToTarget(const float *input, float *output, int vector_count, int vector_size, int output_jump)
{
int left_to_do,offset;
constexpr int block=2048;
constexpr int blockInBytes = block*sizeof(float);
float temp[2048];
for (int i = 0; i < vector_count; ++i)
{
left_to_do = vector_size;
offset = 0;
while(left_to_do > block)
{
memcpy(temp, input, blockInBytes);
memcpy(output, temp, blockInBytes);
left_to_do -= block;
input += block;
output += block;
}
if (left_to_do)
{
memcpy(temp, input, left_to_do*sizeof(float));
memcpy(output, temp, left_to_do*sizeof(float));
input += left_to_do;
output += left_to_do;
}
output += output_jump;
}
}
I'm skeptical of the answer you linked, which encourages avoiding a function call to memcpy. Surely the implementation of memcpy is very well optimized, probably hand written in assembly, and therefore hard to beat! Moreover for large-sized copies, the function call overhead is negligible compared to memory access latency. So simply calling memcpy is likely the fastest way to copy contiguous bytes around in memory.
If output_jump were zero, a single call to memcpy can copy input directly to output (and this would be hard to beat). For nonzero output_jump, the copy needs to be divided up over the contiguous vectors. Use one memcpy per vector, without the temp buffer, copying directly from input + i * vector_size to output + i * (vector_size + output_jump).
But better yet, like the top answer on that thread suggests, try if possible to find a way to avoid copying data in the first place.
I have a function which opens a file from an SD card, uses the file size to set the size of a buffer, writes a block of information to that buffer, then does something with that information, as shown in this code:
char filename = "filename.txt";
uint16_t duration;
uint16_t pixel;
int q = 0;
int w = 0;
bool largefile;
File f;
int readuntil;
long large_buffer;
f = SD.open(filename);
if(f.size() > 3072) {
w = 3072;
} else {
w = f.size();
}
uint8_t buffer[w];
while(f.available()) {
f.read(buffer, sizeof(buffer));
while(q < sizeof(buffer)) {
doStuffWithInformation(buffer[q++]);
}
q=0;
}
f.close;
This works great with smaller file sizes, but anything over the hard limit buffer size of 3072 (which I arrived at empirically, its just the amount of memory that can be safely committed to this function), runs into a problem. Larger files read fine until they hit the last loop of while(f.available()), where they read the end of the file, but then continue reading the buffer, the tail end of which is filled with data from the last loop, that wasn't overwritten by the latest f.read(). How can I make sure that the last loop of the while(f.available()) function only works with the information that was written to the buffer during the current loop? My only idea right now is to solve for factors of the file size, and set the buffer size as the largest factor less than 3072, but this seems intensive to run every time this function is called. Is there an elegant solution staring me in the face?
Your program is not behaving correctly because f.read() is not guaranteed to read the whole buffer. Moreover, it is bound to happen when you read the last chunk of the file, unless the file size is a factor of buffer size (3072 in your case).
While Arduino specification (https://www.arduino.cc/en/Reference/FileRead) doesn't say so, SD.read function returns the number of bytes read. See code of the library here: https://github.com/arduino-libraries/SD/blob/master/src/utility/SdFile.cpp, int16_t SdFile::read(void* buf, uint16_t nbyte)
Knowing that, you should change your loop as following (while also rewriting it as a for loop for better readability and removing q definition above):
while(f.available()) {
uint16_t sz = f.read(buffer, sizeof(buffer));
for (uint16_t q = 0; q < sz; ++q) {
doStuffWithInformation(buffer[q]);
}
}
On a side note, now, when you have this logic in place, it would make sense for you to do away with variable length array and use a fixed buffer of size 512 - the standard sector size on the SD card. Most likely, it will yield the same performance in regards to read, and slightly better performance in regards to sizeof, which will becomes a compile-time constant rather than a run-time calculation. This also makes your program simpler. This makes for following code:
f = SD.open(filename);
...
uint8_t buffer[512];
I have this Matlab code:
[arr1 arr2 arr3] = fReadFileBin(filename));
Where the body of the functions is :
function [Result1 , Result2 , Result3 ] = fReadFileBin(filename)
fid = fopen(filename, 'r');
fseek(fid, 180, 0);
PV = fread(fid, [A*3 B+2], 'float32');
fclose(fid);
Result1 = PV(1:3:3*A, 2:B+1);
Result1 = Result1';
Result2 = PV(2:3:3*A, 2:B+1);
Result2 = Result2';
Result3 = PV(3:3:3*A, 2:B+1);
Result3 = Result3';
As result I have 3 filled vectors of size BxA and type Double.
When I tried to rewrite it in C++:
std::vector<std::vector<double>> result;
result.resize(B, std::vector<double>(A));
std::ifstream is(filename, std::ios::binary);
is.seekg(0, std::ios_base::end);
std::size_t size = is.tellg();
is.seekg(0, std::ios_base::beg);
is.seekg( 180, 0);
std::vector<double> PV (size / sizeof(double));
if (!is.read((char*)&PV[0], size))
{
throw std::runtime_error("error reading file");
}
// Load the data
is.read((char*)&PV[0], size);
is.close();
// std::vector<double> Result1 =
// std::vector<double> Result2 =
// std::vector<double> Result3 =
//R=R'
//R[j][i] = R[i][j];
This question does make sense for me, but still don't get how I can rewrite this part: (1:3:3*A, 2:B+1) in C++ ?
Notes:
-I'm limited to use only standard libraries (no boost, mmap, etc.).
-I checked Mathwork documentation about colon (and still cannot understand how to implement it).
As the result size of the vectors is fix, I'd rather use std::array:
std::array<std::vector<double>, 3> result;
No resize then any more either, which would have looked much simpler anyway:
//result.resize(B, std::vector<double>(A));
result.resize(3);
With this line, your outer vector now contains exactly three vectors - each of them yet empty - just as with the array approach. Which ever you finally select, you need to resize the inner vectors explicitly then. We'll come back to this later, though.
is.seekg(0, std::ios_base::end);
std::size_t size = is.tellg(); // OK, you fetched file size
//is.seekg(0, std::ios_base::beg); // using beg, you can give the desired offset directly
//is.seekg( 180, 0); // but use the seekdir enum! So:
is.seekg(180, std::ios_base::beg);
However, you should check for the file having at least 180 bytes before. You should be aware that any of these operations might fail, so you should check the stream's state, either after each single operation or at least after several of them in group (so at least before resizing your vector PV). Side note: If the stream is already in fail state, every subsequent operation will fail, too, unless you clear() the error state before.
std::vector<double> PV (size / sizeof(double));
Uh, looks strange to me... You start at offset 180, so I assume you should subtract before division; i. e.:
size_t size = ...;
if(size < 180) // bad file!
{
// throw or whatever appropriate
}
size -= 180;
// go on...
Without this fix, next line would have always resulted in the following exception being thrown because you would have read beyond the end of the file (remember, you started reading from file offset 180!):
if (!is.read((char*)&PV[0], size))
Prefer C++ style casts, though:
if (!is.read(reinterpret_cast<char*>(PV.data()), size))
You'll discover quickly that you need the reinterpret_cast, sometimes appropriate, but should at least ring the alarm bells if you consider using it, in most cases, it is just hiding away some deeper problem such as undefined behaviour. PV.data() exists since C++11 and reads a little easier than &PV[0], but both are equivalent.
However, we now have yet a different issue:
Although the standard does not state anything about precision or even format ("The value representation of floating-point types is implementation-defined."), it is most likely that on your system double is a 64-bit IEEE754 data type. Are you really sure that the data is stored exactly in this format? Only then, this can work at all, still, it is very risky, file producer and consumer could speak different languages and chances are that you get bad input...
Now admitted, I am no matlab expert at all, still the following line of yours lets me doubt strongly that above input format applies:
PV = fread(fid, [A*3 B+2], 'float32');
^^
Finally, you have read your data already within the if clause, so drop this second reading line, it is for nothing but producing another failure...
If now data is not stored in binary, but human readable format, you could read the values in as follows:
std::vector<double> pv; // prefer lower camel case variable names
pv.reserve(size/sizeof(double)); // just using as a size hint;
// we can't deduce number of entries
// from file length exactly any more
double v;
while(is >> v)
{
pv.push_back(v);
}
if(!v.eof())
{
// we did not consume the whole file, so we must
// assume that some input error occurred!
// -> appropriate error handling (throw?)
}
Getting to the end slowly:
// std::vector<double> Result1 =
// std::vector<double> Result2 =
// std::vector<double> Result3 =
Commented out; right, you don't need them, you have them already in the result vector/array, i. e. result[0], result[1] and result[2]...
Resize them (or reserve) as needed to place your result data into and go on.
I am sorry now I am not really aware about what your matlab calculations do and I'm not going to learn matlab for this answer – still, with the hints above you might get along yourself. Just a further hint: you cannot multiply vectors/arrays as a whole with each other or with a scalar directly; you have to do this for each element separately within loops. You might consider std::valarray an interesting alternative, though. Additionally, you might find some interesting stuff in the algorithm library, especially under the section "numeric operations". Feel free to ask another question if you do not get along with these...
I have a program that generates files containing random distributions of the character A - Z. I have written a method that reads these files (and counts each character) using fread with different buffer sizes in an attempt to determine the optimal block size for reads. Here is the method:
int get_histogram(FILE * fp, long *hist, int block_size, long *milliseconds, long *filelen)
{
char *buffer = new char[block_size];
bzero(buffer, block_size);
struct timeb t;
ftime(&t);
long start_in_ms = t.time * 1000 + t.millitm;
size_t bytes_read = 0;
while (!feof(fp))
{
bytes_read += fread(buffer, 1, block_size, fp);
if (ferror (fp))
{
return -1;
}
int i;
for (i = 0; i < block_size; i++)
{
int j;
for (j = 0; j < 26; j++)
{
if (buffer[i] == 'A' + j)
{
hist[j]++;
}
}
}
}
ftime(&t);
long end_in_ms = t.time * 1000 + t.millitm;
*milliseconds = end_in_ms - start_in_ms;
*filelen = bytes_read;
return 0;
}
However, when I plot bytes/second vs. block size (buffer size) using block sizes of 2 - 2^20, I get an optimal block size of 4 bytes -- which just can't be correct. Something must be wrong with my code but I can't find it.
Any advice is appreciated.
Regards.
EDIT:
The point of this exercise is to demonstrate the optimal buffer size by recording the read times (plus computation time) for different buffer sizes. The file pointer is opened and closed by the calling code.
There are many bugs in this code:
It uses new[], which is C++.
It doesn't free the allocated memory.
It always loops over block_size bytes of input, not bytes_read as returned by fread().
Also, the actual histogram code is rather inefficient, since it seems to loop over each character to determine which character it is.
UPDATE: Removed claim that using feof() before I/O is wrong, since that wasn't true. Thanks to Eric for pointing this out in a comment.
You're not stating what platform you're running this on, and what compile time parameters you use.
Of course, the fread() involves some overhead, leaving user mode and returning. On the other hand, instead of setting the hist[] information directly, you're looping through the alphabet. This is unnecessary and, without optimization, causes some overhead per byte.
I'd re-test this with hist[j-26]++ or something similar.
Typically, the best timing would be achieved if your buffer size equals the system's buffer size for the given media.
So in my code I have a series of chars which I want to replace with random data. Since rand can replace ints, I figured I could save some time by replacing four chars at once instead of one at a time. So basically instead of this:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i++) // generating the data to send.
TXT[i] = rand() % 255;
I'd like to do something like:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i+4) // generating the data to send.
TXT[i] = rand() % 4294967295;
Something that effect, but I'm not sure how to do the latter part. Any help you can give me is greatly appreciated, thanks!
That won't work. The compiler will take the result from rand() % big_number and chop off the extra data to fit it in an unsigned char.
Speed-wise, your initial approach was fine. The optimization you contemplated is valid, but most likely unneeded. It probably wouldn't make a noticeable difference.
What you wanted to do is possible, of course, but given your mistake, I'd say the effort to understand how right now far outweights the benefits. Keep learning, and the next time you run across code like this, you'll know what to do (and judge if it's necessary), look back on this moment and smile :).
You'll have to access memory directly, and do some transformations on your data. You probably want something like this:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght/sizeof(int); i+=sizeof(int)) // generating the data to send.
{
int *temp = (int*)&TXT[i]; // very ugly
*temp = rand() % 4294967295;
}
It can be problematic though because of alignment issues, so be careful. Alignment issues can cause your program to crash unexpectedly, and are hard to debug. I wouldn't do this if I were you, your initial code is just fine.
TXT[i] = rand() % 4294967295;
Will not work the way you expect it to. Perhaps you are expecting that rand()%4294967295 will generate a 4 byte integer(which you maybe interpreting as 4 different characters). The value that rand()%4294967295, produces will be type cast into a single char and will get assigned to only one of the index of TXT[i].
Though it's not quire clear as to why you need to make 4 assigning at the same time, one approach would be to use bit operators to obtain 4 different significant bytes of the number generated and those can then be assigned to the four different index.
There are valid answers just so much C does not care very much about what type it stores at which address. So you can get away with something like:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
char *arr;
int *iArr;
int main (void){
int i;
arr = malloc(100);
/* Error handling ommitted, yes that's evil */
iArr = (int*) arr;
for (i = 0; i < 25; i++) {
iArr[i] = rand() % INT_MAX;
}
for (i = 0; i < 25; i++) {
printf("iArr[%d] = %d\n", i, iArr[i]);
}
for (i = 0; i < 100; i++) {
printf("arr[%d] = %c\n", i, arr[i]);
}
free(arr);
return 0;
}
In the end an array is just some contiguous block in memory. And you can interpret it as you like (if you want). If you know that sizeof(int) = 4 * sizeof(char) then the above code will work.
I do not say I recommend it. And the others have pointed out whatever happened the first loop through all the chars in TXT will yield the same result. One could think for example of unrolling a loop but really I'd not care about that.
The (int*) just alone is warning enough. It means to the compiler, do not think about what you think the type is just "believe" he programmer that he knows better.
Well this "know better" is probably the root of all evil in C programming....
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i+4)
// generating the data to send.
TXT[i] = rand() % 4294967295;
This has a few issues:
TXT is not guaranteed to be memory-aligned as needed for the CPU to write int data (whether it works - perhaps relatively slowly - or not - e.g. SIGBUS on Solaris - is hardware specific)
the last 1-3 characters may be missed (even if you change i + 4 to i += 4 ;-P)
rand() returns an int anyway - you don't need to mod it with anything
you need to write your random data via an int* so you're accessing 4 bytes at a time and not simply slicing a byte off the end of the random data and overwriting every fourth single character
for stuff like this where you're dependent on the size of int, you should really write it in terms of sizeof(int) so it'll work even if int isn't 32 bits, or use a (currently sadly) non-Standard but common typedef such as int32_t (or on Windows I think it's __int32, or you can use a boost or other library header to get int32_t, or write your own typedef).
It's actually pretty tricky to align your text data: your code suggests you want int-sized slices from the 35th character... even if the overall character array is aligned properly for ints, the 35th character will not be.
If it really is always the 35th, then you can pad the data with a leading character so you're accessing the 36th (being a multiple of presumably 32-bit int size), then align the text to an 32-bit address (with a compiler-specific #pragma or using a union with int32_t). If the real code varies the character you start overwriting from, such that you can't simply align the data once, then you're stuck with:
your original character-at-a-time overwrites
non-portable unaligned overwrites (if that's possible and better on your system), OR
implementing code that overwrites up to three leading unaligned characters, then switches to 32-bit integer overwrite mode for aligned addresses, then back to character-by-character overwrites for up to three trailing characters.
That does not work because the generated value is converted to type of array element - char in this particular case. But you are free to interpret allocated memory in the manner you like. For example, you could convert it into array int:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght-sizeof(int); i+=sizeof(int)) // generating the data to send.
*(int*)(TXT+i) = rand(); // There is no need in modulo operator
for (; i < flenght; ++i) // generating the data to send.
TXT[i] = rand(); // There is no need in modulo operator either
I just want to complete solution with the remarks about modulo operator and handling of arrays not multiple of sizeof(int).
1) % means "the remainder when divided by", so you want rand() % 256 for a char, or else you will never get chars with a value of 255. Similarly for the int case, although here there is no point in doing a modulus operation anyway, since you want the entire range of output values.
2) rand usually only generates two bytes at a time; check the value of RAND_MAX.
3) 34 isn't divisible by 4 anyway, so you will have to handle the end case specially.
4) You will want to cast the pointer, and it won't work if it isn't already aligned. Once you have the cast, though, there is no need to account for the sizeof(int) in your iteration: pointer arithmetic automatically takes care of the element size.
5) Chances are very good that it won't make a noticeable difference. If scribbling random data into an array is really the bottleneck in your program, then it isn't really doing anything significiant anyway.