Translating Matlab code of vector creation to C++ - c++

I have this Matlab code:
[arr1 arr2 arr3] = fReadFileBin(filename));
Where the body of the functions is :
function [Result1 , Result2 , Result3 ] = fReadFileBin(filename)
fid = fopen(filename, 'r');
fseek(fid, 180, 0);
PV = fread(fid, [A*3 B+2], 'float32');
fclose(fid);
Result1 = PV(1:3:3*A, 2:B+1);
Result1 = Result1';
Result2 = PV(2:3:3*A, 2:B+1);
Result2 = Result2';
Result3 = PV(3:3:3*A, 2:B+1);
Result3 = Result3';
As result I have 3 filled vectors of size BxA and type Double.
When I tried to rewrite it in C++:
std::vector<std::vector<double>> result;
result.resize(B, std::vector<double>(A));
std::ifstream is(filename, std::ios::binary);
is.seekg(0, std::ios_base::end);
std::size_t size = is.tellg();
is.seekg(0, std::ios_base::beg);
is.seekg( 180, 0);
std::vector<double> PV (size / sizeof(double));
if (!is.read((char*)&PV[0], size))
{
throw std::runtime_error("error reading file");
}
// Load the data
is.read((char*)&PV[0], size);
is.close();
// std::vector<double> Result1 =
// std::vector<double> Result2 =
// std::vector<double> Result3 =
//R=R'
//R[j][i] = R[i][j];
This question does make sense for me, but still don't get how I can rewrite this part: (1:3:3*A, 2:B+1) in C++ ?
Notes:
-I'm limited to use only standard libraries (no boost, mmap, etc.).
-I checked Mathwork documentation about colon (and still cannot understand how to implement it).

As the result size of the vectors is fix, I'd rather use std::array:
std::array<std::vector<double>, 3> result;
No resize then any more either, which would have looked much simpler anyway:
//result.resize(B, std::vector<double>(A));
result.resize(3);
With this line, your outer vector now contains exactly three vectors - each of them yet empty - just as with the array approach. Which ever you finally select, you need to resize the inner vectors explicitly then. We'll come back to this later, though.
is.seekg(0, std::ios_base::end);
std::size_t size = is.tellg(); // OK, you fetched file size
//is.seekg(0, std::ios_base::beg); // using beg, you can give the desired offset directly
//is.seekg( 180, 0); // but use the seekdir enum! So:
is.seekg(180, std::ios_base::beg);
However, you should check for the file having at least 180 bytes before. You should be aware that any of these operations might fail, so you should check the stream's state, either after each single operation or at least after several of them in group (so at least before resizing your vector PV). Side note: If the stream is already in fail state, every subsequent operation will fail, too, unless you clear() the error state before.
std::vector<double> PV (size / sizeof(double));
Uh, looks strange to me... You start at offset 180, so I assume you should subtract before division; i. e.:
size_t size = ...;
if(size < 180) // bad file!
{
// throw or whatever appropriate
}
size -= 180;
// go on...
Without this fix, next line would have always resulted in the following exception being thrown because you would have read beyond the end of the file (remember, you started reading from file offset 180!):
if (!is.read((char*)&PV[0], size))
Prefer C++ style casts, though:
if (!is.read(reinterpret_cast<char*>(PV.data()), size))
You'll discover quickly that you need the reinterpret_cast, sometimes appropriate, but should at least ring the alarm bells if you consider using it, in most cases, it is just hiding away some deeper problem such as undefined behaviour. PV.data() exists since C++11 and reads a little easier than &PV[0], but both are equivalent.
However, we now have yet a different issue:
Although the standard does not state anything about precision or even format ("The value representation of floating-point types is implementation-defined."), it is most likely that on your system double is a 64-bit IEEE754 data type. Are you really sure that the data is stored exactly in this format? Only then, this can work at all, still, it is very risky, file producer and consumer could speak different languages and chances are that you get bad input...
Now admitted, I am no matlab expert at all, still the following line of yours lets me doubt strongly that above input format applies:
PV = fread(fid, [A*3 B+2], 'float32');
^^
Finally, you have read your data already within the if clause, so drop this second reading line, it is for nothing but producing another failure...
If now data is not stored in binary, but human readable format, you could read the values in as follows:
std::vector<double> pv; // prefer lower camel case variable names
pv.reserve(size/sizeof(double)); // just using as a size hint;
// we can't deduce number of entries
// from file length exactly any more
double v;
while(is >> v)
{
pv.push_back(v);
}
if(!v.eof())
{
// we did not consume the whole file, so we must
// assume that some input error occurred!
// -> appropriate error handling (throw?)
}
Getting to the end slowly:
// std::vector<double> Result1 =
// std::vector<double> Result2 =
// std::vector<double> Result3 =
Commented out; right, you don't need them, you have them already in the result vector/array, i. e. result[0], result[1] and result[2]...
Resize them (or reserve) as needed to place your result data into and go on.
I am sorry now I am not really aware about what your matlab calculations do and I'm not going to learn matlab for this answer – still, with the hints above you might get along yourself. Just a further hint: you cannot multiply vectors/arrays as a whole with each other or with a scalar directly; you have to do this for each element separately within loops. You might consider std::valarray an interesting alternative, though. Additionally, you might find some interesting stuff in the algorithm library, especially under the section "numeric operations". Feel free to ask another question if you do not get along with these...

Related

Element-wise shifting from smaller array to a larger array

I am programming an ESP32 in the Arduino framework. For my application, I need to create a buffer which will store information from both the present and the last time it was accessed. Here is what I am attempting to do.
//first buffer
char buffer1[4];
//second buffer
char buffer2[8];
void setup {
//setup
}
//buffer1 values will change with each iteration of loop from external inputs
//buffer2 must store most recent values of buffer1 plus values of buffer1 from when loop last ran
for example:
**loop first iteration**
void loop {
buffer1[0] = {1};
buffer1[1] = {2};
buffer1[2] = {3};
buffer1[3] = {1};
saveold(); //this is the function I'm trying to implement to save values to buffer2 in an element-wise way
}
//value of buffer2 should now be: buffer2 = {1,2,3,1,0,0,0,0}
**loop second iteration**
void loop {
buffer1[0] = {2};
buffer1[1] = {3};
buffer1[2] = {4};
buffer1[3] = {2};
saveold();
}
//value of buffer2 should now be: buffer2 = {2,3,4,2,1,2,3,1}
From what I've been able to understand through searching online, the "saveold" function I'm trying to make
should implement some form of memmove for these array operations
I've tried to piece it together, but I always overwrite the value of buffer2 instead of somehow shifting new values in, while retaining the old ones
This is all I've got:
void saveold() {
memmove(&buffer2[0], &buffer1[0], (sizeof(buffer1[0]) * 4));
}
From my understanding, this copies buffer1 starting from index position 0 to buffer2, starting at index position 0, for 4 bytes (where 1 char = 1 byte).
Computer science is not my backround, so perhaps there is some fundamental solution or strategy that I am missing. Any pointers would be appreciated.
You have multiple options to implement saveold():
Solution 1
void saveold() {
// "shift" lower half into upper half, saving recent values (actually it's a copy)
buffer2[4] = buffer2[0];
buffer2[5] = buffer2[1];
buffer2[6] = buffer2[2];
buffer2[7] = buffer2[3];
// copy current values
buffer2[0] = buffer[0];
buffer2[1] = buffer[1];
buffer2[2] = buffer[2];
buffer2[3] = buffer[3];
}
Solution 2
void saveold() {
// "shift" lower half into upper half, saving recent values (actually it's a copy)
memcpy(buffer2 + 4, buffer2 + 0, 4 * sizeof buffer2[0]);
// copy current values
memcpy(buffer2 + 0, buffer1, 4 * sizeof buffer1[0]);
}
Some notes
There are even more ways to do it. Anyway, choose the one you understand best.
Be sure that buffer2 is exactly double size of buffer1.
memcpy() can be used safely if source and destination don't overlap. memmove() checks for overlaps and reacts accordingly.
&buffer1[0] is the same as buffer1 + 0. Feel free to use the expression you better understand.
sizeof is an operator, not a function. So sizeof buffer[0] evaluates to the size of buffer[0]. A common and most accepted expression to calculate the size of an array dimension is sizeof buffer1 / sizeof buffer1[0]. You only need parentheses if you evaluate the size of a data type, like sizeof (int).
Solution 3
The last note leads directly to this improvement of solution 1:
void saveold() {
// "shift" lower half into upper half, saving recent values
size_t size = sizeof buffer2 / sizeof buffer2[0];
for (int i = 0; i < size / 2; ++i) {
buffer2[size / 2 + i] = buffer2[i];
}
// copy current values
for (int i = 0; i < size / 2; ++i) {
buffer2[i] = buffer1[i];
}
}
To apply this knowledge to solution 2 is left as an exercise for you. ;-)
The correct way to do this is to use buffer pointers, not by doing hard-copy backups. Doing hardcopies with memcpy is particularly bad on slow legacy microcontrollers such as AVR. Not quite sure what MCU this ESP32 got, seems to be some oddball one from Tensilica. Anyway, this answer applies universally for any processor where you have more data than CPU data word length.
perhaps there is some fundamental solution or strategy that I am missing.
Indeed - it really sounds that what you are looking for is a ring buffer. That is, an array of fixed size which has a pointer to the beginning of the valid data, and another pointer at the end of the data. You move the pointers, not the data. This is much more efficient both in terms of execution speed and RAM usage, compared to making naive hardcopies with memcpy.

file size and buffer overshoot

I have a function which opens a file from an SD card, uses the file size to set the size of a buffer, writes a block of information to that buffer, then does something with that information, as shown in this code:
char filename = "filename.txt";
uint16_t duration;
uint16_t pixel;
int q = 0;
int w = 0;
bool largefile;
File f;
int readuntil;
long large_buffer;
f = SD.open(filename);
if(f.size() > 3072) {
w = 3072;
} else {
w = f.size();
}
uint8_t buffer[w];
while(f.available()) {
f.read(buffer, sizeof(buffer));
while(q < sizeof(buffer)) {
doStuffWithInformation(buffer[q++]);
}
q=0;
}
f.close;
This works great with smaller file sizes, but anything over the hard limit buffer size of 3072 (which I arrived at empirically, its just the amount of memory that can be safely committed to this function), runs into a problem. Larger files read fine until they hit the last loop of while(f.available()), where they read the end of the file, but then continue reading the buffer, the tail end of which is filled with data from the last loop, that wasn't overwritten by the latest f.read(). How can I make sure that the last loop of the while(f.available()) function only works with the information that was written to the buffer during the current loop? My only idea right now is to solve for factors of the file size, and set the buffer size as the largest factor less than 3072, but this seems intensive to run every time this function is called. Is there an elegant solution staring me in the face?
Your program is not behaving correctly because f.read() is not guaranteed to read the whole buffer. Moreover, it is bound to happen when you read the last chunk of the file, unless the file size is a factor of buffer size (3072 in your case).
While Arduino specification (https://www.arduino.cc/en/Reference/FileRead) doesn't say so, SD.read function returns the number of bytes read. See code of the library here: https://github.com/arduino-libraries/SD/blob/master/src/utility/SdFile.cpp, int16_t SdFile::read(void* buf, uint16_t nbyte)
Knowing that, you should change your loop as following (while also rewriting it as a for loop for better readability and removing q definition above):
while(f.available()) {
uint16_t sz = f.read(buffer, sizeof(buffer));
for (uint16_t q = 0; q < sz; ++q) {
doStuffWithInformation(buffer[q]);
}
}
On a side note, now, when you have this logic in place, it would make sense for you to do away with variable length array and use a fixed buffer of size 512 - the standard sector size on the SD card. Most likely, it will yield the same performance in regards to read, and slightly better performance in regards to sizeof, which will becomes a compile-time constant rather than a run-time calculation. This also makes your program simpler. This makes for following code:
f = SD.open(filename);
...
uint8_t buffer[512];

C++ converting string containing non human readable data to 200 double

I have a string whose length is 1600 and I know that it contains 200 double. When I print out the string I get the following :Y���Vz'#��y'#��!U�}'#�-...
I would like to convert this string to a vector containing the 200 doubles.
Here is the code I tried (blobString is a string 1600 characters long):
string first_eight = blobString.substr(0, sizeof(double)); // I get the first 8 values of the string which should represent the first double
double double_value1
memcpy(&double_value1, &first_eight, sizeof(double)); // First thing I tried
double* double_value2 = (double*)first_eight.c_str(); // Second thing I tried
cout << double_value1 << endl;
cout << double_value2 << endl;
This outputs the following:
6.95285e-310
0x7ffd9b93e320
--- Edit solution---
The second method works all I had to do was look to where double_value1 was pointing.
cout << *double_value2 << endl;
Here's an example that might get you closer to what you need. Bear in mind that unless the numbers in your blob are in the exact format that your particular C++ compiler expects, this isn't going to work like you expect. In my example I'm building up the buffer of doubles myself.
Let's start with our array of doubles.
double doubles[] = { 0.1, 5.0, 0.7, 8.6 };
Now I'll build an std::string that should look like your blob. Notice that I can't simply initialize a string with a (char *) that points to the base of my list of doubles, as it will stop when it hits the first zero byte!
std::string double_buf_str;
double_buf_str.append((char *)doubles, 4 * sizeof(double));
// A quick sanity check, should be 32
std::cout << "Length of double_buf_str "
<< double_buf_str.length()
<< std::endl;
Now I'll reinterpret the c_str() pointer as a (double *) and iterate through the four doubles.
for (auto i = 0; i < 4; i++) {
std::cout << ((double*)double_buf_str.c_str())[i] << std::endl;
}
Depending on your circumstances you might consider using a std::vector<uint8_t> for your blob, instead of an std::string. C++11 gives you a data() function that would be the equivalent of c_str() here. Turning your blob directly into a vector of doubles would give you something even easier to work with--but to get there you'd potentially have to get dirty with a resize followed by a memcpy directly into the internal array.
I'll give you an example for completeness. Note that this is of course not how you would normally initialize a vector of doubles...I'm imagining that my double_blob is just a pointer to a blob containing a known number of doubles in the correct format.
const int count = 200; // 200 doubles incoming
std::vector<double> double_vec;
double_vec.resize(count);
memcpy(double_vec.data(), double_blob, sizeof(double) * count);
for (double& d : double_vec) {
std::cout << d << std::endl;
}
#Mooning Duck brought up the great point that the result of c_str() is not necessarily aligned to an appropriate boundary--which is another good reason not to use std::string as a general purpose blob (or at least don't interpret the internals until they are copied somewhere that guarantees a valid alignment for the type you are interested in). The impact of trying to read a double from a non-aligned location in memory will vary depending on architecture, giving you a portability concern. In x86-based machines there will only be a performance impact AFAIK as it will read across alignment boundaries and assemble the double correctly (you can test this on a x86 machine by writing then reading back a double from successive locations in a buffer with an increasing 1-byte offset--it'll just work). In other architectures you'll get a fault.
The std::vector<double> solution will not suffer from this issue due to guarantees about the alignment of newed memory built into the standard.

seekg, relative or absolute position?

I need to read an array from a file. The array is not ordered continuously in the file, have to jump "offset" bytes to get the next element.
What is more efficient, assuming that I read a very large file.
1) Use an incremental relative position.
2) Use an absolute position.
option 1:
int var[N];
seekg(0);
for (int i=0; i<N; i++) {
file.read( (char*) var+i, sizeof(int))
seekg(offset,ios_base::cur);
}
option 2:
int var[N];
for (int i=0; i<N; i++) {
file.seekg(offset*i);
read( (char*) var+i, sizeof(int))
}
read will already advance the position, so you don't need to seek inside the loop. Moreover, arrays are laid out contiguously in memory, so you can just say:
std::vector<int> var(N);
auto res = file.read(reinterpret_cast<char*>(var.data()), sizeof(int) * var.size());
Just make sure to check the value of res and of file afterwards:
if (!file || res != sizeof(int) * var.size())
{
// an error occurred
}
If you're reading from random parts of the file, it makes no difference how you seek (files are essentially "random access"). But be sure to run the above test after every single read to catch errors.
I'm 99.9% sure that it will make no difference at all (aside from correctness in terms of offset needs to be correctly adjusted for the fact that you've moved sizeof(int) bytes forward in the relative case, and not in the absolute case. In both cases, you do a seek, which will move the current position in the file. The actual code in the filesystem that deals with that will ultimately move to an absolute position by calculating it from the current one in the case of ios_base::cur).
If it's REALLY important for you to know which is better, then benchmark the two options. But I'm pretty certain that it makes absolutely no difference at all inside the actual seek function in the filesystem. It's just a large integer (probably 64 bits) keeping track of where in the file you are reading (or writing) next.

Replacing multiple chars at the same time

So in my code I have a series of chars which I want to replace with random data. Since rand can replace ints, I figured I could save some time by replacing four chars at once instead of one at a time. So basically instead of this:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i++) // generating the data to send.
TXT[i] = rand() % 255;
I'd like to do something like:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i+4) // generating the data to send.
TXT[i] = rand() % 4294967295;
Something that effect, but I'm not sure how to do the latter part. Any help you can give me is greatly appreciated, thanks!
That won't work. The compiler will take the result from rand() % big_number and chop off the extra data to fit it in an unsigned char.
Speed-wise, your initial approach was fine. The optimization you contemplated is valid, but most likely unneeded. It probably wouldn't make a noticeable difference.
What you wanted to do is possible, of course, but given your mistake, I'd say the effort to understand how right now far outweights the benefits. Keep learning, and the next time you run across code like this, you'll know what to do (and judge if it's necessary), look back on this moment and smile :).
You'll have to access memory directly, and do some transformations on your data. You probably want something like this:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght/sizeof(int); i+=sizeof(int)) // generating the data to send.
{
int *temp = (int*)&TXT[i]; // very ugly
*temp = rand() % 4294967295;
}
It can be problematic though because of alignment issues, so be careful. Alignment issues can cause your program to crash unexpectedly, and are hard to debug. I wouldn't do this if I were you, your initial code is just fine.
TXT[i] = rand() % 4294967295;
Will not work the way you expect it to. Perhaps you are expecting that rand()%4294967295 will generate a 4 byte integer(which you maybe interpreting as 4 different characters). The value that rand()%4294967295, produces will be type cast into a single char and will get assigned to only one of the index of TXT[i].
Though it's not quire clear as to why you need to make 4 assigning at the same time, one approach would be to use bit operators to obtain 4 different significant bytes of the number generated and those can then be assigned to the four different index.
There are valid answers just so much C does not care very much about what type it stores at which address. So you can get away with something like:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
char *arr;
int *iArr;
int main (void){
int i;
arr = malloc(100);
/* Error handling ommitted, yes that's evil */
iArr = (int*) arr;
for (i = 0; i < 25; i++) {
iArr[i] = rand() % INT_MAX;
}
for (i = 0; i < 25; i++) {
printf("iArr[%d] = %d\n", i, iArr[i]);
}
for (i = 0; i < 100; i++) {
printf("arr[%d] = %c\n", i, arr[i]);
}
free(arr);
return 0;
}
In the end an array is just some contiguous block in memory. And you can interpret it as you like (if you want). If you know that sizeof(int) = 4 * sizeof(char) then the above code will work.
I do not say I recommend it. And the others have pointed out whatever happened the first loop through all the chars in TXT will yield the same result. One could think for example of unrolling a loop but really I'd not care about that.
The (int*) just alone is warning enough. It means to the compiler, do not think about what you think the type is just "believe" he programmer that he knows better.
Well this "know better" is probably the root of all evil in C programming....
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i+4)
// generating the data to send.
TXT[i] = rand() % 4294967295;
This has a few issues:
TXT is not guaranteed to be memory-aligned as needed for the CPU to write int data (whether it works - perhaps relatively slowly - or not - e.g. SIGBUS on Solaris - is hardware specific)
the last 1-3 characters may be missed (even if you change i + 4 to i += 4 ;-P)
rand() returns an int anyway - you don't need to mod it with anything
you need to write your random data via an int* so you're accessing 4 bytes at a time and not simply slicing a byte off the end of the random data and overwriting every fourth single character
for stuff like this where you're dependent on the size of int, you should really write it in terms of sizeof(int) so it'll work even if int isn't 32 bits, or use a (currently sadly) non-Standard but common typedef such as int32_t (or on Windows I think it's __int32, or you can use a boost or other library header to get int32_t, or write your own typedef).
It's actually pretty tricky to align your text data: your code suggests you want int-sized slices from the 35th character... even if the overall character array is aligned properly for ints, the 35th character will not be.
If it really is always the 35th, then you can pad the data with a leading character so you're accessing the 36th (being a multiple of presumably 32-bit int size), then align the text to an 32-bit address (with a compiler-specific #pragma or using a union with int32_t). If the real code varies the character you start overwriting from, such that you can't simply align the data once, then you're stuck with:
your original character-at-a-time overwrites
non-portable unaligned overwrites (if that's possible and better on your system), OR
implementing code that overwrites up to three leading unaligned characters, then switches to 32-bit integer overwrite mode for aligned addresses, then back to character-by-character overwrites for up to three trailing characters.
That does not work because the generated value is converted to type of array element - char in this particular case. But you are free to interpret allocated memory in the manner you like. For example, you could convert it into array int:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght-sizeof(int); i+=sizeof(int)) // generating the data to send.
*(int*)(TXT+i) = rand(); // There is no need in modulo operator
for (; i < flenght; ++i) // generating the data to send.
TXT[i] = rand(); // There is no need in modulo operator either
I just want to complete solution with the remarks about modulo operator and handling of arrays not multiple of sizeof(int).
1) % means "the remainder when divided by", so you want rand() % 256 for a char, or else you will never get chars with a value of 255. Similarly for the int case, although here there is no point in doing a modulus operation anyway, since you want the entire range of output values.
2) rand usually only generates two bytes at a time; check the value of RAND_MAX.
3) 34 isn't divisible by 4 anyway, so you will have to handle the end case specially.
4) You will want to cast the pointer, and it won't work if it isn't already aligned. Once you have the cast, though, there is no need to account for the sizeof(int) in your iteration: pointer arithmetic automatically takes care of the element size.
5) Chances are very good that it won't make a noticeable difference. If scribbling random data into an array is really the bottleneck in your program, then it isn't really doing anything significiant anyway.