I have a problem with casting from float to double when fread;
fread(doublePointer,sizeofFloat,500,f);
if i change double pointer to float pointer, it works just fine.
However,i need it to be double pointer for laster on, and i thought when i write from small data type (float)to bigger data type(double)'s memory, it should be fine. but it turns out it doesnt work as i expected.
what is wrong with it, and how do i solve this problem.
i know i can solve it by converting it one by one. but i have a huge amount of data. and i dont wanna extra 9000000+ round of converting.. that would be very expensive. and is there any trick i can solve it?
is there any c++/c tricks
thanks
If you write float-formatted data into a double, you're only going to get garbage as a result. Sure, you won't overflow your buffer, but that's not the only problem - it's still going to be finding two floats where it expects a double. You need to read it as a float, then convert - casting (even implicitly) in this manner lets the compiler know that the data was originally a float and needs to be converted:
float temp[500];
int i;
fread(temp, sizeof(temp[0]), 500, f);
for (i = 0; i < 500; i++)
doublePointer[i] = temp[i];
Suppose for example a float is 4 bytes on your computer. If you read 500 floats then you read 2000 bytes, one float per float and the result is correct.
Suppose for example a double is 8 bytes on your computer. If you read 500 floats then you read 2000 bytes, but you're reading them into 250 doubles, 2 floats per double, and the result is nonsense.
If your file has 500 floats you have to read 500 floats. Cast each float value to a double value. You can convert each numeric value that way.
When you abuse a pointer, pretending that the pointer points to a type of data that it doesn't really point to, then you're not converting each numeric value, you're preserving nonsense as nonsense.
Edit: You added to your question "and i dont wanna extra 9000000+ round of converting.. that would be very expensive. and is there any trick i can solve it?" The answer is yes, you can use a trick of keeping your floats as floats. If you don't want to convert to doubles then don't convert to doubles, just keep your floats as floats.
9000000 conversions from float to double is nothing. fread into a float array, then convert that into a double array.
Benchmark this code scientifically, don't guess about where the slowdowns might be.
If you're bottlenecked on the conversion, write a unrolled, vectorized conversion loop, or use one from a commercial vector library.
If it's still too slow, tile your reads so you read in your float data in batches of a few pages that fit in L1 cache, then convert those to double, then read the next few pages and convert those to double, etc.
If it's still too slow, investigate loading your data lazily so only the parts that are needed get loaded, and only when they are used.
A modern x86 core is capable of doing two float->double conversions per cycle in a hand-tuned vectorized loop; at 2GHz, that's 4 billion conversions per second per core. 9 million conversions is small change -- my laptop does it in less than 1 millisecond.
Alternatively, just convert the whole dataset to double once, and read it in that way from now on. Problem solved.
I would look at this from a different perspective. If the data is stored as float, then that is all the precision it will ever have. There is no point in converting to double until the rules of floating point arithmetic require it.
So I would allocate a buffer for the 500 (or whatever) floats, and read them from the data file with one suitable call to fread():
float *databuffer;
//...
databuffer = malloc(500 * sizeof(float));
fread(databuffer, sizeof(Float), 500, f);
Later, use the data in whatever math it needs to participate in. It will be promoted to double if required. Don't forget to eventually free the buffer after it is no longer needed.
If your results really do have all the precision of a double, then use a fresh buffer of doubles to hold them. However, if they are to be written back to file as float, then you will eventually need to put them into a buffer of floats.
Note that reading and writing files for interchange often needs to be considered a separate problem from efficient storage and usage of data in memory. It is often necessary to read a file and process each individual value in some way. For example, a portable program might be required to handle data written by a system using a different byte order. Less frequently today, you might find that even the layout of the bits in a float differs between systems. In general, this problem is often best solved by deferring to a library implementing a standard such as XDR (defined by RFC 4506) that was designed to deal with binary portability.
Related
Recently I was (again) reading about 'endian'ness. I know how to identify the endianness of host, as there are lots of post on SO, and also I have seen this, which I think is pretty good resource.
However, one thing I like to know is to how to detect the endianness of input binary file. For example, I am reading a binary file (using C++) like following:
ifstream mydata("mydata.raw", ios::binary);
short value;
char buf[sizeof(short)];
int dataCount = 0;
short myDataMat[DATA_DIMENSION][DATA_DIMENSION];
while (mydata.read(reinterpret_cast<char*>(&buf), sizeof(buf)))
{
memcpy(&value, buf, sizeof(value));
myDataMat[dataCount / DATA_DIMENSION][dataCount%DATA_DIMENSION] = value;
dataCount++;
}
I like to know how I can detect the endianness in the mydata.raw, and whether endianness affects this program anyway.
Additional Information:
I am only manipulating the data in myDataMat using mathematical operations, and no pointer operation or bitwise operation is done on the data).
My machine (host) is little endian.
It is impossible to "detect" the endianity of data in general. Just like it is impossible to detect whether the data is an array of 4 byte integers, or twice that many 2 byte integers. Without any knowledge about the representation, raw data is just a mass of meaningless bits.
However, with some extra knowledge about the data representation, it become possible. Some examples:
Most file formats mandate particular endianity, in which case this is never a problem.
Unicode text files may optionally start with a byte order mark. Same idea can be implemented by other data representations.
Some file formats contain a checksum. You can guess one endianity, and if the checksum does not match, try again with another endianity. It will be unlikely that the checksum matches with wrong interpretation of the data.
Sometimes you can make guesses based on the data. Is the temperature outside 33'554'432 degrees, or maybe 2? You can pick the endianity that represents sane data. Of course, this type of guesswork fails miserably, when the aliens invade and start melting our planet.
You can't tell.
The endianness transformation is essentially an operator E(x) on a number x such that x = E(E(x)). So you don't know "which way round" the x elements are in your file.
I am writing some numeric code in C++ and I want to be able to swap between using double and float. I have therefore added a #define MYFLT which I can make either a float or a double as needed. However, how do I deal with the various numeric literals.
For example
MYFLT someNumber = 1.2;
MYFLT someOtherNumber = 1.5f;
gives compiler warnings for the first line when MYFLT is a float and for the second line when MYFLT is a double. I know this is a trivial example, but there are other cases where I have longer expresions with literals in and floats can end up being converted to doubles then the result back to floats which I think is costing me significant performance. How should I deal with this?
I could do things like
MYFLT someNumber = MYFLT(1.2);
MYFLT someOtherNumber = MYFLT(1.5);
but this is quite tedious. I'm assuming that in that if I do this the compiler is clever enough to just use a float when needed (can anyone confirm that?). What would be better would be if there was a MSVC++ compiler switch or #define that will tell the compiler to treat all floating point literals as floats instead of doubles. Does such a switch exist?
Even when I wrap all my literals as above my code runs 50% slower when I use float rather than double. I was expecting a performance boost through simd type operations, not a penalty!
Phil
What you'd want is #define MYFLTCONST(x) x##f or #define MYFLTCONST(x) x depending on whether you want a f suffix for float appended.
This is a (not quite complete) answer to my own question.
I found that a small function that was called many times (a fast approximation to sin) didn't have its literals cast as MYFLT. The extra computational hit of this also meant that the compiler wasn't inlining it. This function accounted for most of the difference. Some further profiling seemed to indicate that accessing std::vector<float> was slower than std::vector<double> ( I am using [] to do the access if it matters ). Replacing std::vectors with raw fixed sized arrays sped up the double implementation a little and closed the gap significantly for the float implementation. The float version is now only about 10% slower than the double version. But definitely no speed increase due to either RAM access nor vectorization. I guess I need to think more carefully about my loops to get any benefit there.
I guess the conclusion here (yet again) is that the compiler is pretty good at optimising code - it's much better to work with it and do careful profiling than it is to try and do your own blind "optimisations" which might actually have negative effects, like stopping the compiler performing good inlining.
I wrote some parameters (all of type double) to a file for use in performing some complex computations. I write the parameters to the files like so:
refStatsOut << "SomeParam:" << value_of_type_double << endl;
where refStatsOut is an ofstreamparameter. There are four such parameters, each of type double. What I see as written to the file is different from what its actual value is (in terms of loss of precision). As an example, if value_of_type_double had a value -28.07270379934792, then what I see as written in the file is -28.0727.
Also, once these stats have been computed and written I run different programs that use these statistics. The files are read and the values are initially stored as std::strings and then converted to double via atof functions. This results in the values that I have shown above and ruins the computations further down.
My question is this:
1. Is there a way to increase the resolution with which one can write values (of type double and the like) to a file so as to NOT lose any precision?
2. Could this also be a problem of std::string to double conversion with atof? If so, what other function could I use to solve this?
P.S: Please let me know in case some of the details in this question are not clear. I will try to update them and provide more details.
You can use the setprecision function.
ofstream your_file;
you can use your_file.precision(X);
The main difference between precision() and setPrecision() is that precision returns the current precision and setPrecision doesn't. Therefore, you can use precision like this.
streamsize old_precision = your_file.precision(X);
// do what ever you want
//restore precision
your_file.precision(old_precision);
a double is basically a 64-bit integer, if you want a cheap way of writing it out, you can do something like this (note I'm assuming that your compiler uses long for 64-bit ints)
double value = 32985.932235;
long *saveme = (long*)&value;
Just beware of the caveat that the saved value may not remain the same if loaded back on a different architecture.
I know that you can get the digits of a number using modulus and division. The following is how I've done it in the past: (Psuedocode so as to make students reading this do some work for their homework assignment):
int pointer getDigits(int number)
initialize int pointer to array of some size
initialize int i to zero
while number is greater than zero
store result of number mod 10 in array at index i
divide number by 10 and store result in number
increment i
return int pointer
Anyway, I was wondering if there is a better, more efficient way to accomplish this task? If not, is there any alternative methods for this task, avoiding the use of strings? C-style or otherwise?
Thanks. I ask because I'm going to be wanting to do this in a personal project of mine, and I would like to do it as efficiently as possible.
Any help and/or insight is greatly appreciated.
The time it takes to extract the digits will be dwarfed by the time required to dynamically allocate the array. Consider returning the result in a struct:
struct extracted_digits
{
int number_of_digits;
char digits[12];
};
You'll want to pick a suitable value for the maximum number of digits (12 here, which is enough for a 32-bit integer). Alternatively, you could return a std::array<char, 12> and encode the terminal by using an invalid value (so, after the last value, store a 10 or something else that isn't a digit).
Depending on whether you want to handle negative values, you'll also have to decide how to report the unary minus (-).
Unless you want the representation of the number in a base that's a power of 2, that's about the only way to do it.
Smacks of premature optimisation. If profiling proves it matters, then be sure to compare your algo to itoa - internally it may use some CPU instructions that you don't have explicit access to from C++, and which your compiler's optimiser may not be clever enough to employ (e.g. AAM, which divs while saving the mod result). Experiment (and benchmark) coding the assembler yourself. You might dig around for assembly implementations of ITOA (which isn't identical to what you're asking for, but might suggest the optimal CPU instructions).
By "avoiding the use of strings", I'm going to assume you're doing this because a string-only representation is pretty inefficient if you want an integer value.
To that end, I'm going to suggest a slightly unorthodox approach which may be suitable. Don't store them in one form, store them in both. The code below is in C - it will work in C++ but you may want to consider using c++ equivalents - the idea behind it doesn't change however.
By "storing both forms", I mean you can have a structure like:
typedef struct {
int ival;
char sval[sizeof("-2147483648")]; // enough for 32-bits
int dirtyS;
} tIntStr;
and pass around this structure (or its address) rather than the integer itself.
By having macros or inline functions like:
inline void intstrSetI (tIntStr *is, int ival) {
is->ival = i;
is->dirtyS = 1;
}
inline char *intstrGetS (tIntStr *is) {
if (is->dirtyS) {
sprintf (is->sval, "%d", is->ival);
is->dirtyS = 0;
}
return is->sval;
}
Then, to set the value, you would use:
tIntStr is;
intstrSetI (&is, 42);
And whenever you wanted the string representation:
printf ("%s\n" intstrGetS(&is));
fprintf (logFile, "%s\n" intstrGetS(&is));
This has the advantage of calculating the string representation only when needed (the fprintf above would not have to recalculate the string representation and the printf only if it was dirty).
This is a similar trick I use in SQL with using precomputed columns and triggers. The idea there is that you only perform calculations when needed. So an extra column to hold the indexed lowercased last name along with an insert/update trigger to calculate it, is usually a lot more efficient than select lower(non_lowercased_last_name). That's because it amortises the cost of the calculation (done at write time) across all reads.
In that sense, there's little advantage if your code profile is set-int/use-string/set-int/use-string.... But, if it's set-int/use-string/use-string/use-string/use-string..., you'll get a performance boost.
Granted this has a cost, at the bare minimum extra storage required, but most performance issues boil down to a space/time trade-off.
And, if you really want to avoid strings, you can still use the same method (calculate only when needed), it's just that the calculation (and structure) will be different.
As an aside: you may well want to use the library functions to do this rather than handcrafting your own code. Library functions will normally be heavily optimised, possibly more so than your compiler can make from your code (although that's not guaranteed of course).
It's also likely that an itoa, if you have one, will probably outperform sprintf("%d") as well, given its limited use case. You should, however, measure, not guess! Not just in terms of the library functions, but also this entire solution (and the others).
It's fairly trivial to see that a base-100 solution could work as well, using the "digits" 00-99. In each iteration, you'd do a %100 to produce such a digit pair, thus halving the number of steps. The tradeoff is that your digit table is now 200 bytes instead of 10. Still, it easily fits in L1 cache (obviously, this only applies if you're converting a lot of numbers, but otherwise efficientcy is moot anyway). Also, you might end up with a leading zero, as in "0128".
Yes, there is a more efficient way, but not portable, though. Intel's FPU has a special BCD format numbers. So, all you have to do is just to call the correspondent assembler instruction that converts ST(0) to BCD format and stores the result in memory. The instruction name is FBSTP.
Mathematically speaking, the number of decimal digits of an integer is 1+int(log10(abs(a)+1))+(a<0);.
You will not use strings but go through floating points and the log functions. If your platform has whatever type of FP accelerator (every PC or similar has) that will not be a big deal ,and will beat whatever "sting based" algorithm (that is noting more than an iterative divide by ten and count)
I'm dealing with a problem which needs to work with a lot of data. Currently its values are represented as an unsigned int. I know that real values do not exceed a limit of 1000.
Questions
I can use unsigned short to store it. An upside to this is that it'll use less storage space to store the value. Will performance suffer?
If I decided to store data as short but all the calling functions use int, it's recognized that I need to convert between these datatypes when storing or extracting values. Will performance suffer? Will the loss in performance be dramatic?
If I decided to not use short but just 10 bits packed into an array of unsigned int. What will happen in this case comparing with previous ones?
This all depends on architecture. Bit-fields are generally slower, but if you are able to to significantly cut down memory usage with them, you can even gain in performance due to better CPU caching and similar things. Likewise with short (though it is not dramatic in any case).
The best way is to make your source code able to switch representation easily (at compilation time, of course). Then you will be able to test and profile different implementations in your specific circumstances just by, say, changing one #define.
Also, don't forget about premature optimization rule. Make it work first. If it turns out to be slow/not fast enough, only then try to speed up.
I can use unsigned short to store it.
Yes you can use unsigned short (assuming (sizeof(unsigned short) * CHAR_BITS) >= 10)
An upside to this is that it'll use less storage space to store the value.
Less than what? Less than int? Depends what is the sizeof(int) on your system?
Will performance suffer?
Depends. The type int is supposed to be the most efficient integer type for your system so potentially using short may affect your performance. Whether it does will depend on the system. Time it and find out.
If I decided to store data as short but all the calling functions use int, it's recognized that I need to convert between these datatypes when storing or extracting values.
Yes. But the compiler will do the conversion automatically. One thing you need to watch though is conversion between signed and unsigned types. If the value does not fit the exact result may be implementation defined.
Will performance suffer?
Maybe. if sizeof(unsigned int) == sizeof(unsigned short) then probably not. Time it and see.
Will the loss in performance be dramatic?
Time it and see.
If I decided to not use short but just 10 bits packed into an array of unsigned int. What will happen in this case comparing with previous ones?
Time it and see.
A good compromise for you is probably packing three values into a 32 bit int (with two bits unused). Untangling 10 bits from a bit array is a lot more expensive, and doesn't save much space. You can either use bit fields, or do it by hand yourself:
(i&0x3FF) // Get i[0]
(i>>10)&0x3FF // Get i[1]
(i>>20)&0x3FF // Get i[2]
i = (i&0x3FFFFC00) | (j&0x3FF) // Set i[0] to j
i = (i&0x3FF003FF) | ((j&0x3FF)<<10) // Set i[1] to j
i = (i&0xFFFFF) | ((j&0x3FF)<<20) // Set i[2] to j
You can see here how much extra expense it is: a bit operation and 2/3 of a shift (on average) for get, and three bit operations and 2/3 of a shift (on average) to set. Probably not too bad, especially if you're mostly getting the values not setting them.