Float or integer for storing coordinates - c++

I am working on an application which is basically related with drawing annotation on image using MFC's api.
The coordinates required for drawing these annotation is persisted in xml file.
It also handle the scaling of annotations on changing the zoom-level of image.
The problem is that when scaling the coordinates the immediate result is double or float, but we save the result as integer, which result in lots of errors/deviations.
Will it be nice to save the coordinate as float in the xml, also performing the immediate operation on float?
And finally convert it to integer for using in api like LineTo(), MoveTo() which needs long.
Any suggestion or advice on this will be very helpful.
Thanks

I've worked with graphics pipelines for quite some time.
For something that involves scaling, I insist that you store all your data as doubles. Especially when you plan to go from integer to floating-point and floating-point to integer. Far less error when scaling as well.
There is no harm in storing these values in XML any differently than integers.
Also, CPUs these days are quite optimized for floating-point operations.

When serializing the coordinates into the xml, you can use reinterpret_cast<int> to simply save them as an integer with the same binary representation as the float. Conversely, during deserialization, use reinterpret_cast<float> to recover the original number. You shouldn't lose precision on saving/loading this way.
As far as errors go, the solution is trivial: don't save as integer. Keep the floats (I'd even get behind PhoenixX_2's suggestion to upgrade to doubles), then, while drawing, cast them to a temporary int variable.
edit: Note that if you do decide to use double instead of float, you'll need to account for that during serialization, as doubles are 64-bit, not 32. You could also just save the number as a human-readable decimal, which is probably the most obvious way to do it.

Related

Printing bits as IEEE-754 float

Is there some clever and reliable way to print series of bits as an IEEE-754 without actually using a float type?
I have found a way to print fractions, which allows me to represent the float as a a fraction. However, I then came to realize that the exponent may range from -127 to 128 (after adjusting with bias), which may result in the multiplication mantissa * 2^128. The fraction method relies on representing the numerator as an integer, and I would require a really large integer to do this multiplication. I mean, I could use "custom" type to represent this large value (i.e. https://gmplib.org/), but I would prefer if to avoid this. If we multiplied by 10^x, I could simply adjust the decimal point and add some zeros, but sadly that's not the case either.
I have not been able to find anything that mentions any solution for this. Probably due to the fact that googling stuff like "print from
Why am I actually trying to do this?
I'm only doing this to get a better understanding of how floats (IEEE-754 in particular) work, and I find that it always help to do some practical example. So I thought "Hey, why not try to code it?". This has no practical application (that I know of)!
So, almost immediately after posting this, I finally succeded in finding the resources I've been looking for.
https://www.ryanjuckett.com/printing-floating-point-numbers/ talks about it, and references other relevant sources.

Setting float precision in hdf5 dataset

I'm surprised I wasn't able to find an answer to this question. I'm writing float values to an hdf5 dataset, and I want to set the precision at 10 decimals. From the documentation on hdf5 datasets, there doesn't seem to be any way to set precision. The closest I get is doing either 'float32' or 'float64', but 'float32' cuts off my numbers. File size is a big concern for me, and the unnecessary digits for 'float64' make the file significantly larger. Is it possible to choose precision with hdf5?
An example of my issue:
With the true value of data[0] being 0.0066896507
group.create_dataset(name, data=data, dtype='float64')
data[0] yields 0.0066896506999999999, but
group.create_dataset(name, data=data, dtype='float32')
gives me 0.0066896505, which is incorrect. Other numbers in the dataset are even more incorrect.
It's also odd, because when I do
x = h5py.File(my_file,'r')
print(x['dataset'][0])
it gives me the correct number. But when I just type x['dataset'][0] into the console, it gives what I wrote above. How is the data actually being stored? Is it really giving those extra digits? As you can see I'm a little new to hdf5 (and python in general). Thanks for the help.
To create custom precision types, you'll need to drop to the low-level bindings of h5py, specifically the function/types outlined http://api.h5py.org/h5t.html#atomic-classes. See https://github.com/h5py/h5py/blob/master/h5py/h5t.pyx#L202 for an example of how this is done (for half/16-bit floats).
However, this probably isn't what you want (given the reference to decimal digits). Whilst base-10 based floating point numbers exist (see e.g. https://en.wikipedia.org/wiki/Decimal64_floating-point_format), in practice if you're using python all floating point numbers are base-2. This means you care about the number of bits it's stored in (and what format, see https://en.wikipedia.org/wiki/IEEE_754#Basic_and_interchange_formats). Also worth noting is it's entirely possible to print more digits than you have precision for (e.g. I can print float32 which stores ~7 significant figures with 30 significant figures, but that doesn't mean I have 30 significant figures worth of precision). So based on the fact that you care about at least 10 significant figures worth of precision, you should use float64 (which is also known as double, binary64)
If you are concerned about file size, it's worth looking at h5py's compression support, see http://docs.h5py.org/en/latest/high/dataset.html#filter-pipeline.

An Alternative to Floating-Point for Storing Simple Fractional Values

Firstly, the problem I'm trying to solve is coming up with a better representation for values that will always remain uniformly distributed in the range:
0.0 <= x < 1.0
The motivation for this is to attempt to reduce the number of bytes used to store this data (the application is heavily memory and I/O bandwidth bound). Currently a 32-bit floating-point representation is used, 16-bit floating-point is proving insufficiently accurate.
My initial thoughts are to try and store the data in a 16-bit integer and to simply use the scheme:
x/(2^16 - 1) [x is an unsigned short]
To keep the algorithms largely the same and to retain use of the same floating-point hardware operations (at least at first), I would ideally like to keep converting this fractional representation into floating-point representation, performing the operation(s), then converting back into fractional representation for storage.
Clearly, there will be a loss of precision going back and forth between these two quite different, imprecise representations, but for our application, I suspect this might be an acceptable tradeoff.
I've done some research looking at what is currently out there that might give us a good starting point. The seminal "What Every Computer Scientist Should Know About Floating-Point Arithmetic" article (http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) led me to look at a few others, "Beyond Floating Point" (home.ccil.org/~cowan/temp/p319-clenshaw.pdf) being one such example.
Can anyone point me to other examples of representations that people have used elsewhere that might satisfy these requirements?
I'm concerned that any potential gain in exactness of representation (we're wasting much of the floating-point format currently by using this specific range) will be completely out-weighed by the requirement to round twice going from fractional representation to floating-point and back again. In which case, it may be required to do arithmetic using this fractional representation directly to get any benefit out of this approach. Any advice on this point would be helpful?
Don't use 2^16-1. Use 2^16. Yes, you will have very slightly less precision and waste your 0xFFFF, but you will guarantee that there is no loss in precision when converting to floating point. (In contrast, when converting away from floating point, you will lose 8 bits of mantissal precision.)
Round-trip conversions between precisions can cause problems with certain operations, in particular progressively summing numbers. If at all possible, treat your fixed-point values as "dirty", and don't use them for further floating-point computations; prefer recalculating from inputs to using intermediate results which are in fixed-point form.
Alternatively, use 24 bits. With this representation, you will lose no precision in either direction as long as your values don't underflow (that is, as long as they're above 2^-24).
Wouldn't 1/x be badly distributed in your range? 1/2 1/3 1/4 .. do you not want to represent numbers above 1/2?
This kind of thing is done in Netcdf quite a lot to encode data for saving space.
const double scale = 1.0/65536;
unsigned short x;
Any number in x is really x*scale
See example in NetCDF for a more general approach using scale and offset: http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/tutorial/NetcdfDataset.html
Have a look at "Packed Data Values" section of this page:
https://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html#Packed%20Data%20Values

changing float type to short but with same behaviour as float type variable

Is it possible to change the
float *pointer
type that is used in the VS c++ project
to some other type, so that it will still behave as a floating type but with less range?
I know that the floating point values never exceed some fixed value in that project, so I want to optimize the program by memory it uses. It doesn't need 4 bytes for each element of the 'float *pointer', 2 bytes will be enough I think. If I change a float to short and imitate the floating point behaviour, then it will use twice shorter memory. How to do it?
EDIT:
It calculates the probabilities. So there are divisions like
A / B
Where A < B,
And also B (and A) can be from 1 to 10 000.
There is a standard 16-bit floating point format described in IEEE 754-2008 called "binary16". It is specified as a format to store floating point values with reduced precisions. There is almost no compiler support for that yet (I think GCC supports it for certain ARM platforms), but it is quite easy to roll your own routines. This fellow:
http://blog.fpmurphy.com/2008/12/half-precision-floating-point-format_14.html
wrote a bit about it and also presents a routine to convert half-float <-> float.
Also, here seems to be a half-float C++ wrapper class:
half.h:
http://www.koders.com/cpp/fidABD00D95DE84C73BF0218AC621E400E07AA77B53.aspx
half.cpp
http://www.koders.com/cpp/fidF0DD0510FAAED03817A956D251787609BEB5989E.aspx
which supplies "HalfFloat" as a possible drop-in replacement type.
Maybe use fixed-point math? It all depends on value and precision you want to achieve.
http://www.eetimes.com/discussion/other/4024639/Fixed-point-math-in-C
For C there is a lot of code that makes fixed-point easy and I'm pretty sure there are also many C++ classes that make it even easier, but I don't know of any, I'm more into C.
The first, obvious, memory optimization would be to try and get rid of the pointer. If you can store just the float, that may, depending on the larger context, reduce your memory consumption from eight to four bytes already. (On a 64-Bit system, from twelve to four.)
Whether you can get by with a short depends on what your program does with the values. You may be able to use fix point arithmetic using an integral type such as a short, yes but your questions shows way too little context to judge that.
The code you posted and the text in the question do not deal with actual float, but with pointers to float. In all architectures I know of, the size of a pointer is the same regardless of the pointed type, so there would be no improvement in changing that to a short or char pointer.
Now, about the actual pointed elements, what is the range that you expect in your application? What is the precision you need? How many of those elements do you have? What are the memory constraints of your target platform? Unless the range and precision are small and the number of elements huge, just use floats. Also note that if you need floating point operations, storing any other type will require conversions before and after each operation, and you might be impacting performance.
Without greater knowledge of what you are doing, the ranges for short in many architectures are [-32k, 32k), where k stands for 1024. If your data ranges is [-32,32) and you can do with roughly 3 decimal digits you could use fixed point arithmetic with shorts, but there are few such situation.

Drawing real coordinates

I've implemented a plotting class that is currently capable of handling integer values only. I would like to get advice about techniques/mechanisms in order to handle floating numbers. Library used is GDI.
Thanks,
Adi
At some point, they need to be converted to integers to draw actual pixels.
Generally speaking, however, you do not want to just cast each float to int, and draw -- you'll almost certainly get a mess. Instead, you need/want to scale the floats, then round the scaled value to an integer. In most cases, you'll want to make the scaling factor variable so the user can zoom in and out as needed.
Another possibility is to let the hardware handle most of the work -- you could use OpenGL (for one example) to render your points, leaving them as floating point internally, and letting the driver/hardware handle issues like scaling and conversion to integers. This has a rather steep cost up-front (learning enough OpenGL to get it to do anything useful), but can have a fairly substantial payoff as well, such as fast, hardware-based rendering, and making it relatively easy to handle some things like scaling and (if you ever need it) being able to display 3D points as easily as 2D.
Edit:(mostly response to comment): Ultimately it comes down to this: the resolution of a screen is lower than the resolution of a floating point number. For example, a really high resolution screen might display 2048 pixels horizontally -- that's 11 bits of resolution. Even a single precision floating point number has around 24 bits of precision. No matter how you do it, reducing 24-bit resolution to 12-bit resolution is going to lose something -- usually a lot.
That's why you pretty nearly have to make your scaling factor variable -- so the user can choose whether to zoom out and see the whole picture with reduced resolution, or zoom in to see a small part at high resolution.
Since sub-pixel resolution was mentioned: it does help, but only a little. It's not going to resolve a thousand different items that map to a single pixel.
What do these float values represent? I will assume they are some co-ordinates. You will need to know two things:
The source resolution (i.e. the dpi at which these co-ordinates are drawn)
The range that you need to address
After that, this becomes a problem of scaling the points to suitable integer co-ordinates (based on your screen-resolution).
Edit: A simple formula will be:
X(dst) = X(src) * DPI(dst) / DPI(src)
You'll have to convert them to integers and then pass them to functions like MoveTo() and LineTo().
Scale. For example, multiply all the integral values by 10. Multiply the floating point values by 10.0 and then truncate or round (your choice). Now plot as normal.
This will give you extra precision in your graphing. Just remember the scale factor when you look at the picture.
Otherwise convert the floats to int before plotting.
You can try to use GDI+ instead GDI, it has functions that are using float coordinates.