Issues saving double as binary in c++ - c++

In my simulation code for a particle system, I have a class defined for particles, and each particle has a property of pos containing its position, which is a double pos[3]; as there are 3 coordinate components per particle. So with particle object defined by particles = new Particle[npart]; (as we have npart many particles), then e.g. the y-component of the 2nd particle would be accessed with double dummycomp = particles[1].pos[1];
To save the particles to file before using binary I would use (saved as txt, with float precision of 10 and one particle per line):
#include <iostream>
#include <fstream>
ofstream outfile("testConfig.txt", ios::out);
outfile.precision(10);
for (int i=0; i<npart; i++){
outfile << particle[i].pos[0] << " " << particle[i].pos[1] << " " << particle[i].pos[2] << endl;
}
outfile.close();
But now, to save space, I am trying to save the configuration as a binary file, and my attempt, inspired from here, has been as follows:
ofstream outfile("test.bin", ios::binary | ios::out);
for (int i=0; i<npart; i++){
outfile.write(reinterpret_cast<const char*>(particle[i].pos),streamsize(3*sizeof(double)));
}
outfile.close();
but I am facing a segmentation fault when trying to run it. My questions are:
Am I doing something wrong with reinterpret_cast or rather in the argument of streamsize()?
Ideally, it would be great if the saved binary format could also be read within Python, is my approach (once fixed) allowing for that?
working example for the old saving approach (non-binary):
#include <iostream>
#include <fstream>
using namespace std;
class Particle {
public:
double pos[3];
};
int main() {
int npart = 2;
Particle particles[npart];
//initilizing the positions:
particles[0].pos[0] = -74.04119568;
particles[0].pos[1] = -44.33692582;
particles[0].pos[2] = 17.36278231;
particles[1].pos[0] = 48.16310086;
particles[1].pos[1] = -65.02325252;
particles[1].pos[2] = -37.2053818;
ofstream outfile("testConfig.txt", ios::out);
outfile.precision(10);
for (int i=0; i<npart; i++){
outfile << particles[i].pos[0] << " " << particles[i].pos[1] << " " << particles[i].pos[2] << endl;
}
outfile.close();
return 0;
}
And in order to save the particle positions as binary, substitute the saving portion of the above sample with
ofstream outfile("test.bin", ios::binary | ios::out);
for (int i=0; i<npart; i++){
outfile.write(reinterpret_cast<const char*>(particles[i].pos),streamsize(3*sizeof(double)));
}
outfile.close();
2nd addendum: reading the binary in Python
I managed to read the saved binary in python as follows using numpy:
data = np.fromfile('test.bin', dtype=np.float64)
data
array([-74.04119568, -44.33692582, 17.36278231, 48.16310086,
-65.02325252, -37.2053818 ])
But given the doubts cast in the comments regarding non-portability of binary format, I am not confident this type of reading in Python will always work! It would be really neat if someone could elucidate on the reliability of such approach.

The trouble is that base 10 representation of double in ascii is flawed and not guaranteed to give you the correct result (especially if you only use 10 digits). There is a potential for a loss of information even if you use all std::numeric_limits<max_digits10> digits as the number may not be representable in base 10 exactly.
The other issue you have is that the binary representation of a double is not standardized so using it is very fragile and can lead to code breaking very easily. Simply changing the compiler or compiler sittings can result in a different double format and changing architectures you have absolutely no guarantees.
You can serialize it to text in a non lossy representation by using the hex format for doubles.
stream << std::fixed << std::scientific << particles[i].pos[0];
// If you are using C++11 this was simplified to
stream << std::hexfloat << particles[i].pos[0];
This has the affect of printing the value with the same as "%a" in printf() in C, that prints the string as "Hexadecimal floating point, lowercase". Here both the radix and mantissa are converted into hex values before being printed in a very specific format. Since the underlying representation is binary these values can be represented exactly in hex and provide a non lossy way of transferring data between systems. IT also truncates proceeding and succeeding zeros so for a lot of numbers is relatively compact.
On the python side. This format is also supported. You should be able to read the value as a string then convert it to a float using float.fromhex()
see: https://docs.python.org/3/library/stdtypes.html#float.fromhex
But your goal is to save space:
But now, to save space, I am trying to save the configuration as a binary file.
I would ask the question do you really need to save space? Are you running on a low powered low resource environment? Sure then space saving can definitely be a thing (but that is rare nowadays (but these environments do exist)).
But it seems like you are running some form of particle simulation. This does not scream low resource use case. Even if you have tera bytes of data I would still go with a portable easy to read format over binary. Preferably one that is not lossy. Storage space is cheap.

I suggest using a library instead of writing a serialization/deserialization routine from scratch. I find cereal really easy to use, maybe even easier than boost::serialization. It reduces the opportunity for bugs in your own code.
In your case I'd go about serializing doubles like this using cereal:
#include <cereal/archives/binary.hpp>
#include <fstream>
int main() {
std::ofstream outfile("test.bin", ios::binary);
cereal::BinaryOutputArchive out(outfile);
double x, y, z;
x = y = z = 42.0;
out(x, y, z);
}
To deserialize them you'd use:
#include <cereal/archives/binary.hpp>
#include <fstream>
int main() {
std::ifstream infile("test.bin", ios::binary);
cereal::BinaryInputArchive in(infile);
double x,y,z;
in(x, y, z);
}
You can also serialize/deserialize whole std::vector<double>s in the same fashion. Just add #include <cereal/types/vector.hpp> and use in / out like in the given example on a single std::vector<double> instead of multiple doubles.
Ain't that swell.
Edit
In a comment you asked, whether it'd be possible to read a created binary file like that with Python.
Answer:
Serialized binary files aren't really meant to be very portable (things like endianness could play a role here). You could easily adapt the example code I gave you to write a JSON file (another advantage of using a library) and read that format in Python.
Oh and cereal::JSONOutputArchive has an option for setting precision.

Just curious if you ever investigated the idea of converting your data to vectored coordinates instead of Cartesian X,Y,Z? It would seem that this would potentially reduce the size of your data by about 30%: Two coordinates instead of three, but perhaps needing slightly higher precision in order to convert back to your X,Y,Z.
The vectored coordinates could still be further optimized by using the various compression techniques above (text compression or binary conversion).

Related

removing trailing zeroes for a float value c++

I am trying to set up a nodemcu module to collect data from a temperature sensor, and send it using mqtt pubsubclient to my mqtt broker, but that is not the problem.
I am trying to send the temperature in a format that only has one decimal, and at this point I've succesfully made it round up or down, but the format is not right. as of now it rounds the temp to 24.50, 27.80, 23.10 etc. I want to remove the trailing zereos, so it becomes 24.5, 27.8, 23.1 etc.
I have this code set up so far:
#include <math.h>
#include <PubSubClient.h>
#include <ESP8266WiFi.h>
float temp = 0;
void loop {
float newTemp = sensors.getTempCByIndex(0);
temp = roundf((newTemp * 10)) / 10;
serial.println(String(temp).c_str())
client.publish("/test/temperature", String(temp).c_str(), true);
}
I'm fairly new to c++, so any help would be appreciated.
It's unclear what your API is. Seems like you want to pass in the C string. In that case just use sprintf:
#include <stdio.h>
float temp = sensors.getTempCByIndex(0);
char s[30];
sprintf(s, "%.1f", temp);
client.publish("/test/temperature", s, true);
Regardless of what you do to them, floating-point values always have the same precision. To control the number of digits in a text string, change the way you convert the value to text. In normal C++ (i.e., where there is no String type <g>), you do that with a stream:
std::ostrstream out;
out << std::fixed << std::setprecision(3) << value;
std::string text = out.str();
In the environment you're using, you'll have to either use standard streams or figure out what that environment provides for controlling floating-point to text conversions.
The library you are using is not part of standard C++. The String you are using is non-standard.
As Pete Becker noted in his answer, you won't be able to control the trailing zeros by changing the value of temp. You need to either control the precision when converting it to String, or do the conversion and then tweak the resultant string.
If you read the documentation for the String type you are using, there may be options do do one or both of;
control the precision when writing a float to a string; or
examine characters in a String and manually remove trailing zeros.
Or you could use a std::ostrstream to produce the value in a std::string, and work with that instead.

Reading key-value pairs as fast as possible in C++ from file

I have a file with roughly 2 million lines like this:
2s,3s,4s,5s,6s 100000
2s,3s,4s,5s,8s 101
2s,3s,4s,5s,9s 102
The first comma separated part indicates a poker result in Omaha, while the latter score is an example "value" of the cards. It is very important for me to read this file as fast as possible in C++, but I cannot seem to get it to be faster than a simple approach in Python (4.5 seconds) using the base library.
Using the Qt framework (QHash and QString), I was able to read the file in 2.5 seconds in release mode. However, I do not want to have the Qt dependency. The goal is to allow quick simulations using those 2 million lines, i.e. some_container["2s,3s,4s,5s,6s"] to yield 100 (though if applying a translation function or any non-readable format will allow for faster reading that's okay as well).
My current implementation is extremely slow (8 seconds!):
std::map<std::string, int> get_file_contents(const char *filename)
{
std::map<std::string, int> outcomes;
std::ifstream infile(filename);
std::string c;
int d;
while (infile.good())
{
infile >> c;
infile >> d;
//std::cout << c << d << std::endl;
outcomes[c] = d;
}
return outcomes;
}
What can I do to read this data into some kind of a key/value hash as fast as possible?
Note: The first 16 characters are always going to be there (the cards), while the score can go up to around 1 million.
Some further informations gathered from various comments:
sample file: http://pastebin.com/rB1hFViM
ram restrictions: 750MB
initialization time restriction: 5s
computation time per hand restriction: 0.5s
As I see it, there are two bottlenecks on your code.
1 Bottleneck
I believe that the file reading is the biggest problem there. Having a binary file is the fastest option. Not only you can read it directly in an array with a raw istream::read in a single operation (which is very fast), but you can even map the file in memory if your OS supports it. Here is a link that's very informative on how to use memory mapped files.
2 Bottleneck
The std::map is usually implemented with a self-balancing BST that will store all the data in order. This makes the insertion to be an O(logn) operation. You can change it to std::unordered_map, wich uses a hash table instead. A hash table have a constant time insertion if the number of colisions are low. As the ammount of elements that you need to read is known, you can reserve a suitable ammount of chuncks before inserting the elements. Keep in mind that you need more chuncks than the number of elements that will be inserted in the hash to avoid the maximum ammount of colisions.
Ian Medeiros already mentioned the two major botlenecks.
a few thoughts about data structures:
the amount of different cards is known: 4 colors of each 13 cards -> 52 cards.
so a card requires less than 6 bits to store. your current file format currently uses 24 bit (includig the comma).
so by simply enumerating the cards and omitting the comma you can save ~2/3 of file size and allows you to determine a card with reading only one character per card.
if you want to keep the file text based you may use a-m, n-z, A-M and N-Z for the four colors.
another thing that bugs me is the string based map. string operations are innefficient.
One hand contains 5 cards.
that means 52^5 posiibilities if we keep it simple and do not consider the already drawn cards.
--> 52^5 = 380.204.032 < 2^32
that means we can enumuerate every possible hand with a uint32 number. by defining a special sorting scheme of the cards (since order is irrelevant), we can assign a number to the hand and use this number as key in our map that is a lot faster than using strings.
if we have enough memory (1.5 GB) we do not even need a map but we can simply use an array.
of course the most cells are unused but access may be very fast. we even can ommit the ordering of the cards since the cells are present independet if we fill them or not. So we can use them. but in this case you should not forget to fill all possible permutations of the hand read from the file.
with this scheme we also (may be) can further optimize our file reading speed. if we only store the hands number and the rating so that only 2 values need to be parsed.
infact we can optimize the required storage space by using a more complex adressing scheme for the different hands, since in reality there are only 52*51*50*49*48 = 311.875.200 possible hands.additional to that the ordering is irrelevant as mentioned but i think that this saving is not worth the increased complexity of the encoding of the hands.
A simple idea might be to use the C API, which is considerably simpler:
#include <cstdio>
int n;
char s[128];
while (std::fscanf(stdin, "%127s %d", s, &n) == 2)
{
outcomes[s] = n;
}
A rough test showed a considerable speedup for me compared to the iostreams library.
Further speedups may be achieved by storing the data in a contiguous array, e.g. a vector of std::pair<std::string, int>; it depends on whether your data is already sorted and how you need to access it later.
For a serious solution, though, you should probably step back further and think of a better way to represent your data. For example, a fixed-width, binary encoding would be much more space-efficient and faster to parse, since you won't need to look ahead for line endings or parse strings.
Update: From some quick experimentation I've found it fairly fast to first read the entire file into memory and then perform alternating strtok calls with either " " or "\n" as the delimiter; whenever a pair of calls succeed, apply strtol on the second pointer to parse the integer. Here's a skeleton:
#include <cerrno>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <vector>
int main()
{
std::vector<char> data;
// Read entire file to memory
{
data.reserve(100000000);
char buf[4096];
for (std::size_t n; (n = std::fread(buf, 1, sizeof buf, stdin)) > 0; )
{
data.insert(data.end(), buf, buf + n);
}
data.push_back('\0');
}
// Tokenize the in-memory data
char * p = &data.front();
for (char * q = std::strtok(p, " "); q; q = std::strtok(nullptr, " "))
{
if (char * r = std::strtok(nullptr, "\n"))
{
char * e;
errno = 0;
int const n = std::strtol(r, &e, 10);
if (*e != '\0' || errno != 0) { continue; }
// At this point we have data:
// * the string is "q"
// * the integer is "n"
}
}
}

How to work with large numbers when writing and reading a file?

I have written a codes to write my data from one input file to another output file, I used to read all lines of my input file
while (!inputfile.eof())
but in my output file, the last line is missing. So I would like to know, how to prevent this error?
My second question is: for writing data into file, I used
Outputfile.write((char*)&a,sizeof(double));
Outputfile.write((char*)&b,sizeof(double));
here a = 289814.150 and b = 4320978.613 but in the output file, it shows like
289814 4.32098e+006
(value of a is rounded and b value shows with e values) so what is the reason for this and how to fixed this problem?
Here i tried to use cout.setf(ios::fixed);, but if this works for data written on the screen, I don’t know how to fix this to write double data inside my file.
I want to write real values with 3 decimals only in my output file. Please anyone can help thanks.
Okay, based on comments, the intent here has (at least I hope) become reasonably clear: to convert pairs of numbers in text format to binary format, and be able to verify that the converted numbers accurately represent the originals.
There are a number of ways to do that, but the first thing to keep in mind is that no matter what else you do, converting floating point numbers to/from text (decimal) format can and normally will lead to some degree of inaccuracy. The problem is fairly simple: floating point is (normally) done in binary. This means it can only represent fractions in which the denominator is a power of 2 (or a sum of powers of 2). Decimal, obviously enough, uses base 10, so fractions can be composed of a sum of powers of 2 and powers of 5. Any of those that involves a power of 2 (e.g., 0.2) can only be approximated in binary -- pretty much like trying to represent 1/3rd in decimal.
This means your only reasonable choice is to allow some discrepancy between the decimal and binary versions. The best you can hope for is to keep the errors to a minimum. To test for that, what you probably need/want to do is convert the binary floating point back to decimal in the original format, and check whether it's close to the original (e.g., ignore errors in the final digit, at least errors of +/- 1).
The conversion itself should be pretty trivial:
#include <fstream>
int main(int argc, char **argv) {
// checking argc omitted for clarity.
std::ifstream infile(argv[1]);
std::ofstream outfile(argv[2], std::ios::binary);
double a, b;
while (infile >> a && infile >> b) {
outfile.write((char const *)&a, sizeof(a));
outfile.write((char const *)&b, sizeof(b));
}
return 0;
}
Verifying the data isn't nearly so easy. One possibility would be something like this (starting from the two files, one binary and one text):
#include <iostream>
#include <fstream>
#include <sstream>
#include <iomanip>
int main(int argc, char **argv) {
std::string text;
std::ostringstream converter;
std::ifstream text_file(argv[1]);
std::ifstream bin_file(argv[2], std::ios::binary);
double bin_value;
while (text_file >> text) {
bin_file.read((char *)&bin_value, sizeof(bin_value));
// the manipulators will probably need tweaking to match original format.
converter << std::fixed << std::setw(3) << std::setprecision(3) << bin_value;
if (converter.str() != text)
;// they're identical
else if (converter.str().substr(0,3) == text.substr(0,3))
;// the first three digits are equal
else
;// bigger error
}
return 0;
}
That's much more likely to need some tweaking to work the way you want, but the general idea should be in the ballpark as long as you're sure the original numbers are all formatted consistently.

How do you output variable's declared as a double to a text file in C++

I am very new to C++ and I am wondering how you output/write variables declared as double to a txt file. I know about how to output strings using fstream but I cant figure out how to send anything else. I am starting to think that you can't send anything but strings to a text file is that correct? If so then how would you convert the information stored in the variable to a string variable?
Here is my code that I'm trying to implement this concept into, Its fairly simple:
int main()
{
double invoiceAmt = 3800.00;
double apr = 18.5; //percentage
//compute cash discount
double discountRate = 3.0; //percentage
double discountAmt;
discountAmt = invoiceAmt * discountRate/100;
//compute amount due in 10 days
double amtDueIn10;
amtDueIn10 = invoiceAmt - discountAmt;
//Compute Interest on the loan of amount (with discount)for 20 days
double LoanInt;
LoanInt = amtDueIn10 * (apr /360/100) * 20;
//Compute amount due in 20 days at 18.5%.
double amtDueIn20;
amtDueIn20 = invoiceAmt * (1 + (apr /360/100) * 20);
return 0;
}
So what I'm trying to do is use those variables and output them to the text file. Also please inform me on the includes that I need to use for this source code. Feel free to give suggestions on how to improve my code in other ways as well please.
Thanks in advance.
As your tagging suggests, you use file streams:
std::ofstream ofs("/path/to/file.txt");
ofs << amtDueIn20;
Depending on what you need the file for, you'll probably have to write more stuff (like whitespaces etc.) in order to get decent formatting.
Edit due to rmagoteaux22's ongoing problems:
This code
#include <iostream>
#include <fstream>
const double d = 3.1415926;
int main(){
std::ofstream ofs("test.txt");
if( !ofs.good() ) {
std::cerr << "Couldn't open text file!\n";
return 1;
}
ofs << d << '\n';
return 0;
}
compiles for me (VC9) and writes this to test.txt:
3.14159
Can you try this?
Simply use the stream write operator operator<< which has an overloaded definition for double (defined in basic_ostream)
#include <fstream>
...
std::fstream stmMyStream( "c:\\tmp\\teststm.txt", std::ios::in | std::ios::out | std::ios::trunc );
double dbMyDouble = 23.456;
stmMyStream << "The value is: " << dbMyDouble;
To answer your first question, in C you use printf (and for file output fprintf). IIRC, cout has a large number of modifiers also, but I won't mention them as you originally mentioned fstream (more 'C' centric than C++) --
oops, missed the ofstream indicator, ignore my 'C' comments and use C++
to improve your program, be sure to use parentheses a lot when doing computations as above to be 100% sure things are evaluated the way you want them to be (do not rely on order of precedence)
Generally speaking methods to write to a output are printf, wprintf etc.
In case of files, these methods are named as fprintf_s, fsprintf_s etc.
Note that the '_s' methods are the new secure variations of previous formatting methods. You should always use these new secure versions.
For examples refer to:
http://msdn.microsoft.com/en-us/library/ksf1fzyy%28VS.80%29.aspx
Note these methods use a format specifier to convert a given type to text. For example %d acts as a place holder for integer. Similarly %f for double.
Just use the << operator on an output stream:
#include <fstream>
int main() {
double myNumber = 42.5;
std::fstream outfile("test.txt", std::fstream::out);
outfile << "The answer is almost " << myNumber << std::endl;
outfile.close();
}
I was having the exact same problem, where ofstream was outputting strings, but stopped as soon as it reached a variable. With a bit more Googling I found this solution in a forum post:
Under Xcode 3.2 when creating a new project based on stdc++ project template the target build settings for Debug configuration adds preprocessor macros which are incompatible with gcc-4.2:
_GLIBCXX_DEBUG=1
_GLIBXX_DEBUG_PEDANTIC=1
Destroy them if you want Debug/gcc-4.2 to execute correctly.
http://forums.macrumors.com/showpost.php?p=8590820&postcount=8

String manipulation using Arduino and C++

I am trying to manipulate a string in C++. I am working with an Arduino board so I am limited on what I can use. I am also still learning C++ (Sorry for any stupid questions)
Here is what I need to do:
I need to send miles per hour to a 7 segment display. So if I have a number such as 17.812345, I need to display 17.8 to the 7 segment display. What seems to be most efficient way is to first multiply by 10 (This is to shift the decimal point right one place), then cast 178.12345 to an int (to chop decimal points off). The part I am stuck on is how to break apart 178. In Python I could slice the string, but I can't find anything on how to do this in C++ (or at least, I can't find the right terms to search for)
There are four 7 segment displays and a 7 segment display controller. It will measure up to tenths of a mile per hour. Thank you very much for an assistance and information you can provide me.
It would probably be easiest to not convert it to a string, but just use arithmetic to separate the digits, i.e.
float speed = 17.812345;
int display_speed = speed * 10 + 0.5; // round to nearest 0.1 == 178
int digits[4];
digits[3] = display_speed % 10; // == 8
digits[2] = (display_speed / 10) % 10; // == 7
digits[1] = (display_speed / 100) % 10; // == 1
digits[0] = (display_speed / 1000) % 10; // == 0
and, as pointed out in the comments, if you need the ASCII value for each digit:
char ascii_digits[4];
ascii_digits[0] = digits[0] + '0';
ascii_digits[1] = digits[1] + '0';
ascii_digits[2] = digits[2] + '0';
ascii_digits[3] = digits[3] + '0';
This a way you can do it in C++ without modulus math (either way seems fine to me):
#include "math.h"
#include <stdio.h>
#include <iostream.h>
int main( ) {
float value = 3.1415;
char buf[16];
value = floor( value * 10.0f ) / 10.0f;
sprintf( buf, "%0.1f", value );
std::cout << "Value: " << value << std::endl;
return 0;
}
If you actually want to be processing this stuff as strings, I would recommend looking into stringstream. It can be used much the same as any other stream, such as cin and cout, except instead of sending all output to the console you get an actual string out of the deal.
This will work with standard C++. Don't know much about Arduino, but some quick googling suggests it won't support stringstreams.
A quick example:
#include <sstream> // include this for stringstreams
#include <iostream>
#include <string>
using namespace std; // stringstream, like almost everything, is in std
string stringifyFloat(float f) {
stringstream ss;
ss.precision(1); // set decimal precision to one digit.
ss << fixed; // use fixed rather than scientific notation.
ss << f; // read in the value of f
return ss.str(); // return the string associated with the stream.
}
int main() {
cout << stringifyFloat(17.812345) << endl; // 17.8
return 0;
}
You can use a function such as this toString and work your way up from there, like you would in Python, or just use modulo 10,100,1000,etc to get it as numbers. I think manipulating it as a string might be easier for you, but its up to you.
You could also use boost::lexical_cast, but it will probably be hard to get boost working in an embedded system like yours.
A good idea would be to implement a stream for the display. That way the C++ stream syntax could be used and the rest of the application would remain generic. Although this may be overkill for an embedded system.
If you still want to use std::string you may want to use a reverse iterator. This way you can start at the right most digit (in the string) and work towards the left, one character at a time.
If you have access to the run-time library code, you could set up a C language I/O for the display. This is easier to implement than a C++ stream. You could then use fprint, fputs to write to the display. I implemented a debug port in this method, and it was easier for the rest of the developers to use.