Okay, so i have a fairly annoying problem, one of the applications we use hdp, dumps HDF values to a text file.
So basically we have a text file consisting of this:
-8684 -8683 -8681 -8680 -8678 -8676 -8674 -8672 -8670 -8668 -8666
-8664 -8662 -8660 -8657 -8655 -8653 -8650 <trim... 62,000 more rows>
Each of these represent a double:
E.g.:
-8684 = -86.84
We know the values will be between 180 -> -180. But we also have to process around 65,000 rows of this. So time is kinda important.
Whats the best way to deal with this? (i can't use Boost or any of the other libraries, due to internal standards)
As you wish, as an answer instead... :)
Can't you just use standard iostream?
double val; cin >> &val; val/=100;
rinse, repeat 62000*11 times
I think I'd do the job a bit differently. I'd create a small sorta-proxy class to handle reading a value and converting it to a double:
class fixed_point {
double val;
public:
std::istream &read(std::istream &is) {
is >> val; val /= 100.0; return is;
}
operator double() { return val; }
friend std::istream &operator>>(std::istream &is, fixed_point &f) {
return f.read(is);
}
};
Using that, I could read my data a bit more cleanly. For example:
std::vector<double> my_data;
std::copy(std::istream_iterator<fixed_point>(infile),
std::istream_iterator<fixed_point>(),
std::back_inserter(my_data));
The important point here is that operator>> doesn't just read raw data, but extracts real information (in a format you can use) from that raw data.
There are other ways to do the job as well. For example, you could also create a derivative of std::num_get that parses doubles from that file's format. This is probably the way that's theoretically correct, but the documentation for this part of the library is mostly pretty poor, so I have a hard time advising it.
Related
I am using ifstream and and ostream to serialize my data but I am surprised to discover the `<<' operator can't seperate two adjacent strings and seperating them would be quite complicated.
class Name
{
string first_name;
string last name;
friend std::ostream& operator<< (std::ostream& os, const Name& _name)
{
os << _name.first_name << _name.last_name;
return os;
}
friend std::istream& operator>> (std::istream& is, Name& _name)
{
is >> _name.first_name >> _name.last_name;
return is;
}
This doesn't work because << and >> doesn't write null terminator characters and ifstream reads the whole string in variable (first_name) which is kinda disappointing. How can I store the two strings separately so I can read them separately as well? I don't understand what is the motivation of << concatenating all the strings in ostream so we can't read them back seperatly!?
I don't understand what is the motivation of << concatenating all the strings in ostream so we can't read them back seperatly!?
This assumes that the only reason to write them separately is to read them as individual strings. Consider the case where someone has a pair of strings that they want to write to a stream without separators. Or a string followed by a float that they don't want separators for.
If ostreams automatically inserted separators for every << output, then it would be much harder for someone to write text without separators. They'd have to manually concatenate these strings and/or values into a single string, then output that.
And what would they use for this concatenation? They can't use ostringstream like you normally might, because it uses the same facilities as ofstream. So every << would put a separator character in the stream.
In short, the IO streams API writes what you told it to write, not what you may or may not "want" to write. It's not a serialization API; C++ isn't C# or Java. If you want serious serialization features, use Boost.Serialization.
Often times you want to concatenate strings with ostream (commonly stringstream). If you specifically don't want them concatenated it's easy enough to do:
os << _name.first_name << '\n' << _name.last_name;
ifstream and ofstream basically are streams, so they have nothing to indicate limit of data in them. Think about them as a river, all data can read from or write to them. This is true nature of files, so if you need them for serialization you must implement your serialization mechanism or use a library that designed for this purpose like boost::serialization. In C++ every thing implemented as is, and because of this you can gain maximum performance!! :)
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to serialize in c++?
How to implement serialization in C++
I've been toying around with C++ more and more these days and have only had a couple experiences with ofstream at this point. Most of said experiences have been doing simple file output of variables and reading them back in with ifstream. What I haven't done is anything with objects.
Let's assume that I have an object that is being written to frequently (say a game, and the object is the character) Every time the character is hit, the hp is re-written, every time they defeat an enemy they are gaining experience.... my basic idea is to write a simple text-based dungeon crawling game. But how am I going to make some kind of an autosave file?? Do I just write out every attribute of my object to a file individually and then move onto bigger and better from there? If I had to do it right now that's how I'd go about doing it, but I can't help like feeling that there's an easier way than that....
Anyone care to help me output the contents of an entire object(and it's respective attributes) to a file?
You could just write the object to a file by copying it's contents in memory.
But then you hit the tricky bits! You can't copy any items that are pointers to memory, because when you load them back in they don't own that memory. That means copying things like std::string which may internally contain their own memory allocation is also tricky.
Then even with standard types there are issues if you plan to read them on a different machine with a different number of bits, or a different byte order.
The process is called serialisation - there are a few standard techniques to make it easier.
Take a look at this code :
//! Struct for a 2d marker
struct Marker2d{
double x; //!< x coordinate of marker in calibration phantom
double y; //!< y coordinate of marker in calibration phantom
int id; //!< some unique id (used for sequence id as well as for assigned 3d id)
int code; //!< type of marker (small = 0, large = 1)
float size; //!< real size of marker in 2D image (in pixel)
double distanceToNearest; //!< distance to nearest other marker
/**
* Overloaded stream insertion operator. Abbreviation for the output of 2d marker.
\param output_out A reference to an std::ostream instance indicating the output stream
\param marker_in A constant Marker2d reference indicating the 2d marker that we want to output
\return a std::ostream reference containing the new output data
*/
friend std::ostream & operator<<(std::ostream & output_out, const Marker2d & marker_in)
{
return output_out<< std::fixed << std::setprecision(15) <<marker_in.x<<"\t"<<marker_in.y<<"\t"<<marker_in.id<<"\t"
<<marker_in.code<<"\t"<<marker_in.size<<"\t"<<marker_in.distanceToNearest;
}
/**
* Overloaded stream extraction operator.
\param s_in A reference to an std::istream instance indicating the input stream
\param marker_out A Marker2d reference indicating the 2d marker that will have its data members populated
\return a std::istream reference indicating the input stream
*/
friend std::istream& operator>>(std::istream& s_in, Marker2d & marker_out)
{
s_in >> marker_out.x >> marker_out.y >> marker_out.id >> marker_out.code >> marker_out.size >> marker_out.distanceToNearest;
return s_in;
}
};
This is a simple struct with overloaded >> and << operators. This allows you to output to a file like myOfstreamFile << obj; And read the other way around.
If you have say, a thousand of objects stored in a file you can simply put them in a container like this :
std::vector<Marker2d> myMarkers;
std::ifstream f( fileName_in.c_str() );
if(!f)
throw std::exception(std::string("AX.Algorithms.ComputeAssignmentsTest::getMarkersFromFile - Could not open file : " + fileName_in + " for reading!").c_str());
//cool one liner to read objects from file...
std::copy(std::istream_iterator<AX::Calibration::Marker2d>(f), std::istream_iterator<AX::Calibration::Marker2d>(), std::back_inserter(myMarkers));
Of course you could provide other forms of input and output e.g. save to .xml format and parse it as a dom tree etc. This is just a sample.
EDIT : This will work for relatively simple objects. Look at serialization if you need something more complex
Search the web and SO for "serialization". There are some bumps to watch out for: floating point, endianess and variable length fields (strings).
Good luck!
Behind your simple question hides a complex theme. Please take a look to boost::serialization (here for instance). Any time spent to learning boost it's very rewarding.
There's a nice library in Boost called Boost.Serialize. If you're looking for performance, it is probably not the right choice, but looking at the source code and usage may give you some ideas. Serialization/Deserialization can be tricky, especially when you have lots of nested components. JSON is a very nice format for serializing general objects.
Here's a hacky trick that will likely get coders here to rip out their hair and scream at me (this only works for static objects and not ones that use dynamic memory allocation):
class TestA
{
private:
long Numerics;
char StaticArray[10];
int Data[3];
public:
TestA(){Numerics = 10; strcpy(StaticArray,"Input data"); Data[0] = 100; Data[1] = 200; Data[2] = 300;}
void Test(){Numerics = 1000; strcpy(StaticArray,"Data input"); Data[0] = 300; Data[1] = 200; Data[2] = 100;}
void Print()
{
printf("Numerics is: %ld\nStaticArray is: %s\n%d %d %d\n",Numerics,StaticArray,Data[0],Data[1],Data[2]);
}
};
int main()
{
TestA Test;
FILE *File = fopen("File.txt","wb");
Test.Test();
Test.Print();
fwrite((char *)&Test,sizeof(Test),1,File); //Treats the object as if it is a char array
fclose(File);
TestA Test2;
File = fopen("File.txt","rb");
fread((char *)&Test2,sizeof(Test2),1,File); //Treats the object as if it is a char array
Test2.Print();
fclose(File);
return 0;
}
Which results in:
Numerics is: 1000
Static array is: Data input
300 200 100
Numerics is: 1000
Static array is: Data input
300 200 100
Opening the file reveals the written data:
è Data input w, È d
The above trick allows for easy conversion into a byte-based format. Naturally this is hacky, but classes (or objects) should be expected to supply their own object-to-char array conversion process.
I'm not good in IOstream library since I have accustom to stdio and stuff life this, however I got a problem I hoped to be solved in IOstream but I find that it probably not. So I'm quite new to standard C++ libraries but quite well with C++ OOP/Classes and so on.
So I can't use code like
printf (stream, "...", C);
if C is of an aggregate type because I can't create new format string options like %mytupe. Also I can't expect proper behavior of
fwrite/fread (&C, sizeof(C), 1, stream)
if T contains fields that are pointers because fwrite/fread will save/load value of a pointer but not a value stored in memory where the pointer refers to:
class MyClass
{...
private:
{typename} Tp* Data;
} C;
I don't care much of first limit because I can write a function that convert object of each of my class to a text string, it works even if but the last can't be solved easily. For example, I tried to create a function that save each class to binary file but I got a lot of problems with staff like luck of partial specialization of a template and so on (mo matter).
Being tired of making bugs and mistakes while rewriting standard code (like own string and file holder classes) I hoped that learning (at last!) of standard (written by clever people and well-tested :) library will help me since I read a lot that standard C++ library solve first issue with using of streams. I can overload operator << and operator >> or so on to be sure that my class will be saved to or read from text file properly. But what about binary files which is much much more important for me?
What should I do if I want to save an object of class like vector, for example, to the binary file? Using of << and >> fails at all since it says that vector has no operators << and >> overloaded, but even if it had it would produce text data.
Staff like
vector <MyClass> V;
...
ofstream file ("file.bin", ios::binary);
int size1 = ;
file.write((const char*)&V.size(), sizeof(V.size()));
file.write((const char*)&V[0], V.size() * sizeof(MyClass));
is not suitable (and doesn't differs much from using of fwrite) since it saves value (address) of pointer field but not the data stored there (also, what if I declare a "two-dimension" vector as vector > ??). So, if there was overloading of vector operator << like
template <class T> vector
{public:
...
ostream operator << () const
{ostream s;
for (uint32_t k = 0; k < size(); k++)
s << s << this->operator[] (k);
return s;
}
private:
T* Data;
};
and if each T::operator << was overloaded too in the same way (for MyClass - to provide stream of data stored in MyCLass::Tp) it was saved.
(I know, I know, there should be iterator, but maybe I made a more serious mistake because of total misunderstanding of streams? Anyway just I'm talking about idea.)
Well, it is a way to convert data to text, not to got binary data as it is stored in memory, but I know there can be written an interface to work with binary data in the same way (maybe not using << and >> but function names, but it can be for sure)! The questing is: was it done in standard C++ library or somewhere else (another opensource library for C++)? Yes, yes, to properly write a vector to file in one line. (I'll be very surprised if it is not included into standard C++ because how do people save data they work to files if they want to use multidimension dynamic arrays?)
You're looking for the term "serialization", and you might want to use the Boost::Serialization library for that purpose.
struct Vector
{
float x, y, z;
};
func(Vector *vectors) {...}
usage:
load float *coords = load(file);
func(coords);
I have a question about the alignment of structures in C++. I will pass a set of points to the function func(). Is is OK to do it in the way shown above, or is this relying on platform-dependent behavior? (it works at least with my current compiler) Can somebody recommend a good article on the topic?
Or, is it better to directly create a set of points while loading the data from the file?
Thanks
Structure alignment is implementation-dependent. However, most compilers give you a way of specifying that a structure should be "packed" (that is, arranged in memory with no padding bytes between fields). For example:
struct Vector {
float x;
float y;
float z;
} __attribute__((__packed__));
The above code will cause the gcc compiler to pack the structure in memory, making it easier to dump to a file and read back in later. The exact way to do this may be different for your compiler (details should be in your compiler's manual).
I always list members of packed structures on separate lines in order to be clear about the order in which they should appear. For most compilers this should be equivalent to float x, y, z; but I'm not certain if that is implementation-dependent behavior or not. To be safe, I would use one declaration per line.
If you are reading the data from a file, you need to validate the data before passing it to func. No amount of data alignment enforcement will make up for a lack of input validation.
Edit:
After further reading your code, I understand more what you are trying to do. You have a structure that contains three float values, and you are accessing it with a float* as if it were an array of floats. This is very bad practice. You don't know what kind of padding that your compiler might be using at the beginning or end of your structure. Even with a packed structure, it's not safe to treat the structure like an array. If an array is what you want, then use an array. The safest way is to read the data out of the file, store it into a new object of type struct Vector, and pass that to func. If func is defined to take a struct Vector* as an argument and your compiler is allowing you to pass a float* without griping, then this is indeed implementation-dependent behavior that you should not rely on.
Use an operator>> extraction overload.
std::istream& operator>>(std::istream& stream, Vector& vec) {
stream >> vec.x;
stream >> vec.y;
stream >> vec.z;
return stream;
}
Now you can do:
std::ifstream MyFile("My Filepath", std::ios::openmodes);
Vector vec;
MyFile >> vec;
func(&vec);
Prefer passing by reference than passing by pointer:
void func(Vector& vectors)
{ /*...*/ }
The difference here between a pointer and a reference is that a pointer can be NULL or point to some strange place in memory. A reference refers to an existing object.
As far as alignment goes, don't concern yourself. Compilers handle this automagically (at least alignment in memory).
If you are talking about alignment of binary data in a file, search for the term "serialization".
First of all, your example code is bad:
load float *coords = load(file);
func(coords);
You're passing func() a pointer to a float var instead of a pointer to a Vector object.
Secondly, Vector's total size if equal to (sizeof(float) * 3), or in other words to 12 bytes.
I'd consult my compiler's manual to see how to control the struct's aligment, and just to get a peace of mind I'd set it to, say 16 bytes.
That way I'll know that the file, if contains one vector, is only 16 bytes in size always and I need to read only 16 bytes.
Edit:
Check MSVC9's align capabilities .
Writing binary data is non portable between machines.
About the only portable thing is text (even then can not be relied as not all systems use the same text format (luckily most accept the 127 ASCII characters and hopefully soon we will standardize on something like Unicode (he says with a smile)).
If you want to write data to a file you must decide the exact format of the file. Then write code that will read the data from that format and convert it into your specific hardware's representation for that type. Now this format could be binary or it could be a serialized text format it does not matter much in performance (as the disk IO speed will probably be your limiting factor). In terms of compactness the binary format will probably be more efficient. In terms of ease of writing decoding functions on each platform the text format is definitely easier as a lot of it is already built into the streams.
So simple solution:
Read/Write to a serialized text format.
Also no alignment issues.
#include <algorithm>
#include <fstream>
#include <vector>
#include <iterator>
struct Vector
{
float x, y, z;
};
std::ostream& operator<<(std::ostream& stream, Vector const& data)
{
return stream << data.x << " " << data.y << " " << data.z << " ";
}
std::istream& operator>>(std::istream& stream, Vector& data)
{
return stream >> data.x >> data.y >> data.z;
}
int main()
{
// Copy an array to a file
Vector data[] = {{1.0,2.0,3.0}, {2.0,3.0,4.0}, { 3.0,4.0,5.0}};
std::ofstream file("plop");
std::copy(data, data+3, std::ostream_iterator<Vector>(file));
// Read data from a file.
std::vector<Vector> newData; // use a vector as we don't know how big the file is.
std::ifstream input("inputFile");
std::copy(std::istream_iterator<Vector>(input),
std::istream_iterator<Vector>(),
std::back_inserter(newData)
);
}
I have an object with several text strings as members. I want to write this object to the file all at once, instead of writing each string to file. How can I do that?
You can override operator>> and operator<< to read/write to stream.
Example Entry struct with some values:
struct Entry2
{
string original;
string currency;
Entry2() {}
Entry2(string& in);
Entry2(string& original, string& currency)
: original(original), currency(currency)
{}
};
istream& operator>>(istream& is, Entry2& en);
ostream& operator<<(ostream& os, const Entry2& en);
Implementation:
using namespace std;
istream& operator>>(istream& is, Entry2& en)
{
is >> en.original;
is >> en.currency;
return is;
}
ostream& operator<<(ostream& os, const Entry2& en)
{
os << en.original << " " << en.currency;
return os;
}
Then you open filestream, and for each object you call:
ifstream in(filename.c_str());
Entry2 e;
in >> e;
//if you want to use read:
//in.read(reinterpret_cast<const char*>(&e),sizeof(e));
in.close();
Or output:
Entry2 e;
// set values in e
ofstream out(filename.c_str());
out << e;
out.close();
Or if you want to use stream read and write then you just replace relevant code in operators implementation.
When the variables are private inside your struct/class then you need to declare operators as friend methods.
You implement any format/separators that you like. When your string include spaces use getline() that takes a string and stream instead of >> because operator>> uses spaces as delimiters by default. Depends on your separators.
It's called serialization. There are many serialization threads on SO.
There are also a nice serialization library included in boost.
http://www.boost.org/doc/libs/1_42_0/libs/serialization/doc/index.html
basically you can do
myFile<<myObject
and
myFile>>myObject
with boost serialization.
If you have:
struct A {
char a[30], b[25], c[15];
int x;
}
then you can write it all just with write(fh, ptr, sizeof(struct a)).
Of course, this isn't portable (because we're not saving the endieness or size of "int," but that may not be an issue for you.
If you have:
struct A {
char *a, *b, *c;
int d;
}
then you're not looking to write the object; you're looking to serialize it. Your best bet is to look in the Boost libraries and use their serialization routines, because it's not an easy problem in languages without reflection.
There's not really a simple way, it's C++ after all, not PHP, or JavaScript.
http://www.parashift.com/c++-faq-lite/serialization.html
Boost also has some library for it: http://www.boost.org/doc/libs/release/libs/serialization ... like Tronic already mentioned :)
The better method is to write each field individually along with the string length.
As an alternative, you can create a char array (or std::vector<char>) and write all the members into the buffer, then write the buffer to the output.
The underlying thorn is that a compiler is allowed to insert padding between members in a class or structure. Use memcpy or std::copy will result in padding bytes written to the output.
Just remember that you need to either write the string lengths and the content or the content followed by some terminating character.
Other people will suggest checking out the Boost Serialization library.
Unfortunately that is generally not quite possible. If your struct only contains plain data (no pointers or complex objects), you can store it as a one chunk, but care must be taken if portability is an issue. Padding, data type size and endianess issues make this problematic.
You can use Boost.Serialization to minimize the amount of code required for proper portable and versionable searialization.
Assuming your goal is as stated, to write out the object with a single call to write() or fwrite() or whatever, you'd first need to copy the string and other object data into a single contiguous block of memory. Then you could write() that block of memory out with a single call. Or you might be able to do a vector-write by calling writev(), if that call is available on your platform.
That said, you probably won't gain much by reducing the number of write calls. Especially if you are using fwrite() or similar already, then the C library is already doing buffering for you, so the cost of multiple small calls is minimal anyway. Don't put yourself through a lot of extra pain and code complexity unless it will actually do some good...