overriding `istream operator>>` vs using `sscanf` - c++

Say I wanted to initialise a std::vector of objects e.g.
class Person { int ID; string name; ...}
from a file that contains a line for each object. One route, is to override operator>> and simply std::cin>>temp_person, another - which I used to favour is to use sscanf("%...", &...) a bunch of temporary primitive types and simply .emplace_back(Person(temp_primitives...).
Which way achieves the quickest runtime ignoring memory footprint? Is there any point in mmap()ing the entire file?

Since you are reading from a file, the performance is going to be I/O-bound. Almost no matter what you do in memory, the effect on the overall performance is not going to be detectable.
I would prefer the operator>> route, because this would let me use the input iterator idiom of C++:
std::istream_iterator<Person> eos;
std::istream_iterator<Person> iit(inputFile);
std::copy(iit, eos, std::back_inserter(person_vector));
or even
std::vector<Person> person_vector(
std::istream_iterator<Person>(inputFile)
, std::istream_iterator<Person>()
);

Related

How to share global constants with minimum overhead at runtime?

I am using C++11. I am not allowed to use external libraries like boost etc. I must use STL only.
I have a number of events, which must be identified as string constants. I am not allowed to use enums or ints or any other data type. For example:
"event_name1"
"event_name2"
"some_other_event_name3"
"a_different_event_name12"
Then I have some classes which need to use these strings, but don't know the other classes exist (they don't have anything to do with each other).
class Panel{
void postEvent(){
SomeSingleton::postEvent("event_name");
}
}
Another class::
class SomeClass{
SomeClass(){
SomeSingleton::listenForEvent("event_name");
}
void receiveEvent(){
//This function is triggered when "event_name" occurs.
//Do stuff
}
}
All these events are constants, and are used to identify things that are happening.
Here is what I have tried:
How to store string constants that will be accessed by a number of different classes?
Some of the persons there suggested I provide specific details of how to solve a concrete problem, so I have created this new question.
How can I store the strings in a common file, so that all the other classes that use these strings can refer to the same file?
I do not want to waste memory or leak memory during my app's lifetime (it is a mobile app)
compilation times are not a big deal to me, since the project isn't so big
there are expected to be maybe 50 different events.
It seems it would be more maintainable to keep all the strings in one file, and edit only this file as and when things change.
Any class can listen for any event, at any time, and I won't know prior to compilation
The easiest way would be to use a char const* constant, as it's way more optimizable and don't use dynamic allocations.
Also you can use std::string_view in the postEvent function, avoiding dynamic allocations. This step is optional. If you cannot have string views and still want to avoid dynamic allocations, then refer to your implementation's SSO max capacity and keep event names below that size.
Also consider that nonstd::string_view can be shipped as a C++11 library and most likely the abstraction you need. Library such as cpp17_headers and string-view-lite exist solely for that purpose.
It look like this:
constexpr auto event_name1 = "event_name1";
In a class as a static member it works the same way:
struct Type {
static constexpr auto event_name1 = "event_name1";
};
This will at most take space in the read-only static data of your executable.
In light of the fact that you're stuck with C++11, I think my suggestion from here still stands:
#ifndef INCLUDED_EVENT_NAMES
#define INCLUDED_EVENT_NAMES
#pragma once
namespace event_names
{
constexpr auto& event_1 = "event_1";
constexpr auto& event_2 = "event_2";
}
#endif
Defining named references to string literal objects is very simple, does not require any additional libraries, is guaranteed to not introduce any unnecessary objects, won't require any additional memory over the storage for the statically-allocated string literal objects that you'd need anyways, and will not have any runtime overhead.
If you could use C++17, I'd suggest to go with the std::string_view approach, but in C++11, I think the above is most-likely a good compromise for your application.
Global const std::string has one drawback it need processing during startup and creates copy of string literal.
The linked SO answear uses constexpr std::string_view and this is cool solution since constructor is constexpr so nothing have to be done on startup. Also it doesn't create any copy. Problem is that this is C++17
Use of const char [] (or auto or constexpr) is old proven solution. You can compare std::string with it without any extra overhead.
You can create header file for all that strings and let linker to remove all duplicates. It was working like that in old C++.
You can have a struct of static strings:
struct MyNames
{
static const std::string name1;
};
And in a cpp:
const std::string MyNames::name1 = "foo";
You can then access the names from all your required locations. In C++17, you would have used string_view instead to avoid object construction. But this seems to be a duplicate of this answer, basically: https://stackoverflow.com/a/55493109/2266772
For the sake of proper abstraction and good design, you should define an event class. This event class will have either:
A method which provide a string (e.g. name() or system_name())
A conversion operator to a string (not recommended)
A to_string() freestanding function which takes such an event (not recommend)
But beyond that - all of your class can now use an enum, or an index, or whatever they like - they'll just need to use the conversion method whenever they interact with whatever it is that requires strings. Thus none of your classes has to actually know about those strings itself.
The strings themselves can stay within the .cpp implementation file of the class, and nobody else has to know about them. (unles they are actually defined in code that's not yours, but that's not how you described the problem.)

is it a good thing to use iterators to read on a formatted stream?

I have written a class that acts like an iterator to parse CSV formatted files.
I have also written other classes to read specific csv files to fill directly a MyObject structure. Thus the class can be used like that (I removed the error handling part of the code):
std::ifstream in(filename);
MyObjectParser parser(in);
MyObjectParser::Iterator it;
for (it = parser.begin(); it != parser.end(); it++)
{
MyObject b = *it;
// do some stuff here ...
}
The program works well and I'm happy with it but I realized that the implicit meaning (only for myself?) of an iterator is that it will iterate over a collection. In this case there is no collection but a stream.
Should I prefer a form that explicitly suggest i'm using a stream by overloading >> operator
and thus having something like that :
std::ifstream in(filename);
MyObjectReader reader(in);
MyObject obj;
while(reader >> obj)
{
// do the same "some stuff" here...
}
Is it only a matter of taste?
I don't see clearly what are the differences (except that in the second form the object is just filled and not copied) and what are the consequences of choosing the first or the second form.
I would be happy to get some other opinions in order to know exactly why i'm using a solution rather than another.
You can treat a stream as a collection if you want.
I'd note, however, that by overloading operator>>, you can have both -- you can explicitly read data from the stream using operator>> directly, or you can treat the stream as a collection by using std::istream_iterator<whatever> to treat it as a collection.
That being the case, it seems to me that overloading operator>> is the obvious choice, since then you can treat things either way with essentially no extra work. In addition, using std::istream_iterator<x> is a fairly recognizable idiom, since it's included in the standard library.
The concept of iteration is not dependent on that of containers.
Iterators iterate over a sequence of values. Different iterator
designs define the sequence in different ways, but there is
always the ideas of current value, advance and reaching the end.
About the only problem with input iterators is that they only
terminate at the end of file; you cannot say, for example, that
the next 10 lines contain doubles, and then we go on to
something else. (But of course, you can insert a filtering
streambuf in the stream to detect the end.)

How Best to Keep Function From Closing File?

So, I've been trying to be more rigorous with making any passed parameters that shouldn't be touched by a function const.
One situation I've encountered in some of my C++ code is the case where the object may change, but where I want to "lock out" functions from access certain key functionality of the object. For example, for an std::ifstream file handle, I may wish to prevent the function from closing the file.
If I pass it as a const &, the const part keeps me from performing standard file i/o, it seems.
e.g. I want something along the lines of
void GetTags(Arr<std::string> & tags, std::ifstream const& fileHandle)
...but written in such a way to allow file i/o but not open/close operations.
Is there any good/reliable way to do this in C++? What would be considered best practice?
This has already been done for you by the standard library design: Pass a reference to the base class std::istream instead, which does not have a notion of opening or closing - it exposes only the stream interface.
void stream_me(std::istream & is);
std::ifstream is("myfile.txt");
stream_me(is);
In your place I'd just pass a std::istream instead.
You could wrap the ifstream in an object that only exposed the functionality that you wished the caller to be able to use.
However, if you have a bunch of different functions, each with a different subset of ifstream's functionality, you'll end up with lots of different wrapper classes; so I don't see this as a general solution.
I think the best way would be to wrap the ifstream in a new class which only has member functions corresponding to the functionality you wantGetTags to have access to. Then pass that not the ifstream as the second argument to GetTags.

Why 'ifstream' and 'ofstream' are added to "std", while 'fstream' can serve both the purposes?

Using std::fstream one can declare objects of both the types ifstream and ofstream. The only difference is that, with fstream we need to provide in, out, app as a parameter which may not always require for other two.
Is there anything special about ifstream,ofstream which cannot be accomplished with fstream or just a coding convenience ?
It's a bit like asking why we'd want const when you can read and write from variables anyway. It allows compile-time checking, an invaluable feature for reducing bugs. It's also more self-documenting, as when looking at a declaration without the constructor call you can see whether it's an input, output or both: the parameters you mention can often only be seen in the implementation file which may not be to hand. Also, each type of stream may have a few differences in the data members they need - potentially using the minimally-functional class matching your actual needs could save memory, time initialising or checking those other variables etc..
If anything, fstream is the one that's just a convenience. In particular, what you have is basically:
namespace std {
class ifstream { /* ... */ };
class ofstream { /* ... */ };
class fstream : public ifstream, public ofstream { /* ... */ };
}
[obviously skipping over a lot of irrelevant details].
In short, the fstream provides all of the input capabilities of an ifstream and all the output capabilities of a ofstream by deriving from both ifstream and ofstream. Without ifstream and ofstream, an fstream (at least in anything resembling its current form) couldn't exist at all.
The whole point is to be generic. If you only need to read a file, you can take an ifstream as parameter, and then anything which supports reading can be passed in, even if it isn't writeable. And vice versa.

I need some C++ guru's opinions on extending std::string

I've always wanted a bit more functionality in STL's string. Since subclassing STL types is a no no, mostly I've seen the recommended method of extension of these classes is just to write functions (not member functions) that take the type as the first argument.
I've never been thrilled with this solution. For one, it's not necessarily obvious where all such methods are in the code, for another, I just don't like the syntax. I want to use . when I call methods!
A while ago I came up with the following:
class StringBox
{
public:
StringBox( std::string& storage ) :
_storage( storage )
{
}
// Methods I wish std::string had...
void Format();
void Split();
double ToDouble();
void Join(); // etc...
private:
StringBox();
std::string& _storage;
};
Note that StringBox requires a reference to a std::string for construction... This puts some interesting limits on it's use (and I hope, means it doesn't contribute to the string class proliferation problem)... In my own code, I'm almost always just declaring it on the stack in a method, just to modify a std::string.
A use example might look like this:
string OperateOnString( float num, string a, string b )
{
string nameS;
StringBox name( nameS );
name.Format( "%f-%s-%s", num, a.c_str(), b.c_str() );
return nameS;
}
My question is: What do the C++ guru's of the StackOverflow community think of this method of STL extension?
I've never been thrilled with this solution. For one, it's not necessarily obvious where all such methods are in the code, for another, I just don't like the syntax. I want to use . when I call methods!
And I want to use $!---& when I call methods! Deal with it. If you're going to write C++ code, stick to C++ conventions. And a very important C++ convention is to prefer non-member functions when possible.
There is a reason C++ gurus recommend this:
It improves encapsulation, extensibility and reuse. (std::sort can work with all iterator pairs because it isn't a member of any single iterator or container class. And no matter how you extend std::string, you can not break it, as long as you stick to non-member functions. And even if you don't have access to, or aren't allowed to modify, the source code for a class, you can still extend it by defining nonmember functions)
Personally, I can't see the point in your code. Isn't this a lot simpler, more readable and shorter?
string OperateOnString( float num, string a, string b )
{
string nameS;
Format(nameS, "%f-%s-%s", num, a.c_str(), b.c_str() );
return nameS;
}
// or even better, if `Format` is made to return the string it creates, instead of taking it as a parameter
string OperateOnString( float num, string a, string b )
{
return Format("%f-%s-%s", num, a.c_str(), b.c_str() );
}
When in Rome, do as the Romans, as the saying goes. Especially when the Romans have good reasons to do as they do. And especially when your own way of doing it doesn't actually have a single advantage. It is more error-prone, confusing to people reading your code, non-idiomatic and it is just more lines of code to do the same thing.
As for your problem that it's hard to find the non-member functions that extend string, place them in a namespace if that's a concern. That's what they're for. Create a namespace StringUtil or something, and put them there.
As most of us "gurus" seem to favour the use of free functions, probably contained in a namespace, I think it safe to say that your solution will not be popular. I'm afraid I can't see one single advantage it has, and the fact that the class contains a reference is an invitation to that becoming a dangling reference.
I'll add a little something that hasn't already been posted. The Boost String Algorithms library has taken the free template function approach, and the string algorithms they provide are spectacularly re-usable for anything that looks like a string: std::string, char*, std::vector, iterator pairs... you name it! And they put them all neatly in the boost::algorithm namespace (I often use using namespace algo = boost::algorithm to make string manipulation code more terse).
So consider using free template functions for your string extensions, and look at Boost String Algorithms on how to make them "universal".
For safe printf-style formatting, check out Boost.Format. It can output to strings and streams.
I too wanted everything to be a member function, but I'm now starting to see the light. UML and doxygen are always pressuring me to put functions inside of classes, because I was brainwashed by the idea that C++ API == class hierarchy.
If the scope of the string isn't the same as the StringBox you can get segfaults:
StringBox foo() {
string s("abc");
return StringBox(s);
}
At least prevent object copying by declaring the assignment operator and copy ctor private:
class StringBox {
//...
private:
void operator=(const StringBox&);
StringBox(const StringBox&);
};
EDIT: regarding API, in order to prevent surprises I would make the StringBox own its copy of the string. I can think fo 2 ways to do this:
Copy the string to a member (not a reference), get the result later - also as a copy
Access your string through a reference-counting smart pointer like std::tr1::shared_ptr or boost:shared_ptr, to prevent extra copying
The problem with loose functions is that they're loose functions.
I would bet money that most of you have created a function that was already provided by the STL because you simply didn't know the STL function existed, or that it could do what you were trying to accomplish.
It's a fairly punishing design, especially for new users. (The STL gets new additions too, further adding to the problem.)
Google: C++ to string
How many results mention: std::to_string
I'm just as likely to find some ancient C method, or some homemade version, as I am to find the STL version of any given function.
I much prefer member methods because you don't have to struggle to find them, and you don't need to worry about finding old deprecated versions, etc,. (ie, string.SomeMethod, is pretty much guaranteed to be the method you should be using, and it gives you something concrete to Google for.)
C# style extension methods would be a good solution.
They're loose functions.
They show up as member functions via intellisense.
This should allow everyone to do exactly what they want.
It seems like it could be accomplished in the IDE itself, rather than requiring any language changes.
Basically, if the interpreter hits some call to a member that doesn't exist, it can check headers for matching loose functions, and dynamically fix it up before passing it on to the compiler.
Something similar could be done when it's loading up the intellisense data.
I have no idea how this could be worked for existing functions, no massive change like this should be taken lightly, but, for new functions using a new syntax, it shouldn't be a problem.
namespace StringExt
{
std::string MyFunc(this std::string source);
}
That can be used by itself, or as a member of std::string, and the IDE can handle all the grunt work.
Of course, this still leaves the problem of methods being spread out over various headers, which could be solved in various ways.
Some sort of extension header: string_ext which could include common methods.
Hmm....
That's a tougher issue to solve without causing issues...
If you want to extend the methods available to act on string, I would extend it by creating a class that has static methods that take the standard string as a parameter.
That way, people are free to use your utilities, but don't need to change the signatures of their functions to take a new class.
This breaks the object-oriented model a little, but makes the code much more robust - i.e. if you change your string class, then it doesn't have as much impact on other code.
Follow the recommended guidelines, they are there for a reason :)
The best way is to use templated free functions. The next best is private inheritance struct extended_str : private string, which happens to get easier in C++0x by the way as you can using constructors. Private inheritance is too much trouble and too risky just to add some algorithms. What you are doing is too risky for anything.
You've just introduced a nontrivial data structure to accomplish a change in code punctuation. You have to manually create and destroy a Box for each string, and you still need to distinguish your methods from the native ones. You will quickly get tired of this convention.