how do C++ professional programmers implement common abstractions? - c++

I've never programmed with C++ professionally and working with (Visual) C++ as student. I'm having difficulty dealing with the lack of abstractions especially with the STL container classes. For example, the vector class doesn't contain a simple remove method, common in many libraries e.g. .NET Framework. I know there's an erase method, it doesn't make the remove method abstract enough to reduce the operation to a one-line method call. For example, if I have a
std::vector<std::string>
I don't know how else to remove a string element from the vector without iterating thru it and searching for a matching string element.
bool remove(vector<string> & msgs, string toRemove) {
if (msgs.size() > 0) {
vector<string>::iterator it = msgs.end() - 1;
while (it >= msgs.begin()) {
string remove = it->data();
if (remove == toRemove) {
//std::cout << "removing '" << it->data() << "'\n";
msgs.erase(it);
return true;
}
it--;
}
}
return false;
}
What do professional C++ programmers do in this situation? Do you write out the implementation every time? Do you create your own container class, your own library of helper functions, or do you suggest using another library i.e. Boost (even if you program Windows in Visual Studio)? or something else?
(if the above remove operation needs work, please leave an alternative method of doing this, thanks.)

You would use the "remove and erase idiom":
v.erase(std::remove(v.begin(), v.end(), mystring), v.end());
The point is that vector is a sequence container and not geared towards manipulation by value. Depending on your design needs, a different standard library container may be more appropriate.
Note that the remove algorithm merely reorders elements of the range, it does not erase anything from the container. That's because iterators don't carry information about their container with them, and this is fully intentional: By separating iterators from their containers, one can write generic algorithms that work on any sensible container.
Idiomatic modern C++ would try to follow that pattern whenever applicable: Expose your data through iterators and use generic algorithms to manipulate it.

Have you considered std::remove_if?
http://www.cplusplus.com/reference/algorithm/remove_if/

IMO, professionally, it is perfectly logical to write up custom implementation do custom tasks, especially if the standard doesn't provide that. This is much better than writing (read: copy-paste) the same stuff again and again. One may also take advantage of inline function, template functions, macros to place same stuff in one place. This reduces any bugs that may encounter while re-using the same stuff (which may go somewhat wrong while pasting). It also makes it possible to correct the bug in one place.
Templates and macros, if properly designed, are very useful - they aren't code bloat.
Edit: Your code needs improvement:
bool remove(vector & msgs, cosnt string& toRemove);
To iterator over a collection, a for loop is sufficient. There is no need to check for size, take last iterator, check with begin, get data and all.
There is no need to waste a string - just compare it, and remove.
For your problem, I believe a map or set would fit much better.

Related

In C++, should iterable types be non-polymorphic?

A bit of background:
I am currently working on an assignment from my OOP course which consists in designing and implementing a phone book manager around various design patterns.
In my project there are 3 classes around which all the action happens:
PhoneBook;
Contact (the types stored in the phone book);
ContactField (fields stored in the Contact).
ContactManager must provide a way to iterate over its contacts in 2 modes: unfiltered and filtered based on a predicate; Contact must provide a way to iterate over its fields.
How I initially decided to implement:
All design patterns books I came across recommend coding to an interface so my first thought was to extract an interface from each of the above classes and then make them implement it.
Now I also have to create some kind of polymorphic iterator for things to be smooth so I adapted the Java iterator interface to write forward iterators.
The problems:
The major setback with this design is that I lose interoperability
with stl <algorithm> and the syntactic sugar offered by range
based for loops.
Another issue I came across is the Iterator<T>::remove() function. If
I want an iterator that can alter the sequence it iterates over
(remove elements) then all is fine however if I don't want
that behavior I'm not exactly sure what to do.
I see that in Java one can throw UnsupportedOperationException
which isn't that bad since (correct me if I'm wrong) if an
exception isn't handled then the application is terminated and a
stack trace is shown. In C++ you don't really have that luxury
(unless you run with a debugger attached I think) and to be honest
I'd rather prefer to catch such errors at compile time.
The easiest way out (that I see) of this mess is to avoid using interfaces on the iterable types in order to accommodate my own stl compatible iterators. This will increase coupling however I'm not sure it will actually have any impact in the long run (not in the sense that this project will become throw away code soon of course). My guess is that it won't however, I'd like to hear the elders opinion as well before I proceed with my design.
I would probably take a slightly different approach.
Firstly, iteration over a contact is pretty simple since it's a single type of iteration and you can just provide begin and end methods to allow iteration over the underlying fields.
For the iteration over a PhoneBook I would still just provide a normal begin and end, and then provide a for_each_if function that you use to iterate over only the contacts that are interesting, instead of trying to provide a super-custom iterator that skips over un-interesting elements.

C++ Returning Multiple Items

I am designing a class in C++ that extracts URLs from an HTML page. I am using Boost's Regex library to do the heavy lifting for me. I started designing a class and realized that I didn't want to tie down how the URLs are stored. One option would be to accept a std::vector<Url> by reference and just call push_back on it. I'd like to avoid forcing consumers of my class to use std::vector. So, I created a member template that took a destination iterator. It looks like this:
template <typename TForwardIterator, typename TOutputIterator>
TOutputIterator UrlExtractor::get_urls(
TForwardIterator begin,
TForwardIterator end,
TOutputIterator dest);
I feel like I am overcomplicating things. I like to write fairly generic code in C++, and I struggle to lock down my interfaces. But then I get into these predicaments where I am trying to templatize everything. At this point, someone reading the code doesn't realize that TForwardIterator is iterating over a std::string.
In my particular situation, I am wondering if being this generic is a good thing. At what point do you start making code more explicit? Is there a standard approach to getting values out of a function generically?
Yes, it's not only fine but a very nice design. Templating that way is how most of the standard library algorithms work, like std::fill or std::copy; they are made to work with iterators so that you can fill a container that already has elements in it, or you can take an empty container and fill it up with data by using std::back_inserter.
This is a very good design IMO, and takes advantage of the power of templates and the iterator concept.
You can use it like this (but you already know this):
std::list<Url> l;
std::vector<Url> v;
x.get_urls(begin(dat1), end(dat1), std::back_inserter(l));
y.get_urls(begin(dat2), end(dat2), std::back_inserter(v));
I get the feeling that you are afraid of using templates, that they are not "normal" C++, or that they should be avoided and are bloated or something. I assure you, they are very normal and a powerful language feature that no other language (that I know of) has, so whenever it is appropriate to use them, USE THEM. And here, it is very appropriate.
Looks to me that you have the wrong interface.
There are already algorithms for copying from iterators in-to containers. Seems to me that your class is providing a stream of urls (without relying modifying its source). So all you really need is a way to expose you internal data via iterators (forward iterators) and thus all you need to provide begin() and end().
UrlExtractor page(/* Some way of constructing page */);
std::vector<std::string> data;
std::copy(page.begin(), page.end(), std::back_inserter(data));
I would just provide the following interface:
class UrlExtractor
{
...... STUFF
iterator begin();
iterator end();
};
Yes, you are being too general. The point of a template is that you can generate multiple copies of the function that behave differently. You probably don't want that because you should pick one way of representing a URL and use that in your entire program.
How about you just do:
typedef std::string url;
That allows you to change the class you use for urls in the future.
Maybe std::vector implements some interface with push_back() in it and your method can take a reference to that interface (back_inserter?).
It's hard to say without knowing the actual use case scenarios, but in
general, it's better to avoid templates (or any other unnecessary
complexity) unless it actually buys you something. The most obvious
signature here would be:
std::vector<Url> UrlExtractor::get_urls( std::string const& source );
Is there really any likely scenario where you'll have to parse anything
but an std::string? (There might be if you also supported input
iterators. But in practice, if you're parsing, the sources will be
either a std::string or an std::istream&. Unless you really want to
support the latter, just use std::string.) And of course, client code
can do whatever it wants with the returned vector, including appending
it to another type of collection.
If the cost of returning a std::vector does become an issue, then you
could take an std::vector<Url>& as an argument. I can't see any
reasonable scenario where any additional flexibility would buy you much,
and a function like get_urls is likely to be fairly complicated, and
not the sort of thing you'd want to put in a header.

Simplifying find operation on vector

I'm using a vector for several arrays in my code due to the requirement of random access to the individual elements of the array. Some user GUI operations require searching through the array (but not enough to warrant the use of std::map), so littered through the code is this:
if (std::find(array.begin(), array.end(), searchfor) != array.end()) { ... }
I'm thinking of a better and more easily readable way of doing this, perhaps creating a method so I can do something like if (array_find(searchfor) != array.end()) or maybe even extending vector so I can do if (array.find(searchfor) != array.end()).
I'm not sure of the best way to do it. Any ideas?
Use whatever you prefer. I believe doing a function is better though, it avoids the hassle of creating a new class and using it everywhere. Something like :
bool array_contains(searchFor)
{
return std::find(array.begin(), array.end(), searchfor) != array.end();
}
You might find Boost.Range worth a look. Basically you can get rid of begin/end calls as arguments, using collection reference instead.
#include <boost/range/algorithm/find.hpp>
...
if (boost::find(array, searchfor) != array.end()) { ... }
The advantage of this solution is that you still get iterator as a result, which often proves to be useful.
Unless you want to re-write the entire Standard library to use ranges in cases where it's appropriate (like this one) and ensure that everyone that you work with also does this, then the first posted code is the best one.

Interface-based programming in C++ in combination with iterators. How too keep this simple?

In my developments I am slowly moving from an object-oriented approach to interface-based-programming approach. More precisely:
in the past I was already satisfied if I could group logic in a class
now I tend to put more logic behind an interface and let a factory create the implementation
A simple example clarifies this.
In the past I wrote these classes:
Library
Book
Now I write these classes:
ILibrary
Library
LibraryFactory
IBook
Book
BookFactory
This approach allows me to easily implement mocking classes for each of my interfaces, and to switch between old, slower implementations and new, faster implementations, and compare them both within the same application.
For most cases this works very good, but it becomes a problem if I want to use iterators to loop over collections.
Suppose my Library has a collection of books and I want to iterator over them. In the past this wasn't a problem: Library::begin() and Library::end() returned an iterator (Library::iterator) on which I could easily write a loop, like this:
for (Library::iterator it=myLibrary.begin();it!=mylibrary.end();++it) ...
Problem is that in the interface-based approach, there is no guarantee that different implementations of ILibrary use the same kind of iterator. If e.g. OldLibrary and NewLibrary both inherit from ILibrary, then:
OldLibrary could use an std::vector to store its books, and return std::vector::const_iterator in its begin and end methods
NewLibrary could use an std::list to store its books, and return std::list::const_iterator in its begin and end methods
Requiring both ILibrary implementations to return the same kind of iterator isn't a solution either, since in practice the increment operation (++it) needs to be implemented differently in both implementations.
This means that in practice I have to make the iterator an interface as well, meaning that application can't put the iterator on the stack (typical C++ slicing problem).
I could solve this problem by wrapping the iterator-interface within a non-interface class, but this seems a quite complex solution for what I try to obtian.
Are there better ways to handle this problem?
EDIT:
Some clarifications after remarks made by Martin.
Suppose I have a class that returns all books sorted on popularity: LibraryBookFinder.
It has begin() and end() methods that return a LibraryBookFinder::const_iterator which refers to a book.
To replace my old implementation with a brand new one, I want to put the old LibraryBookFinder behind an interface ILibraryBookFinder, and rename the old implementation to OldSlowLibraryBookFinder.
Then my new (blistering fast) implementation called VeryFastCachingLibraryBookFinder can inherit from ILibraryBookFinder. This is where the iterator problem comes from.
Next step could be to hide the interface behind a factory, where I can ask the factory "give me a 'finder' that is very good at returning books according popularity, or according title, or author, .... You end up with code like this:
ILibraryBookFinder *myFinder = LibraryBookFinderFactory (FINDER_POPULARITY);
for (ILibraryBookFinder::const_iterator it=myFinder->begin();it!=myFinder.end();++it) ...
or if I want to use another criteria:
ILibraryBookFinder *myFinder = LibraryBookFinderFactory (FINDER_AUTHOR);
for (ILibraryBookFinder::const_iterator it=myFinder->begin();it!=myFinder.end();++it) ...
The argument of LibraryBookFinderFactory may be determined by an external factor: a configuration setting, a command line option, a selection in a dialog, ... And every implementation has its own kind of optimizations (e.g. the author of a book doesn't change so this can be a quite static cache; the popularity can change daily which may imply a totally different data structure).
You are mixing metaphors here.
If a library is a container then it needs its own iterator it can't re-use an iterator of a member. Thus you would wrap the member iterator in an implementation of ILibraryIterator.
But strictly speaking a Library is not a container it is a library.
Thus the methods on a library are actions (think verbs here) that you can perform on a library. A library may contain a container but strictly speaking it is not a container and thus should not be exposing begin() and end().
So if you want to perform an action on the books you should ask the library to perform the action (by providing the functor). The concept of a class is that it is self contained. User should not be using getter to get stuff about the object and then put stuff back the object should know how to perform the action on itself (this is why I hate getters/setters as they break encapsulation).
class ILibrary
{
public:
IBook const& getBook(Index i) const;
template<R,A>
R checkBooks(A const& librarianAction);
};
If your libraries hold a lot of books, you should consider putting your "aggregate" functions into your collections and pass in the action want it to be perform.
Something in the nature of:
class ILibrary
{
public:
virtual ~Ilibrary();
virtual void for_each( boost::function1<void, IBook> func ) = 0;
};
LibraryImpl::for_each( boost::function1<void, IBook> func )
{
std::for_each( myImplCollection.begin(), myImplCollection.end(), func );
}
Although probably not exactly like that because you may need to deal with using shared_ptr, constness etc.
For this purpose (or in general in implementations where I make heavy use of interfaces), I have also created an interface for an iterator and other objects return this. It becomes pretty Java-a-like.
If you care about having the iterator in most cases of the stack: Your problem is of course that you don't really know the size of the iterator at compile time so you cannot allocate a stack variable of the correct size. But if you really care a lot about this: Maybe you could write some wrapper which either allocates a specific size on the stack (e.g. 128 bytes) and if the new iterator fits in, it moves it there (be sure that your iterator has a proper interface to allow this in a clean way). Or you could use alloca(). E.g. your iterator interface could be like:
struct IIterator {
// iterator stuff here
// ---
// now for the handling on the stack
virtual size_t size() = 0; // must return own size
virtual void copyTo(IIterator* pt) = 0;
};
and your wrapper:
struct IteratorWrapper {
IIterator* pt;
IteratorWrapper(IIterator* i) {
pt = alloca(i->size());
i->copyTo(pt);
}
// ...
};
Or so.
Another way, if in theory it would be always clear at compile time (not sure if that holds true for you; it is a clear restriction): Use functors everywhere. This has many other disadvantages (mainly having all real code in header files) but you will have really fast code. Example:
template<typename T>
do_sth_with_library(T& library) {
for(typename T::iterator i = library.begin(); i != library.end(); ++i)
// ...
}
But the code can become pretty ugly if you do rely too heavy on this.
Another nice solution (making the code more functional -- implementing a for_each interface) was provided by CashCow.
With current C++, this could make the code also a bit complicated/ugly to use though. With the upcoming C++0x and lambda functions, this solution can become much more clean.

How do you use stl's functions like for_each?

I started using stl containers because they came in very handy when I needed the functionality of a list, set and map and had nothing else available in my programming environment. I did not care much about the ideas behind it. STL documentation was interesting up to the point where it came to functions, etc. Then I skipped reading and just used the containers.
But yesterday, still being relaxed from my holidays, I just gave it a try and wanted to go a bit more the stl way. So I used the transform function (can I have a little bit of applause for me, thank you).
From an academic point of view it really looked interesting and it worked. But the thing that bothers me is that if you intensify the use of those functions, you need thousands of helper classes for mostly everything you want to do in your code. The whole logic of the program is sliced into tiny pieces. This slicing is not the result of good coding habits; it's just a technical need. Something, that makes my life probably harder not easier.
I learned the hard way, that you should always choose the simplest approach that solves the problem at hand. I can't see what, for example, the for_each function is doing for me that justifies the use of a helper class over several simple lines of code that sit inside a normal loop so that everybody can see what is going on.
I would like to know, what you are thinking about my concerns? Did you see it like I do when you started working this way and have changed your mind when you got used to it? Are there benefits that I overlooked? Or do you just ignore this stuff as I did (and will go on doing it, probably).
Thanks.
PS: I know that there is a real for_each loop in boost. But I ignore it here since it is just a convenient way for my usual loops with iterators I guess.
The whole logic of the program is sliced in tiny pieces. This slicing is not the result of good coding habits. It's just a technical need. Something, that makes my life probably harder not easier.
You're right, to a certain extent. That's why the upcoming revision to the C++ standard will add lambda expressions, allowing you to do something like this:
std::for_each(vec.begin(), vec.end(), [&](int& val){val++;})
but I also think it is often a good coding habit to split up your code as currently required. You're effectively separating the code describing the operation you want to do, from the act of applying it to a sequence of values. It is some extra boilerplate code, and sometimes it's just annoying, but I think it also often leads to good, clean, code.
Doing the above today would look like this:
int incr(int& val) { return val+1}
// and at the call-site
std::for_each(vec.begin(), vec.end(), incr);
Instead of bloating up the call site with a complete loop, we have a single line describing:
which operation is performed (if it is named appropriately)
which elements are affected
so it's shorter, and conveys the same information as the loop, but more concisely.
I think those are good things. The drawback is that we have to define the incr function elsewhere. And sometimes that's just not worth the effort, which is why lambdas are being added to the language.
I find it most useful when used along with boost::bind and boost::lambda so that I don't have to write my own functor. This is just a tiny example:
class A
{
public:
A() : m_n(0)
{
}
void set(int n)
{
m_n = n;
}
private:
int m_n;
};
int main(){
using namespace boost::lambda;
std::vector<A> a;
a.push_back(A());
a.push_back(A());
std::for_each(a.begin(), a.end(), bind(&A::set, _1, 5));
return 0;
}
You'll find disagreement among experts, but I'd say that for_each and transform are a bit of a distraction. The power of STL is in separating non-trivial algorithms from the data being operated on.
Boost's lambda library is definitely worth experimenting with to see how you get on with it. However, even if you find the syntax satisfactory, the awesome amount of machinery involved has disadvantages in terms of compile time and debug-ability.
My advice is use:
for (Range::const_iterator i = r.begin(), end = r.end(); i != end(); ++i)
{
*out++ = .. // for transform
}
instead of for_each and transform, but more importantly get familiar with the algorithms that are very useful: sort, unique, rotate to pick three at random.
Incrementing a counter for each element of a sequence is not a good example for for_each.
If you look at better examples, you may find it makes the code much clearer to understand and use.
This is some code I wrote today:
// assume some SinkFactory class is defined
// and mapItr is an iterator of a std::map<int,std::vector<SinkFactory*> >
std::for_each(mapItr->second.begin(), mapItr->second.end(),
checked_delete<SinkFactory>);
checked_delete is part of boost, but the implementation is trivial and looks like this:
template<typename T>
void checked_delete(T* pointer)
{
delete pointer;
}
The alternative would have been to write this:
for(vector<SinkFactory>::iterator pSinkFactory = mapItr->second.begin();
pSinkFactory != mapItr->second.end(); ++pSinkFactory)
delete (*pSinkFactory);
More than that, once you have that checked_delete written once (or if you already use boost), you can delete pointers in any sequence aywhere, with the same code, without caring what types you're iterating over (that is, you don't have to declare vector<SinkFactory>::iterator pSinkFactory).
There is also a small performance improvement from the fact that with for_each the container.end() will be only called once, and potentially great performance improvements depending on the for_each implementation (it could be implemented differently depending on the iterator tag received).
Also, if you combine boost::bind with stl sequence algorithms you can make all kinds of fun stuff (see here: http://www.boost.org/doc/libs/1_43_0/libs/bind/bind.html#with_algorithms).
I guess the C++ comity has the same concerns. The to be validated new C++0x standard introduces lambdas. This new feature will enable you to use the algorithm while writing simple helper functions directly in the algorithm parameter list.
std::transform(in.begin(), int.end(), out.begin(), [](int a) { return ++a; })
Local classes are a great feature to solve this. For example:
void IncreaseVector(std::vector<int>& v)
{
class Increment
{
public:
int operator()(int& i)
{
return ++i;
}
};
std::for_each(v.begin(), v.end(), Increment());
}
IMO, this is way too much complexity for just an increment, and it'll be clearer to write it in the form of a regular plain for loop. But when the operation you want to perform over a sequence becomes mor complex. Then I find it useful to clearly separate the operation to be performed over each element from the actual loop sentence. If your functor name is properly chosen, code gets a descriptive plus.
These are indeed real concerns, and these are being addressed in the next version of the C++ standard ("C++0x") which should be published either at the end of this year or in 2011. That version of C++ introduces a notion called C++ lambdas which allow for one to construct simple anonymous functions within another function, which makes it very easy to accomplish what you want without breaking your code into tiny little pieces. Lambdas are (experimentally?) supported in GCC as of GCC 4.5.
Those libraries like STL and Boost are complex also because they need to solve every need and work on any plateform.
As a user of these libraries -- you're not planning on remaking .NET are you? -- you can use their simplified goodies.
Here is possibly a simpler foreach from Boost I like to use:
BOOST_FOREACH(string& item in my_list)
{
...
}
Looks much neater and simpler than using .begin(), .end(), etc. and yet it works for pretty much any iteratable collection (not just arrays/vectors).