Related
I am designing a class in C++ that extracts URLs from an HTML page. I am using Boost's Regex library to do the heavy lifting for me. I started designing a class and realized that I didn't want to tie down how the URLs are stored. One option would be to accept a std::vector<Url> by reference and just call push_back on it. I'd like to avoid forcing consumers of my class to use std::vector. So, I created a member template that took a destination iterator. It looks like this:
template <typename TForwardIterator, typename TOutputIterator>
TOutputIterator UrlExtractor::get_urls(
TForwardIterator begin,
TForwardIterator end,
TOutputIterator dest);
I feel like I am overcomplicating things. I like to write fairly generic code in C++, and I struggle to lock down my interfaces. But then I get into these predicaments where I am trying to templatize everything. At this point, someone reading the code doesn't realize that TForwardIterator is iterating over a std::string.
In my particular situation, I am wondering if being this generic is a good thing. At what point do you start making code more explicit? Is there a standard approach to getting values out of a function generically?
Yes, it's not only fine but a very nice design. Templating that way is how most of the standard library algorithms work, like std::fill or std::copy; they are made to work with iterators so that you can fill a container that already has elements in it, or you can take an empty container and fill it up with data by using std::back_inserter.
This is a very good design IMO, and takes advantage of the power of templates and the iterator concept.
You can use it like this (but you already know this):
std::list<Url> l;
std::vector<Url> v;
x.get_urls(begin(dat1), end(dat1), std::back_inserter(l));
y.get_urls(begin(dat2), end(dat2), std::back_inserter(v));
I get the feeling that you are afraid of using templates, that they are not "normal" C++, or that they should be avoided and are bloated or something. I assure you, they are very normal and a powerful language feature that no other language (that I know of) has, so whenever it is appropriate to use them, USE THEM. And here, it is very appropriate.
Looks to me that you have the wrong interface.
There are already algorithms for copying from iterators in-to containers. Seems to me that your class is providing a stream of urls (without relying modifying its source). So all you really need is a way to expose you internal data via iterators (forward iterators) and thus all you need to provide begin() and end().
UrlExtractor page(/* Some way of constructing page */);
std::vector<std::string> data;
std::copy(page.begin(), page.end(), std::back_inserter(data));
I would just provide the following interface:
class UrlExtractor
{
...... STUFF
iterator begin();
iterator end();
};
Yes, you are being too general. The point of a template is that you can generate multiple copies of the function that behave differently. You probably don't want that because you should pick one way of representing a URL and use that in your entire program.
How about you just do:
typedef std::string url;
That allows you to change the class you use for urls in the future.
Maybe std::vector implements some interface with push_back() in it and your method can take a reference to that interface (back_inserter?).
It's hard to say without knowing the actual use case scenarios, but in
general, it's better to avoid templates (or any other unnecessary
complexity) unless it actually buys you something. The most obvious
signature here would be:
std::vector<Url> UrlExtractor::get_urls( std::string const& source );
Is there really any likely scenario where you'll have to parse anything
but an std::string? (There might be if you also supported input
iterators. But in practice, if you're parsing, the sources will be
either a std::string or an std::istream&. Unless you really want to
support the latter, just use std::string.) And of course, client code
can do whatever it wants with the returned vector, including appending
it to another type of collection.
If the cost of returning a std::vector does become an issue, then you
could take an std::vector<Url>& as an argument. I can't see any
reasonable scenario where any additional flexibility would buy you much,
and a function like get_urls is likely to be fairly complicated, and
not the sort of thing you'd want to put in a header.
I've never programmed with C++ professionally and working with (Visual) C++ as student. I'm having difficulty dealing with the lack of abstractions especially with the STL container classes. For example, the vector class doesn't contain a simple remove method, common in many libraries e.g. .NET Framework. I know there's an erase method, it doesn't make the remove method abstract enough to reduce the operation to a one-line method call. For example, if I have a
std::vector<std::string>
I don't know how else to remove a string element from the vector without iterating thru it and searching for a matching string element.
bool remove(vector<string> & msgs, string toRemove) {
if (msgs.size() > 0) {
vector<string>::iterator it = msgs.end() - 1;
while (it >= msgs.begin()) {
string remove = it->data();
if (remove == toRemove) {
//std::cout << "removing '" << it->data() << "'\n";
msgs.erase(it);
return true;
}
it--;
}
}
return false;
}
What do professional C++ programmers do in this situation? Do you write out the implementation every time? Do you create your own container class, your own library of helper functions, or do you suggest using another library i.e. Boost (even if you program Windows in Visual Studio)? or something else?
(if the above remove operation needs work, please leave an alternative method of doing this, thanks.)
You would use the "remove and erase idiom":
v.erase(std::remove(v.begin(), v.end(), mystring), v.end());
The point is that vector is a sequence container and not geared towards manipulation by value. Depending on your design needs, a different standard library container may be more appropriate.
Note that the remove algorithm merely reorders elements of the range, it does not erase anything from the container. That's because iterators don't carry information about their container with them, and this is fully intentional: By separating iterators from their containers, one can write generic algorithms that work on any sensible container.
Idiomatic modern C++ would try to follow that pattern whenever applicable: Expose your data through iterators and use generic algorithms to manipulate it.
Have you considered std::remove_if?
http://www.cplusplus.com/reference/algorithm/remove_if/
IMO, professionally, it is perfectly logical to write up custom implementation do custom tasks, especially if the standard doesn't provide that. This is much better than writing (read: copy-paste) the same stuff again and again. One may also take advantage of inline function, template functions, macros to place same stuff in one place. This reduces any bugs that may encounter while re-using the same stuff (which may go somewhat wrong while pasting). It also makes it possible to correct the bug in one place.
Templates and macros, if properly designed, are very useful - they aren't code bloat.
Edit: Your code needs improvement:
bool remove(vector & msgs, cosnt string& toRemove);
To iterator over a collection, a for loop is sufficient. There is no need to check for size, take last iterator, check with begin, get data and all.
There is no need to waste a string - just compare it, and remove.
For your problem, I believe a map or set would fit much better.
I started using stl containers because they came in very handy when I needed the functionality of a list, set and map and had nothing else available in my programming environment. I did not care much about the ideas behind it. STL documentation was interesting up to the point where it came to functions, etc. Then I skipped reading and just used the containers.
But yesterday, still being relaxed from my holidays, I just gave it a try and wanted to go a bit more the stl way. So I used the transform function (can I have a little bit of applause for me, thank you).
From an academic point of view it really looked interesting and it worked. But the thing that bothers me is that if you intensify the use of those functions, you need thousands of helper classes for mostly everything you want to do in your code. The whole logic of the program is sliced into tiny pieces. This slicing is not the result of good coding habits; it's just a technical need. Something, that makes my life probably harder not easier.
I learned the hard way, that you should always choose the simplest approach that solves the problem at hand. I can't see what, for example, the for_each function is doing for me that justifies the use of a helper class over several simple lines of code that sit inside a normal loop so that everybody can see what is going on.
I would like to know, what you are thinking about my concerns? Did you see it like I do when you started working this way and have changed your mind when you got used to it? Are there benefits that I overlooked? Or do you just ignore this stuff as I did (and will go on doing it, probably).
Thanks.
PS: I know that there is a real for_each loop in boost. But I ignore it here since it is just a convenient way for my usual loops with iterators I guess.
The whole logic of the program is sliced in tiny pieces. This slicing is not the result of good coding habits. It's just a technical need. Something, that makes my life probably harder not easier.
You're right, to a certain extent. That's why the upcoming revision to the C++ standard will add lambda expressions, allowing you to do something like this:
std::for_each(vec.begin(), vec.end(), [&](int& val){val++;})
but I also think it is often a good coding habit to split up your code as currently required. You're effectively separating the code describing the operation you want to do, from the act of applying it to a sequence of values. It is some extra boilerplate code, and sometimes it's just annoying, but I think it also often leads to good, clean, code.
Doing the above today would look like this:
int incr(int& val) { return val+1}
// and at the call-site
std::for_each(vec.begin(), vec.end(), incr);
Instead of bloating up the call site with a complete loop, we have a single line describing:
which operation is performed (if it is named appropriately)
which elements are affected
so it's shorter, and conveys the same information as the loop, but more concisely.
I think those are good things. The drawback is that we have to define the incr function elsewhere. And sometimes that's just not worth the effort, which is why lambdas are being added to the language.
I find it most useful when used along with boost::bind and boost::lambda so that I don't have to write my own functor. This is just a tiny example:
class A
{
public:
A() : m_n(0)
{
}
void set(int n)
{
m_n = n;
}
private:
int m_n;
};
int main(){
using namespace boost::lambda;
std::vector<A> a;
a.push_back(A());
a.push_back(A());
std::for_each(a.begin(), a.end(), bind(&A::set, _1, 5));
return 0;
}
You'll find disagreement among experts, but I'd say that for_each and transform are a bit of a distraction. The power of STL is in separating non-trivial algorithms from the data being operated on.
Boost's lambda library is definitely worth experimenting with to see how you get on with it. However, even if you find the syntax satisfactory, the awesome amount of machinery involved has disadvantages in terms of compile time and debug-ability.
My advice is use:
for (Range::const_iterator i = r.begin(), end = r.end(); i != end(); ++i)
{
*out++ = .. // for transform
}
instead of for_each and transform, but more importantly get familiar with the algorithms that are very useful: sort, unique, rotate to pick three at random.
Incrementing a counter for each element of a sequence is not a good example for for_each.
If you look at better examples, you may find it makes the code much clearer to understand and use.
This is some code I wrote today:
// assume some SinkFactory class is defined
// and mapItr is an iterator of a std::map<int,std::vector<SinkFactory*> >
std::for_each(mapItr->second.begin(), mapItr->second.end(),
checked_delete<SinkFactory>);
checked_delete is part of boost, but the implementation is trivial and looks like this:
template<typename T>
void checked_delete(T* pointer)
{
delete pointer;
}
The alternative would have been to write this:
for(vector<SinkFactory>::iterator pSinkFactory = mapItr->second.begin();
pSinkFactory != mapItr->second.end(); ++pSinkFactory)
delete (*pSinkFactory);
More than that, once you have that checked_delete written once (or if you already use boost), you can delete pointers in any sequence aywhere, with the same code, without caring what types you're iterating over (that is, you don't have to declare vector<SinkFactory>::iterator pSinkFactory).
There is also a small performance improvement from the fact that with for_each the container.end() will be only called once, and potentially great performance improvements depending on the for_each implementation (it could be implemented differently depending on the iterator tag received).
Also, if you combine boost::bind with stl sequence algorithms you can make all kinds of fun stuff (see here: http://www.boost.org/doc/libs/1_43_0/libs/bind/bind.html#with_algorithms).
I guess the C++ comity has the same concerns. The to be validated new C++0x standard introduces lambdas. This new feature will enable you to use the algorithm while writing simple helper functions directly in the algorithm parameter list.
std::transform(in.begin(), int.end(), out.begin(), [](int a) { return ++a; })
Local classes are a great feature to solve this. For example:
void IncreaseVector(std::vector<int>& v)
{
class Increment
{
public:
int operator()(int& i)
{
return ++i;
}
};
std::for_each(v.begin(), v.end(), Increment());
}
IMO, this is way too much complexity for just an increment, and it'll be clearer to write it in the form of a regular plain for loop. But when the operation you want to perform over a sequence becomes mor complex. Then I find it useful to clearly separate the operation to be performed over each element from the actual loop sentence. If your functor name is properly chosen, code gets a descriptive plus.
These are indeed real concerns, and these are being addressed in the next version of the C++ standard ("C++0x") which should be published either at the end of this year or in 2011. That version of C++ introduces a notion called C++ lambdas which allow for one to construct simple anonymous functions within another function, which makes it very easy to accomplish what you want without breaking your code into tiny little pieces. Lambdas are (experimentally?) supported in GCC as of GCC 4.5.
Those libraries like STL and Boost are complex also because they need to solve every need and work on any plateform.
As a user of these libraries -- you're not planning on remaking .NET are you? -- you can use their simplified goodies.
Here is possibly a simpler foreach from Boost I like to use:
BOOST_FOREACH(string& item in my_list)
{
...
}
Looks much neater and simpler than using .begin(), .end(), etc. and yet it works for pretty much any iteratable collection (not just arrays/vectors).
I'm making a simple crime sim game.
Throughout it I keep doing the same thing over and over:
// vector<Drug*> drugSack;
for (unsigned int i = 0; i < this->drugSack.size(); i++)
this->sell(drugSack[i]);
Just one example. I hate having all these for loops all over the place omg QQ, anyway to do something like:
drugSack->DoForAll((void*)myCallBack);
I'm not well versed in the STL.
Time to start knowing the stl algorithms:
#include <algorithm>
...
std::for_each( drugSack.begin(), drugSack.end(),
std::bind1st( std::mem_fun_ptr( &ThisClass::Sell ), this ) );
The idea is to create an object, called a "functor", that can do a certain action for each of the elements in the range drugSack.begin(), drugSack.end().
This functor can be created using stl constructs like mem_fun_ptr, resulting in a functor taking a ThisClass* and a Drug* argument, and a wrapper around it that will substitute/bind the Class* for this.
Honestly, C++ is currently pretty bad at this kind of stuff. It can definitely do it, as outlined in xtofl's answer, but it's often very clumsy.
Boost has a for-each macro that is quite convenient:
#include <boost/foreach.hpp>
#define foreach BOOST_FOREACH
// ...
foreach(Drug* d, drugSack)
{
sell(d);
}
Or perhaps Boost.Bind, though this is slightly more complex, it reads very nice for your case:
#include <boost/bind.hpp>
// ...
// ThisClass refers to whatever class this method is in
std::for_each(drugSack.begin(), drugSack.end(),
boost::bind(&ThisClass::sell, this, _1));
Bind will make a functor that calls the member function of ThisClass, sell, on the instance of the class pointed to by this, and will replace _1 with the argument it gets from for_each.
The most general method is with lambda's. Boost has a lambda library. I won't include
samples here because for your specific case boost bind works, and the lambda's would be the same code. That said, lamba's can do much more! They basically create in-place functions (implemented as functors), but are much more complex to learn.
Both for-each and bind are far cleaner than the "standard" C++ methods, in my opinion. For now, I'd recommend, in order: for-each, bind, standard C++, lambda's.
In C++0x, the next C++ standard, all this will be nice again with built-in lambda support:
std::for_each(drugSack.begin(), drugSack.end(),
[this](DrugSack* d){ sell(d); });
Or the new range-based for loops:
for(DrugSack* d : drugSack)
{
sell(d);
}
But we must wait a couple years before this is an option. :( Also, I think the range-based for-loop is the easiest thing to read. This is why I recommend boost for-each, because it mimics this behavior and syntax (mostly).
Also, totally unrelated: the style where you include this-> before everything is, in my experience, generally considered bad practice. The compiler will do it for you, all you're doing is cluttering up your code and introducing the chance of mistakes. Things read much better without it.
You could use std::for_each from the STL which applies a function to a range. See the following description: http://www.cplusplus.com/reference/algorithm/for_each/.
You will also need to use std::mem_fun or std::mem_fun_ptr to obtain the member function of your class.
For more advanced cases, have a look at Boost Bind which provides an advanced binder for creating function objects.
For the simple of case of looping through an entire container, I'd just write the loop. It's unfortunately long-winded, but not prone to mistakes if you always write it the same way. I always write such loops as follows:
Container c;
for (Container::iterator i = c.begin(), end = c.end(); i != end; ++i)
...
(or const_iterator where appropriate).
You could try BOOST_FOREACH as an alternative.
Well, first I'm confused: what's sell? Is it meant to be a member function of some class, you need to make drugSack that class, in which case you can do something like the following --
Something like for_each to iterate over drugSack, combined with mem_fun to get sell:
for_each(drugSack.begin(), drugSack.end(), mem_fun(&Drug::sell))
If sell is just an ordinary function, you can just put it in the third argument of for_each.
I've not used C++ very much in the past, and have recently been doing a lot of C#, and I'm really struggling to get back into the basics of C++ again. This is particularly tricky as work mandates that none of the most handy C++ constructs can be used, so all strings must be char *'s, and there is no provision for STL lists.
What I'm currently trying to do is to create a list of strings, something which would take me no time at all using STL or in C#. Basically I want to have a function such as:
char **registeredNames = new char*[numberOfNames];
Then,
RegisterName(const * char const name, const int length)
{
//loop to see if name already registered snipped
if(notFound)
{
registeredNames[lastIndex++] = name;
}
}
or, if it was C#...
if(!registeredNames.Contains(name))
{
registeredNames.Add(name);
}
and I realize that it doesn't work. I know the const nature of the passed variables (a const pointer and a const string) makes it rather difficult, but my basic problem is that I've always avoided this situation in the past by using STL lists etc. so I've never had to work around it!
There are legitimate reasons that STL might be avoided. When working in fixed environments where memory or speed is a premium, it's sometimes difficult to tell what is going on under the hood with STL. Yes, you can write your own memory allocators, and yes, speed generally isn't a problem, but there are differences between STL implementations across platforms, and those differences mighe be subtle and potentially buggy. Memory is perhaps my biggest concern when thinking about using it.
Memory is precious, and how we use it needs to be tightly controlled. Unless you've been down this road, this concept might not make sense, but it's true. We do allow for STL usage in tools (outside of game code), but it's prohibited inside of the actual game. One other related problem is code size. I am slightly unsure of how much STL can contribute to executable size, but we've seen marked increases in code size when using STL. Even if your executable is "only" 2M bigger, that's 2M less RAM for something else for your game.
STL is nice for sure. But it can be abused by programmers who don't know what they are doing. It's not intentional, but it can provide nasty surprises when you don't want to see them (again, memory bloat and performance issues)
I'm sure that you are close with your solution.
for ( i = 0; i < lastIndex; i++ ) {
if ( !strcmp(®isteredNames[i], name ) {
break; // name was found
}
}
if ( i == lastIndex ) {
// name was not found in the registeredNames list
registeredNames[lastIndex++] = strdup(name);
}
You might not want to use strdup. That's simply an example of how to to store the name given your example. You might want to make sure that you either don't want to allocate space for the new name yourself, or use some other memory construct that might already be available in your app.
And please, don't write a string class. I have held up string classes as perhaps the worst example of how not to re-engineer a basic C construct in C++. Yes, the string class can hide lots of nifty details from you, but it's memory usage patterns are terrible, and those don't fit well into a console (i.e. ps3 or 360, etc) environment. About 8 years ago we did the same time. 200000+ memory allocations before we hit the main menu. Memory was terribly fragmented and we couldn't get the rest of the game to fit in the fixed environment. We wound up ripping it out.
Class design is great for some things, but this isn't one of them. This is an opinion, but it's based on real world experience.
You'll probably need to use strcmp to see if the string is already stored:
for (int index=0; index<=lastIndex; index++)
{
if (strcmp(registeredNames[index], name) == 0)
{
return; // Already registered
}
}
Then if you really need to store a copy of the string, then you'll need to allocate a buffer and copy the characters over.
char* nameCopy = malloc(length+1);
strcpy(nameCopy, name);
registeredNames[lastIndex++] = nameCopy;
You didn't mention whether your input is NULL terminated - if not, then extra care is needed, and strcmp/strcpy won't be suitable.
If portability is an issue, you may want to check out STLport.
Why can't you use the STL?
Anyway, I would suggest that you implement a simple string class and list templates of your own. That way you can use the same techniques as you normally would and keep the pointer and memory management confined to those classes. If you mimic the STL, it would be even better.
If you really can't use stl (and I regret believing that was true when I was in the games industry) then can you not create your own string class? The most basic of string class would allocate memory on construction and assignment, and handle the delete in the destructor. Later you could add further functionality as you need it. Totally portable, and very easy to write and unit test.
Working with char* requires you to work with C functions. In your case, what you really need is to copy the strings around. To help you, you have the strndup function. Then you'll have to write something like:
void RegisterName(const char* name)
{
// loop to see if name already registered snipped
if(notFound)
{
registerNames[lastIndex++] = stdndup(name, MAX_STRING_LENGTH);
}
}
This code suppose your array is big enough.
Of course, the very best would be to properly implement your own string and array and list, ... or to convince your boss the STL is not evil anymore !
Edit: I guess I misunderstood your question. There is no constness problem in this code I'm aware of.
I'm doing this from my head but it should be about right:
static int lastIndex = 0;
static char **registeredNames = new char*[numberOfNames];
void RegisterName(const * char const name)
{
bool found = false;
//loop to see if name already registered snipped
for (int i = 0; i < lastIndex; i++)
{
if (strcmp(name, registeredNames[i] == 0))
{
found = true;
break;
}
}
if (!found)
{
registeredNames[lastIndex++] = name;
}
}
I can understand why you can't use STL - most do bloat your code terribly. However there are implementations for games programmers by games programmers - RDESTL is one such library.
Using:
const char **registeredNames = new const char * [numberOfNames];
will allow you to assign a const * char const to an element of the array.
Just out of curiosity, why does "work mandates that none of the most handy C++ constructs can be used"?
All the approaches suggested are valid, my point is if the way C# does it is appealing replicate it, create your own classes/interfaces to present the same abstraction, i.e. a simple linked list class with methods Contains and Add, using the sample code provided by other answers this should be relatively simple.
One of the great things about C++ is generally you can make it look and act the way you want, if another language has a great implementation of something you can usually reproduce it.
I have used this String class for years.
http://www.robertnz.net/string.htm
It provides practically all the features of the
STL string but is implemented as a true class not a template
and does not use STL.
This is a clear case of you get to roll your own. And do the same for a vector class.
Do it with test-first programming.
Keep it simple.
Avoid reference counting the string buffer if you are in MT environment.
If you are not worried about conventions and just want to get the job done use realloc. I do this sort of thing for lists all of the time, it goes something like this:
T** list = 0;
unsigned int length = 0;
T* AddItem(T Item)
{
list = realloc(list, sizeof(T)*(length+1));
if(!list) return 0;
list[length] = new T(Item);
++length;
return list[length];
}
void CleanupList()
{
for(unsigned int i = 0; i < length; ++i)
{
delete item[i];
}
free(list)
}
There is more you can do, e.g. only realloc each time the list size doubles, functions for removing items from list by index or by checking equality, make a template class for handling lists etc... (I have one I wrote ages ago and always use myself... but sadly I am at work and can't just copy-paste it here). To be perfectly honest though, this will probably not outperform the STL equivalent, although it may equal its performance if you do a ton of work or have an especially poor implementation of STL.
Annoyingly C++ is without an operator renew/resize to replace realloc, which would be very useful.
Oh, and apologies if my code is error ridden, I just pulled it out from memory.
const correctness is still const correctness regardless of whether you use the STL or not. I believe what you are looking for is to make registeredNames a const char ** so that the assignment to registeredNames[i] (which is a const char *) works.
Moreover, is this really what you want to be doing? It seems like making a copy of the string is probably more appropriate.
Moreover still, you shouldn't be thinking about storing this in a list given the operation you are doing on it, a set would be better.