Python's itertools implement a chain iterator which essentially concatenates a number of different iterators to provide everything from single iterator.
Is there something similar in C++ ? A quick look at the boost libraries didn't reveal something similar, which is quite surprising to me. Is it difficult to implement this functionality?
Came across this question while investigating for a similar problem.
Even if the question is old, now in the time of C++ 11 and boost 1.54 it is pretty easy to do using the Boost.Range library. It features a join-function, which can join two ranges into a single one. Here you might incur performance penalties, as the lowest common range concept (i.e. Single Pass Range or Forward Range etc.) is used as new range's category and during the iteration the iterator might be checked if it needs to jump over to the new range, but your code can be easily written like:
#include <boost/range/join.hpp>
#include <iostream>
#include <vector>
#include <deque>
int main()
{
std::deque<int> deq = {0,1,2,3,4};
std::vector<int> vec = {5,6,7,8,9};
for(auto i : boost::join(deq,vec))
std::cout << "i is: " << i << std::endl;
return 0;
}
In C++, an iterator usually doesn't makes sense outside of a context of the begin and end of a range. The iterator itself doesn't know where the start and the end are. So in order to do something like this, you instead need to chain together ranges of iterators - range is a (start, end) pair of iterators.
Takes a look at the boost::range documentation. It may provide tools for constructing a chain of ranges. The one difference is that they will have to be the same type and return the same type of iterator. It may further be possible to make this further generic to chain together different types of ranges with something like any_iterator, but maybe not.
I've written one before (actually, just to chain two pairs of iterators together). It's not that hard, especially if you use boost's iterator_facade.
Making an input iterator (which is effectively what Python's chain does) is an easy first step. Finding the correct category for an iterator chaining a combination of different iterator categories is left as an exercise for the reader ;-).
Check Views Template Library (VTL). It may not provided 'chained iterator' directly. But I think it has all the necessary tools/templates available for implementing your own 'chained iterator'.
From the VTL Page:
A view is a container adaptor, that provides a container interface to
parts of the data or
a rearrangement of the data or
transformed data or
a suitable combination of the data sets
of the underlying container(s). Since views themselves provide the container interface, they can be easily combined and stacked. Because of template trickery, views can adapt their interface to the underlying container(s). More sophisticated template trickery makes this powerful feature easy to use.
Compared with smart iterators, views are just smart iterator factories.
What you are essentially looking for is a facade iterator that abstracts away the traversing through several sequences.
Since you are coming from a python background I'll assume that you care more about flexibility rather than speed. By flexibility I mean the ability to chain-iterate through different sequence types together (vector, array, linked list, set etc....) and by speed I mean only allocating memory from the stack.
If this is the case then you may want to look at the any_iterator from adobe labs:
http://stlab.adobe.com/classadobe_1_1any__iterator.html
This iterator will give you the ability to iterate through any sequence type at runtime. To chain you would have a vector (or array) of 3-tuple any_iterators, that is, three any_iterators for each range you chain together (you need three to iterate forward or backward, if you just want to iterate forward two will suffice).
Let's say that you wanted to chain-iterate through a sequence of integers:
(Untested psuedo-c++ code)
typedef adobe::any_iterator AnyIntIter;
struct AnyRange {
AnyIntIter begin;
AnyIntIter curr;
AnyIntIter end;
};
You could define a range such as:
int int_array[] = {1, 2, 3, 4};
AnyRange sequence_0 = {int_array, int_array, int_array + ARRAYSIZE(int_array)};
Your RangeIterator class would then have an std::vector.
<code>
class RangeIterator {
public:
RangeIterator() : curr_range_index(0) {}
template <typename Container>
void AddAnyRange(Container& c) {
AnyRange any_range = { c.begin(), c.begin(), c.end() };
ranges.push_back(any_range);
}
// Here's what the operator++() looks like, everything else omitted.
int operator++() {
while (true) {
if (curr_range_index > ranges.size()) {
assert(false, "iterated too far");
return 0;
}
AnyRange* any_range = ranges[curr_range_index];
if (curr_range->curr != curr_range->end()) {
++(curr_range->curr);
return *(curr_range->curr);
}
++curr_range_index;
}
}
private:
std::vector<AnyRange> ranges;
int curr_range_index;
};
</code>
I do want to note however that this solution is very slow. The better, more C++ like approach is just to store all the pointers to the objects that you want operate on and iterate through that. Alternatively, you can apply a functor or a visitor to your ranges.
Not in the standard library. Boost might have something.
But really, such a thing should be trivial to implement. Just make yourself an iterator with a vector of iterators as a member. Some very simple code for operator++, and you're there.
No functionality exists in boost that implements this, to the best of my knowledge - I did a pretty extensive search.
I thought I'd implement this easily last week, but I ran into a snag: the STL that comes with Visual Studio 2008, when range checking is on, doesn't allow comparing iterators from different containers (i.e., you can't compare somevec1.end() with somevec2.end() ). All of a sudden it became much harder to implement this and I haven't quite decided yet on how to do it.
I wrote other iterators in the past using iterator_facade and iterator_adapter from boost, which are better than writing 'raw' iterators but I still find writing custom iterators in C++ rather messy.
If someone can post some pseudocode on how this could be done /without/ comparing iterators from different containers, I'd be much obliged.
Related
I have a container which (among other things) exposes a string buffer, and the upper case version of that string buffer. (Well, it isn't just upper case, but it is similar in concept) I want to allow a caller to do something similar to:
container c("Example");
auto const iter = c.begin() + 2;
std::printf("%c\n", iter->get_source()); // Prints a
std::printf("%c\n", iter->get_upper()); // Prints A
iter->set('x');
std::puts(c.get()); // Prints Exxmple
std::puts(c.get_upper()); // Prints EXXMPLE
The problem is, the "proxy" type with the member functions get_source, get_upper, etc. has no obvious place it can be stored, and an iterator is required to return a reference to something, not a value. (vector<bool> has a similar problem)
Alternately I could expose some kind of shell container or range, or expose completely separate iterator begin/end functions. Does anyone have experience doing something like this and know what works well?
My personal approach to this sort of things is to use property maps: I envision a system of algorithms which can [optionally] take a property map (or actually sometimes multiple property maps) for each range. The idea is that *it yields a key (e.g., the T& it currently do) which is then used with a property map which transforms the key into the actually accessed value. The transformation can, e.g., be the identity yielding the current behavior of the algorithms and a good default to be used when there is no property map. The example above would look something like this:
auto const cursor = c.begin();
std::printf("%c\n", c.map_source()(*cursor));
std::printf("%c\n", c.map_upper()(*cursor));
c.map_source()(*cursor, 'x');
std::copy(c.map_source(), c, std::ostreambuf_iterator<char>(std::cout));
std::copy(c.map_upper(), c, std::ostreambuf_iterator<char>(std::cout));
std::copy([](unsigned char c)->char{ return std::toupper(c); }, c,
std::ostreambuf_iterator<char>(std::cout));
The code assumes that the property maps yielding the source and the capitalized characters are obtained using c.map_source() and c.map_upper(), respectively. The last variant using std::copy() uses a lambda function as a property map.
Sadly, I still haven't found the time to write up a coherent proposal to apply various improvements to the STL algorithms. ... nor do I have have an implementation putting it all together (I have a somewhat clunky implementation which is about 10 years old and doesn't benefit from various C++11 features which make it a lot easier; also, this implementation only concentrates on property maps and doesn't use the interface I currently envision).
Is there anything built into the C++ Standard Library that allows me to work in a priority queue/heap like data structure (i.e., can always pop the highest value from the list, can define how the highest value is determined for custom classes, etc.) but allows me to update the keys in the heap? I'm dealing with fairly simple data, pairs to be exact, but I need to be able to update the value of a given key within the heap easily for my algorithm to function. WHat is the best way to achieve this in C++?
Binary heaps (which are how priority queues are implemented in the C++ standard library) do not support arbitrary update-key operations. A common method if updates are infrequent is to extrinsically flag the original item as invalid, and reinsert the value with the new key; when an invalid value is popped, it is ignored.
The alternative is using a different PQ implementation which does support update-key, such as a binomial heap. Binomial heaps have the particular advantage of being manipulated by swinging pointers, instead of moving values. This streamlines the task of implementing operations like update-key and delete.
I'm not sure what you're take on Boost is, but I always consider a kind of almost standard library (some boost functionality has even ended up in standard library). In any case, if you're ok with using boost, then Boost.Heap provides a priority queue with updatable priority.
Like most boost libraries, it's header-only, so there's no linker hassles to go through and it won't make your build system any more complex. You can just #include it and use it.
I don't have the ability to comment on your question, but here is my understanding.
Your mentioned pairs, and it sounds like you need the ability to change the priority from some function of your first element of the pair to your second element.
That is, you want to initially use FirstComparator below, but then switch to SecondComparator.
typedef std::pair<X, Y> MyPair;
struct FirstComparator
{
bool operator() (const MyPair& left, const MyPair& right)
{
return left.first < right.first;
}
}
struct SecondComparator
{
bool operator() (const MyPair& left, const MyPair& right)
{
return left.second < right.second;
}
}
Because std::priority_queue is a template that includes a sorting criterion (as your question mentioned), you can create a second container of a different type. (I wrote less-than comparisons, so we have a min-heap.)
You will need to transfer the members into it.
std::priority_queue<MyPair, std::vector<MyPair>, FirstComparator> firstQueue;
// Populate and use firstQueue
// Now create an initially-empty queue sorting according to other criterion,
// and copy elements in.
std::priority_queue<MyPair, std::vector<MyPair>, SecondComparator> secondQueue;
// Use 'push' instead of 'emplace' for pre-C++11 code
while (! firstQueue.empty())
secondQueue.emplace(firstQueue.pop());
// Use secondQueue
An alternative approach is to use a single std::vector, and resort it with std::sort using different sorting criteria. C++11 allows you to create named or anonymous lambda functions for establishing such sorting criteria on-the-fly.
Since your question specifically involved priority queues, I won't get into that unless you're specifically interested.
I am no expert in C++ and STL.
I use a structure in a Map as data. Key is some class C1.
I would like to access the same data but using a different key C2 too (where C1 and C2 are two unrelated classes).
Is this possible without duplicating the data?
I tried searching in google, but had a tough time finding an answer that I could understand.
This is for an embedded target where boost libraries are not supported.
Can somebody offer help?
You may store pointers to Data as std::map values, and you can have two maps with different keys pointing to the same data.
I think a smart pointer like std::shared_ptr is a good option in this case of shared ownership of data:
#include <map> // for std::map
#include <memory> // for std::shared_ptr
....
std::map<C1, std::shared_ptr<Data>> map1;
std::map<C2, std::shared_ptr<Data>> map2;
Instances of Data can be allocated using std::make_shared().
Not in the Standard Library, but Boost offers boost::multi_index
Two keys of different types
I must admit I've misread a bit, and didn't really notice you want 2 keys of different types, not values. The solution for that will base on what's below, though. Other answers have pretty much what will be needed for that, I'd just add that you could make an universal lookup function: (C++14-ish pseudocode).
template<class Key>
auto lookup (Key const& key) { }
And specialize it for your keys (arguably easier than SFINAE)
template<>
auto lookup<KeyA> (KeyA const& key) { return map_of_keys_a[key]; }
And the same for KeyB.
If you wanted to encapsulate it in a class, an obvious choice would be to change lookup to operator[].
Key of the same type, but different value
Idea 1
The simplest solution I can think of in 60 seconds: (simplest meaning exactly that it should be really thought through). I'd also switch to unordered_map as default.
map<Key, Data> data;
map<Key2, Key> keys;
Access via data[keys["multikey"]].
This will obviously waste some space (duplicating objects of Key type), but I am assuming they are much smaller than the Data type.
Idea 2
Another solution would be to use pointers; then the only cost of duplicate is a (smart) pointer:
map<Key, shared_ptr<Data>> data;
Object of Data will be alive as long as there is at least one key pointing to it.
What I usually do in these cases is use non-owned pointers. I store my data in a vector:
std::vector<Data> myData;
And then I map pointers to each element. Since it is possible that pointers are invalidated because of the future growth of the vector, though, I will choose to use the vector indexes in this case.
std::map<Key1, int> myMap1;
std::map<Key2, int> myMap2;
Don't expose the data containers to your clients. Encapsulate element insertion and removal in specific functions, which insert everywhere and remove everywhere.
Bartek's "Idea 1" is good (though there's no compelling reason to prefer unordered_map to map).
Alternatively, you could have a std::map<C2, Data*>, or std::map<C2, std::map<C1, Data>::iterator> to allow direct access to Data objects after one C2-keyed search, but then you'd need to be more careful not to access invalid (erased) Data (or more precisely, to erase from both containers atomically from the perspective of any other users).
It's also possible for one or both maps to move to shared_ptr<Data> - the other could use weak_ptr<> if that's helpful ownership-wise. (These are in the C++11 Standard, otherwise the obvious source - boost - is apparently out for you, but maybe you've implemented your own or selected another library? Pretty fundamental classes for modern C++).
EDIT - hash tables versus balanced binary trees
This isn't particularly relevant to the question, but has received comments/interest below and I need more space to address it properly. Some points:
1) Bartek's casually advising to change from map to unordered_map without recommending an impact study re iterator/pointer invalidation is dangerous, and unwarranted given there's no reason to think it's needed (the question doesn't mention performance) and no recommendation to profile.
3) Relatively few data structures in a program are important to performance-critical behaviours, and there are plenty of times when the relative performance of one versus another is of insignificant interest. Supporting this claim - masses of code were written with std::map to ensure portability before C++11, and perform just fine.
4) When performance is a serious concern, the advice should be "Care => profile", but saying that a rule of thumb is ok - in line with "Don't pessimise prematurely" (see e.g. Sutter and Alexandrescu's C++ Coding Standards) - and if asked for one here I'd happily recommend unordered_map by default - but that's not particularly reliable. That's a world away from recommending every std::map usage I see be changed.
5) This container performance side-track has started to pull in ad-hoc snippets of useful insight, but is far from being comprehensive or balanced. This question is not a sane venue for such a discussion. If there's another question addressing this where it makes sense to continue this discussion and someone asks me to chip in, I'll do it sometime over the next month or two.
You could consider having a plain std::list holding all your data, and then various std::map objects mapping arbitrary key values to iterators pointing into the list:
std::list<Data> values;
std::map<C1, std::list<Data>::iterator> byC1;
std::map<C2, std::list<Data>::iterator> byC2;
I.e. instead of fiddling with more-or-less-raw pointers, you use plain iterators. And iterators into a std::list have very good invalidation guarantees.
I had the same problem, at first holding two map for shared pointers sound very cool. But you will still need to manage this two maps(inserting, removing etc...).
Than I came up with other way of doing this.
My reason was; accessing a data with x-y or radius-angle. Think like each point will hold data but point could be described as cartesian x,y or radius-angle .
So I wrote a struct like
struct MyPoint
{
std::pair<int, int> cartesianPoint;
std::pair<int, int> radianPoint;
bool operator== (const MyPoint& rhs)
{
if (cartesianPoint == rhs.cartesianPoint || radianPoint == rhs.radianPoint)
return true;
return false;
}
}
After that I could used that as key,
std::unordered_map<MyPoint, DataType> myMultIndexMap;
I am not sure if your case is the same or adjustable to this scenerio but it can be a option.
Recently (from one SO comment) I learned that std::remove and std:remove_if are stable. Am I wrong to think this is a terrible design choice since it prevents certain optimizations?
Imagine removing the first and fifth elements of a 1M std::vector. Because of stability, we can't implement remove with swap. Instead we must shift every remaining element. :(
If we weren't limited by stability we could (for RA and BD iter) practically have 2 iters, one from front, second from behind, and then use swap to bring to-be-removed items to end. I'm sure smart people could maybe do even better. My question is in general, not about specific optimization I'm talking about.
EDIT: please note that C++ advertizes the zero overhead principle, and also there are std::sort and std::stable_sort sort algorithms.
EDIT2:
optimization would be something like the following:
For remove_if:
bad_iter looks from the beginning for those elements for which the predicate returns true.
good_iter looks from the end for those elements for which the predicate returns false.
when both have found what is expected they swap their elements. Termination is at good_iter <= bad_iter.
If it helps, think of it like one iter in quick sort algorithm, but we don't compare them to a special element, but instead we use the above predicate.
EDIT3: I played around and tried to find worst case (worst case for remove_if - notice how rarely the predicate would be true) and I got this:
#include <vector>
#include <string>
#include <iostream>
#include <map>
#include <algorithm>
#include <cassert>
#include <chrono>
#include <memory>
using namespace std;
int main()
{
vector<string> vsp;
int n;
cin >> n;
for (int i =0; i < n; ++i)
{ string s = "123456";
s.push_back('a' + (rand() %26));
vsp.push_back(s);
}
auto vsp2 = vsp;
auto remove_start = std::chrono::high_resolution_clock::now();
auto it=remove_if(begin(vsp),end(vsp), [](const string& s){ return s < "123456b";});
vsp.erase(it,vsp.end());
cout << vsp.size() << endl;
auto remove_end = std::chrono::high_resolution_clock::now();
cout << "erase-remove: " << chrono::duration_cast<std::chrono::milliseconds>(remove_end-remove_start).count() << " milliseconds\n";
auto partition_start = std::chrono::high_resolution_clock::now();
auto it2=partition(begin(vsp2),end(vsp2), [](const string& s){ return s >= "123456b";});
vsp2.erase(it2,vsp2.end());
cout << vsp2.size() << endl;
auto partition_end = std::chrono::high_resolution_clock::now();
cout << "partition-remove: " << chrono::duration_cast<std::chrono::milliseconds>(partition_end-partition_start).count() << " milliseconds\n";
}
C:\STL\MinGW>g++ test_int.cpp -O2 && a.exe
12345678
11870995
erase-remove: 1426 milliseconds
11870995
partition-remove: 658 milliseconds
For other usages, partition is bit faster, same or slower. Color me puzzled. :D
I assume you're asking about a hypothetical definition of stable_remove to be what remove currently is, and remove to be implemented however the implementer thinks is best to give the correct values in any order. With an expectation that implementers will be able to improve on just doing exactly the same as stable_remove.
In practice, the library can't easily do this optimization. It depends on the data, but you don't want to spend too long to work out how many elements will be removed before deciding on how to remove each one. For example you could do an extra pass to count them, but there are plenty of cases where that extra pass is inefficient. Just because an unstable remove is faster than stable for certain cases doesn't necessarily mean that an adaptive algorithm to choose between the two is a good bet.
I think the difference between remove and sort is that sorting is known to be a complicated problem with a lot of different solutions and trade-offs and tweaks. All "simple" sort algorithms are slow on average. Most standard algorithms are pretty simple, and remove is one of them but sort is not. I don't think it makes a lot of sense therefore to define stable_remove and remove as separate standard functions.
Edit: your edit with my tweak (similar to std::partition but no need to keep the values on the right) seems pretty reasonable to me. It requires a bidirectional iterator, but there is precedent in the standard for algorithms that behave differently on different iterator categories, such as std::distance. So it would be possible for the standard to define unstable_remove that only requires a forward iterator, but does your thing if it gets a bidi iterator. The standard probably wouldn't lay out the algorithm, but it could have a phrase like "if the iterator is bidirectional, does at most min(k, n-k) moves where k is the number of elements removed", which would in effect force it. But note that the standard doesn't currently say how many moves remove_if does, so I reckon that pinning this down simply wasn't a priority.
There is of course nothing stopping you from implementing your own unstable_remove.
If we accept that the standard didn't need to specify an unstable remove, the question then comes down to whether the function it does define should have been called stable_remove, anticipating a future remove that behaves differently for bidi iterators, and might behave differently for forward iterators if some clever heuristic for doing an unstable remove ever becomes well enough known to be worth a standard function. I'd say not: it is not a disaster if the names of standard functions aren't completely regular. It could have been pretty disruptive to remove the guarantee of stability from the STL's remove_if. Then the question becomes, "why didn't the STL call it stable_remove_if", to which I can only answer that in addition to all the points made in all the answers, the STL design process was a sight quicker than the standardization process.
stable_remove would also open a can of worms regarding other standard functions that could in theory have unstable versions. For a particularly silly example should copy be called stable_copy, just in case some implementation exists on which its demonstrably faster to reverse the order of elements while copying? Should copy be called copy_forward, so that the implementation can choose which of copy_backward and copy_forward is called by copy according to which is faster? Part of the committee's job is to draw a line somewhere.
I think realistically the current standard is sensible, and it would be sensible to separately define a stable_remove and a remove_with_some_other_constraints, but remove_in_some_unspecified_way just doesn't give the same opportunity for optimization that sort_in_some_unspecified_way does. Introsort was invented in 1997, just as C++ was being standardized, but I don't imagine the research effort around remove is quite what it was and is around sort. I may be wrong, optimizing remove might be the next big thing, and if so then the committee has missed a trick.
std::remove is specified to work with forward iterators.
The approach with working with a pair of iterators, from beginning and from the end, would either increase the requirements for the iterators and thus decrease the utility of the function or violate/worsen asymptotic complexity guarantees.
To answer my own question >3 years later :)
Yes it was a "fail".
There is a proposal D0041R0 that would add unstable_remove.
One could argue that just because there is a proposal to add std::unstable_remove that it does not mean that std::remove was a mistake, but I disagree. :)
I'm doing some coding at work in C++, and a lot of the things that I work on involve analyzing sets of data. Very often I need to select some elements from a STL container, and very frequently I wrote code like this:
using std::vector;
vector< int > numbers;
for ( int i = -10; i <= 10; ++i ) {
numbers.push_back( i );
}
vector< int > positive_numbers;
for ( vector< int >::const_iterator it = numbers.begin(), end = numbers.end();
it != end; ++it
) {
if ( number > 0 ) {
positive_numbers.push_back( *it );
}
}
Over time this for loop and the logic contained within it gets a lot more complicated and unreadable. Code like this is less satisfying than the analogous SELECT statement in SQL, assuming that I have a table called numbers with a column named "num" rather than a std::vector< int > :
SELECT * INTO positive_numbers FROM numbers WHERE num > 0
That's a lot more readable to me, and also scales better, over time a lot of the if-statement logic that's in our codebase has become complicated, order-dependent and unmaintainable. If we could do SQL-like statements in C++ without having to go to a database I think that the state of the code might be better.
Is there a simpler way that I can implement something like a SELECT statement in C++ where I can create a new container of objects by only describing the characteristics of the objects that I want? I'm still relatively new to C++, so I'm hoping that there's something magic with either template metaprogramming or clever iterators that would solve this. Thanks!
Edit based on first two answers. Thanks, I had no idea that's what LINQ actually was. I program on Linux and OSX systems primarily, and am interested in something cross-platform across OSX, Linux and Windows. So a more educated version of this question would be - is there a cross-platform implementation of something like LINQ for C++?
You've almost exactly described LINQ. It's a .NET 3.5 feature so you should be able to use it from C++.
The functionality you're describing is commonly found in functional languages that support concepts such as closures, predicates, functors, etc.
The problem with the code above is that it combines:
Logic for iterating over collection (the for loop)
Condition that must be satisfied for an element to be copied to another collection
Logic for copying an element from one collection to another
In reality (1) and (3) are boilerplate, insofar as every time you need to iterate over a collection copying some elements to another collection, it's probably only the conditional code that will change each time. Languages with support for functional programming eliminate this boilerplate. For example, in Groovy you can replace your for loop above with just
def positive_numbers = numbers.findAll{it > 0}
Even though C++ is not a functional language there may be libraries which provide support for functional-style programming with STL collections. For example, the Apache commons collection (and also possibly Google's collection library) provides support for functional style programming with Java collections, even though Java itself is not a functional language.
I think you have described LINQ (a C# and .NET 3.5 feature). Have you looked into that?
LINQ is the obvious answer for .NET (or Mono on non-Windows platforms, but in C++, it shouldn't be that difficult to write something like it yourself in STL.
Use the Boost.Iterator library to write a "select" iterator, for example, one which skips all elements that do not satisfy a given predicate.
Boost already has a few relevant examples in their documentation I believe.
Or http://www.boost.org/doc/libs/1_39_0/libs/iterator/doc/filter_iterator.html might actually do what you need out of the box.
In any case, in C++, you could achieve the same effect basically by layering iterators.
If you have a regular iterator, which visits every element in the sequence, you can wrap that in a filter iterator, which increments the underlying iterator until it finds a value satisfying the condition. Then you could even wrap that in a "select" iterator transforming the value to the desired format.
It seems like a fairly obvious idea, but I'm not aware of any complete implementations of it.
You're using STL containers. I would recommend using STL algorithms, which are largely straight out of set theory. A SQL select is translated to repeated applications of std::find_if, or a combination of std::lower_bound and std::upper_bound (on sorted containers). The performance will be about the same as looping, but the syntax is a little more declarative.
LINQ will give you similar syntax and operations, but unless used over IQueryables (i.e., data in a database) you're not going to get any performance gains either.
Your best bet after that is putting things into files for this sort of thing. Whether that's BerkelyDB, NetCDF, HDF5, STXXL, etc. File access is slow, but doing this allows you to work on more data than fits in memory.
For what you're describing, std::vector isn't a terribly good choice. This is an SQL equivalent to a table with no indexes. On top of that, filling one container with the contents of another container is possibly a reasonable performance optimization, but not very readable, and not quite idiomatic, either. There are a number of ways of solving this portably (IE, without relying on managed code .net).
First choice is to choose a better container. If you don't need to have stable iteration, then you should use std::set or std::multi_set. these containers use a balanced search tree to store the values in order. This is equivalent to a simple SQL index of all columns.
std::set< int > numbers;
for ( int i = -10; i <= 10; ++i ) {
numbers.insert( i );
}
std::set::iterator first = numbers.find(1);
std::set::iterator end = numbers.end();
Now you can iterate from first until end without wasting any extra effort, over the O(n log(n)) fill and O(log(n) ) seek. Iterating is O(1) for std::set::iterator
If, for some reason you must use a vector, you can get more idiomatic C++ using std::find_if (see Max Lybbert's answer)
bool isPositive(int n) { return n > 0; }
std::vector< int > numbers;
for ( int i = -10; i <= 10; ++i ) {
numbers.push_back( i );
}
for ( std::vector< int >::const_iterator end = numbers.end(),
iter = std::find_if(numbers.begin(), end, isPositive); // <- first positive value
iter != end;
iter = std::find_if(iter, end, isPositive) // <- advance iter to the next positive
) {
// iter is guaranteed to be positive here, do something with it!
}
If you want something even more evocative of SQL without actually connecting to a database, you should look at Boost, particularly the boost::multi_index container and boost iterators.
Check out Mono if you want try out LINQ on Linux / OS X. It's a port of the .NET Framework and LINQ is included now i believe.