I've got a question about the architecture of a data structure I'm writing. I'm writing an image class, and I'm going to use it in a specific algorithm. In this algorithm, I need to touch every pixel in the image that's within a certain border. The classic way I know to do this is with two nested for loops:
for(int i = ROW_BORDER; i < img->height - ROW_BORDER; i++)
for(int j = COL_BORDER; j < img->width - COL_BORDER; j++)
WHATEVER
However, I've been told that in style of the STL, it is in general better to return an iterator rather than use loops as above. It would be very easy to get an iterator to look at every pixel in the image, and it would even be easy to incorporate the border constraints, but I feel like included the border is blowing loose coupling out of the water.
So, the question is, should I return a special "border-excluding iterator", use the for loops, or is there a better way I haven't thought of?
Just to avoid things like "well, just use OpenCV, or VXL!" , I'm not actually writing an image class, I'm writing a difference-of-gaussian pyramid for use in a feature detector. That said, the same issues apply, and it was simpler to write two for loops than three or four.
To have something reusable, I'd go with a map function.
namespace your_imaging_lib {
template <typename Fun>
void transform (Image &img, Fun fun) {
const size_t width = img.width(),
size = img.height() * img.width();
Pixel *p = img.data();
for (size_t s=0; s!=size; s+=width)
for (size_t x=0; x!=width; ++x)
p[x + s] = fun (p[x + s]);
}
template <typename Fun>
void generate (Image &img, Fun fun) {
const size_t width = img.width(), size = img.height();
Pixel *p = img.data();
for (size_t s=0, y=0; s!=size; s+=width, ++y)
for (size_t x=0; x!=width; ++x)
p[x + s] = fun (x, y);
}
}
Some refinement needed. E.g., some systems like x, y to be in [0..1).
You can then use this like:
using namespace your_imaging_lib;
Image i = Image::FromFile ("foobar.png");
map (i, [](Pixel const &p) { return Pixel::Monochrome(p.r()); });
or
generate (i, [](int x, int y) { return (x^y) & 0xFF; });
Iff you need knowledge of both coordinates (x and y), I guarantee this will give better performance compared to iterators which need an additional check for each iteration.
Iterators, on the other hand, will make your stuff usable with standard algorithms, like std::transform, and you could make them almost as fast if pixel positions are not needed and you do not have a big pitch in your data (pitch is for alignment, usually found on graphics hardware surfaces).
I suspect you should be using the visitor pattern instead-- instead of returning an iterator or some sort of collection of your items, you should pass in the operation to be done on each pixel/item to your data structure that holds the items, and the data structure should be able to apply that operation to each item. Whether your data structure uses for loops or iterators to traverse the pixel/whatever collection is hidden, and the operation is decoupled from the data structure.
IMHO it sounds like a good idea to have an iterator that touches every pixel. However, it doesn't sound as appealing to me to include the border constraints inside of it. Maybe try to achieve something like:
IConstraint *bc=new BorderConstraint("blue-border");
for(pixel_iterator itr=img.begin(); itr!=img.end(); itr++) {
if(!bc->check(itr))
continue;
// do whatever
}
Where IConstraint is a base class that can be derived to make many different BorderConstraints.
My rationale is that iterators iterate in different ways but I don't think they need to know about your business logic. That could be abstracted away into another design construct as depcited via Constraints above.
In the case of bitmap data it is noteworthy that there are no iterator based algorithms or datasets commonly used in the popular image manipulation APIs. This should be a clue as to that it is hard to implement as efficiently as a regular 2D array. (thanks phresnel)
If you really require/prefer an iterator for your image sans border, you should invent a new concept to iterate. My suggestion would be something like an ImageArea.
class ImageArea: Image
{ int clipXLeft, clipXRight;
int clipYTop, clipYBottom;
public:
ImageArea(Image i, clipXTop ... )
And construct your iterator from there. The iterators can then be transparent to work with images or regions within an image.
On the other hand, a regular x/y index based approach is not a bad idea. Iterators are very useful for abstracting data sets, but comes with a cost when you implement them on your own.
Related
So what I've got is a Grid class and a Tile class. ATM Grid contains two dimensional vector of Tiles (vector<vector<Tile>>). These Tiles hold info about their x, y and z (it's a top down map) and f.e. erosion rate etc.
My problem is with that is that I need to effectively access these tiles by their x/y coordinates, find a tile with median (or other 0 to 1 value, median being 0.5) value from all z coordinates (to set sea level) and also loop through all of them from highest z to the lowest (for creating erosion map.
What would you suggest would be the best data structure to hold these in so I can effectively do everything I listed above and maybe something else as well if I find out later I need it. Right now I just create a temporary sorted structure or map to do the thing, copying all the tiles into it and working with it, which is really slow.
The options I've considered are map which doesn't have a direct access and is also always sorted which would make picking tiles by their x/y hard.
Then a single vector which would allow direct access but if I was to sort the tiles the direct access would be pointless because the position of Tile in vector would be the same as it's x + y * width.
Here is a small sample code:
Class Grid {
public:
Class Tile {
unsigned x;
unsigned y;
float z; // used for drawing height map
static float seaLevel; // static value for all the tiles
unsigned erosionLevel; //used for drawing erosion map
void setSeaLevel(float pos) {
// set seaLevel to z of tile on pos from 0 to 1 in tile grid
}
void generateErosionMap() {
// loop thorugh all tiles from highest z to lowest z and set their erosion
}
void draw() {
// loop through all tiles by their x/y and draw them
}
vector<vector<Tile>> tileGrid;
}
The C++ library provides a basic set of containers. Each container is optimized for access in a specific way.
When you have a requirement to be able to optimally access the same set of data in different ways, the way to do this is to combine several containers together, all referencing the same underlying data, with each container being used to locate a single chunk of data in one particular way.
Let's take two of your requirements, as an example:
Locate a Grid object based on its X and Y coordinates, and
Iterate over all Grids in monotonically increasing or decreasing order, by their z coordinates.
We can implement the first requirement by using a simple two-dimensional vector:
typedef std::vector<std::vector<std::shared_ptr<Grid>>> lookup_by_xy_t;
lookup_by_xy_t lookup_by_xy;
This is rather obvious, on its face value. But note that the vector does not store the actual Grids, but a std::shared_ptr to these objects. If you are not familiar with std::shared_ptrs, read up on them, and understand what they are.
This is fairly basic: you construct a new Grid:
auto g = std::make_shared<Grid>( /* arguments to Grid's constructor */);
// Any additional initialization...
//
// g->foo(); g->bar=4;
//
// etc...
and simply insert it into the lookup vector:
lookup_by_xy[g->x][g->y]=g;
Now, we handle your second requirement: being able to iterate over all these objects by their z coordinates:
typedef std::multimap<double, std::shared_ptr<Grid>> lookup_by_z_t;
lookup_by_z_t lookup_by_z;
This is assuming that your z coordinate is a double. The multimap will, by default, iterate over its contents in strict weak ordering according to the key, from lowest to the highest key. You can either iterate over the map backwards, or use the appropriate comparison class with the multimap, to order its keys from highest to lowest values.
Now, simply insert the same std::shared_ptr into this lookup container:
lookup_by_z.insert(std::make_pair(g->z, g));
Now, you can find each Grid object by either its x/y coordinate, or iterate over all objects by their z coordinates. Both of the two-dimensional vector, and the multimap, contain shared_ptrs to the same Grid objects. Either one can be used to access them.
Simply create other containers, as needed, to access the same underlying objects, in different ways.
Now, of course, all of this additional framework does impose some additional overhead, in terms of dynamic memory allocations, and the overhead for each container itself. There is no free lunch. A custom allocator might become necessary if the amount of raw data becomes an issue.
So after asking this question on my university and getting bit deeper explanation, I've come to this solution.
If you need a data structure that needs various access methods(like in my case direct access by x/y, linear access through sorted z etc.) best solution is to make you own class for handling it. Also using shared_ptr is much slower than uniqu_ptr and shouldn't be used unless necessary. So in my case the implementation would look something like this:
#ifndef TILE_GRID_H
#define TILE_GRID_H
#include "Tile.h"
#include <memory>
#include <vector>
using Matrix = std::vector<std::vector<std::unique_ptr<Tile>>>;
using Sorted = std::vector<Tile*>;
class TileGrid {
public:
TileGrid(unsigned w, unsigned h) : width(w), height(h) {
// Resize _dA to desired size
_directAccess.resize(height);
for (unsigned j = 0; j < height; ++j)
for (unsigned i = 0; i < width; ++i)
_directAccess[j].push_back(std::make_unique<Tile>(i, j));
// Link _sZ to _dA
for (auto& i : _directAccess)
for (auto& j : i)
_sortedZ.push_back(j.get());
}
// Sorts the data by it's z value
void sortZ() {
std::sort(_sortedZ.begin(), _sortedZ.end(), [](Tile* a, Tile* b) { return b->z < a->z; });
}
// Operator to read directly from this container
Tile& operator()(unsigned x, unsigned y) {
return *_directAccess[y][x];
}
// Operator returning i-th position from sorted tiles (in my case used for setting sea level)
Tile& operator()(float level) {
level = fmax(fmin(level, 1), 0);
return *_sortedZ[width * height * level];
}
// Iterators
auto begin() { return _sortedZ.begin(); }
auto end() { return _sortedZ.end(); }
auto rbegin() { return _sortedZ.rbegin(); }
auto rend() { return _sortedZ.rend(); }
const unsigned width; // x dimensoin
const unsigned height; // y dimension
private:
Matrix _directAccess;
Sorted _sortedZ;
};
#endif // TILE_GRID_H
You could also use template, but in my case I only needed this for the Tile class. So as you can see, the main _directAccess matrix holds all the unique_ptr while _sortedZ has only raw pointers to data stored in _dA. This is much faster and also safe because of these pointers being tied to one class, and all of them being deleted at the same time. Also I've added overloaded () operators for accessing the data and reused iterators from the _sortedZ vector. And again the width and height being const is only because of the intended usage for this data structure(not resizable, immovable tiles etc.).
If you have any questions or suggestions on what to improve, feel free to comment.
First, some background:
I'm working on a project which requires me to simulate interactions between objects that can be thought of as polygons (usually triangles or quadrilaterals, almost certainly fewer than seven sides), each side of which is composed of the radius of two circles with a variable (and possibly zero) number of 'rivers' of various constant widths passing between them, and out of the polygon through some other side. As these rivers and circles and their widths (and the positions of the circles) are specified at runtime, one of these polygons with N sides and M rivers running through it can be completely described by an array of N+2M pointers, each referring to the relevant rivers/circles, starting from an arbitrary corner of the polygon and passing around (in principal, since rivers can't overlap, they should be specifiable with less data, but in practice I'm not sure how to implement that).
I was originally programming this in Python, but quickly found that for more complex arrangements performance was unacceptably slow. In porting this over to C++ (chosen because of its portability and compatibility with SDL, which I'm using to render the result once optimization is complete) I am at somewhat of a loss as to how to deal with the polygon structure.
The obvious thing to do is to make a class for them, but as C++ lacks even runtime-sized arrays or multi-type arrays, the only way to do this would be with a ludicrously cumbersome set of vectors describing the list of circles, rivers, and their relative placement, or else an even more cumbersome 'edge' class of some kind. Rather than this, it seems like the better option is to use a much simpler, though still annoying, vector of void pointers, each pointing to the rivers/circles as described above.
Now, the question:
If I am correct, the proper way to handle the relevant memory allocations here with the minimum amount of confusion (not saying much...) is something like this:
int doStuffWithPolygons(){
std::vector<std::vector<void *>> polygons;
while(/*some circles aren't assigned a polygon*/){
std::vector<void *> polygon;
void *start = &/*next circle that has not yet been assigned a polygon*/;
void *lastcircle = start;
void *nextcircle;
nextcircle = &/*next circle to put into the polygon*/;
while(nextcircle != start){
polygon.push_back(lastcircle);
std::vector<River *> rivers = /*list of rivers between last circle and next circle*/;
for(unsigned i = 0; i < rivers.size(); i++){
polygon.push_back(rivers[i]);
}
lastcircle = nextcircle;
nextcircle = &/*next circle to put into the polygon*/;
}
polygons.push_back(polygon);
}
int score = 0;
//do whatever you're going to do to evaluate the polygons here
return score;
}
int main(){
int bestscore = 0;
std::vector<int> bestarrangement; //contains position of each circle
std::vector<int> currentarrangement = /*whatever arbitrary starting arrangement is appropriate*/;
while(/*not done evaluating polygon configurations*/){
//fiddle with current arrangement a bit
int currentscore = doStuffWithPolygons();
if(currentscore > bestscore){
bestscore = currentscore;
bestarrangement = currentarrangement;
}
}
//somehow report what the best arrangement is
return 0;
}
If I properly understand how this stuff is handled, I shouldn't need any delete or .clear() calls because everything goes out of scope after the function call. Am I correct about this? Also, is there any part of the above that is needlessly complex, or else is insufficiently complex? Am I right in thinking that this is as simple as C++ will let me make it, or is there some way to avoid some of the roundabout construction?
And if you're response is going to be something like 'don't use void pointers' or 'just make a polygon class', unless you can explain how it will make the problem simpler, save yourself the trouble. I am the only one who will ever see this code, so I don't care about adhering to best practices. If I forget how/why I did something and it causes me problems later, that's my own fault for insufficiently documenting it, not a reason to have written it differently.
edit
Since at least one person asked, here's my original python, handling the polygon creation/evaluation part of the process:
#lots of setup stuff, such as the Circle and River classes
def evaluateArrangement(circles, rivers, tree, arrangement): #circles, rivers contain all the circles, rivers to be placed. tree is a class describing which rivers go between which circles, unrelated to the problem at hand. arrangement contains (x,y) position of the circles in the current arrangement.
polygons = []
unassignedCircles = range(len(circles))
while unassignedCircles:
polygon = []
start = unassignedCircles[0]
lastcircle = start
lastlastcircle = start
nextcircle = getNearest(start,arrangement)
unassignedCircles.pop(start)
unassignedCircles.pop(nextcircle)
while(not nextcircle = start):
polygon += [lastcircle]
polygon += getRiversBetween(tree, lastcircle,nextcircle)
lastlastcircle = lastcircle
lastcircle = nextcircle;
nextcircle = getNearest(lastcircle,arrangement,lastlastcircle) #the last argument here guarantees that the new nextcircle is not the same as the last lastcircle, which it otherwise would have been guaranteed to be.
unassignedCircles.pop(nextcircle)
polygons += [polygon]
return EvaluatePolygons(polygons,circles,rivers) #defined outside.
Void as template argument must be lower case. Other than that it should work, but I also recommend using a base class for that. With a smart pointer you can let the system handle all the memory management.
I am wondering about the way of accessing data in Mat in OpenCV. As you know, we can access to get data in many ways. I want to store image (Width x Height x 1-depth) in Mat and looping access each pixel in the image. Using ptr<>(irow) to get row-pixel and then access each column in the row is the best way? or using at<>(irow,jcol) is the best? or using directly calculate the index by using index = irow*Width + jrow is the best? Anyone know the reason.
Thanks in advance
You can find information here in the documentation: the basic image container and how to scan images.
I advice you to practice with at (here) if you are not experienced with OpenCV or with C language types hell. But the fastest way is ptr as Nolwenn answer because you avoid the type checking.
at<T> does a range check at every call, thus making it slower than ptr<T>, but safer.
So, if you're confident that your range calculations are correct and you want the best possible speed, use ptr<T>.
I realize this is an old question, but I think the current answers are somehow misleading.
Calling both at<T>(...) and ptr<T>(...) will check the boundaries in the debug mode. If the _DEBUG macro is not defined, they will basically calculate y * width + x and give you either the pointer to the data or the data itself. So using at<T>(...) in release mode is equivalent to calculating the pointer yourself, but safer because calculating the pointer is not just y * width + x if the matrix is just a sub-view of another matrix. In debug mode, you get the safety checks.
I think the best way is to process the image row-by-row, getting the row pointer using ptr<T>(y) and then using p[x]. This has the benefit that you don't have to count with various data layouts and still plain pointer for the inner loop.
You can use plain pointers all the way, which would be most efficient because you avoid one the multiplication per row, but then you need to use step1(i) to advance the pointer. I think that using ptr<T>(y) is a nice trade-off.
According to the official documentations, they suggest that the most efficient way is to get the pointer to the row first, and then just use the plain C operator []. It also saves a multiplication for each iteration.
// compute sum of positive matrix elements
// (assuming that M isa double-precision matrix)
double sum=0;
for(int i = 0; i < M.rows; i++)
{
const double* Mi = M.ptr<double>(i);
for(int j = 0; j < M.cols; j++)
sum += std::max(Mi[j], 0.);
}
I was wondering if there's a neater (or better yet, more efficient), method of summing values of a vector/(asymmetric) matrix (a matrix having structure like symmetry, could of course be exploited in looping, but not that pertinent to my question) pointed by a collection of indices. Basically this code could be used to calculate, say, a cost of a route through a 2D matrix. I'm looking for a way to utilize CPU, not GPU.
Here's some relevant code, the one I'm more interested is the first case. I was thinking it's possible to use std::accumulate with a lambda to capture the indices vector, but then I got wondering, if there's already a neater way, perhaps with some other operator. Not a "real problem" as looping is quite clear for my tastes too, but in hunt for the super-neat or more efficient on-liner...
template<typename out_type>
out_type sum(std::vector<float> const& matrix, std::vector<int> const& indices)
{
out_type cost = 0;
for(decltype(indices.size()) i = 0; i < indices.size() - 1; ++i)
{
const int index = indices.size() * indices[i] + indices[i + 1];
cost += matrix[index];
}
const int index = indices.size() * indices[indices.size() - 1] + indices[0];
cost += matrix[index];
return cost;
}
template<typename out_type>
out_type sum(std::vector<std::vector<float>> const& matrix, std::vector<int> const& indices)
{
out_type cost = 0;
for(decltype(indices.size()) i = 0; i < indices.size() - 1; i++)
{
cost += matrix[indices[i]][indices[i + 1]];
}
cost += matrix[indices[indices.size() - 1]][indices[0]];
return cost;
}
Oh, and PPL/TBB are fair game too.
Edit
As an afterthought and as commented to John, would there be a place to employ std::common_type in the calculation as the input and output types may differ? This is a bit of hand-waving and more like learning techniques and libraries. A form of code kata, if you will.
Edit 2
Now, there's one option to make the loops faster, explained in blog writing How to process a STL vector using SSE code by a blogger theowl84. The code uses __m128 directly, but I wonder if there's something in DirectXMath library too.
Edit 3
Now, after writing some concrete code, I found std::accumulate wouldn't get me far. Or at least I couldn't find a way to do the [indices[i + 1] part in matrix[indices[i]][indices[i + 1]]; in a neat way, as std::accumulate itself gives access to only the current value and the sum. In that light, it looks like novelocrat's approach would be the most fruitful one.
DeadMG proposed using parallel_reduce with associativity caveats, further commented by novelocrat. I didn't go about seeing if I could use parallel_reduce, as the interface looked somewhat cumbersome for quick trying. Other than that, even though my code executes serially, it would suffer from the same floating some issues as the parallel reduction version. Though the parallel version would/could be (much) more unpredictable with than serial version, I think.
This goes somewhat tangential, but it may be of interest to some stumbling here, and to those of whom have read this far, may be (very) interested on article Wandering Precision in The NAG blog, which details some intricanciens even introduced by hardware instruction re-ordering! Then there are some ruminations about this very issue in distributed setting in #AltDevBlogADay Synchronous RTS Engines and a Tale of Desyncs. Also, ACCU (the general mailing list is excellent, by the way, and it's free to join) features several articles (e.g. this) on floating point accuracy. A tangential to tangential, I found Fernando Cacciola's Robustness issues in geometric computing to be a good article to read, originally from ACCU mailing list.
And then then the std::common_type. I couldn't find usage for that. If I had two different types as parameters, then the return value could/should be decided by std::common_type. Perhaps more pertinent is std::is_convertible with static_assert to make sure the desired result type is convertible from the argument types (with a clean error message). Other than that, I can only make up a check that the return value/intermediate calculation value accurracy is sufficient to represent the result of summation without overflows and things like that, but I haven't come across a standard facility for that.
That about that, I think, ladies and gentlemen. I enjoyed myself, I hope those reading this got something out of this too.
You could produce an iterator that takes matrix and indices and yields the appropriate values.
class route_iterator
{
vector<vector<float>> const& matrix;
vector<int> const& indices;
int i;
public:
route_iterator(vector<vector<float>> const& matrix_, vector<int> const& indices_,
int begin = 0)
: matrix(matrix_), indices(indices_), i(begin)
{ }
float operator*() {
return matrix[indices[i]][indices[(i + 1) % indices.size()]];
}
route_iterator& operator++() {
++i;
return *this;
}
};
Then your accumulate runs from route_iterator(matrix, indices) to route_iterator(matrix, indices, indices.size()).
Admittedly, though, this sequentializes without a smart compiler turning it into something parallel. What you really want are parallel map and fold (accumulate) operations.
out_type cost = 0;
for(decltype(indices.size()) i = 0; i < indices.size() - 1; i++)
{
cost += matrix[indices[i]][indices[i + 1]];
}
This is basically std::accumulate. PPL provides (and so does TBB, if I recall) parallel_reduce. This requires associativity but not commutivity, and + over the real/float/integer is associative.
The problem is pretty basic.
(I am puzzled why the search didn't find anything)
I have a rectangular "picture" that stores it's pixel color line
after line in a std::vector
I want to copy a rectangular region out of that picture.
How would I elegantly code this in c++?
My first try:
template <class T> std::vector<T> copyRectFromVector(const std::vector<T>& vec, std::size_t startx, std::size_t starty, std::size_t endx, std::size_t endy, std::size_t fieldWidth, std::size_t fieldHeight)
{
using namespace std;
vector<T> ret((endx-startx)*(endy-starty)+10); // 10: chickenfactor
// checks if the given parameters make sense:
if (vec.size() < fieldWidth*endy)
{
cerr << "Error: CopyRectFromVector: vector to small to contain rectangular region!" << std::endl;
return ret;
}
// do the copying line by line:
vector<T>::const_iterator vecIt = vec.begin();
vector<T>::forward_iterator retIt = ret.end();
vecIt += startx + (starty*fieldWidth);
for(int i=starty; i < endy; ++i)
{
std::copy(vecIt, vecIt + endx - startx, retIt);
}
return ret;
}
does not even compile.....
Addit: Clarification:
I know how to do this "by hand". It is not a problem as such. But I would love some c++ stl iterator magic that does the same, but faster and... more c++ stylish.
Addition: I give the algorithm the pictureDataVector, the width and height of the picture and a rectangle denoting the region that I want to copy out of the picture.
The return value should be a new vector with the contents of the rectangle.
Think of it as opening your favorite image editor, and copy a rectangular region out of that.
The Picture is stored as a long 1D array(vector) of pixelcolors.
for (int r = startRow; r < endRow; r++)
for (int c = startCol; c < endCol; c++)
rect[r-startRow][c-startCol] = source[r*rowWidth+c];
Basically the same idea, except that it compiles and is a bit more iteratory:
#include <vector>
#include <algorithm>
#include <iostream>
#include <iterator>
template <typename I, typename O>
void copyRectFromBiggerRect(
I input,
O output,
std::size_t startx,
std::size_t cols,
std::size_t starty,
std::size_t rows,
std::size_t stride
) {
std::advance(input, starty*stride + startx);
while(rows--) {
std::copy(input, input+cols, output);
std::advance(input, stride);
}
}
template<typename T>
std::vector<T> copyRectFromVector (
const std::vector<T> &vec,
std::size_t startx,
std::size_t starty,
std::size_t endx,
std::size_t endy,
std::size_t stride
) {
// parameter-checking omitted: you could also check endx > startx etc.
const std::size_t cols = endx - startx;
const std::size_t rows = endy - starty;
std::vector<T> ret;
ret.reserve(rows*cols);
std::back_insert_iterator<std::vector<T> > output(ret);
typename std::vector<T>::const_iterator input = vec.begin();
copyRectFromBiggerRect(input,output,startx,cols,starty,rows,stride);
return ret;
}
int main() {
std::vector<int> v(20);
for (int i = 0; i < 20; ++i) v[i] = i;
std::vector<int> v2 = copyRectFromVector(v, 0, 0, 1, 2, 4);
std::copy(v2.begin(), v2.end(), std::ostream_iterator<int>(std::cout, "\n"));
}
I wouldn't expect this to be any faster than two loops copying by index. Probably slower, even, although it's basically a race between the overhead of vector::push_back and the gain from std::copy over a loop.
However, it might be more flexible if your other template code is designed to work with iterators in general, rather than vector as a specific container. copyRectFromBiggerRect can use an array, a deque, or even a list as input just as easily as a vector, although it's currently not optimal for iterators which aren't random-access, since it currently advances through each copied row twice.
For other ways of making this more like other C++ code, consider boost::multi_array for multi-dimensional arrays (in which case the implementation would be completely different from this anyway), and avoid returning collections such as vector by value (firstly it can be inefficient if you don't get the return-value optimisation, and second so that control over what resources are allocated is left at the highest level possible).
Your question asks for a C++ way of copying a rectangular field of elements in some container. You have a fairly close example of doing so and will get more in the answers. Let's generalize, though:
You want an iterator that travels a rectangular range of elements over some range of elements. So, how about write a sort of adapter that sits on any container and provides this special iterator.
Gonna go broad strokes with the code here:
vector<pixels> my_picture;
point selTopLeft(10,10), selBotRight(40, 50);
int picWidth(640), picHeight(480);
rectangular_selection<vector<pixels> > selection1(my_picture.begin(),
my_picture.end(), picWidth, picHeight, selTopLeft, selBotRight);
// Now you can use stl algorithms on your rectangular range
vector<pixels> rect_copy = std::copy(selection1.begin(), selection1.end());
// or maybe you don't want to copy, you want
// to modify the selection in place
std::for_each (selection1.begin(), selection1.end(), invert_color);
I'm sure this is totally do-able, but I'm not comfortable coding stl-style template stuff off-the-cuff. If I have some time and you're interested, I may re-edit a rough-draft later, since this is an interesting concept.
See this SO question's answer for inspiration.
Good C++ code must first be easy to read and understand (just like any code), object oriented (like any code in an object oriented language) and then should use the language facilities to simplify the implementation.
I would not worry about using STL algorithms to make it look more C++-ish, it would be much better to start simplifying the usability (interface) in an object oriented way. Do not use plain vectors externally to represent your images. Provide a level of abstraction: make a class that represents the image and provide the funcionality you need in there. That will improve usability by encapsulating details from the regular use (the 2D area object can know its dimensions, the user does not need to pass them as arguments). And this will make the code more robust as the user can make less mistakes.
Even if you use STL containers, always consider readability first. If it is simpler to implement in terms of a regular for loop and it will be harder to read with STL algorithms, forget them: make your code simple and maintainable.
That should be your focus: making better, simpler, more readable code. Use language features to improve your code, not your code to exercise or show off the features in the language. It will pay off if you need to maintain that code two months from now.
Note: Using more STL will not make your code more idiomatic in C++, and I believe this is one of those cases. Abusing STL can make the code actually worse.