Am I checking for a white pixel correctly? - c++

Cross posting as this may be more of a C++ question than a robotics one.
I am currently going through all the pixels in an image to determine what is a white pixel. I then have to decide where to drive the bot. I am also using sensor_msgs/Image.msg that I get from the /camera/rgb/image_raw channel.
However, I can't seem to locate any white image with the code but the RGBa values I set in my model in gazebo all have value 1 as shown in the image below the code .
I logged all my values(more than once) with ROS_INFO_STREAM but no values are 255, let alone 3 consecutive ones.
void process_image_callback(const sensor_msgs::Image img)
{
const int white_pixel = 255;
const int image_slice_width = img.step / 3;
int j = 0;
bool found = false;
for (int i = 0; not found and i < img.height; i++)
{
for (j; j < img.step-3; j += 3)
{
if (img.data[i*img.step + j] == white_pixel)
{
ROS_INFO_STREAM("img.data[i*img.step + (j + 0)]" + std::to_string(img.data[i*img.step + (j + 0)]));
ROS_INFO_STREAM("img.data[i*img.step + (j + 1)]" + std::to_string(img.data[i*img.step + (j + 1)]));
ROS_INFO_STREAM("img.data[i*img.step + (j + 2)]" + std::to_string(img.data[i*img.step + (j + 2)]));
}
// img.data only has one index
if (img.data[i*img.step + j ] == white_pixel and
img.data[i*img.step + (j + 1)] == white_pixel and
img.data[i*img.step + (j + 2)] == white_pixel)
{
found = true;
break;
}
}
ROS_INFO_STREAM("End of j loop");
}
if (found)
{
// go left, forward or right
}
else
{
// no white pixel seen so stop the bot
}
}

I'd suggest making your own custom structures such as these: The code below is not perfect syntax where it is only pseudo code to illustrate the overall concept.
Having your own custom classes and structures allows you to parse various file types of different image formats into a data structure format that is designed to work with your application.
Here, you would have a custom color structure that can be templated so that the color components can be either <integral> or <floating> types... Then having an external function that takes a Color object will check to see if it meets the criteria of being a white Color object. If the r,g,b color channels are indeed 255 or 1.0 and the alpha channel is not 0, then the pixel should be white!
I also provided specializations of the function template to work with both <integral> and <floating> type Color objects. Again the syntax isn't perfect as it is only pseudo-code but to implement these classes and structures should be trivial.
If the color encoding is not "RGBA" then you will have to figure out the actual encoding of the color channels and covert it to be in an "RGBA" format! Once this is done, we can then apply this to any pixel data.
template<typename T>
struct Color {
T r;
T g;
T b;
T a;
Color() : r{0}, g{0}, b{0}, a{1}
{}
Color(T red, T green, T blue, T alpha = 1) :
r{red}, g{green}, b{blue}, a{alpha}
{}
Color(T& red, T& green T& blue, T& alpha = 1) :
r{red}, g{green}, b{blue}, a{alpha}
{}
};
template<typename T, std::is_integral<T>>
bool isWhite(const Color<T>& color) {
return ( (color.r == 255) && (color.g == 255) &&
(color.b == 255) && (color.a != 0) );
}
template<typename T, std::is_floating<T>>
bool isWhite(const Color<T>& color) {
return ( (color.r == 1.0) && (color.g == 1.0) &&
(color.b == 1.0) && (color.a != 0.0) );
}
class Image {
private:
std::string filename_;
std::string encoding_;
uint32_t width_;
uint32_t height_;
uint32_t step_;
uint8_t bigEndian_;
std::vector<Color> pixelData_; // Where Color structures are populated
// from the file's `uint8[] data` matrix.
public:
Image(const std::string& filename) {
// Open The File parse it's contents from the header information
// and populate your internal data structure with the needed
// information from the file.
}
uint32_t width() const { return width_; }
uint32_t height() const { return height_; }
uint32_t stride() const { return step_; }
std::string getEncoding() const { return encoding_; }
std::string getFilename() const { return filename_; }
std::vector<Color> getPixelData() const { return pixelData_; }
};
Then somewhere else in your code where you are processing the information about the pixel data from the image.
void processImage(const Image& image) {
for (auto& pixel : image.getPixelData() ) {
if ( isWhite( pixel ) ) {
// Do something
} else {
// Do Something different
}
}
}
This should make it easier to work with since the Image object is of your own design. The hardest part would be to write the file loader - parser to obtain all of the information from their file format and to convert it to one of your own...
I've done this quite a bit since I work with 3D graphics using DirectX, OpenGL, and now Vulkan. In the beginning, I never relied on 3rd party libraries to load in image or texture files, I originally wrote my own loaders and parsers to accept TGAs, BMPs, PNGs, etc. and I had a single Texture or Image class that can be created from any of those file formats, file types.
This might help you out in the long run. What if you want to extend your application to use different "Cameras"? Now all you would have to do is just o write different file loaders for the next camera type. Parse it's data structures and convert it into your own custom data structure and format. Now you will end up having a plug and play system so too speak! You can easily extend the supported types your application can use.

Related

Constructing a Grid map based on tile indices for filtering movement in SDL2

We are building a basic game engine with a friend of mine as part of a course, and we are now in the process of implementing a grid-map. The grid map will act as a dev-tool for checking the movement filtering logic that is based on tile type i.e., SOLID, EMPTY, etc.
We are using the Tiled editor to extract our tile sets and our map info in the form of .csv and .json files. We have managed to load, construct and render the tile map successfully while also using memoization for the tile indices. Next up, we must load, construct and render a grid map (4×4 per tile) based on the tile indices that we got from Tiled.
I'm trying to figure out the logic for this process, and here is what I have so far. Please correct me if I'm wrong:
Somehow, determine the various tile types dynamically for the various tile sets that the user/dev may want to use with the engine
While parsing the tile indices and constructing the tile map, generate the 4×4 grid element for each tile. While doing this, keep track of each grid element's associated tile index, its position within the tile & its type.
Generate a generic texture for the “solid” tiles that will be rendered on top of the actual tile of the tile map (i.e., red outline and transparent center)
Render the tiles from (3) on top of the current tile map.
Here come the questions:
— Is there a setting in Tiled that I'm not aware of that can generate the tile types based on their index so that we can achieve (1) dynamically and allow the user/dev to provide any tile set? If not, is the only other way to manually review the tile set and determine the mappings? For example, if we have 30 unique tiles, would the only option be to manually write a map in order to store the tile's Index and its respective type?
— Regarding (2), in our lectures we saw a naive example of how we can use the grid elements (16 per tile) to detect collisions and filter movement. That example included parsing each pixel of each tile, extracting the color percentages and determining if that segment of the tile is a solid or not (i.e., mostly white means empty). Just want to confirm my assumption that this seems like an awful solution for determining which parts of a tile should be collide-able, right?
— For the rendering part, assuming that we have already rendered the tile map on the window's surface, should we use SDL_RenderCopy to draw the grid element outlines over the solid tiles?
Here's our tile map parsing (based on the .csv file) & memoization logic:
TileMap class:
#define MAX_WIDTH 192
#define MAX_HEIGHT 336
typedef unsigned short Dim;
typedef short Index;
typedef unsigned char byte;
typedef struct {
Dim x, y;
} Tile;
class TileMap {
public:
explicit TileMap(const std::string& mapPath);
std::vector<std::string> tilesetPaths;
std::vector<std::string> csvLayers;
std::map <Index, std::pair <unsigned short, unsigned short> > preCachedIndices;
std::vector<Index> indices;
unsigned short tilesetHeight{}, tilesetWidth{}; // number of tile rows/columns
unsigned short tileHeight, tileWidth{};
unsigned short mapWidth{}, mapHeight{};
Tile GetTile(Dim row, Dim col);
void WriteTextMap(const Json::Value& data, const std::string& fp);
bool preCacheIndices(const std::string &inputMapPath);
void renderMap(SDL_Window *window, SDL_Rect &cameraOffset);
void printMapInfo();
void initMapInfo(const std::string& path);
};
parsing & memoization logic:
bool TileMap::preCacheIndices(const std::string &inputMapPath) {
std::ifstream file;
std::string line;
file.open(inputMapPath.c_str(), std::ios::in);
if (!file) {
std::cerr << "Error opening file: " << std::strerror(errno) << "\n";
return false;
}
while (std::getline(file, line)) {
std::stringstream ss(line);
std::string item;
while (std::getline(ss, item, ',')) {
Index index = std::stoi(item);
indices.push_back(index);
// apply memoization to speed the process up (O(1) access time for already existing keys)
if (preCachedIndices.find(index) == preCachedIndices.end()) { // index has not been cached yet
unsigned short row = index / tilesetWidth;
unsigned short col = index % tilesetWidth;
std::pair<unsigned short, unsigned short> p = std::pair<unsigned short, unsigned short>(row, col);
preCachedIndices.insert(std::pair<Index, std::pair<unsigned short, unsigned short>>{index, p});
} else {
continue;
}
}
}
file.close();
return true;
}
rendering logic for the tile map:
void TileMap::renderMap(SDL_Window *window, SDL_Rect &cameraOffset) {
SDL_Surface *screenSurface, *tileset;
screenSurface = SDL_GetWindowSurface(window);
if (!screenSurface) {
printf("Could not initialize screen surface. Error message: %s", SDL_GetError());
}
tileset = IMG_Load(tilesetPaths.at(0).c_str()); // load tileset bitmap
SDL_FillRect(screenSurface, nullptr, SDL_MapRGB(screenSurface->format, 0xFF, 0xFF, 0xFF));
int row = 0;
int col = 0;
for (const auto &index: indices) {
SDL_Rect src, dest;
src.x = preCachedIndices.at(index).second * tileWidth;
src.y = preCachedIndices.at(index).first * tileHeight;
src.w = tileWidth; //tileManager.tilemapWidth;
src.h = tileHeight; //tileManager.tilemapHeight;
dest.x = col * tileWidth + cameraOffset.x;
dest.y = row * tileHeight - cameraOffset.y;
dest.w = tileWidth;
dest.h = tileHeight;
SDL_BlitSurface(tileset, &src, screenSurface, &dest);
if (col == mapWidth - 1) {
col = 0;
row++;
} else {
col++;
}
}
SDL_FreeSurface(tileset);
SDL_UpdateWindowSurface(window);
}
Grid map class:
typedef enum {
ThinAir,
LeftSolid,
RightSolid,
TopSolid,
BottomSolid,
Ground,
Floating,
} Masks;
typedef struct {
int x, y;
} Position;
//Operator overloading sto std::map can understand how to handle Position structs
inline bool operator<(Position const &a, Position const &b) {
return std::tie(a.x, a.y) < std::tie(b.x, b.y);
}
typedef struct {
Index index;
Masks mask;
} GridTile;
using GridIndex = byte;
class GridMap {
private:
TileMap *mapManager;
std::map<Position, GridTile> gridTiles;
unsigned short gridTileWidth, gridTileHeight;
public:
explicit GridMap(TileMap *mapManager);
void printGridTiles();
void precacheGridTiles();
};
The calculation for the grid tile width and height is done as follows:
gridTileHeight = mapManager->tileHeight / 4;
gridTileWidth = mapManager->tileWidth / 4;
My friend's approach to generating the grid map tiles so far:
void GridMap::precacheGridTiles() {
SDL_Surface *tileset;
//todo: This should change for layering (at(0) part)
tileset = IMG_Load(mapManager->tilesetPaths.at(0).c_str());
Index i = 0;
int row = 0, col = 0;
for (const auto &index: mapManager->indices) {
for (int j = 0; j < gridTileWidth * gridTileHeight; j++) {
GridTile gridTile;
gridTile.index = i;
//todo: Find way to determine mask
if(index == 61) {
//todo:sp check based on current position if the tile is top,left,right,bottom side
} else {
gridTile.mask = ThinAir;
}
Position position;
position.x = col + (j * gridTileWidth);
position.y = row + (j * gridTileHeight);
gridTiles[position] = gridTile;
}
if (col < mapManager->mapWidth) {
col += gridTileWidth;
} else {
col = 0;
row += gridTileHeight;
}
}
}
Any pointers for direction would be greatly appreciated. We're looking to confirm/adjust our logic and hopefully find an answer to our questions, but don't want an answer that solves the problem for us please.

Rendering the mandelbrot set in color (and making it look good)

I'm trying to render the Mandelbrot set in color and make it look good. I'm want to replicate the image from the Wikipedia page. The Wikipedia image also included an Ultra Fractal 3 parameter file.
mandelZoom00MandelbrotSet {
fractal:
title="mandel zoom 00 mandelbrot set" width=2560 height=1920 layers=1
credits="WolfgangBeyer;8/21/2005"
layer:
method=multipass caption="Background" opacity=100
mapping:
center=-0.7/0 magn=1.3
formula:
maxiter=50000 filename="Standard.ufm" entry="Mandelbrot" p_start=0/0
p_power=2/0 p_bailout=10000
inside:
transfer=none
outside:
density=0.42 transfer=log filename="Standard.ucl" entry="Smooth"
p_power=2/0 p_bailout=128.0
gradient:
smooth=yes rotation=29 index=28 color=6555392 index=92 color=13331232
index=196 color=16777197 index=285 color=43775 index=371 color=3146289
opacity:
smooth=no index=0 opacity=255
}
I've been trying to decipher this file and write a program to reproduce the Wikipedia image. This is my best attempt:
This is part of the renderer. It's kind of messy because I was fiddling around and rewriting large chunks trying to figure this thing out.
using real = double;
using integer = long long;
struct complex {
real r, i;
};
using grey = unsigned char;
struct color {
grey r, g, b;
};
struct real_color {
real r, g, b;
};
grey real_to_grey(real r) {
// converting to srgb didn't help much
return std::min(std::max(std::round(r), real(0.0)), real(255.0));
}
color real_to_grey(real_color c) {
return {real_to_grey(c.r), real_to_grey(c.g), real_to_grey(c.b)};
}
real lerp(real a, real b, real t) {
return std::min(std::max(t * (b - a) + a, real(0.0)), real(255.0));
}
real_color lerp(real_color a, real_color b, real t) {
return {lerp(a.r, b.r, t), lerp(a.g, b.g, t), lerp(a.b, b.b, t)};
}
complex plus(complex a, complex b) {
return {a.r + b.r, a.i + b.i};
}
complex square(complex n) {
return {n.r*n.r - n.i*n.i, real{2.0} * n.r * n.i};
}
complex next(complex z, complex c) {
return plus(square(z), c);
}
real magnitude2(complex n) {
return n.r*n.r + n.i*n.i;
}
real magnitude(complex n) {
return std::sqrt(magnitude2(n));
}
color lerp(real_color a, real_color b, real t) {
return real_to_grey(lerp(a, b, t));
}
struct result {
complex zn;
integer n;
};
result mandelbrot(complex c, integer iterations, real bailout) {
complex z = {real{0.0}, real{0.0}};
integer n = 0;
real bailout2 = bailout * bailout;
for (; n < iterations && magnitude2(z) <= bailout2; ++n) {
z = next(z, c);
}
return {z, n};
}
struct table_row {
real index;
real_color color;
};
real invlerp(real value, real min, real max) {
return (value - min) / (max - min);
}
color lerp(table_row a, table_row b, real index) {
return lerp(a.color, b.color, invlerp(index, a.index, b.index));
}
color mandelbrot_color(complex c, integer iterations, real bailout) {
const result res = mandelbrot(c, iterations, bailout);
if (res.n == iterations) {
// in the set
return {0, 0, 0};
} else {
table_row table[] = {
// colors and indicies from gradient section
{28.0*0.1, {0x00, 0x07, 0x64}},
{92.0*0.1, {0x20, 0x6B, 0xCB}},
{196.0*0.1, {0xED, 0xFF, 0xFF}},
{285.0*0.1, {0xFF, 0xAA, 0x00}},
{371.0*0.1, {0x31, 0x02, 0x30}},
// interpolate towards black as we approach points that are in the set
{real(iterations), {0, 0, 0}}
};
// it should be smooth, but it's not
const real smooth = res.n + real{1.0} - std::log(std::log2(magnitude(res.zn)));
// I know what a for-loop is, I promise
if (smooth < table[1].index) {
return lerp(table[0], table[1], smooth);
} else if (table[1].index <= smooth && smooth < table[2].index) {
return lerp(table[1], table[2], smooth);
} else if (table[2].index <= smooth && smooth < table[3].index) {
return lerp(table[2], table[3], smooth);
} else if (table[3].index <= smooth && smooth < table[4].index) {
return lerp(table[3], table[4], smooth);
} else {
return lerp(table[4], table[5], smooth);
}
}
}
The colors from the gradient section are in a table in mandelbrot_color. The indices from the gradient section are also in the table but I multiplied them by 0.1. The colors look completely off if I don't multiply by 0.1.
The formula section has maxiter=50000 and p_bailout=10000. These are iterations and bailout in the code. I don't know what p_start=0/0 p_power=2/0 means. I don't know why a different bailout is mentioned in the outside section and I don't know what density=0.42, transfer=none, transfer=log means. The gradient section also mentions rotation=29 but I don't understand how a gradient could be rotated.
The reason I am asking this question is that I don't like the white bands around my image (I'd prefer a smooth while glow like in the Wikipedia image). I also don't like the dark purple skin caused by interpolating towards black (the last row in the table in mandelbrot_color). If we remove that row, we end up with a deep blue skin.
I suspect that there is some kind of mapping from the indices in the gradient section to iteration counts. Maybe * 0.1 is an approximation of that mapping that works some of the time. Might have something to do with transfer, density or rotation. Leave a comment if you would like me to post the whole program. It depends on stb_image_write (a single header image writing library).
As a side note, I've I clean up this code and chuck it into a fragment shader, will it likely be faster (generally speaking) than running multithreaded on the CPU?

How to write the expression "img[markers == -1] = [255,0,0]" in C++ OpenCV?

I'm trying to convert the OpenCV Python example here to C++.
I'm stuck in this line:
img[markers == -1] = [255,0,0]
where both img and markers are matrices.
What is the efficient way to write this in C++ OpenCV?
Since I've already written some code to back my comments up, it would be a waste not to write it up.
NB: Testing it on an i7-4930k, with MSVC 2013, OpenCV 3.1, 64bit. Using a randomly generated input image and mask (~9% is set to -1).
As Miki stated, the simplest way to do this in C++ is to use:
cv::MatExpr operator== (const cv::Mat& a, double s) to create a mask
which you the use in cv::Mat::setTo(...)
For example:
void set_where_markers_match(cv::Mat3b img
, cv::Vec3b value
, cv::Mat1i markers
, int32_t target)
{
img.setTo(value, markers == target);
}
Even though this creates an intermediate mask Mat, it's still efficient enough for vast majority of cases (roughly 2.9 ms per 2^20 pixels).
So what if you feel this is really not good enough and you want to have a shot at writing something faster?
Let's begin with something simple -- iterate rows and columns and use cv::Mat::at.
void set_where_markers_match(cv::Mat3b img
, cv::Vec3b value
, cv::Mat1i markers
, int32_t target)
{
CV_Assert(img.size == markers.size);
for (int32_t r(0); r < img.rows; ++r) {
for (int32_t c(0); c < img.cols; ++c) {
if (markers.at<int32_t>(r, c) == target) {
img.at<cv::Vec3b>(r, c) = value;
}
}
}
}
A little better, ~2.4 ms per iteration.
Let's try using Mat iterators instead.
void set_where_markers_match(cv::Mat3b img
, cv::Vec3b value
, cv::Mat1i markers
, int32_t target)
{
CV_Assert(img.size == markers.size);
cv::Mat3b::iterator it_img(img.begin());
cv::Mat1i::const_iterator it_mark(markers.begin());
cv::Mat1i::const_iterator it_mark_end(markers.end());
for (; it_mark != it_mark_end; ++it_mark, ++it_img) {
if (*it_mark == target) {
*it_img = value;
}
}
}
This doesn't seem to help in my case, ~3.1 ms per iteration.
Time to drop the gloves -- let's use pointers to the pixel data. We've got to be careful and account for discontinuous Mats (e.g. when you have ROI from a larger Mat) -- let's do processing row at a time.
void set_where_markers_match(cv::Mat3b img
, cv::Vec3b value
, cv::Mat1i markers
, int32_t target)
{
CV_Assert(img.size == markers.size);
for (int32_t r(0); r < img.rows; ++r) {
uint8_t* it_img(img.ptr<uint8_t>(r));
int32_t const* it_mark(markers.ptr<int32_t>(r));
int32_t const* it_mark_end(it_mark + markers.cols);
for (; it_mark != it_mark_end; ++it_mark, it_img += 3) {
if (*it_mark == target) {
it_img[0] = value[0];
it_img[1] = value[1];
it_img[2] = value[2];
}
}
}
}
This is a step forward, ~1.9 ms per iteration.
The next easiest step with OpenCV could be parallelizing this -- we can take advantage of cv::parallel_for_. Let's split the work by rows, so we can reuse the previous algorithm.
class ParallelSWMM : public cv::ParallelLoopBody
{
public:
ParallelSWMM(cv::Mat3b& img
, cv::Vec3b value
, cv::Mat1i const& markers
, int32_t target)
: img_(img)
, value_(value)
, markers_(markers)
, target_(target)
{
CV_Assert(img.size == markers.size);
}
virtual void operator()(cv::Range const& range) const
{
for (int32_t r(range.start); r < range.end; ++r) {
uint8_t* it_img(img_.ptr<uint8_t>(r));
int32_t const* it_mark(markers_.ptr<int32_t>(r));
int32_t const* it_mark_end(it_mark + markers_.cols);
for (; it_mark != it_mark_end; ++it_mark, it_img += 3) {
if (*it_mark == target_) {
it_img[0] = value_[0];
it_img[1] = value_[1];
it_img[2] = value_[2];
}
}
}
}
ParallelSWMM& operator=(ParallelSWMM const&)
{
return *this;
};
private:
cv::Mat3b& img_;
cv::Vec3b value_;
cv::Mat1i const& markers_;
int32_t target_;
};
void set_where_markers_match(cv::Mat3b img
, cv::Vec3b value
, cv::Mat1i markers
, int32_t target)
{
ParallelSWMM impl(img, value, markers, target);
cv::parallel_for_(cv::Range(0, img.rows), impl);
}
This one runs at 0.5 ms.
Let's take a step back -- the in my case the original approach runs single threaded. What if we parallelized that? We can just replace the operator() in the above code with the following:
virtual void operator()(cv::Range const& range) const
{
img_.rowRange(range).setTo(value_, markers_.rowRange(range) == target_);
}
That runs at around 0.9 ms.
That seems about it for the reasonable implementations. We could have a shot at vectorizing this, but this is far from trivial (pixels are 3 bytes, we have to deal with alignment, etc.) -- let's not go into that, although it could be a nice excercise for the curious reader. However, since we're around 10 clock cycles per pixel even for the worst approach, there's not much potential for improvement.
Make your pick. In general I'd go with the first approach, and worry about it only once measurements identify this particular operation as a bottleneck.

DirectShow ISampleGrabber: samples are upside-down and color channels reverse

I have to use MS DirectShow to capture video frames from a camera (I just want the raw pixel data).
I was able to build the Graph/Filter network (capture device filter and ISampleGrabber) and implement the callback (ISampleGrabberCB). I receive samples of appropriate size.
However, they are always upside down (flipped vertically that is, not rotated) and the color channels are BGR order (not RGB).
I tried setting the biHeight field in the BITMAPINFOHEADER to both positive and negative values, but it doesn't have any effect. According to MSDN documentation, ISampleGrapper::SetMediaType() ignores the format block for video data anyways.
Here is what I see (recorded with a different camera, not DS), and what DirectShow ISampleGrabber gives me: The "RGB" is actually in red, green and blue respectively:
Sample of the code I'm using, slightly simplified:
// Setting the media type...
AM_MEDIA_TYPE* media_type = 0 ;
this->ds.device_streamconfig->GetFormat(&media_type); // The IAMStreamConfig of the capture device
// Find the BMI header in the media type struct
BITMAPINFOHEADER* bmi_header;
if (media_type->formattype != FORMAT_VideoInfo) {
bmi_header = &((VIDEOINFOHEADER*)media_type->pbFormat)->bmiHeader;
} else if (media_type->formattype != FORMAT_VideoInfo2) {
bmi_header = &((VIDEOINFOHEADER2*)media_type->pbFormat)->bmiHeader;
} else {
return false;
}
// Apply changes
media_type->subtype = MEDIASUBTYPE_RGB24;
bmi_header->biWidth = width;
bmi_header->biHeight = height;
// Set format to video device
this->ds.device_streamconfig->SetFormat(media_type);
// Set format for sample grabber
// bmi_header->biHeight = -(height); // tried this for either and both interfaces, no effect
this->ds.sample_grabber->SetMediaType(media_type);
// Connect filter pins
IPin* out_pin= getFilterPin(this->ds.device_filter, OUT, 0); // IBaseFilter interface for the capture device
IPin* in_pin = getFilterPin(this->ds.sample_grabber_filter, IN, 0); // IBaseFilter interface for the sample grabber filter
out_pin->Connect(in_pin, media_type);
// Start capturing by callback
this->ds.sample_grabber->SetBufferSamples(false);
this->ds.sample_grabber->SetOneShot(false);
this->ds.sample_grabber->SetCallback(this, 1);
// start recording
this->ds.media_control->Run(); // IMediaControl interface
I'm checking return types for every function and don't get any errors.
I'm thankful for any hint or idea.
Things I already tried:
Setting the biHeight field to a negative value for either the capture device filter or the sample grabber or for both or for neither - doesn't have any effect.
Using IGraphBuilder to connect the pins - same problem.
Connecting the pins before changing the media type - same problem.
Checking if the media type was actually applied by the filter by querying it again - but it apparently is applied or at least stored.
Interpreting the image as total byte reversed (last byte first, first byte last) - then it would be flipped horizontally.
Checking if it's a problem with the video camera - when I test it with VLC (DirectShow capture) it looks normal.
My quick hack for this:
void Camera::OutputCallback(unsigned char* data, int len, void *instance_)
{
Camera *instance = reinterpret_cast<Camera*>(instance_);
int j = 0;
for (int i = len-4; i > 0; i-=4)
{
instance->buffer[j] = data[i];
instance->buffer[j + 1] = data[i + 1];
instance->buffer[j + 2] = data[i + 2];
instance->buffer[j + 3] = data[i + 3];
j += 4;
}
Transport::RTPPacket packet;
packet.payload = instance->buffer;
packet.payloadSize = len;
instance->receiver->Send(packet);
}
It's correct on RGB32 color space, for other color spaces this code need to be corrected
I noticed that when using the I420 color space turning disappears.
In addition, most current codecs (VP8) is used as a format raw I/O I420 color space.
I wrote a simple mirroring frame function in color space I420.
void Camera::OutputCallback(unsigned char* data, int len, uint32_t timestamp, void *instance_)
{
Camera *instance = reinterpret_cast<Camera*>(instance_);
Transport::RTPPacket packet;
packet.rtpHeader.ts = timestamp;
packet.payload = data;
packet.payloadSize = len;
if (instance->mirror)
{
Video::ResolutionValues rv = Video::GetValues(instance->resolution);
int k = 0;
// Chroma values
for (int i = 0; i != rv.height; ++i)
{
for (int j = rv.width; j != 0; --j)
{
int l = ((rv.width * i) + j);
instance->buffer[k++] = data[l];
}
}
// U values
for (int i = 0; i != rv.height/2; ++i)
{
for (int j = (rv.width/2); j != 0; --j)
{
int l = (((rv.width / 2) * i) + j) + rv.height*rv.width;
instance->buffer[k++] = data[l];
}
}
// V values
for (int i = 0; i != rv.height / 2; ++i)
{
for (int j = (rv.width / 2); j != 0; --j)
{
int l = (((rv.width / 2) * i) + j) + rv.height*rv.width + (rv.width/2)*(rv.height/2);
if (l == len)
{
instance->buffer[k++] = 0;
}
else
{
instance->buffer[k++] = data[l];
}
}
}
packet.payload = instance->buffer;
}
instance->receiver->Send(packet);
}

How to overlay one image on top of another? C++

I have two RGB images (ppm format), and I want to be able to overlay any pixel that's not purely black of the top image onto the bottom image.
I can successfully load imaged, save images, copy images... but I'm not able to create an image out of the two images in the manner I've described above.
I'm not going to include all the code I have, but the important parts to achieve this are:
struct Pixel
{
unsigned int r;
unsigned int g;
unsigned int b;
}
I overloaded its == operator for easier comparison:
bool Pixel::operator==(const Pixel& other)
{
if(r != other.r)
{
return true;
}
else if(g != other.g)
{
return true;
}
else if(b != other.b)
{
return true;
}
else
{
return false;
}
}
In my Pic class I have this method:
Pic Pic::overlay(const Pic& top, Pixel mask)
{
for(int h = 0; h < height; h++)
{
for(int w = 0; w < width; w++)
{
if(!(top.pixels[h][w] == mask))
{
pixels[h][w] = top.pixels[h][w]; // pixels[][] is a Pixel array
}
}
}
return *this;
}
My main file has this:
Pic top;
Pic bot;
Pic overlay;
Pixel mask:
mask.r = 0;
mask.g = 0;
mask.b = 0;
top.loadimage("top.ppm"); //loadimage() loads the image in and all the data
bot.loadimage("bot.ppm"); //samme thing
overlay = bot.overlay(bot, mask);
overlay.saveimage("overlay.ppm");
The = operator is overloaded for the Pic class, obviously.
The kind of problems I have are these:
In the overlay method, if I leave this if statement as described above, top image will be displayed in the saved file. If I make it without !() part, it'll display the bottom image.
If I get rid of that if() statement completely, and simply try to alter the individual pixels, ex:
pixels[h][w].r = pixels[h][w].r - 50;
The saved image will be altered, all wacky looking, for obvious reasons.
However... .b and .g have no effect on the image.
I'm out of ideas... I've been playing with this for 2 days and I can't figure out what's wrong. Everything works as needed in my program, except this overlay method.
EDIT: So, I found one of the problems in my code and it went back to how I loaded the images with PPM P6 format. Instead of individually loading each pixel as 1 byte, I tried to load them all together, so it created that cushion stuff that happens with structures and binary reading in from compaction... Now I'm able to put the overlay of top image onto the bottom image, but not all colors are showing. Still, better than before.
Here's what I modified my overlay's nested for() loop to look like:
for(int h = 0; h < height; h++)
{
for(int w = 0; w < width; w++)
{
if(top.pixels[h][w].r != mask.r &&
top.pixels[h][w].g != mask.g &&
top.pixels[h][w].b != mask.b )
{
pixels[h][w].r = top.pixels[h][w].r;
pixels[h][w].g = top.pixels[h][w].g;
pixels[h][w].b = top.pixels[h][w].b;
}
}
}
Obviously it still requires work.
This line looks wrong:
overlay = bot.overlay(bot, mask);
Shouldn't it be:
overlay = bot.overlay(top, mask);
And if you want a shorter way to write your equality test then you might like this:
bool Pixel::operator==(const Pixel& other)
{
return (r==other.r && g==other.g && b==other.b);
}
Finally, since you've got an equality operator, then why not do add and assignment ('=') to keep your coder as neat as poss