I have an image, as a cv::Mat. I am getting the raw data from this, with:
uchar* data = (uchar *)pImg.data;
I need to pass this data to a function, then cycle through each pixel of the image. I would have done:
for (int i = 0; i < image.rows; ++i)
{
for (int j = 0; j < image.cols; ++j)
{
//pixel = cv::Point(i,j);
}
}
What is the equivalent of this, using the uchar* data?
It is pretty easy but you need to remember one thing, this image.elemSize() indicates how many bytes there are per pixel (this function is taken from OpenCV mat). So this loop will look little bit different for different image formats. There is a example inside the loop
for (auto i = 0; i < image.rows * image.cols; i+=image.elemSize())
{
//for CV_8UC1
//auto pixel = *(image.data + i)
//for RGB as CV_8UC3
auto r = *(image.data + i)
auto g = *(image.data + i + 1)
auto b = *(image.data + i + 2)
}
The correct pixel value can be accessed from the raw data provided the following parameters are known:
X coordinate of pixel ( column number )
Y coordinate of pixel ( row number )
Image depth (actual data type of a single pixel i.e. uchar, ushort, float etc)
Number of channels of the image
Image step in bytes
Given the above information, the pixel can be accessed as follows (for CV_8UC3 type):
uchar* data = (uchar *)pImg.data;
for (int i = 0; i < image.rows; ++i)
{
for (int j = 0; j < image.cols; ++j)
{
uchar b = data[i * pImg.step + pImg.channels() * j + 0];
uchar g = data[i * pImg.step + pImg.channels() * j + 1];
uchar r = data[i * pImg.step + pImg.channels() * j + 2];
}
}
Related
I already know how to flip an image vertically or horizontally. I have the following code that does so horizontally. The image data here is stored in a QImage as I was working with Qt here.
QImage image(imageFileName);
QImage newImage(image);
if(image.depth() > 8)
{
for (int idx_Y = 0; idx_Y < image.height(); idx_Y++)
{
for (int idx_X = 0; idx_X < image.width(); idx_X++)
{
QRgb rgb = image.pixel(image.width() - 1 - idx_X, idx_Y);
newImage.setPixel(idx_X, idx_Y, rgb);
}
}
}
I'm sure there are faster methods to get it done. However, I don't want any memory allocation on the heap. Could you please tell me what other much faster algorithms there could be?
Thank you.
Elaborating on #Spektres hint
2 nested for loops are not the problem... the setPixel and pixel functions are usually crawlingly slooow on most gfx APIs. Using direct pixel access instead usually boost speed ~1000 times or more ...
This could look like:
QImage image(imageFileName);
QImage newImage(image);
if (image.depth() >= 8) {
const int bytesPerPixel = image.depth() / 8;
for (int y = 0; y < image.height(); ++y) {
char *dataSrc = image.bits() + y * image.bytesPerLine();
char *dataDst = newImage.bits() + y * newImage.bytesPerLine()
+ (newImage.width() - 1) * bytesPerPixel;
for (int i = image.width(); i--;
dataSrc += bytesPerPixel, dataDst -= bytesPerPixel) {
for (int i = 0; i < bytesPerPixel; ++i) dataDst[i] = dataSrc[i];
}
}
}
Please, note that I changed image.depth() > 8 into image.depth() >= 8. (I saw no reason to exclude e.g. QImage::Format_Grayscale8.)
A slightly modified version for mirroring the QImage newImage in-place (considering that it is already copied):
QImage image(imageFileName);
QImage newImage(image);
if (newImage.depth() >= 8) {
const int bytesPerPixel = newImage.depth() / 8;
for (int y = 0; y < image.height(); ++y) {
char *dataL = newImage.bits() + y * newImage.bytesPerLine();
char *dataR = dataL + (newImage.width() - 1) * bytesPerPixel;
for (; dataL < dataR; dataL += bytesPerPixel, dataR -= bytesPerPixel) {
for (int i = 0; i < bytesPerPixel; ++i) std::swap(dataL[i], dataR[i]);
}
}
}
Concerning QImage and qRgb(), you may also notice that Qt supports QImages with 16 bits per component (since Qt 5.12).
I fiddled a bit with this in
SO: Set pixel value of 16 bit grayscale QImage
which might be interesting as well.
I am trying to implement Laplace sharpening using C++ , here's my code so far:
img = imread("cow.png", 0);
Mat convoSharp() {
//creating new image
Mat res = img.clone();
for (int y = 0; y < res.rows; y++) {
for (int x = 0; x < res.cols; x++) {
res.at<uchar>(y, x) = 0.0;
}
}
//variable declaration
int filter[3][3] = { {0,1,0},{1,-4,1},{0,1,0} };
//int filter[3][3] = { {-1,-2,-1},{0,0,0},{1,2,1} };
int height = img.rows;
int width = img.cols;
int filterHeight = 3;
int filterWidth = 3;
int newImageHeight = height - filterHeight + 1;
int newImageWidth = width - filterWidth + 1;
int i, j, h, w;
//convolution
for (i = 0; i < newImageHeight; i++) {
for (j = 0; j < newImageWidth; j++) {
for (h = i; h < i + filterHeight; h++) {
for (w = j; w < j + filterWidth; w++) {
res.at<uchar>(i,j) += filter[h - i][w - j] * img.at<uchar>(h,w);
}
}
}
}
//img - laplace
for (int y = 0; y < res.rows; y++) {
for (int x = 0; x < res.cols; x++) {
res.at<uchar>(y, x) = img.at<uchar>(y, x) - res.at<uchar>(y, x);
}
}
return res;
}
I don't really know what went wrong, I also tried different filter (1,1,1),(1,-8,1),(1,1,1) and the result is also same (more or less). I don't think that I need to normalize the result because the result is in range of 0 - 255. Can anyone explain what really went wrong in my code?
Problem: uchar is too small to hold partial results of filerting operation.
You should create a temporary variable and add all the filtered positions to this variable then check if value of temp is in range <0,255> if not, you need to clamp the end result to fit <0,255>.
By executing below line
res.at<uchar>(i,j) += filter[h - i][w - j] * img.at<uchar>(h,w);
partial result may be greater than 255 (max value in uchar) or negative (in filter you have -4 or -8). temp has to be singed integer type to handle the case when partial result is negative value.
Fix:
for (i = 0; i < newImageHeight; i++) {
for (j = 0; j < newImageWidth; j++) {
int temp = res.at<uchar>(i,j); // added
for (h = i; h < i + filterHeight; h++) {
for (w = j; w < j + filterWidth; w++) {
temp += filter[h - i][w - j] * img.at<uchar>(h,w); // add to temp
}
}
// clamp temp to <0,255>
res.at<uchar>(i,j) = temp;
}
}
You should also clamp values to <0,255> range when you do the subtraction of images.
The problem is partially that you’re overflowing your uchar, as rafix07 suggested, but that is not the full problem.
The Laplace of an image contains negative values. It has to. And you can’t clamp those to 0, you need to preserve the negative values. Also, it can values up to 4*255 given your version of the filter. What this means is that you need to use a signed 16 bit type to store this output.
But there is a simpler and more efficient approach!
You are computing img - laplace(img). In terms of convolutions (*), this is 1 * img - laplace_kernel * img = (1 - laplace_kernel) * img. That is to say, you can combine both operations into a single convolution. The 1 kernel that doesn’t change the image is [(0,0,0),(0,1,0),(0,0,0)]. Subtract your Laplace kernel from that and you obtain [(0,-1,0),(-1,5,-1),(0,-1,0)].
So, simply compute the convolution with that kernel, and do it using int as intermediate type, which you then clamp to the uchar output range as shown by rafix07.
I have an image. I want to modify the image in such a way that the value of each pixel is increased by a particular margin. Next I want to save the newly created image and display it.
I tried changing each pixel value but was only able to set it to constant value. I don't want all the pixels to be constant but their value should increase by (lets say) 50.
#include <iostream>
#include "opencv2/highgui/highgui.hpp"
using namespace std;
using namespace cv;
int main()
{
Mat image;
image = imread("/home/rohit_136/Desktop/image.jpg",CV_LOAD_IMAGE_UNCHANGED);
for (int i = 0; i < image.cols; i++) {
for (int j = 0; j < image.rows; j++) {
Vec3b intensity = image.at<Vec3b>(j, i)=50
}
}
return 0;
}
Vec3b is a vector which contains 3 bytes (chars). Each byte denotes the value of an individual BGR (blue, green, red) or RGB channel. You should traverse this vector and modify each channel independently. Be careful, because we are talking about bytes, each byte should take a value between 0 and 255. I suggest setting a threshold for avoiding overflow.
Note: if you don't care about alpha channel, I suggest loading your image with CV_LOAD_IMAGE_COLOR. This will ensure that your image is loaded in the BGR format.
cv::Mat Image =cv::imread("image.jpg");
uint8_t * orig_ptr = (uint8_t*)Image.data;
for (int y = 0; y < Image.rows; y++)
{
for (int x = 0; x < Image.cols; x++)
{
/*Reading Pizel Values*/
int R = orig_ptr[x * 3 + y*Image.step + 2];/*R -Pixel*/
int G = orig_ptr[x * 3 + y*Image.step + 1];/*G-Pixel*/
int B = orig_ptr[x * 3 + y*Image.step]; /*B-Pixel*/
/*Updating Values*/
orig_ptr[x * 3 + y*Image.step + 2] = cv::saturate_cast<uint8_t>(R + 50);
orig_ptr[x * 3 + y*Image.step + 1] =cv::saturate_cast<uint8_t>(G + 50);
orig_ptr[x * 3 + y*Image.step]=cv::saturate_cast<uint8_t>(B + 50);
}
}
I have problem with access to data from Mat.data. I perform operations on picture and I need access to every pixel separately.
I have to necessairly operate on simple types (float, int etc).
The way I am accesing data is like below:
for (int idx = 0; idx < image.rows; idx++) {
for (int idy = 0; idy < image.cols; idy++) {
int color_tid = idx * image.cols * image.channels() + idy * image.channels();
uint8_t blue = image.data[color_tid];
uint8_t green = image.data[color_tid + 1];
uint8_t red = image.data[color_tid + 2];
float pixelVal = (int) blue + (int) green + (int) red;
(...)
}
}
This approach is working correctly only to square images (NxN pixels), but for NxM there are anomalies outside the square area (smaller edge).
Do anyone know any other way to acces data of the picture Mat?
Example image (correct result):
anomalies (my problem)
I recommend to follow the data layout in a Mat
so your loop becomes:
for (int r = 0; r < img.rows; ++r)
{
for (int c = 0; c < img.cols; ++c)
{
uchar* ptr = img.data + img.step[0] * r + img.step[1] * c;
uchar blue = ptr[0];
uchar green = ptr[1];
uchar red = ptr[2];
float pixelVal = blue + green + red;
}
}
You can eventually perform a little less operations like:
for (int r = 0; r < img.rows; ++r)
{
uchar* pt = img.data + img.step[0] * r;
for (int c = 0; c < img.cols; ++c)
{
uchar* ptr = pt + img.step[1] * c;
uchar blue = ptr[0];
uchar green = ptr[1];
uchar red = ptr[2];
float pixelVal = blue + green + red;
}
}
The code in your question contains a few flaws:
rows and columns are swapped (row is Y, column is X)
step size between rows (aka "stride") does not always equal to the number of columns
Using Mat::at<> makes the code much simpler:
for(int row = 0; row < image.rows; ++row)
{
for(int col = 0; col < image.cols; ++col)
{
const Vec3b& pt = image.at<Vec3b>(row, col);
float pixelVal = pt[0] + pt[1] + pt[2];
...
}
}
everyone I am trying to implement patter matching with FFT but I am not sure what the result should be (I think I am missing something even though a read a lot of stuff about the problem and tried a lot of different implementations this one is the best so far). Here is my FFT correlation function.
void fft2d(fftw_complex**& a, int rows, int cols, bool forward = true)
{
fftw_plan p;
for (int i = 0; i < rows; ++i)
{
p = fftw_plan_dft_1d(cols, a[i], a[i], forward ? FFTW_FORWARD : FFTW_BACKWARD, FFTW_ESTIMATE);
fftw_execute(p);
}
fftw_complex* t = (fftw_complex*)fftw_malloc(rows * sizeof(fftw_complex));
for (int j = 0; j < cols; ++j)
{
for (int i = 0; i < rows; ++i)
{
t[i][0] = a[i][j][0];
t[i][1] = a[i][j][1];
}
p = fftw_plan_dft_1d(rows, t, t, forward ? FFTW_FORWARD : FFTW_BACKWARD, FFTW_ESTIMATE);
fftw_execute(p);
for (int i = 0; i < rows; ++i)
{
a[i][j][0] = t[i][0];
a[i][j][1] = t[i][1];
}
}
fftw_free(t);
}
int findCorrelation(int argc, char* argv[])
{
BMP bigImage;
BMP keyImage;
BMP result;
RGBApixel blackPixel = { 0, 0, 0, 1 };
const bool swapQuadrants = (argc == 4);
if (argc < 3 || argc > 4) {
cout << "correlation img1.bmp img2.bmp" << endl;
return 1;
}
if (!keyImage.ReadFromFile(argv[1])) {
return 1;
}
if (!bigImage.ReadFromFile(argv[2])) {
return 1;
}
//Preparations
const int maxWidth = std::max(bigImage.TellWidth(), keyImage.TellWidth());
const int maxHeight = std::max(bigImage.TellHeight(), keyImage.TellHeight());
const int rowsCount = maxHeight;
const int colsCount = maxWidth;
BMP bigTemp = bigImage;
BMP keyTemp = keyImage;
keyImage.SetSize(maxWidth, maxHeight);
bigImage.SetSize(maxWidth, maxHeight);
for (int i = 0; i < rowsCount; ++i)
for (int j = 0; j < colsCount; ++j) {
RGBApixel p1;
if (i < bigTemp.TellHeight() && j < bigTemp.TellWidth()) {
p1 = bigTemp.GetPixel(j, i);
} else {
p1 = blackPixel;
}
bigImage.SetPixel(j, i, p1);
RGBApixel p2;
if (i < keyTemp.TellHeight() && j < keyTemp.TellWidth()) {
p2 = keyTemp.GetPixel(j, i);
} else {
p2 = blackPixel;
}
keyImage.SetPixel(j, i, p2);
}
//Here is where the transforms begin
fftw_complex **a = (fftw_complex**)fftw_malloc(rowsCount * sizeof(fftw_complex*));
fftw_complex **b = (fftw_complex**)fftw_malloc(rowsCount * sizeof(fftw_complex*));
fftw_complex **c = (fftw_complex**)fftw_malloc(rowsCount * sizeof(fftw_complex*));
for (int i = 0; i < rowsCount; ++i) {
a[i] = (fftw_complex*)fftw_malloc(colsCount * sizeof(fftw_complex));
b[i] = (fftw_complex*)fftw_malloc(colsCount * sizeof(fftw_complex));
c[i] = (fftw_complex*)fftw_malloc(colsCount * sizeof(fftw_complex));
for (int j = 0; j < colsCount; ++j) {
RGBApixel p1;
p1 = bigImage.GetPixel(j, i);
a[i][j][0] = (0.299*p1.Red + 0.587*p1.Green + 0.114*p1.Blue);
a[i][j][1] = 0.0;
RGBApixel p2;
p2 = keyImage.GetPixel(j, i);
b[i][j][0] = (0.299*p2.Red + 0.587*p2.Green + 0.114*p2.Blue);
b[i][j][1] = 0.0;
}
}
fft2d(a, rowsCount, colsCount);
fft2d(b, rowsCount, colsCount);
result.SetSize(maxWidth, maxHeight);
for (int i = 0; i < rowsCount; ++i)
for (int j = 0; j < colsCount; ++j) {
fftw_complex& y = a[i][j];
fftw_complex& x = b[i][j];
double u = x[0], v = x[1];
double m = y[0], n = y[1];
c[i][j][0] = u*m + n*v;
c[i][j][1] = v*m - u*n;
int fx = j;
if (fx>(colsCount / 2)) fx -= colsCount;
int fy = i;
if (fy>(rowsCount / 2)) fy -= rowsCount;
float r2 = (fx*fx + fy*fy);
const double cuttoffCoef = (maxWidth * maxHeight) / 37992.;
if (r2<128 * 128 * cuttoffCoef)
c[i][j][0] = c[i][j][1] = 0;
}
fft2d(c, rowsCount, colsCount, false);
const int halfCols = colsCount / 2;
const int halfRows = rowsCount / 2;
if (swapQuadrants) {
for (int i = 0; i < halfRows; ++i)
for (int j = 0; j < halfCols; ++j) {
std::swap(c[i][j][0], c[i + halfRows][j + halfCols][0]);
std::swap(c[i][j][1], c[i + halfRows][j + halfCols][1]);
}
for (int i = halfRows; i < rowsCount; ++i)
for (int j = 0; j < halfCols; ++j) {
std::swap(c[i][j][0], c[i - halfRows][j + halfCols][0]);
std::swap(c[i][j][1], c[i - halfRows][j + halfCols][1]);
}
}
for (int i = 0; i < rowsCount; ++i)
for (int j = 0; j < colsCount; ++j) {
const double& g = c[i][j][0];
RGBApixel pixel;
pixel.Alpha = 0;
int gInt = 255 - static_cast<int>(std::floor(g + 0.5));
pixel.Red = gInt;
pixel.Green = gInt;
pixel.Blue = gInt;
result.SetPixel(j, i, pixel);
}
BMP res;
res.SetSize(maxWidth, maxHeight);
result.WriteToFile("result.bmp");
return 0;
}
Sample output
This question would probably be more appropriately posted on another site like cross validated (metaoptimize.com used to also be a good one, but it appears to be gone)
That said:
There's two similar operations you can perform with FFT: convolution and correlation. Convolution is used for determining how two signals interact with each-other, whereas correlation can be used to express how similar two signals are to each-other. Make sure you're doing the right operation as they're both commonly implemented throught a DFT.
For this type of application of DFTs you usually wouldn't extract any useful information in the fourier spectrum unless you were looking for frequencies common to both data sources or whatever (eg, if you were comparing two bridges to see if their supports are spaced similarly).
Your 3rd image looks a lot like the power domain; normally I see the correlation output entirely grey except where overlap occurred. Your code definitely appears to be computing the inverse DFT, so unless I'm missing something the only other explanation I've come up with for the fuzzy look could be some of the "fudge factor" code in there like:
if (r2<128 * 128 * cuttoffCoef)
c[i][j][0] = c[i][j][1] = 0;
As for what you should expect: wherever there are common elements between the two images you'll see a peak. The larger the peak, the more similar the two images are near that region.
Some comments and/or recommended changes:
1) Convolution & correlation are not scale invariant operations. In other words, the size of your pattern image can make a significant difference in your output.
2) Normalize your images before correlation.
When you get the image data ready for the forward DFT pass:
a[i][j][0] = (0.299*p1.Red + 0.587*p1.Green + 0.114*p1.Blue);
a[i][j][1] = 0.0;
/* ... */
How you grayscale the image is your business (though I would've picked something like sqrt( r*r + b*b + g*g )). However, I don't see you doing anything to normalize the image.
The word "normalize" can take on a few different meanings in this context. Two common types:
normalize the range of values between 0.0 and 1.0
normalize the "whiteness" of the images
3) Run your pattern image through an edge enhancement filter. I've personally made use of canny, sobel, and I think I messed with a few others. As I recall, canny was "quick'n dirty", sobel was more expensive, but I got comparable results when it came time to do correlation. See chapter 24 of the "dsp guide" book that's freely available online. The whole book is worth your time, but if you're low on time then at a minimum chapter 24 will help a lot.
4) Re-scale the output image between [0, 255]; if you want to implement thresholds, do it after this step because the thresholding step is lossy.
My memory on this one is hazy, but as I recall (edited for clarity):
You can scale the final image pixels (before rescaling) between [-1.0, 1.0] by dividing off the largest power spectrum value from the entire power spectrum
The largest power spectrum value is, conveniently enough, the center-most value in the power spectrum (corresponding to the lowest frequency)
If you divide it off the power spectrum, you'll end up doing twice the work; since FFTs are linear, you can delay the division until after the inverse DFT pass to when you're re-scaling the pixels between [0..255].
If after rescaling most of your values end up so black you can't see them, you can use a solution to the ODE y' = y(1 - y) (one example is the sigmoid f(x) = 1 / (1 + exp(-c*x) ), for some scaling factor c that gives better gradations). This has more to do with improving your ability to interpret the results visually than anything you might use to programmatically find peaks.
edit I said [0, 255] above. I suggest you rescale to [128, 255] or some other lower bound that is gray rather than black.