Goal
Getting the same quality result when using OpenCV Mat as when using Leptonica Pix when doing OCR with Tesseract.
Environment
C++17, OpenCV 3.4.1, Tesseract 3.05.01, Leptonica 1.74.4, Visual Studio Community 2017, Windows 10 Pro 64-bit
Description
I'm working with Tesseract and OCR, and have found what I think is a peculiar behaviour.
This is my input image:
And this is my code:
#include "stdafx.h"
#include <iostream>
#include <opencv2/opencv.hpp>
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#pragma comment(lib, "ws2_32.lib")
using namespace std;
using namespace cv;
using namespace tesseract;
void opencvVariant(string titleFile);
void leptonicaVariant(const char* titleFile);
int main()
{
cout << "Tesseract with OpenCV and Leptonica" << endl;
const char* titleFile = "raptor-companion-2.jpg";
opencvVariant(titleFile);
leptonicaVariant(titleFile);
cout << endl;
system("pause");
return 0;
}
void opencvVariant(string titleFile) {
cout << endl << "OpenCV variant..." << endl;
TessBaseAPI ocr;
ocr.Init(NULL, "eng");
Mat image = imread(titleFile);
ocr.SetImage(image.data, image.cols, image.rows, 1, image.step);
char* outText = ocr.GetUTF8Text();
int confidence = ocr.MeanTextConf();
cout << "Text: " << outText << endl;
cout << "Confidence: " << confidence << endl;
}
void leptonicaVariant(const char* titleFile) {
cout << endl << "Leptonica variant..." << endl;
TessBaseAPI ocr;
ocr.Init(NULL, "eng");
Pix *image = pixRead(titleFile);
ocr.SetImage(image);
char* outText = ocr.GetUTF8Text();
int confidence = ocr.MeanTextConf();
cout << "Text: " << outText << endl;
cout << "Confidence: " << confidence << endl;
}
The methods opencvVariant and leptonicaVariant is basically the same except that one is using the class Mat from OpenCV and the other Pix from Leptonica. Yet, the result is quite different.
OpenCV variant...
Text: Rapton
Confidence: 68
Leptonica variant...
Text: Raptor Companion
Confidence: 83
As one can see in the output above, the Pix variant gives a much better result than the Mat variant. Since my code relies heavily on OpenCV for the computer vision before the OCR its essential for me that the OCR works well with OpenCV and its' classes.
Questions
Why does Pix give a better result than Mat, and vice versa?
How could the algorithm be changed to make the Mat variant as efficient as the Pix variant?
OpenCV imread function by default reads image as colored, which means you get pixels as BGRBGRBGR....
In your example you are assuming opencv image is grayscale, so there are 2 ways of fixing that:
Change your SetImage line according to number of channels in opencv image
ocr.SetImage((uchar*)image.data, image.size().width, simageb.size().height, image.channels(), image.step1());
Convert your opencv image to grayscale with 1 channel
cv::cvtColor(image, image, CV_BGR2GRAY);
Related
I am working on a OpenCV project in C++. In this I am trying to read an image and then resize the image, but on resizing the image, I get Segmentation Fault.
I am using Ubuntu 20.04 and install OpenCV 4.5.4 following this tutorial: https://linuxize.com/post/how-to-install-opencv-on-ubuntu-20-04/
Here is the code I am using:
#include<opencv2/opencv.hpp>
#include<iostream>
using namespace std;
using namespace cv;
int main()
{
// This works: Printing out the OpenCV version
cout << "OpenCV version : " << CV_VERSION << endl;
cout << "Major version : " << CV_MAJOR_VERSION << endl;
cout << "Minor version : " << CV_MINOR_VERSION << endl;
cout << "Subminor version : " << CV_SUBMINOR_VERSION << endl;
// This Works: Read the image using imread function
Mat image = imread("./test_image.jpg");
cv::Mat dst;
// This is where it fails.
cv::resize(image, dst, cv::Size(150,150));
cv::namedWindow("Source", cv::WINDOW_AUTOSIZE );
cv::imshow("Source", image);
cv::namedWindow("resize", cv::WINDOW_AUTOSIZE );
cv::imshow("resize", dst);
waitKey(0);
return 0;
}
I am able to show the loaded image/video frame before resizing.
Can someone please help me here as to where I am going wrong? I have been stuck on this for past 2 days, tried almost all tutorials and solutions available online, but nothing worked. Thanks.
First of all check if the image is empty or not using image.empty().
If that's not the case, then it's probably a issue with OpenCV. I faced the same problem with OpenCV 4.2. Updating the OpenCV version might solve the problem. OpenCV Version 4.6 perfectly worked for me.
I downloaded a project which allows to get frames from Pi camera module with OpenCV. When I try to run the downloaded code, It works without a problem. I just want to apply simple trheshold operation on frames but I got the error shown below.
I check the frames' type and channel. image.channels() returns 1 and image.type() returns 0. I can't see any reason for the threshold operation error.
What is the problem here?
The error:
The Code:
#include "cap.h"
#include <opencv2/opencv.hpp>
#include <opencv2/highgui/highgui.hpp>
using namespace cv;
using namespace std;
int main() {
namedWindow("Video");
// Create capture object, similar to VideoCapture
// PiCapture(width, height, color_flag);
// color_flag = true => color images are captured,
// color_flag = false => greyscale images are captured
PiCapture cap(320, 240, false);
Mat image,binary;
double time = 0;
unsigned int frames = 0;
cout << "Press 'q' to quit" << endl;
while(char(waitKey(1)) != 'q') {
double t0 = getTickCount();
image = cap.grab();
std::cout<<image.channels()<< endl;//check for channel
cout<<image.type()<< endl;//check for type
threshold(image,binary,150,255,THRESH_BINARY);//threshold operation
frames++;
if(!image.empty()) imshow("Hello", image);
else cout << "Frame dropped" << endl;
time += (getTickCount() - t0) / getTickFrequency();
cout << frames / time << " fps" << endl;
}
return 0;
}
Assertion m.ndims >= 2 is to check that the matrix in question is a valid two dimensional image. While you have a conditional to show the image only if it's not empty. But the assertion must be failing before the program reaches that conditional. So you shouldn't be seeing any image window pop up.
I am trying to read data from an industrial camera using the V4l linux driver and C++. I would like to display the result using the OpenCV. I read the buffer, create an Mat object, which actually contains values in range 0...255.
The problem seems to be the imshow() call. When commenting this line out, an actual window without an image is displayed. Once uncommented no window is diplayed and also no output in terminal after this line is shown. I am not able to find a solution on my own, all examples I found look the same as my code to me.
Here is the code:
#include <fcntl.h>
#include "opencv/cv.h"
#include "opencv/highgui.h"
#include <libv4l2.h>
#include <libv4l1.h>
#include <linux/videodev2.h>
#include <sys/ioctl.h>
#define BUFFERSIZE 357120 // 744 * 480
using namespace cv;
using namespace std;
int main(int argc, char **argv) {
int cameraHandle, i;
unsigned char pictureBuffer[BUFFERSIZE];
char cameraDevice[] = "/dev/video0";
struct v4l2_control V4L2_control;
/* open camera device */
if (( cameraHandle = v4l1_open(cameraDevice, O_RDONLY)) == -1 ){
printf("Unable to open the camera");
return -1;
}
// disable auto exposure
V4L2_control.id = V4L2_CID_EXPOSURE_AUTO;
V4L2_control.value = V4L2_EXPOSURE_SHUTTER_PRIORITY;
ioctl(cameraHandle, VIDIOC_S_CTRL, &V4L2_control);
// set exposure time
V4L2_control.id = V4L2_CID_EXPOSURE_ABSOLUTE;
V4L2_control.value = 2;
ioctl(cameraHandle, VIDIOC_S_CTRL, &V4L2_control);
// get 5 pictures to warm up the camera
for (i = 0; i <= 5; i++){
v4l1_read(cameraHandle, pictureBuffer, BUFFERSIZE);
}
// show pictures
Mat mat = Mat(744, 480, CV_8UC3, (void*)pictureBuffer);
cout << "M = " << endl << " " << mat << endl << endl; // display the image data
namedWindow("imagetest", CV_WINDOW_AUTOSIZE );
imshow("imagetest", mat);
waitKey(30);
cout << "test output" << endl;
//clenup
v4l1_close(cameraHandle);
destroyWindow("imagetest");
return 0;
}
EDIT:
Well, after running the code in terminal instead of ecipse I saw a segmentation fault Even commenting everything behind the
cout << "M = " << endl << " " << mat << endl << endl;
line gives me this error.
Solved. The problem lied in the wrong file format. CV_8UC1 or CV_8U instead of CV_8UC3 brought and an output. The difference between those formats is described here: In OpenCV, what's the difference between CV_8U and CV_8UC1?
I am developing a simple camera viewer to test Basler camera acA1300-30gc. I am working in Ubuntu 14.04 with Basler Pylon 4 and OPENCV version 2.4.8 because I am going to develop a machine vision application and I need to analyze frames on the fly.
Based on OpenCV Display Image Tutorial, Sample Code in Pylon Documentation and this similar question I write the following code.
Code:
int main(int argc, char* argv[]) {
Pylon::PylonAutoInitTerm autoInitTerm;
Mat image(IM_HEIGHT, IM_WIDTH, CV_8UC3);
CGrabResultPtr ptrGrabResult;
//namedWindow(WIN_NAME,CV_WINDOW_AUTOSIZE);
try {
CInstantCamera camera( CTlFactory::GetInstance().CreateFirstDevice());
cout << "Using device " << camera.GetDeviceInfo().GetModelName() << endl;
camera.StartGrabbing();
while(camera.IsGrabbing()){
camera.RetrieveResult( 5000, ptrGrabResult, TimeoutHandling_ThrowException);
if (ptrGrabResult->GrabSucceeded()){
memcpy(image.ptr(),ptrGrabResult->GetBuffer(),ptrGrabResult->GetWidth()*ptrGrabResult->GetHeight());
//if(!image.empty())
//imshow(WIN_NAME,image);
//if(waitKey(30)==27){
// camera.StopGrabbing();
//}
}
}
} catch (GenICam::GenericException &e) {
cerr << "An exception occurred." << endl << e.GetDescription() << endl;
}
//destroyWindow(WIN_NAME);
return 0;
}
I don't know why uncommenting namedWindow(WIN_NAME,CV_WINDOW_AUTOSIZE); the camera doesn't grab anymore.
I would be very grateful if some one could help me please.
// Grab.cpp
/*
Note: Before getting started, Basler recommends reading the Programmer's Guide topic
in the pylon C++ API documentation that gets installed with pylon.
If you are upgrading to a higher major version of pylon, Basler also
strongly recommends reading the Migration topic in the pylon C++ API documentation.
This sample illustrates how to grab and process images using the CInstantCamera class.
The images are grabbed and processed asynchronously, i.e.,
while the application is processing a buffer, the acquisition of the next buffer is done
in parallel.
The CInstantCamera class uses a pool of buffers to retrieve image data
from the camera device. Once a buffer is filled and ready,
the buffer can be retrieved from the camera object for processing. The buffer
and additional image data are collected in a grab result. The grab result is
held by a smart pointer after retrieval. The buffer is automatically reused
when explicitly released or when the smart pointer object is destroyed.
*/
#include <pylon/PylonIncludes.h>
#ifdef PYLON_WIN_BUILD
#include <pylon/PylonGUI.h>
#endif
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/core/core.hpp"
using namespace cv;
// Namespace for using pylon objects.
using namespace Pylon;
// Namespace for using cout.
using namespace std;
// Number of images to be grabbed.
static const uint32_t c_countOfImagesToGrab = 100;
int main(int argc, char* argv[])
{
// The exit code of the sample application.
int exitCode = 0;
// Automagically call PylonInitialize and PylonTerminate to ensure
// the pylon runtime system is initialized during the lifetime of this object.
Pylon::PylonAutoInitTerm autoInitTerm;
CGrabResultPtr ptrGrabResult;
namedWindow("CV_Image",WINDOW_AUTOSIZE);
try
{
CInstantCamera camera( CTlFactory::GetInstance().CreateFirstDevice());
cout << "Using device " << camera.GetDeviceInfo().GetModelName() << endl;
camera.Open();
GenApi::CIntegerPtr width(camera.GetNodeMap().GetNode("Width"));
GenApi::CIntegerPtr height(camera.GetNodeMap().GetNode("Height"));
Mat cv_img(width->GetValue(), height->GetValue(), CV_8UC3);
camera.StartGrabbing();
CPylonImage image;
CImageFormatConverter fc;
fc.OutputPixelFormat = PixelType_RGB8packed;
while(camera.IsGrabbing()){
camera.RetrieveResult( 5000, ptrGrabResult, TimeoutHandling_ThrowException);
if (ptrGrabResult->GrabSucceeded()){
fc.Convert(image, ptrGrabResult);
cv_img = cv::Mat(ptrGrabResult->GetHeight(), ptrGrabResult->GetWidth(), CV_8UC3,(uint8_t*)image.GetBuffer());
imshow("CV_Image",cv_img);
waitKey(1);
if(waitKey(30)==27){
camera.StopGrabbing();
}
}
}
}
catch (GenICam::GenericException &e)
{
// Error handling.
cerr << "An exception occurred." << endl
<< e.GetDescription() << endl;
exitCode = 1;
}
// Comment the following two lines to disable waiting on exit
cerr << endl << "Press Enter to exit." << endl;
while( cin.get() != '\n');
return exitCode;
}
take the grab.cpp sample code and add following code into the garb.cpp and it will work.
CImageFormatConverter fc;
fc.OutputPixelFormat = PixelType_BGR8packed;
CPylonImage image;
if (ptrGrabResult->GrabSucceeded())
{
fc.Convert(image, ptrGrabResult);
Mat cv_img = cv::Mat(ptrGrabResult->GetHeight(), ptrGrabResult->GetWidth(), CV_8UC3, (uint8_t*)image.GetBuffer());
imshow(src_window,cv_img);
waitKey(1);
}
I am trying to understand how to get the descriptor for a given KeyPoint in OpenCV. So far my code looks like follows:
#include <iostream>
#include "opencv2/opencv.hpp"
typedef cv::Mat Image;
int main(int argc, const char * argv[])
{
Image imgA = cv::imread("images/buddhamulticam_total100.png",
CV_LOAD_IMAGE_GRAYSCALE);
Image imgB = cv::imread("images/buddhamulticam_total101.png",
CV_LOAD_IMAGE_GRAYSCALE);
cv::Ptr<cv::FeatureDetector> detector =
cv::FeatureDetector::create("ORB");
cv::Ptr<cv::DescriptorExtractor> descriptor =
cv::DescriptorExtractor::create("ORB");
std::vector<cv::KeyPoint> keyPointsA, keyPointsB;
keyPointsA.push_back(cv::KeyPoint(0,0,5));
keyPointsB.push_back(cv::KeyPoint(10,10,5));
cv::Mat descriptorA, descriptorB;
descriptor->compute(imgA, keyPointsA, descriptorA);
descriptor->compute(imgB, keyPointsB, descriptorB);
std::cout << "DescriptorA (" << descriptorA.rows << "," <<
descriptorA.cols << ")" << std::endl;
std::cout << "DescriptorB (" << descriptorB.rows << ","
<< descriptorB.cols << ")" << std::endl;
return 0;
}
The problem is that I am getting no data in the descriptor. What am I missing?
Could you explain in more detail what are the params passed to the KeyPoint object? I am new to computer vision + OpenCV, so probably a better explanation (than OpenCV's documentation) could help.
You're trying to computer ORB on the points (0,0) and (10,10), but they are too close to the image border, so ORB can't compute descriptors in those locations. ORB (as well as the other binary descriptors) filters them out.
EDIT: since you asked about usage, I'm editing the answer. You should pass the whole image. I use it as:
Ptr<FeatureDetector> detector = FeatureDetector::create(detector_name);
Ptr<DescriptorExtractor> descriptor = DescriptorExtractor::create(descriptor_name);
detector->detect(imgK, kp);
descriptor->compute(imgK, kp, desc);