tesseract APIs with OpenMp: sometimes segmentation faults, sometimes not

tesseract APIs with OpenMp: sometimes segmentation faults, sometimes not - c++

I'm writing a small demo with tesseract APIs to run in parallel via OpenMp; it is basically an example taken from tesseract APIs usage page, with some openmp flavour added on it.
The executable takes two arguments: a tif file and an integer for the page to be ocrized.
I'm compiling like this:
clang-omp++ -o tessapi-quality tessapi-quality.cpp -<TESSERACT_FLAGS> -O3 -fopenmp -g.
The problem I'm facing is that it works most of the time, but one over five - more or less - it throws a seg fault.
I tried and debug with gdb, but couldn't find out, 'cause it dies sometimes on a tesseract baseapi function, sometimes on another.
Unfortunately I cannot install valgrind on the machine where I'm working on now.
I'm aware that it's not handy to be tested from anyone who has not tesseract installed, but maybe I'm missing something big in the code and you can help me just taking a look.
This is the code:
#include <stdlib.h>
#include <iostream>
#include <leptonica/allheaders.h>
#include <omp.h>
#include <sys/time.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <tesseract/api/baseapi.h>
void getCompImage(const char* filename, int page){
Pix* image;
Pixa** pixa;
int** blockids;
image = pixReadTiff(filename, page);
int num_threads = 4;
omp_set_num_threads(num_threads);
#pragma omp parallel
{
tesseract::TessBaseAPI *papi = new tesseract::TessBaseAPI();
if (papi->Init(NULL, "eng")) {
#pragma omp critical
{
std::cout << "Could not initialize tesseract " << '\n';
}
exit(1);
}
papi->SetImage(image);
Boxa* boxes = papi->GetComponentImages(tesseract::RIL_TEXTLINE, true, pixa, blockids);
#pragma omp barrier
#pragma omp for schedule(static)
for (int i = 0; i < boxes->n; i++) {
BOX* box = boxaGetBox(boxes, i, L_CLONE);
papi->SetRectangle(box->x, box->y, box->w, box->h);
char* ocrResult = papi->GetUTF8Text();
int my_thread = omp_get_thread_num();
#pragma omp critical
{
std::cout << "Thread: " << my_thread << " Page: " << page << " Box[" << i << "] text: " << ocrResult << "\n";
}
}
// Destroy used object and release memory
papi->End();
}
pixDestroy(&image);
}
int main(int argc, char *argv[])
{
if (argc != 3){
printf("Type the (tiff) file name to OCRize\n"
"Then the page in the file to OCRize (first page=0, second=1 etc..)\n\n");
exit(-1);
}
int page = atoi(argv[2]);
getCompImage(argv[1], page);
return 0;
}

Related

How to properly read a Point Cloud File in C++ and ROS

I just started using the Point Cloud Library and as a start I would like to read a point cloud from file. I followed the tutorial related to that. This is just a small example of a major CMake project I am building. Just slightly different from the tutorial I divided the project to make it more CMake suitable. The CMake runs well and the project seems to be organized. However when I try to run the project I get the following /home/emanuele/catkin_ws/src/map_ros/src/pointcloud_reader_node.cpp:6:10: fatal error: ../map_ros/include/cloud.h: No such file or directory #include "../map_ros/include/cloud.h"
error::Cloud::readPCloud(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) and I don't know how to explain this error.
Below the snippet of code I am using:
cloud.h
#ifndef CLOUD_H
#define CLOUD_H
#include <iostream>
#include <pcl/io/pcd_io.h>
#include <pcl/point_types.h>
#include <sensor_msgs/PointCloud2.h>
#include <string>
class Cloud
{
public:
void readPCloud(std::string filename);
private:
std::string path;
};
#endif// CLOUD_H
cloud.cpp
#include "cloud.h"
void Cloud::readPCloud(std::string filename)
{
pcl::PointCloud<pcl::PointXYZ>::Ptr cloud(new pcl::PointCloud<pcl::PointXYZ>);
if(pcl::io::loadPCDFile<pcl::PointXYZ> (filename, *cloud) == -1) // load point cloud file
{
PCL_ERROR("Could not read the file");
return;
}
std::cout<<"Loaded"<<cloud->width * cloud->height
<<"data points from filename with the following fields: "
<<std::endl;
for(size_t i = 0; i < cloud->points.size(); ++i)
std::cout << " " << cloud->points[i].x
<< " " << cloud->points[i].y
<< " " << cloud->points[i].z << std::endl;
}
pointcloud_reader_node.cpp
#include <ros/ros.h>
#include <pcl/io/pcd_io.h>
#include <pcl/point_types.h>
#include "../map_ros/include/cloud.h"
using namespace std;
int main()
{
std::string fstring = "/home/to/Desktop/file.pcd";
Cloud p;
p.readPCloud(fstring); // <-- Error Here
return 0;
}
Also for completeness I am adding the CMake file below:
cmake_minimum_required(VERSION 2.8.3)
project(map_ros)
add_compile_options(-std=c++11)
set(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR})
find_package(catkin REQUIRED COMPONENTS
// ....
)
catkin_package(
INCLUDE_DIRS include
LIBRARIES ${PROJECT_NAME}
CATKIN_DEPENDS
// ......
)
###########
## Build ##
###########
include_directories(${catkin_INCLUDE_DIRS})
add_executable(pointcloud_reader_node src/pointcloud_reader_node.cpp ${SRCS})
target_link_libraries(pointcloud_reader_node ${catkin_LIBRARIES})

I have figured out the problem to my question some time ago but wanted to share it in case someone has my same problem.
So there were two issues coming at the same time that made me think it was a CMake problem only:
1) catkin_make was not properly compiling not because of CMake as I thought for a long time, but because the cache file catkin_ws.workspace was causing problem to CMake itself. So the first solution to this problem was to erase the cache file catkin_ws.workspace and do a fresh compile. All CMake issues disappeared.
2) Second problem: The correct pseudo code for reading the point-cloud it was like:
main()
{
init node
create pointcloud publisher
create rate object with 1 second duration
load point cloud from file
while(ros::ok())
{
rate.sleep
publish point cloud message
}
}
And I realized nothing was being published on the input and the callback was executed.
Below the complete code that reads point-cloud from file and gives an output of all points to a .txt file. I hope this can be helpful to anyone who may encounter this problem:
test.cpp
#include <ros/ros.h>
#include <sensor_msgs/PointCloud2.h>
#include <pcl/io/pcd_io.h>
#include <pcl_conversions/pcl_conversions.h>
#include <pcl/point_cloud.h>
#include <pcl/point_types.h>
#include <pcl/filters/voxel_grid.h>
void loadFromFile(std::string filename)
{
pcl::PointCloud<pcl::PointXYZ>::Ptr cloud(new pcl::PointCloud<pcl::PointXYZ>);
if(pcl::io::loadPCDFile<pcl::PointXYZ> (filename, *cloud) == -1) // load point cloud file
{
PCL_ERROR("Could not read the file");
return;
}
std::cout<<"Loaded"<<cloud->width * cloud->height
<<"data points from /home/to/Desktop/point_cloud/yourFile.pcd with the following fields: "
<<std::endl;
// Write entire point clouds to a .txt file
std::ofstream myfile;
myfile.open ("/home/to/Desktop/exampleCloud.txt");
if (myfile.is_open()) {
for(size_t i = 0; i < cloud->points.size(); ++i)
myfile << " " << cloud->points[i].x
<< " " << cloud->points[i].y
<< " " << cloud->points[i].z << std::endl;
myfile.close();
}
}
int main (int argc, char** argv)
{
// Initialize ROS
ros::init (argc, argv, "pcl_tutorial_cloud");
ros::NodeHandle nh;
ros::Publisher pub = nh.advertise<sensor_msgs::PointCloud2>("output", 1000);
ros::Rate loop_rate(1);
loadFromFile("/home/to/Desktop/yourFile.pcd");
int count = 0;
while(ros::ok())
{
sensor_msgs::PointCloud2 pcloud2;
pub.publish(pcloud2);
ros::spinOnce();
loop_rate.sleep();
count ++;
}
return 0;
}
Here is the result after running :
1) catkin_make
2) rosrun yourProject test

Thread programming in vmware, 'process scheduling' didn't happen

I'm having some thread programming learning on my virtual machine. The code that do not perform as expected is following:
#include <iostream>
#include <thread>
using namespace std;
void function01() {
for (int i=0; i<100; i++) {
std::cout << "from t1:" << i << std::endl;
}
}
int main() {
// data race and mutex
std::thread t1( function01 );
for (int i=0; i<100; i++) {
std::cout << "from main:" << i << std::endl;
}
t1.join();
return 0;
}
These code should make a data race on std output. But when I compiled it with
:!g++ -std=c++11 -pthread ./foo.cpp
and running, every time I got a result in which 100 times "t1" followed 100 times "main". What confusing me is that when I did the same thing on my another ubuntu14.04 which was installed in my old lap-top, the code performed as my expected. That means this code encountered with data race.
I don't know much about vmware. Are the threads running on the vmware are managed and won't encountered data race?
------------- second edit -----------------------
Thanks for everybody.
The quantity of core might be the main reason. And I had my expected result after setting quantity of vm core to more than one.

Your new machine is probably much faster than your old one. So it is able to complete execution of function01 before main gets to its own loop.
Or it has only one CPU, so it can execute only one routine at a time. And because your loop requires really small amount of computation, CPU could be done with it in one slice of time given to it by OS.
Make sure that your VM has more than one CPU allocated to it. And try to make each step in your loops 'heavier'.
double accumulator = 0;
for (int i=0; i<100; i++) {
for (int j=1; j<1000*1000; j++)
accumulator += std::rand();
std::cout << "from t1:" << i << std::endl;
}

I think the problem is with the time slice. You can verify it by yourself, by introducing some delay in your code. For example:
#include <iostream>
#include <chrono>
#include <thread>
void function01() {
for (int i=0; i<100; i++) {
std::cout << "from t1:" << i << std::endl;
std::this_thread::sleep_for(std::chrono::duration<double, std::milli>{10});
}
}
int main() {
// data race and mutex
std::thread t1( function01 );
for (int i=0; i<100; i++) {
std::cout << "from main:" << i << std::endl;
std::this_thread::sleep_for(std::chrono::duration<double, std::milli>{10});
}
t1.join();
return 0;
}

Run-Time Check Failure #2 dissapears when I define a completely new variable int i = 0 without using it

When I run this code I get the error:
Run-Time Check Failure #2 - Stack around the variable 'line' was corrupted.
#define WIN32_LEAN_AND_MEAN
#pragma comment(lib, "ws2_32.lib")
#include <winsock2.h>
#include <ws2tcpip.h>
#include "VisionUtils.h"
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
int main(void)
{
static std::string dm_ext = ".xml";
std::ifstream testFileNames("training_data\\testFiles.txt");
std::string line;
//int i = 0;
std::cout << "Main loop start" << std::endl;
if (testFileNames.is_open()){
while (std::getline(testFileNames, line))//main loop
{
cout << line << endl;
cout << "training_data\\depth\\" + line + dm_ext << endl;
cv::FileStorage depthData("training_data\\depth\\" + line + dm_ext, cv::FileStorage::READ);
}
}
return 0;
}
Now if I simply define int i = 0 before the loop, the error disappears
int main(void)
{
static std::string dm_ext = ".xml";
std::ifstream testFileNames("training_data\\testFiles.txt");
std::string line;
int i = 0;
std::cout << "Main loop start" << std::endl;
if (testFileNames.is_open()){
while (std::getline(testFileNames, line))//main loop
{
cout << line << endl;
cout << "training_data\\depth\\" + line + dm_ext << endl;
cv::FileStorage depthData("training_data\\depth\\" + line + dm_ext, cv::FileStorage::READ);
}
}
return 0;
}
I am trying to adapt and improve a code that a previous engineer has created and I am not quite familiar with it yet. The code seems to be using an older version of opencv2, though I haven't been able to determine which one. I plan on upgrading to the current version of opencv in the future but I need to do some performance evaluation using the old code and am there fore bound to using that for now.
I would be glad to hear from anyone if they have an idea as to what is causing this weird behavior and how to fix the error I am having.
This is my first post on stackoverflow so please let me know if you see anything that I can improve on in my post.

Resource leak or not (Mac OS X)?

I'm trying to enumerate local users on Mac os.
It works correctly, but i think that there is
some resource leak. I can't understand that.
Profiling says that there are no memory leaks,
but memory usage are constantly grows (Memory Report
chart at XCode). In my case since 2.7M to 4.9M (5 * 1000 iterations).
Can anybody say what is wrong with my code.
Are there any leaks or the behaviour is normal?
This is a simple c++ command line tool project
with Objective-c code with default build settings (XCode 5):
/////////////////////////////////////////////
// main.cpp
#include "test.h"
#include <iostream>
#include <thread>
int main(int argc, const char * argv[])
{
//for (int i = 0; i < 1000; ++i)
for (int i = 0; i < 5; ++i)
{
std::cout << "Iteration # " << i << std::endl;
for (int j = 0; j < 1000; ++j)
{
Execute();
}
std::this_thread::sleep_for(std::chrono::seconds(1));
}
return 0;
}
/////////////////////////////////////////////
// test.mm
#import <Collaboration/Collaboration.h>
#import <CoreServices/CoreServices.h>
#import <Foundation/Foundation.h>
#import <SystemConfiguration/SCDynamicStore.h>
#import <SystemConfiguration/SCDynamicStoreCopySpecific.h>
#include <iostream>
void Execute()
{
CSIdentityAuthorityRef identityAuthority = CSGetLocalIdentityAuthority();
if (!identityAuthority)
{
std::cout << "Failed to get identity authority." << std::endl;
return;
}
CSIdentityQueryRef usersQuery(CSIdentityQueryCreate(nil, kCSIdentityClassUser, identityAuthority));
if (!usersQuery)
{
std::cout << "Failed to create query." << std::endl;
return;
}
/////////////////////////////////////////////////
// Without CSIdentityQueryExecute(usersQuery, 0, nil) - everething is ok.
/////////////////////////////////////////////////
if (!CSIdentityQueryExecute(usersQuery, 0, nil))
{
std::cout << "Failed to execute query." << std::endl;
return;
}
CFRelease(usersQuery);
}
#ifndef __MY_TEST_H__
#define __MY_TEST_H__
void Execute();
#endif

Try to execute the CFRelease before every return since some iterations are not releasing the data.

I just ran this program, and I'm not seeing any memory growth. I slightly simplified it to be a single-file C++ program (currently it's a mix of C++ and ObjC++).
You do have a memory mistake, but I would only expect it to cause a leak if you were getting errors. This block leaks the query:
if (!CSIdentityQueryExecute(usersQuery, 0, nil))
{
std::cout << "Failed to execute query." << std::endl;
return;
}
You should either not return here (you don't technically need to), or you should include a CFRelease(usersQuery) before the return. But again, if this were the problem, you'd see lots of "Failed to execute query" log messages.

When boost library "interprocess" defines a named_mutex do those named_mutexes work properly between different processes, or only with threads?

I think I must be assuming something from the name boost::interprocess that is not true.
The documents repeat that named_mutex is global here.
I am unable to make it work though. Two copies of the same executable should be run at the same time, and I expect that a named mutex in a library named boost::interprocess might actually BLOCK sometimes. It doesn't. It also doesn't prevent data file corruption in the code below.
Here's some code from the boost docs:
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
#include <fstream>
#include <iostream>
#include <cstdio>
int main ()
{
using namespace boost::interprocess;
try{
struct file_remove
{
file_remove() { std::remove("file_name"); }
~file_remove(){ std::remove("file_name"); }
} file_remover;
struct mutex_remove
{
mutex_remove() { named_mutex::remove("fstream_named_mutex"); }
~mutex_remove(){ named_mutex::remove("fstream_named_mutex"); }
} remover;
//Open or create the named mutex
named_mutex mutex(open_or_create, "fstream_named_mutex");
std::ofstream file("file_name");
for(int i = 0; i < 10; ++i){
//Do some operations...
//Write to file atomically
scoped_lock<named_mutex> lock(mutex);
file << "Process name, ";
file << "This is iteration #" << i;
file << std::endl;
}
}
catch(interprocess_exception &ex){
std::cout << ex.what() << std::endl;
return 1;
}
return 0;
Here's what I did to it so I could prove to myself the mutex was doing something:
#include <windows.h>
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <boost/lambda/lambda.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <fstream>
#include <iostream>
#include <cstdio>
int main (int argc, char *argv[])
{
srand((unsigned) time(NULL));
using namespace boost::interprocess;
try{
/*
struct file_remove
{
file_remove() { std::remove("file_name"); }
~file_remove(){ std::remove("file_name"); }
} file_remover;
*/
struct mutex_remove
{
mutex_remove() { named_mutex::remove("fstream_named_mutex"); }
~mutex_remove(){ named_mutex::remove("fstream_named_mutex"); }
} remover;
//Open or create the named mutex
named_mutex mutex(open_or_create, "fstream_named_mutex");
std::ofstream file("file_name");
for(int i = 0; i < 100; ++i){
//Do some operations...
//Write to file atomically
DWORD n1,n2;
n1 = GetTickCount();
scoped_lock<named_mutex> lock(mutex);
n2 = GetTickCount();
std::cout << "took " << (n2-n1) << " msec to acquire mutex";
int randomtime = rand()%10;
if (randomtime<1)
randomtime = 1;
Sleep(randomtime*100);
std::cout << " ... writing...\n";
if (argc>1)
file << argv[1];
else
file << "SOMETHING";
file << " This is iteration #" << i;
file << std::endl;
file.flush(); // added in case this explains the corruption, it does not.
}
}
catch(interprocess_exception &ex){
std::cout << "ERROR " << ex.what() << std::endl;
return 1;
}
return 0;
}
Console Output:
took 0 msec to acquire mutex ... writing...
took 0 msec to acquire mutex ... writing...
took 0 msec to acquire mutex ... writing...
took 0 msec to acquire mutex ... writing...
Also, the demo writes to a file, which if you run two copies of the program will be missing some data.
I expect that if I delete file_name and run two copies of the program, I should get interleaved writes to file_name containing 100 rows from each instance.
(Note, that the demo code is clearly not using an ofstream in append mode, instead it simply rewrites the file each time this program runs, so if we wanted a demo to show two processes writing to a file, I'm aware of that reason why it wouldn't work, but what I did expect is for the above code to be a feasible demonstration of mutual exclusion, which it is not. Also calls to a very handy and aptly named ofstream::flush() method could have been included, and weren't.)
Using Boost 1.53 on Visual C++ 2008

It turns out that Boost is a wonderful library, and it code examples interspersed in the documentation may sometimes be broken. At least the one for boost::interprocess::named_mutex in the docs is not functional on Windows systems.
*Always deleting a mutex as part of the demo code causes the mutex to not function. *
That should be commented in the demo code at the very least. It fails to pass the "principle of least amazement", although I wondered why it was there, I thought it must be idiomatic and necessary, it's idiotic and unnecessary, in actual fact. Or if it's necessary it's an example of what Joel Spolsky would call a leaky abstraction. If mutexes are really filesystem points under C:\ProgramData in Windows I sure don't want to know about it, or know that turds get left behind that will break the abstraction if I don't detect that case and clean it up. (Sure smells like posix friendly semantics for mutexes in Boost have caused them to use a posix-style implementation instead of going to Win32 API directly and implementing a simple mutex that has no filesystem turds.)
Here's a working demo:
#include <windows.h>
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <boost/lambda/lambda.hpp>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
#include <fstream>
#include <iostream>
#include <cstdio>
#include <windows.h>
int main (int argc, char *argv[])
{
srand((unsigned) time(NULL));
using namespace boost::interprocess;
try{
/*
// UNCOMMENT THIS IF YOU WANT TO MAKE THIS DEMO IMPOSSIBLE TO USE TO DEMO ANYTHING
struct file_remove
{
file_remove() { std::remove("file_name"); }
~file_remove(){ std::remove("file_name"); }
} file_remover;
// UNCOMMENT THIS IF YOU WANT TO BREAK THIS DEMO HORRIBLY:
struct mutex_remove
{
mutex_remove() { named_mutex::remove("fstream_named_mutex"); }
~mutex_remove(){ named_mutex::remove("fstream_named_mutex"); }
} remover;
*/
//Open or create the named mutex
named_mutex mutex(open_or_create, "fstream_named_mutex");
std::ofstream file("file_name", std::ios_base::app );
int randomtime = 0;
for(int i = 0; i < 100; ++i){
//Do some operations...
//Write to file atomically
DWORD n1,n2;
n1 = GetTickCount();
{
scoped_lock<named_mutex> lock(mutex);
n2 = GetTickCount();
std::cout << "took " << (n2-n1) << " msec to acquire mutex";
randomtime = rand()%10;
if (randomtime<1)
randomtime = 1;
std::cout << " ... writing...\n";
if (argc>1)
file << argv[1];
else
file << "SOMETHING";
file << "...";
Sleep(randomtime*100);
file << " This is iteration #" << i;
file << std::endl;
file.flush();
}
Sleep(randomtime*100); // let the other guy in.
}
}
catch(interprocess_exception &ex){
std::cout << "ERROR " << ex.what() << std::endl;
return 1;
}
return 0;
}
I would love critques and edits on this answer, so that people will have a working demo of using this named mutex .
To use the demo:
- Build it and run two copies of it. Pass a parameter in so you can see which instance wrote which lines (start myexename ABC and start myexename DEF from a command prompt in windows)
- If it's your second run, delete any stray output named "file_name" if you don't want the second run appended to the first.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

tesseract APIs with OpenMp: sometimes segmentation faults, sometimes not - c++

Related

How to properly read a Point Cloud File in C++ and ROS

Thread programming in vmware, 'process scheduling' didn't happen

Run-Time Check Failure #2 dissapears when I define a completely new variable int i = 0 without using it

Resource leak or not (Mac OS X)?

When boost library "interprocess" defines a named_mutex do those named_mutexes work properly between different processes, or only with threads?

Categories

Resources