I'm currently writing a web-crawler/spider in C++ on Linux and I'm having some problems with updating a database. I'm fairly new to C/C++, just FYI.
The database updates are executed by a seperate thread (using pthreads) but the same problem exists if executed in main() so I, perhaps naively, discarded the threading stuff as the cause of anything.
I'm using libmysqlcppconn for the database API.
I am compiling with gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1) with -O2 -Wall -pedantic and it compiles cleanly.
Nevertheless, when the function commitChangesToDatabase() below is called, it basically picks out items from a std::map (url_queue), throws them in a std::vector (updates) and erases said item from the original std::map, then proceeds to iterate over the std::vector, executing a MySQL prepared statement for each item in the vector. Here is where it fails hard.
It randomly either:
Crashes without any error output (no segfault, no stacktrace, no nothing)
Crashes with a glibc memory corruption detected (see output here: http://pastie.org/private/wlkuorivq5tptlcr7ojg)
Reports that the MySQL server has gone away (caught exception), but keeps trying (doesn't crash)
I have tried switching the prepared statement to a simple executeUpdate(), but to no avail.
I have tried eliminating the step with picking out items and rather just execute the updates whenever I find an item to update, in the first loop over url_queue.
Other functions in this application uses prepared statements as well (another UPDATE) and that works fine. Those functions are also run by seperate threads.
I would run the application through valgrind, but quite frankly, I don't understand most of the output so it wouldn't help me much - but if anyone wants the output from it, let me know which options to run it with and I'll provide it.
I have no clue how to proceed from here. Anyone have a clue what's wrong?
struct queue_item_t {
int id;
int sites_id;
int priority;
int depth;
int handler;
int state; // 0 = Pending, 1 = Working, 2 = Completed, 3 = Checked
double time_allowed_crawl;
bool status;
bool was_redirected;
double time;
double time_end;
double time_curl;
double size;
std::string hash;
std::string url;
std::string file;
std::string host;
};
void commitChangesToDatabase()
{
map< string, queue_item_t >::iterator it, end;
sql::PreparedStatement *pstmt;
int i = 0;
if (!url_queue.size()) {
return;
}
pthread_mutex_lock(&dbCommitMutex);
pthread_mutex_lock(&itemMutex);
cout << "commitChangesToDatabase()" << endl;
pstmt = dbPrepareStatement("UPDATE crawler_queue SET process_hash = NULL, date_crawled = NOW(), url = ?, hash = ? WHERE id = ?");
for (it = url_queue.begin(); it != url_queue.end();)
{
if (it->second.state == 2)
{
pstmt->setString(1, it->second.url);
pstmt->setString(2, it->second.hash);
pstmt->setInt(3, it->second.id);
try {
pstmt->executeUpdate();
++i;
} catch (sql::SQLException &e) {
cerr << "# ERR: SQLException in " << __FILE__;
cerr << "(" << __FUNCTION__ << ") on line " << __LINE__ << endl;
cerr << "# ERR: " << e.what();
cerr << " (MySQL error code: " << e.getErrorCode();
cerr << ", SQLState: " << e.getSQLState() << " )" << endl;
}
url_queue.erase(it++);
}
else {
++it;
}
}
delete pstmt;
cout << "~commitChangesToDatabase()" << endl;
pthread_mutex_unlock(&itemMutex);
pthread_mutex_unlock(&dbCommitMutex);
}
// this function is defined in another file but is written here just to show the contents of it
sql::PreparedStatement *dbPrepareStatement(const std::string &query)
{
return con->prepareStatement(query);
}
Edit:
Some seem to believe the problem is with the iteration over the url_queue collection, however I have ruled that out but commenting out everything that operates on the database, but not the iteration. Furthermore, the iteration here is a simplified (but working) version of the original which picks out items from the map, throws in a vector and erases from the map, as demonstrated below, and that part of the program works fine - it only crashes whenever the database is used.
for (it = url_queue.begin(); it != url_queue.end();)
{
if (it->second.state == 2)
{
update_item.type = (!it->second.was_redirected ? 1 : 2);
update_item.item = it->second;
updates.push_back(update_item);
url_queue.erase(it++);
}
else {
++it;
}
}
Edit 2:
Output from valgrind --leak-check=yes: http://pastie.org/private/2ypk0bmawwsqva3ikfazw
It seems, the iterator is incremented unnecessarily; first in loop body, and also in for statement. In this code, it is possible to increment the end iterator, this is a problematic operation and might be source of the problem.
The following loop structure is more suitable for this case:
it = url_queue.begin();
while( it != url_queue.end() ){
//loop body
}
I don't think it is a good idea to mess with iterators. replace:
else {
++it;
}
by:
else continue;
or just remove it.
Related
I am using a boost::shared_ptr to point to a plugin class. Plugin is a map <string, shared_ptr>. The first time I find a certain plugin in the map, it works fine. However, any subsequent time I try to find a particular plugin, I get a SIGSEGV error. When stepping through my code, I get to foundPlugin = a->second->onCommand(command);and find that a->second is not accessible anymore. This error only happens when I am running in Linux, however. I have no issues while running in Windows. Is there some sort of issue with boost::shared_ptr and linux? I have tried using std::shared_ptr, but I have to use a boost::dll::import function that returns a boost::shared_ptr, and I haven't found an alternative for that yet. Any insight is greatly appreciated!
I load plugins like this:
bool PluginManager::loadPlugin(std::string pluginPath, std::string
pluginName, std::string pluginType)
{
bool couldLoad = false;
boost::filesystem::path libPath = boost::filesystem::current_path();
boost::shared_ptr<my_plugin_api> plugin;
std::cout << "Loading the plugin " << pluginName << std::endl;
if (pluginName == "")
{
pluginName = "plugName";
}
try
{
plugin = boost::dll::import<my_plugin_api>(
libPath / pluginName,
pluginType,
dll::load_mode::append_decorations
);
Plugin.insert(std::pair<std::string,boost::shared_ptr<my_plugin_api>>
(pluginName, plugin));
std::cout << "Loading the plugin " << pluginName << " (SUCCESS)" <<
std::endl;
couldLoad = true;
}
catch (const std::exception& e)
{
std::cerr << e.what() << std::endl;
}
return couldLoad;
}
After much more testing, I feel like my problems are in the above section of code. the boost::dll::import function acts as if it finds a .so, but does not return anything in the boost::shared_ptr, which in turn causes the second snippet of code to fail. Any ideas of why this boost::dll::import function might be acting weirdly in Linux?
bool PluginManager::onCommand(const char* command, const char* pluginName)
{
bool foundPlugin = false;
auto a = Plugin.find(pluginName);
if (a == Plugin.end())
{
std::cerr << "plugin " << pluginName << " not found" << std::endl;
}
else
{
foundPlugin = a->second->onCommand(command);
}
return foundPlugin;
}
I created a C++ wrapper to access my Python modules. everything is working until I try to use threads in my application.
On my Python module there is a method which reads from a webcam (so its uses an infinite loop) and I send callbacks from C++ to get the image and other needed information from it.
Since we have a blocking method here, I decided to use threads.
The threading on Python part seems not to be working on the C++ side that is if I call the async counter part of the webcam_feed loop, none of my callbacks are actually executed (on python part the routines are all executed however, it seems it doesn't reach to C++ section somehow. I don't get any feedback in C++ side, however, on Python part, those routines responsible for executing the callbacks save the info to the disk so I know for sure they are executed).
I asked a separate question for it here.
Therefore I decided to use the threading inside C++ client. However, whenever I execute the code (given below), I get an access violation whenever I want to use any methods after the thread is started.
Here are the sample callbacks I have for now:
void default_callback(bool status, std::string id, py::array_t<uint8_t>& img)
{
auto rows = img.shape(0);
auto cols = img.shape(1);
auto type = CV_8UC3;
cv::Mat img1(rows, cols, type, img.mutable_data());
cv::imshow("from callback", img1);
cv::waitKey(1);
auto timenow = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
std::cout << "\narg1: " << status << " arg2: " << id << " arg3: " << typeid(img).name() << " " << ctime(&timenow) << std::endl;
}
void default_c_callback_temporary(bool status, char* message)
{
std::cout << "status is: " << status << " id/name: " << message << " ptr:" << "" << std::endl;
std::ofstream myfile;
myfile.open("example.txt");
myfile << "Writing this to a file: " << status << message << std::endl;
myfile.close();
}
And this is the actual test
void thread_test_start(Core* core)
{
try
{
core->SetCpuAffinity(2);
core->AddCallback(default_callback);
core->AddCallback_C_tmp(default_c_callback_temporary);
//set true to run the async version (implemented in python)
core->Start(false);
}
catch (const std::exception& ex)
{
std::cout << ex.what() << std::endl;
}
}
int main()
{
Core* core = new Core(false);
std::thread t(thread_test_start, core);
py::print(core->GetCallbacks());
std::cout << "\nGet C Callbacks:\n";
py::print(core->GetCallbacks_C_tmp());
std::cout << "\nEverything done. press Enter to Exit";
t.join();
std::getchar();
return 0;
}
The call to core->GetCallbacks() causes the memory access violation:
Exception thrown at 0x000000006FCC6D80 (python36.dll) in TestDLL.exe: 0xC0000005: Access violation reading location 0x0000000000000010.
And here is a snapshot showing the access violation error inside VS2019:
Doing something like this is also the same :
void thread_test_start2()
{
try
{
Core* core = new Core(false);
core->SetCpuAffinity(2);
core->AddCallback(default_callback);
core->AddCallback_C_tmp(default_c_callback_temporary);
std::thread t(&Core::Start, core, false);
py::print(core->GetCallbacks());
std::cout << "\nGet C Callbacks:\n";
py::print(core->GetCallbacks_C_tmp());
t.join();
}
catch (const std::exception& ex)
{
std::cout << ex.what() << std::endl;
}
}
results in :
Exception thrown at 0x000000006FCC0CDF (python36.dll) in TestDLL.exe: 0xC0000005: Access violation writing location 0x0000000000000020.
like the former one.
Why am I getting this error ? Can we not use threading with Pybind11? What am I missing here?
Here is a sample project to re-create this issue : https://workupload.com/file/6LmfRtbztHK
The reason for memory access violations were due to trying to run methods using different threads. That is, all Pybind11 related methods (methods that use Pybind11) need to be executed under the very same thread it seems.
Therefore executing some portion of the code under one thread and trying to execute some other methods in the main thread will result in memory access violation.
In order to get around this, I ended up implementing a simple dispatcher in one callback where any method that needs to be run, first sets a flag, then each time the callback is run, the flag is checked and the corresponding method is run.
int flag=0;
void callback(...)
{
switch(flag)
{
case 1: //e.g. stop
core->stop();
break;
case 2: // e.g. get_callbacks()
core->get_callbacks();
break;
case 3:
//some other op
break;
....
}
//reset flag
flag = 0;
}
I'm using C++ (written for Windows and Linux) compiled for OpenMPI. I'm getting a strange series of allocation errors when adding a class object to a vector inside the OpenMPI for loop. Debugging shows a shifting pattern of alloc errors, all centered around my "AddEntry()" method, but the errors never fall out in a consistent place in the loop or on a consistent member within the object I'm adding (therefore, I believe the object is not the problem, those details are not included in the question code). I tried reserving space for the vector and I tried solutions using both deque and list. I tried passing the object to the add member as an object, reference and pointer (instantiated with 'new ()') and none of these solutions resolved the issue. This is my code:
#include "MyEntryClass.h"
#include "MyVectorClass.h"
#include <omp.h>
CMyVectorClass::CMyVectorClass()
{
try
{
m_vEntries.clear();
m_vEntries.reserve(750000);
}
catch (exception ex)
{
cout << "ERROR [CMyVectorClass] Exception Code: " << ex.what() << "\n";
}
}
// Interface (public)
bool CMyVectorClass::AddEntry(CMyVectorClass_Entry& eAdd)
{
try
{
m_vEntries.push_back(eAdd);
return true;
}
catch (exception ex)
{
cout << "ERROR [AddEntry] Exception Code: " << ex.what() << "\n";
}
return false;
}
bool CMyVectorClass::DoOMPLoop()
{
// Max processors for omp
int nMaxProcs
// Continue, if true
volatile bool vbContinue = true;
// Loop counter
long lMaxCount = 100000;
try
{
// Iterate through files
// Declare team size
#pragma omp parallel for shared(vbContinue) num_threads(nMaxProcs)
for (long lCount = 0; lCount < lMaxCount; lCount++)
{
// The entry object to add
CMyEntryClass cAdd;
// Do some stuff to the entry
cAdd.SetStuff();
// Catalog the data
vbContinue = AddEntry(cAdd);
}
}
catch (exception ex)
{
cout << "ERROR [DoOMPLoop] Exception Code: " << ex.what() << "\n";
}
return false;
}
// Implementation (private)
This problem has cost me many long hours of frustration attempting to resolve and none of the help I can find on StackOverflow (or the 'net at large) has enabled me to resolve the issue (though it has helped me optimize other code). Please assist.
After much trial and tribulation, I realized the alloc errors were not the source of the problem (since my machine has beaucoup memory and the reserve limit was never even close to exceeded). I began to suspect the AddEntry() method inside the OpenMPI for loop was inducing collisions. So I used "resize" instead of "reserve" and I used an indexed "SetEntryAtIndex()" function to simply reset an object at a given place in the vector (note, this sort of random access is not allowed with all similar containers). This is my code now:
#include "MyEntryClass.h"
#include "MyVectorClass.h"
#include <omp.h>
CMyVectorClass::CMyVectorClass()
{
try
{
m_vEntries.clear();
m_vEntries.resize(750000);
}
catch (exception ex)
{
cout << "ERROR [CMyVectorClass] Exception Code: " << ex.what() << "\n";
}
}
// Interface (public)
bool CMyVectorClass::SetEntryAtIndex(CMyVectorClass_Entry& eSet, long lIndex)
{
try
{
if ((lIndex >= 0) && (lIndex < m_vEntries.size()))
{
m_vEntries[lIndex] = eSet;
return true;
}
else
{
ReportTimeStamp("[SetEntryAtIndex]", "ERROR: Index [" + ConvertLongToString(lIndex) + "] is Out of Range [0:" + ConvertLongToString(m_vEntries.size()) + "]");
}
}
catch (exception ex)
{
cout << "ERROR [SetEntryAtIndex] Exception Code: " << ex.what() << "\n";
}
return false;
}
bool CMyVectorClass::DoOMPLoop()
{
// Max processors for omp
int nMaxProcs
// Continue, if true
volatile bool vbContinue = true;
// Loop counter
long lMaxCount = 100000;
try
{
// Iterate through files
// Declare team size
#pragma omp parallel for shared(vbContinue) num_threads(nMaxProcs)
for (long lCount = 0; lCount < lMaxCount; lCount++)
{
// The entry object to add
CMyEntryClass cAdd;
// Do some stuff to the entry
cAdd.SetStuff();
// Catalog the data
vbContinue = SetEntryAtIndex(cAdd, lCount);
}
}
catch (exception ex)
{
cout << "ERROR [DoOMPLoop] Exception Code: " << ex.what() << "\n";
}
return false;
}
// Implementation (private)
The trick is resizing to create a fully (and even OVER) populated vector and then using an index to ensure there are no collisions inside the OpenMPI loop.
I think there must be something else wrong with the first snippet. Perhaps you are corrupting the vector in code that is not shown? The little snippet below does more or less the same thing and works fine. Note that the loop slops past the end and the vector resizes itself.
#include <vector>
int main(int arg,char*arv[])
{
std::vector <int> my_vec;
my_vec.clear();
my_vec.resize(750000);
for(int i=0;i<760000;i++)
my_vec.push_back(i);
return 0;
}
--Matt
It show error code : Can't create socket(24) , after I survey I know that is reach the open_files_limit,I checked the show global variables like 'open%';
in MySQL and value is 5000000,so my code must some problem in it.
here's my simple code:
class DB {
public:
double query1();
double query2();
double query3();
};
main() {
DB handler;
for(int i=0;i<100000;i++) {
handler.query1();
handler.query2();
handler.query3();
}
}
I wrote a class handle the 3 query and run it in the loop, how can I prevent open-file limit problem in this class
here's query code :
double query1(string pair) {
double get_prob;
try {
/* Create a connection */
driver = get_driver_instance();
con = driver->connect("localhost", "root", "nlpgroup");
/* Connect to the MySQL test database */
con->setSchema("em_im");
stmt = con->createStatement();
stringstream stmvar;
stmvar << "select prob from em where pair='" << pair << "'";
string stmvarstr = stmvar.str();
cout << stmvarstr << endl;
res = stmt->executeQuery(stmvarstr); // replace with your statement
while (res->next()) {
get_prob = atof(res->getString(1).c_str());
}
res->close();
stmt->close();
con->close();
delete res;
delete stmt;
delete con;
} catch (sql::SQLException &e) {
cout << "# ERR: SQLException in " << __FILE__;
cout << "(" << __FUNCTION__ << ") on line " << __LINE__ << endl;
cout << "# ERR: " << e.what();
cout << " (MySQL error code: " << e.getErrorCode();
cout << ", SQLState: " << e.getSQLState() << " )" << endl;
}
return get_prob;
}
show global variables like 'open%'; in MySQL
Apart from MySQL, your OS might impose limits, too. For linux, have a look at /etc/security/limits.conf, on Windows, this answer might help you out.
However, if you need one and the same connection that often one after another time, it might be a better choice to open it once and keep it open until your program terminates. This will additionally give you better performance - and you can improve performance even more using a prepared statement instead. I added this to the example below already...
class DB
{
std::unique_ptr <sql::Connection> con;
std::unique_ptr <sql::PreparedStatement> stmt;
public:
DB();
double query1(std::string const& pair);
};
DB::DB()
: con(get_driver_instance()->connect("localhost", "root", "nlpgroup"))
{
con->setSchema("em_im");
// you might prefer a prepared statement
stmt.reset(con->prepareStatement("SELECT prob FROM em WHERE pair = ?"));
}
double DB::query1(std::string const& pair)
{
double get_prob = 0.0;
try
{
stmt->setString(1, pair);
std::unique_ptr < sql::ResultSet > res(stmt->execute());
while (res->next())
{
get_prob = atof(res->getString(1).c_str());
}
}
catch(sql::SQLException& e)
{
/* ... */
}
return get_prob;
}
Usage of std::unique_ptr assures that all objects are deleted correctly even in case of an exception - which, by the way, your code did not. I did not call close explicitely - it will be called in the objects' destructors anyway, so this is fine.
Be aware that now the constructor can throw an exception, too, so you need a try - catch in the main function, too. Depending on your needs, you then could leave out the try - catch in the query functions. This changes behaviour, however: Leaving as is results in all the queries being executed, even if one fails in between, whereas dropping it results in aborting the loop.
I am trying to write my first program in C++, and I need to use Boost library. I am trying to write a program which recursively goes through a directory tree and returns the date of the newest and the oldest file.
This is where I'm now:
#define BOOST_FILESYSTEM_VERSION 3
#include "boost/filesystem.hpp"
#include <iostream>
#include <ctime>
using namespace std;
namespace fs = boost::filesystem;
int main() {
fs::recursive_directory_iterator it_end;
fs::recursive_directory_iterator it_dir("e:\\");
fs::path p;
time_t oldest( time(NULL) );
time_t newest(0);
try {
for ( ; it_dir != it_end; ++it_dir ) {
p = *it_dir;
try {
time_t t( last_write_time(p) );
if (t<oldest) oldest=t;
if (t>newest) newest=t;
if (fs::is_directory(p)) cout << (p) << " " << t << endl;
}
catch (const fs::filesystem_error& ex) {
cout << "\n" << ex.what() << "\n";
}
}
}
catch (const fs::filesystem_error& ex) {
cout << "\n" << ex.what() << "\n";
}
cout << "\nOldest: " << ctime(&oldest);
cout << "Newest: " << ctime(&newest) << endl;
return 0;
}
The problems I've met are that:
1.When I encounter a too long path (more than 256 or 260 characters, I think), there is an error:
boost::filesystem::last_write_time: The system cannot find the path specified:
2.When I meet with a non accessable directory, like "System Volume Information", I have two more:
boost::filesystem::last_write_time: Access is denied: "e:\System Volume Information"
boost::filesystem::directory_iterator::construct: Access is denied: "e:\System Volume Information"
How can I modify the code above to handle long path names under Windows? Is it really hard to do it? Some programs, like Total Commander for example has no problems with long paths, but many programs still have.
The more important question is that how can I actually make the above code work (not caring about long paths). The problem is that when for ( ; it_dir != it_end; ++it_dir ) meets with a not accessible directory, it throws an exception, and to catch this exception, I need to define the outside catch. But when I'm outside it means that the for cycle is not continuing. So it means that the above code works as far as the first not accessible folder. There it throws an exception and ends.
Is there any way to go back into the for cycle after an exception has been thrown?
My idea is to do a ++it_dir inside the catch and start the for cycle again. But how can I start it again? Shell I move it out to a separate function?
Sorry if my understanding is not clear, it's my first project. I never used C++ before but I'm trying my best!
EDIT:
Any other answer? The problem is that the catch is not working inside the cycle for "not accessible" kind of errors. How can I make it work inside? Here is the smallest code producing the error. Is there any way to catch this error inside the for cycle? Or catch it in a way that it could continue after skipping the non-accessible element with a it_dir++?
int main() {
fs::recursive_directory_iterator it_end;
fs::recursive_directory_iterator it_dir("e:\\");
for ( ; it_dir != it_end; ++it_dir ) {
//something here
}
}
It turned out it is a bug in boost. I've found a bug support ticket for it and contributed to it.
Just put the try/catch inside the for loop...