I'm having an issue with some C++ code that I'm running. Basically, it works fine with most inputs, but with certain inputs it segfaults after my main function returns. This has been... puzzling. I stopped the run at the segfault to get the stack trace, and it returned this:
#0 malloc_consolidate() at /build/eglibc-oGUzwX/eglibc-2.19/malloc/malloc.c:4151
#1 _int_free() at /build/eglibc-oGUzwX/eglibc-2.19/malloc/malloc.c:4057
#2 boost::re_detail::mem_block_cache::~mem_block_cache()() at /usr/lib/x86_64-linux-gnu/libboost_regex.so.1.54.0
#3 __cxa_finalize() at /build/eglibc-oGUzwX/eglibc-2.19/stdlib/cxa_finalize.c:56
#4 ??() at /usr/lib/x86_64-linux-gnu/libboost_regex.so.1.54.0
#5 ??() at
#6 _dl_fini() at /build/eglibc-oGUzwX/eglibc-2.19/elf/dl-fini.c:252
This made me think that I must be doing something wrong with boost regex, but I can't for the life of me figure it out. The way I'm using regex is that users can input a bunch of strings. Those strings could just be normal text, or they could be regular expressions. Because of this, I basically interact with all the inputs as regular expressions. But what if a user gave a string that was intended as plain text but had a character that could be interpreted differently as a regular expression? What I do is go through all plain text input strings and escape all those characters.
Here's the code that I'm working with. This is my main:
int
main(int argc, char * argv[])
{
// Process input arguments
// The desired input is numVertices (int), graph density (double between 0 and 1), halfLoss (double), total loss (double),
// position expanse (double, m), velocity expanse (double, m/s)
int num_vertices;
double graph_density ;
double half_loss;
double total_loss;
double position_expanse;
double velocity_expanse;
if (argc == 1)
{
num_vertices = 48;
graph_density = 1;
half_loss = 200000;
total_loss = 400000;
position_expanse = 400000;
velocity_expanse = 10000;
}
else
{
if (argc != 7)
{
std::cerr << "Need 6 input arguments" << std::endl;
return 1;
}
std::istringstream ss(argv[1]);
num_vertices;
if (!(ss >> num_vertices))
std::cerr << "First input must be an integer" << std::endl;
graph_density = read_double_input(argv[2]);
half_loss = read_double_input(argv[3]);
total_loss = read_double_input(argv[4]);
position_expanse = read_double_input(argv[5]);
velocity_expanse = read_double_input(argv[6]);
}
// Determine how many edges to create
int num_edges = (int) ( (graph_density * num_vertices * (num_vertices - 1)) + 0.5 );
// Create the edges
int edges_created = 0;
std::set<std::pair<int, int> > edge_set;
while (edge_set.size() < num_edges)
{
// Pick a random start vertex and end vertex
int start_vertex = rand() % num_vertices;
int end_vertex = rand() % num_vertices;
// Make sure the start and end vertices are not equal
while (start_vertex == end_vertex)
{
end_vertex = rand() % num_vertices;
}
// Insert the new edge into our set of edges
edge_set.insert(std::pair<int, int>(start_vertex, end_vertex));
}
// Create connection handler
ConnectionHandler conn_handler;
// Create lists for from and to vertices
std::vector<std::string> from_list;
std::vector<std::string> to_list;
// Add connections to from and to lists
for (std::set<std::pair<int, int> >::const_iterator edge_it = edge_set.begin(), end_it = edge_set.end(); edge_it != end_it; ++edge_it)
{
int start_vertex = edge_it->first;
int end_vertex = edge_it->second;
from_list.push_back("Radio" + int_to_string(start_vertex));
to_list.push_back("Radio" + int_to_string(end_vertex));
}
// Read the list into the connection handler
conn_handler.read_connection_list(true, from_list, to_list);
return 0;
}
This code has this ConnectionHandler object that I created. Here's the header for that:
#ifndef CLCSIM_CONNECTIONHANDLER_HPP_
#define CLCSIM_CONNECTIONHANDLER_HPP_
#include <models/network/NetworkTypes.hpp>
#include <generated/xsd/NetworkModelInterfaceConfig.hpp>
namespace clcsim
{
typedef std::map<std::string, std::set<std::string> > ConnectionFilter;
class ConnectionHandler
{
public:
ConnectionHandler();
~ConnectionHandler();
void read_connection_list(const bool is_white_list, const std::vector<std::string> &from_radios, const std::vector<std::string> &to_radios);
private:
ConnectionFilter filter_;
std::set<std::string> from_list_;
std::set<std::string> to_list_;
bool is_white_list_;
};
} // namespace clcsim
#endif // CLCSIM_CONNECTIONHANDLER_HPP_
And here's the source:
#include <models/network/ConnectionHandler.hpp>
#include <oasis/framework/exceptions/ConfigurationException.h>
#include <boost/regex.hpp>
namespace clcsim
{
ConnectionHandler::
ConnectionHandler()
{
}
ConnectionHandler::
~ConnectionHandler()
{
std::cout << "Destructing conn handler" << std::endl;
}
void
ConnectionHandler::
read_connection_list(
const bool is_white_list,
const std::vector<std::string> &from_radios,
const std::vector<std::string> &to_radios)
{
std::cout << "Reading the connection list" << std::endl;
// Make sure the size of both the input vectors are the same
std::size_t from_radio_size = from_radios.size();
std::size_t to_radio_size = to_radios.size();
if (from_radio_size != to_radio_size)
{
throw ofs::ConfigurationException("Error while initializing the "
"Network model: "
"Connections in from/to lists don't align"
);
}
// Create a regular expression/replacement to find all characters in a non-regular expression
// that would be interpreted as special characters in a regular expression. Replace them with
// escape characters
const boost::regex esc("[.$|()\\[\\]{}*+?\\\\]");
const std::string rep("\\\\&");
// Iterate through the specified connections
for (int i = 0; i < from_radio_size; ++i)
{
std::string from_string = boost::regex_replace(from_radios[i], esc, rep, boost::match_default | boost::format_sed);
std::string to_string = boost::regex_replace(to_radios[i], esc, rep, boost::match_default | boost::format_sed);
//std::cout << "From " << from_string << " to " << to_string << std::endl;
filter_[from_string].insert(to_string);
//filter_[from_radios.at(i)].insert(to_radios.at(i));
}
std::cout << "Got here" << std::endl;
}
} // namespace clcsim
Sorry for so much code.
I saw some similar threads related to segfaults with boost::regex. In those examples, the users had really simple code that just created a regex and matched it and ran into an error. It turned out the issue was related to the Boost versioning. I tried to see if I could replicate those sorts of errors, but those simple examples worked just fine for me. So... I'm pretty stumped. I'd really appreciate any help!
For the sake of removing this from the "Unanswered" list, I'm going to post the answer that was provided in the comments instead of here. The OP determined that the suggestion that Boost linked against eglibc was indeed conflicting with the rest of the code linked against glibc. As such, the OP found that upgrading his OS so that eglibc linked libraries were no longer in use fixed the problem.
Related
The genetic code is the set of rules used by living cells to translate information encoded within genetic material (DNA or mRNA sequences of nucleotide triplets, or codons) into proteins. The genetic code is highly similar among all organisms and can be expressed in a simple table with 64 entries.
A three-nucleotide codon in a nucleic acid sequence specifies a single amino acid. The vast majority of genes are encoded with a single scheme often referred to as the genetic code (refer to the codon table).
Attached to this assignment, you will find a text file named “mouse.dat” that contains the complete genome of a mouse. Write a program to read in the DNA sequence from the file, calculate the frequency of each codon in the codon table, and print out the result as a number and a percentage.
(a) Write a serial code for the solution "Normal Code by using C++ Language".
When I compiled the above code I got the below error message "Unable to open file mouse.dat: No such file or directory"
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
using namespace std;
int main()
{
std::vector<string> codons = { "ttt" }; // Better always initialize any variable or array or objects to zero or NULL or empty string.
codons.push_back("ttc"); // { "ttt", "ttc"
codons.push_back("tta"); // { "ttt", "ttc", "tta"
codons.push_back("ttg"); // { "ttt", "ttc", "tta", ...
codons.push_back("tct");
codons.push_back("tcc");
codons.push_back("tca");
codons.push_back("tcg");
codons.push_back("tat");
codons.push_back("tac");
codons.push_back("taa");
codons.push_back("tag");
codons.push_back("tgt");
codons.push_back("tgc");
codons.push_back("tga");
codons.push_back("tgg");
codons.push_back("ctt");
codons.push_back("ctc");
codons.push_back("cta");
codons.push_back("ctg");
codons.push_back("cct");
codons.push_back("ccc");
codons.push_back("cca");
codons.push_back("ccg");
codons.push_back("cat");
codons.push_back("cac");
codons.push_back("caa");
codons.push_back("cag");
codons.push_back("cgt");
codons.push_back("cgc");
codons.push_back("cga");
codons.push_back("cgg");
codons.push_back("att");
codons.push_back("atc");
codons.push_back("ata");
codons.push_back("atg");
codons.push_back("act");
codons.push_back("acc");
codons.push_back("aca");
codons.push_back("acg");
codons.push_back("aat");
codons.push_back("aac");
codons.push_back("aaa");
codons.push_back("aag");
codons.push_back("agt");
codons.push_back("agc");
codons.push_back("aga");
codons.push_back("agg");
codons.push_back("gtt");
codons.push_back("gtc");
codons.push_back("gta");
codons.push_back("gtg");
codons.push_back("gct");
codons.push_back("gcc");
codons.push_back("gca");
codons.push_back("gcg");
codons.push_back("gat");
codons.push_back("gac");
codons.push_back("gaa");
codons.push_back("gag");
codons.push_back("ggt");
codons.push_back("ggc");
codons.push_back("gga");
codons.push_back("ggg"); // // { "ttt", "ttc", "tta", ..., "ggg"}
// codons.size() is 64
vector<int> counts(64, 0);
string line = ""; // Always initialize.
// int numberOfLines=0; // warning: unused variable numberOfLines
char my_character = '\0'; // Always initialize.
for (int indx = 0; 64 > indx; indx++) // Better compare using "number comparison variable" way
{
string codon_req = codons[indx];
ifstream myfile("mouse.dat");
if (myfile.is_open())
{
int cnt = 0, ans = 0;
while (!myfile.eof())
{
myfile.get(my_character);
// If number of count "cnt" becomes 3 reinitialize that to zero.
// and increase "ans" count
if (3 == cnt)
{
ans++;
cnt = 0;
}
if ('\n' == my_character)
{
continue;
}
// Here comparison is not done sequential
// Search if first charater (example t at ttt) is present
// increase cnt
// Next time if it is not present
// compare until we find next letter t
// if found increase cnt at that time.
// Hence ans count is more greater than expected count on word ttt
// NOT SURE ON YOUR PROJECT REQUIREMENT.
if (my_character == (char)codon_req[cnt])
{
cnt++;
}
}
myfile.close();
counts[indx] = ans;
}
else
{
perror("Unable to open file mouse.dat");
exit(1);
}
}
for (int indx = 0; 64 > indx; indx++) //// Better compare using "number comparison variable" way
{
cout << "Before counts[indx] " << counts[indx] << "\n";
codons[indx] = codons[indx] + " " + to_string(counts[indx]);
cout << "After counts[indx] " << counts[indx] << "\n";
}
ofstream newFile("results.txt");
if (newFile.fail())
{
perror("Opening results.txt file failed");
exit(2);
}
else
{
for (int indx = 0; 64 > indx; indx++) /// Better compare using "number comparison variable" way
{
newFile << codons[indx] << '\n';
}
newFile.close();
}
return 0;
}
END
Introduction
I have a vector entities containing 44 million names. I want to split it into 4 parts and process each part in parallel. Class Freebase contains the function loadData() which is used to split the vector and call function multiThread in order to do the processing.
loadEntities() reads a text file containing the names. I didn't put the implementation in the class because it's not important
loadData() splits the vector entities that was initialized in the constructor into 4 parts and adds every part the vector<thread> threads as follows:
threads.push_back(thread(&Freebase::multiThread, this, i, i + right, ref(data)));
multiThread is the function where I process the files
i and i+right are the indices used in the for loop of multithread to loop through entities
returnValues is a subfunction of multiThreadand is used to call an external function.
Problem
cout <<"Entity " << entities[i] << endl; is showing the following results:
Entity m.0rzf6wv (ok)
Entity m.0rzf70 (ok)
Entity m.068s4h9 m.0n_k8bz (WRONG)
Entity Entity m.068s5_1 (WRONG)
The last 2 outputs are wrong. The output should be:
Entity name not entity entity name nor entity name name
This is causing a segmentation fault when the input is being sent to function returnValues. How can I solve it?
Source Code
#ifndef FREEBASE_H
#define FREEBASE_H
class Freebase
{
public:
Freebase(const std::string &, const std::string &, const std::string &, const std::string &);
void loadData();
private:
std::string _serverURL;
std::string _entities;
std::string _xmlFile;
void multiThread(int,int, std::vector<std::pair<std::string, std::string>> &);
//private data members
std::vector<std::string> entities;
};
#endif
#include "Freebase.h"
#include "queries/SparqlQuery.h"
Freebase::Freebase(const string & url, const string & e, const string & xmlFile, const string & tfidfDatabase):_serverURL(url), _entities(e), _xmlFile(xmlFile), _tfidfDatabase(tfidfDatabase)
{
entities = loadEntities();
}
void Freebase::multiThread(int start, int end, vector<pair<string,string>> & data)
{
string basekb = "PREFIX basekb:<http://rdf.basekb.com/ns/> ";
for(int i = start; i < end; i++)
{
cout <<"Entity " << entities[i] << endl;
vector<pair<string, string>> description = returnValues(basekb + "select ?description where {"+ entities[i] +" basekb:common.topic.description ?description. FILTER (lang(?description) = 'en') }");
string desc = "";
for(auto &d: description)
{
desc += d.first + " ";
}
data.push_back(make_pair(entities[i], desc));
}
}
void Freebase::loadData()
{
vector<pair<string, string>> data;
vector<thread> threads;
int Size = entities.size();
//split database into 4 parts
int p = 4;
int right = round((double)Size / (double)p);
int left = Size % p;
float totalduration = 0;
vector<pair<int, int>> coordinates;
int counter = 0;
for(int i = 0; i < Size; i += right)
{
if(i < Size - right)
{
threads.push_back(thread(&Freebase::multiThread, this, i, i + right, ref(data)));
}
else
{
threads.push_back(thread(&Freebase::multiThread, this, i, Size, ref(data)));
}
}//end outer for
for(auto &t : threads)
{
t.join();
}
}
vector<pair<string, string>> Freebase::returnValues(const string & query)
{
vector<pair<string, string>> data;
SparqlQuery sparql(query, _serverURL);
string result = sparql.retrieveInformations();
istringstream str(result);
string line;
//skip first line
getline(str,line);
while(getline(str, line))
{
vector<string> values;
line.erase(remove( line.begin(), line.end(), '\"' ), line.end());
boost::split(values, line, boost::is_any_of("\t"));
if(values.size() == 2)
{
pair<string,string> fact = make_pair(values[0], values[1]);
data.push_back(fact);
}
else
{
data.push_back(make_pair(line, ""));
}
}
return data;
}//end function
EDIT:
Arnon Zilca is correct in his comments. You are writing to a single vector from multiple threads (in Freebase::multiThread()), a recipe for disaster. You can use a mutex as described below to protect the push_back operation.
For more info on thread safety on containers see Is std::vector or boost::vector thread safe?.
So:
mtx.lock();
data.push_back(make_pair(entities[i], desc));
mtx.unlock();
Another option is using the same strategy as you do in returnValues, creating a local vector in multiThread and only pushing the contents to the data vector when thread is done processing.
So:
void Freebase::multiThread(int start, int end, vector<pair<string,string>> & data)
{
vector<pair<string,string>> threadResults;
string basekb = "PREFIX basekb:<http://rdf.basekb.com/ns/> ";
for(int i = start; i < end; i++)
{
cout <<"Entity " << entities[i] << endl;
vector<pair<string, string>> description = returnValues(basekb + "select ?description where {"+ entities[i] +" basekb:common.topic.description ?description. FILTER (lang(?description) = 'en') }");
string desc = "";
for(auto &d: description)
{
desc += d.first + " ";
}
threadResults.push_back(make_pair(entities[i], desc));
}
mtx.lock()
data.insert(data.end(), threadResults.begin(), threadResults.end());
mtx.unlock()
}
Note: I would suggest using a different mutex than the one you use for the cout. The overall result vector data is a different resource than cout. So threads who want to use cout, should not have to wait for another thread to finish with data.
/EDIT
You could use a mutex around
cout <<"Entity " << entities[i] << endl;
That would prevent multiple threads using cout at "the same time". That way you can be sure that an entire message is printed by a thread before another thread gets to print a message. Note that this will impact your performance since threads will have to wait for the mutex to become available before they are allowed to print.
Note: Protecting the cout will only cleanup your output on the stream, it will not influence the behavior of the rest of the code, see above for that.
See http://www.cplusplus.com/reference/mutex/mutex/lock/ for an example.
// mutex::lock/unlock
#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex
std::mutex mtx; // mutex for critical section
void print_thread_id (int id) {
// critical section (exclusive access to std::cout signaled by locking mtx):
mtx.lock();
std::cout << "thread #" << id << '\n';
mtx.unlock();
}
int main ()
{
std::thread threads[10];
// spawn 10 threads:
for (int i=0; i<10; ++i)
threads[i] = std::thread(print_thread_id,i+1);
for (auto& th : threads) th.join();
return 0;
}
i am currently learning to use openmpi, my aim is to parallelize a simple program whose code i will post bellow.
The program is for testing my concept of paralleling a much bigger program, i hope to learn all i need to know for my actual problem if i succeed with this.
Basically it is a definition of a simple c++ class for lists. A list consists of two arrays, one integer and one double. Entries with the same indicies belong together, in a way that the integer entry is some kind of list entry identifier (maybe an object ID) and the double entry is some kind of quantifier (maybe the weight if an object).
The basic purpose of the program is to add lists together (this is the task i want to parallelize). Adding works as follows: For each entry in one list it is checked if there is the same integer entry in the the other list, if so then the double entry gets added to the double entry in the other list, if there is no such entry in the other list then both the integer and the double entries gets added to the end of the list.
Basically each summand in this list addition represents a storage and each entry is a type of object with a given amount (int is the type and double is the amount), so adding two lists means putting the stuff from the second storage to the first.
The order of the list entries is irrelevant, this means that the addition of lists is not only associative but commutative too!
My plan is to add a very large number of such lists (a few billions) so parallelizing could be to let each thread add a subset of lists first and when this is finished distribute all such sublists (one for each thread) to all of the threads.
My current understanding of openmpi is that only the last step (distributing of finished sublists) needs any special non standard stuff. Basically i need a AllReduce but with a custom data type and a custom operaton.
The first problem i have is understanding how to create a fitting MPI data type. I came to the conclusion that i probably need MPI_Type_create_struct to create a struct type.
I found this site with a nice example: http://mpi.deino.net/mpi_functions/MPI_Type_create_struct.html
from which i learned a lot but the problem is, that in this case there are fixed member arrays. In my case i have lists with arbitrary sized member variables or better with pointers pointing to memory blocks of arbitrary size. So doing it like in the example would lead to creating a new MPI datatype for each list size (using fixed sized lists could help but only in this minimalistic case, but i want to learn how to do it with arbitrary sized lists are preparation for my actual problem).
So my question is: how to create a data type for this special case? What is the best way?
I even thought to maybe write some non mpi code to serialize my class/object, (which would be a lot of work for my real problem but in this example it should be easy) to a single block of bits. Then i could simply use a MPI function to distribute those blocks to all threads and then i just have to translate it back to the actual object, and then i could let each thread simply add the "number-of-threads" lists together to have the same full reduced list on all threads (because the operation is commutative it is not important if the order is the same on each thread in the end).
The problem is that i do not know which MPI function to use to distribute a such memory blocks to each thread so that in the end each thread has an array of "number-of-threads" such blocks (similar like AllReduce but with blocks).
But thats just another idea, i would like to hear from you whats the best way.
Thank you, here is my fully working example program (ignore the MPI parts thats just preparation, you can simply compile with: g++)
As you can see, i needed to create custom copy constructors because standard of the pointer members. I hope thats not a problem for MPI?
#include <iostream>
#include <cstdlib>
#if (CFG_MPI > 0)
#include <mpi.h>
#else
#define MPI_Barrier(xxx) // dummy code if not parallel
#endif
class list {
private:
int *ilist;
double *dlist;
int n;
public:
list(int n, int *il, double *dl) {
int i;
if (n>0) {
this->ilist = (int*)malloc(n*sizeof(int));
this->dlist = (double*)malloc(n*sizeof(double));
if (!ilist || !dlist) std::cout << "ERROR: malloc in constructor failed!" << std::endl;
} else {
this->ilist = NULL;
this->dlist = NULL;
}
for (i=0; i<n; i++) {
this->ilist[i] = il[i];
this->dlist[i] = dl[i];
}
this->n = n;
}
~list() {
free(ilist);
free(dlist);
ilist = NULL;
dlist = NULL;
this->n=0;
}
list(const list& cp) {
int i;
this->n = cp.n;
this->ilist = NULL;
this->dlist = NULL;
if (this->n > 0) {
this->ilist = (int*)malloc(this->n*sizeof(int));
this->dlist = (double*)malloc(this->n*sizeof(double));
if (!ilist || !dlist) std::cout << "ERROR: malloc in copy constructor failed!" << std::endl;
}
for (i=0; i<this->n; i++) {
this->ilist[i] = cp.ilist[i];
this->dlist[i] = cp.dlist[i];
}
}
list& operator=(const list& cp) {
if(this == &cp) return *this;
this->~list();
int i;
this->n = cp.n;
if (this->n > 0) {
this->ilist = (int*)malloc(this->n*sizeof(int));
this->dlist = (double*)malloc(this->n*sizeof(double));
if (!ilist || !dlist) std::cout << "ERROR: malloc in copy constructor failed!" << std::endl;
} else {
this->ilist = NULL;
this->dlist = NULL;
}
for (i=0; i<this->n; i++) {
this->ilist[i] = cp.ilist[i];
this->dlist[i] = cp.dlist[i];
}
return *this;
}
void print() {
int i;
for (i=0; i<this->n; i++)
std::cout << i << " : " << "[" << this->ilist[i] << " - " << (double)dlist[i] << "]" << std::endl;
}
list& operator+=(const list& cp) {
int i,j;
if(this == &cp) {
for (i=0; i<this->n; i++)
this->dlist[i] *= 2;
return *this;
}
double *dl;
int *il;
il = (int *) realloc(this->ilist, (this->n+cp.n)*sizeof(int));
dl = (double *) realloc(this->dlist, (this->n+cp.n)*sizeof(double));
if (!il || !dl)
std::cout << "ERROR: 1st realloc in operator += failed!" << std::endl;
else {
this->ilist = il;
this->dlist = dl;
il = NULL;
dl = NULL;
}
for (i=0; i<cp.n; i++) {
for (j=0; j<this->n; j++) {
if (this->ilist[j] == cp.ilist[i]) {
this->dlist[j] += cp.dlist[i];
break;
}
} if (j == this->n) {// no matching entry found in this
this->ilist[this->n] = cp.ilist[i];
this->dlist[this->n] = cp.dlist[i];
this->n++;
}
}
il = (int *) realloc(this->ilist, (this->n)*sizeof(int));
dl = (double *) realloc(this->dlist, (this->n)*sizeof(double));
if (!il || !dl)
std::cout << "ERROR: 2nd realloc in operator += failed!" << std::endl;
else {
this->ilist = il;
this->dlist = dl;
}
return *this;
}
};
int main(int argc, char **argv) {
int npe, myid;
#if (CFG_MPI > 0)
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&npe);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
#else
npe=1;
myid=0;
#endif
if (!myid) // reduce output
std::cout << "NPE = " << npe << " MYID = " << myid << std::endl;
int ilist[5] = {14,17,4,29,0};
double dlist[5] = {0.0, 170.0, 0.0, 0.0, 24.523};
int ilist2[6] = {14,117,14,129,0, 34};
double dlist2[6] = {0.5, 170.5, 0.5, 0.5, 24.0, 1.2};
list tlist(5, ilist, dlist);
list tlist2(6, ilist2, dlist2);
if (!myid) {
tlist.print();
tlist2.print();
}
tlist +=tlist2;
if (myid) tlist.print();
#if (CFG_MPI > 0)
MPI_Finalize();
#endif
return 0;
}
I've a pattern-matching program which takes as input a string and returns a string closely matched by a dictionary. Since the algorithm takes several seconds to run one match query, I am attempting to use multi-threading to run batch queries.
I first read in a file containing a list of queries and for each query dispatch a new thread to perform the matching algorithm, returning the results into an array using pthread_join.
However, I'm getting some inconsistent results. For example, if my query file contains the terms "red, green, blue", I may receive "red, green, green" as the result. Another run may generate the correct "red, green, blue" result. It appears to sometimes be writing over the result in the array, but why would this happen since the array value is set according to the thread id?
Dictionary dict; // global, which performs the matching algorithm
void *match_worker(void *arg) {
char* temp = (char *)arg;
string strTemp(temp);
string result = dict.match(strTemp);
return (void *)(result.c_str());
}
void run(const string& queryFilename) {
// read in query file
vector<string> queries;
ifstream inquery(queryFilename.c_str());
string line;
while (getline(inquery, line)) {
queries.push_back(line);
}
inquery.close();
pthread_t threads[queries.size()];
void *results[queries.size()];
int rc;
size_t i;
for (i = 0; i < queries.size(); i++) {
rc = pthread_create(&threads[i], NULL, match_worker, (void *)(queries[i].c_str()));
if (rc) {
cout << "Failed pthread_create" << endl;
exit(1);
}
}
for (i = 0; i < queries.size(); i++) {
rc = pthread_join(threads[i], &results[i]);
if (rc) {
cout << "Failed pthread_join" << endl;
exit(1);
}
}
for (i = 0; i < queries.size(); i++) {
cout << (char *)results[i] << endl;
}
}
int main(int argc, char* argv[]) {
string queryFilename = arg[1];
dict.init();
run(queryFilename);
return 0;
}
Edit: As suggested by Zac, I modified the thread to explicitly put the result on the heap:
void *match_worker(void *arg) {
char* temp = (char *)arg;
string strTemp(temp);
int numResults = 1;
cout << "perform match for " << strTemp << endl;
string result = dict.match(strTemp, numResults);
string* tmpResult = new string(result);
return (void *)((*tmpResult).c_str());
}
Although, in this case, where would I put the delete calls? If I try putting the following at the end of the run() function it gives an invalid pointer error.
for (i = 0; i < queries.size(); i++) {
delete (char*)results[i];
}
Without debugging it, my guess is that it has something to do with the following:
void *match_worker(void *arg)
{
char* temp = (char *)arg;
string strTemp(temp);
string result = dict.match(strTemp); // create an automatic
return (void *)(result.c_str()); // return the automatic ... but it gets destructed right after this!
}
So when the next thread runs, it writes over the same memory location you are pointing to (by chance), and you are inserting the same value twice (not writing over it).
You should put the result on the heap to ensure it does not get destroyed between the time your thread exits and you store it in your main thread.
With your edit, you are trying to mix things up a bit too much. I've fixed it below:
void *match_worker(void *arg)
{
char* temp = (char *)arg;
string strTemp(temp);
int numResults = 1;
cout << "perform match for " << strTemp << endl;
string result = dict.match(strTemp, numResults);
string* tmpResult = new string(result);
return (void *)(tmpResult); // just return the pointer to the std::string object
}
Declare results as
// this shouldn't compile
//void* results[queries.size()];
std::string** results = new std::string[queries.size()];
for (int i = 0; i < queries.size(); ++i)
{
results[i] = NULL; // initialize pointers in the array
}
When you clean up the memory:
for (i = 0; i < queries.size(); i++)
{
delete results[i];
}
delete [] results; // delete the results array
That said, you would have a much easier time if you used the C++11 threading templates instead of mixing the C pthread library and C++.
The problem is caused by the lifetime of the local variable result and the data returned by the member function result.c_str(). You make this task unnecessary difficult by mixing C with C++. Consider using C++11 and its threading library. It makes the task much easier:
std::string match_worker(const std::string& query);
void run(const std::vector<std::string>& queries)
{
std::vector<std::future<std::string>> results;
results.reserve(queries.size());
for (auto& query : queries)
results.emplace_back(
std::async(std::launch::async, match_worker, query));
for (auto& result : results)
std::cout << result.get() << '\n';
}
How can I check if my array has an element I'm looking for?
In Java, I would do something like this:
Foo someObject = new Foo(someParameter);
Foo foo;
//search through Foo[] arr
for(int i = 0; i < arr.length; i++){
if arr[i].equals(someObject)
foo = arr[i];
}
if (foo == null)
System.out.println("Not found!");
else
System.out.println("Found!");
But in C++ I don't think I'm allowed to search if an Object is null so what would be the C++ solution?
In C++ you would use std::find, and check if the resultant pointer points to the end of the range, like this:
Foo array[10];
... // Init the array here
Foo *foo = std::find(std::begin(array), std::end(array), someObject);
// When the element is not found, std::find returns the end of the range
if (foo != std::end(array)) {
cerr << "Found at position " << std::distance(array, foo) << endl;
} else {
cerr << "Not found" << endl;
}
You would just do the same thing, looping through the array to search for the term you want. Of course if it's a sorted array this would be much faster, so something similar to prehaps:
for(int i = 0; i < arraySize; i++){
if(array[i] == itemToFind){
break;
}
}
There are many ways...one is to use the std::find() algorithm, e.g.
#include <algorithm>
int myArray[] = { 3, 2, 1, 0, 1, 2, 3 };
size_t myArraySize = sizeof(myArray) / sizeof(int);
int *end = myArray + myArraySize;
// find the value 0:
int *result = std::find(myArray, end, 0);
if (result != end) {
// found value at "result" pointer location...
}
Here is a simple generic C++11 function contains which works for both arrays and containers:
using namespace std;
template<class C, typename T>
bool contains(C&& c, T e) { return find(begin(c), end(c), e) != end(c); };
Simple usage contains(arr, el) is somewhat similar to in keyword semantics in Python.
Here is a complete demo:
#include <algorithm>
#include <array>
#include <string>
#include <vector>
#include <iostream>
template<typename C, typename T>
bool contains(C&& c, T e) {
return std::find(std::begin(c), std::end(c), e) != std::end(c);
};
template<typename C, typename T>
void check(C&& c, T e) {
std::cout << e << (contains(c,e) ? "" : " not") << " found\n";
}
int main() {
int a[] = { 10, 15, 20 };
std::array<int, 3> b { 10, 10, 10 };
std::vector<int> v { 10, 20, 30 };
std::string s { "Hello, Stack Overflow" };
check(a, 10);
check(b, 15);
check(v, 20);
check(s, 'Z');
return 0;
}
Output:
10 found
15 not found
20 found
Z not found
One wants this to be done tersely.
Nothing makes code more unreadable then spending 10 lines to achieve something elementary.
In C++ (and other languages) we have all and any which help us to achieve terseness in this case. I want to check whether a function parameter is valid, meaning equal to one of a number of values.
Naively and wrongly, I would first write
if (!any_of({ DNS_TYPE_A, DNS_TYPE_MX }, wtype) return false;
a second attempt could be
if (!any_of({ DNS_TYPE_A, DNS_TYPE_MX }, [&wtype](const int elem) { return elem == wtype; })) return false;
Less incorrect, but looses some terseness.
However, this is still not correct because C++ insists in this case (and many others) that I specify both start and end iterators and cannot use the whole container as a default for both. So, in the end:
const vector validvalues{ DNS_TYPE_A, DNS_TYPE_MX };
if (!any_of(validvalues.cbegin(), validvalues.cend(), [&wtype](const int elem) { return elem == wtype; })) return false;
which sort of defeats the terseness, but I don't know a better alternative...
Thank you for not pointing out that in the case of 2 values I could just have just if ( || ). The best approach here (if possible) is to use a case structure with a default where not only the values are checked, but also the appropriate actions are done.
The default case can be used for signalling an invalid value.
You can use old C-style programming to do the job. This will require little knowledge about C++. Good for beginners.
For modern C++ language you usually accomplish this through lambda, function objects, ... or algorithm: find, find_if, any_of, for_each, or the new for (auto& v : container) { } syntax. find class algorithm takes more lines of code. You may also write you own template find function for your particular need.
Here is my sample code
#include <iostream>
#include <functional>
#include <algorithm>
#include <vector>
using namespace std;
/**
* This is old C-like style. It is mostly gong from
* modern C++ programming. You can still use this
* since you need to know very little about C++.
* #param storeSize you have to know the size of store
* How many elements are in the array.
* #return the index of the element in the array,
* if not found return -1
*/
int in_array(const int store[], const int storeSize, const int query) {
for (size_t i=0; i<storeSize; ++i) {
if (store[i] == query) {
return i;
}
}
return -1;
}
void testfind() {
int iarr[] = { 3, 6, 8, 33, 77, 63, 7, 11 };
// for beginners, it is good to practice a looping method
int query = 7;
if (in_array(iarr, 8, query) != -1) {
cout << query << " is in the array\n";
}
// using vector or list, ... any container in C++
vector<int> vecint{ 3, 6, 8, 33, 77, 63, 7, 11 };
auto it=find(vecint.begin(), vecint.end(), query);
cout << "using find()\n";
if (it != vecint.end()) {
cout << "found " << query << " in the container\n";
}
else {
cout << "your query: " << query << " is not inside the container\n";
}
using namespace std::placeholders;
// here the query variable is bound to the `equal_to` function
// object (defined in std)
cout << "using any_of\n";
if (any_of(vecint.begin(), vecint.end(), bind(equal_to<int>(), _1, query))) {
cout << "found " << query << " in the container\n";
}
else {
cout << "your query: " << query << " is not inside the container\n";
}
// using lambda, here I am capturing the query variable
// into the lambda function
cout << "using any_of with lambda:\n";
if (any_of(vecint.begin(), vecint.end(),
[query](int val)->bool{ return val==query; })) {
cout << "found " << query << " in the container\n";
}
else {
cout << "your query: " << query << " is not inside the container\n";
}
}
int main(int argc, char* argv[]) {
testfind();
return 0;
}
Say this file is named 'testalgorithm.cpp'
you need to compile it with
g++ -std=c++11 -o testalgorithm testalgorithm.cpp
Hope this will help. Please update or add if I have made any mistake.
If you were originally looking for the answer to this question (int value in sorted (Ascending) int array), then you can use the following code that performs a binary search (fastest result):
static inline bool exists(int ints[], int size, int k) // array, array's size, searched value
{
if (size <= 0) // check that array size is not null or negative
return false;
// sort(ints, ints + size); // uncomment this line if array wasn't previously sorted
return (std::binary_search(ints, ints + size, k));
}
edit: Also works for unsorted int array if uncommenting sort.
You can do it in a beginners style by using control statements and loops..
#include <iostream>
using namespace std;
int main(){
int arr[] = {10,20,30,40,50}, toFind= 10, notFound = -1;
for(int i = 0; i<=sizeof(arr); i++){
if(arr[i] == toFind){
cout<< "Element is found at " <<i <<" index" <<endl;
return 0;
}
}
cout<<notFound<<endl;
}
C++ has NULL as well, often the same as 0 (pointer to address 0x00000000).
Do you use NULL or 0 (zero) for pointers in C++?
So in C++ that null check would be:
if (!foo)
cout << "not found";