Related
In a program I need to apply a function in parallel to each unique permutation of a vector. The size of the vector is around N=15
I already have a function void parallel_for_each_permutation which I can use in combination with a std::set to only process each unique permutation exactly once.
This all works well for the general case. However, in my use case the number of unique elements k per vector is very limited, usually around k=4. This means that I'm currently wasting time constructing the same unique permutation over and over again, just to throw it away because it has already been processed.
Is it possible to process all unique permutations in this special case, without constructing all N! permutations?
Example use-case:
#include <algorithm>
#include <thread>
#include <vector>
#include <mutex>
#include <numeric>
#include <set>
#include <iostream>
template<class Container1, class Container2>
struct Comp{
//compare element-wise less than
bool operator()(const Container1& l, const Container2& r) const{
auto pair = std::mismatch(l.begin(), l.end(), r.begin());
if(pair.first == l.end() && pair.second == r.end())
return false;
return *(pair.first) < *(pair.second);
}
};
template<class Container, class Func>
void parallel_for_each_permutation(const Container& container, int num_threads, Func func){
auto ithPermutation = [](int n, size_t i) -> std::vector<size_t>{
// https://stackoverflow.com/questions/7918806/finding-n-th-permutation-without-computing-others
std::vector<size_t> fact(n);
std::vector<size_t> perm(n);
fact[0] = 1;
for(int k = 1; k < n; k++)
fact[k] = fact[k-1] * k;
for(int k = 0; k < n; k++){
perm[k] = i / fact[n-1-k];
i = i % fact[n-1-k];
}
for(int k = n-1; k > 0; k--){
for(int j = k-1; j >= 0; j--){
if(perm[j] <= perm[k])
perm[k]++;
}
}
return perm;
};
size_t totalNumPermutations = 1;
for(size_t i = 1; i <= container.size(); i++)
totalNumPermutations *= i;
std::vector<std::thread> threads;
for(int threadId = 0; threadId < num_threads; threadId++){
threads.emplace_back([&, threadId](){
const size_t firstPerm = size_t(float(threadId) * totalNumPermutations / num_threads);
const size_t last_excl = std::min(totalNumPermutations, size_t(float(threadId+1) * totalNumPermutations / num_threads));
Container permutation(container);
auto permIndices = ithPermutation(container.size(), firstPerm);
size_t count = firstPerm;
do{
for(int i = 0; i < int(permIndices.size()); i++){
permutation[i] = container[permIndices[i]];
}
func(threadId, permutation);
std::next_permutation(permIndices.begin(), permIndices.end());
++count;
}while(count < last_excl);
});
}
for(auto& thread : threads)
thread.join();
}
template<class Container, class Func>
void parallel_for_each_unique_permutation(const Container& container, Func func){
using Comparator = Comp<Container, Container>;
constexpr int numThreads = 4;
std::set<Container, Comparator> uniqueProcessedPermutations(Comparator{});
std::mutex m;
parallel_for_each_permutation(
container,
numThreads,
[&](int threadId, const auto& permutation){
{
std::lock_guard<std::mutex> lg(m);
if(uniqueProcessedPermutations.count(permutation) > 0){
return;
}else{
uniqueProcessedPermutations.insert(permutation);
}
}
func(permutation);
}
);
}
int main(){
std::vector<int> vector1{1,1,1,1,2,3,2,2,3,3,1};
auto func = [](const auto& vec){return;};
parallel_for_each_unique_permutation(vector1, func);
}
The permutations you have to work with are known in the field of combinatorics as multiset permutations.
They are described for example on The Combinatorial Object Server
with more detailed explanations in this paper by professor Tadao Takaoka.
You have some related Python code and some C++ code in the FXT open source library.
You might consider adding the "multiset" and "combinatorics" tags to your question.
One possibility is to borrow the (header-only) algorithmic code from the FXT library, which provides a simple generator class for those multiset permutations.
Performance level:
Using the FXT algorithm on a test vector of 15 objects, {1,1,1, 2,2,2, 3,3,3,3, 4,4,4,4,4}, one can generate all associated 12,612,600 "permutations" in less than 2 seconds on a plain vanilla Intel x86-64 machine; this is without diagnostics text I/O and without any attempt at optimization.
The algorithm generates exactly those "permutations" that are required, nothing more. So there is no longer a need to generate all 15! "raw" permutations nor to use mutual exclusion to update a shared data structure for filtering purposes.
An adaptor class for generating the permutations:
I will try below to provide code for an adaptor class, which allows your application to use the FXT algorithm while containing the dependency into a single implementation file. That way, the code will hopefully fit better into your application. Think FXT's ulong type and use of raw pointers, versus std::vector<std::size_t> in your code. Besides, FXT is a very extensive library.
Header file for the "adaptor" class:
// File: MSetPermGen.h
#ifndef MSET_PERM_GEN_H
#define MSET_PERM_GEN_H
#include <iostream>
#include <vector>
class MSetPermGenImpl; // from algorithmic backend
using IntVec = std::vector<int>;
using SizeVec = std::vector<std::size_t>;
// Generator class for multiset permutations:
class MSetPermGen {
public:
MSetPermGen(const IntVec& vec);
std::size_t getCycleLength() const;
bool forward(size_t incr);
bool next();
const SizeVec& getPermIndices() const;
const IntVec& getItems() const;
const IntVec& getItemValues() const;
private:
std::size_t cycleLength_;
MSetPermGenImpl* genImpl_; // implementation generator
IntVec itemValues_; // only once each
IntVec items_; // copy of ctor argument
SizeVec freqs_; // repetition counts
SizeVec state_; // array of indices in 0..n-1
};
#endif
The class constructor takes exactly the argument type provided in your main program. Of course, the key method is next(). You can also move the automaton by several steps at once using the forward(incr)method.
Example client program:
// File: test_main.cpp
#include <cassert>
#include "MSetPermGen.h"
using std::cout;
using std::cerr;
using std::endl;
// utility functions:
std::vector<int> getMSPermutation(const MSetPermGen& mspg)
{
std::vector<int> res;
auto indices = mspg.getPermIndices(); // always between 0 and n-1
auto values = mspg.getItemValues(); // whatever the user put in
std::size_t n = indices.size();
assert( n == items.size() );
res.reserve(n);
for (std::size_t i=0; i < n; i++) {
auto xi = indices[i];
res.push_back(values[xi]);
}
return res;
}
void printPermutation(const std::vector<int>& p, std::ostream& fh)
{
std::size_t n = p.size();
for (size_t i=0; i < n; i++)
fh << p[i] << " ";
fh << '\n';
}
int main(int argc, const char* argv[])
{
std::vector<int> vec0{1,1, 2,2,2}; // N=5
std::vector<int> vec1{1,1, 1,1, 2, 3, 2,2, 3,3, 1}; // N=11
std::vector<int> vec2{1,1,1, 2,2,2, 3,3,3,3, 4,4,4,4,4}; // N=15
MSetPermGen pg0{vec0};
MSetPermGen pg1{vec1};
MSetPermGen pg2{vec2};
auto pg = &pg0; // choice of 0, 1, 2 for sizing
auto cl = pg->getCycleLength();
auto permA = getMSPermutation(*pg);
printPermutation(permA, cout);
for (std::size_t pi=0; pi < (cl-1); pi++) {
pg->next();
auto permB = getMSPermutation(*pg);
printPermutation(permB, cout);
}
return EXIT_SUCCESS;
}
Text output from the above small program:
1 1 2 2 2
1 2 1 2 2
1 2 2 1 2
1 2 2 2 1
2 1 1 2 2
2 1 2 1 2
2 1 2 2 1
2 2 1 1 2
2 2 1 2 1
2 2 2 1 1
You get only 10 items from vector {1,1, 2,2,2}, because 5! / (2! * 3!) = 120/(2*6) = 10.
The implementation file for the adaptor class, MSetPermGen.cpp, consists of two parts. The first part is FXT code with minimal adaptations. The second part is the MSetPermGen class proper.
First part of implementation file:
// File: MSetPermGen.cpp - part 1 of 2 - FXT code
// -------------- Beginning of header-only FXT combinatorics code -----------
// This file is part of the FXT library.
// Copyright (C) 2010, 2012, 2014 Joerg Arndt
// License: GNU General Public License version 3 or later,
// see the file COPYING.txt in the main directory.
//-- https://www.jjj.de/fxt/
//-- https://fossies.org/dox/fxt-2018.07.03/mset-perm-lex_8h_source.html
#include <cstddef>
using ulong = std::size_t;
inline void swap2(ulong& xa, ulong& xb)
{
ulong save_xb = xb;
xb = xa;
xa = save_xb;
}
class mset_perm_lex
// Multiset permutations in lexicographic order, iterative algorithm.
{
public:
ulong k_; // number of different sorts of objects
ulong *r_; // number of elements '0' in r[0], '1' in r[1], ..., 'k-1' in r[k-1]
ulong n_; // number of objects
ulong *ms_; // multiset data in ms[0], ..., ms[n-1], sentinels at [-1] and [-2]
private: // have pointer data
mset_perm_lex(const mset_perm_lex&); // forbidden
mset_perm_lex & operator = (const mset_perm_lex&); // forbidden
public:
explicit mset_perm_lex(const ulong *r, ulong k)
{
k_ = k;
r_ = new ulong[k];
for (ulong j=0; j<k_; ++j) r_[j] = r[j]; // get buckets
n_ = 0;
for (ulong j=0; j<k_; ++j) n_ += r_[j];
ms_ = new ulong[n_+2];
ms_[0] = 0; ms_[1] = 1; // sentinels: ms[0] < ms[1]
ms_ += 2; // nota bene
first();
}
void first()
{
for (ulong j=0, i=0; j<k_; ++j)
for (ulong h=r_[j]; h!=0; --h, ++i)
ms_[i] = j;
}
~mset_perm_lex()
{
ms_ -= 2;
delete [] ms_;
delete [] r_;
}
const ulong * data() const { return ms_; }
ulong next()
// Return position of leftmost change,
// return n with last permutation.
{
// find rightmost pair with ms[i] < ms[i+1]:
const ulong n1 = n_ - 1;
ulong i = n1;
do { --i; } while ( ms_[i] >= ms_[i+1] ); // can read sentinel
if ( (long)i < 0 ) return n_; // last sequence is falling seq.
// find rightmost element ms[j] less than ms[i]:
ulong j = n1;
while ( ms_[i] >= ms_[j] ) { --j; }
swap2(ms_[i], ms_[j]);
// Here the elements ms[i+1], ..., ms[n-1] are a falling sequence.
// Reverse order to the right:
ulong r = n1;
ulong s = i + 1;
while ( r > s ) { swap2(ms_[r], ms_[s]); --r; ++s; }
return i;
}
};
// -------------- End of header-only FXT combinatorics code -----------
Second part of the class implementation file:
// Second part of file MSetPermGen.cpp: non-FXT code
#include <cassert>
#include <tuple>
#include <map>
#include <iostream>
#include <cstdio>
#include "MSetPermGen.h"
using std::cout;
using std::cerr;
using std::endl;
class MSetPermGenImpl { // wrapper class
public:
MSetPermGenImpl(const SizeVec& freqs) : fg(freqs.data(), freqs.size())
{}
private:
mset_perm_lex fg;
friend class MSetPermGen;
};
static std::size_t fact(size_t n)
{
std::size_t f = 1;
for (std::size_t i = 1; i <= n; i++)
f = f*i;
return f;
}
MSetPermGen::MSetPermGen(const IntVec& vec) : items_(vec)
{
std::map<int,int> ma;
for (int i: vec) {
ma[i]++;
}
int item, freq;
for (const auto& p : ma) {
std::tie(item, freq) = p;
itemValues_.push_back(item);
freqs_.push_back(freq);
}
cycleLength_ = fact(items_.size());
for (auto i: freqs_)
cycleLength_ /= fact(i);
// create FXT-level generator:
genImpl_ = new MSetPermGenImpl(freqs_);
for (std::size_t i=0; i < items_.size(); i++)
state_.push_back(genImpl_->fg.ms_[i]);
}
std::size_t MSetPermGen::getCycleLength() const
{
return cycleLength_;
}
bool MSetPermGen::forward(size_t incr)
{
std::size_t n = items_.size();
std::size_t rc = 0;
// move forward state by brute force, could be improved:
for (std::size_t i=0; i < incr; i++)
rc = genImpl_->fg.next();
for (std::size_t j=0; j < n; j++)
state_[j] = genImpl_->fg.ms_[j];
return (rc != n);
}
bool MSetPermGen::next()
{
return forward(1);
}
const SizeVec& MSetPermGen::getPermIndices() const
{
return (this->state_);
}
const IntVec& MSetPermGen::getItems() const
{
return (this->items_);
}
const IntVec& MSetPermGen::getItemValues() const
{
return (this->itemValues_);
}
Adapting the parallel application:
Regarding your multithreaded application, given that generating the "permutations" is cheap, you can afford to create one generator object per thread.
Before launching the actual computation, you forward each generator to its appropriate initial position, that is at step thread_id * (cycleLength / num_threads).
I have tried to adapt your code to this MSetPermGen class along these lines. See code below.
With 3 threads, an input vector {1,1,1, 2,2,2, 3,3,3,3, 4,4,4,4,4} of size 15 (giving 12,612,600 permutations) and all diagnostics enabled, your modified parallel program runs in less than 10 seconds; less than 2 seconds with all diagnostics switched off.
Modified parallel program:
#include <algorithm>
#include <thread>
#include <vector>
#include <atomic>
#include <mutex>
#include <numeric>
#include <set>
#include <iostream>
#include <fstream>
#include <sstream>
#include <cstdlib>
#include "MSetPermGen.h"
using std::cout;
using std::endl;
// debug and instrumentation:
static std::atomic<size_t> permCounter;
static bool doManagePermCounter = true;
static bool doThreadLogfiles = true;
static bool doLogfileHeaders = true;
template<class Container, class Func>
void parallel_for_each_permutation(const Container& container, int numThreads, Func mfunc) {
MSetPermGen gen0(container);
std::size_t totalNumPermutations = gen0.getCycleLength();
std::size_t permShare = totalNumPermutations / numThreads;
if ((totalNumPermutations % numThreads) != 0)
permShare++;
std::cout << "totalNumPermutations: " << totalNumPermutations << std::endl;
std::vector<std::thread> threads;
for (int threadId = 0; threadId < numThreads; threadId++) {
threads.emplace_back([&, threadId]() {
// generate some per-thread logfile name
std::ostringstream fnss;
fnss << "thrlog_" << threadId << ".txt";
std::string fileName = fnss.str();
std::ofstream fh(fileName);
MSetPermGen thrGen(container);
const std::size_t firstPerm = permShare * threadId;
thrGen.forward(firstPerm);
const std::size_t last_excl = std::min(totalNumPermutations,
(threadId+1) * permShare);
if (doLogfileHeaders) {
fh << "MSG threadId: " << threadId << '\n';
fh << "MSG firstPerm: " << firstPerm << '\n';
fh << "MSG lastExcl : " << last_excl << '\n';
}
Container permutation(container);
auto values = thrGen.getItemValues();
auto permIndices = thrGen.getPermIndices();
auto nsz = permIndices.size();
std::size_t count = firstPerm;
do {
for (std::size_t i = 0; i < nsz; i++) {
permutation[i] = values[permIndices[i]];
}
mfunc(threadId, permutation);
if (doThreadLogfiles) {
for (std::size_t i = 0; i < nsz; i++)
fh << permutation[i] << ' ';
fh << '\n';
}
thrGen.next();
permIndices = thrGen.getPermIndices();
++count;
if (doManagePermCounter) {
permCounter++;
}
} while (count < last_excl);
fh.close();
});
}
for(auto& thread : threads)
thread.join();
}
template<class Container, class Func>
void parallel_for_each_unique_permutation(const Container& container, Func func) {
constexpr int numThreads = 3;
parallel_for_each_permutation(
container,
numThreads,
[&](int threadId, const auto& permutation){
// no longer need any mutual exclusion
func(permutation);
}
);
}
int main()
{
std::vector<int> vector1{1,1,1,1,2,3,2,2,3,3,1}; // N=11
std::vector<int> vector0{1,1, 2,2,2}; // N=5
std::vector<int> vector2{1,1,1, 2,2,2, 3,3,3,3, 4,4,4,4,4}; // N=15
auto func = [](const auto& vec) { return; };
permCounter.store(0);
parallel_for_each_unique_permutation(vector2, func);
auto finalPermCounter = permCounter.load();
cout << "FinalPermCounter = " << finalPermCounter << endl;
}
Problem: I do not always know the exact size of the Jacobian or Function vector that I am going to use Levenberg Marquardt on. Therefore, I need to set the dimensions of them at compile time.
Expected: After I declare an instance of MyFunctorDense. I could set the "InputsAtCompileTime" to my input size and set "ValuesAtCompileTime" to my values size. Then my Jacobian ,aFjac, should have the dimensions tValues x tInputs, and my function vector, aH, should have the dimensions tValues x 1.
Observed:
.h file
#pragma once
#include "stdafx.h"
#include <iostream>
#include <unsupported/Eigen/LevenbergMarquardt>
#include <unsupported/Eigen/NumericalDiff>
//Generic functor
template <typename _Scalar, typename _Index>
struct MySparseFunctor
{
typedef _Scalar Scalar;
typedef _Index Index;
typedef Eigen::Matrix<Scalar,Eigen::Dynamic,1> InputType;
typedef Eigen::Matrix<Scalar,Eigen::Dynamic,1> ValueType;
typedef Eigen::SparseMatrix<Scalar, Eigen::ColMajor, Index>
JacobianType;
typedef Eigen::SparseQR<JacobianType, Eigen::COLAMDOrdering<int> >
QRSolver;
enum {
InputsAtCompileTime = Eigen::Dynamic,
ValuesAtCompileTime = Eigen::Dynamic
};
MySparseFunctor(int inputs, int values) : m_inputs(inputs),
m_values(values) {}
int inputs() const { return m_inputs; }
int values() const { return m_values; }
const int m_inputs, m_values;
};
template <typename _Scalar, int NX=Eigen::Dynamic, int NY=Eigen::Dynamic>
struct MyDenseFunctor
{
typedef _Scalar Scalar;
enum {
InputsAtCompileTime = NX,
ValuesAtCompileTime = NY
};
typedef Eigen::Matrix<Scalar,InputsAtCompileTime,1> InputType;
typedef Eigen::Matrix<Scalar,ValuesAtCompileTime,1> ValueType;
typedef Eigen::Matrix<Scalar,ValuesAtCompileTime,InputsAtCompileTime>
JacobianType;
typedef Eigen::ColPivHouseholderQR<JacobianType> QRSolver;
const int m_inputs, m_values;
MyDenseFunctor() : m_inputs(InputsAtCompileTime),
m_values(ValuesAtCompileTime) {}
MyDenseFunctor(int inputs, int values) : m_inputs(inputs),
m_values(values) {}
int inputs() const { return m_inputs; }
int values() const { return m_values; }
};
struct MyFunctorSparse : MySparseFunctor<double, int>
{
MyFunctorSparse(void) : MySparseFunctor<double, int>(2 , 2) {}
int operator()(const Eigen::VectorXd &aX, //Input
Eigen::VectorXd &aF) const; //Output
int df(const InputType &aF, JacobianType& aFjac);
};
struct MyFunctorDense : MyDenseFunctor<double>
{
MyFunctorDense(void) : MyDenseFunctor<double>( Eigen::Dynamic ,
Eigen::Dynamic) {}
int operator()(const InputType &aX, //Input
ValueType &aF) const; //Output
int df(const InputType &aX, JacobianType& aFjac);
};
.cpp file
#pragma once
#include "stdafx.h"
#include "Main.h"
int MyFunctorSparse::operator()(const Eigen::VectorXd &aX, //Input
Eigen::VectorXd &aF) const //Output
{
//F = aX0^2 + aX1^2
aF(0) = aX(0)*aX(0) + aX(1)*aX(1);
aF(1) = 0;
return 0;
}
int MyFunctorDense::operator()(const InputType &aX, //Input
ValueType &aF) const //Output
{
//F = aX0^2 + aX1^2
for (int i = 0; i < aF.size(); i++)
{
aF(i) = i*aX(0)*aX(0) + i*(aX(1)-1)*(aX(1)-1);
}
return 0;
}
int MyFunctorSparse::df(const InputType &aX, JacobianType& aFjac)
{
aFjac.coeffRef(0, 0) = 2*aX(0);
aFjac.coeffRef(0, 1) = 2*aX(1);
aFjac.coeffRef(1, 0) = 0.0;
aFjac.coeffRef(1, 1) = 0.0;
return 0;
}
int MyFunctorDense::df(const InputType &aX, JacobianType& aFjac)
{
for(int i = 0; i< aFjac.size(); i++)
{
aFjac(i, 0) = 2*i*aX(0);
aFjac(i, 1) = 2*i*(aX(1)-1);
}
return 0;
}
int main(int argc, char *argv[])
{
int input;
std::cout << "Enter 1 to run LM with DenseFunctor, Enter 2 to run LM with
SparseFunctor: " << std::endl;
std::cin >> input;
Eigen::VectorXd tX(2);
tX(0) = 10;
tX(1) = 0.5;
int tInputs = tX.rows();
int tValues = 60928;
std::cout << "tX: " << tX << std::endl;
if (input == 1)
{
MyFunctorDense myDenseFunctor;
tInputs = myDenseFunctor.inputs();
tValues = myDenseFunctor.values();
std::cout << "tInputs : " << tInputs << std::endl;
std::cout << "tValues : " << tValues << std::endl;
Eigen::LevenbergMarquardt<MyFunctorDense> lm(myDenseFunctor);
lm.setMaxfev(30);
lm.setXtol(1e-5);
lm.minimize(tX);
}
if (input == 2)
{
MyFunctorSparse myFunctorSparse;
//Eigen::NumericalDiff<MyFunctor> numDiff(myFunctor);
//Eigen::LevenbergMarquardt<Eigen::NumericalDiff<MyFunctor>,double>
lm(numDiff);
Eigen::LevenbergMarquardt<MyFunctorSparse> lm(myFunctorSparse);
lm.setMaxfev(2000);
lm.setXtol(1e-10);
lm.minimize(tX);
}
std::cout << "tX minimzed: " << tX << std::endl;
return 0;
}
Solution: I figured out my problem. I replaced:
const int m_inputs, m_values;
with
int m_inputs, m_values;
in the ".h" file this makes the member variable of the struct MyFunctorDense modifiable. So, then in the ".cpp" below the line
std::cout << "tX: " << tX << std::endl;
I added:
Eigen::VectorXd tF(60928);
because this is a test function vector of dimension 60928x1. Therefore, I could put in any arbitrary nx1 dimension.
Then below the line:
MyFunctorDense myDenseFunctor;
I added:
myDenseFunctor.m_inputs = tX.rows();
myDenseFunctor.m_values = tF.rows();
Now I get the result:
I've been trying to compile my program which should push a string and a float pair back on a vector:
typedef std::pair<string, float> Prediction;
std::vector<Prediction> predictions;
for ( int i = 0 ; i < output.size(); i++ ) {
std::vector<int> maxN = Argmax(output[i], 1);
int idx = maxN[0];
predictions.push_back(std::make_pair(labels_[idx], output[idx]));
}
return predictions;
However, every time I try to compile this, I get this error:
error: no matching member function for call to 'push_back'
predictions.push_back(std::make_pair(labels_[idx], output[idx]));
I also get a few other warnings saying things like
candidate function not viable: no known conversion from 'pair<[...],
typename __make_pair_return >
&>::type>' to 'const pair<[...], float>' for 1st argument
_LIBCPP_INLINE_VISIBILITY void push_back(const_reference __x);
and
candidate function not viable: no known conversion from 'pair<[...],
typename __make_pair_return >
&>::type>' to 'pair<[...], float>' for 1st argument
_LIBCPP_INLINE_VISIBILITY void push_back(value_type&& __x);
I've been trying to rewrite things and modify my functions but I can't work out why this error remains, does anyone know what I can do to fix this?
Here is the code in context if that helps, the header file:
/**
* Classification System
*/
#ifndef __CLASSIFIER_H__
#define __CLASSIFIER_H__
#include <caffe/caffe.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <algorithm>
#include <iosfwd>
#include <memory>
#include <string>
#include <utility>
#include <vector>
using namespace caffe; // NOLINT(build/namespaces)
using std::string;
/* Pair (label, confidence) representing a prediction. */
typedef std::pair<string, float> Prediction;
class Classifier {
public:
Classifier(const string& model_file,
const string& trained_file,
const string& label_file);
std::vector< Prediction > Classify(const std::vector<cv::Mat>& img);
private:
std::vector< std::vector<float> > Predict(const std::vector<cv::Mat>& img, int nImages);
void WrapInputLayer(std::vector<cv::Mat>* input_channels, int nImages);
void Preprocess(const std::vector<cv::Mat>& img,
std::vector<cv::Mat>* input_channels, int nImages);
private:
shared_ptr<Net<float> > net_;
cv::Size input_geometry_;
int num_channels_;
std::vector<string> labels_;
};
#endif /* __CLASSIFIER_H__ */
Class File:
#define CPU_ONLY
#include "Classifier.h"
using namespace caffe; // NOLINT(build/namespaces)
using std::string;
Classifier::Classifier(const string& model_file,
const string& trained_file,
const string& label_file) {
#ifdef CPU_ONLY
Caffe::set_mode(Caffe::CPU);
#else
Caffe::set_mode(Caffe::GPU);
#endif
/* Load the network. */
net_.reset(new Net<float>(model_file, TEST));
net_->CopyTrainedLayersFrom(trained_file);
CHECK_EQ(net_->num_inputs(), 1) << "Network should have exactly one input.";
CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";
Blob<float>* input_layer = net_->input_blobs()[0];
num_channels_ = input_layer->channels();
CHECK(num_channels_ == 3 || num_channels_ == 1)
<< "Input layer should have 1 or 3 channels.";
input_geometry_ = cv::Size(input_layer->width(), input_layer->height());
/* Load labels. */
std::ifstream labels(label_file.c_str());
CHECK(labels) << "Unable to open labels file " << label_file;
string line;
while (std::getline(labels, line))
labels_.push_back(string(line));
Blob<float>* output_layer = net_->output_blobs()[0];
CHECK_EQ(labels_.size(), output_layer->channels())
<< "Number of labels is different from the output layer dimension.";
}
static bool PairCompare(const std::pair<float, int>& lhs,
const std::pair<float, int>& rhs) {
return lhs.first > rhs.first;
}
/* Return the indices of the top N values of vector v. */
static std::vector<int> Argmax(const std::vector<float>& v, int N) {
std::vector<std::pair<float, int> > pairs;
for (size_t i = 0; i < v.size(); ++i)
pairs.push_back(std::make_pair(v[i], i));
std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);
std::vector<int> result;
for (int i = 0; i < N; ++i)
result.push_back(pairs[i].second);
return result;
}
std::vector<Prediction> Classifier::Classify(const std::vector<cv::Mat>& img) {
std::vector< std::vector<float> > output = Predict(img, img.size());
std::vector<Prediction> predictions;
for ( int i = 0 ; i < output.size(); i++ ) {
std::vector<int> maxN = Argmax(output[i], 1);
int idx = maxN[0];
predictions.push_back(std::make_pair(labels_[idx], output[idx]));
}
return predictions;
}
std::vector< std::vector<float> > Classifier::Predict(const std::vector<cv::Mat>& img, int nImages) {
Blob<float>* input_layer = net_->input_blobs()[0];
input_layer->Reshape(nImages, num_channels_,
input_geometry_.height, input_geometry_.width);
/* Forward dimension change to all layers. */
net_->Reshape();
std::vector<cv::Mat> input_channels;
WrapInputLayer(&input_channels, nImages);
Preprocess(img, &input_channels, nImages);
net_->ForwardPrefilled();
/* Copy the output layer to a std::vector */
Blob<float>* output_layer = net_->output_blobs()[0];
std::vector <std::vector<float> > ret;
for (int i = 0; i < nImages; i++) {
const float* begin = output_layer->cpu_data() + i*output_layer->channels();
const float* end = begin + output_layer->channels();
ret.push_back( std::vector<float>(begin, end) );
}
return ret;
}
/* Wrap the input layer of the network in separate cv::Mat objects
* (one per channel). This way we save one memcpy operation and we
* don't need to rely on cudaMemcpy2D. The last preprocessing
* operation will write the separate channels directly to the input
* layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels, int nImages) {
Blob<float>* input_layer = net_->input_blobs()[0];
int width = input_layer->width();
int height = input_layer->height();
float* input_data = input_layer->mutable_cpu_data();
for (int i = 0; i < input_layer->channels()* nImages; ++i) {
cv::Mat channel(height, width, CV_32FC1, input_data);
input_channels->push_back(channel);
input_data += width * height;
}
}
void Classifier::Preprocess(const std::vector<cv::Mat>& img,
std::vector<cv::Mat>* input_channels, int nImages) {
for (int i = 0; i < nImages; i++) {
vector<cv::Mat> channels;
cv::split(img[i], channels);
for (int j = 0; j < channels.size(); j++){
channels[j].copyTo((*input_channels)[i*num_channels_[0]+j]);
}
}
}
Thanks so much!
typedef std::pair<string, float> Prediction;
std::vector<Prediction> predictions;
std::vector< std::vector<float> > output = Predict(img, img.size());
make_pair expects a string and a float. output[idx] gives a vector of floats. So you need output[i][idx] for only a float.
I've written an indirect radix sort algorithm in C++ (by indirect, I mean it returns the indices of the items):
#include <algorithm>
#include <iterator>
#include <vector>
template<class It1, class It2>
void radix_ipass(
It1 begin, It1 const end,
It2 const a, size_t const i,
std::vector<std::vector<size_t> > &buckets)
{
size_t ncleared = 0;
for (It1 j = begin; j != end; ++j)
{
size_t const k = a[*j][i];
while (k >= ncleared && ncleared < buckets.size())
{ buckets[ncleared++].clear(); }
if (k >= buckets.size())
{
buckets.resize(k + 1);
ncleared = buckets.size();
}
buckets[k].push_back(size_t());
using std::swap; swap(buckets[k].back(), *j);
}
for (std::vector<std::vector<size_t> >::iterator
j = buckets.begin(); j != buckets.begin() + ncleared; j->clear(), ++j)
{
begin = std::swap_ranges(j->begin(), j->end(), begin);
}
}
template<class It, class It2>
void radix_isort(It const begin, It const end, It2 const items)
{
for (ptrdiff_t i = 0; i != end - begin; ++i) { items[i] = i; }
size_t smax = 0;
for (It i = begin; i != end; ++i)
{
size_t const n = i->size();
smax = n > smax ? n : smax;
}
std::vector<std::vector<size_t> > buckets;
for (size_t i = 0; i != smax; ++i)
{
radix_ipass(
items, items + (end - begin),
begin, smax - i - 1, buckets);
}
}
It seems to perform around 40% faster than std::sort when I test it with the following code (3920 ms compared to 6530 ms):
#include <functional>
template<class Key>
struct key_comp : public Key
{
explicit key_comp(Key const &key = Key()) : Key(key) { }
template<class T>
bool operator()(T const &a, T const &b) const
{ return this->Key::operator()(a) < this->Key::operator()(b); }
};
template<class Key>
key_comp<Key> make_key_comp(Key const &key) { return key_comp<Key>(key); }
template<class T1, class T2>
struct add : public std::binary_function<T1, T2, T1>
{ T1 operator()(T1 a, T2 const &b) const { return a += b; } };
template<class F>
struct deref : public F
{
deref(F const &f) : F(f) { }
typename std::iterator_traits<
typename F::result_type
>::value_type const
&operator()(typename F::argument_type const &a) const
{ return *this->F::operator()(a); }
};
template<class T> deref<T> make_deref(T const &t) { return deref<T>(t); }
size_t xorshf96(void) // random number generator
{
static size_t x = 123456789, y = 362436069, z = 521288629;
x ^= x << 16;
x ^= x >> 5;
x ^= x << 1;
size_t t = x;
x = y;
y = z;
z = t ^ x ^ y;
return z;
}
#include <stdio.h>
#include <time.h>
#include <array>
int main(void)
{
typedef std::vector<std::array<size_t, 3> > Items;
Items items(1 << 24);
std::vector<size_t> ranks(items.size() * 2);
for (size_t i = 0; i != items.size(); i++)
{
ranks[i] = i;
for (size_t j = 0; j != items[i].size(); j++)
{ items[i][j] = xorshf96() & 0xFFF; }
}
clock_t const start = clock();
if (1) { radix_isort(items.begin(), items.end(), ranks.begin()); }
else // STL sorting
{
std::sort(
ranks.begin(),
ranks.begin() + items.size(),
make_key_comp(make_deref(std::bind1st(
add<Items::const_iterator, ptrdiff_t>(),
items.begin()))));
}
printf("%u ms\n",
(unsigned)((clock() - start) * 1000 / CLOCKS_PER_SEC),
std::min(ranks.begin(), ranks.end()));
return 0;
}
Hmm, I guess that's the best I can do, I thought.
But after lots of banging my head against the wall, I realized that prefetching in the beginning of radix_ipass can help cut down the result to 1440 ms (!):
#include <xmmintrin.h>
...
for (It1 j = begin; j != end; ++j)
{
#if defined(_MM_TRANSPOSE4_PS) // should be defined if xmmintrin.h is included
enum { N = 8 };
if (end - j > N)
{ _mm_prefetch((char const *)(&a[j[N]][i]), _MM_HINT_T0); }
#endif
...
}
Clearly, the bottleneck is the memory bandwidth---the access pattern is unpredictable.
So now my question is: what else can I do to make it even faster on similar amounts of data?
Or is there not much room left for improvement?
(I'm hoping to avoid compromising the readability of the code if possible, so if the readability is harmed, the improvement should be significant.)
Using a more compact data structure that combines ranks and values can boost the performance of std::sort by a factor 2-3. Essentially, the sort now runs on a vector<pair<Value,Rank>>. The Value data type, std::array<integer_type, 3> has been replaced for this by a more compact pair<uint32_t, uint8_t> data structure. Only half a byte of it is unused, and the < comparison can by done in two steps, first using a presumably efficient comparison of uint32_ts (it's not clear if the loop used by std::array<..>::operator< can be optimized to a similarly fast code, but the replacement of std::array<integer_type,3> by this data structure yielded another performance boost).
Still, it doesn't get as efficient as the radix sort. (Maybe you could tweak a custom QuickSort with prefetches?)
Besides that additional sorting method, I've replaced the xorshf96 by a mt19937, because I know how to provide a seed for the latter ;)
The seed and the number of values can be changed via two command-line arguments: first the seed, then the count.
Compiled with g++ 4.9.0 20131022, using -std=c++11 -march=native -O3, for a 64-bit linux
Sample runs; important note running on a Core2Duo processor U9400 (old & slow!)
item count: 16000000
using std::sort
duration: 12260 ms
result sorted: true
seed: 5648
item count: 16000000
using std::sort
duration: 12230 ms
result sorted: true
seed: 5648
item count: 16000000
using std::sort
duration: 12230 ms
result sorted: true
seed: 5648
item count: 16000000
using std::sort with a packed data structure
duration: 4290 ms
result sorted: true
seed: 5648
item count: 16000000
using std::sort with a packed data structure
duration: 4270 ms
result sorted: true
seed: 5648
item count: 16000000
using std::sort with a packed data structure
duration: 4280 ms
result sorted: true
item count: 16000000
using radix sort
duration: 3790 ms
result sorted: true
seed: 5648
item count: 16000000
using radix sort
duration: 3820 ms
result sorted: true
seed: 5648
item count: 16000000
using radix sort
duration: 3780 ms
result sorted: true
New or changed code:
template<class It>
struct fun_obj
{
It beg;
bool operator()(ptrdiff_t lhs, ptrdiff_t rhs)
{
return beg[lhs] < beg[rhs];
}
};
template<class It>
fun_obj<It> make_fun_obj(It beg)
{
return fun_obj<It>{beg};
}
struct uint32p8_t
{
uint32_t m32;
uint8_t m8;
uint32p8_t(std::array<uint16_t, 3> const& a)
: m32( a[0]<<(32-3*4) | a[1]<<(32-2*3*4) | (a[2]&0xF00)>>8)
, m8( a[2]&0xFF )
{
}
operator std::array<size_t, 3>() const
{
return {{m32&0xFFF00000 >> (32-3*4), m32&0x000FFF0 >> (32-2*3*4),
(m32&0xF)<<8 | m8}};
}
friend bool operator<(uint32p8_t const& lhs, uint32p8_t const& rhs)
{
if(lhs.m32 < rhs.m32) return true;
if(lhs.m32 > rhs.m32) return false;
return lhs.m8 < rhs.m8;
}
};
#include <stdio.h>
#include <time.h>
#include <array>
#include <iostream>
#include <iomanip>
#include <utility>
#include <algorithm>
#include <cstdlib>
#include <iomanip>
#include <random>
int main(int argc, char* argv[])
{
std::cout.sync_with_stdio(false);
constexpr auto items_count_default = 2<<22;
constexpr auto seed_default = 42;
uint32_t const seed = argc > 1 ? std::atoll(argv[1]) : seed_default;
std::cout << "seed: " << seed << "\n";
size_t const items_count = argc > 2 ? std::atoll(argv[2])
: items_count_default;
std::cout << "item count: " << items_count << "\n";
using Items_array_value_t =
#ifdef RADIX_SORT
size_t
#elif defined(STDSORT)
uint16_t
#elif defined(STDSORT_PACKED)
uint16_t
#endif
;
typedef std::vector<std::array<Items_array_value_t, 3> > Items;
Items items(items_count);
auto const ranks_count =
#ifdef RADIX_SORT
items.size() * 2
#elif defined(STDSORT)
items.size()
#elif defined(STDSORT_PACKED)
items.size()
#endif
;
//auto prng = xorshf96;
std::mt19937 gen(seed);
std::uniform_int_distribution<> dist;
auto prng = [&dist, &gen]{return dist(gen);};
std::vector<size_t> ranks(ranks_count);
for (size_t i = 0; i != items.size(); i++)
{
ranks[i] = i;
for (size_t j = 0; j != items[i].size(); j++)
{ items[i][j] = prng() & 0xFFF; }
}
std::cout << "using ";
clock_t const start = clock();
#ifdef RADIX_SORT
std::cout << "radix sort\n";
radix_isort(items.begin(), items.end(), ranks.begin());
#elif defined(STDSORT)
std::cout << "std::sort\n";
std::sort(ranks.begin(), ranks.begin() + items.size(),
make_fun_obj(items.cbegin())
//make_key_comp(make_deref(std::bind1st(
// add<Items::const_iterator, ptrdiff_t>(),
// items.begin())))
);
#elif defined(STDSORT_PACKED)
std::cout << "std::sort with a packed data structure\n";
using Items_ranks = std::vector< std::pair<uint32p8_t,
decltype(ranks)::value_type> >;
Items_ranks items_ranks;
size_t i = 0;
for(auto iI = items.cbegin(); iI != items.cend(); ++iI, ++i)
{
items_ranks.emplace_back(*iI, i);
}
std::sort(begin(items_ranks), end(items_ranks),
[](Items_ranks::value_type const& lhs,
Items_ranks::value_type const& rhs)
{ return lhs.first < rhs.first; }
);
std::transform(items_ranks.cbegin(), items_ranks.cend(), begin(ranks),
[](Items_ranks::value_type const& e) { return e.second; }
);
#endif
auto const duration = (clock() - start) / (CLOCKS_PER_SEC / 1000);
bool const sorted = std::is_sorted(ranks.begin(), ranks.begin() + items.size(),
make_fun_obj(items.cbegin()));
std::cout << "duration: " << duration << " ms\n"
<< "result sorted: " << std::boolalpha << sorted << "\n";
return 0;
}
Full code:
#include <algorithm>
#include <iterator>
#include <vector>
#include <cstddef>
using std::size_t;
using std::ptrdiff_t;
#include <xmmintrin.h>
template<class It1, class It2>
void radix_ipass(
It1 begin, It1 const end,
It2 const a, size_t const i,
std::vector<std::vector<size_t> > &buckets)
{
size_t ncleared = 0;
for (It1 j = begin; j != end; ++j)
{
#if defined(_MM_TRANSPOSE4_PS)
constexpr auto N = 8;
if(end - j > N)
{ _mm_prefetch((char const *)(&a[j[N]][i]), _MM_HINT_T0); }
#else
#error SS intrinsic not found
#endif
size_t const k = a[*j][i];
while (k >= ncleared && ncleared < buckets.size())
{ buckets[ncleared++].clear(); }
if (k >= buckets.size())
{
buckets.resize(k + 1);
ncleared = buckets.size();
}
buckets[k].push_back(size_t());
using std::swap; swap(buckets[k].back(), *j);
}
for (std::vector<std::vector<size_t> >::iterator
j = buckets.begin(); j != buckets.begin() + ncleared; j->clear(), ++j)
{
begin = std::swap_ranges(j->begin(), j->end(), begin);
}
}
template<class It, class It2>
void radix_isort(It const begin, It const end, It2 const items)
{
for (ptrdiff_t i = 0; i != end - begin; ++i) { items[i] = i; }
size_t smax = 0;
for (It i = begin; i != end; ++i)
{
size_t const n = i->size();
smax = n > smax ? n : smax;
}
std::vector<std::vector<size_t> > buckets;
for (size_t i = 0; i != smax; ++i)
{
radix_ipass(
items, items + (end - begin),
begin, smax - i - 1, buckets);
}
}
#include <functional>
template<class Key>
struct key_comp : public Key
{
explicit key_comp(Key const &key = Key()) : Key(key) { }
template<class T>
bool operator()(T const &a, T const &b) const
{ return this->Key::operator()(a) < this->Key::operator()(b); }
};
template<class Key>
key_comp<Key> make_key_comp(Key const &key) { return key_comp<Key>(key); }
template<class T1, class T2>
struct add : public std::binary_function<T1, T2, T1>
{ T1 operator()(T1 a, T2 const &b) const { return a += b; } };
template<class F>
struct deref : public F
{
deref(F const &f) : F(f) { }
typename std::iterator_traits<
typename F::result_type
>::value_type const
&operator()(typename F::argument_type const &a) const
{ return *this->F::operator()(a); }
};
template<class T> deref<T> make_deref(T const &t) { return deref<T>(t); }
size_t xorshf96(void) // random number generator
{
static size_t x = 123456789, y = 362436069, z = 521288629;
x ^= x << 16;
x ^= x >> 5;
x ^= x << 1;
size_t t = x;
x = y;
y = z;
z = t ^ x ^ y;
return z;
}
template<class It>
struct fun_obj
{
It beg;
bool operator()(ptrdiff_t lhs, ptrdiff_t rhs)
{
return beg[lhs] < beg[rhs];
}
};
template<class It>
fun_obj<It> make_fun_obj(It beg)
{
return fun_obj<It>{beg};
}
struct uint32p8_t
{
uint32_t m32;
uint8_t m8;
uint32p8_t(std::array<uint16_t, 3> const& a)
: m32( a[0]<<(32-3*4) | a[1]<<(32-2*3*4) | (a[2]&0xF00)>>8)
, m8( a[2]&0xFF )
{
}
operator std::array<size_t, 3>() const
{
return {{m32&0xFFF00000 >> (32-3*4), m32&0x000FFF0 >> (32-2*3*4),
(m32&0xF)<<8 | m8}};
}
friend bool operator<(uint32p8_t const& lhs, uint32p8_t const& rhs)
{
if(lhs.m32 < rhs.m32) return true;
if(lhs.m32 > rhs.m32) return false;
return lhs.m8 < rhs.m8;
}
};
#include <stdio.h>
#include <time.h>
#include <array>
#include <iostream>
#include <iomanip>
#include <utility>
#include <algorithm>
#include <cstdlib>
#include <iomanip>
#include <random>
int main(int argc, char* argv[])
{
std::cout.sync_with_stdio(false);
constexpr auto items_count_default = 2<<22;
constexpr auto seed_default = 42;
uint32_t const seed = argc > 1 ? std::atoll(argv[1]) : seed_default;
std::cout << "seed: " << seed << "\n";
size_t const items_count = argc > 2 ? std::atoll(argv[2]) : items_count_default;
std::cout << "item count: " << items_count << "\n";
using Items_array_value_t =
#ifdef RADIX_SORT
size_t
#elif defined(STDSORT)
uint16_t
#elif defined(STDSORT_PACKED)
uint16_t
#endif
;
typedef std::vector<std::array<Items_array_value_t, 3> > Items;
Items items(items_count);
auto const ranks_count =
#ifdef RADIX_SORT
items.size() * 2
#elif defined(STDSORT)
items.size()
#elif defined(STDSORT_PACKED)
items.size()
#endif
;
//auto prng = xorshf96;
std::mt19937 gen(seed);
std::uniform_int_distribution<> dist;
auto prng = [&dist, &gen]{return dist(gen);};
std::vector<size_t> ranks(ranks_count);
for (size_t i = 0; i != items.size(); i++)
{
ranks[i] = i;
for (size_t j = 0; j != items[i].size(); j++)
{ items[i][j] = prng() & 0xFFF; }
}
std::cout << "using ";
clock_t const start = clock();
#ifdef RADIX_SORT
std::cout << "radix sort\n";
radix_isort(items.begin(), items.end(), ranks.begin());
#elif defined(STDSORT)
std::cout << "std::sort\n";
std::sort(ranks.begin(), ranks.begin() + items.size(),
make_fun_obj(items.cbegin())
//make_key_comp(make_deref(std::bind1st(
// add<Items::const_iterator, ptrdiff_t>(),
// items.begin())))
);
#elif defined(STDSORT_PACKED)
std::cout << "std::sort with a packed data structure\n";
using Items_ranks = std::vector< std::pair<uint32p8_t,
decltype(ranks)::value_type> >;
Items_ranks items_ranks;
size_t i = 0;
for(auto iI = items.cbegin(); iI != items.cend(); ++iI, ++i)
{
items_ranks.emplace_back(*iI, i);
}
std::sort(begin(items_ranks), end(items_ranks),
[](Items_ranks::value_type const& lhs,
Items_ranks::value_type const& rhs)
{ return lhs.first < rhs.first; }
);
std::transform(items_ranks.cbegin(), items_ranks.cend(), begin(ranks),
[](Items_ranks::value_type const& e) { return e.second; }
);
#endif
auto const duration = (clock() - start) / (CLOCKS_PER_SEC / 1000);
bool const sorted = std::is_sorted(ranks.begin(), ranks.begin() + items.size(),
make_fun_obj(items.cbegin()));
std::cout << "duration: " << duration << " ms\n"
<< "result sorted: " << std::boolalpha << sorted << "\n";
return 0;
}
I'm having a problem getting the syntax right so if someone can help,please?
I have a timing function which take a function and its arguments as a parameters, but I'm not sure how should the call look like.
#include <iostream>
#include <iterator>
#include <random>
#include <vector>
#include<list>
#include<deque>
#include <algorithm>
#include <chrono>
#include <functional>
#include <sstream>
using namespace std;
using namespace std::chrono;
int global_SortType = 1;
template<class F, class A, typename T>
void times(F func, A arg, int n, T typeval) // call func(arg,n)
{
auto t1 = system_clock::now();
func(arg, n, typeval);
auto t2 = system_clock::now();
auto dms = duration_cast<milliseconds>(t2-t1);
cout << "f(x) took " << dms.count() << " milliseconds\n";
}
template<class T>
bool Greater(const T& v1, const T& v2)
{
return false;
}
bool Greater(const int& v1, const int& v2)
{
return v1 > v2;
}
bool Greater(const string& v1, const string& v2)
{
return strcmp(v1.c_str(), v2.c_str()) > 0;
}
template <class T>
struct GreaterThan: public std::binary_function<T, T, bool > {
bool operator () ( const T &ival, const T &newval ) const {
return Greater(ival, newval);
}
};
string random_gen(string& s)
{
string Result; // string which will contain the result
ostringstream convert; // stream used for the conversion
convert << rand();
return convert.str();
}
int random_gen(int& i){
default_random_engine re { std::random_device()() };
uniform_int_distribution<int> dist;
auto r= bind(dist,re);
int x =r();
return x;
}
template<class T>
void print(T& val)
{
}
void print(int& val)
{
cout << val << " ";
}
void print(string& val)
{
cout << val.c_str() << " ";
}
struct Record
{
int v;
string s;
Record(){}
Record(int iv, string ss): v(iv), s(ss)
{
}
};
Record random_gen(Record& r)
{
string stemp;
int i = 0;
return Record(random_gen(i), random_gen(stemp));
}
void print(Record& r)
{
cout<<"int="<<r.v<<" string=";
print(r.s);
}
bool Greater(const Record& r1, const Record& r2)
{
return global_SortType == 1 ? Greater(r1.v, r2.v) : Greater(r1.s, r2.s);
}
template<typename SequenceContainer, class T>
void build_cont(SequenceContainer& seq, int n, T valtype)
{
for(int i=0; i!=n; ++i) {
T gen = random_gen(valtype);
typename SequenceContainer::const_iterator it;
it=find_if(seq.begin(), seq.end(), std::bind2nd(GreaterThan<T>(), gen));
seq.insert(it, gen);
}
for(int i=n-1; i >=0; i--)
{
int gen = i;
if(i > 0)
gen = random_gen(i)%i;
typename SequenceContainer::const_iterator it=seq.begin();
for(int j = 0; j < gen; j++)
it++;
seq.erase(it);
}
}
int main()
{
int n=1000;
vector<int> v;
times(build_cont<std::vector<int>, int>, v, n, 0); // works
vector<string> sv;
string stemp = "";
times(build_cont<std::vector<string>, string>, sv, n, stemp); // works
global_SortType = 1;
vector<Record> rv;
Record rtemp(0, "sfds");
global_SortType = 2;
vector<Record> rsv;
Record rstemp(0, "sfds");
//This one desn't work and I'm not sure of the right syntax
times(build_cont<std::vector<Record>,Record>, sv, n, stemp);
return 0;
}
I'm getting this error
Non-const lvalue reference to type 'vector>' cannot bind to a value of unrelated type 'vector, allocator>>'
and it points to line
func(arg, n, typeval);
Inside this function:
template<typename SequenceContainer, class T>
void build_cont(SequenceContainer& seq, int n, T valtype)
You are using const_iterators rather than iterators to perform insertions and removals. You should change the definition of that function as follows:
template<typename SequenceContainer, class T>
void build_cont(SequenceContainer& seq, int n, T valtype)
{
for(int i=0; i!=n; ++i) {
T gen = random_gen(valtype);
typename SequenceContainer::iterator it;
// ^^^^^^^^
it=find_if(seq.begin(), seq.end(), std::bind2nd(GreaterThan<T>(), gen));
seq.insert(it, gen);
}
for(int i=n-1; i >=0; i--)
{
int gen = i;
if(i > 0)
gen = random_gen(i)%i;
typename SequenceContainer::iterator it=seq.begin();
// ^^^^^^^^
for(int j = 0; j < gen; j++)
it++;
seq.erase(it);
}
}
Also, you forgot to #include the <cstring> standard header, which contains the definition for the strcmp() function. You are using that function inside your Greater() function:
bool Greater(const string& v1, const string& v2)
{
return strcmp(v1.c_str(), v2.c_str()) > 0;
// ^^^^^^
// You need to #include <cstring> before calling this function
}
Moreover, you're invoking function times() with the wrong arguments (sv and stemp):
//This one desn't work and I'm not sure of the right sytax
times(build_cont<std::vector<Record>,Record>, rsv, n, rstemp);
// ^^^ ^^^^^^