Finding subsequences of vector from a vector of structs - c++

I'm trying to find subsequences of vector from a larger vector.
Here's my full code.
#include <iostream>
#include <vector>
using namespace std;
struct Elem {
bool isString;
float f;
string s;
};
void getFounds(vector<Elem> &src, vector<Elem> &dst, vector<size_t> &founds)
{
//what should be in here?
}
int main(int argc, const char * argv[]) {
vector<Elem> elems1 = {{false, 1.f, ""}, {false, 2.f, ""}, {true, 0.f, "foo"},
{false, 1.f, ""}, {false, 2.f, ""}, {true, 0.f, "foo"}}; //the source vector
vector<Elem> elems2 = {{false, 2.f, ""}, {true, 0.f, "foo"}}; //the subsequence to find
vector<size_t> founds; //positions found
getFounds(elems1, elems2, founds);
for (size_t i=0; i<founds.size(); ++i)
cout << founds[i] << endl; // should print 1, 4
return 0;
}
I could do this with std::search if I use it for vector of single types but if I use it for vector of structs, it shows an error saying
"invalid operands to binary expression ('const Elem' and 'const
Elem')"
Is it really impossible to use std::search in this case?
What would be the good way to implement getFounds() in the code?
EDIT : I could make it work by creating an operator function and using std::search
bool operator==(Elem const& a, Elem const& b)
{
return a.isString == b.isString && a.f == b.f && a.s == b.s;
}
void getFounds(vector<Elem> &src, vector<Elem> &dst, vector<size_t> &founds)
{
for (size_t i=0; i<src.size(); ++i) {
auto it = search(src.begin()+i, src.end(), dst.begin(), dst.end());
if (it != src.end()) {
size_t pos = distance(src.begin(), it);
founds.push_back(pos);
i += pos;
}
}
}
However, I would appreciate if anyone can give me an advice to make the code simpler.

Is it really impossible to use std::search in this case?
No, you just need to implement the operator== function in your struct as you've done. You could also implement the operator!= as well, example:
struct Elem
{
bool isString;
float f;
std::string s;
bool operator==(const Elem& other) const {
return (this->isString == other.isString &&
this->f == other.f &&
this->s == other.s);
}
bool operator!=(const Elem& other) const {
return !(*this == other);
}
};
What would be the good way to implement getFounds() in the code? ... advice to make it simpler.
Simpler is relative, especially since you're already using the standard library to achieve what you want; however, you could also implement the getFounds function like so:
void getFounds(const std::vector<Elem>& src, const std::vector<Elem>& sub, std::vector<size_t>& founds)
{
size_t count = 0, tot = 0;
auto beg = sub.begin();
for (auto look = src.begin(); look != src.end();) {
if (*look != *beg) { ++look; ++count; continue; }
for (tot = 0; beg != sub.end(); ++beg, ++look, ++tot) {
if (look == src.end()) { break; }
if (*look != *beg) { break; }
}
if (tot == sub.size()) { founds.push_back(count); }
count += tot;
beg = sub.begin();
}
}
I don't know if that's "simpler" for your needs, as it does essentially what the std::search algorithm would do (loop and check and break if elements aren't matching, etc.), it's just "another" way to do it.
Hope that helps.

Override == and iterate through both arrays looking for match:
bool operator==(Elem const &el1, Elem const &el2)
{
return
el1.isString == el2.isString
&&
el1.f == el2.f
&&
el1.s == el2.s;
}
void getFounds(std::vector<Elem> const &src, std::vector<Elem> const &dst, std::vector<size_t> &founds)
{
for (size_t i = 0; i < src.size(); ++i)
for (size_t j = 0; j < dst.size(); ++j)
if (src[i] == dst[j])
founds.push_back(i);
}
This however will find every index. For example your example will print 1 2 4 5. If you want to abort after 1st find, you need to add some additional logic to it.

Related

std::map<Custom, int>::find() does not find anything

For some reason std::map does not find my objects.
Here is my simplified object :
class LangueISO3 {
public:
enum {
SIZE_ISO3 = 3
};
static constexpr char DEF_LANG[SIZE_ISO3] = {'-','-','-'};
constexpr LangueISO3():code() {
for(size_t i(0); i < SIZE_ISO3; i++){
code[i] = DEF_LANG[i];
}
};
LangueISO3(const std::string& s) {strncpy(code, s.c_str(), 3);};
bool operator==(const LangueISO3& lg) const { return strncmp(code, lg.code, 3) == 0;};
bool operator<(const LangueISO3& lg)const { return code < lg.code;};
private:
char code[SIZE_ISO3];
};
My test is :
{
CPPUNIT_ASSERT_EQUAL(LangueISO3("eng"), LangueISO3("eng"));
std::map<LangueISO3, int> lmap;
lmap.emplace(LangueISO3("fra"), 0);
lmap.emplace(LangueISO3("deu"), 1);
lmap.emplace(LangueISO3("eng"), 2);
auto it = lmap.find(LangueISO3("deu"));
CPPUNIT_ASSERT_EQUAL(1, it->second);
}
The first test has no problem, however the second fails. lmap.find() always return lmap.end()
What did I do wrong ?
You can't compare character arrays with the < operator. When you write code < lg.code, the compiler will compare the address of the code array, and not the contents of the arrays.
Change the definition for operator< to use strncmp:
bool operator<(const LangueISO3& lg)const {
return strncmp(code, lg.code, SIZE_ISO3) < 0;
}
Also, the comparison for operator== should use the SIZE_ISO3 constant instead of hardcoding the size at 3.

vector iterators incompatible with const vector&

I'm writing program for graphs. In this program I have a method which has to return vertices inside the weak component originating at vertex. I am getting: Error "vector iterators incompatible"
struct graph {
std::vector <std::vector<int>> gr;
};
std::vector<int> weak_component(const graph& g, int vertex) {
std::vector<int> ret;
stack<int> s;
s.push(vertex);
vector<int>::iterator j;
bool* used = new bool[g.gr.size()];
while (!s.empty()) {
int hodn=s.top();
s.pop();
used[hodn] = true;
for (j == g.gr[hodn].begin(); j != g.gr[hodn].end(); j++) {
if (!used[*j]) {
s.push(*j);
ret.push_back(*j);
}
}
}
return ret;
}
What's wrong with it?
Since you are taking g as a const graph&, this means g.gr is treated as const inside your function. begin on a const vector<T> returns a const_iterator. (also you used == instead of = for assignment)
for (std::vector<int>::const_iterator j = g.gr[hodn].begin(); ...)
But with C++11 or newer you may as well use auto to avoid this
for (auto j = g.gr[hodn].begin(); ...)
or a range-based for:
for (auto&& e : g.gr) {
if (!used[e]) {
s.push(e);
ret.push_back(e);
}
}

Find max/min of vector of vectors

What is the most efficient and standard (C++11/14) way to find the max/min item of vector of vectors?
std::vector<std::vector<double>> some_values{{5,0,8},{3,1,9}};
the wanted max element is 9
the wanted min element is 0
Here's a multi-threaded solution that returns an iterator (or throws) to the maximum for general type T (assuming operator< is defined for T). Note the most important optimisation is to perform the inner max operations on the 'columns' to exploit C++'s column-major ordering.
#include <vector>
#include <algorithm>
template <typename T>
typename std::vector<T>::const_iterator max_element(const std::vector<std::vector<T>>& values)
{
if (values.empty()) throw std::runtime_error {"values cannot be empty"};
std::vector<std::pair<typename std::vector<T>::const_iterator, bool>> maxes(values.size());
threaded_transform(values.cbegin(), values.cend(), maxes.begin(),
[] (const auto& v) {
return std::make_pair(std::max_element(v.cbegin(), v.cend()), v.empty());
});
auto it = std::remove_if(maxes.begin(), maxes.end(), [] (auto p) { return p.second; });
if (it == maxes.begin()) throw std::runtime_error {"values cannot be empty"};
return std::max_element(maxes.begin(), it,
[] (auto lhs, auto rhs) {
return *lhs.first < *rhs.first;
})->first;
}
threaded_transform is not part of the standard library (yet), but here's an implementation you could use.
#include <vector>
#include <thread>
#include <algorithm>
#include <cstddef>
template <typename InputIterator, typename OutputIterator, typename UnaryOperation>
OutputIterator threaded_transform(InputIterator first, InputIterator last, OutputIterator result, UnaryOperation op, unsigned num_threads)
{
std::size_t num_values_per_threads = std::distance(first, last) / num_threads;
std::vector<std::thread> threads;
threads.reserve(num_threads);
for (int i = 1; i <= num_threads; ++i) {
if (i == num_threads) {
threads.push_back(std::thread(std::transform<InputIterator,
OutputIterator, UnaryOperation>,
first, last, result, op));
} else {
threads.push_back(std::thread(std::transform<InputIterator,
OutputIterator, UnaryOperation>,
first, first + num_values_per_threads,
result, op));
}
first += num_values_per_threads;
result += num_values_per_threads;
}
for (auto& thread : threads) thread.join();
return result;
}
template <typename InputIterator, typename OutputIterator, typename UnaryOperation>
OutputIterator threaded_transform(InputIterator first, InputIterator last, OutputIterator result, UnaryOperation op)
{
return threaded_transform<InputIterator, OutputIterator, UnaryOperation>(first, last, result, op, std::thread::hardware_concurrency());
}
If you used a boost::multi_array<double, 2> instead of a std::vector<std::vector<double>> it would be as simple as:
auto minmax = std::minmax_element(values.data(), values.data() + values.num_elements());
Live demo.
The plain for loop way:
T max_e = std::numeric_limits<T>::min();
for(const auto& v: vv) {
for(const auto& e: v) {
max_e = std::max(max_e, e);
}
}
You must at least look at every element, so, as Anony-mouse mentioned, complexity will be at least O(n^2).
#include <vector>
#include <limits>
#include <algorithm>
int main() {
std::vector<std::vector<double>> some_values;
double max = std::numeric_limits<double>::lowest();
for (const auto& v : some_values)
{
double current_max = *std::max_element(v.cbegin(), v.cend());
max = max < current_max ? current_max : max; // max = std::max(current_max, max);
}
}
You can do it pretty easily with Eric Niebler's range-v3 library (which obviously isn't standard yet, but hopefully will be in the not-too-distant future):
vector<vector<double>> some_values{{5,0,8},{3,1,9}};
auto joined = some_values | ranges::view::join;
auto p = std::minmax_element(joined.begin(), joined.end());
p.first is an iterator to the min element; p.second to the max.
(range-v3 does have an implementation of minmax_element, but unfortunately, it requires a ForwardRange and view::join only gives me an InputRange, so I can't use it.)
Any efficient way to calculate the maximum element in a 2-D array(or vector in your case) involves a complexity of O(n^2) irrespective of what you do, as the calculation involves a comparison between n*n elements.Best way in terms of ease of use is to use std::max_element on the vector of vectors.I will not delve into details.Here is the reference.
If you create a custom iterator to iterate over all double of your vector of vector, a simple std::minmax_element do the job
iterator is something like:
class MyIterator : public std::iterator<std::random_access_iterator_tag, double>
{
public:
MyIterator() : container(nullptr), i(0), j(0) {}
MyIterator(const std::vector<std::vector<double>>& container,
std::size_t i,
std::size_t j) : container(&container), i(i), j(j)
{
// Skip empty container
if (i < container.size() && container[i].empty())
{
j = 0;
++(*this);
}
}
MyIterator(const MyIterator& rhs) = default;
MyIterator& operator = (const MyIterator& rhs) = default;
MyIterator& operator ++() {
if (++j >= (*container)[i].size()) {
do {++i;} while (i < (*container).size() && (*container)[i].empty());
j = 0;
}
return *this;
}
MyIterator operator ++(int) { auto it = *this; ++(*this); return it; }
MyIterator& operator --() {
if (j-- == 0) {
do { --i; } while (i != 0 && (*container)[i].empty());
j = (*container)[i].size();
}
return *this;
}
MyIterator operator --(int) { auto it = *this; --(*this); return it; }
double operator *() const { return (*container)[i][j]; }
bool operator == (const MyIterator& rhs) const {
return container == rhs.container && i == rhs.i && j == rhs.j;
}
bool operator != (const MyIterator& rhs) const { return !(*this == rhs); }
private:
const std::vector<std::vector<double>>* container;
std::size_t i;
std::size_t j;
};
And usage may be
// Helper functions for begin/end
MyIterator MyIteratorBegin(const std::vector<std::vector<double>>& container)
{
return MyIterator(container, 0, 0);
}
MyIterator MyIteratorEnd(const std::vector<std::vector<double>>& container)
{
return MyIterator(container, container.size(), 0);
}
int main() {
std::vector<std::vector<double>> values = {{5,0,8}, {}, {3,1,9}};
auto b = MyIteratorBegin(values);
auto e = MyIteratorEnd(values);
auto p = std::minmax_element(b, e);
if (p.first != e) {
std::cout << "min is " << *p.first << " and max is " << *p.second << std::endl;
}
}
Live example
Using the accumulate function you could write:
#include <iostream>
#include <numeric>
#include <vector>
int main()
{
std::vector<std::vector<double>> m{ {5, 0, 8}, {3, 1, 9} };
double x = std::accumulate(m.begin(), m.end(), m[0][0],
[](double max, const std::vector<double> &v)
{
return std::max(max,
*std::max_element(v.begin(),
v.end()));
});
std::cout << x << '\n';
return 0;
}
but I'd prefer the good, old for-loop.
The example can be extended to find both the min and max values:
std::accumulate(m.begin(), m.end(),
std::make_pair(m[0][0], m[0][0]),
[](std::pair<double, double> minmax, const std::vector<double> &v)
{
auto tmp(std::minmax_element(v.begin(), v.end()));
return std::make_pair(
std::min(minmax.first, *tmp.first),
std::max(minmax.second, *tmp.second));
});
(in real code you have to handle the empty-vector case)
Unfortunately a vector of vector isn't stored contiguously in memory, so you haven't a single block containing all the values (this is one of the reasons why a vector of vector isn't a good model for a matrix).
You can take advantage of a vector of vector if it contains a lot of elements.
Since each sub-vector is autonomous, you could use std::async to fill asynchronously a vector of futures containing the max value of each sub-vector.
The simplest method would be to first have a function to determine the max/min elements of one vector, say a function called:
double getMaxInVector(const vector<double>& someVec){}
Passing by reference (for reading purposes only) in this case will be a lot more time and space efficient (you don't want your function copying an entire vector). Thus in your function to determine max/min element of a vector of vectors, you would have a nested loop, such as:
for(size_t x= 0; x < some_values.size(); x++){
for(size_t y = 0; y < x.size(); y++){
// y represents the vectors inside the vector of course
// current max/min = getMax(y)
// update max/min after inner loop finishes and x increments
// by comparing it with previous max/min
The problem with the above solution is its inefficiency. From my knowledge, this algorithm will generally run on O(n^2log(n)) efficiency, which is quite unimpressive. But of course, it is still a solution. Although there might be standard algorithms that can find the max/min of a vector for you, it's always more accomplishing to write your own, and using the given will usually do nothing in terms of improving efficiency because the algorithm will generally be the same (for small functions that determine max/min). In fact, theoretically, standard functions would run marginally slower since those functions are templates which have to determine the type it is dealing with at run-time.
Lets say we have a vector named some_values, as shown below
7 4 2 0
4 8 10 8
3 6 7 6
3 9 19* 14
define a one-dimensional vector as shown below
vector<int> oneDimVector;
for(int i = 0; i < 4; i++){
for(int j = 0; j < 4; j++){
oneDimVector.push_back(some_values[i][j]);
}
}
Then find out a maximum/minimum element in that one-dimensional vector as shown below
vector<int>::iterator maxElement = max_element(oneDimVector.begin(),oneDimVector.end());
vector<int>::iterator minElement = min_element(oneDimVector.begin(),oneDimVector.end());
Now you get the max/min elements as below
cout << "Max element is " << *maxElement << endl;
cout << "Min element is " << *minElement << endl;
vector<vector<int>> vv = { vector<int>{10,12,43,58}, vector<int>{10,14,23,18}, vector<int>{28,47,12,90} };
vector<vector<int>> vv1 = { vector<int>{22,24,43,58}, vector<int>{56,17,23,18}, vector<int>{11,12,12,90} };
int matrix1_elem_sum=0;
int matrix2_elem_sum = 0;
for (size_t i = 0; i < vv.size(); i++)
{
matrix1_elem_sum += std::accumulate(vv[i].begin(), vv[i].end(), 0);
matrix2_elem_sum += std::accumulate(vv1[i].begin(), vv1[i].end(), 0);
}
cout << matrix1_elem_sum <<endl;
cout << matrix2_elem_sum << endl;
int summ = matrix1_elem_sum + matrix2_elem_sum;
cout << summ << endl;
or optimazed variant:
vector<vector<int>> vv = { vector<int>{10,12,43,58}, vector<int>{10,14,23,18}, vector<int>{28,47,12,90} };
vector<vector<int>> vv1 = { vector<int>{22,24,43,58}, vector<int>{56,17,23,18}, vector<int>{11,12,12,90} };
int summ=0;
int matrix2_elem_sum = 0;
for (size_t i = 0; i < vv.size(); i++)
{
summ += std::accumulate(vv[i].begin(), vv[i].end(), 0)+ std::accumulate(vv1[i].begin(), vv1[i].end(), 0);
}
cout << summ << endl;
}

Compare versions as strings

Comparing version numbers as strings is not so easy...
"1.0.0.9" > "1.0.0.10", but it's not correct.
The obvious way to do it properly is to parse these strings, convert to numbers and compare as numbers.
Is there another way to do it more "elegantly"? For example, boost::string_algo...
I don't see what could be more elegant than just parsing -- but please make use of standard library facilities already in place. Assuming you don't need error checking:
void Parse(int result[4], const std::string& input)
{
std::istringstream parser(input);
parser >> result[0];
for(int idx = 1; idx < 4; idx++)
{
parser.get(); //Skip period
parser >> result[idx];
}
}
bool LessThanVersion(const std::string& a,const std::string& b)
{
int parsedA[4], parsedB[4];
Parse(parsedA, a);
Parse(parsedB, b);
return std::lexicographical_compare(parsedA, parsedA + 4, parsedB, parsedB + 4);
}
Anything more complicated is going to be harder to maintain and isn't worth your time.
I would create a version class.
Then it is simple to define the comparison operator for the version class.
#include <iostream>
#include <sstream>
#include <vector>
#include <iterator>
class Version
{
// An internal utility structure just used to make the std::copy in the constructor easy to write.
struct VersionDigit
{
int value;
operator int() const {return value;}
};
friend std::istream& operator>>(std::istream& str, Version::VersionDigit& digit);
public:
Version(std::string const& versionStr)
{
// To Make processing easier in VersionDigit prepend a '.'
std::stringstream versionStream(std::string(".") + versionStr);
// Copy all parts of the version number into the version Info vector.
std::copy( std::istream_iterator<VersionDigit>(versionStream),
std::istream_iterator<VersionDigit>(),
std::back_inserter(versionInfo)
);
}
// Test if two version numbers are the same.
bool operator<(Version const& rhs) const
{
return std::lexicographical_compare(versionInfo.begin(), versionInfo.end(), rhs.versionInfo.begin(), rhs.versionInfo.end());
}
private:
std::vector<int> versionInfo;
};
// Read a single digit from the version.
std::istream& operator>>(std::istream& str, Version::VersionDigit& digit)
{
str.get();
str >> digit.value;
return str;
}
int main()
{
Version v1("10.0.0.9");
Version v2("10.0.0.10");
if (v1 < v2)
{
std::cout << "Version 1 Smaller\n";
}
else
{
std::cout << "Fail\n";
}
}
First the test code:
int main()
{
std::cout << ! ( Version("1.2") > Version("1.3") );
std::cout << ( Version("1.2") < Version("1.2.3") );
std::cout << ( Version("1.2") >= Version("1") );
std::cout << ! ( Version("1") <= Version("0.9") );
std::cout << ! ( Version("1.2.3") == Version("1.2.4") );
std::cout << ( Version("1.2.3") == Version("1.2.3") );
}
// output is 111111
Implementation:
#include <string>
#include <iostream>
// Method to compare two version strings
// v1 < v2 -> -1
// v1 == v2 -> 0
// v1 > v2 -> +1
int version_compare(std::string v1, std::string v2)
{
size_t i=0, j=0;
while( i < v1.length() || j < v2.length() )
{
int acc1=0, acc2=0;
while (i < v1.length() && v1[i] != '.') { acc1 = acc1 * 10 + (v1[i] - '0'); i++; }
while (j < v2.length() && v2[j] != '.') { acc2 = acc2 * 10 + (v2[j] - '0'); j++; }
if (acc1 < acc2) return -1;
if (acc1 > acc2) return +1;
++i;
++j;
}
return 0;
}
struct Version
{
std::string version_string;
Version( std::string v ) : version_string(v)
{ }
};
bool operator < (Version u, Version v) { return version_compare(u.version_string, v.version_string) == -1; }
bool operator > (Version u, Version v) { return version_compare(u.version_string, v.version_string) == +1; }
bool operator <= (Version u, Version v) { return version_compare(u.version_string, v.version_string) != +1; }
bool operator >= (Version u, Version v) { return version_compare(u.version_string, v.version_string) != -1; }
bool operator == (Version u, Version v) { return version_compare(u.version_string, v.version_string) == 0; }
https://coliru.stacked-crooked.com/a/7c74ad2cc4dca888
Here's a clean, compact C++20 solution, using the new spaceship operator <=>, and Boost's string split algorithm.
This constructs and holds a version string as a vector of numbers - useful for further processing, or can be disposed of as a temporary. This also handles version strings of different lengths, and accepts multiple separators.
The spaceship operator lets us provide results for <, > and == operators in a single function definition (although the equality has to be separately defined).
#include <compare>
#include <boost/algorithm/string.hpp>
struct version {
std::vector<size_t> data;
version() {};
version(std::string_view from_string) {
/// Construct from a string
std::vector<std::string> data_str;
boost::split(data_str, from_string, boost::is_any_of("._-"), boost::token_compress_on);
for(auto const &it : data_str) {
data.emplace_back(std::stol(it));
}
};
std::strong_ordering operator<=>(version const& rhs) const noexcept {
/// Three-way comparison operator
size_t const fields = std::min(data.size(), rhs.data.size());
// first compare all common fields
for(size_t i = 0; i != fields; ++i) {
if(data[i] == rhs.data[i]) continue;
else if(data[i] < rhs.data[i]) return std::strong_ordering::less;
else return std::strong_ordering::greater;
}
// if we're here, all common fields are equal - check for extra fields
if(data.size() == rhs.data.size()) return std::strong_ordering::equal; // no extra fields, so both versions equal
else if(data.size() > rhs.data.size()) return std::strong_ordering::greater; // lhs has more fields - we assume it to be greater
else return std::strong_ordering::less; // rhs has more fields - we assume it to be greater
}
bool operator==(version const& rhs) const noexcept {
return std::is_eq(*this <=> rhs);
}
};
Example usage:
std::cout << (version{"1.2.3.4"} < version{"1.2.3.5"}) << std::endl; // true
std::cout << (version{"1.2.3.4"} > version{"1.2.3.5"}) << std::endl; // false
std::cout << (version{"1.2.3.4"} == version{"1.2.3.5"}) << std::endl; // false
std::cout << (version{"1.2.3.4"} > version{"1.2.3"}) << std::endl; // true
std::cout << (version{"1.2.3.4"} < version{"1.2.3.4.5"}) << std::endl; // true
int VersionParser(char* version1, char* version2) {
int a1,b1, ret;
int a = strlen(version1);
int b = strlen(version2);
if (b>a) a=b;
for (int i=0;i<a;i++) {
a1 += version1[i];
b1 += version2[i];
}
if (b1>a1) ret = 1 ; // second version is fresher
else if (b1==a1) ret=-1; // versions is equal
else ret = 0; // first version is fresher
return ret;
}

Bit Operation For Finding String Difference

The following string of mine tried to find difference between two strings.
But it's horribly slow as it iterate the length of string:
#include <string>
#include <vector>
#include <iostream>
using namespace std;
int hd(string s1, string s2) {
// hd stands for "Hamming Distance"
int dif = 0;
for (unsigned i = 0; i < s1.size(); i++ ) {
string b1 = s1.substr(i,1);
string b2 = s2.substr(i,1);
if (b1 != b2) {
dif++;
}
}
return dif;
}
int main() {
string string1 = "AAAAA";
string string2 = "ATATT";
string string3 = "AAAAA";
int theHD12 = hd(string1,string2);
cout << theHD12 << endl;
int theHD13 = hd(string1,string3);
cout << theHD13 << endl;
}
Is there a fast alternative to do that?
In Perl we can have the following approach:
sub hd {
return ($_[0] ^ $_[1]) =~ tr/\001-\255//;
}
which is much2 faster than iterating the position.
I wonder what's the equivalent of it in C++?
Try to replace the for loop by:
for (unsigned i = 0; i < s1.size(); i++ ) {
if (b1[i] != b2[i]) {
dif++;
}
}
This should be a lot faster because no new strings are created.
Fun with the STL:
#include <numeric> //inner_product
#include <functional> //plus, equal_to, not2
#include <string>
#include <stdexcept>
unsigned int
hd(const std::string& s1, const std::string& s2)
{
// TODO: What should we do if s1.size() != s2.size()?
if (s1.size() != s2.size()){
throw std::invalid_argument(
"Strings passed to hd() must have the same lenght"
);
}
return std::inner_product(
s1.begin(), s1.end(), s2.begin(),
0, std::plus<unsigned int>(),
std::not2(std::equal_to<std::string::value_type>())
);
}
Use iterators:
int GetHammingDistance(const std::string &a, const std::string &b)
{
// Hamming distance is not defined for strings of different lengths.
ASSERT(a.length() == b.length());
std::string::const_iterator a_it = a.begin();
std::string::const_iterator b_it = b.begin();
std::string::const_iterator a_end = a.end();
std::string::const_iterator b_end = b.end();
int distance = 0;
while (a_it != a_end && b_it != b_end)
{
if (*a_it != *b_it) ++distance;
++a_it; ++b_it;
}
return distance;
}
Choice 1: Modify your original code to be as effecient as possable.
int hd(string const& s1, string const& s2)
{
// hd stands for "Hamming Distance"
int dif = 0;
for (std::string::size_type i = 0; i < s1.size(); i++ )
{
char b1 = s1[i];
char b2 = s2[i];
dif += (b1 != b2)?1:0;
}
return dif;
}
Second option use some of the STL algorithms to do the heavy lifting.
struct HammingFunc
{
inline int operator()(char s1,char s2)
{
return s1 == s2?0:1;
}
};
int hd(string const& s1, string const& s2)
{
int diff = std::inner_product(s1.begin(),s1.end(),
s2.begin(),
0,
std::plus<int>(),HammingFunc()
);
return diff;
}
Some obvious points that might make it faster:
Pass the strings as const references, not by value
Use the indexing operator [] to get characters, not a method call
Compile with optimization on
You use strings.
As explained here
The hunt for the fastest Hamming Distance C implementation
if you can use char* my experiements conclude that for Gcc 4.7.2 on an Intel Xeon X5650 the fastest general purpose hamming distance calculating function for small strings (char arrays) is:
// na = length of both strings
unsigned int HammingDistance(const char* a, unsigned int na, const char* b) {
unsigned int num_mismatches = 0;
while (na) {
if (*a != *b)
++num_mismatches;
--na;
++a;
++b;
}
return num_mismatches;
}
If your problem allows you to set an upper distance limit, so that you don't care for greater distances and this limit is always less than the strings' length, the above example can be furhterly optimized to:
// na = length of both strings, dist must always be < na
unsigned int HammingDistance(const char* const a, const unsigned int na, const char* const b, const unsigned int dist) {
unsigned int i = 0, num_mismatches = 0;
while(i <= dist)
{
if (a[i] != b[i])
++num_mismatches;
++i;
}
while(num_mismatches <= dist && i < na)
{
if (a[i] != b[i])
++num_mismatches;
++i;
}
return num_mismatches;
}
I am not sure if const does anything regarding speed, but i use it anyways...