Given a string containing a number of characters interspersed with dashes, for example string s = "A--TG-DF----GR--";, I wish to randomly select a block of dashes (could be of size 1, 2, …, max number of consecutive dashes in string), and copy them over to another part of the string chosen at random.
For example, A--TG-DF(---)-GR-- gets moved to A--T(---)G-DF-GR-- while
another iteration may give A--TG-DF----GR(--) gets moved to A--TG-(--)DF----GR.
I'm generating random indices of the string through int i = rand() % (int) s.length();. Insertion happens through s.insert(rand() % (int) s.length(), substr);, where substr is the block of dashes.
My main problem lies in finding randomly a continuous group of dashes. I thought of using s.find("-"), but that'd only return the first instance of a single dash, and not a random position of a collection of dashes.
I know this problem is likely steeped in XY problems, but I found it a nice challenge none-the-less, so I thought about implementing it with the Boost Interval Container library.
The beauty of the library is that you can forget about a lot of details, while you don't sacrifice a lot of performance.
I've taken the liberty to generalize it, so that it is capable of moving multiple blocks of dashes (uniform randomly selected) simultaneously.
The solution runs Live On Coliru and generates 1,000,000 random transpositions of the given sample with randomly varied numbers of moved blocks (1..3) in about 2673 ms (1156 ms on my machine):
Generator gen(test_case);
std::string result;
std::map<std::string, size_t> histo;
for(int i = 0; i < 1000000; ++i) {
auto const mobility = gen.gen_relocations(1 + gen.random(3)); // move n blocks of dashes
result.clear();
gen.apply_relocations(mobility, std::back_inserter(result));
histo[result]++;
}
Note: the benchmark times include the time taken to build the histogram of unique results generated
Let's do a code walk-through here to explain things:
I tried to use "readable" types:
namespace icl = boost::icl;
using Position = size_t;
using Map = icl::interval_set<Position>;
using Region = Map::value_type;
E.g. the function that builds the Map of dashes is simply:
template <typename It> Map region_map(It f, It l) {
Map dashes;
for (Position pos = 0; f!=l; ++f, ++pos)
if ('-' == *f)
dashes += pos;
return dashes;
}
Note how I didn't particularly optimize this. I let the interval_set combine adjacent dashes. We might use hinted insertion, or a parser that add consecutive dashes as a block. I opted for KISS here.
Later on, we generate relocations, which map a Region onto a non-moving Position in the rest of the text.
using Relocs = boost::container::flat_multimap<Position, Region>;
By using the flat multimap, the caller has the entries already sorted by ascending insertion point. Because we use a reserve()-ed up-front flat multimap, we avoid the overhead of a node based implementation of map here.
We start by picking the dash-blocks to be moved:
Map pick_dashes(int n) {
Map map;
if (!_dashes.empty())
for (int i = 0; i < n; ++i)
map += *std::next(_dashes.begin(), _select(_rng));
return map;
}
The random distribution have been dimensioned at construction, e.g.:
_dashes(region_map(_input.begin(), _input.end())),
_rng(std::random_device {}()),
_select (0, _dashes.iterative_size() - 1),
_randpos(0, _input.size() - 1),
Next, we assign insertion-positions to each. The trick is to assign positions inside non-mobile (inert) regions of the source.
this includes other blocks of dashes that stay in their place
there is the degenerate case where everything is a block of dashes, we detected this in the constructor:
_is_degenerate(cardinality(_dashes) == _input.size())
So the code reads as follows:
Relocs gen_relocations(int n=1) {
Map const moving = pick_dashes(n);
Relocs relocs;
relocs.reserve(moving.iterative_size());
if (_is_degenerate)
{
// degenerate case (everything is a dash); no non-moving positions
// exist, just pick 0
for(auto& region : moving)
relocs.emplace(0, region);
} else {
auto inertia = [&] {
Position inert_point;
while (contains(moving, inert_point = _randpos(_rng)))
; // discard insertion points that are moving
return inert_point;
};
for(auto& region : moving)
relocs.emplace(inertia(), region);
}
return relocs;
}
Now all we need to do is apply the relocations.
The generic implementation of this is pretty straightforward. Again, it's not particularly optimized in order to keep it simple (KISS):
template <typename F>
void do_apply_relocations(Relocs const& mobility, F const& apply) const {
icl::split_interval_set<Position> steps {{0, _input.size()}};
for (auto& m : mobility) {
steps += m.first; // force a split on insertion point
steps -= m.second; // remove the source of the move
//std::cout << m.second << " moving to #" << m.first << ": " << steps << "\n";
}
auto next_mover = mobility.begin();
for(auto& step : steps) {
while (next_mover!=mobility.end() && contains(step, next_mover->first))
apply((next_mover++)->second, true);
apply(step, false);
}
}
Note The trick here is that we "abuse" the split_interval_set combining strategy to break the processing into sub-ranges that "stop" at the randomly generated insertion points: these artificial regions are the "steps" in our generation loop.
The apply function there is what we implement to get what we want. In our case we want a string like A--TG-DFGR(----)-- so we write an overload that appends to a container (e.g. std::string) using any output iterator:
template <typename Out>
Out apply_relocations(Relocs const& mobility, Out out) const {
if (_is_degenerate)
return std::copy(_input.begin(), _input.end(), out);
auto to_iter = [this](Position pos) { return _input.begin() + pos; };
do_apply_relocations(mobility, [&](Region const& r, bool relocated) {
if (relocated) *out++ = '(';
out = std::copy(
to_iter(first(r)),
to_iter(last(r)+1),
out
);
if (relocated) *out++ = ')';
});
return out;
}
Note The "complicated" part here are mapping the Position to input iterators (to_iter) and the code to optionally add () around moved blocks.
And with that, we have seen all the code.
Full Listing
#include <boost/container/flat_map.hpp>
#include <boost/icl/interval_set.hpp>
#include <boost/icl/split_interval_set.hpp>
#include <boost/icl/separate_interval_set.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/range/algorithm.hpp>
#include <iomanip>
#include <iostream>
#include <random>
#include <map>
#include <chrono>
namespace icl = boost::icl;
using Position = size_t;
using Map = icl::interval_set<Position>;
using Region = Map::value_type;
using Relocs = boost::container::flat_multimap<Position, Region>;
struct Generator {
Generator(std::string const& input)
: _input(input),
_dashes(region_map(_input.begin(), _input.end())),
_rng(std::random_device {}()),
_select (0, _dashes.iterative_size() - 1),
_randpos(0, _input.size() - 1),
_is_degenerate(cardinality(_dashes) == _input.size())
{
}
unsigned random(unsigned below) {
return _rng() % below; // q&d, only here to make the tests deterministic for a fixed seed
}
Map full() const {
return Map { { 0, _input.size() } };
}
Relocs gen_relocations(int n=1) {
Map const moving = pick_dashes(n);
Relocs relocs;
relocs.reserve(moving.iterative_size());
if (_is_degenerate)
{
// degenerate case (everything is a dash); no non-moving positions
// exist, just pick 0
for(auto& region : moving)
relocs.emplace(0, region);
} else {
auto inertia = [&] {
Position inert_point;
while (contains(moving, inert_point = _randpos(_rng)))
; // discard insertion points that are moving
return inert_point;
};
for(auto& region : moving)
relocs.emplace(inertia(), region);
}
return relocs;
}
template <typename Out>
Out apply_relocations(Relocs const& mobility, Out out) const {
if (_is_degenerate)
return std::copy(_input.begin(), _input.end(), out);
auto to_iter = [this](Position pos) { return _input.begin() + pos; };
do_apply_relocations(mobility, [&](Region const& r, bool relocated) {
if (relocated) *out++ = '(';
out = std::copy(
to_iter(first(r)),
to_iter(last(r)+1),
out
);
if (relocated) *out++ = ')';
});
return out;
}
template <typename F>
void do_apply_relocations(Relocs const& mobility, F const& apply) const {
icl::split_interval_set<Position> steps {{0, _input.size()}};
for (auto& m : mobility) {
steps += m.first; // force a split on insertion point
steps -= m.second; // remove the source of the move
//std::cout << m.second << " moving to #" << m.first << ": " << steps << "\n";
}
auto next_mover = mobility.begin();
for(auto& step : steps) {
while (next_mover!=mobility.end() && contains(step, next_mover->first))
apply((next_mover++)->second, true);
apply(step, false);
}
}
private:
std::string _input;
Map _dashes;
std::mt19937 _rng;
std::uniform_int_distribution<Position> _select;
std::uniform_int_distribution<Position> _randpos;
bool _is_degenerate;
Map pick_dashes(int n) {
Map map;
if (!_dashes.empty())
for (int i = 0; i < n; ++i)
map += *std::next(_dashes.begin(), _select(_rng));
return map;
}
template <typename It> Map region_map(It f, It l) {
Map dashes;
for (Position pos = 0; f!=l; ++f, ++pos)
if ('-' == *f)
dashes += pos;
return dashes;
}
};
int main() {
for (std::string test_case : {
"----",
"A--TG-DF----GR--",
"",
"ATGDFGR",
})
{
auto start = std::chrono::high_resolution_clock::now();
Generator gen(test_case);
std::string result;
std::map<std::string, size_t> histo;
for(int i = 0; i < 1000000; ++i) {
auto const mobility = gen.gen_relocations(1 + gen.random(3)); // move n blocks of dashes
result.clear();
gen.apply_relocations(mobility, std::back_inserter(result));
histo[result]++;
}
std::cout << histo.size() << " unique results for '" << test_case << "'"
<< " in " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now()-start).count() << "ms\n";
std::multimap<size_t, std::string, std::greater<size_t> > ranked;
for (auto& entry : histo)
ranked.emplace(entry.second, entry.first);
int topN = 10;
for (auto& rank : ranked)
{
std::cout << std::setw(8) << std::right << rank.first << ": " << rank.second << "\n";
if (0 == --topN)
break;
}
}
}
Prints e.g.
1 unique results for '----' in 186ms
1000000: ----
3430 unique results for 'A--TG-DF----GR--' in 1156ms
9251: A(----)--TG-DFGR--
9226: (----)A--TG-DFGR--
9191: A--T(----)G-DFGR--
9150: A--TG-DFGR-(----)-
9132: A--(----)TG-DFGR--
9128: A--TG(----)-DFGR--
9109: A--TG-D(----)FGR--
9098: A--TG-DFG(----)R--
9079: A--TG-DFGR(----)--
9047: A-(----)-TG-DFGR--
1 unique results for '' in 25ms
1000000:
1 unique results for 'ATGDFGR' in 77ms
1000000: ATGDFGR
You can pre-process the string to get a list of iterators that point ot the beginnings of consecutive dashes in the string and then uniformly pick a random element from that list.
I will use the following standard library headers in this example (which is complete and working if you concatenate the following code blocks):
#include <cstddef>
#include <iostream>
#include <random>
#include <stdexcept>
#include <string>
#include <vector>
First, we define a function that finds us said list of iterators. To do so, we make use of std::string::find_first_of and std::string::find_first_not_of to find the index of the first character in and of the first character after the next sequence. Both functions work with indices rather than with iterators, so we have to add them to cbegin(). The function works with any character, not just dashes.
std::vector<std::string::const_iterator>
find_conscutive_sequences(const std::string& text, const char c)
{
std::vector<std::string::const_iterator> positions {};
std::size_t idx = 0UL;
while (idx != std::string::npos && idx < text.length())
{
const auto first = text.find_first_of(c, idx);
if (first == std::string::npos)
break;
positions.push_back(text.cbegin() + first);
idx = text.find_first_not_of(c, first);
}
return positions;
}
Next, we define a function that uses the result of the above function to return an iterator to the beginning of a randomly selected sequence of dashes.
We pass in the random engine as a parameter so it can be seeded once and used over and over again. The random standard library introduced in C++11 is so great that it should be preferred whenever possible over the legacy rand function.
If given an empty vector of positions, we have to fail because there is no sequence we could possibly select.
std::string::const_iterator
get_random_consecutive_sequence(const std::vector<std::string::const_iterator>& positions,
std::default_random_engine& prng)
{
if (positions.empty())
throw std::invalid_argument {"string does not contain any sequence"};
std::uniform_int_distribution<std::size_t> rnddist {0UL, positions.size() - 1UL};
const auto idx = rnddist(prng);
return positions.at(idx);
}
Finally, I define this little helper function to mark the selected sequence. Your code would do the copy / move / shift here.
std::string
mark_sequence(const std::string& text,
const std::string::const_iterator start)
{
const auto c = *start;
const std::size_t first = start - text.cbegin();
std::size_t last = text.find_first_not_of(c, first);
if (last == std::string::npos)
last = text.length();
std::string marked {};
marked.reserve(text.length() + 2UL);
marked += text.substr(0UL, first);
marked += '(';
marked += text.substr(first, last - first);
marked += ')';
marked += text.substr(last, text.length() - last);
return marked;
}
It can be used like this.
int
main()
{
const std::string example {"--A--B-CD----E-F---"};
std::random_device rnddev {};
std::default_random_engine rndengine {rnddev()};
const auto positions = find_conscutive_sequences(example, '-');
for (int i = 0; i < 10; ++i)
{
const auto pos = get_random_consecutive_sequence(positions, rndengine);
std::cout << mark_sequence(example, pos) << std::endl;
}
}
Possible output:
--A--B-CD(----)E-F---
--A--B(-)CD----E-F---
--A(--)B-CD----E-F---
--A(--)B-CD----E-F---
--A--B-CD(----)E-F---
--A--B-CD----E-F(---)
--A--B-CD----E-F(---)
(--)A--B-CD----E-F---
--A--B(-)CD----E-F---
(--)A--B-CD----E-F---
string::find() has optional second parameter: a position to start the search from. So, something like s.find("-", rand() % L) may do the trick for you, where L is (position of the last dash + 1).
As I understand the problem all dash blocks should have the same probability of being selected. Therefore we must first find the positions where all these blocks start and then pick one of those positions at Random.
If I'm allowed to use Smalltalk for pseudo code, then I would first find the indexes where every dash block starts:
dashPositionsOn: aString
| indexes i n |
indexes := OrderedCollection new.
i := 1.
n := aString size.
[i <= n] whileTrue: [| char |
char := aString at: i.
char = $-
ifTrue: [
indexes add: i.
[
i := i + 1.
i <= n and: [
char := aString at: i.
char = $-]] whileTrue]
ifFalse: [i := i + 1]].
^indexes
Now we can pick one of these indexes at random: indexes atRandom.
Please note that there are (much) better ways to implement this algorithm in Smalltalk (as well as in other languages).
Related
There is a collection (vector, list, etc) of directories:
example 1:
/a/ab/bc/de
/a/ab/cc/fw
/a/ab/dd
/a/ab/ee/fg
Find /a/ab
example 2:
/a/ab/bc/de
/a/b/cc/fw
/a/ab/dd
/a/ab/ee/fg
Find /a
What is the best way to find the common path to all the directories?
P.S. The end goal is to copy only the relative paths, for example 1 the /a/ab needs to be removed so that all is left is:
bc/de
cc/fw
dd
ee/fg
Sort vector of paths first.
std::vector<std::string> paths = {"/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd", "/a/ab/ee/fg"};
std::sort(paths.begin(), paths.end());
Compare shortest and longest paths to find the position mismatches.
const auto& shortest = paths.front();
const auto& longest = paths.back();
auto mis = std::mismatch(shortest.cbegin(), shortest.cend(), longest.cbegin(), longest.cend());
Now make a copy from the substring.
auto common = std::string(shortest.cbegin(), mis.first);
Here's the full source code tested in vs2022.
It printed "/a/ab/" and "/a/" for your example.
I beleive you can find how to remove the trailing '/'.
#include <algorithm>
#include <iostream>
#include <string>
#include <vector>
int main() {
try {
std::vector<std::string> paths = {"/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd",
"/a/ab/ee/fg"};
std::sort(paths.begin(), paths.end());
const auto& shortest = paths.front();
const auto& longest = paths.back();
auto mis = std::mismatch(shortest.cbegin(), shortest.cend(),
longest.cbegin(), longest.cend());
auto common = std::string(shortest.cbegin(), mis.first);
std::cout << common << std::endl;
} catch (const std::exception& e) {
std::cerr << e.what() << std::endl;
return -1;
}
return 0;
}
This is a first order approach, (too bad I couldn't find any useful functions in <filesystem>)
#include <string>
#include <vector>
#include <iostream>
std::string get_common_path(const std::string& lhs, const std::string& rhs)
{
auto lhs_it = lhs.begin();
auto rhs_it = rhs.begin();
// as long as characters match move to right (but not past end of either string)
while ((lhs_it != lhs.end()) && (rhs_it != rhs.end()) && (*lhs_it == *rhs_it))
{
++lhs_it;
++rhs_it;
}
return std::string{ lhs.begin(),lhs_it };
}
std::string common_path(const std::vector<std::string>& values)
{
if (values.empty()) return std::string{};
if (values.size() == 1) return values.front();
// get first string, that is now most common path
auto it = values.begin();
std::string retval = *it;
++it;
// loop over all values
while ((it != values.end()) && (!retval.empty()))
{
// the overlap is the existing overlap combined with the next string
// in the vector.
retval = get_common_path(retval, *it);
++it;
}
return retval;
}
int main()
{
std::vector<std::string> paths
{
"/a/ab/bc/de",
"/a/ab/cc/fw",
"/a/ab/dd",
"/a/ab/ee/fg"
};
auto result = common_path(paths);
std::cout << result;
return 0;
}
Define best and the size of the data set.
It is a Tree, so you could insert the paths into a tree and then traverse until you find a node with more than one child, this node is the common path for all nodes.
There is a very easy solution.
You can analyze the data and make the following observation.
If you see the std::vector<std::string>> as a 2-dimenensional array of characters, you can compare the charaters column wise.
/a/ab/bc/de
/a/b/cc/fw
/a/ab/dd
/a/ab/ee/fg
||||
||||
|||+--- Not all charatcers are the same
||+---- All characters in this column are the same
|+----- All characters in this column are the same
+------ All characters in this column are the same
Starting with column 0, you can check, if all characters in this column are the same, Then next column and so on.
As soon as we find a difference in a column, then we know that we have found the end of the common prefix.
And then we can output the result of the common prefix and also the remaining suffixes.
All this with only a few lines of conventional code.
Example for one potential solution:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
std::vector<std::string> paths = { "/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd", "/a/ab/ee/fg" };
int main() {
// Sanity check
if (not paths.empty()) {
// Of course we will only compare to the smallest string size
size_t minSize = std::min_element(paths.begin(), paths.end(), [](const std::string& s1, const std::string& s2) {return s1.size() < s2.size(); })->size();
size_t cont{ 1 }, col{ 0 };
// Double nested loop to find resutling column
for (size_t row{ 1 }; cont and col < minSize; col += cont, row = 1)
for (auto c{ paths.front()[col] }; cont and row < paths.size(); row += cont)
cont = ((c == paths[row][col]) * 1);
// Show result as debug output
std::cout << "Common prefix: " << paths.front().substr(0, col) << "\n\n";
for (std::string& s : paths) std::cout << "Resulting path: " << s.substr(col) << '\n';
}
}
Sea EDIT!! Below
I am coding a word ladder algorithm. The user enters a start word, an end word and a hash of all the words. The algorithm returns all the shortest paths (multiple if exist) from start word to the end word. Eg -> start_word = 'cold' , end_word = 'warm'
output = [[ cold -> cord-> card-> ward-> warm], [/If another path exists/]].
Every consecutive word from the previous is different by one character. I am using BFS search to solve this problem. My strategy was to return all the paths, and then select the shortest ones from the returned list. This is my code to return all the paths:
auto word_ladder::generate(std::string const& from, std::string const& to, absl::flat_hash_set<std::string> const& lexicon) -> std::vector<std::vector<std::string>> {
absl::flat_hash_set<std::string> visited = {};
std::queue<std::vector<std::string>> search_queue = {};
std::vector<std::vector<std::string>> paths = {};
search_queue.push(std::vector<std::string>{from});
while (!search_queue.empty()) {
auto word = search_queue.front();
search_queue.pop();
auto neighbours = generate_neighbours(word.back(), lexicon);
for (auto &i: neighbours) {
auto new_word = word;
new_word.push_back(i);
if (i == to) {
paths.push_back(new_word);
continue;
}
if (visited.find(i) != visited.end()) {
continue;
}
search_queue.push(new_word);
visited.insert(i);
}
}
return paths;
}
It does return multiple paths however the problem is that it doesnt return all the paths. One of the paths it returns is ->
1) awake, aware, sware, share, shire, shirr, shier, sheer, sheep, sleep
however it doesn't return the path -> 2) "awake","aware","sware","share","sharn","shawn","shewn","sheen","sheep","sleep"
I am pretty sure the reason is because the way I have coded it, it marks the word "share" as visited the first time it encounters it (in 1) ). Hence it doesn't go through the second path (in 2))
To solve this, I changed my for loop a bit:
for (auto &i: neighbours) {
auto new_word = word;
new_word.push_back(i);
if (i == to) {
paths.push_back(new_word);
continue;
}
for (auto &j: word) {
if (j == i) {
continue;
}
}
search_queue.push(new_word);
}
The idea was to check if the word has been visited in the path that you are keeping track of in the queue, and not globally. However, this solution for some reason gets stuck in a loop somewhere and doesn't terminate (I am assuming due to large dataset?).
Is there something wrong with my code in the second or it takes too long because of large dataset? How can I better achieve the solution?
EDIT!!!
I am now instead of finding all the paths, finding the length of shortest path and then performing BFS till that depth to get all the paths at that depth.
auto word_ladder::generate(std::string const& from, std::string const& to, absl::flat_hash_set<std::string> const& lexicon) -> std::vector<std::vector<std::string>> {
absl::flat_hash_set<std::string> visited = {};
visited.insert(from);
std::queue<std::vector<std::string>> search_queue = {};
std::vector<std::vector<std::string>> paths = {};
search_queue.push(std::vector<std::string>{from});
auto length = find_shortest_path_length(from, to, lexicon);
std::cout << "length is: " << length << "\n";
// auto level = 0;
std::unordered_map<std::string, int> level_track = {};
level_track[from] = 0;
while (!search_queue.empty() ) {
auto word = search_queue.front();
search_queue.pop();
// **
if (level_track[word.back()] <= length) {
auto neighbours = generate_neighbours(word.back(), lexicon);
const auto &parent = word.back();
for (auto &i: neighbours) {
auto new_word = word;
new_word.push_back(i);
if (i == to) {
paths.push_back(new_word);
std::cout << "The level at the path was " << level_track[parent] << "\n";
continue;
}
if (path_crossed(word, i)) {
continue;
}
search_queue.push(new_word);
level_track[i] = level_track[parent] + 1;
}
}
}
return paths;
}
The solution now terminates so definitely the problem earlier was the large number of searches. However my algorithm is still not giving me correct answer as the way I keep track of depth of my nodes (words) is somehow not correct.
You're trying to find an efficient solution but most probably it doesn't exist. See this answer. Enumerating all shortest paths can be very costly.
i have given a vector `
vector<string> inputArray = { "aba","aa","ad","vcd","aba" };
and i want to return this vector which contains only string with the longest length, in this case i want to return only {"aba","vcd","aba"}, so for now i want to erase elements which length is not equal to the highest `
vector<string> allLongestStrings(vector<string> inputArray) {
int length = inputArray.size();
int longstring = inputArray[0].length();
int count = 0;
vector<string> result;
for (int i = 0; i < length; i++)
{
if (longstring < inputArray[i].length())
{
longstring = inputArray[i].length();
}
count++;
}
for (int = 0; i<count;i++)
{
if (inputArray[i].length() != longstring)
{
inputArray[i].erase(inputArray.begin() + i);
count--;
i--;
}
}
return inputArray;
}
but i get this error no instance of overloaded fucntion "std::basic_string<_Elem,_Traits,_Alloc>::erase[with_Elem=char,_Traits=std::char_traits<char>,_Alloc=std::allocator<char>]" matches the argument list" in inputArray[i].erase(inputArray.begin()+i); this line
what's wrong?
There are other problems, but this specific compiler message is telling you that's not the right way to remove specific character(s) from a string.
However, reading the question in the OP, we see that you wanted to remove a string from a vector. To fix that one specific error, simply change
inputArray[i].erase( /*character position(s) in the string*/ )
to
inputArray.erase( /*some position in the array*/ )
Or you could fix it so it uses an iterator in the string denoted by inputArray[i] to actually delete characters from that string, which of course isn't what you said you wanted to do. The point is, the error message is because you're using the wrong iterator type because you think that you're working with a vector, but you actually told it to work with a string that you got out of the vector.
And then you will compile and have other issues which are well covered in comments already.
The issue with inputArray[i].erase(inputArray.begin() + i); can be fixed as shown in Kenny Ostrom's answer.
I'd like to point out that the OP could make use of the erase-remove idiom or even create a new vector with only the bigger strings instead (the posted code is already copying the source vector).
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
template <typename InputIt>
auto only_the_longest_of(InputIt first, InputIt last)
{
using value_type = typename std::iterator_traits<InputIt>::value_type;
std::vector<value_type> result;
// find the longest size
auto longest = std::max_element(first, last,
[](value_type const &a, value_type const &b) {
return a.size() < b.size();
});
if ( longest == last )
return result;
// extract only the longest ones, instead of erasing
std::copy_if( first, last, std::back_inserter(result)
, [max_size = longest->size()] (value_type const& v) {
return v.size() >= max_size;
});
return result;
}
template <typename T>
auto erase_the_shortest_from(std::vector<T> &input)
{
// find the longest size
auto longest = std::max_element(input.cbegin(), input.cend(),
[](T const &a, T const &b) {
return a.size() < b.size();
});
if ( longest == input.cend() || longest->size() == 0 )
return input.end();
// implement erase-remove idiom
return input.erase(std::remove_if(
input.begin(), input.end(), [max_size = longest->size()] (T const &v) {
return v.size() < max_size;
}));
}
int main()
{
std::vector<std::string> test = {
"aba", "aa", "ad", "vcd", "aba"
};
// The original vector remain unchanged
auto result = only_the_longest_of(test.cbegin(), test.cend());
for (auto const& str : result)
std::cout << str << '\n';
std::cout << '\n';
// This will change the vector
erase_the_shortest_from(test);
for (auto const& str : test)
std::cout << str << '\n';
}
I have a set of integers {1,2}. I want to produce "Transform#1, Transform#2" where each element is tranformed and then result is accumulated with a delimiter.
What would be the easiest way to accomplish this? Do we have "folds", "maps" in c++?
We dont use boost.
You can use std::transform and std::accumulate
int main()
{
std::vector<int> v1 {1,2,3};
std::vector<std::string> v2;
std::transform(begin(v1), end(v1), std::back_inserter(v2), [](auto const& i) {
return std::string("Transform#") + std::to_string(i);
});
std::string s = std::accumulate(std::next(begin(v2)), end(v2), v2.at(0), [](auto const& a, auto const& b) {
return a + ", " + b;
});
std::cout << s;
}
prints Transform#1, Transform#2, Transform#3
You may want to use Range Adaptors. Boost already has them and they are coming to the standard with C++20.
Take a look at the boost::adaptors::transformed example here.
Also, check out the reference to get a better picture of what operations are supported by adaptors.
In the end, you can achieve much cleaner code and the performance difference is negligible (unlike in some other languages, where using this style of programming incurs heavy performance costs).
If you can stand a trailing separator, the following function can transform any iterable range of data { X, ..., Z } to the string "<tag>X<sep>...<sep><tag>Z<sep>".
Code
template <class InputIt>
std::string f(InputIt begin, InputIt end, std::string_view separator = ", ", std::string_view tag = "Transform#")
{
std::stringstream output;
std::transform(begin, end,
std::ostream_iterator<std::string>(output, separator.data()),
[tag](auto const& element){ return std::string{tag} + std::to_string(element); }
);
return output.str();
}
It works by transforming each element from the range into a stream iterator.
Usage
int main()
{
std::set<int> const data{1, 2, 3}; // works with vector, string, list, C-arrays, etc.
std::cout << f(begin(data), end(data)) << '\n';
// prints Transform#1, Transform#2, Transform#3,
}
Live demo
You can perform a fold using simply std::accumulate
#include <set>
#include <string>
#include <iostream>
#include <numeric>
int main()
{
auto transformation = [](int number) { return "Transform#" + std::to_string(number); };
auto transform_and_fold = [&transformation](std::string init, int number) { return std::move(init) + ", " + transformation(number); };
std::set<int> numbers{1, 2};
std::cout << std::accumulate(std::next(numbers.begin()), numbers.end(), transformation(*numbers.begin()), transform_and_fold);
}
Outputs
Transform#1, Transform#2
Assuming that I correctly understand the problem, the following straightforward implementation also looks very simple and easy.
This function works in C++11 and over:
DEMO with 5 test cases
std::string concatenate(
const std::vector<int>& indecies,
const std::string& delimiter = ", ",
const std::string& tag = "Transform#")
{
if(indecies.empty()){
return "";
}
std::string s(tag + std::to_string(indecies[0]));
for(auto it = indecies.begin()+1; it != indecies.cend(); ++it){
s += (delimiter + tag + std::to_string(*it));
}
return s;
}
(BTW, as for this function concatenate, if indecies is empty, the return value is also an empty string, not exceptions (AndreasDM's one) or UB (Everlight's one).
And if indecies has only a single element, for instance indecies={1}, then result is "Transform#1”, not "Transform#1, ”(YSC's one) or ", Transform#1”(sakra's one).
These are different from other answers and this function will be more simpler if this handling is removed.)
Although the performance may not be a focal point, the above function can be slightly optimized by pre-reserving the minimum capacity to save the resulted string by std::basic_string::reserve as follows.
Here +1 in *.size()+1 means the minimum length of a number character.
I also removed delimiter+tag in the for-loop.
This still looks simple:
DEMO with 5 test cases
std::string concatenate_fast(
const std::vector<int>& indecies,
std::string delimiter = ", ",
const std::string& tag = "Transform#")
{
if(indecies.empty()){
return "";
}
std::string s(tag + std::to_string(indecies[0]));
delimiter += tag;
s.reserve((tag.size()+1) + (indecies.size()-1)*(delimiter.size()+1));
for(auto it = indecies.begin()+1; it != indecies.cend(); ++it){
s += (delimiter + std::to_string(*it));
}
return s;
}
I have also tested the performance of these functions and some proposed answers as follows.
These tests are done by Quick C++ Benchmark within gcc-8.2, C++17 and O3 optimization.
Since std::transform_reduce is still not available in Quick C++ Benchmark, I haven’t tested it.
The above concatenate_fast shows best performance at least in these cases and concatenate is second best.
Finally, just personally, taking the balance of the readability and the performance into account, I would like to propose the above concatenate as a solution:
- Performance test with size 2 and 8. (DEMO)
- Performance test with size 16 and 32. (DEMO)
Unless you have some other requirement to preserve the intermediate tranformed list, storing it is suboptimal. You can just call std::accumulate and do both operations on the fly:
#include <cstdio>
#include <iterator>
#include <numeric>
int main ( )
{
int const input [] = { 1, 2, 3, 4, 5, 6 };
// computes sum of squares
auto const add_square = [] ( int x, int y ) { return x + y * y; };
int result = std::accumulate
( std::cbegin (input)
, std::cend (input)
, 0
, add_square
);
std::printf ( "\n%i\n", result );
return 0;
}
If you have the luxury of using C++17, there is a standard library algorithm which does exactly what you need. Here is an example:
#include <iterator>
#include <iostream>
#include <numeric>
#include <string>
int main()
{
auto input = {1, 2, 3};
std::cout << std::transform_reduce(
std::cbegin(input), std::cend(input),
std::string("Result:"),
[](const std::string & left, const std::string & right) { return left + " " + right; },
[](int value) { return "Transform#" + std::to_string(value); }
) << "\n";
}
I have the following multiset in C++:
template<class T>
class CompareWords {
public:
bool operator()(T s1, T s2)
{
if (s1.length() == s2.length())
{
return ( s1 < s2 );
}
else return ( s1.length() < s2.length() );
}
};
typedef multiset<string, CompareWords<string>> mySet;
typedef std::multiset<string,CompareWords<string>>::iterator mySetItr;
mySet mWords;
I want to print each unique element of type std::string in the set once and next to the element I want to print how many time it appears in the list (frequency), as you can see the functor "CompareWord" keeps the set sorted.
A solution is proposed here, but its not what I need, because I am looking for a solution without using (while,for,do while).
I know that I can use this:
//gives a pointer to the first and last range or repeated element "word"
auto p = mWords.equal_range(word);
// compute the distance between the iterators that bound the range AKA frequency
int count = static_cast<int>(std::distance(p.first, p.second));
but I can't quite come up with a solution without loops?
Unlike the other solutions, this iterates over the list exactly once. This is important, as iterating over a structure like std::multimap is reasonably high overhead (the nodes are distinct allocations).
There are no explicit loops, but the tail-end recursion will be optimized down to a loop, and I call an algorithm that will run a loop.
template<class Iterator, class Clumps, class Compare>
void produce_clumps( Iterator begin, Iterator end, Clumps&& clumps, Compare&& compare) {
if (begin==end) return; // do nothing for nothing
typedef decltype(*begin) value_type_ref;
// We know runs are at least 1 long, so don't bother comparing the first time.
// Generally, advancing will have a cost similar to comparing. If comparing is much
// more expensive than advancing, then this is sub optimal:
std::size_t count = 1;
Iterator run_end = std::find_if(
std::next(begin), end,
[&]( value_type_ref v ){
if (!compare(*begin, v)) {
++count;
return false;
}
return true;
}
);
// call our clumps callback:
clumps( begin, run_end, count );
// tail end recurse:
return produce_clumps( std::move(run_end), std::move(end), std::forward<Clumps>(clumps), std::forward<Compare>(compare) );
}
The above is a relatively generic algorithm. Here is its use:
int main() {
typedef std::multiset<std::string> mySet;
typedef std::multiset<std::string>::iterator mySetItr;
mySet mWords { "A", "A", "B" };
produce_clumps( mWords.begin(), mWords.end(),
[]( mySetItr run_start, mySetItr /* run_end -- unused */, std::size_t count )
{
std::cout << "Word [" << *run_start << "] occurs " << count << " times\n";
},
CompareWords<std::string>{}
);
}
live example
The iterators must refer to a sorted sequence (with regards to the Comparator), then the clumps will be passed to the 3rd argument together with their length.
Every element in the multiset will be visited exactly once with the above algorithm (as a right-hand side argument to your comparison function). Every start of a clump will be visited (length of clump) additional times as a left-hand side argument (including clumps of length 1). There will be exactly N iterator increments performed, and no more than N+C+1 iterator comparisons (N=number of elements, C=number of clumps).
#include <iostream>
#include <algorithm>
#include <set>
#include <iterator>
#include <string>
int main()
{
typedef std::multiset<std::string> mySet;
typedef std::multiset<std::string>::iterator mySetItr;
mySet mWords;
mWords.insert("A");
mWords.insert("A");
mWords.insert("B");
mySetItr it = std::begin(mWords), itend = std::end(mWords);
std::for_each<mySetItr&>(it, itend, [&mWords, &it] (const std::string& word)
{
auto p = mWords.equal_range(word);
int count = static_cast<int>(std::distance(p.first, p.second));
std::cout << word << " " << count << std::endl;
std::advance(it, count - 1);
});
}
Outputs:
A 2
B 1
Live demo link.
Following does the job without explicit loop using recursion:
void print_rec(const mySet& set, mySetItr it)
{
if (it == set.end()) {
return;
}
const auto& word = *it;
auto next = std::find_if(it, set.end(),
[&word](const std::string& s) {
return s != word;
});
std::cout << word << " appears " << std::distance(it, next) << std::endl;
print_rec(set, next);
}
void print(const mySet& set)
{
print_rec(set, set.begin());
}
Demo