Word ladder 2 using BFS - c++

Sea EDIT!! Below
I am coding a word ladder algorithm. The user enters a start word, an end word and a hash of all the words. The algorithm returns all the shortest paths (multiple if exist) from start word to the end word. Eg -> start_word = 'cold' , end_word = 'warm'
output = [[ cold -> cord-> card-> ward-> warm], [/If another path exists/]].
Every consecutive word from the previous is different by one character. I am using BFS search to solve this problem. My strategy was to return all the paths, and then select the shortest ones from the returned list. This is my code to return all the paths:
auto word_ladder::generate(std::string const& from, std::string const& to, absl::flat_hash_set<std::string> const& lexicon) -> std::vector<std::vector<std::string>> {
absl::flat_hash_set<std::string> visited = {};
std::queue<std::vector<std::string>> search_queue = {};
std::vector<std::vector<std::string>> paths = {};
search_queue.push(std::vector<std::string>{from});
while (!search_queue.empty()) {
auto word = search_queue.front();
search_queue.pop();
auto neighbours = generate_neighbours(word.back(), lexicon);
for (auto &i: neighbours) {
auto new_word = word;
new_word.push_back(i);
if (i == to) {
paths.push_back(new_word);
continue;
}
if (visited.find(i) != visited.end()) {
continue;
}
search_queue.push(new_word);
visited.insert(i);
}
}
return paths;
}
It does return multiple paths however the problem is that it doesnt return all the paths. One of the paths it returns is ->
1) awake, aware, sware, share, shire, shirr, shier, sheer, sheep, sleep
however it doesn't return the path -> 2) "awake","aware","sware","share","sharn","shawn","shewn","sheen","sheep","sleep"
I am pretty sure the reason is because the way I have coded it, it marks the word "share" as visited the first time it encounters it (in 1) ). Hence it doesn't go through the second path (in 2))
To solve this, I changed my for loop a bit:
for (auto &i: neighbours) {
auto new_word = word;
new_word.push_back(i);
if (i == to) {
paths.push_back(new_word);
continue;
}
for (auto &j: word) {
if (j == i) {
continue;
}
}
search_queue.push(new_word);
}
The idea was to check if the word has been visited in the path that you are keeping track of in the queue, and not globally. However, this solution for some reason gets stuck in a loop somewhere and doesn't terminate (I am assuming due to large dataset?).
Is there something wrong with my code in the second or it takes too long because of large dataset? How can I better achieve the solution?
EDIT!!!
I am now instead of finding all the paths, finding the length of shortest path and then performing BFS till that depth to get all the paths at that depth.
auto word_ladder::generate(std::string const& from, std::string const& to, absl::flat_hash_set<std::string> const& lexicon) -> std::vector<std::vector<std::string>> {
absl::flat_hash_set<std::string> visited = {};
visited.insert(from);
std::queue<std::vector<std::string>> search_queue = {};
std::vector<std::vector<std::string>> paths = {};
search_queue.push(std::vector<std::string>{from});
auto length = find_shortest_path_length(from, to, lexicon);
std::cout << "length is: " << length << "\n";
// auto level = 0;
std::unordered_map<std::string, int> level_track = {};
level_track[from] = 0;
while (!search_queue.empty() ) {
auto word = search_queue.front();
search_queue.pop();
// **
if (level_track[word.back()] <= length) {
auto neighbours = generate_neighbours(word.back(), lexicon);
const auto &parent = word.back();
for (auto &i: neighbours) {
auto new_word = word;
new_word.push_back(i);
if (i == to) {
paths.push_back(new_word);
std::cout << "The level at the path was " << level_track[parent] << "\n";
continue;
}
if (path_crossed(word, i)) {
continue;
}
search_queue.push(new_word);
level_track[i] = level_track[parent] + 1;
}
}
}
return paths;
}
The solution now terminates so definitely the problem earlier was the large number of searches. However my algorithm is still not giving me correct answer as the way I keep track of depth of my nodes (words) is somehow not correct.

You're trying to find an efficient solution but most probably it doesn't exist. See this answer. Enumerating all shortest paths can be very costly.

Related

How to find first unique char a list<string> in c++?

There is a collection (vector, list, etc) of directories:
example 1:
/a/ab/bc/de
/a/ab/cc/fw
/a/ab/dd
/a/ab/ee/fg
Find /a/ab
example 2:
/a/ab/bc/de
/a/b/cc/fw
/a/ab/dd
/a/ab/ee/fg
Find /a
What is the best way to find the common path to all the directories?
P.S. The end goal is to copy only the relative paths, for example 1 the /a/ab needs to be removed so that all is left is:
bc/de
cc/fw
dd
ee/fg
Sort vector of paths first.
std::vector<std::string> paths = {"/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd", "/a/ab/ee/fg"};
std::sort(paths.begin(), paths.end());
Compare shortest and longest paths to find the position mismatches.
const auto& shortest = paths.front();
const auto& longest = paths.back();
auto mis = std::mismatch(shortest.cbegin(), shortest.cend(), longest.cbegin(), longest.cend());
Now make a copy from the substring.
auto common = std::string(shortest.cbegin(), mis.first);
Here's the full source code tested in vs2022.
It printed "/a/ab/" and "/a/" for your example.
I beleive you can find how to remove the trailing '/'.
#include <algorithm>
#include <iostream>
#include <string>
#include <vector>
int main() {
try {
std::vector<std::string> paths = {"/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd",
"/a/ab/ee/fg"};
std::sort(paths.begin(), paths.end());
const auto& shortest = paths.front();
const auto& longest = paths.back();
auto mis = std::mismatch(shortest.cbegin(), shortest.cend(),
longest.cbegin(), longest.cend());
auto common = std::string(shortest.cbegin(), mis.first);
std::cout << common << std::endl;
} catch (const std::exception& e) {
std::cerr << e.what() << std::endl;
return -1;
}
return 0;
}
This is a first order approach, (too bad I couldn't find any useful functions in <filesystem>)
#include <string>
#include <vector>
#include <iostream>
std::string get_common_path(const std::string& lhs, const std::string& rhs)
{
auto lhs_it = lhs.begin();
auto rhs_it = rhs.begin();
// as long as characters match move to right (but not past end of either string)
while ((lhs_it != lhs.end()) && (rhs_it != rhs.end()) && (*lhs_it == *rhs_it))
{
++lhs_it;
++rhs_it;
}
return std::string{ lhs.begin(),lhs_it };
}
std::string common_path(const std::vector<std::string>& values)
{
if (values.empty()) return std::string{};
if (values.size() == 1) return values.front();
// get first string, that is now most common path
auto it = values.begin();
std::string retval = *it;
++it;
// loop over all values
while ((it != values.end()) && (!retval.empty()))
{
// the overlap is the existing overlap combined with the next string
// in the vector.
retval = get_common_path(retval, *it);
++it;
}
return retval;
}
int main()
{
std::vector<std::string> paths
{
"/a/ab/bc/de",
"/a/ab/cc/fw",
"/a/ab/dd",
"/a/ab/ee/fg"
};
auto result = common_path(paths);
std::cout << result;
return 0;
}
Define best and the size of the data set.
It is a Tree, so you could insert the paths into a tree and then traverse until you find a node with more than one child, this node is the common path for all nodes.
There is a very easy solution.
You can analyze the data and make the following observation.
If you see the std::vector<std::string>> as a 2-dimenensional array of characters, you can compare the charaters column wise.
/a/ab/bc/de
/a/b/cc/fw
/a/ab/dd
/a/ab/ee/fg
||||
||||
|||+--- Not all charatcers are the same
||+---- All characters in this column are the same
|+----- All characters in this column are the same
+------ All characters in this column are the same
Starting with column 0, you can check, if all characters in this column are the same, Then next column and so on.
As soon as we find a difference in a column, then we know that we have found the end of the common prefix.
And then we can output the result of the common prefix and also the remaining suffixes.
All this with only a few lines of conventional code.
Example for one potential solution:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
std::vector<std::string> paths = { "/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd", "/a/ab/ee/fg" };
int main() {
// Sanity check
if (not paths.empty()) {
// Of course we will only compare to the smallest string size
size_t minSize = std::min_element(paths.begin(), paths.end(), [](const std::string& s1, const std::string& s2) {return s1.size() < s2.size(); })->size();
size_t cont{ 1 }, col{ 0 };
// Double nested loop to find resutling column
for (size_t row{ 1 }; cont and col < minSize; col += cont, row = 1)
for (auto c{ paths.front()[col] }; cont and row < paths.size(); row += cont)
cont = ((c == paths[row][col]) * 1);
// Show result as debug output
std::cout << "Common prefix: " << paths.front().substr(0, col) << "\n\n";
for (std::string& s : paths) std::cout << "Resulting path: " << s.substr(col) << '\n';
}
}

String Searching Algorithm that uses a graph ? C++

Code Instructions
Hey guys. Above is a coding project I have been assigned. Im reading the instructions and am completely lost because I've never learned how to code an undirected graph? Not sure how my professor expects us to do this but I was hoping I could get some help from experts. Any readings (or tips) you guys suggest I can look at to familiarize myself with how to get started on the program? Appreciate it, thanks!
The problem to solve is called "Word Morph". Your instructor gave some restrictions as to use an undirected graph, where the neighbour node differs only one character from the origin. Unfortuantely the requirements are not clear enough. "Differ by one character is ambiguous. If we use the replace-insert-delete idiom, then we can use other functions as by comparing 2 equal size strings. I assume the full approach.
And, at the end, you need to find a shortest way through a graph.
I could present you one possible solution. A complete working code example.
By the way the graph is non-weigthed, because the cost of travelling from one node to the next is always 1. So actually we are talking about an undirected non-weighted graph.
The main algorithms we need using here are:
Levensthein. Calculate distance of 2 strings
and Breadth First Search, to find the shortes path through a graph
Please note, If the words should have the same length, then no Levensthein is needed. Just compare character by charcter and count the differences. That's rather simple. (But as said: The instructions are a little bit unclear)
Both algorithms can be modified. For example: You do not need a Levensthein distance greater than 1. You can terminate the distance calculation after distance one has been found. And, in the breadth first search, you could show the path, through which you are going.
OK, now, how to implement an undirected graph. There are 2 possibilities:
A Matrix (I will not explain)
A list or a vector
I would recommend the vector approach for this case. A matrix would be rather sparse, so, better a vector.
The basic data structure that you need is a node containing vertices and neighbours. So you have the word (a std::string) as vertex and the "neighbours". That is a std::vector of index positions to other nodes in the graph.
The graph is a vector of nodes. And nodes neighbours point to other nodes in this vector. We use the index into the vector to denote neighbours. All this we pack into a structure and call it "UndirectedGraph". We add a "build" function that checks for adjacencies. In this function we compare each string with any and check, if the difference is <2, so 0 or 1. 0 means equeal and 1 is a given constraint. If we find such a difference, we add it as neighboour in the corresponding node.
Additionally we add a breadth first search algorithm. It is described in Wikipedia
To ease up the implementation of that algortuhm we a a "visited" flag to the node.
Please see the code below:
#include <sstream>
#include <iostream>
#include <vector>
#include <string>
#include <iterator>
#include <iomanip>
#include <numeric>
#include <algorithm>
#include <queue>
std::istringstream textFileStream(R"#(peach
peace
place
plane
plans
plays
slays
stays
stars
sears
years
yearn
)#");
using Vertex = std::string;
using Edge = std::vector<size_t>;
// One node in a graph
struct Node
{
// The Vertex is a std::string
Vertex word{};
// The edges are the index of the neighbour nodes
Edge neighbour{};
// For Breath First Search
bool visited{ false };
// Easy input and output
friend std::istream& operator >> (std::istream& is, Node& n) {
n.neighbour.clear();
std::getline(is, n.word);
return is;
}
friend std::ostream& operator << (std::ostream& os, const Node& n) {
os << std::left << std::setw(25) << n.word << " --> ";
std::copy(n.neighbour.begin(), n.neighbour.end(), std::ostream_iterator<int>(os, " "));
return os;
}
};
// The graph
struct UndirectedGraph
{
// Contains a vector of nodes
std::vector<Node> graph;
// build adjacenies
void build();
// Find Path
bool checkForPathFromStartToEnd(size_t start, size_t end);
bool checkForPath() {bool result = false;if (graph.size() > 1) {size_t s = graph.size() - 2;size_t e = s + 1;result = checkForPathFromStartToEnd(s, e); }return result; }
// Easy input and output
friend std::istream& operator >> (std::istream& is, UndirectedGraph& ug) {
ug.graph.clear();
std::copy(std::istream_iterator<Node>(is), std::istream_iterator<Node>(), std::back_inserter(ug.graph));
return is;
}
friend std::ostream& operator << (std::ostream& os, const UndirectedGraph& ug) {
size_t i{ 0 };
for (const Node& n : ug.graph)
os << std::right << std::setw(4) << i++ << ' ' << n << '\n';
return os;
}
};
// Distance between 2 strings
size_t levensthein(const std::string& string1, const std::string& string2)
{
const size_t lengthString1(string1.size());
const size_t lengthString2(string2.size());
if (lengthString1 == 0) return lengthString2;
if (lengthString2 == 0) return lengthString1;
std::vector<size_t> costs(lengthString2 + 1);
std::iota(costs.begin(), costs.end(), 0);
for (size_t indexString1 = 0; indexString1 < lengthString1; ++indexString1) {
costs[0] = indexString1 + 1;
size_t corner = indexString1;
for (size_t indexString2 = 0; indexString2 < lengthString2; ++indexString2) {
size_t upper = costs[indexString2 + 1];
if (string1[indexString1] == string2[indexString2]) {
costs[indexString2 + 1] = corner;
}
else {
const size_t temp = std::min(upper, corner);
costs[indexString2 + 1] = std::min(costs[indexString2], temp) + 1;
}
corner = upper;
}
}
size_t result = costs[lengthString2];
return result;
}
// Build the adjacenies
void UndirectedGraph::build()
{
// Iterate over all words in the graph
for (size_t i = 0; i < graph.size(); ++i)
// COmpare everything with everything (becuase of symmetries, omit half of comparisons)
for (size_t j = i + 1; j < graph.size(); ++j)
// Chec distance of the 2 words to compare
if (levensthein(graph[i].word, graph[j].word) < 2U) {
// And store the adjacenies
graph[i].neighbour.push_back(j);
graph[j].neighbour.push_back(i);
}
}
bool UndirectedGraph::checkForPathFromStartToEnd(size_t start, size_t end)
{
// Assume that it will not work
bool result = false;
// Store intermediate tries in queue
std::queue<size_t> check{};
// Set initial values
graph[start].visited = true;
check.push(start);
// As long as we have not visited all possible nodes
while (!check.empty()) {
// Get the next node to check
size_t currentNode = check.front(); check.pop();
// If we found the solution . . .
if (currentNode == end) {
// The set resultung value and stop searching
result = true;
break;
}
// Go through all neighbours of the current node
for (const size_t next : graph[currentNode].neighbour) {
// If the neighbour node has not yet been visited
if (!graph[next].visited) {
// Then visit it
graph[next].visited = true;
// And check following elements next time
check.push(next);
}
}
}
return result;
}
int main()
{
// Get the filename from the user
std::cout << "Enter Filename for file with words:\n";
std::string filename{};
//std::cin >> filename;
// Open the file
//std::ifstream textFileStream(filename);
// If the file could be opened . . .
if (textFileStream) {
// Create an empty graph
UndirectedGraph undirectedGraph{};
// Read the complete file into the graph
textFileStream >> undirectedGraph;
Node startWord{}, targetWord{};
std::cout << "Enter start word and enter target word\n"; // teach --> learn
std::cin >> startWord >> targetWord;
// Add the 2 words at the and of our graph
undirectedGraph.graph.push_back(startWord);
undirectedGraph.graph.push_back(targetWord);
// Build adjacency graph, including the just added words
undirectedGraph.build();
// For debug purposes: Show the graph
std::cout << undirectedGraph;
std::cout << "\n\nMorph possible? --> " << std::boolalpha << undirectedGraph.checkForPath() << '\n';
}
else {
// File could not be found or opened
std::cerr << "Error: Could not open file : " << filename;
}
return 0;
}
Please note: Although I have implemented asking for a file name, I do not use it in this example. I read from a istringstream. You need to delete the istringstream later and comment in the existing statements.
Reagarding the requirements from the instructor: I did not use any STL/Library/Boost searching algorithm. (What for? In this example?) But I use of course other C++ STL container. I will not reinvent the wheel and come up with a new "vector" or queue. And I will definitely not use "new" or C-Style arrays or pointer arithmetic.
Have fun!
And to all others: Sorry: I could not resist to write the code . . .

Comparing 2 std::lists of std::pairs

I am hoping you can help me out here. I have searched for other answers, but I havent found something that matches my specific situation (but if you do find one, please let me know the URL!). I have seen a lot of suggestions about using std::map instead of list and I dont mind switching the container if need be.
Currently, I have two Lists of pairs i.e.
std:list <std::pair<string,string>> outputList1;
std:list <std::pair<string,string>> outputList2;
I have populated each list with User Settings that I have retrieved from an SQL database (I omit the SQL retrieval code here).
Example list:
outputList1 (first, second)
CanSeeAll, True
CanSubmit, False
CanControl, False
OutputList2:
CanSeeAll, False
CanSubmit, True
CanControl, False
I want to iterate through both lists and find the mismatches. For example, find the first string of the first pair of the first list to find the matching first string in the second list, then compare the second string to determine whether they match, then print out the non matching pairs to a new string (eventually to file), and so on.
In this example, the final string would have CanSeeAll and CanSubmit as the final output since those are the two that mismatch.
Here is what I've tried so far, but I get a blank string:
std::list <std::pair<std::string,std::string>>::iterator it1 = outputList1.begin();
std::list <std::pair<std::string,std::string>>::iterator it2 = outputList2.begin();
string token;
while (it1 != outputList1.end()){
if((*it1).first == ((*it2).first))
{
if((*it1).second != ((*it2).second))
{
token.append((*it1).first);
token.append(",");
token.append((*it1).second);
token.append("\r\n");
}
it1++;
it2 = outputList2.begin();
}
it2++;
if (it2 == outputList2.end())
it1++;
}
I know this logic is flawed as it will skip the first pair on the second list after the first iteration, but this is the best I can come up with at the moment, and I am banging my head on the keyboard a the moment.
Thanks everyone!
As I understand the problem,
you want to compare every element of one list, to every other element of another list.
You could use a pair of nested range based for loops.
#include <list>
#include <string>
int main(){
std::list<std::pair<std::string,std::string>> l1;
std::list<std::pair<std::string,std::string>> l2;
for (auto x: l1){
for (auto y: l2){
//compare x to y
}
}
}
The answer uses an auxiliary map but, have in mind you will get better result if you use two maps (or hash tables) instead of two list.
// create a map for elements in l2
std::map<std::string, std::string> l2map;
// move elements from l2 to the map so we get O(N*log(N)) instead of O(n²)
for (std::list<std::pair<std::string,std::string> >::iterator it = l2.begin();
it != l2.end();
++it)
{
l2map.insert(*it);
}
// walk l1 and look in l2map
for (std::list<std::pair<std::string,std::string> >::iterator l1it = l1.begin();
l1it != l1.end();
++l1it)
{
// look for the element with the same key in l2
// l1it->first is the key form l1
std::map<std::string, std::string>::iterator l2it = l2map.find(l1it->first);
if (l2it != l2map.end()) {
// found, then compare
if (l1it->second != l2it->second) { // l1it->second is the value from l1
// mismatch
}
} else {
// not in l2
}
}
You could use std::mismatch with the pre-condition: all settings occur in the same order in both lists (you could do a sort if this is not the case)
auto iterPair = std::mismatch(l1.begin(), l1.end(), l2.begin());
while (iterPair.first != l1.end()) {
// TODO: Handle the mismatching iterators
iterPair = std::mismatch(iterPair.first + 1, l1.end(), iterPair.second + 1);
}
If the keys in your lists come in the same order, as in your example, you can traverse the lists linearly:
std::ostringstream s;
std:list<std::pair<string, string>>::const_iterator i2(outputList2.cbegin());
for(auto const &pair: outputList1) {
if(pair.second != i2->second) {
s << pair.first << ": " << pair.second << " != " << i2->second << endl;
}
++i2;
}
Alternatively, use STL algorithms:
#include <algorithm>
typedef std::list<std::pair<std::string, std::string>> List;
std::ostringstream s;
for(
auto itrs(
std::mismatch(
outputList1.cbegin(), outputList1.cend(), outputList2.cbegin()
, [](auto const &l, auto const &r){ return l.second == r.second; }))
; itrs.first != outputList1.cend()
; itrs = std::mismatch(itrs.first, outputList1.cend(), itrs.second
, [](auto const &l, auto const &r){ return l.second == r.second; }))
{
s << itrs.first->first << ": "
<< itrs.first->second << " != " << itrs.second->second
<< std::endl;
}

Randomly selecting specific subsequence from string

Given a string containing a number of characters interspersed with dashes, for example string s = "A--TG-DF----GR--";, I wish to randomly select a block of dashes (could be of size 1, 2, …, max number of consecutive dashes in string), and copy them over to another part of the string chosen at random.
For example, A--TG-DF(---)-GR-- gets moved to A--T(---)G-DF-GR-- while
another iteration may give A--TG-DF----GR(--) gets moved to A--TG-(--)DF----GR.
I'm generating random indices of the string through int i = rand() % (int) s.length();. Insertion happens through s.insert(rand() % (int) s.length(), substr);, where substr is the block of dashes.
My main problem lies in finding randomly a continuous group of dashes. I thought of using s.find("-"), but that'd only return the first instance of a single dash, and not a random position of a collection of dashes.
I know this problem is likely steeped in XY problems, but I found it a nice challenge none-the-less, so I thought about implementing it with the Boost Interval Container library.
The beauty of the library is that you can forget about a lot of details, while you don't sacrifice a lot of performance.
I've taken the liberty to generalize it, so that it is capable of moving multiple blocks of dashes (uniform randomly selected) simultaneously.
The solution runs Live On Coliru and generates 1,000,000 random transpositions of the given sample with randomly varied numbers of moved blocks (1..3) in about 2673 ms (1156 ms on my machine):
Generator gen(test_case);
std::string result;
std::map<std::string, size_t> histo;
for(int i = 0; i < 1000000; ++i) {
auto const mobility = gen.gen_relocations(1 + gen.random(3)); // move n blocks of dashes
result.clear();
gen.apply_relocations(mobility, std::back_inserter(result));
histo[result]++;
}
Note: the benchmark times include the time taken to build the histogram of unique results generated
Let's do a code walk-through here to explain things:
I tried to use "readable" types:
namespace icl = boost::icl;
using Position = size_t;
using Map = icl::interval_set<Position>;
using Region = Map::value_type;
E.g. the function that builds the Map of dashes is simply:
template <typename It> Map region_map(It f, It l) {
Map dashes;
for (Position pos = 0; f!=l; ++f, ++pos)
if ('-' == *f)
dashes += pos;
return dashes;
}
Note how I didn't particularly optimize this. I let the interval_set combine adjacent dashes. We might use hinted insertion, or a parser that add consecutive dashes as a block. I opted for KISS here.
Later on, we generate relocations, which map a Region onto a non-moving Position in the rest of the text.
using Relocs = boost::container::flat_multimap<Position, Region>;
By using the flat multimap, the caller has the entries already sorted by ascending insertion point. Because we use a reserve()-ed up-front flat multimap, we avoid the overhead of a node based implementation of map here.
We start by picking the dash-blocks to be moved:
Map pick_dashes(int n) {
Map map;
if (!_dashes.empty())
for (int i = 0; i < n; ++i)
map += *std::next(_dashes.begin(), _select(_rng));
return map;
}
The random distribution have been dimensioned at construction, e.g.:
_dashes(region_map(_input.begin(), _input.end())),
_rng(std::random_device {}()),
_select (0, _dashes.iterative_size() - 1),
_randpos(0, _input.size() - 1),
Next, we assign insertion-positions to each. The trick is to assign positions inside non-mobile (inert) regions of the source.
this includes other blocks of dashes that stay in their place
there is the degenerate case where everything is a block of dashes, we detected this in the constructor:
_is_degenerate(cardinality(_dashes) == _input.size())
So the code reads as follows:
Relocs gen_relocations(int n=1) {
Map const moving = pick_dashes(n);
Relocs relocs;
relocs.reserve(moving.iterative_size());
if (_is_degenerate)
{
// degenerate case (everything is a dash); no non-moving positions
// exist, just pick 0
for(auto& region : moving)
relocs.emplace(0, region);
} else {
auto inertia = [&] {
Position inert_point;
while (contains(moving, inert_point = _randpos(_rng)))
; // discard insertion points that are moving
return inert_point;
};
for(auto& region : moving)
relocs.emplace(inertia(), region);
}
return relocs;
}
Now all we need to do is apply the relocations.
The generic implementation of this is pretty straightforward. Again, it's not particularly optimized in order to keep it simple (KISS):
template <typename F>
void do_apply_relocations(Relocs const& mobility, F const& apply) const {
icl::split_interval_set<Position> steps {{0, _input.size()}};
for (auto& m : mobility) {
steps += m.first; // force a split on insertion point
steps -= m.second; // remove the source of the move
//std::cout << m.second << " moving to #" << m.first << ": " << steps << "\n";
}
auto next_mover = mobility.begin();
for(auto& step : steps) {
while (next_mover!=mobility.end() && contains(step, next_mover->first))
apply((next_mover++)->second, true);
apply(step, false);
}
}
Note The trick here is that we "abuse" the split_interval_set combining strategy to break the processing into sub-ranges that "stop" at the randomly generated insertion points: these artificial regions are the "steps" in our generation loop.
The apply function there is what we implement to get what we want. In our case we want a string like A--TG-DFGR(----)-- so we write an overload that appends to a container (e.g. std::string) using any output iterator:
template <typename Out>
Out apply_relocations(Relocs const& mobility, Out out) const {
if (_is_degenerate)
return std::copy(_input.begin(), _input.end(), out);
auto to_iter = [this](Position pos) { return _input.begin() + pos; };
do_apply_relocations(mobility, [&](Region const& r, bool relocated) {
if (relocated) *out++ = '(';
out = std::copy(
to_iter(first(r)),
to_iter(last(r)+1),
out
);
if (relocated) *out++ = ')';
});
return out;
}
Note The "complicated" part here are mapping the Position to input iterators (to_iter) and the code to optionally add () around moved blocks.
And with that, we have seen all the code.
Full Listing
#include <boost/container/flat_map.hpp>
#include <boost/icl/interval_set.hpp>
#include <boost/icl/split_interval_set.hpp>
#include <boost/icl/separate_interval_set.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/range/algorithm.hpp>
#include <iomanip>
#include <iostream>
#include <random>
#include <map>
#include <chrono>
namespace icl = boost::icl;
using Position = size_t;
using Map = icl::interval_set<Position>;
using Region = Map::value_type;
using Relocs = boost::container::flat_multimap<Position, Region>;
struct Generator {
Generator(std::string const& input)
: _input(input),
_dashes(region_map(_input.begin(), _input.end())),
_rng(std::random_device {}()),
_select (0, _dashes.iterative_size() - 1),
_randpos(0, _input.size() - 1),
_is_degenerate(cardinality(_dashes) == _input.size())
{
}
unsigned random(unsigned below) {
return _rng() % below; // q&d, only here to make the tests deterministic for a fixed seed
}
Map full() const {
return Map { { 0, _input.size() } };
}
Relocs gen_relocations(int n=1) {
Map const moving = pick_dashes(n);
Relocs relocs;
relocs.reserve(moving.iterative_size());
if (_is_degenerate)
{
// degenerate case (everything is a dash); no non-moving positions
// exist, just pick 0
for(auto& region : moving)
relocs.emplace(0, region);
} else {
auto inertia = [&] {
Position inert_point;
while (contains(moving, inert_point = _randpos(_rng)))
; // discard insertion points that are moving
return inert_point;
};
for(auto& region : moving)
relocs.emplace(inertia(), region);
}
return relocs;
}
template <typename Out>
Out apply_relocations(Relocs const& mobility, Out out) const {
if (_is_degenerate)
return std::copy(_input.begin(), _input.end(), out);
auto to_iter = [this](Position pos) { return _input.begin() + pos; };
do_apply_relocations(mobility, [&](Region const& r, bool relocated) {
if (relocated) *out++ = '(';
out = std::copy(
to_iter(first(r)),
to_iter(last(r)+1),
out
);
if (relocated) *out++ = ')';
});
return out;
}
template <typename F>
void do_apply_relocations(Relocs const& mobility, F const& apply) const {
icl::split_interval_set<Position> steps {{0, _input.size()}};
for (auto& m : mobility) {
steps += m.first; // force a split on insertion point
steps -= m.second; // remove the source of the move
//std::cout << m.second << " moving to #" << m.first << ": " << steps << "\n";
}
auto next_mover = mobility.begin();
for(auto& step : steps) {
while (next_mover!=mobility.end() && contains(step, next_mover->first))
apply((next_mover++)->second, true);
apply(step, false);
}
}
private:
std::string _input;
Map _dashes;
std::mt19937 _rng;
std::uniform_int_distribution<Position> _select;
std::uniform_int_distribution<Position> _randpos;
bool _is_degenerate;
Map pick_dashes(int n) {
Map map;
if (!_dashes.empty())
for (int i = 0; i < n; ++i)
map += *std::next(_dashes.begin(), _select(_rng));
return map;
}
template <typename It> Map region_map(It f, It l) {
Map dashes;
for (Position pos = 0; f!=l; ++f, ++pos)
if ('-' == *f)
dashes += pos;
return dashes;
}
};
int main() {
for (std::string test_case : {
"----",
"A--TG-DF----GR--",
"",
"ATGDFGR",
})
{
auto start = std::chrono::high_resolution_clock::now();
Generator gen(test_case);
std::string result;
std::map<std::string, size_t> histo;
for(int i = 0; i < 1000000; ++i) {
auto const mobility = gen.gen_relocations(1 + gen.random(3)); // move n blocks of dashes
result.clear();
gen.apply_relocations(mobility, std::back_inserter(result));
histo[result]++;
}
std::cout << histo.size() << " unique results for '" << test_case << "'"
<< " in " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now()-start).count() << "ms\n";
std::multimap<size_t, std::string, std::greater<size_t> > ranked;
for (auto& entry : histo)
ranked.emplace(entry.second, entry.first);
int topN = 10;
for (auto& rank : ranked)
{
std::cout << std::setw(8) << std::right << rank.first << ": " << rank.second << "\n";
if (0 == --topN)
break;
}
}
}
Prints e.g.
1 unique results for '----' in 186ms
1000000: ----
3430 unique results for 'A--TG-DF----GR--' in 1156ms
9251: A(----)--TG-DFGR--
9226: (----)A--TG-DFGR--
9191: A--T(----)G-DFGR--
9150: A--TG-DFGR-(----)-
9132: A--(----)TG-DFGR--
9128: A--TG(----)-DFGR--
9109: A--TG-D(----)FGR--
9098: A--TG-DFG(----)R--
9079: A--TG-DFGR(----)--
9047: A-(----)-TG-DFGR--
1 unique results for '' in 25ms
1000000:
1 unique results for 'ATGDFGR' in 77ms
1000000: ATGDFGR
You can pre-process the string to get a list of iterators that point ot the beginnings of consecutive dashes in the string and then uniformly pick a random element from that list.
I will use the following standard library headers in this example (which is complete and working if you concatenate the following code blocks):
#include <cstddef>
#include <iostream>
#include <random>
#include <stdexcept>
#include <string>
#include <vector>
First, we define a function that finds us said list of iterators. To do so, we make use of std::string::find_first_of and std::string::find_first_not_of to find the index of the first character in and of the first character after the next sequence. Both functions work with indices rather than with iterators, so we have to add them to cbegin(). The function works with any character, not just dashes.
std::vector<std::string::const_iterator>
find_conscutive_sequences(const std::string& text, const char c)
{
std::vector<std::string::const_iterator> positions {};
std::size_t idx = 0UL;
while (idx != std::string::npos && idx < text.length())
{
const auto first = text.find_first_of(c, idx);
if (first == std::string::npos)
break;
positions.push_back(text.cbegin() + first);
idx = text.find_first_not_of(c, first);
}
return positions;
}
Next, we define a function that uses the result of the above function to return an iterator to the beginning of a randomly selected sequence of dashes.
We pass in the random engine as a parameter so it can be seeded once and used over and over again. The random standard library introduced in C++11 is so great that it should be preferred whenever possible over the legacy rand function.
If given an empty vector of positions, we have to fail because there is no sequence we could possibly select.
std::string::const_iterator
get_random_consecutive_sequence(const std::vector<std::string::const_iterator>& positions,
std::default_random_engine& prng)
{
if (positions.empty())
throw std::invalid_argument {"string does not contain any sequence"};
std::uniform_int_distribution<std::size_t> rnddist {0UL, positions.size() - 1UL};
const auto idx = rnddist(prng);
return positions.at(idx);
}
Finally, I define this little helper function to mark the selected sequence. Your code would do the copy / move / shift here.
std::string
mark_sequence(const std::string& text,
const std::string::const_iterator start)
{
const auto c = *start;
const std::size_t first = start - text.cbegin();
std::size_t last = text.find_first_not_of(c, first);
if (last == std::string::npos)
last = text.length();
std::string marked {};
marked.reserve(text.length() + 2UL);
marked += text.substr(0UL, first);
marked += '(';
marked += text.substr(first, last - first);
marked += ')';
marked += text.substr(last, text.length() - last);
return marked;
}
It can be used like this.
int
main()
{
const std::string example {"--A--B-CD----E-F---"};
std::random_device rnddev {};
std::default_random_engine rndengine {rnddev()};
const auto positions = find_conscutive_sequences(example, '-');
for (int i = 0; i < 10; ++i)
{
const auto pos = get_random_consecutive_sequence(positions, rndengine);
std::cout << mark_sequence(example, pos) << std::endl;
}
}
Possible output:
--A--B-CD(----)E-F---
--A--B(-)CD----E-F---
--A(--)B-CD----E-F---
--A(--)B-CD----E-F---
--A--B-CD(----)E-F---
--A--B-CD----E-F(---)
--A--B-CD----E-F(---)
(--)A--B-CD----E-F---
--A--B(-)CD----E-F---
(--)A--B-CD----E-F---
string::find() has optional second parameter: a position to start the search from. So, something like s.find("-", rand() % L) may do the trick for you, where L is (position of the last dash + 1).
As I understand the problem all dash blocks should have the same probability of being selected. Therefore we must first find the positions where all these blocks start and then pick one of those positions at Random.
If I'm allowed to use Smalltalk for pseudo code, then I would first find the indexes where every dash block starts:
dashPositionsOn: aString
| indexes i n |
indexes := OrderedCollection new.
i := 1.
n := aString size.
[i <= n] whileTrue: [| char |
char := aString at: i.
char = $-
ifTrue: [
indexes add: i.
[
i := i + 1.
i <= n and: [
char := aString at: i.
char = $-]] whileTrue]
ifFalse: [i := i + 1]].
^indexes
Now we can pick one of these indexes at random: indexes atRandom.
Please note that there are (much) better ways to implement this algorithm in Smalltalk (as well as in other languages).

Find nearest points in a vector

Given a sorted vector with a number of values, as in the following example:
std::vector<double> f;
f.pushback(10);
f.pushback(100);
f.pushback(1000);
f.pushback(10000);
I'm looking for the most elegant way to retrieve for any double d the two values that are immediately adjacent to it. For example, given the value "45", I'd like this to return "10" and "100".
I was looking at lower_bound and upper_bound, but they don't do what I want. Can you help?
EDIT: I've decided to post my own anser, as it is somewhat a composite of all the helpful answers that I got in this thread. I've voted up those answers which I thought were most helpful.
Thanks everyone,
Dave
You can grab both values (if they exist) in one call with equal_range(). It returns a std::pair of iterators, with first being the first location and second being the last location in which you could insert the value passed without violating ordering. To strictly meet your criteria, you'd have to decrement the iterator in first, after verifying that it wasn't equal to the vector's begin().
You can use STL's lower_bound to get want you want in a few lines of code. lower_bound uses binary search under the hood, so your runtime is O(log n).
double val = 45;
double lower, upper;
std::vector<double>::iterator it;
it = lower_bound(f.begin(), f.end(), val);
if (it == f.begin()) upper = *it; // no smaller value than val in vector
else if (it == f.end()) lower = *(it-1); // no bigger value than val in vector
else {
lower = *(it-1);
upper = *it;
}
You could simply use a binary search, which will run in O(log(n)).
Here is a Lua snippet (I don't have time to do it in C++, sorry) which does what you want, except for limit conditions (that you did not define anyway) :
function search(value, list, first, last)
if not first then first = 1; last = #list end
if last - first < 2 then
return list[first], list[last]
end
local median = math.ceil(first + (last - first)/2)
if list[median] > value then
return search(value, list, first, median)
else
return search(value, list, median, last)
end
end
local list = {1,10,100,1000}
print(search(arg[1] + 0, list))
It takes the value to search from the command line :
$ lua search.lua 10 # didn't know what to do in this case
10 100
$ lua search.lua 101
100 1000
$ lua search.lua 99
10 100
I'm going to post my own anser, and vote anyone up that helped me to reach it, since this is what I'll use in the end, and you've all helped me reach this conclusion. Comments are welcome.
std::pair<value_type, value_type> GetDivisions(const value_type& from) const
{
if (m_divisions.empty())
throw 0; // Can't help you if we're empty.
std::vector<value_type>::const_iterator it =
std::lower_bound(m_divisions.begin(), m_divisions.end(), from);
if (it == m_divisions.end())
return std::make_pair(m_divisions.back(), m_divisions.back());
else if (it == m_divisions.begin())
return std::make_pair(m_divisions.front(), m_divisions.front());
else
return std::make_pair(*(it - 1), *(it));
}
What if (in your case) d is less than the first element or more than the last? And how to deal with negative values? By the way: guaranteeing that your "d" lives between the first and the last value of your vector you can do like that:
// Your initializations
std::vector<double>::const_iterator sit = f.begin();
double upper, lower;
Here is the rest:
while ( *sit < d ) // if the element is still less than your d
++sit; // increase your iterator
upper = *sit; // here you get the upper value
lower = *(--sit); // and here your lower
Elegant enough? :/
You could do a search in your vector for your value (which would tell you where your value would be if it were in the vector) and then return the value before and after that location. So searching for 45 would tell you it should be at index=1 and then you would return 0 and 1 (depending on your implementation of the search, you'll either get the index of the smaller value or the index of the larger value, but this is easy to check with a couple boundary conditions). This should be able to run in O(log n) where n is the number of elements in your vector.
I would write something like this, didn't test if this compiles, but you get the idea:
template <typename Iterator>
std::pair<Iterator, Iterator> find_best_pair(Iterator first, Iterator last, const typename Iterator::value_type & val)
{
std::pair<Iterator, Iterator> result(last, last);
typename Iterator::difference_type size = std::distance(first, last);
if (size == 2)
{
// if the container is of size 2, the answer is the two elements
result.first = first;
result.first = first;
++result.first;
}
else
{
// must be of at lease size 3
if (size > 2)
{
Iterator second = first;
++second;
Iterator prev_last = last;
--prev_last;
Iterator it(std::lower_bound(second, last, val));
if (it != last)
{
result.first = it;
result.second = it;
if (it != prev_last)
{
// if this is not the previous last
// then the answer is (it, it + 1)
++result.second;
}
else
{
// if this the previous last
// then the answer is (it - 1, it)
--result.first;
}
}
}
}
return result;
}
I wrote up this little function, which seems to fit the more general case you wanted. I haven't tested it totally, but I did write a little test code (included).
#include <algorithm>
#include <iostream>
#include <vector>
template <class RandomAccessIt, class Container, class T>
std::pair<RandomAccessIt, RandomAccessIt> bracket_range(RandomAccessIt begin, RandomAccessIt end, Container& c, T val)
{
typename Container::iterator first;
typename Container::iterator second;
first = std::find(begin, end, val);
//Find the first value after this by iteration
second = first;
if (first == begin){ // Found the first element, so set this to end to indicate no lower values
first = end;
}
else if (first != end && first != begin) --first; //Set this to the first value before the found one, if the value was found
while (second != end && *second == val) ++second;
return std::make_pair(first,second);
}
int main(int argc, _TCHAR* argv[])
{
std::vector<int> values;
std::pair<std::vector<int>::iterator, std::vector<int>::iterator> vals;
for (int i = 1; i < 9; ++i) values.push_back(i);
for (int i = 0; i < 10; ++i){
vals = bracket_range(values.begin(), values.end(),values, i);
if (vals.first == values.end() && vals.second == values.end()){ // Not found at all
std::cout << i << " is not in the container." << std::endl;
}
else if (vals.first == values.end()){ // No value lower
std::cout << i << ": " << "None Lower," << *(vals.second) << std::endl;
}
else if (vals.second == values.end()) { // No value higher
std::cout << i << ": " << *(vals.first) << ", None Higher" << std::endl;
}
else{
std::cout << i << ": " << *(vals.first) << "," << *(vals.second) << std::endl;
}
}
return 0;
}
Based on the code that tunnuz posted, here you have some improvements regarding bound checking:
template<typename T>
void find_enclosing_values(const std::vector<T> &vec, const T &value, T &lower, T &upper, const T &invalid_value)
{
std::vector<T>::const_iterator it = vec.begin();
while (it != vec.end() && *it < value)
++it;
if(it != vec.end())
upper = *it;
else
upper = invalid_value;
if(it == vec.begin())
lower = invalid_value;
else
lower = *(--it);
}
Example of usage:
std::vector<int> v;
v.push_back(3);
v.push_back(7);
v.push_back(10);
int lower, upper;
find_enclosing_values(v, 4, lower, upper, -1);
std::cout<<"lower "<<lower<<" upper "<<upper<<std::endl;
If you have the ability to use some other data structure (not a vector), I'd suggest a B-tree. If you data is unchanging, I believe you can retrieve the result in constant time (logarithmic time at the worst).