Removing all empty elements in a vector from end - c++

Given a std::vector of strings, what is the best way of removing all elements starting from the end that are empty (equal to empty string or whitespace). The removal of elements should stop when a non-empty element is found.
My current method, (work in progress) is something like:
while (Vec.size() > 0 && (Vec.back().size() == 0 || is_whitespace(Vec.back()))
{
Vec.pop_back();
}
where is_whitespace returns a bool stating if a string is whitespace or not
I suspect that my method will resize the vector at each iteration and that is suboptimal. Maybe with some algorithm it is possible to do in one step.
Input: { "A", "B", " ", "D", "E", " ", "", " " }
Desired Output: { "A", "B", " ", "D", "E" }

As I did not find a good dupe on first glance, here is a simple solution:
// Helper function to see if string is all whitespace
// Can also be implemented as free-function for readablity and
// reusability of course
auto stringIsWhitespace = [](const auto &str)
{
return std::all_of(
begin(str), end(str), [](unsigned char c) { return std::isspace(c); });
};
// Find first non-whitespace string from the back
auto it = std::find_if_not(rbegin(Vec), rend(Vec), stringIsWhitespace);
// Erase from there to the end
Vec.erase(it.base(), end(Vec));
Note the unsigned in the lambda due to this gotcha.
Live example thanks to #Killzone Kid.

Here's a better way:
for (auto it = Vec.rbegin(); it != Vec.rend() && is_whitespace(*it); )
{
it = Vec.erase(it);
}
It will start from the end and stop once non-whitespace has been encountered or the beginning of the vector is reached, whichever comes first. Note that I don't increment the iterator in the for loop.

Related

Split list when predicate is true

Does Kotlin provide a mutation function to split a list when a specific predicate is true?
In the following example the list should be split when the element is a ..
The result should be of the type List<List<String>>.
// input list
val list = listOf(
"This is", "the", "first sentence", ".",
"And", "now there is", "a second", "one", ".",
"Nice", "."
)
// the following should be the result of the transformation
listOf(
listOf("This is", "the", "first sentence"),
listOf("And", "now there is", "a second", "one"),
listOf("Nice")
)
I need something like list.splitWhen { it == "." }
Does Kotlin provide a mutation function to split a list when a
specific predicate is true?
The closest one I have heard of is partition(), however I don't think it will work in your case.
I have made and have briefly tested 3 higher order extension functions, which gives the same expected output.
Solution 1: Straightforward approach
inline fun List<String>.splitWhen(predicate: (String)->Boolean):List<List<String>> {
val list = mutableListOf<MutableList<String>>()
var needNewList = false
forEach {
string->
if(!predicate(string)){
if(needNewList||list.isEmpty()){
list.add(mutableListOf(string))
needNewList= false
}
else {
list.last().add(string)
}
}
else {
/* When a delimiter is found */
needNewList = true
}
}
return list
}
Solution 2: Pair based approach
inline fun List<String>.splitWhen(predicate: (String)->Boolean):List<List<String>> {
val list = mutableListOf<List<String>>()
withIndex()
.filter { indexedValue -> predicate(indexedValue.value) || indexedValue.index==0 || indexedValue.index==size-1} // Just getting the delimiters with their index; Include 0 and last -- so to not ignore it while pairing later on
.zipWithNext() // zip the IndexValue with the adjacent one so to later remove continuous delimiters; Example: Indices : 0,1,2,5,7 -> (0,1),(1,2),(2,5),(5,7)
.filter { pair-> pair.first.index + 1 != pair.second.index } // Getting rid of continuous delimiters; Example: (".",".") will be removed, where "." is the delimiter
.forEach{pair->
val startIndex = if(predicate(pair.first.value)) pair.first.index+1 else pair.first.index // Trying to not consider delimiters
val endIndex = if(!predicate(pair.second.value) && pair.second.index==size-1) pair.second.index+1 else pair.second.index // subList() endIndex is exclusive
list.add(subList(startIndex,endIndex)) // Adding the relevant sub-list
}
return list
}
Solution 3: Check next value if delimiter found approach
inline fun List<String>.splitWhen(predicate: (String)-> Boolean):List<List<String>> =
foldIndexed(mutableListOf<MutableList<String>>(),{index, list, string->
when {
predicate(string) -> if(index<size-1 && !predicate(get(index+1))) list.add(mutableListOf()) // Adds a new List within the output List; To prevent continuous delimiters -- !predicate(get(index+1))
list.isNotEmpty() -> list.last().add(string) // Just adding it to lastly added sub-list, as the string is not a delimiter
else -> list.add(mutableListOf(string)) // Happens for the first String
}
list})
Simply call list.splitWhen{it=="delimiter"}. Solution 3 looks more syntactic sugar. Apart from it, you can do some performance test to check which one performs well.
Note: I have done some brief tests which you can have a look via Kotlin Playground or via Github gist.

How can I find the first word in a vector of strings that matches a user given prefix?

Let's say I have a sorted vector of strings:
std::vector<std::string> Dictionary
Dictionary.push_back("ant");
Dictionary.push_back("anti-matter");
Dictionary.push_back("matter");
Dictionary.push_back("mate");
Dictionary.push_back("animate");
Dictionary.push_back("animal");
std::sort(Dictionary.begin(), Dictionary.end());
I want to find the first word in the vector that matches a prefix, but every example I found use a hard coded string as prefix. For example, I can define a boolean unary function for finding the "an" prefix:
bool find_prefix(std::string &S) {
return S.compare(0, 2, "an");
}
and use it as the predicate of the std::find_if() function to find an iterator to the first match. But how can I search for user given string as a prefix? Is it possible to use binary predicates in some way? Or build a "pseudo-unary" predicate that depends on a variable and a parameter?
Or, is there any other container and methods that I should use in this problem?
I know that there are much more efficient and elegant structures to store a dictionary for prefix search, but I'm a beginner self-learning programming, so first I'd like to learn how to use the standard containers before adventuring in more complex structures.
You can write find_prefix as a lambda. That lets you capture the string you want to search for, and use that for the comparison:
string word = ... // the prefix you're looking for
auto result = std::find_if(Dictionary.begin(), Dictionary.end(),
[&word](string const &S) {
return ! S.compare(0, word.length(), word);
});
Since you are sorting the vector, you should take advantage that the vector is sorted.
Rather than doing a linear search for a match, you can use std::lower_bound to put you close to, if not right on the entry that matches the prefix:
#include <vector>
#include <string>
#include <iostream>
#include <algorithm>
int main()
{
std::vector<std::string> Dictionary;
Dictionary.push_back("ant");
Dictionary.push_back("anti-matter");
Dictionary.push_back("matter");
Dictionary.push_back("mate");
Dictionary.push_back("animate");
Dictionary.push_back("animal");
std::sort(Dictionary.begin(), Dictionary.end());
std::vector<std::string> search_test = {"an", "b", "ma", "m", "x", "anti"};
for (auto& s : search_test)
{
auto iter = std::lower_bound(Dictionary.begin(), Dictionary.end(), s);
// see if the item returned actually is a match
if ( iter->size() >= s.size() && iter->substr(0, s.size()) == s )
std::cout << "The string \"" << s << "\" has a match on \"" << *iter << "\"\n";
else
std::cout << "no match for \"" << s << "\"\n";
}
}
Output:
The string "an" has a match on "animal"
no match for "b"
The string "ma" has a match on "mate"
The string "m" has a match on "mate"
no match for "x"
The string "anti" has a match on "anti-matter"
The test after the lower_bound is done to see if the string actually matches the one found by lower_bound.

Using regex to parse out numbers

My problem is more or less self-explanatory, I want to write a regex to parse out numbers out of a string that user enters via console. I take the user input using:
getline(std::cin,stringName); //1 2 3 4 5
I asume that user enters N numbers followed by white spaces except the last number.
I have solved this problem by analyzing string char by char like this:
std::string helper = "";
std::for_each(stringName.cbegin(), strinName.cend(), [&](char c)
{
if (c == ' ')
{
intVector.push_back(std::stoi(helper.c_str()));
helper = "";
}
else
helper += c;
});
intVector.push_back(std::stoi(helper.c_str()));
I want to achieve the same behavior by using regex. I've wrote the following code:
std::regex rx1("([0-9]+ )");
std::sregex_iterator begin(stringName.begin(), stringName.end(), rx1);
std::sregex_iterator end;
while (begin != end)
{
std::smatch sm = *begin;
int number = std::stoi(sm.str(1));
std::cout << number << " ";
}
Problem with this regex occurs when it gets to the last number since it doesn't have space behind it, therefore it enters an infinite loop. Can someone give me an idea on how to fix this?
You're going to get an endless loop there because you never increment begin. If you do that, you'll get all the numbers except the last one (which, as you say, is not followed by a space).
But I don't understand why you feel it necessary to include the whitespace in the regular expression. If you just match a string of digits, the regex will automatically select the longest possible match, so the following character (if any) cannot be a digit.
I also see no value in the capture in the regex. If you wanted to restrict the capture to the number itself, you would have used ([0-9]+). (But since stoi only converts until it finds a non-digit, it doesn't matter.)
So you just use this:
std::regex rx1("[0-9]+");
for (auto it = std::sregex_iterator{str.begin(), str.end(), rx1},
end = std::sregex_iterator{};
it != end;
++it) {
std::cout << std::stoi(it->str(0)) << '\n';
}
(Live on coliru)

Recursive solutions for glob pattern matching

I'm currently studying implementations of UNIX-style glob pattern matching. Generally, the fnmatch library is a good reference implementation for this functionality.
Looking at some of the implementations, as well as reading various blogs/tutorials about this, it seems that this algorithm is usually implemented recursively.
Generally, any sort of algorithm that requires "back tracking", or requires returning to an earlier state, nicely lends itself to a recursive solution. This includes things like tree traversal, or parsing nested structures.
But I'm having trouble understanding why glob pattern matching in particular is so often implemented recursively. I get the idea that sometimes back tracking will be necessary, for example if we have a string aabaabxbaab, and a pattern a*baab, the characters after the * will match the first "baab" substring, like aa(baab)xbaab, and then fail to match the rest of the string. So the algorithm will need to backtrack so that the character match counter starts over, and we can match the second occurrence of baab, like: aabaabx(baab).
Okay, but generally recursion is used when we might require multiple nested levels of backtracking, and I don't see how that would be necessary in this case. It seems we'd only ever have to backtrack one level at a time, when the iterator over the pattern and the iterator over the input string fail to match. At this point, the iterator over the pattern would need to move back to the last "save point", which would either be the beginning of the string, or the last * that successfully matched something. This doesn't require a stack - just a single save point.
The only complication I can think of is in the event of an "overlapping" match. For example if we have the input string aabaabaab, and the pattern a*baab, we would fail to match because the "b" in the last baab could be part of either the first match or the second match. But it seems this could be solved by simply backtracking the input iterator if the distance between the last pattern iterator save point and the end of the pattern is greater than the distance between the input iterator position and the end of the input string.
So, as far as I'm seeing, it shouldn't be too much of an issue to implement this glob matching algorithm iteratively (assuming a very simple glob matcher, which only uses the * character to mean "match zero or more characters". Also, the matching strategy would be greedy.)
So, I assume I'm definitely wrong about this, because everyone else does this recursively - so I must be missing something. It's just that I can't see what I'm missing here. So I implemented a simple glob matcher in C++ (that only supports the * operator), to see if I could figure out what I'm missing. This is an extremely straightforward, simple iterative solution which just uses an inner loop to do the wildcard matching. It also records the indices which the * character matches in a vector of pairs:
bool match_pattern(const std::string& pattern, const std::string& input,
std::vector<std::pair<std::size_t, std::size_t>>& matches)
{
const char wildcard = '*';
auto pat = std::begin(pattern);
auto pat_end = std::end(pattern);
auto it = std::begin(input);
auto end = std::end(input);
while (it != end && pat != pat_end)
{
const char c = *pat;
if (*it == c)
{
++it;
++pat;
}
else if (c == wildcard)
{
matches.push_back(std::make_pair(std::distance(std::begin(input), it), 0));
++pat;
if (pat == pat_end)
{
matches.back().second = input.size();
return true;
}
auto save = pat;
std::size_t matched_chars = 0;
while (it != end && pat != pat_end)
{
if (*it == *pat)
{
++it;
++pat;
++matched_chars;
if (pat == pat_end && it != end)
{
pat = save;
matched_chars = 0;
// Check for an overlap and back up the input iterator if necessary
//
std::size_t d1 = std::distance(it, end);
std::size_t d2 = std::distance(pat, pat_end);
if (d2 > d1) it -= (d2 - d1);
}
}
else if (*pat == wildcard)
{
break;
}
else
{
if (pat == save) ++it;
pat = save;
matched_chars = 0;
}
}
matches.back().second = std::distance(std::begin(input), it) - matched_chars;
}
else break;
}
return it == end && pat == pat_end;
}
Then I wrote a series of tests to see if I could find some pattern or input string that would require multiple levels of backtracking (and therefore recursion or a stack), but I can't seem to come up with anything.
Here is my test function:
void test(const std::string& input, const std::string& pattern)
{
std::vector<std::pair<std::size_t, std::size_t>> matches;
bool result = match_pattern(pattern, input, matches);
auto match_iter = matches.begin();
std::cout << "INPUT: " << input << std::endl;
std::cout << "PATTERN: " << pattern << std::endl;
std::cout << "INDICES: ";
for (auto& p : matches)
{
std::cout << "(" << p.first << "," << p.second << ") ";
}
std::cout << std::endl;
if (result)
{
std::cout << "MATCH: ";
for (std::size_t idx = 0; idx < input.size(); ++idx)
{
if (match_iter != matches.end())
{
if (idx == match_iter->first) std::cout << '(';
else if (idx == match_iter->second)
{
std::cout << ')';
++match_iter;
}
}
std::cout << input[idx];
}
if (!matches.empty() && matches.back().second == input.size()) std::cout << ")";
std::cout << std::endl;
}
else
{
std::cout << "NO MATCH!" << std::endl;
}
std::cout << std::endl;
}
And my actual tests:
test("aabaabaab", "a*b*ab");
test("aabaabaab", "a*");
test("aabaabaab", "aa*");
test("aabaabaab", "aaba*");
test("/foo/bar/baz/xlig/fig/blig", "/foo/*/blig");
test("/foo/bar/baz/blig/fig/blig", "/foo/*/blig");
test("abcdd", "*d");
test("abcdd", "*d*");
test("aabaabqqbaab", "a*baab");
test("aabaabaab", "a*baab");
So this outputs:
INPUT: aabaabaab
PATTERN: a*b*ab
INDICES: (1,2) (3,7)
MATCH: a(a)b(aaba)ab
INPUT: aabaabaab
PATTERN: a*
INDICES: (1,9)
MATCH: a(abaabaab)
INPUT: aabaabaab
PATTERN: aa*
INDICES: (2,9)
MATCH: aa(baabaab)
INPUT: aabaabaab
PATTERN: aaba*
INDICES: (4,9)
MATCH: aaba(abaab)
INPUT: /foo/bar/baz/xlig/fig/blig
PATTERN: /foo/*/blig
INDICES: (5,21)
MATCH: /foo/(bar/baz/xlig/fig)/blig
INPUT: /foo/bar/baz/blig/fig/blig
PATTERN: /foo/*/blig
INDICES: (5,21)
MATCH: /foo/(bar/baz/blig/fig)/blig
INPUT: abcdd
PATTERN: *d
INDICES: (0,4)
MATCH: (abcd)d
INPUT: abcdd
PATTERN: *d*
INDICES: (0,3) (4,5)
MATCH: (abc)d(d)
INPUT: aabaabqqbaab
PATTERN: a*baab
INDICES: (1,8)
MATCH: a(abaabqq)baab
INPUT: aabaabaab
PATTERN: a*baab
INDICES: (1,5)
MATCH: a(abaa)baab
The parentheses that appear in the output after "MATCH: " show the substrings that were matched/consumed by each * character. So, this seems to work fine, and I can't seem to see why it would be necessary to backtrack multiple levels here - at least if we limit our pattern to only allow * characters, and we assume greedy matching.
So I assume I'm definitely wrong about this, and probably overlooking something simple - can someone help me to see why this algorithm might require multiple levels of backtracking (and therefore recursion or a stack)?
I didn't check all the details of your implementation, but it is certainly true that you can do the match without recursive backtracking.
You can actually do glob matching without backtracking at all by building a simple finite-state machine. You could translate the glob into a regular expression and build a DFA in the normal way, or you could use something very similar to the Aho-Corasick machine; if you tweaked your algorithm a little bit, you'd achieve the same result. (The key is that you don't actually need to backup the input scan; you just need to figure out the correct scan state, which can be precomputed.)
The classic fnmatch implementations are not optimized for speed; they're based on the assumption that patterns and target strings are short. That assumption is usually reasonable, but if you allow untrusted patterns, you're opening yourself up to a DoS attack. And based on that assumption, an algorithm similar to the one you present, which does not require precomputation, is probably faster in the vast majority of use cases than any algorithm which requires precomputing state transition tables while avoiding the exponential blowup with pathological patterns.

How do I detect "_" in a C++ string?

I want to know the positions of the "_" in a string:
string str("BLA_BLABLA_BLA.txt");
Something like:
string::iterator it;
for ( it=str.begin() ; it < str.end(); it++ ){
if (*it == "_") //this goes wrong: pointer and integer comparison
{
pos(1) = it;
}
cout << *it << endl;
}
Thanks,
André
Note that "_" is a string literal, while '_' is a character literal.
If you dereference an iterator into a string, what you get is a character. Of course, characters can only be compared to character literals, not to string literals.
However, as others have already noticed, you shouldn't implement such an algorithm yourself. It's been done a million times, two of which (std::string::find() and std::find()) ended up in C++' standard library. Use one of those.
std::find(str.begin(), str.end(), '_');
// ^Single quote!
string::find is your friend.
http://www.cplusplus.com/reference/string/string/find/
someString.find('_');
Why dont you use the find method : http://www.cplusplus.com/reference/string/string/find/
You can make use of the find function as:
string str = "BLA_BLABLA_BLA.txt";
size_t pos = -1;
while( (pos=str.find("_",pos+1)) != string::npos) {
cout<<"Found at position "<<pos<<endl;
}
Output:
Found at position 3
Found at position 10