Starting loop at specific index of a std::string? - c++

I wrote the following function:
std::regex r("");
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {}
It starts looking for regex matches from the beginning of the given string (str) But how may I tell it to exclude everything before specific index?
For example I want it to delete with all of what comes after index number 4 (Not including it).
Note: I am calling this code from another function so I tried something like str + 4 in the string parameter but I got an error that it's not l-value.

If I understand your question correctly you can pass a parameter to the function with the position where you'd like to start the search, and use it to set the iterator:
void print_str(const std::string& str, int pos)
{
std::regex r("\\{[^}]*\\}");
auto words_begin =
std::sregex_iterator(str.begin() + pos, str.end(), r);
//...
}
int main()
{
std::string str = "somestring";
func_str(str, 4);
}
Or pass the iterators themselves, one to the position you'd like to start the search and one to the end of the string:
void func_str(std::string::iterator it_begin, std::string::iterator it_end)
{
std::regex r("\\{[^}]*\\}");
auto words_begin =
std::sregex_iterator(it_begin, it_end, r);
//...
}
int main()
{
std::string str = "somestring";
func_str(str.begin() + 4, str.end());
}
As #bruno correctly stated, you may use str.substr(4) not str + 4, as an argument instead of the original string, the downside of the method is that it will create unnecessary copies of the string to be searched, as #Marek also correctly pointed out, thus the options of passing a position or begin and end iterators is less expensive. The upside is that you would not have to change anything in the function.

I suggest checking the std::smatch#position() to determine if the match is to be taken or discarded:
#include <iostream>
#include<regex>
int main() {
std::regex r("\\{[^}]*\\}");
std::string str("{1}, {2} and {3}");
auto words_begin =
std::sregex_iterator(str.begin(), str.end(), r);
auto words_end = std::sregex_iterator();
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
std::smatch m = *i;
if (m.position() > 4) {
std::cout << m.str() << std::endl;
}
}
return 0;
}
See the C++ demo online. Adjust the if condition as you need.
Here, the first {1} match is discarded since its position was less or equal than 4.

Related

remove element by position in a vector<string> in c++

I have been trying to remove the value False and 0;0 from a vector<string> plan; containing the following
1003;2021-03-09;False;0;0;1678721F
1005;2021-03-05;False;0;0;1592221D
1005;2021-03-06;False;0;0;1592221D
1003;2021-03-07;False;0;0;1592221D
1003;2021-03-08;False;0;0;1592221D
1004;2021-03-09;False;0;0;1592221D
1004;2021-03-10;False;0;0;1592221D
1001;2021-03-11;False;0;0;1592221D
but the solutions I have found only work with int, and I tried the following
remove(plan.begin(), plan.end(), "False");
also with erase, but it didn't work
what is the mistake that I am making, or how should I do to eliminate the values that I want, which are in the position [2] [3] and [4], thanks for any help.
[Note: With the assumption 1003;2021-03-09;False;0;0;1678721F corresponding to a row inside std::vector<string>]
std::remove : Removes from the vector either a single element (position) or a range of elements ([first, last)).
In case std::vector<string> plan contains value False then it is removed.
std::vector < std::string > plan =
{
"1003","2021-03-09","False","0;0","1678721F"
};
std::remove(plan.begin(),plan.end(),"False");
In your case you need to remove given sub-string from each row of the plan. You need to iterate through all the rows to remove given value using std::string::erase.
std::vector < std::string > plan =
{
"1003;2021-03-09;False;0;0;1678721F",
"1005;2021-03-05;False;0;0;1592221D",
"1005;2021-03-06;False;0;0;1592221D",
"1003;2021-03-07;False;0;0;1592221D",
"1003;2021-03-08;False;0;0;1592221D",
"1004;2021-03-09;False;0;0;1592221D",
"1004;2021-03-10;False;0;0;1592221D",
"1001;2021-03-11;False;0;0;1592221D"};
for (auto & e:plan)
{
//As position of False;0;0; is at a fixed index, i.e: from index:16, 10 characters are removed
e.erase (16, 10);
}
To generalize, You can make use of std::String::find to find a sub-string and erase it.
void removeSubstrs(string& s, string p) {
string::size_type n = p.length();
for (string::size_type i = s.find(p);
i != string::npos;
i = s.find(p))
s.erase(i, n);
}
int
main ()
{
std::vector < std::string > plan =
{
"1003;2021-03-09;False;0;0;1678721F",
"1005;2021-03-05;False;0;0;1592221D",
"1005;2021-03-06;False;0;0;1592221D",
"1003;2021-03-07;False;0;0;1592221D",
"1003;2021-03-08;False;0;0;1592221D",
"1004;2021-03-09;False;0;0;1592221D",
"1004;2021-03-10;False;0;0;1592221D",
"1001;2021-03-11;False;0;0;1592221D"};
for (auto & e:plan)
{
removeSubstrs (e, ";False;0;0");
}
for (auto e:plan)
std::cout << e << std::endl;
return 0;
}
[Note: This answer assumes that each line corresponds to an element in the vector]
With the statement
remove(plan.begin(), plan.end(), "False");
you try to remove all elements from the vector that are equal to "False".
You need to iterate over the vector and erase the sub-string from each and every string in the vector.
For example you can use a range for loop to iterate over all the strings (or rather references to them), and then use the std::string functions find to find the sub-strings you want to remove and replace to replace the sub-strings with empty strings (i.e. nothing).
If you are sure that there is only one occurrence of "First" and "0;0" in your vector, you can use something like this:
std::string EraseFirstSubString(
const std::string & main_str,
const std::string & sub_str)
{
std::string new_main_str = main_str;
size_t pos = new_main_str.find(sub_str);
if (pos != std::string::npos)
{
new_main_str.erase(pos, sub_str.length());
}
return new_main_str;
}
int main()
{
std::vector<std::string> plan = {
"1003;2021-03-09;False;0;0;1678721F",
"1005;2021-03-05;False;0;0;1592221D",
"1005;2021-03-06;False;0;0;1592221D",
"1003;2021-03-07;False;0;0;1592221D",
"1003;2021-03-08;False;0;0;1592221D",
"1004;2021-03-09;False;0;0;1592221D",
"1004;2021-03-10;False;0;0;1592221D",
"1001;2021-03-11;False;0;0;1592221D"
};
for (std::string & str : plan)
{
str = EraseFirstSubString(str, "False");
str = EraseFirstSubString(str, "0;0");
}
};
But, if you think that you may have many occurrences of those sub-strings, you should improve a little bit your sub-string removing mechanism like this:
std::string EaraseSubStrings(
const std::string & main_str,
const std::string & sub_str)
{
std::string new_main_str = main_str;
size_t pos = new_main_str.find(sub_str);
while (pos != std::string::npos)
{
new_main_str.erase(pos, sub_str.length());
pos = new_main_str.find(sub_str);
}
return new_main_str;
}
If you already have a vector of individual std::string objects, you can easily use the operations that the strings library offers.
#include <algorithm>
#include <vector>
#include <string>
// before C++20 change constexpr to inline
constexpr void change(std::vector<std::string>& sv, std::string const& rem) {
for_each(beign(sv),end(sv), [&rem](std::string& s) {
s.erase(std::min(s.size(),s.find(rem)), rem.size());
});
}

std::smatch str() not returning correct string

I decided to make my own regex.h containing a class with some methods for an easier way to check, and parse strings using regexes.
The first version of my .h included just some methods, which worked just fine. Later, I decided to organize all those methods in one class, everything worked fine, but, at some point, the "match_str" method started returning strings that were of the correct length, but only consisting of "|" characters, for some reason.
This is the whole regex.h file:
#include <string>
#include <regex>
#include <vector>
class regex {
std::vector<std::smatch> match;
public:
regex(std::string);
std::regex r;
int generate_matches(std::string s) {
auto matches_begin = std::sregex_iterator(s.begin(), s.end(), r);
auto matches_end = std::sregex_iterator();
for (std::sregex_iterator i = matches_begin; i != matches_end; ++i) { match.push_back(*i); }
return match.size();
}
bool matches(std::string s) {
return std::regex_search(s, r);
}
int match_count() {
return match.size();
}
std::string match_str(int index = 0, int group = 0) {
return match.size() ? match.at(index)[group].str() : "";
}
int match_pos(int index = 0) {
return match.at(index).position() + 1;
}
}; regex::regex(std::string regex) : r(regex) {}
Everything but the "match_str" method seems to work fine
This code:
int main() {
regex rx("(int|long)( +)([a-z]);");
if (rx.generate_matches("int a; int b; int c;")) {
std::cout << rx.match_str() + "\n";
}
system("pause");
}
Outputs:
¦¦¦¦¦¦
Press any key to continue . . .
Objects of match_results keep const iterator or const char* pointer to the matched string. In generate_matches string s object is local variable so it is deleted when function terminates, you cannot store const iterator or pointer of local variable to vector - you will have dangling pointer, and it is undefined behaviour when you try to read data of object which was destroyed.
You can add additional variable to your regex class and change your generate_matches function as follows:
class regex {
std::vector<std::smatch> match;
std::string str; // <---
int generate_matches(std::string s) {
str = s; // <---
auto matches_begin = std::sregex_iterator(str.begin(), str.end(), r); // <---
auto matches_end = std::sregex_iterator();
for (std::sregex_iterator i = matches_begin; i != matches_end; ++i) { match.push_back(*i); }
return match.size();
}
now you can call match_str function and read match vector because smatch objects refer to existing object - str, not temporary.

C++ STL splitting string at comma

I am aware of several related questions, such as Parsing a comma-delimited std::string one. However, I have created a code that fits my specific need - to split the string (read from a file) at comma stripping any whitespaces. Later I want to convert these substrings to double and store in std::vector. Not all operations are shown. Here is the code I am giving.
include "stdafx.h"
#include<iostream>
#include<string>
#include<vector>
#include<algorithm>
int main()
{
std::string str1 = " 0.2345, 7.9 \n", str2;
str1.erase(remove_if(str1.begin(), str1.end(), isspace), str1.end()); //remove whitespaces
std::string::size_type pos_begin = { 0 }, pos_end = { 0 };
while (str1.find_first_of(",", pos_end) != std::string::npos)
{
pos_end = str1.find_first_of(",", pos_begin);
str2 = str1.substr(pos_begin, pos_end- pos_begin);
std::cout << str2 << std::endl;
pos_begin = pos_end+1;
}
}
Output:
0.2345
7.9
So the program goes like this. While loop searches for occurrence of , pos_end will store first occurrence of ,, str2 will be a substring, pos_begin will go to one next to pos_end. First iteration will run fine.
In the next iteration, pos_end will be very large value and I am not sure what pos_end- pos_begin will be. Same goes with pos_begin (though it will be unused). Is making some checks, such as
if (pos_end == std::string::npos)
pos_end = str1.length();
a way to go?
The program works on though (g++ -Wall -Wextra prog.cpp -o prog -std=c++11). Is this approach correct?
Your erase idiom may fail to compile on more modern compilers because isspace is overloaded. At certain point removing whitespaces using range-for might be more effective.
Algorythm in question depends whether you need or not to store tokens and correct "syntax" errors in line and store or not empty token.
#include<iostream>
#include<string>
#include<list>
#include<algorithm>
typedef std::list<std::string> StrList;
void tokenize(const std::string& in, const std::string& delims, StrList& tokens)
{
tokens.clear();
std::string::size_type pos_begin , pos_end = 0;
std::string input = in;
input.erase(std::remove_if(input.begin(),
input.end(),
[](auto x){return std::isspace(x);}),input.end());
while ((pos_begin = input.find_first_not_of(delims,pos_end)) != std::string::npos)
{
pos_end = input.find_first_of(delims,pos_begin);
if (pos_end == std::string::npos) pos_end = input.length();
tokens.push_back( input.substr(pos_begin,pos_end-pos_begin) );
}
}
int main()
{
std::string str = ",\t, 0.2345,, , , 7.9 \n";
StrList vtrToken;
tokenize( str, "," , vtrToken);
int i = 1;
for (auto &s : vtrToken)
std::cout << i++ << ".) " << s << std::endl;
return 0;
}
Output:
1.) 0.2345
2.) 7.9
This variant strips all empty token. Whether is right or not is unknown in your context, so there is no correct answer. If you have to check if string was correct, or if you have replace empty tokens with default values, you have to add additional checks
I use ranges library in c++20 and implement like bellow:
#include <iostream>
#include <ranges>
#include <algorithm>
#include <vector>
auto join_character_in_each_subranges = [](auto &&rng) {
return std::string(&*rng.begin(), std::ranges::distance(rng)); };
auto trimming = std::ranges::views::filter([](auto character){
return !std::isspace(character);});
int main()
{
std::string myline = " 0.2345, 7.9 ";
std::vector<double> line_list;
for (std::string&& words : myline
| std::ranges::views::split(',')
| std::ranges::views::transform(join_character_in_each_subranges))
{
auto words_trimming = words | trimming;
std::string clean_number;
std::ranges::for_each(words_trimming,
[&](auto character){ clean_number += character;});
line_list.push_back(atof(clean_number.c_str()));
}
}
First, iterate on myline sentences and splits the view into subranges on the delimiter
myline | std::ranges::views::split(',')
get each subrange and append each character to each other and view into the std::string with transform function
std::transform applies the given function to a range and stores the result in another range.
std::ranges::views::transform(join_character_in_each_subranges)
also, remove any prefix and suffix from view ranges
auto words_trimming = words | trimming;
and convert view ranges to std::string with
std::ranges::for_each(words_trimming, [&](auto character){ clean_number += character;});
finally, convert each clean_number to double and push_back into the list.
line_list.push_back(atof(clean_words.c_str()));

Faster way to find a sub-string given beginning and end positions in another string using C++?

The task is to find a substring (needle) in another string (haystack), given the beginning position and end position of the "haystack". The the beginning and end positions follow STL convention, i.e. the end position is the position of the character following the interested range.
For example: find "567" with beg_pos=0 and end_pos=8 in "0123456789" should return 5, while find "567" with beg_pos=0 and end_pos=4 in "0123456789" should return -1.
I could imagine two simple implementations:
Method 1: Use size_t pos = haystack.find(needle, beg_pos); to get the substring position, then compare the return value pos with end_pos if found. In the worst case, the find function will go until the end of the string haystack, but the search after end_pos is unnecessary. The performance might be bad if haystack is long.
Method 2: Use size_t pos = haystack.substr(beg_pos, end_pos-beg_pos).find(needle); to find the position, then return pos+beg_pos if found. This method avoids the problem of unnecessary searching after end_pos, but it requires to allocate a new temporary string, which might also have performance issue.
I am wondering if there is a faster way to accomplish the task.
In C++17 we have std::string_view which can be constructed with a pointer and and size. This will allow you to get a read only slice of the string where nothing would be copied. You can then use std::string_view::find to find if the sub string exists in that slice. That would look like
std::string haystack = "lots of stuff";
std::string needle = "something";
std::string_view slice(haystack.c_str() + start, end - start); // use end - start to get size of the slice
auto pos = slice.find(needle);
if (pos == std::string::npos)
return -1;
else
return pos; // or pos + start if you need the index from the start and not just in the slice.
pre-c++17
Here is a method which I think is optimally quick. It uses std::search, which seems to me to be an iterator-based substr.
In this example the position of the needle is returned relative to the start of the haystack, not the substring being searched:
#include <string>
#include <iostream>
#include <algorithm>
int main()
{
using namespace std::literals;
auto my_haystack = "0123456789"s;
auto needle = "567"s;
auto find_needle = [&needle](auto first, auto last)
{
auto i = std::search(first, last, begin(needle), end(needle));
if (i == last)
return std::string::npos;
else
return std::string::size_type(std::distance(first, i));
};
auto in_substring = [](auto&& str, auto b, auto e, auto&& f) -> std::string::size_type
{
using std::begin;
auto brange = begin(str) + b;
auto erange = begin(str) + e;
auto p = f(brange, erange);
if (p != std::string::npos)
p += b;
return p;
};
auto pos = in_substring(my_haystack, 0, 4, find_needle);
std::cout << pos << std::endl;
pos = in_substring(my_haystack, 0, my_haystack.size(), find_needle);
std::cout << pos << std::endl;
pos = in_substring(my_haystack, 1, my_haystack.size(), find_needle);
std::cout << pos << std::endl;
pos = in_substring(my_haystack, 1, 4, find_needle);
std::cout << pos << std::endl;
}
example output (64-bit size_type):
18446744073709551615
5
5
18446744073709551615

C++ Find last ocurrence of a string inside a substring

I need a method that helps me to find a string inside another substring, or in other words, find a string inside a subrange of other string. Besides, I need to find it in reverse order because I know that the string I'm looking for is closed to the end of the substring used as "haystack".
Let's suppose the following piece of code, where rfind_in_substr is the method I'm asking for:
std::string sample("An example with the example word example trice");
// substring "ample with the example wo"
std::size_t substr_beg = 5;
std::size_t substr_size = 24;
// (1)
std::size_t pos = rfind_in_substr(sample, substr_beg,
substr_size, "example");
// pos == 20, because its the index of the start of the second
// "example" word inside the main string.
Of course, the line (1) could be replaced by:
std::size_t pos = substr_beg + sample.substr
(substr_beg, substr_size).rfind("example");
But that implies an unnecesary copy of the substring. Is there any method or C++/boost method that could help me doing that?
I was looking at boost::algorithm::string library but I've found nothing (that I had understood). I know that C++17 has the std::string_view class, that would be perfect, but I'm using C++14.
From Boost.StringAlgo:
#include <boost/algorithm/string/find.hpp>
auto haystack = boost::make_iterator_range(str.begin() + from, str.begin() + from + len);
auto found = boost::algorithm::find_last(haystack, needle);
Now, if you need to use this with other member functions from std::string, you need to do extra steps in converting a resulting range into an index like this answer does, but if you aren't, then simply use the range interface and avoid the std::string's "helpful" methods.
Another option is to use boost::string_ref which is what std::string_view is basically based on:
#include <iostream>
#include <boost/utility/string_ref.hpp>
std::size_t rfind_in_substr(std::string const& str, std::size_t from,
std::size_t len, std::string const& s)
{
return from + boost::string_ref(str).substr(from, len).rfind(s);
}
int main()
{
std::string sample("An example with the example word example trice");
// substring "ample with the example wo"
std::size_t substr_beg = 5;
std::size_t substr_size = 24;
// (1)
std::size_t pos = rfind_in_substr(sample, substr_beg,
substr_size, "example");
// pos == 20, because its the index of the start of the second
// "example" word inside the main string.
std::cout << pos << "\n";
}
You can find the answer by combining an API that limits the search within the original string by length and an additional check to see if the end result comes prior to substr_beg:
std::size_t rfind_in_substr(
const std::string& str
, const std::size_t from
, const std::size_t len
, const std::string& sub
) {
std::size_t res = str.rfind(sub, from+len-sub.size());
return res != string::npos && res >= from ? res : string::npos;
}
from+len-sub.size() computes the last position at which the substring could start.
res >= from rejects an answer if it comes before the initial character of substring.
Demo.
With std::find_end the problem can be solved efficiently without using more than needed, but I hoped there was any method that already solved that:
#include <iostream>
#include <string>
#include <algorithm>
std::size_t rfind_in_substr(std::string const& str, std::size_t from,
std::size_t len, std::string const& s)
{
auto sub_beg = str.begin() + from;
auto sub_end = sub_beg + len;
auto found_it = std::find_end(sub_beg, sub_end, s.begin(), s.end());
if (found_it == sub_end)
return str.npos;
else
return found_it - str.begin();
}
int main()
{
std::string sample("An example with the example word example trice");
// substring "ample with the example w"
std::size_t substr_beg = 5;
std::size_t substr_size = 24;
std::size_t pos = rfind_in_substr(sample, substr_beg,
substr_size, "example");
std::cout << pos << std::endl; // Prints 20
}