Conditionally replace regex matches in string - c++

I am trying to replace certain patterns in a string with different replacement patters.
Example:
string test = "test replacing \"these characters\"";
What I want to do is replace all ' ' with '_' and all other non letter or number characters with an empty string. I have the following regex created and it seems to tokenize correctly, but I am not sure how to (if possible) perform a conditional replace using regex_replace.
string test = "test replacing \"these characters\"";
regex reg("(\\s+)|(\\W+)");
expected result after replace would be:
string result = "test_replacing_these_characters";
EDIT:
I cannot use boost, which is why I left it out of the tags. So please no answer that includes boost. I have to do this with the standard library. It may be that a different regex would accomplish the goal or that I am just stuck doing two passes.
EDIT2:
I did not remember what characters were included in \w at the time of my original regex, after looking it up I have further simplified the expression. Again the goal is anything matching \s+ should be replaced with '_' and anything matching \W+ should be replaced with empty string.

The c++ (0x, 11, tr1) regular expressions do not really work (stackoverflow) in every case (look up the phrase regex on this page for gcc), so it is better to use boost for a while.
You may try if your compiler supports the regular expressions needed:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, char * argv[]) {
string test = "test replacing \"these characters\"";
regex reg("[^\\w]+");
test = regex_replace(test, reg, "_");
cout << test << endl;
}
The above works in Visual Studio 2012Rc.
Edit 1: To replace by two different strings in one pass (depending on the match), I'd think this won't work here. In Perl, this could easily be done within evaluated replacement expressions (/e switch).
Therefore, you'll need two passes, as you already suspected:
...
string test = "test replacing \"these characters\"";
test = regex_replace(test, regex("\\s+"), "_");
test = regex_replace(test, regex("\\W+"), "");
...
Edit 2:
If it would be possible to use a callback function tr() in regex_replace, then you could modify the substitution there, like:
string output = regex_replace(test, regex("\\s+|\\W+"), tr);
with tr() doing the replacement work:
string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }
the problem would have been solved. Unfortunately, there's no such overload in some C++11 regex implementations, but Boost has one. The following would work with boost and use one pass:
...
#include <boost/regex.hpp>
using namespace boost;
...
string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }
...
string test = "test replacing \"these characters\"";
test = regex_replace(test, regex("\\s+|\\W+"), tr); // <= works in Boost
...
Maybe some day this will work with C++11 or whatever number comes next.
Regards
rbo

The way to do this has commonly been accomplished by using four backslashes to remove the backlash effecting the actual C code. Then you will need to make a second pass for the parentheses and escape them in your regex then and only then.
string tet = "test replacing \"these characters\"";
//regex reg("[^\\w]+");
regex reg("\\\\"); //--AS COMMONLY TAUGHT AND EXPLAINED
tet = regex_replace(tet, reg, " ");
cout << tet << endl;
regex reg2("\""); //--AS SHOWN
tet = regex_replace(tet, reg2, " ");
cout << tet << endl;
And in a single pass use;
string tet = "test replacing \"these characters\"";
//regex reg("[^\\w]+");
regex reg3("\\\""); //--AS EXPLAINED
tet = regex_replace(tet, reg3, "");
cout << tet << endl;

Related

c++ regex get folder from a file path

I have a file name like this
/mnt/opt/storage/ssd/subtitles/8/vtt/2011022669-5126858992107.vtt
how to replace the file name with * using regex so I get
/mnt/opt/storage/ssd/subtitles/8/vtt/*?
I know the simple for loop split or boost::filesystem approach, I'm looking for a regex_replace approach.
You don't need regexp for this:
string str = "/mnt/opt/storage/ssd/subtitles/8/vtt/2011022669-5126858992107.vtt";
auto lastSlash = str.find_last_of('/');
str.replace(str.begin() + lastSlash + 1, str.end(), "*");
Try this pattern
(([\w+\-])+)(?=(\.\w{3}))
tested in notepad++.
(?=()) its lookahaed. So it will match ([\w+-])+ only if extension (.\w{2,3)) in format .xxx or .xx is after this group.
In c++ you have to just replace group to * something like
replace (string, $1 , '*') -- i don't know c++ replace funciton, just assuming.
$1,$2,$3... its group number, in this case - $1 its (([\w+-])+).
Below is a solution with regexp_replace [live]:
std::string path = "/mnt/opt/storage/ssd/subtitles/8/vtt/2011022669-5126858992107.vtt";
std::regex re(R"(\/[^\/]*?\..+$)");
std::cout << path << '\n';
std::cout << std::regex_replace(path, re, "/*") << '\n';
outputs:
/mnt/opt/storage/ssd/subtitles/8/vtt/2011022669-5126858992107.vtt
/mnt/opt/storage/ssd/subtitles/8/vtt/*
but,... regexp seems to be a bit too heavy weight for such simple replacement

C++ RegExp and placeholders

I'm on C++11 MSVC2013, I need to extract a number from a file name, for example:
string filename = "s 027.wav";
If I were writing code in Perl, Java or Basic, I would use a regular expression and something like this would do the trick in Perl5:
filename ~= /(\d+)/g;
and I would have the number "027" in placeholder variable $1.
Can I do this in C++ as well? Or can you suggest a different method to extract the number 027 from that string? Also, I should convert the resulting numerical string into an integral scalar, I think atoi() is what I need, right?
You can do this in C++, as of C++11 with the collection of classes found in regex. It's pretty similar to other regular expressions you've used in other languages. Here's a no-frills example of how you might search for the number in the filename you posted:
const std::string filename = "s 027.wav";
std::regex re = std::regex("[0-9]+");
std::smatch matches;
if (std::regex_search(filename, matches, re)) {
std::cout << matches.size() << " matches." << std::endl;
for (auto &match : matches) {
std::cout << match << std::endl;
}
}
As far as converting 027 into a number, you could use atoi (from cstdlib) like you mentioned, but this will store the value 27, not 027. If you want to keep the 0 prefix, I believe you will need to keep this as a string. match above is a sub_match so, extract a string and convert to a const char* for atoi:
int value = atoi(match.str().c_str());
Ok, I solved using std::regex which for some reason I couldn't get to work properly when trying to modify the examples I found around the web. It was simpler than I thought. This is the code I wrote:
#include <regex>
#include <string>
string FileName = "s 027.wav";
// The search object
smatch m;
// The regexp /\d+/ works in Perl and Java but for some reason didn't work here.
// With this other variation I look for exactly a string of 1 to 3 characters
// containing only numbers from 0 to 9
regex re("[0-9]{1,3}");
// Do the search
regex_search (FileName, m, re);
// 'm' is actually an array where every index contains a match
// (equally to $1, $2, $2, etc. in Perl)
string sMidiNoteNum = m[0];
// This casts the string to an integer number
int MidiNote = atoi(sMidiNoteNum.c_str());
Here is an example using Boost, substitute the proper namespace and it should work.
typedef std::string::const_iterator SITR;
SITR start = str.begin();
SITR end = str.end();
boost::regex NumRx("\\d+");
boost::smatch m;
while ( boost::regex_search ( start, end, m, NumRx ) )
{
int val = atoi( m[0].str().c_str() )
start = m[0].second;
}

Regular expression validation fails while egrep validates just fine

I'm trying to use regular expressions in order to validate strings so before I go any further let me explain first how the strings looks like: optional number of digits followed by an 'X' and an optional ('^' followed by one or more digits).
Here are some exmaples: "2X", "X", "23X^6" fit the pattern while strings like "X^", "4", "foobar", "4X^", "4X44" don't.
Now where was I: using 'egrep' and the "^[0-9]{0,}\X(\^[0-9]{1,})$" regex I can validate just fine those strings however when trying this in C++ using the C++11 regex library it fails.
Here's the code I'm using to validate those strings:
#include <iostream>
#include <regex>
#include <string>
#include <vector>
int main()
{
std::regex r("^[0-9]{0,}\\X(\\^[0-9]{1,})$",
std::regex_constants::egrep);
std::vector<std::string> challanges_ok {"2X", "X", "23X^66", "23X^6",
"3123X", "2313131X^213213123"};
std::vector<std::string> challanges_bad {"X^", "4", "asdsad", " X",
"4X44", "4X^"};
std::cout << "challanges_ok: ";
for (auto &str : challanges_ok) {
std::cout << std::regex_match(str, r) << " ";
}
std::cout << "\nchallanges_bad: ";
for (auto &str : challanges_bad) {
std::cout << std::regex_match(str, r) << " ";
}
std::cout << "\n";
return 0;
}
Am I doing something wrong or am I missing something? I'm compiling under GCC 4.7.
Your regex fails to make the '^' followed by one or more digits optional; change it to:
"^[0-9]*X(\\^[0-9]+)?$".
Also note that this page says that GCC's support of <regex> is only partial, so std::regex may not work at all for you ('partial' in this context apparently means 'broken'); have you tried Boost.Xpressive or Boost.Regex as a sanity check?
optional number of digits followed by an 'X' and an optional ('^' followed by one or more digits).
OK, the regular expression in your code doesn't match that description, for two reasons: you have an extra backslash on the X, and the '^digits' part is not optional. The regex you want is this:
^[0-9]{0,}X(\^[0-9]{1,}){0,1}$
which means your grep command should look like this (note single quotes):
egrep '^[0-9]{0,}X(\^[0-9]{1,}){0,1}$' filename
And the string you have to pass in your C++ code is this:
"^[0-9]{0,}X(\\^[0-9]{1,}){0,1}$"
If you then replace all the explicit quantifiers with their more traditional abbreviations, you get #ildjarn's answer: {0,} is *, {1,} is +, and {0,1} is ?.

Using Boost::Xpressive to match a single character

I have a string that can be "/" "+" "." or a descriptive name
I'm trying to figure out how to use regex to check if the string matches any of the 3 special characters above (/ + or .)
After doing a bit of reading i decided boost::xpressive was the way to go but i still cannot figure it out.
is Boost:xpressive suitable for this task and what would my regex string need to be?
thanks
Why not just use std::string::find_first_of() to do your own solution? Sounds like a lot of machinery for a fairly simple task.
Edit
Try this out if you're still stuck.
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
using namespace std;
using namespace boost::xpressive;
int main()
{
sregex re = sregex::compile("[+./]|[:word:]+");
sregex op = as_xpr('+') | '.' | '/';
sregex rex = op | (+alpha);
if (regex_match(string("word"), re))
cout << "word" << endl;
if (regex_match(string("word2"), re))
cout << "word2" << endl;
if (regex_match(string("+"), re))
cout << "+" << endl;
return 0;
}
There are two ways to do the same thing shown. The variable named re is intialized with a perl-like regular expression string. rex uses Xpressive native elements.
I would say that Boost.Xpressive may be overkill for the task, but it's your call.
Regular expression are life savers when you want to validate a particularly formatted string. Here, there is no format involved, only a set of possible values. My advice : if your problem can be solved by simple, successive string equality comparisons, than you probably don't need anything like regular expressions.

Get String Between 2 Strings

How can I get a string that is between two other declared strings, for example:
String 1 = "[STRING1]"
String 2 = "[STRING2]"
Source:
"832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"
How can I get the "I need this text here"?
Since this is homework, only clues:
Find index1 of occurrence of String1
Find index2 of occurrence of String2
Substring from index1+lengthOf(String1) (inclusive) to index2 (exclusive) is what you need
Copy this to a result buffer if necessary (don't forget to null-terminate)
Might be a good case for std::regex, which is part of C++11.
#include <iostream>
#include <string>
#include <regex>
int main()
{
using namespace std::string_literals;
auto start = "\\[STRING1\\]"s;
auto end = "\\[STRING2\\]"s;
std::regex base_regex(start + "(.*)" + end);
auto example = "832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"s;
std::smatch base_match;
std::string matched;
if (std::regex_search(example, base_match, base_regex)) {
// The first sub_match is the whole string; the next
// sub_match is the first parenthesized expression.
if (base_match.size() == 2) {
matched = base_match[1].str();
}
}
std::cout << "example: \""<<example << "\"\n";
std::cout << "matched: \""<<matched << "\"\n";
}
Prints:
example: "832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"
matched: "I need this text here"
What I did was create a program that creates two strings, start and end that serve as my start and end matches. I then use a regular expression string that will look for those, and match against anything in-between (including nothing). Then I use regex_match to find the matching part of the expression, and set matched as the matched string.
For more info, see http://en.cppreference.com/w/cpp/regex and http://en.cppreference.com/w/cpp/regex/regex_search
Use strstr http://www.cplusplus.com/reference/clibrary/cstring/strstr/ , with that function you will get 2 pointers, now you should compare them (if pointer1 < pointer2) if so, read all chars between them.