c++ regex get folder from a file path - c++

I have a file name like this
/mnt/opt/storage/ssd/subtitles/8/vtt/2011022669-5126858992107.vtt
how to replace the file name with * using regex so I get
/mnt/opt/storage/ssd/subtitles/8/vtt/*?
I know the simple for loop split or boost::filesystem approach, I'm looking for a regex_replace approach.

You don't need regexp for this:
string str = "/mnt/opt/storage/ssd/subtitles/8/vtt/2011022669-5126858992107.vtt";
auto lastSlash = str.find_last_of('/');
str.replace(str.begin() + lastSlash + 1, str.end(), "*");

Try this pattern
(([\w+\-])+)(?=(\.\w{3}))
tested in notepad++.
(?=()) its lookahaed. So it will match ([\w+-])+ only if extension (.\w{2,3)) in format .xxx or .xx is after this group.
In c++ you have to just replace group to * something like
replace (string, $1 , '*') -- i don't know c++ replace funciton, just assuming.
$1,$2,$3... its group number, in this case - $1 its (([\w+-])+).

Below is a solution with regexp_replace [live]:
std::string path = "/mnt/opt/storage/ssd/subtitles/8/vtt/2011022669-5126858992107.vtt";
std::regex re(R"(\/[^\/]*?\..+$)");
std::cout << path << '\n';
std::cout << std::regex_replace(path, re, "/*") << '\n';
outputs:
/mnt/opt/storage/ssd/subtitles/8/vtt/2011022669-5126858992107.vtt
/mnt/opt/storage/ssd/subtitles/8/vtt/*
but,... regexp seems to be a bit too heavy weight for such simple replacement

Related

Split the string at the particular occurrence of special character (+) using regex in Java

I want to split the following string around +, but I couldn't succeed in getting the correct regex for this.
String input = "SOP3a'+bEOP3'+SOP3b'+aEOP3'";
I want to have a result like this
[SOP3a'+bEOP3', SOP3b'+aEOP3']
In some cases I may have the following string
c+SOP2SOP3a'+bEOP3'+SOP3b'+aEOP3'EOP2
which should be split as
[c, SOP2SOP3a'+bEOP3'+SOP3b'+aEOP3'EOP2]
I have tried the following regex but it doesn't work.
input.split("(SOP[0-9](.*)EOP[0-9])*\\+((SOP)[0-9](.*)(EOP)[0-9])*");
Any help or suggestions are appreciated.
Thanks
You can use the following regex to match the string and by replacing it using captured group you can get the expected result :
(?m)(.*?)\+(SOP.*?$)
see demo / explanation
Following is the code in Java that would work for you:
public static void main(String[] args) {
String input = "SOP3a'+bEOP3'+SOP3b'+aEOP3'";
String pattern = "(?m)(.*?)\\+(SOP.*?$)";
Pattern regex = Pattern.compile(pattern);
Matcher m = regex.matcher(input);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
} else {
System.out.println("NO MATCH");
}
}
The m.group(1) and m.group(2) are the values that you are looking for.
Do you really need to use split method?
And what are the rules? They are unclear to me.
Anyway, considering the regex you provided, I've only removed some unnecessary groups and I've found what you are looking for, however, instead of split, I just joined the matches as splitting it would generate some empty elements.
const str = "SOP1a+bEOP1+SOP2SOP3a'+bEOP3'+SOP3b'+aEOP3'EOP2";
const regex = RegExp(/(SOP[0-9].*EOP[0-9])*\+(SOP[0-9].*EOP[0-9])*/)
const matches = str.match(regex);
console.log('Matches ', matches);
console.log([matches[1],matches[2]]);

C++ RegExp and placeholders

I'm on C++11 MSVC2013, I need to extract a number from a file name, for example:
string filename = "s 027.wav";
If I were writing code in Perl, Java or Basic, I would use a regular expression and something like this would do the trick in Perl5:
filename ~= /(\d+)/g;
and I would have the number "027" in placeholder variable $1.
Can I do this in C++ as well? Or can you suggest a different method to extract the number 027 from that string? Also, I should convert the resulting numerical string into an integral scalar, I think atoi() is what I need, right?
You can do this in C++, as of C++11 with the collection of classes found in regex. It's pretty similar to other regular expressions you've used in other languages. Here's a no-frills example of how you might search for the number in the filename you posted:
const std::string filename = "s 027.wav";
std::regex re = std::regex("[0-9]+");
std::smatch matches;
if (std::regex_search(filename, matches, re)) {
std::cout << matches.size() << " matches." << std::endl;
for (auto &match : matches) {
std::cout << match << std::endl;
}
}
As far as converting 027 into a number, you could use atoi (from cstdlib) like you mentioned, but this will store the value 27, not 027. If you want to keep the 0 prefix, I believe you will need to keep this as a string. match above is a sub_match so, extract a string and convert to a const char* for atoi:
int value = atoi(match.str().c_str());
Ok, I solved using std::regex which for some reason I couldn't get to work properly when trying to modify the examples I found around the web. It was simpler than I thought. This is the code I wrote:
#include <regex>
#include <string>
string FileName = "s 027.wav";
// The search object
smatch m;
// The regexp /\d+/ works in Perl and Java but for some reason didn't work here.
// With this other variation I look for exactly a string of 1 to 3 characters
// containing only numbers from 0 to 9
regex re("[0-9]{1,3}");
// Do the search
regex_search (FileName, m, re);
// 'm' is actually an array where every index contains a match
// (equally to $1, $2, $2, etc. in Perl)
string sMidiNoteNum = m[0];
// This casts the string to an integer number
int MidiNote = atoi(sMidiNoteNum.c_str());
Here is an example using Boost, substitute the proper namespace and it should work.
typedef std::string::const_iterator SITR;
SITR start = str.begin();
SITR end = str.end();
boost::regex NumRx("\\d+");
boost::smatch m;
while ( boost::regex_search ( start, end, m, NumRx ) )
{
int val = atoi( m[0].str().c_str() )
start = m[0].second;
}

C++ regex to search file paths in a string

I'm trying to parse strings which can contain file paths.
I'm using C++ with regex library. I'm not that good with regex, here it's the ECMAScript.
I don't know why the string :
"C:\Windows\explorer.exe C:\titi\toto.exe"
Doesn't matches the pattern (actually it only founds the first one)
(?:[a-zA-Z]\:|\\)(?:\\[a-z_\-\s0-9]+)+
Do you have a better idea to find every match ?
Thanks!
Here's my code:
wsmatch matches;
regex_constants::match_flag_type fl = regex_constants::match_default ;
regex_constants::syntax_option_type st = regex_constants::icase //Case insensitive
| regex_constants::ECMAScript
| regex_constants::optimize;
wregex pattern(L"(?:[a-zA-Z]\\:|\\\\)(?:\\\\[a-z_\\-\\s0-9]+)+", st);
// Look if matches pattern
printf("--> %ws\n", path.c_str());
if (regex_search(path, matches, pattern, fl)
&& matches.size() > 0)
{
for (u_int i = 0 ; i < matches.size() ; i++)
{
wssub_match sub_match = matches[i];
wstring sub_match_str = sub_match.str();
printf("%ws\n", sub_match_str.c_str());
}
}
You could use something like this:
.?:(\\[a-zA-Z 0-9]*)*.[a-zA-Z]*
I tested it with http://regexpal.com/ and it extracts all file paths.
Although regex provided by #mspoerr satisfies example question, but it wasn't great for me in more complex scenarios, therefore I used to write my own.
Regex:
(\w:)?([\\\w\s0-9_]*)\.\w+
Advanced test string:
C:\Wi ndows\explorer.exe asdasds
: ad C:\titi\toto.Heexe
HELLOO : qwefqwfqwf c:\aa.
(it matches only two valid file paths)

Conditionally replace regex matches in string

I am trying to replace certain patterns in a string with different replacement patters.
Example:
string test = "test replacing \"these characters\"";
What I want to do is replace all ' ' with '_' and all other non letter or number characters with an empty string. I have the following regex created and it seems to tokenize correctly, but I am not sure how to (if possible) perform a conditional replace using regex_replace.
string test = "test replacing \"these characters\"";
regex reg("(\\s+)|(\\W+)");
expected result after replace would be:
string result = "test_replacing_these_characters";
EDIT:
I cannot use boost, which is why I left it out of the tags. So please no answer that includes boost. I have to do this with the standard library. It may be that a different regex would accomplish the goal or that I am just stuck doing two passes.
EDIT2:
I did not remember what characters were included in \w at the time of my original regex, after looking it up I have further simplified the expression. Again the goal is anything matching \s+ should be replaced with '_' and anything matching \W+ should be replaced with empty string.
The c++ (0x, 11, tr1) regular expressions do not really work (stackoverflow) in every case (look up the phrase regex on this page for gcc), so it is better to use boost for a while.
You may try if your compiler supports the regular expressions needed:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, char * argv[]) {
string test = "test replacing \"these characters\"";
regex reg("[^\\w]+");
test = regex_replace(test, reg, "_");
cout << test << endl;
}
The above works in Visual Studio 2012Rc.
Edit 1: To replace by two different strings in one pass (depending on the match), I'd think this won't work here. In Perl, this could easily be done within evaluated replacement expressions (/e switch).
Therefore, you'll need two passes, as you already suspected:
...
string test = "test replacing \"these characters\"";
test = regex_replace(test, regex("\\s+"), "_");
test = regex_replace(test, regex("\\W+"), "");
...
Edit 2:
If it would be possible to use a callback function tr() in regex_replace, then you could modify the substitution there, like:
string output = regex_replace(test, regex("\\s+|\\W+"), tr);
with tr() doing the replacement work:
string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }
the problem would have been solved. Unfortunately, there's no such overload in some C++11 regex implementations, but Boost has one. The following would work with boost and use one pass:
...
#include <boost/regex.hpp>
using namespace boost;
...
string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }
...
string test = "test replacing \"these characters\"";
test = regex_replace(test, regex("\\s+|\\W+"), tr); // <= works in Boost
...
Maybe some day this will work with C++11 or whatever number comes next.
Regards
rbo
The way to do this has commonly been accomplished by using four backslashes to remove the backlash effecting the actual C code. Then you will need to make a second pass for the parentheses and escape them in your regex then and only then.
string tet = "test replacing \"these characters\"";
//regex reg("[^\\w]+");
regex reg("\\\\"); //--AS COMMONLY TAUGHT AND EXPLAINED
tet = regex_replace(tet, reg, " ");
cout << tet << endl;
regex reg2("\""); //--AS SHOWN
tet = regex_replace(tet, reg2, " ");
cout << tet << endl;
And in a single pass use;
string tet = "test replacing \"these characters\"";
//regex reg("[^\\w]+");
regex reg3("\\\""); //--AS EXPLAINED
tet = regex_replace(tet, reg3, "");
cout << tet << endl;

regex how can I split this word?

I have a list of several phrases in the following format
thisIsAnExampleSentance
hereIsAnotherExampleWithMoreWordsInIt
and I'm trying to end up with
This Is An Example Sentance
Here Is Another Example With More Words In It
Each phrase has the white space condensed and the first letter is forced to lowercase.
Can I use regex to add a space before each A-Z and have the first letter of the phrase be capitalized?
I thought of doing something like
([a-z]+)([A-Z])([a-z]+)([A-Z])([a-z]+) // etc
$1 $2$3 $4$5 // etc
but on 50 records of varying length, my idea is a poor solution. Is there a way to regex in a way that will be more dynamic? Thanks
A Java fragment I use looks like this (now revised):
result = source.replaceAll("(?<=^|[a-z])([A-Z])|([A-Z])(?=[a-z])", " $1$2");
result = result.substring(0, 1).toUpperCase() + result.substring(1);
This, by the way, converts the string givenProductUPCSymbol into Given Product UPC Symbol - make sure this is fine with the way you use this type of thing
Finally, a single line version could be:
result = source.substring(0, 1).toUpperCase() + source(1).replaceAll("(?<=^|[a-z])([A-Z])|([A-Z])(?=[a-z])", " $1$2");
Also, in an Example similar to one given in the question comments, the string hiMyNameIsBobAndIWantAPuppy will be changed to Hi My Name Is Bob And I Want A Puppy
For the space problem it's easy if your language supports zero-width-look-behind
var result = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "(?<=[a-z])([A-Z])", " $1");
or even if it doesn't support them
var result2 = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "([a-z])([A-Z])", "$1 $2");
I'm using C#, but the regexes should be usable in any language that support the replace using the $1...$n .
But for the lower-to-upper case you can't do it directly in Regex. You can get the first character through a regex like: ^[a-z] but you can't convet it.
For example in C# you could do
var result4 = Regex.Replace(result, "^([a-z])", m =>
{
return m.ToString().ToUpperInvariant();
});
using a match evaluator to change the input string.
You could then even fuse the two together
var result4 = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "^([a-z])|([a-z])([A-Z])", m =>
{
if (m.Groups[1].Success)
{
return m.ToString().ToUpperInvariant();
}
else
{
return m.Groups[2].ToString() + " " + m.Groups[3].ToString();
}
});
A Perl example with unicode character support:
s/\p{Lu}/ $&/g;
s/^./\U$&/;