Find specific char inside delimiter - regex

I have this string:
(40.959953710949506, -74.18210638344726),(40.95891663745299, -74.10606039345703),(40.917472246121065, -74.09582940498359),(40.921752754230255, -74.16397897163398),(40.95248644043785, -74.21067086616523)
I need to grab the commas inside the parentesis for further processing, and I want the commas spliting the groups to remain.
Let's say I want to replace the target commas by FOO, the result should be:
(40.959953710949506 FOO -74.18210638344726),(40.95891663745299 FOO -74.10606039345703),(40.917472246121065 FOO -74.09582940498359),(40.921752754230255 FOO -74.16397897163398),(40.95248644043785 FOO -74.21067086616523)
I want a Regular Expression that is not language specific.

You can just use a lookaround to find all , that are not preceded by a ) like this:
(?<!\)),
I don't want some language specific functions for this
The format of the above regex is not language specific as can be seen in the following Code Snippet or this regex101 snippet:
const x = '(40.959953710949506, -74.18210638344726),(40.95891663745299, -74.10606039345703),(40.917472246121065, -74.09582940498359),(40.921752754230255, -74.16397897163398),(40.95248644043785, -74.21067086616523)';
const rgx = /(?<!\)),/g;
console.log(x.replace(rgx, ' XXX'));

For example:
import re
s = "(40.959953710949506, -74.18210638344726),(40.95891663745299, -74.10606039345703),(40.917472246121065, -74.09582940498359),(40.921752754230255, -74.16397897163398),(40.95248644043785, -74.21067086616523)"
s = re.sub(r",(?=[^()]+\))", " FOO", s)
print(s)
# (40.959953710949506 FOO -74.18210638344726),(40.95891663745299 FOO -74.10606039345703),(40.917472246121065 FOO -74.09582940498359),(40.921752754230255 FOO -74.16397897163398),(40.95248644043785 FOO -74.21067086616523)
We use a positive lookahead to only replace commas where ) comes before ( ahead in the string.

Use re.sub with a callback function:
inp = "(40.959953710949506, -74.18210638344726),(40.95891663745299, -74.10606039345703),(40.917472246121065, -74.09582940498359),(40.921752754230255, -74.16397897163398),(40.95248644043785, -74.21067086616523)"
output = re.sub(r'\((-?\d+(?:\.\d+)?),\s*(-?\d+(?:\.\d+)?)\)', lambda m: r'(' + m.group(1) + r' FOO ' + m.group(2) + r')', inp)
print(output)
This prints:
(40.959953710949506 FOO -74.18210638344726),(40.95891663745299 FOO -74.10606039345703),(40.917472246121065 FOO -74.09582940498359),(40.921752754230255 FOO -74.16397897163398),(40.95248644043785 FOO -74.21067086616523)
The strategy here is to capture the two numbers in each tuple in separate groups. Then, we replace by connecting the two numbers with FOO instead of the original comma.

Related

Javascript regex to match type annotations

I'm trying to match type annotations from a string of parameters:
foo: string, bar:number, baz: Array<string>
my initial pattern was working fine for primitives:
:\s*\w+
but it's not capturing arrays, so I tried an alternation, but it's not working:
:\s*\w+|:\s*\w+<\w+>
end result should be:
foo, bar, baz
You can make the part with the brackets optional and replace the matches with an empty string leaving the desired result:
:\s*\w+(?:<\w+>)?
Regex demo
let s = "foo: string, bar:number, baz: Array<string>";
console.log(s.replace(/:\s*\w+(?:<\w+>)?/g, ''));
Or match the parts using a capturing group
(\w+):\s*\w
Regex demo
let s = "foo: string, bar:number, baz: Array<string>";
let matches = Array.from(s.matchAll(/(\w+):\s*\w/g), m => m[1]);
console.log(matches.join(", "));

Python regex lookbehind and lookahead

I need to match the string "foo" from a string with this format:
string = "/foo/boo/poo"
I tied this code:
poo = "poo"
foo = re.match('.*(?=/' + re.escape(poo) + ')', string).group(0)
and it gives me /foo/boo as the content of the variable foo (instead of just foo/boo).
I tried this code:
poo = "poo"
foo = re.match('(?=/).*(?=/' + re.escape(poo) + ')', string).group(0)
and I'm getting the same output (/foo/boo instead of foo/boo).
How can I match only the foo/boo part?
Hey try the following regex:
(?<=/).*(?=/poo)
^^^^^^
It will not take into account your first slash in the result.
Tested regex101: https://regex101.com/r/yzMkTg/1
Transform your code in the following way and it should work:
poo = "poo"
foo = re.match('(?<=/).*(?=/' + re.escape(poo) + ')', string).group(0)
Have a quick look at this link for more information about the behavior of Positive lookahead and Positive lookbehind
http://www.rexegg.com/regex-quickstart.html
You are missing a < in your lookbehind!
Lookbehinds look like this:
(?<=...)
not like this:
(?=...)
That would be a lookahead!
So,
(?<=/).*(?=/poo)

Regex to replace all occurrences between two matches

I am using std::regex and need to do a search and replace.
The string I have is:
begin foo even spaces and maybe new line(
some text only replace foo foo bar foo, keep the rest
)
some more text not replace foo here
Only the stuff between begin .... ( and ) should be touched.
I manage to replace the first foo by using this search and replace:
(begin[\s\S]*?\([\s\S]*?)foo([\s\S]*?\)[\s\S]*)
$1abc$2
Online regex demo
Online C++ demo
However, how do I replace all three foo in one pass? I tried lookarounds, but failed because of the quantifiers.
The end result should look like this:
begin foo even spaces and maybe new line(
some text only replace abc abc bar abc, keep the rest
)
some more text not replace foo here
Question update:
I am looking for a pure regex solution. That is, the question should be solved by only changing the search and replace strings in the online C++ demo.
I have come up with this code (based on Benjamin Lindley's answer):
#include <iostream>
#include <regex>
#include <string>
int main()
{
std::string input_text = "my text\nbegin foo even 14 spaces and maybe \nnew line(\nsome text only replace foo foo bar foo, keep the rest\n)\nsome more text not replace foo here";
std::regex re(R"((begin[^(]*)(\([^)]*\)))");
std::regex rxReplace(R"(\bfoo\b)");
std::string output_text;
auto callback = [&](std::string const& m){
std::smatch smtch;
if (regex_search(m, smtch, re)) {
output_text += smtch[1].str();
output_text += std::regex_replace(smtch[2].str().c_str(), rxReplace, "abc");
} else {
output_text += m;
}
};
std::sregex_token_iterator
begin(input_text.begin(), input_text.end(), re, {-1,0}),
end;
std::for_each(begin,end,callback);
std::cout << output_text;
return 0;
}
See IDEONE demo
I am using one regex to find all matches of begin...(....) and pass them into the callback function where only Group 2 is processed further (a \bfoo\b regex is used to replace foos with abcs).
I suggest using (begin[^(]*)(\([^)]*\)) regex:
(begin[^(]*) - Group 1 matching a character sequence begin followed with zero or more characters other than (
(\([^)]*\)) - Group 2 matching a literal ( followed with zero or more characters other than ) (with [^)]*) and a literal ).

Python Replacement of Shortcodes using Regular Expressions

I have a string that looks like this:
my_str = "This sentence has a [b|bolded] word, and [b|another] one too!"
And I need it to be converted into this:
new_str = "This sentence has a <b>bolded</b> word, and <b>another</b> one too!"
Is it possible to use Python's string.replace or re.sub method to do this intelligently?
Just capture all the characters before | inside [] into a group . And the part after | into another group. Just call the captured groups through back-referencing in the replacement part to get the desired output.
Regex:
\[([^\[\]|]*)\|([^\[\]]*)\]
Replacemnet string:
<\1>\2</\1>
DEMO
>>> import re
>>> s = "This sentence has a [b|bolded] word, and [b|another] one too!"
>>> m = re.sub(r'\[([^\[\]|]*)\|([^\[\]]*)\]', r'<\1>\2</\1>', s)
>>> m
'This sentence has a <b>bolded</b> word, and <b>another</b> one too!'
Explanation...
Try this expression: [[]b[|](\w+)[]] shorter version can also be \[b\|(\w+)\]
Where the expression is searching for anything that starts with [b| captures what is between it and the closing ] using \w+ which means [a-zA-Z0-9_] to include a wider range of characters you can also use .*? instead of \w+ which will turn out in \[b\|(.*?)\]
Online Demo
Sample Demo:
import re
p = re.compile(ur'[[]b[|](\w+)[]]')
test_str = u"This sentence has a [b|bolded] word, and [b|another] one too!"
subst = u"<bold>$1</bold>"
result = re.sub(p, subst, test_str)
Output:
This sentence has a <bold>bolded</bold> word, and <bold>another</bold> one too!
Just for reference, in case you don't want two problems:
Quick answer to your particular problem:
my_str = "This sentence has a [b|bolded] word, and [b|another] one too!"
print my_str.replace("[b|", "<b>").replace("]", "</b>")
# output:
# This sentence has a <b>bolded</b> word, and <b>another</b> one too!
This has the flaw that it will replace all ] to </b> regardless whether it is appropriate or not. So you might want to consider the following:
Generalize and wrap it in a function
def replace_stuff(s, char):
begin = s.find("[{}|".format(char))
while begin != -1:
end = s.find("]", begin)
s = s[:begin] + s[begin:end+1].replace("[{}|".format(char),
"<{}>".format(char)).replace("]", "</{}>".format(char)) + s[end+1:]
begin = s.find("[{}|".format(char))
return s
For example
s = "Don't forget to [b|initialize] [code|void toUpper(char const *s)]."
print replace_stuff(s, "code")
# output:
# "Don't forget to [b|initialize] <code>void toUpper(char const *s)</code>."

Regex: optional group

I want to split a string like this:
abc//def//ghi
into a part before and after the first occurrence of //:
a: abc
b: //def//ghi
I'm currently using this regex:
(?<a>.*?)(?<b>//.*)
Which works fine so far.
However, sometimes the // is missing in the source string and obviously the regex fails to match. How is it possible to make the second group optional?
An input like abc should be matched to:
a: abc
b: (empty)
I tried (?<a>.*?)(?<b>//.*)? but that left me with lots of NULL results in Expresso so I guess it's the wrong idea.
Try a ^ at the begining of your expression to match the begining of the string and a $ at the end to match the end of the string (this will make the ungreedy match work).
^(?<a>.*?)(?<b>//.*)?$
A proof of Stevo3000's answer (Python):
import re
test_strings = ['abc//def//ghi', 'abc//def', 'abc']
regex = re.compile("(?P<a>.*?)(?P<b>//.*)?$")
for ts in test_strings:
match = regex.match(ts)
print 'a:', match.group('a'), 'b:', match.group('b')
a: abc b: //def//ghi
a: abc b: //def
a: abc b: None
Why use group matching at all? Why not just split by "//", either as a regex or a plain string?
use strict;
my $str = 'abc//def//ghi';
my $short = 'abc';
print "The first:\n";
my #groups = split(/\/\//, $str, 2);
foreach my $val (#groups) {
print "$val\n";
}
print "The second:\n";
#groups = split(/\/\//, $short, 2);
foreach my $val (#groups) {
print "$val\n";
}
gives
The first:
abc
def//ghi
The second:
abc
[EDIT: Fixed to return max 2 groups]