How to replace different matching groups with different text in Regex - regex

I have the following text:
dangernounC2
cautionnounC2
alertverbC1
dangerousadjectiveB1
What I need as an output is:
danger (n)
caution (n)
alert (v)
dangerous (adj)
I would know how to do this if the list was, for example, all nouns or all verbs etc., but is there a way to replace each matching group with different corresponding text?

Here is a regular expression that would work for you. But it's a kind of trick that only works because this substitution is part of the match.
Regular expression
(n)ounC2|(v)erbC1|(adj)ectiveB1
Substitution
($1$2$3)
Use (\1\2\3) instead if you're using Python
Explanation
(n)ounC2|(v)erbC1|(adj)ectiveB1 will match either nounC2, verbC1 or adjectiveB1
When it matches nounC2, Group 1 will contain n, Group 2 and 3 contain nothing
When it matches verbC1, Group 2 will contain v, Group 1 and 3 contain nothing
When it matches adjectiveB1, Group 3 will contain adj, Group 1 and 2 contain nothing
Every match is replaced with a space followed by the values of the 3 groups between parenthesis.
Demos
Demo on RegEx101
Code snippet (JavaScript)
const regex = /(n)ounC2|(v)erbC1|(adj)ectiveB1/gm;
const str = `
dangernounC2
cautionnounC2
alertverbC1
dangerousadjectiveB1
eatverbC1
prettyadjectiveB1`;
const subst = ` ($1$2$3)`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);

Related

Capturing all occurences of a repeated group in a string and reference them for substitution

I have following regex that matches any number in the string and returns it in the group, which then i replace with another text.
For the sample string:
/text_1/123456/text_2
With /^(.*[^0-9])+([0-9]{3,}+)+(.*)$ and using substitution like $1captured_group$3 i get my desired result i.e. /text_1/captured_group/text_2
However for scenarios where the capturing groups appears more than once in the give string such as:
/text_1/123456/text_2/789011
/text_1/123456/text_2/789011/abc/12345
The given regex would only capture last group i.e. 789011 and 12345 respectively. However, what i want is to capture all of the groups and be able to reference them later to replace them.
An explanation given on regex101.com i beleive addresses my scenario:
A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data.
However, i am not sure how to Put a capturing group around the repeated group to capture all iterations and later reference all the matched values?
As Hao Wu commented:
"If you want to match multiple occurrences you need to get rid of the anchors (^, $) and add a global (g) modifier, such as /\b[0-9]{3,}\b/g"
As for storing matches and referencing them for later use, you could have an array of objects wherein each object has the match and an array of two indices -- the first index being the index of the start of the match and the second index being the index of the end of the match:
// string = `123`
{match: 123, indices: [0, 2]}
In the example below, the function tagMatches(str, rgx) uses .matchAll() method.
const tagMatches = (str, rgx) => {
const matches = str.matchAll(rgx);
let result = [];
for (const match of matches) {
result.push({"match": +match[0], "indices": [match.index, match.index + match.length]});
}
return result;
}
const string = `utfuduyiutcv fvtycy 1sdtyveaf 678900 amsiofjsogifn979/125487/`;
const regexp = /\b(\d){3,}\b/g;
const tagged = tagMatches(string, regexp)
console.log(tagged);
console.log("first match: "+tagged[0].match);
console.log("second match start: "+tagged[1].indices[0]);
console.log("first match end: "+tagged[0].indices[1]);

regular expression include existing matches in search

I am trying to capture parts of an equation using a regular expression.
Equation: 1×2÷3×4
Regular expression: \d+(×|÷)\d+
I expect this to result in:
1×2
2÷3
3×4
But it only returns:
1×2
3×4
I assume this has something to do with the structure, but I'm not even sure where to start or what to google to find the answer.
If your regex matches something then it will continue after that match so that's why you are getting only two matches. You can use (?=abc) positive lookahead to just see that if there is ([×÷]) and capture it and (\d) after the match.
You can use
/\d(?=([×÷])(\d))/g
The below code is specifically in Javascript
const regex = /\d(?=([×÷])(\d))/g;
const str = "1×2÷3×4";
const results = [...str.matchAll(regex)].map((arr) => {
return `${arr[0]}${arr[1]}${arr[2]}`;
});
console.log(results);
Each part of the string will only be matched to the pattern one time - once the substring "1x2" has matched the regular expression, the '2' won't be re-used in subsequent matches. Consider the string "×2÷3×4" (i.e. drop the first '1') - in this case the first (and only) match is "2÷3".

Compound Words - Regex [duplicate]

I would expect this line of JavaScript:
"foo bar baz".match(/^(\s*\w+)+$/)
to return something like:
["foo bar baz", "foo", " bar", " baz"]
but instead it returns only the last captured match:
["foo bar baz", " baz"]
Is there a way to get all the captured matches?
When you repeat a capturing group, in most flavors, only the last capture is kept; any previous capture is overwritten. In some flavor, e.g. .NET, you can get all intermediate captures, but this is not the case with Javascript.
That is, in Javascript, if you have a pattern with N capturing groups, you can only capture exactly N strings per match, even if some of those groups were repeated.
So generally speaking, depending on what you need to do:
If it's an option, split on delimiters instead
Instead of matching /(pattern)+/, maybe match /pattern/g, perhaps in an exec loop
Do note that these two aren't exactly equivalent, but it may be an option
Do multilevel matching:
Capture the repeated group in one match
Then run another regex to break that match apart
References
regular-expressions.info/Repeating a Capturing Group vs Capturing a Repeating Group
Javascript flavor notes
Example
Here's an example of matching <some;words;here> in a text, using an exec loop, and then splitting on ; to get individual words (see also on ideone.com):
var text = "a;b;<c;d;e;f>;g;h;i;<no no no>;j;k;<xx;yy;zz>";
var r = /<(\w+(;\w+)*)>/g;
var match;
while ((match = r.exec(text)) != null) {
print(match[1].split(";"));
}
// c,d,e,f
// xx,yy,zz
The pattern used is:
_2__
/ \
<(\w+(;\w+)*)>
\__________/
1
This matches <word>, <word;another>, <word;another;please>, etc. Group 2 is repeated to capture any number of words, but it can only keep the last capture. The entire list of words is captured by group 1; this string is then split on the semicolon delimiter.
Related questions
How do you access the matched groups in a javascript regex?
How's about this? "foo bar baz".match(/(\w+)+/g)
Unless you have a more complicated requirement for how you're splitting your strings, you can split them, and then return the initial string with them:
var data = "foo bar baz";
var pieces = data.split(' ');
pieces.unshift(data);
try using 'g':
"foo bar baz".match(/\w+/g)
You can use LAZY evaluation.
So, instead of using * (GREEDY), try using ? (LAZY)
REGEX: (\s*\w+)?
RESULT:
Match 1: foo
Match 2: bar
Match 3: baz

scala regex to match tab separated words from a string

I'm trying to match the following string
"name type this is a comment"
Name and type are definitely there.
Comment may or may not exist.
I'm trying to store this into variables n,t and c.
val nameTypeComment = """^(\w+\s+){2}(?:[\w+\s*)*\(\,\,]+)"""
str match { case nameType(n, t, c) => print(n,t,c) }
This is what I have but doesn't seem to be working. Any help is appreciated.
val nameType = """^(\w+)\s+([\w\)\(\,]+)""".r
However this works when i was trying to work with strings only with name and type and no comment which is a group of words which might or not be there.
Note that ^(\w+\s+){2}(?:[\w+\s*)*\(\,\,]+) regex only contains 1 capturing group ((\w+\s+)) while you define 3 in the match block.
The ^(\w+)\s+([\w\)\(\,]+) only contains 2 capturing groups: (\w+) and ([\w\)\(\,]+).
To make your code work, you need to define 3 capturing groups. Also, it is not clear what the separators are, let me assume the first two fields are just 1 or more alphanumeric/underscore symbols separated by 1 or more whitespaces. The comment is anything after 2 first fields.
Then, use
val s = "name type this comment a comment"
val nameType = """(\w+)\s+(\w+)\s+(.*)""".r
val res = s match {
case nameType(n, t, c) => print(n,t,c)
case _ => print("NONE")
}
See the online demo
Note that we need to compile a regex object, pay attention at the .r after the regex pattern nameType.
Note that a pattern inside match is anchored by default, the start of string anchor ^ can be omitted.
Also, it is a good idea to add case _ to define the behavior when no match is found.

1 to 5 of the same groups in REGEX

For a string such as:
abzyxcabkmqfcmkcde
Notice that there are string patterns between ab and c in bold. To capture the first string pattern:
ab([a-z]{3,5})c
Is it possible to match both of the groups from the sample string? Actually, there should be 1 to 5 groups.
Note: python style regex.
You can verify that a given string conforms to the 1-5 repetitions of ab([a-z]{3,5})c using this regex
(?:ab([a-z]{3,5})c){1,5}
or this one if there are characters expected between the groups
(?:ab([a-z]{3,5})c.*?){1,5}
You will only be able to extract the last matching group from that string however, not any of the previous ones. to get a previous one you need to use hsz's approach
Just match all results - i.e. with g flag:
/ab([a-z]{3,5})c/g
or some method like in Python:
re.findall(pattern, string, flags=0)