How to capture repeating groups in Regex (for C#) [duplicate] - regex

This question already has answers here:
Get string between two strings in a string
(27 answers)
Closed 6 years ago.
I've to detect and extract from a string a repeating group of characters and list one part of each captured group.
Here is an example of string to parse: "za e eStartGood1Endds qStartGood2Endsds df"
My Regex is: ".*?(?::Start(.+)End.*?)+"
Expecting groups captured expected: Good1, Good2, etc
My Regex capture is wrong: it seems that (?::Start(.+) is considered as group to capture...
May I miss something?
Thanks!

This regex do the job :
/(?<=Start)(.+?)(?=End)/g

Why not use this pattern: \*{2}Start\*{2}(.*?)\*{2}End\*{2}
For this input string: za e e**Start**Good1**End**ds dq**Start**Good2**End**sds df, it captures Good1 and Good2.
You can play with it here: https://regex101.com/r/dG0dX6/2

Related

How to match optional group, if it is already matched by main group? [duplicate]

This question already has an answer here:
regexp match anything before and after a word, if it exists
(1 answer)
Closed 9 months ago.
I have input strings like:
any[sym)bol_text
any[sym)bol_text (any[sym)bol_text) any[sym)bol_text
any[sym)bol_text (this_text)
any[sym)bol_text2 (this_text)Fzcj
And I have regexp:
(?<text>[^\r\n]+)(?:\(this_text\))?
But I can't handle strings with (this_text) optional group. It matches by first one, but I don't need this exact text in output
^(?<text>.+?)(?:\(this_text\).*)?$
So yes, last group should contains handling any text and ends with $

How can I remove a certain pattern from a string? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I have this string like "682_2, 682_3, 682_4". (682 is a random number)
How can i get this string "2, 3, 4" using regex and ruby?
You can do this in ruby
input="682_2, 682_3, 682_4"
output = input.gsub(/\d+_/,"")
puts output
A simple regex could be
/_([0-9]+)$/ and in the match group of the result you will have 2 for 682_2 and 3 for 682_3
Ruby code snippet would be "64532_2".match(/_([0-9]+)/).captures[0]
you can use scan which returns an array containing the matches:
string_code.scan(/(?<=_)\d/)
(?<=_) tells to find a pattern that has a given pattern (_ in this case) before itself but wont capture that, it captures only \d. if it can have more than 1 digit like 682_13,682_33 then \d+ is necessary.

Python regex to parse '#####' text in description field [duplicate]

This question already has answers here:
regex to extract mentions in Twitter
(2 answers)
Extracting #mentions from tweets using findall python (Giving incorrect results)
(3 answers)
Closed 3 years ago.
Here's the line I'm trying to parse:
#abc def#gmail.com #ghi j#klm #nop.qrs #tuv
And here's the regex I've gotten so far:
#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
My goal is to get ['#abc', '#ghi', '#tuv'], but no matter what I do, I can't get 'j#klm' to not match. Any help is much appreciated.
Try using re.findall with the following regex pattern:
(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)
inp = "#abc def#gmail.com #ghi j#klm #nop.qrs #tuv"
matches = re.findall(r'(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)', inp)
print(matches)
This prints:
['#abc', '#ghi', '#tuv']
The regex calls for an explanation. The leading lookbehind (?:(?<=^)|(?<=\s)) asserts that what precedes the # symbol is either a space or the start of the string. We can't use a word boundary here because # is not a word character. We use a similar lookahead (?=\s|$) at the end of the pattern to rule out matching things like #nop.qrs. Again, a word boundary alone would not be sufficient.
just add the line initiation match at the beginning:
^#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
it shoud work!

Regexp for string stating with a + and having numbers only [duplicate]

This question already has answers here:
Match exact string
(3 answers)
Closed 4 years ago.
I have the following regex for a string which starts by a + and having numbers only:
PatternArticleNumber = $"^(\\+)[0-9]*";
However this allows strings like :
+454545454+4545454
This should not be allowed. Only the 1st character should be a +, others numbers only.
Any idea what may be wrong with my regex?
You can probably workaround this problem by just adding an ending anchor to your regex, i.e. use this:
PatternArticleNumber = $"^(\\+)[0-9]*$";
Demo
The problem with your current pattern is that the ending is open. So, the string +454545454+4545454 might appear to be a match. In fact, that entire string is not a match, but the engine might match the first portion, before the second +, and report a match.

Regex to match all character groups in a string [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 5 years ago.
I need a regex to match the groups of characters in a string.
For example this is-a#beautiful^day.
Should result in the following list: this, is, a, beautiful, day.
As a mention I don't know how long the string is or by what characters the words are separated.
Any ideas? I have no clue how to build a regex for this.
If you want find all groups of letters:
import re
string = "this is-a#beautiful^day"
list = re.findall(r'[A-Za-z]+', string)
print list
['this', 'is', 'a', 'beautiful', 'day']