I have this text:
text1 without brackets
text2 (with brackets)
and I need two groups in every line:
group#1: text1 without brackets
group#2:
group#1: text2
group#2: with brackets
Here is a link for this example: regexr.com
Thanks for help!
You may use
^(.*?)(?:\s*\(([^()]*)\))?$
See the regex demo and the regex graph:
Details
^ - start of string
(.*?) - Group 1: any 0+ chars as ew as possible
(?:\s*\(([^()]*)\))? - an optional sequence of patterns that is tried at least once:
\s* - 0+ whitespaces
\( - a ( char
([^()]*) - Group 2: 0+ chars other than ( and )
\) - a ) char
$ - end of the string.
Try pattern: ([^(\n]+)(?:\n|\(([^)]+))
Explanation:
([^(\n]+) - first capturing group: match one or more characters other than ( or \n so it will match everything until opening bracket or newline character
(?:...) - used in order to make use of alternation and not create second capturing group
\n|\(([^)]+) - match newline or bracker ( and one or more characters other than closing bracket ) storing it into second capturing group.
Demo
Related
I have the following input: Mobileapp/1.19.2 (SM-S908B; Android 12; da-DK)
I either need to match (SM-S908B; and da-DK) or just (SM-S908B;
So match anything between ( and first ; and last ; and )
I tried and and was able to use this expression ([^(;]+);([^;]+)
But it matches to SM-S908B; Android 12
Would really appreciate if anyone could help since I am still learning Regex.
Assuming at least one occurence of the semi-colon is present, maybe chuck both options in their own group:
(?:\(([^;]+)|;\s*([^;)]+)\))
See an online demo
(?: - Open non-capture group;
\(([^;]+) - Match a literal open-paranthesis followed by a 1st capture group to match 1+ non-semicolon characters;
| - Or;
;\s*([^;)]+)\) - Match a semicolon and 0+ whitespace characters before a 2nd capture group to match 1+ characters other than semicolon or closing paranthesis.
Another option is to match just these parts:
(?:\(|;.*;\s*|\G(?!^))\K[^;)]+
See an online demo
(?: - Open non-capture group;
\( - Match an open paranthesis;
| - Or;
;.*;\s* - Match from 1st semicolon to last semicolon with possible 0+ whitespace chars;
| - Or;
\G(?!^) - Assert position at end of previous match but exclude start-line with negative lookahead;
\K - Reset starting point of reported match;
[^;)]+ - Match 1+ characters other than semicolon or closing paranthesis.
I am trying to do regex parsing and matching and optionally discard the rest of the string.
My strings are of type:
[GROUP 1][delimiter 1][GROUP 2][delimiter 2][GROUP 3][delimiter 3 - optional][REST OF THE STRING - optional]
For example:
07. Neospace - Into The Night (Chris Van Buren)
13. Atomic Space Orchestra - Starfleet
I am trying to capture GROUP 1, GROUP 2 and GROUP 3 while ignoring REST OF THE STRING
The following regex works well if [delimiter 3] is present:
(\d+)\. (.*) - (.*)(?: \()
I am getting "07", "Neospace" and "Into The Night".
But for the second string, there is no match, because my last non-capturing group is mandatory.
When I'm trying to make last group optional like this:
(\d+)\. (.*) - (.*)(?: \()? non-capturing group stops working and I am getting "Into The Night (Chris Van Buren)" for the GROUP 3 - which is NOT what I want.
If the 3rd group has ( as a delimiter, you can use a negated character class to exclude matching a ( char.
Note that using * as a quantifier can also match an empty string between the delimiters.
If the match should be at the start of the string, you can prepend the pattern with ^
(\d+)\. (.*?) - ([^(\n]*)
Explanation
(\d+) Capture group 1, match 1+ digits
\. Match .
(.*?) Capture group 2, match 0+ times any character, as few as possible
- Match literally
([^(\n]*) Capture group 3, match 0+ times any character except ( or a newline
See a regex demo.
I have fields which contain data in the following possible formats (each line is a different possibility):
AAA - Something Here
AAA - Something Here - D
Something Here
Note that the first group of letters (AAA) can be of varying lengths.
What I am trying to capture is the "Something Here" or "Something Here - D" (if it exists) using PCRE, but I can't get the Regex to work properly for all three cases. I have tried:
- (.*) which works fine for cases 1 and 2 but obviously not 3;
(?<= - )(.*) which also works fine for cases 1 and 2;
(?! - )(.+)| - (.+) works for cases 2 and 3 but not 1.
I feel like I'm on the verge of it but I can't seem to crack it.
Thanks in advance for your help.
Edit: I realized that I was unclear in my requirements. If there is a trailing " - D" (the letter in the data is arbitrary but should only be a single character), that needs to be captured as well.
About the patterns that you tried:
- (.*)This pattern will match the first occurrence of - followed by matching the rest of the line. It will match too much for the second example as the .* will also match the second occurrence of -
(?<= - )(.*)This pattern will match the same as the first example without the - as it asserts that is should occur directly to the left
(?! - )(.+)| - (.+) This pattern uses a negative lookahead which asserts what is directly to the right is not (?! - ). As none of the example start with - , the whole line will be matched directly after the negative lookahead due to .+ and the second part after the alternation | will not be evaluated
If the first group of letters can be of varying length, you could make the match either specific matching 1 or more uppercase characters [A-Z]+ or 1+ word characters \w+.
To get a more broad match, you could match 1 or more non whitespace characters using \S+
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*
Explanation
^ Start of string
(?:\S+\h-\h)? Optionally match the first group of non whitespace chars followed by - between horizontal whitespace chars
\K Clear the match buffer (Forget what is currently matched)
\S+ Match 1+ non whitespace characters
(?: Non capture group
\h(?!-\h) Match a horizontal whitespace char and assert what is directly to the right is not - followed by another horizontal whitespace char
\S+ Match 1+ non whitespace chars
)* Close non capture group and repeat 1+ times to match more "words" separated by spaces
Regex demo
Edit
To match an optional hyphen and trailing single character, you could add an optional non capturing group (?:-\h\S\h*)?$ and assert the end of the string if the pattern should match the whole string:
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*\h*(?:-\h\S\h*)?$
Regex demo
You may use
^(?:.*? - )?\K.*?(?= - | *$)
^(?:.*?\h-\h)?\K.*?(?=\h-\h|\h*$)
See the regex demo
Details
^ - start of string
-(?:.*? - )? - an optional non-capturing group matching any 0+ chars other than line break chars as few as possible up to the first space-space
\K - match reset operator
.*? - any 0+ chars other than line break chars as few as possible
(?= - | *$) - space-space or 0+ spaces till the end of string should follow immediately on the right.
Note that \h matches any horizontal whitespace chars.
^(?:[A-Z]+ - \K)?.*\S
demo
Since "Something Here" can be anything, there's no reason to specially describe the eventual last letter in the pattern. You don't need something more complicated.
With this pattern I assume that you are not interested by the trailing spaces, that's why I ended it with \S. If you want to keep them, remove the \S and change the previous quantifier to +.
I have a lot of calls in lots of different files to os.getenv('some_var'). I would like to replace all of these with os.environ['some_var'].
I know how to replace all instances of os.getenv with os.environ but not how to replace the (.*) with [.*] without loosing the text inside.
Try this regex:
(os\.)[^()]*\(([^()]*)\)
Replace each match with \1environ[\2]
Click for Demo
Explanation:
(os\.) - matches os. and capture in group 1
[^()]*\( - matches 0+ occurrences of any character that is neither a ( nor ) follwed by (
([^()]*) - matches 0+ occurrences of any character that is neither a ( nor ). This substring is captured in Group 2
\) - matches )
You can match the text and capture the text inside parenthesis using this regex,
os.getenv\('([^']+)'\)
And replace it with os.environ['\1']
This regex basically has three parts,
os.getenv\(' - This literally matches os.getenv('
([^']+) - This captures whatever text is there in parenthesis and captures it in group1
'\) - This literally matches ')
Demo
I am converting one pdf to text with xpdf and then find some words
with help of regex and preg_match_all.
I am seperating my words with colon in pdftotext.
Below is my pdftotext output:
In respect of Shareholders
Name: xyx
Residential address: dublin
No of Shares: 2
Name: abc
Residential address: canada
No of Shares: 2
So i write one regex that will show me words after colon in text().
$regex = '/(?<=: ).+/';
preg_match_all($regex, $string, $matches);
But Now i want regex that will display all data after In respect of Shareholders.
So, i write $regex = '/(?<=In respect of Shareholders).*?(?=\s)';
But it shows me only :
Name: xyx
I want first to find all data after In respect of shareholders and then another regex to find words after colon.
You may use
if (preg_match_all('~(?:\G(?!\A)|In respect of Shareholders)\s*[^:\r\n]+:\h*\K.*~', $string, $matches)) {
print_r($matches[0]);
}
See the regex demo
Details
(?:\G(?!\A)|In respect of Shareholders) - either the end of the previous successful match or In respect of Shareholders text
\s* - 0+ whitespaces
[^:\n\r]+ - 1 or more chars other than :, CR and LF
: - a colon
\h* - 0+ horizontal whitespaces
\K - match reset operator that discards all text matched so far
.* - the rest of the line (0 or more chars other than line break chars).
In your regex (?<=: ).+ you will match any character 1+ times after a colon and a space. To capture all that follows the spaces or tabs in a group, you could use (?<=: )[\t ](.+)
Another way to match the texts using a capturing group could be:
^.*?:[ \t]+(\w+)
Explanation
^ Assert start of the string
.*?: Match any character non greedy followed by a :
[ \t]+ Match 1+ times a space or a tab
(\w+) Capture in a group 1+ word characters
Regex demo | Php demo
Or use \K to forget what was matched if that is supported:
^.*?:\h*\K\w+
Regex demo