regex combination of two lookaround - regexstorm.net - regex

I have to collect two informantion from a text using regex. The name and the database and relate then in one table. But a can only collect then individually.
This is an example, i have many blocks of these, and two of then don't have a database value, these i need to ingnore
[SCD] {I need the name between []}
Driver=/opt/pcenter/pc961/ODBC7.1/lib/DWmsss27.so
Description=
Database=scd {I need the value after Defaut|Database}
Address=#######
LogonID=######
Password=######
QuoteId=No
AnsiNPW=No
ApplicationsUsingThreads=1
The regex to find the name is:
(?<=\[)(.*)(?=\])
The regex to find the value after database is
(?<=Defaut|Database=)(.*)
How can i combine both of then into onde regex ?

To match both values you could use 2 capturing groups instead and use a repeating pattern and a negative lookahead to check if a line do not start with Default of Database until the line does.
\[([^]]+)\](?:\r?\n(?!Default|Database).*)*\r?\n(?:Default|Database)=(\S+)
About the pattern
\[ Match [
( Capture group 1
[^]]+ match 1+ times not ]
) Close group 1
\] Match ]
(?: Non capturing group
\r?\n Match newline,
(?! Negative lookahead, assert what is directly on the right is not
Default|Database Match one of the options
).* Close negative lookahead and match any char except a newline 0+ times
)* Close non capturing group and repeat 0+ times
\r?\n(?:Default|Database)= Match newline, any of the options and =
(\S+) Capturing group 2, match 1+ times a non whitespace char (or use (.+) to match any char 1+ times)
regexstorm demo

Related

How to make optional capturing groups be matched first

For example I want to match three values, required text, optional times and id, and the format of id is [id=100000], how can I match data correctly when text contains spaces.
my reg: (?<text>[\s\S]+) (?<times>\d+)? (\[id=(?<id>\d+)])?
example source text: hello world 1 [id=10000]
In this example, all of source text are matched in text
The problem with your pattern is that matches any whitespace and non whitespace one and unlimited times, which captures everything without getting the other desired capture groups. Also, with a little help with the positive lookahead and alternate (|) , we can make the last 2 capture groups desired optional.
The final pattern (?<text>[a-zA-Z ]+)(?=$|(?<times>\d+)? \[id=(?<id>\d+)])
Group text will match any letter and spaces.
The lookahead avoid consuming characters and we should match either the string ended, or have a number and [id=number]
Said that, regex101 with further explanation and some examples
You could use:
:\s*(?<text>[^][:]+?)\s*(?<times>\d+)? \[id=(?<id>\d+)]
Explanation
: Match literally
\s* Match optional whitespace chars
(?<text> Group text
[^][:]+? match 1+ occurrences of any char except [ ] :
) Close group text
\s* Match optional whitespace chars
(?<times>\d+)? Group times, match 1+ digits
\[id= Match [id=
(?<id>\d+) Group id, match 1+ digirs
] Match literally
Regex demo

Regex match specific strings

I want to capture all the strings from multi lines data. Supposed here the result and here’s my code which does not work.
Pattern: ^XYZ/[0-9|ALL|P] I’m lost with this part anyone can help?
Result
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/ALL
XYZ/P1
XYZ/P2,3
XYZ/P4,5-7
XYZ/P1-4,5-7,8-9
Changed to
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/A12345 after the slash limited to 6 alphanumeric chars
XYZ/LH-1234567890 after the /LH- limited to 10 numeric chars
The pattern could be:
^XYZ\/(?:ALL|P?[0-9]+(?:-[0-9]+)?(?:,[0-9]+(?:-[0-9]+)?)*)$
The pattern in parts matches:
^ Start of string
XYZ\/ Match XYX/ (You don't have to escape the / depending on the pattern delimiters)
(?: Outer on capture group for the alternatives
ALL Match literally
| Or
P? Match an optional P
[0-9]+(?:-[0-9]+)? Match 1+ digits with an optional - and 1+ digits
(?: Non capture group to match as a whole
,[0-9]+(?:-[0-9]+)? Match ,and 1+ digits and optional - and 1+ digits
)* Close the non capture group and optionally repeat it
) Close the outer non capture group
$ End of string
Regex demo
You can use this regex pattern to match those lines
^XYZ\/(?:P|ALL|[0-9])[0-9,-]*$
Use the global g and multiline m flags.
Btw, [P|ALL] doesn't match the word "ALL".
It only matches a single character that's a P or A or L or |.

Regex - add a zero after second period

I have the following example of numbers, and I need to add a zero after the second period (.).
1.01.1
1.01.2
1.01.3
1.02.1
I would like them to be:
1.01.01
1.01.02
1.01.03
1.02.01
I have the following so far:
Search:
^([^.])(?:[^.]*\.){2}([^.].*)
Substitution:
0\1
but this returns:
01 only.
I need the 1.01. to be captured in a group as well, but now I'm getting confuddled.
Does anyone know what I am missing?
Thanks!!
You may try this regex replacement with 2 capture groups:
Search:
^(\d+\.\d+)\.([1-9])
Replacement:
\1.0\2
RegEx Demo
RegEx Details:
^: Start
(\d+\.\d+): Match 1+ digits + dot followed by 1+ digits in capture group #1
\.: Match a dot
([1-9]): Match digits 1-9 in capture group #2 (this is to avoid putting 0 before already existing 0)
Replacement: \1.0\2 inserts 0 just before capture group #2
You could try:
^([^.]*\.){2}\K
Replace with 0. See an online demo
^ - Start line anchor.
([^.]*\.){2} - Negated character 0+ times (greedy) followed by a literal dot, matched twice.
\K - Reset starting point of reported match.
EDIT:
Or/And if \K meta escape isn't supported, than see if the following does work:
^((?:[^.]*\.){2})
Replace with ${1}0. See the online demo
^ - Start line anchor.
( - Open 1st capture group;
(?: - Open non-capture group;
`Negated character 0+ times (greedy) followed by a literal dot.
){2} - Close non-capture group and match twice.
) - Close capture group.
Using your pattern, you can use 2 capture groups and prepend the second group with a dot in the replacement like for example \g<1>0\g<2> or ${1}0${2} or $10$2 depending on the language.
^((?:[^.]*\.){2})([^.])
^ Start of string
((?:[^.]*\.){2}) Capture group 1, match 2 times any char except a dot, then match the dot
([^.].*) Capture group 2, match any char except a dot
Regex demo
A more specific pattern could be matching the digits
^(\d+\.\d+\.)(\d)
^ Start of string
(\d+\.\d+\.) Capture group 1, match 2 times 1+ digits and a dot
(\d) Capture group 2, match a digit
Regex demo
For example in JavaScript
const regex = /^(\d+\.\d+\.)(\d)/;
[
"1.01.1",
"1.01.2",
"1.01.3",
"1.02.1",
].forEach(s => console.log(s.replace(regex, "$10$2")));
Obviously, there will be tons of solutions for this, but if this pattern holds (i.e. always the trailing group that is a single digit)... \.(\d)$ => \.0\1 would suffice - to merely insert a 0, you don't need to match the whole thing, only just enough context to uniquely identify the places targeted. In this case, finding all lines ending in a . followed by a single digit is enough.

RegEx optional group with optional sub-group

I have a set of strings with fairly inconsistent naming, that should be structured enough to be divided into groups though.
Here's an excerpt:
test test 1970-2020 w15.txt
test 1970-2020 w15.csv
test 1990-99 q1 .txt
test 1981 w15 .csv
test test w15.csv
I am trying to extract information by groups (test-name, (year)?, suffix, type) using the following RegEx:
(.*)\s+([0-9]+(\-[0-9]+)?\s+)?((w|q)[0-9]+(\s+)?)(\..*)$
It works except for the optional group matching the years (interval of year's, single year or no year at all).
What am I missing to make the pattern work?
Here's also a link to RegEx101 for testing:
https://regex101.com/r/wG3aM3/817
You could make the pattern a bit more specific and make the content of the year optional
^(.*?)\s+((?:\d{4}(?:-(?:\d{4}|\d{2}))?)?)\s+([wq][0-9]+)\s*(\.\w+)$
Explanation
^ Start of string
(.*?) Capture group 1 Match 0+ times any char except a newline non greedy
\s+ Match 1+ whitespace chars
( Capture group 2
(?: Non capture group
\d{4}(?:-(?:\d{4}|\d{2}))? Match 4 digits and optionally - and 2 or 4 digits
)? Close non capture group and make the year optional
) Close group 2
\s+ Match 1+ whitespace chars
([wq][0-9]+) Capture group 3 Match either w or q and 1+ digits 0-9
\s* Match 0+ whitespace chars
(\.\w+) Capture group 4, match a dot and 1+ word characters
$ End of string
Regex demo
Note that \s could also match a newline.

Is it possible to compare two values in a row and fetch the desired one, but both the values matches the regex written

text = "Happy 4/20 from the team! 13/10 congrats..after so many contents"
I want to fetch only 13/10 which is the rating. I have written regex
(\d+\.\d+|\d+)/(((?=10)10)|([1-9]\d+))
but it fetches the first one(4/20).
Is this possible to achieve using regex?
In this part of your pattern (?=10)10 you can omit the positive lookahead because that says if what is on the right is 10, then match 10. This part [1-9]\d+ matches 10 and above so 10 is already in the range.
You could use a capturing group with a quantifier {2} to repeat that group.
Your pattern can also be written as \d+(?:\.\d+)?/[1-9]\d+)
To get the second group, you might use:
^(?:.*?(\d+(?:\.\d+)?/[1-9]\d+)){2}
^ Start of the string
(?: Non capturing group
.*? Match any char non greedy
( Capturing group
\d+(?:\.\d+)? Match 1+ digits, optionally match a dot and 1+ digits
/ Match /
[1-9]\d+ Match 10 and above
) Close capturing group
){2} Close non capturing group and repeat 2 times
Regex demo