Regex table of contents - regex

I have a table of contents items I would need to regex. The data is not totally uniform and I cant get it to work in all cases.
Data is following:
1. Header 1
1.2. SubHeader2
1.2.1 Subheader
1.2.2. Another header
1.2.2.1 Test
1.2.2.2. Test2
So I would need to get both the number and the header in different groups. The number should be without the trailing dot, if it is there. The issue that im struggling with is that not all of the numbers have the trailing dot.
I have tried
^([0-9\.]+)[\.]\s+(.+)$ -- Doesnt work when there is no trailing
^([0-9\.]+)[\.]?\s+(.+)$ -- Contains the trailing dot if it is there

You can use
^(\d+(?:\.\d+)*)\.?\s+(.+)
See the regex demo. Details:
^ - start of string
(\d+(?:\.\d+)*) - Group 1: one or more digits and then zero or more repetitions of a . and one or more digits sequence
\.? - an optional .
\s+ - one or more whitespaces
(.+) - Group 2: any one or more chars other than line break chars, as many as possible.

Related

capturing values after an optional slash

I am trying to write in regex a string that allows me to have
an alphanumeric string of length no longer than 5 (as an example) [a-z0-9]{3,5}
followed by an optional forward slash /?
that cannot end in a 3
I want to capture any group of at least 3, with our without a slash, and then anything after it.
And I am having a very hard time accomplishing this. If I require the slash / it is much easier to do so.
When I try
(?=.+\/?.+)[a-z0-9]{2,5}\/?(?<!3\/|3)
I can capture what I want - up until the slash, but can't crack how to get anything after IF legit things occur
(?=.+\/?.+)[a-z0-9]{2,62}\/?.?
My requirement for length goes up by 1 - to 4 instead of 3 - due to the additional . I put after the \/?. I could change my match to account for it, but it becomes really difficult.
(?=.+\/?.+)[a-z0-9]{2,5}\/?(?<!3\/|3)$
This only gives me the last slash or non slash follwed by 2,5 characters.
(?=.+\/?.+)[a-z0-9]{2,62}\/?.*
or
(?=.+\/?.+)[a-z0-9]{2,62}\/?.?+
simply then ignores my ending rule, of not being able to close with3/ or 3. Also this allows me to use more than 5 characters before the slash. Def not what I want :)
Is there a way to make an optional field still maintain length and ending rules?
I am running this script on both regexr.com and https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_regexp and gitbash and not getting the results I would like
Try:
^[a-z0-9]{3,5}(?<!3)(?:$|\/.*)
Regex demo.
^ - beginning of the string
[a-z0-9]{3,5} - capture a-z0-9 between 3 and 5 times
(?<!3) - the last character should not be 3
(?:$|\/.*) - match either end of string $ or / and any number of characters.
If the last character in this range [a-z0-9] should not be a 3 you can exclude it like [a-z124-9]
^[a-z0-9]{2,4}[a-z124-9](?:\/.*)?$
Explanation
^ Start of string
[a-z0-9]{2,4} Match 2-4 chars in the ranges a-z 0-9
[a-z124-9] Match a single char a-z and then either 1,2 4-9
(?:\/.*)? Optionally match / and the rest of the line
$ End of string
See a regex101 demo.
If you can not match a 3 at all:
^[a-z124-9]{3,5}(?:\/.*)?$
See another regex101 demo

RegEx: allow 1-25 letters or spaces but exclude three-letter-value-list

I'm new to this forum and hope someone can support me.
I need to create a RegEx pattern which allows 1 to 25 letters or spaces but does not allow one of the values EMP, NDB, POI or CWR.
I tried the following using negative lookahead:
((?!EMP|NDB|POI|CWR)[A-Za-z\s]{1,25})$
However this does not work properly, the value (like EMP) is still accepted - see https://regex101.com/r/YfflBi/1
This only works fine if I only have letters (no spaces) and limit down to 3:
((?!EMP|NDB|POI|CWR)[A-Za-z]{3})$
(see https://regex101.com/r/SzmuwP/1)
However the challenge here is that I need 1 to 25 letters or spaces to be accepted but not one of the three-letter-values I mentioned.
Many thanks in advance to everyone thinking about a solution!
You can use
^(?:(?!EMP|NDB|POI|CWR)[A-Za-z\s]){1,25}$
See the regex demo. Details:
^ - start of string
(?: - start of a non-capturing group:
(?!EMP|NDB|POI|CWR)[A-Za-z\s] - a letter or whitespace that is not the starting char of the char sequences defined in the negative lookahead
){1,25} - repeat the pattern sequence inside the non-capturing group one to 25 times
$ - end of string.

Regex to match n times for helm

To match these examples:
1-10-1
1-7-3
10-8-5
1-7-14
11-10-12
This regex works:
^[\\d]{1,2}-[\\d]{1,2}-[\\d]{1,2}$
How could this be written in a way that just matches something like "[\d]{1,2}-?" three (n) times?
You may use:
^\d\d?(?:-\d\d?){2}$
See an online demo.
^ - Start line anchor.
\d\d? - A single digit followed by an optional one (the same as \d{1,2}).
(?:-\d\d?){2} - A non-capture group starting with an hyphen followed by the same construct as above, one or two digits. The capture group is repeated exactly two times.
$ - End string anchor.
The idea here is to avoid an optional hyphen in the attempts you made since essentially you'd start to allow whole different things like "123" and "123456". It's more appropriate to match the first element of a delimited string and then use the non-capture group to match a delimiter and the rest of the required elements exactly n-1 times.

Regex to remove all zeroes except the last one

I'm building an expression that will be processing my fixed width files fields. I need to get rid of all the zeroes in front of the amount, but sometimes there is only zeroes in this field.
There is always 11 characters in this field. This is the expression I have so far.
^0+(?=.$)
Works fine with 00000000000 as long as there are only zeroes in this field. However this is a payment app and this field stores amounts, so if we get for example 00000000099 it's not working as expected and returns whole string. What would be the best way to approach this? I'm still quite fresh to this, I must be missing a trivial thing. Thanks in advance.
You haven't mentioned which app you are using. Maybe there is a function to remove padding? If you want regex, it looks like you could try:
^0+(?=\d+$)
And replace with nothing. See the online demo.
^ - Start line anchor.
0+ - Match 1+ zeros upto;
(?=\d+$) - A positive lookahead for 1+ digits before end line character.
Or use:
^0+(\d+)$
And replace by the 1st capture group. See the demo
^ - Start line anchor.
0+ - Match 1+ zeros upto;
(\d+) - 1st Capture group holding 1+ digits.
$ - End line anchor.

How do you specify multiples in negative character classes in regular expressions?

I am trying to write a regular expression to search for anything but digits or the * or - characters, with one caveat. Where I'm hitting a wall is that I need to be able to allow three or less digits to be found but not four or more, though even one * or - shouldn't be found.
This is what I have so far (for three matches):
.*?([^0-9\*-]+).*?([^0-9\*-]+).*?([^0-9\*-]+).*?
I have no idea where to insert {4,} for the digits (I've tried and it doesn't seem to work anywhere) or how to change it to do as I want.
For instance, in "Jack has* 777 1883874 -sheep-" I'd like it to return "Jack has 777 sheep". Or in "2343klj-3***.net" I'd like it to return "klj 3 .net"
You may use the following regex (replacing with a literal space, " "):
(?:[-*\s]|\d{4,})+
See the regex demo. Replace with $1 (to insert one captured horizontal whitespace if any).
Details
(?:[-*\s]|\d{4,})+ - a non-capturing group matching one or more consecutive repetitions of
[-*\s] - 0+ whitespaces, - or/and *
| - or
\d{4,} - 4+ digits.
Next, to remove all leading and trailing whitespace you may use
^\s+|\s+$
and replace with an empty string. ^\s+ matches 1+ whitespaces at the start of the string and \s+$ matches 1+ whitespaces at the end of the string.
With the help here, this is what works. It may be impossible to do it all in one regex because of the conflict of needing no spaces at the beginning and end but spaces in between each remaining grouping.
First, a find and replace using ([-*\h]|\d{4,})+ and replacing with a space.
Second, using ^\s*(.*)\s*$.