Regex to capture words and numbers in separate groups - regex

I need two groups - one to extract words, second - numbers. Example:
['| Sofia | 300']
need to extract:
Group 1 - Sofia; Group 2 - 300
My regex attempt:
([a-zA-Z]+[ ]*[a-zA-Z]+)([0-9]+)
I don't understand as to why this doesn't match. I've been reading for 30 minutes now and maybe I can't phrase my issue correctly, but I can't find solution. My thinking here is that each set of parentheses holds a group. The Regex inside them seems to work fine on its own, but when I try to capture 2 groups - it fails. Obviously I am missing something important about multiple group capturing.

It doesn't match because you're not matching the characters between "Sofia" and "300". This would match "Sofia300", but not "Sofia 300" or "Sofia | 300". Try this:
(\w+ *\w+).*?(\d+)
(I'm using \w instead of [a-zA-Z] and \d instead of [0-9] for brevity.)

The following will give you your groups:
/([a-z]+).*\|\s([0-9]+)/i
Example

Related

How to select with regex this character?

For the example i have these four ip address:
10.100.0.11; wrong
10.100.1.12; good
10.100.11.4; good
10.100.44.1; wrong
The task has simple rules. In the 3rd place cant be 0, and the 4rd place cant be a solo 1.
I need to select they from an ip table in different routers and i know only this rules.
My solution:
^(10.100.[1-9]{1,3}.[023456789]{1,3})$
but in this case every number with 1 like 10, 100 etc is missing, so in this way this solution is wrong.
^(10.100.[1-9]{1,3}.[1-9]{2,3})$
This solve the problem of the single 1, but make another one.
From the rules you have given, this regex should work:
10\.100\.([123456789]\d*|\d{2,})\.([^1]$|\d{2,})
it also matches 3rd position number containing a 0 but not in the first place.
so 10.100.10.4 will match as well as 10.100.02.4
I don't know if it's the intended behavior since I'm not familiar with ip adress.
The last part \.([^1]$|\d{2,}) reads like this:
"after the 3rd dot is either
a character which is not 1 followed by the end of the line
or two or more digits"
If you want to avoid malformed string containing non-digit character like 10.100.12.a to be match you should replace [^1] by [023456789] or lazier (and therefore better ;) by [02-9]
I use https://regex101.com to debug regex. It's just awesome.
Here is your regex if you want to play with it
You might use
^10\.100\.[1-9]{1,3}\.(?:[02-9]|\d{2,3})$
The pattern matches
^ Start of string
10\.100\. Match 10.100. (note to escape the dot to match it literally)
[1-9]{1,3} Match 3 times a digit 1-9
\. Match a dot
(?: Non capture group
[02-9] Match a digit 0 or 2-9
| Or
\d{2,3} Match 2 or 3 digits 0-9
) Close the group
$ End of string
Regex demo

Regex Giftcard number pattern

I am trying to come up with a regex for a giftcard number pattern in an application. I have this so far and it works fine:
(?:5049\d{12}|6219\d{12}) = 5049123456789012
What I need to account for though is numbers that are separated by dashed or spaces like so:
5049-1234-5678-9012
5049 1234 5678 9012
Can I chain these patterns together or do I need to make separate for each type?
The easiest and most simple regex could be:
(?:(5049|6219)([ -]?\d{4}){3})
Explanation:
(5049|6219) - Will check for the '5049' or '6219' start
(x){3} - Will repeat the (x) 3 times
[ -]? - Will look for " " or "-", ? accepts it once or 0 times
\d{4} - Will look for a digit 4 times
A more detailed explanation and example can be found here: https://regex101.com/r/A46GJp/1/
Use (?:5049|6219)(?:[ -]?\d{4}){3}
First, match one of the two leads. Then match 3 groups of 4 digits each, each group optionally preceded by space or dash.
See regex101 for demo, and also explains in more detail.
The above regex will also match if separators are mixed, e.g. 5049 1234-5678 9012. If you don't want that, use
(?:5049|6219)([ -]?)\d{4}(?:\1\d{4}){2} regex101
This captures the first separator, if any, and specifies that the following 2 groups must use that same separator.
Try this :
(?:(504|621)9(\d{12}|(\-\d{4}){3}|(\s\d{4}){3}))
https://regex101.com/r/SyjaT5/6

Regular Expression to Extract Text Bounded by '/'

I need to a regular expression to extract names from a GEDCOM file. The format is:
Fred Joseph /Smith/
Where the text bounded by the / is the surname and the Fred Joseph are the forenames. The complication is that the surname could be at any place in the text or may not be there at all. I need something that will extract the surname and capture everything else as the forenames.
This is as far as I have got and I have tried making groups optional with the ? qualifier but to no avail:
As you can see it has several problems: If the surname is missing nothing gets captured, the forename(s) sometimes have leading and trailing spaces, and I have 3 capture groups when I'd really like 2. Even better would be if the capture group for the surname didn't include the '/' characters.
Any help would be much appreciated.
For your last line, I'm not sure there is a way to join the group 1 with group 3 into a single group.
Here is my proposed solution. It doesn't capture spaces around forenames.
^(?:\h*([a-z\h]+\b)\h*)?(?:\/([a-z\h]+)\/)?(?:\h*([a-z\h]+\b)\h*)?$
To correctly match the names, care to use the insensitive flag, and if you test all lines at once, use multiline flag.
See the demo
Explanation
^ start of the line
(?:\h*([a-z\h]+\b)\h*)? first non-capturing group that matches 0 or 1 time:
\h* 0 or more horizontal spaces
([a-z\h]+\b) captures in a group letters and spaces, but stops at the end of the last word
\h* matches the possible remaining spaces without capturing
(?:\/([a-z\h]+)\/)? second non-capturing group that matches 0 or 1 time a name in a capturing group surrounded by slashes
(?:\h*([a-z\h]+\b)\h*)? third non-capturing group doing the same as first one, capturing the names in a third group.
$ end of the line
For your requirements
([A-z a-z /])+\w*
Sample
Hope this helps
(.\*?)\\/(.\*?)\\/(.\*)
Try this: ^([^/]*)(/[^/]+/)?([^/]*)$
This matches the following:
^ start of string (or with multiline modifier start of line)
([^/\n]*) anything other than / or new line zero or more times - this is captured as group 1
(/[^/\n]+/)? a single / followed by one or more non / or new line characters, then a single '/' character - this is captured as group 2, and is optional
([^/\n]*) anything other than / or new line zero or more times - this is captured as group 3
$ end of string (or with multiline modifier end of line)
You can see in action with your example text here: https://regex101.com/r/9kmKpy/1
To not capture the slashes you can add a non capturing group by adding ?: to the second set of brackets, and then adding another pair between the slashes:
^([^\/\n]*)(?:\/([^\/\n]+)\/)?([^\/\n]*)$
https://regex101.com/r/9kmKpy/2
I am not sure I follow what language is being used to extract the data, but based on what you have so far, you simply need to add '?':
(.*)(\/?.*\/?)(.*)
Not that this does not give you groupings for EACH name as some solutions will have multiple names in a single group
Edit:
Extending on Niitaku solution and looking at having each individual name in its own group, you could use:
^\s*(?:\/?([a-z]+)\/?)\s*(?:\/?([a-z]+)\/?)\s*(?:\/?([a-z]+)\/?)\s*$
As explained though, if using a language like ruby it would simply be:
ruby -pe '$_ = $_.scan(/\w+/)' file

Regex for 5 digit number with optional characters

I am trying to create a regex to validate a field where the user can enter a 5 digit number with the option of adding a / followed by 3 letters. I have tried quite a few variations of the following code:
^(\d{5})+?([/]+[A-Z]{1,3})?
But I just can't seem to get what I want.
For instance l would like the user to either enter a 5 digit number such as 12345 with the option of adding a forward slash followed by any 3 letters such as 12345/WFE.
You probably want:
^\d{5}(?:/[A-Z]{3})?$
You might have to escape that forward slash depending on your regex flavor.
Explanation:
^ - start of string anchor
\d{5} - 5 digits
(?:/[A-Z]{3}) - non-capturing group consisting of a literal / followed by 3 uppercase letters (depending on your needs you could consider making this a capturing group by removing the ?:).
? - 0 or 1 of what precedes (in this case that's the non-capturing group directly above).
$ - end of string anchor
All in all, the regex looks like this:
You can use this regex
/^\d{5}(?:\/[a-zA-Z]{3})?$/
^\d{5}(?:/[A-Z]{3})?$
Here it is in practice (this is a great site to test your regexes):
http://regexr.com?36h9m
^(\d{5})(\/[A-Z]{3})?
Tested in rubular

Regular expression to match string only if trailed by a character

I need help creating a regular expression.
Here are two sample strings:
/path/to/file.jpg
/path/to/file.type.jpg
Respectively, I'm trying to capture:
file.jpg
file.type.jpg
But I want to capture the three as separate strings.
file,jpg
file,type,jpg
Note that I'm not capturing the periods.
I thought something like this could work (excluding the new lines):
([a-z]+)\.
[([a-z]+)[\.]{1}]?
([a-z]{3})
Guidance would be appreciated.
I'm wondering if there is another modified I would need to use to have it capture it properly.
The above expression errors out, by the way :(
I suggest you to use pattern
\/([^.]+)\.?([^.]+|)\.([^.]+)$
and you will have 3 groups: file, type (which will be empty, if not present) and extension
You'd have to use:
/(\w+)(\.(\w+))?\.(\w+){3,4}\b
Then capturing groups 1, 3 and 4 would be your: file(1) type(3) and jpg/png whatever(4)
Groups taken apart:
(\w+) - matches word characters 1 or more (equivalent of saying: {1, }
(\.(\w+))? - matches the 3rd group and with a dot in front, and makes the whole group optional ( ? )
(\w+) - as gr 1
(\w{3,4})\b - matchees 3 or 4 word characters ( {3,4} ) and ensures that after those chracters there are no other characters (word end - \b - ! if supported !)
You can use: "\/(?:\w+\/)+(\w+)\.?(\w+)?\.(\w+)" as regex.
Edit: didnt read about not matching dots.
Live Demo
This regex should work:
/(\w+)\.(\w+)(?:\.(\w+))?$/
Live Demo