Is it possible to match only the letter from the following string?
RO41 RNCB 0089 0957 6044 0001 FPS21098343
What I want: FPS
What I'm trying LINK : [0-9]{4}\s*\S+\s+(\S+)
What I get: FPS21098343
Any help is much appreciated! Thanks.
You can try with this:
var String = "0258 6044 0001 FPS21098343";
var Reg = /^(?:\d{4} )+ *([a-zA-Z]+)(?:\d+)$/;
var Match = Reg.exec(String);
console.log(Match);
console.log(Match[1]);
You can match up to the first one or more letters in the following way:
^[^a-zA-Z]*([A-Za-z]+)
^.*?([A-Za-z]+)
^[\w\W]*?([A-Za-z]+)
(?s)^.*?([A-Za-z]+)
If the tool treats ^ as the start of a line, replace it with \A that always matches the start of string.
The point is to match
^ / \A - start of string
[^a-zA-Z]* - zero or more chars other than letters
([A-Za-z]+) - capture one or more letters into Group 1.
The .*? part matches any text (as short as possible) before the subsequent pattern(s). (?s) makes . match line break chars.
Replace A-Za-z in all the patterns with \p{L} to match any Unicode letters. Also, note that [^\p{L}] = \P{L}.
To grep all the groups of letters that go in a row in any place in the string you can simply use:
([a-zA-Z]+)
You could use a capture group to get FPS:
\b[0-9]{4}\s+\S+\s+([A-Z]+)
The pattern matches:
\b[0-9]{4} A wordboundary to prevent a partial match, and match 4 digits
\s+\S+\s+ Match 1+ non whitespace chars between whitespace chars
([A-Z]+) Capture group 1, match 1+ chars A-Z
Regex demo
If the chars have to be followed by digits till the end of the string, you can add \d+$ to the pattern:
\b[0-9]{4}\s+\S+\s+([A-Z]+)\d+$
Regex demo
Related
I want to capture all the strings from multi lines data. Supposed here the result and here’s my code which does not work.
Pattern: ^XYZ/[0-9|ALL|P] I’m lost with this part anyone can help?
Result
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/ALL
XYZ/P1
XYZ/P2,3
XYZ/P4,5-7
XYZ/P1-4,5-7,8-9
Changed to
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/A12345 after the slash limited to 6 alphanumeric chars
XYZ/LH-1234567890 after the /LH- limited to 10 numeric chars
The pattern could be:
^XYZ\/(?:ALL|P?[0-9]+(?:-[0-9]+)?(?:,[0-9]+(?:-[0-9]+)?)*)$
The pattern in parts matches:
^ Start of string
XYZ\/ Match XYX/ (You don't have to escape the / depending on the pattern delimiters)
(?: Outer on capture group for the alternatives
ALL Match literally
| Or
P? Match an optional P
[0-9]+(?:-[0-9]+)? Match 1+ digits with an optional - and 1+ digits
(?: Non capture group to match as a whole
,[0-9]+(?:-[0-9]+)? Match ,and 1+ digits and optional - and 1+ digits
)* Close the non capture group and optionally repeat it
) Close the outer non capture group
$ End of string
Regex demo
You can use this regex pattern to match those lines
^XYZ\/(?:P|ALL|[0-9])[0-9,-]*$
Use the global g and multiline m flags.
Btw, [P|ALL] doesn't match the word "ALL".
It only matches a single character that's a P or A or L or |.
I need one regex to capture a string up to a :, but the problem is that the : is not always there.
At this moment I am able to capture the groups when I have the : but not when I dont.
Not sure what I am doing wrong.
strings to capture
XXX 1 A:B (working)
XXX 1 A: (working)
XXX A (not working)
My regex:
^(?P<grp1>[A-Z]{3,10})\s(?P<grp2>.*)(?=\:)(?:.)*$
You can use
^(?P<grp1>[A-Z]{3,10})\s(?P<grp2>.*?)(?::.*)?$
See the regex demo. Details:
^ - start of string
(?P<grp1>[A-Z]{3,10}) - Group "grp1": three to ten uppercase letters
\s - a whitespace
(?P<grp2>.*?) - Group "grp2": any zero or more chars other than line break chars, as few as possible
(?::.*)? - an optional group matching any zero or more chars other than line break chars as many as possible
$- end of string.
Optionally match a single : after it
^(?P<grp1>[A-Z]{3,10})\s(?P<grp2>[^:\r\n]*)(?::[^:\r\n]*)?$
^ Start of string
(?P<grp1>[A-Z]{3,10}) Group grp1
\s Match a whitspace char
(?P<grp2>[^:\r\n]*) Group 2 grp2 Match any char except : or a newline
(?::[^:\r\n]*)? Optionally match a single : between optional chars other than : or a newline
$ End of string
Regex demo
I have a string that has the following structure:
digit-word(s)-digit.
For example:
2029 AG.IZTAPALAPA 2
I want to extract the word(s) in the middle, and the digit at the end of the string.
I want to extract AG.IZTAPALAPA and 2 in the same capture group to extract like:
AG.IZTAPALAPA 2
I managed to capture them as individual capture groups but not as a single:
town_state['municipality'] = town_state['Town'].str.extract(r'(\D+)', expand=False)
town_state['number'] = town_state['Town'].str.extract(r'(\d+)$', expand=False)
Thank you for your help!
Yo can use a single capturing group for the example string to match a single "word" that consists of uppercase chars A-Z with an optional dot in the middle which can not be at the start or end followed by 1 or more digits.
\b\d+ ([A-Z]+(?:\.[A-Z]+)* \d+)\b
Explanation
\b A word boundary
\d+
( Capture group 1
[A-Z]+ Match 1+ occurrences of an uppercase char A-Z
(?:\.[A-Z]+)* \d+ Repeat 0+ times matching a dot and a char A-Z followed by matching 1+ digits
) Close group 1
\b A word boundary
Regex demo
Or you can make the pattern a bit broader matching either a dot or a word character
\b\d+ ([\w.]+(?: [\w.]+)* \d+)\b
Regex demo
You can use the following simple regex:
[0-9]+\s([A-Z]+.[A-Z]+(?: [0-9]+)*)
Note:
(?: [0-9]+)* will make it the last digital optional.
The words' length could be 2 or 6-10 and could be separated by space or comma. The word only include alphabet, not case sensitive.
Here is the groups of words that should be matched:
RE,re,rereRE
Not matching groups:
RE,rere,rel
RE,RERE
Here is the pattern that I have tried
((([a-zA-Z]{2})|([a-zA-Z]{6,10}))(,|\s+)?)
But unfortunately this pattern can match string like this: RE,RERE
Look like the word boundary has not been set.
You could match chars a-z either 2 or 6 - 10 times using an alternation
Then repeat that pattern 0+ times preceded by a comma or a space [ ,].
^(?:[A-Za-z]{6,10}|[A-Za-z]{2})(?:[, ](?:[A-Za-z]{6,10}|[A-Za-z]{2}))*$
Explanation
^ Start of string
(?:[A-Za-z]{6,10}|[A-Za-z]{2}) Match chars a-z 6 -10 or 2 times
(?: Non capturing group
[, ](?:[A-Za-z]{6,10}|[A-Za-z]{2}) Match comma or space and repeat previous pattern
)* Close non capturing group and repeat 0+ times
$ End of string
Regex demo
If lookarounds are supported, you might also assert what is directly on the left and on the right is not a non whitespace character \S.
(?<!\S)(?:[A-Za-z]{6,10}|[A-Za-z]{2})(?:[ ,](?:[A-Za-z]{6,10}|[A-Za-z]{2}))*(?!\S)
Regex demo
([a-zA-Z]{2}(,|\s)|[a-zA-Z]{6,10}|(,|\s))
This one will get only the words who have 2 letter, or between 6 and 10
\b,?([a-zA-Z]{6,10}|[a-zA-Z]{2}),?\b
You can use this
^(?!.*\b[a-z]{4}\b)(?:(?:[a-z]{2}|[a-z]{6,10})(?:,|[ ]+)?)+$
Regex Demo
This regex will match your first case, but neither of your two other cases:
^((([a-zA-Z]{2})|([a-zA-Z]{6,10}))(,|[ ]+|$))+$
I'm making the assumption here that each line should be a single match.
Here it is in action.
I know that ^. is first character and (\d+)(?!.*\d) is last number. I've tried using | between these and have been trying to find code for the second character, but with no success.
This is in R.
Take for example:
'ABCD some random words and spaces 1234' should output 'A4' when I do
sub([regex here], "", 'ABCD some random words and spaces 1234')
If you used ^.|(\d+)(?!.*\d), the pattern would only match the first char and remove it with sub, and would remove the first char and the last 1+ digits if used with gsub without backreferences in the replacement pattern. See this pattern demo.
You can use
sub("^(.).*(\\d).*$", "\\1\\2", "ABCD some random words and spaces 1234")
See the R demo and the regex demo.
This TRE regex pattern matches:
^ - start of string
(.) - Group 1 capturing any char
.* - 0+ any chars as many as possible up to the last...
(\\d) - Group 2 capturing a digit
.* - the rest of the string
$ - end of string.
The \\1\\2 replacement pattern re-inserts the values captured with Group 1 and Group 2 back to the result.