I've got a string
198.21 543 G110P0GHTT SAW GHA + DBA 11998
And I'd like to match all groups of the string between spaces. So far I've come up with (?<=\s)(.*?)(?=\s) which matches all but the first group. Additionally, this does not count the GHA + DBA as a group. What can I add to this to ensure it includes the first record as well as anything MORE than one space
You don't need to use look arounds here. Just use this regex to match a non-whitespace string or substring separated by a single space:
\S+(?:\s\S+)*
RegEx Demo
RegEx Details:
\S+: Match 1+ non-space characters
(?:\s\S+)*: Match 0 or more non-space substring separated by a single space.
Related
I am close but no quite there. I am trying to match the last word to pull out the last name.
My Regex:
Insured Name:\W*(?<insured_last_name>.*)
Text that I am searching:
Insured Name:
FRED & ETHYL MERTZ
Sample here...
https://regex101.com/r/McdMcq/3
You can match Insured Name: until the end of the line. Then match a newline and optional following whitespace chars.
Then at the line where you want to get the last word, first match until the end of the line, then backtrack until the last space, and capture 1+ non whitespace chars in group insured_last_name
\bInsured Name:.*\r?\n\s*.* (?<insured_last_name>\S+)
In parts
\bInsured Name: Match literally
.*\r?\n\s* Match the rest of the line, a newline and 0+ whitespace chars
.* Match the rest of the line and match the last space
(?<insured_last_name>\S+) Match 1+ non whitespace chars in group insured_last_name
Regex demo
You can simply /\w+$/gm
Demo: https://regex101.com/r/McdMcq/4
Explanation:
\w: Look for alphanumeric letters
+: At least one
$: And then the end of the string
If there are multiple rows and potentially garbage data in between I would recommend you to remove the 2 newlines (\n\n) and then do a Positive Lookbehind looking for "Name". Demo: https://regex101.com/r/McdMcq/5
If you need to store the result in a capture group simply enclose \w+$ with parenthesis and group name (i.e (?<insured_last_name>\w+$)) on any of the two regexes.
You may need to define your data set a little more, but you can try
Insured Name:\n+.*(?<insured_last_name>\b.+)
Example
It starts at "Insured Name:", then any empty lines, then will read the following line until the final word boundary (excluding the EOL); anything after that is in your named group.
I have a list that could look sort of like
("!Goal 27' Edward Nketiah"),
("!Goal 33' 46' Pierre Emerick-Aubameyang"),
("!Sub Nicolas Pepe"),
("Jordan Pickford"),
and I'm looking to match either !Sub or !Goal 33' 46' or !Goal 27'
Right now I'm using the regex (!\w+\s) which will match !Goal and !Sub, but I want to be able to get the timestamps too. Is there an easy way to do that? There is no limit on the number of timestamps there could be.
As I mentioned in my comment, you can use the following regex to accomplish this:
(!\w+(?:\s\d+')*)
Explanation:
(!\w+(?:\s\d+')*) capture the following
! matches this character literally
\w+ matches one or more word characters
(?:\s\d+')* match the following non-capture group zero or more times
\s match a whitespace character
\d+ matches one or more digits
' match this character literally
Additionally, the first capture group isn't necessary - you can remove it to simply match:
!\w+(?:\s\d+')*
If you need each timestamp, you can use !\w+(\s\d+')* and split capture group 1 on the space character.
If your input always follows the format "bang text blank digits apostrophe blank digits apostrophe etc", then it should be as simple as:
!\w+(?:\s\d+')*
Explanation:
! matches an exclamation mark
\w+ matches 1 or more word-characters (letters, underscores)
(?:…) is a non-capturing group
\s matches a single whitespace character
\d+ matches one or more digits
' matches the apostrophe character
* repeatedly matches the group 0 or more times
this :
(!\w+(?:\s\d+')*)
will capture :
"!Goal 27'"
"!Goal 33' 46'"
"!Sub"
I want to pull the second string in a commma delimited list where the first value is numeric and the second is alpha.
I'm using \d[^,]+(?=,) to pull the numeric value in the first field and just need help with pulling the second value from the "Name" column.
Here's part of a sample file that I'm trying to extract data from:
Address Number,Name,Employee Master Exist(Y/N),Auto-Deposit Exists(Y/N),Supplier Master Exists(Y/N),Supplier Master Created,ACH Account Exists(Y/N),ACH Account Created,ACH Same as Auto-deposit(Y/N)
//line break here is for clarity and does not exist in file//
4398,Presley Elvis Aaron,Y,N,Y,N,Y,N,N
10154,Shepard Alan Barrett,Y,Y,Y,N,Y,N,N
You could make use of a capturing group if you want to match the second string by first matching 1+ digits and a comma.
Then capture in a group matching 1+ chars a-zA-Z and match the trailing comma.
^\d+,([a-zA-Z]+(?: [a-zA-Z]+)*),
^ Start of string
\d+, Match 1+ digits and a comma (Or use (\d+), if the digits should also be a group)
( Capture group 1
[a-zA-Z]+ Match 1+ chars a-zA-Z
(?: [a-zA-Z]+)* Repeat matching the same as previous preceded by a space
), Close capturing group and match trailing comma
Regex demo
To get a bit broader match you could use this pattern to match at least a single char a-zA-Z
\d+,([a-zA-Z ]*[a-zA-Z][a-zA-Z ]*),
Regex demo
Note that this part in your pattern \d[^,]+ matches not only digits, but 1 digit followed by 1+ times any char except a comma which would for example also match 4a$ .
You could try this regex:
^\d+,([^,]+),
This will look for lines:
starting with one or more digits
followed by a comma
capture anything that is not a comma
followed by a comma
See it at Regex 101
If not all lines contain a name, then change the + to a *:
^\d+,([^,]*),
See alternative regex
I am looking to create groups, that are separated by 4 spaces
The problem is that if the group contains any space, other than the 4 space separator, there is no match with the regex I have tried so far
This is what I have tried.
Let's say I have these 2 lines, with 4 spaces between the words
word 1 word 2
word1 word2
and the regex is
^([^ {4}]*) {4}([^ {4}]*)$
This matches only the 2nd line. The presence of any space anywhere other than the 4 space separator, will not match the line.
My expectation is to match and have the correct groups identified, in both these lines.
This RegEx might help you to divide your input strings into five groups, where the second and fourth groups are the four-space:
([a-zA-Z0-9_ ]*)(\s{4})([a-zA-Z0-9_ ]*)(\s{4})([a-zA-Z0-9_ ]*)
If you may not have space in your columns, you could simplify it using this RegEx:
(\w+)(\s{4})(\w+)(\s{4})(\w+)
After some experimentation and based on the good suggestions here, I came us with This RegEx:
^(.*?) (.*?) (.*?)$
On the surface it does what I need. The last line has more 4 space blocks at the end, but that should not happen. Any pitfall that I am not seeing?
Instead of using a non greedy dot star .*? approach, you could specify the characters that you want to match.
If your data contains for example only words, you could match 1+ word chars \w+ followed by a repeating pattern (\w+(?: \w+)*) to match a space and 1+ word chars followed by matching 4 spaces.
Note that if you want to match more that a word character, you could use a character class and add the characters that you would allow to match.
^(\w+(?: \w+)*) {4}(\w+(?: \w+)*) {4}(\w+(?: \w+)*)$
Regex demo
I have the below string
abc-12d-ef-oy-5678-xyz--**--20190120075439322am--**--ghi-66d-ef-oy-8877-sdf--**--sfdfdsgfg--**--20190120075765487am
It is kind of multi character delimited string, delimited by '--**--' I am trying to extract the first and second words which has the -oy- tag in it. This is a column in a table. I am using the regex_extract method but i am not able extract the string which contains a string and ends with a string.
Here is one pattern that i tried .*(.*oy.*)--
If the -oy- can not be at the start or at the end, you could use this pattern to match the 2 hyphen delimited strings with -oy-:
[a-z0-9]+(?:-[a-z0-9]+)*-oy(?:-[a-z0-9]+)+
Regex details
[a-z0-9]+ Match 1+ times a-z0-9
(?: Non capturing group
-[a-z0-9]+ Match - and 1+ times a-z0-9
)* Close group and repeat 0+ times
-oy Match literally
(?:-[a-z0-9]+)+ Repeat 1+ times a group which will match - and 1+ times a-z0-9
You can extend the character class [A-Za-z0-9] to allow what you want to match like uppercase chars.
Regex demo | Java demo
If the matches should be between delimiters, you could use a positive lookbehind and positive lookahead and an alternation:
(?<=^|--\\*\\*--)[a-z0-9]+(?:-[a-z0-9]+)*-oy(?:-[a-z0-9]+)+(?=--\\*\\*--|$)
See a Java demo
You can use this regex which will match string containing -oy- and capture them in group1 and group2.
^.*?(\w+(?:-\w+)*-oy-\w+(?:-\w+)*).*?(\w+(?:-\w+)*-oy-\w+(?:-\w+)*)
This regex basically matches two strings delimiter separated containing -oy- using this (\w+(?:-\w+)*-oy-\w+(?:-\w+)*) to capture the text.
Demo
Are you able to select values from capture groups?
(?:--\*\*--|^)(.*?-oy-.*?)(?:--\*\*--|$)
?: - Non-capture group, matches the delimiter, begin of line, or end of line but does not create a capture group
*? - Lazy match so you only grab the contents of the field
https://regex101.com/r/aUAvcx/1
--- Second stab at this follows ---
This is convoluted. Hopefully you can use Lookahead and Lookbehind. The last problem I had was the final record was being "Greedy" and sucking up the field before it too. So I had to add an exclusion in the capture group for your delimiter.
See if this works for you.
(?<=--\*\*--|^)((?:(?:(?!--\*\*--).)*)-oy-(?:(?:(?!--\*\*--).)*))(?=--\*\*--|$)
https://regex101.com/r/aUAvcx/3
Basically the (?: are so we are not getting too many capture groups to work with.
There are three parts to this:
The lookbehind - Make sure the field is framed by the delimiter (or start of line)
The capture group - Grab the contents of the field, making sure a delimiter isn't sucked up into it
The lookahead - Make sure the field is framed by the delimiter (or end of line)
As far as the capture group goes, I check the left and right side of the -oy- to make sure the delimiter isn't there.