Regex to pull first two fields from a comma separated file

Regex to pull first two fields from a comma separated file - regex

I want to pull the second string in a commma delimited list where the first value is numeric and the second is alpha.
I'm using \d[^,]+(?=,) to pull the numeric value in the first field and just need help with pulling the second value from the "Name" column.
Here's part of a sample file that I'm trying to extract data from:
Address Number,Name,Employee Master Exist(Y/N),Auto-Deposit Exists(Y/N),Supplier Master Exists(Y/N),Supplier Master Created,ACH Account Exists(Y/N),ACH Account Created,ACH Same as Auto-deposit(Y/N)
//line break here is for clarity and does not exist in file//
4398,Presley Elvis Aaron,Y,N,Y,N,Y,N,N
10154,Shepard Alan Barrett,Y,Y,Y,N,Y,N,N

You could make use of a capturing group if you want to match the second string by first matching 1+ digits and a comma.
Then capture in a group matching 1+ chars a-zA-Z and match the trailing comma.
^\d+,([a-zA-Z]+(?: [a-zA-Z]+)*),
^ Start of string
\d+, Match 1+ digits and a comma (Or use (\d+), if the digits should also be a group)
( Capture group 1
[a-zA-Z]+ Match 1+ chars a-zA-Z
(?: [a-zA-Z]+)* Repeat matching the same as previous preceded by a space
), Close capturing group and match trailing comma
Regex demo
To get a bit broader match you could use this pattern to match at least a single char a-zA-Z
\d+,([a-zA-Z ]*[a-zA-Z][a-zA-Z ]*),
Regex demo
Note that this part in your pattern \d[^,]+ matches not only digits, but 1 digit followed by 1+ times any char except a comma which would for example also match 4a$ .

You could try this regex:
^\d+,([^,]+),
This will look for lines:
starting with one or more digits
followed by a comma
capture anything that is not a comma
followed by a comma
See it at Regex 101
If not all lines contain a name, then change the + to a *:
^\d+,([^,]*),
See alternative regex

Related

How to make optional capturing groups be matched first

For example I want to match three values, required text, optional times and id, and the format of id is [id=100000], how can I match data correctly when text contains spaces.
my reg: (?<text>[\s\S]+) (?<times>\d+)? (\[id=(?<id>\d+)])?
example source text: hello world 1 [id=10000]
In this example, all of source text are matched in text

The problem with your pattern is that matches any whitespace and non whitespace one and unlimited times, which captures everything without getting the other desired capture groups. Also, with a little help with the positive lookahead and alternate (|) , we can make the last 2 capture groups desired optional.
The final pattern (?<text>[a-zA-Z ]+)(?=$|(?<times>\d+)? \[id=(?<id>\d+)])
Group text will match any letter and spaces.
The lookahead avoid consuming characters and we should match either the string ended, or have a number and [id=number]
Said that, regex101 with further explanation and some examples

You could use:
:\s*(?<text>[^][:]+?)\s*(?<times>\d+)? \[id=(?<id>\d+)]
Explanation
: Match literally
\s* Match optional whitespace chars
(?<text> Group text
[^][:]+? match 1+ occurrences of any char except [ ] :
) Close group text
\s* Match optional whitespace chars
(?<times>\d+)? Group times, match 1+ digits
\[id= Match [id=
(?<id>\d+) Group id, match 1+ digirs
] Match literally
Regex demo

Regular expression matching and remove spaces

Please how can I get the address using regex:
Address 123 Mayor Street, LAG Branch ABC
used (?<=Address(\s))(.*(?=\s)) but it includes the spaces after "Address". Trying to get an expression that extracts the address without the spaces. (There are a couple of spaces after "Address" before "123")
Thanks!

The pattern (?<=Address(\s))(.*(?=\s)) that you tried asserts Address followed by a single whitespace char to the left, and then matches the rest of the line asserting a whitespace char to the right.
For the example data, that will match right before the last whitespace char in the string, and the match will also contain all the whitespace chars that are present right after Address
One option to match the bold parts in the question is to use a capture group.
\bAddress\s+([^,]+,\s*\S+)
The pattern matches:
\bAddress\s+ Match Address followed by 1+ whitespace chars
( Capture group 1
[^,]+, Match 1+ occurrences of any char except , and then match ,
\s*\S+ Match optional whitespace chars followed by 1+ non whitespace chars
) Close group 1
.NET regex demo (Click on the Table tab to see the value for group 1)
Note that \s and [^,] can also match a newline
A variant with a positive lookbehind to get a match only:
(?<=\bAddress\s+)[^,\s][^,]+,\s*\S+
.NET Regex demo

Regex for checking address house number

I'm using the following expression to validate a house number:
^\d{1,4}([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|)$
Now the requirement has changed to the following constraints:
one number (25)
one number w/ one letter (25A)
one number w/ a second one divided by a hyphen (25-32)
one number w/ a second one divided by a hyphen and one letter w/ blank (25-32 A)
How do I validate these w/ changes to the regex above?

If you only want to match those values, you might use a pattern to match 1 or more digits followed by an optional part that matches either A-Z OR a hyphen and 1+ digits optionally followed by a space and a char A-Z
^\d+(?:[A-Z]|-\d+(?: [A-Z])?)?$
^ Start of string
\d+ Match 1+ digits
(?: Non capture group
[A-Z] Match a char A-Z
| Or
-\d+ Match
(?: [A-Z])?
)? Close group and make it optional
$ End of string
Regex demo

Regex groups for dash delimited filename in URL

I have a URL that is structured like so: <domain>/<subdirectory>/<filename>-<semantic_version>-<hash>.<filetype>
For example, it could look like: https://cdn.example.com/sample_files/some_file-1.2.3-56857cfc709d3996f057252c16ec4656f5292802.css
So far I have the following regex which gives me the entire filename. However, I'd like to individually get the filename, semantic_version, and hash as defined above. You can assume that the filename will not has dashes in the name.
([^/\\&\?]+)$(?<=(?:.js))

You could match the protocol and then until the last forward slash.
After that, capture 1+ word chars in group 1 for the file name, a repeating part in group 2 to capture digits divided by dots and in the third group a character class which would match all the characters in the hash.
^http\S+\/(\w+)-(\d+(?:\.\d+)+)-([0-9a-f]+)\.\w+$
Explanation
^ Start of string
http\S+\/ Match the protocol followed by 1+ non whitespace chars, then backtrack till the last /
(\w+)- Capture group 1, match 1+ word chars followed by -
(\d+(?:\.\d+)+)- Capture group 2, match digits divided by dots followed by -
([0-9a-f]+)\.\w+ Capture group 3, match 1+ times the chars from the hash followed by . and 1+ word chars
$ End of string
Regex demo
If the hash always has 40 characters, you could match [a-z0-9]{40} instead of [a-z]+ to be a bit more precise.

Use multiple capture groups that don't match - characters.
([^-/\\&\?]+)-([^-/\\&\?]+)-([^-/\\&\?]+)\.[a-z]+$(?<=(?:.js))

Regex match an optional number of digits

I have a list that could look sort of like
("!Goal 27' Edward Nketiah"),
("!Goal 33' 46' Pierre Emerick-Aubameyang"),
("!Sub Nicolas Pepe"),
("Jordan Pickford"),
and I'm looking to match either !Sub or !Goal 33' 46' or !Goal 27'
Right now I'm using the regex (!\w+\s) which will match !Goal and !Sub, but I want to be able to get the timestamps too. Is there an easy way to do that? There is no limit on the number of timestamps there could be.

As I mentioned in my comment, you can use the following regex to accomplish this:
(!\w+(?:\s\d+')*)
Explanation:
(!\w+(?:\s\d+')*) capture the following
! matches this character literally
\w+ matches one or more word characters
(?:\s\d+')* match the following non-capture group zero or more times
\s match a whitespace character
\d+ matches one or more digits
' match this character literally
Additionally, the first capture group isn't necessary - you can remove it to simply match:
!\w+(?:\s\d+')*
If you need each timestamp, you can use !\w+(\s\d+')* and split capture group 1 on the space character.

If your input always follows the format "bang text blank digits apostrophe blank digits apostrophe etc", then it should be as simple as:
!\w+(?:\s\d+')*
Explanation:
! matches an exclamation mark
\w+ matches 1 or more word-characters (letters, underscores)
(?:…) is a non-capturing group
\s matches a single whitespace character
\d+ matches one or more digits
' matches the apostrophe character
* repeatedly matches the group 0 or more times

this :
(!\w+(?:\s\d+')*)
will capture :
"!Goal 27'"
"!Goal 33' 46'"
"!Sub"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to pull first two fields from a comma separated file - regex

You could try this regex: ^\d+,([^,]+), This will look for lines: starting with one or more digits followed by a comma capture anything that is not a comma followed by a comma See it at Regex 101 If not all lines contain a name, then change the + to a : ^\d+,([^,]), See alternative regex

Related

How to make optional capturing groups be matched first

Regular expression matching and remove spaces

Regex for checking address house number

Regex groups for dash delimited filename in URL

Regex match an optional number of digits

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to pull first two fields from a comma separated file - regex

You could try this regex: ^\d+,([^,]+), This will look for lines: starting with one or more digits followed by a comma capture anything that is not a comma followed by a comma See it at Regex 101 If not all lines contain a name, then change the + to a *: ^\d+,([^,]*), See alternative regex

Related

How to make optional capturing groups be matched first

Regular expression matching and remove spaces

Regex for checking address house number

Regex groups for dash delimited filename in URL

Regex match an optional number of digits

Categories

Resources

You could try this regex: ^\d+,([^,]+), This will look for lines: starting with one or more digits followed by a comma capture anything that is not a comma followed by a comma See it at Regex 101 If not all lines contain a name, then change the + to a : ^\d+,([^,]), See alternative regex