Need regex help for matching names - regex

Let's say I have these three names
John Doe (p45643)
Le'anne Frank
Molly-Mae Edwards
I want to match
1) John Doe
2) Le'anne Frank
3) Molly-Mae Edwards
The regex I have tried is
(^[a-zA-Z-'^\d]$)+
but it isn't working as I am expecting.
I would like help creating a regex pattern that:
Matches a name from start to finish, and cannot contain a number. The only permitted values each "name" can contain is, [a-zA-Z'-], so if a name was
J0hn then it shouldn't match

If I understood correctly your question, then you have a minor errors in your regex:
(^[a-zA-Z-'^\d]$)+
^-------^------Here
The - pointed above should be escaped or moved to the end since it works as a range character. The + is marking the group as repeated.
You can use this regex instead (following your previous pattern):
(^[a-zA-Z'^\d -]+$)
Regex demo
Update: for your comment. If you want to match separately, then you can use:
(\b[a-zA-Z'^\d-]+\b)
Regex demo
And if you only want to match string (not numbers), then you can use:
(\b[a-zA-Z'-]+\b)
Regex demo

You are using the anchors incorrectly. Based on the modifier it can match the whole string or a single line.
Try
/^[a-zA-Z'-]+$/

Thanks to #Djory Krache
The query I was looking for was
(\b[a-zA-Z'-]+\b)

Related

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

Non-greedy matching

In Geany, I want to match the titles of books. One example:
Michael Lewis, Liar's Poker, Hodder & Stoughton Ltd, London, 1989
I try to do so with this regex code:
,\s.*?,
This regex matches too much. it matches: [, Liar's Poker,] and [,London,].
I want to have a regex that only matches the title.
I think you need this regex with no global modifier. If you set global modifier i.e. g then it will return further matches like you have experienced.
,\s*([^,]+)
Demo
As you want to ignore further matches thus you may try this too:
^.*?,\s*([^,]+).*$
You will get Liar's Poker in group 1
Demo 2
/(, \w+[']?\w? \w+,)/g
this regex will get you this
[", Liar's Poker,"]
you will have to do additional processing to remove those leading and trailing commas. Try it out and see if this works for you.

regex to select only the zipcode

,Ray Balwierczak,4/11/2017,,895 Forest Hill Rd,Apalachin,NY,13732,y,,
i want to select only 13732 from the line. I came up with this regex
(\d)(\s*\d+)*(\,y,,)
But its also selecting the ,y,, .if i remove it that part from regex, the regex also gets valid for the date. please help me on this.
Generally, if you want to match something without capturing it, use zero-length lookaround (lookahead or lookbehind). In your case, you can use lookahead:
(\d)(\s*\d+)*(?=\,y,,)
The syntax (?=<stuff>) means "followed by <stuff>, without matching it".
More information on lookarounds can be found in this tutorial.
Regex: \D*(\d{5})\D*
Explanation: match 5 digits surrounded by zero or more non-digits on both sides. Then you can extract group containing the match.
Here's code in python:
import re
string = ",Ray Balwierczak,4/11/2017,,895 Forest Hill Rd,Apalachin,NY,13732,y,,"
search = re.search("\D*(\d{5})\D*", string)
print search.group(1)
Output:
13732

How can I match the last two words in a sentence in PostgreSQL?

Have been trying for a while, to match the last word of a sentence:
select regexp_matches('My name is Harry Potter', '[^ ]+$');
returned {Potter}
to try to match the last two words:
select regexp_matches('My name is Harry Potter', '[^ ]\s+[^ ]+$');
failed.
select regexp_matches('My name is Harry Potter', '(.*?)\s+(.*?)$');
Did not word as intended either.
Any insights?
Instead of using REGEXP_MATCHES which returns an array of matches, you may be better off using SUBSTRING which will give you the match as TEXT directly.
Using the correct pattern, as #Abelisto shared, you can do this:
SELECT SUBSTRING('My name is Harry Potter' FROM '\w+\W+\w+$')
This returns Harry Potter as opposed to {"Harry Potter"}
Per #Hambone's comment, if either of the words at the end contain punctuation, like an apostrophe, you would want to consider using the following pattern:
SELECT SUBSTRING('My name is Danny O''neal' FROM '\S+\s+\S+$')
The above would correctly return Danny O'neal as opposed to just O'neal
You should use double escaping in the pattern since it seems the standard_conforming_strings parameter of your PostgreSQL instance is turned off. See PostgreSQL 9.5.3 Documentation:
standard_conforming_strings (boolean)
This controls whether ordinary string literals ('...') treat backslashes literally, as specified in the SQL standard. Beginning in PostgreSQL 9.1, the default is on (prior releases defaulted to off).
Thus, you need to use
'[^ ]+\\s+[^ ]+$'
^^
or
'\\S+\\s+\\S+$'
Here,
[^ ]+ - 1 or more characters other than a space (any non-whitespace if \\S is used)
\\s+ - 1 or more whitespaces
[^ ]+ - 1 or more characters other than a space (any non-whitespace if \\S is used)
$ - end of string anchor.
Don't know how the regex works for postgres, but
online regex testers tell me that .*\s(.+)\s+(.*?)$ might do the trick.
I'm not 100% clear on what you're trying to do, but this regex matches the last two words of a sentence, and it's similar to your initial regex: "[^ ]+\s+[^ ]+$" (I just added a '+'.)
For further testing, I suggest going to https://regex101.com/ It's one of the best online regex helpers I've found, and it even breaks down the regex for you. (I'm not involved with the site in any way - it's a recommendation, not a plug)

Find first point with regex

I want a regex which return me only characters before first point.
Ex :
T420_02.DOMAIN.LOCAL
I want only T420_02
Please help me.
You can use the following regex: ^(.*?)(?=\.)
The captured group contains what you need (T420_02 in your example).
This simple expression should do what you need, assuming you want to match it at the beginning of the string:
^(.+?)\.
The capture group contains the string before (but not including) the ..
Here's a fiddle: http://www.rexfiddle.net/s8l0bn3
Use regex pattern ^[^.]+(?=[.])