Retrieve the words after the last numeric occurrence using regex - regex

I receive streetname + doorno in a string variable. I have to split them. My current regex is /[0-9].*$/ This works fine for normal addresses. But I have addresses where streetname also contains a numeric value. In this case, the streetname is considered as doorno too.
For ex,
[Correct] Street = Example Street 15B returns doorno = 15B
[Correct] Street = Example Street 15 B returns doorno = 15 B
[Correct] Street = Example Street returns doorno = null
[Correct] Street = Example Street15 returns doorno = 15
[Incorrect] Street = Example Street 158 7 returns doorno = 158 7. However I am expecting, the streetname = Example Street 158 & doorno = 7
[Incorrect] Street = Example Street 158 7 B returns doorno = 158 7 B. However I am expecting, the streetname = Example Street 158 & doorno = 7 B
[Incorrect] Street = Example Street 158 7B returns doorno = 158 7B. However I am expecting, the streetname = Example Street 158 & doorno = 7B
[Incorrect] Street = Example Street158 7 B returns doorno = 158 7B. However I am expecting, the streetname = Example Street158 & doorno = 7B
Can someone please help me to fix the regex for the above incorrect cases?

You may use
/^(.*\D)(\d.*)$/
It matches:
^ - start of a string
(.*\D) - Group 1: any 0+ chars (other than line break chars) up to the last occurrence of the subsequent subpatterns (ie. \D\d.*$)
(\d.*) - Group 2: a digit and then any 0+ chars (other than line break chars)
$ - end of string.

Related

Regex extract string based on String match

I have this data with some messy addresses inside which contains sometimes not in order a Province, District, and ward :
Name ADDRESS
Store1 453, Duy Tan, Phuong Nguyen Nghiem, Thanh pho Quang Ngai
Store2 13 DUNG SY THANH KHE, P. THANH KHE TAY
Store3 98 Phan Xich Long- P. 2
Store4 306 B4, NGUYENVAN LINH, Ward - 5
Store5 22, Ngo 421/16, Tran Duy Hung, To 42, Phuong Trung Hoa, Quan Cau Giay
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
//Replace each \ with \\ so that C# doesn't treat \ as escape character
//Pattern: Start of string, any integers, 0 or 1 letter, end of word
string sPattern = "^[0-9]+([A-Za-z]\\b)?";
string sString = Row.ADDRESS ?? ""; //Coalesce to empty string if NULL
//Find any matches of the pattern in the string
Match match = Regex.Match(sString, sPattern, RegexOptions.IgnoreCase);
//If a match is found
if (match.Success)
//Return the first match into the new
//HouseNumber field
Row.ward= match.Groups[0].Value;
else
//If not found, leave the HouseNumber blank
Row.ward= "";
}
}
I would like to modify my regex formula to return the data like this in the column Ward. (you can see the synonyms in my addresses (Phuong,P.,ward,etc).
Name ADDRESS ward
Store1 453, Duy Tan, Phuong Nguyen Nghiem, Quang Ngai Phuong Nguyen Nghiem
Store2 13 DUNG SY THANH KHE, P. THANH KHE TAY Phuong THANH KHE TAY
Store3 98 Phan Xich Long- P. 2 Phuong 2
Store4 306 B4, NGUYENVAN LINH, Ward - 5 Phuong 5
Store5 22, Ngo 421/16,--. To 42, Phuong Trung Hoa, Quan Cau Giay Phuong Trung Hoa
I use that regex expression to extract the civic number, but is there a way with REGEX i can modifiu return the data in my column ward like in the example above?
The groups in this regex, as tested in https://regex101.com/, match the data in your column ward, as in your example. However, you may need to better define the patterns where each will appear since this regex only matches them as they appear in your example data. However, it may be enough for you to extrapolate and get the regex that you really need.
(Phuong.*),|P\.(.*$)|Ward - (.*$)
The group in option 1 matches from Phuong (inclusive) until the first comma.
The group in option 2 matches anything that comes after P. until the end of the string.
The group in option 3 matches anything that comes after Ward - until the end of the string.
This one is a bit more advanced, but it only matches what you mentioned in your examples, no groups:
Phuong.*(?=,)|(?<=P\.).*$|(?<=Ward - ).*$
Test it in https://regex101.com to see how it works and to see what each part means.
Finally, you may want to exclude Phuong from the match in option 1 on so that your program can always print Phuong and then the match.

dart regex remove space phone

I tried all this regex solution but no match REGEX Remove Space
I work with dart and flutter and I tried to capture only digit of this type of string :
case 1
aaaaaaaaa 06 12 34 56 78 aaaaaa
case 2
aaaaaaaa 0612345678 aaaaaa
case 3
aaaaaa +336 12 34 56 78 aaaaa
I search to have only 0612345678 with no space and no +33. Just 10 digit in se case of +33 I need to replace +33 by 0
currently I have this code \D*(\d+)\D*? who run with the case 2
You may match and capture an optional +33 and then a digit followed with spaces or digits, and then check if Group 1 matched and then build the result accordingly.
Here is an example solution (tested):
var strs = ['aaaaaaaaa 06 12 34 56 78 aaaaaa', 'aaaaaaaa 0612345678 aaaaaa', 'aaaaaa +336 12 34 56 78 aaaaa', 'more +33 6 12 34 56 78'];
for (int i = 0; i < strs.length; i++) {
var rx = new RegExp(r"(?:^|\D)(\+33)?\s*(\d[\d ]*)(?!\d)");
var match = rx.firstMatch(strs[i]);
var result = "";
if (match != null) {
if (match.group(1) != null) {
result = "0" + match.group(2).replaceAll(" ", "");
} else {
result = match.group(2).replaceAll(" ", "");
}
print(result);
}
}
Returns 3 0612345678 strings in the output.
The pattern is
(?:^|\D)(\+33)?\s*(\d[\d ]*)(?!\d)
See its demo here.
(?:^|\D) - start of string or any char other than a digit
(\+33)? - Group 1 that captures +33 1 or 0 times
\s* - any 0+ whitespaces
(\d[\d ]*) - Group 2: a digit followed with spaces or/and digits
(?!\d) - no digit immediately to the right is allowed.
Spaces are removed from Group 2 with a match.group(2).replaceAll(" ", "") since one can't match discontinuous strings within one match operation.

Excel Macro Unable to Separate String Address

Software: MS Excel 2016
Update 1
Please note there can be any number of digits before West, i.e.
123124234234West18th Street
2West 14th Avenue
12324West
Please assist with general solution
Original Question
There is address, 31West 52nd Street I am trying to split the 31 and West so output will be
31 West 52nd Street
Tried this Macro statement but it won't work, please guide
Selection.Replace What:="?#West ", Replacement:=" West " _
, LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False, SearchFormat _
:=False, ReplaceFormat:=False
This is a sample of code, that would check for the first few chars. If they are digits, if would split them with a space from the rest:
Option Explicit
Public Sub TestMe()
Debug.Print fnStrStripMyNumber("31West 52nd Street")
Debug.Print fnStrStripMyNumber("123Vityata Shampion")
End Sub
Public Function fnStrStripMyNumber(strStr As String) As String
Dim lngCountDigits As Long
Dim lngCounter As Long
strStr = Trim(strStr)
For lngCounter = 1 To Len(strStr)
If IsNumeric(Mid(strStr, lngCounter, 1)) Then
lngCountDigits = lngCountDigits + 1
Else
Exit For
End If
Next lngCounter
strStr = Left(strStr, lngCountDigits) & " " & Right(strStr, Len(strStr) - lngCountDigits)
fnStrStripMyNumber = Trim(strStr)
End Function
Thus, from input:
"31West 52nd Street"
"123Vityata Shampion"
We get output:
31 West 52nd Street
123 Vityata Shampion
You can try this excel formula as well,
=LEFT(A1,FIND("West",A1)-1)&" "&RIGHT(A1,LEN(A1)-FIND("West",A1)+1)
Or if you want a macro only,
Sub rep()
Range("B1") = Replace(Range("A1"), "West", " West")
End Sub

Groovy null regex

I would like to do the same task as this question but with groovy.
REGEX: How to split string with space and double quote
def sourceString = "18 17 16 \"Arc 10 12 11 13\" \"Segment 10 23 33 32 12\" 23 76 21"
def myMatches = sourceString.findAll(/("[^"]+")|\S+/) { match, item -> item }
println myMatches
This is the result
[null, null, null, "Arc 10 12 11 13", "Segment 10 23 33 32 12", null, null, null]
Consider the following, which uses the Elvis operator:
def sourceString = '18 17 16 "Arc 10 12 11 13" "Segment 10 23 33 32 12" 23 76 21'
def regex = /"([^"]+)"|\S+/
def myMatches = sourceString.findAll(regex) { match, item ->
item ?: match
}
assert 8 == myMatches.size()
assert 18 == myMatches[0] as int
assert 17 == myMatches[1] as int
assert 16 == myMatches[2] as int
assert "Arc 10 12 11 13" == myMatches[3]
assert "Segment 10 23 33 32 12" == myMatches[4]
assert 23 == myMatches[5] as int
assert 76 == myMatches[6] as int
assert 21 == myMatches[7] as int
Returning the match instead of item gives nearly the expected result but the quotes remain. Don't know how to exclude them using regexp but removing the quotes from the result works:
def myMatches = sourceString.findAll(/"([^"]+)"|\S+/) { match, item -> match.replace('"', '') }

How can I extract the first digits from string

I am trying to use powershell to extract the first digits from a long string. How can I use regex to only get the first numbers from a string?
String 1:
000660007501S W RUSSELL DLC NO 41 SLY 2.5 FT OF ELY 313.82 FT OF FOLLOWING DESCRIBED PORTION OF SAMUEL W RUSSELL DONATION CLAIM` 1000
string 2:
010454040006ALDERBROOK DIV NO 05 62000 14000040
string 3:
012000012000ALEXANDER ACRE TRS S 1/2 OF LOT 38 TGW LOT 39 TGW N 45.96 FT OF E 109.23 FT LOT 40 LESS ANY POR PLTD DEVON LANE 13000 38-39-40
I was able to do it like this:
$accountnumber = $p.Substring(0,16) -replace '\D+',''
$Parcelnumber = $accountnumber.Substring(0,10)