Suite 400 - 100 ABCDEF (Capture values from 100) - regex

I need a regular expression that would find 100 ABCDEF from input string Suite 400 - 100 ABCDEF. It should be noted that I created a regex as below but it picks the value from Suite.
[^-\s]\d.+

Just put $ at the end of your regex. $ means "end of line".
Also, replace the dot with [^-], so it will match only non-hyphens:
[^-\s]?\d[^-]+$
Fiddle: http://refiddle.com/refiddles/5b9a88ef75622d4ca9590000

Since you're trying to match a US street address, you should try matching a number followed by one or more words instead:
\d+(?:\s+[A-Za-z.]+)+
Demo: https://regex101.com/r/y6n5jD/1

Related

Regex to get any numbers after the occurrence of a string in a line

Hi guys im trying to get the the substring as well as the corresponding number from this string
text = "Milk for human consumption may be taken only from cattle from 80 hours after the last treatment."
I want to select the word milk and the corresponding number 80 from this sentence. This is part of a larger file and i want a generic solution to get the word milk in a line and then the first number that occurs after this word anywhere in that line.
(Milk+)\d
This is what i came up with thinking that i can make a group milk and then check for digits but im stumped how to start a search for numbers anywhere on line and not just immediately after the word milk. Also is there any way to make the search case insensitive?
Edit: im looking to get both the word and the number if possible eg: "milk" "80" and using python
/(?<!\p{L})([Mm]ilk)(?!p{L})\D*(\d+)/
This matches the following strings, with the match and the contents of the two capture groups noted.
"The Milk99" # "Milk99" 1:"Milk" 2:"99"
"The milk99 is white" # "milk99" 1:"milk" 2:"99"
"The 8 milk is 99" # "milk is 99" 1:"milk" 2:"99"
"The 8milk is 45 or 73" # "milk is 45" 1:"milk" 2:"45"
The following strings are not matched.
"The Milk is white"
"The OJ is 99"
"The milkman is 37"
"Buttermilk is 99"
"MILK is 99"
This regular expression could be made self-documenting by writing it in free-spacing mode:
/
(?<!\p{L}) # the following match is not preceded by a Unicode letter
([Mm]ilk) # match 'M' or 'm' followed by 'ilk' in capture group 2
(?!p{L}) # the preceding match is not followed by a Unicode letter
\D* # match zero or more characters other than digits
(\d+) # match one or more digits in capture group 2
/x # free-spacing regex definition mode
\D* could be replaced with .*?, ? making the match non-greedy. If the greedy variant were used (.*), the second capture group for "The 8milk is 45 or 73" would contain "3".
To match "MILK is 99", change ([Mm]ilk) to (?i)(milk).
This seems to work in java (I overlooked that the questioner wanted python or the question was later edited) like you want to:
String example =
"Test 40\n" +
"Test Test milk for human consumption may be taken only from cattle from hours after the last treatment." +
"\nTest Milk for human consumption may be taken only from cattle from 80 hours after the last treatment." +
"\nTest miLk for human consumption may be taken only from cattle from 80 hours after the last treatment.";
Matcher m = Pattern.compile("((?i)(milk).*?(\\d+).*\n?)+").matcher(example);
m.find();
System.out.print(m.group(2) + m.group(3));
Look at how it tests whether the word "milk" appears in a case insensitive manner anywhere before a number in the exact same line and only prints these both. It also prints only the first found occurence (making it find all occurencies is also possible pretty easily just by a little modifications of the given code).
I hope the way it extracts these both things from a matching pattern is in the sense of your task.
You should try this one
(Milk).*?(\d+)
Based on your language, you can also specify a case-insensitive search. Example in JS: /(Milk).*?(\d+)/i, the final i makes the search case insensitive.
Note the *?, the most important part ! This is a lazy iteration. In other words, it reads any char, but as soon as it can stop and process the next instruction successfully then it does. Here, as soon as you can read a digit, you read it. A simple * would have returned the last number from this line after Milk instead

Regex for valid SSN or other ID

I'm a regex newbie and I've got a valid regex for SSNs:
/^(\d{3}(\s|-)?\d{2}(\s|-)?\d{4})|[\d{9}]*$/
But I now need to expand it to accept either an SSN or another alphanumeric ID of 7 characters, like this:
/^[a-zA-Z0-9]{7}$/
I thought it'd be as simple as grouping the SSN and adding an OR | but my tests are still failing. This is what I've got now:
/^((\d{3}(\s|-)?\d{2}(\s|-)?\d{4})|[\d{9}])|[a-zA-Z0-9]{7}$/
What am I doing wrong? And is there a more elegant way to say either SSN or my other ID?
Thanks for any helpful tips.
Valid SSNs:
123-45-6789
123456789
123 45 6789
Valid ID: aCe8999
I have modified your first regex also a bit, below is demo program. This is as per my understanding of the problem. Let me know if any modification is needed.
my #ids = (
'123-45-6789',
'123456789',
'123 45 6789',
'1234567893434', # invalid
'123456789wwsd', # invalid
'aCe8999',
'aCe8999asa' # invalid
);
for (#ids) {
say "match = $&" if $_ =~ /^ (?:\d{3} ([ \-])? \d{2} \1? \d{4})$ | ^[a-zA-Z0-9]{7}$/x ;
}
Output:
match = 123-45-6789
match = 123456789
match = 123 45 6789
match = aCe8999
Your first regex got some problems. The important thing about it is that it accepts {{{{}}}}} which means you have built a wrong character class. Also it matches 123-45 6789 (notice the mixture of space and dash).
To mean OR in regular expressions you need to use pipe | and remember that each symbol belongs to the side that it resides. So for example ^1|2$ checks for strings beginning with 1 or ending with 2 not only two individual input strings 1 and 2.
To apply the exact match you need to do ^1$|^2$ or ^(1|2)$.
With the second regex ^[a-zA-Z0-9]{7}$ you are not saying alphanumeric ID of 7 characters but you are saying numeric, alphabetic or alphanumeric. So it matches 1234567 too. If this is not a problem, the following regex is the solution by eliminating the said issues:
^\d{3}([ -]?)\d\d\1\d{4}$|^[a-zA-Z0-9]{7}$

regex for excluding text at end of string

I have a regular expression (built in adobe javascript) which finds string which can be of varying length.
The part I need help with is when the string is found I need to exclude the extra characters at the end, which will always end with 1 1.
This is the expression:
var re = new RegExp(/WASH\sHANDLING\sPLANT\s[-A-z0-9 ]{2,90}/);
This is the result:
WASH HANDLING PLANT SIZING STATION SERVICES SHEET 1 1 75 MOR03 MUP POS SU W ST1205 DWG 0001
I need to modify the regex to exclude the string in bold beginning with the 1 1.
Keep in mind the string searched for can be of varying length hence the {2,90}
Can anyone please advise assistance in modifying the REGEX to exclude all string from 1 1
Thank you
You may use a positive lookahead and keep the same functionality:
/WASH\sHANDLING\sPLANT\s[-A-Za-z0-9 ]{2,90}(?=\b1 1\b)/
^^^^^^^^^^^
The (?=\b1 1\b) lookahead requires 1 1 as whole "word" after your match.
See the regex demo
Also, note that [A-z] matches more than just letters.

I want to replace the second occurrence of the number in the string

I have a string say a url like below
"www.regexperl.com/1234/34/firstpage/home.php"
Now i need to replace the 34 number that is the second occurrence of a number in the string with 2.
The resultant string should be like
"www.regexperl.com/1234/2/firstpage/home.php"
The challenge I m facing is when i try to store the value 34 and replace it , It is replacing the 34 in the number 1234 and gives the result like below
"www.regexperl.com/122/34/firstpage/home.php"
Kindly let me know a proper regex to solve the problem.
Use \K.
^.*?\d+\b.*?\K\d+
Replace by your string.See demo.
https://regex101.com/r/lW2kK1/1
Well if the positions are constant then you can find and replace as follows.
Regex: (\.com\/\d+)(\/\d+)
Input string: www.regexperl.com/1234/34/firstpage/home.php
Replacement to do: Replace with \1/ followed by number of your choice. For example \1/2.
Output string: www.regexperl.com/1234/2/firstpage/home.php
Regex101 Demo

regular expression for bulgarian mobile phone numbers

Hello I should think of this regular expression:
The telephone number should begin with 087 OR 088 OR 089 and then it should be followed by7 digits:
This is what I made but it doesn't work correctly: it accepts only numbers which begin with 089
(087)|(088)|(089)[0-9]{7}";
/08[789]\d{7}/
that will match 087xxxxxxx, 088xxxxxxx, 089xxxxxxx numbers.
See it in action
Maybe /08[7-9][0-9]{7}/ is what you're searching for?
Autopsy:
08 - a literal 08
[7-9] - matches the numbers from 7-9 once
[0-9]{7} - matches the numbers from 0-9 repeated exactly 7 times
That said, you might prefer /^08[7-9][0-9]{7}$/ if your string is only the phone number. (^ means "the string MUST start here" and $ means "the string MUST end here").
Actually that will be far better regex for Bulgarian phone numbers:
/(\+)?(359|0)8[789]\d{1}(|-| )\d{3}(|-| )\d{3}/
It checks:
Phones that start with country code(+359) or 0 instead;
if the phone number use delimiters like - or space.
I tried it in https://regex101.com and it did not work against my test set. So I tweaked it a little bit with the below regex pattern:
^([+]?359)|0?(|-| )8[789]\d{1}(|-| )\d{3}(|-| )\d{3}$