This question already has an answer here:
Parse IP addresses from txt
(1 answer)
Closed 7 years ago.
>>> pat=re.compile('^\d{3}-\d{2}-\d{4}$')
>>> pat.findall('my sssn is 111-22-3333')
I am trying to catch ssn in the text. I tried the expression in pythex and it worked there but it's not wokring in python. I am new to this.
Remove the ^ and the $ anchors:
Your regex should be:
\d{3}-\d{2}-\d{4}
The caret ^ matches the position before the first character in the string, and since you have m in your input, \d{3} doesn't match.
$ matches after the last character in the string, you don't really need it here unless you want nothing to appear after the last four digits.
pat=re.compile('^.*?(\d{3}-\d{2}-\d{4}).*$')
Just group what you want and use .* to catch the buffer.This will make ^$match the whole string as opposed to what you were doing as then there were character after and before what you wanted.
Related
This question already has answers here:
In regex, match either the end of the string or a specific character
(2 answers)
Closed 7 months ago.
I have two regular expressions that work fine to extract text between characters:
(?<=\$)(.*)(?=\*)
(?<=\$)(.*)(?=)
For my example text $66* the first expression extracts 66. When the asterisk is not present in the text (i.e. $66), the second expression extracts 66.
How can I combine the two to use the first one if an asterisk is present and the second one if no asterisk is present?
I tried with what I thought would be an if|then|else like below but am doing something wrong: (?(?=\*)(?<=\$)(.*)(?=\*)|(?<=\$)(.*)(?=))
You can use a negated character set to exclude asterisks in your match instead:
(?<=\$)[^*]+
Demo: https://regex101.com/r/vuGBiJ/2
As you are already using a capture group, you could also match the $ and capture 1+ characters except the asterix.
\$([^*]+)
Regex demo
This question already has answers here:
Splitting a String by number of delimiters
(2 answers)
Closed 2 years ago.
I have a file containing informations in the following format :
Fred,Frank , Marcel Godwin , Marion,Ryan
I need the match commas and any whitespace around them, but not any comma inside brackets.
My problem is that with my current regex [\s,]+ the whitespaces between words are matched. So in this example the whitespace between Marcel and Godwin.
I thought about using something like \s,\s* but it wouldn't match parts when there is no whitespace around the comma, like between Fred and Frank
Surely, it's a simple fix but I can't figure it out.
I think this will match the commas including the whitespace before and afterwards like you explained in your question.
\s*(?=\,)\,(?<=\,)\s*
This is a positive looahead: (?=\,), it means it matches any whitespace if there is a comma afterwards.
This is a positive lookbehind: (?<=\,), it means it matches any whitespace if there is a comma rigth before.
Try it out yourself. You can use this page to check the output in your browser.
This question already has answers here:
Regex Last occurrence?
(7 answers)
Closed 3 years ago.
I have the following RegEx syntax that will match the first date found.
([0-9]+)/([0-9]+)/([0-9]+)
However, I would like to start from the end of the content and search backwards. In other words, in the below example, my syntax will always match the first date, but I want it to match the last instead.
Some Text here
01/02/15
Some additional
text here.
10/04/14
Ending text
here
I believe this is possible by using a negative lookahead, but all my attempts failed at this because I don't understand RegEx enough. Help would be appreciated.
Note: my application uses RegEx PCRP.
You could make the dot match a newline using for example an inline modifier (?s) and match until the end of the string.
Then make use of backtracking until the last occurrence of the date like pattern and precede the first digit with a word boundary.
Use \K to forget what was matched and match the date like pattern.
^(?s).*\b\K[0-9]+/[0-9]+/[0-9]+
Regex demo
Note that the pattern is a very broad match and does not validate a date itself.
This question already has answers here:
Regex.Match whole words
(4 answers)
Closed 3 years ago.
I have the following regex:
^USD|AUD|BRL|GBP|CAD|CNY|DKK|AED|EUR|HKD|INR|MYR|MXN|NZD|PHP|SGD|THB|ARS|COP|CLP|PEN|VEF$
When using this example string: 16ccf52b144~~refCode-3-d5779a89-d437-448a-bf53-efad2cdd66f6~20191020T16:00~20191026T16:00~USD~305.81~~~~**8294A2B49CD60ABE4FC7081F05CD06AA17E837CCADEB0ABC57B6AC94B09882FB
I am expecting the regex to return USD, instead it is returning CAD. How can I edit the regex so that it returns USD ...Ideally regex should look at ~currencyCode~ ...instead right now it is looking at currencyCode without tilde.
The ^ and $ assertions are unnecessary in your regex since the substring you are trying to match is at neither the beginning nor the end of the string, and the fact that ^ is preceding USD means that the pattern can only match USD if it is at the beginning of the string.
Instead, group the alternations and surround them with word boundary assertions:
\b(?:USD|AUD|BRL|GBP|CAD|CNY|DKK|AED|EUR|HKD|INR|MYR|MXN|NZD|PHP|SGD|THB|ARS|COP|CLP|PEN|VEF)\b
You haven't said which language or framework you're using, so I'll assume you want a generally applicable regex.
If you know that ~ will precede and follow your currency, then you can use a zero-width assertion to find text between ~ characters like so:
(?<=~)(USD|AUD|BRL|GBP|CAD|CNY|DKK|AED|EUR|HKD|INR|MYR|MXN|NZD|PHP|SGD|THB|ARS|COP|CLP|PEN|VEF)(?=~)
This will match the USD in 6:00~USD~305 because it's surrounded by ~, but not the CAD in 7CCADEB0 because it's not surrounded by them.
This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
Have regex in our project that matches any url that contains the string
"/pdf/":
(.+)/pdf/.+
Need to modify it so that it won't match urls that also contain "help"
Example:
Shouldn't match: "/dealer/help/us/en/pdf/simple.pdf"
Should match: "/dealer/us/en/pdf/simple.pdf"
If lookarounds are supported, this is very easy to achieve:
(?=.*/pdf/)(?!.*help)(.+)
See a demo on regex101.com.
(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)
First thing is match either a space or the start of a line
(?:^|\s)
Then we match anything that is not a or h OR any h that does not have elp behind it, one or more times +, until we find a /pdf/, then match non-space characters \S any number of times *.
((?:[^h ]|h(?!elp))+\/pdf\/\S*)
If we want to detect help after the /pdf/, we can duplicate matching from the start.
((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)
Finally, we match a or end line/string ($)
(?:$|\s)
The full match will include leading/trailing spaces, and should be stripped. If you use capture group 1, you don't need to strip the ends.
Example on regex101