Ms Exchange Online | Regex in Transport Rule - regex

I want to create a Regex Transport rule for the email with Subject 1365 1049126 9003175245 19382_ST03
I want to check if the subject contains 900 in the third group of characters and the numbers after 900 are dynamic and is a single word.
For example:
Subject 1 - 1365 1049126 9003175245 19382_ST03
Subject 2 - 2455 3245626 9003175000 19382_ST03
Subject 3 - 4567 4449126 9003005030 19382_ST03
How do I set this in regular expression? Any help will be appreciated.

You could use something like this: ^[0-9]{4}\s[0-9]{7}\s900[0-9]{7}\s\w{10}$
^ Matches start of strings
[0-9]{4}\s Matches 4 digits and whitespace after them
[0-9]{7}\s Does the same but for 7 digits
900[0-9]{7}\s <-- This matches 900 and 7 digits and whitespace
\w{10}s Matches 10 word characters
$ Matches end of string
If these groups are not ONLY digits, you can replace [0-9] with \w

Related

Regex for phone number allowing 9 or 10 digits

I am trying to validate a regex which
allows 10 digits
if digit starts with 672 then it should only allow total 9 digits
I have tried below regex
/^\d{10}$|^(672)\d{6}$/
https://regex101.com/r/0ahnKx/1
It works for 10 digits but if number starts with 672 then also it allows 10 digits.
Could anyone help how can I fix this?
Thanks!
First of all, the capturing group in your regex is redundant, it would make sense to wrap the part of the pattern between ^ and $ to only use single occurrences of the anchors.
To fix the issue, you need to make sure the first three digits matched by \d{10} are not 672, and you can achieve that with a negative lookahead:
/^((?!672)\d{10}|672\d{6})$/
/^(?:(?!672)\d{10}|672\d{6})$/
See the regex demo. Details:
^ - start of string
(?: - start of a group:
(?!672)\d{10} - no 672 substring check is triggered and then ten digits are matched
| - or
672\d{6} - 672 and six digits
) - end of the group
$ - end of string.

How to clean data in python for Special Characters through Regex?

Name Id Salary Desgn
0 Mike B1230 3000 Engg
1 John !##2 3000 !##&
2 Lucy B1230 3000 %##B
3 ###& ###& ###& ###&
4 snow B1230 3000 Engg
5 Lily #&-# 3000 Engg
Output:
Name Id Salary Desgn
0 Mike B1230 3000 Engg
1 John !##2 3000
2 Lucy B1230 3000 %##B
3
4 snow B1230 3000 Engg
5 Lily 3000 Engg
I was trying to clean data where if a cell contains pure special characters(without any numbers or alphabets) it will replace those value as a null value by Regular Expression.
You can use this pattern r'\B([!##%&-]+)\s' and should work for the examples you provided (except for the %##B because it contains a letter, contrary to the description you gave).
In python this would be:
import re
patt = r'\B([!##%&-]+)\s'
re.sub(patt, '', your_string)
If using pandas you can use apply: df['new_column'] = df['string_column'].apply(lambda x: re.sub(patt, '', x))
Here is a solution:
(?<!\S)[^\w\s]+(?!\S)
Using lookarounds, this regex matches exactly the unwanted strings. This might help preserve the formatting of the text (e.g replace each match with " ".
The problem with \W* or \W+ in this case is that it will match all non-word characters even if they are adjacent to word characters so we need to be a little more specific.
EDIT: !\S in negative lookarounds above cannot be replaced with a \s and positive lookarounds because \s does not match start and end of the string and it will lead to a more complex regex in order to to match patterns at start and end positions.
You can use this regex to match text that doesn't contain at least one alphabet or number and replace it with empty string,
(?!\S*[a-zA-Z0-9]\S*)(?<!\S)\S+\s*
Here this (?!\S*[a-zA-Z0-9]\S*) negative look ahead rejects a token if it doesn't contain at least one alphabet or number, then (?<!\S) ensures the match doesn't start partially from a token that may have alphanumeric character just preceding to it and \S+ matches that token and \s* at the end consumes trailing space(s) after the removed token as you posted in your expected output.
Check out this demo

Match all type of numbers

I need regular expression which extracts all numbers with different delimiters (single whitespace, comma, dot). Each number can use none or all of them.
Example:
text: 'numbers: 3.14 2 544 345,345.55 506 test 120 100 100'
output: '3.14', '2 544', '345,345.55', '506', '120 100 100'
I created re: \d+[(.|,|\s)\d+]+, but it not works properly.
I assume the numbers you need to extract are separated with 2 or more whitespaces, else it would be impossible to differentiate between the end of the previous number and the start of a new one.
If you need to extract the numbers in the formats as shown above, XXX XXX.XXX or XXX,XXX,XXX.XX or XXX or XXX XXX XXX, you may use
\b\d{1,3}(?:[, ]\d{3})*(?:\.\d+)?\b
See the regex demo
Details:
\b - leading word boundary
\d{1,3} - 1 to 3 digits
(?:[, ]\d{3})* - 0+ sequences of a comma or space ([, ]) and 3 digits (\d{3})
(?:\.\d+)? - an optional sequence of a dot followed with 1+ digits
\b - trailing word boundary
A less restrictive pattern would be the same as above, but with limiting quantifiers replaced with a +:
\b\d+(?:[, ]\d+)*(?:\.\d+)?\b
See this regex demo
It will also match numbers like 1234566 and 124354354.343344.

Regex preg_match to neutralize a pricelist, keeping only digits, dots and commas*

I am using preg_match (PHP version 5.5.*) and want to ignore all alphabetic letters [a-zA-Z] and special symbols such as $ and -, only to match numbers, commas, dots. Whitespaces between numbers such as 6 000 should be matched. Commas after a number that is not followed by another number should be ignored, such as 6, would only match 6
Note that this is used in a single string and never in a list, like the sample below. I use the list to show what input and desired output is, "per line".
Sample input:
1
1,99
1.99
10
100
5999 dollars
2 USD
$2,99
Our price 2.99
Price: $ 20
200 $
20,-
6 999 USD
Desired output:
1
1,99
1.99
10
100
5999
2
2,99
2.99
20
200
20
6 999
I have tried /([0-9.,\s]+)/ but the output of 6 999 USD becomes 6.
Edit
The code we are using looks like this:
preg_match($regex, $value, $extractions);
array_shift($extractions);
$this->persist($extractions);
Demo
Update:
If you have   instead of spaces, you can do two things..my recommended is to just do a str_replace() first:
str_replace(' ', ' ', $number);
The other option is to also check for   with the [\s,] group:
[\d.](?:[\d.]|(?:[\s,]| )(?=\d))*
Example:
preg_match('/[\d.](?:[\d.]|[\s,](?=\d))*/', $number, $matches);
$number = reset($matches);
Explanation:
So I classified the valid characters (digits, spaces, commas, and periods) into two groups: [\d.] and [\s,]. A number must start with a digit or a period ($.99 == .99 != 99). Then we use a repeated non-capturing group (?:...)* to take care of our alternation and lookahead assertions. Anytime there is a [\d.] we match it with now questions asked. Otherwise (|), it it is a [\s,] we assert that it is followed with a digit using a lookahead ((?=...)).
Demo
Example:
preg_replace('/\s*[^\d\s,.]+\s*|,(?!\d)/', '', $number);
Explanation:
[^\d\s,.]+ will match 1+ characters that are not either a digit, whitespace, a comma, or a period. We put \s* on either side to grab any extra whitespace around these unwanted characters (like in "Our price "). The only unwanted character this doesn't match is a trailing comma. We use an alternation (|), then look for a comma, and then make sure that it is not followed by a digit using a negative lookahead ((?!...)).
Demo

RegEx with counting digits and allow special chars

I've done some searching but cant find the right regex.
i would like a regex for a text that only contains digits, whitespaces and plus signs.
like: [0-9 +]
But with a min/max limit for only the digits in that text.
My suggestions ended up with something like this:
^[0-9 \+](?=(.*[0-9]){5,8})$
Should be OK:
"123 456 7"
"12345"
"+ 123 456 78"
Should not be ok:
"123456789"
"+ 124 578a"
"+123456789"
Anyone got a solution that might do the trick?
Edit:
I can see that i was to short on my explanation what i'm aiming for.
My regex conditions should be:
Must include between 5-8 digits
Allow whitespaces and plus signs
I'm guessing from your own regex that between 5 and 8 digits in a row without a whitespace in between are allowed. If that's true, than the following regex might do the trick (example written in Python). It allows single digit groups being between 5 and 8 digits long. If there is more than one group, it allows each group to have exactly 3 digits except for the last group which can be between 1 and 3 digits long. One single plus sign on the left is optional.
Are you parsing phone numbers? :)
In [176]: regex = re.compile(r"""
^ # start of string
(?: \+\s )? # optional plus sign followed by whitespace
(?:
(?: \d{3}\s )+ # one or more groups of three digits followed by whitespace
\d{1,3} # one group of between one and three digits
| # ALTERNATIVE
\d{5,8} # one group of between five and eight digits
)
$ # end of string
""", flags=re.X)
# --- MATCHES ---
In [177]: regex.findall('123 456 7')
Out[177]: ['123 456 7']
In [178]: regex.findall('12345')
Out[178]: ['12345']
In [179]: regex.findall('+ 123 456 78')
Out[179]: ['+ 123 456 78']
In [200]: regex.findall('12345678')
Out[200]: ['12345678']
# --- NON-MATCHES ---
In [180]: regex.findall('123456789')
Out[180]: []
In [181]: regex.findall('+ 124 578a')
Out[181]: []
In [182]: regex.findall('+123456789')
Out[182]: []
In [198]: regex.findall('123')
Out[198]: []
In [24]: regex.findall('1234 556')
Out[24]: []
You can do something like this:
^(?:[ +]*[0-9]){5}(?:(?:[ +]*[0-9])?){3}$
See it here on Regexr
The first group (?:[ +]*[0-9]){5} are the 5 minimum digits, with any amount of spaces and plus before, the second part (?:(?:[ +]*[0-9])?){3} matches the optional digits, with any amount of spaces and plus before.
You were very close - you need to anchor the lookahead to the start of input, and add a second negative lookahead for the upper bound of the quantity of digits:
^(?=(.*\d){5,8})(?!(.*\d){9,})[\d +]+$
Also, fyi you don't need to escape the plus sign within the character class, and [0-9] is \d