I have downloaded a script from Microsoft which will allow us to take a string and convert it into a friendly format to display on user profiles.
The original string is tel:+441234123456;ext=3456.
What I need to do is convert it into a UK friendly format so that the converted string is 01234 123456.
The steps I think I need to take are :
Removing the tel:+44 and replacing with 0.
After first 4 digits add a space.
Finish the variable with the last 6 digits.
Remove the ;ext=3456
There was a similar process but for US suggested, unfortunately no knowing regex this goes over my head slightly!
$tel = $LineURI -replace 'tel:(\+1)([2-9]\d{2})([2-9]\d{2})(\d{4});ext=\d{4}','$1 ($2) $3-$4;'
this is a way using more than one -replace To simplify things at the cost of some performance:
$tel = $LineURI-replace 'tel:\+\d\d','0' -replace ';.+' -replace '(^.{5})','$1 '
A single regular expression should suffice:
PS C:\> 'tel:+441234123456;ext=3456' -replace '^tel:\+\d{2}(\d{4})(\d+);.*$', '0$1 $2'
01234 123456
Regular expression breakdown:
^tel:\+\d{2} matches a literal tel:+ followed by two digits at the beginning of the string (^).
(\d{4}) matches four subsequent digits. The parentheses group the match so that it can be referenced in the replacement as $1.
(\d+) matches the longest sequence of subsequent digits after the above, but at least one digit. This too is grouped by parentheses so that it can be referenced in the replacement as $2.
;.*$ matches the remainder of the string starting with a semicolon.
Related
I need to write a regular expression that has to replace everything except for a single group.
E.g
IN
OUT
OK THT PHP This is it 06222021
This is it
NO MTM PYT Get this content 111111
Get this content
I wrote the following Regular Expression: (\w{0,2}\s\w{0,3}\s\w{0,3}\s)(.*?)(\s\d{6}(\s|))
This RegEx creates 4 groups, using the first entry as an example the groups are:
OK THT PHP
This is it
06222021
Space Charachter
I need a way to:
Replace Group 1,2,4 with String.Empty
OR
Get Group 3, ONLY
You don't need 4 groups, you can use a single group 1 to be in the replacement and match 6-8 digits for the last part instead of only 6.
Note that this \w{0,2} will also match an empty string, you can use \w{1,2} if there has to be at least a single word char.
^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$
^ Start of string
\w{0,2}\s\w{0,3}\s\w{0,3}\s Match 3 times word characters with a quantifier and a whitespace in between
(.*?) Capture group 1 match any char as least as possible
\s\d{6,8} Match a whitespace char and 6-8 digits
\s? Match an optional whitespace char
$ End of string
Regex demo
Example code
Dim s As String = "OK THT PHP This is it 06222021"
Dim result As String = Regex.Replace(s, "^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$", "$1")
Console.WriteLine(result)
Output
This is it
My approach does not work with groups and does use a Replace operation. The match itself yields the desired result.
It uses look-around expressions. To find a pattern between two other patterns, you can use the general form
(?<=prefix)find(?=suffix)
This will only return find as match, excluding prefix and suffix.
If we insert your expressions, we get
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6}\s?)
where I simplified (\s|) as \s?. We can also drop it completely, since we don't care about trailing spaces.
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6})
Note that this works also if we have more than 6 digits because regex stops searching after it has found 6 digits and doesn't care about what follows.
This also gives a match if other things precede our pattern like in 123 OK THT PHP This is it 06222021. We can exclude such results by specifying that the search must start at the beginning of the string with ^.
If the exact length of the words and numbers does not matter, we simply write
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+)
If the find part can contain numbers, we must specify that we want to match until the end of the line with $ (and include a possible space again).
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+\s?$)
Finally, we use a quantifier for the 3 ocurrences of word-space:
(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)
This is compact and will only return This is it or Get this content.
string result = Regex.Match(#"(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)").Value;
I need a regular expression to match a string within a longer string.
Specifically I need to not match any leading zeros or the last 2 digits for the string.
For example, my input might be the following:
00009666666605
00010444444404
00007Z22222205
00033213433104
00009000G00005
And I would like to match
96666666
104444444
7Z222222
332134331
9000G000
For further information, the last 2 digits are always numbers and describe the starting point of the valid reference, after the leading zeros.
I thought I'd cracked it with something like
(?<=0000).{8}|((?<=000).{9})+? but that doesn't work as expected.
It sure takes a lot of steps, but this should do the trick:
(?<=^000)[^0].{8}|(?<=^0000).{8}
(?<= 'start lookbehind
^000 'for the beginning of the string then three zeroes
) 'end lookbehind
[^0] 'match a non-zero
.{8} 'match the remaining 8 chars
| ' OR
(?<= 'start lookbehind
^0000 'for the beginning of the string then four zeroes
) 'end lookbehind
.{8} 'match the remaining 8 chars
That said, in .NET, it will be quicker to do:
dim trimmed = line.TrimStart("0"c)
dim numberString = trimmed.Substring(0,trimmed.Length-2)
if the format of these string is always the same
I would use:
^0*(.*).{2}$
And access your matches via $1
Regex Storm demo
I am trying to match some strings using a regular expression in PowerShell but due to the differing format of the original string that I'm extracting from, encountering difficulty. I admittedly am not very strong with creating regular expressions.
I need to extract the numbers from each of these strings. These can vary in length but in both cases will be preceded by Foo
PC1-FOO1234567
PC2-FOO1234567/FOO98765
This works for the second example:
'PC2-FOO1234567/FOO98765' -match 'FOO(.*?)\/FOO(.*?)\z'
It lets me access the matched strings using $matches[1] and $matches[2] which is great.
It obviously doesn't work for the first example. I suspect I need some way to match on either / or the end of the string but I'm not sure how to do this and end up with my desired match.
Suggestions?
You may use
'FOO(.*?)(?:/FOO(.*))?$'
It will match FOO, then capture any 0 or more chars as few as possible into Group 1 and then will attempt to optionally match a sequence of patterns: /FOO, any 0 or more chars as many as possible captured into Group 2 and then the end of string position should follow.
See the regex demo
Details
FOO - literal substring
(.*?) - Group 1: any zero or more chars other than newline, as few as possible
(?:/FOO(.*))? - an optional non-capturing group matching 1 or 0 repetitions of:
/FOO - a literal substring
(.*) - Group 2: any 0+ chars other than newline as many as possible (* is greedy)
$ - end of string.
[edit - removed the unneeded pipe to Where-Object. thanks to mklement0 for that! [*grin*]]
this is a somewhat different approach. it splits on the foo, then replaces the unwanted / with nothing, and finally filters out any string that contains letters.
the pure regex solutions others offered will likely be faster, but this may be slightly easier to understand - and therefore to maintain. [grin]
# fake reading in a text file
# in real life, use Get-Content
$InStuff = #'
PC1-FOO1234567
PC2-FOO1234567/FOO98765
'# -split [environment]::NewLine
$InStuff -split 'foo' -replace '/' -notmatch '[a-z]'
output ...
1234567
1234567
98765
To offer a more concise alternative with the -split operator, which obviates the need to access $Matches afterwards to extract the numbers:
PS> 'PC1-FOO1234568', 'PC2-FOO1234567/FOO98765' -split '(?:^PC\d+-|/)FOO' -ne ''
1234568 # single match from 1st input string
1234567 # first of 2 matches from 2nd input string
98765
Note: -split always returns a [string[]] array, even if only 1 string is returned; result strings from multiple input strings are combined into a single, flat array.
^PC\d+-|/ matches PC followed by 1 or more (+) digits (\d) at the start of the string (^) or (|) a / char., which matches both PC2-FOO at the beginning and /FOO.
(?:...), a non-capturing subexpression, must be used to prevent -split from including what the subexpression matched in the results array.
-ne '' filters out the empty elements that result from the input strings starting with a separator.
To learn more about the regex-based -split operator and in what ways it is more powerful than the string literal-based .NET String.Split() method, see this answer.
I have the following regular expression:
/^[a-f0-9]{8}$/ --- This expression extracts an 8 character string as a md5 hash, for example: if I have the following string "hello world .305eef9f x1xxx 304ccf9f test1232" it will return "304ccf9f"
I also have the following regular expression:
/.[^.]*$/ --- This expression extracts a string after the last period (included), for example, if I have "hello world.this.is.atest.case9.23919sd3xxxs" it will return ".23919sd3xxxs"
Thing is, I've readen a bit about regex but I can't join both expressions in order to find the md5 string after the last period (included), for example:
topLeftLogo.93f02a9d.controller.99f06a7s ----> must return ".99f06a7s"
Thanks in advance for your time and help!
/^[a-f0-9]{8}$/ --- This expression extracts an 8 character string as a md5 hash
Yes but it doesn't return "304ccf9f" from "hello world .305eef9f x1xxx 304ccf9f test1232" because ^ in regex means start of string. How is it possible for it to match in middle of a string?
/.[^.]*$/ --- This expression extracts a string after the last period
No. It will do if you escape first dot only \.
To combine these two you have to replace ^ with \.:
\.[a-f0-9]{8}$
To match your characters 8 times after the last dot in this range [a-f0-9] you might use (if supported) a positive lookahead (?!.*\.) to match your values and assert that what follows does not contain a dot:
\.[a-f0-9]{8}(?!.*\.)
Regex demo
If you want to match characters from a-z instead of a-f like 99f06a7s you could use [a-z0-9]
About the first example
This regex ^[a-f0-9]{8}$ will match one of the ranges in the character class 8 times from the start until the end of the string due to the anchors ^ and $. It would not find a match in hello world .305eef9f x1xxx 304ccf9f test1232 on the same line.
About the second example
.[^.]*$ will match any character zero or more times followed by matching not a dot. That would for example also match a single a and is not bound to first matching a dot because you have to escape the dot to match it literally.
I'm adding this just in case people needs to solve a similar casuistic:
Case 1: for example, we want to get the hexadecimal ([a-f0-9]) 8 char string from our filename string
between the last period and the file extension, in order, for example, to remove that "hashed" part:
Example:
file.name2222.controller.2567d667.js ------> returns .2567d667
We will need to use the following regex:
\.[a-f0-9]{8}(?=\.\w+$)
Case 2: for example, we want the same as above but ignoring the first period:
Example:
file.name2222.controller.2567d667.js ------> returns 2567d667
We will need to use the following regex
[a-f0-9]{8}(?=\.\w+$)
I'm creating a regex to process the line below as read from a file.
30/05/2014 17:58:19 418087******2093 No415000345536 5,000.00
I have successfully created the regex but my issue is that the string may sometimes appear as below with a slight addition (bold highlight)
31/05/2014 15:06:29 410741******7993 0027200004750 No415100345732 1,500.00
Please assist in altering the pattern to ignore the integer of 13 digits that I don't need.
Below is my regex pattern
((?:(?:[0-2]?\d{1})|(?:[3][01]{1}))[-:\/.](?:[0]?[1-9]|[1][012])[-:\/.](?:(?:[1]{1}\d{1}\d{1}\d{1})|(?:[2]{1}\d{3})))(?![\d])(\s+)((?:(?:[0-1][0-9])|(?:[2][0-3])|(?:[0-9])):(?:[0-5][0-9])(?::[0-5][0-9])?(?:\s?(?:am|AM|pm|PM))?)(\s+)(\d{6})(\*{6})(\d{4})(\s+)(No)(\d+)(\s+)([+-]?\d*\.\d+)(?![-+0-9\.])
Advice and contribution will be highly appreciated.
The regular expression in question was most likely created using a regular expression builder.
Here is your regular expression reduced to its component parts, simplified and with support for both variants of valid strings.
Date with a not complete validation (invalid days in month still possible):
(?:0?[1-9]|[12]\d|3[01])[-:\/.](?:0?[1-9]|1[012])[-:\/.](?:19|20)\d\d
Whitespace(s) between date and time:
[\t ]+
\s matches also newline characters and other not often used whitespaces which is the reason why I'm using [\t ]+ instead of \s.
Time with at least hour and minute with a not complete validation (leap second, AM or PM with invalid hour):
(?:[01]?\d|2[0-3]):[0-5][0-9](?::[0-5][0-9])?(?:[\t ]?(?:am|AM|pm|PM))?
Whitespace(s), number with 4 digits, 6 asterisk, number with 4 digits, whitespace(s):
[\t ]+\d{6}\*{6}\d{4}[\t ]+
Optionally a number with 13 digits not marked for backreferencing:
(?:\d{13}[\t ]+)?
Number with undetermined number of digits, whitespace(s), optional plus or minus sign, floating point number (without exponent):
No\d+[\t ]+[+-]?[\d,.]+
And here is the entire expression with 2 additionally added pairs of parentheses to mark the strings of real interest for further processing.
((?:0?[1-9]|[12]\d|3[01])[-:\/.](?:0?[1-9]|1[012])[-:\/.](?:19|20)\d\d[\t ]+(?:[01]?\d|2[0-3]):[0-5][0-9](?::[0-5][0-9])?(?:[\t ]?(?:am|AM|pm|PM))?[\t ]+\d{6}\*{6}\d{4}[\t ]+)(?:\d{13}[\t ]+)?(No\d+[\t ]+[+-]?[\d,.]+)
The first marking group matches:
30/05/2014 17:58:19 418087******2093
31/05/2014 15:06:29 410741******7993
\1 or $1 can be used to reference this part of entire found string.
The second marking group matches:
No415000345536 5,000.00
No415100345732 1,500.00
\2 or $2 can be used to reference this part of entire found string.
Hint: (...) is a marking group. (?:...) is a non-marking group because of ?: immediately after opening parenthesis.