Regular expressions aren't exactly my strong suit. I got a regex for validating international phone numbers here. The validation bit works for me but I don't understand how I can take the regex result and use it to format the number. My question is how do I figure out, from the regex, what the groupings are that I can use to display?
var intl1RegexObj = /^((\+)?[1-9]{1,2})?([-\s\.])?((\(\d{1,4}\))|\d{1,4})(([-\s\.])?[0-9]{1,12}){1,2}$/;
if (IntlRegexObj.test(businessPhoneValue))
{
var formattedPhoneNumber = businessPhoneValue.replace(IntlRegexObj, "($1)");
// display formatted result
}
After simplifying that mess of a regex:
if (subject.match(/^((?:\+)?[1-9]{1,2})?[\-\s.]?((?:\(\d{1,4}\))|\d{1,4})([\-\s.]?\d{1,12}){1,2}$/)) {
// Successful match
}
There are now only 3 capturing groups.
First one $1 is easy, the country code with an optional +.
Then you have the local area code, basically 1-4 numbers with / without parentheses optionally prefixed by [-\s.]. That's $2
Finally you have your the actual phone number which can be from 1 to 24 numbers, including optional space or dot or minus sign [-\s.]
More detailed explanation:
"
^ # Assert position at the beginning of the string
( # Match the regular expression below and capture its match into backreference number 1
(?: # Match the regular expression below
\+ # Match the character “+” literally
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
[1-9] # Match a single character in the range between “1” and “9”
{1,2} # Between one and 2 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
[-\s.] # Match a single character present in the list below
# The character “-”
# A whitespace character (spaces, tabs, line breaks, etc.)
# The character “.”
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 2
# Match either the regular expression below (attempting the next alternative only if this one fails)
(?: # Match the regular expression below
\( # Match the character “(” literally
\d # Match a single digit 0..9
{1,4}# Between one and 4 times, as many times as possible, giving back as needed (greedy)
\) # Match the character “)” literally
)
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\d # Match a single digit 0..9
{1,4} # Between one and 4 times, as many times as possible, giving back as needed (greedy)
)
( # Match the regular expression below and capture its match into backreference number 3
[-\s.] # Match a single character present in the list below
# The character “-”
# A whitespace character (spaces, tabs, line breaks, etc.)
# The character “.”
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
\d # Match a single digit 0..9
{1,12} # Between one and 12 times, as many times as possible, giving back as needed (greedy)
){1,2} # Between one and 2 times, as many times as possible, giving back as needed (greedy)
$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
"
This regex is whoefully inadequate. As I go to your link, even a couple of the ones listed in non-match will match with this regex. The regex is purely an overlap of possibilities by the look of the groupings that happen to be capture groupings. And any sense of parsing out real parts of the number are sadly destroyed with this regex.
Expanded, it looks like this:
^
(
(\+)?
[1-9]{1,2}
)?
([-\s\.])?
(
(
\(\d{1,4}\)
)
|
\d{1,4}
)
(
([-\s\.])?
[0-9]{1,12}
){1,2}
$
I even tried to forumulate a proper capture grouping for its parts and sadly it shows the problems.
^
(?: \+ )?
( [1-9]{1,2} |) # Capt Group 1, international code (or not)
(?| # Branch Reset
\( (\d{1,4}) \) # Capure Group 2, area code
| (\d{1,4})
)
(?:[-\s.])?
( # Capt Group 3, the rest ########-########
[0-9]{1,12}
[-\s.]?
[0-9]{1,12}?
)
$
There might be something better out there, but this is just a validation wonder that doesen't really work correctly for the most part to do even that.
Regular expressions are not used to format anything. They just tell you if the string you are validating abides by the regular expression's rules. Example would be in a form where a user is entering a phone number. If the string they enter into the form doesn't match the regular expression then the form's validation which uses the regular expression to check the string will say something like, "Phone number is not in correct format."
Related
Given
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
I can use
$line -match '^\{(?<type>[a-z]+)(-\[(?<target>(C|F|CF))\])?(\[(?<tab>\d+)\])?\}_(?<string>.*)'
And $matches['tab'] will correctly have a value of 3. However, if I then want to increment that value, without also affecting the [3] in the string section things get more complicated. I can use $tabIndex = $line.indexOf("[$tab]") to get the index of the first occurrence, and I can also use $newLine = ([regex]"\[$tab\]").Replace($line, '[4]', 1) to only replace the first occurrence. But I wonder, is there a way to get at the this more directly? It's not strictly necessary, as I will only ever want to replace things within the initial {}_, which has a very consistent form, so replacing first instance works, just wondering if I am missing out on a more elegant solution, which also might be needed in a different situation.
I would change the regex a bit, because mixing Named captures with Numbered captures is not recommended, so it becomes this:
'^\{(?<type>[a-z]+)(?:-\[(?<target>[CF]{1,2})\])?(?:\[(?<tab>\d+)\])?\}_(?<string>.*)'
You could then use it like below to replace the tab value:
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
$newTabValue = 12345
$line -replace '^\{(?<type>[a-z]+)(?:-\[(?<target>[CF]{1,2})\])?(?:\[(?<tab>\d+)\])?\}_(?<string>.*)', "{`${type}-[`${target}][$newTabValue]}_`${string}"
The result of this will be:
{initError-[cf][12345]}_Invalid nodes(s): [3]
Regex details:
^ Assert position at the beginning of the string
\{ Match the character “{” literally
(?<type> Match the regular expression below and capture its match into backreference with name “type”
[a-z] Match a single character in the range between “a” and “z”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
(?: Match the regular expression below
- Match the character “-” literally
\[ Match the character “[” literally
(?<target> Match the regular expression below and capture its match into backreference with name “target”
[CF] Match a single character present in the list “CF”
{1,2} Between one and 2 times, as many times as possible, giving back as needed (greedy)
)
\] Match the character “]” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
(?: Match the regular expression below
\[ Match the character “[” literally
(?<tab> Match the regular expression below and capture its match into backreference with name “tab”
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\] Match the character “]” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
\} Match the character “}” literally
_ Match the character “_” literally
(?<string> Match the regular expression below and capture its match into backreference with name “string”
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
An alternative way of increasing the first number in the brackets is using the -Split operator to access the number you want to change:
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
$NewLine = $line -split "(\d+)"
$NewLine[1] = [int]$newLine[1] + 1
-join $NewLine
Output:
{initError-[cf][4]}_Invalid nodes(s): [3]
I have a manifest file
Bundle-ManifestVersion: 2
Bundle-Name: BundleSample
Bundle-Version: 4
I want to change the value of Bundle-Name using -replace in Powershell.
I used this pattern Bundle-Name:(.*)
But it returns including the Bundle-Name. What would be the pattern if I want to change only the value of the Bundle-Name?
You could capture both the Bundle-Name: and its value in two separate capture groups.
Then replace like this:
$manifest = #"
Bundle-ManifestVersion: 2
Bundle-Name: BundleSample
Bundle-Version: 4
"#
$newBundleName = 'BundleTest'
$manifest -replace '(Bundle-Name:\s*)(.*)', ('$1{0}' -f $newBundleName)
# or
# $manifest -replace '(Bundle-Name:\s*)(.*)', "`$1$newBundleName"
The above will result in
Bundle-ManifestVersion: 2
Bundle-Name: BundleTest
Bundle-Version: 4
Regex details:
( Match the regex below and capture its match into backreference number 1
Bundle-Name: Match the character string “Bundle-Name:” literally (case sensitive)
\s Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
( Match the regex below and capture its match into backreference number 2
. Match any single character that is NOT a line break character (line feed)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
Thanks to LotPings, there is even an easier regex that can be used:
$manifest -replace '(?<=Bundle-Name:\s*).*', $newBundleName
This uses a positive lookbehind.
The regex details for that are:
(?<= Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
Bundle-Name: Match the characters “Bundle-Name:” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
I have created the following expression: (.NET regex engine)
((-|\+)?\w+(\^\.?\d+)?)
hello , hello^.555,hello^111, -hello,+hello, hello+, hello^.25, hello^-1212121
It works well except that :
it captures the term 'hello+' but without the '+' : this group should not be captured at all
the last term 'hello^-1212121' as 2 groups 'hello' and '-1212121' both should be ignored
The strings to capture are as follows :
word can have a + or a - before it
or word can have a ^ that is followed by a positive number (not necessarily an integer)
words are separated by commas and any number of white spaces (both not part of the capture)
A few examples of valid strings to capture :
hello^2
hello^.2
+hello
-hello
hello
EDIT
I have found the following expression which effectively captures all these terms, it's not really optimized but it just works :
([a-zA-Z]+(?= ?,))|((-|\+)[a-zA-Z]+(?=,))|([a-zA-Z]+\^\.?\d+)
Ok, there are some issues to tackle here:
((-|+)?\w+(\^.?\d+)?)
^ ^
The + and . should be escaped like this:
((-|\+)?\w+(\^\.?\d+)?)
Now, you'll also get -1212121 there. If your string hello is always letters, then you would change \w to [a-zA-Z]:
((-|\+)?[a-zA-Z]+(\^\.?\d+)?)
\w includes letters, numbers and underscore. So, you might want to restrict it down a bit to only letters.
And finally, to take into consideration of the completely not capturing groups, you'll have to use lookarounds. I don't know of anyway otherwise to get to the delimiters without hindering the matches:
(?<=^|,)\s*((-|\+)?[a-zA-Z]+(\^\.?\d+)?)\s*(?=,|$)
EDIT: If it cannot be something like -hello^2, and if another valid string is hello^9.8, then this one will fit better:
(?<=^|,)\s*((?:-|\+)?[a-zA-Z]+|[a-zA-Z]+\^(?:\d+)?\.?\d+)(?=\s*(?:,|$))
And lastly, if capturing the words is sufficient, we can remove the lookarounds:
([-+]?[a-zA-Z]+|[a-zA-Z]+\^(?:\d+)?\.?\d+)
It would be better if you first state what it is you are looking to extract.
You also don't indicate which Regular Expression engine you're using, which is important since they vary in their features, but...
Assuming you want to capture only:
words that have a leading + or -
words that have a trailing ^ followed by an optional period followed by one or more digits
and that words are sequences of one or more letters
I'd use:
([a-zA-Z]+\^\.?\d+|[-+][a-zA-Z]+)
which breaks down into:
( # start capture group
[a-zA-Z]+ # one or more letters - note \w matches numbers and underscores
\^ # literal
\.? # optional period
\d+ # one or more digits
| # OR
[+-]? # optional plus or minus
[a-zA-Z]+ # one or more letters or underscores
) # end of capture group
EDIT
To also capture plain words (without leading or trailing chars) you'll need to rearrange the regexp a little. I'd use:
([+-][a-zA-Z]+|[a-zA-Z]+\^(?:\.\d+|\d+\.\d+|\d+)|[a-zA-Z]+)
which breaks down into:
( # start capture group
[+-] # literal plus or minus
[a-zA-Z]+ # one or more letters - note \w matches numbers and underscores
| # OR
[a-zA-Z]+ # one or more letters
\^ # literal
(?: # start of non-capturing group
\. # literal period
\d+ # one or more digits
| # OR
\d+ # one or more digits
\. # literal period
\d+ # one or more digits
| # OR
\d+ # one or more digits
) # end of non-capturing group
| # OR
[a-zA-Z]+ # one or more letters
) # end of capture group
Also note that, per your updated requirements, this regexp captures both true non-negative numbers (i.e. 0, 1, 1.2, 1.23) as well as those lacking a leading digit (i.e. .1, .12)
FURTHER EDIT
This regexp will only match the following patterns delimited by commas:
word
word with leading plus or minus
word with trailing ^ followed by a positive number of the form \d+, \d+.\d+, or .\d+
([+-][A-Za-z]+|[A-Za-z]+\^(?:.\d+|\d+(?:.\d+)?)|[A-Za-z]+)(?=,|\s|$)
Please note that the useful match will appear in the first capture group, not the entire match.
So, in Javascript, you'd:
var src="hello , hello ,hello,+hello,-hello,hello+,hello-,hello^1,hello^1.0,hello^.1",
RE=/([+-][A-Za-z]+|[A-Za-z]+\^(?:\.\d+|\d+(?:\.\d+)?)|[A-Za-z]+)(?=,|\s|$)/g;
while(RE.test(src)){
console.log(RegExp.$1)
}
which produces:
hello
hello
hello
+hello
-hello
hello^1
hello^1.0
hello^.1
I need a regular expression that will tell if a string is in the following format. The groups of numbers must be comma delimited. Can contain a range of numbers separated by a -
300, 200-400, 1, 250-300
The groups can be in any order.
This is what I have so far, but it's not matching the entire string. It's only matching the groups of numbers.
([0-9]{1,3}-?){1,2},?
Try this one:
^(?:\d{1,3}(?:-\d{1,3})?)(?:,\s*\d{1,3}(?:-\d{1,3})?|$)+
Since you didn't specify the number ranges I leave this to you. In any case you should do math with regex :)
Explanation:
"
^ # Assert position at the beginning of the string
(?: # Match the regular expression below
\\d # Match a single digit 0..9
{1,3} # Between one and 3 times, as many times as possible, giving back as needed (greedy)
(?: # Match the regular expression below
- # Match the character “-” literally
\\d # Match a single digit 0..9
{1,3} # Between one and 3 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
)
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
, # Match the character “,” literally
\\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\\d # Match a single digit 0..9
{1,3} # Between one and 3 times, as many times as possible, giving back as needed (greedy)
(?: # Match the regular expression below
- # Match the character “-” literally
\\d # Match a single digit 0..9
{1,3} # Between one and 3 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"
^(\d+(-\d+)?)(,\s*(\d+(-\d+)?))*$
This should work:
/^([0-9]{1,3}(-[0-9]{1,3})?)(,\s?([0-9]{1,3}(-[0-9]{1,3})?))*$/
You need some repetition:
(?:([0-9]{1,3}-?){1,2},?)+
To ensure that the numbers are correct, i.e. that you don't match numbers like 010, you might want to change the regex slightly. I also changed the range part of the regex, so that you don't match things like 100-200- but only 100 or 100-200, and added support for whitespaces after the comma (optional):
(?:(([1-9]{1}[0-9]{0,2})(-[1-9]{1}[0-9]{0,2})?){1,2},?\s*)+
Also, depending on what you want to capture, you might want to change the capturing brackets () to non capturing ones (?:)
UPDATE
A revised version based on the latest comments:
^\s*(?:(([1-9][0-9]{0,2})(-[1-9][0-9]{0,2})?)(?:,\s*|$))+$
([0-9-]+),\s([0-9-]+),\s([0-9-]+),\s([0-9-]+)
Try this regular expression
^(([0-9]{1,3}-?){1,2},?\s*)+$
Need a regular expression to validate username which:
should allow trailing spaces but not spaces in between characters
must contain at least one letter,may contain letters and numbers
7-15 characters max(alphanumeric)
cannot contain special characters
underscore is allowed
Not sure how to do this. Any help is appreciated. Thank you.
This is what I was using but it allows space between characters
"(?=.*[a-zA-Z])[a-zA-Z0-9_]{1}[_a-zA-Z0-9\\s]{6,14}"
Example: user name
No spaces are allowed in user name
Try this:
foundMatch = Regex.IsMatch(subjectString, #"^(?=.*[a-z])\w{7,15}\s*$", RegexOptions.IgnoreCase);
Is also allows the use of _ since you allowed this in your attempt.
So basically I use three rules. One to check if at least one letter exists.
Another to check if the string consists only of alphas plus the _ and finally I accept trailing spaces and at least 7 with a max of 15 alpha's. You are in a good track. Keep it up and you will be answering questions here too :)
Breakdown:
"
^ # Assert position at the beginning of the string
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[a-z] # Match a single character in the range between “a” and “z”
)
\w # Match a single character that is a “word character” (letters, digits, etc.)
{7,15} # Between 7 and 15 times, as many times as possible, giving back as needed (greedy)
\s # Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
"