NetBIOS Name Regular Expression - c++

I have a question
according to this link http://support.microsoft.com/kb/188997
(
A computer name can be up to 15 alphanumeric characters with no blank spaces. The name must be unique on the network and can contain the following special characters:
! # # $ % ^ & ( ) - _ ' { } . ~
The Following characters are not allowed:
\ * + = | : ; " ? < > ,
)
and I am developing in C++
so i used the following code but when i input character which isn't allowed.. it is matched ! why ?
regex rgx("[a-zA-Z0-9]*(!|#|#|$|%|^|&|\(|\)|-|_|'|.|~|\\{|\\})*[a-zA-Z0-9]*");
string name;
cin>>name;
if (regex_match(name, rgx))
{
cout << " Matched :) " << endl;
}
else
cout << "Not Matched :(" << endl;
your help will be greatly appreciated :)

Your regular expression will match any string, because all your quantifiers are "none or more characters" (*) and since you're not looking for start and end of the string, you'll match even empty strings. Also you're using an unescaped ^ within one pair of brackets ((...|^|...), which will never match, unless this position is the beginning of a string (which may happen due to the *quantifier as explained above).
It's a lot more easier to achieve what you're trying to though:
regex rgx("^[\\w!##$%^()\\-'{}\\.~]{1,15}$");
If you're using C++11, you might as well use a raw string for better readability:
regex rgx(R"(^[\w!##$%^()\-'{}\.~]{1,15}$)");
This should match all valid names containing at least one (and up to) 15 of the selected characters.
\w matches any "word" character, that is A-Z, a-z, digits, and underscores (and based on your locale and regex engine possibly also umlauts and accented characters). Due to this it might be better to actually replace it with A-Za-z\d_ in the above expression:
regex rgx("^[A-Za-z\\d_!##$%^()\\-'{}\\.~]{1,15}$");
Or:
regex rgx(R"(^[A-Za-z\d_!##$%^()\-'{}\.~]{1,15}$)");
{a,b} is a quantifier matching the previous expresssion between a and b times (inclusive).
^ and $ will force the regular expression to fill the whole string (since they'll match beginning and end).

Look here: http://www.cplusplus.com/reference/regex/ECMAScript/ . There you have something about special characters (with a special meaning for a regex).
For example, ^ has a special meaning in a regex, so you must escape it: \^. Other special characters are: $ \ . * + ? ( ) [ ] { } |.
Also, I thing your regex will not allow names like a-b-c (multiple parts of special characters, or more than two parts of alphanumerical characters).

Related

Get everything except special words

I need a regex which match everything expect for several words.
The input-string is something like:
This Is A &ltTest$gt;
It should match
This Is A Test
So I want to have everything around , < and >
I've tried something like [^ ] to ignore all appearances of but this excludes every character.
/&[a-zA-Z]{2,8};/g
Breakdown:
& - match & literally
[a-zA-Z]{2,8} - match any characters in ranges a-z and A-Z from 2 to 8 times
; - until a semi colon
The longest special character that you could encounter is ϑ - ϑ, and so I've taken this into account in the regex.
The proper formatting replaces each of the special characters with a space, and replaces multiple spaces in a row with a single space
let regex = /&[a-zA-Z]{2,8};/g,
string = "This Is A <Test>",
properlyFormatted = string.replace(regex, " ").replace(/\ +/g, " ");
console.log(properlyFormatted);
The alternative:
/&(?:lt|gt|nbsp);/g
Breakdown:
& - match & literally
(?:lt|gt|nbsp) - match any group in lt, gt, nbsp
; - directly followed by a semi colon
This regex will only take into account the specific characters you described.
let regex = /&(?:lt|gt|nbsp);/g,
string = "This Is A <Test>",
properlyFormatted = string.replace(regex, " ").replace(/\ +/g, " ");
console.log(properlyFormatted);

changing RegEx from 3 digits to 4

I'm not that great at RegEx, and have the following piece of code on my hands:
value.replace(/\s*.*(\d+[,\.]\d+)[^\d]*/m, "$1");
Now it works great at reducing this "\r\n\t\t\t\t& #36;0.05 USD\t\t\t" (please note I've intentionally left a space between the & and # as removing it converts it to a dollar sign on the site) to this "0.05". The issue I have is that if the number is a double digit (10.05 rather than 0.05) the expression removes the digit from the front and still outputs 0.05 rather than 10.05.
From what I can see in the expression, it's hard coded to pick up just 3 digits, so I was wondering if there's a way to amend it to also work in cases where there are 4 digits.
The . after /\s* is matching the first digit if there are 2 or more digits. Remove that and see if it works...
value.replace(/\s*(\d+[,.]\d+)[^\d]/m, "$1");
Given your example of the regex:
/\s*.*(\d+[,.]\d+)[^\d]/m
And the data:
\r\n\t\t\t\t$0.05 USD\t\t\t
\r\n\t\t\t\t$10.05 USD\t\t\t
In the regex, the leading "/" (forward-slash), and the "/" before the "m" delimits the regex and is not part of the matching.
The "\s" in the regex is shorthand for [ \t\r\n\f] which matches whitespace (space, tab, Carriage-return, Line-feed, Form-feed). So, "\s*" will match "\r\n\t\t\t\t"
The "." (dot) in the regex matches any single character (generally any character except "\n").
The "*" following the "." says to match any 0 or more characters. So, together the ".*", matches the "$" (and possibly, additionally, one or more digits... see below).
Next, the "(" in the regex starts the part of the regex that will "capture" part of your data.
The "\d" in the regex will match any 1 number. Actually "\d" matches [0-9] and other digit characters, like Eastern Arabic numerals "??????????".
The "+" following the "\d" says to match any 1 or more numbers (digits).
The "[,.]" in the regex will match one of either a literal "." (dot), or a "," (comma), to match the "decimal" separator.
Another "\d+" to match any 1 or more numbers (digits).
Next, the ")" in the regex closes the part of the regex that will "capture" part of your data.
The "[^\d]" will match any 1 character that is not a number (digit). So, in this case, it will match the
" " (space).
The "m" at the end of the regex (following the second "/"): "m" changes the behavior of the "^" and "$" anchors, which are not used in your regex, so the "m" should have no effect. But, if you're using Ruby, "m" changes the behavior of the "." (dot).
Now, the "problem"... the ".*" (before the "("), is in regex terms, "greedy". This means it will match as "early" as possible, and for as "long" as possible. So, if there is more than 1 digit following the ";", then the ".*" will consume some digits.
Note: Using ".*" can cause all sorts of problems, especially with "/m" under Ruby. It's best to avoid using ".*" if possible.
There are 2 ways to fix this.
1) If the part before the number you want to capture is always "$", then specify that in regex instead of the ".*". So like this:
/\s*$(\d+[,.]\d+)[^\d]/m
or, if it will always be "$" or something very similar to that:
/\s*[^;]+;(\d+[,.]\d+)[^\d]/m
Here, "[^;]+;" means any string of 1 or more characters that does not contain a ";" followed by a "[;]".
2) If the part before the number you want to capture which is shown as "$", could be totally different in the data, then you just need to make sure that the part of the regex that is currently ".*" will not match a digit in the last position. So like this:
/\s[^.,]*[^\d](\d+[,.]\d+)[^\d]/m
Here, "[^.,]*[^\d]" means any string of 0 or more characters that does not contain a "." (dot) or a "," (comma) where the last character does not contain a digit.
Try this
value.replace( /\s*.(\d+[,.]\d+)[^\d]/m, "$1");
WORKING REGEX
Output:
The .* matches greedily and therefore matches as many characters, including digits, as it can, as long as the rest of the pattern can still match.
The rest of the pattern can still match if just one digit is left for the /d+ to match, so you only end up with one digit there.
If the semicolon in your example is always in that position in the strings you wish to match, use it as a marker like this
value.replace(/.*;(\d+[,\.]\d+).*/m, "$1");

What regular expression can verify the format of this type of string?

I need to verify that a string is in a certain format...here are the rules.
Can contain a colon and/or dot.
Both the colon and dot are optional
If a colon and/or dot is specified there must be at least one character to the left and one character to the right of the colon/dot.
The colon must be before the dot if both are specified
Only 0 or 1 colon and 0 or 1 dot is allowed
AnyString means a string of one or more unicode characters excluding colon and dot (colon and dot characters are not allowed as part of AnyString).
Examples:
Can be...
AnyString:AnyString.AnyString
AnyString:AnyString
AnyString.AnyString
AnyString
Cannot be...
AnyString:.AnyString
AnyString.AnyString:AnyString
AnyString:
AnyString.
:AnyString
.AnyString
I have tried lots of different combinations and I am just not good enough at Regular Expressions to get this one.
Thanks in advance
Well, that looks like:
It definitely starts with one or more non-colon-or-dot characters
It then optionally has a colon followed by one or more non-colon-or-dot characters
It then optionally has a dot followed by one or more non-colon-or-dot characters
If there are both "colon plus X" and "dot plus X" sections, the colon section must come first
(Note that none of your now-edited-in explanation was present when I wrote the above, so it was just based on the examples.)
So I'd expect that to be a regex like this:
^[^.:]+(?::[^.:]+)?(?:\.[^.:]+)?$
Notes:
You'd want to put all of this in a verbatim string literal to avoid having to escape the backslashes, e.g.
var regex = new Regex(#"^[^.:]+(?::[^.:]+)?(?:\.[^.:]+)?$");
^ matches the start of a string
[^.:] will match any character other than dot or colon
+ is the syntax for "at least one"
(?:<subexpression>) is the syntax for a non-capturing group
\. is an escaped dot, as . means "any character"
? is the syntax for "zero or one" (i.e. optional)
$ matches the end of a string
Test code:
using System;
using System.Text.RegularExpressions;
class Test
{
static readonly Regex regex =
new Regex(#"^[^.:]+(?::[^.:]+)?(?:\.[^.:]+)?$");
static void Main()
{
AssertValid("AnyString:AnyString.AnyString",
"AnyString:AnyString",
"AnyString.AnyString",
"AnyString");
AssertInvalid("AnyString:.AnyString",
"AnyString.AnyString:AnyString",
"AnyString:",
"AnyString:..Anystring",
"AnyString.",
":AnyString",
".AnyString");
}
static void AssertValid(params string[] inputs)
{
foreach (var input in inputs)
{
if (!regex.IsMatch(input))
{
Console.WriteLine("Expected to match but didn't: {0}",
input);
}
}
}
static void AssertInvalid(params string[] inputs)
{
foreach (var input in inputs)
{
if (regex.IsMatch(input))
{
Console.WriteLine("Expected not to match but did: {0}",
input);
}
}
}
}
/^[a-z]+[:.]?[a-z]+[:.]?[a-z]+$/i
How about that? That one doesn't include digits. What is "AnyString" allowed to contain?
This appears to meet all of your listed criteria:
^[^.:]+(:[^.:]+)?(\.[^.:]+)?$
Note that I assume AnyString can literally be anything that doesn't contain a colon or period. Also note that I added begin/end line anchors. You can remove those if you want.
This regex translates in human language to:
One or more characters that aren't a colon or period.
Optionally followed by a colon and then one or more characters that aren't a colon or period.
Optionally followed by a period and then one or more characters that aren't a colon or period.
try this
(.+):(.+)\.(.+)|(.+)\.(.+)
your match-rule is quite simple if we correctly break it down into logical pieces.
I will take the maximum of possible structure that your strings can contain
that is
TEXT:TEXT.TEXT
I'll break that structure as follows:
**TEXT** then **:CHARACTER** then **TEXT** then **.CHARACTER** then **TEXT**
This break-down implies
1. your text starts with a letter, then 0 or more series of letters follow
2. after it can either contain or not contain a colon which is immediately followed by a letter
3. then again 0 or more series of letter; pay attention here **0 or more**
4. then it can contain or not contain a dot immediately followed by a letter
5. then again 0 or more series of letters; pay attention here **0 or more**
In classic regex definition language your regular expression will look like as
[A..Za..z]+ (:[A..Za..z]){0,1} [A..Za..z]* (\.[A..Za..z]){0,1} [A..Za..z]*
I have separated the pieces that define the points 1 to 5 above for your ease of reading.
In actual usage there should be no blank spaces in the regex.
Hope this was of assistance.
Cheers.
Here is a basic version: (?:[^:.]+:)?(?:[^:.]+\.)?[^:.]+. If you define Anystring more rigorously, this can be improved.
Your basic requirement looks like it has 3 parts. Zero or one "Anystring"s followed by a colon, then zero or one "Anystring"s followed by a dot, followed by a mandatory "Anystring". This is reflected in the structure of the regex.
Given the information I have, I am considering [^:.]+ to be a regex that matches an Anystring, since the only constraints are
that it cannot be zero length
that it cannot contain colons or dots, (which is implied by the fact that a maximum of one colon and one period is allowed)

Need a regular expression for alphanumeric with 1 hypen present and space inbetween words

Can you please provide me with a regular expression that would
Allow only alphanumeric
Have definitely only one hyphen in the entire string
Hyphen or spaces not allowed at the front and back of the string
no consecutive space or hyphens allowed.
hypen and one space can be present near each other
Valid - "123-Abc test1","test- m e","abc slkh-hsds"
Invalid - " abc ", " -hsdj sdsd hjds- "
Thanks for helping me out on the same. Your help is much appreciated
/^([a-zA-Z0-9] ?)+-( ?[a-zA-Z0-9])+$/
See demo here.
EDIT:
If there can't be a space on both sides of the hyphen, then there needs to be a little more:
/^([a-zA-Z0-9] ?)+-(((?<! -) )?[a-zA-Z0-9])+$/
^^^^^^^^ ^
Alternatively, if negative lookbehind assertions aren't supported (e.g. in JavaScript), then an equivalent regex:
/^([a-zA-Z0-9]( (?!- ))?)+-( ?[a-zA-Z0-9])+$/
^ ^^^^^^^ ^
Only alphanumeric (hyphen and space included, otherwise it'd make no sense):
^[\da-zA-Z -]+$
This is the main part that will match the string and makes sure that every character is in the given set. I.e. digits and ASCII letters as well as space and hyphen (the use of which will be restricted in the following parts).
Only one hyphen and none at the start or end of the string:
(?=^[^-]+-[^-]+$)
This is a lookahead assertion making sure that the string starts and ends with at least one non-hyphen character. A single hyphen is required in the middle.
No space at the start or end or the string:
(?=^[^ ].*[^ ]$)
Again a lookahead, similar to the one above. They could be combined into one, but it looks much messier and is harder to explain.
No consecutive spaces (consecutive hyphens are ruled out already by 2. above):
(?!.* )
Putting it all together:
(?!.* )(?=^[^ ].*[^ ]$)(?=^[^-]+-[^-]+$)^[\da-zA-Z -]+$
Quick PowerShell test:
PS> $re='(?!.* )(?=^[^ ].*[^ ]$)(?=^[^-]+-[^-]+$)^[\da-zA-Z -]+$'
PS> "123-Abc test1","test- m e","abc slkh-hsds"," abc ", " -hsdj sdsd hjds- " -match $re
123-Abc test1
test- m e
abc slkh-hsds
Use this regex:
^(.+-.+)[\da-zA-Z]+[\da-zA-Z ]*[\da-zA-Z]+$

Regular Expression - Only match Alphanumerics and a SINGLE whitespace between words

I'm new to Regular Expressions...
I've been asked a regular expression that accepts Alphanumerics, a few characters more, and only ONE whitespace between words.
For example :
This should match :
"Hello world"
This shouldn't :
"Hello world"
Any ideas?
This was my expression:
[\w':''.'')''(''\[''\]''{''}''-''_']+$
I already tried the \s? (the space character once or never - right? ) but I didn't get it to work.
Using Oniguruma regex syntax, you could do something like:
^[\w\.:\(\)\[\]{}\-_](?: ?[\w\.:\(\)\[\]{}\-_])*$
Assuming that the 'other characters' are . : () [] {} - _
This regex will match a string that must begin and end with a word character or one of the other allowed characters and cannot have more than one space in a row.
If you're using the x flag (ignore whitespace in regular expression) you'll need to do this instead:
^[\w\.:\(\)\[\]{}\-_](?:\ ?[\w\.:\(\)\[\]{}\-_])*$
The only difference is the \ in front of the space.
What about:
^[\w\.:\(\)\[\]{}\-]+( [\w\.:\(\)\[\]{}\-]+)*$
Matches:
^[\w\.:\(\)\[\]{}\-]+: line begins with 1 or more acceptable characters (underscore is included in \w).
( [\w\.:\(\)\[\]{}\-]+): look to include a single separator character and 1 or more acceptable characters.
*$: repeat single separator and word 0 or more times.
Tested:
Hello(space)World: TRUE
Hello(space)(space)World: FALSE
Hello: TRUE
Hello(space): FALSE
Hello(tab)World: FALSE