RegEx for company name variations - regex

I have a requirement to accept string values ONLY where they meet the following criteria :
1) Can start with special character if required
2) Must start with capital letter ( Even if the first character is a special character )
3) The string value must not have 2 special characters in a row ( consecutive )
4) The string value must not have 2 spaces in a row ( consecutive )
5) Accented characters are allowed ( eg: Faddas )
6) Enclosed values at the start of the string or at the end are valid but must be inside parenthesis ( ie: (Ltd) )
7) Numerics are allowed anywhere in the string value
I have the following regex value : ^(\(([^)]+)\))?[\#\#\$\%\&\*\(\)\-\_\+\]\[\'\;\:\?\.\,\!]?\p{Lu}+[\s'-]?\p{L}+(?:[\s'-]\p{L}+)+(\(([^)]+)\))*$
This works ok for the following tested values :
Éast-Shipping-ltd
Éast-Shipping(LTD)
But fails the next example :
Éast-123Shipping(LTD)
Is there any way to allow for numerics mid string ?
I have tried [0-9] variations, [A-Za-z09] variations and p{N} variations but to no avail.
Many thanks for your time.

This is a REALLY nasty pattern, but I was able to simplify it a bit and do what you wanted:
^(\(([^)]+)\))?[[:punct:]]?\p{Lu}+(?:[\s'-]?[\p{L}\d]+)+(\(([^)]+)\))*$
There are lots of useful shorthand character classes, including [[:punct:]], which I used to replace your massive punctuation character class. To add the ability to include numbers, I put the \p{L} in a character class with the \d token, which will match any number (in any language, with the Unicode flag).
Demo on Regex101

Here we have some characters acceptable for company names
^[0-9A-Za-zÀ-ÿ\s,._+;()*~'##!?&-]+$

Related

character 0: character set expected

I want to define a table name by regular expression defined here such that:
Always begin a name with a letter, an underscore character (_), or a
backslash (). Use letters, numbers, periods, and underscore
characters for the rest of the name.
Exceptions: You can’t use "C", "c", "R", or "r" for the name, because
they’re already designated as a shortcut for selecting the column or
row for the active cell when you enter them in the Name or Go To box.
let lex_valid_characters_0 = ['a'-'z' 'A'-'Z' '_' '\x5C'] ['a'-'z' 'A'-'Z' '0'-'9' '.' '_']+
let haha = ['C' 'c' 'R' 'r']
let lex_table_name = lex_valid_characters_0 # haha
But it returns me an error character 0: character set expected.. Could anyone help?
Here is the description of # from the manual:
regexp1 # regexp2
(difference of character sets) Regular expressions regexp1 and regexp2 must be character sets defined with [… ] (or a single character expression or underscore _). Match the difference of the two specified character sets.
The description says the two sets must be character sets defined with [ ... ] but your definition of lex_valid_characters_0 is far more complex than that.
The idea of # is that it defines a pattern that matches exactly one character from a set specified as the difference of two one-character patterns. So it doesn't make sense to apply it to lex_valid_characters_0, which matches strings of arbitrary length.
Update
Here is my thinking on the problem, for what it's worth. There are no extra restrictions on names that are 2 or more characters long (as I read the spec). So it shouldn't be too difficult to specify a regular expression for these names. And it also wouldn't be that hard to come up with a regular expression that defines all the valid 1-character names. The full set of names is the union of these two sets.
You could also use the fact that the longest, first match is the one that applies for ocamllex. I.e., you could have rules for the 4 special cases before the general rule.

Regex: not all BLANKS but allow certain characters, with limit

Trying to come up with a Regex, or combination of Regex, that returns False if a) they have only entered only BLANK(s), or they b) entered "non-legal" characters. Lastly, the number of characters has a set limit.
The closest I have thus far is below. Where it fails is that it does not count any leading spaces; only the non-BLANKs are counted, and so it fails. Using js.
const reg = /^(**[ ]***[!-~\u2018-\u201d\u2013\u2014]){1,10}$/;
EDIT: I think the above is incorrect, and I meant to post this:
const re4 = /^(?!\s*$)[!-~\u2018-\u201d\u2013\u2014]{1,10}$/;
EDIT 2: this has less clutter; allow space and all other 'standard' keyboard chars:
const re5 = /^(?!\s*$)[!-~]{1,10}$/;
So, this says you can enter a bunch of spaces, and must include at least 1 other character from the list following; but the {1,10} only counts the non-spaces and so I can end up with too many in total.
EDIT:
So, using re5 above --
s = ' '; // should fail
s = ' blah blah'; // should pass
s = ' blah blah'; // should fail, as there are 11 characters
Try ^(?:\s*\S){1,10}\s*$
Allow 1-10 non whiter, change \S to allow chars
Update 2: After learning that you cannot invert the match result in code, here's one last suggestion using negative lookahead (like you already tried yourself).
This regex matches only strings of 1-10 non-banned characters that are not all whitespace:
const re4 = /^(?!\s+$)[^\!-\~\u2018-\u201d\u2013\u2014]{1,10}$/
Update 1: Use this regex to match all-whitespace string OR strings longer than 10 chars OR strings containing bad characters:
const re4 = /(^\s+$|^.{11,}$|[\!-\~\u2018-\u201d\u2013\u2014])/
I understand that you want to impose a length restriction via regex. I would suggest against that and recommend using str.length instead.
This regex will match whitespace-only strings and strings containing one or more bad characters:
const re4 = /(^\s+$|[\!-\~\u2018-\u201d\u2013\u2014])/;
Regarding prohibition of all-whitespace strings: Instead of packing it into a regex, you might consider using something more explicit like if (s.trim().length == 0). IMO this makes your intention clearer and your code propably more readable, leaving you with this easy to read regex:
# matches any string containing a *bad* character
const re4 = /[\!-\~\u2018-\u201d\u2013\u2014]/;
If you use trim for the all-whitespace check, you might convert your regex into a positive assertion, even with length restriction:
# matches any string consisting of 1-10 characters not considered *bad*
const re4 = /^[^\!-\~\u2018-\u201d\u2013\u2014]{1,10}$/;
To match the input when it’s from 1 to 10 chars long and can't be all blanks, use a negative look ahead to assert not all blanks:
^(?! *$).{1,10}
If you want to restrict allowable chars, change the dot to a suitable character class of allowable chars.

Regex for string representation of a method call

I have a string that follows a specific pattern like so
operator(field,value)
and I'd like to use regex to extract out all three of operator, field and value. I'm struggling to come up with the syntax for how to capture these. In this case value can be alphanumeric as well, for example
"contains(name, Joe)"
or "lt(quantity, 2.5)"
Use something like this to capture groups, you may want to limit the characters accepted with [], note the use of ` and the use of \ escaping for () within the regexp:
func main() {
re := regexp.MustCompile(`(.+)\((.+),\s?(.+)\)`)
for _, t := range tests {
fmt.Println("result", re.FindStringSubmatch(t))
}
}
https://play.golang.org/p/43YLTafgQt
output:
result [contains(field, value) contains field value]
result [contains(name, Joe) contains name Joe]
result [lt(quantity, 2.5) lt quantity 2.5]
result [plus(no,44) plus no 44]
Depending on how strict you want to be you could use [a-z]+ or similar instead of .+ to match only certain characters but if you are not worried about bogus values this would probably be fine.
I don't know golang, but I do know regex's, so I'll do what I can here.
You probably want a group each for the "operator", "field", and "value". I'm going to assume for now that each of these can be represented as any combination of alphabetic, numeric, or underscore characters, with length of at least one character. In regex, we have a shortcut for that: \w represents a single alpha-numeric or underscore character, and the + modifier means "one or more". So \w+ means one or more such character in a row. If you want a more complex definition of what these fields can be named, I'll let you specify that in your question.
You say that you want to support "operator(field,value)". I'll start without whitespace anywhere, because it's simpler and you can easily remove all whitespace yourself before running the regex. We'll later add some whitespace support to the regex if you want it, but it'll make life difficult.
To do this, we want three groups, "1(2,3)" where 1 is the operator name, 2 is the field name, and 3 is the value name. Each of these, as given above, will be \w+ in our regex. We'll want to match the open and close parentheses as well as the comma, but we'll throw them away because they're really just delimiters. The parentheses will need to be escaped in the regex, since regex's have a special meaning for parentheses. The result looks like:
(\w+)\((\w+),(\w+)\)
\ 1 / \ 2 / \ 3 /
Where the second line shows you where the groups are each defined.
If you want to support some whitespace, you'll need to add \s* in all such locations. This gets hairy, but you can do it as such:
(\w+)\s*\(\s*(\w+)\s*,\s*(\w+)\s*\)
\ 1 / \ 2 / \ 3 /
You give an example of wanting to support floating point values, and I presume other kinds of values too. You can accomplish this using the "or" pipe, |. For example, group 3, instead of just being \w+, could be defined as
[a-zA-Z_]\w*|\d+\.?|\d*\.\d+
This string will support alphanumeric+underscore strings where the first character must be alphabetic or underscore, OR integers, OR floating point (defined as an integer string with a period at the beginning, middle, or end). Clearly, this can go on and on to support more complex string values, but you get the idea.
So the final regex might look like:
(\w+)\s*\(\s*(\w+)\s*,\s*([a-zA-Z_]\w+|\d+\.?|\d*\.\d+)\s*\)
Sorry for not giving any golang help, I hope someone else can edit my answer and fill in that major gap.

Regular Expression for the Pattern?

I'm required to write a regular expression that has the following rules:
Digits between 1 to 4
hyphen (only one and can occur at any position)
Length of Text must be less than or equal to 6 (including the potential hyphen)
May end with a letter or a number, but not a hyphen.
Some valid examples are:
1-3411
12-413
123-2A
11-1
These examples are invalid:
12--11 ( since it contains two hyphens)
1-2345 ( since it contains number 5)
11-2311 ( since length is more than 6)
The RegEx that I wrote is:
^[1-4]-[1-4]{4}|^[1-4]{2}-[1-4]{3}|^[1-4]{3}-[1-4]{2}|^[1-4]{4}-[1-4]
However, this does not seem to be working, and it doesn't handle the case of a single character being is present in the end.
Can some some please help me determine a way of handling this?
<>
is character occurs in last position then before character we must have a digit not hypen .
i.e 11-a ( must fail)
11-1a (must pass)
^(?!(?:[^-\n]*-){2})(?:[1-4-]{1,5}[1-4]|[1-4-]{1,5}[a-zA-Z])$
You can handle that using a lookahead.See demo.
https://regex101.com/r/tS1hW2/16
If you have such a complex requirement, it is always easy to use lookarrounds to form an and-pattern matching each condition at the same time. Sometimes you need to split up ONE condition into two:
Base-Match: 6 or less digits: ^.{1,6}$
(AND) Only 1-4 and hyphen and letter: ^[1-4a-z\-]+$ (not accurate, requires next line)
(AND) First 1...5 elements NO Letter: ^[1-4\-]{1,5}[1-4a-z]$
(AND) No double hypen and not at the end: ^[^-]*-[^-]+$
Putting all together leads to:
(?=^[1-4\-]{1,5}[1-4a-z]$)(?=^[^-]*-[^-]*$)(?=^[1-4a-z\-]+$)^.{1,6}$
Debuggex Demo

Regular expression for password (at least 2 digits and one special character and minimum length 8)

I have been searching for regular expression which accepts at least two digits and one special character and minimum password length is 8. So far I have done the following: [0-9a-zA-Z!##$%0-9]*[!##$%0-9]+[0-9a-zA-Z!##$%0-9]*
Something like this should do the trick.
^(?=(.*\d){2})(?=.*[a-zA-Z])(?=.*[!##$%])[0-9a-zA-Z!##$%]{8,}
(?=(.*\d){2}) - uses lookahead (?=) and says the password must contain at least 2 digits
(?=.*[a-zA-Z]) - uses lookahead and says the password must contain an alpha
(?=.*[!##$%]) - uses lookahead and says the password must contain 1 or more special characters which are defined
[0-9a-zA-Z!##$%] - dictates the allowed characters
{8,} - says the password must be at least 8 characters long
It might need a little tweaking e.g. specifying exactly which special characters you need but it should do the trick.
There is no reason, whatsoever, to implement all rules in a single regex.
Consider doing it like thus:
Pattern[] pwdrules = new Pattern[] {
Pattern.compile("........"), // at least 8 chars
Pattern.compile("\d.*\d"), // 2 digits
Pattern.compile("[-!"§$%&/()=?+*~#'_:.,;]") // 1 special char
}
String password = ......;
boolean passed = true;
for (Pattern p : pwdrules) {
Matcher m = p.matcher(password);
if (m.find()) continue;
System.err.println("Rule " + p + " violated.");
passed = false;
}
if (passed) { .. ok case.. }
else { .. not ok case ... }
This has the added benefit that passwort rules can be added, removed or changed without effort. They can even reside in some ressource file.
In addition, it is just more readable.
Try this one:
^(?=.*\d{2,})(?=.*[$-/:-?{-~!"^_`\[\]]{1,})(?=.*\w).{8,}$
Here's how it works shortly:
(?=.*\d{2,}) this part saying except at least 2 digits
(?=.*[$-/:-?{-~!"^_[]]{1,})` these are special characters, at least 1
(?=.*\w) and rest are any letters (equals to [A-Za-z0-9_])
.{8,}$ this one says at least 8 characters including all previous rules.
Below is map for current regexp (made with help of Regexper)
UPD
Regexp should look like this ^(?=(.*\d){2,})(?=.*[$-\/:-?{-~!"^_'\[\]]{1,})(?=.*\w).{8,}$
Check out comments for more details.
Try this regex. It uses lookahead to verified there is a least two digits and one of the special character listed by you.
^(?=.*?[0-9].*?[0-9])(?=.*[!##$%])[0-9a-zA-Z!##$%0-9]{8,}$
EXPLANATION
^ #Match start of line.
(?=.*?[0-9].*?[0-9]) #Look ahead and see if you can find at least two digits. Expression will fail if not.
(?=.*[!##$%]) #Look ahead and see if you can find at least one of the character in bracket []. Expression will fail if not.
[0-9a-zA-Z!##$%0-9]{8,} #Match at least 8 of the characters inside bracket [] to be successful.
$ # Match end of line.
Regular expressions define a structure on the string you're trying to match. Unless you define a spatial structure on your regex (e.g. at least two digits followed by a special char, followed by ...) you cannot use a regex to validate your string.
Try this : ^.*(?=.{8,15})(?=.*\d)(?=.*\d)[a-zA-Z0-9!##$%]+$
Please read below link for making password regular expression policy:-
Regex expression for password rules