Regex confution - regex

I know that there are a lot of topics like this one. I've spent a lot of hours checking expressions to make my code work. I don't really understand how regex work, so I hope you can help me out.
I want to validate this inputs (I hope I am not pushing it)
Only letters (with latin characters too)
Address (including dots, commas, colon, number sign and hyphen)
Telephone (numbers and hyphen)
like:
/[a-zA-ZÑñÁáÉéÍíÓóÚú]+$/ /* Only letters */
/[a-zA-Z0-9\sñáéíóúü .,:#-]+$/ /* Address */
/^[\d-]+$/ /* Telephone */
They work fine, when I include an special character at the end of the string but if I enter that special character between accepted characters it does not work. Allow me to write an example please:
For the "Only letters" expression:
ab[(% - Does not pass
a[(%b - It pass and it shouldn't!
Thanks a lot for your time, any help will be appreciate!

You forgot the ^ start of string anchor at the beginning of the 2 first patterns.
See demo 1:
^[a-zA-ZÑñÁáÉéÍíÓóÚú]+$
^
Same with the second regex. There, you also have a literal space and \s, so literal space can be removed:
^[a-zA-Z0-9\sñáéíóúü.,:#-]+$
^
See demo 2
And as for your third regex, it is not optimal since it will match ----1123.
Use
/^(?:\d+-)+\d+$/
See demo 3. Here, we match sequences of digits and hyphen (with (?:\d+-)+) and then a sequence of digits, from beginning till end.

The expression /[..]+$/ says that the test subject must have any of the characters (..) at its end. $ symbolises the end of the string. The beginning of the string does not have to match. If you want to enforce that for the entire string, use the beginning anchor as well:
/^[..]+$/
This now says the string must have any of the characters (..) between its beginning and end, and there's no room for anything else.
You're already doing this for the telephone regex.

Related

Regular expression to match a word that contains ONLY one colon

I am new to regex, basically I'd like to check if a word has ONLY one colons or not.
If has two or more colons, it will return nothing.
if has one colon, then return as it is. (colon must be in the middle of string, not end or beginning.
(1)
a:bc:de #return nothing or error.
a:bc #return a:bc
a.b_c-12/:a.b_c-12/ #return a.b_c-12/:a.b_c-12/
(2)
My thinking is, but this is seems too complicated.
^[^:]*(\:[^:]*){1}$
^[-\w.\/]*:[-\w\/.]* #this will not throw error when there are 2 colons.
Any directions would be helpful, thank you!
This will find such "words" within a larger sentence:
(?<= |^)[^ :]+:[^ :]+(?= |$)
See live demo.
If you just want to test the whole input:
^[^ :]+:[^ :]+$
To restrict to only alphanumeric, underscore, dashes, dots, and slashes:
^[\w./-]+:[\w./-]+$
I saw this as a good opportunity to brush up on my regex skills - so might not be optimal but it is shorter than your last solution.
This is the regex pattern: /^[^:]*:[^:]*$/gm and these are the strings I am testing against: 'oneco:on' (match) and 'one:co:on', 'oneco:on:', ':oneco:on' (these should all not match)
To explain what is going on, the ^ matches the beginning of the string, the $ matches the end of the string.
The [^:] bit says that any character that is not a colon will be matched.
In summary, ^[^:] means that the first character of the string can be anything except for a colon, *: means that any number of characters can come after and be followed by a single colon. Lastly, [^:]*$ means that any number (*) of characters can follow the colon as long as they are not a colon.
To elaborate, it is because we specify the pattern to look for at the beginning and end of the string, surrounding the single colon we are looking for that only the first string 'oneco:on' is a match.

REGEX : Must include only letter,number or character

I have the following REGEX
/^(?!.* )(?=.*[!##$\.%^&])(?=.*\d)(?=.*[A-Za-z])$/
it should not allow white space and contains a letter a digit and a character.
But I would like to have the following
do not contain space, and contains any of [!##$\.%^&], digits and character, so aaaaaaaa or !!!!!!!! would work.
but I can7t find how to validate the lot
With your shown samples, could you please try following. Here is Online regex demo
^[!#$a-z.%^&\dA-Z#]+$
Explanation: Simply looking for !#$a-z.%^&\dA-Z# characters from starting to end of the string if only these come then match the string if anything else is coming apart from these then don't match the string.

Is there a regular expression that allows trailing spaces for a word with fixed size?

We have some fields with user input; and so far our rule to validate this content was [A-Z][A-Z0-9]{0,7}.
Meaning: any uppercase word with at least one character; starting with a character; and up to 8 characters for the whole word.
Now I am told that we should accept "trailing" spaces as well; but of course - only trailing spaces. Update; as the first answer got that wrong: the maximum length of the whole word is still 8 characters! Because that is exactly the point that caused me to ask this question.
I guess this can be checked with TWO expressions:
a) [A-Z][A-Z0-9 ]{0,7}... must match the input and
b) [ ][A-Z0-9] must not match the input
(the second expression simply finding any "non-trailing" space)
But is there also a SINGLE regular expression that I could use to check for this condition?
Or is this one of the occasions, where well, though luck - regular expressions only accept context free grammars?!
If you want to allow trailing spaces only then use:
^[A-Z][A-Z0-9]{0,7} *$
Or:
^[A-Z][A-Z0-9]{0,7}\h*$
Here \h* is horizontal whitespace that matches 0 or space or tab characters at the end only.
EDIT: Based on edited question you can use this lookahead based regex:
^(?=[A-Z0-9\h]{1,8}$)[A-Z][A-Z0-9]*\h*$
RegEx Demo

combination "+" with "$" in regex

Thanks to everyone who has replied.
I think I have to tweak my first question a little bit.
I'm a little bit confusing because of the definition of $ sign.
It just asserts that there are between 6 and 10 word chars at the very end of the string.
That's it! Right? Then, It has to be matched with my test string "123a56A781231231231241" in my opinion. Because it doesn't break the rule! 6-10 word chars at the very beginning of string, and at the very end of string. Perfect, isn't it?
Plus, I want to know the difference between ^(?=\w{6,10}$) and ^(?=\w{6,10})$.
One more, Casimir et Hippolyte you said The + doesn't change anything, this means only that the quantifier ( {6,10} here) is possessive and doesn't allow backtracks.
Is that means + sign makes $ sign disable?
Thank you guys in advance.
Before I go any further, I want you guys to know that it's been only 2 days since I started to study about regex. I'm totally newbie.
First. ^(?=\w{6,10}$) This is pattern. Why the dollar signal has to be inside of () ? I know it's a dumb question but I'm curious. I tried to locate the dollar sign at the outside of (). But it didn't work as I expected.
Second. I found several tutorial site and it says the dollar sign means
"$ may appear at the end of a pattern to require the match to occur at the very end of a line. For example, abc$ matches 123abc but not abc123."
So $ is used to assert that the matched part of string is at the very end of a line. Right?
If that is true, why this pattern : "^(?=\w{6,10}$)" can't be matched with my test string : "123a56A781231231231241".
As you see, my test string contains 6~10 word characters at the very beginning of a line and 6~10 word characters at the very end of a line.
Third. As I mention earlier, this pattern : ^(?=\w{6,10}$) can't be matched with my test string : "123a56A781231231231241" But! if I add + sign behind of \w{6,10} like ^(?=\w{6,10}+$)
it works.
Is it because + sign is possessive? I mean,as far as I know, + sign tells the engine not to backtrack once a match has been made. So I hazard the guess, the $ sign doesn't do his job as it doesn't even do backtracking(I'm not sure about this,of course,as I don't know how the $ sign works behind). Is it right?
If that's your whole regex, you don't need a look-ahead. ie these two regexes are equivalent:
^(?=\w{6,10}$)
^\w{6,10}$
Why the $ needs to be inside the bracket? That's because the (anchored) look ahead ^(?=\w{6,10}) just asserts that there are between 6 and 10 word chars at the front of the input. But it will succeed if there's more than 6-10 word chars at the front of the input.
By putting the $ inside the look ahead, it will only succeed if there are 6-10 word chars in the whole input.
You would only use a look ahead if you also wanted to have another restriction. For example, to match
6-10 word chars, and "a" appears before "b"
you would use the regex:
^(?=\w{6,10}$).*a.*b
The (?=..) is a lookahead, it's a zero-width assertion, this means that it is just a check and matches nothing. In other word a lookahead means followed by.
The pattern ^(?=\w{6,10}$) means:
begining of the string followed by between 6 and 10 word characters until the end of the string.
Note that there isn't any character matched since all is inside a lookahead exêct the ^ that is zero-width too.
A match function can only return an empty string as match result, but will return true if the condition is met (otherwhise false)
The + doesn't change anything, this means only that the quantifier ( {6,10} here) is possessive and doesn't allow backtracks. More informations about this feature here: www.regular-expressions.info/possessive.html
I can't help you with this because I don't know what you mean. Are you trying to match against the test string in 2 and 3?
^(?=\w{6,10}$) is trying to match the beginning of the string, followed by 6-10 word characters and the end of the string. Your string is longer than 10 characters, so that won't match.
When you add the + it matches one or more instances of the 6-10 character string.
Adding the + should still not match, because either way you are looking to match a string exactly 6-10 chars long, but your test string is longer. Making it possessive won't change the match in this instance.

Limiting RegEx to match only a string of 1-254 characters length

This is my RegEx:
"^[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
I need to match only strings less than 255 characters.
I've tried adding the word boundaries at the start of the RegEx but it fails:
"^(?=.{1,254})[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
You need the $ in the lookahead to make sure it's only up to 254. Otherwise, the lookahead will match even when there are more than 254.
(?=.{1,254}$)
Also, keep in mind that you can greatly simplify your regex because many characters that would usually need to be escaped do not need to when in a character class (square brackets).
"[\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]"
is the same as this:
"[-\w!#$%&'*+/=`{|}~?^]"
Note that the dash must be first in the character class to be a literal dash, and the caret must not be first.
With some other simplifications, here is the complete string:
"^(?=.{1,254}$)[-\w!#$%&'*+/=`{|}~?^]+(\.[-\w!#$%&'*+/=`{|}~?^]+)*#((\d{1,3}\.){3}\d{1,3}|([-\w]+\.)+[a-zA-Z]{2,6})$"
Notes:
I removed the stipulation that the first char shouldn't be a period ([^.]) because the next character class doesn't match a period anyway, so it's redundant.
I removed many extraneous parens
I replaced [0-9] with \d
I replaced {0,1} with the shorthand "?"
After the # sign, it seemed that you were trying to match an IP address or text domain name, so I separated them more so it couldn't be a combination
I'm not sure what the optional square bracket at the end was for, so I removed it: "(]?)"
I tried it in Regex Hero, and it works. See if it works for you.
This depends on what language you are working in. In Python for example you can regex to split a text into separate strings, and then use len() to remove strings longer than the 255 characters you want
I think this post will help. It shows how to limit certain patterns but I am not sure how you would add it to the entire regex.