Limiting RegEx to match only a string of 1-254 characters length

Limiting RegEx to match only a string of 1-254 characters length - regex

This is my RegEx:
"^[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
I need to match only strings less than 255 characters.
I've tried adding the word boundaries at the start of the RegEx but it fails:
"^(?=.{1,254})[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"

You need the $ in the lookahead to make sure it's only up to 254. Otherwise, the lookahead will match even when there are more than 254.
(?=.{1,254}$)
Also, keep in mind that you can greatly simplify your regex because many characters that would usually need to be escaped do not need to when in a character class (square brackets).
"[\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]"
is the same as this:
"[-\w!#$%&'*+/=`{|}~?^]"
Note that the dash must be first in the character class to be a literal dash, and the caret must not be first.
With some other simplifications, here is the complete string:
"^(?=.{1,254}$)[-\w!#$%&'*+/=`{|}~?^]+(\.[-\w!#$%&'*+/=`{|}~?^]+)*#((\d{1,3}\.){3}\d{1,3}|([-\w]+\.)+[a-zA-Z]{2,6})$"
Notes:
I removed the stipulation that the first char shouldn't be a period ([^.]) because the next character class doesn't match a period anyway, so it's redundant.
I removed many extraneous parens
I replaced [0-9] with \d
I replaced {0,1} with the shorthand "?"
After the # sign, it seemed that you were trying to match an IP address or text domain name, so I separated them more so it couldn't be a combination
I'm not sure what the optional square bracket at the end was for, so I removed it: "(]?)"
I tried it in Regex Hero, and it works. See if it works for you.

This depends on what language you are working in. In Python for example you can regex to split a text into separate strings, and then use len() to remove strings longer than the 255 characters you want

I think this post will help. It shows how to limit certain patterns but I am not sure how you would add it to the entire regex.

Related

Regular Expression for email formatting without hypen at first and last

I have created the regular expression which will take the email address as in following format:
abc#xyz.com.in
Regular Expression
/^(?!-)[\w-\.]+#([\w-]+\.)+[\w-]{2,4}/
I am trying to do the email which is not having hyphen at start and last.
Invalid Format
-abc#xyz.com
abc#xyz.com-
valid format
abc#xyz.com
abc#xyz.com.in

Your regex can be edited in a simple way (see a demo at Regex101):
/^[\w\.]+[\w\.\-]*#[\w\.]+\.[\w\.]{2,4}$/
^: This is the beginning of the line
[\w\.]+: This is the first part of the email before # can have only word characters (\w) or dot (\.) at least once.
[\w\.\-]*: After that, the same characters from the list before can occur including the dash (\-) and as many times as you want. Remember, the dash has to be escaped if used in the list between [ and ], otherwise it represents a range instead of the dash itself.
#: This matches itself.
[\w\.]+: After the #` character, there must be at least one character from the list.
\.: Then followed by the dot literally.
[\w\.]{2,4}: Finally the last 2-4 characters.
$: And the end of a line.
The difference between this and your Regex is just a little:
/^[\w\.]+[\w\.\-]*#[\w\.]+\.[\w\.]{2,4}$/
/^(?!-)[\w-\.]+#([\w-]+\.)+[\w-]{2,4}/
I rather avoided the negative look-ahead and specify (whitelist) the characters that can occur on the position, unless it is really needed to blacklist them (which I generally try to avoid). The rest of the Regex is quite similar except you should escape the dash - character between the list braces [ and ].
Finally, I omitted the capturing groups ( and ) and leave it up to you to place them wherever you need.

Add \w to each end of your regex, and include the end anchor$
^\w[\w.-]+#([\w-]+\.)+[\w-]{2,4}\w$
Note also the dot doesn't need escaping within a character class.

a complete email RegEx
/^(([^<>()[\]\\.,;:\s#"]+(\.[^<>()[\]\\.,;:\s#"]+)*)|(".+"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/

Check array syntax with Regex

I'm trying to create a regex that checks if a string is a valid path for Firestore document.
I will find a regex that testing if a string:
start with a char ^([a-z]{1})
after first char, there will be only letter/digit and/or a dot \w*(.?\w+){0,}
last chars in the string could be an index of an array (\[{1}\d+\]{1})?$
First and second points work well but the last group doesn't work. I test a string like data.images[11 and the regex return true.

first of all you can shorten some quantifiers in your regex:
{1} -> can be ignored completely
{0,} -> *
Your second part could be expressed like this, this will also support readability:
[\w.]* meaning: take any character inside the brackets 0 to n-times. The bracket expression also supports predefined classes, so we are using \w here. The dot INSIDE the brackets doesn't need to be escaped, it simply means the one character dot.
So your parts would be:
^([a-z])
[\w.]*
(\[\d+\])?$
I hope this helps. According to regexpal it matches data.images[11], but not data.images[11. Also it seems to support all your demands.
EDIT:
Your second part doesn't work because (like Asocia stated in the answer) you would need to escape the dot. The dot itself is a class meaning "any character" (depending on regex engine and settings sometimes even line breaks). As you mean the dot as a character you need to escape it.

regex not working as it should

I'm trying to catch up on regex and I have made one as below;
^(.){1};(\d){4};(\d){8};[A,K]{1};(\d){7,8};(\d){8};[A-Z ]{1,};[ ,\d]{1};(\d){8};(\d){1};(\d){1}; $
and the sample is;
ä;1234;00126434;K;11821111;00000000;SOME TEXT ; 0;00000000;0;0;
As far as I've read
. is all chars, \d is digits, {n} and variations indicates n time and depending on variation, more repetitions.
What could be the problem?

A few suggestions/observations:
You can remove all {1}s, they don't do anything.
[A,K] means "A, , or K". If you want to match any letter between A and K, use [A-K].
You should place the capturing group around the repetitions: (\d{7,8}) captures a 7-8 digit number; (\d){7,8} will only capture the last digit.
[ ,\d]{1} fails on your regex because there are two characters (space and 0) at that point in the string.
you might need to remove the space before the final $, unless there actually is a space in your string after the last semicolon.
Here's a version that matches (and captures each element in a separate group):
^(.);(\d{4});(\d{8});([A-K]);(\d{7,8});(\d{8});([A-Z ]+);([ ,\d]+);(\d{8});(\d);(\d); *$
See it in action on regex101.com.

Please, don't abuse regexps for everything.
Your format is a CSV format, just split at ; and the validate the individual parts properly. This is perfectly valid, usually similarly efficient, and easier to debug.
With regexp, make sure you properly escape (i.e. double escape!). In most programming languages, \ is a reserved character in strings, and you will need to use \\ to get the desired effect.

Try this:
^(.){1};(\d){4};(\d){8};[A-K]{1};(\d){7,8};(\d){8};[A-Z ]{1,};[ \d]{2};(\d){8};(\d){1};(\d){1};$
Here what was happening in your regex
^(.){1};(\d){4};(\d){8};[A,K]{1};(\d){7,8};(\d){8};[A-Z ]{1,};[ ,\d]{1};(\d){8};(\d){1};(\d){1}; $
You have extra space before $ at the end.
To specify range use - and not comma, Your range should be [A-K].
In [ ,\d] range You have restricted it to 1 character {1} it should be {2} one for
space and 1 for digit.
Additional: You don't need to specify {1} as it will match one preceding token by default

If yours does not work, you can try this one :
^(.){1};(\d){4};(\d){8};[A,K]{1};(\d){7,8};(\d){8};[A-Z ]{1,};( \d){1};(\d){8};(\d){1};(\d){1};$

Regex to check input is not empty plus it's alphanumeric?

I need a single regex to check if input must not be empty plus the input has alphanumeric characters only.
I know the alphanumeric part,^[\s+0-9a-zA-Z]+$, but I am not sure about the not empty requirement.
I can only use a single expression and I can't use any language function.

Simply use this regex to match a non-empty alphanumeric string:
^[a-zA-Z0-9]+$
Details
^ - string start
[a-zA-Z0-9]+ - one or more letters or digits
$ - string end.

I'm going to assume by Not empty you mean not only white space, otherwise you've got the answer you want. + means one or more.
^[a-zA-Z0-9][a-zA-Z0-9\s]*^
will make sure that the string has something other than white space in it.
Additionally if \s is valid then I assume \w is as well, meaning that this could more easily be said as
^[(?:\w|\s)*$
The ?: in the ( ) makes it a non-capture group. If you don't care about capture then this can be omitted, making it the very terse.
^\w(\w|\s)*$

Perl matching characters bigger than a given length

I have been struggle to write regex that matches words longer than a given length within parentheses. First I thought I could do this with \(\w{a,}\) but I realize that it doesn't match with words with white space (ab cd ef). All I want to do is find out any characters within parentheses longer than, for instance, 3 characters. How can I resolve this problem ?

What is a word with white space?
if you want to match any character then use .
\(.{3,}\)
. matches any character except newlines
But be careful, this is greedy. it will match for example also
(a)123(b)
To avoid this you could do something like
\([^)]{3,}\)
See it here online on Regexr
[^)] means any character except a )

You could use a character class that includes both \w and \s:
\([\w\s]{a,}\)

Maybe do you mean?
\([\w\s]{a,}\)

if it has a space in it it's not a word anymore.
is matching any characters fine \(.{a,}\)? Or you just need the whitespace \(\(\w|\s\){a,}\)?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Limiting RegEx to match only a string of 1-254 characters length - regex

This depends on what language you are working in. In Python for example you can regex to split a text into separate strings, and then use len() to remove strings longer than the 255 characters you want

I think this post will help. It shows how to limit certain patterns but I am not sure how you would add it to the entire regex.

Related

Regular Expression for email formatting without hypen at first and last

Check array syntax with Regex

regex not working as it should

Regex to check input is not empty plus it's alphanumeric?

Perl matching characters bigger than a given length

Categories

Resources