Regex for Lines Under 50 Characters - regex

I have tried (^[.*]{1,50}$)/gm but it simply does not work.
I'd like a line made up of any characters to match this regex.
Qwertyuiop
$$%%^^89e7hbequdwanjk
etc should all match, including this line
However, lines over 50 characters long should not match.

You are specifying a string of 1-50 occurrences of either . or *. If you want a string of any characters, the [...] character class is wrong (it enumerates literal characters you want to match); you are looking for . without square brackets, which matches any one character.
The regular expression for that is
^.{1,50}$
Some languages require you to specify a separator such as /.../ around your regex, but it's hard to tell from your example whether yours is one of them; in this case, you are missing the beginning separator.
The /g flag only makes sense if you need to find multiple occurrences on the same line. The /m flag makes sense if the ^ and $ anchors should match newlines in multi-line text.
If the title of your question is correct, and you want properly under 50 characters, change the 50 to 49 (and maybe the 1 to 0).

Your regex, [.*] matches only dots . and *, since inside [] both are treated literally. Try
/^.{1,50}$/gm
It'll match between 1 and 50 of anything. If you also want to capture it add back the parenthesis
/(^.{1,50}$)/gm

Related

How to allow spaces in between words?

EDIT: I've been experimenting, and it seems like putting this:
\(\w{1,12}\s*\)$
works, however, it only allows space at the end of the word.
example,
Matches
(stuff )
(stuff )
Does not
(st uff)
Regexp:
\(\w{1,12}\)
This matches the following:
(stuff)
But not:
(stu ff)
I want to be able to match spaces too.
I've tried putting \s but it just broke the whole thing, nothing would match. I saw one post on here that said to enclose the whole thing in a ^[]*$ with space in there. That only made the regex match everything.
This is for Google Forms validation if that helps. I'm completely new to regex, so go easy on me. I looked up my problem but could not find anything that worked with my regex. (Is it because of the parenthesis?)
For matching text like (st uff) or (st uff some more) you will need to write your regex like this,
\(\w{1,12}(?:\s+\w{1,12})*\)
Regex explanation:
\( - Literal start parenthesis
\w{1,12} - Match a word of length 1 to 12 like you wanted
(?:\s+\w{1,12})* - You need this pattern so it can match one or more space followed by a word of length 1 to 12 and whole of this pattern to repeat zero or more times
\) - Literal closing parenthesis
Demo
Now if you want to optionally also allow spaces just after starting parenthesis and ending parenthesis, you can just place \s* in the regex like this,
\(\s*\w{1,12}(?:\s+\w{1,12})*\s*\)
^^^ ^^^
Demo with optional spaces
If you are trying to get 12 characters between parentheses:
\([^\)]{1,12}\)
The [^\)] segment is a character class that represents all characters that aren't closing parentheses (^ inverts the class).
If you want some specific characters, like alphanumeric and spaces, group that into the character class instead:
\([\w ]{1,12}\)
Or
\([\w\s]{1,12}\)
If you want 12 word characters with an arbitrary number of spaces anywhere in between:
\(\s*(?:\w\s*){1,12}\)

Limiting RegEx to match only a string of 1-254 characters length

This is my RegEx:
"^[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
I need to match only strings less than 255 characters.
I've tried adding the word boundaries at the start of the RegEx but it fails:
"^(?=.{1,254})[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
You need the $ in the lookahead to make sure it's only up to 254. Otherwise, the lookahead will match even when there are more than 254.
(?=.{1,254}$)
Also, keep in mind that you can greatly simplify your regex because many characters that would usually need to be escaped do not need to when in a character class (square brackets).
"[\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]"
is the same as this:
"[-\w!#$%&'*+/=`{|}~?^]"
Note that the dash must be first in the character class to be a literal dash, and the caret must not be first.
With some other simplifications, here is the complete string:
"^(?=.{1,254}$)[-\w!#$%&'*+/=`{|}~?^]+(\.[-\w!#$%&'*+/=`{|}~?^]+)*#((\d{1,3}\.){3}\d{1,3}|([-\w]+\.)+[a-zA-Z]{2,6})$"
Notes:
I removed the stipulation that the first char shouldn't be a period ([^.]) because the next character class doesn't match a period anyway, so it's redundant.
I removed many extraneous parens
I replaced [0-9] with \d
I replaced {0,1} with the shorthand "?"
After the # sign, it seemed that you were trying to match an IP address or text domain name, so I separated them more so it couldn't be a combination
I'm not sure what the optional square bracket at the end was for, so I removed it: "(]?)"
I tried it in Regex Hero, and it works. See if it works for you.
This depends on what language you are working in. In Python for example you can regex to split a text into separate strings, and then use len() to remove strings longer than the 255 characters you want
I think this post will help. It shows how to limit certain patterns but I am not sure how you would add it to the entire regex.

RegEx \D matches start and end of line as well

I need to find lines that are 3 digits and 3 other characters: I thought I use the following RegEx:
^\d{3}\D{3}$
But take the following sample text file and run the RegEx above (the text must have the empty lines in it):
1
12
123xxx
123y
aaabb
The problem is that there are two matches: 123xxx (which is fine), but also 123y is matched!
I suspect the reason is that "y" + the end-of-line + the beginning-of-next-line are also matched.
How can I tell the regex engine to ignore line beginnings and endings with \D and match characters only, not positions?
The behavior of $ in UltraEdit changes depending on whether you have "Match Whole Word Only" checked or not. To get the behavior you want you need to make sure that that option is checked. Your regular expression doesn't need to change.
Maybe:
/^\d{3}\D{3}$/m
The m means
Treat string as multiple lines. That is, change "^" and "$" from matching the start or end of the string to matching the start or end of any line anywhere within the string.
http://perldoc.perl.org/perlre.html
I don't know about UltraEdit exactly but I expect it will have something similar.
Try this :
^\d{3}[\S]{3}$
Match lines with 3 digits followed by three characters that are not blank characters.

Vim RegEx: Match until blank line

I'm trying to write a RegEx that will match any line that contains ".wpd", and then match all lines after that until it reaches a blank line (including the blank line).
This is what I've tried:
/\v^.*.wpd\_.\{-}^\s*$
However, the non-greedy operator \{-} after the "all characters including new lines" character class \{-} doesn't seem to work. If I use
/\v^.*.wpd\_.*
that will match the next line containing ".wpd" and then all lines after that. However, as soon as I change the * to \{-}, it doesn't match anything at all.
What am I doing wrong? Thanks!
This one seems to work:
/\v^.*\.wpd\_.{-}\n\s*\n
You cannot use the atom ^ (same for $) inside the regexp, it has its special meaning only at the front (back); elsewhere, it's taken as the literal char. Use \n to match a newline inside the regexp, as shown by perreal's answer.
(?s)[^\n\r]*\.wpd(.*?)\n{2}
(?s) - Turn on 'dot matches line breaks' to search across lines
[^\n\r]* - Starting at the beginning of a line, match anything that's not a line break
.wpd - Match '.wpd'
(.*?) - Match anything, non-greedily, including line breaks ( because we turned on (?s) previously )
\n{2} - ... until you find two newlines in a row, which would be a blank line
:)
The following is a large supporting comment to #perreal's answer above as well as my own version of that answer which I find more intuitive.
Let's dissect the following regexp based on http://vimdoc.sourceforge.net/htmldoc/pattern.html#/magic
/\v^.*\.wpd\_.{-}\n\s*\n
\v (lowercase v): This is the 'very magic' operator which
signifies that in the pattern after it all ASCII characters except
'0'-'9', 'a'-'z', 'A'-'Z' and '_' have a special meaning.Therefore, characters like * , ^, $ need not be escaped in the pattern but for _ to have special meaning (such as modifying the behaviour of . to match newline), it needs to be escaped. Hence with \v set, you need \_ for the latter to have special meaning. To truly appreciate how much very magic simplifies the expression, compare it with the same expression using the very NOmagic(uppercase \V): /\V\^\.\*.wpd\_\.\{-}\n\s\*\n (very nomagic) vs /\v^.*\.wpd\_.{-}\n\s*\n (very magic)
^.*\.wpd: Greedily match anything (.*) from the beginning of a line (^) till .wpd
\_. : Matches a single character, which can be
any character including the newline. Note that with \v set, the pattern must have escaped underscore as noted above.
{-} : Is the non-greedy equivalent of * quantifier. So, where .*BLAH matches the most possible characters till BLAH, .{-}BLAH will match the least possible. To see this in action, take a look at this (in this case, I had to use ? instead of {-} since that regex is PCRE) :
\n\s*\n: Matches a blank line which may contain one or more spaces or tabs
\_.{-}\n\s*\n: combines the above two and means Match the least possible number of characters including newline (\_.) until a blank line (\n\s*\n)
\v^.*\.wpd\_.{-}\n\s*\n: Finally putting it altogether, set the very magic operator (possibly to allow simplifying the pattern by not needing to escape anything except an _ for special meaning), search for any line which contains .wpd and match until the closest blank line.
My version using variants of end-of-line start-of-line characters
The only modification is to the expression used to signify a blank line. I find it useful to define a blank line in terms of the start-of-line ('^') and end-of-line ('$') characters, however as-is, they cannot be used anywhere in a regexp except the beginning and the end respectively.
For the above use-case, there are variants which can be used anywhere in a regex, namely: '_^' and \_$ respectively. Therefore the blank line expression can be written as \_^\s*\_$ instead of \n\s*\n, thus making the complete expression:
\v^.*.wpd\_.{-}\_^\s*\_$
This perhaps is closer to answering the OP's question about why they were unable to use the start-of-line character in their expression.
Phew!

regular expressions boost c++

trying to catch the characters at the start the string and newlines the string is
.V/1LBOG\n.F/AV0094/08NOV/SAL/Y\n.E/0134249356001"
the regular expression i am using is from the string above i need to catch .V/ and .E/
^.[VE]/*
But it only seems to ctach .V/ can anyone see why as i thought ^ means newlines aswell as start of strings ? any help will be very gratefull as ive had this problem for a while now. If this is not the correct way as in doing this could you propose a different way.
Regex 101:
^ means start of string. And you guessed it right. There can only be one start of string.
^.[VE]/*
means :
Match start of string, followed by any character (other than newline), followed by either a V or a E, followed by 0 to n / (greedy).
Probably you want something like this :
\.[VE].*?(?:\\n|$)
Which means match a dot, followed by V or E and match everything until \n or end of string.
Comment if I am wrong.
So .V/1LBOG\n.F/AV0094/08NOV/SAL/Y\n.E/0134249356001"
Looks like this ?
.V/1LBOG
.F/AV0094/08NOV/SAL/Y
.E/0134249356001"
If yes, then you need to change your regex a little bit:
\.[VE].*
Abusing the fact that . does not match newlines by default.
. in regular expressions matches any single character, not a literal .. If you want to match a literal period, you need to escape it (\.). * doesn't match any number of any characters (as most shells would), but instead matches zero or more instances of whatever you put before it. For example, A* will match the literal letter A, AAAA etc., and .* will match any string.
^ means the beginning of a line. ^\.[VE]/ will match .V/ and .E/ (but only at the start of the line).
if you need .V or .E try ^.(V|E)/* the or | operator is useful for check ^.V/* or ^.E/*