Regex without \r\n and not entirely space - regex

I'm checking for valid user input for an executable; however it does include things like del/rm, dir/ls. The input is collected through XML and is validated using XSD. I will not check for file existence, since my program submits to a server, which may or may not have access to the same files.
The only requirements then, are that it not have a new line \r or \n and it cannot be entirely white space. I think it would be valid to assume that tab \t would not be allowed either, but I am more concerned with newlines.
Thanks

Does this mean you have the limitations mentioned here:
http://www.regular-expressions.info/xml.html
If so, then you probably want something like this:
[^\r\n\t]*[^\r\n\t\s][^\r\n\t]*
The middle part means there has to be one character that is not a newline, tab, or whitespace. The rest of it means zero or more characters around that character that aren't a newline or tab (but it can be whitespace). I think you might be able to remove the \r\n\t from the middle group because they all might be encompassed in \s but I haven't tested any of this.
Remove the three occurrences of \t if you want tabs.

I am not entirely sure what you want to do, but a regular expressions for "no newline and not just whitespace" would be
[ \t]*\S[^\r\n]*
This matches zero or more whitespace characters followed by a non-whitespace characters and an abitrary number of characters that are not \r or \n (including spaces and tabs). It cannot match a string consisting of only whitespace (as there would be no non-whitespace character matching \S).
To prohibit tabs also, you can change this to read
[ ]*\S[^\r\n\t]*

Related

RegExp space character

I have this regular expression: ^[a-zA-Z]\s{3,16}$
What I want is to match any name with any spaces, for example, John Smith and that contains 3 to 16 characters long..
What am I doing wrong?
Background
There are a couple of things to note here. First, a quantifier (in this case, {3,16}) only applies to the last regex token. So what your current regex really is saying is to "Match any string that has a single alphabetical character (case-insensitive) followed by 3 to 16 whitespace characters (e.g. spaces, tabs, etc.)."
Second, a name can have more than 2 parts (a middle name, certain ethnic names like "De La Cruz") or include special characters such as accented vowels. You should consider if this is something you need to account for in your program. These things are important and should be considered for any real application.
Assumptions and Answer
Now, let's just assume you only want a certain format for names that consists of a first name, a last name, and a space. Let's also assume you only want simple ASCII characters (i.e. no special characters or accented characters). Furthermore, both the first and last names should start with a capital character followed by only lower-case characters. Other than that, there are no restrictions on the length of the individual parts of the name. In this case, the following regex would do the trick:
^(?=.{3,16}$)[A-Z][a-z]+ [A-Z][a-z]+$
Notes
The first token after the ^ character is what is called a positive lookahead. Basically a positive look ahead will match the regex between the opening (?= and closing ) without actually moving the position of the cursor that is matching the string.
Notice I removed the \s token, since you usually want only a (space). The space can be replaced with the \s token, if tabs and other whitespace is desired there.
I also added a restriction that a name must start with a capital letter followed by only lower-case letters.
Crude English Translation
To help your understanding, here is a simple English translation of what the regex is really doing. The part in italics is just copied from the first part of the English translation of the regex.
"Match any string that has 3-16 characters and starts with a capital alphabetical character followed by one or more (+) alphabetical characters followed by a single space followed by a capital alphabetical character followed by one or more (+) alphabetical characters and ends with any lowercase letter."
Tools
There are a couple of tools I like to use when I am trying to tackle a challenging regex. They are listed below in no particular order:
https://regex101.com/ - Allows you to test regex expressions in real time. It also has a nifty little library to help you along.
http://www.regular-expressions.info/ - Basically a repository of knowledge on regex.
Edit/Update
You mentioned in your comments that you are using your regex in JavaScript. JavaScript uses a forward slash surrounding the regex to determine what is a regex. For this simple case, there are 2 options for using a regex to match a string.
First, use String's match method as follows
"John Smith".match(/^(?=.{3,16}$)[A-Z][a-z]+ [A-Z][a-z]+$/);
Second, create a regex and use its test() method. For example,
/^(?=.{3,16}$)[A-Z][a-z]+ [A-Z][a-z]+$/.test("John Smith");
The latter is probably what you want as it simply returns true or false depending on whether the regex actually matches the string or not.

CQ5 textfield validation with regex

I have a simple CQ dialog with a textfield. The authors somehow managed to paste illegal characters into it, the last two times it was a vertical tab (VT) copied from a PowerPoint file.
I played around with some regex and came up with the following to exclude anything below SPACE and DEL:
/^[^\0-\x1F\x7F]*$/
Sadly I can't really test the vertical tab as I am not able to enter this character on regex101. So I tried it with TAB and this seems to be working: https://regex101.com/r/yH0lN5/1
But if I use this in my regex property of the textfield, no matter what I enter the validation fails. Any idea what I am doing wrong?
White listing isn't an option as i need to support Unicode characters like chinese in the future.
You should double the backslashes to make sure they are treated as literal backslashes by the regex engine.
Also, I suggest using consistent notation, and replace \0 with \x00:
regex="/^[^\\x00-\\x1F\\x7F]*$/"
And this regex just matches entires strings that contain zero or more characters (due to *) other than (due to the negated character class used [^...]) the ones from the NUL to US character ([\x00-\x1F]) and a DEL character (\x7F):

In VIM, why don't you have to add back '$' in a search and replace?

I've been learning how to do more complex search and replace functions in VIM, and I ran across a use case like this:
:%s/$/|/g
This supposedly finds the end of every line and replaces it with a vertical pipe. When I was first learning this, though, I assumed you would have to add the end-of-line character in the replacement string to get the expected results. i.e.,
:%s/$/|$/g
Why does it work without it and still preserve the line break? Shouldn't it be replacing the line's terminating character with your string and removing it in the process?
The same thing could be asked with the beginning-of-line character, ^.
Anchor $ does not include the newline character. In fact it is a zero-width token. It matches the empty character just before the first newline in your string. And hence the result.
Similarly, ^ matches an empty character before the first character in your string.
See http://www.regular-expressions.info/anchors.html for more details.

Find whitespace in end of string using wildcards or regex

I have a Resoure.resx file that I need to search to find strings ending with a whitespace. I have noticed that in Visual Web Developer I can search using both regex and wildcards but I can not figure out how to find only strings with whitespace in the end. I tried this regex but didn't work:
\s$
Can you give me an example? Thanks!
I'd expect that to work, although since \s includes \n and \r, perhaps it's getting confused. Or I suppose it's possible (but really unlikely) that the flavor of regular expressions that Visual Web Developer uses (I don't have a copy) doesn't have the \s character class. Try this:
[ \f\t\v]$
...which searches for a space, formfeed, tab, or vertical tab at the end of a line.
If you're doing a search and replace and want to get rid of all of the whitespace at the end of the line, then as RageZ points out, you'll want to include a greedy quantifier (+ meaning "one or more") so that you grab as much as you can:
[ \f\t\v]+$
You were almost there. adding the + sign means 1 characters to infinite number of characters.
This would probably make it:
\s+$
Perhaps this would work:
^.+\s$
Using this you'll be able to find nonempty lines that end with a whitespace character.

How to deal with the new line character in the Silverlight TextBox

When using a multi-line TextBox (AcceptsReturn="True") in Silverlight, line feeds are recorded as \r rather than \r\n. This is causing problems when the data is persisted and later exported to another format to be read by a Windows application.
I was thinking of using a regular expression to replace any single \r characters with a \r\n, but I suck at regex's and couldn't get it to work.
Because there may be a mixture of line endings just blindy replacing all \r with \r\n doesn't cut it.
So two questions really...
If regex is the way to go what's the correct pattern?
Is there a way to get Silverlight to respect it's own Environment.NewLine character in TextBox's and have it insert \r\n rather just a single \r?
I don't know Silverlight, but I imagine (I hope!) there's a way to get it to respect Environment.NewLine—that would be a better approach. If there isn't, however, you can use a regex. I'll assume you have text which contains all of \r, \n, and \r\n, and never uses those as anything but line endings—you just want consistency. (If they show up as non-line ending data, the regex solution becomes much harder, and possibly impossible.) You thus want to replace all occurrences of \r(?!\n)|(?<!\r)\n with \r\n. The first half of the first regex matches any \r not followed by a \n; the second half matches a lone \n which wasn't preceded by a \r.
The fancy operators in this regex are termed lookaround: (?=...) is a positive lookahead, (?<=...) is a positive lookbehind, (?!...) is a negative lookahead, and (?<!...) is a negative lookbehind. Each of them is a zero-width assertion like ^ or $; they match successfully without consuming input if the given regex succeeds/fails (for positive/negative, respectively) to match after/before (for lookahead/lookbehind) the current location in the string.
I don't know Silverlight at all (and I find the behavior you're describing very strange), but perhaps you could try searching for \r(?!\n) and replacing that with \r\n.
\r(?!\n) means "match a \r if and only if it's not followed by \n".
If you also happen to have \n without preceding \rs and want to "normalize" those too, then search for \r(?!\n)|(?<!\r)\n and replace with \r\n.
(?<!\r)\n means "match a \n if and only if it's not preceded by \r".