Find whitespace in end of string using wildcards or regex - regex

I have a Resoure.resx file that I need to search to find strings ending with a whitespace. I have noticed that in Visual Web Developer I can search using both regex and wildcards but I can not figure out how to find only strings with whitespace in the end. I tried this regex but didn't work:
\s$
Can you give me an example? Thanks!

I'd expect that to work, although since \s includes \n and \r, perhaps it's getting confused. Or I suppose it's possible (but really unlikely) that the flavor of regular expressions that Visual Web Developer uses (I don't have a copy) doesn't have the \s character class. Try this:
[ \f\t\v]$
...which searches for a space, formfeed, tab, or vertical tab at the end of a line.
If you're doing a search and replace and want to get rid of all of the whitespace at the end of the line, then as RageZ points out, you'll want to include a greedy quantifier (+ meaning "one or more") so that you grab as much as you can:
[ \f\t\v]+$

You were almost there. adding the + sign means 1 characters to infinite number of characters.
This would probably make it:
\s+$

Perhaps this would work:
^.+\s$
Using this you'll be able to find nonempty lines that end with a whitespace character.

Related

What is the difference between `(\S.*\S)` and `^\s*(.*)\s*$` in regex?

I'm doing the RegexOne regex tutorial and it has a question about writing a regular expression to remove unnecessary whitespace.
The solution provided in the tutorial is
We can just skip all the starting and ending whitespace by not capturing it in a line. For example, the expression ^\s*(.*)\s*$ will catch only the content.
The setup for the question does indicate the use of the hat at the beginning and the dollar sign at the end, so it makes sense that this is the expression that they want:
We have previously seen how to match a full line of text using the hat ^ and the dollar sign $ respectively. When used in conjunction with the whitespace \s, you can easily skip all preceding and trailing spaces.
That said, using \S instead, I was able to come up with what seems like a simpler solution - (\S.*\S).
I've found this Stack Overflow solution that match the one in the tutorial - Regex Email - Ignore leading and trailing spaces? and I've seen other guides that recommend the same format but I'm struggling to find an explanation for why the \S is bad.
Additionally, this validates as correct in their tool... so, are there cases where this would not work as well as the provided solution? Or is the recommended version just a standard format?
The tutorial's solution of ^\s*(.*)\s*$ is wrong. The capture group .* is greedy, so it will expand as much as it can, all the way to the end of the line - it will capture trailing spaces too. The .* will never backtrack, so the \s* that follows will never consume any characters.
https://regex101.com/r/584uVG/1
Your solution is much better at actually matching only the non-whitespace content in the line, but there are a couple odd cases in which it won't match the non-space characters in the middle. (\S.*\S) will only capture at least two characters, whereas the tutorial's technique of (.*) may not capture any characters if the input is composed of all whitespace. (.*) may also capture only a single character.
But, given the problem description at your link:
Occasionally, you'll find yourself with a log file that has ill-formatted whitespace where lines are indented too much or not enough. One way to fix this is to use an editor's search a replace and a regular expression to extract the content of the lines without the extra whitespace.
From this, matching only the non-whitespace content (like you're doing) probably wouldn't remove the undesirable leading and trailing spaces. The tutorial is probably thinking to guide you towards a technique that can be used to match a whole line with a particular pattern, and then replace that line with only the captured group, like:
Match ^\s*(.*\S)\s*$, replace with $1: https://regex101.com/r/584uVG/2/
Your technique would work given the problem if you had a way to make a new text file containing only the captured groups (or all the full matches), eg:
const input = ` foo
bar
baz
qux `;
const newText = (input.match(/\S(?:$|.*\S)/gm) || [])
.join('\n');
console.log(newText);
Using \S instead of . is not bad - if one knows a particular location must be matched by a non-space character, rather than by a space, using \S is more precise, can make the intent of the pattern clearer, and can make a bad match fail faster, and can also avoid problems with catastrophic backtracking in some cases. These patterns don't have backtracking issues, but it's still a good habit to get into.

match first space on a line using sublime text and regular expressions

So regular expressions have always been tough for me. Im getting frustrated trying to find a regular expression that will select the first white space on a line. So then i can use sublime text to replace that with a /
If you could give a quick explanation that would help to
In the spirit of #edi's answer, but with some explanation of what's happening. Match the beginning of the line with ^, then look for a sequence of characters that are not whitespace with [^\s]* or \S* (the former may work in more editors, libraries, etc than the latter), then find the first whitespace character with \s. Putting these together, you have
^[^\s]*\s
You may want to group the non-whitespace and whitespace parts, so you can do the replacement you're talking about:
^([^\s]*)(\s)
Then the replacement pattern is just \1/
You can use this regex.
^([^\s]*)\s

vim regex replace multiple consecutive spaces with only one space

I often work with text files which have a variable amount of whitespaces as word separators (text processors like Word do this, to distribute fairly the whitespace amount due to different sizes of letters in certain fonts and they put this annoying variable amount of spaces even when saving as plain text).
I would like to automate the process of replacing these sequences of whitespaces that have variable length with single spaces. I suspect a regex could do it, but there are also whitespaces at the beginning of paragraphs (usually four of them, but not always), which I would want to let unchanged, so basically my regex should also not touch the leading whitespaces and this adds to the complexity.
I'm using vim, so a regex in the vim regex dialect would be very useful to me, if this is doable.
My current progress looks like this:
:%s/ \+/ /g
but it doesn't work correctly.
I'm also considering to write a vim script that could parse text lines one by one, process each line char by char and skip the whitespaces after the first one, but I have a feeling this would be overkill.
this will replace 2 or more spaces
s/ \{2,}/ /g
or you could add an extra space before the \+ to your version
s/ \+/ /g
This will do the trick:
%s![^ ]\zs \+! !g
Many substitutions can be done in Vim easier than with other regex dialects by using the \zs and \ze meta-sequences. What they do is to exclude part of the match from the final result, either the part before the sequence (\zs, “s” for “start here”) or the part after (\ze, “e” for “end here”). In this case, the pattern must match one non-space character first ([^ ]) but the following \zs says that the final match result (which is what will be replaced) starts after that character.
Since there is no way to have a non-space character in front of line-leading whitespace, it will be not be matched by the pattern, so the substitution will not replace it. Simple.
In the interests of pragmatism, I tend to just do it as a three-stage process:
:g/^ /s//XYZZYPARA/g
:g/ \+/s// /g
:g/^XYZZYPARA/s// /g
I don't doubt that there may be a better way (perhaps using macros or even a pure regex way) but I usually find this works when I'm in a hurry. Of course, if you have lines starting with XYZZYPARA, you may want to adjust the string :-)
It's good enough to turn:
This is a new paragraph
spanning two lines.
And so is this but on one line.
into:
This is a new paragraph
spanning two lines.
And so is this but on one line.
Aside: If you're wondering why I use :g instead of :s, that's just habit mostly. :g can do everything :s can and so much more. It's actually a way to execute an arbitrary command on selected lines. The command to execute happens to be s in this case so there's no real difference but, if you want to become a vi power user, you should look into :g at some point.
There are lots of good answers here (especially Aristotle's: \zs and \ze are well worth learning). Just for completeness, you can also do this with a negative look-behind assertion:
:%s/\(^ *\)\#<! \{2,}/ /g
This says "find 2 or more spaces (' \{2,}') that are NOT preceded by 'the start of the line followed by zero or more spaces'". If you prefer to reduce the number of backslashes, you can also do this:
:%s/\v(^ *)#<! {2,}/ /g
but it only saves you two characters! You could also use ' +' instead of ' {2,}' if you don't mind it doing a load of redundant changes (i.e. changing a single space to a single space).
You could also use the negative look-behind to just check for a single non-space character:
:%s/\S\#<!\s\+/ /g
which is much the same as (a slightly modified version of Aristotle's to treat spaces and tabs as the same in order to save a bit of typing):
:%s/\S\zs \+/ /g
See:
:help \zs
:help \ze
:help \#<!
:help zero-width
:help \v
and (read it all!):
:help pattern.txt
Answered; but though i'd toss my work flow in anyway.
%s/ / /g
#:#:#:#:#:#:#:#:#:#:#:#:(repeat till clean)
Fast and simple to remember. There are a far more elegant solutions above; but just my .02.
Does this work?
%s/\([^ ]\) */\1 /g
I like this version - it is similar to the look ahead version of Aristotle Pagaltzis, but I find it easier to understand. (Probably just my unfamiliarity with \zs)
s/\([^ ]\) \+/\1 /g
or for all whitespace
s/\(\S\)\s\+/\1 /g
I read it as "replace all occurences of something other than a space followed by multiple spaces with the something and a single space".

How to deal with the new line character in the Silverlight TextBox

When using a multi-line TextBox (AcceptsReturn="True") in Silverlight, line feeds are recorded as \r rather than \r\n. This is causing problems when the data is persisted and later exported to another format to be read by a Windows application.
I was thinking of using a regular expression to replace any single \r characters with a \r\n, but I suck at regex's and couldn't get it to work.
Because there may be a mixture of line endings just blindy replacing all \r with \r\n doesn't cut it.
So two questions really...
If regex is the way to go what's the correct pattern?
Is there a way to get Silverlight to respect it's own Environment.NewLine character in TextBox's and have it insert \r\n rather just a single \r?
I don't know Silverlight, but I imagine (I hope!) there's a way to get it to respect Environment.NewLine—that would be a better approach. If there isn't, however, you can use a regex. I'll assume you have text which contains all of \r, \n, and \r\n, and never uses those as anything but line endings—you just want consistency. (If they show up as non-line ending data, the regex solution becomes much harder, and possibly impossible.) You thus want to replace all occurrences of \r(?!\n)|(?<!\r)\n with \r\n. The first half of the first regex matches any \r not followed by a \n; the second half matches a lone \n which wasn't preceded by a \r.
The fancy operators in this regex are termed lookaround: (?=...) is a positive lookahead, (?<=...) is a positive lookbehind, (?!...) is a negative lookahead, and (?<!...) is a negative lookbehind. Each of them is a zero-width assertion like ^ or $; they match successfully without consuming input if the given regex succeeds/fails (for positive/negative, respectively) to match after/before (for lookahead/lookbehind) the current location in the string.
I don't know Silverlight at all (and I find the behavior you're describing very strange), but perhaps you could try searching for \r(?!\n) and replacing that with \r\n.
\r(?!\n) means "match a \r if and only if it's not followed by \n".
If you also happen to have \n without preceding \rs and want to "normalize" those too, then search for \r(?!\n)|(?<!\r)\n and replace with \r\n.
(?<!\r)\n means "match a \n if and only if it's not preceded by \r".

How to read this command to remove all blanks at the end of a line

I happened across this page full of super useful and rather cryptic vim tips at http://rayninfo.co.uk/vimtips.html. I've tried a few of these and I understand what is happening enough to be able to parse it correctly in my head so that I can possibly recreate it later. One I'm having a hard time getting my head wrapped around though are the following two commands to remove all spaces from the end of every line
:%s= *$== : delete end of line blanks
:%s= \+$== : Same thing
I'm interpreting %s as string replacement on every line in the file, but after that I am getting lost in what looks like some gnarly variation of :s and regex. I'm used to seeing and using :s/regex/replacement. But the above is super confusing.
What do those above commands mean in english, step by step?
The regex delimiters don't have to be slashes, they can be other characters as well. This is handy if your search or replacement strings contain slashes. In this case I don't know why they use equal signs instead of slashes, but you can pretend that the equals are slashes:
:%s/ *$//
:%s/ \+$//
Does that make sense? The first one searches for a space followed by zero or more spaces, and the second one searches for one or more spaces. Each one is anchored at the end of the line with $. And then the replacement string is empty, so the spaces are deleted.
I understand your confusion, actually. If you look at :help :s you have to scroll down a few pages before you find this note:
*E146*
Instead of the '/' which surrounds the pattern and replacement string, you
can use any other character, but not an alphanumeric character, '\', '"' or
'|'. This is useful if you want to include a '/' in the search pattern or
replacement string. Example:
:s+/+//+
I do not know vim syntax, but it looks to me like these are sed-style substitution operators. In sed, the / (in s/REGEX/REPLACEMENT/) can be uniformly replaced with any other single character. Here it appears to be =. So if you mentally replace = with /, you'll get
:%s/ *$//
:%s/ \+$//
which should make more sense to you.