How to read this command to remove all blanks at the end of a line - regex

I happened across this page full of super useful and rather cryptic vim tips at http://rayninfo.co.uk/vimtips.html. I've tried a few of these and I understand what is happening enough to be able to parse it correctly in my head so that I can possibly recreate it later. One I'm having a hard time getting my head wrapped around though are the following two commands to remove all spaces from the end of every line
:%s= *$== : delete end of line blanks
:%s= \+$== : Same thing
I'm interpreting %s as string replacement on every line in the file, but after that I am getting lost in what looks like some gnarly variation of :s and regex. I'm used to seeing and using :s/regex/replacement. But the above is super confusing.
What do those above commands mean in english, step by step?

The regex delimiters don't have to be slashes, they can be other characters as well. This is handy if your search or replacement strings contain slashes. In this case I don't know why they use equal signs instead of slashes, but you can pretend that the equals are slashes:
:%s/ *$//
:%s/ \+$//
Does that make sense? The first one searches for a space followed by zero or more spaces, and the second one searches for one or more spaces. Each one is anchored at the end of the line with $. And then the replacement string is empty, so the spaces are deleted.
I understand your confusion, actually. If you look at :help :s you have to scroll down a few pages before you find this note:
*E146*
Instead of the '/' which surrounds the pattern and replacement string, you
can use any other character, but not an alphanumeric character, '\', '"' or
'|'. This is useful if you want to include a '/' in the search pattern or
replacement string. Example:
:s+/+//+

I do not know vim syntax, but it looks to me like these are sed-style substitution operators. In sed, the / (in s/REGEX/REPLACEMENT/) can be uniformly replaced with any other single character. Here it appears to be =. So if you mentally replace = with /, you'll get
:%s/ *$//
:%s/ \+$//
which should make more sense to you.

Related

reuse last matched character of regex in sed

Many of you with a certain leaning towards proper formatting will know the pain of having a lot of space characters insted of a tab character in the beginning of indented lines after another person edited a file and added lines. I seem to be unable to teach my colleagues how to use vim's integrated line pasting function, so I'm searching for some simple ways to automatically correct lines beginning with a certain pattern. ;)
I'm using a regex to find the corresponding lines, but I can't work out how to "reuse" the last matched character in sed when using "find and replace". The regex matching the lines is
'^\ *[A-Z]'
I would like to replace those space characters, but keep the uppercase letter. My idea would be something like
sed 's|^\ *[A-Z]|\t$|g'
or so, but I guess that would replace the whole line with a single tab character since $ usually matches the line ending?
Is there a simple way to reuse parts of the matched regex in sed?
How about simply not including the first non-space character in the match in the first place?
This matches all spaces at the beginning of a line:
^ *
Edit (quote from the comments):
obviously I don't want to replace spaces in front of other characters than uppercase letters
A look-ahead could do that, but unfortunatey sed does not support them. But you can use the next best thing, an expression that determines which lines sed operates on:
sed '|^ *[A-Z]| s|^ *|\t|'
Of course a back-reference would do it as well:
sed 's|^ *\([A-Z]\)|\t\1|'

How can I remove all the text between matches on a line?

I have this problem:
Input text:
this is my text text text and more text
this is my text myspace this is my text
this space is my text space this is my
this is my text this is my text
this space is my text space space myspace
Let say I want to search for "space"
I would like to have this as output:
this is my text text text and more text
space
space space
this is my text this is my text
space space space space
Matches on the same line have to be separated with a space.
Line without matches must remain as it is.
Same for all other search items.
I'm trying to realize this, this afternoon but without success.
Can anyone help me?
Solution:
:g/space/s/\(.*space\).*$/\1/|s/.\{-}space/ space/g|s/^ //
Explanation:
This is tricky, but it can be done. It can't be done with a single regular expression, though.
The first thing we do is get rid of anything after the last match (we actually exploit the fact that regular expressions are greedy by default here):
s/\(.*space\).*$/\1/
Then we remove anything between all the internal matches (notice we use the lazy version of * here, \{-}):
s/.\{-}space/ space/g
The previous step will leave an initial space in the result, so we get rid of that:
s/^ //
Fortunately, in vim, we can chain replacements together with the | character. So, putting it all together:
:g/space/s/\(.*space\).*$/\1/|s/.\{-}space/ space/g|s/^ //
is this tricky line ok for you?
:g/space/s/space/^G/g|s/[^^G]//g|s/^G/space /g
the ^G above you need press Ctrl-V Ctrl-G
the output of above command is same as your example except for the ending whitespace after pattern (space in this case). but it is easy to be fixed, e.g. chain another s/ $// after the :g line.
Kent's solution uses a nice trick that makes it work only for fixed strings, but it's clean and short. Ethan Brown's answer is more general, but also adds complexity with its three steps. I think the best solution can be developed based on the accepted answer in this very similar question.
Contrary to what Ethan Brown assumes, this can indeed be done with a single regular expression substitution. Here it is, in all its ugliness:
:g/space/s/\%(^\|\%(space \)*space\%( \%(.*space\)\#=\)\?\)\zs\%(\%(space \)*space\%( \%(.*space\)\#=\)\?\)\#!.\{-1,}\ze\%(\%(space \)*space\%( \%(.*space\)\#=\)\?\|$\)//g
It becomes somewhat more readable when you use the :DeleteExcept command from my PatternsOnText plugin:
:g/space/DeleteExcept/\%(space \)*space\%( \%(.*space\)\#=\)\?/
Explanation
This deletes everything except
potentially multiple sequential occurrences \%(space \)*
of the word space
including the trailing whitespace when it's not the last match in the line, i.e. there's a following match \%(.*space\)\#= so that the whitespace is not swallowed
or excluding (i.e. deleting) it \? after the last match in the line.
More practical alternative
Though it's a nice challenge to come up with the above solution, in practice, I would also favor a two-step approach, just because it's way simpler:
:g/space/DeleteExcept/space\%( \|$\)/
This leaves behind trailing whitespace that can be pruned with
:%s/ $//

Struggling with regex in yahoo pipe

I'm using Yahoo Pipes to build a scraper that would scrape our company micro-site via xPath and generate an RSS feed that I can then embed on the main site.
So far I got as far as scraping the Job title and location from the page but I can't get the items to link out to the micro-site.
Here's my pipe so far: http://pipes.yahoo.com/pipes/pipe.info?_id=2bb5b8fedd0064b64d0e8861e3fc8fd5
I think I need to extract the href link from each node and then apply regex but I really can't get my head around it.
The link looks like this in the code: www2.jobs.badenochandclark.ch/JavaScript:OpenAssignment('a960c93a-11fe-4751-bc27-83a48429c3ba',%20'/Jobs/Details/a960c93a-11fe-4751-bc27-83a48429c3ba');
But I'm struggling to generate a regex that would basically do this:
www2.jobs.badenochandclark.ch/JavaScript:OpenAssignment('a960c93a-11fe-4751-bc27-83a48429c3ba',%20'/Jobs/Details/a960c93a-11fe-4751-bc27-83a48429c3ba');
So I'm stuck on how to extract a link and then how to build that on to the pipe. Any help or nudge in the right direction would be really appreciated.
Here you go..
http://pipes.yahoo.com/pipes/pipe.info?_id=d564b802185d5777d757ed4189470941
Used slightly less complicated code in the regex module. It often being easier to erase the code you do not want than trying to extract and assign to a variable
in plx.link.href find this-> JavaScript(.+)Jobs replace with->jobs
in plx.link.href find this-> \'\); replace with->leave blank
the trailing bit of code '); requires the backslashes as ' and ) are control charecters adding the backslash \ makes regex read them literlally as text characters.
This bit of regex a(.+?)b means match or grab everything between a & b and comes in handy for this sort of thing a lot.
Full-fledged URL-parsing isn't simple, but given enough constraints it becomes manageable.
For example, if you know
that JavaScript:OpenAssignment( always follows a /,
that the first argument is always a hexadecimal+dashes string in quotes,
that the second argument (at least the portion you need) is also in quotes,
and that you can discard the remainder of URL after the "function,"
then something like this might be a starting point:
\/JavaScript:OpenAssignment\([^'"]*['"][0-9a-fA-F\-]+['"][^,)]*,[^'")]*['"]([0-9a-fA-F\-]+)['"].*
Then, $1 would contain the match you desire to keep. The explanation follows.
\/ Slashes need to be escaped (usually).
JavaScript:OpenAssignment Our function of interest.
\( Parentheses need to be escaped too.
[^'"]* We're looking for a quote next, so ignore any
string of non-quotes, e.g. %20.
['"] A quote character.
[0-9a-fA-F\-]+ A hexadecimal-and-dashes string.
['"] A quote character.
[^,)]* We're looking for a comma next, so ignore any
string of non-quotes, e.g., again, %20.
, A comma character.
[^'"]* We're looking for a quote again, so ignore any
string of non-quotes, e.g. %20.
['"] A quote character.
([0-9a-fA-F\-]+) A hexadecimal-and-dashes string, this time captured.
['"] A quote character.
.* The rest of the string that we don't care about.

Find whitespace in end of string using wildcards or regex

I have a Resoure.resx file that I need to search to find strings ending with a whitespace. I have noticed that in Visual Web Developer I can search using both regex and wildcards but I can not figure out how to find only strings with whitespace in the end. I tried this regex but didn't work:
\s$
Can you give me an example? Thanks!
I'd expect that to work, although since \s includes \n and \r, perhaps it's getting confused. Or I suppose it's possible (but really unlikely) that the flavor of regular expressions that Visual Web Developer uses (I don't have a copy) doesn't have the \s character class. Try this:
[ \f\t\v]$
...which searches for a space, formfeed, tab, or vertical tab at the end of a line.
If you're doing a search and replace and want to get rid of all of the whitespace at the end of the line, then as RageZ points out, you'll want to include a greedy quantifier (+ meaning "one or more") so that you grab as much as you can:
[ \f\t\v]+$
You were almost there. adding the + sign means 1 characters to infinite number of characters.
This would probably make it:
\s+$
Perhaps this would work:
^.+\s$
Using this you'll be able to find nonempty lines that end with a whitespace character.

vim regex replace multiple consecutive spaces with only one space

I often work with text files which have a variable amount of whitespaces as word separators (text processors like Word do this, to distribute fairly the whitespace amount due to different sizes of letters in certain fonts and they put this annoying variable amount of spaces even when saving as plain text).
I would like to automate the process of replacing these sequences of whitespaces that have variable length with single spaces. I suspect a regex could do it, but there are also whitespaces at the beginning of paragraphs (usually four of them, but not always), which I would want to let unchanged, so basically my regex should also not touch the leading whitespaces and this adds to the complexity.
I'm using vim, so a regex in the vim regex dialect would be very useful to me, if this is doable.
My current progress looks like this:
:%s/ \+/ /g
but it doesn't work correctly.
I'm also considering to write a vim script that could parse text lines one by one, process each line char by char and skip the whitespaces after the first one, but I have a feeling this would be overkill.
this will replace 2 or more spaces
s/ \{2,}/ /g
or you could add an extra space before the \+ to your version
s/ \+/ /g
This will do the trick:
%s![^ ]\zs \+! !g
Many substitutions can be done in Vim easier than with other regex dialects by using the \zs and \ze meta-sequences. What they do is to exclude part of the match from the final result, either the part before the sequence (\zs, “s” for “start here”) or the part after (\ze, “e” for “end here”). In this case, the pattern must match one non-space character first ([^ ]) but the following \zs says that the final match result (which is what will be replaced) starts after that character.
Since there is no way to have a non-space character in front of line-leading whitespace, it will be not be matched by the pattern, so the substitution will not replace it. Simple.
In the interests of pragmatism, I tend to just do it as a three-stage process:
:g/^ /s//XYZZYPARA/g
:g/ \+/s// /g
:g/^XYZZYPARA/s// /g
I don't doubt that there may be a better way (perhaps using macros or even a pure regex way) but I usually find this works when I'm in a hurry. Of course, if you have lines starting with XYZZYPARA, you may want to adjust the string :-)
It's good enough to turn:
This is a new paragraph
spanning two lines.
And so is this but on one line.
into:
This is a new paragraph
spanning two lines.
And so is this but on one line.
Aside: If you're wondering why I use :g instead of :s, that's just habit mostly. :g can do everything :s can and so much more. It's actually a way to execute an arbitrary command on selected lines. The command to execute happens to be s in this case so there's no real difference but, if you want to become a vi power user, you should look into :g at some point.
There are lots of good answers here (especially Aristotle's: \zs and \ze are well worth learning). Just for completeness, you can also do this with a negative look-behind assertion:
:%s/\(^ *\)\#<! \{2,}/ /g
This says "find 2 or more spaces (' \{2,}') that are NOT preceded by 'the start of the line followed by zero or more spaces'". If you prefer to reduce the number of backslashes, you can also do this:
:%s/\v(^ *)#<! {2,}/ /g
but it only saves you two characters! You could also use ' +' instead of ' {2,}' if you don't mind it doing a load of redundant changes (i.e. changing a single space to a single space).
You could also use the negative look-behind to just check for a single non-space character:
:%s/\S\#<!\s\+/ /g
which is much the same as (a slightly modified version of Aristotle's to treat spaces and tabs as the same in order to save a bit of typing):
:%s/\S\zs \+/ /g
See:
:help \zs
:help \ze
:help \#<!
:help zero-width
:help \v
and (read it all!):
:help pattern.txt
Answered; but though i'd toss my work flow in anyway.
%s/ / /g
#:#:#:#:#:#:#:#:#:#:#:#:(repeat till clean)
Fast and simple to remember. There are a far more elegant solutions above; but just my .02.
Does this work?
%s/\([^ ]\) */\1 /g
I like this version - it is similar to the look ahead version of Aristotle Pagaltzis, but I find it easier to understand. (Probably just my unfamiliarity with \zs)
s/\([^ ]\) \+/\1 /g
or for all whitespace
s/\(\S\)\s\+/\1 /g
I read it as "replace all occurences of something other than a space followed by multiple spaces with the something and a single space".