Vim - sed like labels or replacing only within pattern - regex

On the basis of some html editing I've came up with need for help from some VIM master out there.
I wan't to achieve simple task - I have html file with mangled urls.
Just description Just description
...
Just description
Unfortunately it's not "one url per line".
I am aware of three approaches:
I would like to be able to replace only within '"http://[^"]*"' regex (similar like replace only in matching lines - but this time not whole lines but only matching pattern should be involved)
Or use sed-like labels - I can do this task with sed -e :a -e 's#\("http://[^" ]*\) \([^"]*"\)#\1_\2#g;ta'
Also I know that there is something like "\#<=" but I am non native speaker and vim manual on this is beyond my comprehension.
All help is greatly appreciated.
If possible I would like to know answer on all three problems (as those are pretty interesting and would be helpful in other tasks) but either of those will do.

Re: 1. You can replace recursively by combining vim's "evaluate replacement as an expression" feature (:h :s\=) with the substitute function (:h substitute()):
:%s!"http://[^"]*"!\=substitute(submatch(0), ' ', '_', 'g')!g
Re: 2. I don't know sed so I can't help you with that.
Re: 3. I don't see how \#<= would help here. As for what it does: It's equivalent to Perl's (?<=...) feature, also known as "positive look-behind". You can read it as "if preceded by":
:%s/\%(foo\)\#<=bar/X/g
"Replace bar by X if it's preceded by foo", i.e. turn every foobar into fooX (the \%( ... \) are just for grouping here). In Perl you'd write this as:
s/(?<=foo)bar/X/g;
More examples and explanation can be found in perldoc perlretut.

I think what you want to do is to replace all spaces in your http:// url into _.
To achieve the goal, #melpomene's solution is straightforward. You could try it in your vim.
On the other hand, if you want to simulate your sed line, you could try followings.
:let #s=':%s#\("http://[^" ]*\)\#<= #_#g^M'
^M means Ctrl-V then Enter
then
200#s
this works in same way as your sed line (label, do replace, back to label...) and #<= was used as well.
one problem is, in this way, vim cannot detect when all match-patterns were replaced. Therefore a relative big number (200 in my example) was given. And in the end an error msg "E486: Pattern not found..." shows.
A script is needed to avoid the message.

Related

VI delete everything except a pattern

I have a huge JSON output and I just need to delete everything except a small string in each line.
The string has the format
"title": "someServerName"
the "someServerName" (the section within quotes) can vary wildly.
The closest I've come is this:
:%s/\("title":\s"*"\)
But this just manages to delete
"title": "
The only thing I want left in each line is
"title": "someServerName"
EDIT to answer the posted question:
The Text I'm going to be working with will have a format similar to
{"_links": {"self": {"href": "/api/v2/servers/32", "title": "someServerName"},tons_of_other_json_crap_goes_here
All I want left at the end is:
"title": "someServerName"
It should be .* rather than * to match a group of any characters. This does the job:
%s/^.*\("title":\s".*"\).*$/\1/
Explanation of each part:
%s/ Substitute on each matching line.
^.* Ignore any characters starting from beginning of line.
\("title":\s".*"\) Capture the title and server name. ".*" will match any characters between quotes.
.*$ Ignore the rest of the line.
/\1/ The result of the substitution will be the first captured group. The group was captured by parentheses \(...\).
This sounds like a job for grep.
:%!grep -o '"title":\s*"[^"]*"'
For more help with Vim's filtering see :h :range!.
See man grep for more information on the -o/--only-matching flag.
It's quit convenient if you break the replace command into two steps as below. (p.s. I learned this skill from good guide book 《Practical Vim》 recently).
Step 1: Search the contents that you want to keep
\v"title":\s.*"
This will match "title": "someServerName". You can try again and again with command q/ to open the search command window and modify the regular expression (This is the most excellent part I think).
\v^.*("title":\s.*").*$
Then add bracket for latter use and add .* to match other parts that you wish to delete.
Step 2: Replace the matched contents
:%s//\1/g Note the original string in this substitute command is the matched part in last search (Very good feature in vim). And \1 means using the matched group which is actually the part you wish to keep.
Hope you can find it more convenient than the long obscure substitute command.
%s/\v.*("title":\s".*").*/\1
Explanation and source here.
I think the most elegant way to solve this is using ':g'
More comprehensive details in this link below!
Power fo G
It may not fully archive what you're looking, but damn close ;)

How to create regex search-and-replace with comments?

I have a bit of a strange problem: I have a code (it's LaTeX but that does not matter here) that contains long lines with period (sentences).
For better version control I wanted to split these sentences on a new line each.
This can be achieved via sed 's/\. /.\n/g'.
Now the problem arises if there are comments with potential periods as well.
These comments must not be altered, otherwise they will be parsed as LaTeX code and this might result in errors etc.
As a pseudo example you can use
Foo. Bar. Baz. % A. comment. with periods.
The result should be
Foo.
Bar.
Baz. % ...
Alternatively the comment might go on the next line without any problems.
It was ok to use perl if that would work out better. I tried with different programs (sed and perl) a few ideas but none did what I expected. Either the comment was also altered or only the first period was altered (perl -pe 's/^([^%]*?)\. /\1.\n/g').
Can you point me in the right direction?
This is tricky as you're essentially trying to match all occurrences of ". " that don't follow a "%". A negative look-behind would be useful here, but Perl doesn't support variable-width negative look-behind. (Though there are hideous ways of faking it in certain situations.) We can get by without it here using backtracking control verbs:
s/(?:%(*COMMIT)(*FAIL))|\.\K (?!%)/\n/g;
The (?:%(*COMMIT)(*FAIL)) forces replacement to stop the first time it sees a "%" by committing to a match and then unconditionally failing, which prevents back-tracking. The "real" match follows the alternation: \.\K (?!%) looks for a space that follows a period and isn't followed by a "%". The \K causes the period to not be included in the match so we don't have to include it in the replacement. We only match and replace the space.
Putting the comment by itself on a following line can be done with sed pretty easily, using the hold space:
sed '/^[^.]*%/b;/%/!{s/\. /.\n/g;b};h;s/[^%]*%/%/;x;s/ *%.*//;s/\. /.\n/g;G'
Or if you want the comment by itself before the rest:
sed '/^[^.]*%/b;/%/!{s/\. /.\n/g;b};h;s/ *%.*//;s/\. /.\n/g;x;s/[^%]*%/%/;G'
Or finally, it is possible to combine the comment with the last line also:
sed '/^[^.]*%/b;/%/!{s/\. /.\n/g;b};h;s/[^%]*%/%/;x;s/ *%.*//;s/\. /.\n/g;G;s/\n\([^\n]*\)$/ \1/'

Regex that matches contents of () with nested () in it

I have a messy text file(Due to nature of the contents I cannot paste it).
In the file I want to match things that are in unnested parenthesis.
Here is sample that includes the problem:
a(b()c((d)e)f()g)h(i)
The output that I need is:
(b()c((d)e)f()g)(i)
(basically everything in the largest parenthesis, less 'a' and 'h')
Again I cannot post the actual contents but above example illustrates the problem I have in original file.
I am working on this from bash, I am familiar with sed, grep, but not awk unfortunately.
Thanks
Since regex will find the longest possible match, you can just use
\(.*\)
If you care about nesting and want to find the outermost, e.g. for ((a)) and (b)))) you want to find ((a)) and (b), then that's a typical example of a grammar that you technically can't match with regular expressions.
However, since you tagged your post PCRE:
grep -P -o '(?xs)(?(DEFINE) (?<c>([^()]|(?&p))) (?<p>\((?&c)*\)))((?&p))'
Ha, I know there is a good answer, but posting the one I came up with to enrich the range of ides. (demo).
(?x)
(?(DEFINE)(?<nest>(\((?:[^()]*(?1)?[^()]*)\))))
\((?:[^()]*(?&nest)?[^()]*)*\)
Of course it needs to be flattened onto the grep line.

Is it possible to use regex for matching with a condition?

I posted this question on super user and I was suggested to post this question on stackoverflow.
I really like vim and today I faced with the intresting problem and I think it can be done via regexp but I can't form out proper one.
I've got a very big sql-file. It consolidates many different queries. File has content with something like this:
select * from hr.employees, oe.orders, oe.order_items
select * from hr.employess, oe.orders, hr.job_history
select * from oe.customers, oe.orders, hr.employees
select * from hr.employees, hr.departments, hr.locations
How can I select only that lines, which has only one match with hr. on the line?. For example above it will be first and third lines.
Sure, it is possible to match such lines. This pattern matches:
^\%(\%(hr\.\)\#!.\)*hr\.\%(\%(hr\.\)\#!.\)*$
Some people like to reduce the amount of backslash-escaping by using the very magic switch \v. Then the same pattern becomes
\v^%(%(hr\.)#!.)*hr\.%(%(hr\.)#!.)*$
(Here I used non-capturing parentheses \%(...\) but capturing parentheses \(...\) would work just as well.)
The question is: What do you want to do with these lines? Delete them?
In that case you could use the :global command:
:g/\v^%(%(hr\.)#!.)*hr\.%(%(hr\.)#!.)*$/d
More information at
:h :global
:h /\v
:h /\%(
:h /\#!
To check if line contains only single occurrence of hr. use regex pattern
^(?=.*\bhr\.)(?!.*\bhr\..*\bhr\.).* with m modifier. I suggest to use grep -P utility.
For that, you need to combine a negative lookbehind with a negative lookahead assertion; i.e. the currently matching pattern must not match before nor afterwards in the same line. In Vim, the atoms for those are \#<! and \#!, respectively.
The pattern to find a single occurrence of X is therefore this:
/\%(X.*\)\#<!X\%(.*X\)\#!/
Applied to your pattern hr.:
/\%(hr\..*\)\#<!hr\.\%(.*hr\.\)\#!/
Problems like this I always find easier to split it into multiple uses of the regex. If I were going to filter through this with grep, I'd do this:
grep "hr\." foo.sql
and that gives me all the lines with "hr.", even the ones with two.
Now I pipe that output through grep again and ask for it to ignore lines with hr. appearing twice:
grep "hr\." foo.sql | grep -v "hr\..*hr\."
I know you're talking about vim, but I'm showing alternatives that might be helpful and might be a good deal clearer.

Regex to insert text BEFORE a line containing a match?

I have a bunch of artists that are named in this fashion:
Killers, The
Treatment, The
Virginmarys, The
I need them to look like
The Killers
The Treatment
The Virginmarys
I'm able to match the lines with , The ((^|\n)(.*, The) is what I've used) but the more advanced syntax is eluding me. I can use regex on the replacement syntax as well (it's for a TextPipe filter so it might as well be for Notepad++ or any other Regex text editor).
You should be able to use the following:
Find: (\S+),\s\S*
Replace: The $1
Or include the The..
Find: (\S+),\s+(\S+)
Replace: $2 $1
Depending on your editor, you may be better off using \1, \2, and so on for capture groups.
Since you need to specifically capture the title before the comma, do so:
(^|\n)(.*), The
And replace it putting the "the" in the right place:
\1The \2
Regular expressions define matches but not substitutions.
How and in which way you can perform substitutions is highly dependant on the application.
Most editors that provide regular expression support work on a line per line basis.
Some of them will allow substitutions such as
s/^(.*Banana)/INSERTED LINE\n\1/
which would then insert the specific pattern before each match. Note that others may not allow newlines in the substitution pattern at all. In VIM, you can input newlines into the command prompt using Ctrl+K Return Return. YMMV.
In Java, you would just first print the insertion text, then print the matching line.