Using vim to transform indented dirlist to full paths - regex

I have inherited a set of files with indented directory listings, like this:
DirA/
DirA1/
DirA2/
DirA2.1/
DirA3/
DirB/
DirC/
DirC1/
DirC1.1/
DirC1.1.1/
DirC1.2/
For a migration to new code and a new structure I need to change this to a full path on each line.
DirA/
DirA/DirA1/
DirA/DirA2/
DirA/DirA2/DirA2.1/
DirA/DirA3/
DirB/
DirC/
DirC/DirC1/
DirC/DirC1/DirC1.1/
DirC/DirC1/DirC1.1/DirC1.1.1/
DirC/DirC1/DirC1.2/
I'd prefer to use vim for this (if only to increase my vim fuu), but I haven't yet found out how to do this (using regexes, registers and/or macros?). Could anyone please give me some pointers? I'd appreciate it.

Assuming an indent of width of 2 spaces:
:%s#^\s\+#\=join(split(getline(line('.')-1),'/')[:strlen(submatch(0))/2-1],'/').'/'
Overview
The idea is to get the previous line (getline(line('.')-1)) and split the line into directories. Then do a substitution on the leading spaces and replacing with the sublist of list of directories from the previous line according the the indent level, strlen(submatch(0))/2-1.
Glory of details
%s#^\s\+#{replacement} - do a substitution on lines staring with spaces
\={expr} will yield the result of {expr} as the replacement for the spaces - this is called a sub-replace-expression
getline(line('.')-1) - get the previous line
split(getline(line('.')-1), '/') - split up the previous lines into directory segments
lst[{start}:{end}] - create a sublist from list, lst, starting from {start} and ending at index {end}.
lst[:{end}] - assume start of list. e.g [0,1,2][:1] yields [0,1]
submatch(0) - inside a sub-replace-expression submatch({n} will get the capture group number, {n}
submatch(0) will yield the entire matching pattern of the substitution. aka \0 or & in a normal substitution
strlen(submatch(0))/2 will get the current line's indention level
strlen(submatch(0))/2-1 will be 1 indention level less. aka. the part of the directory structure that the current line and the line above have in common
join({lst}, {glue}) - join a list together as a string with {glue} as the separator between list items
Add a / at the end as it got taken out by split()
It is important to know that Vim does the substitution line by line starting from the top (as noted by #Lieven Keersmaekers). This means we can use the previous line to get the common ancestors
For more help see:
:h :s
:h sub-replace-expression
:h getline()
:h line()
:h join()
:h sublist
:h strlen()
:h split()

Can you try this regex:
/^(.+\/)((\r?\n(\s\s){1,}\S+){0,})\r?\n\s\s(\S+)/gm
And replacing by one of these:
$1$2\n$1$5
$1$2\r\n$1$5
(if you need \n line breaks or \r\n)
Demo : https://regex101.com/r/iwXMot/2
How works:
capture first Directory (has no \s before)
capture all lines till line has \s\s while next one has no \s at the beginning. it is last sub directory.
add parent directory to last subdirectory and remove \s\s
continue this loop!
for next steps, change the last \s\s to \s{4} and next one \s{6} and ...
I don't know vim, but hope this regex pattern helps!

Related

VIM complex query

I am using VIM to edit a csv file that looks like this:
A00,A01,A02...
A10,A11,A12...
A20,A21,A22...
I want to delete every line, from the second occurrence of "," until the end. So i would be left with:
A00,A01
A10,A11
A20,A21
I tried Ctrl-Shift-v on the second "," in first line, then G to pick all lines, then D to delete until the end. the problem is that Aij are not necessarily of the same length so that didn't work...
normal command here is helpful:
:%norm! 2f,D
does it.
If you want to golf a bit, you can record macro, or use x#='...' format. But I feel this way is slightly better than regex solution for the given problem.
Another possibility with :s, where you can easily control the number of fields to keep (2 in occurence):
:%s/\v^([^,]*\zs,){2}.*//
\zs is included inside the () group, then its position is defined relative to the last found group.
Use :substitute, it'll work better than block-visual mode:
:%s/\v^.{-},.{-}\zs,.*//
With
\v to simply the regex (very magic mode) (:h /\v)
^ to match the start of line
.{-},: a non greedy match on anything until ... the first comma (:h /\{)
.{-}\zs, : the same thing until the second comma, except this time we tell that the match starts at the comma (\zs -> Zone Start -> :h /\zs).
and then we replace the match (i.e. starting from the 2nd comma) with nothing.

How do I replace a newline in Atom?

In Atom, If I activate regex mode on the search-and-replace tool, it can find newlines as \n, but when I try to replace them, they're still there.
Is there no way to replace a newline-spanning string in Atom?
Looks like Atom matches newlines as \r\n but behaves inconsistently when replacing just the \n with nothing.
So newlines seem to match \s+ and \r\n, and only "half" of the line-ending matches \n.
If you replace \n with a string, nothing happens to the line-ending, but the string is appended to the next line
If you replace \r with a string, nothing happens at all, but the cursor advances.
It's alittle bit late to answer but i use following term to search and it works with Atom v1.19.7 x64
\r?\n|\r
BR
None of these answers helped me.
What worked for me:
I just added a new line at the end of the file.
Shift + <- (arrow to left)
Ctrl + C
Ctrl + V in the "Replace in current buffer" line
Just copied the new line and pasted it in :D
DELETE INVISIBLE LINE BREAKS IN CODE WITH ATOM
(using the "Find in buffer" function)
(- open your code-file with the Atom-Editor)
Hit cmd(mac)/ctrl(win) + f on your keyboard for activating the Find in buffer function (a little window appears at the bottom atom-screen edge).
Mark your Code in which you want to delete the invisible Line breaks.
Click on the Markup-Mode Button and after that on the Regex-Mode (.*) Button and type into the first field: \n
After that click replace all.
[And Atom will delete all the invisible line breaks indicated by \n (if you use LF-Mode right bottom corner, for CRLF-Mode (very common on windows machines as default) use \r\n) by replacing them with nothing.]
Hope that helps.
Synaikido
You can use backreferencing:
eg. Replace triple blank lines with a single blank line
Find regex: (\r\n){3}
Replace: $1
You can indicate double blank lines with (\r\n){2} ... or any number n of blank lines with (\r\n){n}. And you can omit the $1 and leave replace blank to remove the blank lines altogether.
If you wanted to replace 3 blank lines with two, your replace string can be $1$1 or $1$2 (or even $1$3 ... $3$3 ... $3$2 ... ): $1 just refers to the first round bracketed expression \r\n; $2 with the second (which is the same as the first, so $1$1 replaces the same way as $1$2 because $1 == $2). This generalizes to n blank lines.
The purists will probably not like my solution, but you can also transform the find and replace inputs into a multiline text box by copying content with several line breaks and pasting it into the find/replace inputs. It will work with or without using regex.
For example, you can copy this 3 lines and paste them into both find and replace inputs:
line 1
line 2
line 3
Now that your inputs have the number of lines that you need, you can modify them as you want (and add regex if necessary).
Heh, very weird, Ctrl+Shift+F does not work too!
Workaround: open Atom Settings, then Core Packages->line-ending-selector, scroll to bottom to see tips about command to convert line endings: 'convert-to-LF'.
To convert: Cmd+Shift+P type 'line' and choose 'convert-to-LF' - done!
You could change default option 'Default line ending' from 'OS' to 'LF'.
Also after settings changed your new files will use 'LF'.
prerequisite: activate 'Use Regexp'
in my version of atom (linux, 1.51.0) i used the following code to add 'export ' after a new line
search '\n'
replace '\nexport '
worked like a charm
\r\n didn't match anything

find a single quote at the end of a line starting with "mySqlQueryToArray"

I'm trying to use regex to find single quotes (so I can turn them all into double quotes) anywhere in a line that starts with mySqlQueryToArray (a function that makes a query to a SQL DB). I'm doing the regex in Sublime Text 3 which I'm pretty sure uses Perl Regex. I would like to have my regex match with every single quote in a line so for example I might have the line:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name'");
I want the regex to match in that line both of the quotes around $name but no other characters in that line. I've been trying to use (?<=mySqlQueryToArray.*)' but it tells me that the look behind assertion is invalid. I also tried (?<=mySqlQueryToArray)(?<=.*)' but that's also invalid. Can someone guide me to a regex that will accomplish what I need?
To find any number of single quotes in a line starting with your keyword you can use the \G anchor ("end of last match") by replacing:
(^\h*mySqlQueryToArray|(?!^)\G)([^\n\r']*)'
With \1\2<replacement>: see demo here.
Explanation
( ^\h*mySqlQueryToArray # beginning of line: check the keyword is here
| (?!^)\G ) # if not at the BOL, check we did match sth on this line
( [^\n\r']* ) ' # capture everything until the next single quote
The general idea is to match everything until the next single quote with ([^\n\r']*)' in order to replace it with \2<replacement>, but do so only if this everything is:
right after the beginning keyword (^mySqlQueryToArray), or
after the end of the last match ((?!^)\G): in that case we know we have the keyword and are on a relevant line.
\h* accounts for any started indent, as suggested by Xælias (\h being shortcut for any kind of horizontal whitespace).
https://stackoverflow.com/a/25331428/3933728 is a better answer.
I'm not good enough with RegEx nor ST to do this in one step. But I can do it in two:
1/ Search for all mySqlQueryToArray strings
Open the search panel: ⌘F or Find->Find...
Make sure you have the Regex (.* ) button selected (bottom left) and the wrap selector (all other should be off)
Search for: ^\s*mySqlQueryToArray.*$
^ beginning of line
\s* any indentation
mySqlQueryToArray your call
.* whatever is behind
$ end of line
Click on Find All
This will select every occurrence of what you want to modify.
2/ Enter the replace mode
⌥⌘F or Find->Replace...
This time, make sure that wrap, Regex AND In selection are active .
Them search for '([^']*)' and replace with "\1".
' are your single quotes
(...) si the capturing block, referenced by \1 in the replace field
[^']* is for any character that is not a single quote, repeated
Then hit Replace All
I know this is a little more complex that the other answer, but this one tackles cases where your line would contain several single-quoted string. Like this:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name' and Value='1234'");
If this is too much, I guess something like find: (?<=mySqlQueryToArray)(.*?)'([^']*)'(.*?) and replace it with \1"\2"\3 will be enough.
You can use a regex like this:
(mySqlQueryToArray.*?)'(.*?)'(.*)
Working demo
Check the substitution section.
You can use \K, see this regex:
mySqlQueryToArray[^']*\K'(.*?)'
Here is a regex demo.

Regex / Vim: Matching everything except a pattern, where pattern is multi-line?

Is there a way in Vim to perform a similar operation to
:v/PATTERN/d
where PATTERN is a multi-line regex pattern? I'm sure there is a way to do this in script, but I am mainly curious as to if it is possible to do using only standard regex substitution or Vim commands, because at this point it is more academic than an actual need.
My example is the following:
asdf
begin
blah
end
asdf
alsdfjasf
begin
random stuff
end
...
I want to get the blocks of begin/end with the lines between them, but ignore everything outside of the blocks, ultimately ending up with
begin
blah
end
begin
random stuff
end
...
My thoughts were to do
:v/begin\_.\{-}end/d
where everything didn't match that would be deleted or even copied to register, but obviously :v and :g only work on single lines.
Then, I started going down the path of running a substitute and substitute everything with empty string that DIDN'T match the begin\_.\{-}end pattern, but I cannot grasp how to achieve such using look-behinds or anything. The regex works perfectly fine when just searching, but I can't figure out how to tell the regex engine to find everything BUT that pattern. Any ideas?
clear reg a
qaq
append begin...end to reg a
:g/begin/,/end/y A
open new tab
:tabnew
put reg a
"ap
Flip it inside out, and delete everything delimited by the pattern:
:%s/\%(^end\n*\|\%^\)\zs\_.\{-}\ze\%(^begin\|\%$\)//
\%( ... ) - non capturing group
^end\n* - the end of your pattern
\%^ - the beginning of the file
\zs - don't include anything matched before this point in the string to be replaced
\_.\{-} - non-greedy matching of anything (including newlines)
\ze - don't include anything matched after this point in the string to be replaced
^begin - the beginning of your pattern
\%$ - the end of your pattern
The commands executed by g and v can themselves take ranges, so you can act on everything from "begin" to "end" with :g /begin/ .,/end/xxx where xxx is a command to execute on that range. You cant really use this for :v but there are several ways you could do it in multiple passes, e.g.
"mark all the lines we want to keep by putting '#' at the start of the line"
:g/^begin/ .,/^end/ s/^/#/
"delete all unmarked lines"
:v/^#/ d
"remove the markers"
:s/^#//
Of course this assumes you do not have any lines in the file starting with #.
Alternatively you could delete everything between each "end" and the next "begin"
:g/^end/ +1,/^begin/-1 d
then delete the cruft left at the start and end of the file:
:1,/^begin/-1 d
:$?^end?+1,$ d
Note the use of $?^end?+1 to search backwards from the end of the file to find the last line starting with "end".
N.B. the last two will delete too much if the file starts with "begin" or ends with "end" at that point, so check before you use them.

auto indent in vim string replacement new line?

I'm using the following command to auto replace some code (adding a new code segment after an existing segment)
%s/my_pattern/\0, \r some_other_text_i_want_to_insert/
The problem is that with the \r, some_other_text_i_want_to_insert gets inserted right after the new line:
mycode(
some_random_text my_pattern
)
would become
mycode(
some_random_text my_pattern
some_other_text_i_want_to_insert <--- this line is NOT indented
)
instead of
mycode(
some_random_text my_pattern
some_other_text_i_want_to_insert <--- this line is now indented
)
i.e. the new inserted line is not indented.
Is there any option in vim or trick that I can use to indent the newly inserted line?
Thanks.
Try this:
:let #x="some_other_text_i_want_to_insert\n"
:g/my_pattern/normal "x]p
Here it is, step by step:
First, place the text you want to insert in a register...
:let #x="some_other_text_i_want_to_insert\n"
(Note the newline at the end of the string -- it's important.)
Next, use the :global command to put the text after each matching line...
:g/my_pattern/normal "x]p
The ]p normal-mode command works just like the regular p command (that is, it puts the contents of a register after the current line), but also adjusts the indentation to match.
More info:
:help ]p
:help :global
:help :normal
%s/my_pattern/\=submatch(0).", \n".matchstr(getline('.'), '^\s*').'some_other_text'/g
Note that you will have to use submatch and concatenation instead of & and \N. This answer is based on the fact that substitute command puts the cursor on the line where it does the substitution.
How about normal =``?
:%s/my_pattern/\0, \r some_other_text_i_want_to_insert/ | normal =``
<equal><backtick><backtick>: re-index position before latest jump
(Sorry about the strange formatting, escaping backtick is really hard to use here)
To keep them as separate command you could do one of these mappings:
" Equalize and move cursor to end of change - more intuitive for me"
nnoremap =. :normal! =````<CR>
" Equalize and keeps cursor at beginning of change"
nnoremap =. :keepjumps normal! =``<CR>
I read the mapping as "equalize last change" since dot already means "repeat last change".
Or skip the mapping altogether since =`` is only 3 keys with 2 of them being repeats. Easy peasy, lemon squeezy!
References
:help =
:help mark-motions
Kind of a round-about way of achieving the same thing: You could record a macro which finds the next occurance of my_pattern and inserts after it a newline and your replacement string. If auto-indent is turned on, the indent level will be maintained reagardless of where the occurance of my_pattern is found.
Something like this key sequence:
q 1 # begin recording
/my_pattern/e # find my_pattern, set cursor to end of match
a # append
\nsome_other_text... # the text to append
<esc> # exit insert mode
q # stop recording
Repeated by pressing #1
You can do it in two steps. This is similar to Bill's answer but simpler and slightly more flexible, since you can use part of the original string in the replacement.
First substitute and then indent.
:%s/my_pattern/\0, \r some_other_text_i_want_to_insert/
:%g/some_other_text_i_want_to_insert/normal ==
If you use part of the original string with \0,\1, etc. just use the common part of the replacement string for the :global (second) command.
I achieved this by using \s* at the beginning of my pattern to capture the preceding whitespace.
I'm using the vim addon for VSCode, which doesn't seem to match standard vim completely, but for me,
:%s/(\s*)(existing line)/$1$2\n$1added line/g
turns this
mycode{
existing line
}
into this
mycode{
existing line
added line
}
The parentheses in the search pattern define groups which are referenced by $1 and $2. In this case $1 is the white space captured by (\s*). I'm not an expert on different implementations of vim or regex, but as far as I can tell, this way of referencing regex groups is specific to VSCode (or at least not general). More explanation of that here. Using \s* to capture a group of whitespace should be general, though, or at least have a close analog in your environment.