VIM complex query - regex

I am using VIM to edit a csv file that looks like this:
A00,A01,A02...
A10,A11,A12...
A20,A21,A22...
I want to delete every line, from the second occurrence of "," until the end. So i would be left with:
A00,A01
A10,A11
A20,A21
I tried Ctrl-Shift-v on the second "," in first line, then G to pick all lines, then D to delete until the end. the problem is that Aij are not necessarily of the same length so that didn't work...

normal command here is helpful:
:%norm! 2f,D
does it.
If you want to golf a bit, you can record macro, or use x#='...' format. But I feel this way is slightly better than regex solution for the given problem.

Another possibility with :s, where you can easily control the number of fields to keep (2 in occurence):
:%s/\v^([^,]*\zs,){2}.*//
\zs is included inside the () group, then its position is defined relative to the last found group.

Use :substitute, it'll work better than block-visual mode:
:%s/\v^.{-},.{-}\zs,.*//
With
\v to simply the regex (very magic mode) (:h /\v)
^ to match the start of line
.{-},: a non greedy match on anything until ... the first comma (:h /\{)
.{-}\zs, : the same thing until the second comma, except this time we tell that the match starts at the comma (\zs -> Zone Start -> :h /\zs).
and then we replace the match (i.e. starting from the 2nd comma) with nothing.

Related

How can I delete this part of the text with regex?

I have a problem that I really hope that somebody could help me. So, I want to delete some parts of text from a notepad++ document using Regex. If there's another software that I can use to delete this part of text, let me know please, I am really really noob with regex
So, my document its like this:
1
00:00:00,859 --> 00:00:03,070
text over here
2
00:00:03,070 --> 00:00:09,589
text over here
3
00:00:09,589 --> 00:00:10,589
some numbers here
4
00:00:10,589 --> 00:00:12,709
Text over here
5
00:00:12,709 --> 00:00:18,610
More text with numbers here
What I want to learn is how can I delete the first 2 lines of numbers in all the document? So I could get only the text parts (the "text over here" parts)
I would really appreciate any kind of help!
My solution:
^[\s\S]{1,5}\d{1,3}:\d{1,3}:\d{1,3},\d{1,5}\s-->\s*?\d{1,3}:\d{1,3}:\d{1,3},\d{1,5}\s
This solution match both types: either all data in one line, or numbers in one line and data in the second.
Demo: https://regex101.com/r/nKD0DQ/1/
Simplest solution;
\d+(\r\n|\r|\n)\d{2}:\d{2}.*(\r\n|\r|\n)
Get line with some number \d+ with its line break (\r\n|\r|\n)
Also the next line that starts with two 2-digit numbers and a colon \d{2}:\d{2} with the rest .* and its line break. No need to match all since we already are in the correct line, since subtitle file is defined well with its predictable structure.
Put this as Find what: value in Search -> Replace.. in Notepad++, with Seach Mode: Regular Expression and with replace value (Replace with:) of empty space. Will get you the correct result, lines of expected text with empty line in between each.
to see it on action on regex101
Subtitles, for accuracy you can use this:
\d+(\r\n|\n|\r)(\d\d:){2}\d\d,\d{3}\s*-->\s*(\d\d:){2}\d\d,\d{3}(\r\n|\n|\r)
Check Regular Expression, Find what with this and Replace with empty would do.
Regxe Demo
srt subtitles are basically ordered. And it's better accurate than lose texts.
\d : a single digit.
+ : one or more of occurances of the afore character or group.
\r\n: carriage and return. (newline)
* : zero or more of occurances of the afore character or group.
| : Or, match either one.
{3}: Match afore character or group three times.
I'm going for a less specific regex:
^[0-9]*\n[0-9:,]*\s-->\s[0-9:,]*
Demo # regex101

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

What is the regexpression for fixed words and a variable

I am using sublime to search and replace text in a code.
This is what I want to find
pre + <anyvariable> + post
and then replace it like this
sanitize(<anyvariable>)
I don't know how to come up with regexp after finding "pre + "
Came up with something like this
/(\pre \+)(?=.*)/g
Try this:
/^(.*\+)\s*(\S+)\s*(\+.*)$/
This will give you everything in three capture groups, starting from the beginning of the line between the first and second +, and then everything following to the end of the line. If there's other content such as more +'s, then this probably won't work blanketly.
Without a computer, and not sure a out what you want, but if you search for:
pre \+ (.*?) \+ post
And replace with:
sanityze($1) (or \1 I never remember)
It should do what you want :-)
If you want to be able to have linebreaks, replace .* by (?:.|\n)

find a single quote at the end of a line starting with "mySqlQueryToArray"

I'm trying to use regex to find single quotes (so I can turn them all into double quotes) anywhere in a line that starts with mySqlQueryToArray (a function that makes a query to a SQL DB). I'm doing the regex in Sublime Text 3 which I'm pretty sure uses Perl Regex. I would like to have my regex match with every single quote in a line so for example I might have the line:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name'");
I want the regex to match in that line both of the quotes around $name but no other characters in that line. I've been trying to use (?<=mySqlQueryToArray.*)' but it tells me that the look behind assertion is invalid. I also tried (?<=mySqlQueryToArray)(?<=.*)' but that's also invalid. Can someone guide me to a regex that will accomplish what I need?
To find any number of single quotes in a line starting with your keyword you can use the \G anchor ("end of last match") by replacing:
(^\h*mySqlQueryToArray|(?!^)\G)([^\n\r']*)'
With \1\2<replacement>: see demo here.
Explanation
( ^\h*mySqlQueryToArray # beginning of line: check the keyword is here
| (?!^)\G ) # if not at the BOL, check we did match sth on this line
( [^\n\r']* ) ' # capture everything until the next single quote
The general idea is to match everything until the next single quote with ([^\n\r']*)' in order to replace it with \2<replacement>, but do so only if this everything is:
right after the beginning keyword (^mySqlQueryToArray), or
after the end of the last match ((?!^)\G): in that case we know we have the keyword and are on a relevant line.
\h* accounts for any started indent, as suggested by Xælias (\h being shortcut for any kind of horizontal whitespace).
https://stackoverflow.com/a/25331428/3933728 is a better answer.
I'm not good enough with RegEx nor ST to do this in one step. But I can do it in two:
1/ Search for all mySqlQueryToArray strings
Open the search panel: ⌘F or Find->Find...
Make sure you have the Regex (.* ) button selected (bottom left) and the wrap selector (all other should be off)
Search for: ^\s*mySqlQueryToArray.*$
^ beginning of line
\s* any indentation
mySqlQueryToArray your call
.* whatever is behind
$ end of line
Click on Find All
This will select every occurrence of what you want to modify.
2/ Enter the replace mode
⌥⌘F or Find->Replace...
This time, make sure that wrap, Regex AND In selection are active .
Them search for '([^']*)' and replace with "\1".
' are your single quotes
(...) si the capturing block, referenced by \1 in the replace field
[^']* is for any character that is not a single quote, repeated
Then hit Replace All
I know this is a little more complex that the other answer, but this one tackles cases where your line would contain several single-quoted string. Like this:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name' and Value='1234'");
If this is too much, I guess something like find: (?<=mySqlQueryToArray)(.*?)'([^']*)'(.*?) and replace it with \1"\2"\3 will be enough.
You can use a regex like this:
(mySqlQueryToArray.*?)'(.*?)'(.*)
Working demo
Check the substitution section.
You can use \K, see this regex:
mySqlQueryToArray[^']*\K'(.*?)'
Here is a regex demo.

auto indent in vim string replacement new line?

I'm using the following command to auto replace some code (adding a new code segment after an existing segment)
%s/my_pattern/\0, \r some_other_text_i_want_to_insert/
The problem is that with the \r, some_other_text_i_want_to_insert gets inserted right after the new line:
mycode(
some_random_text my_pattern
)
would become
mycode(
some_random_text my_pattern
some_other_text_i_want_to_insert <--- this line is NOT indented
)
instead of
mycode(
some_random_text my_pattern
some_other_text_i_want_to_insert <--- this line is now indented
)
i.e. the new inserted line is not indented.
Is there any option in vim or trick that I can use to indent the newly inserted line?
Thanks.
Try this:
:let #x="some_other_text_i_want_to_insert\n"
:g/my_pattern/normal "x]p
Here it is, step by step:
First, place the text you want to insert in a register...
:let #x="some_other_text_i_want_to_insert\n"
(Note the newline at the end of the string -- it's important.)
Next, use the :global command to put the text after each matching line...
:g/my_pattern/normal "x]p
The ]p normal-mode command works just like the regular p command (that is, it puts the contents of a register after the current line), but also adjusts the indentation to match.
More info:
:help ]p
:help :global
:help :normal
%s/my_pattern/\=submatch(0).", \n".matchstr(getline('.'), '^\s*').'some_other_text'/g
Note that you will have to use submatch and concatenation instead of & and \N. This answer is based on the fact that substitute command puts the cursor on the line where it does the substitution.
How about normal =``?
:%s/my_pattern/\0, \r some_other_text_i_want_to_insert/ | normal =``
<equal><backtick><backtick>: re-index position before latest jump
(Sorry about the strange formatting, escaping backtick is really hard to use here)
To keep them as separate command you could do one of these mappings:
" Equalize and move cursor to end of change - more intuitive for me"
nnoremap =. :normal! =````<CR>
" Equalize and keeps cursor at beginning of change"
nnoremap =. :keepjumps normal! =``<CR>
I read the mapping as "equalize last change" since dot already means "repeat last change".
Or skip the mapping altogether since =`` is only 3 keys with 2 of them being repeats. Easy peasy, lemon squeezy!
References
:help =
:help mark-motions
Kind of a round-about way of achieving the same thing: You could record a macro which finds the next occurance of my_pattern and inserts after it a newline and your replacement string. If auto-indent is turned on, the indent level will be maintained reagardless of where the occurance of my_pattern is found.
Something like this key sequence:
q 1 # begin recording
/my_pattern/e # find my_pattern, set cursor to end of match
a # append
\nsome_other_text... # the text to append
<esc> # exit insert mode
q # stop recording
Repeated by pressing #1
You can do it in two steps. This is similar to Bill's answer but simpler and slightly more flexible, since you can use part of the original string in the replacement.
First substitute and then indent.
:%s/my_pattern/\0, \r some_other_text_i_want_to_insert/
:%g/some_other_text_i_want_to_insert/normal ==
If you use part of the original string with \0,\1, etc. just use the common part of the replacement string for the :global (second) command.
I achieved this by using \s* at the beginning of my pattern to capture the preceding whitespace.
I'm using the vim addon for VSCode, which doesn't seem to match standard vim completely, but for me,
:%s/(\s*)(existing line)/$1$2\n$1added line/g
turns this
mycode{
existing line
}
into this
mycode{
existing line
added line
}
The parentheses in the search pattern define groups which are referenced by $1 and $2. In this case $1 is the white space captured by (\s*). I'm not an expert on different implementations of vim or regex, but as far as I can tell, this way of referencing regex groups is specific to VSCode (or at least not general). More explanation of that here. Using \s* to capture a group of whitespace should be general, though, or at least have a close analog in your environment.