Remove lines from buffer that match the selected text - regex

When analyzing large log files, I often remove lines containing text I find irrelevant:
:g/whatever/d
Sometimes I find text that spans multiple lines, like stacktraces. For that, I record the steps taken (search, go to start anchor, delete to end anchor) and replay that macro with 100000#q. I'm searching for a function or a feature vim already has included that allows me to mark text and remove all lines containing this text. Ideally this would also work for block selection.

If I understood your problem right, this command should do what you want:
:g/NullPointer/,/omitt/d
Example:
Before:
1
2
3
NullPointerException1
4
5
6
omitted
7
NullPointerException2
8
9
omitted
10
After:
1
2
3
7
10
Please read :h edit-paragraph-join, there is good explanation for the command, your case is just changing join into d

:g/whatever/d2
will delete a line with whatever and the line after it. If you can find text that always happens in the first line, you can strip out all of the following text if it has the same number of lines by changing 2 to whatever you need.

You could actually just use some normal commands in a global command to achieve what you want, look at your example (hope i understood it more or less right):
someText
NullPointerException
...
omitted
you want to delte from the line above NPE until the line with omitted right?
Just use the following:
:g/NullPointerException/execute "normal! kddd/omitted\<cr>dd"
It maybe looks complex, but it isn't. It is not better than a macro1
, but i like commands more, because I always make errors recording macros.
Since it only uses normal vim movements, it is easy to adopt. If you f.e. not know where your previous anchor is, you could use ?anchor\<cr> instead of kd. For a better demonstration you will have to submit a realistic example.
[1] You could argue, that this only needs to be run once, but that is also true for a recursive macro http://vim.wikia.com/wiki/Record_a_recursive_macro

Thanks to the answers here, I was able to code a very handy function: The sources below enables one to select text and remove all lines with the same (or similar) text in the current buffer. That works with both in-line and multiline selection. As I said I was searching for something that made me faster in analyzing log files. Log files typically contain dates and times and these change all the time, so it's a good idea to have something that let's us ignore numbers. Let's see. I'm using these two mappings:
vnoremap d :<C-U>echo RemoveSelectionFromBuffer(0)<CR>
vnoremap D :<C-U>echo RemoveSelectionFromBuffer(1)<CR>
Typical usage:
Remove similar lines ignoring numbers: Shift+v, then Shift+d
Remove same matches (single line): Mark text inline (leaving out dates and times), then d
Remove same matches (multiline): Mark text across lines (leaving out dates and times), then d
Here's the source code:
" Removes lines matching the selected text from buffer.
function! RemoveSelectionFromBuffer(ignoreNumbers)
let lines = GetVisualSelection() " selected lines
" Escape backslashes and slashes (delimiters)
call map(lines, {k, v -> substitute(v, '\\\|/', '\\&', 'g')})
if a:ignoreNumbers == 1
" Substitute all numbers with \s*\d\s* - in formatted output matching
" lines may have whitespace instead of numbers. All backslashes need
" to be escaped because \V (very nomagic) will be used.
call map(lines, {k, v -> substitute(v, '\s*\d\+\s*', '\\s\\*\\d\\+\\s\\*', 'g')})
endif
let blc = line('$') " number of lines in buffer (before deletion)
let vlc = len(lines) " number of selected lines
let pattern = join(lines, '\_.') " support multiline patterns
let cmd = ':g/\V' . pattern . '/d_' . vlc " delete matching lines (d_3)
let pos = getpos('v') " save position
execute "silent " . cmd
call setpos('.', pos) " restore position
let dlc = blc - line('$') " number of deleted lines
let dmc = dlc / vlc " number of deleted matches
let cmd = substitute(cmd, '\(.\{50\}\).*', '\1...', '') " command output
let lout = dlc . ' line' . (dlc == 1 ? '' : 's')
let mout = '(' . dmc . ' match' . (dmc == 1 ? '' : 'es') . ')'
return printf('%s removed: %s', (vlc == 1 ? lout : lout . ' ' . mout), cmd)
endfunction
I took the GetVisualSelection() code from this answer.
function! GetVisualSelection()
if mode() == "v"
let [line_start, column_start] = getpos("v")[1:2]
let [line_end, column_end] = getpos(".")[1:2]
else
let [line_start, column_start] = getpos("'<")[1:2]
let [line_end, column_end] = getpos("'>")[1:2]
end
if (line2byte(line_start)+column_start) > (line2byte(line_end)+column_end)
let [line_start, column_start, line_end, column_end] =
\ [line_end, column_end, line_start, column_start]
end
let lines = getline(line_start, line_end)
if len(lines) == 0
return ''
endif
let lines[-1] = lines[-1][: column_end - 1]
let lines[0] = lines[0][column_start - 1:]
return lines
endfunction
Thanks, aepksbuck, DoktorOSwaldo and Kent.

Related

REGEX - Automatic text selection and restructering

I am kinda new to AHK, I've written some scripts. But with my latest script, I'm kind of stuck with REGEX in AHK.
I want to make the report of a structure of texts I make.
To do this I've set up a system:
sentences ending on a '.', are the important sentences with "-". (variable 'Vimportant') BUT WITHOUT the words mentioned for 'Vanecdotes2' or 'Vdelete2' cfr. 4
sentences ending on a '.*', are the anecdotes (variable 'Vanecdotes1') where I've put a star manualy after the point.
sentences ending on a '.!', are irrelevant sentences and need to be deleted (variable 'Vdelete1') were I've put a star manually after the point.
an extra option I want to implement are words to detect in a sentence so that the sentence will be automatically added to the variable 'Vanecdotes2' or 'Vdelete2'
An random example would be this (I already have put ! and * after the sentence (why is not important) and of which "acquisition" is an example op Vanecdotes2 of my point 4 above):
Last procedure on 19/8/2019.
Normal structure x1.!
Normal structure x2.!
Abberant structure x3, needs follow-up within 2 months.
Structure x4 is lower in activity, but still above p25.
Abberant structure x4, needs follow-up within 6 weeks.
Normal structure x5.
Good aqcuisition of x6.
So the output of the Regex in the variables should be
Last procedure on 19/8/2019.
Normal structure x1.! --> regex '.!' --> Vdelete1
Normal structure x2.! --> regex '.!' --> Vdelete1
Abberant structure x3, needs follow-up within 2 months. --> Regex '.' = Vimportant
Structure x4 is lower in activity, but still above p25.* --> regex '.*' = Vanecdote1
Abberant structure x4, needs follow-up within 6 weeks. --> Regex '.' = Vimportant
Normal structure x5.! --> regex '.!' --> Vdelete1
Good aqcuisition of x6. --> Regex 'sentence with the word acquisition' = Vanecdote2
And the output should be:
'- Last procedure on 19/8/2019.
- Abberant structure x3, needs follow-up within 2 months.
- Abberant structure x4, needs follow-up within 6 weeks.
. Structure x4 is lower inactivity, but still above p25.
. Good aqcuisition of x6.
But I have been having a lot of trouble with the regex, especialy with the selection of sentences ending on a * or !. But also with the exclusion criteria, they just don't want to do it.
Because AHT doesn't have a real good tester, I first tested it in another regex tester and I was planning to 'translate' it later on to AHK code.. but it just doesn't work. (so I know in the script below I'm using AHK language with nonAHK regex, but I've just put the to together for illustration)
This is what i have now:
Send ^c
clipwait, 1000
Temp := Clipboard
Regexmatch(Temp, "^.*[.]\n(?!^.*\(Anecdoteword1|Anecdoteword2|deletewordX|deletewordY)\b.*$)", Vimportant)
Regexmatch(Temp, "^.*[.][*]\n")", Vanecdotes1)
Regexmatch(Temp, "^.*[.][!]\n")", Vdelete1)
Regexmatch(Temp, "^.*\b(Anecdoteword1|Anecdoteword2)\b.*$")", Vanecdotes2)
Regexmatch(Temp, "^.*\b(deletewordX|deletewordY)\b.*$")", Vdelete2)
Vanecdotes_tot := Vanecdotes1 . Vanecdotes2
Vdelete_tot := Vdelete1 . Vdelete2
Vanecdotes_ster := "* " . StrReplace(Vanecdotes_tot, "`r`n", "`r`n* ")
Vimportant_stripe := "- " . StrReplace(Vimportant, "`r`n", "`r`n- ")
Vresult := Vimportant_stripe . "`n`n" . Vanecdotes_ster
For "translation to AHK" I tried to make ^.*\*'n from the working (non ahk) regex ^.*[.][*]\n.
There isn't really such a thing as AHK regex. AHK pretty much uses PCRE, apart from the options.
So don't try to turn a linefeed \n into an AHK linefeed `n.
And there seem to be some syntax errors in your regexes. Not quite sure what those extra ") in there are supposed to be. Also, instead of using [.][*], you're supposed to use \.\*. The \ is required with those specific characters to escape their normal functionality (any character and match between zero and unlimited).
[] is to match any character in that group, like if you wanted to match either . or * you'd do [.*].
And seems like you got the idea of using capture groups, but just in case, here's a minimal example about them:
RegexMatch("TestTest1233334Test", "(\d+)", capture)
MsgBox, % capture
And lastly, about your approach to the problem, I'd recommend looping through the input line by line. It'll be much better/easier. Use e.g LoopParse.
Minimal example for it as well:
inp := "
(
this is
a multiline
textblock
we're going
to loop
through it
line by line
)"
Loop, Parse, inp, `n, `r
MsgBox, % "Line " A_Index ":`n" A_LoopField
Hope this was of help.
This i were i al up till now, nothing works (i will try the suggested loop when Regex is working): ^m::
BlockInput, On
MouseGetPos, , ,TempID, control
WinActivate, ahk_id %TempID%
if WinActive("Pt.")
Send ^c
clipwait, 1000
Temp := Clipboard
Regexmatch(Temp, "(^(?:..\n)((?! PAX|PAC|Normaal|Geen).)$)", Vimportant)
Vimportant := Vimportant.1
Regexmatch(Temp, "(^..*\n)", Vanecdotes1_ster)
Regexmatch(Temp, "(^..!\n)" , Vdelete1_uitroep)
Regexmatch(Temp, "(^.\b(PAX|PAC)\b.$)", Vanecdotes2)
Regexmatch(Temp, "(^.\b(Normaal|Geen)\b.$)", Vdelete2)
Vanecdotes1 := StrReplace(Vanecdotes1_ster, ".", ".")
Vdelete1 := StrReplace(Vdelete1_uitroep, ".!", ".")
Vanecdotes_tot := Vanecdotes1 . Vanecdotes2
Vdelete_tot := Vdelete1 . Vdelete2
Vanecdotes_ster := " " . StrReplace(Vanecdotes_tot, "rn", "rn* ")
Vimportant_stripe := "- " . StrReplace(Vimportant, "rn", "rn- ")
Vresult := Vimportant_stripe . "nn" . Vanecdotes_ster
Clipboard := Vresult
Send ^v
return

Have Tabulize ignore some lines and align the others

I would want Tabulize to ignore lines which do not have a particular character and then align/tabularize the lines ..
text1_temp = text_temp;
temporary_line;
text2 = text_temp;
In the end i would like the following :
text1_temp = text_temp;
temporary_line;
text2 = text_temp;
// The 2nd "=" is spaced/tabbed with relation to the first "="
If i run ":Tabularize /=" for the 3 lines together I get :
text1_temp = text_temp;
temporary_line;
text2 = text_temp;
Where the two lines with "=" are aligned with respect to the length of the middle line
Any suggestions .. ?
PS: I edited the post possibly to explain the need better ..
I am not sure how to do this with Tabular directly. You might be able to use Christian Brabandt's NrrwRgn plugin to filter out only lines with = using :NRP then running :NRM. This will give you a new buffer with only the lines with = so you can run :tabularize/=/ and then save the the buffer (:w, :x, etc).
:g/=/NRP
:NRM
:tabularize/=/
:x
The easiest option is probably to use vim-easy-align which supports such behavior out of the box it seems. Example of using EasyAlign (Using ga as EasyAlign's mapping you):
gaip=
What about a simple replace, like :g/=/s/\t/ /g ?
If that doesn't work, you can try this too: :g/=/s/ \+= \+/ = /g
Explanation:
The :/g/=/s will find all the lines that contain '=', and do the replacement for them.
So, s/\t/ /g will replace tabs with spaces. These two things combined will do what you need.

Join lines after specific word till another specific word

I have a .txt file of a transcript that looks like this
MICHEAL: blablablabla.
further talk by Michael.
more talk by Michael.
VALERIE: blublublublu.
Valerie talks more.
MICHAEL: blibliblibli.
Michael talks again.
........
All in all this pattern goes on for up to 4000 lines and not just two speakers but with up to seven different speakers, all with unique names written with upper-case letters (as in the example above).
For some text mining I need to rearrange this .txt file in the following way
Join the lines following one speaker - but only the ones that still belong to him - so that the above file looks like this:
MICHAEL: blablablabla. further talk by Michael. more talk by Michael.
VALERIE: blublublublu. Valerie talks more.
MICHAEL: blibliblibli. Michael talks again.
Sort the now properly joined lines in the .txt file alphabetically, so that all lines spoken by a speaker are now together. But, the sort function should not sort the sentences spoken by one speaker (after having sorted each speakers lines together).
I know some basic vim commands, but not enough to figure this out. Especially, the first one. I do not know what kind of pattern I can implement in vim so that it only joins the lines of each speaker.
Any help would be greatly apperciated!
Alright, first the answer:
:g/^\u\+:/,/\n\u\+:\|\%$/join
And now the explanation:
g stands for global and executes the following command on every line that matches
/^\u+:/ is the pattern :g searches for : ^ is start of line, \u is a upper case character, + means one or more matches and : is unsurprisingly :
then comes the tricky bit, we make the executed command a range, from the match so some other pattern match. /\n\u+:\|\%$ is two parts parted by the pipe \| . \n\u+: is a new line followed by the last pattern, i.e. the line before the next speaker. \%$ is the end of the file
join does what it says on the tin
So to put it together: For each speaker, join until the line before the next speaker or the end of the file.
The closest to the sorting I now of is
:sort /\u+:/ r
which will only sort by speaker name and reverse the other line so it isn't really what you are looking for
Well I don't know much about vim, but I was about to match lines corresponding particular speaker and here is the regex for that.
Regex: /([A-Z]+:)([A-Za-z\s\.]+)(?!\1)$/gm
Explanation:
([A-Z]+:) captures the speaker's name which contains only capital letters.
([A-Za-z\s\.]+) captures the dialogue.
(?!\1)$ backreferences to the Speaker's name and compares if the next speaker was same as the last one. If not then it matches till the new speaker is found.
I hope this will help you with matching at least.
In vim you might take a two step approach, first replace all newlines.
:%s/\n\+/ /g
Then insert a new line before the terms UPPERCASE: except the first one:
:%s/ \([[:upper:]]\+:\)/\r\1/g
For the sorting you can leverage the UNIX sort program:
:%sort!
You can combine them using a pipe symbol:
:%s/\n\+/ /g | %s/ \([[:upper:]]\+:\)/\r\1/g | %!sort
and map them to a key in your vimrc file:
:nnoremap <F5> :%s/\n\+/ /g \| %s/ \([[:upper:]]\+:\)/\r\1/g \| %sort! <CR>
If you press F5 in normal mode, the transformation happens. Note that the | needs to get escaped in the nnoremap command.
Here is a script solution to your problem.
It's not well tested, so I added some comments so you can fix it easily.
To make it run, just:
fill the g:speakers var in the top of the script with the uppercase names you need;
source the script (ex: :sav /tmp/script.vim|so %);
run :call JoinAllSpeakLines() to join the lines by speakers;
run :call SortSpeakLines() to sort
You may adapt the different patterns to better fit your needs, for example adding some space tolerance (\u\{2,}\s*\ze:).
Here is the code:
" Fill the following array with all the speakers names:
let g:speakers = [ 'MICHAEL', 'VALERIE', 'MATHIEU' ]
call sort(g:speakers)
function! JoinAllSpeakLines()
" In the whole file, join all the lines between two uppercase speaker names
" followed by ':', first inclusive:
silent g/\u\{2,}:/call JoinSpeakLines__()
endf
function! SortSpeakLines()
" Sort the whole file by speaker, keeping the order for
" each speaker.
" Must be called after JoinAllSpeakLines().
" Create a new dict, with one key for each speaker:
let speakerlines = {}
for speaker in g:speakers
let speakerlines[speaker] = []
endfor
" For each line in the file:
for line in getline(1,'$')
let speaker = GetSpeaker__(line)
if speaker == ''
continue
endif
" Add the line to the right speaker:
call add(speakerlines[speaker], line)
endfor
" Delete everything in the current buffer:
normal gg"_dG
" Add the sorted lines, speaker by speaker:
for speaker in g:speakers
call append(line('$'), speakerlines[speaker])
endfor
" Delete the first (empty) line in the buffer:
normal gg"_dd
endf
function! GetOtherSpeakerPattern__(speaker)
" Returns a pattern which matches all speaker names, except the
" one given as a parameter.
" Create an new list with a:speaker removed:
let others = copy(g:speakers)
let idx = index(others, a:speaker)
if idx != -1
call remove(others, idx)
endif
" Create and return the pattern list, which looks like
" this : "\v<MICHAEL>|<VALERIE>..."
call map(others, 'printf("<%s>:",v:val)')
return '\v' . join(others, '|')
endf
function! GetSpeaker__(line)
" Returns the uppercase name followed by a ':' in a line
return matchstr(a:line, '\u\{2,}\ze:')
endf
function! JoinSpeakLines__()
" When cursor is on a line with an uppercase name, join all the
" following lines until another uppercase name.
let speaker = GetSpeaker__(getline('.'))
if speaker == ''
return
endif
normal V
" Search for other names after the cursor line:
let srch = search(GetOtherSpeakerPattern__(speaker), 'W')
echo srch
if srch == 0
" For the last one only:
normal GJ
else
normal kJ
endif
endf

findall function grabbing the wrong info

I am trying to writing a piece of python to read my files. The code is below:
import re, os
captureLevel = [] # capture read scale.
captureQID = [] # capture questionID.
captureDesc = [] # capture description.
file=open(r'E:\Grad\LIS\LIS590 Text mining\Final_Project\finalproject_data.csv','rt')
newfile=open('finalwordlist.csv','w')
mytext=file.read()
for row in mytext.split('\n'):
grabLevel=re.findall(r'(\d{1})+\n',row)
captureLevel.append(grabLevel)
grabQID=re.findall(r'(\w{1}\d{5})',row)
captureQID.append(grabQID) #ERROR LINE.
grabDesc=re.findall(r'\,+\s+(\w.+)',row)
captureDesc.append(grabDesc)
lineCount = 0
wordCount = 0
lines = ''.join(grabDesc).split('.')
for line in lines:
lineCount +=1
for word in line.split(' '):
wordCount +=1
newfile.write(''.join(grabLevel) + '|' + ''.join(grabQID) + '|' + str(lineCount) + '|' + str(wordCount) + '|' + word + '\n')
newfile.close()
Here are three lines of my data:
a00004," another oakstr eetrequest, helped student request item",2
a00005, asked retiree if he used journal on circ list,2
a00006, asked scientist about owner of some archival notes,2
Here is the result:
22|a00002|1|1|a00002,
22|a00002|1|2|
22|a00002|1|3|scientist
22|a00002|1|4|looking
22|a00002|1|5|for
The first column of the result should be just one number, but why is it printing out a two digit number?
Any idea what is the problem here? Thanks.
It is the tab and space difference again. Need to be careful especially for Python. Spaces are not treated as equivalent to tab. Here is a helpful link talking about the difference: http://legacy.python.org/dev/peps/pep-0008/. To be brief, space is recommended for indentation in the post. However, I find Tab works fine for indentation too. It is important to keep indentation consistent. So if you use tab, make sure you use it all the way.

Substituting zero-width match in vim script

I have written this script that replaces many spaces around the cursor with one space. This however doesn't work when I use it with no spaces around the cursor. It seems to me that Vim doesn't replace on a zero-width match.
function JustOneSpace()
let save_cursor = getpos(".")
let pos = searchpos(' \+', 'bc')
s/\s*\%#\s*/ /e
let save_cursor[2] = pos[1] + 1
call setpos('.', save_cursor)
endfunction
nmap <space> :call JustOneSpace()<cr>
Here are a few examples (pipe | is cursor):
This line
hello | world
becomes
hello |world
But this line
hello wo|rld
doesn't become
hello wo |rld
Update: By changing the function to the following it works for the examples above.
function JustOneSpace()
let save_cursor = getpos(".")
let pos = searchpos(' *', 'bc')
s/\s*\%#\s*/ /e
let save_cursor[2] = pos[1] + 1
call setpos('.', save_cursor)
endfunction
This line
hello |world
becomes
hello w|orld
The problem is that the cursors moves to the next character. It should stay in the same place.
Any pointers and or tips?
I think that the only problem with your script is that the position saving doesn't seem correct. You can essentially do what you are trying to do with:
:s/\s*\%#\s*/ /e
which is identical to the (correct) code in your question. You could simply map this with:
:nmap <space> :s/\s*\%#\s*/ /e<CR>
If you want to save the position, it gets a little more complicated. Probably the best bet is to use something like this:
function! JustOneSpace()
" Get the current contents of the current line
let current_line = getline(".")
" Get the current cursor position
let cursor_position = getpos(".")
" Generate a match using the column number of the current cursor position
let matchRE = '\(\s*\)\%' . cursor_position[2] . 'c\s*'
" Find the number of spaces that precede the cursor
let isolate_preceding_spacesRE = '^.\{-}' . matchRE . '.*$'
let preceding_spaces = substitute(current_line, isolate_preceding_spacesRE, '\1', "")
" Modify the line by replacing with one space
let modified_line = substitute(current_line, matchRE, " ", "")
" Modify the cursor position to handle the change in string length
let cursor_position[2] -= len(preceding_spaces) - 1
" Set the line in the window
call setline(".", modified_line)
" Reset the cursor position
call setpos(".", cursor_position)
endfunction
Most of that is comments, but the key thing is that you look at the length of the line before and after the substitution and decide on the new cursor position accordingly. You could do this with your method by comparing len(getline(".")) before and after if you prefer.
Edit
If you want the cursor to end after the space character, modify the line:
let cursor_position[2] -= len(current_line) - len(modified_line)
such that it looks like this:
let cursor_position[2] -= (len(current_line) - len(modified_line)) - 1
Edit (2)
I've changed the script above to consider your comments such that the cursor position is only adjusted by the number of spaces before the cursor position. This is done by creating a second regular expression that extracts the spaces preceding the cursor (and nothing else) from the line and then adjusting the cursor position by the number of spaces.
I don't use vim, but if you want to match zero or more spaces, shouldn't you be using ' *' instead of ' \+' ?
EDIT: re the cursor positioning problem: what you're doing now is setting the position at the beginning of the whitespace before you do the substitution, then moving it forward one position so it's after the space. Try setting it at the end of the match instead, like this:
search(' *', 'bce')
That way, any additions or removals will occur before the cursor position. In most editors, the cursor position automatically moves to track such changes. You shouldn't need to do any of that getpos/setpos stuff.
This function is based on Al's answer.
function JustOneSpace()
" Get the current contents of the current line
let current_line = getline(".")
" Get the current cursor position
let cursor_position = getpos(".")
" Generate a match using the column number of the current cursor position
let matchre = '\s*\%' . cursor_position[2] . 'c\s*'
let pos = match(current_line, matchre) + 2
" Modify the line by replacing with one space
let modified_line = substitute(current_line, matchre, " ", "")
" Modify the cursor position to handle the change in string length
let cursor_position[2] = pos
" Set the line in the window
call setline(".", modified_line)
" Reset the cursor position
call setpos(".", cursor_position)
endfunction
Instead using the difference between the normal and the modified line, I find the position of the first space that will match the regular expression of the substitution. Then I set the cursor position to that position + 1.
This simple one I use does almost the same:
nnoremap <leader>6 d/\S<CR>
Put the cursor till where you want to remove the spaces and it removes all the spaces after the cursor and the next text.