Capturing preceding whitespace in Julia - regex

I have a very long piece of code that I need to add to, and would prefer to do it using a script rather than write myself for fear of introducing errors.
I have commands that look like this
rename oldname newname
rename oldname2 newname2
I want to, whenever I see the command "rename" I want to add a "note" command
rename oldname newname
note newname: "A Note"
rename oldname2 newname2
note newname2: "A Note"
I am using Julia's read and write features to do this, and it has been very easy so far.
f = open("renaming.txt") # open input file
g = open("renaming_output.txt", "w") # open output file
for ln in eachline(f)
write(g, "$ln \n") # write the command, no matter what it is
stripped = lstrip("$ln") # Strip whitespace so I can use "startswith" command
if startswith(stripped, "ren")
words = split("$ln", " ") # split on space to get the "newvar" name
println("$ln \n") #check that I am working with a rename command
println("note ", words[3]":") # check that the note function prints
note_command = string("note ", words[3], ": \n") # construct the note command
write(g, note_command) #write it to the output file.
end
end
My issue is with the indentation. The above code writes the "note" command on the far left, without any indentation. However, Ideally I would like the note command to be indented one level further than the rename command. But I can't figure out how to capture all the preceeding whitespace.
I presume that the answer involves using the match and m.match functions, but I can't get it to work.
Any help would be appreciated.

On Julia 0.7 the simplest change in your code would be to replace
println("note ", words[3]":")
with
println(first(ls, search(ls, 'r')-1), " note ", words[3]":")
Using regular expressions you can write rx = r"(?:(?!r).)*" at the start of your code and then:
println(match(rx, ls).match, " note ", words[3]":")
In both cases we take care to retain the start of ls till 'r' in its original form.

With Julia 6.1, my solution, with the help the answer, is as follows
if startswith(stripped, "ren") & !startswith(stripped, "renvars")
leftpad = " " ^ search("$ln", 'r')
words = split(stripped, " ")
varname = string(leftpad, " note ", words[3], ": ", words[2], " \n")
print(varname)
write(g, varname)
end
With the leftpad = ^ search("$ln", 'r') being the key addition. Given that the left padding of my code is always tabs, I just insert the number of tabs as there are characters before the first r. This solution works in 0.6.1, but search may behave differently in .7.

Related

Remove lines from buffer that match the selected text

When analyzing large log files, I often remove lines containing text I find irrelevant:
:g/whatever/d
Sometimes I find text that spans multiple lines, like stacktraces. For that, I record the steps taken (search, go to start anchor, delete to end anchor) and replay that macro with 100000#q. I'm searching for a function or a feature vim already has included that allows me to mark text and remove all lines containing this text. Ideally this would also work for block selection.
If I understood your problem right, this command should do what you want:
:g/NullPointer/,/omitt/d
Example:
Before:
1
2
3
NullPointerException1
4
5
6
omitted
7
NullPointerException2
8
9
omitted
10
After:
1
2
3
7
10
Please read :h edit-paragraph-join, there is good explanation for the command, your case is just changing join into d
:g/whatever/d2
will delete a line with whatever and the line after it. If you can find text that always happens in the first line, you can strip out all of the following text if it has the same number of lines by changing 2 to whatever you need.
You could actually just use some normal commands in a global command to achieve what you want, look at your example (hope i understood it more or less right):
someText
NullPointerException
...
omitted
you want to delte from the line above NPE until the line with omitted right?
Just use the following:
:g/NullPointerException/execute "normal! kddd/omitted\<cr>dd"
It maybe looks complex, but it isn't. It is not better than a macro1
, but i like commands more, because I always make errors recording macros.
Since it only uses normal vim movements, it is easy to adopt. If you f.e. not know where your previous anchor is, you could use ?anchor\<cr> instead of kd. For a better demonstration you will have to submit a realistic example.
[1] You could argue, that this only needs to be run once, but that is also true for a recursive macro http://vim.wikia.com/wiki/Record_a_recursive_macro
Thanks to the answers here, I was able to code a very handy function: The sources below enables one to select text and remove all lines with the same (or similar) text in the current buffer. That works with both in-line and multiline selection. As I said I was searching for something that made me faster in analyzing log files. Log files typically contain dates and times and these change all the time, so it's a good idea to have something that let's us ignore numbers. Let's see. I'm using these two mappings:
vnoremap d :<C-U>echo RemoveSelectionFromBuffer(0)<CR>
vnoremap D :<C-U>echo RemoveSelectionFromBuffer(1)<CR>
Typical usage:
Remove similar lines ignoring numbers: Shift+v, then Shift+d
Remove same matches (single line): Mark text inline (leaving out dates and times), then d
Remove same matches (multiline): Mark text across lines (leaving out dates and times), then d
Here's the source code:
" Removes lines matching the selected text from buffer.
function! RemoveSelectionFromBuffer(ignoreNumbers)
let lines = GetVisualSelection() " selected lines
" Escape backslashes and slashes (delimiters)
call map(lines, {k, v -> substitute(v, '\\\|/', '\\&', 'g')})
if a:ignoreNumbers == 1
" Substitute all numbers with \s*\d\s* - in formatted output matching
" lines may have whitespace instead of numbers. All backslashes need
" to be escaped because \V (very nomagic) will be used.
call map(lines, {k, v -> substitute(v, '\s*\d\+\s*', '\\s\\*\\d\\+\\s\\*', 'g')})
endif
let blc = line('$') " number of lines in buffer (before deletion)
let vlc = len(lines) " number of selected lines
let pattern = join(lines, '\_.') " support multiline patterns
let cmd = ':g/\V' . pattern . '/d_' . vlc " delete matching lines (d_3)
let pos = getpos('v') " save position
execute "silent " . cmd
call setpos('.', pos) " restore position
let dlc = blc - line('$') " number of deleted lines
let dmc = dlc / vlc " number of deleted matches
let cmd = substitute(cmd, '\(.\{50\}\).*', '\1...', '') " command output
let lout = dlc . ' line' . (dlc == 1 ? '' : 's')
let mout = '(' . dmc . ' match' . (dmc == 1 ? '' : 'es') . ')'
return printf('%s removed: %s', (vlc == 1 ? lout : lout . ' ' . mout), cmd)
endfunction
I took the GetVisualSelection() code from this answer.
function! GetVisualSelection()
if mode() == "v"
let [line_start, column_start] = getpos("v")[1:2]
let [line_end, column_end] = getpos(".")[1:2]
else
let [line_start, column_start] = getpos("'<")[1:2]
let [line_end, column_end] = getpos("'>")[1:2]
end
if (line2byte(line_start)+column_start) > (line2byte(line_end)+column_end)
let [line_start, column_start, line_end, column_end] =
\ [line_end, column_end, line_start, column_start]
end
let lines = getline(line_start, line_end)
if len(lines) == 0
return ''
endif
let lines[-1] = lines[-1][: column_end - 1]
let lines[0] = lines[0][column_start - 1:]
return lines
endfunction
Thanks, aepksbuck, DoktorOSwaldo and Kent.

Python Search File For Specific Word And Find Exact Match And Print Line

I wrote a script to print the lines containing a specific word from a bible txt file.The problem is i couldn't get the exact word with the line instead it prints all variations of the word.
For eg. if i search for "am" it prints sentences with words containing "lame","name" etc.
Instead i want it to print only the sentences with "am" only
i.e, "I am your saviour", "Here I am" etc
Here is the code i use:
import re
text = raw_input("enter text to be searched:")
shakes = open("bible.txt", "r")
for line in shakes:
if re.match('(.+)' +text+ '(.+)', line):
print line
This is another approach to take to complete your task, it may be helpful although it doesn't follow your current approach very much.
The test.txt file I fed as input had four sentences:
This is a special cat. And this is a special dog. That's an average cat. But better than that loud dog.
When you run the program, include the text file. In command line, that'd look something like:
python file.py test.txt
This is the accompanying file.py:
import fileinput
key = raw_input("Please enter the word you with to search for: ")
#print "You've selected: ", key, " as you're key-word."
with open('test.txt') as f:
content = str(f.readlines())
#print "This is the CONTENT", content
list_of_sentences = content.split(".")
for sentence in list_of_sentences:
words = sentence.split(" ")
for word in words:
if word == key:
print sentence
For the keyword "cat", this returns:
That is a special cat
That's an average cat
(note the periods are no longer there).
I think if you, in the strings outside text, put spaces like this:
'(.+) ' + text + ' (.+)'
That would do the trick, if I correctly understand what is going on in the code.
re.findall may be useful in this case:
print re.findall(r"([^.]*?" + text + "[^.]*\.)", shakes.read())
Or even without regex:
print [sentence + '.' for sentence in shakes.split('.') if text in sentence]
reading this text file:
I am your saviour. Here I am. Another sentence.
Second line.
Last line. One more sentence. I am done.
both give same results:
['I am your saviour.', ' Here I am.', ' I am done.']

Dealing with Spaces and NA's when Uniting Multiple Columns with Tidyr

So using the simple dataframe below, I want to create a new column that has all the days for each person, separated by a semi-colon.
For example, using Doug, it should look like - Monday; Wednesday; Friday
I would like to use Tidyr's Unite function for this but when I use it, I get - Monday;;Wednesday;;Friday, because of the NA's, which also could be blank spaces as well. Sometimes there are semi-colons at the beginning and end as well. So I'm hoping there's a way to keep using "unite" but enhanced with a regular expression so that I end up with each day of the week separated by one semi-colon, and no semi-colons at the beginning or end.
I would also like to stick with Tidyr, Dplyr, Stringr, etc.
Names<-c("Doug","Ken","Erin","Yuki","John")
Monday<-c("Monday"," "," ","Monday","Monday")
Tuesday<-c(" ","Tuesday","Tuesday"," ","Tuesday")
Wednesday<-c(" ","Wednesday","Wednesday","Wednesday"," ")
Thursday<-c(" "," "," "," ","Thursday")
Friday<-c(" "," "," "," ","Friday")
Days<-data.frame(Monday,Tuesday,Wednesday,Thursday,Friday)
Days<-Days%>%unite(BestDays,Monday,Tuesday,Wednesday,Thursday,Friday,sep="; ",remove=FALSE)
You can try :
Names<-c("Doug","Ken","Erin","Yuki","John")
Monday<-c("Monday",NA,NA,"Monday","Monday")
Tuesday<-c(NA,"Tuesday","Tuesday",NA,"Tuesday")
Wednesday<-c(NA,"Wednesday","Wednesday","Wednesday",NA)
Thursday<-c(NA,NA,NA,NA,"Thursday")
Friday<-c(NA,NA,NA,NA,"Friday")
Days<-data.frame(Monday,Tuesday,Wednesday,Thursday,Friday)
concat_str = function(str) str %>% na.omit %>% paste(collapse = "; ")
Days$BestDaysConcat = apply(Days[,c("Monday","Tuesday","Wednesday","Thursday","Friday")], 1, concat_str)
From getAnywhere("unite_.data.frame"), unite is calling do.call("paste", c(data[from], list(sep = sep))) underhood, and paste as far as I know doesn't provide a functionality to omit NAs unless manually implemented in some way;
Nevertheless, you can use a regular expression method as follows with gsub from base R to clean up the result column:
gsub("^\\s;\\s|;\\s{2}", "", Days$BestDays)
# [1] "Monday" "Tuesday; Wednesday"
# [3] "Tuesday; Wednesday" "Monday; Wednesday"
# [5] "Monday; Tuesday; Thursday; Friday"
This removes either ^\\s;\\s pattern or ;\\s{2} pattern, the former handle the case when the string starts with space string where we can just remove the space and it's following ;\\s, otherwise remove ;\\s{2} which can handle cases where \\s are both in the middle of the string and at the end of the string.

Join lines after specific word till another specific word

I have a .txt file of a transcript that looks like this
MICHEAL: blablablabla.
further talk by Michael.
more talk by Michael.
VALERIE: blublublublu.
Valerie talks more.
MICHAEL: blibliblibli.
Michael talks again.
........
All in all this pattern goes on for up to 4000 lines and not just two speakers but with up to seven different speakers, all with unique names written with upper-case letters (as in the example above).
For some text mining I need to rearrange this .txt file in the following way
Join the lines following one speaker - but only the ones that still belong to him - so that the above file looks like this:
MICHAEL: blablablabla. further talk by Michael. more talk by Michael.
VALERIE: blublublublu. Valerie talks more.
MICHAEL: blibliblibli. Michael talks again.
Sort the now properly joined lines in the .txt file alphabetically, so that all lines spoken by a speaker are now together. But, the sort function should not sort the sentences spoken by one speaker (after having sorted each speakers lines together).
I know some basic vim commands, but not enough to figure this out. Especially, the first one. I do not know what kind of pattern I can implement in vim so that it only joins the lines of each speaker.
Any help would be greatly apperciated!
Alright, first the answer:
:g/^\u\+:/,/\n\u\+:\|\%$/join
And now the explanation:
g stands for global and executes the following command on every line that matches
/^\u+:/ is the pattern :g searches for : ^ is start of line, \u is a upper case character, + means one or more matches and : is unsurprisingly :
then comes the tricky bit, we make the executed command a range, from the match so some other pattern match. /\n\u+:\|\%$ is two parts parted by the pipe \| . \n\u+: is a new line followed by the last pattern, i.e. the line before the next speaker. \%$ is the end of the file
join does what it says on the tin
So to put it together: For each speaker, join until the line before the next speaker or the end of the file.
The closest to the sorting I now of is
:sort /\u+:/ r
which will only sort by speaker name and reverse the other line so it isn't really what you are looking for
Well I don't know much about vim, but I was about to match lines corresponding particular speaker and here is the regex for that.
Regex: /([A-Z]+:)([A-Za-z\s\.]+)(?!\1)$/gm
Explanation:
([A-Z]+:) captures the speaker's name which contains only capital letters.
([A-Za-z\s\.]+) captures the dialogue.
(?!\1)$ backreferences to the Speaker's name and compares if the next speaker was same as the last one. If not then it matches till the new speaker is found.
I hope this will help you with matching at least.
In vim you might take a two step approach, first replace all newlines.
:%s/\n\+/ /g
Then insert a new line before the terms UPPERCASE: except the first one:
:%s/ \([[:upper:]]\+:\)/\r\1/g
For the sorting you can leverage the UNIX sort program:
:%sort!
You can combine them using a pipe symbol:
:%s/\n\+/ /g | %s/ \([[:upper:]]\+:\)/\r\1/g | %!sort
and map them to a key in your vimrc file:
:nnoremap <F5> :%s/\n\+/ /g \| %s/ \([[:upper:]]\+:\)/\r\1/g \| %sort! <CR>
If you press F5 in normal mode, the transformation happens. Note that the | needs to get escaped in the nnoremap command.
Here is a script solution to your problem.
It's not well tested, so I added some comments so you can fix it easily.
To make it run, just:
fill the g:speakers var in the top of the script with the uppercase names you need;
source the script (ex: :sav /tmp/script.vim|so %);
run :call JoinAllSpeakLines() to join the lines by speakers;
run :call SortSpeakLines() to sort
You may adapt the different patterns to better fit your needs, for example adding some space tolerance (\u\{2,}\s*\ze:).
Here is the code:
" Fill the following array with all the speakers names:
let g:speakers = [ 'MICHAEL', 'VALERIE', 'MATHIEU' ]
call sort(g:speakers)
function! JoinAllSpeakLines()
" In the whole file, join all the lines between two uppercase speaker names
" followed by ':', first inclusive:
silent g/\u\{2,}:/call JoinSpeakLines__()
endf
function! SortSpeakLines()
" Sort the whole file by speaker, keeping the order for
" each speaker.
" Must be called after JoinAllSpeakLines().
" Create a new dict, with one key for each speaker:
let speakerlines = {}
for speaker in g:speakers
let speakerlines[speaker] = []
endfor
" For each line in the file:
for line in getline(1,'$')
let speaker = GetSpeaker__(line)
if speaker == ''
continue
endif
" Add the line to the right speaker:
call add(speakerlines[speaker], line)
endfor
" Delete everything in the current buffer:
normal gg"_dG
" Add the sorted lines, speaker by speaker:
for speaker in g:speakers
call append(line('$'), speakerlines[speaker])
endfor
" Delete the first (empty) line in the buffer:
normal gg"_dd
endf
function! GetOtherSpeakerPattern__(speaker)
" Returns a pattern which matches all speaker names, except the
" one given as a parameter.
" Create an new list with a:speaker removed:
let others = copy(g:speakers)
let idx = index(others, a:speaker)
if idx != -1
call remove(others, idx)
endif
" Create and return the pattern list, which looks like
" this : "\v<MICHAEL>|<VALERIE>..."
call map(others, 'printf("<%s>:",v:val)')
return '\v' . join(others, '|')
endf
function! GetSpeaker__(line)
" Returns the uppercase name followed by a ':' in a line
return matchstr(a:line, '\u\{2,}\ze:')
endf
function! JoinSpeakLines__()
" When cursor is on a line with an uppercase name, join all the
" following lines until another uppercase name.
let speaker = GetSpeaker__(getline('.'))
if speaker == ''
return
endif
normal V
" Search for other names after the cursor line:
let srch = search(GetOtherSpeakerPattern__(speaker), 'W')
echo srch
if srch == 0
" For the last one only:
normal GJ
else
normal kJ
endif
endf

Scite Lua - escaping right bracket in regex?

Bumped into a somewhat weird problem... I want to turn the string:
a\left(b_{d}\right)
into
a \left( b_{d} \right)
in Scite using a Lua script.
So, I made the following Lua script for Scite:
function SpaceTexEquations()
editor:BeginUndoAction()
local sel = editor:GetSelText()
local cln3 = string.gsub(sel, "\\left(", " \\left( ")
local cln4 = string.gsub(cln3, "\\right)", " \\right) ")
editor:ReplaceSel(cln4)
editor:EndUndoAction()
end
The cln3 line works fine, however, cln4 crashes with:
/home/user/sciteLuaFunctions.lua:49: invalid pattern capture
>Lua: error occurred while processing command
I think this is because bracket characters () are reserved characters in Lua; but then, how come the cln3 line works without escaping? By the way I also tried:
-- using backslash \ as escape char:
local cln4 = string.gsub(cln3, "\\right\)", " \\right) ") -- crashes all the same
-- using percentage sign % as escape chare
local cln4 = string.gsub(cln3, "\\right%)", " \\right) ") -- does not crash, but does not match either
Could anyone tell me what would be the correct way to do this?
Thanks,
Cheers!
The correct escape character in Lua is %, so what you tried should work, I just tried
local sel = [[a\left(b_{d}\right)]]
local cln3 = string.gsub(sel, "\\left%(", " \\left( ")
local cln4 = string.gsub(cln3, "\\right%)", " \\right) ")
print (cln4)
and got
a \left( b_{d} \right)
so, this worked for me when I tried it, what did you get as a match when you tried %