Emacs compilation-mode regex for multiple lines

Emacs compilation-mode regex for multiple lines - regex

So I have a tool lints python changes I've made and produces errors and warnings. I would like this to be usable in compile mode with Emacs, but I have an issue. The file name is output only once at the beginning, and then only line numbers appear with the errors and warnings. Here's an example:
Linting file.py
E0602: 37: Undefined variable 'foo'
C6003: 42: Unnecessary parens after 'print' keyword
2 new errors, 2 total errors in file.py.
It's very similar to pylint, but there's no output-format=parseable option. I checked the documentation for compilation-error-regexp-alist, and found something promising:
If FILE, LINE or COLUMN are nil or that index didn't match, that
information is not present on the matched line. In that case the
file name is assumed to be the same as the previous one in the
buffer, line number defaults to 1 and column defaults to
beginning of line's indentation.
So I tried writing a regexp that would optionally match the file line and pull it out in a group, and then the rest would match the other lines. I assumed that it would first match
Linting file.py
E0602: 37: Undefined variable 'foo'
and be fine. Then it would continue and match
C6003: 42: Unnecessary parens after 'print' keyword
with no file. Since there was no file, it should use the file name from the previous match right? Here's the regexp I'm using:
(add-to-list 'compilation-error-regexp-alist 'special-lint)
(add-to-list 'compilation-error-regexp-alist-alist
'(special-lint
"\\(Linting \\(.*\\)\\n\\)?\\([[:upper:]][[:digit:]]+:\\s-+\\([[:digit:]]\\)+\\).*"
2 4 nil nil 3))
I've checked it with re-builder and manually in the scratch buffer. It behaves as expected. the 2nd group is the file name, the 4th is the line number, and the 3rd is what I want highlighted. Whenever I try this, I get the error:
signal(error ("No match 2 in highlight (2 compilation-error-face)"))
I have a workaround for this that involves transforming the output before the compile module looks at it, but I'd prefer to get rid of that and have a "pure" solution. I would appreciate any advice or pointing out any dumb mistakes I may have made.
EDIT
Thomas' pseudo code below worked quite well. He mentioned that doing a backwards re search could mess up the match data, and it did. But that was solved by adding the save-match-data special form before save-excursion.

FILE can also have the form (FILE
FORMAT...), where the FORMATs (e.g.
"%s.c") will be applied in turn to the
recognized file name, until a file of
that name is found. Or FILE can also
be a function that returns (FILENAME)
or (RELATIVE-FILENAME . DIRNAME). In
the former case, FILENAME may be
relative or absolute.
You could try to write a regex that doesn't match the file name at all, only the column. Then for the file, write a function that searches backwards for the file. Perhaps not as efficient, but it should have the advantage that you can move upwards through the error messages and it will still identify the correct file when you cross file boundaries.
I don't have the necessary stuff installed to try this out, but take the following pseudo-code as an inspiration:
(add-to-list 'compilation-error-regexp-alist-alist
'(special-lint
"^\\S-+\\s-+\\([0-9]+\\):.*" ;; is .* necessary?
'special-lint-backward-search-filename 1))
(defun special-lint-backward-search-filename ()
(save-excursion
(when (re-search-backward "^Linting \\(.*\\)$" (point-min) t)
(list (match-string 1)))))
(It could be that using a search function inside special-lint-backward-search-filename will screw up the sub-group matching of the compilation-error-regexp, which would suck.)

I don't think you can make compilation do what you want here, because it won't assume that a subsequent error relates to a previously-seen filename. But here's an alternative; write a flymake plugin. Flymake always operates on the current file, so you only need to tell it how to find line (and, optionally, column) numbers.
Try hacking something like this, and you'll likely be pleasantly surprised.

Related

Notepad++ - Selecting or Highlighting multiple sections of repeated text IN 1 LINE

I have a text file in Notepad++ that contains about 66,000 words all in 1 line, and it is a set of 200 "lines" of output that are all unique and placed in 1 line in the basic JSON form {output:[{output1},{output2},...}]}.
There is a set of characters matching the RegEx expression "id":.........,"kind":"track" that occurs about 285 times in total, and I am trying to either single them out, or copy all of them at once.
Basically, without some super complicated RegEx terms, I am stuck because I can't figure out how to highlight all of them at once, and also the Remove Unbookmarked Lines feature does not apply because this is all in one line. I have only managed to be able to Mark every single occurrence.
So does this require a large number of steps to get the file into multiple lines and work from there, or is there something else I am missing?
Edit: I have come up with a set of Macro schemes that make the process of doing this manually work much faster. It's another alternative but still takes a few steps and quite some time.
Edit 2: I intended there to be an answer for actually just highlighting the different sections all at once, but I guess that it not possible. The answer here turns out to be more useful in my case, allowing me to have a list of IDs without everything else.

You seem to already have a regex which matches single instances of your pattern, so assuming it works and that we must use Notepad++ for this:
Replace .*?("id":.........,"kind":"track").*?(?="id".........,"kind":"track"|$) with \1.
If this textfile is valid JSON, this opens you up to other, non-notepad++ options, like using Python with the json module.
Edited to remove unnecessary steps

Vim: How to apply external command only to lines matching pattern

Two of my favorite Vim features are the ability to apply standard operators to lines matching a regex, and the ability to filter a selection or range of lines through an external command. But can these two ideas be combined?
For example, I have a text file that I use as a lab notebook, with notes from different dates separated by a line of dashes. I can do something like delete all the dash-lines with something like :% g/^-/d. But let's say I wanted to resize all the actual text lines, without touching those dash lines.
For a single paragraph, this would be something like {!}fmt. But how can this be applied to all the non-dash paragraphs? When I try what seems the logical thing, and just chain these two together with :% v/^-/!fmt, that doesn't work. (In fact, it seems to crash Vim...)
Is there a way to connect these two ideas, and only pass lines (not) matching a pattern into an external command like fmt?

Consider how the :global command works.
:global (and :v) make two passes through the buffer,
first marking each line that matches,
then executing the given command on the marked lines.
Thus if you can come up with a command – be it an Ex command or a command-line tool – and an associated range that can be applied to each matching line (and range), you have a winner.
For example, assuming that your text is soft-wrapped and your paragraphs are simply lines that don't begin with minus, here's how to reformat the paragraphs:
:v/^-/.!fmt -72
Here we used the range . "current line" and thus filtered every matching line through fmt. More complicated ranges work, too. For instance, if your text were hard-wrapped and paragraphs were defined as "from a line beginning with minus, up until the next blank line" you could instead use this:
:g/^-/.,'}!fmt -72
Help topics:
:h multi-repeat
:h :range!
:h :range

One way to do it may be applying the command to the lines matching the pattern 'not containing only dashes'
The solution I would try the is something like (not tested):
:g/\v^(-+)#!/normal V!fmt
EDIT I was doing some experiments and I think a recurvie macro should work for you
first of all set nowrapscan:
set nowrapscan
To prevent the recursive macro executing more than you want.
Then you make a search:
/\v^(-+)#!
Test if pressing n and p works with your pattern and tune it up if needed
After that, start recording the macro
qqn:.!awk '{print $2}'^M$
In this case I use awk as an example .! means filter current line with an external program
Then to make the macro recursive just append the string '#q' to the register #q
let #q .= '#q'
And move to the beggining of the buffer to apply the recursive macro and make the modifications:
gg#q
Then you are done. Hope this helps

Regex return file name, remove path and file extension

I have a data.frame that contains a text column of file names. I would like to return the file name without the path or the file extension. Typically, my file names have been numbered, but they don't have to be. For example:
df<-data.frame(data=c("a","b"),fileNames=c("C:/a/bb/ccc/NAME1.ext","C:/a/bb/ccc/d D2/name2.ext"))
I would like to return the equivalent of
df<-data.frame(data=c("a","b"),fileNames=c("NAME","name"))
but I cannot figure out the slick regular expression to do this with gsub. For example, I can get rid of the extension with (provided the file name ends with a number):
gsub('([0-9]).ext','',df[,"fileNames"])
Though I've been trying various patterns (by reading the regex help files and similar solutions on this site), I can't get a regex to return the text between the last "/" and the first ".". Any thoughts or forwards to similar questions are much appreciated!
The best I have gotten is:
gsub('*[[:graph:]_]/|*[[:graph:]_].ext','',df[,"fileNames"])
But this 1) doesn't get rid of all the leading path characters and 2) is dependent on a specific file extension.

Perhaps this will get you closer to your solution:
library(tools)
basename(file_path_sans_ext(df$fileNames))
# [1] "NAME1" "name2"
The file_path_sans_ext function is from the "tools" package (which I believe usually comes with R), and that will extract the path up to (but not including) the extension. The basename function will then get rid of your path information.
Or, to take from file_path_sans_ext and modify it a bit, you can try:
sub("(.*\\/)([^.]+)(\\.[[:alnum:]]+$)", "\\2", df$fileNames)
# [1] "NAME1" "name2"
Here, I've "captured" all three parts of the "fileNames" variables, so if you wanted just the file paths, you would change "\\2" to "\\1", and if you wanted just the file extensions, you would change it to "\\3".

First of all, to get rid of the "leading path", you can use basename. To remove the extension, you can use sub similar to your description in your question:
filenames <- sub("\\.[[:alnum:]]+$", "", basename(as.character(df$fileNames)))
Note that you should use sub instead of gsub here, because the file extension can only occur once for each filename. Also, you should use \\. which matches a dot instead of . which matches any symbol. Finally, you should append $ to the pattern to make sure you are removing the extension only if it is at the end of the filename.
Edit: the function file_path_sans_ext suggested in Ananda Mahto's solution works via sub("([^.]+)\\.[[:alnum:]]+$", "\\1", x), i.e. instead of removing the extension as above, the non-extension part of the filename is retained. I can't see any specific advantages or disadvantages of both methods in the OP's case.

Vim: Invert string (by words)

This is my string:
"this is my sentence"
I would like to have this output:
"sentence my is this"
I would like to select a few words on a line (in a buffer) and reverse it word by word.
Can anyone help me?

It's not totally clear what the context is here: you could be talking about text in a line in a buffer or about a string stored in a VimScript variable.
note: Different interpretations of the question led to various approaches and solutions.
There are some "old updates" that start about halfway through that have been rendered more or less obsolete by a plugin mentioned just above that section. I've left them in because they may provide useful info for some people.
full line replacement
So to store the text from the current line in the current buffer in a vimscript variable, you do
let words = getline('.')
And then to reverse their order, you just do
let words = join(reverse(split(words)))
If you want to replace the current line with the reversed words, you do
call setline('.', words)
You can do it in one somewhat inscrutable line with
call setline('.', join(reverse(split(getline('.')))))
or even define a command that does that with
command! ReverseLine call setline('.', join(reverse(split(getline('.')))))
partial-line (character-wise) selections
As explained down in the "old updates" section, running general commands on a character- or block-wise visual selection — the former being what the OP wants to do here — can be pretty complicated. Ex commands like :substitute will be run on entire lines even if only part of the line is selected using a character-wise visual select (as initiated with an unshifted v).
I realized after the OP commented below that reversing the words in a partial-line character-wise selection can be accomplished fairly easily with
:s/\%V.*\%V./\=join(reverse(split(submatch(0))))/
wherein
\%V within the RE matches some part of the visual selection. Apparently this does not extend after the last character in the selection: leaving out the final . will exclude the last selected character.
\= at the beginning of the replacement indicates that it is to be evaluated as a vimscript expression, with some differences.
submatch(0) returns the entire match. This works a bit like perl's $&, $1, etc., except that it is only available when evaluating the replacement. I think this means that it can only be used in a :substitute command or in a call to substitute()
So if you want to do a substitution on a single-line selection, this will work quite well. You can even pipe the selection through a system command using ...\=system(submatch(0)).
multiple-line character-wise selections
This seems to also work on a multiple-line character-wise selection, but you have to be careful to delete the range (the '<,'> that vim puts at the beginning of a command when coming from visual mode). You want to run the command on just the line where your visual selection starts. You'll also have to use \_.* instead of .* in order to match across newlines.
block-wise selections
For block-wise selections, I don't think there's a reasonably convenient way to manipulate them. I have written a plugin that can be used to make these sorts of edits less painful, by providing a way to run arbitrary Ex commands on any visual selection as though it were the entire buffer contents.
It is available at https://github.com/intuited/visdo. Currently there's no packaging, and it is not yet available on vim.org, but you can just git clone it and copy the contents (minus the README file) into your vimdir.
If you use vim-addon-manager, just clone visdo in your vim-addons directory and you'll subsequently be able to ActivateAddons visdo. I've put in a request to have it added to the VAM addons repository, so at some point you will be able to dispense with the cloning and just do ActivateAddons visdo.
The plugin adds a :VisDo command that is meant to be prefixed to another command (similarly to the way that :tab or :silent work). Running a command with VisDo prepended will cause that command to run on a buffer containing only the current contents of the visual selection. After the command completes, the buffer's contents are pasted into the original buffer's visual selection, and the temp buffer is deleted.
So to complete the OP's goal with VisDo, you would do (with the words to be reversed selected, and with the above-defined ReverseLine command available):
:'<,'>VisDo ReverseLine
old updates
...previous updates follow ... warning: verbose, somewhat obselete, and mostly unnecessary...
The OP's edit makes it more clear that the goal here is to be able to reverse the words contained in a visual selection, and specifically a character-wise visual selection.
This is decidedly not a simple task. The fact that vim does not make this sort of thing easy really confused me when I first started using it. I guess this is because its roots are still very much in the line-oriented editing functionality of ed and its predecessors and descendants. For example, the substitute command :'<,'>s/.../.../ will always work on entire lines even if you are in character-wise or block-wise (ctrlv) visual selection mode.
Vimscript does make it possible to find the column number of any 'mark', including the beginning of the visual selection ('<) and the end of the visual selection ('>). That is, as far as I can tell, the limit of its support. There is no direct way to get the contents of the visual selection, and there is no way to replace the visual selection with something else. Of course, you can do both of those things using normal-mode commands (y and p), but this clobbers registers and is kind of messy. But then you can save the initial registers and then restore them after the paste...
So basically you have to go to sort of extreme lengths to do with parts of lines what can easily done with entire lines. I suspect that the best way to do this is to write a command that copies the visual selection into a new buffer, runs some other command on it, and then replaces the original buffer's visual selection with the results, deleting the temp buffer. This approach should theoretically work for both character-wise and block-wise selections, as well as for the already-supported linewise selections. However, I haven't done that yet.
This 40-line code chunk declares a command ReverseCharwiseVisualWords which can be called from visual mode. It will only work if the character-wise visual selection is entirely on a single line. It works by getting the entire line containing the visual selection (using getline()) running a parameterized transformation function (ReverseWords) on the selected part of it, and pasting the whole partly-transformed line back. In retrospect, I think it's probably worth going the y/p route for anything more featureful.
" Return 1-based column numbers for the start and end of the visual selection.
function! GetVisualCols()
return [getpos("'<")[2], getpos("'>")[2]]
endfunction
" Convert a 0-based string index to an RE atom using 1-based column index
" :help /\%c
function! ColAtom(index)
return '\%' . string(a:index + 1) . 'c'
endfunction
" Replace the substring of a:str from a:start to a:end (inclusive)
" with a:repl
function! StrReplace(str, start, end, repl)
let regexp = ColAtom(a:start) . '.*' . ColAtom(a:end + 1)
return substitute(a:str, regexp, a:repl, '')
endfunction
" Replace the character-wise visual selection
" with the result of running a:Transform on it.
" Only works if the visual selection is on a single line.
function! TransformCharwiseVisual(Transform)
let [startCol, endCol] = GetVisualCols()
" Column numbers are 1-based; string indexes are 0-based
let [startIndex, endIndex] = [startCol - 1, endCol - 1]
let line = getline("'<")
let visualSelection = line[startIndex : endIndex]
let transformed = a:Transform(visualSelection)
let transformed_line = StrReplace(line, startIndex, endIndex, transformed)
call setline("'<", transformed_line)
endfunction
function! ReverseWords(words)
return join(reverse(split(a:words)))
endfunction
" Use -range to allow range to be passed
" as by default for commands initiated from visual mode,
" then ignore it.
command! -range ReverseCharwiseVisualWords
\ call TransformCharwiseVisual(function('ReverseWords'))
update 2
It turns out that doing things with y and p is a lot simpler, so I thought I'd post that too. Caveat: I didn't test this all too thoroughly, so there may be edge cases.
This function replaces TransformCharwiseVisual (and some of its dependencies) in the previous code block. It should theoretically work for block-wise selections too — it's up to the passed Transform function to do appropriate things with line delimiters.
function! TransformYankPasteVisual(Transform)
let original_unnamed = [getreg('"'), getregtype('"')]
try
" Reactivate and yank the current visual selection.
normal gvy
let #" = a:Transform(#")
normal gvp
finally
call call(function('setreg'), ['"'] + original_unnamed)
endtry
endfunction
So then you can just add a second command declaration
command! -range ReverseVisualWords
\ call TransformYankPasteVisual(function('ReverseWords'))
tangentially related gory detail
Note that the utility of a higher-level function like the ones used here is somewhat limited by the fact that there is no (easy or established) way to declare an inline function or block of code in vimscript. This wouldn't be such a limitation if the language weren't meant to be used interactively. You could write a function which substitutes its string argument into a dictionary function declaration and returns the function. However, dictionary functions cannot be called using the normal invocation syntax and have to be passed to call call(dictfunct, args, {}).
note: A more recent update, given above, obsoletes the above code. See the various sections preceding old updates for a cleaner way to do this.

Maybe:
:s/\v(.*) (.*) (.*) (.*)/\4 \3 \2 \1/
Of course you probably need to be more specific in the first part to find that particular sentence. Generally you can refer to match groups as \number with \0 being the whole match.

Here's a way to do by calling out to Ruby. After selecting the line you want to reverse, you can do this in command mode to replace it:
!ruby -e 'puts ARGF.read.strip.split(/\b/).reverse.join'

I found the solution myself thank to your answers and a lot of trying :)
This works:
function! Test()
exe 'normal ' . 'gv"ay'
let r = join(reverse(split(getreg('a'))))
let #a = r
exe 'normal ' . 'gv"ap'
endfunction
Didn't thought that I was enable to write such a function :)

StackOverflowError with Checkstyle 4.4 RegExp check

Hello,
Background:
I'm using Checkstyle 4.4.2 with a RegExp checker module to detect when the file name in out java source headers do not match the file name of the class or interface in which they reside. This can happen when a developer copies a header from one class to another and does not modify the "File:" tag.
The regular expression use in the RexExp checker has been through many incarnations and (though it is possibly overkill at this point) looks like this:
File: (\w+)\.java\n(?:.*\n)*?(?:[\w|\s]*?(?: class | interface )\1)
The basic form of files I am checking (though greatly simplified) looks like this
/*
*
* Copyright 2009
* ...
* File: Bar.java
* ...
*/
package foo
...
import ..
...
/**
* ...
*/
public class Bar
{...}
The Problem:
When no match is found, (i.e. when a header containing "File: Bar.java" is copied into file Bat.java ) I receive a StackOverflowError on very long files (my test case is #1300 lines).
I have experimented with several visual regular expression testers and can see that in the non-matching case when the regex engine passes the line containing the class or interface name it starts searching again on the next line and does some backtracking which probably causes the StackOverflowError
The Question:
How to prevent the StackOverflowError by modifying the regular expression
Is there some way to modify my regular expression such that in the non-matching case (i.e. when a header containing "File: Bar.java" is copied into file Bat.java ) that the matching would stop once it examines the line containing the interface or class name and sees that "\1" does not match the first group.
Alternatively if that can be done, Is is possible minimize the searching and matching that takes place after it examines the line containing the interface or class thus minimizing processing and (hopefully) the StackOverflow error?

Try
File: (\w+)\.java\n.*^[\w \t]+(?:class|interface) \1
in dot-matches-all mode. Rationale:
[\w\s] (the | doesn't belong there) matches anything, including line breaks. This results in a lot of backtracking back up into the lines that the previous part of the regex had matched.
If you let the greedy dot gobble up everything up to the end of the file (quick) and then backtrack until you find a line that starts with words or spaces/tabs (but no newlines) and then class or interface and \1, then that doesn't require as much stack space.
A different, and probably even better solution would be to split the problem into parts.
First match the File: (\w+)\.java part. Then do a second search with ^[\w \t]+(?:class|interface) plus the \1 match from the first search on the same file.

Follow up:
I plugged in Tim Pietzcher's suggestion above and his greedy solution did indeed fail faster and without a StackOverflowError when no match was found. However, in the positive case, the StackOverflowError still occurred.
I took a look at the source code RegexpCheck.java. The classes pattern is constructed in multiline mode such that the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. Then it reads the entire class file into a string and does a recursive search for the pattern(see findMatch()). That is undoubtedly the source of the StackOverflowException.
In the end I didn't get it to work (and gave up) Since Maven 2 released the maven-checkstyle-plugin-2.4/Checkstyle 5.0 about 6 weeks ago we've decided to upgrade our tools. This may not solve the StackOverflowError problem, but it will give me something else to work on until someone decides that we need to pursue this again.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js