Search for multiline regex - regex

I have several source file which have function definitions as follows.
ReturnType ClassName::
FunctionName(FunctionArgs...)
{
....
}
ReturnType ClassName::NestedClassName::
FunctionName(FunctionArgs...)
{
....
}
I want to grep through the files and list all the functions of first type sepeartely and second type separately. Is there a way to do it in emacs?
Note: I have tried C-q C-j from [https://emacs.stackexchange.com/questions/9548/what-is-regex-to-match-newline-character]. it didn't work.

There seem to be two inter-related pieces to this: the regexps to use, and the search method. I am making the following assumptions (sorry for not clarifying these beforehand; I don't have enough rep to comment):
You are interested in collecting just the function signature (the first two lines), not what's inside the braces.
Colons never appear in return types or class names.
No line ever ends in a double colon inside a function body.
Within Emacs, I can distinguish the two cases with the regexps
^[^:]+::\n.+$
^[^:]+::[^:]+::\n.+$
(where you would replace \n with an actual newline (via C-q C-j, for instance) in an interactive usage).
If you have all of the buffers opened in Emacs, you can just use multi-occur now. Otherwise, you can call out to grep using the grep command, or just use M-! to call grep directly.
If you're using grep, then you need a different approach (or at least I couldn't find an appropriate regexp). If you drop the second line and use the -A 1 switch (which tells grep to print the line following each match), everything seems to work properly. You also need to escape the + operator, for some reason. Here are the resulting commands:
grep -A 1 "^[^:]\+::$" files
grep -A 1 "^[^:]\+::[^:]\+::$" files

Related

Using egrep to find a multi-line definition in a file, matching all content within {}

I know that stackexchange is filled with helpful stuff about how to do things similar to what I want to do, but I'm afraid I'm not skilled enough to extrapolate from them and solve my problem.
I'm trying to write a script that searches for specific latex definitions, contained in one or more .sty files, and then return the entire definition. The definition is contained inside curly brackets, but there may be lots of curly brackets inside of the definition.
For example, following this thread I tried the command
sed -n '/\\def\\propref\>/,/}$/p' *.sty
but this returns
\def\propref#1{%
\IfBeginWith*{#1}{prop:}
But I want it to return the entire definition, delimited by {}, i.e.,
\def\propref#1{%
\IfBeginWith*{#1}{prop:}
{Prop.~\ref{#1}}%
{Prop.~\ref{prop:#1}}}
So, the hard part for me is to locate the closing delimiter that matches the opening one. A second issue, if the solution is to use sed, is that I'd like the command to return the file name as well as the pattern, just as grep does, when searching thru multiple files. Specifically, I'll like the first line of the returned output to look grep-like
my_oneLineDefs.sty:\def\propref#1{%
Here's a snippet of the .sty file containing the definition.
\def\propref#1{%
\IfBeginWith*{#1}{prop:}
{Prop.~\ref{#1}}%
{Prop.~\ref{prop:#1}}}
\def\thmref#1{%
\IfBeginWith*{#1}{thm:}
{Thm.~\ref{#1}}%
{Thm.~\ref{thm:#1}}}
\def\secref#1{%
\IfBeginWith*{#1}{sec:}
{\S\ref{#1}}%
{\S\ref{sec:#1}}}

Finding and modifying function definitions (C++) via bash-script

Currently I am working on a fairly large project. In order to increase the quality of our code, we decided to enforce the treatement of return values (Error Codes) for every function. GCC supports a warning concerning the return value of a function, however the function definition has to be preceeded by the following flag.
static __attribute__((warn_unused_result)) ErrorCode test() { /* code goes here */ }
I want to implement a bashscript that parses the entire source code and issues a warning in case the
__attribute__((warn_unused_result))
is missing.
Note that all functions that require this kind of modification return a type called ErrorCode.
Do you think this is possible via a bash script ?
Maybe you can use sed with regular expressions. The following worked for me on a couple of test files I tried:
sed -r "s/ErrorCode\s+\w+\s*(.*)\s*\{/__attribute__((warn_unused_result)) \0/g" test.cpp
If you're not familiar with regex, the pattern basically translates into:
ErrorCode, some whitespace, some alphanumerics (function name), maybe some whitespace, open parenthesis, anything (arguments), close parenthesis, maybe some whitespace, open curly brace.
If this pattern is found, it is prefixed by __attribute__((warn_unused_result)). Note that this only works if you are putting the open curly brace always in the same line as the arguments and you don't have line breaks in your function declarations.
An easy way I could imagine is via ctags. You create a tag file over all your source code, and then parse the tags file. However, I'm not quite sure about the format of the tags file. The variant I'm using here (Exuberant Ctags 5.8) seems to put an "f" in the fourth column, if the tag represents a function. So in this case I would use awk to filter all tags that represent functions, and then grep to throw away all lines without __attribute__((warn_unused_result)).
So, in a nutshell, first you do
$ ctags **/*.c
This creates a file called "tags" in the current directory. The command might also be ctags-exuberant, depending on your variant. The **/*.c is a glob pattern that might work in your shell - if it doesn't, you have to supply your source files in another way (look at the ctagsoptions).
Then you filter the funktions:
$ cat tags | awk -F '\t' '$4 == "f" {print $0}' | grep -v "__attribute__((warn_unused_result))"
No, it is not possible in the general case. The C++ grammar is the most complex of all the languages I know of, and C++ is not parsable via regular expressions in the general case. You might succeed if you limit yourself to a very narrow set of uses, but I am not sure how feasible it is in your case.
I also do not think the excersise is worth the effort, since sometimes ignoring the result of the function is an OK thing.

How to use VI to remove ocurance of character on lines matching regex?

I'm trying to change the case of method names for some functions from lowercase_with_underscores to lowerCamelCase for lines that begin with public function get_method_name(). I'm struggling to get this done in a single step.
So far I have used the following
:%s/\(get\)\([a-zA-Z]*\)_\(\w\)/\1\2\u\3/g
However, this only replaces one _ character at a time. What I would like it a search and replace that does something like the following:
Identify all lines containing the string public function [gs]et.
On these lines, perform the following search and replace :s/_\(\w\)/\u\1/g
(
EDIT:
Suppose I have lines get_method_name() and set_method_name($variable_name) and I only want to change the case of the method name and not the variable name, how might I do that? The get_method_name() is more simple of course, but I'd like a solution that works for both in a single command. I've been able to use :%g/public function [gs]et/ . . . as per the solution listed below to solve for the get_method_name() case, but unfortunately not the set_method_name($variable_name) case.
If I've understood you correctly, I don't know why the things you've tried haven't worked but you can use g to perform a normal mode command on lines matchings a pattern.
Your example would be something like:
:%g/public function [gs]et/:s/_\(\w\)/\u\1/g
Update:
To match only the method names, we can use the fact that there will only be method names before the first $, as this looks to be PHP.
To do that, we can use a negative lookbehind, #<!:
:%g/public function [gs]et/:s/\(\$.\+\)\#<!_\(\w\)/\u\2/g
This will look behind #<! for any $ followed by any number of characters and only match _\(\w\) if no $s are found.
Bonus points(?):
To do this for multiple buffers stick a bufdo in front of the %g
You want to use a substitute with an expression (:h sub-replace-expression)
Match the complete string you want to process then pass that string to a second substitute command to actually change the string
:%s/\(get\|set\)\zs_\w\+/\=substitute(submatch(0), '_\([A-Za-z]\)', '\U\1', 'g')
Running the above on
get_method_name($variable_name)
set_method_name($variable_name)
returns
getMethodName($variable_name)
setMethodName($variable_name)
To have vi do replace sad with happy, on all lines, in a file:
:1, $ s/sad/happy/g
(It is the :1, $ before the sed command that instructs vi to execute the command on every line in the file.)

export filenames to temp file bash

I have a lot of files in multiple directories that all have the following setup for the filename:
prob123456_01
I want to delete the trailing "_01" off of each file name and export them to a temp file. How exactly would I delete the trailing "_01" as well as export? I am rather new to scripting so any help would be greatly appreciated!
As you've tagged with bash, I'll assume that you can use globstar
shopt -s globstar # enable globstar
for f in **_[0-9][0-9]; do echo "${f%_*}"; done > tmp
With globstar enabled, the pattern **_[0-9][0-9] matches any file ending in _, followed by any 2 digit number, in the current directory and any subdirectories. ${f%_*} removes the end of the file name using bash's built-in string manipulation functionality.
Better yet, as Charles Duffy suggests (thanks), you can use an array instead of a loop:
files=( **_[0-9][0-9] ); printf '%s\n' "${files[#]%_*}"
The array is filled the filenames that match the same pattern as before. ${files[#]%_*} removes the last part from each element of the array and passes them all as arguments to printf, which prints each result on a separate line.
Either of these approaches is likely to be quicker than using find as everything is done in the shell, without executing any separate processes.
Previously I had suggested to use the pattern **_{00..99}, although this is not ideal for a couple of reasons. It is less efficient, as it expands to **_00, **_01, **_02, ..., **_99. Also, any of those 100 patterns that don't match will be included literally in the output unless another option, nullglob is enabled.
It's up to you whether you use [0-9] or [[:digit:]] but the advantage of the latter is that it matches all characters defined to be a digit, which may vary depending on your locale. If this isn't a concern, I would go with the former.
If I understand you correctly, you want a list of the filenames without the trailing _01. The following would do that:
find . -type f -name '*_01' | sed 's/_01$//' > tmp.lst
find . -type f -name '*_01' looks for all the files in the current directory, and its descendent directories, for files with names ending in _01.
| is the so-called pipe, handing the results of the left-hand call to the right-hand call.
sed 's/_01$//' removes the _01 from the end of each filename.
> tmp.lst writes the result into the file tmp.lst
These are all pretty basic parts of working with bash and its likes, so it might be a good idea to look at a tutorial or two and familiarize yourself with those and a few others ;)

Apply regular expression substitution globally to many files with a script

I want to apply a certain regular expression substitution globally to about 40 Javascript files in and under a directory. I'm a vim user, but doing this by hand can be tedious and error-prone, so I'd like to automate it with a script.
I tried sed, but handling more than one line at a time is awkward, especially if there is no limit to how many lines the pattern might match.
I also tried this script (on a single file, for testing):
ex $1 <<EOF
gs/,\(\_\s*[\]})]\)/\1/
EOF
The pattern will eliminate a trailing comma in any Perl/Ruby-style list, so that "[a, b, c,]" will come out as "[a, b, c]" in order to satisfy Internet Explorer, which alone among browsers, chokes on such lists.
The pattern works beautifully in vim but does nothing if I run it in ex, as per the above script.
Can anyone see what I might be missing?
You asked for a script, but you mentioned that you are vim user. I tend to do project-wide find and replace inside of vim, like so:
:args **/*.js | argdo %s/,\(\_\s*[\]})]\)/\1/ge | update
This is very similar to the :bufdo solution mentioned by another commenter, but it will use your args list rather than your buflist (and thus doesn't require a brand new vim session nor for you to be careful about closing buffers you don't want touched).
:args **/*.js - sets your arglist to contain all .js files in this directory and subdirectories
| - pipe is vim's command separator, letting us have multiple commands on one line
:argdo - run the following command(s) on all arguments. it will "swallow" subsequent pipes
% - a range representing the whole file
:s - substitute command, which you already know about
:s_flags, ge - global (substitute as many times per line as possible) and suppress errors (i.e. "No match")
| - this pipe is "swallowed" by the :argdo, so the following command also operates once per argument
:update - like :write but only when the buffer has been modified
This pattern will obviously work for any vim command which you want to run on multiple files, so it's a handy one to keep in mind. For example, I like to use it to remove trailing whitespace (%s/\s\+$//), set uniform line-endings (set ff=unix) or file encoding (set filencoding=utf8), and retab my files.
1) Open all the files with vim:
bash$ vim $(find . -name '*.js')
2) Apply substitute command to all files:
:bufdo %s/,\(\_\s*[\]})]\)/\1/ge
3) Save all the files and quit:
:wall
:q
I think you'll need to recheck your search pattern, it doesn't look right. I think where you have \_\s* you should have \_s* instead.
Edit: You should also use the /ge options for the :s... command (I've added these above).
You can automate the actions of both vi and ex by passing the argument +'command' from the command line, which enables them to be used as text filters.
In your situation, the following command should work fine:
find /path/to/dir -name '*.js' | xargs ex +'%s/,\(\_\s*[\]})]\)/\1/g' +'wq!'
you can use a combination of the find command and sed
find /path -type f -iname "*.js" -exec sed -i.bak 's/,[ \t]*]/]/' "{}" +;
If you are on windows, Notepad++ allows you to run simple regexes on all opened files.
Search for ,\s*\] and replace with ]
should work for the type of lists you describe.