Command line perl regex to find dangling Javascript commas - regex

Hello I'm seeking a Perl one-liner if possible, to scan all of our Javascript files, to find so-called "rogue commas". That is, commas that come at the end of an array or object data structure, and therefore commas that come immediately before either an ']' or '}' character.
The main challenge I'm encountering is how to make the regex that checks for ] or } non-greedy. The regex needs to span multiple lines, since the comma could end one line, followed by the } or ] on the next line, but I've figured out how to do that with the help of the book Minimal Perl.
Also, I'd like to be able to pipe a number of files to this Perl regex (via find/xargs), and so I'd like to print the name of the input file, and the line number within that file.
Below are various attempts of mine that are not particularly close to working straight from my bash history. Thanks in advance:
find winhome/workspace/SsuExt4Zoura/quotetool/js
-name "*.js" | xargs perl -00 -wnl -e '/,\s+$/ and print $_;' find winhome/workspace/SsuExt4Zoura/quotetool/js
-name "*.js" | xargs perl -00 -wnl -e '/,\s+/ and print $_;' find winhome/workspace/SsuExt4Zoura/quotetool/js
-name "*.js" | xargs perl -00 -wnl -e '/,\s+\]/ and print $_;' find winhome/workspace/SsuExt4Zoura/quotetool/js
-name "*.js" | xargs perl -00 -wnl -e '/,\s+[\]\}]/ and print $_;' find winhome/workspace/SsuExt4Zoura/quotetool/js
-name "*.js" | xargs perl -00 -wnl -e '/,\s+[\]\}]/ and print $_;' | wc -l find winhome/workspace/SsuExt4Zoura/quotetool/js
-name "*.js" | xargs perl -00 -wnl -e '/,\s+[\]\}]/ and print $_;' | wc -l find winhome/workspace/SsuExt4Zoura/quotetool/js
-name "*.js" | xargs perl -00 -wnl -e '/,\s+}/ and print $_;' | wc -l find winhome/workspace/SsuExt4Zoura/quotetool/js
-name "*.js" | xargs perl -00 -wnl -e '/,\s+}?/ and print $_;' | wc -l find winhome/workspace/SsuExt4Zoura/quotetool/js
-name "*.js" | xargs perl -00 -wnl -e '/,\s+}+?/ and print $_;' | wc -l find winhome/workspace/SsuExt4Zoura/quotetool/js
-name "*.js" | xargs perl -00 -wnl -e '/,$/' and print $_;' find winhome/workspace/SsuExt4Zoura/quotetool/js
-name "*.js" | xargs perl -00 -wnl -e '/,$/ and print $_;' find winhome/workspace/SsuExt4Zoura/quotetool/js
-name "*.js" | xargs perl -00 -wnl -e '/\,$/ and print $_;'

With the -00 switch, you change the record separator, and (probably) get the whole file in one line, which allows you to find multi-line trailing commas. However, it also makes the print $_ print the whole line. What you probably want is printing the file name:
print $ARGV if /,\s*[\]\}]/;

Most of these look like a decent approach to the problem, with one small issue. You probably want ,\s*(?:$|[\]\}]) rather than ,\s+(?:$|[\]\}]) as there may not be even one space. Your + quantifier might miss forms like ,].
Having said that, JavaScript can be pretty subtle, and you might well encounter comments and other stuff, which might legitimately end with a comma before something unexpected, like the end of the file or a }. A cheap solution might be to use a perl s/// form to simply remove all the comments before applying your tests.
If you're handling JSON, JSON::XS can enforce validity with its relaxed option.
If you need real validation, something like JSLint is probably the way to go. I've had a lot of success with using Rhino to embed JavaScript (a bit less using Perl with SpiderMonkey) and using this as a set of tests against JavaScript code would be a nice way to ensure reliability over time.

An easy solution to this problem is to use comma-first style. Since commas never come at the end of a line, there is never a 'trailing comma'.
For example:
var myObj = { foo: 1
, bar: 2
, baz: 4
}
You can easily detect if a comma is missing, it's obvious which elements belong to what set of braces, and there's never a 'trailing comma problem'.
See also https://gist.github.com/357981

Related

Perl regex is not matching

I'm trying to pipe the output of a find command to a perl one-liner to replace a line that ends with ?> with RedefineForDocker::standardizeXmlmc() but for some reason the value isn't being replaced. I've checked the output of the find command and it is performing as expected, and I've double checked my regex and it should match.
find . -name *.php -exec ggrep -Ezl 'class XmlMethodCall.*([?]>)$' {} \; \
| xargs perl -ewpn -i.bak2 \
"s/[?]>\s*?$/RedefineForDocker::standardizeXmlmc()\n/gm"
I get no warnings and no indication that it isn't working, the backups are created, but the file remains unchanged. The list of matched files run from the find command is below.
./swsupport/clisupp/trending/services/data.helpers.php
./swsupport/clisupp/_bpmui/arch/service/data.helpers.php
./swsupport/clisupp/_bpmui/itsm/service/data.helpers.php
./swsupport/clisupp/_bpmui/itsm_default/service/data.helpers.php
./webclient_code/php/session.php
./webclient_code/service/storedquery/helpers.php
./php/_phpinclude/itsm/xmlmc/xmlmc.php
./php/_phpinclude/itsmf/xmlmc/xmlmc.php
./php/_phpinclude/itsm_default/xmlmc/xmlmc.php
Here is an example of one of the files it should match
https://regex101.com/r/BUoCif/1
Run your perl command as this:
perl -i.bak2 -wpe 's/\?>\h*$/RedefineForDocker::standardizeXmlmc()\n/gm'
Order of command line option is important here.
Full pipeline should be like this:
find . -name '*.php' -exec ggrep -PZzl '(?ms)class XmlMethodCall.*\?>\h*$' {} + |
xargs -0 perl -i.bak2 -wpe 's/\?>\h*$/RedefineForDocker::standardizeXmlmc()\n/gm'
Note use -Z option in grep and -0 option in xargs to address issues with filenames with whitespaces etc.

Got Error 'repetition-operator operand invalid' with negative look behind regex (?<!(Log\())#"[^"]+"

I want to find all hardcoded strings in my project except words starts with Log(.
Using this regex to do so but getting an error as mentioned above.
KEYWORDS='(?<!(Log\())#"[^"]+"'
find "${SRCROOT}" \( -name "*.h" -or -name "*.m" \) -print0 | xargs -0 egrep --with-filename "($KEYWORDS).*\$
Is there any other alternative regex or script to get the same result.
You might just filter out what you don't want to see:
xargs -0 grep -EH '#"[^"]+"' | grep -v 'LOG\(#"'
If you want to stick with your regular expression:
xargs -0 perl -ne 'print "$ARGV: $_" if /(?<!LOG\()#".+?"/'

Replace multiline string in all files in terminal

I have a couple of files in a directory, in which I have a piece of text between two separators.
Text to keep
//###==###
Text to remove
//###==###
Text to keep
After an extensive search, I found the following Mac OS X Terminal command, with which I can remove the separators themselves.
perl -pi -w -e 's|//###==###||g' `find . -type f`
However, I need something with a regex that does not only remove the separators themselves, but also what is in between. Something like this, although this line doesn't do anything.
perl -pi -w -e 's|//###==###(.*)//###==###||g' `find . -type f`
EDIT AFTER DUPLICATE FLAG
I see something similar here, using the scalar range operator, but I cannot make it work for me. Failed attempts include:
perl -pi -w -e 's|//###==###..//###==###||g' `find . -type f`
perl -pi -w -e 's|//###==###(..)//###==###||g' `find . -type f`
perl -pi -w -e 's|//###==###[..]//###==###||g' `find . -type f`
SOLUTION
With the help of dawg below, the following oneliner will do exactly what I want:
$ perl -0777 -p -i -e 's/(^\s*^\/\/###==###.*?\/\/###==###\s*)//gms' `find . -type f -name "index.php"`
You can use:
s/(^\s*^\/\/###==###.*?\/\/###==###\s*)//gms
Working Demo
Then in the terminal and in Perl. Given:
$ echo "$tgt"
Text to keep
//###==###
Text to remove
//###==###
Text to keep
Use the -0777 command flag to slurp the whole file and then:
$ echo "$tgt" | perl -0777 -ple 's/(^\s*^\/\/###==###.*?\/\/###==###\s*)//gms'
Text to keep
Text to keep
Or, you can use the range operator. If done this way, you cannot remove the leading and trailing blank lines if that is your intent:
$ echo "$tgt" | perl -lne 'print unless (/\/\/###==###/ ... /\/\/###==###/)'
Text to keep
Text to keep

Combine different regex together in xargs command

I have these two regexes:
find ... | xargs perl -pi -e 's/\t/ /g'
find ... | xargs perl -pi -e 's/[^\S\n]+$//g'
First one changes tabs to 4 spaces, and second removes any trailing white space at the end of each line.
I am tempted to combine the two, but don't want to break something. Besides, they are doing different things -- one is adding spaces, another is removing spaces. Is there a safe way to merge these two together or just leave them as is?
You can do this:
find ... | xargs perl -l -pi -e 's/\t/ /g; s/\s+$//'
Since the second find is operating on the results of the first one, it's safe to perform each command in succession in a single perl invocation.
I would leave the expressions separate, but you can perform them both with a single call to perl:
find ... | xargs perl -pi -e 's/\t/ /g;' -e 's/[^\S\n]+$//g;'

How to match once per file in grep?

Is there any grep option that let's me control total number of matches but stops at first match on each file?
Example:
If I do this grep -ri --include '*.coffee' 're' . I get this:
./app.coffee:express = require 'express'
./app.coffee:passport = require 'passport'
./app.coffee:BrowserIDStrategy = require('passport-browserid').Strategy
./app.coffee:app = express()
./config.coffee: session_secret: 'nyan cat'
And if I do grep -ri -m2 --include '*.coffee' 're' ., I get this:
./app.coffee:config = require './config'
./app.coffee:passport = require 'passport'
But, what I really want is this output:
./app.coffee:express = require 'express'
./config.coffee: session_secret: 'nyan cat'
Doing -m1 does not work as I get this for grep -ri -m1 --include '*.coffee' 're' .
./app.coffee:express = require 'express'
Tried not using grep e.g. this find . -name '*.coffee' -exec awk '/re/ {print;exit}' {} \; produced:
config = require './config'
session_secret: 'nyan cat'
UPDATE: As noted below the GNU grep -m option treats counts per file whereas -m for BSD grep treats it as global match count
So, using grep, you just need the option -l, --files-with-matches.
All those answers about find, awk or shell scripts are away from the question.
I think you can just do something like
grep -ri -m1 --include '*.coffee' 're' . | head -n 2
to e.g. pick the first match from each file, and pick at most two matches total.
Note that this requires your grep to treat -m as a per-file match limit; GNU grep does do this, but BSD grep apparently treats it as a global match limit.
I would do this in awk instead.
find . -name \*.coffee -exec awk '/re/ {print FILENAME ":" $0;exit}' {} \;
If you didn't need to recurse, you could just do it with awk:
awk '/re/ {print FILENAME ":" $0;nextfile}' *.coffee
Or, if you're using a current enough bash, you can use globstar:
shopt -s globstar
awk '/re/ {print FILENAME ":" $0;nextfile}' **/*.coffee
using find and xargs.
find every .coffee files and excute -m1 grep to each of them
find . -print0 -name '*.coffee'|xargs -0 grep -m1 -ri 're'
test
without -m1
linux# find . -name '*.txt'|xargs grep -ri 'oyss'
./test1.txt:oyss
./test1.txt:oyss1
./test1.txt:oyss2
./test2.txt:oyss1
./test2.txt:oyss2
./test2.txt:oyss3
add -m1
linux# find . -name '*.txt'|xargs grep -m1 -ri 'oyss'
./test1.txt:oyss
./test2.txt:oyss1
find . -name \*.coffee -exec grep -m1 -i 're' {} \;
find's -exec option runs the command once for each matched file (unless you use + instead of \;, which makes it act like xargs).
You can do this easily in perl, and no messy cross platform issues!
use strict;
use warnings;
use autodie;
my $match = shift;
# Compile the match so it will run faster
my $match_re = qr{$match};
FILES: for my $file (#ARGV) {
open my $fh, "<", $file;
FILE: while(my $line = <$fh>) {
chomp $line;
if( $line =~ $match_re ) {
print "$file: $line\n";
last FILE;
}
}
}
The only difference is you have to use Perl style regular expressions instead of GNU style. They're not much different.
You can do the recursive part in Perl using File::Find, or use find feed it files.
find /some/path -name '*.coffee' -print0 | xargs -0 perl /path/to/your/program