grep - search for "<?\n" at start of a file - regex

I have a hunch that I should probably be using ack or egrep instead, but what should I use to basically look for
<?
at the start of a file? I'm trying to find all files that contain the php short open tag since I migrated a bunch of legacy scripts to a relatively new server with the latest php 5.
I know the regex would probably be '/^<\?\n/'

I RTFM and ended up using:
grep -RlIP '^<\?\n' *
the P argument enabled full perl compatible regexes.

If you're looking for all php short tags, use a negative lookahead
/<\?(?!php)/
will match <? but will not match <?php
[meder ~/project]$ grep -rP '<\?(?!php)' .

find . -name "*.php" | xargs grep -nHo "<?[^p^x]"
^x to exclude xml start tag

if you worried about windows line endings, just add \r?.

grep '^<?$' filename
Don't know if that is showing up correctly. Should be
grep ' ^ < ? $ ' filename

Do you mean a literal "backslash n" or do you mean a newline?
For the former:
grep '^<?\\n' [files]
For the latter:
grep '^<?$' [files]
Note that grep will search all lines, so if you want to find matches just at the beginning of the file, you'll need to either filter each file down to its first line, or ask grep to print out line numbers and then only look for line-1 matches.

Related

Can grep show only result i want

I have data as this
tatusx2.atc?beginnum=0;8pctgRB Mwdf fgEio"text1"text4"text
tatqsx3.atc?beginnum=1;8pctgRBwsaNezxio"text2
tatssx4.atc?beginnum=2;8pctgsvMALNejkio"data2
tatksx4.atc?beginnum=1;8pctgxdfALNebfio"text3
tatzsx5.atc?beginnum=3;8pwerRBMALNetior"datac
How to get only data between ; and "
I have tried grep -oP ';.*?"' file and got output :
;8pctgRBMwdffgEio"
;8pctgRBwsaNezxio"
;8pctgsvMALNejkio"
;8pctgxdfALNebfio"
;8pwerRBMALNetior"
But my desired output is:
8pctgRB Mwdf fgEio
8pctgRBwsaNezxio
8pctgsvMALNejkio
8pctgxdfALNebfio
8pwerRBMALNetior
You need to use lookahead and lookbehind regex expressions
grep -oP '(?<=;)\w*(?=")'
I consider you play around regexr to learn more about regular expressions. Checkout their cheatsheet.
A much more readable way to write the expression you need is:
grep -oP '(?<=;).*(?=")' file
and will get you the desired result. PERL regexes are apparently experimental but certain patterns work without issues.
The following options are being used:
-o --only-matching to the print only the matched parts of a matching line
-P --perl-regexp
Using ?=; will get you the string beginning with ; but using the > you are able to start at the index after. Similarly the end string tag is specified.
Here is suggested additional reading.

Remove character occurring after _ from all the files excluding file extension (.png)

I was searching for a unix command/shell script to remove characters occurred after _ in all the files excluding file extension.
Example:
b6d28-insurance-renewal-shop_6b5c74fa3d4b96f7557c3fd66f2555af.png
should be renamed to
b6d28-insurance-renewal-shop.png
I have tried searching online and but was not able to find out a quick and optimal solution.
Please note that those extra characters are added randomly and varying in each file.
Thanks in Advance!
You can use sed like this using a negated character class:
f='b6d28-insurance-renewal-shop_6b5c74fa3d4b96f7557c3fd66f2555af.png'
sed 's/_[^_.]*//' <<< "$f"
b6d28-insurance-renewal-shop.png
[^_.] matches any character except DOT or underscore.
If you're using bash then you can do this in shell itself using:
echo "${f%_*}.png"
You could also use cut for the result like this:
file="b6d28-insurance-renewal-shop_6b5c74fa3d4b96f7557c3fd66f2555af.png"
new_file=$(echo $file | cut -d'_' -f1).$(echo $file | cut -d'.' -f2)
echo "New file name: ${new_file}"
Output:
New file name: b6d28-insurance-renewal-shop.png
Regex pattern:
(\_[\d\w]+)(?=(\.\w{2,3}))
to find every _akfgasfhsgfhha before .ext[ension]
Assuming that f holds the original filename,
${f%_*}.${f##*.}
would give you the transformed filename.

Match with regex any string except a string provided

In order to configure apache I am strugelling with one regex. My goal is to match any string, but a provided string. I know this has been asked a couple of times on stackoverflow, however I could not fix it so far.
The regex should match
/home/www/dir1/*
/home/www/dir_wl/example1/*
/home/www/dir_wl/example2/*
It should not match
/home/www/dir1/MEW/*
/home/www/dir_wl/example1/MEW/*
/home/www/dir_wl/example2/MEW*
Here is the entire line:
<Directory ~ "^/home/www/(dir1|dir_wl/.*(?!MEW))/(?!MEW/)">
Any help is greatly apreciated!
I got it with "grep -P". "-P" means "Interpret PATTERN as a Perl regular expression". The program you are working with must support this.
I created the file "dirs.txt" with the following content:
/home/www/dir1/*
/home/www/dir_wl/example1/*
/home/www/dir_wl/example2/*
/home/www/dir1/MEW/*
/home/www/dir_wl/example1/MEW/*
/home/www/dir_wl/example2/MEW*
Then I run this command.
cat dirs.txt | grep -P "^/home/www/(dir1|dir_wl)/(?!MEW)(((?!/MEW).)*$)"
It returns ...
/home/www/dir1/*
/home/www/dir_wl/example1/*
/home/www/dir_wl/example2/*
This does the same on a much more simple way ...
cat regex.txt | grep -E "^/home/www/(dir1|dir_wl)/" | grep -v -E "/MEW"

Regular expression to extract text from XML-ish data using GNU sed

I have a file full of lines extracted from an XML file using "gsed regexp -i FILENAME". The lines in the file are all of one of either format:
<field number='1' name='Account' type='STRING'W/>
<field number='2' name='AdvId' type='STRING'W>
I've inserted a 'W' in the end which represents optional whitespace. The order and number of properties are not necessarily the same in all lines throughout the file although "number" is always before "type".
What I'm searching for is a regular expression "regexp" that I can give to gnu sed so that this command:
gsed regexp -i FILENAME
gives me a file with lines looking like this:
1 STRING
2 STRING
I don't care about the amount of whitespace in the result as long as there is some after the number and a newline at the end of each line.
I'm sure it is possible, but I just can't figure out how in a reasonable amount of time. Can anyone help?
Thanks a lot,
jules
Using xsh, a Perl wrapper around XML::LibXML:
open file.xml ;
for //field echo #number #type ;
I'm sure this can be optimized, but it works for me and answers your question:
sed "s/^.*number='\([0-9]*\)'.*type='\(.*\)'.*$/\1 \2/" <filename>
Saying that, I think the others are right, if you have an XML-file you should use an XML-parser.
I think you're much better off using a command line XML tool such as XMLStarlet. That will integrate well with the shell and let you perform XPath searches. It's XML-aware so it'll handle character encodings, whitespace correctly etc.
Simple cut should work for you:
cut -f2,6 -d"'" --output-delimiter=" "
If you really want sed:
sed -r "s/.'(.)'.type='(.)'.*/\1 \2/"
You can use this:
sed -r "s/<field [^>]*?number='([0-9]+)'[^>]*?type='([^']+)'[^>]*>/\1 \2/"
You would be better off using an XML parser, but if you had to use sed:
sed 's/<field number=\'(.*?)\'.*?type=\'(.*?)\'/\1 \2
sed -ni "/<field .*>/s#^.*[[:space:]]number='\\([^']\\+\\).*[[:space:]]type='\\([^']\\+\\).*#\1 \2#p" FILENAME
Or if you don't mind contents of number and type to be optional:
sed -ni "/<field .*>/s#^.*[[:space:]]number='\\([^']*\\).*[[:space:]]type='\\([^']*\\).*#\1 \2#p" FILENAME
Just change from [^']\\+ to [^']* at your preference.

Using sed to remove all console.log from javascript file

I'm trying to remove all my console.log, console.dir etc. from my JS file before minifying it with YUI (on osx).
The regex I got for the console statements looks like this:
console.(log|debug|info|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*)\);?
and it works if I test it with the RegExr.
But it won't work with sed.
What do I have to change to get this working?
sed 's/___???___//g' <$RESULT >$RESULT_STRIPPED
update
After getting the first answer I tried
sed 's/console.log(.*)\;//g' <test.js >result.js
and this works, but when I add an OR
sed 's/console.\(log\|dir\)(.*)\;//g' <test.js >result.js
it doesn't replace the "logs":
Your original expression looks fine. You just need to pass the -E flag to sed, for extended regular expressions:
sed -E 's/console.(log|debug|info|...|count)\((.*)\);?//g'
The difference between these types of regular expressions is explained in man re_format.
To be honest I have never read that page, but instead simply tack on an -E when things don't work as expected. =)
You must escape ( (for grouping) and | (for oring) in sed's regex syntax. E.g.:
sed 's/console.\(log\|debug\|info\|warn\|error\|assert\|dir\|dirxml\|trace\|group\|groupEnd\|time\|timeEnd\|profile\|profileEnd\|count\)(.*);\?//g'
UPDATE example:
$ sed 's/console.\(log\|debug\|info\|warn\|error\|assert\|dir\|dirxml\|trace\|group\|groupEnd\|time\|timeEnd\|profile\|profileEnd\|count\)(.*);\?//g'
console.log # <- input line, not matches, no replacement printed on next line
console.log
console.log() # <- input line, matches, no printing
console.log(blabla); # <- input line, matches, no printing
console.log(blabla) # <- input line, matches, no printing
console.debug(); # <- input line, matches, no printing
console.debug(BAZINGA) # <- input line, matches, no printing
DATA console.info(ditto); DATA2 # <- input line, matches, printing of expected data
DATA DATA2
HTH
I also find the way to remove all the console.log ,
and i am trying to use python to do this,
but i find the Regex is not work for.
my writing like this:
var re=/^console.log(.*);?$/;
but it will match the following string:
'console.log(23);alert(234dsf);'
does it work? with the
"s/console.(log|debug|info|...|count)((.*));?//g"
I try this:
sed -E 's/console.(log|debug|info)( ?| +)\([^;]*\);//g'
See the test:
Regex Tester
Here's my implementation
for i in $(find ./dir -name "*.js")
do
sed -E 's/console\.(log|warn|error|assert..timeEnd)\((.*)\);?//g' $i > ${i}.copy && mv ${i}.copy $i
done
took the sed thing from github
I was feeling lazy and hoping to find a script to copy & paste. Alas there wasn't one, so for the lazy like me, here is mine. It goes in a file named something like 'minify.sh' in the same directory as the files to minify. It will overwrite the original file and it needs to be executable.
#!/bin/bash
for f in *.js
do
sed -Ei 's/console.(log|debug|info)\((.*)\);?//g' $f
yui-compressor $f -o $f
done
I'd just like to add here that I was running into issues with namespaced console.logs such as window.console.log. Also Tweenmax.js has some interesting uses of console.log in some parts such as
window.console&&console.log(t)
So I used this
sed -i.bak s/[^\&a-zA-Z0-9\.]console.log\(/\\/\\//g js/combined.js
The regex effectively says replace all console.logs that don't start with &, alphanumerics, and . with a '//' comment, which uglify later takes out.
Rodrigocorsi's works with nested parentheses. I added a ? after the ; because yuicompressor was omitting some semicolons.
It is probable that the reason this is not working is that you are not 'limiting'
the regex to not include a closing parenthesises ()) in the method parameters.
Try this regular expression:
console\.(log|trace|error)\(([^)]+)\);
Remember to include the rest of your method names in the capture group.