egrep to find largest suffix for file

egrep to find largest suffix for file - regex

There are files like this:
Report.cfg
Report.cfg.1
Report.cfg.2
Report.cfg.3
I want to fetch the max suffix, if exists (i.e. 3) using egrep.
If I try simple egrep:
ls | egrep Report.cfg.*
I get the full file name and the whole list, not the suffix only.
What could be an optimized egrep?

You can use this awk to find greatest number from a list of file ending with dot and a number.:
printf '%s\n' *.cfg.[0-9] | awk -F '.' '$NF > max{max = $NF} END{print max}'
3

Related

AWK: get file name from LS

I have a list of file names (name plus extension) and I want to extract the name only without the extension.
I'm using
ls -l | awk '{print $9}'
to list the file names and then
ls -l | awk '{print $9}' | awk /(.+?)(\.[^.]*$|$)/'{print $1}'
But I get an error on escaping the (:
-bash: syntax error near unexpected token `('
The regex (.+?)(\.[^.]*$|$) to isolate the name has a capture group and I think it is correct, while I don't get is not working within awk syntax.
My list of files is like this ABCDEF.ext in the root folder.

Your specific error is caused by the fact that your awk command is incorrectly quoted. The single quotes should go around the whole command, not just the { action } block.
However, you cannot use capture groups like that in awk. $1 refers to the first field, as defined by the input field separator (which in this case is the default: one or more "blank" characters). It has nothing to do with the parentheses in your regex.
Furthermore, you shouldn't start from ls -l to process your files. I think that in this case your best bet would be to use a shell loop:
for file in *; do
printf '%s\n' "${file%.*}"
done
This uses the shell's built-in capability to expand * to the list of everything in the current directory and removes the .* from the end of each name using a standard parameter expansion.
If you really really want to use awk for some reason, and all your files have the same extension .ext, then I guess you could do something like this:
printf '%s\0' * | awk -v RS='\0' '{ sub(/\.ext$/, "") } 1'
This prints all the paths in the current directory, and uses awk to remove the suffix. Each path is followed by a null byte \0 - this is the safe way to pass lists of paths, which in principle could contain any other character.
Slightly less robust but probably fine in most cases would be to trust that no filenames contain a newline, and use \n to separate the list:
printf '%s\n' * | awk '{ sub(/\.ext$/, "") } 1'
Note that the standard tool for simple substitutions like this one would be sed:
printf '%s\n' * | sed 's/\.ext$//'

(.+?) is a PCRE construct. awk uses EREs, not PCREs. Also you have the opening script delimiter ' in the middle of the script AFTER the condition instead of where it belongs, before the start of the script.
The syntax for any command (awk, sed, grep, whatever) is command 'script' so this should be is awk 'condition{action}', not awk condition'{action}'.
But, in any case, as mentioned by #Aaron in the comments - don't parse the output of ls, see http://mywiki.wooledge.org/ParsingLs

Try this.
ls -l | awk '{ s=""; for (i=9;i<=NF;i++) { s = s" "$i }; sub(/\.[^.]+$/,"",s); print s}'
Notes:
read the ls -l output is weird
It doesn't check the items (they are files? directories? ... strip extentions everywhere)
Read the other answers :D

If the extension is always the same pattern try a sed replacement:
ls -l | awk '{print $9}' | sed 's\.ext$\\'

Printing Both Matching and Non-Matching Patterns

I am trying to compare two files to then return one of the files columns upon a match. The code that I am using right now is excluding non-matching patterns and just printed out matching patterns. I need to print all results, both matching and non-matching, using grep.
File 1:
A,42.4,-72.2
B,47.2,-75.9
Z,38.3,-70.7
C,41.7,-95.2
File 2:
F
A
B
Z
C
P
E
Current Result:
A,42.4,-72.2
B,47.2,-75.9
Z,38.3,-70.7
C,41.7,-95.2
Expected Result:
F
A,42.4,-72.2
B,47.2,-75.9
Z,38.3,-70.7
C,41.7,-95.2
P
E
Bash Code:
while IFS=',' read point lat lon; do
check=`grep "${point} /home/aaron/file2 | awk '{print $1}'`
echo "${check},${lat},${lon}"
done < /home/aaron/file1

In awk:
$ awk -F, 'NR==FNR{a[$1]=$0;next}{print ($1 in a?a[$1]:$1)}' file1 file2
F
A,42.4,-72.2
B,47.2,-75.9
Z,38.3,-70.7
C,41.7,-95.2
P
E
Explained:
$ awk -F, ' # field separator to ,
NR==FNR { # file1
a[$1]=$0 # hash record to a, use field 1 as key
next
}
{
print ($1 in a?a[$1]:$1) # print match if found, else nonmatch
}
' file1 file2

If you don't care about order, there's a join binary in GNU coreutils that does just what you need :
$sort file1 > sortedFile1
$sort file2 > sortedFile2
$join -t, -a 2 sortedFile1 sortedFile2
A,42.4,-72.2
B,47.2,-75.9
C,41.7,-95.2
E
F
P
Z,38.3,-70.7
It relies on files being sorted and will not work otherwise.
Now will you please get out of my /home/ ?

another join based solution preserving the order
f() { nl -nln -s, -w1 "$1" | sort -t, -k2; }; join -t, -j2 -a2 <(f file1) <(f file2) |
sort -t, -k2 |
cut -d, -f2 --complement
F
A,42.4,-72.2,2
B,47.2,-75.9,3
Z,38.3,-70.7,4
C,41.7,-95.2,5
P
E
Cannot beat the awk solution but another alternative utilizing unix toolchain based on decorate-undecorate pattern.

Problems with your current solution:
1. You are missing a double-quote in grep "${point} /home/aaron/file2.
2. You should start with the other file for printing all lines in that file
while IFS=',' read point; do
echo "${point}$(grep "${point}" /home/aaron/file1 | sed 's/[^,]*,/,/')"
done < /home/aaron/file2
3. The grep can give more than one result. Which one do you want (head -1) ?
An improvement would be
while IFS=',' read point; do
echo "${point}$(grep "^${point}," /home/aaron/file1 | sed -n '1s/[^,]*,/,/p')"
done < /home/aaron/file2
4. Using while is the wrong approach.
For small files it wil get the work done, but you will get stuck with larger files. The reason is that you will call grep for each line in file2, reading file1 a lot of times.
Better is using awk or some other solution.
Another solution is using sed with the output of another sed command:
sed -r 's#([^,]*),(.*)#s/^\1$/\1,\2/#' /home/aaron/file1
This will give commands for the second sed.
sed -f <(sed -r 's#([^,]*),(.*)#s/^\1$/\1,\2/#' /home/aaron/file1) /home/aaron/file2

bash search for multiple patterns on different lines in a file

I have a number of files and I want to filter out the ones that contain 2 patterns. However these patterns are on different lines. I've tried it using grep and awk but in both cases they only seem to work on matches patterns on the same line. I know grep is line based but I'm less familiar with awk. Here's what I came up with but it only works prints lines that match both strings:
awk '/string1/ && /string2/' file

Grep will easily handle this using xargs:
grep -l string1 * | xargs grep -l string2
Use this command in the directory where the files are located, and resulting matches will be displayed.

Depending om whether you really want to search for regexps:
gawk -v RS='^$' '/regexp1/ && /regexp2/ {print FILENAME}' file
or for strings:
gawk -v RS='^$' 'index($0,"string1") && index($0,"string2") {print FILENAME}' file
The above uses GNU awk for multi-char RS to read the whole file as a single record.

You can do it with find
find -type f -exec bash -c "grep -q string1 {} && grep -q string2 {} && echo {}" ";"

You could do it like this with GNU awk:
awk '/foo/{seenFoo++} /bar/{seenBar++} seenFoo&&seenBar{print FILENAME;seenFoo=seenBar=0;nextfile}' file*
That says... if you see foo, increment variable seenFoo, likewise if you see bar, increment variable seenBar. If, at any point, you have seen both foo and bar, print the name of the current file and skip to the next input file ignoring all remaining lines in current file, and, before you start the next file, clear the flags to say we have seen neither foo nor bar in the new file.

get file names, time stamps and MD5 checksums from a log file

I want to write a bash script, that will take the output of a log file and extract the relevant content to another log file, which I will used to do statistical analysis of the time it takes to send a file as an example:
The content is as follows:
FileSize TimeStamp MD5 Full Path to File
4824597 2013-06-21 11:26 5a264...c11 ...45/.../.../ITAM.xml
4824597 2013-06-20 23:18 5a264...c11 ...48/.../.../1447_rO8iKD.TMP.ITAM.xml
I am trying to extract the TimeStamp and the Full Path to the File.
I am a beginner in scripting but so far I have tried:
cat "/var/log/Customer.log" | grep '2013* *11' >> test.txt
Are there other methods I'm missing. Thank you very much.

If you want extract the TimeStamp and the Full Path for all entries then this should work:
awk 'NR>1{print $2,$3,$NF}' inputFile > outputFile

Code for GNU sed:
sed -nr '2,$ {s/\S+\s+(\S+)\s+(\S+)\s+\S+\s+(.*)/\1 \2\t\3/;p}' file
$cat file
FileSize TimeStamp MD5 Full Path to File
4824597 2013-06-21 11:26 5a264...c11 ...45/.../.../ITAM.xml
4824597 2013-06-20 23:18 5a264...c11 ...48/.../.../1447_rO8iKD.TMP.ITAM.xml
$sed -nr '2,$ {s/\S+\s+(\S+)\s+(\S+)\s+\S+\s+(.*)/\1 \2\t\3/;p}' file
2013-06-21 11:26 ...45/.../.../ITAM.xml
2013-06-20 23:18 ...48/.../.../1447_rO8iKD.TMP.ITAM.xml

Looks like this is what you want:
awk '$2 ~ /^2013/ && $4 ~ /11$/ { print $2, $3, $NF; }' /var/log/Customer.log > test.txt
$2 ~ /^2013/ matches dates beginning with 2013
$4 ~ /11$/ matches MD5 ending with 11
print $2, $3, $NF prints fields 2 (date), 3 (time), and the last field (pathname)
If these regular expressions are confusing to you, go to Regular-Expressions.info and read the tutorial.

Assuming the columns are tab-separated, you can just use cut:
cut -f2,4 /var/log/Customer.log | grep -v MD5 >> test.txt
will append columns 2 and 4 (counting starts at 1) into the test.txt. Lines containing MD5 will be removed by the grep invocation.

You can do it like this:
awk 'NR!=1 {print $2 " " $3 "\t" $5}' Customer.log > stat.txt

Cut out number in files named after a pattern

I've got the following files:
create_file_1.sql
create_file_2.sql
create_file_3.sql
create_file_4.sql
I'm iterating those files in a loop.
Now I want to get the number inside those files. I want to store the 1, 2, 3, … inside a variable in the loop.
How can I achieve this? How can I cut out this number?
P. S.: I want to achieve this with an AIX command.

Using sed:
[jaypal:~/Temp] echo "create_file_1.sql" | sed 's/.*_\([0-9]\+\)\.sql/\1/'
1
Using bash:
[jaypal:~/Temp] var="create_file_1.sql"
[jaypal:~/Temp] tmp=${var%.*} # Removes the extension
[jaypal:~/Temp] var=${tmp##*_} # Removes portion till the last underscore
[jaypal:~/Temp] echo $var
1
Using awk:
[jaypal:~/Temp] echo "create_file_1.sql" | awk -v FS="[_.]" '{print $(NF-1)}'
1

Well ... It depends on how flexible you want it to be. If you can assume that the number is "the part between the second underscore and the first period after the second underscore", you can simply use:
NUMBER=$(echo $FILENAME | cut -d_ -f3 | cut -d. -f1)
assuming that $FILENAME holds the current filename, of course.
This uses cut to first take the string after the second underscore, then cutting that by taking the string leading up to the first period.
This, admittedly, does not use regular expressions which maybe you want based on your tags, but I find the above a bit easier to read for a simple case like this.

for filename in create_file_1.sql create_file_2.sql create_file_3.sql create_file_4.sql
do
i=$(echo $filename | cut -d_ -f3 | cut -d. -f1)
# do something with $i
done

If the only number in the file name is the one that you want to get this will also works
for filename in create_file_1.sql create_file_2.sql create_file_3.sql create_file_4.sql ; do
number=`echo $filename | grep [0-9]* -o`
done

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js