How count the number of lines of a file group? - regex

I want to count the number of lines that have all the log files in a specified month. So far I get those files with the following command.
ls localhost_access_log.[0-9][0-9]-11-11*
An example of this file name is localhost_access_log.10-10-11.log
I tried to use
ls localhost_access_log.[0-9][0-9]-11-11* | wc -l
this option gives me is the number of files filtered by ls, not what I want. I want the sum of all the lines that have these files. Thanks

This will give you the total number of lines in all files -
wc -l localhost_access_log.[0-9][0-9]-11-11* | awk 'END{print $1}'
Even better -
awk 'END{print NR}' localhost_access_log.[0-9][0-9]-11-11*

Try:
wc -l localhost_access_log.[0-9][0-9]-11-11*

Related

Find all pdf in folder of a given year in filename

I have a folder with thousands of pdf named according to dates like 20100820.pdf or 20000124.pdf etc.
In the command line, I used the following command in other projects to search for all pdf in a folder and attach a command to it, like so ls | grep -E "\.pdf$" | [command here]. Now I would like it to search only those pdf in a given folder from the year 2010 for example. How can I achieve that?
Wow, it was so easy in the end.
This solves the problem:
ls | grep -E "\.pdf$" | grep -E "2010"
ls | grep -E "\.pdf$" | awk -F "." '{if ($1>20100000) print $1}'
This command takes all the pdfs and splits the filename, the digits in file name are then compared with 20100000. If greater then, it prints the filename.

How can I cut and statistics the string in text plain document?

I have got a large text plain doc,The content please refer this pic
cat textplain.txt|grep '^\.[\/[:alpha:]]*[\.\:][[:alpha:]]*'
I want the output result like below :
./external/selinux/libsepol/src/mls.c
./external/selinux/libsepol/src/handle.c
./external/selinux/libsepol/src/constraint.c
./external/selinux/libsepol/src/sidtab.c
./external/selinux/libsepol/src/nodes.c
./external/selinux/libsepol/src/conditiona.c
Question:
What's should I do
Just regenerate the file with
grep -lr des ./android/source/code
-l only lists the files with matches without showing their contents
-r is still needed to search subdirectories
-n has no influence on -l, so can be omitted. -c instead of -l would add the number of occurrences to each file name, but you'll probably want to | grep -v :0 to skip the zeroes.
Or, use cut and sort -u:
cut -d: -f1 textplain.txt | sort -u
-d: delimit columns by :
-f1 only output the first column
-u output unique lines

How to remove both matching lines while removing duplicates

I have a large text file containing a list of emails called "main", and I have sent mails to some of them. I have a list of 'sent' emails. Now, I want to remove the 'sent' emails from the list "main".
In other words, I want to remove both the matching raw from the text file while removing duplicates. Example:
I have:
email#email.com
test#test.com
email#email.com
I want:
test#test.com
Is there any easier way to achieve this? Please suggest a tool or method to do this, but please consider the text file is larger than 10MB.
In terminal:
cat test| sort | uniq -c | awk -F" " '{if($1==1) print $2}'
I use cygwin a lot for such tasks, as the unix command line is incredibly powerful.
Here's how to achieve what you want:
cat main.txt | sort -u | grep -Fvxf sent.txt
sort -u will remove duplicates (by sorting the main.txt file first), and grep will take care of removing the unwanted addresses.
Here's what the grep options mean:
-F plain text search
-v invert results
-x will force the whole line to match the pattern
-f read patterns from the specified file
Oh, and if your files are in the Windows format (CR LF newlines) you'll rather have to do this:
cat main.txt | dos2unix | sort -u | grep -Fvxf <(cat sent.txt | dos2unix)
Just like with the Windows command line, you can simply add:
> output.txt
at the end of the command line to redirect the output to a text file.

Why I got an 'en' include file while using awk to search for 'em' regex keyword?

I was learning awk regex search.
Under \etc path I did this command:
ls -l | awk '/em/ { print $9}'
And the output is this:
aptdaemon
at.deny
emacs
gnome-settings-daemon
systemd
I'm wondering why this 'at.deny' file was shown in the result?
Shouldn't /em/ just search for file name which contains 'em'?
It is matching some other column, try a search on the desired one only:
ls -l | awk '$9 ~ /em/ { print $9}'
It might be better to use find though:
find -maxdepth 0 -name '*em*'
Or just:
ls *em* 2>/dev/null
THE PROBLEM
ls -l will print more than just the name of files in the current directory, if you do ls -l at.deny I'm sure you will find a em somewhere in there.
Your current command to awk is searching the entire line, not just what happens to be in column $9, and then print the contents of column $9.
THE SOLUTION
If you just want to search, and print, the specified column, do so with the following:
ls -l | awk '($9 ~ /em/) { print $9 }'
CONCERNS
As expressed by #EdMorton:
you can't guarantee that $9 is the location of the [full] file name in ls output. $NF would be closer but still wouldn't account for file names that contain spaces. The better response is that you shouldn't try to parse the output of ls (google it).
The snippet in questions suffer from other flaws than the most obvious one, if you are trying to find the files in the current directory that matches a certain pattern, you should use something like find.
find . -maxdepth 0 -name '*em*'
The above would output all files in the current directory (.) (-maxdepth 0 is to prevent find from diving into subdirectories), having em in their name. If you'd like to use regular expressions please see the -regex switch in man find..

How do I select files 1-88

I have a files in the directory named OIS001_OD_EYE_MIC.png - OIS176_OD_EYE_MIC.png
Extract numbers 1-99 is easy as show by this regex.
I want 1-88 to divide the directory in half.
Why? So I can have two even sets of files to compress
ls | sed -n '/^[A-Z]*0[0-9][0-9].*EYE_MIC.png/p'
Here is my attempt of getting 0-99. Can you help me get 1-88 and perhaps 89-176?
You can use a range: {start..end} like this:
echo OIS00{0..88}_OD_EYE_MIC.png
will expand to
OIS000_OD_EYE_MIC.png OIS001_OD_EYE_MIC.png [...] OIS0087_OD_EYE_MIC.png OIS0088_OD_EYE_MIC.png
Look for Brace expansion in bash's man page
With a new-enough bash:
ls OIS0{01..88}_OD_EYE_MIC.png
With regexes you have to think about how the strings of certain number ranges look (you can't just match specific number ranges directly). 1-88:
/^[A-Z]*(00[1-9]|0[1-7][0-9]|08[0-8]).*EYE_MIC.png/
For 88 - 176:
/^[A-Z]*(089|09[0-9]|1[0-6][0-9]|17[0-6]).*EYE_MIC.png/
Here are some more examples.
Here's a piped parallel alternative:
ls -v | columns --by-columns -c2 | tr -s ' ' \
| tee >(cut -d' ' -f1 | tar cf part1.tar -T -) \
>(cut -d' ' -f2 | tar cf part2.tar -T -) > /dev/null
This method needs more work if the files have whitespace in their names.
The idea is to columnate the file-list and use tee to multiplex it into separate compression processes.
The columns program comes with the autogen package (at least in Debian).