How to sort output of "s3cmd ls" - regex

Amazon "s3cmd ls" takes like this output:
2010-02-20 21:01 1458414588 s3://file1.tgz.00<br>
2010-02-20 21:10 1458414527 s3://file1.tgz.01<br>
2010-02-20 22:01 1458414588 s3://file2.tgz.00<br>
2010-02-20 23:10 1458414527 s3://file2.tgz.01<br>
2010-02-20 23:20 1458414588 s3://file2.tgz.02<br>
<br>
How to select all files of archive, ending at 00 ... XX, with the latest date of fileset ?
Date and time is not sorted.
Bash, regexp ?
Thanx!

s3cmd ls s3://bucket/path/ | sort -k1,2
This will sort by date ascending.

DATE=$(s3cmd ls | sort -n | tail -n 1 | awk '{print $1}')
s3cmd ls | grep $DATE
sorting as a number schould put youngest dates last. Tail -n1 takes the last line, awk cuts the first word which is the date. Use that to get all entries of that date.
But maybe I didn't understand the question - so you have to rephrase it. You tell us 'date and time is not sorted' and provide an example where they are sorted - you ask for the latest date, but all entries have the same date.

Try
This command syntax is s3 ls s3:// path of the bucket / files | sort
s3cmd ls s3://file1.tgz.00<br> | sort
It will sort in the descending date at last

Simply append | sort to your s3cmd ls command to sort by date. The most recent file will appear right above your command line.
s3cmd ls s3://path/to/file | sort

Related

Filter a list of folders obtained via ls using a regex expression

Context
I have an ls command which gives me a list of folder basenames, as follows:
INPUT:
bash$ ls -d /nfs_archivedbuilds/build/mx/${VERSION_NAME}/${OPERATING_SYSTEM}/* | xargs -n1 basename
OUTPUT:
4750070-190311-0913-3603182
4761979-190319-SHELVE-3617880
4763232-190319-2049-3618496
4763232-190320-SHELVE-3619115
4764259-190320-1402-3619606
4764259-190320-cifx-6274238
4764339-190320-2049-3620637
4764339-190320-SHELVE-3620115
4764339-190320-cifx-6274274
These folders are ordered from the oldest (first result) to the newest (last result).
I have a logic in place which starts by checking the newest: if it's good (I make some checks about the content of the folder) I want to keep it, else I want to analyze the second-newer (and so on).
In order to do this, I start by getting the tail -1:
SETUPS_CONTROL=1
MY_SETUP=$(ls -d /nfs_archivedbuilds/build/mx/${VERSION_NAME}/${OPERATING_SYSTEM}/* | xargs -n1 basename | tail -${SETUPS_CONTROL})
... doing some stuff to check
... and if it ends up not to be good, I increase SETUPS_CONTROL and get again MY_SETUP, this time with a +1 tail.
Question
I would like to keep the same logic, but being able to filter out, already in my ls command, every folder which is not made of numbers only.
For example:
The folder 4750070-190311-0913-3603182 would be good for me, because it's only numbers and dashes
The folder 4761979-190319-SHELVE-3617880 would not be good for me, because it contains a word (SHELVE) and it's not only made of numbers and dashes.
Currently, when I apply my loop to determine whether it's good, I'm forced to do a regex on each result I obtain to determine whether it's good or not:
if [[ ${MY_SETUP} =~ "^[0-9-]+$" ]]
then
#my setup might be good
else
#my setup is not good already, no need to further my checks
fi
This works fine, but I was wondering if I couldn't filter the list directly in my ls command instead of getting whatever, and having to loop to understand whether it's good or not.
My attempt
I have tried to pipe a grep "my regex expression" into the command:
ls -d /nfs_archivedbuilds/build/mx/${VERSION_NAME}/${OPERATING_SYSTEM}/* | xargs -n1 basename | grep "^[0-9-]+$"
... but it returns an empty result.
Same with single quote:
ls -d /nfs_archivedbuilds/build/mx/${VERSION_NAME}/${OPERATING_SYSTEM}/* | xargs -n1 basename | grep '^[0-9-]+$'
Can anyone please help?
You are missing -E while doing grep with regular expression, please add grep -E

Find all pdf in folder of a given year in filename

I have a folder with thousands of pdf named according to dates like 20100820.pdf or 20000124.pdf etc.
In the command line, I used the following command in other projects to search for all pdf in a folder and attach a command to it, like so ls | grep -E "\.pdf$" | [command here]. Now I would like it to search only those pdf in a given folder from the year 2010 for example. How can I achieve that?
Wow, it was so easy in the end.
This solves the problem:
ls | grep -E "\.pdf$" | grep -E "2010"
ls | grep -E "\.pdf$" | awk -F "." '{if ($1>20100000) print $1}'
This command takes all the pdfs and splits the filename, the digits in file name are then compared with 20100000. If greater then, it prints the filename.

How do I select files 1-88

I have a files in the directory named OIS001_OD_EYE_MIC.png - OIS176_OD_EYE_MIC.png
Extract numbers 1-99 is easy as show by this regex.
I want 1-88 to divide the directory in half.
Why? So I can have two even sets of files to compress
ls | sed -n '/^[A-Z]*0[0-9][0-9].*EYE_MIC.png/p'
Here is my attempt of getting 0-99. Can you help me get 1-88 and perhaps 89-176?
You can use a range: {start..end} like this:
echo OIS00{0..88}_OD_EYE_MIC.png
will expand to
OIS000_OD_EYE_MIC.png OIS001_OD_EYE_MIC.png [...] OIS0087_OD_EYE_MIC.png OIS0088_OD_EYE_MIC.png
Look for Brace expansion in bash's man page
With a new-enough bash:
ls OIS0{01..88}_OD_EYE_MIC.png
With regexes you have to think about how the strings of certain number ranges look (you can't just match specific number ranges directly). 1-88:
/^[A-Z]*(00[1-9]|0[1-7][0-9]|08[0-8]).*EYE_MIC.png/
For 88 - 176:
/^[A-Z]*(089|09[0-9]|1[0-6][0-9]|17[0-6]).*EYE_MIC.png/
Here are some more examples.
Here's a piped parallel alternative:
ls -v | columns --by-columns -c2 | tr -s ' ' \
| tee >(cut -d' ' -f1 | tar cf part1.tar -T -) \
>(cut -d' ' -f2 | tar cf part2.tar -T -) > /dev/null
This method needs more work if the files have whitespace in their names.
The idea is to columnate the file-list and use tee to multiplex it into separate compression processes.
The columns program comes with the autogen package (at least in Debian).

Git log stats with regular expressions

I would like to do some stats on my git log to get something like:
10 Daniel Schmidt
5 Peter
1 Klaus
The first column is the count of commits and the second the commiter.
I already got as far as this:
git log --raw |
grep "^Author: " |
sort |
uniq -c |
sort -nr |
less -FXRS
The interesting part is the
grep "^Author: "
which i wanted to modify with a nice Regex to exclude the mail adress.
With Rubular something like this http://rubular.com/r/mEzP2hFjGb worked, but if i insert it in the grep (or in a piped other one) it won't get me the right output.
Sidequestion: Is there a possibility to get the count and the author seperated by something else then whitespace while staying in this pipe command style? I would like to have a nicer seperator between both to us column later (and maybe some color ^^)
Thanks a lot for your help!
Google git-extras. It has a git summary that does this.
git shortlog -n -s gets you the same data. On the git repository, for example (piped to head to get higher numbers):
$ git shortlog -n -s | head -4
11129 Junio C Hamano
1395 Shawn O. Pearce
1103 Linus Torvalds
896 Jeff King
To get a different delimiter, you could pipe it to awk:
$ git shortlog -n -s | awk 'BEGIN{OFS="|";} { $1=$1; print $0 }' | head -4
11129|Junio|C|Hamano
1395|Shawn|O.|Pearce
1103|Linus|Torvalds
896|Jeff|King
You can get the full power of pcre (which should match your experiments with Rebular) with a perl one-liner:
perl -ane 'print if /^Author: /'
Just extend that pattern as necessary.
To reformat you can use awk (eg awk '{printf "%5d\t%s", $1, $2}')

How count the number of lines of a file group?

I want to count the number of lines that have all the log files in a specified month. So far I get those files with the following command.
ls localhost_access_log.[0-9][0-9]-11-11*
An example of this file name is localhost_access_log.10-10-11.log
I tried to use
ls localhost_access_log.[0-9][0-9]-11-11* | wc -l
this option gives me is the number of files filtered by ls, not what I want. I want the sum of all the lines that have these files. Thanks
This will give you the total number of lines in all files -
wc -l localhost_access_log.[0-9][0-9]-11-11* | awk 'END{print $1}'
Even better -
awk 'END{print NR}' localhost_access_log.[0-9][0-9]-11-11*
Try:
wc -l localhost_access_log.[0-9][0-9]-11-11*