How do I select files 1-88 - regex

I have a files in the directory named OIS001_OD_EYE_MIC.png - OIS176_OD_EYE_MIC.png
Extract numbers 1-99 is easy as show by this regex.
I want 1-88 to divide the directory in half.
Why? So I can have two even sets of files to compress
ls | sed -n '/^[A-Z]*0[0-9][0-9].*EYE_MIC.png/p'
Here is my attempt of getting 0-99. Can you help me get 1-88 and perhaps 89-176?

You can use a range: {start..end} like this:
echo OIS00{0..88}_OD_EYE_MIC.png
will expand to
OIS000_OD_EYE_MIC.png OIS001_OD_EYE_MIC.png [...] OIS0087_OD_EYE_MIC.png OIS0088_OD_EYE_MIC.png
Look for Brace expansion in bash's man page

With a new-enough bash:
ls OIS0{01..88}_OD_EYE_MIC.png

With regexes you have to think about how the strings of certain number ranges look (you can't just match specific number ranges directly). 1-88:
/^[A-Z]*(00[1-9]|0[1-7][0-9]|08[0-8]).*EYE_MIC.png/
For 88 - 176:
/^[A-Z]*(089|09[0-9]|1[0-6][0-9]|17[0-6]).*EYE_MIC.png/
Here are some more examples.

Here's a piped parallel alternative:
ls -v | columns --by-columns -c2 | tr -s ' ' \
| tee >(cut -d' ' -f1 | tar cf part1.tar -T -) \
>(cut -d' ' -f2 | tar cf part2.tar -T -) > /dev/null
This method needs more work if the files have whitespace in their names.
The idea is to columnate the file-list and use tee to multiplex it into separate compression processes.
The columns program comes with the autogen package (at least in Debian).

Related

How to find specific text in a text file, and append it to the filename?

I have a collection of plain text files which are named as yymmdd_nnnnnnnnnn.txt, which I want to append another number sequence to the filenames, so that they each become named as yymmdd_nnnnnnnnnn_iiiiiiiii.txt instead, where the iiiiiiiii is taken from the one line in each file which contains the text "GST: 123456789⏎" (or similar) at the end of the line. While I am sure that there will only be one such matching line within each file, I don't know exactly which line it will be on.
I need an elegant one-liner solution that I can run over the collection of files in a folder, from a bash script file, to rename each file in the collection by appending the specific GST number for each filename, as found within the files themselves.
Before even getting to the renaming stage, I have encountered a problem with this. Here is what I tried, which didn't work...
# awk '/\d+$/' | grep -E 'GST: ' 150101_2224567890.txt
The grep command alone works perfectly to find the relevant line within the file, but the awk doesn't return just the final digits group. It fails with the error "warning: regexp escape sequence \d is not a known regexp operator". I had assumed that this regex should return any number of digits which are at the end of the line. The text file in question contains a line which ends with "GST: 112060340⏎". Can someone please show me how to make this work, and maybe also to help with the appropriate coding to move the collection of files to the new filenames? Thanks.
Thanks to a comment from #Renaud, I now have the following code working to obtain just the GST registration number from within a text file, which puts me a step closer towards a workable solution.
awk '/GST: / {printf $NF}' 150101_2224567890.txt
I still need to loop this over the collection instead of just specifying one filename. I also need to be able to use the output from #Renaud's contribution, to rename the files. I'm getting closer to a working solution, thanks!
This awk should work for you:
awk '$1=="GST:" {fn=FILENAME; sub(/\.txt$/, "", fn); print "mv", FILENAME, fn "_" $2 ".txt"; nextfile}' *_*.txt | sh
To make it more readable:
awk '$1 == "GST:" {
fn = FILENAME
sub(/\.txt$/, "", fn)
print "mv", FILENAME, fn "_" $2 ".txt"
nextfile
}' *_*.txt | sh
Remove | sh from above to see all mv commands together.
You may try
for f in *_*.txt; do echo mv "$f" "${f%.txt}_$(sed '/.*GST: /!d; s///; q' "$f").txt"; done
Drop the echo if you're satisfied with the output.
As you are sure there is only one matching line, you can try:
$ n=$(awk '/GST:/ {print $NF}' 150101_2224567890.txt)
$ mv 150101_2224567890.txt "150101_2224567890_$n.txt"
Or, for all .txt files:
for f in *.txt; do
n=$(awk '/GST:/ {print $NF}' "$f")
if [[ -z "$n" ]]; then
printf '%s: GST not found\n' "$f"
continue
fi
mv "$f" "$f{%.txt}_$n.txt"
done
Another one-line solution to consider, although perhaps not so elegant.
for original_filename in *_*.txt; do \
new_filename=${original_filename%'.txt'}_$(
grep -E 'GST: ' "$original_filename" | \
sed -E 's/.*GST//g; s/[^0-9]//g'
)'.txt' && \
mv "$original_filename" "$new_filename"; \
done
Output:
150101_2224567890_123456789.txt
If you are open to a multi line script:-
#!/bin/sh
for f in *.txt; do
prefix=$(echo "${f}" | sed s'#\.txt##')
cp "${f}" f1
sed -i s'#GST#%GST#' "./f1"
cat "./f1" | tr '%' '\n' > f2
number=$(cat "./f2" | sed -n '/GST/'p | cut -d':' -f2 | tr -d ' ')
newname="${prefix}_${number}.txt"
mv -v "${f}" "${newname}"
rm -v "./f1"
rm -v "./f2"
done
In general, if you want to make your files easy to work with, then leave as many potential places for them to be split with newlines as possible. It is much easier to alter files by simply being able to put what you want to delete or print on its' own line, than it is to search for things horizontally with regular expressions.

Using sed to trim the beginning of stdout

I'm writing a small script to list all the directories being shared in a macos system. Macos has a simple tool called sharing -l that will list all the paths once it's combined with sharing -l | grep path The problem is the output looks like this:
path: /Volumes/Storage A/File Server/
and I need it to look like this instead
/Volumes/Storage\ A/File\ Server/
So the white spaces need to be escaped and the beginning of the line with path: and the white space needs to be trimmed. I'm been messing about with sed for hours now but I just don't know enough about it to do this all in one command. I'm hoping to append something to the end of sharing -l | grep path
You may use this:
sharing -l | sed -En '/^path:/{ s/^path:[[:blank:]]*//; s/[[:blank:]]+/\\&/g; p;}'
Could you please try following.
sharing -l | awk '{$2=$2"\\";$3=$3"\\";sub(/^path: +/,"")} 1'
If you don't need the white spaces escaped:
$ sharing -l | sed -n 's/^path:[[:space:]]*//p'
/Volumes/Storage A/File Server/
and if you do:
$ sharing -l | awk 'sub(/^path:[[:space:]]*/,""){gsub(/[[:space:]]/,"\\\\&"); print}'
/Volumes/Storage\ A/File\ Server/

How can I cut and statistics the string in text plain document?

I have got a large text plain doc,The content please refer this pic
cat textplain.txt|grep '^\.[\/[:alpha:]]*[\.\:][[:alpha:]]*'
I want the output result like below :
./external/selinux/libsepol/src/mls.c
./external/selinux/libsepol/src/handle.c
./external/selinux/libsepol/src/constraint.c
./external/selinux/libsepol/src/sidtab.c
./external/selinux/libsepol/src/nodes.c
./external/selinux/libsepol/src/conditiona.c
Question:
What's should I do
Just regenerate the file with
grep -lr des ./android/source/code
-l only lists the files with matches without showing their contents
-r is still needed to search subdirectories
-n has no influence on -l, so can be omitted. -c instead of -l would add the number of occurrences to each file name, but you'll probably want to | grep -v :0 to skip the zeroes.
Or, use cut and sort -u:
cut -d: -f1 textplain.txt | sort -u
-d: delimit columns by :
-f1 only output the first column
-u output unique lines

How to split, map, and join in Bash?

I want to create a simple regular expression to match some files. The command npm ls --dev --parseable prints out a bunch of files, for example:
/Users/chetcorcos/code/dev-tool/node_modules/fsevents/node_modules/tough-cookie
/Users/chetcorcos/code/dev-tool/node_modules/fsevents/node_modules/tunnel-agent
/Users/chetcorcos/code/dev-tool/node_modules/fsevents/node_modules/rimraf
/Users/chetcorcos/code/dev-tool/node_modules/fsevents/node_modules/rimraf/node_modules/glob
/Users/chetcorcos/code/dev-tool/node_modules/fsevents/node_modules/rimraf/node_modules/glob/node_modules/inflight
/Users/chetcorcos/code/dev-tool/node_modules/fsevents/node_modules/rimraf/node_modules/glob/node_modules/inflight/node_modules/wrappy
I want to get back a string that looks something like this:
tough-cookie|tunnel-agent|rimraf|inflight|wrappy
To get this, I want to "split by newline, map over basename, and join with a pipe". In JavaScript with Ramdajs, I'd so something like this:
R.pipe(R.split('\n'), R.map(R.split('/')), R.map(R.nth(-1)), R.join('|'))
Any ideas how to do something like this in bash? Whats the idiomatic way of doing this?
Bash doesn't have functional programming primitives built in. It's possible to build them with a hundred lines of code or so, but also not particularly worth it for this kind of use case.
Consider:
content=$(npm ls --dev --parseable | sed -e 's#.*/##' | paste -s -d '|')
echo "$content"
...this routes the stdout of NPM into sed, telling it to replace everything up to the last slash in each line with an empty string, and then routing the stdout of sed into paste, using that to combine all lines into a single string with | separating them.
Alternately, to use no tools not built into bash itself (other than your data source, npm):
#!/bin/bash
# note that this requires bash 4.0 or later
mapfile -t lines < <(npm ls --dev --parseable) # read content into array
lines=( "${lines[#]##*/}" ) # trim everything prior to last / in each
(IFS='|'; printf '%s\n' "${lines[*]}") # emit array as a single string with |s
You could just pipe that thing to awk and have awk pick off the last element:
npm ls --dev --parseable | awk -F"/" '{output=output$(NF)"|"} END { sub(/[|]+$/, "", output); print output }'
That awk script will split incoming records by /, capture the last element $(NF) to variable output with a pipe to delimit, Then once complete, will strip the last pipe using gsub and spit the results out
You already have a 'list' of strings, separated by '\n'.
Just map basename on each item (using xargs) - then you'll get list of basenames separated by '\n' plus final '\n'. Then replace each '\n' with '|' symbol:
anycmd | xargs -r -n1 basename | tr '\n' '|'
You may then remove last '|' by either sed or second xargs.
anycmd | xargs -r -n1 basename | tr '\n' '|' | sed 's/|$//'
or
anycmd | xargs -r -n1 basename | xargs | tr ' ' '|'

Using sed/awk and regex to process logs

I have 1000s of log files generated by a very verbose PHP script. The general structure is as follows
###Unknown no of lines, which I want to ignore###
=================================================
$insert_vars['cdr_pkey']=17568
$id<TAB>$g1<TAB>$i1<tab>rating1<TAB>$g2<TAB>$i2<tab>rating2 #<TAB>more $gX,$iX,$ratingX
#numerical values of $id $g1 $i1 etc. separated by tab
#numerical values of ---""---
#I do not know how many lines will be there (unique column is $id)
=================================================
###Unknown no of lines, which I want to ignore###
I have to process these log files and create an excel sheet (I am thinking csv format) and report the data back. I am really bad at excel, but I thought of outputting something like :
cdr_pkey<TAB>id<TAB>g1<TAB>i1<TAB>rating1<TAB>g2<TAB>rating2 #and so on
17568<TAB>1349<TAB>0.0004532<TAB>0.01320<TAB>2.014E-4<TAB>...#rest of numerical values
17568<TAB>1364<TAB>...#values for id=1364
17568<TAB>1321<TAB>...#values for id=1321
...
17569<TAB>1048<TAB>...#values for id=1048
17569<TAB>1426<TAB>...#values for id=1426
...
...
So my cdr_pkey is unique column in the sheet, and for each $cdr_pkey, I have multiple $ids, each having their own set of $g1,$i1,$rating1...
After testing such format, it can be read by excel. Now I just want to extend it to all those 1000s of files.
I am just not sure how to proceed further. What's the next step?
The following bash script does something that might be related to what you want. It is parameterized by what you meant when you said <TAB>. I assume you mean the ascii tab character, but if your logs are so verbose that they spell out <TAB> you will need to modify the variable $WHAT_DID_YOU_MEAN_BY_TAB accordingly. Note that there is very little about this script that does The Right Thing™; it reads the entire file into a string variable, which might not even be possible depending on how big your log files are. On the up side, the script could be easily modified to make two passes, instead, if you think that's better.
#!/bin/bash
WHAT_DID_YOU_MEAN_BY_TAB='\t'
if [[ $# -ne 1 ]] ; then echo "Requires one argument: the file to process" ; exit 1 ; fi
FILENAME="$1"
RELEVANT=$(sed -n '/^==*$/,/^==*$/p' "$FILENAME" | sed '1d' | head -n '-1')
CDR_PKEY=$(echo "$RELEVANT" | \
grep '$insert_vars\['"'cdr_pkey'\]" | \
sed 's/.*=\(.*\)/\1/')
echo "$RELEVANT" | sed '1,2d' | \
sed "s/.*/${CDR_PKEY}$WHAT_DID_YOU_MEAN_BY_TAB\0/"
The following find command is an example use, but your case will depend on how your logs are organized.
find . LOG_PATTERN -exec THIS_SCRIPT '{}' \;
Lastly, I have ignored the issue of putting the CSV headers on the output. This is easily done out-of-band.
(Edit: updated the script to reflect discussion in the comments.)
EDIT: James tells me that changing the sed in last echo from ... 1d ... to ... 1,2 ... and dropping the grep -v 'id' should do the trick.
Confirmed that it works. So changing it below. Thanks again to James Wilcox.
Based on #James script this is what I came up with. I just piped the final echo to grep -v 'id'
Thanks again James Wilcox
WHAT_DID_YOU_MEAN_BY_TAB='\t'
if [[ $# -lt 1 ]] ; then echo "Requires at least one argument: the files to process" ; exit 1 ; fi
echo -e "key\tid\tg1\ti1\td1\tc1\tr1\tg2\ti2\td2\tc2\tr2\tg3\ti3\td3\tc3\tr3"
for i in "$#"
do
FILENAME="$i"
RELEVANT=$(sed -n '/^==*$/,/^==*$/p' "$FILENAME" | sed '1d' | head -n '-1')
CDR_PKEY=$(echo "$RELEVANT" | \
grep '$insert_vars\['"'cdr_pkey'\]" | \
sed 's/.*=\(.*\)/\1/')
echo "$RELEVANT" | sed '1, 2d' | \
sed "s/.*/${CDR_PKEY}$WHAT_DID_YOU_MEAN_BY_TAB\0/"
#the one with grep looked like :-
#echo "$RELEVANT" | sed '1d' | \
#sed "s/.*/${CDR_PKEY}$WHAT_DID_YOU_MEAN_BY_TAB\0/" | grep -v 'id'
done