Linux Commad Line Zip with Regex - regex

I have thousands of jpg files that are all called 1.jpg, 2.jpg, 3.jpg and so on. I need to zip up a range of them and I thought I could do this with regex, but so far haven't had any luck.
Here is the command
zip images.zip '[66895-105515]'.jpg
Does anyone have any ideas?

I am very sure that is not possible to match number ranges like this with regular expressions (digit ranges, yes, but not whole multi-digit numbers), as regular expressions work on the character level. However, you can use the "seq" command to generate the list of files and use "xargs" to pass them to "zip":
seq --format %g.jpg 66895 105515 | xargs zip images.zip
I tested the command with a bunch of dummy files under Linux and it works fine.

Use in conjunction with ls and bash range ({m..n}) operator like this:
ls {66895..105515}".jpg" 2>/dev/null | zip jpegs -#

You need to pipe some stuff - list the files, filter by the regex, zip up each listed file.
ls | grep [66895-10551] | xargs zip images.zip
Edit: Whoops, didn't test with multi-digit numbers. As denisw mentions, this method won't work.

Related

Random sample from regex

I would like to test a tool on a small number of files from a directory. To run the tool on all files in the directory, I would run:
./my-tool input/*.test
However, the tool takes a long time to run and I would like to test it only on a subset of the files in input/. Currently, I am copying a random subset to another folder and using the regex to grab all files from that folder
My question is: Is there any way to limit the regex matches? i.e. a way to run ./my-tool input/[PATTERN].test Where [PATTERN] is a regex that will expand to only be n matches. Even better, is there a way to do that and randomize which ones are returned?
On GNU/Linux you can easily and robustly select a subset of files with shuf:
shuf -ze -n 10 input/*.test | xargs -0 ./my-tool

Match the string using regex

I have few files like 123.iso, 234.isoaa, 456.isoab, sajdhsjf.isoaf.
I want to extract all the files except those that end with exactly .iso.
For example, I should have 234.isoaa, 456.isoab, sajdhsjf.isoaf.
Assuming you meant "all the files with suffix beginning with .iso except those...", this works:
ls -1 | egrep "\.iso.+"
Try this :
\b[\w](?!.iso\b).[\w]\b
As Tim Pietzker noted, you didn't say for which shell you need a solution, but in zsh you could do
setopt local_options extended_glob
echo *^*.iso(N)
If you are happy to get only files which have .isoX at the end (with any X), this should work in bash, zsh and ksh:
echo *.iso?*
Note that this second solution - different to the first one - would not list files such as abc.txt.
Of course you can do a ls -1 instead of the echo. This depends on how you what you want to do with the result.

how to remove lines from file that don't match regex?

I have a big file that looks like this:
7f0c41d6-f9c6-47aa-a034-d40bc629c973.csv
159890
159891
24faaed6-62ee-4175-8430-5d73b09911c8.csv
159907
5bad221f-25ef-44fa-9086-fd152e697928.csv
642e4ac3-3d46-4b4c-b5c8-aa2fa54d0b04.csv
d0e145a5-ceb8-4d4b-ae47-11e0c9a6548d.csv
159929
ba678cbd-af57-493b-a69e-e7504b4bc328.csv
7750840f-9bf9-4a68-9f25-a2ba0968d481.csv
159955
159959
And I'm only interesting in *.csv files, can someone point me how to remove files that do not end with .csv.
Thank you.
grep "\.csv$" file
will pull out only those lines ending in .csv
Then if you want to put them in a different file;
grep "\.csv$" file > newfile
sed is your friend:
sed -i.bak '/\.csv$/!d' file
-i.bak : in-place edit. creates backup file with .bak extension
([0-9a-zA-Z-]*.csv$)
This is the regex code that only select the filename ending with .csv extensions.
Hope this will help you.
If you are familiar with the vim text editor (vim or vi is typically installed on many linux boxes), use the following vim Ex mode command to remove lines that don't match a particular pattern:
:v/<pattern>/d
For example, if I wanted to delete all lines that didn't contain "column" I would run:
:v/"column"/d
Hope this helps.
If it is the case that you do not want to have to save the names of files in another file just to remove unwanted files, then this may also be an added solution for your needs (understanding that this is an old question).
This single line for loop using the grep "\.csv" file solution recursively so you don't need to manage multiple files names being saved here or there.
for f in *; do if [ ! "$(echo ${f} | grep -Eo '.csv')" == ".csv" ]; then rm "${f}"; fi; done
As a visual aid to show you that it works as intended (for removing all files except csv files) here is a quick and dirty screenshot showing the results using your sample output.
And here is a slightly shorter version of the single line command:
for f in *; do if [ ! "$(echo ${f} | grep -o '.csv')" ]; then rm "${f}"; fi; done
And here is it's sample output using your sample's csv file names and some randomly generated text files.
The purpose for using such a loop with a conditional is to guarantee you only rid yourself of the files you want gone (the non-csv files) and only in the current working directory without parsing the ls command.
Hopefully this helps you and anyone else that is looking for a similar solution.

Bash go through list of dirs and generate md5

What would be the bash script that:
Goes through a directory, and puts all the sub-directories in an array
For each dir, generate an md5 sum of a file inside that dir
Also, the file who's md5sum has to be generated doesn't always have the same name and path. However, the pattern is always the same:
/var/mobile/Applications/{ the dir name here is taken from the array }/{some name}.app/{ binary, who's name is the same as it's parent dir, but without the .app extension }
I've never worked with bash before (and have never needed to) so this may be something really simple and nooby. Anybody got an idea? As can be seen by the path, this is designed to be run on an iDevice.
for dir in /var/mobile/Applications/*; do
for app in "$dir"/*.app; do
appdirname=${app##*/}
appname=${appdirname%.app}
binary="$app/$appname"
if [ -f "$binary" ]; then
echo "I: dir=$dir appbase=$appbase binary=$binary"
fi
done
done
Try this, I hope the code is straight-forward. The two things worth explaining are:
${app##*/}, which uses the ## operator to strip off the longest prefix matching the expression */.
${appdirname%.app}, which uses the % operator to strip off the shortest suffix matching the expression .app. (You could have also used %% (strip longest suffix) instead of %, since the pattern .app is always four characters long.)
Try something like:
ls -1 /Applications/*/Contents/Info.plist | while read name; do md5 -r "$name"; done
the above will show md5 checksum for all Info.plist files for all applications, like:
d3bde2b76489e1ac081b68bbf18a7c29 /Applications/Address Book.app/Contents/Info.plist
6a093349355d20d4af85460340bc72b2 /Applications/Automator.app/Contents/Info.plist
f1c120d6ccc0426a1d3be16c81639ecb /Applications/Calculator.app/Contents/Info.plist
Bash is very easy but you need to know the cli-tools of your system.
For to print the md5 hash of all files of the a directory recursively:
find /yourdirectory/ -type f | xargs md5sum
If you only want to list the tree of directories:
find /tmp/ -type d
You can generate a list with:
MYLIST=$( find /tmp/ -type d )
Use "for" for iterate the list:
for i in $MYLIST; do
echo $i;
done
If you are a newbie in bash:
http://tldp.org/LDP/Bash-Beginners-Guide/html/
http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html

Apply regular expression substitution globally to many files with a script

I want to apply a certain regular expression substitution globally to about 40 Javascript files in and under a directory. I'm a vim user, but doing this by hand can be tedious and error-prone, so I'd like to automate it with a script.
I tried sed, but handling more than one line at a time is awkward, especially if there is no limit to how many lines the pattern might match.
I also tried this script (on a single file, for testing):
ex $1 <<EOF
gs/,\(\_\s*[\]})]\)/\1/
EOF
The pattern will eliminate a trailing comma in any Perl/Ruby-style list, so that "[a, b, c,]" will come out as "[a, b, c]" in order to satisfy Internet Explorer, which alone among browsers, chokes on such lists.
The pattern works beautifully in vim but does nothing if I run it in ex, as per the above script.
Can anyone see what I might be missing?
You asked for a script, but you mentioned that you are vim user. I tend to do project-wide find and replace inside of vim, like so:
:args **/*.js | argdo %s/,\(\_\s*[\]})]\)/\1/ge | update
This is very similar to the :bufdo solution mentioned by another commenter, but it will use your args list rather than your buflist (and thus doesn't require a brand new vim session nor for you to be careful about closing buffers you don't want touched).
:args **/*.js - sets your arglist to contain all .js files in this directory and subdirectories
| - pipe is vim's command separator, letting us have multiple commands on one line
:argdo - run the following command(s) on all arguments. it will "swallow" subsequent pipes
% - a range representing the whole file
:s - substitute command, which you already know about
:s_flags, ge - global (substitute as many times per line as possible) and suppress errors (i.e. "No match")
| - this pipe is "swallowed" by the :argdo, so the following command also operates once per argument
:update - like :write but only when the buffer has been modified
This pattern will obviously work for any vim command which you want to run on multiple files, so it's a handy one to keep in mind. For example, I like to use it to remove trailing whitespace (%s/\s\+$//), set uniform line-endings (set ff=unix) or file encoding (set filencoding=utf8), and retab my files.
1) Open all the files with vim:
bash$ vim $(find . -name '*.js')
2) Apply substitute command to all files:
:bufdo %s/,\(\_\s*[\]})]\)/\1/ge
3) Save all the files and quit:
:wall
:q
I think you'll need to recheck your search pattern, it doesn't look right. I think where you have \_\s* you should have \_s* instead.
Edit: You should also use the /ge options for the :s... command (I've added these above).
You can automate the actions of both vi and ex by passing the argument +'command' from the command line, which enables them to be used as text filters.
In your situation, the following command should work fine:
find /path/to/dir -name '*.js' | xargs ex +'%s/,\(\_\s*[\]})]\)/\1/g' +'wq!'
you can use a combination of the find command and sed
find /path -type f -iname "*.js" -exec sed -i.bak 's/,[ \t]*]/]/' "{}" +;
If you are on windows, Notepad++ allows you to run simple regexes on all opened files.
Search for ,\s*\] and replace with ]
should work for the type of lists you describe.