export filenames to temp file bash - regex

I have a lot of files in multiple directories that all have the following setup for the filename:
prob123456_01
I want to delete the trailing "_01" off of each file name and export them to a temp file. How exactly would I delete the trailing "_01" as well as export? I am rather new to scripting so any help would be greatly appreciated!

As you've tagged with bash, I'll assume that you can use globstar
shopt -s globstar # enable globstar
for f in **_[0-9][0-9]; do echo "${f%_*}"; done > tmp
With globstar enabled, the pattern **_[0-9][0-9] matches any file ending in _, followed by any 2 digit number, in the current directory and any subdirectories. ${f%_*} removes the end of the file name using bash's built-in string manipulation functionality.
Better yet, as Charles Duffy suggests (thanks), you can use an array instead of a loop:
files=( **_[0-9][0-9] ); printf '%s\n' "${files[#]%_*}"
The array is filled the filenames that match the same pattern as before. ${files[#]%_*} removes the last part from each element of the array and passes them all as arguments to printf, which prints each result on a separate line.
Either of these approaches is likely to be quicker than using find as everything is done in the shell, without executing any separate processes.
Previously I had suggested to use the pattern **_{00..99}, although this is not ideal for a couple of reasons. It is less efficient, as it expands to **_00, **_01, **_02, ..., **_99. Also, any of those 100 patterns that don't match will be included literally in the output unless another option, nullglob is enabled.
It's up to you whether you use [0-9] or [[:digit:]] but the advantage of the latter is that it matches all characters defined to be a digit, which may vary depending on your locale. If this isn't a concern, I would go with the former.

If I understand you correctly, you want a list of the filenames without the trailing _01. The following would do that:
find . -type f -name '*_01' | sed 's/_01$//' > tmp.lst
find . -type f -name '*_01' looks for all the files in the current directory, and its descendent directories, for files with names ending in _01.
| is the so-called pipe, handing the results of the left-hand call to the right-hand call.
sed 's/_01$//' removes the _01 from the end of each filename.
> tmp.lst writes the result into the file tmp.lst
These are all pretty basic parts of working with bash and its likes, so it might be a good idea to look at a tutorial or two and familiarize yourself with those and a few others ;)

Related

For the love of BASH, regex, locate & find - contains A not B

Goal: Regex pattern for use with find and locate that "Contains A but not B"
So I have a bash script that manipulates a few video files.
In its current form, I create a variable to act on later with a for loop that works well:
if [ "$USE_FIND" = true ]; then
vid_files=$(find "${DIR}" -type f -regex ".*\.\(mkv\|avi\|ts\|mp4\|m2ts\)")
else
vid_files=$(locate -ir "${DIR}.*\.\(mkv\|avi\|ts\|mp4\|m2ts\)")
fi
So "contains A" is any one of the listed extensions.
I'd like to add to a condition where if a certain string (B) is contained the file isn't added to the array (can be a directory or a filename).
I've spent some time with lookaheads trying to implement this to no avail. So an example of "not contains B" as "Robot" - I've used different forms of .*(?!Robot).*
e.g. ".*\(\?\!Robot\).*\.\(mkv\|avi\|ts\|mp4\|m2ts\)" for find but it doesn't work.
I've sort of exhausting regex101.com, terminal and chmod +x at this point and would welcome some help. I think it's the case that's it's called through a bash script causing me the difficulty.
One of my many sources of reference in trying to sort this:
Ref: Is there a regex to match a string that contains A but does not contain B
You may want to avoid the use find inside a process substitution to build a list of files, as, while this is admittedly rare, filenames could contain newlines.
You could use an array, which will handle file names without issues (assuming the array is later expanded properly).
declare -a vid_files=()
while IFS= read -r -d '' file
do
! [[ "$file" =~ Robot ]] || continue
vid_files+=("$file")
done < <(find "${DIR}" -type f -regex ".*\.\(mkv\|avi\|ts\|mp4\|m2ts\)" -print0)
The -print0 option of find generates a null byte to separate the file names, and the -d '' option of read allows a null byte to be used as a record separator (both obviously go together).
You can get the list of files using "${vid_files[#]}" (double quotes are important to prevent word splitting). You can also iterate over the list easily :
for file in "${vid_files[#]}"
do
echo "$file"
done

bulk file renaming in bash, to remove name with spaces, leaving trailing digits

Can a bash/shell expert help me in this? Each time I use PDF to split large pdf file (say its name is X.pdf) into separate pages, where each page is one pdf file, it creates files with this pattern
"X 1.pdf"
"X 2.pdf"
"X 3.pdf" etc...
The file name "X" above is the original file name, which can be anything. It then adds one space after the name, then the page number. Page numbers always start from 1 and up to how many pages. There is no option in adobe PDF to change this.
I need to run a shell command to simply remove/strip out all the "X " part, and just leave the digits, like this
1.pdf
2.pdf
3.pdf
....
100.pdf ...etc..
Not being good in pattern matching, not sure what regular expression I need.
I know I need something like
for i in *.pdf; do mv "$i$" ........; done
And it is the ....... part I do not know how to do.
This only needs to run on Linux/Unix system.
Use sed..
for i in *.pdf; do mv "$i" $(sed 's/.*[[:blank:]]//' <<< "$i"); done
And it would be simple through rename
rename 's/.*\s//' *.pdf
You can remove everything up to (including) the last space in the variable with this:
${i##* }
That's "star space" after the double hash, meaning "anything followed by space". ${i#* } would remove up to the first space.
So run this to check:
for i in *.pdf; do echo mv -i -- "$i" "${i##* }" ; done
and remove the echo if it looks good. The -i suggested by Gordon Davisson will prompt you before overwriting, and -- signifies end of options, which prevents things from blowing up if you ever have filenames starting with -.
If you just want to do bulk renaming of files (or directories) and don't mind using external tools, then here's mine: rnm
The command to do what you want would be:
rnm -rs '/.*\s//' *.pdf
.*\s selects the part before (and with) the last white space and replaces it with empty string.
Note:
It doesn't overwrite any existing files (throws warning if it finds an existing file with the target name).
And this operation is failsafe. You can get back the changes made by last rnm command with rnm -u.
Here's a list of documents for rnm.

Search for multiline regex

I have several source file which have function definitions as follows.
ReturnType ClassName::
FunctionName(FunctionArgs...)
{
....
}
ReturnType ClassName::NestedClassName::
FunctionName(FunctionArgs...)
{
....
}
I want to grep through the files and list all the functions of first type sepeartely and second type separately. Is there a way to do it in emacs?
Note: I have tried C-q C-j from [https://emacs.stackexchange.com/questions/9548/what-is-regex-to-match-newline-character]. it didn't work.
There seem to be two inter-related pieces to this: the regexps to use, and the search method. I am making the following assumptions (sorry for not clarifying these beforehand; I don't have enough rep to comment):
You are interested in collecting just the function signature (the first two lines), not what's inside the braces.
Colons never appear in return types or class names.
No line ever ends in a double colon inside a function body.
Within Emacs, I can distinguish the two cases with the regexps
^[^:]+::\n.+$
^[^:]+::[^:]+::\n.+$
(where you would replace \n with an actual newline (via C-q C-j, for instance) in an interactive usage).
If you have all of the buffers opened in Emacs, you can just use multi-occur now. Otherwise, you can call out to grep using the grep command, or just use M-! to call grep directly.
If you're using grep, then you need a different approach (or at least I couldn't find an appropriate regexp). If you drop the second line and use the -A 1 switch (which tells grep to print the line following each match), everything seems to work properly. You also need to escape the + operator, for some reason. Here are the resulting commands:
grep -A 1 "^[^:]\+::$" files
grep -A 1 "^[^:]\+::[^:]\+::$" files

find and replace within file

I have a requirement to search for a pattern which is something like :
timeouts = {default = 3.0; };
and replace it with
timeouts = {default = 3000.0;.... };
i.e multiply the timeout by factor of 1000.
Is there any way to do this for all files in a directory
EDIT :
Please note that some of the files are symlinks in the directory.Is there any way to get this done for symlinks also ?
Please note that timeouts exists as a substring also in the files so i want to make sure that only this line gets replaced. Any solution is acceptable using sed awk perl .
Give this a try:
for f in *
do
sed -i 's/\(timeouts = {default = [0-9]\+\)\(\.[0-9]\+;\)\( };\)/\1000\2....\3/' "$f"
done
It will make the replacements in place for each file in the current directory. Some versions of sed require a backup extension after the -i option. You can supply one like this:
sed -i .bak ...
Some versions don't support in-place editing. You can do this:
sed '...' "$f" > tmpfile && mv tmpfile "$f"
Note that this is obviously not actually multiplying by 1000, so if the number is 3.1 it would become "3000.1" instead of 3100.0.
you can do this
perl -pi -e 's/(timeouts\s*=\s*\{default\s*=\s*)([0-9.-]+)/print $1; $2*1000/e' *
One suggestion for whichever solution above you decide to use - it may be worth it to think through how you could refactor to avoid having to modify all of these files for a change like this again.
Do all of these scripts have similar functionality?
Can you create a module that they would all use for shared subroutines?
In the module, could you have a single line that would allow you to have a multiplier?
For me, anytime I need to make similar changes in more than one file, it's the perfect time to be lazy to save myself time and maintenance issues later.
$ perl -pi.bak -e 's/\w+\s*=\s*{\s*\w+\s*=\s*\K(-?[0-9.]+)/sprintf "%0.1f", 1000 * $1/eg' *
Notes:
The regex matches just the number (see \K in perlre)
The /e means the replacement is evaluated
I include a sprintf in the replacement just in case you need finer control over the formatting
Perl's -i can operate on a bunch of files
EDIT
It has been pointed out that some of the files are shambolic links. Given that this process is not idempotent (running it twice on the same file is bad), you had better generate a unique list of files in case one of the links points to a file that appears elsewhere in the list. Here is an example with find, though the code for a pre-existing list should be obvious.
$ find -L . -type f -exec realpath {} \; | sort -u | xargs -d '\n' perl ...
(Assumes none of your filenames contain a newline!)

Apply regular expression substitution globally to many files with a script

I want to apply a certain regular expression substitution globally to about 40 Javascript files in and under a directory. I'm a vim user, but doing this by hand can be tedious and error-prone, so I'd like to automate it with a script.
I tried sed, but handling more than one line at a time is awkward, especially if there is no limit to how many lines the pattern might match.
I also tried this script (on a single file, for testing):
ex $1 <<EOF
gs/,\(\_\s*[\]})]\)/\1/
EOF
The pattern will eliminate a trailing comma in any Perl/Ruby-style list, so that "[a, b, c,]" will come out as "[a, b, c]" in order to satisfy Internet Explorer, which alone among browsers, chokes on such lists.
The pattern works beautifully in vim but does nothing if I run it in ex, as per the above script.
Can anyone see what I might be missing?
You asked for a script, but you mentioned that you are vim user. I tend to do project-wide find and replace inside of vim, like so:
:args **/*.js | argdo %s/,\(\_\s*[\]})]\)/\1/ge | update
This is very similar to the :bufdo solution mentioned by another commenter, but it will use your args list rather than your buflist (and thus doesn't require a brand new vim session nor for you to be careful about closing buffers you don't want touched).
:args **/*.js - sets your arglist to contain all .js files in this directory and subdirectories
| - pipe is vim's command separator, letting us have multiple commands on one line
:argdo - run the following command(s) on all arguments. it will "swallow" subsequent pipes
% - a range representing the whole file
:s - substitute command, which you already know about
:s_flags, ge - global (substitute as many times per line as possible) and suppress errors (i.e. "No match")
| - this pipe is "swallowed" by the :argdo, so the following command also operates once per argument
:update - like :write but only when the buffer has been modified
This pattern will obviously work for any vim command which you want to run on multiple files, so it's a handy one to keep in mind. For example, I like to use it to remove trailing whitespace (%s/\s\+$//), set uniform line-endings (set ff=unix) or file encoding (set filencoding=utf8), and retab my files.
1) Open all the files with vim:
bash$ vim $(find . -name '*.js')
2) Apply substitute command to all files:
:bufdo %s/,\(\_\s*[\]})]\)/\1/ge
3) Save all the files and quit:
:wall
:q
I think you'll need to recheck your search pattern, it doesn't look right. I think where you have \_\s* you should have \_s* instead.
Edit: You should also use the /ge options for the :s... command (I've added these above).
You can automate the actions of both vi and ex by passing the argument +'command' from the command line, which enables them to be used as text filters.
In your situation, the following command should work fine:
find /path/to/dir -name '*.js' | xargs ex +'%s/,\(\_\s*[\]})]\)/\1/g' +'wq!'
you can use a combination of the find command and sed
find /path -type f -iname "*.js" -exec sed -i.bak 's/,[ \t]*]/]/' "{}" +;
If you are on windows, Notepad++ allows you to run simple regexes on all opened files.
Search for ,\s*\] and replace with ]
should work for the type of lists you describe.