script to rename filenames containing backslash - regex

I am trying to write a shell script to rename the result of 7-zipped folders. The resulting filenames contains backslash \ in the filename.
I wrote a simple :
#! /bin/sh
for n in * do
OldName=$n
NewName=`echo "$n" | tr -s '\' "#" | tr -s " " "_"`
echo $NewName
mv "$OldName" "$NewName"
done
The problem I have is that \01 is interpreted by echo, and my files are :
FLD\01.02.2015 thefile.pdf
Thus, echo "FLD\01.02.2015 thefile.pdf" returns FLD?.2015.
I have tried various replacement solutions, s/\/#/g, sed, tr.
I tried to use printf instead
I have search all over the net without finding a valid solution.
Nothing works. I need a solution that would work on Unix and Mac OS X.
the only "working" solution would be to
ls > liste.txt
sed -e 's/\\/,/g' liste.txt
and then parse liste.txt, escape the backslash, generate a rename.sh and execute it. But this seems really dirty to me.
Does anyone has a suggestion ?

If you care about literal data, don't use echo. Its results are not well defined.
You can instead use printf, also POSIX, which doesn't mangle data.
NewName=`printf "%s" "$n" | tr -s '\' "#" | tr -s " " "_"`

Instead of using echo to rename the filename you can do this (in Bash):
NewName="${OldName//\\/}"
Example:
OldName="FLD\01.02.2015 thefile.pdf"
NewName="${OldName//\\/}"
echo "$NewName"
It will print out:
FLD01.02.2015 thefile.pdf
Note: You will need to change the shebang to #!/bin/bash to use bash parameter expansion.

Related

How to find specific text in a text file, and append it to the filename?

I have a collection of plain text files which are named as yymmdd_nnnnnnnnnn.txt, which I want to append another number sequence to the filenames, so that they each become named as yymmdd_nnnnnnnnnn_iiiiiiiii.txt instead, where the iiiiiiiii is taken from the one line in each file which contains the text "GST: 123456789⏎" (or similar) at the end of the line. While I am sure that there will only be one such matching line within each file, I don't know exactly which line it will be on.
I need an elegant one-liner solution that I can run over the collection of files in a folder, from a bash script file, to rename each file in the collection by appending the specific GST number for each filename, as found within the files themselves.
Before even getting to the renaming stage, I have encountered a problem with this. Here is what I tried, which didn't work...
# awk '/\d+$/' | grep -E 'GST: ' 150101_2224567890.txt
The grep command alone works perfectly to find the relevant line within the file, but the awk doesn't return just the final digits group. It fails with the error "warning: regexp escape sequence \d is not a known regexp operator". I had assumed that this regex should return any number of digits which are at the end of the line. The text file in question contains a line which ends with "GST: 112060340⏎". Can someone please show me how to make this work, and maybe also to help with the appropriate coding to move the collection of files to the new filenames? Thanks.
Thanks to a comment from #Renaud, I now have the following code working to obtain just the GST registration number from within a text file, which puts me a step closer towards a workable solution.
awk '/GST: / {printf $NF}' 150101_2224567890.txt
I still need to loop this over the collection instead of just specifying one filename. I also need to be able to use the output from #Renaud's contribution, to rename the files. I'm getting closer to a working solution, thanks!
This awk should work for you:
awk '$1=="GST:" {fn=FILENAME; sub(/\.txt$/, "", fn); print "mv", FILENAME, fn "_" $2 ".txt"; nextfile}' *_*.txt | sh
To make it more readable:
awk '$1 == "GST:" {
fn = FILENAME
sub(/\.txt$/, "", fn)
print "mv", FILENAME, fn "_" $2 ".txt"
nextfile
}' *_*.txt | sh
Remove | sh from above to see all mv commands together.
You may try
for f in *_*.txt; do echo mv "$f" "${f%.txt}_$(sed '/.*GST: /!d; s///; q' "$f").txt"; done
Drop the echo if you're satisfied with the output.
As you are sure there is only one matching line, you can try:
$ n=$(awk '/GST:/ {print $NF}' 150101_2224567890.txt)
$ mv 150101_2224567890.txt "150101_2224567890_$n.txt"
Or, for all .txt files:
for f in *.txt; do
n=$(awk '/GST:/ {print $NF}' "$f")
if [[ -z "$n" ]]; then
printf '%s: GST not found\n' "$f"
continue
fi
mv "$f" "$f{%.txt}_$n.txt"
done
Another one-line solution to consider, although perhaps not so elegant.
for original_filename in *_*.txt; do \
new_filename=${original_filename%'.txt'}_$(
grep -E 'GST: ' "$original_filename" | \
sed -E 's/.*GST//g; s/[^0-9]//g'
)'.txt' && \
mv "$original_filename" "$new_filename"; \
done
Output:
150101_2224567890_123456789.txt
If you are open to a multi line script:-
#!/bin/sh
for f in *.txt; do
prefix=$(echo "${f}" | sed s'#\.txt##')
cp "${f}" f1
sed -i s'#GST#%GST#' "./f1"
cat "./f1" | tr '%' '\n' > f2
number=$(cat "./f2" | sed -n '/GST/'p | cut -d':' -f2 | tr -d ' ')
newname="${prefix}_${number}.txt"
mv -v "${f}" "${newname}"
rm -v "./f1"
rm -v "./f2"
done
In general, if you want to make your files easy to work with, then leave as many potential places for them to be split with newlines as possible. It is much easier to alter files by simply being able to put what you want to delete or print on its' own line, than it is to search for things horizontally with regular expressions.

replace string with underscore and dots using sed or awk

I have a bunch of files with filenames composed of underscore and dots, here is one example:
META_ALL_whrAdjBMI_GLOBAL_August2016.bed.nodup.sortedbed.roadmap.sort.fgwas.gz.r0-ADRL.GLND.FET-EnhA.out.params
I want to remove the part that contains .bed.nodup.sortedbed.roadmap.sort.fgwas.gz. so the expected filename output would be META_ALL_whrAdjBMI_GLOBAL_August2016.r0-ADRL.GLND.FET-EnhA.out.params
I am using these sed commands but neither one works:
stringZ=META_ALL_whrAdjBMI_GLOBAL_August2016.bed.nodup.sortedbed.roadmap.sort.fgwas.gz.r0-ADRL.GLND.FET-EnhA.out.params
echo $stringZ | sed -e 's/\([[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.\)//g'
echo $stringZ | sed -e 's/\[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.//g'
Any solution is sed or awk would help a lot
Don't use external utilities and regexes for such a simple task! Use parameter expansions instead.
stringZ=META_ALL_whrAdjBMI_GLOBAL_August2016.bed.nodup.sortedbed.roadmap.sort.fgwas.gz.r0-ADRL.GLND.FET-EnhA.out.params
echo "${stringZ/.bed.nodup.sortedbed.roadmap.sort.fgwas.gz}"
To perform the renaming of all the files containing .bed.nodup.sortedbed.roadmap.sort.fgwas.gz, use this:
shopt -s nullglob
substring=.bed.nodup.sortedbed.roadmap.sort.fgwas.gz
for file in *"$substring"*; do
echo mv -- "$file" "${file/"$substring"}"
done
Note. I left echo in front of mv so that nothing is going to be renamed; the commands will only be displayed on your terminal. Remove echo if you're satisfied with what you see.
Your regex doesn't really feel too much more general than the fixed pattern would be, but if you want to make it work, you need to allow for more than one lower case character between each dot. Right now you're looking for exactly one, but you can fix it with \+ after each [[:lower:]] like
printf '%s' "$stringZ" | sed -e 's/\([[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.\)//g'
which with
stringZ="META_ALL_whrAdjBMI_GLOBAL_August2016.bed.nodup.sortedbed.roadmap.sort.fgwas.gz.r0-ADRL.GLND.FET-EnhA.out.params"
give me the output
META_ALL_whrAdjBMI_GLOBAL_August2016.r0-ADRL.GLND.FET-EnhA.out.params
Try this:
#!/bin/bash
for line in $(ls -1 META*);
do
f2=$(echo $line | sed 's/.bed.nodup.sortedbed.roadmap.sort.fgwas.gz//')
mv $line $f2
done

How to extract a substring using sed on OS X?

Im trying to iterate over each file and folder inside a directory and extract part of the file name into a variable, but I can't make sed work correctly. I either get all of the file name or none of it.
This version of the script should capture the entire file name:
#!/bin/bash
for f in *
do
substring=`echo $f | sed -E -n 's/(.*)/\1/'`
echo "sub: $substring"
done
But instead I get nothing:
sub:
sub:
sub:
sub:
...
This version should give me just the first character in the filename:
#!/bin/bash
for f in *
do
substring=`echo $f | sed -E 's/^([a-zA-Z])/\1/'`
echo "sub: $substring"
done
But instead I get the whole file name:
sub: Adlm
sub: Applications
sub: Applications (Parallels)
sub: Desktop
...
I've tried numerous iterations of it and what it basically boils down to is that if I use -n I get nothing and if I don't I get the whole file name.
Can someone show me how to get just the first character?
Or, my overall goal is to be able to extract a substring and store it into a variable, if anybody has a better approach to it, that would be appreciated as well.
Thanks in advance.
If you want to modify a shell parameter, you probably want to use a parameter expansion.
for f in *; do
# This version should expand to the whole parameter
echo "$f"
# This version should expand to the first character in the filename
echo "${f::1}"
done
Parameter expansions are not as powerful as sed, but they are built in to the shell (no launching a separate process or subshell necessary) and there are expansions for:
Substrings (as above)
Replacing and substituting characters
Altering the case of strings (bash 4+)
and more.
This version of the script should capture the entire file name:
sed -E -n 's/(.*)/\1/'
But instead I get nothing.
You used -n so naturally it won't yield anything. Perhaps you should remove -n or add p:
sed -E -n 's/(.*)/\1/p'
This version should give me just the first character in the filename:
sed -E 's/^([a-zA-Z])/\1/'
But instead I get the whole file name,
You didn't replace anything there. Perhaps what you wanted was
sed -E 's/^([a-zA-Z]).*/\1/'
Also I suggest quoting your arguments well:
substring=`echo "$f" | sed ...'`
Finally the simpler method is to use substring expansion if you're using Bash as suggested by kojiro.
You forget to add .* after the capturing group in sed,
$ for i in *; do substring=`echo $i | sed -E 's/^(.).*$/\1/'`; echo "sub: $substring"; done
It's better to use . instead of [a-zA-Z] because it may fail if the first character starts with any special character.
I prefer awk to sed. It seems to be easier for me to understand.
#!/bin/bash
#set -x
for f in *
do
substring=`echo $f | awk '{print substr($1,1,1)}'`
echo "sub: $substring"
done

How can I reorganize nested quotes within sed regex in a bash script that triggers an "unterminated substitute pattern" error?

The following command is throwing an unterminated substitute pattern error in bash:
eval $(echo "sed '" "s,#\("{a..u}{a..z}"\),\n\n\1,;" "'")
But not for everyone. Linux apparently works fine. Mac throws the unterminated substitute pattern error.
How can I reorganize to make this work?
Here's the entire bash command (the goal is to cleanly output current MySQL settings into my.cnf) :
{
# Print version, user, host and time
echo -e "# MYSQL VARIABLES {{{1\n##\n# MYSQL `
mysql -V | sed 's,^.*\(V.*\)\, for.*,\1,'
` - By: `logname`#`hostname -f` on `date +%c`\n##"
for l in {a..z}; do
# Get mysql global variables starting with $l
echo '#'; mysql -NBe "SHOW GLOBAL VARIABLES LIKE '${l}%'" |
# Transorm it
sed 's,\t,^= ,' |
column -ts^ |
tr "\n" '#' |
eval $(echo "sed '" "s,#\("{a..u}{a..z}"\),\n\n\1,;" "'") |
eval $(echo "sed '" "s,#\(innodb_"{a..z}{a..z}"\),\n\n\1,;" "'") |
tr '#' "\n" |
sed 's,^,# ,g'
done
echo -e "#\n##\n# MYSQL VARIABLES }}}1";
} | tee ~/mysql-variables.log
The default sed in OS X is an BSD version of sed. Just tested:
the above fails in OS X's default sed,
and works with the GNU version (gsed - installed from macports).
So, probably the BSD version doesn't handles such long substitution command series.
You can try use the next:
eval $(echo "perl -ple '" "s,#("{a..u}{a..z}"),\n\n\1,;" "'")
And maybe I didn't understand right your goal, but what is a wrong with a much simpler?
sed 's/#\([a-u][a-z]\)/\n\n\1/' #or
sed 's/#\("[a-u][a-z]"\)/\n\n\1/'
EDIT
Once again i'm only focused to the 1st code-line and not the whole solution. So created a bash/perl version what works without problems on OS X (with default OS X tools).
The next code
MYSQLCMD=/usr/local/mysql-5.6.16-osx10.7-x86_64/bin/mysql #your path to mysql command
printf "# MYSQL VARIABLES {{{1\n##\n# MYSQL %s " "$($MYSQLCMD -V | sed 's/.*\(Ver .*\),.*/\1/')"
printf " - By: %s#%s on %s\n" $(logname) $(hostname -f) "$(date +%c)"
perl -e "\$s=qx($MYSQLCMD -NBe 'SHOW GLOBAL VARIABLES');" \
-e 'for("aa".."uz"){$s=~s/^($_)/#\n$1/m;$s=~s/^(innodb_$_)/#\n$1/m};' \
-e '$s=~s/(.*)\t(.*)/sprintf "# %-55s= %s",$1,$2/gem;print $s'
printf "#\n##\n# MYSQL VARIABLES }}}1\n";
roughly do the same what the original code.
Try breaking the command up into multiple commands:
eval "$(printf "sed "; echo "-e 's,#\("{a..z}{a..z}"\),\n\n\1,'")"
But note that sed on OSX also probably doesn't like \n as a newline, so you'll have to do:
$ nl='
'
$ eval "$(printf "sed "; echo "-e 's,#\("{a..z}{a..z}"\),\\$nl\\$nl\1,'")"
I would strongly recommend finding a better solution. Probably via perl.
For a quick fix, try (see below for a preferable alternative that doesn't use eval):
eval "$(echo "sed '" "s,#\("{a..u}{a..z}"\),"$'\\\n\\\n'"\1,"$'\n' "'")"
As #jm666 hints at, FreeBSD sed (at least the version that comes with OS X 10.9.4) has a limit on the size of individual lines in a script (command string) - 4096 bytes - and the large single-line string that results from your use of bash's brace (range) expansion ({a..u}{a..z}) exceeds that limit.
The above works around that by putting each s call on its own line by appending $'\n' (which in bash expands to an actual newline - see below) rather than ; to the string to be brace-expanded.
Also note that \n\n was replaced with spliced-in $'\\\n\\\n', because FreeBSD sed doesn't support \n escapes in replacement strings (treats them as literal n chars). $'\\\n\\\n' inserts actual newlines - escaped with \ - using a bash feature called ANSI C-quoting.
(Similarly, FreeBSD sed also doesn't support escape sequence \t in regexes to represent chars, so your sed 's,\t,^= ,' command must be replaced with sed 's,'$'\t'',^= ,'.)
Note that the entire string passed to eval must then be double-quoted so as to ensure that the newlines are passed through to sed.
Note that you could in theory still hit a limit: the max. length of a command line, but that limit is much higher: a little less than 256 KB on OS X.
Also, you may pass long sed scripts via a file, by using the -f option.
Generally, it's better to avoid use of eval, so here's an alternative:
sed "$(printf %s "s,#\("{a..u}{a..z}"\),"$'\\\n\\\n'"\1,"$'\n')"

Why is sed not recognizing \t as a tab?

sed "s/\(.*\)/\t\1/" $filename > $sedTmpFile && mv $sedTmpFile $filename
I am expecting this sed script to insert a tab in front of every line in $filename however it is not. For some reason it is inserting a t instead.
Not all versions of sed understand \t. Just insert a literal tab instead (press Ctrl-V then Tab).
Using Bash you may insert a TAB character programmatically like so:
TAB=$'\t'
echo 'line' | sed "s/.*/${TAB}&/g"
echo 'line' | sed 's/.*/'"${TAB}"'&/g' # use of Bash string concatenation
#sedit was on the right path, but it's a bit awkward to define a variable.
Solution (bash specific)
The way to do this in bash is to put a dollar sign in front of your single quoted string.
$ echo -e '1\n2\n3'
1
2
3
$ echo -e '1\n2\n3' | sed 's/.*/\t&/g'
t1
t2
t3
$ echo -e '1\n2\n3' | sed $'s/.*/\t&/g'
1
2
3
If your string needs to include variable expansion, you can put quoted strings together like so:
$ timestamp=$(date +%s)
$ echo -e '1\n2\n3' | sed "s/.*/$timestamp"$'\t&/g'
1491237958 1
1491237958 2
1491237958 3
Explanation
In bash $'string' causes "ANSI-C expansion". And that is what most of us expect when we use things like \t, \r, \n, etc. From: https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html#ANSI_002dC-Quoting
Words of the form $'string' are treated specially. The word expands
to string, with backslash-escaped characters replaced as specified by
the ANSI C standard. Backslash escape sequences, if present, are
decoded...
The expanded result is single-quoted, as if the dollar sign had not
been present.
Solution (if you must avoid bash)
I personally think most efforts to avoid bash are silly because avoiding bashisms does NOT* make your code portable. (Your code will be less brittle if you shebang it to bash -eu than if you try to avoid bash and use sh [unless you are an absolute POSIX ninja].) But rather than have a religious argument about that, I'll just give you the BEST* answer.
$ echo -e '1\n2\n3' | sed "s/.*/$(printf '\t')&/g"
1
2
3
* BEST answer? Yes, because one example of what most anti-bash shell scripters would do wrong in their code is use echo '\t' as in #robrecord's answer. That will work for GNU echo, but not BSD echo. That is explained by The Open Group at http://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html#tag_20_37_16 And this is an example of why trying to avoid bashisms usually fail.
I've used something like this with a Bash shell on Ubuntu 12.04 (LTS):
To append a new line with tab,second when first is matched:
sed -i '/first/a \\t second' filename
To replace first with tab,second:
sed -i 's/first/\\t second/g' filename
Use $(echo '\t'). You'll need quotes around the pattern.
Eg. To remove a tab:
sed "s/$(echo '\t')//"
You don't need to use sed to do a substitution when in actual fact, you just want to insert a tab in front of the line. Substitution for this case is an expensive operation as compared to just printing it out, especially when you are working with big files. Its easier to read too as its not regex.
eg using awk
awk '{print "\t"$0}' $filename > temp && mv temp $filename
I used this on Mac:
sed -i '' $'$i\\\n\\\thello\n' filename
Used this link for reference
sed doesn't support \t, nor other escape sequences like \n for that matter. The only way I've found to do it was to actually insert the tab character in the script using sed.
That said, you may want to consider using Perl or Python. Here's a short Python script I wrote that I use for all stream regex'ing:
#!/usr/bin/env python
import sys
import re
def main(args):
if len(args) < 2:
print >> sys.stderr, 'Usage: <search-pattern> <replace-expr>'
raise SystemExit
p = re.compile(args[0], re.MULTILINE | re.DOTALL)
s = sys.stdin.read()
print p.sub(args[1], s),
if __name__ == '__main__':
main(sys.argv[1:])
Instead of BSD sed, i use perl:
ct#MBA45:~$ python -c "print('\t\t\thi')" |perl -0777pe "s/\t/ /g"
hi
I think others have clarified this adequately for other approaches (sed, AWK, etc.). However, my bash-specific answers (tested on macOS High Sierra and CentOS 6/7) follow.
1) If OP wanted to use a search-and-replace method similar to what they originally proposed, then I would suggest using perl for this, as follows. Notes: backslashes before parentheses for regex shouldn't be necessary, and this code line reflects how $1 is better to use than \1 with perl substitution operator (e.g. per Perl 5 documentation).
perl -pe 's/(.*)/\t$1/' $filename > $sedTmpFile && mv $sedTmpFile $filename
2) However, as pointed out by ghostdog74, since the desired operation is actually to simply add a tab at the start of each line before changing the tmp file to the input/target file ($filename), I would recommend perl again but with the following modification(s):
perl -pe 's/^/\t/' $filename > $sedTmpFile && mv $sedTmpFile $filename
## OR
perl -pe $'s/^/\t/' $filename > $sedTmpFile && mv $sedTmpFile $filename
3) Of course, the tmp file is superfluous, so it's better to just do everything 'in place' (adding -i flag) and simplify things to a more elegant one-liner with
perl -i -pe $'s/^/\t/' $filename
TAB=$(printf '\t')
sed "s/${TAB}//g" input_file
It works for me on Red Hat, which will remove tabs from the input file.
If you know that certain characters are not used, you can translate "\t" into something else.
cat my_file | tr "\t" "," | sed "s/(.*)/,\1/"