Regex match for file and rename + overwrite old file - regex

Im trying to make a bash script to rename some files wich match my regex, if they match i want to rename them using the regex and overwrite an old existing file.
I want to do this because on computer 1 i have a file, on computer 2 i change the file. Later i go back to computer 1 and it gives an example conflict so it saves them both.
Example file:
acl_cam.MYI
Example file after conflict:
acl_cam (Example conflit with .... on 2015-08-20).MYI
I tried a lot of thinks like rename, mv and couple other scripts but it didn't work.
the regex i should use in my opinion:
(.*)/s\(.*\)\.(.*)
then rename it to value1 . value2 and replace the old file (acl_cam.MYI) and do this for all files/directories from where it started
can you guys help me with this one?

The issue you have, if I understand your question correctly, is two part. (1) What is the correct regex that will match the error string and produce a filename?; and (2) how to use the returned filename to move/remove the offending file?
If the sting at issue is:
acl_cam (Example conflit with .... on 2015-08-20).MYI
and you need to return the MySQL file name, then a regex similar to the following will work:
[ ][(].*[)]
The stream editor sed is about as good as anything else to return the filename from your string. Example:
$ printf "acl_cam (Example conflit with .... on 2015-08-20).MYI\n" | \
sed -e 's/[ ][(].*[)]//'
acl_cam.MYI
(shown with line continuation above)
Then it is up to you how you move or delete the file. The remaining question is where is the information (the error string) currently stored and how do you have access to it? If you have a file full of these errors, then you could do something like the following:
while read -r line; do
victim=$( printf "%s\n" "$line" | sed -e 's/[ ][(].*[)]//' )
## to move the file to /path/to/old
[ -e "$victim" ] && mv "$victim" /path/to/old
done <$myerrorfilename
(you could also feed the string to sed as a here-string, but omitted for simplicity)
You could also just delete the file if that suits your purpose. However, more information is needed to clarify how/where that information is stored and what exactly you want to do with it to provide any more specifics. Let me know if you have further questions.

Final solution for this question for people who are interested:
for i in *; do
#Wildcar check if current file containt (Exemplaar
if [[ $i == *"(Exemplaar"* ]]
then
#Rename the file to the original name (without Exemplaar conflict)
NewFileName=$(echo "$i" | sed -E -e 's/[ ][(].*[)]//')
#Remove the original file
rm $NewFileName;
#Copy the conflict file as the original file name
cp -a "$i" $NewFileName;
#Delete the conflict file
rm "$i";
echo "Removed file: $NewFileName with: $i";
fi
done
I used this code to replace my database conflict files created by dropbox sync with different computers.

Related

How to rename multiple files with multiple letters and numbers combinations and sizes using bash or regex?

I've been trying to rename a list of files by it's been quite difficult...
The 41 filenames are:
BEIII_S29_pear_derep.fasta
BEII_S15_pear_derep.fasta
BEI_S1_pear_derep.fasta
MB211III_S30_pear_derep.fasta
MB211II_S16_pear_derep.fasta
MB211I_S2_pear_derep.fasta
...
and I need to rename to:
BEIII.fas
BEII.fas
BEI.fas
MB211III.fas
MB211II.fas
MB211I.fas
I tryed using for loop:
for i in *_S[0-9]{1,2}_pear_derep.fasta; do newfile="$(basename $i _S[0-9]{1,2}_pear_derep.fasta)"; echo $newfile; cp ${newfile}_S[0-9]{1,2}_pear_derep.fasta ${newfile}.fas; done;
It didn't work, then
rename 's/([A-Z]*[0-9]*[I]{1,4})_[A-Z][0-9]_[a-z]_[a-z]{1,5}(\.fasta).*/$1$2/g' *
It didn't work
then
for file in *.fas; do newfile=$(echo "$file" | sed -re 's/S_[0-9][0-9](\.)/\./g') mv -v $file $newfile; done;
None of them worked.
The thing here is that I have to use a regex to KEEP a variable beggining, which varys between
[A-Z]{2}[0-9]{3}[I]{1,3}
then everything else is excluded
S[0-9]{1,2}_[a-z]{4}_[a-z]{5} and then the extension .fasta to .fas
Could someone help me please?
Thank you Guys
You should make sure that *\.fasta targets every file you need. Make sure that you echo the mv command or create a copy of the directory and try it there first.
for i in *\.fasta; do
mv $i ${i/_*/}.fas;
done
The substitution ${i/_*/} removes everything after the first _.
The regexp in your rename attempt is missing a bunch of quantifiers. Also, it doesn't change the extension from .fasta to .fas. You should also anchor it to the beginning and end of the filename. There's no need for the g modifier, since you're only doing one replacement per name.
rename 's/^([A-Z]*[0-9]*I{1,4})_[A-Z][0-9]*_[a-z]*_[a-z]{1,5}\.fasta$/$1.fas/' *

sed / awk - remove space in file name

I'm trying to remove whitespace in file names and replace them.
Input:
echo "File Name1.xml File Name3 report.xml" | sed 's/[[:space:]]/__/g'
However the output
File__Name1.xml__File__Name3__report.xml
Desired output
File__Name1.xml File__Name3__report.xml
You named awk in the title of the question, didn't you?
$ echo "File Name1.xml File Name3 report.xml" | \
> awk -F'.xml *' '{for(i=1;i<=NF;i++){gsub(" ","_",$i); printf i<NF?$i ".xml ":"\n" }}'
File_Name1.xml File_Name3_report.xml
$
-F'.xml *' instructs awk to split on a regex, the requested extension plus 0 or more spaces
the loop {for(i=1;i<=NF;i++) is executed for all the fields in which the input line(s) is(are) splitted — note that the last field is void (it is what follows the last extension), but we are going to take that into account...
the body of the loop
gsub(" ","_", $i) substitutes all the occurrences of space to underscores in the current field, as indexed by the loop variable i
printf i<NF?$i ".xml ":"\n" output different things, if i<NF it's a regular field, so we append the extension and a space, otherwise i equals NF, we just want to terminate the output line with a newline.
It's not perfect, it appends a space after the last filename. I hope that's good enough...
▶    A D D E N D U M    ◀
I'd like to address:
the little buglet of the last space...
some of the issues reported by Ed Morton
generalize the extension provided to awk
To reach these goals, I've decided to wrap the scriptlet in a shell function, that changing spaces into underscores is named s2u
$ s2u () { awk -F'\.'$1' *' -v ext=".$1" '{
> NF--;for(i=1;i<=NF;i++){gsub(" ","_",$i);printf "%s",$i ext (i<NF?" ":"\n")}}'
> }
$ echo "File Name1.xml File Name3 report.xml" | s2u xml
File_Name1.xml File_Name3_report.xml
$
It's a bit different (better?) 'cs it does not special print the last field but instead special-cases the delimiter appended to each field, but the idea of splitting on the extension remains.
This seems a good start if the filenames aren't delineated:
((?:\S.*?)?\.\w{1,})\b
( // start of captured group
(?: // non-captured group
\S.*? // a non-white-space character, then 0 or more any character
)? // 0 or 1 times
\. // a dot
\w{1,} // 1 or more word characters
) // end of captured group
\b // a word boundary
You'll have to look-up how a PCRE pattern converts to a shell pattern. Alternatively it can be run from a Python/Perl/PHP script.
Demo
Assuming you are asking how to rename file names, and not remove spaces in a list of file names that are being used for some other reason, this is the long and short way. The long way uses sed. The short way uses rename. If you are not trying to rename files, your question is quite unclear and should be revised.
If the goal is to simply get a list of xml file names and change them with sed, the bottom example is how to do that.
directory contents:
ls -w 2
bob is over there.xml
fred is here.xml
greg is there.xml
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[#]}; i++));do
echo "${a_glob[i]}";
done
shopt -u nullglob
# output
bob is over there.xml
fred is here.xml
greg is there.xml
# then rename them
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[#]}; i++));do
# I prefer 'rename' for such things
# rename 's/[[:space:]]/_/g' "${a_glob[i]}";
# but sed works, can't see any reason to use it for this purpose though
mv "${a_glob[i]}" $(sed 's/[[:space:]]/_/g' <<< "${a_glob[i]}");
done
shopt -u nullglob
result:
ls -w 2
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml
globbing is what you want here because of the spaces in the names.
However, this is really a complicated solution, when actually all you need to do is:
cd [your space containing directory]
rename 's/[[:space:]]/_/g' *.xml
and that's it, you're done.
If on the other hand you are trying to create a list of file names, you'd certainly want the globbing method, which if you just modify the statement, will do what you want there too, that is, just use sed to change the output file name.
If your goal is to change the filenames for output purposes, and not rename the actual files:
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[#]}; i++));do
echo "${a_glob[i]}" | sed 's/[[:space:]]/_/g';
done
shopt -u nullglob
# output:
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml
You could use rename:
rename --nows *.xml
This will replace all the spaces of the xml files in the current folder with _.
Sometimes it comes without the --nows option, so you can then use a search and replace:
rename 's/[[:space:]]/__/g' *.xml
Eventually you can use --dry-run if you want to just print filenames without editing the names.

Changing file extensions with sed [duplicate]

This question already has answers here:
How to use sed to change file extensions?
(7 answers)
Closed 5 years ago.
If the arguments are files, I want to change their extensions to .file.
That's what I got:
#!/bin/bash
while [ $# -gt 0 ]
do
if [ -f $1 ]
then
sed -i -e "s/\(.*\)\(\.\)\(.*\)/\1\2file" $1
fi
shift
done
The script is running, but it doesn't do anything. Another problem is that file hasn't any extension, my sed command will not work, right? Please help.
sed is for manipulating the contents of files, not the filename itself.
Option 1, taken from this answer by John Smith:
filename="file.ext1"
mv "${filename}" "${filename/%ext1/ext2}"
Option 2, taken from this answer by chooban:
rename 's/\.ext/\.newext/' ./*.ext
Option 3, taken from this answer by David W.:
$ find . -name "*.ext1" -print0 | while read -d $'\0' file
do
mv $file "${file%.*}.ext2"
done
and more is here.
UPDATE : (in comment asked what % and {} doing?)
"${variable}othter_chars" > if you want expand a variable in string you can use it. and %.* in {} means take the value of variable strip off the pattern .* from the tail of the value for example if your variable be filename.txt "${variable%.*} return just filename.
Using a shell function to wrap a sed evaluate (e) command:
mvext ()
{
ext="$1";
while shift && [ "$1" ]; do
sed 's/.*/mv -iv "&" "&/
s/\(.*\)\.[^.]*$/\1/
s/.*/&\.'"${ext}"'"/e' <<< "$1";
done
}
Tests, given files bah and boo, and the extension should be .file, which is then changed to .buzz:
mvext file bah boo
mvext buzz b*.file
Output:
'bah' -> 'bah.file'
'boo' -> 'boo.file'
'bah.file' -> 'bah.buzz'
'boo.file' -> 'boo.buzz'
How it works:
The first arg is the file extension, which is stored in $ext.
The while loop parses each file name separately, since a name might include escaped spaces and whatnot. If the filenames were certain to have not such escaped spaces, the while loop could probably be avoided.
sed reads standard input, provided by a bash here string <<< "$1".
The sed code changes each name foo.bar (or even just plain foo) to the string "mv -iv foo.bar
foo.file" then runs that string with the evaluate command. The -iv options show what's been moved and prompts if an existing file might be overwritten.

Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)

I have a bunch of daily printer logs in CSV format and I'm writing a script to keep track of how much paper is being used and save the info to a database, but I've come across a small problem
Essentially, some of the document names in the logs include commas in them (which are all enclosed within double quotes), and since it's in comma separated format, my code is messing up and pushing everything one column to the right for certain records.
From what I've been reading, it seems like the best way to go about fixing this would be using awk or sed, but I'm unsure which is the best option for my situation, and how exactly I'm supposed to implement it.
Here's a sample of my input data:
2015-03-23 08:50:22,Jogn.Doe,1,1,Ineo 4000p,"MicrosoftWordDocument1",COMSYRWS14,A4,PCL6,,,NOT DUPLEX,GRAYSCALE,35kb,
And here's what I have so far:
#!/bin/bash
#Get today's file name
yearprefix="20"
currentdate=$(date +"%m-%d-%y");
year=${currentdate:6};
year="$yearprefix$year"
month=${currentdate:0:2};
day=${currentdate:3:2};
filename="papercut-print-log-$year-$month-$day.csv"
echo "The filename is: $filename"
# Remove commas in between quotes.
#Loop through CSV file
OLDIFS=$IFS
IFS=,
[ ! -f $filename ] && { echo "$Input file not found"; exit 99; }
while read time user pages copies printer document client size pcl blank1 blank2 duplex greyscale filesize blank3
do
#Remove headers
if [ "$user" != "" ] && [ "$user" != "User" ]
then
#Remove any file name with an apostrophe
if [[ "$document" =~ "'" ]];
then
document="REDACTED"; # Lazy. Need to figure out a proper solution later.
fi
echo "$time"
#Save results to database
mysql -u username -p -h localhost -e "USE printerusage; INSERT INTO printerlogs (time, username, pages, copies, printer, document, client, size, pcl, duplex, greyscale, filesize) VALUES ('$time', '$user', '$pages', '$copies', '$printer', '$document', '$client', '$size', '$pcl', '$duplex', '$greyscale', '$filesize');"
fi
done < $filename
IFS=$OLDIFS
Which option is more suitable for this task? Will I have to create a second temporary file to get this done?
Thanks in advance!
As I wrote in another answer:
Rather than interfere with what is evidently source data, i.e. the stuff inside the quotes, you might consider replacing the field-separator commas (with say |) instead:
s/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g
And then splitting on | (assuming none of your data has | in it).
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern
There is probably an easier way using sed alone, but this should work. Loop on the file, for each line match the parentheses with grep -o then replace the commas in the line with spaces (or whatever it is you would like to use to get rid of the commas - if you want to preserve the data you can use a non printable and explode it back to commas afterward).
i=1 && IFS=$(echo -en "\n\b") && for a in $(< test.txt); do
var="${a}"
for b in $(sed -n ${i}p test.txt | grep -o '"[^"]*"'); do
repl="$(sed "s/,/ /g" <<< "${b}")"
var="$(sed "s#${b}#${repl}#" <<< "${var}")"
done
let i+=1
echo "${var}"
done

Regex to remove lines in file(s) that ending with same or defined letters

i need a bash script for mac osx working in this way:
./script.sh * folder/to/files/
#
# or #
#
./script.sh xx folder/to/files/
This script
read a list of files
open each file and read each lines
if lines ended with the same letters ('*' mode) or with custom letters ('xx') then
remove line and RE-SAVE file
backup original file
My first approach to do this:
#!/bin/bash
# ck init params
if [ $# -le 0 ]
then
echo "Usage: $0 <letters>"
exit 0
fi
# list files in current dir
list=`ls BRUTE*`
for i in $list
do
# prepare regex
case $1 in
"*") REGEXP="^.*(.)\1+$";;
*) REGEXP="^.*[$1]$";;
esac
FILE=$i
# backup file
cp $FILE $FILE.bak
# removing line with same letters
sed -Ee "s/$REGEXP//g" -i '' $FILE
cat $FILE | grep -v "^$"
done
exit 0
But it doesn't work as i want....
What's wrong?
How can i fix this script?
Example:
$cat BRUTE02.dat BRUTE03.dat
aa
ab
ac
ad
ee
ef
ff
hhh
$
If i use '*' i want all files that ended with same letters to be clean.
If i use 'ff' i want all files that ended with 'ff' to be clean.
Ah, it's on Mac OSx. Remember that sed is a little different from classical linux sed.
man sed
sed [-Ealn] command [file ...]
sed [-Ealn] [-e command] [-f command_file] [-i extension] [file
...]
DESCRIPTION
The sed utility reads the specified files, or the standard input
if no files are specified, modifying the input as specified by a list
of commands. The
input is then written to the standard output.
A single command may be specified as the first argument to sed.
Multiple commands may be specified by using the -e or -f options. All
commands are applied
to the input in the order they are specified regardless of their
origin.
The following options are available:
-E Interpret regular expressions as extended (modern)
regular expressions rather than basic regular expressions (BRE's).
The re_format(7) manual page
fully describes both formats.
-a The files listed as parameters for the ``w'' functions
are created (or truncated) before any processing begins, by default.
The -a option causes
sed to delay opening each file until a command containing
the related ``w'' function is applied to a line of input.
-e command
Append the editing commands specified by the command
argument to the list of commands.
-f command_file
Append the editing commands found in the file
command_file to the list of commands. The editing commands should
each be listed on a separate line.
-i extension
Edit files in-place, saving backups with the specified
extension. If a zero-length extension is given, no backup will be
saved. It is not recom-
mended to give a zero-length extension when in-place
editing files, as you risk corruption or partial content in situations
where disk space is
exhausted, etc.
-l Make output line buffered.
-n By default, each line of input is echoed to the standard
output after all of the commands have been applied to it. The -n
option suppresses this
behavior.
The form of a sed command is as follows:
[address[,address]]function[arguments]
Whitespace may be inserted before the first address and the
function portions of the command.
Normally, sed cyclically copies a line of input, not including
its terminating newline character, into a pattern space, (unless there
is something left
after a ``D'' function), applies all of the commands with
addresses that select that pattern space, copies the pattern space to
the standard output, append-
ing a newline, and deletes the pattern space.
Some of the functions use a hold space to save all or part of the
pattern space for subsequent retrieval.
anything else?
it's clear my problem?
thanks.
I don't know bash shell too well so I can't evaluate what the failure is.
This is just an observation of the regex as understood (this may be wrong).
The * mode regex looks ok:
^.*(.)\1+$ that ended with same letters..
But the literal mode might not do what you think.
current: ^.*[$1]$ that ended with 'literal string'
This shouldn't use a character class.
Change it to: ^.*$1$
Realize though the string in $1 (before it goes into the regex) should be escaped
incase there are any regex metacharacters contained within it.
Otherwise, do you intend to have a character class?
perl -ne '
BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
/$re/ && next || print
'
Example:
echo "aa
ab
ac
ad
ee
ef
ff" | perl -ne '
BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
/$re/ && next || print
' '*'
produces
ab
ac
ad
ee
ef
A possible issue:
When you put * on the command line, the shell replaces it with the name of all the files in your directory. Your $1 will never equal *.
And some tips:
You can replace replace:
This:
# list files in current dir
list=`ls BRUTE*`
for i in $list
With:
for i in BRUTE*
And:
This:
cat $FILE | grep -v "^$"
With:
grep -v "^$" $FILE
Besides the possible issue, I can't see anything jumping out at me. What do you mean clean? Can you give an example of what a file should look like before and after and what the command would look like?
This is the problem!
grep '\(.\)\1[^\r\n]$' *
on MAC OSX, ( ) { }, etc... must be quoted!!!
Solved, thanks.