Bash string manipulation regex or string indexing - regex

I have two possible artifacts, which I need to modify with a basch script.
Artifacts before manipulation are following
tesla-server-1.1.1-develop#34.tgz
tesla-server-1.1.1-master#34.tgz
After I run my regex on them, they should look the following
tesla-server-1.1.1-develop.tgz
tesla-server-1.1.1.tgz
What I have is the following
#!/bin/env bash
branch=master or develop
if [[ "${branch}" == "develop" ]]; then
artifact="tesla-server-1.1.1-develop#34.tgz"
new_artifact `expr match "$artifact" '(.+develop|.tgz)'`
cp artifact new_artifact
elif [[ "${branch}" == "master" ]]; then
artifact="tesla-server-1.1.1-master#34.tgz"
new_artifact `expr match "$artifact" '(.+master|.tgz)'`
cp artifact new_artifact
fi
Any help would be greatly appreciated, either using regex og string indexing

Using sed you can do:
file='tesla-server-1.1.1-master#34.tgz'
cp "$file" $(sed -E 's/(-master)?#[0-9]+//' <<< "$file")
This will copy given file to tesla-server-1.1.1.tgz

Related

BASH script to extract UUID/PARTUUID from Variable

I'm trying to develop a bash setup script that includes mounting and migrating a boot drive. I've got most of it working, but would like to populate my /boot/cmdline.txt and fstab files with drive UUID and PARTUUID numbers.
I basically set a variable with the output of blkid:
disk=$(blkid)
echo "${disk}"
RESULT:
/dev/mmcblk0p1: LABEL_FATBOOT="boot" LABEL="boot" UUID="69D5-9B27" TYPE="vfat" PARTUUID="d9b3f436-01"
/dev/mmcblk0p2: LABEL="rootfs" UUID="24eaa08b-10f2-49e0-8283-359f7eb1a0b6" TYPE="ext4" PARTUUID="d9b3f436-02"
/dev/sda1: LABEL="usbfs" UUID="493b6467-7b7b-4291-a86d-dea5e842780b" TYPE="ext4" PARTUUID="83122dbb-cacf-4612-9be2-4301a03e8093"
/dev/mmcblk0: PTUUID="d9b3f436" PTTYPE="dos"
My goal is to set one variable to capture the /dev/sda1 value for UUID and the other for the same drives PARTUUID. My basic premise is to do something like this (based on being able to do this in python:
#sudo code#
Disk=diskInfo
While line in Disk; do
If Line contains /dev/sda1
Then
Do some Regex to set vUUID = "493b6467-7b7b-4291-a86d-dea5e842780b"
Do some Regex to set vPARTUUID = "83122dbb-cacf-4612-9be2-4301a03e8093"
I think I want something like this - - but can't get it to work:
disk=$(blkid)
while read line; do
if [[ $line == '/dev/sda1'* ]]; then
if [[ $line =~ UUID=(["'])(?:(?=(\\?))\2.)*?\1 ]]; then #captures too much
vUUID=${BASH_REMATCH[1]}
fi
if [[ $line =~ PARTUUID=(["'])(?:(?=(\\?))\2.)*?\1 ]]; then #captures too much
vPARTUUID=${BASH_REMATCH[1]}
fi
fi
done <<< "$disk"
You don't need a loop here.
$ IFS=\" read -r _ vUUID _ vPARTUUID _ < <(blkid /dev/sda1 -s UUID -s PARTUUID)
$
$ echo $vUUID
9099-AD46
$
$ echo $vPARTUUID
90afc43c-5b4d-4721-b82a-000e585fef62
If there is no such disk read will silently fail with a non-zero exit status; so you can use it as a condition in an if-else expression.
You are almost there.
Would you please try:
pat='^/dev/sda1.* UUID="([^"]+)".* PARTUUID="([^"]+)"'
while IFS= read -r line; do
if [[ $line =~ $pat ]]; then
vUUID="${BASH_REMATCH[1]}"
vPARTUUID="${BASH_REMATCH[2]}"
fi
done < <(blkid)
result:
echo "vUUID=$vUUID"
vUUID=493b6467-7b7b-4291-a86d-dea5e842780b
echo "vPARTUUOID=$vPARTUUID"
vPARTUUOID=83122dbb-cacf-4612-9be2-4301a03e8093
Hope this helps.

ShellScript - IF handling filename pattern

I have a directory in UNIX which has a thousands of .TGZ compressed files, they follow this pattern :
01.red.something.tgz
02.red.something.tgz
03.red.anything.tgz
04.red.something.tgz
01.blue.something.tgz
02.blue.everything.tgz
03.blue.something.tgz
04.blue.something.tgz
01.yellow.something.tgz
02.yellow.blablathing.tgz
03.yellow.something.tgz
04.yellow.something.tgz
They are using a large amount from the filesystem,and i need to list them without extract the file itself. Actually it'll take some time, so i believe this shellscript will fit the need. I'm kinda new to Shellscript, i'm learning so i made this .sh
$pattern = "red"
for file in *.tgz
do
if [[ ${file} == '...${pattern}.*.tgz' ]]; then
echo" ==> ${file} match the pattern and the output dir is : out/"
tar -tf $file > ./out/$file
else
echo "${file} Doesn't match the pattern"
fi
done
But i've made something wrong in the if part,and even when the pattern is matched, i've got the 'Doesn't match the pattern' message.
I Know it's kinda simple if,but i can't understand why this fella doesn't work. I'd be thankfull if you guys can explain why this doesn't work.
Thank you.
you need to watch out for spaces when you create varibales in bash, in if there should not be ' - single quotes or " - double quotes if you want to match on regex, use: if [[ ${file} == ${regEx} ]];
Test:
$ ls *.tgz
01.red.something.tgz 01.yellow.something.tgz
$ ./t.sh
==> 01.red.something.tgz match the pattern and the output dir is : out/
01.yellow.something.tgz Doesn't match the pattern
$ cat t.sh
#!/bin/bash
pattern="red"
regEx="*.${pattern}.*.tgz"
for file in *.tgz
do
if [[ ${file} == ${regEx} ]]; then
echo " ==> ${file} match the pattern and the output dir is : out/"
#tar -tf $file > ./out/$file
else
echo "${file} Doesn't match the pattern"
fi
done

Find a string in a file name (shell script)

I am trying to use regex to match a file name and extract only a portion of the file name. My file names have this pattern: galax_report_for_Sample11_8757.xls, and I want to extract the string Sample11 in this case. I have tried the following regex, but it does not work for me, could someone help with the correct regex?
name=galax_report_for_Sample11_8757.xls
sampleName=$([[ "$name" =~ ^[^_]+_([^_]+) ]] && echo ${BASH_REMATCH[2]})
edit:
just found this works for me:
sampleName=$([[ "$name" =~ ^[^_]+_([^_]+)_([^_]+)_([^_]+) ]] && echo ${BASH_REMATCH[3]})
In a simple case like this, where you essentially have just a list of values separated by a single instance of a separator character each, consider using cut to extract the field of interest:
sampleName=$(echo 'galax_report_for_Sample11_8757.xls' | cut -d _ -f 4)
If you're using bash or zsh or ksh, you can make it a little more efficient:
sampleName=$(cut -d _ -f 4 <<< 'galax_report_for_Sample11_8757.xls')
Here is a slightly shorter alternative to the approach you used:
sampleName=$([[ "$name" =~ ^([^_]+_){3}([^_]+) ]] && echo ${BASH_REMATCH[2]})

Bash Script sed command not working correctly with file passed through command line

Problem
As I am trying to write a script to rename massive files according to some regex requirement, the command work ok on my iTerm2 succeeds but the same command fails to do the work in the script.
Plus some of my file names includes some Chinese and Korean characters.(don't know whether that is the problem or not)
code
So My code takes three input: Old regex, New regex and the files that need to be renamed.
Here is not code:
#!/bin/bash
# we have less than 3 arguments. Print the help text:
if [ $# -lt 3 ] ; then
cat << HELP
ren -- renames a number of files using sed regular expressions USAGE: ren 'regexp'
'replacement' files...
EXAMPLE: rename all *.HTM files into *.html:
ren 'HTM' 'html' *.HTM
HELP
exit 0
fi
OLD="$1"
NEW="$2"
# The shift command removes one argument from the list of
# command line arguments.
shift
shift
# $# contains now all the files:
for file in "$#"; do
if [ -f "$file" ] ; then
newfile=`echo "$file" | sed "s/${OLD}/${NEW}/g"`
if [ -f "$newfile" ]; then
echo "ERROR: $newfile exists already"
else
echo "renaming $file to $newfile ..."
mv "$file" "$newfile"
fi
fi
done
I register the bash command in the .profile as:
alias ren="bash /pathtothefile/ren.sh"
Test
The original file name is "제01과.mp3" and I want it to become "第01课.mp3".
So with my script I use:
$ ren "제\([0-9]*\)과" "第\1课" *.mp3
And it seems that the sed in the script has not worked successfully.
But the following which is exactly the same, works to replaces the name:
$ echo "제01과.mp3" | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
Any thoughts? Thx
Print the result
I have make the following change in the script so that it could print the process information:
newfile=`echo "$file" | sed "s/${OLD}/${NEW}/g"`
echo "The ${file} is changed to ${newfile}"
And the result for my test is:
The 제01과.mp3 is changed into 제01과.mp3
ERROR: 제01과.mp3 exists already
So there is no format problem.
Updating(all done under bash 4.2.45(2), Mac OS 10.9)
Testing
As I try to execute the command from the bash directly. I mean with the for loop. There is something interesting. I first stored all the names into a files.txt file using:
$ ls | grep mp3 > files.txt
And do the sed and bla bla. While single command in bash interactive mode like:
$ file="제01과.mp3"
$ echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
gives
第01课.mp3
While in the following in the interactive mode:
files=`cat files.txt`
for file in $files
do
echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
done
gives no changes!
And by now:
echo $file
gives:
$ 제30과.mp3
(There are only 30 files)
Problem Part
And I tried the first command which worked before:
$ echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
It gives no changes as:
$ 제30과.mp3
So I create a new newfile and tried again as:
$ newfile="제30과.mp3"
$ echo $newfile | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
And it gives correctly:
$第30课.mp3
WOW ORZ... Why! Why ! Why! And I try to see whether file and newfile are the same, and of course, they are not:
if [[ $file == $new ]]; then
echo True
else
echo False
fi
gives:
False
My guess
I guess there are some encoding problems , but I have found non reference, could anyone help? Thx again.
Update 2
I seem to understand that there are a huge difference between string and the file name. To be specific, it I directly use a variable like:
file="제30과.mp3"
in the script, the sed works fine. However, if the variable was passed from the $# or set the variable like:
file=./*mp3
Then the sed fails to work. I don't know why. And btw, mac sed has no -r option and in ubuntu -r does not solve the question I mention above.
Some errors combined:
In order to use groups in a regex, you need extended regex -r in sed, -E in grep
escaping correctly is a beast :)
Example
files="제2과.mp3 제30과.mp3"
for file in $files
do
echo $file | sed -r 's/제([0-9]*)과\.mp3/第\1课.mp3/g'
done
outputs
第2课.mp3
第30课.mp3
If you are not doing this as a programming project, but want to skip ahead to the part where it just works, I found these resources listed at http://www.tldp.org/LDP/GNU-Linux-Tools-Summary/html/x4055.htm:
MMV (and MCP, MLN, ...) utilities use a specialized syntax to perform bulk file operations on paths. (http://linux.maruhn.com/sec/mmv.html)
mmv before\*after.mp3 Before\#1After.mp3
Esomaniac, a Java alternative that also works on Windows, is apparently dead (home page is parked).
rename is a perl script you can download from CPAN: https://metacpan.org/release/File-Rename
rename 's/\.JPG$/.jpg/' *.JPG

Parse date timestamp in SQL filename using RegEx

I am building a bash script for backing up databases. I've already set up a cron job to running this script daily and I already can dump the .sql files according to the following format:
YYYYMMDD_HHMMSS-databasename.sql
Considering the timestamp formatted name, I want to build another bash script that will parse the YYMMDD filename part and select all the daily files of the last week. This new bash script will run weekly.
How can I parse these numbers into a date using regex?
Selecting the date part from a filename with RegEx:
^(20[12]\d)(0[1-9]|1[012])(0[1-9]|[12]\d|3[01])_\d+-\w+\.sql$
Explained regex here: http://regex101.com/r/iU7wL5
Update with correct time validation also:
^(20[12]\d)(0[1-9]|1[012])(0[1-9]|[12]\d|3[01])_([01]\d|2[0-3])[0-5]\d[0-5]\d-\w+\.sql$
Explained demo: http://regex101.com/r/yV1dD7
Note: this works on dates in 2010-2029 range and validates the filename to your output format
Here a complete solution, try doing this :
without regex but offset cut (assuming your example is the same format for all files , like a script does when running in crontab):
cd /path/to/dumps
str='20130321_145907-databasename.sql'
for i in {7..14}; do
curfile=$(date -d ${str:0:8} -d "$i days ago" '+%Y%m%d')*
if [[ -s $curfile ]]; then
# do something with "$curfile"
fi
done
If you really need a regex :
cd /path/to/dumps
str='20130321_145907-databasename.sql'
if [[ $str =~ ^([0-9]{8})_[0-9]{6} ]]; then
for i in {7..14}; do
curfile=$(date -d ${BASH_REMATCH[1]} -d "$i days ago" '+%Y%m%d')*
if [[ -s $curfile ]]; then
# do something with "$curfile"
fi
done
fi
Note
note the final glob * on the curfile= line
With bash 3+:
$ file=20130321_foo.log
$ [[ $file =~ ^[0-9]{8} ]]
$ echo ${BASH_REMATCH[0]}
20130321
$
What about this?
LAST_WEEK_BEG=$(date --date="-7 days" +%Y%m%d)
LAST_WEEK_END=$(date --date="-14 days" +%Y%m%d)
if [ $YOUR_DATE -ge "$LAST_WEEK_BEG" ] && [ $YOUR_DATE -le "$LAST_WEEK_END" ]; then
do things
fi