Find and replace date/time in string - regex

I have already made a regex which should work, but it doesn't work.
echo "FileName.17:09:2010 4.16.PM.720p.mp4" | sed -E 's/\d{2}:\d{2}:\d{4}\ (\d|\d{2})\.(\d{2}|\d)\.((AM)|(PM))//g'
Should output: FileName..720p.mp4
But instead outputs the same "FileName.17:09:2010 4.16.PM.720p.mp4".

Is \d a valid character class in sed? Try replacing it with [0-9] or [:digit:]. See re_format.

#! /bin/sh
f="FileName.17:09:2010 4.16.PM.720p.mp4"
echo ${f%%.*}${f##*[AP]M}
This works for any variable containing a string matching the pattern. See Parameter Expansion for shell variables.
With sed, just delete from the first dot to the AM or PM. Or, if the filename could have extraneous dots, then delete from the 1st number folowed by ':' up to [AP]M, 's/\.[0-9]\+:.*\[AP]M//'
I think the sed way might be better; because if it fails it returns the original string. A mismatch on the shell expression would return some of the name twice; but error checking can be added easily: It fails for files not having AM or PM in the name.
echo FileName.17:09:2010 4.16.PM.720p.mp4|sed -e 's/\..*[AP]M\././'

bash$ echo "FileName.17:09:2010 4.16.PM.720p.mp4"|ruby -e 's=gets.chomp.split(".");puts [s[0],s[-2,2]].join(".")'
FileName.720p.mp4

Related

Deleting everything after a pattern in Unix

I have a string replenishment_category string,Date string, I want to delete everything starting with Date (including it), also the comma before it if it is present.
I have the string to be replaced stored in a variable:
PARTITION_COLUMN='Date'
I tried sed to replace everything after the variable PARTITION_COLUMN
echo "replenishment_category string,Date string" | sed "s/"$PARTITION_COLUMN".* //g"
but the output still has the string that follows the date:
replenishment_category string,string
How do I remove the string part and also the comma preceding the Date?
Try this:
echo "replenishment_category string,Date string" | sed "s/$PARTITION_COLUMN.*//"
Notice the space removed after .* and the double quote around the entire command.
You could do this with shell parameter expansion alone, assuming you have extended globs enabled (shopt -s extglob):
$ var='replenishment_category string,Date string'
$ part_column=Date
$ echo "${var%%?(,)"$part_column"*}"
replenishment_category string
The ${word%%pattern} expansion works without extended globs and removes the longest match of pattern from the end of $word.
The ?(pattern) extended pattern matches zero or one occurrences of pattern and is used to remove the comma if present.
"$part_column"* matches any string that begins with the expansion of $part_column. Quoting it is not required in the example, but good practice to prevent glob characters from expanding.
Since you want to remove variable as well as comma before it so following sed may help you here.
echo "replenishment_category string,Date string" | sed "s/,$PARTITION_COLUMN.*//g"
sed would obviously work here:
echo "replenishment_category string,Date string" | sed "s/\b,$PARTITION_COLUMN.*//"
Output:
replenishment_category string

How can I use sed to regex string and number in bash script

I want to separate string and number in a file to get a specific number in bash script, such as:
Branches executed:75.38% of 1190
I want to only get number
75.38
. I have try like the code below
$new_value=value | sed -r 's/.*_([0-9]*)\..*/\1/g'
but it was incorrect and it was failed.
How should it works? Thank you before for your help.
You can use the following regex to extract the first number in a line:
^[^0-9]*\([0-9.]*\).*$
Usage:
% echo 'Branches executed:75.38% of 1190' | sed 's/^[^0-9]*\([0-9.]*\).*$/\1/'
75.38
Give this a try:
value=$(sed "s/^Branches executed:\([0-9][.0-9]*[0-9]*\)%.*$/\1/" afile)
It is assumed that the line appears only once in afile.
The value is stored in the value variable.
There are several things here that we could improve. One is that you need to escape the parentheses in sed: \(...\)
Another one is that it would be good to have a full specification of the input strings as well as a good script that can help us to play with this.
Anyway, this is my first attempt:
Update: I added a little more bash around this regex so it'll be more easy to play with it:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]*\.[0-9]*\).*/\1/g'`
echo $new_value
Update 2: as john pointed out, it will match only numbers that contain a decimal dot. We can fix it with an optional group: \(\.[0-9]\+\)?.
An explanation for the optional group:
\(...\) is a group.
\(...\)? Is a group that appears zero or one times (mind the question mark).
\.[0-9]\+ is the pattern for a dot and one or more digits.
Putting all together:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]\+\(\.[0-9]\+\)\?\).*/\1/g'`
echo $new_value

unix sed command regular expression

Can anyone explain me how the regular expression works in the sed substitute command.
$ cat path.txt
/usr/kbos/bin:/usr/local/bin:/usr/jbin:/usr/bin:/usr/sas/bin
/usr/local/sbin:/sbin:/bin/:/usr/sbin:/usr/bin:/opt/omni/bin:
/opt/omni/lbin:/opt/omni/sbin:/root/bin
$ sed 's/\(\/[^:]*\).**/\1/g' path.txt
/usr/kbos/bin
/usr/local/sbin
/opt/omni/lbin
From the above sed command they used back reference and save operator concept.
Can anyone explain me how the regular expression especially /[^:]* work in the substitute command to get only the first path in each line.
I think you wrote an extra asterisk * in your sed code, so it should be like this:
$ sed 's/\(\/[^:]*\).*/\1/g' file
/usr/kbos/bin
/usr/local/sbin
/opt/omni/lbin
To change the delimiter will help to understand it a little bit better:
sed 's#\(/[^:]*\).*#\1#g'
The s#something#otherthing#g is a basic sed command that looks for something and changes it for otherthing all over the file.
If you do s#(something)#\1#g then you "save" that something and then you can print it back with \1.
Hence, what it is doing is to get a pattern like /[^:]* and then print is back. /[^:]* means / and then every char except :. So it will get / + all the string until it finds a semicolon :. It will store that piece of the string and then print it back.
Small examples:
# get every char
$ echo "hello123bye" | sed 's#\([a-z]*\).*#\1#g'
hello
# get everything until it finds the number 3
$ echo "hello123bye" | sed 's#\([^3]*\).*#\1#g'
hello12
[^:]*
in regex would match all characters except for :, so it would match until this:
/usr/kbos/bin
also it would match these,
/usr/local/bin
/usr/jbin
/usr/bin
/usr/sas/bin
As, these all contains characters, that are not :
.* match any character, zero or more times.
Thus, this regex [^:]*.*, would match all this expressions:
/usr/kbos/bin:/usr/local/bin:/usr/jbin:/usr/bin:/usr/sas/bin
/usr/local/bin:/usr/jbin:/usr/bin:/usr/sas/bin
/usr/jbin:/usr/bin:/usr/sas/bin
/usr/bin:/usr/sas/bin
However, you get only the first field (ie,/usr/kbos/bin, by using back reference in sed), because, regular expression output the longest possible match found.

Extract string located after or between matched pattern(s)

Given a string "pos:665181533 pts:11360 t:11.360000 crop=720:568:0:4 some more words"
Is it possible to extract string between "crop=" and the following space using bash and grep?
So if I match "crop=" how can I extract anything after it and before the following white space?
Basically, I need "720:568:0:4" to be printed.
I'd do it this way:
grep -o -E 'crop=[^ ]+' | sed 's/crop=//'
It uses sed which is also a standard command. You can, of course, replace it with another sequence of greps, but only if it's really needed.
I would use sed as follows:
echo "pos:665181533 pts:11360 t:11.360000 crop=720:568:0:4 some more words" | sed 's/.*crop=\([0-9.:]*\)\(.*\)/\1/'
Explanation:
s/ : substitute
.*crop= : everything up to and including "crop="
\([0-9.:]\) : match only numbers and '.' and ':' - I call this the backslash-bracketed expression
\(.*\) : match 'everything else' (probably not needed)
/\1/ : and replace with the first backslash-bracketed expression you found
I think this will work (need to recheck my reference):
awk '/crop=([0-9:]*?)/\1/'
yet another way with bash pattern substitution
PAT="pos:665181533 pts:11360 t:11.360000 crop=720:568:0:4 some more words"
RES=${PAT#*crop=}
echo ${RES%% *}
first remove all up to and including crop= found from left to right (#)
then remove all from and including the first space found from right to left (%%)

How to retain the first instance of a match with sed

I have a set of tokens in data and wish to strip off the trailing ".[0-9]", however i cannot figure out how to quote the regexp properly. The First match should be all up to the . and the second the . and a number. I am intending that the first match be retained.
data="thing thing__aaa.0 thing__bbb.3 thing__ccc.5 other_aaa other_bbb other_ccc.5"
data=`echo $data | sed s/\([a-zA-Z0-9_]+\)\(\.[0-9]\)/\1/g`
echo $data
Actual output:
thing thing__aaa.0 thing__bbb.3 thing__ccc.5 other_aaa other_bbb other_ccc.5
Desired output:
thing thing__aaa thing__bbb thing__ccc other_aaa other_bbb other_ccc
The idea is that the unquoted ([a-zA-Z0-9_]+) is the first matching group, and the (\.[0-9]) matches the .number. the \1 should replace both groups with the first group.
How about just
echo $data | sed 's/\.[0-9]//g'
or if number may contain more digits, then
echo $data | sed 's/\.[0-9]\+//g'
It looks like you just want to delete all strings of the form \.[0-9]. So why not just do:
sed 's/\.[0-9]+\b//g'
(This relies on gnu sed's \b and + extensions. For other sed you can do:
sed 's/\.[0-9][0-9]*\( \|$\)/\1/g'
I normally don't encourage the use of shell specific extensions, but if you are using bash you might be happy using an array:
bash$ data=(thing thing__aaa.0 thing__bbb.3)
bash$ echo "${data[#]%.[0-9]*}"
Note that this will also delete extensions that are not all digits (ie foo.34bb), but perhaps is adequate for your needs.)