sed - get only text without extension - regex

How do I remove the extension in this SED statement?
Through
sed 's/.* - //'
File content
2021-04-21_#fluffyban_6953588770591509765.mp4 - Filename.mp4
Actual
Filename.mp4
Desired
Filename

With your shown samples only. This could be done with simple codes in awk,sed and perl as follows.
1st solution: Using sed, perform simple substitutions and you will get desired output.
sed 's/.*- //;s/\.mp4$//' Input_file
2nd solution: Using awk its more simpler, creating different field separator and just print appropriate 2nd last column.
awk -F'- |.mp4' '{print $(NF-1)}' Input_file
3rd solution: Using substitution method in awk to get the required value as per OP's requirement.
awk '{gsub(/.*- |\.mp4$/,"")} 1' Input_file
4th solution: With perl one liner we could grab the appropriate needed value by setting field separators as dash spaces and .mp4 as follows:
perl -a -F'-\s+|\.mp4' -ne 'print "$F[$#F-1]\n";' Input_file

The Bash way (which works in most similar shells such us zsh,sh,ksh) is:
fn="2021-04-21_#fluffyban_6953588770591509765.mp4 - Filename.mp4"
base=${fn%.*}
ext=${fn#$base.}
echo "$base"
echo "$ext"
Prints:
2021-04-21_#fluffyban_6953588770591509765.mp4 - Filename
mp4

You can use
#!/bin/bash
s='2021-04-21_#fluffyban_6953588770591509765.mp4 - Filename.mp4'
sed -n 's/.* - \([^.]*\).*/\1/p' <<< "$s"
# => Filename
See the online demo.
Details:
-n - suppress default line output
s/ - substitute found pattern
.* - \([^.]*\).* - any text, space, -, space, then any zero or more chars other than a dot captured into Group 1, and then any text
/\1/ - replace found matches with Group 1 value
p - print the result of the substitution.

Using gnu awk you can also use a capture group to get the filename
match($0, /.* - ([^.]+)\.mp4$/, a) {print a[1]}' file
Regex explanation
.* - Match the last occurrence of -
( Capture group 1 (Referred to by a[1] in the awk example)
[^.]+ Match 1+ times any char except a dot
) Close group 1
\.mp4$ Match .mp4 at the end of the string
Awk explanation
awk '
match($0, /.* - ([^.]+)\.mp4$/, a) { # Test if the line using $0 matches the pattern
print a[1] # Print the value of group 1
}
' file

Yet another awk:
awk '{sub(/\.[^.]+$/, ""); print $NF}' file
Filename

gawk/mawk/mawk2 'BEGIN { FS = "( \- |[.][^. ]+$)"
} NF > 2 { print $(NF-1) }'
no substr(), index(), match(), or sub() needed. If you're VERY certain " - " can only occur once, then
awk 'BEGIN { FS = "(^.* \- |[.][^. ]+$)"; OFS = "" } —-NF'

Related

Sed: can not replace part of the string whithout replacing all of it

I am trying to replace part of the string, but can not find a proper regex for sed to execute it properly.
I have a string
/abc/foo/../bar
And I would like to achive the following result:
/abc/bar
I have tried to do it using this command:
echo $string | sed 's/\/[^:-]*\..\//\//'
But as result I am getting just /bar.
I understand that I must use group, but I just do not get it.
Could you, please, help me to find out this group that could be used?
You can use
#!/bin/bash
string='/abc/foo/../bar'
sed -nE 's~^(/[^/]*)(/.*)?/\.\.(/[^/]*).*~\1\3~p' <<< "$string"
See the online demo. Details:
-n - suppresses default line output
E - enables POSIX ERE regex syntax
^ - start of string
(/[^/]*) - Group 1: a / and then zero or more chars other than /
(/.*)? - an optional group 2: a / and then any text
/\.\. - a /.. fixed string
(/[^/]*) - Group 3: a / and then zero or more chars other than /
.* - the rest of the string.
\1\3 replaces the match with Group 1 and 3 values concatenated
p only prints the result of successful substitution.
You can use a capture group for the first part and then match until the last / to remove.
As you are using / to match in the pattern, you can opt for a different delimiter.
#!/bin/bash
string="/abc/foo/../bar"
sed 's~\(/[^/]*/\)[^:-]*/~\1~' <<< "$string"
The pattern in parts:
\( Capture group 1
/[^/]*/ Match from the first till the second / with any char other than / in between
\) Close group 1
[^:-]*/ Match optional chars other than : and - then match /
Output
/abc/bar
Using sed
$ sed 's#^\(/[^/]*\)/.*\(/\)#\1\2#' input_file
/abc/bar
or
$ sed 's#[^/]*/[^/]*/##2' input_file
/abc/bar
Using awk
string='/abc/foo/../bar'
awk -F/ '{print "/"$2"/"$NF}' <<< "$string"
#or
awk -F/ 'BEGIN{OFS=FS}{print $1,$2,$NF}' <<< "$string"
/abc/bar
Using bash
string='/abc/foo/../bar'
echo "${string%%/${string#*/*/}}/${string##*/}"
/abc/bar
Using any sed:
$ echo "$string" | sed 's:\(/[^/]*/\).*/:\1:'
/abc/bar

Match from beginning to word as long as there are no . in between: Convert grep -Po command to sed

I have made the following command to be able to match the string from the beginning of the line until the first occurrence of ".enabled" as long as there are no "." in between.
grep -Po '^\K[\w-]*?(?=\.enabled)'
input:
a-b-c.a.enabled.xxx.xx
a-b-c.a.b.enabled.xxx.xx
a-b-c.enabled.xxx.xx
output:
a-b-c
It runs properly on my local env with grep v3.1 but on Busybox v1.28.4 it says "grep: unrecognized option: P"
For that reason, I would like to convert this command to sed. Any input would be really helpful.
You can use
awk -F'.' '$2 == "enabled"{print $1}' file
sed -n 's/^\([^.]*\)\.enabled.*/\1/p' file
See the online demo.
Details:
awk:
-F'.' - the field separator is set to a .
$2 == "enabled" - if Group 2 value is enabled, then
{print $1} - print Field 1 value
sed:
-n - suppresses default line output in the sed command
s/^\([^.]*\)\.enabled.*/\1/p - finds any zero or more chars other than . at the start of string (placing them into Group 1, \1), then a .enabled and then the rest of the string and replaces with the Group 1 value, and prints the resulting value.
You may use this equivalent sed of your grep -P command:
sed -nE 's/^([-_[:alnum:]]+)\.enabled.*/\1/p' file
a-b-c
Details:
-n: Suppress notmal output
-E: Enables extended regex mode
([-_[:alnum:]]+): -_[:alnum:]]is equivalent of [-\w] or [-_a-zA-Z0-9]. It matches 1+ of these characters and captures them in group #1
\.enabled.*: matches .enabled followed by 0 or more of any string
\1: is replacement string that put value captured in capture group #1 back in replacement
With your shown samples, you could try following.
awk -F'\\.enabled' '$1~/^[-_[:alnum:]]+$/{print $1}' Input_file
Explanation: Simply making field separator as .enabled for all the lines here. Then in main program checking condition if 1st field is having --or_` or alphanumeric then print 1st field here.

How to extract text between first 2 dashes in the string using sed or grep in shell

I have the string like this feature/test-111-test-test.
I need to extract string till the second dash and change forward slash to dash as well.
I have to do it in Makefile using shell syntax and there for me doesn't work some regular expression which can help or this case
Finally I have to get smth like this:
input - feature/test-111-test-test
output - feature-test-111- or at least feature-test-111
feature/test-111-test-test | grep -oP '\A(?:[^-]++-??){2}' | sed -e 's/\//-/g')
But grep -oP doesn't work in my case. This regexp doesn't work as well - (.*?-.*?)-.*.
Another sed solution using a capture group and regex/pattern iteration (same thing Socowi used):
$ s='feature/test-111-test-test'
$ sed -E 's/\//-/;s/^(([^-]*-){3}).*$/\1/' <<< "${s}"
feature-test-111-
Where:
-E - enable extended regex support
s/\//-/ - replace / with -
s/^....*$/ - match start and end of input line
(([^-]-){3}) - capture group #1 that consists of 3 sets of anything not - followed by -
\1 - print just the capture group #1 (this will discard everything else on the line that's not part of the capture group)
To store the result in a variable:
$ url=$(sed -E 's/\//-/;s/^(([^-]*-){3}).*$/\1/' <<< "${s}")
$ echo $url
feature-test-111-
You can use awk keeping in mind that in Makefile the $ char in awk command must be doubled:
url=$(shell echo 'feature/test-111-test-test' | awk -F'-' '{gsub(/\//, "-", $$1);print $$1"-"$$2"-"}')
echo "$url"
# => feature-test-111-
See the online demo. Here, -F'-' sets the field delimiter as -, gsub(/\//, "-", $1) replaces / with - in Field 1 and print $1"-"$2"-" prints the value of --separated Field 1 and 2.
Or, with a regex as a field delimiter:
url=$(shell echo 'feature/test-111-test-test' | awk -F'[-/]' '{print $$1"-"$$2"-"$$3"-"}')
echo "$url"
# => feature-test-111-
The -F'[-/]' option sets the field separator to - and /.
The '{print $1"-"$2"-"$3"-"}' part prints the first, second and third value with a separating hyphen.
See the online demo.
To get the nth occurrence of a character C you don't need fancy perl regexes. Instead, build a regex of the form "(anything that isn't C, then C) for n times":
grep -Eo '([^-]*-){2}' | tr / -
With sed and cut
echo feature/test-111-test-test| cut -d'-' -f-2 |sed 's/\//-/'
Output
feature-test-111
echo feature/test-111-test-test| cut -d'-' -f-2 |sed 's/\//-/;s/$/-/'
Output
feature-test-111-
You can use the simple BRE regex form of not something then that something which is [^-]*- to get all characters other than - up to a -.
This works:
echo 'feature/test-111-test-test' | sed -nE 's/^([^/]*)\/([^-]*-[^-]*-).*/\1-\2/p'
feature-test-111-
Another idea using parameter expansions/substitutions:
s='feature/test-111-test-test'
tail="${s//\//-}" # replace '/' with '-'
# split first field from rest of fields ('-' delimited); do this 3x times
head="${tail%%-*}" # pull first field
tail="${tail#*-}" # drop first field
head="${head}-${tail%%-*}" # pull first field; append to previous field
tail="${tail#*-}" # drop first field
head="${head}-${tail%%-*}-" # pull first field; append to previous fields; add trailing '-'
$ echo "${head}"
feature-test-111-
A short sed solution, without extended regular expressions:
sed 's|\(.*\)/\([^-]*-[^-]*\).*|\1-\2|'

Do various substitutions but only before a character

I am doing something like this:
echo 'foo_bar_baz=foo_bar_baz' | sed -r 's/_([[:alnum:]])/\U\1/g'
and getting as result:
fooBarBaz=fooBarBaz
Is there a way of getting fooBarBaz=foo_bar_baz instead?
I tryed to do this, non-greedy:
echo 'foo_bar_baz=foo_bar_baz' | sed -r 's/([^=].*?)_([[:alnum:]])/\1\U\2/g'
but the result is this:
foo_bar_baz=foo_barBaz
What I need is to convert from:
foo_bar_baz=foo_bar_baz
to:
fooBarBaz=foo_bar_baz
EDIT: Adding more Generic solution which will work for more than 3 values before = too.
awk '
BEGIN{
FS=OFS="="
}
{
num=split($1,array,"_")
for(i=2;i<=num;i++){
val=(val?val:"")toupper(substr(array[i],1,1)) substr(array[i],2)
}
$1=array[1] val
val=""
}
1
' Input_file
This should be an easy task for awk.
echo 'foo_bar_baz=foo_bar_baz' | awk '
BEGIN{
FS=OFS="="
}
{
split($1,array,"_")
$1=array[1] toupper(substr(array[2],1,1)) substr(array[2],2) toupper(substr(array[3],1,1)) substr(array[3],2)
}
1'
To simply remove _ in first part use(this will not make letter capital):
echo 'foo_bar_baz=foo_bar_baz' | awk 'BEGIN{FS=OFS="="}{gsub(/_/,"",$1)} 1'
You may use
s='foo_bar_baz=foo_bar_baz'
sed -E ':a;s/^([^=_]*)_([[:alnum:]])/\1\U\2/g; ta' <<< "$s"
# => fooBarBaz=foo_bar_baz
See the online sed demo
Details
:a - define an a label to jump to if the substitution is a success
s/^([^=_]*)_([[:alnum:]])/\1\U\2/g - find
^ - start of string
([^=_]*) - Group 1 (\1 in the replacement pattern): any 0+ chars other than = and _
_ - an underscore
([[:alnum:]]) - Group 2 (\2 in the replacement pattern): an alphanumeric char
\1\U\2 - Group 1 value and then an uppercased Group 2 value
ta - t is a branch command making sed go back to the a label and repeat matching.
This might work for you (GNU sed):
sed -E 'h;s/_(.)/\u\1/g;G;s/=.*=/=/' file
Make a copy of the current line. Remove all _'s and uppercase the following characters. Append the copy and replace everything between ='s with a single =.
An alternative:
sed -E ':a;s/_(.*=)/\u\1/;ta' file
With GNU awk for the 3rd arg to match():
$ echo 'foo_bar_baz=foo_bar_baz' |
awk '{while (match($0,/(.*)_(.)(.*=.*)/,a)) $0 = a[1] toupper(a[2]) a[3]} 1'
fooBarBaz=foo_bar_baz
Note that the above solution is not restricted to any specific number of _s nor any specific letter following the underscores:
$ echo 'wee_sleekit_cowrin_timrous_beastie=foo_bar_baz' |
awk '{while (match($0,/(.*)_(.)(.*=.*)/,a)) $0 = a[1] toupper(a[2]) a[3]} 1'
weeSleekitCowrinTimrousBeastie=foo_bar_baz
Change _(.) to _([[:lower:]]) if you only want the underscores removed when followed by a lower case letter.

How to print matched regex pattern using awk?

Using awk, I need to find a word in a file that matches a regex pattern.
I only want to print the word matched with the pattern.
So if in the line, I have:
xxx yyy zzz
And pattern:
/yyy/
I want to only get:
yyy
EDIT:
thanks to kurumi i managed to write something like this:
awk '{
for(i=1; i<=NF; i++) {
tmp=match($i, /[0-9]..?.?[^A-Za-z0-9]/)
if(tmp) {
print $i
}
}
}' $1
and this is what i needed :) thanks a lot!
This is the very basic
awk '/pattern/{ print $0 }' file
ask awk to search for pattern using //, then print out the line, which by default is called a record, denoted by $0. At least read up the documentation.
If you only want to get print out the matched word.
awk '{for(i=1;i<=NF;i++){ if($i=="yyy"){print $i} } }' file
It sounds like you are trying to emulate GNU's grep -o behaviour. This will do that providing you only want the first match on each line:
awk 'match($0, /regex/) {
print substr($0, RSTART, RLENGTH)
}
' file
Here's an example, using GNU's awk implementation (gawk):
awk 'match($0, /a.t/) {
print substr($0, RSTART, RLENGTH)
}
' /usr/share/dict/words | head
act
act
act
act
aft
ant
apt
art
art
art
Read about match, substr, RSTART and RLENGTH in the awk manual.
After that you may wish to extend this to deal with multiple matches on the same line.
gawk can get the matching part of every line using this as action:
{ if (match($0,/your regexp/,m)) print m[0] }
match(string, regexp [, array])
If array is present, it is cleared,
and then the zeroth element of array is set to the entire portion of
string matched by regexp. If regexp contains parentheses, the
integer-indexed elements of array are set to contain the portion of
string matching the corresponding parenthesized subexpression.
http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions
If Perl is an option, you can try this:
perl -lne 'print $1 if /(regex)/' file
To implement case-insensitive matching, add the i modifier
perl -lne 'print $1 if /(regex)/i' file
To print everything AFTER the match:
perl -lne 'if ($found){print} else{if (/regex(.*)/){print $1; $found++}}' textfile
To print the match and everything after the match:
perl -lne 'if ($found){print} else{if (/(regex.*)/){print $1; $found++}}' textfile
If you are only interested in the last line of input and you expect to find only one match (for example a part of the summary line of a shell command), you can also try this very compact code, adopted from How to print regexp matches using `awk`?:
$ echo "xxx yyy zzz" | awk '{match($0,"yyy",a)}END{print a[0]}'
yyy
Or the more complex version with a partial result:
$ echo "xxx=a yyy=b zzz=c" | awk '{match($0,"yyy=([^ ]+)",a)}END{print a[1]}'
b
Warning: the awk match() function with three arguments only exists in gawk, not in mawk
Here is another nice solution using a lookbehind regex in grep instead of awk. This solution has lower requirements to your installation:
$ echo "xxx=a yyy=b zzz=c" | grep -Po '(?<=yyy=)[^ ]+'
b
Off topic, this can be done using the grep also, just posting it here in case if anyone is looking for grep solution
echo 'xxx yyy zzze ' | grep -oE 'yyy'
Using sed can also be elegant in this situation. Example (replace line with matched group "yyy" from line):
$ cat testfile
xxx yyy zzz
yyy xxx zzz
$ cat testfile | sed -r 's#^.*(yyy).*$#\1#g'
yyy
yyy
Relevant manual page: https://www.gnu.org/software/sed/manual/sed.html#Back_002dreferences-and-Subexpressions
If you know what column the text/pattern you're looking for (e.g. "yyy") is in, you can just check that specific column to see if it matches, and print it.
For example, given a file with the following contents, (called asdf.txt)
xxx yyy zzz
to only print the second column if it matches the pattern "yyy", you could do something like this:
awk '$2 ~ /yyy/ {print $2}' asdf.txt
Note that this will also match basically any line where the second column has a "yyy" in it, like these:
xxx yyyz zzz
xxx zyyyz
echo "abc123def" | awk '
function MATCH(haystack, needle, ltrim, rtrim)
{
if(ltrim == 0 && !length(ltrim))
ltrim = 0;
if(rtrim == 0 && !length(rtrim))
rtrim = 0;
return substr(haystack, match(haystack, needle) + ltrim, RLENGTH - ltrim - rtrim);
}
{
print $0 " - " MATCH($0, "123"); # 123
print $0 " - " MATCH($0, "[0-9]*d", 0, 1); # 123
print $0 " - " MATCH($0, "1234"); # Nothing printed
}'