Do various substitutions but only before a character - regex

I am doing something like this:
echo 'foo_bar_baz=foo_bar_baz' | sed -r 's/_([[:alnum:]])/\U\1/g'
and getting as result:
fooBarBaz=fooBarBaz
Is there a way of getting fooBarBaz=foo_bar_baz instead?
I tryed to do this, non-greedy:
echo 'foo_bar_baz=foo_bar_baz' | sed -r 's/([^=].*?)_([[:alnum:]])/\1\U\2/g'
but the result is this:
foo_bar_baz=foo_barBaz
What I need is to convert from:
foo_bar_baz=foo_bar_baz
to:
fooBarBaz=foo_bar_baz

EDIT: Adding more Generic solution which will work for more than 3 values before = too.
awk '
BEGIN{
FS=OFS="="
}
{
num=split($1,array,"_")
for(i=2;i<=num;i++){
val=(val?val:"")toupper(substr(array[i],1,1)) substr(array[i],2)
}
$1=array[1] val
val=""
}
1
' Input_file
This should be an easy task for awk.
echo 'foo_bar_baz=foo_bar_baz' | awk '
BEGIN{
FS=OFS="="
}
{
split($1,array,"_")
$1=array[1] toupper(substr(array[2],1,1)) substr(array[2],2) toupper(substr(array[3],1,1)) substr(array[3],2)
}
1'
To simply remove _ in first part use(this will not make letter capital):
echo 'foo_bar_baz=foo_bar_baz' | awk 'BEGIN{FS=OFS="="}{gsub(/_/,"",$1)} 1'

You may use
s='foo_bar_baz=foo_bar_baz'
sed -E ':a;s/^([^=_]*)_([[:alnum:]])/\1\U\2/g; ta' <<< "$s"
# => fooBarBaz=foo_bar_baz
See the online sed demo
Details
:a - define an a label to jump to if the substitution is a success
s/^([^=_]*)_([[:alnum:]])/\1\U\2/g - find
^ - start of string
([^=_]*) - Group 1 (\1 in the replacement pattern): any 0+ chars other than = and _
_ - an underscore
([[:alnum:]]) - Group 2 (\2 in the replacement pattern): an alphanumeric char
\1\U\2 - Group 1 value and then an uppercased Group 2 value
ta - t is a branch command making sed go back to the a label and repeat matching.

This might work for you (GNU sed):
sed -E 'h;s/_(.)/\u\1/g;G;s/=.*=/=/' file
Make a copy of the current line. Remove all _'s and uppercase the following characters. Append the copy and replace everything between ='s with a single =.
An alternative:
sed -E ':a;s/_(.*=)/\u\1/;ta' file

With GNU awk for the 3rd arg to match():
$ echo 'foo_bar_baz=foo_bar_baz' |
awk '{while (match($0,/(.*)_(.)(.*=.*)/,a)) $0 = a[1] toupper(a[2]) a[3]} 1'
fooBarBaz=foo_bar_baz
Note that the above solution is not restricted to any specific number of _s nor any specific letter following the underscores:
$ echo 'wee_sleekit_cowrin_timrous_beastie=foo_bar_baz' |
awk '{while (match($0,/(.*)_(.)(.*=.*)/,a)) $0 = a[1] toupper(a[2]) a[3]} 1'
weeSleekitCowrinTimrousBeastie=foo_bar_baz
Change _(.) to _([[:lower:]]) if you only want the underscores removed when followed by a lower case letter.

Related

sed - get only text without extension

How do I remove the extension in this SED statement?
Through
sed 's/.* - //'
File content
2021-04-21_#fluffyban_6953588770591509765.mp4 - Filename.mp4
Actual
Filename.mp4
Desired
Filename
With your shown samples only. This could be done with simple codes in awk,sed and perl as follows.
1st solution: Using sed, perform simple substitutions and you will get desired output.
sed 's/.*- //;s/\.mp4$//' Input_file
2nd solution: Using awk its more simpler, creating different field separator and just print appropriate 2nd last column.
awk -F'- |.mp4' '{print $(NF-1)}' Input_file
3rd solution: Using substitution method in awk to get the required value as per OP's requirement.
awk '{gsub(/.*- |\.mp4$/,"")} 1' Input_file
4th solution: With perl one liner we could grab the appropriate needed value by setting field separators as dash spaces and .mp4 as follows:
perl -a -F'-\s+|\.mp4' -ne 'print "$F[$#F-1]\n";' Input_file
The Bash way (which works in most similar shells such us zsh,sh,ksh) is:
fn="2021-04-21_#fluffyban_6953588770591509765.mp4 - Filename.mp4"
base=${fn%.*}
ext=${fn#$base.}
echo "$base"
echo "$ext"
Prints:
2021-04-21_#fluffyban_6953588770591509765.mp4 - Filename
mp4
You can use
#!/bin/bash
s='2021-04-21_#fluffyban_6953588770591509765.mp4 - Filename.mp4'
sed -n 's/.* - \([^.]*\).*/\1/p' <<< "$s"
# => Filename
See the online demo.
Details:
-n - suppress default line output
s/ - substitute found pattern
.* - \([^.]*\).* - any text, space, -, space, then any zero or more chars other than a dot captured into Group 1, and then any text
/\1/ - replace found matches with Group 1 value
p - print the result of the substitution.
Using gnu awk you can also use a capture group to get the filename
match($0, /.* - ([^.]+)\.mp4$/, a) {print a[1]}' file
Regex explanation
.* - Match the last occurrence of -
( Capture group 1 (Referred to by a[1] in the awk example)
[^.]+ Match 1+ times any char except a dot
) Close group 1
\.mp4$ Match .mp4 at the end of the string
Awk explanation
awk '
match($0, /.* - ([^.]+)\.mp4$/, a) { # Test if the line using $0 matches the pattern
print a[1] # Print the value of group 1
}
' file
Yet another awk:
awk '{sub(/\.[^.]+$/, ""); print $NF}' file
Filename
gawk/mawk/mawk2 'BEGIN { FS = "( \- |[.][^. ]+$)"
} NF > 2 { print $(NF-1) }'
no substr(), index(), match(), or sub() needed. If you're VERY certain " - " can only occur once, then
awk 'BEGIN { FS = "(^.* \- |[.][^. ]+$)"; OFS = "" } —-NF'

How to extract text between first 2 dashes in the string using sed or grep in shell

I have the string like this feature/test-111-test-test.
I need to extract string till the second dash and change forward slash to dash as well.
I have to do it in Makefile using shell syntax and there for me doesn't work some regular expression which can help or this case
Finally I have to get smth like this:
input - feature/test-111-test-test
output - feature-test-111- or at least feature-test-111
feature/test-111-test-test | grep -oP '\A(?:[^-]++-??){2}' | sed -e 's/\//-/g')
But grep -oP doesn't work in my case. This regexp doesn't work as well - (.*?-.*?)-.*.
Another sed solution using a capture group and regex/pattern iteration (same thing Socowi used):
$ s='feature/test-111-test-test'
$ sed -E 's/\//-/;s/^(([^-]*-){3}).*$/\1/' <<< "${s}"
feature-test-111-
Where:
-E - enable extended regex support
s/\//-/ - replace / with -
s/^....*$/ - match start and end of input line
(([^-]-){3}) - capture group #1 that consists of 3 sets of anything not - followed by -
\1 - print just the capture group #1 (this will discard everything else on the line that's not part of the capture group)
To store the result in a variable:
$ url=$(sed -E 's/\//-/;s/^(([^-]*-){3}).*$/\1/' <<< "${s}")
$ echo $url
feature-test-111-
You can use awk keeping in mind that in Makefile the $ char in awk command must be doubled:
url=$(shell echo 'feature/test-111-test-test' | awk -F'-' '{gsub(/\//, "-", $$1);print $$1"-"$$2"-"}')
echo "$url"
# => feature-test-111-
See the online demo. Here, -F'-' sets the field delimiter as -, gsub(/\//, "-", $1) replaces / with - in Field 1 and print $1"-"$2"-" prints the value of --separated Field 1 and 2.
Or, with a regex as a field delimiter:
url=$(shell echo 'feature/test-111-test-test' | awk -F'[-/]' '{print $$1"-"$$2"-"$$3"-"}')
echo "$url"
# => feature-test-111-
The -F'[-/]' option sets the field separator to - and /.
The '{print $1"-"$2"-"$3"-"}' part prints the first, second and third value with a separating hyphen.
See the online demo.
To get the nth occurrence of a character C you don't need fancy perl regexes. Instead, build a regex of the form "(anything that isn't C, then C) for n times":
grep -Eo '([^-]*-){2}' | tr / -
With sed and cut
echo feature/test-111-test-test| cut -d'-' -f-2 |sed 's/\//-/'
Output
feature-test-111
echo feature/test-111-test-test| cut -d'-' -f-2 |sed 's/\//-/;s/$/-/'
Output
feature-test-111-
You can use the simple BRE regex form of not something then that something which is [^-]*- to get all characters other than - up to a -.
This works:
echo 'feature/test-111-test-test' | sed -nE 's/^([^/]*)\/([^-]*-[^-]*-).*/\1-\2/p'
feature-test-111-
Another idea using parameter expansions/substitutions:
s='feature/test-111-test-test'
tail="${s//\//-}" # replace '/' with '-'
# split first field from rest of fields ('-' delimited); do this 3x times
head="${tail%%-*}" # pull first field
tail="${tail#*-}" # drop first field
head="${head}-${tail%%-*}" # pull first field; append to previous field
tail="${tail#*-}" # drop first field
head="${head}-${tail%%-*}-" # pull first field; append to previous fields; add trailing '-'
$ echo "${head}"
feature-test-111-
A short sed solution, without extended regular expressions:
sed 's|\(.*\)/\([^-]*-[^-]*\).*|\1-\2|'

How to match a regex 1 to 3 times in a sed command?

Problem
I want to get any text that consists of 1 to three digits followed by a % but without the % using sed.
What I tried
So i guess the following regex should match the right pattern : [0-9]{1,3}%.
Then i can use this sed command to catch the three digits and only print them :
sed -nE 's/.*([0-9]{1,3})%.*/\1/p'
Example
However when i run it, it shows :
$ echo "100%" | sed -nE 's/.*([0-9]{1,3})%.*/\1/p'
0
instead of
100
Obviously, there's something wrong with my sed command and i think the problem comes from here :
[0-9]{1,3}
which apparently doesn't do what i want it to do.
edit:
Solution
The .* at the start of sed -nE 's/.*([0-9]{1,3})%.*/\1/p' "ate" the two first digits.
The right way to write it, according to Wicktor's answer, is :
sed -nE 's/(.*[^0-9])?([0-9]{1,3})%.*/\2/p'
The .* grabs all digits leaving just the last of the three digits in 100%.
Use
sed -nE 's/(.*[^0-9])?([0-9]{1,3})%.*/\2/p'
Details
(.*[^0-9])? - (Group 1) an optional sequence of any 0 or more chars up to the non-digit char including it
([0-9]{1,3}) - (Group 2) one to three digits
% - a % char
.* - the rest of the string.
The match is replaced with Group 2 contents, and that is the only value printed since n suppresses the default line output.
It will be easier to use a cut + grep option:
echo "abc 100%" | cut -d% -f1 | grep -oE '[0-9]{1,3}'
100
echo "100%" | cut -d% -f1 | grep -oE '[0-9]{1,3}'
100
Or else you may use this awk:
echo "100%" | awk 'match($0, /[0-9]{1,3}%/){print substr($0, RSTART, RLENGTH-1)}'
100
Or else if you have gnu grep then use -P (PCRE) option:
echo "abc 100%" | ggrep -oP '[0-9]{1,3}(?=%)'
100
This might work for you (GNU sed):
sed -En 's/.*\<([0-9]{1,3})%.*/\1/p' file
This is a filtering exercise, so use the -n option.
Use a back reference to capture 1 to 3 digits, followed by % and print the result if successful.
N.B. The \< ensures the digits start on a word boundary, \b could also be used. The -E option is employed to reduce the number of back slashes which would normally be necessary to quote (,),{ and } metacharacters.

sed Back-references used to replace

there is a string a_b_c_d. I want to replace _ with - in the string between a_ and _d. Below is processing.
echo "a_b_c_d" | sed -E 's/(.+)_(.+)_(.+)/\1`s/_/-/g \2`\3/g'
But it does not work. how can I reuse the \2 to replace its content?
Perl allows to use code in replacement section with e modifier
$ echo 'a_b_c_d' | perl -pe 's/a_\K.*(?=_d)/$&=~tr|_|-|r/e'
a_b-c_d
$ echo 'x_a_b_c_y' | perl -pe 's/x_\K.*(?=_y)/$&=~tr|_|-|r/e'
x_a-b-c_y
$&=~tr|_|-|r here $& is the matched portion, and tr is applied on that to replace _ to -
a_\K this will match a_ but won't be part of matched portion
(?=_d) positive lookahead to match _d but won't be part of matched portion
With sed (tested on GNU sed 4.2.2, not sure of syntax for other versions)
$ echo 'a_b_c_d' | sed -E ':a s/(a_.*)_(.*_d)/\1-\2/; ta'
a_b-c_d
$ echo 'x_a_b_c_y' | sed -E ':a s/(x_.*)_(.*_y)/\1-\2/; ta'
x_a-b-c_y
:a label a
s/(a_.*)_(.*_d)/\1-\2/ substitute one _ with - between a_ and _d
ta go to label a as long as the substitution succeeds
gnu sed:
$ sed -r 's/_/-/g;s/(^[^-]+)-/\1_/;s/-([^-]+$)/_\1/' <<<'x_a_b_c_y'
x_a-b-c_y
The idea is, replacing all _ by -, then restoring the ones you want to keep.
update
if the fields separated by _ contains -, we can make use ge of gnu sed:
sed -r 's/(^[^_]+_)(.*)(_[^_]+$)/echo "\1"$(echo "\2"\|sed "s|_|-|g")"\3"/ge'
For example we want ----_f-o-o_b-a-r_---- to be ----_f-o-o-b-a-r_----:
sed -r 's/(^[^_]+_)(.*)(_[^_]+$)/echo "\1"$(echo "\2"\|sed "s|_|-|g")"\3"/ge' <<<'----_f-o-o_b-a-r_----'
----_f-o-o-b-a-r_----
Following Kent's suggestion, and if you do not need a general solution, this works:
$ echo 'a_b_c+d_x' | tr '_' '-' | sed -E 's/^([a-z]+)-(.+)-([a-z]+)$/\1_\2_\3/g'
$ a_b-c+d_x
The character classes should be adjusted to match the leading and trailing parts of your input string. Fails, of course, if a or x contain the '-' character.

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd - should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After
Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.
This might work for you (GNU sed):
sed 's/-[^-]*//2g' file
You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u
#EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub():
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.
awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
Always print the 1st field (print $1), followed by:
If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).
This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:
$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u
awk '$0 = $2 ? $1 FS $2 : $1' FS=-
Result
After-u
After
This will do it in awk:
echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'