sed search and replace only before a char - regex

Is there a way to use sed (with potential other command) to transform all the keys in a file that lists key-values like that :
a.key.one-example=a_value_one
a.key.two-example=a_value_two
and I want that
A_KEY_ONE_EXAMPLE=a_value_one
A_KEY_TWO_EXAMPLE=a_value_two
What I did so far :
sed -e 's/^[^=]*/\U&/'
it produced this :
A.KEY.ONE-EXAMPLE=a_value_one
A.KEY.TWO-EXAMPLE=a_value_two
But I still need to replace the "." and "-" on left part of the "=". I don't think it is the right way to do it.

It should be done very easily done in awk. awk is the better tool IMHO for this task, it keeps it simple and easy.
awk 'BEGIN{FS=OFS="="} {$1=toupper($1);gsub(/[.-]/,"_",$1)} 1' Input_file
Simple explanation:
Make field separator and output field separator as =
Then use awk's default function named toupper which will make $1(first field) upper case and save it into $1 itself.
Using gsub to substitute . OR - with _ in $1 as per requirement.
use 1 which is idiomatic way to print a line in awk.

This might work for you (GNU sed):
sed -E 'h;y/.-/__/;s/.*/\U&/;G;s/=.*=/=/' file
Make a copy of the current line.
Translate . and - to _.
Capitalize the whole line.
Append the copy.
Remove the centre portion.

You can use
sed ':a;s/^\([^=]*\)[.-]\([^=]*\)/\U\1_\2/g;ta' file > newfile
Details:
:a - sets an a label
s/^\([^=]*\)[.-]\([^=]*\)/\U\1_\2/g - replaces ^\([^=]*\)[.-]\([^=]*\) pattern that matches
^ - start of string
\([^=]*\) - Group 1 (\1): any zero or more chars other than =
[.-] - a dot or hyphen
\([^=]*\) - Group 2 (\2): any zero or more chars other than =
ta - jumps back to a label position upon successful replacement
and replaces with Group 2 + _ + Group 1
See the online demo:
#!/bin/bash
s='a.key.one-example=a_value_one
a.key.two-example=a_value_two'
sed ':a;s/^\([^=]*\)[.-]\([^=]*\)/\U\1_\2/g;ta' <<< "$s"
Output:
A_KEY_ONE_EXAMPLE=a_value_one
A_KEY_TWO_EXAMPLE=a_value_two

Related

Convert regex positive look ahead to sed operation

I would like to sed to find and replace every occurrence of - with _ but only before the first occurrence of = on every line.
Here is a dataset to work with:
ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"
In the end the dataset should look like this:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
I found this regex will match properly.
\-(?=.*=)
However the regex uses positive lookaheads and it appears that sed (even with -E, -e or -r) dose not know how to work with positive lookaheads.
I tried the following but keep getting Invalid preceding regular expression
cat dataset.txt | sed -r "s/-(?=.*=)/_/g"
Is it possible to convert this in a usable way with sed?
Note, I do not want to use perl. However I am open to awk.
You can use
sed ':a;s/^\([^=]*\)-/\1_/;ta' file
See the online demo:
#!/bin/bash
s='ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"'
sed ':a; s/^\([^=]*\)-/\1_/;ta' <<< "$s"
Output:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
Details:
:a - setting a label named a
s/^\([^=]*\)-/\1_/ - find any zero or more chars other than a = char from the start of string (while capturing into Group 1 (\1)) and then matches a - char, and replaces with Group 1 value (\1) and a _ (that replaces the found - char)
ta - jump to lable a location upon successful replacement. Else, stop.
You might also use awk setting the field separator to = and replace all - with _ for the first field.
To print only the replaced lines:
awk 'BEGIN{FS=OFS="="}gsub("-", "_", $1)' file
Output
ke_y_0_1="foo"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
If you want to print all lines:
awk 'BEGIN{FS=OFS="="}{gsub("-", "_", $1);print}' file

can sed replace words in pattern substring match in one line?

original line in file sed.txt:
outer_string_PATTERN_string(PATTERN_And_PATTERN_PATTERN_i)PATTERN_outer_string(i_PATTERN_inner)_outer_string
only need to replace PATTERN to pattern which in brackets, not lowercase, it could replace to other word.
expect result:
outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string
I could use ([^)]*) pattern to find the substring which would be replace some worlds in. But I can't use this pattern to index the substring's position, and it will replace the whole line's PATTERN to pattern.
:/tmp$ sed 's/([^)]*)/---/g' sed.txt
outer_string_PATTERN_string---PATTERN_outer_string---_outer_string
:/tmp$ sed '/([^)]*)/s/PATTERN/pattern/g' sed.txt
outer_string_pattern_string(pattern_And_pattern_pattern_i)pattern_outer_string(i_pattern_inner)_outer_string
I also tried to use the regex group in sed to capture and replace the words, but I can't figure out the command.
Can sed implement that? And how to achieve that? THX.
Can sed implement that?
It can be done using GNU sed and basic regular expressions
(BRE):
sed '
s/)/)\n/g
:1
s/\(([^)]*\)PATTERN\([^)]*)\n\)/\1pattern\2/
t1
s/\n//g
' < file
where
1st s inserts a newline after each )
2nd s replaces the last (* is greedy) PATTERN inside ()s with pattern
t loops back if a substitution was made
3rd s strips all inserted newlines
EDIT
2nd substitute command edited according to OP's suggestion
since there is no need to match \n inside ().
Can sed implement that?
Yes. But you do not want to do it in sed. Use other programming language, like Python, Perl, or awk.
how to achieve that?
Implementing non-greedy regex is not simple in sed. Basically, generally, it consists of:
taking chunk of the input
process the chunk
put it in hold space
shuffle hold with pattern space - extract what been already processed, what's not
repeat
shuffle with hold space
output
Anyway, the following script:
#!/bin/bash
sed <<<'outer_string_PATTERN_string(PATTERN_i_PATTERN_PATTERN_i)PATTERN_outer_string(i_PATTERN_inner)_outer_string' '
:loop;
/\([^(]*\)\(([^)]*)\)\(.*\)/{
# Lowercase the second part.
s//\1\L\2\E\n\3/;
# Mix with hold space.
G;
s/\(.*\)\n\(.*\)\n\(.*\)/\3\1\n\2/;
# Put processed stuff into hold spcae
h; s/\n.*//; x;
# Process the other stuff again.
s/.*\n//;
bloop;
};
# Is hold space empty?
x; /^$/!{
# Pattern space has trailing stuff - add it.
G; s/\n//;
# We will print it.
h;
# Clear hold space
s/.*//
};x;
'
outputs:
PATTERN_outer_string(i_pattern_inner)outer_string_PATTERN_string(pattern_i_pattern_pattern_i)_outer_string
As an alternative, it is easier to do this in gnu awk with RS that matches (...) substring:
awk -v RS='\\([^)]+)' '{gsub(/PATTERN/, "pattern", RT); ORS=RT} 1' file
outer_string_PATTERN_string(pattern_i_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string
Steps:
RS='\\([^)]+)' captures a (...) string as record separator
gsub function then replaces PATTERN with pattern in matched text i.e. RT
ORS=RT sets ORS as the new modified RT
1 prints each record to stdout
Another alternative solution using lookahead assertion in a perl regex:
perl -pe 's/PATTERN(?=[^()]*\))/pattern/g' file
Solved by this:
:/tmp$ sed 's/(/\n(/g' sed.txt | sed 's/)/)\n/g' | sed '/([^)]*)/s/PATTERN/pattern/g' | sed ':a;N;$!ba;s/\n//g'
outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string
make pattern () in a new line
find the () lines and replace the PATTERN to pattern
merge multiple lines in one line
thanks for How can I replace a newline (\n) using sed?

Regex, select the line that starts with my condition, but take only the characters after space

I have a file that has content similiar below:
ptrn: 435324kjlkj34523453
Note1: rtewqtiojdfgkasdktewitogaidfks
Note2: t4rwe3tewrkterqwotkjrekqtrtlltre
I am trying to get characters after space at the line starts with "ptrn:" . I am trying the command below ;
>>> cat daily.txt | grep '^p.*$' > dailynew.txt
and I am getting the result in the new file:
ptrn: 435324kjlkj34523453
But I want only the characters after space, which are " 435324kjlkj34523453" to be written in the new file without "ptrn:" at the beginning.
The result should be like:
435324kjlkj34523453
How can establish this goal with an efficient regex?
You can use
grep -oP '^ptrn:\s*\K.*' daily.txt > dailynew.txt
awk '/^ptrn:/{print $2}' daily.txt > dailynew.txt
sed -n 's/^ptrn:[[:space:]]*\(.*\)/\1/p' daily.txt > dailynew.txt
See the online demo. All output 435324kjlkj34523453.
In the grep PCRE regex (enabled with -P option) the patterns match
^ - the startof string
ptrn: - a ptrn: substring
\s* - zero or more whitespaces
\K - match reset operator that clears the current match value
.* - the rest of the line.
In the awk command, ^ptrn: regex is used to find the line starting with ptrn: and then {print $2} prints the value after the first whitespace, from the second "column" (since the default field separator in awk is whitespace).
In sed, the command means
-n - suppresses the default line output
s - substitution command is used
^ptrn:[[:space:]]*\(.*\) - start of string, ptrn:, zero or more whitespace, and the rest of the line captured into Group 1
\1 - replaces the match with group 1 value
p - prints the result of the substitution.
You can use this sed:
sed -nE 's/^ptrn: (.*)/\1/p' file > output_file.txt

extract substring with SED

I have the next strings:
for example:
input1 = abc-def-ghi-jkl
input2 = mno-pqr-stu-vwy
I want extract the first word between "-"
for the fisrt string I want to get: def
if the input is the second string, I want to get: pqr
I want to use the command SED, Could you help me please?
Use
sed 's,^[^-]*-\([^-]*\).*,\1,' file
The string after the first - will be captured up to the second - and the rest will be matched, then the matched line will be replaced with the group text.
With bash:
var='input1 = abc-def-ghi-jkl'
var=${var#*-} # remove shortest prefix `*-`, this removes `input1 = abc-`
echo "${var%%-*}" # remove longest suffix `-*`, this removes `-ghi-jkl`
Or with awk:
awk -F'-' '{print $2}' <<<'input1 = abc-def-ghi-jkl'
Use - as input field separator and print the second field.
Or with cut:
cut -d'-' -f2 <<<'input1 = abc-def-ghi-jkl'
When you want to use sed, you can choose between solutions like
# Double processing
echo "$input1" | sed 's/[^-]*-//;s/-.*//'
# Normal approach
echo "$input1" | sed -r 's/^[^-]*-([^-]*)|-.*)/\1/g'
# Funny alternative
echo "$input1" | sed -r 's/(^[^-]*-|-.*)//g'
The obvious "external" tool would be cut. You can also look at a Bash builtin solution like
[[ ${input1} =~ ([^-]*)-([^-]*) ]] && printf %s "${BASH_REMATCH[2]}"
grep solution (in my opinion this is the most natural approach, as you are only trying to find matches to a regular expression - you are not looking to edit anything, so there should be no need for the more advanced command sed)
grep -oP '^[^-]*-\K[^-]*(?=-)' << EOF
> abc-qrs-bobo-the-clown
> 123-45-6789
> blah-blah-blah
> no dashes here
> mahi-mahi
> EOF
Output
qrs
45
blah
Explanation
Look at the inputs first, included here for completeness as a heredoc (more likely you would name your file as the last argument to grep.) The solution requires at least two dashes to be present in the string; in particular, for mahi-mahi it will find no match. If you want to find the second mahi as a match, you can remove the lookahead assertion at the end of the regular expression (see below).
The regular expression does this. First note the command options: -o to return only the matched substring, not the entire line; and -P to use Perl extensions. Then, the regular expression: start from the beginning of the line (^); look for zero or more non-dash characters followed by dash, and then (\K) discard this part of the required match from the substrings found to match the pattern. Then look for zero or more non-dash characters again - this will be returned by the command. Finally, require a dash following this pattern, but do not include it in the match. This is done with a lookahead (marked by (?= ... )).

Use sed to replace patterns that are not at the start of end of lines

Let's say I have input:
/a/b/c/d/e/
/a/b/c/d/e
a/b/c/d/e/
a/b/c/d/e
I'd like to replace all / that are not at the edges with + so the output is:
/a+b+c+d+e/
/a+b+c+d+e
a+b+c+d+e/
a+b+c+d+e
I've tried this command:
sed -e "s#\(.\)/\(.\)#\1+\2#g"
which is close but not quite:
/a+b/c+d/e/
/a+b/c+d/e
a+b/c+d/e/
a+b/c+d/e
presumably because the \(.\) overlap between successive / characters.
I don't believe sed has a null match operator for beginning or end of line. So, how is this done?
You can translate all slashes to + and then replace + (at the beginning or at the end) with a slash:
sed 'y/\//+/;s/^+\|+$/\//g;'
or if the OR operator isn't available:
sed 'y/\//+/;s/^+/\//;s/+$/\//;'
better if you change the delimiter to avoid to escape all literal slashes:
sed 'y~/~+~;s~^+\|+$~/~g;'
or if the OR operator isn't available:
sed 'y~/~+~;s~^+~/~;s~+$~/~;'
(where ^ is an anchor for the start of the line and $ for the end)
Other way: you can protect the slashes you want to preserve using a placeholder:
sed 's~^/~{`%{~;s~/$~{`%{~;y~/~+~;s~{`%{~/~g;'
If you have perl you can use lookarounds for this:
perl -pe 's~(?<!^)/(?!$)~+~g' file
Output:
/a+b+c+d+e/
/a+b+c+d+e
a+b+c+d+e/
a+b+c+d+e
Otherwise you can use this sed with 2 substitutes:
sed -r 's~(.)/(.)~\1+\2~g; s~(.)/(.)~\1+\2~g' file
Or this sed with labeling and looping:
sed -r ':a;s|(.)/(.)|\1+\2|g;ta' file
Here is a sed command that gives your output:
sed -r 's=(.)/\b=\1+=g;' file
usually / is uses as separator for the s command, but here we use =
the / is matched where there is something (.) before it and and we are at a word boundary
initially I tried (.)/(.) but that did not work:
The second dot was consumed and the next match would only start after it,
i.e. in x/y/< the second match would only see /z and not y/z
with \b the first match does not consume the y and the second match sees y/
This is the common and extremely useful sed idiom for doing jobs like this:
$ sed 's:a:aA:g; s:^/\|/$:aB:g; s:/:+:g; s:aB:/:g; s:aA:a:g' file
/a+b+c+d+e/
/a+b+c+d+e
a+b+c+d+e/
a+b+c+d+e
The 1st sub changes all as to aA. At that point there is no letter a in the input that is not followed by the letter A (we need to do this first to ensure that after our 2nd sub the only aBs in the input are as a result of that 2nd sub)
The 2nd sub changes all /s at the start or end of a line to aB. At that point the only aBs in the input are where there were originally /s at the start or end of the line.
The 3rd sub changes all remaining /s (i.e. those that were not at the start or end of the line) to +s.
The 4th sub restores the aBs back to the original front/end /s.
The 5th sub restores the aAs back to the original as.
This might work for you (GNU sed):
sed ':a;s/\([^\/]\)\/\([^\/]\)/\1+\2/g;ta' file
Or visually easier:
sed -r ':a;s#([^/])/([^/])#\1+\2#g;ta' file
It is really the same regexp twice:
sed 's/\([^\/]\)\/\([^\/]\)/\1+\2/g;s/\([^\/]\)\/\([^\/]\)/\1+\2/g' file