Sed - replace value in file with regex match in another file - regex

I am trying to code a bash script in a build process where we only have a few tools (like grep, sed, awk) and I am trying to replace a value in an ini file with a value from a regular expression match in another.
So, I am matching something like "^export ADDRESS=VALUE" in file export_vars.h and putting VALUE into an ini file called config.ini in a line with "ADDRESS=[REPLACE]". So, I am trying to replace [REPLACE] with VALUE with one command in bash.
I have come across that sed can take an entire file and insert it into another with a command like
sed -i -e "/[REPLACE]/r export_vars.h" config.ini
I need to somehow refine this command to only read the pattern match from export_vars.h. Does anyone know how to do this?

sed is for simple substitutions on individual lines, that is all. You need to be looking at awk for what you're trying to do. Something like:
awk '
BEGIN { FS=OFS="=" }
NR==FNR {
if ( $1 == "export ADDRESS" ) {
value = $2
}
next
}
{ sub(/\[REPLACE\]/,value); print }
' export_vars.h config.ini
Untested, of course, since you didn't provide testable sample input/output.

Another in awk:
$ awk '/ADDRESS/{if(a!="")$0=a;else a=$NF}NR>FNR' export_vars.h config.ini
ADDRESS=VALUE
Explained:
$ awk '
/ADDRESS/ { # when ADDRESS is found in record
if(a!="") $0=a # if a is set (from first file), use it
else a=$NF } # otherwise set a with the last field
NR>FNR # print all record of the last file
' export_vars.h config.ini # mind the order
This solution does not tolerate space around = since $0 is replaced with $NF from the other file.

Related

Insert text into line if that line doesn't contain another string using sed

I am merging a number of text files on a linux server but the lines in some differ slightly and I need to unify them.
For example some files will have line like
id='1244' group='american' name='fred',american
Other files will be like
id='2345' name='frank', english
finally others will be like
id='7897' group='' name='maria',scottish
what I need to do is, if group='' or group is not in the string at all I need to add it somewhere before the comma setting it to the text after the comma so in the 2nd example above the line would become:
id='2345' name='frank' group='english',english
and the same in the last example which would become
id='7897' name='maria' group='scottish',scottish
This is going into a bash script. I can't actually delete the line and add to the end of the file as it relates to the following line.
I've used the following:
sed -i.bak 's#group=""##' file
which deletes the group="" string so the lines will either contain group='something' or wont contain it at all and that works
Then I tried to add the group if it doesn't exist using the following:
sed -i.bak '/group/! s#,(.*$)#group="\1",\1#' file
but that throws up the error
sed: -e expression #1, char 38: invalid reference \1 on `s' command's RHS
EDIT by Ed Morton to create a single sample input file and expected output:
Sample Input:
id='1244' group='american' name='fred',american
foo
id='2345' name='frank', english
bar
id='7897' group='' name='maria',scottish
Expected Output:
id='1244' group='american' name='fred',american
foo
id='2345' name='frank' group='english',english
bar
id='7897' name='maria' group='scottish',scottish
sed -r "
/group=''/ s/// # group is empty, remove it
/group=/! s/,[[:blank:]]*(.+)/ group='\\1',\\1/ # group is missing, add it
" file
id='1244' group='american' name='fred',american
foo
id='2345' name='frank' group='english',english
bar
id='7897' name='maria' group='scottish',scottish
The foo and bar lines are untouched because the s/// command did not match a comma followed by characters.
something like
sed '
/^[^,]*group[^,]*,/ ! {
s/, *\(.*\)/ group='\''\1'\'', \1/
}
/^[^,]*group='\'\''/ {
s/group='\'\''\([^,]*\), *\(.*\)/group='\''\2'\''\1, \2/
}
'
This GNU awk may help:
awk -v sq="'" '
BEGIN{RS="[ ,\n]+"; FS="="; found=0}
$1=="group"{
if($2==sq sq)
{next}
else
{found=1}
}
NF>1{
printf "%s=%s ",$1,$2
}
NF==1{
if(!found)
{printf "group=%s",$1}
print ","$1
found=0
}
' file
The script relies on the record separator RS which is set to get all key='value' pairs.
If the key group isn't found or is empty, it is printed when reaching a record with only one field.
Note that the variable sq holds the single quote character and is used to detect empty group field.
Sed can be pretty ugly. And your data format appears to be somewhat inconsistent. This MIGHT work for you:
$ sed -e "/group='[a-z]/b e" -e "s/group='' *//" -e "s/,\([a-z]*\)$/ group='\1', /" -e ':e' input.txt
Broken out for easier reading, here's what we're doing:
/group='[a-z]/b e - If the line contains a valid group, branch to the end.
s/group='' *// - Remove any empty group,
s/,\([a-z]*\)$/ group='\1', / - add a new group based on your specs
:e - branch label for the first command.
And then the default action is to print the line.
I really don't like manipulating data this way. It's prone to error, and you'll be further ahead reading this data into something that accurately stores its data structure, then prints the data according to a new structure. A more robust solution would likely be tied directly to whatever is producing or consuming this data, and would not sit in the middle like this.

Extract Filename before date Bash shellscript

I am trying to extract a part of the filename - everything before the date and suffix. I am not sure the best way to do it in bashscript. Regex?
The names are part of the filename. I am trying to store it in a shellscript variable. The prefixes will not contain strange characters. The suffix will be the same. The files are stored in a directory - I will use loop to extract the portion of the filename for each file.
Expected input files:
EXAMPLE_FILE_2017-09-12.out
EXAMPLE_FILE_2_2017-10-12.out
Expected Extract:
EXAMPLE_FILE
EXAMPLE_FILE_2
Attempt:
filename=$(basename "$file")
folder=sed '^s/_[^_]*$//)' $filename
echo 'Filename:' $filename
echo 'Foldername:' $folder
$ cat file.txt
EXAMPLE_FILE_2017-09-12.out
EXAMPLE_FILE_2_2017-10-12.out
$
$ cat file.txt | sed 's/_[0-9]*-[0-9]*-[0-9]*\.out$//'
EXAMPLE_FILE
EXAMPLE_FILE_2
$
No need for useless use of cat, expensive forks and pipes. The shell can cut strings just fine:
$ file=EXAMPLE_FILE_2_2017-10-12.out
$ echo ${file%%_????-??-??.out}
EXAMPLE_FILE_2
Read all about how to use the %%, %, ## and # operators in your friendly shell manual.
Bash itself has regex capability so you do not need to run a utility. Example:
for fn in *.out; do
[[ $fn =~ ^(.*)_[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2} ]]
cap="${BASH_REMATCH[1]}"
printf "%s => %s\n" "$fn" "$cap"
done
With the example files, output is:
EXAMPLE_FILE_2017-09-12.out => EXAMPLE_FILE
EXAMPLE_FILE_2_2017-10-12.out => EXAMPLE_FILE_2
Using Bash itself will be faster, more efficient than spawning sed, awk, etc for each file name.
Of course in use, you would want to test for a successful match:
for fn in *.out; do
if [[ $fn =~ ^(.*)_[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2} ]]; then
cap="${BASH_REMATCH[1]}"
printf "%s => %s\n" "$fn" "$cap"
else
echo "$fn no match"
fi
done
As a side note, you can use Bash parameter expansion rather than a regex if you only need to trim the string after the last _ in the file name:
for fn in *.out; do
cap="${fn%_*}"
printf "%s => %s\n" "$fn" "$cap"
done
And then test $cap against $fn. If they are equal, the parameter expansion did not trim the file name after _ because it was not present.
The regex allows a test that a date-like string \d\d\d\d-\d\d-\d\d is after the _. Up to you which you need.
Code
See this code in use here
^\w+(?=_)
Results
Input
EXAMPLE_FILE_2017-09-12.out
EXAMPLE_FILE_2_2017-10-12.out
Output
EXAMPLE_FILE
EXAMPLE_FILE_2
Explanation
^ Assert position at start of line
\w+ Match any word character (a-zA-Z0-9_) between 1 and unlimited times
(?=_) Positive lookahead ensuring what follows is an underscore _ character
Simply with sed:
sed 's/_[^_]*$//' file
The output:
EXAMPLE_FILE
EXAMPLE_FILE_2
----------
In case of iterating through the list of files with extension .out - bash solution:
for f in *.out; do echo "${f%_*}"; done
awk -F_ 'NF-=1' OFS=_ file
EXAMPLE_FILE
EXAMPLE_FILE_2
Could you please try awk solution too, which will take care of all the .out files, note this has ben written and tested in GNU awk.
awk --re-interval 'FNR==1{if(val){close(val)};split(FILENAME, array,"_[0-9]{4}-[0-9]{2}-[0-9]{2}");print array[1];val=FILENAME;nextfile}' *.out
Also my awk version is old so I am using --re-interval, if you have latest version of awk you may need not to use it then.
Explanation and Non-one liner fom of solution: Adding a non-one liner form of solution too here with explanation.
awk --re-interval '##Using --re-interval for supporting ERE in my OLD awk version, if OP has new version of awk it could be removed.
FNR==1{ ##Checking here condition that when very first line of any Input_file is being read then do following actions.
if(val){ ##Checking here if variable named val value is NOT NULL then do following.
close(val) ##close the Input_file named which is stored in variable val, so that we will NOT face problem of TOO MANY FILES OPENED, so it will be like one file read close it in background then.
};
split(FILENAME, array,"_[0-9]{4}-[0-9]{2}-[0-9]{2}");##Splitting FILENAME(which will have Input_file name in it) into array named array only, whose separator is a 4 digits-2 digits- then 2 digits, actually this will take care of YYYY-MM-DD format in Input_file(s) and it will be easier for us to get the file name part.
print array[1]; ##Printing array 1st element here.
val=FILENAME; ##Storing FILENAME variable value which will have current Input_file name in it to variable named val, so that we could close it in background.
nextfile ##nextfile as it name suggests it will skip all the lines in current line and jump onto the next file to save some cpu cycles of our system.
}
' *.out ##Mentioning all *.out Input_file(s) here.

How can I use sed to find a line starting with AAA but NOT end with BBB

I'm trying to create a script to append oracleserver to /etc/hosts as an alias of localhost. Which means I need to:
Locate the line that ^127.0.0.1 and NOT oracleserver$
Then, append oracleserver to this line
I know the best practice is probably using negative look ahead. However, sed does not have look around feature: What's wrong with my lookahead regex in GNU sed?. Can anyone provide me some possible solutions?
sed -i '/oracleserver$/! s/^127\.0\.0\.1.*$/& oracleserver/' filename
/oracleserver$/! - on lines not ending with oracleserver
^127\.0\.0\.1.*$ - replace the whole line if it is starting with 127.0.0.1
& oracleserver - with the line plus a space separator ' ' (required) and oracleserver after that
Just use awk with && to combine the two conditions:
awk '/^127\.0\.0\.1/ && !/oracleserver$/ { $0 = $0 "oracleserver" } 1' file
This appends the string when the first pattern is matched but the second one isn't. The 1 at the end is always true, so awk prints each line (the default action is { print }).
I wouldn't use sed but instead perl:
Locate the line that ^127.0.0.1 and NOT oracleserver$
perl -pe 'if ( m/^127\.0\.0\.1/ and not m/oracleserver$/ ) { s/$/oracleserver/ }'
Should do the trick. You can add -i.bak to inplace edit too.

Replace fileds with AWK by using a different file as translation list

I am using awk in Windows. I have a script called test.awk.
This script should read a file and replace a certain filed (key) with a value.
The key->value list is in a file called translate.txt.
It's structure is like this:
e;Emil
f;Friedrich
g;Gustaf
h;Heinrich
i;Ida
In a simple example, my input file would be
e,111
f,222
g,333
h,444
i,555
..
so the output should be
Emil,111
Friedrich,222
Gustaf,333
Heinrich,444
Ida,555
..
The script I an having is using a user function key2value to do the replacement, but I don't succeed in giving this function another file translate.txt as a source. See my code:
{
FS=","
d=key2value($1)
print d "," $2
}
function key2value(b)
{
#this should use another file, not the currently processed one
FILENAME="translate.txt"
begin
{
FS=";"
if ($1=b)
{
return $2
}
end
}
Another thing, the FS is buggy to, it starts working from the second line only.
This simple one-liner will do the trick:
awk 'FNR==NR{a[$1]=$2;next}{print a[$1],$2}' FS=',|;' OFS=',' translate input
Emil,111
Friedrich,222
Gustaf,333
Heinrich,444
Ida,555
In script form:
BEGIN { # The BEGIN block is executed before the files are read
FS="[,;]" # Set the FS to be either a comma or semi-colon
OFS="," # Set the OFS (output field separator) to be a comma
}
FNR==NR { # FNR==NR only true when reading the first file
key2value[$1]=$2; # Create associative array of key,value pairs
next # Grab the next line in the first file
}
{ # Now in the second file, print looked up value and $2
print key2value[$1],$2
}
Run like:
awk -f translate.awk translate.txt input.txt
There are numerous error with your script, you should take a read of Effective AWK Programming.
Code for GNU sed (Windows quoting):
sed -r "s#(\S+);(\S+)#/^\1,/s/.*,(\\S+)/\2,\\1/#" file1|sed -rf - file2
Shell session:
>type file1 file2
file1
e;Emil
f;Friedrich
g;Gustaf
h;Heinrich
i;Ida
file2
e,111
f,222
g,333
h,444
i,555
>sed -r "s#(\S+);(\S+)#/^\1,/s/.*,(\\S+)/\2,\\1/#" file1|sed -rf - file2
Emil,111
Friedrich,222
Gustaf,333
Heinrich,444
Ida,555

sed: remove strings between two patterns leaving the 2nd pattern intact (half inclusive)

I am trying to filter out text between two patterns, I've seen a dozen examples but didn't manage to get exactly what I want:
Sample input:
START LEAVEMEBE text
data
START DELETEME text
data
more data
even more
START LEAVEMEBE text
data
more data
START DELETEME text
data
more
SOMETHING that doesn't start with START
# sometimes it starts with characters that needs to be escaped...
I want to stay with:
START LEAVEMEBE text
data
START LEAVEMEBE text
data
more data
SOMETHING that doesn't start with START
# sometimes it starts with characters that needs to be escaped...
I tried running sed with:
sed 's/^START DELETEME/,/^[^ ]/d'
And got an inclusive removal, I tried adding "exclusions" (not sure if I really understand this syntax well):
sed 's/^START DELETEME/,/^[^ ]/{/^[^ ]/!d}'
But my "START DELETEME" line is still there (yes, I can grep it out, but that's ugly :) and besides - it DOES remove the empty line in this sample as well and I'd like to leave empty lines if they are my end pattern intact )
I am wondering if there is a way to do it with a single sed command.
I have an awk script that does this well:
BEGIN { flag = 0 }
{
if ($0 ~ "^START DELETEME")
flag=1
else if ($0 !~ "^ ")
flag=0
if (flag != 1)
print $0
}
But as you know "A is for awk which runs like a snail". It takes forever.
Thanks in advance.
Dave.
Using a loop in sed:
sed -n '/^START DELETEME/{:l n; /^[ ]/bl};p' input
GNU sed
sed '/LEAVEMEBE/,/DELETEME/!d;{/DELETEME/d}' file
I would stick with awk:
awk '
/LEAVE|SOMETHING/{flag=1}
/DELETE/{flag=0}
flag' file
But if you still prefer sed, here's another way:
sed -n '
/LEAVE/,/DELETE/{
/DELETE/b
p
}
' file