Linux Replace With Variable Containing Double Quotes - regex

I have read the following:
How Do I Use Variables In A Sed Command
How can I use variables when doing a sed?
Sed replace variable in double quotes
I have learned that I can use sed "s/STRING/$var1/g" to replace a string with the contents of a variable. However, I'm having a hard time finding out how to replace with a variable that contains double quotes, brackets and exclamation marks.
Then, hoping to escape the quotes, I tried piping my result though sed 's/\"/\\\"/g' which gave me another error sed: -e expression #1, char 7: unknown command: E'. I was hoping to escape the problematic characters and then do the variable replacement: sed "s/STRING/$var1/g". But I couldn't get that far either.
I figured you guys might know a better way to replace a string with a variable that contains quotes.
File1.txt:
Example test here
<tag>"Hello! [world]" This line sucks!</tag>
End example file
Variable:
var1=$(cat file1.txt)
Example:
echo "STRING" | sed "s/STRING/$var1/g"
Desired output:
Example test here
<tag>"Hello! [world]" This line sucks!</tag>
End example file

using awk
$ echo "STRING" | awk -v var="$var1" '{ gsub(/STRING/,var,$0); print $0}'
Example test here
<tag>"Hello! [world]" This line sucks!</tag>
End example file
-v var="$var1": To use shell variable in awk
gsub(/STRING/,var,$0) : To globally substitute all occurances of "STRING" in whole record $0 with var
Special case : "If your var has & in it " say at the beginning of the line then it will create problems with gsub as & has a special meaning and refers to the matched text instead.
To deal with this situation we've to escape & as follows :
$ echo "STRING" | awk -v var="$var1" '{ gsub(/&/,"\\\\&",var); gsub(/STRING/,var,$0);print $0}'
&Example test here
<tag>"Hello! [world]" This line sucks!</tag>
End example file

The problem isn't the quotes. You're missing the "s" command, leading sed to treat /STRING/ as a line address, and the value of $var1 as a command to execute on matching lines. Also, $var1 has unescaped newlines and a / character that'll cause trouble in the substitution. So add the "s", and escape the relevant characters in $var1:
var1escaped="$(echo "$var1" | sed 's#[\/&]#\\&#; $ !s/$/\\/')"
echo "STRING" | sed "s/STRING/$var1escaped/"
...but realistically, #batMan's answer (using awk) is probably a better solution.

Here is one awk command that gets text-to-be-replaces from a file that may consist of all kind of special characters such as & or \ etc:
awk -v pat="STRING" 'ARGV[1] == FILENAME {
# read replacement text from first file in arguments
a = (a == "" ? "" : a RS) $0
next
}
{
# now run a loop using index function and use substr to get the replacements
s = ""
while( p = index($0, pat) ) {
s = s substr($0, 1, p-1) a
$0 = substr($0, p+length(pat))
}
$0 = s $0
} 1' File1.txt <(echo "STRING")
To be able to handle all kind of special characters properly this command avoids any regex based functions. We use plain text based functions such as index, substr etc.

Related

How to use 'sed' to add dynamic prefix to each number in integer list?

How can I use sed to add a dynamic prefix to each number in an integer list?
For example:
I have a string "A-1,2,3,4,5", I want to transform it to string "A-1,A-2,A-3,A-4,A-5" - which means I want to add prefix of first integer i.e. "A-" to each number of the list.
If I have string like "B-1,20,300" then I want to transform it to string "B-1,B-20,B-300".
I am not able to use RegEx Capturing Groups because for global match they do not retain their value in subsequent matches.
When it comes to looping constructs in sed, I like to use newlines as markers for the places I have yet to process. This makes matching much simpler, and I know they're not in the input because my input is a text line.
For example:
$ echo A-1,2,3,4,5 | sed 's/,/\n/g;:a s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/; ta'
A-1,A-2,A-3,A-4,A-5
This works as follows:
s/,/\n/g # replace all commas with newlines (insert markers)
:a # label for looping
s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/ # replace the next marker with a comma followed
# by the prefix
ta # loop unless there's nothing more to do.
The approach is similar to #potong's, but I find the regex much more readable -- \([^0-9]*\) captures the prefix, \([^\n]*\) captures everything up to the next marker (i.e. everything that's already been processed), and then it's just a matter of reassembling it in the substitution.
Don't use sed, just use the other standard UNIX text manipulation tool, awk:
$ echo 'A-1,2,3,4,5' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
A-1,A-2,A-3,A-4,A-5
$ echo 'B-1,20,300' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
B-1,B-20,B-300
This might work for you (GNU sed):
sed -E ':a;s/^((([^-]+-)[^,]+,)+)([0-9])/\1\3\4/;ta' file
Uses pattern matching and a loop to replace a number following a comma by the first column prefix and that number.
Assuming this is for shell scripting, you can do so with 2 seds:
set string = "A1,2,3,4,5"
set prefix = `echo $string | sed 's/^\([A-Z]\).*/\1/'`
echo $string | sed 's/,\([0-9]\)/,'$prefix'-\1/g'
Output is
A1,A-2,A-3,A-4,A-5
With
set string = "B-1,20,300"
Output is
B-1,B-20,B-300
Could you please try following(if ok with awk).
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i !~ /^A/&&$i !~ /\"A/){
$i="A-"$i
}
}
}
1' Input_file
if your data in 'd' file, tried on gnu sed:
sed -E 'h;s/^(\w-).+/\1/;x;G;:s s/,([0-9]+)(.*\n(.+))/,\3\1\2/;ts; s/\n.+//' d

Extract Filename before date Bash shellscript

I am trying to extract a part of the filename - everything before the date and suffix. I am not sure the best way to do it in bashscript. Regex?
The names are part of the filename. I am trying to store it in a shellscript variable. The prefixes will not contain strange characters. The suffix will be the same. The files are stored in a directory - I will use loop to extract the portion of the filename for each file.
Expected input files:
EXAMPLE_FILE_2017-09-12.out
EXAMPLE_FILE_2_2017-10-12.out
Expected Extract:
EXAMPLE_FILE
EXAMPLE_FILE_2
Attempt:
filename=$(basename "$file")
folder=sed '^s/_[^_]*$//)' $filename
echo 'Filename:' $filename
echo 'Foldername:' $folder
$ cat file.txt
EXAMPLE_FILE_2017-09-12.out
EXAMPLE_FILE_2_2017-10-12.out
$
$ cat file.txt | sed 's/_[0-9]*-[0-9]*-[0-9]*\.out$//'
EXAMPLE_FILE
EXAMPLE_FILE_2
$
No need for useless use of cat, expensive forks and pipes. The shell can cut strings just fine:
$ file=EXAMPLE_FILE_2_2017-10-12.out
$ echo ${file%%_????-??-??.out}
EXAMPLE_FILE_2
Read all about how to use the %%, %, ## and # operators in your friendly shell manual.
Bash itself has regex capability so you do not need to run a utility. Example:
for fn in *.out; do
[[ $fn =~ ^(.*)_[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2} ]]
cap="${BASH_REMATCH[1]}"
printf "%s => %s\n" "$fn" "$cap"
done
With the example files, output is:
EXAMPLE_FILE_2017-09-12.out => EXAMPLE_FILE
EXAMPLE_FILE_2_2017-10-12.out => EXAMPLE_FILE_2
Using Bash itself will be faster, more efficient than spawning sed, awk, etc for each file name.
Of course in use, you would want to test for a successful match:
for fn in *.out; do
if [[ $fn =~ ^(.*)_[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2} ]]; then
cap="${BASH_REMATCH[1]}"
printf "%s => %s\n" "$fn" "$cap"
else
echo "$fn no match"
fi
done
As a side note, you can use Bash parameter expansion rather than a regex if you only need to trim the string after the last _ in the file name:
for fn in *.out; do
cap="${fn%_*}"
printf "%s => %s\n" "$fn" "$cap"
done
And then test $cap against $fn. If they are equal, the parameter expansion did not trim the file name after _ because it was not present.
The regex allows a test that a date-like string \d\d\d\d-\d\d-\d\d is after the _. Up to you which you need.
Code
See this code in use here
^\w+(?=_)
Results
Input
EXAMPLE_FILE_2017-09-12.out
EXAMPLE_FILE_2_2017-10-12.out
Output
EXAMPLE_FILE
EXAMPLE_FILE_2
Explanation
^ Assert position at start of line
\w+ Match any word character (a-zA-Z0-9_) between 1 and unlimited times
(?=_) Positive lookahead ensuring what follows is an underscore _ character
Simply with sed:
sed 's/_[^_]*$//' file
The output:
EXAMPLE_FILE
EXAMPLE_FILE_2
----------
In case of iterating through the list of files with extension .out - bash solution:
for f in *.out; do echo "${f%_*}"; done
awk -F_ 'NF-=1' OFS=_ file
EXAMPLE_FILE
EXAMPLE_FILE_2
Could you please try awk solution too, which will take care of all the .out files, note this has ben written and tested in GNU awk.
awk --re-interval 'FNR==1{if(val){close(val)};split(FILENAME, array,"_[0-9]{4}-[0-9]{2}-[0-9]{2}");print array[1];val=FILENAME;nextfile}' *.out
Also my awk version is old so I am using --re-interval, if you have latest version of awk you may need not to use it then.
Explanation and Non-one liner fom of solution: Adding a non-one liner form of solution too here with explanation.
awk --re-interval '##Using --re-interval for supporting ERE in my OLD awk version, if OP has new version of awk it could be removed.
FNR==1{ ##Checking here condition that when very first line of any Input_file is being read then do following actions.
if(val){ ##Checking here if variable named val value is NOT NULL then do following.
close(val) ##close the Input_file named which is stored in variable val, so that we will NOT face problem of TOO MANY FILES OPENED, so it will be like one file read close it in background then.
};
split(FILENAME, array,"_[0-9]{4}-[0-9]{2}-[0-9]{2}");##Splitting FILENAME(which will have Input_file name in it) into array named array only, whose separator is a 4 digits-2 digits- then 2 digits, actually this will take care of YYYY-MM-DD format in Input_file(s) and it will be easier for us to get the file name part.
print array[1]; ##Printing array 1st element here.
val=FILENAME; ##Storing FILENAME variable value which will have current Input_file name in it to variable named val, so that we could close it in background.
nextfile ##nextfile as it name suggests it will skip all the lines in current line and jump onto the next file to save some cpu cycles of our system.
}
' *.out ##Mentioning all *.out Input_file(s) here.

Remove \n newline if string contains keyword

I'd like to know if I can remove a \n (newline) only if the current line has one ore more keywords from a list; for instance, I want to remove the \n if it contains the words hello or world.
Example:
this is an original
file with lines
containing words like hello
and world
this is the end of the file
And the result would be:
this is an original
file with lines
containing words like hello and world this is the end of the file
I'd like to use sed, or awk and, if needed, grep, wc or whatever commands work for this purpose. I want to be able to do this on a lot of files.
Using awk you can do:
awk '/hello|world/{printf "%s ", $0; next} 1' file
this is an original
file with lines
containing words like hello and world this is the end of the file
here is simple one using sed
sed -r ':a;$!{N;ba};s/((hello|world)[^\n]*)\n/\1 /g' file
Explanation
:a;$!{N;ba} read whole file into pattern, like this: this is an original\nfile with lines\ncontaining words like hell\
o\nand world\nthis is the end of the file$
s/((hello|world)[^\n]*)\n/\1 /g search the key words hello or world and remove the next \n,
g command in sed substitute stands to apply the replacement to all matches to the regexp, not just the first.
A non-regex approach:
awk '
BEGIN {
# define the word list
w["hello"]
w["world"]
}
{
printf "%s", $0
for (i=1; i<=NF; i++)
if ($i in w) {
printf " "
next
}
print ""
}
'
or a perl one-liner
perl -pe 'BEGIN {#w = qw(hello world)} s/\n/ / if grep {$_ ~~ #w} split'
To edit the file in-place, do:
awk '...' filename > tmpfile && mv tmpfile filename
perl -i -pe '...' filename
This might work for you (GNU sed):
sed -r ':a;/^.*(hello|world).*\'\''/M{$bb;N;ba};:b;s/\n/ /g' file
This checks if the last line, of a possible multi-line, contains the required string(s) and if so reads another line until end-of-file or such that the last line does not contain the/those string(s). Newlines are removed and the line printed.
$ awk '{ORS=(/hello|world/?FS:RS)}1' file
this is an original
file with lines
containing words like hello and world this is the end of the file
sed -n '
:beg
/hello/ b keep
/world/ b keep
H;s/.*//;x;s/\n/ /g;p;b
: keep
H;s/.*//
$ b beg
' YourFile
a bit harder due to check on current line that may include a previous hello or world already
principle:
on every pattern match, keep the string in hold buffer
other wise, load hold buffer and remove \n (use of swap and empty the current line due to limited buffer operation available) and print the content
Add a special case of pattern in last line (normaly hold so not printed otherwise)

how to replace the next string after match (every) two blank lines?

is there a way to do this kind of substitution in Awk, sed, ...?
I have a text file with sections divived into two blank lines;
section1_name_x
dklfjsdklfjsldfjsl
section2_name_x
dlskfjsdklfjsldkjflkj
section_name_X
dfsdjfksdfsdf
I would to replace every "section_name_x" by "#section_name_x", this is, how to replace the next string after match (every) two blank lines?
Thanks,
Steve,
awk '
(NR==1 || blank==2) && $1 ~ /^section/ {sub(/section/, "#&")}
{
print
if (length)
blank = 0
else
blank ++
}
' file
#section1_name_x
dklfjsdklfjsldfjsl
#section2_name_x
dlskfjsdklfjsldkjflkj
#section_name_X
dfsdjfksdfsdf
hm....
Given your example data why not just
sed 's/^section[0-9]*_name.*/#/' file > newFile && mv newFile file
some seds support sed -i OR sed -i"" to overwrite the existing file, avoiding the && mv ... shown above.
The reg ex says, section must be at the beginning of the line, and can optionally contain a number or NO number at all.
IHTH
In gawk you can use the RT builtin variable:
gawk '{$1="#"$1; print $0 RT}' RS='\n\n' file
* Update *
Thanks to #EdMorton I realized that my first version was incorrect.
What happens:
Assigning to $1 causes the record to be rebuildt, which is not good in this cases since any sequence of white space is replaced by a single space between fields, and by the null string in the beginning and at the end of the record.
Using print adds an additional newline to the output.
The correct version:
gawk '{printf "%s", "#" $0 RT}' RS='\n\n\n' file

Replacing empty fields in delimited text file with dummy value

I am working on a project that takes a delimited set of data of the form:
field1~field2~field3~.....~fieldn
Having empty fields is a possibility, so
field1~~~field4~~field6
is perfectly acceptable.
This file gets translated using an inhouse translator program that leaves a little to be desired. Specifically, it doesn't deal with empty fields well. My solution was to stick some dummy value in there, like a space or an # sign. I've tried:
sed -r 's/~/~ ~/g'
and
awk '{gsub(/\~\~/,"~ ~")}; 1' file > file.SPACE
but both of these fall short in replacing MULTIPLE fields. So if I input
field1~field2~~~field3
it'll output:
field1~field2~ ~~field3
I'd like to just script this if I could, as I can't change the code of the translator. I can change the code in the program that creates the delimited file, but I'd rather not. Is there some workaround, or is coming up with an expression for this just one of the inherent limitations in a regular language?
EDIT: Wow thanks for the quick response everyone, all your solutions worked so I upvoted all of them. I think I'm going to accept Janito's because of the explanation.
Also why the downvote?
You could try:
sed -e ':a;s/~~/~ ~/;ta'
This creates a label "a" with the ":" command, then replaces one occurrance of ~~ with ~ ~, and then uses the "t" test command to jump back to the "a" label if the previous substitute command succeeded.
Hope this helps =)
awk '{for( i=0; i<=NF; i++ ) if( $i ~ /^$/ ) $i = " " } 1' FS='~' OFS='~' input
or:
awk '/^$/{ $0 = " " } 1' ORS='~' RS='~' input
or:
awk '{ while( gsub( "~~", "~ ~" )); }1' input
sed -e ':loop' -e 's/~~/~ ~/g' -e 't loop' file
You can use Perl
perl -pe 's/~(?=~)/~ /g'
...which says replace each "~" followed by "~" with "~ "
To store result(s) to file.SPACE use
perl -pe 's/~(?=~)/~ /g' file >file.SPACE