Find and Replace value using Regular Expressions in Sed - replace

I need to search and replace the value for CustomerEMailID in xml file with the sed search and replace sed s/// . I was able to get the value for CustomerEMailID, how do I replace it with emailID using sed?
//Print the value
sed -n 's/.*CustomerEMailID="\([^"]*\).*/\1/p' xmltoconvert.xml
//xml file:
<Order
CustomerEMailID="XXXX"

Try this:
sed 's/\(.*CustomerEMailID="\)[^"]*\(.*\)/\1emailID\2/' xmltoconvert.xml
It will replace the value you want and print the whole file to terminal.
Input file:
<Order
CustomerEMailID="XXXX">
Output file:
<Order
CustomerEMailID="emailID">

With awk you could use the following:
awk -v email="emailID" 'BEGIN{FS=OFS="\""}{for(i=1;i<=NF;i++) if($i ~ /CustomerEMailID=/) $(i+1)=email}1' file
It first sets the field separators FS and OFS to ".
Then it will look for the index of parameter matching the pattern /CustomerEMailID=/ and replace the next parameter to the string stored in the awk variable email.

Related

Removing rows that contains "(null)" value from a text file

I would like to remove any row within a .txt file that contains "(null)". The (null) value is always in the 3rd column. I would like to add this to a script that I already have.
Txt file example:
39|1411|XXYZ
40|1416|XXX
41|1420|(null)
In this example I would like to remove the third row.
Im guessing its an awk -F but not sure from there.
You are on the right track with using -F.
$ awk -F '|' '$3 != "(null)"' file.txt
39|1411|XXYZ
40|1416|XXX
You set the field separator to |, then print all lines where the third field is not equal to (null). This uses awk's default of "print the line" if there's no action associated with a pattern.
If you relax the requirement to specifically test the third field, and there is no other place for the "(null)" substring to occur, you can get the same result with
grep -vF '(null)' file.txt
With awk:
awk '-F|' '$3 != "(null)"' < input-file
Here is a sed:
$ sed '/(null)$/d' file
39|1411|XXYZ
40|1416|XXX
The $ assures that the (null) is at the end of the line. If you want to assure that (null) is the final column:
$ sed '/\|(null)$/d' file
And if you want to be extra sure that it is the third column:
$ sed '/^[^|]*\|[^|]*\|(null)$/d' file
Or with grep:
$ grep -v '^[^|]*|[^|]*|(null)$'
(But instead of this last one, just use awk...)
Use grep:
grep -v '|.*|(null)' in_file
Here, grep uses option -v : print lines that do not match.
Or use Perl:
perl -F'[|]' -lane 'print if $F[2] ne "(null)";' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F'[|]' : Split into #F on literal |, rather than on whitespace.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
I would like to remove any row within a .txt file that contains "(null)"
If you wish to do that using AWK let file.txt content be
39|1411|XXYZ
40|1416|XXX
41|1420|(null)
then
awk '!index($0,"(null)")' file.txt
will output
39|1411|XXYZ
40|1416|XXX
Explanation: index return position of first occurence of substring ((null) in this case) or 0 if none will find, I negate what is return thus getting truth for 0 and false for anything else and AWK does print where result was truth.

replace a pipe delimiter with a space using awk or sed

I have a pipe delimited file with a sample lines like below;
/u/chaintrk/bri/sh/picklist_autoprint.sh|-rwxrwxr-x|bdr|bdr|2665|Oct|23|14:04|3919089454
/u/chaintrk/bri/sh/generate_ct2020.pl|-rwxrwxr-x|bdr|bdr|15916|Oct|23|14:04|957147508
is there a way that awk or sed can transform the lines into the output like below where the pipe between the month and the date was replaced by space?
/u/chaintrk/bri/sh/picklist_autoprint.sh|-rwxrwxr-x|bdr|bdr|2665|Oct 23|14:04|3919089454
/u/chaintrk/bri/sh/generate_ct2020.pl|-rwxrwxr-x|bdr|bdr|15916|Oct 23|14:04|957147508
With GNU sed:
sed -E 's/(\|[A-Z][a-z]{2})\|([0-9]{1,2}\|)/\1 \2/' file
Output:
/u/chaintrk/bri/sh/picklist_autoprint.sh|-rwxrwxr-x|bdr|bdr|2665|Oct 23|14:04|3919089454
/u/chaintrk/bri/sh/generate_ct2020.pl|-rwxrwxr-x|bdr|bdr|15916|Oct 23|14:04|957147508
If you want to edit file "in place" add sed's option -i.
Yes, it is possible to change a "|" with an space.
The real problem is to identify which of the field(s) to change.
Are those always the 6th and 7th? If so, this works:
awk -vFS='|' '{sub($6"|"$7,$6" "$7)}1' file
Are those with a text Upper-lower-lower followed by a 1 or 2 digits?
If so, this other works:
gawk '{c="[|]([[:upper:]][[:lower:]]{2})[|]([0-9]{1,2})[|]";print gensub(c,"|\\1 \\2|",1,$0)}' file

How to pass a variable from bash to regex

I'm trying to print only lines that contain the variable $foo. I've tried using double quotes and curly braces to no avail. What is the proper way to pass a shell variable to a regex in sed?
sed -n 's:\("${foo}".*$\):\1:p' file.txt
sed is overkill if you don't actually need to modify the matching lines. Just use grep:
grep "$foo" file.txt
Try the below sed command to print the lines which contains value assigned to the variable foo.
sed -n "/$foo/p" file
Using single quote you get the result like
$-pattern='foo'
$-sed 's/('$pattern'.*$)/<\1>/g'
fo
fo
foo
<foo>

SED replace expression "within" a regular expression

I have to change a CSV file column (the date) which is written in the following format:
YYYY-MM-DD
and I would like it to be
YYYY.MM.DD
I can write a succession of 2 sed rules piped one to the other like :
sed 's/-/./' file.csv | sed 's/-/./'
but this is not clean. my question is: is there a way of assigning variables in sed and tell it that YYYY-MM-DD should be parsed as year=YYYY ; month=MM ; day=DD and then tell it
write $year.$month.$day
or something similar? Maybe with awk?
You could use groups and access the year, month, and day directly via backreferences:
sed 's#\([0-9][0-9][0-9][0-9]\)-\([0-9][0-9]\)-\([0-9][0-9]\)#\1.\2.\3#g'
Here's an alternative solution with awk:
awk 'BEGIN { FS=OFS="," } { gsub("-", ".", $1); print }' file.csv
BEGIN { FS=OFS="," } tells awk to break the input lines into fields by , (variable FS, the [input] Field Separator), as well as to also use , when outputting modified input lines (variable OFS, the Output Field Separator).
gsub("-", ".", $1) replaces all - instances with . in field 1
The assumption is that the data is in the 1st field, $1; if the field index is a different one, replace the 1 in $1 accordingly.
print simply outputs the modified input line, terminated with a newline.
What you are doing is equivalent to supplying the "global" replacement flag:
sed 's/-/./g' file.csv
sed has no variables, but it does have numbered groups:
sed -r 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\1.\2.\3/g' file.csv
or, if your sed has no -r:
sed 's/\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\)/\1.\2.\3/g' file.csv
You may try this sed command also,
sed 's/\([0-9]\{4\}\)\-\([0-9]\{2\}\)\-\([0-9]\{2\}\)/\1.\2.\3/g' file
Example:
$ (echo '2056-05-15'; echo '2086-12-15'; echo 'foo-bar-go') | sed 's/\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\)/\1.\2.\3/g'
2056.05.15
2086.12.15
foo-bar-go

how do i replace the first 100 characters of all lines in a file using awk

How do I replace the first 100 characters of all lines in a file using awk? There is no field delimiter in this file. All fields are fixed width. And given the variation in the data, I cannot use a search and replace.
How about sed? To replace the first 100 characters with say A:
$ sed -r 's/.{100}/A/' file
If you're happy with the results rewrite the file using -i:
$ sed -ri 's/.{100}/A/' file
awk '{print "replacing text..." substr($0,100)}'
Use pure shell.
#!/usr/bin/env bash
# read each line into shell variable REPLY
while read -r ; do
echo "REPLACE text ... ${REPLY:100}"
done <file
Explanation
REPLY is shell variable, refer http://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html. Set to the line of input read by the read builtin command when no arguments are supplied
${REPLY:100} - get the string after 100 characters.