replace a pipe delimiter with a space using awk or sed - regex

I have a pipe delimited file with a sample lines like below;
/u/chaintrk/bri/sh/picklist_autoprint.sh|-rwxrwxr-x|bdr|bdr|2665|Oct|23|14:04|3919089454
/u/chaintrk/bri/sh/generate_ct2020.pl|-rwxrwxr-x|bdr|bdr|15916|Oct|23|14:04|957147508
is there a way that awk or sed can transform the lines into the output like below where the pipe between the month and the date was replaced by space?
/u/chaintrk/bri/sh/picklist_autoprint.sh|-rwxrwxr-x|bdr|bdr|2665|Oct 23|14:04|3919089454
/u/chaintrk/bri/sh/generate_ct2020.pl|-rwxrwxr-x|bdr|bdr|15916|Oct 23|14:04|957147508

With GNU sed:
sed -E 's/(\|[A-Z][a-z]{2})\|([0-9]{1,2}\|)/\1 \2/' file
Output:
/u/chaintrk/bri/sh/picklist_autoprint.sh|-rwxrwxr-x|bdr|bdr|2665|Oct 23|14:04|3919089454
/u/chaintrk/bri/sh/generate_ct2020.pl|-rwxrwxr-x|bdr|bdr|15916|Oct 23|14:04|957147508
If you want to edit file "in place" add sed's option -i.

Yes, it is possible to change a "|" with an space.
The real problem is to identify which of the field(s) to change.
Are those always the 6th and 7th? If so, this works:
awk -vFS='|' '{sub($6"|"$7,$6" "$7)}1' file
Are those with a text Upper-lower-lower followed by a 1 or 2 digits?
If so, this other works:
gawk '{c="[|]([[:upper:]][[:lower:]]{2})[|]([0-9]{1,2})[|]";print gensub(c,"|\\1 \\2|",1,$0)}' file

Related

Bash Comma Delimited List Extracting Last Column

I have a comma delimited list in a txt file in bash that looks like this:
name1,org2,enabled,email
name2,org1,enabled,email
name3,org3,enabled,
name4,org4,enabled,email
name5,org5,enabled,
I want a command that will extract the rows of the people who are missing their e-mails, what is a command that will do that? Thanks
awk -<Flag> <don't know the syntax>
In awk:
$ awk -F, '$4==""' file
name3,org3,enabled,
name5,org5,enabled,
-F, defines FS, the input file separator
$4=="" outputs records where 4th field is empty
grep:
$ grep ",$" file
name3,org3,enabled,
name5,org5,enabled,
,$ returns records where the last field is empty
I assume that your file contains lines like:
name1,org2,enabled,email#domain.com and not name1,org2,enabled,email
Based on that, you can use grep -v (invert), i.e.:
grep -v '#' file
Output:
name3,org3,enabled,
name5,org5,enabled,
This could be awk command similar to the code below:
awk -F, '$4 == ""'
This code assumes:
each line is comma separated string
4th field could be empty
if the item 2 is true, print the whole line
Edit:
Early I have shared the shorter way with !$4. But this one is not good approach. For details look for discussions in the comments to my post.
grep approach:
grep -Eo '([^[:space:]]*,){3}$' file
The output:
name3,org3,enabled,
name5,org5,enabled,
sed approach:
sed -n '/\(\S*,\)\{3\}$/p' file

Get specific Text between Specific Tags

At the top of my HTML files, I have...
<H2>City</H2>
<P>Liverpool</P>
or
<H2>City</H2>
<P>Dublin</P>
I want to output the text between the tags straight after <H2>City</H2> instances. So in the examples above which are separate files, I want to print out Liverpool and in the second example, Dublin.
Looking at this thread, I try:
sed -e 's/City\(.*\)\/P/\1/'
which I hope would get me half way there... but that just prints out the entire file. Any ideas?
awk to the rescue! You need multi-char RS support though (gawk has it)
$ awk -F'[<>]' -v RS='<H2>City</H2>' 'NF{print $3}' file
another approach can be
$ awk 'c&&c--{sub(/<[^>]*>/,""); print} /<H2>City<\/H2>/{c=1}' file
find the next record after City and trim the angle brackets...
Try using the following regex :
(?s)(?<=City<\/H2>\n<P>).*?(?=<\/P>)
see regex demo / explanation
sed
sed -e 's/(?s)(?<=City<\/H2>\n<P>).*?(?=<\/P>)/'
I checked and the \s seem not work for spaces. You should use the newline character \n:
sed -e 's/<H2>City<\/H2>\n<P>\(.*\)<\/P>/\1/'
There is no need of use lookbehind (like above), that is an overkill.
With sed, you can use the n command to read next line after your pattern. Then just remove the tag to output your content:
sed -n '/<H2>City<\/H2>/n;s/ *<\/*P> *//gp;' file
I think this should work in your mac:
echo -e "<H2>City</H2>\n<P>Dublin</P>" |awk -F"[<>]" '/City/{getline;print $3}'
Dublin

how do i replace the first 100 characters of all lines in a file using awk

How do I replace the first 100 characters of all lines in a file using awk? There is no field delimiter in this file. All fields are fixed width. And given the variation in the data, I cannot use a search and replace.
How about sed? To replace the first 100 characters with say A:
$ sed -r 's/.{100}/A/' file
If you're happy with the results rewrite the file using -i:
$ sed -ri 's/.{100}/A/' file
awk '{print "replacing text..." substr($0,100)}'
Use pure shell.
#!/usr/bin/env bash
# read each line into shell variable REPLY
while read -r ; do
echo "REPLACE text ... ${REPLY:100}"
done <file
Explanation
REPLY is shell variable, refer http://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html. Set to the line of input read by the read builtin command when no arguments are supplied
${REPLY:100} - get the string after 100 characters.

Replace strings with double quotes in a XML file

I have a huge XML file with longer lines (5000-10000 characters per line) with following text:
Pattern="abc"
and I want to replace it with
Pattern="def"
As the line sizes are huge, I have no choice but to use awk. Please suggest how this can be achieved. I tried with the below but it is not working:
CMD="{sub(\"Pattern=\"abc\"\",\"Pattern=\"def\"\"); print}"
echo "$CMD"
awk "$CMD" "Some File Name.xml"
Any help is highly appreciated.
one suggestion with awk
BEGIN {FS="\""; OFS=""}
/Pattern="abc"/{$2="\"def\""}1
I don't understand why you said "As the line sizes are huge, I have no choice but to use awk". AFAIK sed is no more limited on line length than awk is and since this is a simple substitution on a single line, sed is the better choice of tool:
$ cat file
Pattern="abc"
$ sed -r 's/(Pattern=")[^"]+/\1def/' file
Pattern="def"
If the pattern occurs multiple times on the line, add a "g" to the end of the line.
Since you mention in your comment being stuck with a sed that can't handle long lines, let's assume you can't install GNU tools so you'll need a non-GNU awk solution like this:
$ awk '{sub(/Pattern="[^"]+/,"Pattern=\"def")}1' file
Pattern="def"
If you LITERALLY mean you only want to replace Pattern="abc" then just do:
$ awk '{sub(/Pattern="abc"/,"Pattern=\"def\"")}1' file
Pattern="def"
If You have bash you can try this:
Create file with long lines (>10_000 chars):
for((i=0;i<2500;++i));{ s="x$s";}
l="${s}Pattern=\"abc\"$s"
for i in {1..5}; { echo "$l$l";} >infile
The script:
while read x; do echo "${x//Pattern=\"abc\"/Pattern=\"def\"}";done <infile
This replaces all occurrences of Pattern="abc" to Pattern="def" in each line.

removing very first token of a file only

i am not much familiar in scripting. it can be very easy problem. I want to remove first token of every file.
file 1
1 this is good
file 2
2 this is another file.
i would like to remove 1 and 2 from file 1 and file 2. how would do it? any bash command for it?
Or with awk:
$ awk '{if (NR==1) {$1="";print $0;} else print $0}' input_file
(This preserves the space at the start of the line)
Using sed and assuming you don't want to preserve a leading space:
sed '1{s/\s*\w*//}' input_file
This will works on the very first line (1{}) and uses substitute command (s/pattern/replace/) to delete the first white spaces and following word characters (\s*\w*). The word characters are [a-zA-Z0-9].
$ sed '0,/1/{s/1//}' f1
this is good