Removing rows that contains "(null)" value from a text file - regex

I would like to remove any row within a .txt file that contains "(null)". The (null) value is always in the 3rd column. I would like to add this to a script that I already have.
Txt file example:
39|1411|XXYZ
40|1416|XXX
41|1420|(null)
In this example I would like to remove the third row.
Im guessing its an awk -F but not sure from there.

You are on the right track with using -F.
$ awk -F '|' '$3 != "(null)"' file.txt
39|1411|XXYZ
40|1416|XXX
You set the field separator to |, then print all lines where the third field is not equal to (null). This uses awk's default of "print the line" if there's no action associated with a pattern.
If you relax the requirement to specifically test the third field, and there is no other place for the "(null)" substring to occur, you can get the same result with
grep -vF '(null)' file.txt

With awk:
awk '-F|' '$3 != "(null)"' < input-file

Here is a sed:
$ sed '/(null)$/d' file
39|1411|XXYZ
40|1416|XXX
The $ assures that the (null) is at the end of the line. If you want to assure that (null) is the final column:
$ sed '/\|(null)$/d' file
And if you want to be extra sure that it is the third column:
$ sed '/^[^|]*\|[^|]*\|(null)$/d' file
Or with grep:
$ grep -v '^[^|]*|[^|]*|(null)$'
(But instead of this last one, just use awk...)

Use grep:
grep -v '|.*|(null)' in_file
Here, grep uses option -v : print lines that do not match.
Or use Perl:
perl -F'[|]' -lane 'print if $F[2] ne "(null)";' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F'[|]' : Split into #F on literal |, rather than on whitespace.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

I would like to remove any row within a .txt file that contains "(null)"
If you wish to do that using AWK let file.txt content be
39|1411|XXYZ
40|1416|XXX
41|1420|(null)
then
awk '!index($0,"(null)")' file.txt
will output
39|1411|XXYZ
40|1416|XXX
Explanation: index return position of first occurence of substring ((null) in this case) or 0 if none will find, I negate what is return thus getting truth for 0 and false for anything else and AWK does print where result was truth.

Related

Remove hostnames from a single line that follow a pattern in bash script

I need to cat a file and edit a single line with multiple domains names. Removing any domain name that has a set certain pattern of 4 letters ex: ozar.
This will be used in a bash script so the number of domain names can range, I will save this to a csv later on but right now returning a string is fine.
I tried multiple commands, loops, and if statements but sending the output to variable I can use further in the script proved to be another difficult task.
Example file
$ echo file.txt
ozarkzshared.com win.ad.win.edu win_fl.ozarkzsp.com ap.allk.org allk.org >ozarkz.com website.com
What I attempted (that was close)
domains_1=$(cat /tmp/file.txt | sed 's/ozar*//g')
domains_2=$( cat /tmp/file.txt | printf '%s' "${string##*ozar}")
Goal
echo domain_x
win.ad.win.edu ap.allk.org allk.org website.com
If all the domains are on a single line separated by spaces, this might work:
awk '/ozar/ {next} 1' RS=" " file.txt
This sets RS, your record separator, then skips any record that matches the keyword. If you wanted to be able to skip a substring provided in a shell variable, you could do something like this:
$ s=ozar
$ awk -v re="$s" '$0 ~ re {next} 1' RS=" " file.txt
Note that the ~ operator is comparing a regular expression, not precisely a substring. You could leverage the index() function if you really want to check a substring:
$ awk -v s="$s" 'index($0,s) {next} 1' RS=" " file.txt
Note that all of the above is awk, which isn't what you asked for. If you'd like to do this with bash alone, the following might be for you:
while read -r -a a; do
for i in "${a[#]}"; do
[[ "$i" = *"$s"* ]] || echo "$i"
done
done < file.txt
This assigns each line of input to the array $a[], then steps through that array testing for a substring match and printing if there is none. Text processing in bash is MUCH less efficient than in a more specialized tool like awk or sed. YMMV.
you want to delete the words until a space delimiter
$ sed 's/ozar[^ ]*//g' file
win.ad.win.edu win_fl. ap.allk.org allk.org website.com

unix sed not backtracking to finish the job

I'm trying to make a script to convert postgres CSV dumps into Oracle csv dumps. Aka, I'm trying to replace "true" with "Y" and "false" with "N".
So I want a script called to_oracle like this:
echo "false,false,false,true" | to_oracle
N,N,N,Y
So here is my attempt:
sed -E -e 's:(,|^)true(,|$):\1Y\2:g' -e 's:(,|^)false(,|$):\1N\2:g' "$#"
The logic is that a field in a CSV file either starts with beginning of line or a comma "," and it ends with either the end of line or a comma ","
The problem with this script is that it greedily absorbs the comma and thus every second field doesn't work:
echo "false,false,false,true" | to_oracle
N,false,N,Y
Now I suppose I could pipe it to the script twice, and that would do the job, but I'm wondering is there a more elegant solution?
An awk version:
echo "false,false,false,true" | awk -F, -v OFS=, '{for(i=1;i<=NF;i++) $i=$i=="true"?"Y":"N"}1'
N,N,N,Y
It test one by one field, if its true use Y, else use N
If you like to test for false as well
echo "false,false,false,true" | awk -F, -v OFS=, '{for(i=1;i<=NF;i++) $i=($i=="true"?"Y":($i=="false"?"N":"other"))}1'
N,N,N,Y
With GNU sed, you may use
sed -E ':a;s/(,|^)false(,|$)/\1N\2/;ta; :b;s/(,|^)true(,|$)/\1Y\2/;tb'
See the online demo
Details
-E will enable POSIX ERE syntax
':a;s/(,|^)false(,|$)/\1N\2/;ta; will recursively replace false in between commas or start/end of string with N
:b;s/(,|^)true(,|$)/\1Y\2/;tb' will recursively replace true in between commas or start/end of string with Y.

How can I print 2 lines if the second line contains the same match as the first line?

Let's say I have a file with several million lines, organized like this:
#1:N:0:ABC
XYZ
#1:N:0:ABC
ABC
I am trying to write a one-line grep/sed/awk matching function that returns both lines if the NCCGGAGA line from the first line is found in the second line.
When I try to use grep -A1 -P and pipe the matches with a match like '(?<=:)[A-Z]{3}', I get stuck. I think my creativity is failing me here.
With awk
$ awk -F: 'NF==1 && $0 ~ s{print p ORS $0} {s=$NF; p=$0}' ip.txt
#1:N:0:ABC
ABC
-F: use : as delimiter, makes it easy to get last column
s=$NF; p=$0 save last column value and entire line for printing later
NF==1 if line doesn't contain :
$0 ~ s if line contains the last column data saved previously
if search data can contain regex meta characters, use index($0,s) instead to search literally
note that this code assumes input file having line containing : followed by line which doesn't have :
With GNU sed (might work with other versions too, syntax might differ though)
$ sed -nE '/:/{N; /.*:(.*)\n.*\1/p}' ip.txt
#1:N:0:ABC
ABC
/:/ if line contains :
N add next line to pattern space
/.*:(.*)\n.*\1/ capture string after last : and check if it is present in next line
again, this assumes input like shown in question.. this won't work for cases like
#1:N:0:ABC
#1:N:0:XYZ
XYZ
This might work for you (GNU sed):
sed -n 'N;/.*:\(.*\)\n.*\1/p;D' file
Use grep-like option -n to explicitly print lines. Read two lines into the pattern space and print both if they meet the requirements. Always delete the first and repeat.
If you actual Input_file is same as shown example then following may help you too here.
awk -v FS="[: \n]" -v RS="" '$(NF-1)==$NF' Input_file
EDIT: Adding 1 more solution as per Sundeep suggestion too here.
awk -v FS='[:\n]' -v RS= 'index($NF, $(NF-1))' Input_file

Bash replace '\n\n}' string in file

I've got files repeatedly containing the string \n\n} and I need to replace such string with \n} (removing one of the two newlines).
Since such files are dynamically generated through a bash script, I need to embed replacing code inside the script.
I tried with the following commands, but it doesn't work:
cat file.tex | sed -e 's/\n\n}/\n}/g' # it doesn't work!
cat file.tex | perl -p00e 's/\n\n}/\n}/g' # it doesn't work!
cat file.tex | awk -v RS="" '{gsub (/\n\n}/, "\nb")}1' # it does work, but not for large files
You didn't provide any sample input and expected output so it's a guess but maybe this is what you're looking for:
$ cat file
a
b
c
}
d
$ awk '/^$/{f=1;next} f{if(!/^}/)print "";f=0} 1' file
a
b
c
}
d
a way with sed:
sed -i -n ':a;N;$!ba;s/\n\n}/\n}/g;p' file.tex
details:
:a # defines the label "a"
N # append the next line to the pattern space
$!ba # if it is not the last line, go to label a
s/\n\n}/\n}/g # replace all \n\n} with \n}
p # print
The i parameter will change the file in place.
The n parameter prevents to automatically print the lines.
This Perl command will do as you ask
perl -i -0777 -pe's/\n(?=\n})//g' file.tex
This should work:
cat file.tex | sed -e 's/\\n\\n}/\\n}/g'
if \n\n} is written as raw string.
Or if it's new line:
cat file.tex | sed -e ':a;N;$!ba;s/\n\n}/\n}/g'
Another method:
if the first \n is any new line:
text=$(< file.tex)
text=${text//$'\n\n}'/$'\n}'}
printf "%s\n" "$text" #> file
If the first \n is an empty line:
text=$(< file.tex)
text=${text//$'\n\n\n}'/$'\n\n}'}
printf "%s\n" "$text" #> file
Nix-style line filters process the file line-by-line. Thus, you have to do something extra to process an expression which spans lines.
As mentioned by others, '\n\n' is simply an empty line and matches the regular expression /^$/. Perhaps the most efficient thing to do is to save each empty line until you know whether or not the next one will contain a close bracket at the beginning of the line.
cat file.tex | perl -ne 'if ( $b ) { print $b unless m/^\}/; undef $b; } if ( m/^$/ ) { $b=$_; } else { print; } END { print $b if $b; }'
And to clean it all up we add an END block, to process the case that the last line in the file is blank (and we want to keep it).
If you have access to node you can use rexreplace
npm install -g regreplace
and then run
rexreplace '\n\n\}' '\n\}' myfile.txt
Of if you have more files in a dir data you can do
rexreplace '\n\n\}' '\n\}' data/*.txt

bash replace part of text separated with marker that includes a keyword

I would like to replace everything between : if there's a keyword in it
Having
TEXT="/Something/like-this:/How/can-one-replace/text/separated/with/colon/that-includes/a/keyword?:There/may-be/multiple/keywords:/Thanks:/keyword"
with:
sed -e 's/regex here that searches for keyword/\/some\/path/g' <<< $TEXT
To get:
/Something/like-this:/some/path:/some/path:/Thanks:/some/path
P.S.
Another example to make it more clear: How can paths that includes hello be replaced with another path?
/opt/hello/bin:/bin:/home/user/hello:/home/user/bin:/media/hello
=>
/some/path:/bin:/some/path:/home/user/bin:/some/path
My apologies for unclear question.
I think you need this,
$ sed -r 's~^([^:]+):.*:([^:]+):(.*)$~\1:/Replacement:/Replacement:\2:/Replacement~g' file
/Something/like-this:/Replacement:/Replacement:/Thanks:/Replacement
Or something like this,
$ sed -r 's~^([^:]+):.*:([^:]+):(.*)$~\1:/*Replacement*:/*Replacement*:\2:/*Replacement*~g' file
/Something/like-this:/*Replacement*:/*Replacement*:/Thanks:/*Replacement*
Or
it may be like this, if you assign some path to Replacement variable,
$ Replacement=/foo/bar
$ sed -r "s~^([^:]+):.*:([^:]+):(.*)$~\1:/*$Replacement*:/*$Replacement*:\2:/*$Replacement*~g" file
/Something/like-this:/*/foo/bar*:/*/foo/bar*:/Thanks:/*/foo/bar*
Or
You may try this also,
awk -v RS=: -v var=/path -v ORS=: '{sub (/.*hello.*/,var)}1' file
Example:
$ echo '/opt/hello/bin:/bin:/home/user/hello:/home/user/bin:/media/hello' | awk -v RS=: -v var=/foo/bar -v ORS=: '{sub (/.*hello.*/,var)}1'
/foo/bar:/bin:/foo/bar:/home/user/bin:/foo/bar:
Explanation:
Awk inbuilt variable RS(Record seperator) and ORS(Output Record Seperator) are set to :. So awk breaks the string whenever it finds : in the input and treats the text after : would be in the next line.
ORS is set to :, so awk prints the records with : as seperator.
-v var=/foo/bar , Replacement string is assigned to a variable var.
sub (/.*hello.*/,var), if the record matches this regex, it replaces the whole record with the value in the variable var.
1, to print all the records.
My version:
sed 's/:/::/g;s/^/:/;s/$/:/;s/:[^:]*keyword[^:]*:/:REPLACEMENT:/g;s/^://;s/:$//;s/::/:/g'
With bash
IFS=: read -ra arr <<<'/opt/hello/bin:/bin:/home/user/hello:/home/user/bin:/media/hello'
v=$(IFS=:; printf "%s\n" "${arr[*]/*hello*/\/some\/path}")
echo $v
/some/path:/bin:/some/path:/home/user/bin:/some/path