Bash Comma Delimited List Extracting Last Column - regex

I have a comma delimited list in a txt file in bash that looks like this:
name1,org2,enabled,email
name2,org1,enabled,email
name3,org3,enabled,
name4,org4,enabled,email
name5,org5,enabled,
I want a command that will extract the rows of the people who are missing their e-mails, what is a command that will do that? Thanks
awk -<Flag> <don't know the syntax>

In awk:
$ awk -F, '$4==""' file
name3,org3,enabled,
name5,org5,enabled,
-F, defines FS, the input file separator
$4=="" outputs records where 4th field is empty
grep:
$ grep ",$" file
name3,org3,enabled,
name5,org5,enabled,
,$ returns records where the last field is empty

I assume that your file contains lines like:
name1,org2,enabled,email#domain.com and not name1,org2,enabled,email
Based on that, you can use grep -v (invert), i.e.:
grep -v '#' file
Output:
name3,org3,enabled,
name5,org5,enabled,

This could be awk command similar to the code below:
awk -F, '$4 == ""'
This code assumes:
each line is comma separated string
4th field could be empty
if the item 2 is true, print the whole line
Edit:
Early I have shared the shorter way with !$4. But this one is not good approach. For details look for discussions in the comments to my post.

grep approach:
grep -Eo '([^[:space:]]*,){3}$' file
The output:
name3,org3,enabled,
name5,org5,enabled,
sed approach:
sed -n '/\(\S*,\)\{3\}$/p' file

Related

Removing rows that contains "(null)" value from a text file

I would like to remove any row within a .txt file that contains "(null)". The (null) value is always in the 3rd column. I would like to add this to a script that I already have.
Txt file example:
39|1411|XXYZ
40|1416|XXX
41|1420|(null)
In this example I would like to remove the third row.
Im guessing its an awk -F but not sure from there.
You are on the right track with using -F.
$ awk -F '|' '$3 != "(null)"' file.txt
39|1411|XXYZ
40|1416|XXX
You set the field separator to |, then print all lines where the third field is not equal to (null). This uses awk's default of "print the line" if there's no action associated with a pattern.
If you relax the requirement to specifically test the third field, and there is no other place for the "(null)" substring to occur, you can get the same result with
grep -vF '(null)' file.txt
With awk:
awk '-F|' '$3 != "(null)"' < input-file
Here is a sed:
$ sed '/(null)$/d' file
39|1411|XXYZ
40|1416|XXX
The $ assures that the (null) is at the end of the line. If you want to assure that (null) is the final column:
$ sed '/\|(null)$/d' file
And if you want to be extra sure that it is the third column:
$ sed '/^[^|]*\|[^|]*\|(null)$/d' file
Or with grep:
$ grep -v '^[^|]*|[^|]*|(null)$'
(But instead of this last one, just use awk...)
Use grep:
grep -v '|.*|(null)' in_file
Here, grep uses option -v : print lines that do not match.
Or use Perl:
perl -F'[|]' -lane 'print if $F[2] ne "(null)";' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F'[|]' : Split into #F on literal |, rather than on whitespace.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
I would like to remove any row within a .txt file that contains "(null)"
If you wish to do that using AWK let file.txt content be
39|1411|XXYZ
40|1416|XXX
41|1420|(null)
then
awk '!index($0,"(null)")' file.txt
will output
39|1411|XXYZ
40|1416|XXX
Explanation: index return position of first occurence of substring ((null) in this case) or 0 if none will find, I negate what is return thus getting truth for 0 and false for anything else and AWK does print where result was truth.

Extract the string matched in a regex, not the line, with awk

This should not be too difficult but I could not find a solution.
I have a HTML file, and I want to extract all URLs with a specific pattern.
The pattern is /users/<USERNAME>/ - I actually only need the USERNAME.
I got only to this:
awk '/users\/.*\//{print $0}' file
But this filters me the complete line. I don't want the line.
Even just the whole URL is fine (e.g. get /users/USERNAME/), but I really only need the USERNAME....
If you want to do this in single awk then use match function:
awk -v s="/users/" 'match($0, s "[^/[:blank:]]+") {
print substr($0, RSTART+length(s), RLENGTH-length(s))
}' file
Or else this grep + cut will do the job:
grep -Eo '/users/[^/[:blank:]]+' file | cut -d/ -f
set the delimiter and do a literal match to second field and print the third.
$ awk -F/ '$2=="users"{print $3}'
Assuming your statement gives you the entire line of something like
/users/USERNAME/garbage/otherStuff/
You could pipe this result through head assuming you always know that it will be
/users/USERNAME/....
After piping through head, you can also use cut commands to remove more of the end text until you have only the piece you want.
The command will look something like this
awk '/users\/.*\//{print $0}' file | head (options) | cut (options)

awk strings for git

I'm trying to do an awk to retrieve the directory for certain git repos.
Current
git#ssh.gitlab.org:repo1/dir/dir/file1.git
git#ssh.gitlab.org:repo1/dir/dir/file2.git
git#ssh.gitlab.org:repo1/dir/dir/file3.git
git#ssh.gitlab.org:repo1/dir/dir/file4.git
I have this below using a field separate, but I'm unsure how to remote .git
awk -F':' '{print $2}' file
repo1/dir/dir/file1.git
repo1/dir/dir/file2.git
repo1/dir/dir/file3.git
repo1/dir/dir/file4.git
Desired result
repo1/dir/dir/file1
repo1/dir/dir/file2
repo1/dir/dir/file3
repo1/dir/dir/file4
You may use
awk -F':' '{sub(/\.[^.\/]*$/, "", $2); print $2;}' file
Using -F':' you will split all records (lines) into colon-separated fields. You access the second item only using $2, but before printing it, you need to remove the final . and any 0 or more chars other than . and / up to the end of the field value, which is done with sub(/\.[^.\/]*$/, "", $2).
See the online demo
With this solution, you may handle files and folders that may have any amount of dots in their names.
Could you please try following.
awk -F'[:.]' '{print $(NF-1)}' input_file
2nd solution: In case you don't want to hard code field number then try following.
awk 'match($0,/:[^.]*/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
With sed
$ sed 's/^[^:]*://; s/\.git$//' file
repo1/dir/dir/file1
repo1/dir/dir/file2
repo1/dir/dir/file3
repo1/dir/dir/file4
s/^[^:]*:// remove up to first : from start of line
s/\.git$// remove .git from end of line
you can also use sed -E 's/^[^:]*:|\.git$//g' to do it with single substitution
With regex, you can use:
(?<=:)[a-z0-9\/]*
Match anything composed of letters, numbers and slash after the semicolon. So it will stop at the dot.
Or directly match everything between : and . with
(?<=:).*(?=\.)

How can I print 2 lines if the second line contains the same match as the first line?

Let's say I have a file with several million lines, organized like this:
#1:N:0:ABC
XYZ
#1:N:0:ABC
ABC
I am trying to write a one-line grep/sed/awk matching function that returns both lines if the NCCGGAGA line from the first line is found in the second line.
When I try to use grep -A1 -P and pipe the matches with a match like '(?<=:)[A-Z]{3}', I get stuck. I think my creativity is failing me here.
With awk
$ awk -F: 'NF==1 && $0 ~ s{print p ORS $0} {s=$NF; p=$0}' ip.txt
#1:N:0:ABC
ABC
-F: use : as delimiter, makes it easy to get last column
s=$NF; p=$0 save last column value and entire line for printing later
NF==1 if line doesn't contain :
$0 ~ s if line contains the last column data saved previously
if search data can contain regex meta characters, use index($0,s) instead to search literally
note that this code assumes input file having line containing : followed by line which doesn't have :
With GNU sed (might work with other versions too, syntax might differ though)
$ sed -nE '/:/{N; /.*:(.*)\n.*\1/p}' ip.txt
#1:N:0:ABC
ABC
/:/ if line contains :
N add next line to pattern space
/.*:(.*)\n.*\1/ capture string after last : and check if it is present in next line
again, this assumes input like shown in question.. this won't work for cases like
#1:N:0:ABC
#1:N:0:XYZ
XYZ
This might work for you (GNU sed):
sed -n 'N;/.*:\(.*\)\n.*\1/p;D' file
Use grep-like option -n to explicitly print lines. Read two lines into the pattern space and print both if they meet the requirements. Always delete the first and repeat.
If you actual Input_file is same as shown example then following may help you too here.
awk -v FS="[: \n]" -v RS="" '$(NF-1)==$NF' Input_file
EDIT: Adding 1 more solution as per Sundeep suggestion too here.
awk -v FS='[:\n]' -v RS= 'index($NF, $(NF-1))' Input_file

replace a pipe delimiter with a space using awk or sed

I have a pipe delimited file with a sample lines like below;
/u/chaintrk/bri/sh/picklist_autoprint.sh|-rwxrwxr-x|bdr|bdr|2665|Oct|23|14:04|3919089454
/u/chaintrk/bri/sh/generate_ct2020.pl|-rwxrwxr-x|bdr|bdr|15916|Oct|23|14:04|957147508
is there a way that awk or sed can transform the lines into the output like below where the pipe between the month and the date was replaced by space?
/u/chaintrk/bri/sh/picklist_autoprint.sh|-rwxrwxr-x|bdr|bdr|2665|Oct 23|14:04|3919089454
/u/chaintrk/bri/sh/generate_ct2020.pl|-rwxrwxr-x|bdr|bdr|15916|Oct 23|14:04|957147508
With GNU sed:
sed -E 's/(\|[A-Z][a-z]{2})\|([0-9]{1,2}\|)/\1 \2/' file
Output:
/u/chaintrk/bri/sh/picklist_autoprint.sh|-rwxrwxr-x|bdr|bdr|2665|Oct 23|14:04|3919089454
/u/chaintrk/bri/sh/generate_ct2020.pl|-rwxrwxr-x|bdr|bdr|15916|Oct 23|14:04|957147508
If you want to edit file "in place" add sed's option -i.
Yes, it is possible to change a "|" with an space.
The real problem is to identify which of the field(s) to change.
Are those always the 6th and 7th? If so, this works:
awk -vFS='|' '{sub($6"|"$7,$6" "$7)}1' file
Are those with a text Upper-lower-lower followed by a 1 or 2 digits?
If so, this other works:
gawk '{c="[|]([[:upper:]][[:lower:]]{2})[|]([0-9]{1,2})[|]";print gensub(c,"|\\1 \\2|",1,$0)}' file