Edited: Grep/Awk- Print specific info from table

Edited: Grep/Awk- Print specific info from table - regex

(This example is edited, following a user's recommendation, considering a mistake in my table display)
I have a .csv table from where I need certain info. My table looks like this:
Name, Birth
James,2001/02/03 California
Patrick,2001/02/03 Texas
Sarah,2000/03/01 Alabama
Sean,2002/02/01 New York
Michael,2002/02/01 Ontario
From here, I would need to print only the unique birthdates, in an ascending order, like this:
2000/03/01
2001/02/03
2002/02/01
I have thought of a regular expression to identify the dates, such as:
awk '/[0-9]{4}/[0-9]{2}/[0-9]/{2}/' students.csv
However, I'm getting a syntax error in the regex, and I wouldn't know how to follow from this step.
Any hints?

Use cut and sort with -u option to print unique values:
cut -d' ' -f2 students.csv | sort -u > out_file
You can also use grep instead of cut:
grep -Po '\d\d\d\d/\d\d/\d\d' students.csv | sort -u > out_file
Here, GNU grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
SEE ALSO:
perlre - Perl regular expressions

Here is a gnu awk solution to get this done in a single command:
awk 'NF > 2 && !seen[$2]++{} END {
PROCINFO["sorted_in"]="#ind_str_asc"; for (i in seen) print i}' file
2000/03/01
2001/02/03
2002/02/01

Using any awk and whether your names have 1 word or more and whether blank chars exist after the commas or not:
$ awk -F', *' 'NR>1{sub(/ .*/,"",$2); print $2}' file | sort -u
2000/03/01
2001/02/03
2002/02/01

With your shown samples, could you please try following. Written and tested in GNU awk, should work in any awk though.
awk '
match($0,/[0-9]{4}(\/[0-9]{2}){2}/){
arrDate[substr($0,RSTART,RLENGTH)]
}
END{
for(i in arrDate){
print i
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/[0-9]{4}(\/[0-9]{2}){2}/){ ##using match function to match regex to match only date format.
arrDate[substr($0,RSTART,RLENGTH)] ##Creating array arrDate which has index as sub string of matched one.
}
END{ ##Starting END block of this awk program from here.
for(i in arrDate){ ##Traversing through arrDate here.
print i ##Printing index of array here.
}
}
' Input_file ##Mentioning Input_file name here.

Related

count of extracted text for each number

I have a text file with lot of SQL queries those look something like this...
select * from sometable where customernos like '%67890%';
select name, city from sometable where customernos like '%67890%';
select * from othertable where customernos like '%12345%';
I can get the count using a command like this...
grep -v 67890 file.txt | wc -l
But is there any way I can get the count of all customer numbers report like...
12345 1
67890 2

Could you please try following.
awk '
match($0,/%[^%][0-9]{5}/){
val[substr($0,RSTART+1,RLENGTH-1)]++
}
END{
for(i in val){
print i,val[i]
}
}' Input_file
For shown samples output will be as follows.
12345 1
67890 2
Explanation: Adding explanation for above.
awk ' ##Starting awk program from here.
match($0,/%[^%][0-9]{5}/){ ##Using match function to match from % to till 5 digits before next occurrence of % here.
val[substr($0,RSTART+1,RLENGTH-1)]++ ##Creating val with index of sub-string of matched regex above.
}
END{ ##Starting END block of this program from here.
for(i in val){ ##Traversing through val here.
print i,val[i] ##Printing value of i and value of array val with index i here.
}
}' Input_file ##Mentioning Input_file name here.

This might work for you (GNU grep,sort,uniq and awk):
grep -Eo '\b[0-9]{5}\b' file | sort -n | uniq -c | awk '{print $2,$1}'
Find 5 digit numbers, sort them, filter and count them and then reverse the columns.
Just for fun, here is a sed solution:
sed -nE 'H;$!d;x;s/[^0-9]/ /g;s/ +/ /g;
:a;x;s/.*/1/;x;tb;
:b;s/^(( \S+\b).*)\2\b/\1/;Tc;x;s/.*/expr & + 1/e;x;tb;
:c;G;s/^ (\S+)(.*)\n(.*)/\1 \3\n\2/;/^[0-9]{5} /P;s/.*\n//;/\S/ba' file
Slurp the file into memory.
Space separate numbers.
Reduce multiple occurrences of the first number to one and count the occurrences.
Print the first number and its occurrences if it fits the criteria.
Repeat with all other numbers.

While using awk some of the folder names are missing

I am trying to grep folder name present after /share/volume_repository
c20_testprd_108
/share/volume_repository/c20_testprd_108_2018-01-0912:15:51.469
/share/volume_repository/test_testprd_20_2019-03-0504:03:24.24
/share/volume_repository/c20_testprd_109_2018-01-0912:11:32.915
/share/volume_repository/hp_testprd_2003_2018-10-2917:51:24.724
/share/volume_repository/hp_testprd_3335_2019-01-2220:00:17.139
/share/volume_repository/hp_testprd_2002_2018-10-2917:49:15.605
/share/shared_volume_repository/fnolan_ha_testprd_02_2018-06-2621:31:23.405
I tried to fetch by the combination of cut & awk, in awk if I am using field separator _20 it removes some of the folder names.
cat abc |cut -d '/' -f 4|awk -F '_20' '{print $1}'
Output:
c20_testprd_108
test_testprd
c20_testprd_109
hp_testprd
hp_testprd_3335
hp_testprd
fnolan_ha_testprd_02
The expected output is
c20_testprd_108
test_testprd_20
c20_testprd_109
hp_testprd_2003
hp_testprd_3335
hp_testprd_2002
fnolan_ha_testprd_02

Could you please try following. Written and tested with shown samples.
awk '
match($0,/\/share\/(shared_)?volume_repository\/[^:]*/){
value=substr($0,RSTART,RLENGTH)
gsub(/.*\/|_[0-9]+-[0-9]+-[0-9]+$/,"",value)
print value
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/\/share\/(shared_)?volume_repository\/[^:]*/){ ##Using match function to match regex from share till colon here.
value=substr($0,RSTART,RLENGTH) ##Creating var value with sub-string for current line.
gsub(/.*\/|_[0-9]+-[0-9]+-[0-9]+$/,"",value) ##Globally substituting everything till / OR last date timings from value here.
print value ##Printing value here.
}
' Input_file ##Mentioning Input_file name here.

With GNU awk:
awk '{print $4}' FS='/|_....-..-....' file
Output:
c20_testprd_108
test_testprd_20
c20_testprd_109
hp_testprd_2003
hp_testprd_3335
hp_testprd_2002
fnolan_ha_testprd_02

Print everything before relevant symbol and keep 1 character after relevant symbol

I'm trying to find a one-liner to print every before relevant symbol and keep just 1 character after relevant symbol:
Input:
thisis#atest
thisisjust#anothertest
just#testing
Desired output:
thisis#a
thisjust#a
just#t
awk -F"#" '{print $1 "#" }' will almost give me what I want but I need to find a way to print the second character as well. Any ideas?

You can substitute what's after the first character after # with nothing with sed:
sed 's/\(#.\).*/\1/'

You could use grep:
$ grep -o '[^#]*#.' infile
thisis#a
thisisjust#a
just#t
This matches a sequence of characters other than #, followed by # and any character. The -o option retains only the match itself.

With the special RT variable in GNU's awk, you can do:
awk 'BEGIN{RS="#.|\n"}RT!="\n"{print $0 RT}'

Get the index of the '#', then pull out the substring.
$ awk '{print substr($0,1,index($0,"#")+1);}' in.txt
thisis#a
thisisjust#a
just#t

1st Solution: Could you please try following.
awk 'match($0,/[^#]*#./){print substr($0,RSTART,RLENGTH)}' Input_file
Above will print lines as per your ask which have # in them and leave lines which does not have it, in case you want to completely print those lines use following then.
awk 'match($0,/[^#]*#./){print substr($0,RSTART,RLENGTH);next} 1' Input_file
2nd solution:
awk 'BEGIN{FS=OFS="#"} {print $1,substr($2,1,1)}' Input_file

Some small variation of Ravindes 2nd example
awk -F# '{print $1"#"substr($2,1,1)}' file
awk -F# '{print $1FS substr($2,1,1)}' file
Another grep variation (shortest posted so far):
grep -oP '.+?#.' file
o print only matching
P Perl regex (due to +?)
. any character
+ and more
? but stop with:
#
. pluss one more character
If we do not add ?. This line test#one#two becomes test#one#t instead of test#o do to the greedy +

If you want to use awk, the cleanest way to do this with is using index which finds the position of a character:
awk 'n=index($0,'#') { print substr($0,1,n+1) }' file
There are, however, shorter and more dedicated tools for this. See the other answers.

extract all values for specific key from space delimited text file

have a text file in the format
1=23 2=44 15=17:31:37.640 5=abc 15=17:31:37.641 4=23 15=17:31:37.643 15=17:31:37.643
I need a regex to extract all the values for key 15 for a multiline text file
output should be
17:31:37.640 17:31:37.641 17:31:37.643 17:31:37.643
Sorry, I should have stated that the values I'm trying to extract are timestamps in the form 17:31:37.643

You can use GNU grep to extract the substrings.
grep -Po '\b15=\K\S+' | tr '\n' ' '
-P option interprets the pattern as a Perl regular expression.
-o option shows only the matching part that matches the pattern.
\K throws away everything that it has matched up to that point.
Output
17:31:37.640 17:31:37.641 17:31:37.643 17:31:37.643

You can use sed:
sed 's/15=\([^ ]*\)/\1/g;s/[0-9]\+[^ ]\+ //g' input.file
Gave that answer before OP added the expected output, it will work too, but adds a new line after every value:
If you have GNU grep, you can use a lookbehind assertion that comes with perl compatible regex mode:
grep -oP '(?<=15=)[^ ]*' <<< '1=23 2=44 15=xyz 5=abc 15=yyy 4=23 15=omnet 15=that'
Output:
xyz
yyy
omnet
that

Using awk:
awk -F'=' -v RS=' ' -v ORS=' ' '$1==15 { print $2 }' file
xyz yyy omnet that
Set the Input and Output Record Separator to space and Input Field Separator to =. Test the condition of column1 to be 15. If that is true, print the second column.
As suggested by Ed Morton in the comments, this would leave a trailing blank char or even an absent newline. If thats a concern, you can use the following using GNU awk for multi-char RS.
gawk -F'=' -v RS='[[:space:]]+' '$1==15{ printf "%s%s", (c++?OFS:""), $2 } END{print ""}' file

how to get sub-expression value of regExp in awk?

I was analyzing logs contains information like the following:
y1e","email":"","money":"100","coi
I want to fetch the value of money, i used 'awk' like :
grep pay action.log | awk '/"money":"([0-9]+)"/' ,
then how can i get the sub-expression value in ([0-9]+) ?

If you have GNU AWK (gawk):
awk '/pay/ {match($0, /"money":"([0-9]+)"/, a); print substr($0, a[1, "start"], a[1, "length"])}' action.log
If not:
awk '/pay/ {match($0, /"money":"([0-9]+)"/); split(substr($0, RSTART, RLENGTH), a, /[":]/); print a[5]}' action.log
The result of either is 100. And there's no need for grep.

Offered as an alternative, assuming the data format stays the same once the lines are grep'ed, this will extract the money field, not using a regular expression:
awk -v FS=\" '{print $9}' data.txt
assuming data.txt contains
y1e","email":"","money":"100","coin.log
yielding:
100
I.e., your field separator is set to " and you print out field 9

You need to reference group 1 of the regex
I'm not fluent in awk but here are some other relevant questions
awk extract multiple groups from each line
GNU awk: accessing captured groups in replacement text
Hope this helps

If you have money coming in at different places then may be it would not be a good idea to hard code the positional parameter.
You can try something like this -
$ awk -v FS=[,:\"] '{ for (i=1;i<=NF;i++) if($i~/money/) print $(i+3)}' inputfile

grep pay action.log | awk -F "\n" 'm=gensub(/.*money":"([0-9]+)".*/, "\\1", "g", $1) {print m}'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Edited: Grep/Awk- Print specific info from table - regex

Here is a gnu awk solution to get this done in a single command: awk 'NF > 2 && !seen[$2]++{} END { PROCINFO["sorted_in"]="#ind_str_asc"; for (i in seen) print i}' file 2000/03/01 2001/02/03 2002/02/01

Using any awk and whether your names have 1 word or more and whether blank chars exist after the commas or not: $ awk -F', ' 'NR>1{sub(/ ./,"",$2); print $2}' file | sort -u 2000/03/01 2001/02/03 2002/02/01

Related

count of extracted text for each number

While using awk some of the folder names are missing

Print everything before relevant symbol and keep 1 character after relevant symbol

extract all values for specific key from space delimited text file

how to get sub-expression value of regExp in awk?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Edited: Grep/Awk- Print specific info from table - regex

Here is a gnu awk solution to get this done in a single command: awk 'NF > 2 && !seen[$2]++{} END { PROCINFO["sorted_in"]="#ind_str_asc"; for (i in seen) print i}' file 2000/03/01 2001/02/03 2002/02/01

Using any awk and whether your names have 1 word or more and whether blank chars exist after the commas or not: $ awk -F', *' 'NR>1{sub(/ .*/,"",$2); print $2}' file | sort -u 2000/03/01 2001/02/03 2002/02/01

Related

count of extracted text for each number

While using awk some of the folder names are missing

Print everything before relevant symbol and keep 1 character after relevant symbol

extract all values for specific key from space delimited text file

how to get sub-expression value of regExp in awk?

Categories

Resources

Using any awk and whether your names have 1 word or more and whether blank chars exist after the commas or not: $ awk -F', ' 'NR>1{sub(/ ./,"",$2); print $2}' file | sort -u 2000/03/01 2001/02/03 2002/02/01