Ok this is driving me crazy. I have a text file with the following content:
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
"1","2","3","4","text","2020-12-12","2020-04-11","21"
"1","2","3","4","text","2020-05-21","2020-03-23","453"
etc.
I want to filter lines on which the second date is in december, I tried things like:
grep '.*(\d{4}-\d{2}-\d{2}).*(2020-12-).*' > output.txt
grep '.*\d{4}-\d{2}-\d{2}.*2020-12-.*' > output.txt
grep -P '.*\d{4}-\d{2}-\d{2}.*2020-12-.*' > output.txt
But nothing seems to work. Is there any way to accomplish this with either grep, egrep, sed or awk?
You need to use -P option of grep to enable perl compatible regular expressions, could you please try following. Written and tested with your shown samples.
grep -P '("\d+",){4}"[a-zA-Z]+","2020-12-\d{2}"' Input_file
Explanation: Adding explanation for above, following is only for explanation purposes.
grep ##Starting grep command from here.
-P ##Mentioning -P option for enabling PCRE regex with grep.
'("\d+",){4} ##Looking for " digits " comma this combination 4 times here.
"[a-zA-Z]+", ##Then looking for " alphabets ", with this one.
"2020-12-\d{2}" ##Then looking for " 2020-12-07 date " which OP needs.
' Input_file ##Mentioning Input_file name here.
I suggest an alternate solution awk due to input data structured in rows and columns using a common delimiter:
awk -F, '$7 ~ /-12-/' file
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
Use either grep -P or egrep for short:
$ cat test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
"1","2","3","4","text","2020-12-12","2020-04-11","21"
"1","2","3","4","text","2020-05-21","2020-03-23","453"
$
$ grep -P '^"([^"]*","){6}2020-12-' test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
$
$ egrep '^"([^"]*","){6}2020-12-' test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
Explanation:
^" - expect a " to start
([^"]*","){6} - scan over all chars other than ", followed by ","; repeat that 6 times
2020-12- - expect 202012-
The problem is in:
egrep '.*\d{4}-\d{2}-\d{2}.2020-12-.' > output.txt
^ HERE
The . just matches a single character, but you want to skip ",", so change to:
egrep '.*\d{4}-\d{2}-\d{2}.+2020-12-.' > output.txt
^^ HERE
The . becomes a .+.
$ acpi
Battery 0: Charging, 18%, 01:37:09 until charged
How to grep the battery level value without percentage character (18)?
This should do it but I'm getting an empty result:
acpi | grep -e '(?<=, )(.*)(?=%)'
Your regex is correct but will work with experimental -P or perl mode regex option in gnu grep. You will also need -o to show only matching text.
Correct command would be:
grep -oP '(?<=, )\d+(?=%)'
However, if you don't have gnu grep then you can also use sed like this:
sed -nE 's/.*, ([0-9]+)%.*/\1/p' file
18
Could you please try following, written and tested in link https://ideone.com/nzSGKs
your_command | awk 'match($0,/Charging, [0-9]+%/){print substr($0,RSTART+10,RLENGTH-11)}'
Explanation: Adding detailed explanation for above only for explanation purposes.
your_command | ##Running OP command and passing its output to awk as standrd input here.
awk ' ##Starting awk program from here.
match($0,/Charging, [0-9]+%/){ ##Using match function to match regex Charging, [0-9]+% in line here.
print substr($0,RSTART+10,RLENGTH-11) ##Printing sub string and printing from 11th character from starting and leaving last 11 chars here in matched regex of current line.
}'
Using awk:
awk -F"," '{print $2+0}'
Using GNU sed:
sed -rn 's/.*\, *([0-9]+)\%\,.*/\1/p'
You can use sed:
$ acpi | sed -nE 's/.*Charging, ([[:digit:]]*)%.*/\1/p'
18
Or, if Charging is not always in the string, you can look for the ,:
$ acpi | sed -nE 's/[^,]*, ([[:digit:]]*)%.*/\1/p'
Using bash:
s='Battery 0: Charging, 18%, 01:37:09 until charged'
res="${s#*, }"
res="${res%%%*}"
echo "$res"
Result: 18.
res="${s#*, }" removes text from the beginning to the first comma+space and "${res%%%*}" removes all text from end till (and including) the last occurrence of %.
so I have a file with the following:
username=jsmith
api=3434kjklj23j4l3kj4l34j3l4j
I would like to return using regular expression "jsmith" and "3434kjklj23j4l3kj4l34j3l4j"
I know the regular expression for it is:
(username=)(.*) > \2
(api=)(.*) > \2
however using grep or sed or awk. I can't seem to figure out the way to use them without return the entire line.
How would you go about doing that with a commandline command?
awk is made for this task:
awk -F= '{print$2}' file
If the file has other entries, you can limit the output with a condition:
awk -F= '$1=="username"||$1=="api"{print$2}' file
Here is one using bash, PCRE and positive lookbehind (where supported):
$ grep -Po "((?<=^username=)|(?<=^api=)).*" file
jsmith
3434kjklj23j4l3kj4l34j3l4j
ie. output everything that is preceeded by username= or api= that start the lines.
And one in awk:
$ awk 'sub(/^(username|api)=/,""){print}' file
jsmith
3434kjklj23j4l3kj4l34j3l4j
ie. print lines where preceeding ^username= or ^api= are removed first.
Since you want to see chess with the input game=chess, here some solutions without matching username= or api=
cut -d"=" -f2- file
# or
sed -n 's/[^=]*=//p' file
here's the answer that worked on the macos and RHEl7.
awk -F= '$1=="username"{print$2}' testfile.txt
awk -F= '$1=="api"{print$2}' testfile.txt
testfile.txt
username=user1
api=pass1
username=user2
api =pass2
To a file jungle.txt with following text ...
A lion sleeps in the jungle
A lion sleeps tonight
A tiger awakens in the swamp
The parrot observes
Wimoweh, wimoweh, wimoweh, wimoweh
... one could perform GREP search ...
$ grep lion jungle.txt
... or SED search ...
$ sed "/lion/p" jungle.txt
... to find occurences of a pattern ("lion" in this case).
Is there some easy way to get a number of returned lines? Or at least to know that there was more than 1 found? As always, I've googled a lot first, but surprisingly found no answer.
Thanks!
grep can count matching lines:
grep -c 'lion' file
Output:
2
Syntax:
-c: Suppress normal output; instead print a count of matching lines for each input file. With the -v, --invert-match option (see below), count non-matching lines. (-c is specified by POSIX.)
This might work for you (GNU sed):
sed '/lion/!d' file | sed '$=;d'
or if you prefer:
sed -n '/lion/p' file | sed -n '$='
N.B. if the file is empty or the first sed command finds nothing the result of the second sed command is blank.
You can use awk
awk '/lion/ {a++} END {print a+0}'
2
But I would say that the best solution is the one posted by Cyros using grep -c 'lion' file
Just pass the grep command output to wc- l command to count the number of returned lines,
$ grep 'lion' file | wc -l
2
From wc --help
-l, --lines print the newline counts
please refer the file contents below.
#HD VN:1.0 SO:unsorted
#SQ SN:Chr1 LN:30427680
#PG ID:bowtie2 PN:bowtie2 VN:2.1.0
how can i extract just the number 30427680 using awk or any other unix command.
Using sed
sed -n 's/.*LN://p' < input.txt
This will erase everything up until LN:, and print what's left, and only if a substitution did take place.
Using awk
awk -v FS=: '/LN:/ { print $3; }' < input.txt
This will match lines that contain LN:, use : as field separator, and print the 3rd column.
Using grep
grep -o '[0-9]\{3,\}' < input.txt
This will match sequences of 3 or more digits, and print only the matched pattern thanks to the -o.
Depending on other cases not included in your question, you might have to make the patterns more strict.
Using grep:
grep -oP 'LN:\K.*' filename
Just use grep:
grep -o 30427680 file
-o, --only-matching
Prints only the matching part of the lines.
Using perl :
perl -ne 'print $& if /LN:\K.*/' filename
or
perl -ne 'print $1 if /LN:(.*)/' filename
Another awk
awk -F"LN:" 'NF>1 {print $2}' file