How to egrep the month from the date? - regex

I want to ask how I can use egrep to extract only the month section of a date in the form of
mm/dd/yyyy at hh:mm:ss
I've tried the positive lookbehind assertion but it didn't seem to work. The context of this code is: I'm looking at multiple files and gathering the dates from each of the files into timestamp.txt. In the original files, all the dates are located after TimeStamp:(note space after colon)
I'm not too great with regular expressions so I know I'm missing the expression to block out the / after the first two digits as well. If anybody can help me with that, that would be awesome :D
egrep "(?<=TimeStamp:\s)" $CURFILE | sort >> ../timestamp.txt
Thank you!

Not sure if this is possible with egrep but it is with perl
echo "TimeStamp: 12/10/2012"| perl -n -e 'print $1f m#: (..)/#'
Here's an other way of doing this
echo "TimeStamp: 12/10/2012"| awk -F/ '{print $1}' | awk '{print $2}'

echo "TimeStamp: 12/10/2012"| grep TimeStamp: | cut -d ' ' -f 2 |cut -d '/' -f 1
The code is untested and might have argument escaping problems.
The idea is to first split the input by space (or colon) and then by slash. If there are more spaces in the line you might need to manipulate -f values or add more splits.

Related

sed regex cut string after match

I tested a regex on http://regexr.com/ and it works like expected.
How can I run this by using sed?
/^.*?OU=([^,]*)/g
The test string looks like:
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test,OU=Tese Sites,DC=Test,DC=local;test.local
And the output is:
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test
So it should cut the string before the second OU= starts.
Thanks
sed is not the best tool for this case when you have to deal with text that contains "columns" and can be split. Here are two possibilities, one with sed and the other with awk:
s="mario.test;Mario Test;Mario;Test;123;+001122334455,CN=Mario Test,OU=AT-Linz,OU=Tese Sites,DC=Test,DC=local;test.local"
echo $s | sed 's/OU=/й/' | sed 's/\([^й]*\)й\([^,]*\).*/\1OU=\2/'
echo $s | awk -F",OU=" '{print $1 ",OU=" $2}'
See the online demo
The awk solution splits with ,OU= substring and then joins the first and second column with the separator (since it is hardcoded, it is easy to put it back).
sed uses 2 passes: 1) add a non-used char (must be a control char, here, a Cyrillic letter is used for better "visibility") to mark the border of our match, 2) match all we do not need and match and capture what we need to keep with the help of capturing groups and backreferences.
Your question isn't clear but from reading your comments, are either of these what you're looking for?
$ awk -F, '{print $1 FS $2}' file
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test
$ awk -F'CN=[^,]+,OU=|,' '{print $1 $2}' file
mario.test;Mario Test;Mario;Test;123;+001122334455;AT-Test

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd - should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After
Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.
This might work for you (GNU sed):
sed 's/-[^-]*//2g' file
You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u
#EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub():
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.
awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
Always print the 1st field (print $1), followed by:
If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).
This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:
$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u
awk '$0 = $2 ? $1 FS $2 : $1' FS=-
Result
After-u
After
This will do it in awk:
echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'

how to extract substring and numbers only using grep/sed

I have a text file containing both text and numbers, I want to use grep to extract only the numbers I need for example, given a file as follow:
miss rate 0.21
ipc 222
stalls n shdmem 112
So say I only want to extract the data for miss rate which is 0.21. How do I do it with grep or sed? Plus, I need more than one number, not only the one after miss rate. That is, I may want to get both 0.21 and 112. A sample output might look like this:
0.21 222 112
Cause I need the data for later plot.
If you really want to use only grep for this, then you can try:
grep "miss rate" file | grep -oe '\([0-9.]*\)'
It will first find the line that matches, and then only output the digits.
Sed might be a bit more readable, though:
sed -n 's#miss rate ##p' file
Use awk instead:
awk '/^miss rate/ { print $3 }' yourfile
To do it with just grep, you need non-standard extensions like here with GNU grep using PCRE (-P) with positive lookbehind (?<=..) and match only (-o):
grep -Po '(?<=miss rate ).*' yourfile
Using the special look around regex trick \K with pcre engine with grep :
grep -oP 'miss rate \K.*' file.txt
or with perl :
perl -lne 'print $& if /miss rate \K.*/' file.txt
The grep-and-cut solution would look like:
to get the 3rd field for every successful grep use:
grep "^miss rate " yourfile | cut -d ' ' -f 3
or to get the 3rd field and the rest use:
grep "^miss rate " yourfile | cut -d ' ' -f 3-
Or if you use bash and "miss rate" only occurs once in your file you can also just do:
a=( $(grep -m 1 "miss rate" yourfile) )
echo ${a[2]}
where ${a[2]} is your result.
If "miss rate" occurs more then once you can loop over the grep output reading only what you need. (in bash)
You can use:
grep -P "miss rate \d+(\.\d+)?" file.txt
or:
grep -E "miss rate [0-9]+(\.[0-9]+)?"
Both of those commands will print out miss rate 0.21. If you want to extract the number only, why not use Perl, Sed or Awk?
If you really want to avoid those, maybe this will work?
grep -E "miss rate [0-9]+(\.[0-9]+)?" g | xargs basename | tail -n 1
I believe
sed 's|[^0-9]*\([0-9\.]*\)|\1 |g' fiilename
will do the trick. However every entry will be on it's own line if that is ok. I am sure there is a way for sed to produce a comma or space delimited list but I am not a super master of all things sed.

Regular expression to extract a percentage

I have strings like the following: blabla a13724bla-bla244 35%
Notice that there is always a space before the percentage. I would like to extract the percentage number (so, without the %) from these strings using the Linux shell.
Assuming you have GNU grep:
$ grep -oP '\d+(?=%)' <<< "blabla a13724bla-bla244 35%"
35
Using sed:
echo blabla a13724bla-bla244 35% | sed 's/.*[ \t][ \t]*\([0-9][0-9]*\)%.*/\1/'
If you expect to have multiple percentages in a line then:
echo blabla 20% a13724bla-bla244 35% | \
sed -e 's/[^%0-9 ]*//g;s/ */\n/g' | sed -n '/%/p'
You can try this
echo "blabla a13724bla-bla244 35%" | cut -d' ' -f3 | sed 's/\%//g'
NOTE: Assumption is the input is always in this format and percentage is 3rd token separated by space.
You may try this regular expression:
/\s(\d+%)/
Use this regular expression:
\s(\d{1,3})%
If you need it in shell, you can use sed or this perl one-liner:
echo "blah 35%" | perl -pe "s/.*\s(\d{1,3})%/\1/g"
35
If you always have a number of continuous columns maybe you should try with awk instead of a regular expresion.
cat file.txt |awk '{print $3}' |cut -d "%" -f 1
With this code you obtain the third column.

bash script regex matching

In my bash script, I have an array of filenames like
files=( "site_hello.xml" "site_test.xml" "site_live.xml" )
I need to extract the characters between the underscore and the .xml extension so that I can loop through them for use in a function.
If this were python, I might use something like
re.match("site_(.*)\.xml")
and then extract the first matched group.
Unfortunately this project needs to be in bash, so -- How can I do this kind of thing in a bash script? I'm not very good with grep or sed or awk.
Something like the following should work
files2=(${files[#]#site_}) #Strip the leading site_ from each element
files3=(${files2[#]%.xml}) #Strip the trailing .xml
EDIT: After correcting those two typos, it does seem to work :)
xbraer#NO01601 ~
$ VAR=`echo "site_hello.xml" | sed -e 's/.*_\(.*\)\.xml/\1/g'`
xbraer#NO01601 ~
$ echo $VAR
hello
xbraer#NO01601 ~
$
Does this answer your question?
Just run the variables through sed in backticks (``)
I don't remember the array syntax in bash, but I guess you know that well enough yourself, if you're programming bash ;)
If it's unclear, dont hesitate to ask again. :)
I'd use cut to split the string.
for i in site_hello.xml site_test.xml site_live.xml; do echo $i | cut -d'.' -f1 | cut -d'_' -f2; done
This can also be done in awk:
for i in site_hello.xml site_test.xml site_live.xml; do echo $i | awk -F'.' '{print $1}' | awk -F'_' '{print $2}'; done
If you're using arrays, you probably should not be using bash.
A more appropriate example wold be
ls site_*.xml | sed 's/^site_//' | sed 's/\.xml$//'
This produces output consisting of the parts you wanted. Backtick or redirect as needed.