Make reference to a file in a regular expression - regex

I have two files. One is a SALESORDERLIST, which goes like this
ProductID;ProductDesc
1,potatoes 1 kg.
2,tomatoes 2 k
3,bottles of whiskey 2 un.
4,bottles of beer 40 gal
(ProductID;ProductDesc) header is actually not in the file, so disregard it.
In another file, POSSIBLEUNITS, I have -you guessed- the possible units, and their equivalencies:
u;u.;un;un.;unit
k;k.;kg;kg.,kilograms
This is my first day with regular expressions and I would like to know how can I get the entries in SALESORDERLIST, whose units appear in POSSIBLEUNITS. In my example, I would like to exclude entry 4 since 'gal' is not listed in POSSIBLEUNITS file.
I say regex, since I have a further criteria that needs to be matched:
egrep "^[0-9]+;{1}[^; ][a-zA-Z ]+" SALESORDERLIST
From those resultant entries, I want to get those ending in valid units.
Thanks!

One way of achieving what you want is:
cat SALESORDERLIST | egrep "\b(u|u\.|un|un\.|unit|k|k\.|kg|kg\.|kilograms)\b"
1,potatoes 1 kg.
2,tomatoes 2 k
3,bottles of whiskey 2 un.
The metacharacter \b is an anchor that allows you to perform a "whole words only" search using
a regular expression in the form of \bword\b.
http://www.regular-expressions.info/wordboundaries.html

One way would be to create a bash script, say called findunit.sh:
while read line
do
match=$(egrep -E "^[0-9]+,{1}[^, ][a-zA-Z ]+" <<< $line)
name=${match##* }
# echo "$name..."
found=$(egrep "$name" /pathtofile/units.txt)
# echo "xxx$found"
[ -n "$found" ] && echo $line
done < $1
Then run with:
findunit.sh SALESORDERLIST
My output from this is:
1,potatoes 1 kg.
2,tomatoes 2 k
3,bottles of whiskey 2 un.

An example of doing it completely in bash:
declare -A units
while read line; do
while [ -n "$line" ]; do
i=`expr index $line ";"`
if [[ $i == 0 ]]; then
units[$line]=1
break
fi
units[${line:0:$((i-1))}]=1
line=${line#*;}
done
done < POSSIBLEUNITS
while read line; do
unit=${line##* }
if [[ ${units[$unit]} == 1 ]]; then
echo $line
fi
done < SALESORDERLIST

Related

Bash regex overwrite line if multiple match

I have a bash script where I have 3 regular expressions. I would like to, through conditional if, to find the match of the first pattern in the file.
If there is a match, then look for a match in the second pattern but only with the lines that have matched the first pattern.
Finally, to check the third pattern only with the lines that have matched the second pattern (which are also the ones that had already matched the first pattern).
I have the following code but I don't know how to tell that if there is a match to overwrite the "line" value to decrease the number of total lines to only the ones matching.
#!/bin/bash
pattern1= egrep '^([^,]*,){31}[1-9][0-9].*'
pattern2= egrep '^([^,]*,){16}[0-1].[3-9].*'
pattern3= egrep '^([^,]*,){32}[2-9][0-9].*'
while read line
do
if [[$line == $pattern1]];then
newline == $pattern1
if [[$newline == $pattern2 ]];then
newline2 == $pattern2
if [[$newline2 == $pattern3 ]]; then
echo $pattern3
fi
done < mj1.csv #this is the input file
I will call this script like ./b1.sh <filename>.
Some input data:
EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc
1985,1,1,10/26/1984,21,252,21.6899384,CHI,1,WSB,1,16,1,40,5,16,0.313,0,0,,6,7,0.857,1,5,6,7,2,4,5,2,16,12.5
1985,2,2,10/27/1984,21,253,21.69267625,CHI,0,MIL,0,-2,1,34,8,13,0.615,0,0,,5,5,1,3,2,5,5,2,1,3,4,21,19.4
1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9
1985,4,4,10/30/1984,21,256,21.7008898,CHI,0,KCK,1,5,1,36,8,21,0.381,0,0,,9,9,1,2,2,4,5,3,1,6,5,25,14.7
1985,5,5,11/1/1984,21,258,21.7063655,CHI,0,DEN,0,-16,1,33,7,15,0.467,0,0,,3,4,0.75,3,2,5,5,1,1,2,4,17,13.2
1985,6,6,11/7/1984,21,264,21.72279261,CHI,0,DET,1,4,1,27,9,19,0.474,0,0,,7,9,0.778,1,3,4,3,3,1,5,5,25,14.9
1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.75,4,4,8,5,3,2,5,2,33,29.3
1985,8,8,11/10/1984,21,267,21.73100616,CHI,0,IND,1,2,1,42,9,22,0.409,0,0,,9,12,0.75,2,7,9,4,2,5,3,4,27,21.2
1985,9,9,11/13/1984,21,270,21.73921971,CHI,1,SAS,1,3,1,43,18,27,0.667,1,1,1,8,11,0.727,2,8,10,4,3,2,4,4,45,37.5
1985,10,10,11/15/1984,21,272,21.74469541,CHI,1,BOS,0,-20,1,33,12,24,0.5,0,1,0,3,3,1,0,2,2,2,2,1,1,4,27,17.1
1985,11,11,11/17/1984,21,274,21.75017112,CHI,1,PHI,0,-9,1,44,4,17,0.235,0,0,,8,8,1,0,5,5,7,5,2,4,5,16,12.5
1985,12,12,11/19/1984,21,276,21.75564682,CHI,1,IND,0,-17,1,39,11,26,0.423,0,3,0,12,16,0.75,2,3,5,2,2,1,3,3,34,20.8
1985,13,13,11/21/1984,21,278,21.76112252,CHI,0,MIL,0,-10,1,42,11,22,0.5,0,0,,13,14,0.929,4,9,13,2,2,2,6,3,35,26.7
1985,14,14,11/23/1984,21,280,21.76659822,CHI,0,SEA,1,19,1,30,9,13,0.692,0,0,,5,6,0.833,0,4,4,3,4,1,4,4,23,19.5
1985,15,15,11/24/1984,21,281,21.76933607,CHI,0,POR,0,-10,1,41,10,24,0.417,0,1,0,10,10,1,3,3,6,8,3,1,4,4,30,23.9
1985,16,16,11/27/1984,21,284,21.77754962,CHI,0,GSW,0,-6,1,24,6,10,0.6,0,0,,1,1,1,0,2,2,3,3,2,4,1,13,11.1
1985,17,17,11/29/1984,21,286,21.78302533,CHI,0,PHO,0,-5,1,30,9,17,0.529,1,1,1,3,4,0.75,1,2,3,2,2,0,2,5,22,14
1985,18,18,11/30/1984,21,287,21.78576318,CHI,0,LAC,1,4,1,37,9,15,0.6,0,0,,2,4,0.5,2,3,5,5,3,0,4,4,20,15.5
1985,19,19,12/2/1984,21,289,21.79123888,CHI,0,LAL,1,1,1,42,7,13,0.538,0,0,,6,8,0.75,2,0,2,3,1,1,4,3,20,12.9
1985,20,20,12/4/1984,21,291,21.79671458,CHI,1,NJN,1,15,1,35,7,13,0.538,0,0,,6,6,1,1,2,3,6,1,0,3,3,20,16
1985,21,21,12/7/1984,21,294,21.80492813,CHI,1,NYK,1,2,1,43,8,16,0.5,0,1,0,5,7,0.714,1,1,2,3,2,0,6,5,21,9.3
1985,22,22,12/8/1984,21,295,21.80766598,CHI,1,DAL,1,2,1,35,10,23,0.435,0,0,,0,0,,4,3,7,2,0,2,2,3,20,11.2
1985,23,23,12/11/1984,21,298,21.81587953,CHI,1,DET,0,-7,1,37,13,28,0.464,0,1,0,1,3,0.333,1,7,8,6,2,0,3,4,27,16.2
1985,24,24,12/12/1984,21,299,21.81861739,CHI,0,DET,0,-7,1,30,6,17,0.353,0,2,0,9,10,0.9,0,1,1,2,2,1,1,5,21,12.5
1985,25,25,12/14/1984,21,301,21.82409309,CHI,0,NJN,0,-2,1,44,12,25,0.48,0,0,,10,10,1,2,6,8,8,1,0,0,4,34,29.5
1985,26,26,12/15/1984,21,302,21.82683094,CHI,1,PHI,0,-12,1,27,7,16,0.438,0,0,,0,0,,1,1,2,2,1,0,1,2,14,7.2
1985,27,27,12/18/1984,21,305,21.83504449,CHI,1,HOU,0,-8,1,45,8,20,0.4,0,1,0,2,4,0.5,1,2,3,8,3,0,1,2,18,14.5
1985,28,28,12/20/1984,21,307,21.84052019,CHI,0,ATL,1,3,1,41,12,22,0.545,0,0,,10,16,0.625,4,4,8,7,5,1,7,5,34,26.6
To make things easier, pattern1 matches all rows where column PTS is higher than 10, pattern 2 matches the rows where column FG_PCT is higher than 0.3, and pattern 3 matches all rows where column GmSc is higher than 19.
While an awk solution is going to be a bit faster ... we'll focus on a bash solution per OP's request.
First issue is regex matching uses the =~ operator and not the == operator.
Second issue is that to keep a row if only all 3 regexes match means we want to and (&&) the results of all 3 regex matches.
Third issue addresses some basic syntax issues with OP's current code (eg, space after [[ and before ]]; improper assignments of regex patterns to the pattern* variables).
One bash idea:
pattern1='^([^,]*,){31}[1-9][0-9].*'
pattern2='^([^,]*,){16}[0-1].[3-9].*'
pattern3='^([^,]*,){32}[2-9][0-9].*'
head -1 mj1.csv > mj1.new.csv
while read -r line
do
if [[ "${line}" =~ $pattern1 && "${line}" =~ $pattern2 && "${line}" =~ $pattern3 ]]
then
# do whatever with $line, eg:
echo "${line}"
fi
done < mj1.csv >> mj1.new.csv
This generates:
$ cat mj1.new.csv
EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc
1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9
1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.75,4,4,8,5,3,2,5,2,33,29.3
1985,8,8,11/10/1984,21,267,21.73100616,CHI,0,IND,1,2,1,42,9,22,0.409,0,0,,9,12,0.75,2,7,9,4,2,5,3,4,27,21.2
1985,9,9,11/13/1984,21,270,21.73921971,CHI,1,SAS,1,3,1,43,18,27,0.667,1,1,1,8,11,0.727,2,8,10,4,3,2,4,4,45,37.5
1985,12,12,11/19/1984,21,276,21.75564682,CHI,1,IND,0,-17,1,39,11,26,0.423,0,3,0,12,16,0.75,2,3,5,2,2,1,3,3,34,20.8
1985,13,13,11/21/1984,21,278,21.76112252,CHI,0,MIL,0,-10,1,42,11,22,0.5,0,0,,13,14,0.929,4,9,13,2,2,2,6,3,35,26.7
1985,15,15,11/24/1984,21,281,21.76933607,CHI,0,POR,0,-10,1,41,10,24,0.417,0,1,0,10,10,1,3,3,6,8,3,1,4,4,30,23.9
1985,25,25,12/14/1984,21,301,21.82409309,CHI,0,NJN,0,-2,1,44,12,25,0.48,0,0,,10,10,1,2,6,8,8,1,0,0,4,34,29.5
1985,28,28,12/20/1984,21,307,21.84052019,CHI,0,ATL,1,3,1,41,12,22,0.545,0,0,,10,16,0.625,4,4,8,7,5,1,7,5,34,26.6
NOTE: OP hasn't (yet) provided the expected output so at this point I have to assume OP's regexes are correct

Regular Expression to search for a number between two

I am not very familiar with Regular Expressions.
I have a requirement to extract all lines that match an 8 digit number between any two given numbers (for example 20200628 and 20200630) using regular expression. The boundary numbers are not fixed, but need to be parameterized.
In case you are wondering, this number is a timestamp, and I am trying to extract information between two dates.
HHHHH,E.164,20200626113247
HHHHH,E.164,20200627070835
HHHHH,E.164,20200628125855
HHHHH,E.164,20200629053139
HHHHH,E.164,20200630125855
HHHHH,E.164,20200630125856
HHHHH,E.164,20200626122856
HHHHH,E.164,20200627041046
HHHHH,E.164,20200628125856
HHHHH,E.164,20200630115849
HHHHH,E.164,20200629204531
HHHHH,E.164,20200630125857
HHHHH,E.164,20200630125857
HHHHH,E.164,20200626083628
HHHHH,E.164,20200627070439
HHHHH,E.164,20200627125857
HHHHH,E.164,20200628231003
HHHHH,E.164,20200629122857
HHHHH,E.164,20200630122237
HHHHH,E.164,20200630122351
HHHHH,E.164,20200630122858
HHHHH,E.164,20200630122857
HHHHH,E.164,20200630084722
Assuming the above data is stored in a file named data.txt, the idea is to sort it on the 3rd column delimited by the comma (i.e. sort -nk3), and then pass the sorted output through this perl filter, as demonstrated by this find_dates.sh script:
#!/bin/bash
[ $# -ne 3 ] && echo "Expects 3 args: YYYYmmdd start, YYYYmmdd end, and data filename" && exit
DATE1=$1
DATE2=$2
FILE=$3
echo "$DATE1" | perl -ne 'exit 1 unless /^\d{8}$/'
[ $? -ne 0 ] && echo "ERROR: First date is invalid - $DATE1" && exit
echo "$DATE2" | perl -ne 'exit 1 unless /^\d{8}$/'
[ $? -ne 0 ] && echo "ERROR: Second date is invalid - $DATE2" && exit
[ ! -r "$FILE" ] && echo "ERROR: File not found - $FILE" && exit
cat $FILE | sort -t, -nk3 | perl -ne '
BEGIN { $date1 = shift; $date2 = shift }
print if /164,$date1/ .. /164,$date2/;
print if /164,$date2/;
' $DATE1 $DATE2 | sort -u
Running the command find_dates.sh 20200627 20200629 data.txt will produce the result:
HHHHH,E.164,20200627041046
HHHHH,E.164,20200627070439
HHHHH,E.164,20200627070835
HHHHH,E.164,20200627125857
HHHHH,E.164,20200628125855
HHHHH,E.164,20200628125856
HHHHH,E.164,20200628231003
HHHHH,E.164,20200629053139
HHHHH,E.164,20200629122857
HHHHH,E.164,20200629204531
For the example you gave, between 20200628 and 20200630, you may try:
\b202006(?:2[89]|30)
Demo
I might be tempted to make the general comment that regex is not very suitable for finding numerical ranges (whereas application programming languages are). However, in the case of parsing a text log file, regex is what would be easily available.

Using regex in Bash with mapfile

Edit 2:
Minimal input file: input/input.txt
#-----------
snapshot=83
#-----------
time=30142088
mem_heap_B=20224
mem_heap_extra_B=8
mem_stacks_B=240480
heap_tree=empty
#-----------
snapshot=84
#-----------
time=30408368
mem_heap_B=20224
mem_heap_extra_B=8
mem_stacks_B=240552
heap_tree=empty
#-----------
snapshot=85
#-----------
time=30674648
mem_heap_B=20224
mem_heap_extra_B=8
mem_stacks_B=240464
heap_tree=empty
#-----------
snapshot=86
#-----------
Actual output:
input.txt/*
time, heap, stack
input/input.txt
time, heap, stack
30674648, 20224, 240464
input/input.txt
time, heap, stack
input/input.txt
time, heap, stack
input/input.txt
time, heap, stack
30674648, 20224, 240464
Expected output:
input.txt
time, heap, stack,
30142088, 20224, 240480
30408368, 20224, 240552
30674648, 20224, 240464
Edit: Originally, the problem may have been due to Bash's regex's lack of multiline capability. However, after stripping newlines from the text, the problem remains, with the exception that the output now has between one to five lines instead of zero.
I'm trying to write a Bash script to parse a text file into a desirable CSV file with the needed information.
As part of the script, I iterate through n files. Each of the files contains m matches for a given regex, and each match contains three capture groups.
I want to format the three capture groups into a CSV row, then concatenate all the rows of all the matches of all the files and write them to a *.csv file.
I'm quite comfortable using Regex in high level languages such as Kotlin or C#, however I have no experience with Regex in Bash. I used this answer as a starting point, however it doesn't seem to be working for me (mapfile -t matches < <( format_row "$text" "$regex" ) doesn't do anything.
Here's the full code with the relevant portion noted:
#!/bin/bash
# RELEVANT CODE BELOW
regex="time=([0-9]+)\nmem_heap_B=([0-9]+)\n.*\nmem_stacks_B=([0-9]+)"
format_row() {
local s=$1 regex=$2
while [[ $s =~ $regex ]]
do
time="${BASH_REMATCH[1]}"
heap="${BASH_REMATCH[2]}"
stack="${BASH_REMATCH[3]}"
echo "${time}, ${heap}, ${stack}"
echo ""
s=${s#*"${BASH_REMATCH[3]}"}
done
}
for file in $1/*
do
echo "Parsing ${file}..."
echo $file >> $2
echo "time, heap, stack" >> $2
text=$(<${file})
mapfile -t matches < <( format_row "$text" "$regex" )
printf "%s\n" "${matches[#]}" >> $2
echo "" >> $2
done
echo ""
echo "Done"
Thanks!
There are two problems here:
Although bash's =~ operator can match newlines, it does not understand the \n escape sequence. You have to use actual newlines in your regex. This can also be achieved by C-style strings $'\n'.
The regex quantifier * is greedy. When matching ...
[[ "a=1,b=1 a=2,b=2 a=3,b=3" =~ a=(.).*b=(.) ]]
... you end up with BATCH_REMATCH=(1 3) instead of (1 1).
In other regex dialects like PCRE you could use the non-greedy quantifier *?. However, in bash we have to use a workaround and have to replace .* with something that cannot match more than wanted, for instance
[[ "a=1,b=1 a=2,b=2 a=3,b=3" =~ a=(.)[^=]*b=(.) ]]
In your case we have to make sure that the next mem_stacks is not matched
As you didn't post any example input and expected output, I can only guess. However, I assume the following regex could work for you:
regex=$'time=([0-9]+)
mem_heap_B=([0-9]+)
([^\n]*\n){TODO set number of lines allowed here}
mem_stacks_B=([0-9]+)'
Note that now you have to use BASH_REMATCH[4] instead of [3].
At the marked location you have to insert the numbers of lines allowed between mem_heap and mem_stacks. The number can be constant (e.g. {5}) or a range (e.g. {1,10}). In case of ranges you have to make sure that the maximum bound is not so high that you could accidentally skip the next mem_stacks and match another mem_stacks instead. Thus, in case of ranges it might be more appropriate to use two matches. Something like
regex1='time=([0-9]+)
mem_heap_B=([0-9]+)'
regex2='mem_stacks_B=([0-9]+)'
while
[[ "$s" =~ $regex1 ]] &&
time="${BASH_REMATCH[1]}" &&
heap="${BASH_REMATCH[2]}" &&
[[ "$s" =~ $regex2 ]] &&
stack="${BASH_REMATCH[1]}"
do
echo "$time, $heap, $stack"
s="${s#*$stack}"
done >> "$2"
By the way:
https://www.shellcheck.net/ helps you to make your script more robust.
First and foremost: quote your variables.
You can use do cmd1; cmd2 done >> file instead of do cmd1 >> file; cmd2 >> file; done.
mapfile -t matches < <(format_row "$text" "$regex")
printf "%s\n" "${matches[#]}" >> "$2"
could be written as just
format_row "$text" "$regex" >> "$2"

Regular expression Bash issue

I have to match a string composed of only lowercase characters repeated 2 times , for example ballball or printprint. For example the word ball is not accepted because is not repeated 2 time.
For this reason I have this code:
read input
expr='^(([a-z]*){2})$'
if [[ $input =~ $expr ]]; then
echo "OK string"
exit 0
fi
exit 10
but it doesn't work , for example if I insert ball the script prints "OK string".
What do I wrong?
Not all Bash versions support backreferences in regexes natively. If yours doesn't, you can use an external tool such as grep:
read input
re='^\([a-z]\+\)\1$'
if grep -q "$re" <<< "$input"; then
echo "OK string"
exit 0
fi
exit 1
grep -q is silent and has a successful exit status if there was a match. Notice how (, + and ) have to be escaped for grep. (grep -E would understand () without escaping.)
Also, I've replaced your * with + so we don't match the empty string.
Alternatively: your requirement means that a matching string has two identical halves, so we can check for just that, without any regexes:
read input
half=$(( ${#input} / 2 ))
if (( half > 0 )) && [[ ${input:0:$half} = ${input:$half} ]]; then
echo "OK string"
fi
This uses Substring Expansion; the first check is to make sure that the empty string doesn't match.
Your requirement is to match strings made of two repeated words. This is easy to do by just checking if the first half of your string is equal to the remaining part. No need to use regexps...
$ var="byebye" && len=$((${#var}/2))
$ test ${var:0:$len} = ${var:$len} && { echo ok ; } || echo no
ok
$ var="abcdef" && len=$((${#var}/2))
$ test ${var:0:$len} = ${var:$len} && { echo ok ; } || echo no
no
The regex [a-z]* will match any alphanumeric or empty string.
([a-z]*){2} will match any two of those.
Ergo, ^(([a-z]*){2})$ will match any string containing zero or more alphanumeric characters.
Using the suggestion from #hwnd (replacing {2} with \1) will enforce a match on two identical strings.
N.B: You will need a fairly recent version of bash. Tested in bash 4.3.11.

sed regex to match ['', 'WR' or 'RN'] + 2-4 digits

I'm trying to do some conditional text processing on Unix and struggling with the syntax. I want to acheive
Find the first 2, 3 or 4 digits in the string
if 2 characters before the found digits are 'WR' (could also be lower case)
Variable = the string we've found (e.g. WR1234)
Type = "work request"
else
if 2 characters before the found digits are 'RN' (could also be lower case)
Variable = the string we've found (e.g. RN1234)
Type = "release note"
else
Variable = "WR" + the string we've found (Prepend 'WR' to the digits)
Type = "Work request"
fi
fi
I'm doing this in a Bash shell on Red Hat Enterprise Linux Server release 5.5 (Tikanga)
Thanks in advance,
Karl
I'm not sure how you read in your strings but this example should help you get there. I loop over 4 example strings, WR1234 RN456 7890 PQ2342. You didn't say what to do if the string doesn't match your expected format (PQ2342 in my example), so my code just ignores it.
#!/bin/bash
for string in "WR1234 - Work Request Name.doc" "RN5678 - Release Note.doc"; do
[[ $string =~ ^([^0-9]*)([0-9]*).*$ ]]
case ${BASH_REMATCH[1]} in
"WR")
var="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
type="work request"
echo -e "$var\t-- $type"
;;
"RN")
var="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
type="release note"
echo -e "$var\t-- $type"
;;
"")
var="WR${BASH_REMATCH[2]}"
type="work request"
echo -e "$var\t-- $type"
;;
esac
done
Output
$ ./rematch.sh
WR1234 -- work request
RN5678 -- release note
I like to use perl -pe instead of sed because PERL has such expressive regular expressions. The following is a bit verbose for the sake of instruction.
example.txt:
WR1234 - Work Request name.doc
RN456
rn456
WR7890 - Something else.doc
wr789
2456
script.sh:
#! /bin/bash
# search for 'WR' or 'RN' followed by 2-4 digits and anything else, but capture
# just the part we care about
records="`perl -pe 's/^((WR|RN)([\d]{2,4})).*/\1/i' example.txt`"
# now that you've filtered out the records, you can do something like replace
# WR's with 'work request'
work_requests="`echo \"$records\" | perl -pe 's/wr/work request /ig' | perl -pe 's/rn/release note /ig'`"
# or add 'WR' to lines w/o a listing
work_requests="`echo \"$work_requests\" | perl -pe 's/^(\d)/work request \1/'`"
# or make all of them uppercase
records_upper=`echo $records | tr '[:lower:]' '[:upper:]'`
# or count WR's
wr_count=`echo "$records" | grep -i wr | wc -l`
echo count $wr_count
echo "$work_requests"
#!/bin/bash
string="RN12344 - Work Request Name.doc"
echo "$string" | gawk --re-interval '
{
if(match ($0,/(..)[0-9]{4}\>/,a ) ){
if (a[1]=="WR"){
type="Work release"
}else if ( a[1] == "RN" ){
type = "Release Notes"
}
print type
}
}'