Bash: how to take a number from string? (regular expression maybe) - regex

I want to get a count of symbols in a file.
wc -c f1.txt | grep [0-9]
But this code return a line where grep found numbers. I want to retrun only 38. How?

You can use awk:
wc -c f1.txt | awk '{print $1}'
OR using grep -o:
wc -c f1.txt | grep -o "[0-9]\+"
OR using bash regex capabilities:
re="^ *([0-9]+)" && [[ "$(wc -c f1.txt)" =~ $re ]] && echo "${BASH_REMATCH[1]}"

pass data to wc from stdin instead of a file: nchars=$(wc -c < f1.txt)

Related

Sed command to search by regex in file

I need to get a number of version from file. My version file looks like this:
#define MINOR_VERSION_NUMBER 1
I try to use sed command:
VERSION_MINOR=`sed -i -e 'MINOR_VERSION_NUMBER\s+\([0-9]+\).*/\1/p' $WORKSPACE/project/common/version.h`
but I get error:
sed: -e expression #1, char 2: extra characters after command
The "address" that selects matching lines needs to be enclosed in /.../ (or \X...X for any X).
sed -ne '/MINOR_VERSION_NUMBER/{ s/.*\([0-9]\).*/\1/;p }'
Don't use -i, it changes the file in place and doesn't output anything.
The more common way would be to use awk to find the line and extract the wanted column:
awk '(/MINOR_VERSION_NUMBER/){print$3}'
using grep
grep MINOR_VERSION_NUMBER | grep -o '[0-9]*$'
Demo :
$echo "#define MINOR_VERSION_NUMBER 1" | grep -o '[0-9]*$'
1
$echo "#define MINOR_VERSION_NUMBER 1123" | grep -o '[0-9]*$'
1123
$
Here is a correction of your attempt. Change your line:
VERSION_MINOR=`sed -i -e 'MINOR_VERSION_NUMBER\s+\([0-9]+\).*/\1/p' $WORKSPACE/project/common/version.h`
into:
VERSION_MINOR=`sed -n -e '/^#define\s\+MINOR_VERSION_NUMBER\s\+\([0-9]\+\).*/ s//\1/p' $WORKSPACE/project/common/version.h`
This can be made more readable with GNU sed's -r option:
VERSION_MINOR=`sed -n -r -e '/^#define\s+MINOR_VERSION_NUMBER\s+([0-9]+).*/ s//\1/p' $WORKSPACE/project/common/version.h`
As stated by choroba, awk would be more suited than sed for this kind of processing (see his answer).
However, here is another solution using bash's read builtin, together with GNU grep:
read x x VERSION_MINOR x < <(grep -F -w -m1 MINOR_VERSION_NUMBER $WORKSPACE/project/common/version.h)
VERSION_MINOR=$(echo "#define MINOR_VERSION_NUMBER 1" | tr -s ' ' | cut -d' ' -f3)

Pulling Single digit out of string --bash

Example
./test.sh R19
echo "$1" > test.txt
cat test.txt | grep -o ^[A-Z] > model.txt
cat test.txt | grep -o [0-9] > num1.txt
cat test.txt | grep -o [0-9]$ > num2.txt
echo "$(cat model.txt)00$(cat num1.txt)00$(cat num2.txt)"
Im expecting to see R001009, however what i get is
R001
9009
So how can i get it so my num1.txt only recieves the middle number and not both?
That's because grep -o '[0-9]' is returning all the digits on separate lines.
The painful way would be cat test.txt | grep -o [0-9] | head -1 > num1.txt
But don't do that: you're doing way too much file I/O. Use a regex in bash:
if [[ $1 =~ ^([A-Z])([0-9])([0-9])$ ]]; then
printf "%s00%d00%d\n" "${BASH_REMATCH[#]:1}"
fi
Make sure you're using #!/bin/bash as your shebang line.
$ set -- R19
$ if [[ $1 =~ ^([A-Z])([0-9])([0-9])$ ]]; then
> printf "%s00%d00%d\n" "${BASH_REMATCH[#]:1}"
> fi
R001009

How do I grep for all words that contain two consecutive e’s, and also contains two y’s

I want to find the set of words that contain two consecutive e’s, and also contains two y’s.
So far i got to /eeyy/
Alteration with ERE:
$ echo evyyree | grep -E '.*ee.*yy|.*yy.*ee'
evyyree
$ echo eveeryy | grep -E '.*ee.*yy|.*yy.*ee'
eveeryy
If the match needs to be in the same word, you can do:
$ echo "eee yyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee' # no match
$ echo "eeeyyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyyyy
Then only that word:
$ echo 'eeeyy heelo' | grep -Eo 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyy
Pipe it:
$ echo eennmmyy | grep ee | grep yy
eennmmyy
awk approach to match all words that contain both ee and yy:
s="eennmmyy heello thees-whyy someyy"
echo $s | awk '{for(i=1;i<=NF;i++) if($i~/ee/ && $i~/yy/) print $i}'
The output:
eennmmyy
thees-whyy
The only sensible and extensible way to do this is with awk:
awk '/ee/&&/yy/' file
Imagine trying to do it the grep way if you also had to find zz. Here's awk:
awk '/ee/&&/yy/&&/zz/' file
and here's grep:
grep -E 'ee.*yy.*zz|ee.*zz.*yy|yy.*ee.*zz|yy.*zz.*ee|zz.*yy.*ee|zz.*ee.*yy' file
Now add a 4th additional string to search for and see what that looks like!

How can I extract the timestamp from the end of a shell variable when the format isn't fixed?

I'm trying to extract the timestamp from the end of a shell variable like this:
Input=AEXP_CSTONE_EU_prpbdp_sourcefile_yyyymmddhhmmss.txt
TimeStamp=`echo $Input | awk -F"_" '{print $6}'`
This works for this particular case, but the format of the string can change. For example, it could also be:
Input=AEXP_CSTONE_EU_prpbdp_sourcefile_prospects_yyyymmddhhmmss.txt
The variable will always end with yyyymmddhhmmss.txt. How can I extract the timestamp consistently?
Given:
$ echo $Input
AEXP_CSTONE_EU_prpbdp_sourcefile_prospects_20151116141111.txt
You can use sed:
$ echo $Input | sed -n 's|.*_\([0-9]\{14\}\)\.txt|\1|p'
20151116141111
Or nested grep:
$ echo $Input | grep -Eo '_[0-9]{14}\.txt' | grep -Eo '[0-9]{14}'
20151116141111
awk:
$ echo $Input | awk -F_ '{split($NF, a, "."); print a[1]}'
20151116141111
Perl
$ echo $Input | perl -ne 'print $1 if /_(\d{14})\.txt/'
20151116141111
cut and rev:
$ echo $Input | rev | cut -d'_' -f 1 | rev | cut -d'.' -f 1
20151116141111
Bash:
$ last=${Input##*_}
$ echo $last
20151116141111.txt
$ ts=${last%.*}
$ echo $ts
20151116141111
In summary, lots of ways...
If you don't want to loose the .txt part, even easier:
$ echo $Input | sed -n 's|.*_\([0-9]\{14\}\.txt\)|\1|p'
20151116141111.txt
$ echo $Input | grep -Eo '[0-9]{14}\.txt$'
20151116141111.txt
$ echo $Input | awk -F_ '{print $NF}'
20151116141111.txt
$ echo $Input | perl -ne 'print $1 if /_(\d{14}\.txt)/'
20151116141111.txt
$ echo $Input | rev | cut -d'_' -f 1 | rev
20151116141111.txt
$ last=${Input##*_}
$ echo $last
20151116141111.txt
You need to match the part that will not change then:
TimeStamp=$(echo $Input | perl -pe 's/.*(\d{14})\.txt/$1/')
You are extracting the 6th field separated by _, yet it seems you really want to extract the last field. You can do that with parameter expansion:
timestamp=${Input##*_}
timestamp=${timestamp%.txt}
See BashFAQ 100 for more on string manipulation in bash.
In awk, you'd use $NF to reference the last field, though awk is overkill for this.

Print a part of string regex bash

From this content (in a file):
myspecificBhost.fqdn.com myspecificaBhost.fqdn.com myspecificzBhost.fqdn.com
I need to print the next 4 characters from the "B":
Bhost
I tried:
echo ${var:position1:lenght}
but position 1 is never equal
Using BASH regex:
s='myspecificBhost.fqdn.com myspecificaBhost.fqdn.com myspecificzBhost.fqdn.com'
[[ "$s" =~ (B[a-z][a-z][a-z][a-z]) ]] && echo "${BASH_REMATCH[1]}"
Bhost
try sed command:
sed -nr '/.*c(.{4,6}).*/s//\1/p' input.txt | cut -c2-6
RESULT:
Bhost
With grep command:
cat input.txt | grep -o B.... | head -1
RESULT:
Bhost
Try with:
cat file | grep -o B....
Bash using parameter substitution. Outputs the 4 characters
after the first 'B':
text='myspecificBhost.fqdn.com myspecificaBhost.fqdn.com myspecificzBhost.fqdn.com'
text=${text#*B}
text=${text:0:4}
echo "${text}"
Output:
host
To get the leading 'B' use
echo "B${text}"