Using html input pattern in awk

Using html input pattern in awk - regex

echo 8d07\'54.520\"W | awk '{ if ($1 ~ /[-+]?[0-9]*[.]?[0-9]+/) print $1; else print "erro" }'
I'm trying to check if it's a number, but it's no working... I use this same regex in a html
input text, and it works.
In this case I was expecting "erro". It's not working.
My final goal is to apply 3 different pattern match to 3 fields $1 $2 $3...

Not 100% sure of the requirement but you probably need to put anchors.
$ echo 8d07\'54.520\"W | awk '{ if ($1 ~ /^[-+]?[0-9]+[.]?[0-9]+/) print $1; else print "erro" }'
erro

Related

Bash AWK and Regex Apply on specific Column

I have the following dataset
Name,quantity,unit
car,6,6
plane,7,5
ship,2,3.44
bike,8,7.66
I want to print only the names which has unit in whole numbers.
I have done the following which does not give out the result
#!/bin/bash
awk 'BEGIN {
FS=","
}
/^[0-9]*$/ {
print "Has Whole numbers: " $1
}
' file.csv
The result should be
Has Whole numbers: car
Has Whole numbers: plane

Added a couple of lines to your test data:
Name,quantity,unit
car,6,6
plane,7,5
ship,2,3.44
bike,8,7.66
Starship,1,1.0
Super Heavy,2,0
null,0,
And awk:
$ awk -F, 'int($3)==$3 ""' file
Output:
car,6,6
plane,7,5
Super Heavy,2,0
int($3) makes an integer of $3 and $3 "" turns $3 to a string.

If you are sure 3rd column is a number:
awk -F, '(NR != 1 && $3 !~ /\./){print "Has Whole numbers:", $1}' file.csv
or well actually its better the way you did it:
awk -F, '$3 ~ /^[0-9]$/{print "Has Whole numbers:", $1}' file

Try changing /^[0-9]*$/ to $3 ~ /^[0-9]*$/ && $3 != 0 once in your tried attempt it should work then.
In case you DO NOT want to hard code field number and want to find out unit field number automatically then try following.
awk -F="," -v field_val="unit" '
FNR==1{
for(j=1;j<=NF;j++){
if($j==field_val){
field_number=j
next
}
}
}
$field_number ~ /[0-9]*$/ && $field_number!=0{
print "Has whole numbers: " $1
}' Input_file

How to create awk regex to match only one "space" between two words?

I have a sentence of form 2016-23-12 90-34-23 want to create an awk script to match it.
a.awk
$1 ~ /^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}[[:space:]][[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}/{
ts = $1 " " $2
print
}
Run using:
awk -f a.awk --posix
2016-23-12 90-34-23
Output:
Nothing

I assume your intention is match the whole string, in which $1 is incorrect, use it as $0
The problem you are seeing is Awk dynamic regular-expressions like the one you used don't need the $0 ~ /regex/ type match, the // is not needed here, just do as with your script being,
dynamicRegex = "[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}[[:space:]][[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}"
$0 ~ dynamicRegex {
print "match success"
}
and now running the script as
echo "2016-23-12 90-34-23"| awk -f a.awk --posix
2016-23-12 90-34-23
match success
Quoting from the page,
[..]The righthand side of a ~ or !~ operator need not be a regexp constant (i.e., a string of characters between slashes). It may be any expression. The expression is evaluated and converted to a string if necessary; the contents of the string are used as the regexp. A regexp that is computed in this way is called a dynamic regexp [..]
Another way would be to use the normal Regular Expression syntax over the POSIX character classes as a regexp constant as below,
$0 ~ /^[0-9]{4}-[0-9]{2}-[0-9]{2}\s[0-9]{2}-[0-9]{2}-[0-9]{2}$/ {
print "match success"
}
Remember with the above regex, your script is not longer POSIX compatible and running with --posix won't work here, also the \s here is a GNU Awk specific construct. Running it as
echo "2016-23-12 90-34-23"| awk -f a.awk
match success
Now to print the line upon the match, upon success just do,
print $1 FS $2
after the earlier print command.

Try this -
$echo "2016-23-12 90-34-23" | awk '{if($0 ~ /^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}[[:space:]][[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}$/) {print $0}}'
2016-23-12 90-34-23
$echo "2016-23-121 190-34-23" | awk '{if($0 ~ /^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}[[:space:]][[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}$/) {print $0}}'
##### No result

Trying to write a regex in bash

I am new to regex and I am trying to write a regex in a bash script .
I am trying to match line with a regex which has to return the second word in the line .
regex = "commit\s+(.*)"
line = "commit 5456eee"
if [$line =~ $regex]
then
echo $2
else
echo "No match"
fi
When I run this I get the following error:-
man.sh: line 1: regex: command not found
man.sh: line 2: line: command not found
I am new to bash scripting .
Can anyone please help me fix this .
I just want to write a regex to capture the word that follows commit

You don't want a regex, you want parameter expansion/substring extraction:
line="commit 5456eee"
first="${line% *}"
regex="${line#* }"
if [[ $line =~ $regex ]]
then
echo $2
else
echo "No match"
fi
$first == 'commit', $regex == '5456eee'. Bash provides all the tools you need.

If you really only need the second word you could also do it with awk
line = "commit 5456eee"
echo $line | awk '{ print $2 }'
or if you have a file:
cat filename | awk '{ print $2 }'
Even if it's no bash only solution, awk should be present on most linux os's.

You should remove the spaces around the equals sign, otherwise bash thinks you want to execute the regex command using = and "commit\s+(.*)" as arguments.
Then you should remove the spaces also in the if condition and quote the strings:
$ regex="commit\s+(.*)"
$ line="commit 5456eee"
$ if [ "$line"=~"$regex" ]
> then
> echo "Match"
> else
> echo "No match"
> fi
Match

maybe you didn't start your script with the
#!/bin/sh
or
#!/bin/bash
to define the language you're using... ?
It must be your first line.
then be careful, spaces are consistant in bash. In your "if" statement, it should be :
if [ $line =~ $regex ]
check this out and tell us more about the errors you get

if you make this script to a file like test.sh
and execute like that :
test.sh commit aaa bbb ccc
$0 $1 $2 $3 $4
you can get the arguments eassily by $0 $1...

A simple way to get the resulting capture group that was matched (if there is one) is to use BASH_REMATCH, which puts the match results into it's own array:
regex=$"commit (.*)"
line=$"commit 5456eee"
if [[ $line =~ $regex ]]
then
match=${BASH_REMATCH[1]}
echo $match
else
echo "No match"
fi
Since you have only one capture group it will be defined within the array as BASH_REMATCH[1]. In the above example I've assigned the variable $match to the result of BASH_REMATCH[1] which returns:
5456eee

Regexp dynamic variable

I have a crontab file named mycronfile:
#30 07 03 09 RAB root bash /media/data/test1.sh
#* * * * * root bash /media/data/test2.sh
30 07 * * * root bash /media/data/test3.sh
I am trying to add new lines to it but only if they don't already exist. This my code :
while read $line; do
com1=$(echo $line | awk '{ print $8 }')
com2=$(echo $line | awk '{ print $7 }')
fullCom=$(echo "$com2 $com1")
fixMin=$(echo $line | awk '{ print $1 }')
fixHour=$(echo $line | awk '{ print $2 }')
fixDate=$(echo $line | awk '{ print $3 }')
fixMonth=$(echo $line | awk '{ print $4 }')
actv=`echo "$fixMin $fixHour $fixDate $fixMonth $fixDay $user $fullCom"`
if grep "$actv" tempcron; then
echo "data in tempcron exist"
echo "$actv" > /dev/null
else
echo "data input into file"
echo "$actv >> tempcron"
fi
done < myfilecron
Every time I execute the script, data in tempcron is always duplicated. Of course I need to grep from mycronfile with the right pattern to avoid duplication. But how to grep it with regular expression? Because problem come when line contain asterisk (*).

A rewrite:
while read min hr date mon day user cmd; do
actv="$min $hr $date $mon $day $user $cmd" # only purpose I can see here
# is to fix the spaces
if grep -Fq "$actv" tempcron; then
echo "data in tempcron exist"
else
echo "data input into file"
echo "$actv >> tempcron"
fi
done < myfilecron
The main problem is that the $actv string contains * characters, which are regular expression quantifiers. You weren't telling grep to search for a plain string, you were giving a regex pattern that didn't match.
The other big problem with your script: while read $line -- you give read one or more variable names not a variable values

How to print matched regex pattern using awk?

Using awk, I need to find a word in a file that matches a regex pattern.
I only want to print the word matched with the pattern.
So if in the line, I have:
xxx yyy zzz
And pattern:
/yyy/
I want to only get:
yyy
EDIT:
thanks to kurumi i managed to write something like this:
awk '{
for(i=1; i<=NF; i++) {
tmp=match($i, /[0-9]..?.?[^A-Za-z0-9]/)
if(tmp) {
print $i
}
}
}' $1
and this is what i needed :) thanks a lot!

This is the very basic
awk '/pattern/{ print $0 }' file
ask awk to search for pattern using //, then print out the line, which by default is called a record, denoted by $0. At least read up the documentation.
If you only want to get print out the matched word.
awk '{for(i=1;i<=NF;i++){ if($i=="yyy"){print $i} } }' file

It sounds like you are trying to emulate GNU's grep -o behaviour. This will do that providing you only want the first match on each line:
awk 'match($0, /regex/) {
print substr($0, RSTART, RLENGTH)
}
' file
Here's an example, using GNU's awk implementation (gawk):
awk 'match($0, /a.t/) {
print substr($0, RSTART, RLENGTH)
}
' /usr/share/dict/words | head
act
act
act
act
aft
ant
apt
art
art
art
Read about match, substr, RSTART and RLENGTH in the awk manual.
After that you may wish to extend this to deal with multiple matches on the same line.

gawk can get the matching part of every line using this as action:
{ if (match($0,/your regexp/,m)) print m[0] }
match(string, regexp [, array])
If array is present, it is cleared,
and then the zeroth element of array is set to the entire portion of
string matched by regexp. If regexp contains parentheses, the
integer-indexed elements of array are set to contain the portion of
string matching the corresponding parenthesized subexpression.
http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions

If Perl is an option, you can try this:
perl -lne 'print $1 if /(regex)/' file
To implement case-insensitive matching, add the i modifier
perl -lne 'print $1 if /(regex)/i' file
To print everything AFTER the match:
perl -lne 'if ($found){print} else{if (/regex(.*)/){print $1; $found++}}' textfile
To print the match and everything after the match:
perl -lne 'if ($found){print} else{if (/(regex.*)/){print $1; $found++}}' textfile

If you are only interested in the last line of input and you expect to find only one match (for example a part of the summary line of a shell command), you can also try this very compact code, adopted from How to print regexp matches using `awk`?:
$ echo "xxx yyy zzz" | awk '{match($0,"yyy",a)}END{print a[0]}'
yyy
Or the more complex version with a partial result:
$ echo "xxx=a yyy=b zzz=c" | awk '{match($0,"yyy=([^ ]+)",a)}END{print a[1]}'
b
Warning: the awk match() function with three arguments only exists in gawk, not in mawk
Here is another nice solution using a lookbehind regex in grep instead of awk. This solution has lower requirements to your installation:
$ echo "xxx=a yyy=b zzz=c" | grep -Po '(?<=yyy=)[^ ]+'
b

Off topic, this can be done using the grep also, just posting it here in case if anyone is looking for grep solution
echo 'xxx yyy zzze ' | grep -oE 'yyy'

Using sed can also be elegant in this situation. Example (replace line with matched group "yyy" from line):
$ cat testfile
xxx yyy zzz
yyy xxx zzz
$ cat testfile | sed -r 's#^.*(yyy).*$#\1#g'
yyy
yyy
Relevant manual page: https://www.gnu.org/software/sed/manual/sed.html#Back_002dreferences-and-Subexpressions

If you know what column the text/pattern you're looking for (e.g. "yyy") is in, you can just check that specific column to see if it matches, and print it.
For example, given a file with the following contents, (called asdf.txt)
xxx yyy zzz
to only print the second column if it matches the pattern "yyy", you could do something like this:
awk '$2 ~ /yyy/ {print $2}' asdf.txt
Note that this will also match basically any line where the second column has a "yyy" in it, like these:
xxx yyyz zzz
xxx zyyyz

echo "abc123def" | awk '
function MATCH(haystack, needle, ltrim, rtrim)
{
if(ltrim == 0 && !length(ltrim))
ltrim = 0;
if(rtrim == 0 && !length(rtrim))
rtrim = 0;
return substr(haystack, match(haystack, needle) + ltrim, RLENGTH - ltrim - rtrim);
}
{
print $0 " - " MATCH($0, "123"); # 123
print $0 " - " MATCH($0, "[0-9]*d", 0, 1); # 123
print $0 " - " MATCH($0, "1234"); # Nothing printed
}'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using html input pattern in awk - regex

Not 100% sure of the requirement but you probably need to put anchors. $ echo 8d07\'54.520\"W | awk '{ if ($1 ~ /^[-+]?[0-9]+[.]?[0-9]+/) print $1; else print "erro" }' erro

Related

Bash AWK and Regex Apply on specific Column

How to create awk regex to match only one "space" between two words?

Trying to write a regex in bash

Regexp dynamic variable

How to print matched regex pattern using awk?

Categories

Resources