compare with one of the 5 options - regex

I am trying to find words where second or third character is one of "aeiou".
# cat t.txt
test
platform
axis
welcome
option
I tried this but the word "platform" and "axis" is missing in the output.
# awk 'substr($0,2,1) == "e" {print $0}' t.txt
test
welcome

You may use this awk solution that matches 1 or 2 of any char followed by one of the vowels:
awk '/^.{1,2}[aeiou]/' file
test
platform
axis
welcome
Or else use substr function to get a substring of 2nd and 3rd char and then compare with one of the vowels:
awk 'substr($0,2,2) ~ /[aeiou]/ ' file
test
platform
axis
welcome
As per comment below OP wants to get string without vowels in 2nd or 3rd position, Here is a solution for that:
awk '{
s=substr($0,2,2)
gsub(/[aeiou]+/, "", s)
print substr($0,1,1) s substr($0, 4)
}' file
tst
pltform
axs
wlcome
option
PS: This sed would be shorter for replacement:
sed -E 's/^(.[^aeiou]{0,1})[aeiou]{1,2}/\1/' file

With your shown samples only, please try following awk code. Written and tested in GNU awk, should work in any awk. Simple explanation would be, setting field separator as NULL and checking if 2nd OR 3rd field(character in current line basically) is any of a e i o u then print that line.
awk -v FS="" '$2~/^[aeiou]$/ || $3~/^[aeiou]$/' Input_file

This might work for you (GNU sed):
sed '/\<..\?[aeiou]/I!d' file
If the second or third character from the start of a word boundary is either a,e,i,o or u (of any case) don't delete the line.

I would harness GNU AWK for this task following way, let file.txt content be
test
platform
axis
welcome
option
then
awk 'BEGIN{FPAT="."}index("aeiou", $2)||index("aeiou", $3)' file.txt
gives output
test
platform
axis
welcome
Explanation: I inform GNU AWK that field is any single character (.) using FPAT, then I filter lines using index function, if 2nd field that is 2nd character is anywhere inside aeiou then index returns value greater than zero which is treated as true in boolean context and apply same function for 3rd field that is 3rd character and then apply logical OR (||) to their effects.
(tested in gawk 4.2.1)

Related

grep regex how to get only results with one preceeding word?

My string is :
www.abc.texas.com
mail.texas.com
subdomain.xyz.cc.texas.com
www2.texas.com
I an trying to get results only with "one" word before texas.com. Expectation when I do a regex grep :
mail.texas.com
www2.texas.com
So mail & www2 are the "one" word that I'm talking about. I tried :
grep "*.texas.com", but I get all of them in results. Can someone please help ?
You can use
grep '^[^.]*\.texas\.com'
Details:
^ - start of string
[^.]* - zero or more chars other than a . char
\.texas\.com - .texas.com string (literal . char must be escaped in the regex pattern).
See the online demo:
#!/bin/bash
s='www.abc.texas.com
mail.texas.com
subdomain.xyz.cc.texas.com
www2.texas.com'
grep '^[^.]*\.texas\.com' <<< "$s"
Output:
mail.texas.com
www2.texas.com
With awk:
awk 'BEGIN{FS=OFS="."} /texas.com$/ && NF==3' file
Output:
mail.texas.com
www2.texas.com
Set one dot as input and output field separator, check for texas.com at the end ($) of your line and check for three fields.
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
With your shown samples, please try following awk code.
awk -F'.' 'NF==3 && $2=="texas" && $3=="com"' Input_file
Explanation: Simple making field separator as . for all the lines in awk program. Then in main program checking condition if NF==3(means number of fields in current line)are 3 AND 2nd field is texas and 3rd field is com if all 3 conditions are MET then print the line.

Is there a way to use regex with awk to execute a command only when the pattern is matched?

I'm trying to write a bash script that gets user input, checks a .txt for the line that contains that input then plugs that into a wget statement to commence a download.
In testing the functionality awk seems to print out every line, not just pattern matched lines.
chosen=DSC01985
awk -v c="$chosen" 'BEGIN {FS="/"; /c/}
{print $8, "found", c}
END{print " done"}' ./imgLink.txt
The above should take from imgLink.txt, search for the pattern and return that the pattern is found. Instead it prints the the 8th field of every line in the file.
I have tried moving /c/ out of the begin statement but to no avail.
what's going on here?
Example input:
https://xxxx/xxxx/xxxx/xxxx/xxx/DSC01533.jpg
https://xxxx/xxxx/xxxx/xxxx/xxx/DSC01536.jpg
https://xxxx/xxxx/xxxx/xxxx/xxx/DSC01543.jpg
https://xxxx/xxxx/xxxx/xxxx/xxx/DSC01558.jpg
https://xxxx/xxxx/xxxx/xxxx/xxx/DSC01565.jpg
etc.
Example output:
...
DSC02028.jpg found DSC01985
DSC02030.jpg found DSC01985
DSC02032.jpg found DSC01985
DSC02038.jpg found DSC01985
DSC02042.jpg found DSC01985
etc.
You were close in your attempt, you can't search an awk variable like /var/ you need different method for this. Could you please try following.Considering that your string which you want to look will come in URL value(s) which you have currently xxxed in your post.
awk -v c="$chosen" -F'/' '$0 ~ c{print $NF " found " c}' Input_file
Not sure why you have written done in your END block, you could add it here if you need it. Also $NF means last field of current line you could print it as per your need too.

awk Regular Expression (REGEX) get phone number from file

The following is what I have written that would allow me to display only the phone numbers
in the file. I have posted the sample data below as well.
As I understand (read from left to right):
Using awk command delimited by "," if the first char is an Int and then an int preceded by [-,:] and then an int preceded by [-,:]. Show the 3rd column.
I used "www.regexpal.com" to validate my expression. I want to learn more and an explanation would be great not just the answer.
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
awk -F "," '/^(\d)+([-,:*]\d+)+([-,:*]\d+)*$/ {print $3}' bashuser.csv
bashuser.csv
Jordon,New York,630-150,7234
Jaremy,New York,630-250-7768
Jordon,New York,630*150*7745
Jaremy,New York,630-150-7432
Jordon,New York,630-230,7790
Expected Output:
6301507234
6302507768
....
You could just remove all non int
awk '{gsub(/[^[:digit:]]/, "")}1' file.csv
gsub remove all match
[^[:digit:]] the ^ everything but what is next to it, which is an int [[:digit:]], if you remove the ^ the reverse will happen.
"" means remove or delete in awk inside the gsub statement.
1 means print all, a shortcut for print
In sed
sed 's/[^[:digit:]]*//g' file.csv
Since your desired output always appears to start on field #3, you can simplify your regrex considerably using the following:
awk -F '[*,-]' '{print $3$4$5}'
Proof of concept
$ awk -F '[*,-]' '{print $3$4$5}' < ./bashuser.csv
6301507234
6302507768
6301507745
6301507432
6302307790
Explanation
-F '[*,-]': Use a character class to set the field separators to * OR , OR -.
print $3$4$5: Concatenate the 3rd through 5th fields.
awk is not very suitable because the comma occurs not only as a separator of records, better results will give sed:
sed 's/[^,]\+,[^,]\+,//;s/[^0-9]//g;' bashuser.csv
first part s/[^,]\+,[^,]\+,// removes first two records
second part //;s/[^0-9]//g removes all remaining non-numeric characters

Replace last occurrence of a character in a field with awk

I'm trying to replace the last occurrence of a character in a field with awk. Given is a file like this one:
John,Doe,Abc fgh 123,Abc
John,Doe,Ijk-nop 45D,Def
John,Doe,Qr s Uvw 6,Ghi
I want to replace the last space " " with a comma ",", basically splitting the field into two. The result is supposed to look like this:
John,Doe,Abc fgh,123,Abc
John,Doe,Ijk-nop,45D,Def
John,Doe,Qr s Uvw,6,Ghi
I've tried to create a variable with the number of occurrences of spaces in the field with
{var1=gsub(/ /,"",$3)}
and then integrate it in
{var2=gensub(/ /,",",var1,$4); print var2}
but the how-argument in gensub does not allow any characters besides numbers and G/g.
I've found a similar thread here but wasn't able to adapt the solution to my problem.
I'm fairly new to this so any help would be appreciated!
With GNU awk for gensub():
$ awk 'BEGIN{FS=OFS=","} {$3=gensub(/(.*) /,"\\1,","",$3)}1' file
John,Doe,Abc fgh,123,Abc
John,Doe,Ijk-nop,45D,Def
John,Doe,Qr s Uvw,6,Ghi
Get the book Effective Awk Programming by Arnold Robbins.
Very well-written question btw!
Here is a short awk
awk '{$NF=RS$NF;sub(" "RS,",")}1' file
John,Doe,Abc fgh,123,Abc
John,Doe,Ijk-nop,45D,Def
John,Doe,Qr s Uvw,6,Ghi
Updated due to Eds comment.
Or you can use the rev tools.
rev file | sed 's/ /,/' | rev
John,Doe,Abc fgh,123,Abc
John,Doe,Ijk-nop,45D,Def
John,Doe,Qr s Uvw,6,Ghi
Revers the line, then replace first space with ,, then revers again.

Grep Regex: List all lines except

I'm trying to automagically remove all lines from a text file that contains a letter "T" that is not immediately followed by a "H". I've been using grep and sending the output to another file, but I can't come up with the magic regex that will help me do this.
I don't mind using awk, sed, or some other linux tool if grep isn't the right tool to be using.
That should do it:
grep -v 'T[^H]'
-v : print lines not matching
[^H]: matches any character but H
You can do:
grep -v 'T[^H]' input
-v is the inverse match option of grep it does not list the lines that match the pattern.
The regex used is T[^H] which matches any lines that as a T followed by any character other than a H.
Read lines from file exclude EMPTY Lines and Lines starting with #
grep -v '^$\|^#' folderlist.txt
folderlist.txt
# This is list of folders
folder1/test
folder2
# This is comment
folder3
folder4/backup
folder5/backup
Results will be:
folder1/test
folder2
folder3
folder4/backup
folder5/backup
Adding 2 awk solutions to the mix here.
1st solution(simpler solution): With simple awk and any version of awk.
awk '!/T/ || /TH/' Input_file
Checking 2 conditions:
If a line doesn't contain T OR
If a line contains TH then:
If any of above condition is TRUE then print that line simply.
2nd solution(GNU awk specific): Using GNU awk using match function where mentioning regex (T)(.|$) and using match function's array creation capability.
awk '
!/T/{
print
next
}
match($0,/(T)(.|$)/,arr) && arr[1]=="T" && arr[2]=="H"
' Input_file
Explanation: firstly checking if a line doesn't have T then print that simply. Then using match function of awk to match T followed by any character OR end of the line. Since these are getting stored into 2 capturing groups so checking if array arr's 1st element is T and 2nd element is H then print that line.