Passing shell variable to awk in for-loop - regex

I'm writing a script to print column and row numbers of cells which match a given string and then output it to a text file. The individual awk commands work fine in terminal and I've resolved other syntax issues, but .txt that is output still comes up empty. I think I have a problem with passing shell variables to awk.
#!/bin/bash
echo Literal or regex string to find:
read string
echo File path to find string match in:
read filename
echo "Matches for $string were found in the following cells:" > results.txt
for string in filename
do
awk -v awkvar="$string" -F"," '{for(i=1;i<=NF;i++){if ($i ~ /awkvar/){print i}}}' $filename >> results.txt | echo -e "\n" >> results.txt
awk -v awkvar="$string" '/awkvar/{print NR}' $filename >> results.txt | echo -e "\n" >> results.txt
done
Problem Resolved
I've rewritten the script as follows:
#!/bin/bash
# Prompt for input: 1. enter file name or path that you want searched; 2. enter the literal or regex string
echo File name or path to find matches in:
read file
echo Literal or regex string to find:
read string
# Define variable and test if any matches are to be found; if not, notification is sent to terminal, but if matches exist, their row numbers (as summary rows) and individual column numbers will be output to a .txt file in the home directory. NB: you need to escape minus symbol with brackets, [-], so that it's not confused with an invalid grep option!
matchesFound=$(cat $file | grep -E -c "$string")
if [ $matchesFound -eq 0 ];
then
echo "No matches exist."
else
printf "Summary Row No: \n`awk -v awkvar="$string" '$0 ~ awkvar{print NR}' $file`" > results_for_$string.txt
printf "\nInstance Column No: \n`awk -v awkvar="$string" -F"," '{for(i=1;i<=NF;i++){if ($i ~ awkvar){print i}}}' $file`" >> results_for_$string.txt
fi

You can't use awk variables inside regexp check pattern, try following instead. You could use index function of awk and to check if condition try without /../ way.
awk -v awkvar="$string" -F"," '{for(i=1;i<=NF;i++){if ($i ~ awkvar){print i}}}' $filename >> results.txt | echo -e "\n" >> results.txt
awk -v awkvar="$string" 'index($0,awkvar){print NR}' $filename >> results.txt | echo -e "\n" >> results.txt
This answer deals with only awk code shown by OP as per question, to fix it.

Related

Remove a line from file that starts with a number in bash

I am trying to create a simple CSV editor in bash,
and I struggle with removing a line. The user passes in the ID of
the line to remove (each row is defined with an ID as the first column).
This is an example file structure:
ID,Name,Surname
0,Mark,Twain
1,Cristopher,Jones
So, having the id saved as a variable and the file name in another variable (say its file.csv) I attempt to remove it from bash with this line:
read -p "Pass the object's ID: " idtoremove
fname=file.csv
sed -i -e "'/^$idtoremove*,/d'" $fname
However, this has no effect on the file. What could be wrong with this line?
Also, how can I replace a line starting with given ID with a string from a variable? This is another problem I will have to face but I have no Idea how to approach this one.
Following script could help you. It asks user to enter an id.
cat script.ksh
echo "Please enter the id to be removed:"
read value
awk -v val="$value" -F, '$1!=val' Input_file
In case you want to save output into Input_file itself append > tmp_file && mv tmp_file Input_file in above awk code.
With sed:
cat script.ksh
echo "Please enter the id to be removed:"
read value
sed "/^$value,/d" Input_file
Kindly use sed -i.bak option in above sed to save output into Input_file itself and have a backup of Input_file(before change) too.
This is best done in awk:
awk -v id="$idtoremove" -F, '$1 != id' file.csv
If you're using gnu awk then you can save in-place also:
awk -i inplace -v id="$idtoremove" -F, '$1 != id' file.csv
For other awk versions use:
awk -v id="$idtoremove" -F, '$1 != id' file.csv > $$.csv &&
mv $$.csv file.csv

How to concatenate lines by unix commands

I have next file myfile.txt
"field1","val1","val2","val3"
"field2","val1","val2","val3"
"field3","val1","va
l2","va
l3"
"field4","val1","val2","val3"
I want to do this file in normal view like that:
"field1","val1","val2","val3"
"field2","val1","val2","val3"
"field3","val1","val2","val3"
"field4","val1","val2","val3"
So, I am trying to do that with next commands:
filename=myfile.txt
while read line
do
found=$(grep '^[^"]')
if [ "$found" ]; then
#think here must be command "paste"
fi
done < $filename
but something wrong. Please help me, I am not guru in unix commands
Try this:
filename=$1
while read -r line
do
found=$found$(echo $line | grep '[^"]')
if [[ -n $found && $found == *\" ]]; then
echo $found;
found=''
fi
done < "$filename"
The variable $found is always appended to itself this way you'll join the "broken lines".
In the if it's then checked if $found is not empty (-n does just that) and then check if $found ends with a quote as suggested by #Barmar
If it does end with a quote that's the end so you echo $found set variable to empty
sed solution:
sed -Ez 's/[[:space:]]+//g; s/""/","/g; s/(([^,]+,){3})([^,]+),/\1\3\n/g; $a\\' myfile.txt
-z - treat the input as lines separated by null(zero) character instead of newlines
s/[[:space:]]+//g - remove whitespaces between/within lines
s/""/","/g - separating adjacent fields which were wrapped/breaked
s/(([^,]+,){3})([^,]+),/\1\3\n/g - set linebreak (record separator) on each 4th field
$a\\ - append the final newline at the end of the content
The output:
"field1","val1","val2","val3"
"field2","val1","val2","val3"
"field3","val1","val2","val3"
"field4","val1","val2","val3"
Without knowing number of fields in input, you can use this gnu-awk solution using FPAT and gensub:
awk -v RS= -v FPAT='("[^"]*"|[^,"]+),?' -v OFS= '{
for (h=1; h<=NF; h++) $h = gensub(/([^"])\n[[:blank:]]*/, "\\1", "g", $h); } 1' file
"field1","val1","val2","val3"
"field2","val1","val2","val3"
"field3","val1","val2","val3"
"field4","val1","val2","val3"
To save changes back to file use:
awk -i inplace -v RS= -v FPAT='("[^"]*"|[^,"]+),?' -v OFS= '{
for (h=1; h<=NF; h++) $h = gensub(/([^"])\n[[:blank:]]*/, "\\1", "g", $h); } 1' file

Find regular expression in a file matching a given value

I have some basic knowledge on using regular expressions with grep (bash).
But I want to use regular expressions the other way around.
For example I have a file containing the following entries:
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
Now I want to use bash to figure out to which line a particular number matches.
For example:
grep 8 file
should return:
line_three=[7-9]
Note: I am aware that the example of "grep 8 file" doesn't make sense, but I hope it helps to understand what I am trying to achieve.
Thanks for you help,
Marcel
As others haven pointed out, awk is the right tool for this:
awk -F'=' '8~$2{print $0;}' file
... and if you want this tool to feel more like grep, a quick bash wrapper:
#!/bin/bash
awk -F'=' -v seek_value="$1" 'seek_value~$2{print $0;}' "$2"
Which would run like:
./not_exactly_grep.sh 8 file
line_three=[7-9]
My first impression is that this is not a task for grep, maybe for awk.
Trying to do things with grep I only see this:
for line in $(cat file); do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done
Using while for file reading (following comments):
while IFS= read -r line; do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done < file
This can be done in native bash using the syntax [[ $value =~ $regex ]] to test:
find_regex_matching() {
local value=$1
while IFS= read -r line; do # read from input line-by-line
[[ $line = *=* ]] || continue # skip lines not containing an =
regex=${line#*=} # prune everything before the = for the regex
if [[ $value =~ $regex ]]; then # test whether we match...
printf '%s\n' "$line" # ...and print if we do.
fi
done
}
...used as:
find_regex_matching 8 <file
...or, to test it with your sample input inline:
find_regex_matching 8 <<'EOF'
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
EOF
...which properly emits:
line_three=[7-9]
You could replace printf '%s\n' "$line" with printf '%s\n' "${line%%=*}" to print only the key (contents before the =), if so inclined. See the bash-hackers page on parameter expansion for a rundown on the syntax involved.
This is not built-in functionality of grep, but it's easy to do with awk, with a change in syntax:
/[0-3]/ { print "line one" }
/[4-6]/ { print "line two" }
/[7-9]/ { print "line three" }
If you really need to, you could programmatically change your input file to this syntax, if it doesn't contain any characters that need escaping (mainly / in the regex or " in the string):
sed -e 's#\(.*\)=\(.*\)#/\2/ { print "\1" }#'
As I understand it, you are looking for a range that includes some value.
You can do this in gawk:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
$ awk -v n=8 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[7-9]
Since the digits are being treated as numbers (vs a regex) it supports larger ranges:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[75-95]
line_four=[55-105]
$ awk -v n=92 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[75-95]
line_four=[55-105]
If you are just looking to interpret the right hand side of the = as a regex, you can do:
$ awk -F= -v tgt=8 'tgt~$2' /tmp/file
You would like to do something like
grep -Ef <(cut -d= -f2 file) <(echo 8)
This wil grep what you want but will not display where.
With grep you can show some message:
echo "8" | sed -n '/[7-9]/ s/.*/Found it in line_three/p'
Now you would like to transfer your regexp file into such commands:
sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file
Store these commands in a virtual command file and you will have
echo "8" | sed -nf <(sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file)

print line that matches first field (bash)

I'm trying to read userinput, have that match the first field of a csv file, and print out the entire line. Here's what i've come up with:
#/bin/bash
echo "enter number: "
read USERINPUT
LINENUMBER=$(awk -v FS=',' '{print $1}' < test.csv | grep -n "$USERINPUT")
FULLLINE=$(sed -n $LINENUMBER\p test.csv)
echo $FULLLINE
The problem i'm running into is say i set USERINPUT=4 but my csv file has several lines like 4, 421, 444, etc.. i match all of them. How do i make
grep -n "$USERINPUT"
only match exactly what it is set to and nothing else?
Instead of printing the first column of every line, then using grep, you should just do the whole thing in awk:
line_number=$(awk -F, -v s="$number" '$1==s{print NR}' test.csv)
If you just want to print the line, that's simple:
awk -F, -v s="$number" '$1==s' test.csv
By the way, instead of using an echo followed by a read, you can use read -p which allows you to specify a prompt:
read -p "enter number: " number
#/bin/bash
read -p "enter number: " num
grep "^$num," test.csv
The -o grep option prints only what matches the regular expression.
E.g.
grep -o '.*USERINPUT.*'
or
grep -o '^USERINPUT.*'
etc.
#/bin/bash
echo "enter number: "
read USERINPUT
# for a var assignation and print content
FULLLINE=$(egrep "^${USERINPUT%% *}," test.csv )
echo $FULLLINE
# for only a print
egrep "^${USERINPUT%% *}," test.csv
Use of egrep to include deleimiter (start line and coma around the input)
Use of a small input test removing trailing space via ${VarName%% *}

Extract all numbers from a text file and store them in another file

I have a text file which have lots of lines. I want to extract all the numbers from that file.
File contains text and number and each line contains only one number.
How can i do it using sed or awk in bash script?
i tried
#! /bin/bash
sed 's/\([0-9.0-9]*\).*/\1/' <myfile.txt >output.txt
but this didn't worked.
grep can handle this:
grep -Eo '[0-9\.]+' myfile.txt
-o tells to print only the matches and [0-9\.]+ is a regular expression to match numbers.
To put all numbers on one line and save them in output.txt:
echo $(grep -Eo '[0-9\.]+' myfile.txt) >output.txt
Text files should normally end with a newline characters. The use of echo above assures that this happens.
Non-GNU grep:
If your grep does not support the -o flag, try:
echo $(tr ' ' '\n' <myfile.txt | grep -E '[0-9\.]+') >output.txt
This uses tr to replace all spaces with newlines (so each number appears separately on a line) and then uses grep to search for numbers.
tr -sc '0-9.' ' ' "$file"
Will transform every string of non-digit-or-period characters into a single space.
You can also use Bash:
while read line; do
if [[ $line =~ [0-9\.]+ ]]; then
echo $BASH_REMATCH
fi
done <myfile.txt >output.txt