I have next file myfile.txt
"field1","val1","val2","val3"
"field2","val1","val2","val3"
"field3","val1","va
l2","va
l3"
"field4","val1","val2","val3"
I want to do this file in normal view like that:
"field1","val1","val2","val3"
"field2","val1","val2","val3"
"field3","val1","val2","val3"
"field4","val1","val2","val3"
So, I am trying to do that with next commands:
filename=myfile.txt
while read line
do
found=$(grep '^[^"]')
if [ "$found" ]; then
#think here must be command "paste"
fi
done < $filename
but something wrong. Please help me, I am not guru in unix commands
Try this:
filename=$1
while read -r line
do
found=$found$(echo $line | grep '[^"]')
if [[ -n $found && $found == *\" ]]; then
echo $found;
found=''
fi
done < "$filename"
The variable $found is always appended to itself this way you'll join the "broken lines".
In the if it's then checked if $found is not empty (-n does just that) and then check if $found ends with a quote as suggested by #Barmar
If it does end with a quote that's the end so you echo $found set variable to empty
sed solution:
sed -Ez 's/[[:space:]]+//g; s/""/","/g; s/(([^,]+,){3})([^,]+),/\1\3\n/g; $a\\' myfile.txt
-z - treat the input as lines separated by null(zero) character instead of newlines
s/[[:space:]]+//g - remove whitespaces between/within lines
s/""/","/g - separating adjacent fields which were wrapped/breaked
s/(([^,]+,){3})([^,]+),/\1\3\n/g - set linebreak (record separator) on each 4th field
$a\\ - append the final newline at the end of the content
The output:
"field1","val1","val2","val3"
"field2","val1","val2","val3"
"field3","val1","val2","val3"
"field4","val1","val2","val3"
Without knowing number of fields in input, you can use this gnu-awk solution using FPAT and gensub:
awk -v RS= -v FPAT='("[^"]*"|[^,"]+),?' -v OFS= '{
for (h=1; h<=NF; h++) $h = gensub(/([^"])\n[[:blank:]]*/, "\\1", "g", $h); } 1' file
"field1","val1","val2","val3"
"field2","val1","val2","val3"
"field3","val1","val2","val3"
"field4","val1","val2","val3"
To save changes back to file use:
awk -i inplace -v RS= -v FPAT='("[^"]*"|[^,"]+),?' -v OFS= '{
for (h=1; h<=NF; h++) $h = gensub(/([^"])\n[[:blank:]]*/, "\\1", "g", $h); } 1' file
I wrote a bash script which would read the text file already provided in the argument and would process the text and redirect errors to the error file and other outputs to the list.txt file.
#!/bin/bash
filename="$1"
while read line; do
a=$(echo $line | awk "{print NF}")
if [ "$a" = "3" ]
then
first=$(echo $line | awk -F' ' '{print $1}')
last=$(echo $line | awk -F' ' '{print $2}')
email=$(echo $line | awk -F' ' '{print $3}')
if [[ $first =~ ^[a-zA-Z]+$ && $last =~ ^[a-zA-Z]+$ ]]
then
if [[ $email =~ '<([\w\.\-_]+)?\w+#[\w-_]+(\.\w+){1,}>' ]]
then
echo "$first $last $email" | cat >>list.txt
elif [[ $email =~ '([\w\.\-_]+)?\w+#[\w-_]+(\.\w+){1,}' ]]
then
echo "$first $last <$email>" | cat >>list.txt
else
echo "$first $last $email" | cat >&2
fi
else
echo "$first $last $email" | cat >&2
fi
else
echo "$line" | cat >&2
fi
done < $filename
I run this code as $./script.sh argumentfile.txt 2>error.txt
My argument file has following information
Joe cable cable#ablecorp.com
Bob Brown <bob_baker#bakerandsons.com>
Jim Hass hass#bigcorp.com
mike_lupo#mou.east.com
Edison jones jones#inl.net.gov
pirate.coe.su.com pirate people
Ideal form of the file should be as (which is intentionally poorly formatted)
lastname firstname <email>
In the error file what I get is
Joe cable cable#ablecorp.com
Bob Brown <bob_baker#bakerandsons.com>
Jim Hass hass#bigcorp.com
mike_lupo#mou.east.com
Edison jones jones#inl.net.gov
pirate.coe.su.com pirate people
You could just do this entirely with awk:
#!/bin/bash
gawk '{
name_re = "^[[:alpha:]]+$"
mail_re = "<?[[:alnum:]_.%+-]+#[[:alnum:]_.-]+\\.[[:alpha:]]{2,6}>?"
# check for 3 fields with suitable regexp matching for all
if (NF == 3 && $1 ~ name_re && $2 ~ name_re && $3 ~ mail_re) {
# put brackets around the address if needed
email = $3 ~ /^<.*?>$/ ? $3 : "<" $3 ">"
# output to the good list
print $1 " " $2 " " email > "list.txt"
# move to the next one
next
}
# output to the bad list
print > "error.txt"
}' "$1"
Tested with BSD and Gnu versions of awk.
I am new to regex and I am trying to write a regex in a bash script .
I am trying to match line with a regex which has to return the second word in the line .
regex = "commit\s+(.*)"
line = "commit 5456eee"
if [$line =~ $regex]
then
echo $2
else
echo "No match"
fi
When I run this I get the following error:-
man.sh: line 1: regex: command not found
man.sh: line 2: line: command not found
I am new to bash scripting .
Can anyone please help me fix this .
I just want to write a regex to capture the word that follows commit
You don't want a regex, you want parameter expansion/substring extraction:
line="commit 5456eee"
first="${line% *}"
regex="${line#* }"
if [[ $line =~ $regex ]]
then
echo $2
else
echo "No match"
fi
$first == 'commit', $regex == '5456eee'. Bash provides all the tools you need.
If you really only need the second word you could also do it with awk
line = "commit 5456eee"
echo $line | awk '{ print $2 }'
or if you have a file:
cat filename | awk '{ print $2 }'
Even if it's no bash only solution, awk should be present on most linux os's.
You should remove the spaces around the equals sign, otherwise bash thinks you want to execute the regex command using = and "commit\s+(.*)" as arguments.
Then you should remove the spaces also in the if condition and quote the strings:
$ regex="commit\s+(.*)"
$ line="commit 5456eee"
$ if [ "$line"=~"$regex" ]
> then
> echo "Match"
> else
> echo "No match"
> fi
Match
maybe you didn't start your script with the
#!/bin/sh
or
#!/bin/bash
to define the language you're using... ?
It must be your first line.
then be careful, spaces are consistant in bash. In your "if" statement, it should be :
if [ $line =~ $regex ]
check this out and tell us more about the errors you get
if you make this script to a file like test.sh
and execute like that :
test.sh commit aaa bbb ccc
$0 $1 $2 $3 $4
you can get the arguments eassily by $0 $1...
A simple way to get the resulting capture group that was matched (if there is one) is to use BASH_REMATCH, which puts the match results into it's own array:
regex=$"commit (.*)"
line=$"commit 5456eee"
if [[ $line =~ $regex ]]
then
match=${BASH_REMATCH[1]}
echo $match
else
echo "No match"
fi
Since you have only one capture group it will be defined within the array as BASH_REMATCH[1]. In the above example I've assigned the variable $match to the result of BASH_REMATCH[1] which returns:
5456eee
If I have a string:
s='path/to/my/foo.txt'
and an array
declare -a include_files=('foo.txt' 'bar.txt');
how can I check the string for matches in my array efficiently?
You could loop through the array and use a bash substring check
for file in "${include_files[#]}"
do
if [[ $s = *${file} ]]; then
printf "%s\n" "$file"
fi
done
Alternately, if you want to avoid the loop and you only care that a file name matches or not, you could use the # form of bash extended globbing. The following example assumes that array file names do not contain |.
shopt -s extglob
declare -a include_files=('foo.txt' 'bar.txt');
s='path/to/my/foo.txt'
printf -v pat "%s|" "${include_files[#]}"
pat="${pat%|}"
printf "%s\n" "${pat}"
#prints foo.txt|bar.txt
if [[ ${s##*/} = #(${pat}) ]]; then echo yes; fi
For an exact match to the file name:
#!/bin/bash
s="path/to/my/foo.txt";
ARR=('foo.txt' 'bar.txt');
for str in "${ARR[#]}";
do
# if [ $(echo "$s" | awk -F"/" '{print $NF}') == "$str" ]; then
if [ $(basename "$s") == "$str" ]; then # A better option than awk for sure...
echo "match";
else
echo "no match";
fi;
done
echo 8d07\'54.520\"W | awk '{ if ($1 ~ /[-+]?[0-9]*[.]?[0-9]+/) print $1; else print "erro" }'
I'm trying to check if it's a number, but it's no working... I use this same regex in a html
input text, and it works.
In this case I was expecting "erro". It's not working.
My final goal is to apply 3 different pattern match to 3 fields $1 $2 $3...
Not 100% sure of the requirement but you probably need to put anchors.
$ echo 8d07\'54.520\"W | awk '{ if ($1 ~ /^[-+]?[0-9]+[.]?[0-9]+/) print $1; else print "erro" }'
erro