How to concatenate lines by unix commands

How to concatenate lines by unix commands - regex

I have next file myfile.txt
"field1","val1","val2","val3"
"field2","val1","val2","val3"
"field3","val1","va
l2","va
l3"
"field4","val1","val2","val3"
I want to do this file in normal view like that:
"field1","val1","val2","val3"
"field2","val1","val2","val3"
"field3","val1","val2","val3"
"field4","val1","val2","val3"
So, I am trying to do that with next commands:
filename=myfile.txt
while read line
do
found=$(grep '^[^"]')
if [ "$found" ]; then
#think here must be command "paste"
fi
done < $filename
but something wrong. Please help me, I am not guru in unix commands

Try this:
filename=$1
while read -r line
do
found=$found$(echo $line | grep '[^"]')
if [[ -n $found && $found == *\" ]]; then
echo $found;
found=''
fi
done < "$filename"
The variable $found is always appended to itself this way you'll join the "broken lines".
In the if it's then checked if $found is not empty (-n does just that) and then check if $found ends with a quote as suggested by #Barmar
If it does end with a quote that's the end so you echo $found set variable to empty

sed solution:
sed -Ez 's/[[:space:]]+//g; s/""/","/g; s/(([^,]+,){3})([^,]+),/\1\3\n/g; $a\\' myfile.txt
-z - treat the input as lines separated by null(zero) character instead of newlines
s/[[:space:]]+//g - remove whitespaces between/within lines
s/""/","/g - separating adjacent fields which were wrapped/breaked
s/(([^,]+,){3})([^,]+),/\1\3\n/g - set linebreak (record separator) on each 4th field
$a\\ - append the final newline at the end of the content
The output:
"field1","val1","val2","val3"
"field2","val1","val2","val3"
"field3","val1","val2","val3"
"field4","val1","val2","val3"

Without knowing number of fields in input, you can use this gnu-awk solution using FPAT and gensub:
awk -v RS= -v FPAT='("[^"]*"|[^,"]+),?' -v OFS= '{
for (h=1; h<=NF; h++) $h = gensub(/([^"])\n[[:blank:]]*/, "\\1", "g", $h); } 1' file
"field1","val1","val2","val3"
"field2","val1","val2","val3"
"field3","val1","val2","val3"
"field4","val1","val2","val3"
To save changes back to file use:
awk -i inplace -v RS= -v FPAT='("[^"]*"|[^,"]+),?' -v OFS= '{
for (h=1; h<=NF; h++) $h = gensub(/([^"])\n[[:blank:]]*/, "\\1", "g", $h); } 1' file

Related

print the last letter of each word to make a string using `awk` command

I have this line
UDACBG UYAZAM DJSUBU WJKMBC NTCGCH DIDEVO RHWDAS
i am trying to print the last letter of each word to make a string using awk command
awk '{ print substr($1,6) substr($2,6) substr($3,6) substr($4,6) substr($5,6) substr($6,6) }'
In case I don't know how many characters a word contains, what is the correct command to print the last character of $column, and instead of the repeding substr command, how can I use it only once to print specific characters in different columns

If you have just this one single line to handle you can use
awk '{for (i=1;i<=NF;i++) r = r "" substr($i,length($i))} END{print r}' file
If you have multiple lines in the input:
awk '{r=""; for (i=1;i<=NF;i++) r = r "" substr($i,length($i)); print r}' file
Details:
{for (i=1;i<=NF;i++) r = r "" substr($i,length($i)) - iterate over all fields in the current record, i is the field ID, $i is the field value, and all last chars of each field (retrieved with substr($i,length($i))) are appended to r variable
END{print r} prints the r variable once awk script finishes processing.
In the second solution, r value is cleared upon each line processing start, and its value is printed after processing all fields in the current record.
See the online demo:
#!/bin/bash
s='UDACBG UYAZAM DJSUBU WJKMBC NTCGCH DIDEVO RHWDAS'
awk '{for (i=1;i<=NF;i++) r = r "" substr($i,length($1))} END{print r}' <<< "$s"
Output:
GMUCHOS

Using GNU awk and gensub:
$ gawk '{print gensub(/([^ ]+)([^ ])( |$)/,"\\2","g")}' file
Output:
GMUCHOS

1st solution: With GNU awk you could try following awk program, written and tested eith shown samples.
awk -v RS='.([[:space:]]+|$)' 'RT{gsub(/[[:space:]]+/,"",RT);val=val RT} END{print val}' Input_file
Explanation: Set record separator as any character followed by space OR end of value/line. Then as per OP's requirement remove unnecessary newline/spaces from fetched value; keep on creating val which has matched value of RS, finally when awk program is done with reading whole Input_file print the value of variable then.
2nd solution: Using record separator as null and using match function on values to match regex (.[[:space:]]+)|(.$) to get last letter values only with each match found, keep adding matched values into a variable and at last in END block of awk program print variable's value.
awk -v RS= '
{
while(match($0,/(.[[:space:]]+)|(.$)/)){
val=val substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
}
END{
gsub(/[[:space:]]+/,"",val)
print val
}
' Input_file

Simple substitutions on individual lines is the job sed exists to do:
$ sed 's/[^ ]*\([^ ]\) */\1/g' file
GMUCHOS

using many tools
$ tr -s ' ' '\n' <file | rev | cut -c1 | paste -sd'\0'
GMUCHOS
separate the words to lines, reverse so that we can pick the first char easily, and finally paste them back together without a delimiter. Not the shortest solution but I think the most trivial one...

I would harness GNU AWK for this as follows, let file.txt content be
UDACBG UYAZAM DJSUBU WJKMBC NTCGCH DIDEVO RHWDAS
then
awk 'BEGIN{FPAT="[[:alpha:]]\\>";OFS=""}{$1=$1;print}' file.txt
output
GMUCHOS
Explanation: Inform AWK to treat any alphabetic character at end of word and use empty string as output field seperator. $1=$1 is used to trigger line rebuilding with usage of specified OFS. If you want to know more about start/end of word read GNU Regexp Operators.
(tested in gawk 4.2.1)

Another solution with GNU awk:
awk '{$0=gensub(/[^[:space:]]*([[:alpha:]])/, "\\1","g"); gsub(/\s/,"")} 1' file
GMUCHOS
gensub() gets here the characters and gsub() removes the spaces between them.
or using patsplit():
awk 'n=patsplit($0, a, /[[:alpha:]]\>/) { for (i in a) printf "%s", a[i]} i==n {print ""}' file
GMUCHOS

An alternate approach with GNU awk is to use FPAT to split by and keep the content:
gawk 'BEGIN{FPAT="\\S\\>"}
{ s=""
for (i=1; i<=NF; i++) s=s $i
print s
}' file
GMUCHOS
Or more tersely and idiomatic:
gawk 'BEGIN{FPAT="\\S\\>";OFS=""}{$1=$1}1' file
GMUCHOS
(Thanks Daweo for this)
You can also use gensub with:
gawk '{print gensub(/\S*(\S\>)\s*/,"\\1","g")}' file
GMUCHOS
The advantage here of both is that single letter "words" are handled properly:
s2='SINGLE X LETTER Z'
gawk 'BEGIN{FPAT="\\S\\>";OFS=""}{$1=$1}1' <<< "$s2"
EXRZ
gawk '{print gensub(/\S*(\S\>)\s*/,"\\1","g")}' <<< "$s2"
EXRZ
Where the accepted answer and most here do not:
awk '{for (i=1;i<=NF;i++) r = r "" substr($i,length($1))} END{print r}' <<< "$s2"
ER # WRONG
gawk '{print gensub(/([^ ]+)([^ ])( |$)/,"\\2","g")}' <<< "$s2"
EX RZ # WRONG

Find regular expression in a file matching a given value

I have some basic knowledge on using regular expressions with grep (bash).
But I want to use regular expressions the other way around.
For example I have a file containing the following entries:
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
Now I want to use bash to figure out to which line a particular number matches.
For example:
grep 8 file
should return:
line_three=[7-9]
Note: I am aware that the example of "grep 8 file" doesn't make sense, but I hope it helps to understand what I am trying to achieve.
Thanks for you help,
Marcel

As others haven pointed out, awk is the right tool for this:
awk -F'=' '8~$2{print $0;}' file
... and if you want this tool to feel more like grep, a quick bash wrapper:
#!/bin/bash
awk -F'=' -v seek_value="$1" 'seek_value~$2{print $0;}' "$2"
Which would run like:
./not_exactly_grep.sh 8 file
line_three=[7-9]

My first impression is that this is not a task for grep, maybe for awk.
Trying to do things with grep I only see this:
for line in $(cat file); do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done
Using while for file reading (following comments):
while IFS= read -r line; do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done < file

This can be done in native bash using the syntax [[ $value =~ $regex ]] to test:
find_regex_matching() {
local value=$1
while IFS= read -r line; do # read from input line-by-line
[[ $line = *=* ]] || continue # skip lines not containing an =
regex=${line#*=} # prune everything before the = for the regex
if [[ $value =~ $regex ]]; then # test whether we match...
printf '%s\n' "$line" # ...and print if we do.
fi
done
}
...used as:
find_regex_matching 8 <file
...or, to test it with your sample input inline:
find_regex_matching 8 <<'EOF'
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
EOF
...which properly emits:
line_three=[7-9]
You could replace printf '%s\n' "$line" with printf '%s\n' "${line%%=*}" to print only the key (contents before the =), if so inclined. See the bash-hackers page on parameter expansion for a rundown on the syntax involved.

This is not built-in functionality of grep, but it's easy to do with awk, with a change in syntax:
/[0-3]/ { print "line one" }
/[4-6]/ { print "line two" }
/[7-9]/ { print "line three" }
If you really need to, you could programmatically change your input file to this syntax, if it doesn't contain any characters that need escaping (mainly / in the regex or " in the string):
sed -e 's#\(.*\)=\(.*\)#/\2/ { print "\1" }#'

As I understand it, you are looking for a range that includes some value.
You can do this in gawk:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
$ awk -v n=8 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[7-9]
Since the digits are being treated as numbers (vs a regex) it supports larger ranges:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[75-95]
line_four=[55-105]
$ awk -v n=92 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[75-95]
line_four=[55-105]
If you are just looking to interpret the right hand side of the = as a regex, you can do:
$ awk -F= -v tgt=8 'tgt~$2' /tmp/file

You would like to do something like
grep -Ef <(cut -d= -f2 file) <(echo 8)
This wil grep what you want but will not display where.
With grep you can show some message:
echo "8" | sed -n '/[7-9]/ s/.*/Found it in line_three/p'
Now you would like to transfer your regexp file into such commands:
sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file
Store these commands in a virtual command file and you will have
echo "8" | sed -nf <(sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file)

Removing spaces for all the columns of a CSV file in bash/unix

I have a CSV file in which every column contains unnecessary extra spaces added to it before the actual value. I want to create a new CSV file by removing all the spaces.
For example
One line in input CSV file
123, ste hen, 456, out put
Expected output CSV file
123,ste hen,456,out put
I tried using awk to trim each column but it didn't work.

This sed should work:
sed -i.bak -E 's/(^|,)[[:blank:]]+/\1/g; s/[[:blank:]]+(,|$)/\1/g' file.csv
This will remove leading spaes, trailing spaces and spaces around comma.
Update: Here is an awk command to do the same:
awk -F '[[:blank:]]*,[[:blank:]]*' -v OFS=, '{
gsub(/^[[:blank:]]+|[[:blank:]]+$/, ""); $1=$1} 1' file

awk is your friend.
Input
$ cat 38609590.txt
Ted Winter, Evelyn Salt, Peabody
Ulrich, Ethan Hunt, Wallace
James Bond, Q, M
(blank line)
Script
$ awk '/^$/{next}{sub(/^[[:blank:]]*/,"");gsub(/[[:blank:]]*,[[:blank:]]*/,",")}1' 38609590.txt
Output
Ted Winter,Evelyn Salt,Peabody
Ulrich,Ethan Hunt,Wallace
James Bond,Q,M
Note
This one removes the blank lines too - /^$/{next}.
See the [ awk ] manual for more information.

To remove leading blank chars with sed:
$ sed -E 's/(^|,) +/\1/g' file
123,ste hen,456,out put
With GNU awk:
$ awk '{$0=gensub(/(^|,) +/,"\\1","g")}1' file
123,ste hen,456,out put
With other awks:
$ awk '{sub(/^ +/,""); gsub(/, +/,",")}1' file
123,ste hen,456,out put
To remove blank chars before and after the values with sed:
$ sed -E 's/ *(^|,|$) */\1/g' file
123,ste hen,456,out put
With GNU awk:
$ awk '{$0=gensub(/ *(^|,|$) */,"\\1","g")}1' file
123,ste hen,456,out put
With other awks:
$ awk '{gsub(/^ +| +$/,""); gsub(/ *, */,",")}1' file
123,ste hen,456,out put
Change (a single blank char) to [[:blank:]] if you can have tabs as well as blank chars.

echo " 123, ste hen, 456, out put" | awk '{sub(/^ +/,""); gsub(/, /,",")}1'
123,ste hen,456,out put

Another way to do with awk to remove multiple leading white-spaces is as below:-
$ awk 'BEGIN{FS=OFS=","} {s = ""; for (i = 1; i <= NF; i++) gsub(/^[ \t]+/,"",$i);} 1' <<< "123, ste hen, 456, out put"
123,ste hen,456,out put
FS=OFS="," sets the input and output field separator to ,
s = ""; for (i = 1; i <= NF; i++) loops across each column entry up to the end (i.e. from $1,$2...NF) and the gsub(/^[ \t]+/,"",$i) trims only the leading white-space and not anywhere else (one ore more white-space, note the +) from each column.
If you are want to do this action for an entire file, suggest using a simple script like below
#!/bin/bash
# Output written to the file 'output.csv' in the same path
while IFS= read -r line || [[ -n "$line" ]]; do # Not setting IFS here, all done in 'awk', || condition for handling empty lines
awk 'BEGIN{FS=OFS=","} {s = ""; for (i = 1; i <= NF; i++) gsub(/^[ \t]+/,"",$i);} 1' <<< "$line" >> output.csv
done <input.csv

$ cat > test.in
123, ste hen, 456, out put
$ awk -F',' -v OFS=',' '{for (i=1;i<=NF;i++) gsub(/^ +| +$/,"",$i); print $0}' test.in
123,ste hen,456,out put
or written out loud:
BEGIN {
FS="," # set the input field separator
OFS="," # and the output field separator
}
{
for (i=1;i<=NF;i++) # loop thru every field on record
gsub(/^ +| +$/,"",$i) # remove leading and trailing spaces
print $0 # print out the trimmed record
}
Run with:
$ awk -f test.awk test.in

awk -F' *, *' '$1=$1' OFS=, file_path

You could try :
your file : ~/path/file.csv
cat ~/path/file.csv | tr -d "\ "
sed "s/, /,/g" ~/path/file.csv

bash regex multiple match in one line

I'm trying to process my text.
For example i got:
asdf asdf get.this random random get.that
get.it this.no also.this.no
My desired output is:
get.this get.that
get.it
So regexp should catch only this pattern (get.\w), but it has to do it recursively because of multiple occurences in one line, so easiest way with sed
sed 's/.*(REGEX).*/\1/'
does not work (it shows only first occurence).
Probably the good way is to use grep -o, but i have old version of grep and -o flag is not available.

This grep may give what you need:
grep -o "get[^ ]*" file

Try awk:
awk '{for(i=1;i<=NF;i++){if($i~/get\.\w+/){print $i}}}' file.txt
You might need to tweak the regex between the slashes for your specific issue. Sample output:
$ awk '{for(i=1;i<=NF;i++){if($i~/get\.\w+/){print $i}}}' file.txt
get.this
get.that
get.it

With awk:
awk -v patt="^get" '{
for (i=1; i<=NF; i++)
if ($i ~ patt)
printf "%s%s", $i, OFS;
print ""
}' <<< "$text"
bash
while read -a words; do
for word in "${words[#]}"; do
if [[ $word == get* ]]; then
echo -n "$word "
fi
done
echo
done <<< "$text"
perl
perl -lane 'print join " ", grep {$_ =~ /^get/} #F' <<< "$text"

This might work for you (GNU sed):
sed -r '/\bget\.\S+/{s//\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1 /g;s/ $//}' file
or if you want one per line:
sed -r '/\n/!s/\bget\.\S+/\n&\n/g;/^get/P;D' file

Shell script to Translate the find third word separated by tab in a line

Could someone please help me write ascript to translate the third word in each line, words being separated by tabs.
Sample input:
Hello how Are You
Iam Fine how about
Sample output:
Hello how Ziv You
Iam Fine sld about
The third word in each line should be translated as if using: tr '[abcdefghijklmnopqrstuvqxyz]' '[zyxwvutsrqponmlkjihgfedcba]'

Given the following:
[somedude#dev7 ~]# cat so.txt
Hello how Are You
Iam Fine how about
[somedude#dev7 ~]#
I'd run:
[somedude#dev7 ~]# cat so.sh
#!/bin/bash
_INPUT="Hello how Are You
Iam Fine how about"
# read each line from config file
while read -r l
do
_GET_THIRD_WORD=$(echo $l | awk '{print $3}')
echo $_GET_THIRD_WORD | sed -i "s,$_GET_THIRD_WORD,SOMETHINGTOTRANSLATEWITH,"
done < so.txt
[somedude#dev7 ~]#
This will echo out each of your translated lines to standard out.
Hope this helps!

Just bash:
#!/bin/bash
while read -ra A; do
printf "%s\t%s" "${A[0]}" "${A[1]}"
printf "\t%s" "$(echo "${A[2]}" | tr '[ABCDEFGHIJKLMNOPQRSTUVQXYZabcdefghijklmnopqrstuvqxyz]' '[ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba]')" "${A[#]:3}"
echo
done
Run as:
bash script.sh < input_file
Output:
Hello how Aiv You
Iam Fine slw about
If you change \t to space ():
Hello how Ziv You
Iam Fine slw about
Another version:
#!/bin/bash
F=({A..Z} {a..z}) R=({Z..A} {z..a})
while read -ra A; do
printf "%s\t%s" "${A[0]}" "${A[1]}"
printf "\t%s" "$(IFS=''; echo "${A[2]}" | tr "[${F[*]}]" "[${R[*]}]")" "${A[#]:3}"
echo
done

This is very kludgy but gets the job done (in the bash shell). It uses sed's y transliterate operator on the entire input file. This is passed via process substitution to awk and the third field stored in an array. Awk then loops through the original file and replaces each instance of the third field with the transliterated value.
awk -F'\t' -v OFS='\t' 'NR == FNR{a[NR]=$3; next};{$3=a[FNR]; print}' \
<(sed -e 'y/abcdefghijklmnopqrstuvqxyz/zyxwvutsrqponmlkjihgfedcba/' \
-e 'y/ABCDEFGHIJKLMNOPQRSTUVQXYZ/ZYXWVUTSRQPONMLKJIHGFEDCBA/' file) file

AWK script something like this
#!/usr/bin/awk -f
BEGIN{
IFS="\t" #input field separator as tab
CHARSET = "abcdefghijklmnopqrstuvwxyz"
}
{
rep_str="" #replacement string
# loop in through each char of third word
for(i=1;i<=length($3);i++){
char = substr($3,i,1)
loc = index(CHARSET,tolower(char))
#check to see if the character is actually an alphabet
if(loc>0){
#get the reverse location of char in the CHARSET
rep_char = substr(CHARSET,27-loc,1)
#change the replacement character to upper case if the original char is uppercase
if(char~/[A-Z]/){
rep_char = toupper(rep_char)
}
}else{
rep_char = char
}
rep_str=rep_str rep_char #final replacement sting formed by concatenation of replaced char rep_char
}
$3 = rep_str
print $0
}

perl -F'\t' -lane '$F[3] =~ tr/ABCDEFGHIJKLMNOPQRSTUVQXYZabcdefghijklmnopqrstuvqxyz/ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba/ ; print "#F"' Filename

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to concatenate lines by unix commands - regex

Related

print the last letter of each word to make a string using `awk` command

Find regular expression in a file matching a given value

Removing spaces for all the columns of a CSV file in bash/unix

bash regex multiple match in one line

Shell script to Translate the find third word separated by tab in a line

Categories

Resources