Work only with lines that have 4 words in SED - regex

How to work with lines that have only 4 words in them with SED?
That’s what I managed to do, but it’s not working:
sed -e '/[ ]*[^ ]+[ ]*[^ ]+[ ]*[^ ]+[ ]*[^ ]+[ ]*/!d' -e 'other commands...' fileName

This should be portable:
sed -n '
/^[[:blank:]]*\([[:alpha:]]\{1,\}[[:blank:]]\{1,\}\)\{3\}[[:alpha:]]\{1,\}[[:blank:]]*$/ {
# capture the 1st word of every 4 word line and print it 3 times
s/^[[:blank:]]*\([[:alpha:]]\{1,\}\).*/\1 \1 \1/
p
}
' > temp-file

AWK may be the easier tool for your task. Just check the number of fields in a line is equal to four using the awk built-in variable NF.
awk 'NF==4' filename
would be a good starting point. if you wish to write the changes to file, you can use the inplace edit option of the GNU AWK like below
gawk -i inplace 'NF==4' filename

Related

Find all text between $...$ delimiters using bash script

I have a text file, and I'm trying to get an array of strings containing between $..$ delimiters (LaTeX formulas) using bash script. My current code doesn't work, result is empty:
#!/bin/bash
array=($(grep -o '\$([^\$]*)\$' test.txt))
echo ${array[#]}
I tested this regex here, it finds the matches. I use the following test string:
b5f1e7$bfc2439c621353$d1ce0$629f$b8b5
Expected result is
bfc2439c621353 629f
But echo returns empty. Although if I use '[0-9]\+' it works:
5 1 7 2439 621353 1 0 629 8 5
What do I do wrong?
How about:
grep -o '\$[^$]*\$' test.txt | tr -d '$'
This is basically performing your original grep (but without the brackets, which were causing it to not match), then removing the first/last characters from each match.
You may use awk with input field separator as $:
s='b5f1e7$bfc2439c621353$d1ce0$629f$b8b5'
awk -F '$' '{for (i=2; i<=NF; i+=2) print $i}' <<< "$s"
Note that this awk command doesn't validate input. If you want awk to allow for only valid inputs then you may use this gnu awk command with FPAT:
awk -v FPAT='\\$[^$]*\\$' '{for (i=1; i<=NF; i++) {gsub(/\$/, "", $i); print $i}}' <<< "$s"
bfc2439c621353
629f
What about this?
grep -Eo '\$[^$]+\$' a.txt | sed 's/\$//g'
I'm using sed to replace the $.
Try escaping your braces:
tst> grep -o '\$\([^\$]*\)\$' test.txt
$bfc2439c621353$
$629f$
of course, you then have to strip out the $ signs (-o prints the entire match). You can try sed instead:
tst> sed 's/[^\$]*\$\([^\$]*\)\$[^\$]*/\1\n/g' test.txt
bfc2439c621353
629f
Why is your expected output given b5f1e7$bfc2439c621353$d1ce0$629f$b8b5 the two elements bfc2439c621353 629f rather than the three elements bfc2439c621353 d1ce0 629f?
Here's a single grep command to extract those:
$ grep -Po '\$\K[^\$]*(?=\$)' <<<'b5f1e7$bfc2439c621353$d1ce0$629f$b8b5'
bfc2439c621353
d1ce0
629f
(This requires GNU grep as compiled with libpcre for -P)
This uses \$\K (equivalent to (?<=\$)to look behind at the first $ and (?=\$) to look ahead to the next $. Since these are lookarounds, they are not absorbed by grep in the process and therefore d1ce0 is available to be found.
Here's a single POSIX sed command to extract those:
$ sed 's/^[^$]*\$//; s/\$[^$]*$//; s/\$/\n/g' \
<<<'b5f1e7$bfc2439c621353$d1ce0$629f$b8b5'
bfc2439c621353
d1ce0
629f
This does not use any GNU notation and should work on any POSIX-compatible system (such as OS X). It removes the leading and trailing portions that aren't wanted, then replaces each $ with a newline.
Using bash regex:
var="b5f1e7\$bfc2439c621353\$d1ce0\$629f\$b8b5" # string to var
while [[ $var =~ ([^$]*\$)([^$]*)\$(.*) ]] # matching
do
echo -n "${BASH_REMATCH[2]} " # 2nd element has the match
var="${BASH_REMATCH[3]}" # 3rd is the rest of the string
done
echo # trailing newline
bfc2439c621353 629f

How to display words as per given number of letters?

I have created this basic script:
#!/bin/bash
file="/usr/share/dict/words"
var=2
sed -n "/^$var$/p" /usr/share/dict/words
However, it's not working as required to be (or still need some more logic to put in it).
Here, it should print only 2 letter words but with this it is giving different output
Can anyone suggest ideas on how to achieve this with sed or with awk?
it should print only 2 letter words
Your sed command is just searching for lines with 2 in text.
You can use awk for this:
awk 'length() == 2' file
Or using a shell variable:
awk -v n=$var 'length() == n' file
What you are executing is:
sed -n "/^2$/p" /usr/share/dict/words
This means: all lines consisting in exactly the number 2, nothing else. Of course this does not return anything, since /usr/share/dict/words has words and not numbers (as far as I know).
If you want to print those lines consisting in two characters, you need to use something like .. (since . matches any character):
sed -n "/^..$/p" /usr/share/dict/words
To make the number of characters variable, use a quantifier {} like (note the usage of \ to have sed's BRE understand properly):
sed -n "/^.\{2\}$/p" /usr/share/dict/words
Or, with a variable:
sed -n '/^.\{'"$var"'\}$/p' /usr/share/dict/words
Note that we are putting the variable outside the quotes for safety (thanks Ed Morton in comments for the reminder).
Pure bash... :)
file="/usr/share/dict/words"
var=2
#building a regex
str=$(printf "%${var}s")
re="^${str// /.}$"
while read -r word
do
[[ "$word" =~ $re ]] && echo "$word"
done < "$file"
It builds a regex in a form ^..$ (the number of dots is variable). So doing it in 2 steps:
create a string of the desired length e.g: %2s. without args the printf prints only the filler spaces for the desired length e.g.: 2
but we have a variable var, therefore %${var}s
replace all spaces in the string with .
but don't use this solution. It is too slow, and here are better utilities for this, best is imho grep.
file="/usr/share/dict/words"
var=5
grep -P "^\w{$var}$" "$file"
Try awk-
awk -v var=2 '{if (length($0) == var) print $0}' /usr/share/dict/words
This can be shortened to
awk -v var=2 'length($0) == var' /usr/share/dict/words
which has the same effect.
To output only lines matching 2 alphabetic characters with grep:
grep '^[[:alpha:]]\{2\}$' /usr/share/dict/words
GNU awk and mawk at least (due to empty FS):
$ awk -F '' 'NF==2' /usr/share/dict/words #| head -5
aa
Ab
ad
ae
Ah
Empty FS separates each character on its own field so NF tells the record length.

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd - should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After
Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.
This might work for you (GNU sed):
sed 's/-[^-]*//2g' file
You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u
#EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub():
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.
awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
Always print the 1st field (print $1), followed by:
If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).
This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:
$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u
awk '$0 = $2 ? $1 FS $2 : $1' FS=-
Result
After-u
After
This will do it in awk:
echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'

Using sed to find and replace within matched substrings

I'd like to use sed to process a property file such as:
java.home=/usr/bin/java
groovy-home=/usr/lib/groovy
workspace.home=/build/me/my-workspace
I'd like to replace the .'s and -'s with _'s but only up to the ='s token. The output would be
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
I've tried various approaches including using addresses but I keep failing. Does anybody know how to do this?
What about...
$ echo foo.bar=/bla/bla-bla | sed -e 's/\([^-.]*\)[-.]\([^-.]*=.*\)/\1_\2/'
foo_bar=/bla/bla-bla
This won't work for the case where you have more than 1 dot or dash one the left, though. I'll have to think about it further.
awk makes life easier in this case:
awk -F= -vOFS="=" '{gsub(/[.-]/,"_",$1)}1' file
here you go:
kent$ echo "java.home=/usr/bin/java
groovy-home=/usr/lib/groovy
workspace.home=/build/me/my-workspace"|awk -F= -vOFS="=" '{gsub(/[.-]/,"_",$1)}1'
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
if you really want to do with sed (gnu sed)
sed -r 's/([^=]*)(.*)/echo -n \1 \|sed -r "s:[-.]:_:g"; echo -n \2/ge' file
same example:
kent$ echo "java.home=/usr/bin/java
groovy-home=/usr/lib/groovy
workspace.home=/build/me/my-workspace"|sed -r 's/([^=]*)(.*)/echo -n \1 \|sed -r "s:[-.]:_:g"; echo -n \2/ge'
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
In this case I would use AWK instead of sed:
awk -F"=" '{gsub("\\.|-","_",$1); print $1"="$2;}' <file.properties>
Output:
java_home/usr/bin/java
groovy_home/usr/lib/groovy
workspace_home/build/me/my-workspace
This might work for you (GNU sed):
sed -r 's/=/\n&/;h;y/-./__/;G;s/\n.*\n//' file
"You wait ages for a bus..."
This works with any number of dots and hyphens in the line and does not require GNU sed:
sed 'h; s/.*=//; x; s/=.*//; s/[.-]/_/g; G; s/\n/=/' < data
Here's how:
h: save a copy of the line in the hold space
s: throw away everything before the equal sign in the pattern space
x: swap the pattern and hold
s: blow away everything after the = in the pattern
s: replaces dots and hyphens with underscores
G: join the pattern and hold with a newline
s: replace that newline with an equal to glue it all back together
Other way using sed
sed -re 's/(.*)([.-])(.*)=(.*)/\1_\3=\4/g' temp.txt
Output
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
In case there are more than .- on left hand side then this
sed -re ':a; s/^([^.-]+)([\.-])(.*)=/\1_\3=/1;t a' temp.txt

How to seek forward and replace selected characters with sed

Can I use sed to replace selected characters, for example H => X, 1 => 2, but first seek forward so that characters in first groups are not replaced.
Sample data:
"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";
How it should be after sed:
"Hello World";"Number 1 is there";"tX2s-Xas,2,XXunKnownData";
What I have tried:
Nothing really, I would try but everything I know about sed expressions seems to be wrong.
Ok, I have tried to capture ([^;]+) and "skip" (get em back using ´\1\2´...) first groups separated by ;, this is working fine but then comes problem, if I use capturing I need to select whole group and if I don't use capturing I'll lose data.
This is possible with sed, but is kinda tedious. To do the translation if field number $FIELD you can use the following:
sed 's/\(\([^;]*;\)\{'$((FIELD-1))'\}\)\([^;]*;\)/\1\n\3\n/;h;s/[^\n]*\n\([^\n]*\).*/\1/;y/H1/X2/;G;s/\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)/\2\1\4/'
Or, reducing the number of brackets with GNU sed:
sed -r 's/(([^;]*;){'$((FIELD-1))'})([^;]*;)/\1\n\3\n/;h;s/[^\n]*\n([^\n]*).*/\1/;y/H1/X2/;G;s/([^\n]*)\n([^\n]*)\n([^\n]*)\n([^\n]*)/\2\1\4/'
Example:
$ FIELD=3
$ echo '"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";' | sed -r 's/(([^;]*;){'$((FIELD-1))'})([^;]*;)/\1\n\3\n/;h;s/[^\n]*\n([^\n]*).*/\1/;y/H1/X2/;G;s/([^\n]*)\n([^\n]*)\n([^\n]*)\n([^\n]*)/\2\1\4/'
"Hello World";"Number 1 is there";"tX2s-Xas,2,XXunKnownData";
$ FIELD=2
$ echo '"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";' | sed -r 's/(([^;]*;){'$((FIELD-1))'})([^;]*;)/\1\n\3\n/;h;s/[^\n]*\n([^\n]*).*/\1/;y/H1/X2/;G;s/([^\n]*)\n([^\n]*)\n([^\n]*)\n([^\n]*)/\2\1\4/'
"Hello World";"Number 2 is there";"tH1s-Has,1,HHunKnownData";
There may be a simpler way that I didn't think of, though.
If awk is ok for you:
awk -F";" '{gsub("H","X",$3);gsub("1","2",$3);}1' OFS=";" file
Using -F, the file is split with semi-colon as delimiter, and hence now the 3rd field($3) is of our interest. gsub function substitutes all occurences of H with X in the 3rd field, and again 1 to 2.
1 is to print every line.
[UPDATE]
(I just realized that it could be shorter. Perl has an auto-split mode):
$F[2] =~ s/H/X/g; $F[2] =~ s/1/2/g; $_=join(";",#F)
Perl is not known for being particularly readable, but in this case I suspect the best you can get with sed might not be as clear as with Perl:
echo '"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";' |
perl -F';' -ape '$F[2] =~ s/H/X/g; $F[2] =~ s/1/2/g; $_=join(";",#F)'
Taking apart the Perl code:
# your groups are in #F, accessed as $F[$i]
$F[2] =~ s/H/X/g; # Do whatever you want with your chosen (Nth) group.
$F[2] =~ s/1/2/g;
$_ = join(";", #F) # Put them back together.
perl -pe is like sed. (sort of.)
and perl -F';' -ape means use auto-splitting (-a) and set the field separator to ';'. Then your groups are accessible via $F[i] - so it works slightly like awk, too.
So it would also work like perl -F';' -ape '/*your code*/' < inputfile
I know you asked for a sed solution - I often find myself switching to Perl (though I do still like sed) for one-liners.
awk -F";" '{gsub("H","X",$3);gsub("1","2",$3);}1' Your_file
This might work for you (GNU sed):
sed 's/H/X/2g;s/1/2/2g' file
This changes all but the first occurrence of H or 1 to X or 2 respectively
If it's by fields separated by ;'s, use:
sed 's/H[^;]*;/&\n/;h;y/H/X/;H;g;s/\n.*\n//;s/1[^;]*;/&\n/;h;y/1/2/;H;g;s/\n.*\n//' file
This can be mutated to cater for many values, so:
echo -e "H=X\n1=2"|
sed -r 's|(.*)=(.*)|s/\1[^;]*;/\&\\n/;h;y/\1/\2/;H;g;s/\\n.*\\n//|' |
sed -f - file