Inserting a "," in a particular position of a text

Inserting a "," in a particular position of a text - regex

(I put a exact text and command I executed so would be looking a bit messy.)
I have a .TXT file looking like
11111111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111111
And outcome I am looking for would be like
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
Command I have tried is
sed -i 's/\(.\{14\}\)\(.\{7\}\)\(.\{2\}\)\(.\{1\}\)\(.\{3\}\)\(.\{13\}\)\(.\{1\}\)\(.\{8\}\)\(.\{16\}\)\(.\{3\}\)/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,/' SOME.TXT
And outcome I have got was
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,1111111111111110,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,1111111111111110,111
I have literally no idea why these 0s suddenly popped out and ' , ' doesn't appear in the position where I command even though it worked half way.
Is this a bug or something in sed command?

It is printing 0 in output because sed capture groups and their back-references can be up to 9 only and \10 is interpreted as \1 followed by literal 0.
You can solve it easily using FIELDWIDTHS feature of gnu-awk:
awk -v OFS=, 'BEGIN { FIELDWIDTHS = "14 7 2 1 3 13 1 8 16 3 *" } {$1 = $1} 1' file
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
Just for academic exercise, here is a working sed to solve this using 2 substitutions:
sed -E 's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.+)/\1,\2,\3,\4,\5,\6,\7,\8,\9/; s/(.+,.{16})(.{3})(.*)/\1,\2,\3/' file

sed can't reference capture groups > 9, Perl can:
perl -i -pe 's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.{16})(.{3})/$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,/' SOME.TXT

If you insist to use sed, you can do something like:
sed 's/./&,/68;s/./&,/65;s/./&,/49;s/./&,/41;s/./&,/40;s/./&,/27;s/./&,/24;s/./&,/23;s/./&,/21;s/./&,/14' test.txt
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111

Related

Work only with lines that have 4 words in SED

How to work with lines that have only 4 words in them with SED?
That’s what I managed to do, but it’s not working:
sed -e '/[ ]*[^ ]+[ ]*[^ ]+[ ]*[^ ]+[ ]*[^ ]+[ ]*/!d' -e 'other commands...' fileName

This should be portable:
sed -n '
/^[[:blank:]]*\([[:alpha:]]\{1,\}[[:blank:]]\{1,\}\)\{3\}[[:alpha:]]\{1,\}[[:blank:]]*$/ {
# capture the 1st word of every 4 word line and print it 3 times
s/^[[:blank:]]*\([[:alpha:]]\{1,\}\).*/\1 \1 \1/
p
}
' > temp-file

AWK may be the easier tool for your task. Just check the number of fields in a line is equal to four using the awk built-in variable NF.
awk 'NF==4' filename
would be a good starting point. if you wish to write the changes to file, you can use the inplace edit option of the GNU AWK like below
gawk -i inplace 'NF==4' filename

How to display words as per given number of letters?

I have created this basic script:
#!/bin/bash
file="/usr/share/dict/words"
var=2
sed -n "/^$var$/p" /usr/share/dict/words
However, it's not working as required to be (or still need some more logic to put in it).
Here, it should print only 2 letter words but with this it is giving different output
Can anyone suggest ideas on how to achieve this with sed or with awk?

it should print only 2 letter words
Your sed command is just searching for lines with 2 in text.
You can use awk for this:
awk 'length() == 2' file
Or using a shell variable:
awk -v n=$var 'length() == n' file

What you are executing is:
sed -n "/^2$/p" /usr/share/dict/words
This means: all lines consisting in exactly the number 2, nothing else. Of course this does not return anything, since /usr/share/dict/words has words and not numbers (as far as I know).
If you want to print those lines consisting in two characters, you need to use something like .. (since . matches any character):
sed -n "/^..$/p" /usr/share/dict/words
To make the number of characters variable, use a quantifier {} like (note the usage of \ to have sed's BRE understand properly):
sed -n "/^.\{2\}$/p" /usr/share/dict/words
Or, with a variable:
sed -n '/^.\{'"$var"'\}$/p' /usr/share/dict/words
Note that we are putting the variable outside the quotes for safety (thanks Ed Morton in comments for the reminder).

Pure bash... :)
file="/usr/share/dict/words"
var=2
#building a regex
str=$(printf "%${var}s")
re="^${str// /.}$"
while read -r word
do
[[ "$word" =~ $re ]] && echo "$word"
done < "$file"
It builds a regex in a form ^..$ (the number of dots is variable). So doing it in 2 steps:
create a string of the desired length e.g: %2s. without args the printf prints only the filler spaces for the desired length e.g.: 2
but we have a variable var, therefore %${var}s
replace all spaces in the string with .
but don't use this solution. It is too slow, and here are better utilities for this, best is imho grep.
file="/usr/share/dict/words"
var=5
grep -P "^\w{$var}$" "$file"

Try awk-
awk -v var=2 '{if (length($0) == var) print $0}' /usr/share/dict/words
This can be shortened to
awk -v var=2 'length($0) == var' /usr/share/dict/words
which has the same effect.

To output only lines matching 2 alphabetic characters with grep:
grep '^[[:alpha:]]\{2\}$' /usr/share/dict/words

GNU awk and mawk at least (due to empty FS):
$ awk -F '' 'NF==2' /usr/share/dict/words #| head -5
aa
Ab
ad
ae
Ah
Empty FS separates each character on its own field so NF tells the record length.

Using sed to replace tab with spaces

I'm trying to replace the tab with 4 spaces, using sed, but it is not working.
Here is my code:
sed -i '{s/\t/ \{4\}/g}' filename
Any suggestion is appreciate.

In sed replacement is not supposed to be a regex, so use:
sed -i.bak $'s/\t/ /g' filename
On gnu-sed even this will work:
sed -i.bak 's/\t/ /g' filename

There is already an accepted answer but it does hardcoded basic tab expansion, while tabs have a variable width suitable for alignment, which is not taken into account in the previous answer. For example:
12\tabcd
1234\tabcd
should expand to the correctly aligned:
12 abcd
1234 abcd
but the given sed command will incorrectly expand to this misaligned output:
% printf "12\tabcd\n1234\tabcd\n" | sed 's/\t/ /g'
12 abcd
1234 abcd
The correct way to do it is to use the standard command expand, it's installed on all systems.
% printf "12\tabcd\n1234\tabcd\n" | expand
12 abcd
1234 abcd
If you want to use tabstops of size 4, pass -t 4.

How to seek forward and replace selected characters with sed

Can I use sed to replace selected characters, for example H => X, 1 => 2, but first seek forward so that characters in first groups are not replaced.
Sample data:
"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";
How it should be after sed:
"Hello World";"Number 1 is there";"tX2s-Xas,2,XXunKnownData";
What I have tried:
Nothing really, I would try but everything I know about sed expressions seems to be wrong.
Ok, I have tried to capture ([^;]+) and "skip" (get em back using ´\1\2´...) first groups separated by ;, this is working fine but then comes problem, if I use capturing I need to select whole group and if I don't use capturing I'll lose data.

This is possible with sed, but is kinda tedious. To do the translation if field number $FIELD you can use the following:
sed 's/\(\([^;]*;\)\{'$((FIELD-1))'\}\)\([^;]*;\)/\1\n\3\n/;h;s/[^\n]*\n\([^\n]*\).*/\1/;y/H1/X2/;G;s/\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)/\2\1\4/'
Or, reducing the number of brackets with GNU sed:
sed -r 's/(([^;]*;){'$((FIELD-1))'})([^;]*;)/\1\n\3\n/;h;s/[^\n]*\n([^\n]*).*/\1/;y/H1/X2/;G;s/([^\n]*)\n([^\n]*)\n([^\n]*)\n([^\n]*)/\2\1\4/'
Example:
$ FIELD=3
$ echo '"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";' | sed -r 's/(([^;]*;){'$((FIELD-1))'})([^;]*;)/\1\n\3\n/;h;s/[^\n]*\n([^\n]*).*/\1/;y/H1/X2/;G;s/([^\n]*)\n([^\n]*)\n([^\n]*)\n([^\n]*)/\2\1\4/'
"Hello World";"Number 1 is there";"tX2s-Xas,2,XXunKnownData";
$ FIELD=2
$ echo '"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";' | sed -r 's/(([^;]*;){'$((FIELD-1))'})([^;]*;)/\1\n\3\n/;h;s/[^\n]*\n([^\n]*).*/\1/;y/H1/X2/;G;s/([^\n]*)\n([^\n]*)\n([^\n]*)\n([^\n]*)/\2\1\4/'
"Hello World";"Number 2 is there";"tH1s-Has,1,HHunKnownData";
There may be a simpler way that I didn't think of, though.

If awk is ok for you:
awk -F";" '{gsub("H","X",$3);gsub("1","2",$3);}1' OFS=";" file
Using -F, the file is split with semi-colon as delimiter, and hence now the 3rd field($3) is of our interest. gsub function substitutes all occurences of H with X in the 3rd field, and again 1 to 2.
1 is to print every line.

[UPDATE]
(I just realized that it could be shorter. Perl has an auto-split mode):
$F[2] =~ s/H/X/g; $F[2] =~ s/1/2/g; $_=join(";",#F)
Perl is not known for being particularly readable, but in this case I suspect the best you can get with sed might not be as clear as with Perl:
echo '"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";' |
perl -F';' -ape '$F[2] =~ s/H/X/g; $F[2] =~ s/1/2/g; $_=join(";",#F)'
Taking apart the Perl code:
# your groups are in #F, accessed as $F[$i]
$F[2] =~ s/H/X/g; # Do whatever you want with your chosen (Nth) group.
$F[2] =~ s/1/2/g;
$_ = join(";", #F) # Put them back together.
perl -pe is like sed. (sort of.)
and perl -F';' -ape means use auto-splitting (-a) and set the field separator to ';'. Then your groups are accessible via $F[i] - so it works slightly like awk, too.
So it would also work like perl -F';' -ape '/*your code*/' < inputfile
I know you asked for a sed solution - I often find myself switching to Perl (though I do still like sed) for one-liners.

awk -F";" '{gsub("H","X",$3);gsub("1","2",$3);}1' Your_file

This might work for you (GNU sed):
sed 's/H/X/2g;s/1/2/2g' file
This changes all but the first occurrence of H or 1 to X or 2 respectively
If it's by fields separated by ;'s, use:
sed 's/H[^;]*;/&\n/;h;y/H/X/;H;g;s/\n.*\n//;s/1[^;]*;/&\n/;h;y/1/2/;H;g;s/\n.*\n//' file
This can be mutated to cater for many values, so:
echo -e "H=X\n1=2"|
sed -r 's|(.*)=(.*)|s/\1[^;]*;/\&\\n/;h;y/\1/\2/;H;g;s/\\n.*\\n//|' |
sed -f - file

Bash: Extract Range with Regular Expressioin (maybe sed?)

I have a file that is similar to this:
<many lines of stuff>
SUMMARY:
<some lines of stuff>
END OF SUMMARY
I want to extract just the stuff between SUMMARY and END OF SUMMARY. I suspect I can do this with sed but I am not sure how. I know I can modify the stuff in between with this:
sed "/SUMMARY/,/END OF SUMMARY/ s/replace/with/" fileName
(But not sure how to just extract that stuff).
I am Bash on Solaris.

sed -n "/SUMMARY/,/END OF SUMMARY/p" fileName

If Perl is fine you can use:
perl -e 'print $1 if(`cat FILE_NAME`=~/SUMMARY:\n(.*?)END OF SUMMARY/s);'

If you don't want to print the marker lines:
sed '1,/SUMMARY/d;/END OF SUMMARY/,$d' filename

This should work using (FreeBSD) sed as well:
sed -E -n -e '/^SUMMARY:/,/^END OF SUMMARY/{ /^SUMMARY:/d; /^END OF SUMMARY/d; p;}' file.txt

You can do this with awk:
$ echo 'many
lines
of
stuff
SUMMARY:
this is the summary
over two lines
END OF SUMMARY' | awk '
BEGIN {e=0}
/^END OF SUMMARY$/ {e=0}
{if (e==1) {print}}
/^SUMMARY:$/ {e=1}'
which outputs:
this is the summary
over two lines
Not all implementations of awk will require the BEGIN clause but I always like to include explicit initialisation.
It works by using an echo flag (e) to decide whether you're in the summary section or not.

On Solaris , use nawk
#!/bin/bash
nawk '
/SUMMARY/{
gsub(".*SUMMARY:","");
f=1
}
/END OF SUMMARY/{f=0;
gsub("END OF SUMMARY.*","")
}f' file
output
$ cat file
1 2 3 <many lines of stuff>
4 5 6 SUMMARY: 7 8 9
<some lines of stuff>
END OF SUMMARY blah
blah
$ ./shell.sh
7 8 9
<some lines of stuff>

Here's yet another sed version just doing a multi-line print & quit (which may be suitable for extracting a range of lines from a large file):
sed -E -n -e '/^SUMMARY:$/{n;h;};/^END OF SUMMARY$/!H;/^END OF SUMMARY$/{g;p;q;}' fileName | sed 1d
For a multi-line sed script pretty well explained see:
http://ilfilosofo.com/blog/2008/04/26/sed-multi-line-search-and-replace/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Inserting a "," in a particular position of a text - regex

sed can't reference capture groups > 9, Perl can: perl -i -pe 's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.{16})(.{3})/$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,/' SOME.TXT

Related

Work only with lines that have 4 words in SED

How to display words as per given number of letters?

Using sed to replace tab with spaces

How to seek forward and replace selected characters with sed

Bash: Extract Range with Regular Expressioin (maybe sed?)

Categories

Resources