Regex for Replacing String with Incrementing Number - regex

Once upon a time I used UltraEdit for making this
text 1
text 2
text 3
text 4
...
from that
text REPLACE
text REPLACE
text REPLACE
text REPLACE
...
it was as easy as replace REPLACE with \i
now, how can I do this with eg. sed?
If you provide a solution, could you please add directions for filling the result with leading zeros?
thanks

You can use awk instead of sed for this:
awk '{ sub(/REPLACE/, ++i) } 1' file
text 1
text 2
text 3
text 4
...
Code Demo

Is this what you want?
$ awk '{$NF=sprintf("%05d",++i)} 1' file
text 00001
text 00002
text 00003
text 00004
If not, edit your question to show some more truly representative sample input and expected output (and get rid of the ...s if they don't literally exist in your input and output as they make your example non-testable).

perl -i -pe 's/REPLACE/++$i/ge' file
For zero-padding to the minimum width (i.e. if there are 10 replacements, use field width 2):
perl -i -pe '
BEGIN {
$patt = "REPLACE";
chomp( $n = qx(grep -co "$patt" file) );
$n = int( log($n)/log(10) ) + 1;
}
s/$patt/ sprintf("%0*d", $n, ++$i) /ge
' file

Related

extract timecode seperated by a pattern and the text in the line after

I want to extract the start-time and end-time and subtitle text from a set subtitle file. What is a better way of doing this? The subtitle file is as follows:
1
00:00:14,680 --> 00:00:23,960
on
2
00:00:24,480 --> 00:00:30,000
VERT
3
00:00:30,080 --> 00:00:38,120
UD
4
00:00:38,120 --> 00:00:39,040
REST
I want the following:
00:00:14.680 , 00:00:23.960, on
00:00:24.480 , 00:00:30.000, VERT
00:00:30.080 , 00:00:38.120, UD
00:00:38.120 , 00:00:39.040, REST
After some googling, I can extract on an online regex with the following, as shown in the image. How do I put extracted text in a file (and replace the , with a .?
(\d.{11})\s-->\s(\d.{11})[\r\n](\w+)
Update: Got what I want with the following. Is there any way to add to replace the ,\ with .?
gawk 'match($0, /([0-9].{11})\s-->\s([0-9].{11})/, a) {getline; print a[1], "\t", a[2],"\t", $0}'
This works using grep and perl:
$ cat text.txt | egrep -v '^[0-9]*$'| perl -pe 's/(:\d{2}),(\d)/$1.$2/g; s/ *--> */, /; s/(\d)\n/$1, /g;'
00:00:14.680, 00:00:23.960, on
00:00:24.480, 00:00:30.000, VERT
00:00:30.080, 00:00:38.120, UD
00:00:38.120, 00:00:39.040, REST
the egrep removes empty and digits only lines
several perl search & replaces fix the commas to a dots, fix the --> arrow to a comma, and joins tow lines with a comma

Regexp for removing certain columns

I have an input of this format:
<apple1> <orange1> : <apple2> <orange2> : <apple3> <orange3> : ...
This input is of undefined length and consists of apple-orange pairs with varying orange and apple parts, separated by a colon.
I'd like to have this as an output:
<apple1> <orange1> : <orange2> : <orange3> : ...
I. e. all apple parts but the first removed.
Each apple part is 14 characters wide, each orange part is 19 characters wide.
I tried things like this:
sed -r 's/.{14}(.{19}):/\1:/g'
But this always ran into problems skipping the first apple part.
Can anybody provide a regexp solving this task?
Real world example input:
appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo
foofoofoofoofobarbarbarbarbarbarb:foofoofoofoofobarbarbarbarbarbarb:foofoofoofoofobarbarbarbarbarbarb
xxxxxxxxxxxxxxooooooooooooooooooo:ppppppppppppppqqqqqqqqqqqqqqqqqqq:nnnnnnnnnnnnnnttttttttttttttttttt
Output should be this:
appleappleapplorangeorangeorangeo:orangeorangeorangeo:orangeorangeorangeo
foofoofoofoofobarbarbarbarbarbarb:barbarbarbarbarbarb:barbarbarbarbarbarb
xxxxxxxxxxxxxxooooooooooooooooooo:qqqqqqqqqqqqqqqqqqq:ttttttttttttttttttt
Your regex to sed was almost correct. Just match ":_14_19" over and over and remove the 14 part. (Note: I use commas as regex delimiters below because they're easier to read.)
$ export A='appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo:foofoofoofoofobarbarbarbarbarbarb:foofoofoofoofobarbarbarbarbarbarb:foofoofoofoofobarbarbarbarbarbarb:xxxxxxxxxxxxxxooooooooooooooooooo:ppppppppppppppqqqqqqqqqqqqqqqqqqq:nnnnnnnnnnnnnnttttttttttttttttttt'
$ echo $A | sed -Ee 's,:.{14}(.{19}),:\1,g'
appleappleapplorangeorangeorangeo:orangeorangeorangeo:orangeorangeorangeo:barbarbarbarbarbarb:barbarbarbarbarbarb:barbarbarbarbarbarb:ooooooooooooooooooo:qqqqqqqqqqqqqqqqqqq:ttttttttttttttttttt
This job is more suitable to awk as input file is well structured in rows and columns using a known delimiter i.e. colon:
awk 'BEGIN{FS=OFS=":"} {for (i=2; i<=NF; i++) $i = substr($i, 15)} 1' file
appleappleapplorangeorangeorangeo:orangeorangeorangeo:orangeorangeorangeo
foofoofoofoofobarbarbarbarbarbarb:barbarbarbarbarbarb:barbarbarbarbarbarb
xxxxxxxxxxxxxxooooooooooooooooooo:qqqqqqqqqqqqqqqqqqq:ttttttttttttttttttt
This awk command uses : as input+output delimiter and starting from 2nd field in each record it sets each field to a substring of same field from 15th position.
With perl..
Our Input: appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo
lets assume
a=appleappleappl (14 characters)
b=orangeorangeorangeo (19 characters)
c=appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo (rest of the line, which is a repeating combination of a and b.
Expected output: Before the fist colon (:), both a and b are kept and after the first colon, only b is kept.
${a}${b}:${b}:${b}:.... (please correct me if I am wrong)
So here it is once again, to recap, both the input and output.
Our Input: appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo
Expected Output: appleappleapplorangeorangeorangeo:orangeorangeorangeo:orangeorangeorangeo
Please try this script: (As mentioned earlier, this is using perl and not shell).
%_Host#User> cat apple.pl
#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
chomp $_ ;
my #tmp = split /:/, $_ ;
my ($a,$b) = (substr($tmp[0],0,14), substr($tmp[0],14,19)) ;
my $str = "$a"."$b" ;
foreach my $i (1..$#tmp) {
$tmp[$i] =~ s/$a//g ;
$str .= ":"."$tmp[$i]" ;
}
print "$str\n" ;
}
%_Host#User>
Script Output:
%_Host#User> cat td_apple |./apple.pl
appleappleapplorangeorangeorangeo:orangeorangeorangeo:orangeorangeorangeo
foofoofoofoofobarbarbarbarbarbarb:barbarbarbarbarbarb:barbarbarbarbarbarb
xxxxxxxxxxxxxxooooooooooooooooooo:ppppppppppppppqqqqqqqqqqqqqqqqqqq:nnnnnnnnnnnnnnttttttttttttttttttt
Sample Data:
%_Host#User> cat td_apple
appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo
foofoofoofoofobarbarbarbarbarbarb:foofoofoofoofobarbarbarbarbarbarb:foofoofoofoofobarbarbarbarbarbarb
xxxxxxxxxxxxxxooooooooooooooooooo:ppppppppppppppqqqqqqqqqqqqqqqqqqq:nnnnnnnnnnnnnnttttttttttttttttttt
%_Host#User>
Thanks.

Replace a block of text

I have a file in this pattern:
Some text
---
## [Unreleased]
More text here
I need to replace the text between '---' and '## [Unreleased]' with something else in a shell script.
How can it be achieved using sed or awk?
Perl to the rescue!
perl -lne 'my #replacement = ("First line", "Second line");
if ($p = (/^---$/ .. /^## \[Unreleased\]/)) {
print $replacement[$p-1];
} else { print }'
The flip-flop operator .. tells you whether you're between the two strings, moreover, it returns the line number relative to the range.
This might work for you (GNU sed):
sed '/^---/,/^## \[Unreleased\]/c\something else' file
Change the lines between two regexp to the required string.
This example may help you.
$ cat f
Some text
---
## [Unreleased]
More text here
$ seq 1 5 >mydata.txt
$ cat mydata.txt
1
2
3
4
5
$ awk '/^---/{f=1; while(getline < c)print;close(c);next}/^## \[Unreleased\]/{f=0;next}!f' c="mydata.txt" f
Some text
1
2
3
4
5
More text here
awk -v RS="\0" 'gsub(/---\n\n## \[Unreleased\]\n/,"something")+1' file
give this line a try.
An awk solution that:
is portable (POSIX-compliant).
can deal with any number of lines between the start line and the end line of the block, and potentially with multiple blocks (although they'd all be replaced with the same text).
reads the file line by line (as opposed to reading the entire file at once).
awk -v new='something else' '
/^---$/ { f=1; next } # Block start: set flag, skip line
f && /^## \[Unreleased\]$/ { f=0; print new; next } # Block end: unset flag, print new txt
! f # Print line, if before or after block
' file

Edit line names with a new name containing an incremented value

This seems like a simple task to me but getting it to work easily is ending up more difficult than I thought:
I have a fasta file containing several million lines of text (only a few hundred individual sequence entries) and these sequence names are long, I want to replace all characters after the header > with Contig $n, where $n is an integer starting at 1 and is incremented for each replacement.
an example input sequence name:
>NODE:345643RD:Cov_456:GC47:34thgd
ATGTCGATGCGT
>NODE...
ATGCGCTTACAC
Which I then want to output like this
>Contig 1
ATGTCGATGCGT
>Contig 2
ATGCGCTTACAC
so maybe a Perl script? I know some basics but I'd like to read in a file and then output the new file with the changes, and I'm unsure of the best way to do this? I've seen some Perl one liner examples but none did what I wanted.
$n = 1
if {
s/>.*/(Contig)++$n/e
++$n
}
$ awk '/^\\>/{$0="\\>Contig "++n} 1' file
\>Contig 1
ATGTCGATGCGT
\>Contig 2
ATGCGCTTACAC
Try something like this:
#!/usr/bin/perl -w
use strict;
open (my $fh, '<','example.txt');
open (my $fh1, '>','example2.txt');
my $n = 1;
# For each line of the input file
while(<$fh>) {
# Try to update the name, if successful, increment $n
if ($_ =~ s/^>.*/>Contig$n/) { $n++; }
print $fh1 $_;
}
When you use the /e modifier, Perl expects the substitution pattern to be a valid Perl expression. Try something like
s/>.*/">Contig " . ++$n/e
perl -i -pe 's/>.*/">Contig " . ++$c/e;' file.txt
Output:
\>Contig 1
ATGTCGATGCGT
\>Contig 2
ATGCGCTTACAC
I'm not awk expert (far from that), but solved this only for curiosity and because sed don't contain variables (limited possibilities).
One possible gawk solution could be
awk -v n=1 '/^>/{print ">Contig " n; n++; next}1' <file

Cut and copy-paste given positions of the text

My dummy text file (one continuous line) looks like this:
AAChvhkfiAFAjjfkqAPPMB
I want to:
Delete part of the text (specific range);
Copy-Paste (specific range of characters) within the file.
How I am doing this:
To cut part of the text at wanted positions (from 5 to 7 characters & from 10 to 14 characters) I use cut
echo 'AAChvhkfiAFAjjfkqAPPMB' | cut --complement -c 5-7,10-14
AAChfifkqAPPMB
But I really don't know how to copy-paste text. For example: to copy text from 15 to 18 characters and paste it after character 1 (also using previous cut command). To get the final result like this:
fkqAAAChfifkqAPPMB
So I do have to questions:
How to read text (from .. to) given range using perl, awk or sed & paste this text at specific position.
How to combine this text pasting with the previous cut command as after cutting text will move to the left side, hence wrong text will be copied.
Maybe something like this:
$ echo AAChvhkfiAFAjjfkqAPPMB | awk '{ print(substr($1, 0, 14) substr($1, 18) substr($1, 15, 3)) }'
AAChvhkfiAFAjjAPPMBfkq
In Perl I think substr would be a good candidate, try eg.
$a = '1234567890';
#from pos 2, replace 3 chars with nothing, return the 3 chars
$b=substr($a,2,3,'');
print "$a\t$b\n"; #1267890 345
#in posistion 0 (first), replace 0 characters (ie pure insert)
#with the content of $b
substr($a,0,0,$b);
print "$a\t$b\n"; #3451267890 345
See http://perldoc.perl.org/functions/substr.html for more details.
splice() may be a candidate as well.
In perl, you can use array slice, by splitting the string in a array :
my $string = "AAChvhkfiAFAjjfkqAPPMB1";
my #arr = split //, $string;
and slicing (print element 5 to 7 and 10 to 14):
print #array[5..7,10..14];
you can use splice() too to re-arrange the array.
perldoc said :
Removes the elements designated by OFFSET and LENGTH from an array, and replaces them with the elements of LIST, if any.
See http://perldoc.perl.org/perldata.html#Slices
quite straightforward with awk:
kent$ echo "AAChvhkfiAFAjjfkqAPPMB"|awk '
{for(i=5;i<=7;i++)$i="";
for(i=10;i<=14;i++)$i="";
for(i=15;i<=18;i++)t=sprintf("%s%s",t,$i);
$0=t""$0}1' OFS="" FS=""
fkqAAAChfifkqAPPMB
edit
to reverse the part of text, you just need to swap t and $i:
kent$ echo "AAChvhkfiAFAjjfkqAPPMB"|awk '
{for(i=5;i<=7;i++)$i="";
for(i=10;i<=14;i++)$i="";
for(i=15;i<=18;i++)t=sprintf("%s%s",$i,t);
$0=t""$0}1' OFS="" FS=""
AqkfAAChfifkqAPPMB