replace all to "hello" except string that is in between double quotes - regex

$ cat file1
"rome" newyork
"rome"
rome
What do I need to fill in the blank?
$ sed ____________________ file1
I want output like
"rome" newyork
"rome"
hello
if my input is like this
$ cat file1
/temp/hello/ram
hello
/hello/temp/ram
if I want to change the hello that does not have slashes what should I do? (change hello to happy)
temp/hello/ram
happy
/hello/temp/ram

sed 's/[^\"]rome[^\"]/hello/g' your_file
tested below:
> cat temp
"rome" newyork
"rome"
rome
> sed 's/[^\"]rome[^\"]/hello/g' temp
"rome" newyork
"rome"
hello
>

Why is rome changed to hello but newyork is not? If I'm reading the question correctly, you're trying to replace everything not in double quotes with hello?
Depending on the exact use cases you want (what happens to the input string ""?), you probably want something like this:
sed 's/\".*\"/hello/'

I dont see a direct way to replace all others except those enclosed inside " "
However, with recursive sed, a brute force method, you can achieve it.
cat file1 | sed "s/\"rome\"/\"italy\"/g" | sed "s/rome/hello/g" | sed "s/\"italy\"/\"rome\"/g"

The second problem can be solved with a simple perl one-liner (assuming only one hello per line):
perl -pe 'next if /\//; s/hello/happy/;'
The first problem requires some internal book keeping to keep track of whether you are inside a string or not. This can also be solved with perl:
#!/usr/bin/perl -w
use strict;
use warnings;
my $state_outside_string = 0;
my $state_inside_string = 1;
my $state = $state_outside_string;
while (my $line = <>) {
my #chars = split(//,$line);
my $not_yet_printed = "";
foreach my $char (#chars) {
if ($char eq '"') {
if ($state == $state_outside_string) {
$state = $state_inside_string;
$not_yet_printed =~ s/rome/hello/;
print $not_yet_printed;
$not_yet_printed = "";
} else {
$state = $state_outside_string;
}
print $char;
next;
}
if ($state == $state_inside_string) {
print $char;
} else {
$not_yet_printed .= $char;
}
}
$not_yet_printed =~ s/rome/hello/;
print $not_yet_printed;
}

Related

Use sed to replace letters [a-z] and [A-Z] and ['] with underscores

...for all characters but the first letter of every word on a line excluding the first word. All text is English language.
Would like to use sed to convert input like this:
Mary had a little lamb
It's fleece was white as snow
to this:
Mary h__ a l_____ l___
It's f_____ w__ w____ a_ s___
For a project that looks at cued recall.
Looked at several intros to sed and regex. Would be using the flavor of sed on the terminal shipped with MacOS 10.14.5.
This might work for you (GNU sed):
sed -E 'h;y/'\''/x/;s/\B./_/g;G;s/\S+\s*(.*)\n(\S+\s*).*/\2\1/' file
Make a copy of the current line in the hold space. Translate ''s to `x's so that such words can be filled with underscores other than the first letter of each word. Append the copied line and using grouping and back references replace the first word of the line unadulterated.
sed is for doing simple s/old/new operations on individual strings, that is all. For anything else you should be using awk, e.g. with GNU awk for the 3rd arg to match():
$ awk '{
out = $1
$1 = ""
while ( match($0,/(\S)(\S*)(.*)/,a) ) {
out = out OFS a[1] gensub(/./,"_","g",a[2])
$0 = a[3]
}
print out $0
}' file
Mary h__ a l_____ l___
It's f_____ w__ w____ a_ s___
With any awk in any shell on every UNIX box including the default awk on MacOS:
$ awk '{
out = $1
$1 = ""
while ( match($0,/[^[:space:]][^[:space:]]*/) ) {
str = substr($0,RSTART+1,RLENGTH-1)
gsub(/./,"_",str)
out = out OFS substr($0,RSTART,1) str
$0 = substr($0,RSTART+RLENGTH)
}
print out $0
}' file
Mary h__ a l_____ l___
It's f_____ w__ w____ a_ s___
Here is another awk script (all awk versions), I enjoyed creating for this quest.
script.awk
{
for (i = 2; i <= NF; i++) { # for each input word starting from 2nd word
head = substr($i,1,1); # output word head is first letter from current field
tail = substr("____________________________", 1, length($i) - 1); # output word tail is computed from template word
$i = head tail; # recreate current input word from head and tail
}
print; # output the converted line
}
input.txt
Mary had a little lamb
It's fleece was white as snow
run:
awk -f script.awk input.txt
this could be also condensed into single line:
awk '{for (i = 2; i <= NF; i++) $i = substr($i,1,1) substr("____________________________", 1, length($i) - 1); print }' input.txt
output is:
Mary h__ a l_____ l____
It's f_____ w__ w____ a_ s___
I enjoyed this task.

translating awk script into perl

I'm trying to translate this code into perl.
gawk '/^>c/ {OUT=substr($0,2) ".fa";print " ">OUT}; OUT{print >OUT}' your_input
Can someone help me?
Perl has a utility to do this for you called a2p. If your script is call script.awk then you would run:
$ a2p script.awk
Which produces:
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
# process any FOO=bar switches
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
while (<>) {
chomp; # strip record separator
if (/^>c/) {
$OUT = substr($_, (2)-1) . '.fa';
&Pick('>', $OUT) &&
(print $fh ' ');
}
;
if ($OUT) {
&Pick('>', $OUT) &&
(print $fh $_);
}
}
sub Pick {
local($mode,$name,$pipe) = #_;
$fh = $name;
open($name,$mode.$name.$pipe) unless $openammeamme}++;
}
To save this to a file, use redirection:
$ a2p script.awk > script.pl
Perl also provides a tool for converting sed scripts: s2p.
#!/usr/bin/perl
my ($outf,$OUT) ;
while(<>){
if(/^>(c.*)/){ $OUT = "$1.fa";
close($outf) if $outf;
open($outf,">",$OUT);
print OUT " \n"}
if($outf){ print $outf $_ }
}
if input is:
>caaa
sdf
sdff
>cbbb
ew
ew
Creats 2 files:
==> caaa.fa <==
>caaa
sdf
sdff
==> cbbb.fa <==
>cbbb
ew
ew
This perl one liner should be equivalent of that awk command:
perl -ane 'if($F[0] =~ /^>c/){$OUT=substr($F[0],1).".fa"; if(OUT==null) {open(OUT,">$OUT");} print OUT " \n"} if ($OUT){print OUT $_} END{close(OUT)}' file
Indented command line:
perl -ane 'if ($F[0] =~ /^>c/) {
$OUT = substr($F[0], 1).".fa";
if (OUT==null) { open(OUT, ">$OUT") }
print OUT " \n"
}
if ($OUT) {
print OUT $_
}
END{close(OUT)
}' file

match using regex in perl

HI I am trying to exract some data from a text file in perl. My file looks like this
Name:John
FirstName:Smith
Name:Alice
FirstName:Meyers
....
I want my string to look like John Smith and Alice Meyers
I tried something like this but I'm stuck and I don't know how to continue
while (<INPUT>) {
if (/^[Name]/) {
$match =~ /(:)(.*?)(\n) /
$string = $string.$2;
}
if (/^[FirstName]/) {
$match =~ /(:)(.*?)(\n)/
$string = $string.$2;
}
}
What I try to do is that when I match Name or FirstName to copy to content between : and \n but I get confused which is $1 and $2
This will put you first and last names in a hash:
use strict;
use warnings;
use Data::Dumper;
open my $in, '<', 'in.txt';
my (%data, $names, $firstname);
while(<$in>){
chomp;
($names) = /Name:(.*)/ if /^Name/;
($firstname) = /FirstName:(.*)/ if /^FirstName/;
$data{$names} = $firstname;
}
print Dumper \%data;
Through perl one-liner,
$ perl -0777 -pe 's/(?m).*?Name:([^\n]*)\nFirstName:([^\n]*).*/\1 \2/g' file
John Smith
Alice Meyers
while (<INPUT>) {
/^([A-Za-z])+\:\s*(.*)$/;
if ($1 eq 'Name') {
$surname = $2;
} elsif ($1 eq 'FirstName') {
$completeName = $2 . " " . $surname;
} else {
/* Error */
}
}
You might want to add some error handling, e.g. make sure that a Name is always followed by a FirstName and so on.
$1 $2 $3 .. $N , it's the capture result of () inside regex.
If you do something like that , you cant avoid using $1 like variables.
my ($matched1,$matched2) = $text =~ /(.*):(.*)/
my $names = [];
my $name = '';
while(my $row = <>){
$row =~ /:(.*)/;
$name = $name.' '.$1;
push(#$names,$name) if $name =~ / /;
$name = '' if $name =~ / /;
}
`while(<>){
}
`
open (FH,'abc.txt');
my(%hash,#array);
map{$_=~s/.*?://g;chomp($_);push(#array,$_)} <FH>;
%hash=#array;
print Dumper \%hash;

Identifying pseudo-duplicates with Perl

I have a list that contains names. There are multiples of the same name. I want to catch the first instance of these pseudo-dupes and anchor them.
Example input
Josh Smith
Josh Smith0928340938
Josh Smith and friends
hello
hello1223
hello and goodbye.
What I want to do is identify the first occurrence of Josh Smith or hello and put an anchor such as a pipe | in front of it to validate. These are also wildcards as the list is large, so I cannot specifically look for the first match of Josh Smith and so on.
My desired output would be this:
|Josh Smith
Josh Smith0928340938
Josh Smith and friends
|hello
hello1223
hello and goodbye.
I did not provide any code. I am a little in the dark on how to go about this and was hoping maybe someone had been in a similar situation using regex or Perl.
I think based on what I understand of your requirements you are looking for something like this:
$prefix = '';
$buffered = '';
$count = 0;
while ($line = <>) {
$linePrefix = substr($line,0,length($prefix));
if ($buffered ne '' && $linePrefix eq $prefix) {
$buffered .= $line;
$count++;
} else {
if ($buffered ne '') {
print "|" if ($count > 1);
print $buffered;
}
$buffered = $line;
$prefix = $line;
chomp $prefix;
$count = 1;
}
}
if ($buffered ne '') {
if ($count > 1) {
print "|";
}
print $buffered;
}
Actually, IMO this is a rather interesting question, because you can be creative. As you do not know how to identify the root name, I have to ask if you have to? I have a feeling that you do not need a perfect solution. Therefore, I would go for something simple:
#!/usr/bin/perl -wn
$N = 4;
if (#prev) {
$same_start = length $_ >= $N &&
substr($prev[0], 0, $N) eq substr($_, 0, $N);
unless ($same_start) {
print "|", shift #prev if $#prev;
#prev = grep { print;0 } #prev;
}
}
push #prev, $_;
}{ print for #prev
edit: fixed bug: <print "|", shift #prev;> to <print "|", shift #prev if $#prev;>
Sample output:
$ perl josh.pl <josh-input.txt
|Josh Smith
Josh Smith0928340938
Josh Smith and friends
|hello
hello1223
hello and goodbye.

How to switch/rotate every two lines with sed/awk?

I have been doing this by hand and I just can't do it anymore-- I have thousands of lines and I think this is a job for sed or awk.
Essentially, we have a file like this:
A sentence X
A matching sentence Y
A sentence Z
A matching sentence N
This pattern continues for the entire file. I want to flip every sentence and matching sentence so the entire file will end up like:
A matching sentence Y
A sentence X
A matching sentence N
A sentence Z
Any tips?
edit: extending the initial problem
Dimitre Radoulov provided a great answer for the initial problem. This is an extension of the main problem-- some more details:
Let's say we have an organized file (due to the sed line Dimitre gave, the file is organized). However, now I want to organize the file alphabetically but only using the language (English) of the second line.
watashi
me
annyonghaseyo
hello
dobroye utro!
Good morning!
I would like to organize alphabetically via the English sentences (every 2nd sentence). Given the above input, this should be the output:
dobroye utro!
Good morning!
annyonghaseyo
hello
watashi
me
For the first part of the question, here is a one way to swap every other line with each other in sed without using regular expressions:
sed -n 'h;n;p;g;p'
The -n command line suppresses the automatic printing. Command h puts copies the current line from the pattern space to the hold space, n reads in the next line to the pattern space and p prints it; g copies the first line from the hold space back to the pattern space, bringing the first line back into the pattern space, and p prints it.
sed 'N;
s/\(.*\)\n\(.*\)/\2\
\1/' infile
N - append the next line of input into the pattern space
\(.*\)\n\(.*\) - save the matching parts of the pattern space
the one before and the one after the newline.
\2\\
\1 - exchange the two lines (\1 is the first saved part,
\2 the second). Use escaped literal newline for portability
With some sed implementations you could use the escape sequence
\n: \2\n\1 instead.
First question:
awk '{x = $0; getline; print; print x}' filename
next question: sort by 2nd line
paste - - < filename | sort -f -t $'\t' -k 2 | tr '\t' '\n'
which outputs:
dobroye utro!
Good morning!
annyonghaseyo
hello
watashi
me
Assuming an input file like this:
A sentence X
Z matching sentence Y
A sentence Z
B matching sentence N
A sentence Z
M matching sentence N
You could do both exchange and sort with Perl:
perl -lne'
$_{ $_ } = $v unless $. % 2;
$v = $_;
END {
print $_, $/, $_{ $_ }
for sort keys %_;
}' infile
The output I get is:
% perl -lne'
$_{ $_ } = $v unless $. % 2;
$v = $_;
END {
print $_, $/, $_{ $_ }
for sort keys %_;
}' infile
B matching sentence N
A sentence Z
M matching sentence N
A sentence Z
Z matching sentence Y
A sentence X
If you want to order by the first line (before the exchange):
perl -lne'
$_{ $_ } = $v unless $. % 2;
$v = $_;
END {
print $_, $/, $_{ $_ }
for sort {
$_{ $a } cmp $_{ $b }
} keys %_;
}' infile
So, if the original file looks like this:
% cat infile1
me
watashi
hello
annyonghaseyo
Good morning!
dobroye utro!
The output should look like this:
% perl -lne'
$_{ $_ } = $v unless $. % 2;
$v = $_;
END {
print $_, $/, $_{ $_ }
for sort {
$_{ $a } cmp $_{ $b }
} keys %_;
}' infile1
dobroye utro!
Good morning!
annyonghaseyo
hello
watashi
me
This version should handle duplicate records correctly:
perl -lne'
$_{ $_, $. } = $v unless $. % 2;
$v = $_;
END {
print substr( $_, 0, length() - 1) , $/, $_{ $_ }
for sort {
$_{ $a } cmp $_{ $b }
} keys %_;
}' infile
And another version, inspired by the solution posted by Glenn (record exchange included and assuming the pattern _ZZ_ is not present in the text file):
sed 'N;
s/\(.*\)\n\(.*\)/\1_ZZ_\2/' infile |
sort |
sed 's/\(.*\)_ZZ_\(.*\)/\2\
\1/'