How can I substitute one substring for another in Perl? - regex

I have a file and a list of string pairs which I get from another file. I need substitute the first string of the pair with the second one, and do this for each pair.
Is there more efficient/simple way to do this (using Perl, grep, sed or other), then running a separate regexp substitution for each pair of values?

#! /usr/bin/perl
use warnings;
use strict;
my %replace = (
"foo" => "baz",
"bar" => "quux",
);
my $to_replace = qr/#{["(" .
join("|" => map quotemeta($_), keys %replace) .
")"]}/;
while (<DATA>) {
s/$to_replace/$replace{$1}/g;
print;
}
__DATA__
The food is under the bar in the barn.
The #{[...]} bit may look strange. It's a hack to interpolate generated content inside quote and quote-like operators. The result of the join goes inside the anonymous array-reference constructor [] and is immediately dereferenced thanks to #{}.
If all that seems too wonkish, it's the same as
my $search = join "|" => map quotemeta($_), keys %replace;
my $to_replace = qr/($search)/;
minus the temporary variable.
Note the use of quotemeta—thanks Ivan!—which escapes the first string of each pair so the regular-expression engine will treat them as literal strings.
Output:
The bazd is under the quux in the quuxn.
Metaprogramming—that is, writing a program that writes another program—is also nice. The beginning looks familiar:
#! /usr/bin/perl
use warnings;
use strict;
use File::Compare;
die "Usage: $0 path ..\n" unless #ARGV >= 1;
# stub
my #pairs = (
["foo" => "baz"],
["bar" => "quux"],
['foo$bar' => 'potrzebie\\'],
);
Now we generate the program that does all the s/// replacements—but is quotemeta on the replacement side a good idea?—
my $code =
"sub { while (<>) { " .
join(" " => map "s/" . quotemeta($_->[0]) .
"/" . quotemeta($_->[1]) .
"/g;",
#pairs) .
"print; } }";
#print $code, "\n";
and compile it with eval:
my $replace = eval $code
or die "$0: eval: $#\n";
To do the replacements, we use Perl's ready-made in-place editing:
# set up in-place editing
$^I = ".bak";
my #save_argv = #ARGV;
$replace->();
Below is an extra nicety that restores backups that the File::Compare module judges to have been unnecessary:
# in-place editing is conservative: it creates backups
# regardless of whether it modifies the file
foreach my $new (#save_argv) {
my $old = $new . $^I;
if (compare($new, $old) == 0) {
rename $old => $new
or warn "$0: rename $old => $new: $!\n";
}
}

There are two ways, both of them require you to compile a regex alternation on the keys of the table:
my %table = qw<The A the a quick slow lazy dynamic brown pink . !>;
my $alt
= join( '|'
, map { quotemeta } keys %table
sort { ( length $b <=> length $a ) || $a cmp $b }
)
;
my $keyword_regex = qr/($alt)/;
Then you can use this regex in a substitution:
my $text
= <<'END_TEXT';
The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog.
The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog.
END_TEXT
$text =~ s/$keyword_regex/$table{ $1 }/ge; # <- 'e' means execute code
Or you can do it in a loop:
use English qw<#LAST_MATCH_START #LAST_MATCH_END>;
while ( $text =~ /$keyword_regex/g ) {
my $key = $1;
my $rep = $table{ $key };
# use the 4-arg form
substr( $text, $LAST_MATCH_START[1]
, $LAST_MATCH_END[1] - $LAST_MATCH_START[1], $rep
);
# reset the position to start + new actual
pos( $text ) = $LAST_MATCH_START[1] + length $rep;
}

Build a hash of the pairs. Then split the target string into word tokens, and check each token against the keys in the hash. If it's present, replace it with the value of that key.

If eval is not a security concern:
eval $(awk 'BEGIN { printf "sed \047"} {printf "%s", "s/\\<" $1 "\\>/" $2 "/g;"} END{print "\047 substtemplate"}' substwords )
This constructs a long sed command consisting of multiple substitution commands. It's subject to potentially exceeding your maximum command line length. It expects the word pair file to consist of two words separated by whitespace on each line. Substitutions will be made for whole words only (no clbuttic substitutions).
It may choke if the word pair file contains characters that are significant to sed.
You can do it this way if your sed insists on -e:
eval $(awk 'BEGIN { printf "sed"} {printf "%s", " -e \047s/\\<" $1 "\\>/" $2 "/g\047"} END{print " substtemplate"}' substwords)

Related

multi replace in postgresql query using perl

I'm cleaning some text directly in my query, and rather than using nested replace functions, I found this bit of code that uses perl to perform multiple replacements at once: multi-replace with perl
CREATE FUNCTION pg_temp.multi_replace(string text, orig text[], repl text[])
RETURNS text
AS $BODY$
my ($string, $orig, $repl) = #_;
my %subs;
if (#$orig != #$repl) {
elog(ERROR, "array sizes mismatch");
}
if (ref #$orig[0] eq 'ARRAY' || ref #$repl[0] eq 'ARRAY') {
elog(ERROR, "array dimensions mismatch");
}
#subs{#$orig} = #$repl;
my $re = join "|",
sort { (length($b) <=> length($a)) } keys %subs;
$re = qr/($re)/;
$string =~ s/$re/$subs{$1}/g;
return $string;
$BODY$ language plperl strict immutable;
Example query:
select
name as original_name,
multi_replace(name, '{-,&,LLC$}', '{_,and,business}') as cleaned_name
from some_table
The function finds the pattern LLC at the end of the name string but removes it instead of replacing it with "business."
How can I make this work as intended?
When the strings in #$orig are to be matched literally, I'd actually use this:
my ($string, $orig, $repl) = #_;
# Argument checks here.
my %subs; #subs{ #$orig } = #$repl;
my $pat =
join "|",
map quotemeta,
sort { length($b) <=> length($a) }
#$orig;
return $string =~ s/$re/$subs{$&}/gr;
In particular, map quotemeta, was missing.
(By the way, the sort line isn't needed if you ensure that xy comes before x in #$orig when you want to replace both x(?!y) and xy.)
But you want the strings in #$orig to be treated as regex patterns. For that, you can use the following:
# IMPORTANT! Only provide strings from trusted sources in
# `#$orig` as it allows execution of arbitrary Perl code.
my ($string, $orig, $repl) = #_;
# Argument checks here.
my $re =
join "|",
map "(?:$orig->[$_])(?{ $_ })",
0..$#$orig;
{
use re qw( eval );
$re = qr/$re/;
}
return $string =~ s/$re/$repl->[$^R]/gr;
However, in your environment, I have doubts about the availability of use re qw( eval ); and (?{ }), so the above may be an unviable solution for you.
my ($string, $orig, $repl) = #_;
# Argument checks here.
my $re =
join "|",
map "(?<_$_>$orig->[$_])",
0..$#$orig;
$re = qr/$re/;
return
$string =~ s{$re}{
my ( $n ) =
map substr( $_, 1 ),
grep { $-{$_} && defined( $-{$_}[0] ) }
grep { /^_\d+\z/aa }
keys( %- );
$repl->[$n]
}egr;
While the regexp tests for LLC$ with the special meaning of the $, what gets captured into $1 is just the string LLC and so it doesn't find the look-up value to replace.
If the only thing you care about is $, then you could fix it by changing the map-building lines to:
#subs{map {my $t=$_; $t=~s/\$$//; $t} #$orig} = #$repl;
my $re = join "|",
sort { (length($b) <=> length($a)) } #$orig;
But it will be very hard to make it work more generally for every possible feature of regex.
The purpose of this plperl function in the linked blog post is to find/replace strings, not regular expressions. LLC being found with LLC$ as a search term does not happen in the original code, as the search terms go through quotemeta before being included into $re (as also sugggested in ikegami's answer)
The effect of removing the quotemeta transformation is that LLC at the end of a string is matched, but since as a key it's not found in $subs (because the key there isLLC$), then it's getting replaced by an empty string.
So how to make this work with regular expressions in the orig parameter?
The solution proposed by #ikegami does not seem usable from plperl, as it complains with this error: Unable to load re.pm into plperl.
I thought of an alternative implementation without the (?{ code }) feature: each match from the main alternation regexp can be rechecked against each regexp in orig, in a code block run with /ge. On the first match, the corresponding string in repl is selected as the replacement.
Code:
CREATE or replace FUNCTION pg_temp.multi_replace(string text, orig text[], repl text[])
RETURNS text AS
$BODY$
my ($string, $orig, $repl) = #_;
my %subs;
if (#$orig != #$repl) {
elog(ERROR, "array sizes mismatch");
}
if (ref #$orig[0] eq 'ARRAY' || ref #$repl[0] eq 'ARRAY') {
elog(ERROR, "array dimensions mismatch");
}
#subs{#$orig} = #$repl;
my $re = join "|", keys %subs;
$re = qr/($re)/;
# on each match, recheck the match individually against each regexp
# to find the corresponding replacement string
$string =~ s/$re/{ my $r; foreach (#$orig) { if ($1 =~ $_) {$r=$subs{$_}; last;} } $r;}/ge;
return $string;
$BODY$ language plperl strict immutable;
Test
=> select pg_temp.multi_replace(
'bar foo - bar & LLC',
'{^bar,-,&,LLC$}',
'{baz,_,and,business}'
);
multi_replace
----------------------------
baz foo _ bar and business

Dynamically capture regular expression match in Perl

I'm trying to dynamically catch regex matching in Perl. I've known that eval will help me do this but I may be doing something wrong.
Code:
use strict;
use warnings;
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
my $str = '1/12/2016';
foreach my $pattern (keys (%testHash)) {
my $value = $testHash{$pattern};
my $result;
eval {
local $_ = $str;
/$pattern/;
print "\$1 - $1\n";
print "\$2 - $2\n";
print "\$3 - $3\n";
eval { print "$value\n"; }
}
}
Is it also possible to store captured regex patterns in an array?
I believe what you really want is a dynamic version of the following:
say $str =~ s/(\d+)\/(\d+)\/(\d+)/$1$2$3/gr;
String::Substitution provides what we need to achieve that.
use String::Substitution qw( gsub_copy );
for my $pattern (keys(%testHash)) {
my $replacement = $testHash{$pattern};
say gsub_copy($str, $pattern, $replacement);
}
Note that $replacement can also be a callback. This permits far more complicated substitutions. For example, if you wanted to convert 1/12/2016 into 2016-01-12, you could use the following:
'(\d+)/(\d+)/(\d+)' => sub { sprintf "%d-%02d-%02d", #_[3,1,2] },
To answer your actual question:
use String::Substitution qw( interpolate_match_vars last_match_vars );
for my $pattern (keys(%testHash)) {
my $template = $testHash{$pattern};
$str =~ $pattern # Or /$pattern/ if you prefer
or die("No match!\n");
say interpolate_match_vars($template, last_match_vars());
}
I am not completely sure what you want to do here, but I don't think your program does what you think it does.
You are useing eval with a BLOCK of code. That's like a try block. If it dies inside of that eval block, it will catch that error. It will not run your string like it was code. You need a string eval for that.
Instead of explaining that, here's an alternative.
This program uses sprintf and numbers the parameters. The %1$s syntax in the pattern says _take the first argument (1$) and format it as a string (%s). You don't need to localize or assign to $_ to do a match. The =~ operator does that on other variables for you. I also use qr{} to create a quoted regular expression (essentially a variable containing a precompiled pattern) that I can use directly. Because of the {} as delimiter, I don't need to escape the slashes.
use strict;
use warnings;
use feature 'say'; # like print ..., "\n"
my %testHash = (
qr{(\d+)/(\d+)/(\d+)} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d+) nomatch} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d\d\d\d)} => '%3$4d-%2$02d-%1$02d',
qr{\d} => '%s', # no capture group
);
my $str = '1/12/2016';
foreach my $pattern ( keys %testHash ) {
my #captures = ( $str =~ $pattern );
say "pattern: $pattern";
if ($#+ == 0) {
say " no capture groups";
next;
}
unless (#captures) {
say " no match";
next;
}
# debug-output
for my $i ( 1 .. $#- ) {
say sprintf " \$%d - %s", $i, $captures[ $i - 1 ];
}
say sprintf $testHash{$pattern}, #captures;
}
I included four examples:
The first pattern is the one you had. It uses %1$s and so on as explained above.
The second one does not match. We check the number of elements in #captured by looking at it in scalar context.
The third one shows that you can also reorder the result, or even use the sprintf formatting.
The last one has no capture group. We check by looking at the index of the last element ($# as the sigil for arrays that usually have an # sigil) in #+, which holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. The first element is the end of the overall match, so if this only has one element, we don't have capture groups.
The output for me is this:
pattern: (?^:(\d+)/(\d+)/(\d\d\d\d))
$1 - 1
$2 - 12
$3 - 2016
2016-12-01
pattern: (?^:(\d+)/(\d+)/(\d+) nomatch)
no match
pattern: (?^:\d)
no capture groups
pattern: (?^:(\d+)/(\d+)/(\d+))
$1 - 1
$2 - 12
$3 - 2016
1.12.2016
Note that the order in the output is mixed up. That's because hashes are not ordered in Perl, and if you iterate over the keys in a hash without sort the order is random.
Apologies! I realized both my question and sample code were both vague. But after reading your suggestions I came of with the following code.
I haven't optimized this code yet and there is a limit to the replacement.
foreach my $key (keys %testHash) {
if ( $str =~ $key ) {
my #matchArr = ($str =~ $key); # Capture all matches
# Search and replace (limited from $1 to $9)
for ( my $i = 0; $i < #matchArr; $i++ ) {
my $num = $i+1;
$testHash{$key} =~ s/\$$num/$matchArr[$i]/;
}
$result = $testHash{$key};
last;
}
}
print "$result\n";
Evaluing the regexp in list context returns the matches. so in your example:
use Data::Dumper; # so we can see the result
foreach my $pattern (keys (%testHash)) {
my #a = ($str =~/$pattern/);
print Dumper(\#a);
}
would do the job.
HTH
Georg
Is it also possible to store captured regex patterns in an array?
Of course it is possible to store captured substrings in an array:
#!/usr/bin/env perl
use strict;
use warnings;
my #patterns = map qr{$_}, qw{
(\d+)/(\d+)/(\d+)
};
my $str = '1/12/2016';
foreach my $pattern ( #patterns ) {
my #captured = ($str =~ $pattern)
or next;
print "'$_'\n" for #captured;
}
Output:
'1'
'12'
'2016'
I do not quite understand what you are trying to do with combinations of local, eval EXPR and eval BLOCK in your code and the purpose of the following hash:
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
If you are trying to codify that this pattern should result in three captures, you can do that like this:
my #tests = (
{
pattern => qr{(\d+)/(\d+)/(\d+)},
ncaptures => 3,
}
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern})
or next;
unless (#captured == $test->{ncaptures}) {
# handle failure
}
}
See this answer to find out how you can automate counting the number of capture groups in a pattern. Using the technique in that answer:
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
my #tests = map +{ pattern => qr{$_}, ncaptures => number_of_capturing_groups($_) }, qw(
(\d+)/(\d+)/(\d+)
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern});
ok #captured == $test->{ncaptures};
}
done_testing;
sub number_of_capturing_groups {
"" =~ /|$_[0]/;
return $#+;
}
Output:
ok 1
1..1

Perl Grepping from an Array

I need to grep a value from an array.
For example i have a values
#a=('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl');
#Array = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl','branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
Now, i need to loop #a and find each value matches to #Array. For Example
It works for me with grep. You'd do it the exact same way as in the More::ListUtils example below, except for having grep instead of any. You can also shorten it to
my $got_it = grep { /$str/ } #paths;
my #matches = grep { /$str/ } #paths;
This by default tests with /m against $_, each element of the list in turn. The $str and #paths are the same as below.
You can use the module More::ListUtils as well. Its function any returns true/false depending on whether the condition in the block is satisfied for any element in the list, ie. whether there was a match in this case.
use warnings;
use strict;
use Most::ListUtils;
my $str = 'branches/Soft/a.txt';
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
my $got_match = any { $_ =~ m/$str/ } #paths;
With the list above, containing the $str, the $got_match is 1.
Or you can roll it by hand and catch the match as well
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
This does print out the match.
Note that the strings you show in your example do not contain the one to match. I added it to my list for a test. Without it in the list no match is found in either of the examples.
To test for more than one string, with the added sample
my #strings = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl',
'branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
foreach my $str (#strings) {
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
# Or, instead of the foreach loop above use
# my $match = grep { /$str/ } #paths;
# print "Matched for $str\n" if $match;
}
This prints
Found it: branches/Soft/a.txt
Found it: branches/Soft/h.cpp
Found it: branches/Main/utils.pl
When the lines with grep are uncommented and foreach ones commented out I get the corresponding prints for the same strings.
The slashes dot in $a will pose a problem so you either have to escape them it when doing regex match or use a simple eq to find the matches:
Regex match with $a escaped:
my #matches = grep { /\Q$a\E/ } #array;
Simple comparison with "equals":
my #matches = grep { $_ eq $a } #array;
With your sample data both will give an empty array #matches because there is no match.
This Solved My Question. Thanks to all especially #zdim for the valuable time and support
my #SVNFILES = ('branches/Soft/a.txt', 'branches/Soft/b.txt');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
foreach my $svn (#SVNFILES)
{
chomp ($svn);
my $m = grep { /$svn/ } (#paths);
if ( $m eq '0' ) {
print "Files Mismatch\n";
exit 1;
}
}
You should escape characters like '/' and '.' in any regex when you need it as a character.
Likewise :
$a="branches\/Soft\/a\.txt"
Retry whatever you did with either grep or perl with that. If it still doesn't work, tell us precisely what you tried.

perl match single occurence pattern in string

I have a list of names and I want to look for names containing two given letters asigned using variables.
$one = "A";
$two = "O";
Please note that I want those letters to be present anywhere in the checked names, so that I can get outputs like this:
Jason
Damon
Amo
Noma
Boam
...
But each letter must only be present once per name, meaning that this wouldn't work.
Alamo
I've tried this bit of code but it doesn't work.
foreach my $name (#list) {
if ($name =~ /$one/) {
if ($name =~ /$two/) {
print $name;
}}
else {next}; }
How about this?
for my $name (#list) {
my $ones = () = $name =~ /$one/gi;
my $twos = () = $name =~ /$two/gi;
if ($ones == 1 && $twos == 1) {
print $name;
}
}
#!/usr/bin/env perl
#
# test.pl is the name of this script
use warnings;
use strict;
my %char = map {$_ => 1} grep {/[a-z]/} map {lc($_)} split //, join '', #ARGV;
my #chars = sort keys %char; # the different characters appearing in the command line arguments
while (my $line = <STDIN>)
{
grep {$_ <=> 1} map {scalar(() = $line =~ /$_/ig )} #chars
or print $line;
}
Now:
echo hello world | test.pl fw will print nothing (w occurs exactly once in hello world, but f does not)
echo hello world | test.pl hw will print a line consisting of hello world (both h and w occur exactly once).
One way to get it all into a single regex is to use an expression within the regex pattern to search for the other letter (a or o) based on which one was found first:
#!/usr/bin/env perl
use 5.010; use strict; use warnings;
while(<DATA>){
chomp;
say if m/^
[^ao]* # anything but a or o
([ao]) # an 'a' or 'o'
[^ao]* # anything but a or o
(??{($1 and lc($1) eq 'a') ? 'o' : 'a'}) # the other 'a' or 'o'
[^ao]* $/xi; # anything but a or o
}
__DATA__
Abdi
Chelal
Jucal
Husham
Gallio
Pileser
Tekoa
Kenaz
Raphah
See the perlre section on Extended Expressions for more info.
This is my solution. You don't make it clear whether there will always be just two single-character strings to match but I have assumed that there may be more
Unfortunately the classical way of counting occurrences of a character -- tr/// -- doesn't interpolate variables into its searchlist and doesn't have a case-independent modifier /i. But the pattern-match operator m// does, so that is what I have used
I thoroughly dislike the so-called goatse operator, but there isn't a neater way that I know of that allows you to count the number of times a global regex pattern matches
I could have used a grep for the inner loop, but I went for a regular for loop and a next with a label as I believe it's more readable this way
use strict;
use warnings;
use v5.10.1;
use autodie;
my #list = do {
open my $fh, '<', 'names.txt';
<$fh>;
};
chomp #list;
my ($one, $two) = qw/ A O /;
NAME:
for my $name ( #list ) {
for ( $one, $two) {
my $count = () = $name =~ /$_/gi;
next NAME unless $count == 1;
}
say $name;
}
output
Gallio
Tekoa
Achbor
Clopas
This is the input that I used
Abdi
Chelal
Jucal
Husham
Gallio
Pileser
Tekoa
Kenaz
Raphah
Tiras
Jehudi
Bildad
Shemidah
Meshillemoth
Tabeel
Achbor
Jesus
Osee
Elnaam
Rephah
Asaiah
Er
Clopas
Penuel
Shema
Marsena
Jaare
Joseph
Shamariah
Levi
Aphses

Grepping second pattern after matching first pattern

Is there any grep/sed option which will allow me to match a pattern after matching another pattern? For example: Input file (foos are variable patterns starting with 0 mixed with random numbers preceded by # in front):
0foo1
0foo2
0foo3
\#89888
0foo4
0foo5
\#98980
0foo6
So once I try to search for a variable pattern (eg. foo2), I also want to match another pattern (eg, #number) from this pattern line number, in this case, #89888.
Therefore output for variable foo2 must be:
foo2 #89888
For variable foo5:
foo5 #98980
foos consist of every character, including which may be considered metacharacters.
I tried a basic regex match script using tcl which will first search for foo* and then search for next immediate #, but since I am working with a very large file, it will take days to finish. Any help is appreciated.
A Perl one-liner to slurp the whole file and match across any newlines for the pattern you seek would look like:
perl -000 -nle 'm{(foo2).*(\#89888)}s and print join " ",$1,$2' file
The -000 switch enables "slurp" mode which signals Perl not to split the file into chunks, but rather treat it as one large string. The s modifier lets . match any character, including a newline.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my ( %matches, $recent_foo );
while(<DATA>)
{
chomp;
( $matches{$recent_foo} ) = $1 if m/(\\#\d+)/;
( $recent_foo ) = $1 if m/(0foo\d+)/;
}
print Dumper( \%matches );
__DATA__
0foo1
0foo2
0foo3
\#89888
0foo4
0foo5
\#98980
0foo6
./perl
$VAR1 = {
'0foo5' => '\\#98980',
'0foo3' => '\\#89888'
};
If what you want is 0foo1, 0foo2 and 0foo3 to all have the same value the following will do:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my ( %matches, #recent_foo );
while(<DATA>)
{
chomp;
if (/^\\#/)
{
#matches{#recent_foo} = ($') x #recent_foo;
undef #recent_foo;
}
elsif (/^0/)
{
push #recent_foo, $';
}
}
print Dumper( \%matches );
__DATA__
0foo1
0foo2
0foo3
\#89888
0foo4
0foo5
\#98980
0foo6
gives:
$VAR1 = {
'foo2' => '89888',
'foo1' => '89888',
'foo5' => '98980',
'foo3' => '89888',
'foo4' => '98980'
};
Var='foo2'
sed "#n
/${Var}/,/#[0-9]\{1,\}/ {
H
/#[0-9]\{1,\}/ !d
s/.*//;x
s/.//;s/\n.*\\n/ /p
q
}" YourFile
Not clear as request. It take first occurence of your pattern foo2 until first #number, remove line between and print both line in 1 than quit (no other extract
A Tcl solution. The procedure runs in a little over 3 microseconds, so you'll need very large data files to have it run for days. If more than one token matches, the first match is used (it's easy to rewrite the procedure to return all matches).
set data {
0foo1
0foo2
0foo3
\#89888
0foo4
0foo5
\#98980
0foo6
}
proc find {data pattern} {
set idx [lsearch -regexp $data $pattern]
if {$idx >= 0} {
lrange $data $idx $idx+1
}
}
find $data 0foo3
# -> 0foo3 #89888
find $data 0f.*5
# -> 0foo5 #98980
Documentation: if, lrange, lsearch, proc, set
sed
sed -n '/foo2/,/#[0-9]\+/ {s/^[[:space:]]*[0\\]//; p}' file |
sed -n '1p; $p' |
paste -s
The first sed prints all the lines between the first pattern and the 2nd, removing optional leading whitespace and the leading 0 or \.
The second sed extracts only the first and last lines.
The paste command prints the 2 lines as a single line, separated with a tab.
awk
awk -v p1=foo5 '
$0 ~ p1 {found = 1}
found && /#[0-9]+/ { sub(/^\\\/, ""); print p1, $0; exit }
' file
tcl
lassign $argv filename pattern1
set found false
set fid [open $filename r]
while {[gets $fid line] != -1} {
if {[string match "*$pattern1*" $line]} {
set found true
}
if {$found && [regexp {#\d+} $line number]} {
puts "$pattern1 $number"
break
}
}
close $fid
Then
$ tclsh 2patt.tcl file foo4
foo4 #98980
Is this what you want?
$ awk -v tgt="foo2" 'index($0,tgt){f=1} f&&/#[0-9]/{print tgt, $0; exit}' file
foo2 \#89888
$ awk -v tgt="foo5" 'index($0,tgt){f=1} f&&/#[0-9]/{print tgt, $0; exit}' file
foo5 \#98980
I'm using index() above as it searches for a string not a regexp and so could not care less what RE metacharacters are in foo - they are all just literal characters in a string.
It's not clear from your question if you want to find a specific number after a specific foo or the first number after foo2 or even if you want to search for a specific foo value or all "foo"s or...