I've been racking my brain for hours on this and I'm at my wit's end. I'm beginning to think that this isn't possible for a regular expression.
The closest thing I've seen is this post: Regular expression to match a line that doesn't contain a word?, but the solution doesn't work when I replace "hede" with the number.
I want to select EACH line that DOES NOT contain: 377681 so that I can delete it.
^((?!377681).)*$
...doesn't work, along with thousands of other examples/tweaks that I've found or done.
Is this possible?
Would grep -v 377681 input_file solve your problem?
Try this one
^(?!.*377681).+$
See it here on Regexr
Important here is to use the m (multiline) modifier, so that ^ match the start of the line and $ the end of the row, other wise it will not work.
(Note: I recognized that my regex has the same meaning than yours.)
There's probably a better way of doing this, like for example iterating each line and asking for a built String method, like indexOf or contains depending on the language you're using.
Could you give us the full example?
<?php
$lines = array(
'434343343776815456565464',
'434343343774815456565464',
'434343343776815456565464'
);
foreach($lines as $key => $value){
if(!preg_match('#(377681)#is', $value)){
unset($lines[$key]);
}
}
print_r($lines);
?>
You'll need to enable the m (multi-line) flag for the ^ and $ to match the start- and end-of-lines respectively. If you don't, ^ will match the start-of-input and $ will only match the end-of-input.
The following demo:
#!/usr/bin/env php
<?php
$text = 'foo 377681 bar
this can be 3768 removed
377681 more text
remove me';
echo preg_replace('/^((?!377681).)*$/m', '---------', $text);
?>
will print:
foo 377681 bar
---------
377681 more text
---------
Related
I want to grep the shortest match and the pattern should be something like:
<car ... model=BMW ...>
...
...
...
</car>
... means any character and the input is multiple lines.
You're looking for a non-greedy (or lazy) match. To get a non-greedy match in regular expressions you need to use the modifier ? after the quantifier. For example you can change .* to .*?.
By default grep doesn't support non-greedy modifiers, but you can use grep -P to use the Perl syntax.
Actualy the .*? only works in perl. I am not sure what the equivalent grep extended regexp syntax would be. Fortunately you can use perl syntax with grep so grep -P would work but grep -E which is same as egrep would not work (it would be greedy).
See also: http://blog.vinceliu.com/2008/02/non-greedy-regular-expression-matching.html
grep
For non-greedy match in grep you could use a negated character class. In other words, try to avoid wildcards.
For example, to fetch all links to jpeg files from the page content, you'd use:
grep -o '"[^" ]\+.jpg"'
To deal with multiple line, pipe the input through xargs first. For performance, use ripgrep.
My grep that works after trying out stuff in this thread:
echo "hi how are you " | grep -shoP ".*? "
Just make sure you append a space to each one of your lines
(Mine was a line by line search to spit out words)
Sorry I am 9 years late, but this might work for the viewers in 2020.
So suppose you have a line like "Hello my name is Jello".
Now you want to find the words that start with 'H' and end with 'o', with any number of characters in between. And we don't want lines we just want words. So for that we can use the expression:
grep "H[^ ]*o" file
This will return all the words. The way this works is that: It will allow all the characters instead of space character in between, this way we can avoid multiple words in the same line.
Now you can replace the space character with any other character you want.
Suppose the initial line was "Hello-my-name-is-Jello", then you can get words using the expression:
grep "H[^-]*o" file
The short answer is using the next regular expression:
(?s)<car .*? model=BMW .*?>.*?</car>
(?s) - this makes a match across multiline
.*? - matches any character, a number of times in a lazy way (minimal
match)
A (little) more complicated answer is:
(?s)<([a-z\-_0-9]+?) .*? model=BMW .*?>.*?</\1>
This will makes possible to match car1 and car2 in the following text
<car1 ... model=BMW ...>
...
...
...
</car1>
<car2 ... model=BMW ...>
...
...
...
</car2>
(..) represents a capturing group
\1 in this context matches the sametext as most recently matched by
capturing group number 1
I know that its a bit of a dead post but I just noticed that this works. It removed both clean-up and cleanup from my output.
> grep -v -e 'clean\-\?up'
> grep --version grep (GNU grep) 2.20
I have the following input:
string='GET........ref=mp4;GET........ref=flv;GET........ref=mp4;'
It has 3 segments. I need to extract the segments ending with mp4;.
ie.
GET........ref=mp4
GET........ref=mp4
The current result will match GET........ref=mp4 and GET........ref=flv;GET........ref=mp4;.
My regular express: GET(.*?)mp4
I don't need the long match containing flv inside, and this regex does not work: GET(.*?)(?!:flv)mp4
I don't know how to solve and any help is appreciated.
You can explode the semi-colon separated list and then use preg_grep to get only the elements that end with mp4:
$string='GET........ref=mp4;GET........ref=flv;GET........ref=mp4;';
$res = explode(";", $string);
$res = preg_grep('/mp4$/i', $res);
print_r($res);
See IDEONE demo
If there are no semi-colons, all is glued:
// NO SEMI_COLONS
$str='GET........ref=mp4GET........ref=flvGET........ref=mp4';
preg_match_all('/GET\b(?:(?!GET\b).)*mp4(?=$|GET\b)/', $str, $res);
print_r($res);
See another IDEONE demo
First things first, you need to split your string into tokens:
http://get........ref=mp4
http://get........ref=flv
http://get........ref=mp4
and then apply your regex. if you need it to start with the http and end with mp4 then use "^http.mp4$"
The ^ means beginning of the line, $ means the end of the line and the . means match any character 0 or more times. And example using sed to split the results for instance:
echo "http://get........ref=mp4;http://get........ref=flv;http://get........ref=mp4a;" | sed s/';'/\\n/g | grep "^http.*mp4$"
EDIT: if ';' is not your real separator, replace it with whatever is the real separator.
If you are looking for bit a cleaner approach that will work with or without ;
preg_match_all("/GET(?:(?!GET).)*=mp4/", $str, $res);
print_r($res);
I have string
Message <Network=Data Center> All Verified
I need to extract all string except one in angular brackets
I tried
m/(?![^<]*\\>)/s
Not giving desired result.
Removing <..> regions
It's easier to remove the <..> parts from the string and then deal with the remaining string.
Try this oneliner:
cat file | perl -pne 's/<[^>]*?>//g;'
For your sample input, this is the output:
Message All Verified
Notice the non-greedy quantifier ? is used in the regex. Also, because this is a oneliner, the s/// search-and-replace construct is applied to $_ implicit variable (which is a line from standard input). So after search & replace has run in this oneliner, the $_ will be altered(there will be no <..> regions in it). Also the -p was used in order to print the variable $_ after running the block of code. You can read more about Perl commandline switches in perlrun.
This is one solution. Below there is another one:
Capturing regions outside of <..>
On the other hand, you can(if you want) match the parts outside of the <..> regions.
In order to do that let's build a regex. First, we want a < or > free region. The following regex matches just that
$p = ([^<>]*).
Next, we want to match everything before <, and for that we can write (?:$p<) and everything after >, and that's (?:>$p).
Now if we assemble all those parts together we get (?:>$p)|(?:$p<).
Notice that (?:) is a non-capturing group.
So now there are two capturing groups (the two $p you see above) but only one will match at a time, so some of the captures will be undef. We'll have to filter those out.
Finally, we can assemble all the captures, and we're done.
cat file | perl -ne '$p="([^<>]*)";#x=grep{defined} m{(?:>$p)|(?:$p<)}g; print join(" ",#x)."\n";'
Parse::Yapp parser
You might think that using Parser::Yapp for this particular problem is a bit too much(usually, if you have something complicated to parse, you would use a grammar and a parser generator), but .. why not.. :)
Ok, so we need a grammar, here's one right here grammar_file.yp:
#header
%%
#rules
expression:
| exterior '<' interior '>' exterior
| exterior
;
exterior:
| TOK { $_[0]->YYData->{DATA} .= $_[1]; }
| expression
;
interior: TOK;
%%
#footer
sub Error { my ($parser)=shift; }
sub Lexer {
use Data::Dumper;
my($parser)=shift;
$parser->YYData->{INPUT} or return('',undef);
#$parser->YYData->{INPUT}=~s/^\s+//;
for ($parser->YYData->{INPUT}) {
return ('TOK',$1) if(s/^([^<>]+)//);
return ( $1,$1) if(s/^([<>])//);
};
}
You will notice in the grammar above that the interior is completely ignored, and only the terminals from exterior are collected.
Here's a small program that will use the parser(MyParser.pm generated from grammar_file.yp) parse.pl:
#!/usr/bin/env perl
use strict;
use warnings;
use MyParser;
my $parser=MyParser->new;
$parser->YYData->{INPUT} = "Message <Network=Data Center> All Verified";
my $value=$parser->YYParse(
yylex => \&MyParser::Lexer,
yyerror => \&MyParser::Error,
#yydebug => 0x1F,
);
my $nberr=$parser->YYNberr();
my $data=$parser->YYData->{DATA};
print "Result=$data"
And now a Makefile and we're done:
generate_parser_module:
yapp -m MyParser grammar_file.yp;
run:
perl parse.pl
all: generate_parser_module
Note
Some more Parser generators can be found here
Regexp::Grammars
Parse::RecDescent
Marpa::XS or Marpa::R2
You can do it other way: just remove the string in the angular brackets:
s#<.*>##
Or if > is not allowed:
s#<[^>]*>##
You can use sed for that:
cat yourfile |sed 's/<.*>//g' > newfile
If you need perl:
perl -i -pe "s/<.*?>//g" yourfile
Here is a compact approach. The following regex will capture your strings into Group 1:
<[^>]+>|([^<>]*)
What we are interested in here is not the overall match, but just the Group 1 matches.
So we need to iterate over Group 1 matches. I don't code in Perl, but following a recipe from the perlretut tutorial, this should do it:
while ($x =~ /<[^>]+>|([^<>]*)/g) {
print "$1","\n";
}
Please give it a try and let me know if it works for you.
Format of log line:
Xxx x xx:xx:xx xmmxxx XXXXXX: XXXXXXX:XXX: xxx_Mxxx_Xxxxxx_mxxxxxmmxx [XXX xxxx.
I want to extract from '_m' to the end of the line, removing the '_' before the 'm'.
New to regex...
Thanks!
if your tool/language support look-behind, this works: match the first _m till EOL. also ignore the leading _
(?<=_)m.*
test with grep:
kent$ echo "Xxx x xx:xx:xx xmmxxx XXXXXX: XXXXXXX:XXX: xxx_Mxxx_Xxxxxx_mxxxxxmmxx [XXX xxxx."|grep -Po '(?<=_)m.*'
mxxxxxmmxx [XXX xxxx.
With sed:
sed -n 's/^.*_\(m.*$\)/\1/p' file
It is quite easy:
This example is written in C# however the regex is quite general and will probably work anywhere:
Regex regex = new Regex(#"_(m.*)"); // If you look for _M the regex should be #"_(M.*)"
Match match = regex.Match(logLine);
if (match.Success)
Console.WriteLine(match.Groups[1].Value);
Hope this will help you on your quest.
I have been trying to remove the text before and after a particular character in each line of a text. It would be very hard to do manually since it contain 5000 lines and I need to remove text before that keyword in each line. Any software that could do it, would be great or any Perl scripts that could run on Windows. I run Perl scripts in ActivePerl, so scripts that could do this and run on ActivePerl would be helpful.
Thanks
I'd use this:
$text =~ s/ .*? (keyword) .* /$1/gx;
You don't need software, you can make this part of your existing script. Multiline regex replace along the lines of /a(b)c/ then you can backref b in the replacer with $1. Without knowing more about the text you're working with it's hard to guess what the actual pattern would be.
Presuming that you have the following:
text1 text2 keyword text3 text4 text5 keyword text6 text7
and what you want is
s/.*?keyword(.*?)keyword.*/keyword$1keyword/;
otherwise you can just replace the whole line with keyword
An example of the data may help us be clearer
I'd say, that if $text contains your whole text, you can do :
$text =~ s/^.*(keyword1|keyword2).*$/$1/m;
The m modifier makes ^ and $ see a beginning and an ending of line, and not the beginning and ending of the string.
Assuming you want to remove all text to the left of keyword1 and all text to the right of keyword2:
while (<>) {
s/.*(keyword1)/$1/;
s/(keyword2).*/$1/;
print;
}
Put this into a perl script and run it like this:
fix.pl original.txt > new.txt
Or if you just want to do this inplace, perhaps on several files at once:
perl -i.bak -pe 's/.*(keyword1)/$1/; s/(keyword2).*/$1/;' original.txt original2.txt
This will do inplace editing, renaming the original to have a .bak extension, use an implicit while-loop with print and execute the search and replace pattern before each print.
To be safe, verify it without the -i option first, or at the very least on only one file...