how to use preg_replace for the following pattern - regex

Input: "abbbcdaa" Output: "abcd"
With the follow regex the out put is abcda
preg_replace('/(.)\\1*/', '$1', "abbbcdaa");
how to get abcd using pre_replace

This should do it:
$string="abbbcdaa";
echo preg_replace('/(.)(?=.*?\1)/','',$string);
The above outputs:
bcda
Alternatively, you can use:
echo count_chars($string,3);
That would also return unique characters abcd.
Good luck!

Related

Extracting substrings ending with "mp4"

I have the following input:
string='GET........ref=mp4;GET........ref=flv;GET........ref=mp4;'
It has 3 segments. I need to extract the segments ending with mp4;.
ie.
GET........ref=mp4
GET........ref=mp4
The current result will match GET........ref=mp4 and GET........ref=flv;GET........ref=mp4;.
My regular express: GET(.*?)mp4
I don't need the long match containing flv inside, and this regex does not work: GET(.*?)(?!:flv)mp4
I don't know how to solve and any help is appreciated.
You can explode the semi-colon separated list and then use preg_grep to get only the elements that end with mp4:
$string='GET........ref=mp4;GET........ref=flv;GET........ref=mp4;';
$res = explode(";", $string);
$res = preg_grep('/mp4$/i', $res);
print_r($res);
See IDEONE demo
If there are no semi-colons, all is glued:
// NO SEMI_COLONS
$str='GET........ref=mp4GET........ref=flvGET........ref=mp4';
preg_match_all('/GET\b(?:(?!GET\b).)*mp4(?=$|GET\b)/', $str, $res);
print_r($res);
See another IDEONE demo
First things first, you need to split your string into tokens:
http://get........ref=mp4
http://get........ref=flv
http://get........ref=mp4
and then apply your regex. if you need it to start with the http and end with mp4 then use "^http.mp4$"
The ^ means beginning of the line, $ means the end of the line and the . means match any character 0 or more times. And example using sed to split the results for instance:
echo "http://get........ref=mp4;http://get........ref=flv;http://get........ref=mp4a;" | sed s/';'/\\n/g | grep "^http.*mp4$"
EDIT: if ';' is not your real separator, replace it with whatever is the real separator.
If you are looking for bit a cleaner approach that will work with or without ;
preg_match_all("/GET(?:(?!GET).)*=mp4/", $str, $res);
print_r($res);

How to remove matching pattern?

How do i remove my matching pattern from the file?
Everytime the pattern [my_id= occurs, it shall be removed without replacement.
For example, the field [my_id=AB_123456789.1] should be AB_123456789.1.
I already tried, with no result
sed '/\[my\_id\=/d'
awk '$(NF-1) /^[protein\_id\=/d'
Also it is possible to remove the first n characters from the last but 1 field ($(NF-1)) as an alternative?
Thanks for any help
You can use:
sed 's/\[my_id=\([^]]*\)\]/\1/g' file
\[my_id=\([^]]*\)\] looks for this and replaces with the text inside (\1).
\[my_id=\([^]]*\)\] means [my_id= plus a string not containing ], that is caught with the \(...\) syntax to be printed back with \1.
Test
$ cat a
hello [my_id=AB_123456789.1] bye
adf aa [my_id=AB_123456789.1] bbb
$ sed 's/\[my_id=\([^]]*\)\]/\1/g' a
hello AB_123456789.1 bye
adf aa AB_123456789.1 bbb
You can try something like this in awk
$ cat <<test | awk 'gsub(/\[my_id=|\]/,"")'
hello [my_id=AB_123456789.1] bye
adf aa [my_id=AB_123456789.1] bbb
test
hello AB_123456789.1 bye
adf aa AB_123456789.1 bbb

Regex to extract content from each line of a log file output from '_m' to the end of the line

Format of log line:
Xxx x xx:xx:xx xmmxxx XXXXXX: XXXXXXX:XXX: xxx_Mxxx_Xxxxxx_mxxxxxmmxx [XXX xxxx.
I want to extract from '_m' to the end of the line, removing the '_' before the 'm'.
New to regex...
Thanks!
if your tool/language support look-behind, this works: match the first _m till EOL. also ignore the leading _
(?<=_)m.*
test with grep:
kent$ echo "Xxx x xx:xx:xx xmmxxx XXXXXX: XXXXXXX:XXX: xxx_Mxxx_Xxxxxx_mxxxxxmmxx [XXX xxxx."|grep -Po '(?<=_)m.*'
mxxxxxmmxx [XXX xxxx.
With sed:
sed -n 's/^.*_\(m.*$\)/\1/p' file
It is quite easy:
This example is written in C# however the regex is quite general and will probably work anywhere:
Regex regex = new Regex(#"_(m.*)"); // If you look for _M the regex should be #"_(M.*)"
Match match = regex.Match(logLine);
if (match.Success)
Console.WriteLine(match.Groups[1].Value);
Hope this will help you on your quest.

how to write regular expression to match the repeated strings and designate the repeated number?

for example, i want to match the string "abcabc" in a text file, where the two (and only two) "abc" are attached together, and no characters are in front of and at the end of "abcabc"?
if i use grep -n '(abc){2}' TEST, it does not work
Escape the parentesis:
grep -n '\(abc\)\{2\}' TEST
if you want to match the string abcabc alone on a line, as your description seems to suggest, use:
grep -n '^\(abc\)\{2\}$' TEST
This is the shortest alternative:
^abcabc$
Try:
egrep '\b(abc){2}\b' input
Check this fits.
$ grep '\<abcabc\>'
test
abcabc
abcabc
test abcabc
test abcabc
testabcabc
Thanks guys, from your answers, i think i have got what i need.
As i said in my questions, i want to match "abcabc", where 2 and only repeated "abc", no other characters in front of and at the end of, but "abcabc" is not the only string in that line.
so the answers is:
grep -n "\b\\(abc\\)\\{2\\}\b" filename

Regex: Line does NOT contain a number

I've been racking my brain for hours on this and I'm at my wit's end. I'm beginning to think that this isn't possible for a regular expression.
The closest thing I've seen is this post: Regular expression to match a line that doesn't contain a word?, but the solution doesn't work when I replace "hede" with the number.
I want to select EACH line that DOES NOT contain: 377681 so that I can delete it.
^((?!377681).)*$
...doesn't work, along with thousands of other examples/tweaks that I've found or done.
Is this possible?
Would grep -v 377681 input_file solve your problem?
Try this one
^(?!.*377681).+$
See it here on Regexr
Important here is to use the m (multiline) modifier, so that ^ match the start of the line and $ the end of the row, other wise it will not work.
(Note: I recognized that my regex has the same meaning than yours.)
There's probably a better way of doing this, like for example iterating each line and asking for a built String method, like indexOf or contains depending on the language you're using.
Could you give us the full example?
<?php
$lines = array(
'434343343776815456565464',
'434343343774815456565464',
'434343343776815456565464'
);
foreach($lines as $key => $value){
if(!preg_match('#(377681)#is', $value)){
unset($lines[$key]);
}
}
print_r($lines);
?>
You'll need to enable the m (multi-line) flag for the ^ and $ to match the start- and end-of-lines respectively. If you don't, ^ will match the start-of-input and $ will only match the end-of-input.
The following demo:
#!/usr/bin/env php
<?php
$text = 'foo 377681 bar
this can be 3768 removed
377681 more text
remove me';
echo preg_replace('/^((?!377681).)*$/m', '---------', $text);
?>
will print:
foo 377681 bar
---------
377681 more text
---------