How can I remove repeated characters but leave two of them? - regex

If there are more than 2 characters
"Hiiiiiii
My frieeend!!!!!!!"
I need to be reduced to
"Hii
My frieend!!"
Please undestand that in my language there are many words with double chars.
Thnx in advance
kplla

Perl / regex (and if it's not english, Perl has given me better luck with Unicode than PHP):
#!/usr/bin/perl
$str = "Hiiiiii My Frieeeeend!!!!!!!";
$str =~ s/(.)\1\1+/$1$1/g;
print $str;

If a PHP and regex based solution is fine you can do:
$str = "Hiiiiiii My frieeend!!!!!!!";
$str = preg_replace('#(.)\1+#','$1',$str);
echo $str; // prints Hi My friend!
$str = preg_replace('#(.)\1{2,}#','$1$1',$str);
echo $str; // prints Hii My frieend!!
You can make use of the regex used above in Perl too:
$str = "Hiiiiiii My frieeend!!!!!!!";
$str =~s/(.)\1{2,}/$1$1/g;

Here's another regex solution that uses lookahead (just for fun), in Java:
System.out.println(
"Hiiiiii My Frieeeeend!!!!!!!".replaceAll("(.)(?=\\1\\1)", "")
); // prints "Hii My Frieend!!"

Related

Need a regex for the following expression?

#! /usr/bin/perl
$str = "ab_cde,efg_gh,drg_fgt,main_xx,sum(abc),avg(def)";
or
$str = "ab_cde,bc_bn,gy_ihf,efg_gh,drg_fgt,main_xx,sum(abc),avg(def‌​)";
Guys, the string before main_xx is dynamic means there can be more elements with this format like xx_xx or xxx_xx or xx_xxx or xxx_xxxx or it can be as many characters before and after "underscore". So before main_xx, as many elements can come with above format. I want to match string UP TO main_xxbecause even fetching dynamically, this "main_xx" will be the last element and want to ignore elements aftermain_xx`. Please help to create a regex for this.
#!/usr/bin/perl -w
use strict;
my $str = "ab_cde,efg_gh,drg_fgt,main_xx,sum(abc),avg(def)";
(my $result) = ($str =~ m/(.*main_xx)/);
print $result;
The output will be everything up to main_xx (given xx is just the string made of x's).
Try this
my $str = "ab_cde,efg_gh,drg_fgt,main_xx,sum(abc),avg(def)";
my ($match)= $str =~m/(.+main_xx)/;
print $match;

Perl simple regex uppercase words separated by underscore

Consider I have string like print_this_text_in_camel_case and I want to uppercase the first word and every word after the underscore, so the result will be Print_This_Text_In_Camel_Case. The below test does not work on the first word.
#!/usr/bin/perl
my $str = "print_this_text_in_camel_case";
$str =~ s/(_.)/uc($1)/ge;
print $str, "\n";
Just modify the regex to match the first char as well:
#!/usr/bin/perl
my $str = "print_this_text_in_camel_case";
$str =~ s/(_.|^.)/uc($1)/ge;
print $str, "\n";
will print out:
Print_This_Text_In_Camel_Case
You need to add a beginning-of-string anchor as an alternative to the underscore.
For Perl 5.10+, I'd use a \K (keep) escape to emulate variable-width look-behind and only uppercase the letter. I'd also use use \U to perform the uppercase in the replacement text instead of uc and the /e (eval) modifier.
$str =~ s/(?:^|_)\K(.)/\U$1/g;
If you're using an older version of Perl (without \K) you could do it this way:
$str =~ s/(^|_)(.)/$1\U$2/g;
Another alternative is using split and join instead of a regex:
$str = join '_', map { ucfirst } split /_/, $s;
It is tidiest to use a negative look-behind. This code fragment upper-cases all letters that aren't preceded by a letter.
my $str = "print_this_text_in_camel_case";
$str =~ s/ (?<!\p{alpha}) (\p{alpha}) /uc $1/xgei;
print $str, "\n";
output
Print_This_Text_In_Camel_Case
If you prefer, or if you have a very old copy of Perl that doesn't support Unicode properties, you can use [a-z] instead od \p{alpha}, like this
$str =~ s/ (?<![a-z]) ([a-z]) /uc $1/xige;
which produces the same result.
You could also use ucfirst
use feature 'say';
my $str = "print_this_text_in_camel_case";
my #split = map(ucfirst, (split/(_)/, $str));
say #split;

Ereg_replace with a String

I have a string like this:
$str = "{gfgd}i:123;a:7:{gfgd}i:5;a:35:";
And I want to replace it to:
$str = "{gfgd},{gfgd},";
I want to use ereg_replace with it and replace this kind of phrase:
"i:[0-9]a:[0-9]:" into "," sign.
I try it:
$str = "i:143;a:5:{gfgd}i:123;a:7:{gfgd}i:5;a:35:";
$text = ereg_replace("/^i:[0-9]+;a:[0-9]+:+$", ",", $str);
But i doesn't work. Can you help me?
Thank you in advance
$str = "i:143;a:5:{gfgd}i:123;a:7{gfgd}i:5;a:35";
$str = ereg_replace("\}[^\{]+\{", "},{", $str); // replace between } and { with },{
$str = ereg_replace("^[^\{]+", "", $str); // remove from first
$str = ereg_replace("[^\}]+$", ",", $str); // remove from last
print $str;
Don't use ereg_replace as This function has been DEPRECATED as of PHP 5.3.0
Use preg_replace instead and your regex is wrong. Remove anchors ^ and $
$text = preg_replace('/i:[0-9]+;a:[0-9]+:?/', ",", $str);
//=> ,{gfgd},{gfgd},
Online Demo: http://ideone.com/W2P55n
It looks like you are dealing with a PHP array or object serialized into string. I recommend running:
<?php
$arrayOrObject = unserialize($theEntireStringYouGot);
print_r($arrayOrObject);
?>
That way you may not need to even deal with regex at all.
Note: it will not unserialize a piece of string like in your example, feed it the whole thing.

How to substitute whitesapaces and tabs in a string with _ in perl?

$string = I am a boy
How to substitute whitespaces between words with underscore ?
You need a regular expression and the substitution operator to do that.
my $string = 'I am a boy';
$string =~ s/\s/_/g;
You can learn more about regex in perlre and perlretut. A nice tool to play around with is Rubular.
Also, your code will not compile. You need to quote your string, and you need to put a semicolon at the end.
$string = 'I am a boy';
$string =~ s/ /_/g;
$string =~ tr( \t)(_); # Double underscore not necessary as per Dave's comment
This is just to show another option in perl. I think Miguel Prz and imbabque showed more smarter ways, personally i follow the way imbabque showed.
my $str = "This is a test string";
$str =~ s/\p{Space}/_/g;
print $str."\n";
and the output is
This_is_a_test_string

How do I match a Russian word in Unicode text using Perl?

I have a website I want to regexp on, say http://www.ru.wikipedia.org/wiki/perl . The site is in Russian and I want to pull out all the Russian words. Matching with \w+ doesn't work and matching with \p{L}+ retrieves everything.
How do I do it?
All those answers are overcomplicated. Use this
$text =~/\p{cyrillic}/
bam.
perl -MLWP::Simple -e 'getprint "http://ru.wikipedia.org/wiki/Perl"'
403 Forbidden <URL:http://ru.wikipedia.org/wiki/Perl>
Well, that doesn't help!
Downloading a copy first, this seems to work:
use Encode;
local $/ = undef;
my $text = decode_utf8(<>);
my #words = ($text =~ /([\x{0400}-\x{04ff}]+)/gs);
foreach my $word (#words) {
print encode_utf8($word) . "\n";
}
Okay, then try this:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $response = $ua->get("http://ru.wikipedia.org/wiki/Perl");
die $response->status_line unless $response->is_success;
my $content = $response->decoded_content;
my #russian = $content =~ /\s([\x{0400}-\x{052F}]+)\s/g;
print map { "$_\n" } #russian;
I believe that the Cyrillic character set starts at 0x0400 and the Cyrillic supplement character set ends at 0x052F, so this should get many of the words.
Just leave this here.
Match a specific Russian word
use utf8;
...
utf8::decode($text);
$text =~ /привет/;