Convert regex string to URL - regex

Im trying to convert words into URLs. the words are separated with a comma ,. But my problem is that the first word is not taken into account, because there is no comma infort of it.
public static function convertHashtags($str){
$regex = "/,+([a-zA-Z0-9_]+)/";
$str = preg_replace($regex, '$0', htmlentities($str));
return($str);
}
For example $str=june,mars,april results that only mars and april get URLed, not june.

You can change your regex to:
$regex = '/(?<=,|^)([a-zA-Z0-9_]+)/';
to match line start or comma before your words.
You can shorten your regex to:
$regex = '/(?<=,|^)(\w+)/';

Related

How to write regular expression in powershell

I need regular expression in powershell to split string by a string ## and remove string up-to another character (;).
I have the following string.
$temp = "admin#test.com## deliver, expand;user1#test.com## deliver, expand;group1#test.com## deliver, expand;"
Now, I want to split this string and get only email ids into new array object. my expected output should be like this.
admin#test.com
user1#test.com
group1#test.com
To get above output, I need to split string by the character ## and remove sub string up-to semi-colon (;).
Can anyone help me to write regex query to achieve this need in powershell?.
If you want to use regex-based splitting with your approach, you can use ##[^;]*; regex and this code that will also remove all the empty values (with | ? { $_ }):
$res = [regex]::Split($temp, '##[^;]*;') | ? { $_ }
The ##[^;]*; matches:
## - double #
[^;]* - zero or more characters other than ;
; - a literal ;.
See the regex demo
Use [regex]::Matches to get all occurrences of your regular expression. You probably don't need to split your string first if this suits for you:
\b\w+#[^#]*
Debuggex Demo
PowerShell code:
[regex]::Matches($temp, '\b\w+#[^#]*') | ForEach-Object { $_.Groups[0].Value }
Output:
admin#test.com
user1#test.com
group1#test.com

Copy matched pattern of line at the end of it

I want to match a pattern from text and then append it at the end of line. In below case i want to match numbers and then paste it at the end of line. In case of matching two patterns want to have comma separated.
Basically i am looking how i can use the matching portion as variable.
I am looking to do it in Bash.
abc 123=
agdaf456ad
dfaf879:
abc123xyz12:
To
abc 123=123
agdaf456ad456
dfaf879:879
abc123xyz12:123,12
Something like
(\d+)(.*)$
And replace with
$1$2$1
Regex Demo
Example
$replace = preg_replace("/(\d+)(.*)$/", "$1$2$1", "abc 123=");
echo $replace;
=> abc 123=123
To get all sequences of digits in a given string, you can use a mere \d+ regex, and then just implode the obtained result array and append it to the input string:
$str = "abc123xyz12:";
preg_match_all('/\d+/', $str, $m);
$append = implode(",", $m[0]);
echo $str . $append;
See demo

perl Regex replace for specific string length

I am using Perl to do some prototyping.
I need an expression to replace e by [ee] if the string is exactly 2 chars and finishes by "e".
le -> l [ee]
me -> m [ee]
elle -> elle : no change
I cannot test the length of the string, I need one expression to do the whole job.
I tried:
`s/(?=^.{0,2}\z).*e\z%/[ee]/g` but this is replacing the whole string
`s/^[c|d|j|l|m|n|s|t]e$/[ee]/g` same result (I listed the possible letters that could precede my "e")
`^(?<=[c|d|j|l|m|n|s|t])e$/[ee]/g` but I have no match, not sure I can use ^ on a positive look behind
EDIT
Guys you're amazing, hours of search on the web and here I get answers minutes after I posted.
I tried all your solutions and they are working perfectly directly in my script, i.e. this one:
my $test2="le";
$test2=~ s/^(\S)e$/\1\[ee\]/g;
print "test2:".$test2."\n";
-> test2:l[ee]
But I am loading these regex from a text file (using Perl for proto, the idea is to reuse it with any language implementing regex):
In the text file I store for example (I used % to split the line between match and replace):
^(\S)e$% \1\[ee\]
and then I parse and apply all regex like that:
my $test="le";
while (my $row = <$fh>) {
chomp $row;
if( $row =~ /%/){
my #reg = split /%/, $row;
#if no replacement, put empty string
if($#reg == 0){
push(#reg,"");
}
print "reg found, reg:".$reg[0].", replace:".$reg[1]."\n";
push #regs, [ #reg ];
}
}
print "orgine:".$test."\n";
for my $i (0 .. $#regs){
my $p=$regs[$i][0];
my $r=$regs[$i][1];
$test=~ s/$p/$r/g;
}
print "final:".$test."\n";
This technique is working well with my other regex, but not yet when I have a $1 or \1 in the replace... here is what I am obtaining:
final:\1\ee\
PS: you answered to initial question, should I open another post ?
Something like s/(?i)^([a-z])e$/$1[ee]/
Why aren't you using a capture group to do the replacement?
`s/^([c|d|j|l|m|n|s|t])e$/\1 [ee]/g`
If those are the characters you need and if it is indeed one word to a line with no whitespace before it or after it, then this will work.
Here's another option depending on what you are looking for. It will match a two character string consisting of one a-z character followed by one 'e' on its own line with possible whitespace before or after. It will replace this will the single a-z character followed by ' [ee]'
`s/^\s*([a-z])e\s*$/\1 [ee]/`
^(\S)e$
Try this.Replace by $1 [ee].See demo.
https://regex101.com/r/hR7tH4/28
I'd do something like this
$word =~ s/^(\w{1})(e)$/$1$2e/;
You can use following regex which match 2 character and then you can replace it with $1\[$2$2\]:
^([a-zA-Z])([a-zA-Z])$
Demo :
$my_string =~ s/^([a-zA-Z])([a-zA-Z])$/$1[$2$2]/;
See demo https://regex101.com/r/iD9oN4/1

Perl regex in a variable string

I'm new to perl with regex.
I'm trying to have a string of oid 1.3.6.1.2.1.4.22.1.2.*.192.168.1.1, but I'm not sure how to do it.
I tried the below, but it is getting error which is saying not able to recognize the oid.
my $matchanyoid = "/(\d+)$/";
my $dot1dTpFdbAddress = '1.3.6.1.2.1.4.22.1.2.',$matchanyoid,'\.',$srcip;
Comma is not a concatenation operator, dot is:
my $dot1dTpFdbAddress = '1.3.6.1.2.1.4.22.1.2.' . $matchanyoid . '\.' . $srcip;
If you are trying to build a regular expression, note that the first several dots are not backslashed, so they can match anything. To avoid lots of backslashes, you can use the \Q ... \E construct:
my $matchanyoid = '(\d+)';
my $srcip = 12;
my $regex = qr/\Q1.3.6.1.2.1.4.22.1.2.\E$matchanyoid\.$srcip/;
print '1.3.6.1.2.1.4.22.1.2.123.12' =~ $regex;

Perl - Regex to extract only the comma-separated strings

I have a question I am hoping someone could help with...
I have a variable that contains the content from a webpage (scraped using WWW::Mechanize).
The variable contains data such as these:
$var = "ewrfs sdfdsf cat_dog,horse,rabbit,chicken-pig"
$var = "fdsf iiukui aawwe dffg elephant,MOUSE_RAT,spider,lion-tiger hdsfds jdlkf sdf"
$var = "dsadp poids pewqwe ANTELOPE-GIRAFFE,frOG,fish,crab,kangaROO-KOALA sdfdsf hkew"
The only bits I am interested in from the above examples are:
#array = ("cat_dog","horse","rabbit","chicken-pig")
#array = ("elephant","MOUSE_RAT","spider","lion-tiger")
#array = ("ANTELOPE-GIRAFFE","frOG","fish","crab","kangaROO-KOALA")
The problem I am having:
I am trying to extract only the comma-separated strings from the variables and then store these in an array for use later on.
But what is the best way to make sure that I get the strings at the start (ie cat_dog) and end (ie chicken-pig) of the comma-separated list of animals as they are not prefixed/suffixed with a comma.
Also, as the variables will contain webpage content, it is inevitable that there may also be instances where a commas is immediately succeeded by a space and then another word, as that is the correct method of using commas in paragraphs and sentences...
For example:
Saturn was long thought to be the only ringed planet, however, this is now known not to be the case.
^ ^
| |
note the spaces here and here
I am not interested in any cases where the comma is followed by a space (as shown above).
I am only interested in cases where the comma DOES NOT have a space after it (ie cat_dog,horse,rabbit,chicken-pig)
I have a tried a number of ways of doing this but cannot work out the best way to go about constructing the regular expression.
How about
[^,\s]+(,[^,\s]+)+
which will match one or more characters that are not a space or comma [^,\s]+ followed by a comma and one or more characters that are not a space or comma, one or more times.
Further to comments
To match more than one sequence add the g modifier for global matching.
The following splits each match $& on a , and pushes the results to #matches.
my $str = "sdfds cat_dog,horse,rabbit,chicken-pig then some more pig,duck,goose";
my #matches;
while ($str =~ /[^,\s]+(,[^,\s]+)+/g) {
push(#matches, split(/,/, $&));
}
print join("\n",#matches),"\n";
Though you can probably construct a single regex, a combination of regexs, splits, grep and map looks decently
my #array = map { split /,/ } grep { !/^,/ && !/,$/ && /,/ } split
Going from right to left:
Split the line on spaces (split)
Leave only elements having no comma at the either end but having one inside (grep)
Split each such element into parts (map and split)
That way you can easily change the parts e.g. to eliminate two consecutive commas add && !/,,/ inside grep.
I hope this is clear and suits your needs:
#!/usr/bin/perl
use warnings;
use strict;
my #strs = ("ewrfs sdfdsf cat_dog,horse,rabbit,chicken-pig",
"fdsf iiukui aawwe dffg elephant,MOUSE_RAT,spider,lion-tiger hdsfds jdlkf sdf",
"dsadp poids pewqwe ANTELOPE-GIRAFFE,frOG,fish,crab,kangaROO-KOALA sdfdsf hkew",
"Saturn was long thought to be the only ringed planet, however, this is now known not to be the case.",
"Another sentence, although having commas, should not confuse the regex with this: a,b,c,d");
my $regex = qr/
\s #From your examples, it seems as if every
#comma separated list is preceded by a space.
(
(?:
[^,\s]+ #Now, not a comma or a space for the
#terms of the list
, #followed by a comma
)+
[^,\s]+ #followed by one last term of the list
)
/x;
my #matches = map {
$_ =~ /$regex/;
if ($1) {
my $comma_sep_list = $1;
[split ',', $comma_sep_list];
}
else {
[]
}
} #strs;
$var =~ tr/ //s;
while ($var =~ /(?<!, )\b[^, ]+(?=,\S)|(?<=,)[^, ]+(?=,)|(?<=\S,)[^, ]+\b(?! ,)/g) {
push (#arr, $&);
}
the regular expression matches three cases :
(?<!, )\b[^, ]+(?=,\S) : matches cat_dog
(?<=,)[^, ]+(?=,) : matches horse & rabbit
(?<=\S,)[^, ]+\b(?! ,) : matches chicken-pig