Find all commas between two seperate characters in string - regex

I have a substring that contains commas. This substring lives inside of another string that is a semi colon delimited list. I need to match the commas in that substring. The substring has a key field "u3=" in front of it.
Example:
u1=something;u2=somethingelse;u3=cat,matt,bat,hat;u4=anotherthing;u5=yetanotherthing
Regex so far:
(?<=u3)(.*)(?=;)
The regex i've been working on above matches everything between "u3" and the last ";" in the outerstring. I need to match only the commas in the substring.
Any guidance would be greatly appreciated.

You didn't specify language!
C#, VB (.NET):
Using an infinite positive lookbehind,
(?<=u3=[^;]*),
Java:
Using a variable-length positive lookbehind:
(?<=u3=[^;]{0,9999}),
PHP (PCRE), Perl, Ruby:
Using \G along with \K token:
(?>u3=|\G(?!^))[^,;]+\K,
Live demo
JavaScript:
Using two replace() methods (if you are going to substitute),
var s = 'u1=something;u2=somethingelse;u3=cat,matt,bat,hat;u4=anotherthing;u5=yetanotherthing';
console.log(
s.replace(/u3=[^;]+/, function(match) {
return match.replace(/,/g, '*');
})
)

Try to use this regex:
(?<=u3)[^;]+
The result is:
=cat,matt,bat,hat

If this was PHP I would do this:
<?php
$str = 'u1=something;u2=somethingelse;u3=cat,matt,bat,hat;u4=anotherthing;u5=yetanotherthing;';
$split = explode(';', $str);
foreach ($split as $key => $value) {
$subsplit = explode('=',$value);
if ($subsplit[0] == 'u3') {
echo $subsplit[1];
preg_match_all('/,/', $subsplit[1], $matches, PREG_OFFSET_CAPTURE);
}
}
var_dump($matches);

Related

Perl regex exclude optional word from match

I have a strings and need to extract only icnnumbers/numbers from them.
icnnumber:9876AB54321_IN
number:987654321FR
icnnumber:987654321YQ
I need to extract below data from above example.
9876AB54321
987654321FR
987654321YQ
Here is my regex, but its working for first line of data.
(icnnumber|number):(\w+)(?:_IN)
How can I have expression which would match for three set of data.
Given your strings to extract are only upper case and numeric, why use \w when that also matches _?
How about just matching:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
m/number:([A-Z0-9]+)/;
print "$1\n";
}
__DATA__
icnnumber:9876AB54321_IN
number:987654321FR
icnnumber:987654321YQ
Another alternative to get only the values as a match using \K to reset the match buffer
\b(?:icn)?number:\K[^\W_]+
Regex demo | Perl demo
For example
my $str = 'icnnumber:9876AB54321_IN
number:987654321FR
icnnumber:987654321YQ';
while($str =~ /\b(?:icn)?number:\K[^\W_]+/g ) {
print $& . "\n";
}
Output
9876AB54321
987654321FR
987654321YQ
You may replace \w (that matches letters, digits and underscores) with [^\W_] that is almost the same, but does not match underscores:
(icnnumber|number):([^\W_]+)
See the regex demo.
If you want to make sure icnnumber and number are matched as whole words, you may add a word boundary at the start:
\b(icnnumber|number):([^\W_]+)
^^
You may even refactor the pattern a bit in order not to repeat number using an optional non-capturing group, see below:
\b((?:icn)?number):([^\W_]+)
^^^^^^^^
Pattern details
\b - a word boundary (immediately to the right, there must be start of string or a char other than letter, digit or _)
((?:icn)?number) - Group 1: an optional sequence of icn substring and then number substring
: - a : char
([^\W_]+) - Group 2: one or more letters or digits.
Just another suggestion maybe, but if your strings are always valid, you may consider just to split on a character class and pull the second index from the resulting array:
my $string= "number:987654321FR";
my #part = (split /[:_]/, $string)[1];
print #part
Or for the whole array of strings:
#Array = ("icnnumber:9876AB54321_IN", "number:987654321FR", "icnnumber:987654321YQ");
foreach (#Array)
{
my $el = (split /[:_]/, $_)[1];
print "$el\n"
}
Results in:
9876AB54321
987654321FR
987654321YQ
Regular expression can have 'icn' as an option and part of the interest is 11 characters after :.
my $re = qr/(icn)?number:(.{11})/;
Test code snippet
use strict;
use warnings;
use feature 'say';
my $re = qr/(icn)?number:(.{11})/;
while(<DATA>) {
say $2 if /$re/;
}
__DATA__
icnnumber:9876AB54321_IN
number:987654321FR
icnnumber:987654321YQ
Output
9876AB54321
987654321FR
987654321YQ
Already you got best and better answers here anyway I trying to solve your question right now.
Get the whole string,
my $str = do { local $/; <DATA> }; #print $str;
You can check the first grouping method upto _ or \b from the below line,
#arrs = ($str=~m/number\:((?:(?!\_).)*)(?:\b|\_)/ig);
(or)
You can check the non-words \W and _ for the first grouping here, and pushing the matches in the array
#arrs = ($str=~m/number\:([^\W\_]+)(?:\_|\b)/ig);
print the output
print join "\n", #arrs;
__DATA__
icnnumber:9876AB54321_IN
number:987654321FR
icnnumber:987654321YQ

Using preg_replace with varying variable replacements

I'm trying to use preg_replace to search for a string but only replace a portion of the string, rather than the entire string, in a dynamic fashion.
For example, I am able to find the strings 'od', ':od', 'od:', '#od', and 'od ' with my code below. I want to replace only the 'od' portion with the word 'odometer' and leave the colon, hashtag, and white spaces untouched. However, the way that my current preg_replace is written would replace the colons and the hashtag in addition to the letters themselves. Any creative solutions to replace the characters only but preserve the surrounding symbols?
Thank you!
if(isset($_POST["text"]))
{
$original = $_POST["text"];
$abbreviation= array();
$abbreviation[0] = 'od';
$abbreviation[1] = 'rn';
$abbreviation[2] = 'ph';
$abbreviation[3] = 'real';
$translated= array();
$translated[0] ='odometer';
$translated[1] ='run';
$translated[2] ='pinhole';
$translated[3] ='fake';
function add_regex_finders($str){
return "/[\s:\#]" . $str . "[\s:]/i";
}
$original_parsed = array_map('add_regex_finders',$original);
preg_replace($original_parsed,$translated,$original);
}
You can add capture groups around the characters before and after the matched abbreviation, and then add the group references to the replacement string:
function add_regex_finders($str){
return "/([\s:\#])" . $str . "([\s:])/i";
}
$abbrevs_parsed = array_map('add_regex_finders', $abbreviation);
$translt_parsed = array_map(function ($v) { return '$1' . $v . '$2'; }, $translated);
echo preg_replace($abbrevs_parsed, $translt_parsed, $original);
Demo on 3v4l.org
Note you had a typo in your code, passing $original to the call to add_regex_finders when it should be $abbreviation.

How to write regular expression in powershell

I need regular expression in powershell to split string by a string ## and remove string up-to another character (;).
I have the following string.
$temp = "admin#test.com## deliver, expand;user1#test.com## deliver, expand;group1#test.com## deliver, expand;"
Now, I want to split this string and get only email ids into new array object. my expected output should be like this.
admin#test.com
user1#test.com
group1#test.com
To get above output, I need to split string by the character ## and remove sub string up-to semi-colon (;).
Can anyone help me to write regex query to achieve this need in powershell?.
If you want to use regex-based splitting with your approach, you can use ##[^;]*; regex and this code that will also remove all the empty values (with | ? { $_ }):
$res = [regex]::Split($temp, '##[^;]*;') | ? { $_ }
The ##[^;]*; matches:
## - double #
[^;]* - zero or more characters other than ;
; - a literal ;.
See the regex demo
Use [regex]::Matches to get all occurrences of your regular expression. You probably don't need to split your string first if this suits for you:
\b\w+#[^#]*
Debuggex Demo
PowerShell code:
[regex]::Matches($temp, '\b\w+#[^#]*') | ForEach-Object { $_.Groups[0].Value }
Output:
admin#test.com
user1#test.com
group1#test.com

Copy matched pattern of line at the end of it

I want to match a pattern from text and then append it at the end of line. In below case i want to match numbers and then paste it at the end of line. In case of matching two patterns want to have comma separated.
Basically i am looking how i can use the matching portion as variable.
I am looking to do it in Bash.
abc 123=
agdaf456ad
dfaf879:
abc123xyz12:
To
abc 123=123
agdaf456ad456
dfaf879:879
abc123xyz12:123,12
Something like
(\d+)(.*)$
And replace with
$1$2$1
Regex Demo
Example
$replace = preg_replace("/(\d+)(.*)$/", "$1$2$1", "abc 123=");
echo $replace;
=> abc 123=123
To get all sequences of digits in a given string, you can use a mere \d+ regex, and then just implode the obtained result array and append it to the input string:
$str = "abc123xyz12:";
preg_match_all('/\d+/', $str, $m);
$append = implode(",", $m[0]);
echo $str . $append;
See demo

perl Regex replace for specific string length

I am using Perl to do some prototyping.
I need an expression to replace e by [ee] if the string is exactly 2 chars and finishes by "e".
le -> l [ee]
me -> m [ee]
elle -> elle : no change
I cannot test the length of the string, I need one expression to do the whole job.
I tried:
`s/(?=^.{0,2}\z).*e\z%/[ee]/g` but this is replacing the whole string
`s/^[c|d|j|l|m|n|s|t]e$/[ee]/g` same result (I listed the possible letters that could precede my "e")
`^(?<=[c|d|j|l|m|n|s|t])e$/[ee]/g` but I have no match, not sure I can use ^ on a positive look behind
EDIT
Guys you're amazing, hours of search on the web and here I get answers minutes after I posted.
I tried all your solutions and they are working perfectly directly in my script, i.e. this one:
my $test2="le";
$test2=~ s/^(\S)e$/\1\[ee\]/g;
print "test2:".$test2."\n";
-> test2:l[ee]
But I am loading these regex from a text file (using Perl for proto, the idea is to reuse it with any language implementing regex):
In the text file I store for example (I used % to split the line between match and replace):
^(\S)e$% \1\[ee\]
and then I parse and apply all regex like that:
my $test="le";
while (my $row = <$fh>) {
chomp $row;
if( $row =~ /%/){
my #reg = split /%/, $row;
#if no replacement, put empty string
if($#reg == 0){
push(#reg,"");
}
print "reg found, reg:".$reg[0].", replace:".$reg[1]."\n";
push #regs, [ #reg ];
}
}
print "orgine:".$test."\n";
for my $i (0 .. $#regs){
my $p=$regs[$i][0];
my $r=$regs[$i][1];
$test=~ s/$p/$r/g;
}
print "final:".$test."\n";
This technique is working well with my other regex, but not yet when I have a $1 or \1 in the replace... here is what I am obtaining:
final:\1\ee\
PS: you answered to initial question, should I open another post ?
Something like s/(?i)^([a-z])e$/$1[ee]/
Why aren't you using a capture group to do the replacement?
`s/^([c|d|j|l|m|n|s|t])e$/\1 [ee]/g`
If those are the characters you need and if it is indeed one word to a line with no whitespace before it or after it, then this will work.
Here's another option depending on what you are looking for. It will match a two character string consisting of one a-z character followed by one 'e' on its own line with possible whitespace before or after. It will replace this will the single a-z character followed by ' [ee]'
`s/^\s*([a-z])e\s*$/\1 [ee]/`
^(\S)e$
Try this.Replace by $1 [ee].See demo.
https://regex101.com/r/hR7tH4/28
I'd do something like this
$word =~ s/^(\w{1})(e)$/$1$2e/;
You can use following regex which match 2 character and then you can replace it with $1\[$2$2\]:
^([a-zA-Z])([a-zA-Z])$
Demo :
$my_string =~ s/^([a-zA-Z])([a-zA-Z])$/$1[$2$2]/;
See demo https://regex101.com/r/iD9oN4/1