Regex: set capture to fixed string

Regex: set capture to fixed string - regex

I want to match the string a but I want my capture patten to be b.
To capture a named as id I can easily do:
(?<id>a)
but I want id to be b when the original string was just a. i.e I want the capture to be characters that aren't in the original string.
For example, in PHP it would look something like:
preg_match('/your_magic/', 'a', $matches);
print $matches['id'] == 'b'; // true

There is no way to get anything in to a capturing group which isn't in the input string. Capturing groups are (at least in Perl) partially represented as start/end positions of the original input string.
If the value you want the capturing group to get is in the input string you can do that using lookarounds. The desired string has to be after the match if your regex flavor has a limited lookbehind (like PHP).
For example:
preg_match('/a(?=.*(?<id>b))/', 'a foo b', $matches);
print "matched '$matches[0]', id is '$matches[id]'";
Output:
matched 'a', id is 'b'

Related

How to get text that is before and after of a matched group in a regex expression

I have following regex that matches any number in the string and returns it in the group.
^.*[^0-9]([0-9]+).*$  $1
Is there a way I can get the text before and after of the matched group i.e. also as my endgoal is to reconstruct the string by replacing the value of only the matched group.
For e.g. in case of this string /this_text_appears_before/73914774/this_text_appears_after, i want to do something like $before_text[replaced_text]$after_text to generate a final result of /this_text_appears_before/[replaced_text]/this_text_appears_after

You only need a single capture group, which should capture the first part instead of the digits:
^(.*?[^0-9])[0-9]+
Regex demo
In the replacement use group 1 followed by your replacement text \1[replaced_text]
Example
pattern = r"^(.*?[^0-9])[0-9]+"
s = "/this_text_appears_before/73914774/this_text_appears_after"
result = re.sub(pattern, r"\1[replaced_text]", s)
if result:
print (result)
Output
/this_text_appears_before/[replaced_text]/this_text_appears_after
Other options for the example data can be matching the /
^(.*?/)[0-9]+
Or if you want to match the first 2 occurrences of the /
^(/[^/]+/)[0-9]+

Make regex quantifier length depend on previous capture group

I'm hoping to use a regex to parse strings which begin with an integer n. After a space, there are n characters, after which there may be more text. I'm hoping to capture n and the n characters that follow. There are no constraints on these n characters. In other words, 5 hello world should match with the capture groups 5 and hello.
I tried this regex, but it wouldn't compile because its structure depends on the input: (\d+) .{\1}.
Is there a way to get the regex compiler to do what I want, or do I have to parse this myself?
I'm using Rust's regex crate, if that matters. And if it's not possible with regex, is it possible with another, more sophisticated regex engine?
Thanks!

As #Cary Swoveland said in the comments, this is not possible in regex in one step without hard-coding the various possible lengths.
However, it is not too difficult to take a substring of the matched string with length from the matched digit:
use regex::Regex;
fn main() {
let re = Regex::new(r"(\d+) (.+)").unwrap();
let test_str = "5 hello world";
for cap in re.captures_iter(test_str) {
let length: usize = cap[1].parse().unwrap_or(0);
let short_match: String = cap[2].chars().take(length).collect();
println!("{}", short_match); // hello
}
}
If you know you'll only be dealing with ASCII characters (no Unicode, accent marks, etc.) then you can use the simpler slice syntax let short_match = &cap[2][..length];.

If Perl is your option, would you please try:
perl -e '
$str = "5 abcdefgh";
$str =~ /(\d+) ((??{".{".($^N)."}"}))/;
print "1st capture group = $1\n";
print "2nd capture group = $2\n";
print "whole capture group = $&\n";
'
Output:
1st capture group = 5
2nd capture group = abcde
whole capture group = 5 abcde
[Explanation]
If the (??{...}) block is encountered in a regex, its contents
are expanded as a Perl code on the fly.
The special variable $^N refers to the last captured group
and is expanded as 5 in the case.
Then the code (??{".{".($^N)."}"}) is evaluated as .{5} which
represents a dot followed by a quantifier.

How to detect the character before a number in RegEx

I have a string test_demo_0.1.1.
I want in PowerShell script to add before the 0.1.1 some text, for example: test_demo_shay_0.1.1.
I succeeded to detect the first number with RegEx and add the text:
$str = "test_demo_0.1.1"
if ($str - match "(?<number>\d)")
{
$newStr = $str.Insert($str.IndexOf($Matches.number) - 1, "_shay")-
}
# $newStr = test_demo_shay_0.1.1
The problem is, sometimes my string includes a number in another location, for example: test_demo2_0.1.1 (and then the insert is not good).
So I want to detect the first number which the character before is _, how can I do it?
I tried "(_<number>\d)" and "([_]<number>\d)" but it doesn't work.

What you ask for is called a positive lookbehind (a construct that checks for the presence of some pattern immediately to the left of thew current location):
"(?<=_)(?<number>\d)"
^^^^^^
However, it seems all you want is to insert _shay before the first digit preceded with _. A replace operation will suit here best:
$str -replace '_(\d.*)', '_shay_$1'
Result: test_demo_shay_0.1.1.
Details
_ - an underscore
(\d.*) - Capturing group #1: a digit and then any 0+ chars to the end of the line.
The $1 in the replacement pattern is the contents matched by the capturing group #1.

Regex in PHP: take all the words after the first one in string and truncate all of them to the first character

I'm quite terrible at regexes.
I have a string that may have 1 or more words in it (generally 2 or 3), usually a person name, for example:
$str1 = 'John Smith';
$str2 = 'John Doe';
$str3 = 'David X. Cohen';
$str4 = 'Kim Jong Un';
$str5 = 'Bob';
I'd like to convert each as follows:
$str1 = 'John S.';
$str2 = 'John D.';
$str3 = 'David X. C.';
$str4 = 'Kim J. U.';
$str5 = 'Bob';
My guess is that I should first match the first word, like so:
preg_match( "^([\w\-]+)", $str1, $first_word )
then all the words after the first one... but how do I match those? should I use again preg_match and use offset = 1 in the arguments? but that offset is in characters or bytes right?
Anyway after I matched the words following the first, if the exist, should I do for each of them something like:
$second_word = substr( $following_word, 1 ) . '. ';
Or my approach is completely wrong?
Thanks
ps - it would be a boon if the regex could maintain the whole first two words when the string contain three or more words... (e.g. 'Kim Jong U.').

It can be done in single preg_replace using a regex.
You can search using this regex:
^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+
And replace by:
$1.
RegEx Demo
Code:
$name = preg_replace('/^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+/', '$1.', $name);
Explanation:
(*FAIL) behaves like a failing negative assertion and is a synonym for (?!)
(*SKIP) defines a point beyond which the regex engine is not allowed to backtrack when the subpattern fails later
(*SKIP)(*FAIL) together provide a nice alternative of restriction that you cannot have a variable length lookbehind in above regex.
^\w+(?:$| +)(*SKIP)(*F) matches first word in a name and skips it (does nothing)
(\w)\w+ matches all other words and replaces it with first letter and a dot.

You could use a positive lookbehind assertion.
(?<=\h)([A-Z])\w+
OR
Use this regex if you want to turn Bob F to Bob F.
(?<=\h)([A-Z])\w*(?!\.)
Then replace the matched characters with \1.
DEMO
Code would be like,
preg_replace('~(?<=\h)([A-Z])\w+~', '\1.', $string);
DEMO
(?<=\h)([A-Z]) Captures all the uppercase letters which are preceeded by a horizontal space character.
\w+ matches one or more word characters.
Replace the matched chars with the chars inside the group index 1 \1 plus a dot will give you the desired output.

A simple solution with only look-ahead and word boundary check:
preg_replace('~(?!^)\b(\w)\w+~', '$1.', $string);
(\w)\w+ is a word in the name, with the first character captured
(?!^)\b performs a word boundary check \b, and makes sure the match is not at the start of the string (?!^).
Demo

Insert a character when capturing group

I want to select a group out of a given string and insert a character in position 5 of that group.
Input String: xxx123456789yyy
Expression: ^x{3}(?<serialno>\d{5}\d{4})y{3}$
Output (serialno): 123456789
Now I want the serialno group to contain a 'A' between 5 and 6, so that I get '12345A6789' instead of 123456789'. The character is always an 'A' and I want to do this in one Regular Expression.
Is it possible to do this with match or do I have to call match and replace?

You can't alter a string with a match, so you'll need to use preg_replace:
$output = preg_replace('/^x{3}(\d{5})(\d{4})y{3}$/', '$1A$2', $input);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex: set capture to fixed string - regex

Related

How to get text that is before and after of a matched group in a regex expression

Make regex quantifier length depend on previous capture group

How to detect the character before a number in RegEx

Regex in PHP: take all the words after the first one in string and truncate all of them to the first character

Insert a character when capturing group

Categories

Resources