regular expression to find the repeating pattern +X+X+X - regex

I have a regular expression
'^[0-9]*d[0-9]+(\+[0-9]*)*$'
to limit an input in the following format
str1 = '3d8+10'
str2 = 'd8+2+4'
However, the re I have also lets the string below through:
str3 = 'd8++2'
is there a way to write the regular expression in order to limit the pattern to
+X+X+X...?

You need
^[0-9]*d[0-9]+(\+[0-9]+)*$
a * here ^ allows only + to match as well
If the string must have at least one + n then use + (one or more) at the end
^[0-9]*d[0-9]+(\+[0-9]+)+$

It appears you are looking for
'^[0-9]*d[0-9]+(\+[0-9]+)*$'

Related

Replacing unknown number of named groups

I am working on such a pattern:
<type>"<prefix>"<format>"<suffix>";<neg_type>"<prefix>"<format>"<suffix>"
So i wrote 2 examples here, with or without prefix:
n"prefix"#,##0"suffix";-"prefix"#,##0"suffix"
n#,##0"suffix";-#,##0"suffix"
Indeed i wrote the folowing regex to capture my group:
raw = r"(?P<type>^.)(?:\"(?P<prefix>[^\"]*)\"){0,1}(?P<format>[^\"]*)(?:\"(?P<suffix>[^\"]*)\"){0,1};(?P<negformat>.)(?:\"(?P=prefix)\"){0,1}(?P=format)(?:\"(?P=suffix)\"){0,1}"
Now i am parsing a big text which contain such structure and i would like to replace the prefix or suffix (only if they exist!). Due to the unknown number (potentially null) of captured group i do not know how to easily can make my replacements (with re.sub).
Additionnaly, due to some implementation constraint i treat sequentially prefix and suffix (so i do not get the suffix to replace at the same time than the prefix to replace even if they belong to the same sentence.
First, we can simplify your regex by using single quotes for the string. That removes the necessity of escaping the " character. Second, {0,1} can be replaced by ?:
raw = r'(?P<type>^.)(?:"(?P<prefix>[^"]*)")?(?P<format>[^"]*)(?:"(?P<suffix>[^"]*)")?;(?P<negformat>.)(?:"(?P<prefix2>(?P=prefix))")?(?P=format)(?:"(?P<suffix2>(?P=suffix))")?'
Notice that I have added (?P<prefix2>) and (?P<suffix2) named groups above for the second occurrences of the prefix and suffix.
I am working on the assumption that your pattern may be repeated within the text (if the pattern only appears once, this code will still work). In that case, the character substitutions must be made from the last to first occurrence so that the start and last character offset information returned by the regex engine remains correct even after character substitutions are made. Similarly, when we find an occurrence of the pattern, we must first replace in the order suffix2, prefix2, suffix and prefix.
We use re.finditer to iterate through the text to return match objects and form these into a list, which we reverse so that we can process the last matches first:
import re
raw = r'(?P<type>^.)(?:"(?P<prefix>[^"]*)")?(?P<format>[^"]*)(?:"(?P<suffix>[^"]*)")?;(?P<negformat>.)(?:"(?P<prefix2>(?P=prefix))")?(?P=format)(?:"(?P<suffix2>(?P=suffix))")?'
s = """a"prefix"format"suffix";b"prefix"format"suffix"
x"prefix_2"format_2"suffix_2";y"prefix_2"format_2"suffix_2"
"""
new_string = s
matches = list(re.finditer(raw, s, flags=re.MULTILINE))
matches.reverse()
if matches:
for match in matches:
if match.group('suffix2'):
new_string = new_string[0:match.start('suffix2')] + 'new_suffix' + new_string[match.end('suffix2'):]
if match.group('prefix2'):
new_string = new_string[0:match.start('prefix2')] + 'new_prefix' + new_string[match.end('prefix2'):]
if match.group('suffix'):
new_string = new_string[0:match.start('suffix')] + 'new_suffix' + new_string[match.end('suffix'):]
if match.group('prefix'):
new_string = new_string[0:match.start('prefix')] + 'new_prefix' + new_string[match.end('prefix'):]
print(new_string)
Prints:
a"new_prefix"format"new_suffix";b"new_prefix"format"new_suffix"
x"new_prefix"format_2"new_suffix";y"new_prefix"format_2"new_suffix"
The above code, for demo purposes, makes the same substitutions for each occurrence of the pattern.
As far as your second concern:
There is nothing preventing you from making two passes against the text, once to replace the prefixes and once to replace suffixes as these become know. Obviously, you would only be checking certain groups for each pass, but you could still be using the same regex. And, of course, for each occurrence of the pattern you can have unique substitutions. The above code shows how to find and make the substitutions.
To allow 0 to 9 instances or the prefix
import re
raw = r'(?P<type>^.)(?:"(?P<prefix>[^"]*)")?(?P<format>[^"]*)(?:"(?P<suffix>[^"]*)")?;(?P<negformat>.)(?P<prefix2>(?:"(?P=prefix)"){0,9})(?P=format)(?:"(?P<suffix2>(?P=suffix))")?'
s = """a"prefix"format"suffix";b"prefix""prefix""prefix"format"suffix"
x"prefix_2"format_2"suffix_2";y"prefix_2"format_2"suffix_2"
"""
new_string = s
matches = list(re.finditer(raw, s, flags=re.MULTILINE))
matches.reverse()
if matches:
for match in matches:
if match.group('suffix2'):
new_string = new_string[0:match.start('suffix2')] + 'new_suffix' + new_string[match.end('suffix2'):]
if match.group('prefix2'):
start = match.start('prefix2')
end = match.end('prefix2')
repl = s[start:end]
n = repl.count('"') // 2
new_string = new_string[0:start] + (n * '"new_prefix"') + new_string[end:]
if match.group('suffix'):
new_string = new_string[0:match.start('suffix')] + 'new_suffix' + new_string[match.end('suffix'):]
if match.group('prefix'):
new_string = new_string[0:match.start('prefix')] + 'new_prefix' + new_string[match.end('prefix'):]
print(new_string)
Prints:
a"new_prefix"format"new_suffix";b"new_prefix""new_prefix""new_prefix"format"new_suffix"
x"new_prefix"format_2"new_suffix";y"new_prefix"format_2"new_suffix"

Matlab: How to replace dynamic part of string with regexprep

I have strings like
#(foo) 5 + foo.^2
#(bar) bar(1,:) + bar(4,:)
and want the expression in the first group of parentheses (which could be anything) to be replaced by x in the whole string
#(x) 5 + x.^2
#(x) x(1,:) + x(4,:)
I thought this would be possible with regexprep in one step somehow, but after reading the docu and fiddling around for quite a while, I have not found a working solution, yet.
I know, one could use two commands: First, grab the string to be matched with regexp and then use it with regexprep to replace all occurrences.
However, I have the gut feeling this should be somehow possible with the functionality of dynamic expressions and tokens or the like.
Without the support of an infinite-width lookbehind, you cannot do that in one step with a single call to regexprep.
Use the first idea: extract the first word and then replace it with x when found in between word boundaries:
s = '#(bar) bar(1,:) + bar(4,:)';
word = regexp(s, '^#\((\w+)\)','tokens'){1}{1};
s = regexprep(s, strcat('\<',word,'\>'), 'x');
Output: #(x) x(1,:) + x(4,:)
The ^#\((\w+)\) regex matches the #( at the start of the string, then captures alphanumeric or _ chars into Group 1 and then matches a ). tokens option allows accessing the captured substring, and then the strcat('\<',word,'\>') part builds the whole word matching regex for the regexprep command.

Regex match a string and allow specific character to appear randomly

I want to extract a portion of a string, allowing for the dash character to appear randomly throughout. In my match, I want the dash character occurrences to be included.
Let's say I have a scenario like so:
haystack = "arandomse-que-nce"
needle = "sequence"
and I want to come out on the other end with a string like se-que-nce this this case, what would the regex pattern look like?
I would split the string and then join by -*; for example, in JavaScript:
var needle = "sequence"
var regex = new RegExp(needle.split('').join('-*'))
var result = "arandomse-que-nce".match(regex) // ["se-que-nce"]
var result2 = "a-bad-sequ_ence".match(regex) // null
You could also use a regex to insert -* between each character:
var regex = new RegExp(needle.replace(/(?!$|^)/g, '-*'))
Both the split/join method and the replace method return 's-*e-*q-*u-*e-*n-*c-*e' for the regex.
If you have characters like * in your string, that have meanings in regular expressions, you may want to escape them, like so:
var regex = new RegExp(needle.replace(/(?!$|^)/g, '-*')
.replace(/([-\\^$*+?.()|[\]{}])/g, '\\$1'))
Then, if needle was 1+1, for example, it would give you 1-*\+-*1 for the regex.
s-*e-*q-*u-*e-*n-*c-*e-*
The assumes that multiple hyphens in a row are okay.
EDIT: Doorknob's split/join solution is good, but be aware that it only works for character that aren't special characters (*, +, etc.)
I don't know what the specifications are, but if there are special characters, make sure to escape them:
new RegExp(needle.split('').map(function(c) { return '\\' + c; }).join('-*'))
You could try to use:
s-?e-?q-?u-?e-?n-?c-?e

Simple Regular Expression matching

Im new to regular expressions and Im trying to use RegExp on gwt Client side. I want to do a simple * matching. (say if user enters 006* , I want to match 006...). Im having trouble writing this. What I have is :
input = (006*)
input = input.replaceAll("\\*", "(" + "\\" + "\\" + "S\\*" + ")");
RegExp regExp = RegExp.compile(input).
It returns true with strings like BKLFD006* too. What am I doing wrong ?
Put a ^ at the start of the regex you're generating.
The ^ character means to match at the start of the source string only.
I think you are mixing two things here, namely replacement and matching.
Matching is used when you want to extract part of the input string that matches a specific pattern. In your case it seems that is what you want, and in order to get one or more digits that are followed by a star and not preceded by anything then you can use the following regex:
^[0-9]+(?=\*)
and here is a Java snippet:
String subjectString = "006*";
String ResultString = null;
Pattern regex = Pattern.compile("^[0-9]+(?=\\*)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
On the other hand, replacement is used when you want to replace a re-occurring pattern from the input string with something else.
For example, if you want to replace all digits followed by a star with the same digits surrounded by parentheses then you can do it like this:
String input = "006*";
String result = input.replaceAll("^([0-9]+)\\*", "($1)");
Notice the use of $1 to reference the digits that where captured using the capture group ([0-9]+) in the regex pattern.

matlab regexprep replace nth occurence

Matlab documentation states that it is possible to replace the Nth occurrence of the pattern in regexprep. I am failing to see how to implement it and google is not returning anything useful.
http://www.weizmann.ac.il/matlab/techdoc/ref/regexprep.html
Basically the string I have is :,:,1 and I want to replace the second occurrence of : with an arbitrary number. Based on the documentation:
regexprep(':,:,4',':','AnyNumber','N')
I do no understand how the N option should be used. I have tried 'N',2 or just '2'.
Note that the position of the : could be anywhere.
I realize there are other ways of doing this other than regexprep but I don't like having a problem linger.
Thanks for the help!
regexprep(':,:,4',':','AnyNumber',2)
The above works.
According the MATLAB documentation, the general syntax of regexprep is:
newStr = regexprep(str,expression,replace,option1,...optionM);
It looks in the "str", finds matching "expression", and replaces the matching string with "replace". There are 9 available options. Eight of them are fixed strings, one is an integer. The integer tells which one of the matching string to be replaced.
The following code set up all the parameters as variables, find the number of the matching strings, and use that information to replace only the last occurrence.
str = ':,:,4';
expression= ':';
replace = num2str(floor(rand()*10));
% generate a single digit random number converted to string
idx = regexp(str, expression); % use regexp to find the number of matches
regexprep(str, expression, replace, length(idx)); % only replace the last one