Matlab documentation states that it is possible to replace the Nth occurrence of the pattern in regexprep. I am failing to see how to implement it and google is not returning anything useful.
http://www.weizmann.ac.il/matlab/techdoc/ref/regexprep.html
Basically the string I have is :,:,1 and I want to replace the second occurrence of : with an arbitrary number. Based on the documentation:
regexprep(':,:,4',':','AnyNumber','N')
I do no understand how the N option should be used. I have tried 'N',2 or just '2'.
Note that the position of the : could be anywhere.
I realize there are other ways of doing this other than regexprep but I don't like having a problem linger.
Thanks for the help!
regexprep(':,:,4',':','AnyNumber',2)
The above works.
According the MATLAB documentation, the general syntax of regexprep is:
newStr = regexprep(str,expression,replace,option1,...optionM);
It looks in the "str", finds matching "expression", and replaces the matching string with "replace". There are 9 available options. Eight of them are fixed strings, one is an integer. The integer tells which one of the matching string to be replaced.
The following code set up all the parameters as variables, find the number of the matching strings, and use that information to replace only the last occurrence.
str = ':,:,4';
expression= ':';
replace = num2str(floor(rand()*10));
% generate a single digit random number converted to string
idx = regexp(str, expression); % use regexp to find the number of matches
regexprep(str, expression, replace, length(idx)); % only replace the last one
Related
I have strings like
#(foo) 5 + foo.^2
#(bar) bar(1,:) + bar(4,:)
and want the expression in the first group of parentheses (which could be anything) to be replaced by x in the whole string
#(x) 5 + x.^2
#(x) x(1,:) + x(4,:)
I thought this would be possible with regexprep in one step somehow, but after reading the docu and fiddling around for quite a while, I have not found a working solution, yet.
I know, one could use two commands: First, grab the string to be matched with regexp and then use it with regexprep to replace all occurrences.
However, I have the gut feeling this should be somehow possible with the functionality of dynamic expressions and tokens or the like.
Without the support of an infinite-width lookbehind, you cannot do that in one step with a single call to regexprep.
Use the first idea: extract the first word and then replace it with x when found in between word boundaries:
s = '#(bar) bar(1,:) + bar(4,:)';
word = regexp(s, '^#\((\w+)\)','tokens'){1}{1};
s = regexprep(s, strcat('\<',word,'\>'), 'x');
Output: #(x) x(1,:) + x(4,:)
The ^#\((\w+)\) regex matches the #( at the start of the string, then captures alphanumeric or _ chars into Group 1 and then matches a ). tokens option allows accessing the captured substring, and then the strcat('\<',word,'\>') part builds the whole word matching regex for the regexprep command.
I'm trying to use Regex in VBA for analyzing a txt file.
The result should only show the first number after the searchword. In my syntax not all possibilities are in the result. and it also shows numbers in the next line.
Thx 4 Help
This is my regex
(Sword)?\d
https://regex101.com/r/fLx1qN/1
In my syntax not all possibilities are in the result. and it also shows numbers in the next line
The question isn't very precise about which strings should match and which ones shouldn't.
Number is ambiguous. Do you mean digit, or integer?
After is ambiguous. Do you mean immediately after, any time after?
Do you want to match the number immediately after the search word? Or can it follow anywhere after the search word?
Is the search meant to be case sensitive?
For digit use: (Sword)[^\d]*(\d), for integer use: (Sword)[^\d]*(-?\d+)
For immediately after (allowing for space and colon) use: (Sword)[:\s]*(\d)
To make the expression case insensitive with VBA, use the following reference: https://www.regular-expressions.info/vbscript.html
Set myRegExp = New RegExp
myRegExp.Pattern = "(Sword)[^\d]*(-?\d+)"
myRegExp.IgnoreCase = True
I am trying to use the value.match command in OpenRefine 2.6 for splitting two columns based on a 4 number date.
A sample of the text is:
"first sentence, second sentence, third sentences, 2009"
What I do is going to "Add column based on this column" and insert
value.match(\d{4})
but I get the error
Parsing error at offset 12: Missing number, string, identifier, regex,
or parenthesized expression
any idea of the possible solution?
You need to fix 3 things to get this working:
1) As Wiktor says you need to start & end the regular expression with a forward slash /
2) The 'match' function requires you to match the whole string in the cell, not just the fragment you need - so your regular expression needs to match the whole string
3) To extract part of a string with 'match' you need to have capture groups in your regular expression- that is use ( ) around the bit of the regular expression you want to extract. The captured values will be put in an array and you will need to get the string out of tge array to store it in a cell
So you'll need something like:
value.match(/.*(\d{4})/)[0]
To get the four digit year from the end of the string
I am trying to get a hex value from a string with this condition "VALUE: num,num,num,HEX,num,num"
I have the following
% set STRINGTOPARSE "VALUE: 12,12,13,2,9,5271256369606C00,0,0"
% regexp {(,[0-9A-Z]+,)+} $STRINGTOPARSE result1 result2 result3
1
% puts $result1
,12,
% puts $result2
,12,
% puts $result3
I believe the condition of {(,[0-9A-Z]+,)+} will be sufficient to take the HEX from above string,
but instead I got the first result ",12," and not the HEX that I want. What have I done wrong ?
You might want to use split instead:
set result [lindex [split $STRINGTOPARSE ","] 5]
regexp is not giving you the result you are looking for because the first part that matches is ,12, and the match stops there and won't look for more matches.
You could use regexp to do it, but it will be more messy... one possible way would be to match each comma:
regexp {^(?:[^,]*,){5}([0-9A-F]+),} $STRINGTOPARSE -> hex
Where (?:[^,]*,){5} matches the first 5 non-comma parts with their commas, and ([0-9A-F]+) then grabs the hex value you're looking for.
I think that the problem is that you seem to think [0-9A-Z] will have to match at least a letter, which is not the case. it will match any character within the character class and you get a match as long as you get 1 character to match.
If you wanted a regex to match a series of characters with both numbers and letters, then you would have to use some lookaheads (using classes alone might make it more messy):
regexp {\y(?=[^,A-Z]*[0-9])(?=[^,0-9]*[A-Z])[0-9A-Z]+\y} $STRINGTOPARSE -> hex
But... this might look even more complex than before, so I would advise sticking to splitting instead :)
I have written a small program to whir through a textfile and find and replace regex where 9 digits \d{9}. It works fine, except what I need is a little more complicated.
I am finding the right data correctly. theFile is just a string with the text file streamread into it. I do this and then create and write it to another file.
But I need to find each string match individually, and replace that match with only the last 5 digits of that individual number (currently this is just replacing with FOUND). Keeping the file otherwise identical.
I am not sure how/what is the best way of doing this? would i have to split into an array of strings rather than one mass string? (it's quite a big file)
Any questions let me know, thanks in advance.
Dim regexString As String = "(\d{9})"
Dim replacement1 As String = "FOUND"
Dim rgx As New Regex(regexString)
Try
theFile = rgx.Replace(theFile, replacement1)
Catch
End try
Instead of using just one replacement pattern \d{9} split and group with two patterns, the first is 4 numbers long, the second 5 numbers. Then in the replace use only the last 5 numbers from the last group
Dim k = "abcd 123456789 abcf"
Dim ptn = "(\d{4})(\d{5})"
Dim result = Regex.Replace(k, ptn, "$2")
This approach leaves unchanged the sequences with less than 9 consecutive numbers, but if you have sequences with more than 9 numbers and don't want to change them, then you need a pattern with
Dim ptn = "(\b\d{4})(\d{5}\b)"
to fix the two groups inside a sequence of exactly nine numbers.
The question appears to ask for matches on exactly nine digits and wants the first four to be removed. Ie to replace the nine digits with the last five.
Splitting the regular expression in the question into two parts, for the unwanted and the wanted parts gives
regexString = "\d{4}(\d{5})"
which captures the wanted five digits, so then the replacement is
replacement1 ="$1"
Or in some other regular expression implementations it would be replacement1 ="\1". Additionally the replace method in some regular expression system may have additional options (parameters) for replace first versus replace n-th versus replace all occurrences.
Suppose there are more than nine digits and only the final five are wanted. In this case the regular expression can be written as one of the following (as different regular expression languages support different features). The replacement expression is the same as above.
regexString = "\d{4,}(\d{5})"
regexString = "\d\d\d\d+(\d{5})"
regexString = "\d\d\d\d\d*(\d{5})"
Because regular expressions are normally "greedy" the \d{5} should always match the final 5 digits but it may be worth finishing the regular expression with ...(\d{5})([^\d]|$) and changing the replace to be $1$2. That way it looks for a trailing non-digit or end-of-string.