Blueprism: how to use the replace function in a calculation stage? - regex

I am reading a text from an application using BluePrism. The text has the following structure (the number varies from case to case): "Please take note of your order reference: 525". I need to be able to extract the number from the text. Looking at the calculation stage, there is a replace function: replace(text, pattern, new-text). I want to use this function to replace all alphabetic characters in my text with an empty string to return only whatever is numeric. How can I input that in the pattern?
So I want something like this:
Replace([Order confirmation text ], /^[A-z]+$/, " ")
Also, I tried to look for a proper documentation for the VBOs that are shipped with blueprism, but couldn't find any. Does anyone know where we can get documentations for blueprism functions?

The Replace() function in calculate stage is the simplest possible one. It's not a regex one!
So, if the stirng is always in that format, then you can use:
Replace([Text],"Please take note of your order reference:","")
If the text is not always that standard, then you should rather use a regular expressions. To do that, you need to use an object, that will invoke a regex code.
In the standard blueprism objects, you can find:
Object: Utility - Strings C#
Action: Extract Regex Values
I think there is no Regex Replace action, by default, so if you'd like to, then you have to implement it. Below you can find a code that I am using:
Dim R as New Regex(Regex_Pattern, RegexOptions.SingleLine)
Dim M as Match = R.Match(Text)
replacement_result = R.Replace(Text,Regex_Pattern,replacement_string)

Quick Answer if the pre text is constant use a Mid statement then this will take out the issue the other guy had with the right. i.e.
Mid("Please take note of your order reference: 525",42,6)
If you aim for a maximum number length it will stop at the end anyway.

A few things here:
-Your pattern isn't matching because it's looking for a constant string of letters from start to finish (^ anchors to the beginning of the string and $ anchors to the end).
-You're replacing the pattern with a space, not an empty string, so you'll end up with a bunch of spaces in your result even if you correct the pattern.
-You said you only want to replace alphabetic characters, but it looks like you also want to get rid of spaces and colons.
Try replacing [A-Za-z :]+ with "".

Your goal is to retrieve number from string then use Right():
Right("Please take note of your order reference: 525", 3)
This will return only numeric.
Regards
Vimal

Related

How do I find more than one synbol using regex?

I have a string of text and numbers which to all intents and purposes looks like a long string of random text.
I need to detect multiple + or multiple - which are either next to each other or spread out throughout the string.
So, for example I need to detect these:
abc+abc+abc-abc-
or
abc++abc--
abc could be numbers or characters. The text could contain zero or one + and zero or one - in any order, at the beginning or anywhere through.
Could someone please assist with a regex (vba compatible) which would assist in determining these?
Many thanks
You don't need to use a single regular expression. Using other methods can be a lot clearer and more straightforward: remove matches for [^+-] to filter out the characters you don't want, then use ([+-])\1 to do the final validation.

List of allowed characters from regular expression

Does someone know about some way how to extract allowed characters from regular expression and construct user friendly message?
For example, by providing regular expression
^[a-zA-Z0-9&\-\+_\.\s]{1,10}$
to get something like
a-z A-Z 0-9 & - + _ . with spaces
I am using java. I can imagine that it could be too complicated or even impossible to cover all types of regular expressions, but maybe you know about some library, tool or algorithm that could help.
Thanks
Yes. It can be done.
What you need is:
Turn your regexp body into a string.
Parse that string (with a regex for instance) that will output the desired list.
Apply possible regexp options (such as ignore case to the result).
This is tedious work if you're not VERY familiar with Regexp. I actually have code in production doing just that, but it's proprietary so I can't post it here and it's not in Java.
I guess you should first ask yourself whether there is no simpler solution for your problem. If for instance your regexp is a constant, you could associate it with a by-hand list of accepted characters.
If your input is a character-class like the one you provided, you could match it with the expression
([^\\]-[^\\]|\\.|[^^$[\]])
that will give you a list of elements like "a-z", "\+", "_" that you could then tidy up a little further, e.g., removing the "\", and then print it nicely formatted.
And you could extract the length information using
{([0-9]+)(,([0-9]+))?}
that accepts {1,10} as well as {10} with the "from" and "to" values being captured each in their own group.
That should get you started.

do we ever use regex to find regex expressions?

let's say i have a very long string. the string has regular expressions at random locations. can i use regex to find the regex's?
(Assuming that you are looking for a JavaScript regexp literal, delimited by /.)
It would be simple enough to just look for everything in between /, but that might not always be a regexp. For example, such a search would return /2 + 3/ of the string var myNumber = 1/2 + 3/4. This means that you will have to know what occurs before the regular expression. The regexp should be preceded by something other than a variable or number. These are the cases that I can think of:
/regex/;
var myVar = /regex/;
myFunction(/regex/,/regex/);
return /regex/;
typeof /regex/;
case /regex/;
throw /regex/;
void /regex/;
"global" in /regex/;
In some languages you can use lookbehind, which might look like this (untested!):
(?=<^|\n|[^\s\w\/]|\breturn|\btypeof|\bcase|\bthrow|\bvoid|\bin)\s*\/(?:\\\/|[^\/\*\n])(?:\\\/|[^\/\n])*\/
However, JavaScript does not support that. I would recommend imitating lookbehind by putting the portion of the regexp designed to match the literal itself in a capturing group and accessing that. All cases of which I am aware can be matched by this regexp:
(?:^|\n|[^\s\w\/]|\breturn|\btypeof|\bcase|\bthrow|\bvoid|\bin)\s*(\/(?:\\\/|[^\/\*\n])(?:\\\/|[^\/\n])*\/)
NOTE: This regex sometimes results in false positives in comments.
If you want to also grab modifiers (e.g. /regex/gim), use
(?:^|\n|[^\s\w\/]|\breturn|\btypeof|\bcase|\bthrow|\bvoid|\bin)\s*(\/(?:\\\/|[^\/\*\n])(?:\\\/|[^\/\n])*\/\w*)
If there are any reserved words I am missing that may be followed by a regexp literal, simply add this to the end of the first group: |\bkeyword
All that remains then is to access the capturing group, using a code similar to the following:
var codeString = "function(){typeof /regex/;}";
var searchValue = /(?:^|\n|[^\s\w\/]|\breturn|\btypeof|\bcase|\bthrow)\s*(\/(?:\\\/|[^\/\*\n])(?:\\\/|[^\/\n])*\/)/g;
// the global modifier is necessary!
var match = searchValue.exec(codeString); // "['typeof /regex/','/regex/']"
match = match[1]; // "/regex/"
UPDATE
I just fixed an error with the regexp concerning escaped slashes that would have caused it to get only /\/ of a regexp like /\/hello/
UPDATE 4/6
Added support for void and in. You can't blame me too much for not including this at first, as even Stack Overflow doesn't, if you look at the syntax coloring in the first code block.
What do you mean by "regular expression"? aaaa is a valid regular expression. This is also a regular expression. If you mean a regular expression literal you might need something like this: /\/(?:[^\\\/]|\\.)*\// (adapted from here).
UPDATE
slebetman makes a good point; regular-expression literals don't need to start with /. In Perl or sed, they can start with whatever you want. Essentially, what you're trying to do is risky and probably won't work for all cases.
Its not the best way to go about this.
You can attempt to do so with some degree of confidence (using EOL to break up into substrings and finding ones that look like regular expressions - perhaps delimited by quotation marks) however dont forget that a very long string CAN be a regex, so you will never have complete confidence using this approach.
Yes, if you know whether (and how!) your regex is delimited. Say, for example, that your string is something like
aaaaa...aaa/b/aaaaa
where 'b' is the 'regular expression' delimited by the character / (this is a near-basic scenario); what you have to do is scan the string for the expected delimiter, extract whatever it's inbetween delimiters (paying attention to escape chars) and you should be set.
This, if your delimiter is a known character and if you are sure that it appears an even number of times or you want to discard the rest (for example, which set of delimiters are you considering in the following string: aaa/b/aaa/c/aaa/d)
If this is the case then you need to follow the same reasoning you'd do to find any substring in a given string. Once you've found the first regexp, keep parsing until you hit the end of the string or you find another regexp, and so on.
I suspect, however, that you are looking for a 'general rule' to find any string that, once parsed, would result in a valid regular expression (say we're talking about POSIX regexp-- try man re_format if you're under *BSD). If that is the case you could try every possible substring of every length of the given string and feed it to a regexp parser for syntax correctness. Still, you have proven nothing of the validity of the regexp, i.e. on what they actually match.
If that is what you're trying to do I strongly recommend finding another way or explaining better what you are trying to accomplish here.

How to make a regular expression looking for a list of extensions separated by a space

I want to be able to take a string of text from the user that should be formated like this:
.ext1 .ext2 .ext3 ...
Basically, I am looking for a dot, a string of alphanumeric characters of any length a space, and rinse and repeat. I am a little confused on how to say " i need a period, string of characters and a space". But also, the last extension could either be followed by nothing, or a space, or a series of spaces. Also, I guess in between extensions could be followed by any number of spaces?
EDIT: I made it clearer what I was looking for.
Thanks!
Try this:
^(?:\.[A-Za-z0-9]+ +)*\.[A-Za-z0-9]+ *$
(Rubular)
In a Java string literal you need to escape the backslashes:
"^(?:\\.[A-Za-z0-9]+ +)*\\.[A-Za-z0-9]+ *$"
(\.\w+)\s* Match this and get your results.
^((\.\w+)\s*)*$ Check this and if it's true, your String is exactly what you want.
For the last pattern thing, you can't (AFAIK) do both getting all extensions (separated) and checking that the last is followed by other things. Either you check your string, or you extract the extensions from it.
I'd start with something like: ^.[a-z0-9]+([\t\n\v ]+.[a-z0-9]+)*$

Regular Expression to List accepted words

I need a regular expression to list accepted Version Numbers. ie. Say I wanted to accept "V1.00" and "V1.02". I've tried this "(V1.00)|(V1.01)" which almost works but then if I input "V1.002" (Which is likely due to the weird version numbers I am working with) I still get a match. I need to match the exact strings.
Can anyone help?
The reason you're getting a match on "V1.002" is because it is seeing the substring "V1.00", which is part of your regex. You need to specify that there is nothing more to match. So, you could do this:
^(V1\.00|V1\.01)$
A more compact way of getting the same result would be:
^(V1\.0[01])$
Do this:
^(V1\.00|V1\.01)$
(. needs to be escaped, ^ means must be on the beginning of the text and $ must be on the end of the text)
I would use the '^' and '$' to mark the beginning and end of the string, like this:
^(V1\.00|V1\.01)$
That way the entire string must match the regex.