I have a string of text and numbers which to all intents and purposes looks like a long string of random text.
I need to detect multiple + or multiple - which are either next to each other or spread out throughout the string.
So, for example I need to detect these:
abc+abc+abc-abc-
or
abc++abc--
abc could be numbers or characters. The text could contain zero or one + and zero or one - in any order, at the beginning or anywhere through.
Could someone please assist with a regex (vba compatible) which would assist in determining these?
Many thanks
You don't need to use a single regular expression. Using other methods can be a lot clearer and more straightforward: remove matches for [^+-] to filter out the characters you don't want, then use ([+-])\1 to do the final validation.
Related
I am reading a text from an application using BluePrism. The text has the following structure (the number varies from case to case): "Please take note of your order reference: 525". I need to be able to extract the number from the text. Looking at the calculation stage, there is a replace function: replace(text, pattern, new-text). I want to use this function to replace all alphabetic characters in my text with an empty string to return only whatever is numeric. How can I input that in the pattern?
So I want something like this:
Replace([Order confirmation text ], /^[A-z]+$/, " ")
Also, I tried to look for a proper documentation for the VBOs that are shipped with blueprism, but couldn't find any. Does anyone know where we can get documentations for blueprism functions?
The Replace() function in calculate stage is the simplest possible one. It's not a regex one!
So, if the stirng is always in that format, then you can use:
Replace([Text],"Please take note of your order reference:","")
If the text is not always that standard, then you should rather use a regular expressions. To do that, you need to use an object, that will invoke a regex code.
In the standard blueprism objects, you can find:
Object: Utility - Strings C#
Action: Extract Regex Values
I think there is no Regex Replace action, by default, so if you'd like to, then you have to implement it. Below you can find a code that I am using:
Dim R as New Regex(Regex_Pattern, RegexOptions.SingleLine)
Dim M as Match = R.Match(Text)
replacement_result = R.Replace(Text,Regex_Pattern,replacement_string)
Quick Answer if the pre text is constant use a Mid statement then this will take out the issue the other guy had with the right. i.e.
Mid("Please take note of your order reference: 525",42,6)
If you aim for a maximum number length it will stop at the end anyway.
A few things here:
-Your pattern isn't matching because it's looking for a constant string of letters from start to finish (^ anchors to the beginning of the string and $ anchors to the end).
-You're replacing the pattern with a space, not an empty string, so you'll end up with a bunch of spaces in your result even if you correct the pattern.
-You said you only want to replace alphabetic characters, but it looks like you also want to get rid of spaces and colons.
Try replacing [A-Za-z :]+ with "".
Your goal is to retrieve number from string then use Right():
Right("Please take note of your order reference: 525", 3)
This will return only numeric.
Regards
Vimal
Does someone know about some way how to extract allowed characters from regular expression and construct user friendly message?
For example, by providing regular expression
^[a-zA-Z0-9&\-\+_\.\s]{1,10}$
to get something like
a-z A-Z 0-9 & - + _ . with spaces
I am using java. I can imagine that it could be too complicated or even impossible to cover all types of regular expressions, but maybe you know about some library, tool or algorithm that could help.
Thanks
Yes. It can be done.
What you need is:
Turn your regexp body into a string.
Parse that string (with a regex for instance) that will output the desired list.
Apply possible regexp options (such as ignore case to the result).
This is tedious work if you're not VERY familiar with Regexp. I actually have code in production doing just that, but it's proprietary so I can't post it here and it's not in Java.
I guess you should first ask yourself whether there is no simpler solution for your problem. If for instance your regexp is a constant, you could associate it with a by-hand list of accepted characters.
If your input is a character-class like the one you provided, you could match it with the expression
([^\\]-[^\\]|\\.|[^^$[\]])
that will give you a list of elements like "a-z", "\+", "_" that you could then tidy up a little further, e.g., removing the "\", and then print it nicely formatted.
And you could extract the length information using
{([0-9]+)(,([0-9]+))?}
that accepts {1,10} as well as {10} with the "from" and "to" values being captured each in their own group.
That should get you started.
over on the Excel VBA forum here someone has asked for help with matching strings like the below:
Examples:
ACBD,AC - Match
ACBD,CA - Match
ACBD,ADB - Match
AC,ABCD - Match
ABC, ABD - No Match
the rule is that strings match on condition that all of the letters in one string is contained in the other (i.e either one of the two strings contain all the letters of the other)
So it occurred to me that a Regular expression might be the answer, but I am an absolute newbie on that so can you help please?
Is it possible to match both strtings against each other ?
thanks
Philip
While Regex would certainly make the check easier, I don't that this is not possible without additional coding. You would need the code to do one of the following things:
1) match each character individually then see if all matches were true,
2) re-arrange the order of the characters in all possible order permutations and check each order to see if that matched
Either way, you would need to manipulate the "checking" string in order to cover all of the possible requirements of the match.
If you had asked for "any of these characters" or "all of these characters, in this order", you might be able to do it without extra logic, but since you need "any of these characters, in any order", you've need to manipulate the inputs.
I haven't got an answer for you in VBA but can tell you the steps you need to take.
For each element create a variable with the characters sorted into alphabetical order - you will need to search the net for a sort function to do this as there is not one built into VBA.
Insert a .* between each character in both variables - these are your regexs. You probably want to incorporate this step in with the sort function.
Then all you need to do is match element one of your array with the regex variable created from the second element and then do the second with the first.
Working on a migrations class in php.
If I have a string like this:
create_users_roles_table
and I want to get the words between the first and the last word correctly, plus being able to get the word correct if there's only one word inbetween like:
create_users_table
How do I go about that?
I've done:
(\B)_([a-zA-Z]+)_?([a-zA-Z]+)_table
and that works fine when I do create_users_roles_table
and produces users and roles.
But when only doing create_users_table it produces user and s.
Obviously I need it to produce only users.
Anyone?
I think it should read
(\B)_([a-zA-Z]+)_?([a-zA-Z]+)?_table
But this won't work if there are three words in between. I'd suggest stripping the words and then splitting them separately, since I don't think regular expressions can handle variable number of capture groups.
If you can be sure of how many words there can be, you can always hard code this. For tree or less words you can use
(\B)_([a-zA-Z]+)(?:_([a-zA-Z]+))?(?:_([a-zA-Z]+))?_table
I have an text that consists of information enclosed by a certain pattern.
The only thing I know is the pattern: "${template.start}" and ${template.end}
To keep it simple I will substitute ${template.start} and ${template.end} with "a" in the example.
So one entry in the text would be:
aINFORMATIONHEREa
I do not know how many of these entries are concatenated in the text. So the following is correct too:
aFOOOOOOaaASDADaaASDSDADa
I want to write a regular expression to extract the information enclosed by the "a"s.
My first attempt was to do:
a(.*)a
which works as long as there is only one entry in the text. As soon as there are more than one entries it failes, because of the .* matching everything. So using a(.*)a on aFOOOOOOaaASDADaaASDSDADa results in only one capturing group containing everything between the first and the last character of the text which are "a":
FOOOOOOaaASDADaaASDSDAD
What I want to get is something like
captureGroup(0): aFOOOOOOaaASDADaaASDSDADa
captureGroup(1): FOOOOOO
captureGroup(2): ASDAD
captureGroup(3): ASDSDAD
It would be great to being able to extract each entry out of the text and from each entry the information that is enclosed between the "a"s. By the way I am using the QRegExp class of Qt4.
Any hints? Thanks!
Markus
Multiple variation of this question have been seen before. Various related discussions:
Regex to replace all \n in a String, but no those inside [code] [/code] tag
Using regular expressions how do I find a pattern surrounded by two other patterns without including the surrounding strings?
Use RegExp to match a parenthetical number then increment it
Regex for splitting a string using space when not surrounded by single or double quotes
What regex will match text excluding what lies within HTML tags?
and probably others...
Simply use non-greedy expressions, namely:
a(.*?)a
You need to match something like:
a[^a]*a
You have a couple of working answers already, but I'll add a little gratuitous advice:
Using regular expressions for parsing is a road fraught with danger
Edit: To be less cryptic: for all there power, flexibility and elegance, regular expression are not sufficiently expressive to describe any but the simplest grammars. Ther are adequate for the problem asked here, but are not a suitable replacement for state machine or recursive decent parsers if the input language become more complicated.
SO, choosing to use RE for parsing input streams is a decision that should be made with care and with an eye towards the future.