since few days I am sitting and fighting with the regular expression without any success
My first expression, what I want:
brackets just one time, doesn't matter where
Text or numbers before and after brackets optional
numbers within the brackets
Example what is allowed:
[32] text1
text1 [5]
text1 [103] text2
text1
[123]
[some value [33]] (maybe to complicated, would be not so important?)
My second expression is similar but just numbers before and after the brackets instead text
[32] 11
11 [5]
11 [103] 22
11
[123]
no match:
[12] xxx [5] (brackets are more than one time)
[aa] xxx (no number within brackets)
That's what I did but is not working because I don't know how to do with the on-time-brackets:
^.*\{?[0-9]*\}.*$
From some other answer I found also that, that's looks good but I need that for the numbers:
^[^\{\}]*\{[^\{\}]*\}[^\{\}]*$
I want to use later the number in the brackets and replace with some other values, just for some additional information, if important.
Hope someone can help me. Thanks in advance!
This is what you want:
^([^\]\n]*\[\d+\])?[^[\n]*$
Live example
Update: For just numbers:
^[\d ]*(\[\d+\])?[\d ]*$
Explaination:
^ Start of line
[^...] Negative character set --> [^\]] Any character except ]
* Zero or more length of the Class/Character set
\d 0-9
+ One or more length of the Class/Character set
(...)? 0 or 1 of the group
$ End of line
Note: These RegExs can return empty matches.
Thanks to #MMMahdy-PAPION! He improved the answer.
Related
My goal is to find entries in the text that have the form
[{XX} xxx]
[ xxx {XX} xxx]
[xxx {XX}]
(with parentheses inclusive)
xxx is a random value
I tried to do and this is what happened to do
\[*(?=\{.*\}]*)([^]]+)\]
But unfortunately it doesn't work 100% right
blocked for [{XX} xxxx] [{XX xxx] [10 xxx] [xxx{XXX} xx] {X} io]
I wanted to see 2 options highlighted: [{XX} xxxx] and [xxx{SSS} xx]
You can try using the following regex:
\[[^\[]*?\s?{[^{]+}\s?[^\]]*?\]
Regex Explanation:
\[: open bracket
[^\[]*?: least amount of non-open bracket characters
\s?: optional space
{[^{]+}: least amount of non-open braces between braces
\s?: optional space
[^\]]*?: least amount of non-closed bracket characters
\]: closed bracket
Check the demo here.
I have the following string:
"......(some chars) aaa bbb ###8/13/2018 ......(some chars)"
The ### in the string represent some random characters. ###'s length is unknown and it could be None (just "aaa bbb 8/13/2018").
My goal is to find the date from the string (8/13/2018) and the starting index of ###.
I currently used the following code:
m = re.search(r'\s.*?([0-9]{1,}/[0-9]{1,}/[0-9]{2,})', str)
m.groups()[0] ## The date
m.start() ## index of ###
But the regex is matching bbb ###8/13/2018 instead of ###8/13/2018
I also tried change the regex to:
r'\s(?!\s).*?[0-9]{1,}/[0-9]{1,}/[0-9]{2,}'
r'\s(?!\s)*?[0-9]{1,}/[0-9]{1,}/[0-9]{2,}'
But neither of them works.
I will be appreciated for any help or comments. Thank you.
I tend to believe you are looking for:
#*(?:\d{1,2}/){2}\d{2,4} or even \S*(?:\d{1,2}/){2}\d{2,4}
This is simply saying:
\S* start with 0 or more non-space charaters.
(?:\d{1,2}/){2} find two groups of \d{1,2}/ but do not capture them. ie not capturing: (?:..).this will match the month and date part 8/13/. \d{1,2} means atleast one digit and atmost two digits
\d{2,4} match the year .Atleast 2 digits and atmost 4 digits
Using a part of your regex, I think you mean something like this
r'\S*([0-9]+/[0-9]+/[0-9]{2,})'
https://regex101.com/r/dxF4sT/1
To find the starting index, it would be where the match was found.
Note that \S will find all consecutive non-whitespace.
You can change this to other things like [#a-zA-Z] etc..., just add it to the class.
I'm trying to validate that a form field contains a valid score for a volleyball match. Here's what I have, and I think it works, but I'm not an expert on regular expressions, by any means:
r'^ *([0-9]{1,2} *- *[0-9]{1,2})((( *[,;] *)|([,;] *)|( *[,;])|[,;]| +)[0-9]{1,2} *- *[0-9]{1,2})* *$'
I'm using python/django, not that it really matters for the regex match. I'm also trying to learn regular expressions, so a more optimal regex would be useful/helpful.
Here are rules for the score:
1. There can be one or more valid set (set=game) results included
2. Each result must be of the form dd-dd, where 0 <= dd <= 99
3. Each additional result must be separated by any of [ ,;]
4. Allow any number of sets >=1 to be included
5. Spaces should be allowed anywhere except in the middle of a number
So, the following are all valid:
25-10 or 25 -0 or 25- 9 or 23 - 25 (could be one or more spaces)
25-10,25-15 or 25-10 ; 25-15 or 25-10 25-15 (again, spaces allowed)
25-1 2 -25, 25- 3 ;4 - 25 15-10
Also, I need each result as a separate unit for parsing. So in the last example above, I need to be able to separately work on:
25-1
2 -25
25- 3
4 - 25
15-10
It'd be great if I could strip the spaces from within each result. I can't just strip all spaces, because a space is a valid separator between result sets.
I think this is solution for your problem.
str.replace(r"(\d{1,2})\s*-\s*(\d{1,2})", "$1-$2")
How it works:
(\d{1,2}) capture group of 1 or 2 numbers.
\s* find 0 or more whitespace.
- find -.
$1 replace content with content of capture group 1
$2 replace content with content of capture group 2
you can also look at this.
So, I've built a regex which follows this:
4!a2!a2!c[3!c]
which is translated to
4 alpha character followed by
2 alpha characters followed by
2 characters followed by
3 optional character
this is a standard format for SWIFT BIC code HSBCGB2LXXX
my regex to pull this out of string is:
(?<=:32[^:]:)(([a-zA-Z]{4}[a-zA-Z]{2})[0-9][a-zA-Z]{1}[X]{3})
Now this is targeting a specific tag (32) and works, however, I'm not sure if it's the cleanest, plus if there are any characters before H then it fails.
the string being matched against is:
:32B:HsBfGB4LXXXHELLO
the following returns HSBCGB4LXXX, but this:
:32B:2HsBfGB4LXXXHELLO
returns nothing.
EDIT
For clarity. I have a string which contains multiple lines all starting with :2xnumber:optional letter (eg, :58A:) i want to specify a line to start matching in and return a BIC from anywhere in the line.
EDIT
Some more example data to help:
:20:ABCDERF Z
:23B:CRED
:32A:140310AUD2120,
:33B:AUD2120,
:50K:/111222333
Mr Bank of Dad
Dads house
England
:52D:/DBEL02010987654321
address 1
address 2
:53B:/HSBCGB2LXXX
:57A://AU124040
AREFERENCE
:59:/44556677
A line which HSBCGB2LXXX contains a BIC
:70:Another line of data
:71A:Even more
Ok, so I need to pass in as a variable the tag 53 or 59 and return the BIC HSBCGB2LXXX only!
Your regex can be simplified, and corrected to allow a character before the H, to:
:32[^:]:.?([a-zA-Z]{6}\d[a-zA-Z]XXX)
The changes made were:
Lost the look behind - just make it part of the match
Inserting .? meaning "optional character"
([a-zA-Z]{4}[a-zA-Z]{2}) ==> [a-zA-Z]{6} (4+2=6)
[0-9] ==> \d (\d means "any digit")
[X]{3} ==> XXX (just easier to read and less characters)
Group 1 of the match contains your target
I'm not quite sure if I understand your question completely, as your regular expression does not completely match what you have described above it. For example, you mentioned 3 optional characters, but in the regexp you use 3 mandatory X-es.
However, the actual regular expression can be further cleaned:
instead of [a-zA-Z]{4}[a-zA-Z]{2}, you can simply use [a-zA-Z]{6}, and the grouping parentheses around this might be unnecessary;
the {1} can be left out without any change in the result;
the X does not need surrounding brackets.
All in all
(?<=:32[^:]:)([a-zA-Z]{6}[0-9][a-zA-Z]X{3})
is shorter and matches in the very same cases.
If you give a better description of the domain, probably further improvements are also possible.
How can I write this as a regular expression?
tabspaceSTRINGtabspace
My data looks like this:
12345 adsadasdasdasd 30
34562 adsadasdasdasd asdadaads<adasdad 30
12313 adsadasdasdasd asdadas dsaads 313123<font="TNR">adsada 30
1232131 adsadasdasdasd asdadaads<adasdad"asdja <div>asdjaıda 30
I want to get
12345 30
34562 30
12313 30
1232131 30
\t*\t doesn't work.
try the following regular expression
\t.+\t
The problem there is your definition of String...
If you use something like the suggested above, it'll match
tabspaceSTRINGtabspacetabspace
You get the picture. This might be acceptable, if not, you need to limit your "STRING" definition, like:
\t\w+\t
or:
\t[a-zA-Z]+\t
What characters are allowed in your string?
\t\w+\t
\w would allow letters, digits and the underscore (depending on your regex engine ASCII or Unicode)
See it here on Regexr, a good platform to test regular expressions.
Your "regex" \t*\t would match 0 or more tabs and then one tab. The * is a quantifier meaning 0 or more and is referring to the character or group before (here to your \t)
If your whitespace are not tabs, try this
\s+.+\s+30
\s is a whitespace character (space, tab, newline (not important for Notepad++)).
If you are not sure about the strings you are looking for except that they are separated by tabs it is a good approach to describe such a string as everything but a tab: (^\t*)
[^\t]*\t([^\t]*)\t[^\t]*
You can test it on regexpad.com.