regex for excluding text at end of string

regex for excluding text at end of string - regex

I have a regular expression (built in adobe javascript) which finds string which can be of varying length.
The part I need help with is when the string is found I need to exclude the extra characters at the end, which will always end with 1 1.
This is the expression:
var re = new RegExp(/WASH\sHANDLING\sPLANT\s[-A-z0-9 ]{2,90}/);
This is the result:
WASH HANDLING PLANT SIZING STATION SERVICES SHEET 1 1 75 MOR03 MUP POS SU W ST1205 DWG 0001
I need to modify the regex to exclude the string in bold beginning with the 1 1.
Keep in mind the string searched for can be of varying length hence the {2,90}
Can anyone please advise assistance in modifying the REGEX to exclude all string from 1 1
Thank you

You may use a positive lookahead and keep the same functionality:
/WASH\sHANDLING\sPLANT\s[-A-Za-z0-9 ]{2,90}(?=\b1 1\b)/
^^^^^^^^^^^
The (?=\b1 1\b) lookahead requires 1 1 as whole "word" after your match.
See the regex demo
Also, note that [A-z] matches more than just letters.

Related

Regex - Match n occurences of substring within any m-lettered window

I am facing some issues forming a regex that matches at least n times a given pattern within m characters of the input string.
For example imagine that my input string is:
00000001100000001110111100000000000000000000000000000000000000000000000000110000000111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001100
I want to detect all cases where an 1 appears at least 7 times (not necessarily consecutively) in the input string, but within a window of up to 20 characters.
So far I have built this expression:
(1[^1]*?){7,}
which detects all cases where an 1 appears at least 7 times in the input string, but this now matches both the:
11000000011101111
and the
1100000001110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011
parts whereas I want only the first one to be kept, as it is within a substring composed of less than 20 characters.
It tried to combine the aforementioned regex with:
(?=(^[01]{0,20}))
to also match only parts of the string containing either an '1' or a '0' of length up to 20 characters but when I do that it stops working.
Does anyone have an idea gow to accomplish this?
I have put this example in regex101 as a quick reference.
Thank you very much!

This is not something that can be done with regex without listing out every possible string. You would need to iterate over the string instead.
You could also iterate over the matches. Example in Python:
import re
matches = re.finditer(r'(?=((1[^1]*?){7}))', string)
matches = [match.group(1) for match in matches if len(match.group(1)) <= 20]

The next Python snippet is an attempt to get the desired sequences using only the regular expression.
import re
r = r'''
(?mx)
( # the 1st capturing group will contain the desired sequence
1 # this sequence should begin with 1
(?=(?:[01]{6,19}) # let's see that there are enough 0s and 1s in a line
(.*$)) # the 2nd capturing group will contain all characters to the end of a line
(?:0*1){6}) # there must be six more 1s in the sequence
(?=.{0,13} # complement the 1st capturing group to 20 characters
\2) # the rest of a line should be 2nd capturing group
'''
s = '''
0000000
101010101010111111100000000000001
00000001100000001110111100000000000000000000000000000000000000000000000000110000000111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001100
1111111
111111
'''
print([m.group(1) for m in re.finditer(r, s)])
Output:
['1010101010101', '11111100000000000001', '110000000111011', '1111111']
You can find an exhaustive explanation of this regular expression on RegEx101.

Python Regular Expression: No space in between

I have the following string:
"......(some chars) aaa bbb ###8/13/2018 ......(some chars)"
The ### in the string represent some random characters. ###'s length is unknown and it could be None (just "aaa bbb 8/13/2018").
My goal is to find the date from the string (8/13/2018) and the starting index of ###.
I currently used the following code:
m = re.search(r'\s.*?([0-9]{1,}/[0-9]{1,}/[0-9]{2,})', str)
m.groups()[0] ## The date
m.start() ## index of ###
But the regex is matching bbb ###8/13/2018 instead of ###8/13/2018
I also tried change the regex to:
r'\s(?!\s).*?[0-9]{1,}/[0-9]{1,}/[0-9]{2,}'
r'\s(?!\s)*?[0-9]{1,}/[0-9]{1,}/[0-9]{2,}'
But neither of them works.
I will be appreciated for any help or comments. Thank you.

I tend to believe you are looking for:
#*(?:\d{1,2}/){2}\d{2,4} or even \S*(?:\d{1,2}/){2}\d{2,4}
This is simply saying:
\S* start with 0 or more non-space charaters.
(?:\d{1,2}/){2} find two groups of \d{1,2}/ but do not capture them. ie not capturing: (?:..).this will match the month and date part 8/13/. \d{1,2} means atleast one digit and atmost two digits
\d{2,4} match the year .Atleast 2 digits and atmost 4 digits

Using a part of your regex, I think you mean something like this
r'\S*([0-9]+/[0-9]+/[0-9]{2,})'
https://regex101.com/r/dxF4sT/1
To find the starting index, it would be where the match was found.
Note that \S will find all consecutive non-whitespace.
You can change this to other things like [#a-zA-Z] etc..., just add it to the class.

Visual Basic - RegEx - Overall Length Check regardless the number of matches

I have the following problem :
This is my RegEx-Pattern :
\d*[a-z A-Z][a-zA-Z0-9 _?!()\/\\]*
It allows anything but numbers that stand alone like : 1 , 11 , 111 or so on.
My question : How can I set the overall Length of the input regardless of the matches ?
i tried it with several options like {1,30} before each match and i put the regex in a group with ( ) and then {1,30} but it still doesnt work.
If anyone could help me i would appreciate it :).
Allowed string:
Group1
Group 1
1Group
Group!?()\/
Group !()\?!
a1 a1 a1 a1
Not Allowed:
1
11
And so on. {1,30} after a match restricts the number of how many times i can input the match. What i want to know is: How can i set the maximum length of my above RegEx, like after 30 chars the input is reached regardless of the matches?

In order to disallow a numeric string input only, you can use a negative look-ahead (?!\d+$) and to set a limit to the input, use a limiting quantifier {1,30}:
(?!\d+$)[a-zA-Z0-9 _?!()\/\\]{1,30}
See demo
Note that if you plan to match whole strings, you'd need anchors: ^ at the beginning will anchor the regex to the beginning of string, and $ will anchor at the end.
^(?!\d+$)[a-zA-Z0-9 _?!()\/\\]{1,30}$
See another demo

Regular Expressions in R

I found somewhat similar questions
R - Select string text between two values, regex for n characters or at least m characters,
but I'm still having trouble
say I have a string in r
testing_String <- "AK ADAK NAS PADK ADK 70454 51 53N 176 39W 4 X T 7"
And I need to be able to pull anything between the first element in the string that contains 2 characters (AK) and PADK,ADK. PADK and ADK will change in character but will always be 4 and 3 characters in length respectively.
So I would need to pull
ADAK NAS
I came up with this but its picking up everything from AK to ADK
^[A-Za-z0_9_]{2}(.*?) +[A-Za-z0_9_]{4}|[A-Za-z0_9_]{3,}

If I understood your question correctly, this should do the trick:
\b[A-Z]{2}\s+(.+?)\s+[A-Z]{4}\s+[A-Z]{3}\b
Demo
You'll have to switch the perl = TRUE option (to use a decent regex engine).
\b means word boundary. So this pattern looks for a match starting with a 2-letter word and ending with a 4 letter word followed by a 3 letter word. Your value will be in the first group.
Alternatively, you can write the following to avoid using the capturing group:
\b[A-Z]{2}\s+\K.+?(?=\s+[A-Z]{4}\s+[A-Z]{3}\b)
But I'd prefer the first method because it's easier to read.

Lookbehind is supported for perl=TRUE, so this regex will do what you want:
(?<=\w{2}\s).*?(?=\s+[^\s]{4}\s[^\s]{2})

What is wrong with this Regular Expression?

I am beginner and have some problems with regexp.
Input text is : something idUser=123654; nick="Tom" something
I need extract value of idUser -> 123456
I try this:
//idUser is already 8 digits number
MatchCollection matchsID = Regex.Matches(pk.html, #"\bidUser=(\w{8})\b");
Text = matchsID[1].Value;
but on output i get idUser=123654, I need only number
The second problem is with nick="Tom", how can I get only text Tom from this expresion.

you don't show your output code, where you get the group from your match collection.
Hint: you will need group 1 and not group 0 if you want to have only what is in the parentheses.

.*?idUser=([0-9]+).*?
That regex should work for you :o)

Here's a pattern that should work:
\bidUser=(\d{3,8})\b|\bnick="(\w+)"
Given the input string:
something idUser=123654; nick="Tom" something
This yields 2 matches (as seen on rubular.com):
First match is User=123654, group 1 captures 123654
Second match is nick="Tom", group 2 captures Tom
Some variations:
In .NET regex, you can also use named groups for better readability.
If nick always appears after idUser, you can match the two at once instead of using alternation as above.
I've used {3,8} repetition to show how to match at least 3 and at most 8 digits.
API links
Match.Groups property
This is how you get what individual groups captured in a match

Use look-around
(?<=idUser=)\d{1,8}(?=(;|$))
To fix length of digits to 6, use (?<=idUser=)\d{6}(?=($|;))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex for excluding text at end of string - regex

You may use a positive lookahead and keep the same functionality: /WASH\sHANDLING\sPLANT\s[-A-Za-z0-9 ]{2,90}(?=\b1 1\b)/ ^^^^^^^^^^^ The (?=\b1 1\b) lookahead requires 1 1 as whole "word" after your match. See the regex demo Also, note that [A-z] matches more than just letters.

Related

Regex - Match n occurences of substring within any m-lettered window

Python Regular Expression: No space in between

Visual Basic - RegEx - Overall Length Check regardless the number of matches

Regular Expressions in R

What is wrong with this Regular Expression?

Categories

Resources