Regex match multiple items

Regex match multiple items - regex

I have the following string:
<button {{ $attributes->class([
'bg-blue-600 hover:bg-blue-700 text-white px-3 py-2 rounded',
'bg-blue-600 px-3 py-2 hover:bg-blue-700 text-white rounded',
])->merge([
'wire:click' => $click,
]) }}>
{{ $label }}
</button>
I'm trying to get a VS Code extension (headwind) to match the stuff inside the class method single quotes via custom regex setting.
I have this regex which works a bit:
class\(([^)]*)\)
However the problem is its matching everything inside of the braces, which makes headwind mess up.
I need it to match each occurence of stuff inside the single quotes. How do I do this?

You can use
(?<=\\bclass\\(\\[\\s*'(?:[^']*'\\s*,\\s*')*)[^']+(?=')
I.e., the (?<=\bclass\(\[\s*'(?:[^']*'\s*,\s*')*)[^']+(?=') escaped version.
See the regex demo. Details:
(?<=\bclass\(\[\s*'(?:[^']*'\s*,\s*')*) - a position that is immediately preceded with a whole word class([, zero or more whitespaces, ', and then zero or more occurrences of any zero or more chars other than ', ', a , enclosed with zero or more whitespaces and then a ' char
[^']+ - one or more chars other than '
(?=') - a location that is immediately followed with a ' char.

Related

Regex Pattern Matching at Beginning of String with BeautifulSoup

I'm currently looking for a way to perform pattern matching via regex at the beginning of an HTML class name. The pattern I'm trying to match is:
"col-xs-.*"
Two examples of classes in the HTML page are:
<div class="col-xs-12 col-sm-12 col-lg-12">
<div class="mod-tiles__sizer col-xs-6 col-sm-4 col-lg-3">
The goal is to only match the above class name, as it actually starts with "col-xs-.*", which is what I am after. Using my current regex matching I can't seem to single these class names out. Currently I'm trying to match using the following regex pattern:
regex = re.compile('^col-xs-.*$')
soup.find_all("div", class_ = regex)
Unfortunately this pattern also prints out the second class name (where "col-xs-.*" appears in the middle and not just at the start). Hopefully someone has a solution to this issue.

I think you want attribute = value css selector with starts with ^ operator to specify the prefix string to find in the class attribute.
soup.select('[class^="col-xs-"]')
Example:
from bs4 import BeautifulSoup as bs
html = '''
<div class="col-xs-12 col-sm-12 col-lg-12">
<div class="mod-tiles__sizer col-xs-6 col-sm-4 col-lg-3">
'''
soup = bs(html, 'lxml')
classes = [' '.join(item['class']) for item in soup.select('[class^="col-xs-"]')]
print(classes)

I'm guessing that this expression might likely extract those desired classes:
import re
regex = r"[\"']\s*(\bcol-xs-[0-9]+\b[^\"']+?)\s*[\"']"
test_str = """
<div class="col-xs-12 col-sm-12 col-lg-12"><div class=" col-xs-12 col-sm-12 col-lg-12 ">
<div class="mod-tiles__sizer col-xs-6 col-sm-4 col-lg-3"><div class="col-xs-12 col-sm-12 col-lg-12">
<div class="mod-tiles__sizer col-xs-6 col-sm-4 col-lg-3">
"""
print(re.findall(regex, test_str, re.MULTILINE | re.IGNORECASE))
Output
['col-xs-12 col-sm-12 col-lg-12', 'col-xs-12 col-sm-12 col-lg-12', 'col-xs-12 col-sm-12 col-lg-12']
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

If you want to find them without beautiful supe, this is the way to do it.
All div tags with a class attribute where col-xs- is at the beginning of the value:
Includes whitespace trimming.
r"(?i)<div(?=(?:[^>\"']|\"[^\"]*\"|'[^']*')*?(?<=\s)class\s*=\s*(?:(['\"])\s*(col-xs-(?:(?!\1)[\S\s])*?)\s*\1))\s+(?:\"\S\s]*?\"|'\S\s]*?'|[^>]*?)+>"
https://regex101.com/r/rsXqI9/1
Formatted:
Class value is in group 2.
(?i)
< div
(?=
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
(?<= \s )
class \s* = \s*
(?:
( ['"] ) # (1)
\s*
( # (2 start)
col-xs-
(?:
(?! \1 )
[\S\s]
)*?
) # (2 end)
\s*
\1
)
)
\s+
(?: " \S\s ]*? " | ' \S\s ]*? ' | [^>]*? )+
>

HTML5 Regex Pattern for textbox validation: allow alphabet, spaces, and certain characters only

I am creating a form in html and for a text input, I want to have a regex validation on it that only allows the user to input: letters of the alphabet, numbers and certain characters;
/ - forward slash
- - hyphen
. - period (full stop)
& - ampersand
- spaces
I have tried this pattern but to no avail:
[a-zA-Z0-9\/\-\.\&\ ]
My HTML code:
<input type="text" id="payment_reference" name="payment_reference" maxlength="18" value="' . $payment_reference_default . '" pattern="[a-zA-Z0-9\/\-\.\&\ ]" required>
I know how to get only alphabet characters and numbers, but allowing the other characters as well is something I'm unable to manage.

You need to remove escaping from every non-special char because in FF (and Chrome) the HTML5 regex is compiled with u modifier, and it lays bigger restriction on the pattern. To enable matching 1 or more allowed chars, add + after the character class:
pattern="[a-zA-Z0-9/.& -]+"
Note that all special chars in your pattern except - are not special inside a character class. Hence, only - can be escaped, but it is common to put it at the start/end of the character class to avoid overescaping.
input:valid {
color: black;
}
input:invalid {
color: red;
}
<form name="form1">
<input pattern="[a-zA-Z0-9/.& -]+" title="Please enter the data in correct format." />
<input type="Submit"/>
</form>

You need to escape character properly.
- i am not escaping it here because when you use it as first or last character in character class you don't need to escape
[a-zA-Z0-9\/.&\s-]
Demo

.Net Regex - omitting space in group capture

I've got something like this
<div class="col-md-6">
<div class="form-group">
<label class="control-label col-md-3">Expires at:</label>
<div class="col-md-9">
<p class="form-control-static">
March 3rd, 2019 </p>
</div>
I wanna capture the date. The only way i know how to do it is like this which captures the space after 2019 which i don't need.
(?<=>Expires at:<\/label>\n<div class=\"col-md-9\">\n<p class=\"form-control-static\">\n)([^<]*)

Well your match condition is "all characters except <", which is literally what you get. You can either use some kind of trimming function in your language outside of the regex to post-process it, or write the regex you actually mean:
(?<=>Expires at:<\/label>\n<div class=\"col-md-9\">\n<p class=\"form-control-static\">\n)([ \w,]*?)\s*<
ie, all alphanumerical characters, numbers and commas, as few as possible, followed by as many spaces as possible (or 0), followed by <, and only capture the first part. See it in action here.
Also, obligatory reference to this post.

Find text between key phrases

I have a var that have some text in:
<cfsavecontent variable="foo">
element.password_input=
<div class="holder">
<label for="$${input_id}" > $${label_text}</label>
<input name="$${input_name}" id="$${input_id}" value="$${input_value}" type="password" />
</div>
# END element.password_input
element.text_input=
<div class="ctrlHolder">
<label for="$${element_id}" > $${element_label_text}</label>
<input name="$${element_name}" id="$${element_id}"
value="$${element_value}" type="text"
class="textInput" />
</div>
# END element.text_input
</cfsavecontent>
and I am trying to parse through the var to get all of the different element type(s) here is what I have so far:
ar = REMatch( "element\.+(.*=)(.*?)*", foo )
but it is only giving me this part:
element.text_input=
element.password_input=
any help will be appreciated.

Your immediate problem is that by default . doesn't include newlines - you would need to use the flag (?s) in your regex for it to do this.
However, simply enabling that flag still wont result in your present regex doing what you're expecting it to do.
A better regex would be:
(element\.\w+)=(?:[^##]+|##(?! END \1))+(?=## END \1)
You would then do ListFirst(match[i],'=') and ListRest(match[i],'=') to get the name and value. (rematch doesn't return captured groups).
(Obviously the #s above are doubled to escape them for CF.)
The above regex dissected is:
(element\.\w+)=
Match element. and any alphanumeric, placed it into capture group 1, then match = character.
(?:
[^##]+
|
##(?! END \1)
)+
Match any number of non-hash characters, or a hash not followed by the ending token (using negative lookahead (?!...)) and referencing capture group 1 (\1), repeat as many times as possible (+), using a non-capturing group ((?:...)).
(?=## END \1)
Lookahead (?=...) to confirm the variable's ending token is present.

Regex to exclude multiple strings

Could use some help with Regex searching with NetBeans 7.01's find function.
I'm trying to exclude multiple strings. Specifically, the target lines:
<div class="table_left">
<div class="table_right">
<div class="table_clear">
I need to match only the third and other Div classes that are not either table_left or table_right.
I've tried:
class="table_(((?!left).*)|((?!right).*))
and
class="table_(left|right){0}
I realized while pasting my first Regex line that I'm matching not right OR not left, which is returning both. What is the proper way to specify two conditions? The and operator?
The joys of searching for words that are also Boolean operators...

Try this pattern:
<div\s+class="(?!table_(left|right))[^"]+"
which wouldn't match:
<div class="table_left">
<div class="table_right">
but would match:
<div class="table_clear">
<div class="foo">
EDIT
The HT wrote:
I need to match only classes that begin with table, but are not right or left
Ah, okay, that would look like:
<div\s+class="table_(?!left|right)[^"]+"
or
<div\s+class="table(?!_left|_right)[^"]+"
as you already found yourself (but I included it in my answer for completeness sake).
A quick explanation of the pattern <div\s+class="table_(?!left|right)[^"]+":
<div # match '<div'
\s+ # match one ore more space chars
class="table_(?!left|right) # match 'class="table_' only if it is not followed by 'left' or 'right'
[^"]+ # match one or more characters other than '"'
" # match a '"'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex match multiple items - regex

Related

Regex Pattern Matching at Beginning of String with BeautifulSoup

HTML5 Regex Pattern for textbox validation: allow alphabet, spaces, and certain characters only

.Net Regex - omitting space in group capture

Find text between key phrases

Regex to exclude multiple strings

Categories

Resources