How do I match these text lines in regex? - regex

I'm trying to match the three first text lines in regex, i.e. the ones ending with form.
value="something form"
value="Second cool form"
value="another silly old form"
value="blabla"
How can I do that?

I don't know what tool you are using, but the following pattern should match the first three lines:
.*form"$
Demo

You could simply use:
.*form"$
In order to work, you would have to turn on multiline mode.
Dot (.) means - match me anything but newline character, asterisk (*) means - match me dot zero or more times after which comes text form. Dollar sign ($) is anchor to the string ending.
Take a look at demo. You should learn more about regular expressions here, this is basic regex matching.

You can try using this:
\w*form\b
\w*: Allows characters in front of form
\b: Makes sure that form is at the end of the string.
Regex 101 demo

Actually if you want to match the 'form' as a separate word, you need something like this:
\Wform\W
\W (capital W) is any character which does not represent a word character, at least in perl-like regex.

Related

Regex - how do I match this?

I've been trying hard to get this Regex to work, but am simply not good enough at this stuff apparently :(
Regex - Trying to extract sources
I thought this would work... I'm trying to get all of the content where:
It starts with ds://
Ends with either carriage return or line feed
That's it! Essentially I'm going to then do a negative lookahead such that I can remove all content that is NOT conforming to above (in Notepad++) which allows for Regex search/replace.
Search for lines that contain the pattern, and mark them
Search menu > Mark
Find what: ds://.*\R
check Regular expression
Check Mark the lines
Find all
Remove the non marked lines
Search menu > Bookmark
Remove unmarked lines
You don't need to add the \w specifier to look for a word after the ds:// in the look ahead. Removing that and altering the final specification from "zero or one carriage return, then zero or one newline" to "either a carriage return or a newline" in capture group should do it for you:
(?=ds:\/\/).*(?:\r|\n)
Update: Carriage return or Line feed group does not need to be captured.
Update 2: The following regex will actually work for your proposed use case in the comments, matching everything but the pattern you described in the question.
^(?:(?!ds:\/\/.*(?:\r|\n)).)*$
You regex (?=ds:\w+).*\r?\n? does not match because in the content there is ds:// and \w does not match a forward slash. To make your regex work you could change it to:
(?=ds://\w+).*\r?\n? demo which can be shortened to ds://.*\R? demo
Note that you don't have to escape the forward slash.
If you want to do a find and replace to keep the lines that contain ds:// you could use a negative lookahead:
Find what
^(?!.*ds://).*\R?
Replace with
Leave empty
Explanation
^ Start of the string
(?!.*ds://) Negative lookahead to assert the string does not contain ds://
.* Match any character 0+ times
\R? An optional unicode newline sequence to also match the last line if it is not followed by a newline
See the Regex demo
Here you go, Andrew:
Regex: ds:\/\/.*
Link: https://regex101.com/r/ulO9GO/2
Let me know if any question.

Regex - returning a match without a period

I'm using the below regex string to match the word "kohls" which is located in a group of other words.
\W*((?i)kohls(?-i))\W*
It works great when the word is alone, but if the word is in a url, the match includes a period on both sides.
See the below examples:
Thank you for shopping at Kohls - returns a match for kohls.
https://www.kohls.com - returns a match for .kohls.
Edit. https://www.KohlsAndMichaels.com - doesn't return any match for kohls.
I want it to only extract the exact match for kohls without periods or any other symbols/text in front or behind it. Can you tell me what I'm doing wrong?
In cases like that you can always use a site like regex101.com, which explains the regular expression and shows the matches with colors. So this is how your regular expression currently works:
As you can see in blue color, the problem with the dots is in the \W*, which matches any non-word character. In order to fix this, you can use the following regular expression:
\b((?i)kohls(?-i))\b
The \b (before and after the word you want to match) is used to assert the position at a word boundary. See how this work on that website now:
If you still have questions, look at the explanation of the regular expression provided by that website. It is worth looking.
The \W metacharacter is used to find non-word characters. So adding a star operator will match 0 or more of these non-word characters (like periods). Did you meant to add a word boundary instead?
\b(?i)kohls(?-i)\b
Replace both \W* with [\W,\.\-]* etc.
Should be enough.

Regex to extract only text after string and before space

I want to match text after given string. In this case, the text for lines starting with "BookTitle" but before first space:
BookTitle:HarryPotter JK Rowling
BookTitle:HungerGames Suzanne Collins
Author:StephenieMeyer BookTitle:Twilight
Desired output is:
HarryPotter
HungerGames
I tried: "^BookTitle(.*)" but it's giving me matches where BookTitle: is in middle of line, and also all the stuff after white space. Anyone help?
you can have positive lookbehind in your pattern.
(?<=BookTitle:).*?(?=\s)
For more info: Lookahead and Lookbehind Zero-Width Assertions
What language is this?
And provide some code, please; with the ^ anchor you should definitely only be matching on string that begin with BookTitle, so something else is wrong.
If you can guarantee that all whitespace is stripped from the titles, as in your examples, then ^BookTitle:(\S+) should work in many languages.
Explanation:
^ requires the match to start at the beginning of the string, as you know.
\s - *lower*case means: match on white*s*pace (space, tab, etc.)
\S - *upper*case means the inverse: match on anything BUT whitespace.
\w is another possibility: match on *w*ord character (alphanumeric plus underscore) - but that will fail you if, for example, there's an apostrophe in the title.
+, as you know, is a quantifier meaning "at least one of".
Hope that helps.
With the 'multi-line' regex option use something like this:
^BookTitle:([^\s]+)
Without multi-line option, this:
(?:^|\n)BookTitle:([^\s]+)

regular expression no characters

I have this regular expression
([A-Z], )*
which should match something like
test, (with a space after the comma)
How to I change the regex expression so that if there are any characters after the space then it doesn't match.
For example if I had:
test, test
I'm looking to do something similar to
([A-Z], ~[A-Z])*
Cheers
Use the following regular expression:
^[A-Za-z]*, $
Explanation:
^ matches the start of the string.
[A-Za-z]* matches 0 or more letters (case-insensitive) -- replace * with + to require 1 or more letters.
, matches a comma followed by a space.
$ matches the end of the string, so if there's anything after the comma and space then the match will fail.
As has been mentioned, you should specify which language you're using when you ask a Regex question, since there are many different varieties that have their own idiosyncrasies.
^([A-Z]+, )?$
The difference between mine and Donut is that he will match , and fail for the empty string, mine will match the empty string and fail for ,. (and that his is more case-insensitive than mine. With mine you'll have to add case-insensitivity to the options of your regex function, but it's like your example)
I am not sure which regex engine/language you are using, but there is often something like a negative character groups [^a-z] meaning "everything other than a character".

How to match any character in regex

How can I match all characters including new line with a regex.
I am trying to match all characters between brackets "()". I don't want to activate Dot matches all.
I tried
\([.\n\r]*\)
But it doesn't work.
(.*\) This doesn't work if there is an new line between the brackets.
I have been using http://regexpal.com/ to test my regular expressions. Tell me if you know something better.
I'd usually use something like \([\S\s]*\) in this situation.
The [\S\s] will match any whitespace or non-whitespace character.
The first example doesn't work because inside a character class the dot is treated literally (Matches the . character instead of all characters).
\((.|[\n\r])*\)