I'm trying to create a regex to match the last word of a string, but only if the string starts with a certain pattern.
For example, I want to get the last word of a string only if the string starts with "The cat".
"The cat eats butter" -> would match "butter".
"The cat drinks milk"-> would match "milk"
"The dog eats beef" -> would find no match.
I know the following will give me the last word:
\s+\S*$
I also know that I can use a positive look behind to make sure a string starts with a certain pattern:
(?<=The cat )
But I can't figure out to combine them.
I'll be using this in c# and I know I could combine this with some string comparison operators but I'd like this all to be in one regex expression, as this is one of several regex pattern string that I'll be looping through.
Any ideas?
Use the following regex:
^The cat.*?\s+(\S+)$
Details:
^ - Start of the string.
The cat - The "starting" pattern.
.*? - A sequence of arbitrary chars, reluctant version.
\s+ - A sequence of "white" chars.
(\S+) - A capturing group - sequence of "non-white" chars,
this is what you want to capture.
$ - End of the string.
So the last word will be in the first capturing group.
What about this one?
^The\scat.*\s(\w+)$
My regex knowdlege is quite rusty, but couldn't you simply "add" the word you are looking for at the start of \s+\S*$, if you know that will return the last word?
Something like this then (the "\" is supposed to be the escape sign so it's read as the actual word):
\T\h\e\ \c\a\t\ \s+\S*$
Without Regex
No need for regex. Just use C#'s StartsWith with Linq's Split(' ').Last().
See code in use here
using System;
using System.Linq;
using System.Text.RegularExpressions;
class Example {
static void Main() {
string[] strings = {
"The cat eats butter",
"The cat drinks milk",
"The dog eats beef"
};
foreach(string s in strings) {
if(s.StartsWith("The cat")) {
Console.WriteLine(s.Split(' ').Last());
}
}
}
}
Result:
butter
milk
With Regex
If you prefer, however, a regex solution, you may use the following.
See code in use here
using System;
using System.Text.RegularExpressions;
class Example {
static void Main() {
string[] strings = {
"The cat eats butter",
"The cat drinks milk",
"The dog eats beef"
};
Regex regex = new Regex(#"(?<=^The cat.*)\b\w+$");
foreach(string s in strings) {
Match m = regex.Match(s);
if(m.Success) {
Console.WriteLine(m.Value);
}
}
}
}
Result:
butter
milk
Related
I want to search all the words in a line/sentence and detect any word with ize and convert it to ise except for certain words listed.
Find: ^(?!size)(?!resize)(?!Belize)(?!Bizet)(?!Brize)(?!Pfizer)(?!assize)(?!baize)(?!bedizen)(?!citizen)(?!denizen)(?!filesize)(?!maize)(?!prize)(?!netizen)(?!seize)(?!wizen)(?!outsize)(?!oversize)(?!misprize)(?!supersize)(?!undersize)(?!unsized)(?!upsize)([a-zA-Z-\s]+)ize
Replace: $1ise
So far all i get is the first word of the line with ize to work, or the last word with ize to work.
Example Organize to socialize whatever size.
To Organise to socialise whatever size.
Find (?i)(?!size|resize|Belize|so&so|unsized|upsize)(?<!\w)(\w+)ize
Replace $1ise
worked as intended. Capitalisation issues added (?i)
The regex ([a-zA-Z-\s]+)ize has the whitespace marker in it (\s) so it will will match anything beyond the word boundary. You might want to work with \w and/or \b to match only characters from the word where the "ize" is located. Additionally, you don't want the ^ at the beginning since this would match the start of the string.
Possible regex: (?!....your list....)(\w+)ize
Example input: "Organize to socialize whatever size."
Found matches: "Organize" and "socialize", but not "size", see https://regex101.com/r/UIfoa8/1
After that you can use your replacement $1ise to replace the found string with the captured group and "ise".
Make a Whitelist Array
Make the excluded words (whitelist) an array of strings
.split(' ') the text being searched through (searchStr) into an array
then .map() through each word of the array
using .indexOf() to compare a word vs. the whitelist
using .test() to see if it's a x+"ize" word to .replace()
Once the searchArray is complete, .join() it into a string (resultString).
Demo
"organize", "mesmerized", "socialize", and "baptize" was mixed into the search string of some whitelist words
var searchStr = `organize Belize Bizet mesmerized Brize Pfizer assize baize bedizen citizen denizen filesize socialize maize prize netizen seize wizen outsize baptize`;
var whitelist = ["size", "resize", "Belize", "Bizet", "Brize", "Pfizer", "assize", "baize", "bedizen", "citizen", "denizen", "filesize", "maize", "prize", "netizen", "seize", "wizen", "outsize", "oversize", "misprize", "supersize", "undersize", "unsized", "upsize"];
var searchArray = searchStr.split(' ').map(function(word) {
var match;
if (whitelist.indexOf(word) !== -1) {
match = word;
} else if (/([a-z]+?)ize/i.test(word)) {
match = word.replace(/([a-z]+?)ize/i, '$1ise');
} else {
match = word;
}
return match;
});
var resultString = searchArray.join(', ');
console.log(resultString);
I am parsing a feed and need to exclude fields that consist of a string with the word "bad", in any combination of case.
For example "bad" or "Bad id" or "user has bAd id" would not pass the regular expression test,
but "xxx Badlands ddd" or "aaabad" would pass.
Exclude anything that matches /\bbad\b/i
The \b matches word boundaries and the i modifier makes it case insensitive.
For javascript, you can just put your word in the regex and do the match \b stnads for boundries, which means no character connected :
/\bbad\b/i.test("Badkamer") // i for case-insensitive
You may try this regex:
^(.*?(\bbad\b)[^$]*)$
REGEX DEMO
I think the easiest way to do this would be to split the string into words, then check each word for a match, It could be done with a function like this:
private bool containsWord(string searchstring, string matchstring)
{
bool hasWord = false;
string[] words = searchstring.split(new Char[] { ' ' });
foreach (string word in words)
{
if (word.ToLower() == matchstring.ToLower())
hasWord = true;
}
return hasWord;
}
The code converts everything to lowercase to ignore any case mismatches. I think you can also use RegEx for this:
static bool ExactMatch(string input, string match)
{
return Regex.IsMatch(input.ToLower(), string.Format(#"\b{0}\b", Regex.Escape(match.ToLower())));
}
\b is a word boundary character, as I understand it.
These examples are in C#. You didn't specify the language
I am trying to replace a certain group to "" by using regex.
I was searching and doing my best, but it's over my head.
What I want to do is,
string text = "(12je)apple(/)(jj92)banana(/)cat";
string resultIwant = {apple, banana, cat};
In the first square bracket, there must be 4 character including numbers.
and '(/)' will come to close.
Here's my code. (I was using matches function)
string text= #"(12dj)apple(/)(88j1)banana(/)cat";
string pattern = #"\(.{4}\)(?<value>.+?)\(/\)";
Regex rex = new Regex(pattern);
MatchCollection mc = rex.Matches(text);
if(mc.Count > 0)
{
foreach(Match str in mc)
{
print(str.Groups["value"].Value.ToString());
}
}
However, the result was
apple
banana
So I think I should use replace or something else instead of Matches.
The below regex would capture the word characters which are just after to ),
(?<=\))(\w+)
DEMO
Your c# code would be,
{
string str = "(12je)apple(/)(jj92)banana(/)cat";
Regex rgx = new Regex(#"(?<=\))(\w+)");
foreach (Match m in rgx.Matches(str))
Console.WriteLine(m.Groups[1].Value);
}
IDEONE
Explanation:
(?<=\)) Positive lookbehind is used here. It sets the matching marker just after to the ) symbol.
() capturing groups.
\w+ Then it captures all the following word characters. It won't capture the following ( symbol because it isn't a word character.
I have a string like '[1]-[2]-[3],[4]-[5],[6,7,8],[9]' or '[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]', I'd like the Pattern to get the list result, but don't know how to figure out the pattern. Basically the comma is the split, but [6,7,8] itself contains the comma as well.
the string: [1]-[2]-[3],[4]-[5],[6,7,8],[9]
the result:
[1]-[2]-[3]
[4]-[5]
[6,7,8]
[9]
or
the string: [Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]
the result:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
,(?=\[)
This pattern splits on any comma that is followed by a bracket, but keeps the bracket within the result text.
The (?=*stuff*) is known as a "lookahead assertion". It acts as a condition for the match but is not itself part of the match.
In C# code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
foreach(String s in Regex.Split(inputstring, #",(?=\[)"))
System.Console.Out.WriteLine(s);
In Java code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile(",(?=\\[)"));
for(String s : p.split(inputstring))
System.out.println(s);
Either produces:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
Although I believe the best approach here is to use split (as presented by #j__m's answer), here's an approach that uses matching rather than splitting.
Regex:
(\[.*?\](?!-))
Example usage:
String input = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile("(\\[.*?\\](?!-))");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Resulting output:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
An answer that doesn't use regular expressions (if that's worth something in ease of understanding what's going on) is:
substitute "]#[" for "],["
split on "#"
I want to match text that contains:
MyValue="{NON_SPACEs}{SPACE_ONE_OE_MORE}{NON_SPACEs}"
pattern:
MyValue="(\S*)(\s+)(\S*)"
Example of text:
sometext MyValue="val1 val2" sometext="xyz"
the problem of my pattern that it's also matches:
sometext MyValue="val1val2" sometext="xyz" (no space between val1 and val2)
I use this for tests: http://regexpal.com/
Restrict your non-space chars to also be non-quotes:
MyValue="([^\s"]*)(\s+)([^\s"]*)"
This regex won't try to span multiple quoted values.
Consider removing some or all of those brackets, especially around the spaces, unless you need to capture a group.
This is what you are looking for:
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string txt="abc xyz";
string re1=".*?"; // Non-greedy match on filler
string re2="(\\s+)"; // White Space 1
Regex r = new Regex(re1+re2,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String ws1=m.Groups[1].ToString();
Console.Write("("+ws1.ToString()+")"+"\n");
}
Console.ReadKey();
}
}
}
Hope it Helps :)