Regex to capture first substring with the following properties - regex

I'm looking for a regex to capture the first substring with the following properties:
The substring contains no lowercase letters or symbols
The substring is immediately preceded by "..."
The substring is immdiately followed by "...\n
For example, I'd like to capture "FOO BAR" in the following
"...this is TEXT...\n that...\nI DON'T CARE ABOUT...\nbut I do care about...FOO BAR...\nNothing else matters."

Use this:
// This will target only capitol letters and numbers and spaces
// it will also capture the first occurrence only
/\.\.\.[A-Z0-9 ]+\.\.\.\n/
Here is an example usage:
var string = "...this is TEXT...\n that...\nI DON'T CARE ABOUT...\nbut I do care about...FOO BAR...\nNothing else matters.";
var regex = new RegExp(/\.\.\.[A-Z0-9 ]+\.\.\.\n/);
var res = regex.exec(string);
var result = res[0].substring(3, res[0].length - 4); // strip out the ... and \n
console.log(result);

Related

How to refine results of a regular expression

function getPrecedents(thisFormula){
var exp = /(\w+\!)?\$?[A-Z]{1,}(?:\d+)?(\:?\$?\w+)*(?!\()\b/gm;
var results=[];
var result;
while ((result=exp.exec(thisFormula))!== null){
results.push(result);
}
return results;
}
From the above code I am getting the following results
Trigger_Hires!$AA$15
AD$7
Trigger_Hires!$AC60
Trigger_Hires!$AB60
Rev
Import_Staffing!AD$16
Trigger_Hires!$AC60
Trigger_Hires!$AB60
Customers
Import_Staffing!AD$19
Trigger_Hires!$AC60
I would like to eliminate results that are just letters like Rev and Customers either with modified regexp or 2nd loop
I suggest adding a check before adding the match to the results array:
while (result=exp.exec(thisFormula)) {
if (!/^[A-Za-z]+$/.test(result[0]))
results.push(result[0]);
}
Note you need to access result[0] to get the whole regex match value. To check if the match value is all letters, ^[A-Za-z]+$ regex is used: ^ asserts the position at the start of the string, [A-Za-z]+ matches 1+ letters and $ asserts the position at the end of the string.

Search through whole line and change words with ize to ise using regex in Notepad++

I want to search all the words in a line/sentence and detect any word with ize and convert it to ise except for certain words listed.
Find: ^(?!size)(?!resize)(?!Belize)(?!Bizet)(?!Brize)(?!Pfizer)(?!assize)(?!baize)(?!bedizen)(?!citizen)(?!denizen)(?!filesize)(?!maize)(?!prize)(?!netizen)(?!seize)(?!wizen)(?!outsize)(?!oversize)(?!misprize)(?!supersize)(?!undersize)(?!unsized)(?!upsize)([a-zA-Z-\s]+)ize
Replace: $1ise
So far all i get is the first word of the line with ize to work, or the last word with ize to work.
Example Organize to socialize whatever size.
To Organise to socialise whatever size.
Find (?i)(?!size|resize|Belize|so&so|unsized|upsize)(?<!\w)(\w+)ize
Replace $1ise
worked as intended. Capitalisation issues added (?i)
The regex ([a-zA-Z-\s]+)ize has the whitespace marker in it (\s) so it will will match anything beyond the word boundary. You might want to work with \w and/or \b to match only characters from the word where the "ize" is located. Additionally, you don't want the ^ at the beginning since this would match the start of the string.
Possible regex: (?!....your list....)(\w+)ize
Example input: "Organize to socialize whatever size."
Found matches: "Organize" and "socialize", but not "size", see https://regex101.com/r/UIfoa8/1
After that you can use your replacement $1ise to replace the found string with the captured group and "ise".
Make a Whitelist Array
Make the excluded words (whitelist) an array of strings
.split(' ') the text being searched through (searchStr) into an array
then .map() through each word of the array
using .indexOf() to compare a word vs. the whitelist
using .test() to see if it's a x+"ize" word to .replace()
Once the searchArray is complete, .join() it into a string (resultString).
Demo
"organize", "mesmerized", "socialize", and "baptize" was mixed into the search string of some whitelist words
var searchStr = `organize Belize Bizet mesmerized Brize Pfizer assize baize bedizen citizen denizen filesize socialize maize prize netizen seize wizen outsize baptize`;
var whitelist = ["size", "resize", "Belize", "Bizet", "Brize", "Pfizer", "assize", "baize", "bedizen", "citizen", "denizen", "filesize", "maize", "prize", "netizen", "seize", "wizen", "outsize", "oversize", "misprize", "supersize", "undersize", "unsized", "upsize"];
var searchArray = searchStr.split(' ').map(function(word) {
var match;
if (whitelist.indexOf(word) !== -1) {
match = word;
} else if (/([a-z]+?)ize/i.test(word)) {
match = word.replace(/([a-z]+?)ize/i, '$1ise');
} else {
match = word;
}
return match;
});
var resultString = searchArray.join(', ');
console.log(resultString);

Regex for characters in specific location in string

Using notepad++, how can I replace the -s noted by the carats? The dashes I want to replace occurs every 7th character in the string.
11.871-2-2.737-2.00334-2
^ ^ ^
123456781234567812345678
It's pretty simple since it's only dashes:
(\S*?)-
Begin capture group.............................. (
Find any number of non-space chars... \S*
Lazily until...............................................?
End capture group...................................)
No capture find hyphen...........................-
Demo 1
var str = `11.871-2-2.737-2.00334-2`;
var sub = `$1`;
var rgx = /(\S*?)-/g;
var res = str.replace(rgx, sub);
console.log(res);
"There is a dash (right above 1) that I would like to preserve. This seems to get rid of all the dashes in the string"
The question clearly shows that there isn't a dash at the "1 position", but since there's a possibility that it's possible considering the pattern (n7). Don't have time to break it down, but I can refer you to a proper definition of the meta char \b.
Demo 2
var str = `-11.871-2-2.737-2.00334-2`;
var sub = `$1$2`;
var rgx = /\b[-]{1}(\S*?)-(\S*?)\b/g;
var res = str.replace(rgx, sub);
console.log(res);
Search for ([0-9\.-]{6,6})-
Replace with: $1MY_SEPARATOR

Grab first 4 characters of two words RegEx

I would like to grab the first 4 characters of two words using RegEx. I have some RegEx experinece however a search did not yeild any results.
So if I have Awesome Sauce I would like the end result to be AwesSauc
Use the Replace Text action with the following parameters:
Pattern: \W*\b(\p{L}{1,4})\w*\W*
Replacement text: $1
See the regex demo.
Pattern details:
\W* - 0+ non-word chars (trim from the left)
\b - a leading word boundary
(\p{L}{1,4}) - Group 1 (later referred to via $1 backreference) matching any 1 to 4 letters (incl. Unicode ones)
\w* - any 0+ word chars (to match the rest of the word)
\W* - 0+ non-word chars (trim from the right)
I think this RegEx should do the job
string pattern = #"\b\w{4}";
var text = "The quick brown fox jumps over the lazy dog";
Regex regex = new Regex(pattern);
var match = regex.Match(text);
while (match.Captures.Count != 0)
{
foreach (var capture in match.Captures)
{
Console.WriteLine(capture);
}
match = match.NextMatch();
}
// outputs:
// quic
// brow
// jump
// over
// lazy
Alternatively you could use patterns like:
\b\w{1,4} => The, quic, brow, fox, jump, over, the, lazy, dog
\b[\w|\d]{1,4} => would also match digits
Update:
added a full example for C# and modified the pattern slightly. Also added some alternative patterns.
one approach with Linq
var res = new string(input.Split().SelectMany((x => x.Where((y, i) => i < 4))).ToArray());
Try this expression
\b[a-zA-Z0-9]{1,4}
Using regex would in fact be more complex and totally unnecessary for this case. Just do it as either of the below.
var sentence = "Awesome Sau";
// With LINQ
var linqWay = string.Join("", sentence.Split(" ".ToCharArray(), options:StringSplitOptions.RemoveEmptyEntries).Select(x => x.Substring(0, Math.Min(4,x.Length))).ToArray());
// Without LINQ
var oldWay = new StringBuilder();
string[] words = sentence.Split(" ".ToCharArray(), options:StringSplitOptions.RemoveEmptyEntries);
foreach(var word in words) {
oldWay.Append(word.Substring(0, Math.Min(4, word.Length)));
}
Edit:
Updated code based on #Dai's comment. Math.Min check borrowed as is from his suggestion.

c# regex split or replace. here's my code i did

I am trying to replace a certain group to "" by using regex.
I was searching and doing my best, but it's over my head.
What I want to do is,
string text = "(12je)apple(/)(jj92)banana(/)cat";
string resultIwant = {apple, banana, cat};
In the first square bracket, there must be 4 character including numbers.
and '(/)' will come to close.
Here's my code. (I was using matches function)
string text= #"(12dj)apple(/)(88j1)banana(/)cat";
string pattern = #"\(.{4}\)(?<value>.+?)\(/\)";
Regex rex = new Regex(pattern);
MatchCollection mc = rex.Matches(text);
if(mc.Count > 0)
{
foreach(Match str in mc)
{
print(str.Groups["value"].Value.ToString());
}
}
However, the result was
apple
banana
So I think I should use replace or something else instead of Matches.
The below regex would capture the word characters which are just after to ),
(?<=\))(\w+)
DEMO
Your c# code would be,
{
string str = "(12je)apple(/)(jj92)banana(/)cat";
Regex rgx = new Regex(#"(?<=\))(\w+)");
foreach (Match m in rgx.Matches(str))
Console.WriteLine(m.Groups[1].Value);
}
IDEONE
Explanation:
(?<=\)) Positive lookbehind is used here. It sets the matching marker just after to the ) symbol.
() capturing groups.
\w+ Then it captures all the following word characters. It won't capture the following ( symbol because it isn't a word character.