Regex wrapping word - regex

Regex example
How can I exclude the first space in every match?
The same regex: (?:^|\W)#(\w+)(?!\w)

Is this what you're looking for?
http://regexr.com/3ca98
From the information you gave us until now, this regex should also be sufficient: #(\w+)(?!\w).
But maybe there's more to it than we know. What did you want to achieve with the (?:^|\W)?
Edit: Thinking about what you probably want to achieve, it occured to me that you might only match your pattern if it's not in the middle of another word (e.g. test#case). You probably don't want to match this.
To exclude such cases, you have to asure that there's some kind of whitespace character in front of it, or in other words: nothing else but whitespace characters or nothing.
I assume you use javascript because regexr.com does and sadly, there is no regex lookbehind available in javascripts regex implementation. So there is no real option to make sure there is only nothing or whitespace in front of your pattern.
One solution would be to work with capture groups. Take this regex:
(?:^|\s+)(#\w+)
It searches for one or more whitespace characters or linestarts in front of your pattern but doesn't use a capture group for that. Then your pattern is up and it's the first capture group in the whole expression.
To use this in javascript now, you need to instantiate a RegExp object and use its function exec until there are no more matches and save the first capture group to a result array.
JS code:
var txt = text.innerHTML;
var re = /(?:^|\s+)(#\w+)/g;
var res = [];
var tmpresult = [];
while ((tmpresult = re.exec(txt)) !== null) {
res.push(tmpresult[1]); // push first capture group to result stack
}
result.innerHTML = JSON.stringify(res, null, 2);
JSFiddle: https://jsfiddle.net/j41tw4hm/1/
Updated regexr.com: http://regexr.com/3ca9n

Related

Replacing Regex expression that is not supported with Google Script

A short background of what I am trying to achieve: I have a Google Doc and A google sheet.
The google doc contains text and the google sheet contains 2 columns: a word and it's translation.
the function gets the body of the google doc and supposed to go over the "words" col, identify all appearances of each word in the body and replace it with its translation - but it matches only occurrences that are whole-words and exact match only.
What basically I want to have would be easier to explain with an example:
Let's say I have the word "pop" and it is translated to "pretty". I want the function to replace the word except for cases like:
pop's
allpop
popping
etc..
So basically, as was mentioned only if it's an exact match and a whole word only.
This is the function, the regex works fine, the problem is that it is not supported with google script. I couldn't come up with a solution that replaces the regex I made with one that works and meet my requirements.
I attach the code so in case something is unclear, you would be able to understand what I meant if you're familiar with regex.
function replaceText(body, words, origin, translated) {
for(var i=0; i<words.length; i++){
var word = words[i][origin-1];
var regex = RegExp("(?:\\b)" + word + "\\b(?!\\')",'gi');
Logger.log(body.getText().match(regex));
Logger.log(body.replaceText(regex, translation));
var translation = words[i][translated-1];
var foundElement = body.replaceText(regex, translation);
}
return body;
}
Also if you're interested, attached the link with what regex expressions are supported by Google Script:
https://github.com/google/re2/wiki/Syntax
First, (?:\\b) should just be \\b, the word boundary is zero-width anyway, so it does not need a lookaround.
Second, I understand that your issue is specifically with replaceText. The line body.getText().match(regex); works with regular JavaScript string method, which supports the usual regexes. The issue is that you need replaceText, and that one is different.
Third, replaceText does not take a regular expression object as a parameter: its arguments are strings. Check the docs again.
Finally, since we don't want to treat ' as a word boundary and don't have lookahead support, a solution is to escape ' by replacing it with a weird enough alphanumeric string that won't occur naturally. At the end, replace back.
function translate() {
var body = DocumentApp.getActiveDocument().getBody();
var escape = "uJKiy5hzXNUWFDl7k2pSZoDZ8ipv6LR1ArTi6gXu"; // from https://www.random.org/strings/?num=2&len=20&digits=on&upperalpha=on&loweralpha=on&unique=on&format=html&rnd=new
body.replaceText("'", escape);
// the loop would begin here
var word = "pop";
body.replaceText("(?i)\\b" + word + "\\b", "translation");
// loop would end here.
body.replaceText(escape, "'");
}
Note that case-insensitive flag is (?i), and that replacement in replaceText is always global.
And watch out for curly apostrophes: if they need to special treatment too, escape them similarly but using some other random string.

Regex select everything up until next match including new lines

Im trying to capture the conversation below but the regex expression only capture a single line, I want it to capture the entire phrase said by anyone up until the next person says anything else. If I use the /s setting, the '.+' will capture everything until the end of the file not until the next match
Im new to the regular expressions, sorry for any bad explanation
This is what Ive got so far
The regex expression:
/([0-9]{2}\/[0-9]{2}\/[0-9]{2} [0-9]{2}\:[0-9]{2}\:[0-9]{2}: (.+):) (.+)/
What I want
Regex101 Fiddle
I going to use use both \2 and \3 to capture who said and the phrase said inside a for loop so I can text mine it
Using a pattern to extract, then some LINQ to process:
var pattern = "^[0-9]{2}/[0-9]{2}/[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}: (.+?): ((?:[^/]+(?:\n|$))+)";
var data = Regex.Matches(src, pattern, RegexOptions.Multiline).Cast<Match>().Select(m => new { who = m.Groups[1].Value, text = m.Groups[2].Value});

Regex to extract second word from URL

I want to extract a second word from my url.
Examples:
/search/acid/all - extract acid
/filter/ion/all/sss - extract ion
I tried to some of the ways
/.*/(.*?)/
but no luck.
A couple things:
The forward slashes / have to be escaped like this \/
The (.*?) will match the least amount of any character, including zero characters. In this case it will always match with an empty string.
The .* will take as many characters as it can, including forward slashes
A simple solution will be:
/.+?\/(.*?)\//
Update:
Since you are using JavaScript, try the following code:
var url = "/search/acid/all";
var regex = /.+?\/(.*?)\//g;
var match = regex.exec(url);
console.log(match[1]);
The variable match is a list. The first element of that list is a full match (everything that was matched), you can just ignore that, since you are interested in the specific group we wanted to match (the thing we put in parenthesis in the regex).
You can see the working code here
This regex will do the trick:
(?:[^\/]*.)\/([^\/]*)\/
Proof.
For me, I had difficulties with the above answers for URL without an ending forward slash:
/search/acid/all/ /* works */
/search/acid /* doesn't work */
To extract the second word from both urls, what worked for me is
var url = "/search/acid";
var regex = /(?:[^\/]*.)\/([^\/]*)/g;
var match = regex.exec(url);
console.log(match[1]);

Regex - Match parentheses without matching their contents

I just need to match parentheses around some content that has to match specific criteria. I need to match only the parentheses so that I can then do a quick replacement of only those parentheses and keep their content.
For the moment, what I have matches those specific parentheses, but unfortunately also their contents: \((?:\d{2,7})\)
The criteria for matching parentheses are as following:
only match parentheses that contain \d{2,7}
I have tried positive lookahead (\((?=\d{2,7})\)), and while it does indeed not consume whatever follows the open parenthesis, it then fails to match the closing parenthesis as it backtracks to before the content...
So yeah, any help would be appreciated :)
Pure RegEx pattern: \((?=\d{2,7}\))|(?<=\()\d{2,7}\K\)
Update: I don't know about Swift, but according to this documentation, Template Matching Format part, $n can also be used similarly, as in
let myString = "(32) 123-323-2323"
let regex = try! NSRegularExpression(pattern: "\\((\\d{2,7})\\)")
let range = NSMakeRange(0, myString.characters.count)
regex.stringByReplacingMatchesInString(myString,
options: [],
range: range,
withTemplate: "$1")
With the assumption that you are using Java, I would suggest something as simple as
str.replaceAll("\\((\\d{2,7})\\)", "$1")
The pattern \((\d{2,7})\) captures the whole expression with parantheses with the number in group 1 and replaces it with only the number inside, thus effectively removing the surrounding brackets.
The regex can be \((\d{2,7})\). It will match all pairing parenthesis with content and the content is accessible via parameter 1 and can be added to string which replace the parenthesis.
How to access results of regex is language specific, I think.
EDIT:
Here is code which can work. It's untested and I have to warn you at first:
This is my first experience with Swift and online sandbox which I found couldn't compile it. But it couldn't compile examples from Apple website, either...
import Foundation
let text = "some input 22 with (65498) numbers (8643)) and 63546 (parenthesis)"
let regex = try NSRegularExpression(pattern: "\\((\\d{2,7})\\)", options: [])
let replacedStr = regex.stringByReplacingMatchesInString(text,
options: [],
range: NSRange(location: 0, length: text.characters.count),
withTemplate: "$1")
Are you okay with removing all parenthesis regardless?: [()] done.
Apparently you've said that's not okay, though that wasn't clear at the time of the question first being asked.
So, then try capturing the number part and using it as the substitution of the match. Like: in.replaceAll("\\((\\d{2,7})\\)","$1").
To put this very plainly so that any regular expression system can use it:
Match:\(([0-9]{2,7})\) means a ( a subgroup of 2 to 7 digits and a )
Substitute each match with: a reference to that match's first subgroup capture. The digits.
You can see this operating as you asked on the input:
Given some (unspecified) input that contains (1) (12) (123) etc to (1234567) and (12345678) we munge it to strip (some) parentheses.
If you follow this fiddle.re link, and press the green button.
Or with more automatic-explanation at this regex101.com link.
Or what about replacing each match with a substring of the matched content, so you don't even need a subgroup, since you never want the first or last character of the match. Like:
Pattern p = Pattern.compile("\\(\d{2,7}\\)");
Matcher m = p.matcher(mysteryInputSource);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, mysterInputSource.substring(m.start()+1, m.end));
}
m.appendTail(sb);
System.out.println(sb.toString());
Since you're not using Java, try translating any of the above suggestions to your language.

Regular expression to find specific text within a string enclosed in two strings, but not the entire string

I have this type of text:
string1_dog_bit_johny_bit_string2
string1_cat_bit_johny_bit_string2
string1_crocodile_bit_johny_bit_string2
string3_crocodile_bit_johny_bit_string4
string4_crocodile_bit_johny_bit_string5
I want to find all occurrences of “bit” that occur only between string1 and string2. How do I do this with regex?
I found the question Regex Match all characters between two strings, but the regex there matches the entire string between string1 and string2, whereas I want to match just parts of that string.
I am doing a global replacement in Notepad++. I just need regex, code will not work.
Thank you in advance.
Roman
If I understand correctly here a code to do what you want
var intput = new List<string>
{
"string1_dog_bit_johny_bit_string2",
"string1_cat_bit_johny_bit_string2",
"string1_crocodile_bit_johny_bit_string2",
"string3_crocodile_bit_johny_bit_string4",
"string4_crocodile_bit_johny_bit_string5"
};
Regex regex = new Regex(#"(?<bitGroup>bit)");
var allMatches = new List<string>();
foreach (var str in intput)
{
if (str.StartsWith("string1") && str.EndsWith("string2"))
{
var matchCollection = regex.Matches(str);
allMatches.AddRange(matchCollection.Cast<Match>().Select(match => match.Groups["bitGroup"].Value));
}
}
Console.WriteLine("All matches {0}", allMatches.Count);
This regex will do the job:
^string1_(?:.*(bit))+.*_string2$
^ means the start of the text (or line if you use the m option like so: /<regex>/m )
$ means the end of the text
. means any character
* means the previous character/expression is repeated 0 or more times
(?:<stuff>) means a non-capturing group (<stuff> won't be captured as a result of the matching)
You could use ^string1_(.*(bit).*)*_string2$ if you don't care about performance or don't have large/many strings to check. The outer parenthesis allow multiple occurences of "bit".
If you provide us with the language you want to use, we could give more specific solutions.
edit: As you added that you're trying a replacement in Notepad++ I propose the following:
Use (?<=string1_)(.*)bit(.*)(?=_string2) as regex and $1xyz$2 as replacement pattern (replace xyz with your string). Then perform an "replace all" operation until N++ doesn't find any more matches. The problem here is that this regex will only match 1 bit per line per iteration - and therefore needs to be applied repeatedly.
Btw. even if a regexp matches the whole line, you can still only replace parts of it using capturing groups.
You can use the regex:
(?:string1|\G)(?:(?!string2).)*?\Kbit
regex101 demo. Tried it on notepad++ as well and it's working.
There're description in the demo site, but if you want more explanations, let me know and I'll elaborate!