highlighting phrase or words problem - regex

function highlight_phrase($str, $phrase, $class='highlight')
{
if ($str == '')
{
return '';
}
if ($phrase != '')
{
return preg_replace('/('.preg_quote($phrase, '/').')/Ui', '<span class="'.$class.'">'."\\1".'</span>', $str);
}
return $str;
}
above code is what i use to highlight phrases in a string. I have problem with following issues:
if phrase is new car it matches new car and new cars both in a string meaning it highlights new car of new cars but i need not highlight new cars.
I could check for space but what if phrase ends with ,.? or ! etc.

Use the \b pattern to match word boundaries, i.e. in your case /\b(new car)\b/ will match
"the new car is blue"
"the new car."
"new car"
but not
"all the new cars".

Add (?!\w) to the regex. This will cause it to only match when the phrase is followed by a non-word character [^a-zA-Z0-9_].
return preg_replace('/('.preg_quote($phrase, '/')(?!\w)')/Ui', '<span class="'.$class.'">'."\\1".'</span>', $str);

Related

Search through whole line and change words with ize to ise using regex in Notepad++

I want to search all the words in a line/sentence and detect any word with ize and convert it to ise except for certain words listed.
Find: ^(?!size)(?!resize)(?!Belize)(?!Bizet)(?!Brize)(?!Pfizer)(?!assize)(?!baize)(?!bedizen)(?!citizen)(?!denizen)(?!filesize)(?!maize)(?!prize)(?!netizen)(?!seize)(?!wizen)(?!outsize)(?!oversize)(?!misprize)(?!supersize)(?!undersize)(?!unsized)(?!upsize)([a-zA-Z-\s]+)ize
Replace: $1ise
So far all i get is the first word of the line with ize to work, or the last word with ize to work.
Example Organize to socialize whatever size.
To Organise to socialise whatever size.
Find (?i)(?!size|resize|Belize|so&so|unsized|upsize)(?<!\w)(\w+)ize
Replace $1ise
worked as intended. Capitalisation issues added (?i)
The regex ([a-zA-Z-\s]+)ize has the whitespace marker in it (\s) so it will will match anything beyond the word boundary. You might want to work with \w and/or \b to match only characters from the word where the "ize" is located. Additionally, you don't want the ^ at the beginning since this would match the start of the string.
Possible regex: (?!....your list....)(\w+)ize
Example input: "Organize to socialize whatever size."
Found matches: "Organize" and "socialize", but not "size", see https://regex101.com/r/UIfoa8/1
After that you can use your replacement $1ise to replace the found string with the captured group and "ise".
Make a Whitelist Array
Make the excluded words (whitelist) an array of strings
.split(' ') the text being searched through (searchStr) into an array
then .map() through each word of the array
using .indexOf() to compare a word vs. the whitelist
using .test() to see if it's a x+"ize" word to .replace()
Once the searchArray is complete, .join() it into a string (resultString).
Demo
"organize", "mesmerized", "socialize", and "baptize" was mixed into the search string of some whitelist words
var searchStr = `organize Belize Bizet mesmerized Brize Pfizer assize baize bedizen citizen denizen filesize socialize maize prize netizen seize wizen outsize baptize`;
var whitelist = ["size", "resize", "Belize", "Bizet", "Brize", "Pfizer", "assize", "baize", "bedizen", "citizen", "denizen", "filesize", "maize", "prize", "netizen", "seize", "wizen", "outsize", "oversize", "misprize", "supersize", "undersize", "unsized", "upsize"];
var searchArray = searchStr.split(' ').map(function(word) {
var match;
if (whitelist.indexOf(word) !== -1) {
match = word;
} else if (/([a-z]+?)ize/i.test(word)) {
match = word.replace(/([a-z]+?)ize/i, '$1ise');
} else {
match = word;
}
return match;
});
var resultString = searchArray.join(', ');
console.log(resultString);

regular expression - replace a word by a space

I have this line and now wants to replace not only dots and underline by a space. Now I also would replace the word "German" (without the quotes) by a blank line.
Can anybody help ?
preg_replace('/\(.*?\)|\.|_/i', ' ',
best regs
Edit:
public function parseMovieName($releasename)
{
$cat = new Category;
if (!$cat->isMovieForeign($releasename))
{
preg_match('/^(?P<name>.*)[\.\-_\( ](?P<year>19\d{2}|20\d{2})/i', $releasename, $matches);
if (!isset($matches['year']))
preg_match('/^(?P<name>.*)[\.\-_ ](?:dvdrip|bdrip|brrip|bluray|hdtv|divx|xvid|proper|repack|real\.proper|sub\.?fix|sub\.?pack|ac3d|unrated|1080i|1080p|720p|810p)/i', $releasename, $matches);
if (isset($matches['name']))
{
$name = preg_replace('/\(.*?\)|\.|_/i', ' ', $matches['name']);
$year = (isset($matches['year'])) ? ' ('.$matches['year'].')' : '';
return trim($name).$year;
}
}
return false;
}
The string is for example "movieName German 2015" but the output should be "movieName 2015" (without the quotes)
Solved:
Change now the line preg_replace('/\(.*?\)|\.|_/i', ' ', $matches['name']); to $name = preg_replace('/\h*\bGerman\b|\([^()]*\)|[._]/', ' ', $matches['name']);
Thanks # Wiktor Stribiżew
To add an alternative to an alternation group, you just need to use
$name = preg_replace('/\h*\bGerman\b|\([^()]*\)|[._]/', ' ', $matches['name']);
^^^^^^ ^^^^^^^
Note that \h matches horizontal whitespace only (no linebreaks), if you need linebreaks, use \s.
The \h*\bGerman\b matches zero or more spaces followed by a whole word "German" (as \b is a word boundary, no "Germanic" word will be matched).
Also, (\.|_) is equal to [._] in the result this pattern matches, but a character class [...] is much more efficient when matching single symbols.

Regular Expressions that is true if the word "bad" is not in the string

I am parsing a feed and need to exclude fields that consist of a string with the word "bad", in any combination of case.
For example "bad" or "Bad id" or "user has bAd id" would not pass the regular expression test,
but "xxx Badlands ddd" or "aaabad" would pass.
Exclude anything that matches /\bbad\b/i
The \b matches word boundaries and the i modifier makes it case insensitive.
For javascript, you can just put your word in the regex and do the match \b stnads for boundries, which means no character connected :
/\bbad\b/i.test("Badkamer") // i for case-insensitive
You may try this regex:
^(.*?(\bbad\b)[^$]*)$
REGEX DEMO
I think the easiest way to do this would be to split the string into words, then check each word for a match, It could be done with a function like this:
private bool containsWord(string searchstring, string matchstring)
{
bool hasWord = false;
string[] words = searchstring.split(new Char[] { ' ' });
foreach (string word in words)
{
if (word.ToLower() == matchstring.ToLower())
hasWord = true;
}
return hasWord;
}
The code converts everything to lowercase to ignore any case mismatches. I think you can also use RegEx for this:
static bool ExactMatch(string input, string match)
{
return Regex.IsMatch(input.ToLower(), string.Format(#"\b{0}\b", Regex.Escape(match.ToLower())));
}
\b is a word boundary character, as I understand it.
These examples are in C#. You didn't specify the language

RegEx and split camelCase

I want to get an array of all the words with capital letters that are included in the string. But only if the line begins with "set".
For example:
- string "setUserId", result array("User", "Id")
- string "getUserId", result false
Without limitation about "set" RegEx look like /([A-Z][a-z]+)/
$str ='setUserId';
$rep_str = preg_replace('/^set/','',$str);
if($str != $rep_str) {
$array = preg_split('/(?<=[a-z])(?=[A-Z])/',$rep_str);
var_dump($array);
}
See it
Also your regex will also work.:
$str = 'setUserId';
if(preg_match('/^set/',$str) && preg_match_all('/([A-Z][a-z]*)/',$str,$match)) {
var_dump($match[1]);
}
See it

Regular expression to match word pairs joined with colons

I don't know regular expression at all. Can anybody help me with one very simple regular expression which is,
extracting 'word:word' from a sentence. e.g "Java Tutorial Format:Pdf With Location:Tokyo Javascript"?
Little modification:
the first 'word' is from a list but second is anything. "word1 in [ABC, FGR, HTY]"
guys situation demands a little more
modification.
The matching form can be "word11:word12 word13 .. " till the next "word21: ... " .
things are becoming complex with sec.....i have to learn reg ex :(
thanks in advance.
You can use the regex:
\w+:\w+
Explanation:
\w - single char which is either a letter(uppercase or lowercase), digit or a _.
\w+ - one or more of above char..basically a word
so \w+:\w+
would match a pair of words separated by a colon.
Try \b(\S+?):(\S+?)\b. Group 1 will capture "Format" and group 2, "Pdf".
A working example:
<html>
<head>
<script type="text/javascript">
function test() {
var re = /\b(\S+?):(\S+?)\b/g; // without 'g' matches only the first
var text = "Java Tutorial Format:Pdf With Location:Tokyo Javascript";
var match = null;
while ( (match = re.exec(text)) != null) {
alert(match[1] + " -- " + match[2]);
}
}
</script>
</head>
<body onload="test();">
</body>
</html>
A good reference for regexes is https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp
Use this snippet :
$str=" this is pavun:kumar hello world bk:systesm" ;
if ( preg_match_all ( '/(\w+\:\w+)/',$str ,$val ) )
{
print_r ( $val ) ;
}
else
{
print "Not matched \n";
}
Continuing Jaú's function with your additional requirement:
function test() {
var words = ['Format', 'Location', 'Size'],
text = "Java Tutorial Format:Pdf With Location:Tokyo Language:Javascript",
match = null;
var re = new RegExp( '(' + words.join('|') + '):(\\w+)', 'g');
while ( (match = re.exec(text)) != null) {
alert(match[1] + " = " + match[2]);
}
}
I am currently solving that problem in my nodejs app and found that this is, what I guess, suitable for colon-paired wordings:
([\w]+:)("(([^"])*)"|'(([^'])*)'|(([^\s])*))
It also matches quoted value. like a:"b" c:'d e' f:g
Example coding in es6:
const regex = /([\w]+:)("(([^"])*)"|'(([^'])*)'|(([^\s])*))/g;
const str = `category:"live casino" gsp:S1aik-UBnl aa:"b" c:'d e' f:g`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Example coding in PHP
$re = '/([\w]+:)("(([^"])*)"|\'(([^\'])*)\'|(([^\s])*))/';
$str = 'category:"live casino" gsp:S1aik-UBnl aa:"b" c:\'d e\' f:g';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
You can check/test your regex expressions using this online tool: https://regex101.com
Btw, if not deleted by regex101.com, you can browse that example coding here
here's the non regex way, in your favourite language, split on white spaces, go through the element, check for ":" , print them if found. Eg Python
>>> s="Java Tutorial Format:Pdf With Location:Tokyo Javascript"
>>> for i in s.split():
... if ":" in i:
... print i
...
Format:Pdf
Location:Tokyo
You can do further checks to make sure its really "someword:someword" by splitting again on ":" and checking if there are 2 elements in the splitted list. eg
>>> for i in s.split():
... if ":" in i:
... a=i.split(":")
... if len(a) == 2:
... print i
...
Format:Pdf
Location:Tokyo
([^:]+):(.+)
Meaning: (everything except : one or more times), :, (any character one ore more time)
You'll find good manuals on the net... Maybe it's time for you to learn...