delete all text after the nth occurance of a whitespace - regex

I used this snippet to make line breaks inside text. how can I delete the text behind the nth whitespace completely?
var str:String = ("This is just a test string").replace(/(( [^ ]+){2}) /, "$1\n");
regards

This works using regex (([^ ]* ){2}).* and replace pattern $1:
function removeAfterNthSpace() {
var nth = parseInt($("#num").val());
var regEx = new RegExp("(([^ ]* ){" + nth + "}).*", "g")
var str = ("This is just a test string").replace(regEx, "$1");
console.log(str);
}
$('#num').change(removeAfterNthSpace);
removeAfterNthSpace();
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="number" id="num" value="2" />
See it working on regex101.

Related

Why is My Regex Code not working on DART?

I fetched data from a web sites body.Then I write a regular expression and applied on DART but it didnt work.What is the Problem?
Here is the Regex code:
</td><td align="left">(.*?)</td><td class="dataGridActive
Here is my part of the content:
</tr><tr onmouseover="mover(this);" onmouseout="mout(this);" style="background-color:White;">
<td align="left">233</td><td align="left">ÖMER EFE CIKIT</td><td class="dataGridActive" align="center">
And the dart code:
void CheckRE(String text) {
final RegExp pattern = RegExp(
r'</td><td align="left">(.*?)</td><td class="dataGridActive"',
multiLine: true,
caseSensitive: true,
); // 800 is the size of each chun
pattern
.allMatches(text)
.forEach((RegExpMatch match) => print(match.group(1)));
}
I think what you want is the following.
I have changed your output so it prints the content of capture group 1 instead of capture group 0. Capture group 0 contains the whole string which matches while 1 and up contains the content of each defined capture group in your regular expression.
const input = '''
</tr><tr onmouseover="mover(this);" onmouseout="mout(this);" style="background-color:White;">
<td align="left">233</td><td align="left">ÖMER EFE CIKIT</td><td class="dataGridActive" align="center">
''';
void main() => checkRE(input); // ÖMER EFE CIKIT
void checkRE(String text) {
final RegExp pattern = RegExp(
r'</td><td align="left">(.*?)</td><td class="dataGridActive"',
multiLine: true,
caseSensitive: true,
); // 800 is the size of each chun
pattern.allMatches(text).forEach((RegExpMatch match) => print(match[1]));
}
Also changed (.*) to (.*?) based on advice from #MikeM.

How to assert that a text ends with digits in protractor

I would like to assert in Protractor that a link text is composed by the following way: text-1 (where text is a variable, and the number can be composed by any digits).
I tried the following:
browser.wait(
ExpectedConditions.visibilityOf(
element(by.xpath(`//a[#class = 'collapsed' and starts-with(text(), '${text}') and ends-with(text(), '-(/d+)')]`))),
5000)
and
browser.wait(
ExpectedConditions.visibilityOf(
element(by.xpath(`//a[#class = 'collapsed' and starts-with(text(), '${text}') and ends-with(text(), '/^-(/d+)$/')]`))),
5000)
Unfortunately, none of the above xpaths worked.
How can I fix this?
If you change the way to declare the variable and your second predicate you can go with :
//a[#class='collapsed'][starts-with(text(),'" + text_variable + "')][number(replace(.,'^.*-(\d+)$','$1'))*0=0]
where [number(replace(.,'^.*-(\d+)$','$1'))*0=0] test for the presence of a number at the end of a string.
Example. If you have :
<p>
<a class="collapsed">foofighters-200</a>
<a class="collapsed">foofighters</a>
<a class="collapsed">boofighters-200</a>
<a class="collapsed">boofighters-200abc</a>
</p>
The following XPath :
//a[#class='collapsed'][starts-with(text(),'foo')][number(replace(.,'^.*-(\d+)$','$1'))*0=0]
will output 1 element :
<a class="collapsed">foofighters-200</a>
So in Protractor you could have :
var text = "foo";
browser.wait(ExpectedConditions.visibilityOf(element(by.xpath("//a[#class='collapsed'][starts-with(text(),'" + text + "')][number(replace(.,'^.*-(\d+)$','$1'))*0=0]"))), 5000);
...
You can use regexp for this:
await browser.wait(async () => {
return new RegExp('^.*(\d+)').test(await $('a.collapsed').getText());
}, 20000, 'Expected link text to contain number at the end');
Tune this regex here if needed:
https://regex101.com/r/9d9yaJ/1

Regex for characters in specific location in string

Using notepad++, how can I replace the -s noted by the carats? The dashes I want to replace occurs every 7th character in the string.
11.871-2-2.737-2.00334-2
^ ^ ^
123456781234567812345678
It's pretty simple since it's only dashes:
(\S*?)-
Begin capture group.............................. (
Find any number of non-space chars... \S*
Lazily until...............................................?
End capture group...................................)
No capture find hyphen...........................-
Demo 1
var str = `11.871-2-2.737-2.00334-2`;
var sub = `$1`;
var rgx = /(\S*?)-/g;
var res = str.replace(rgx, sub);
console.log(res);
"There is a dash (right above 1) that I would like to preserve. This seems to get rid of all the dashes in the string"
The question clearly shows that there isn't a dash at the "1 position", but since there's a possibility that it's possible considering the pattern (n7). Don't have time to break it down, but I can refer you to a proper definition of the meta char \b.
Demo 2
var str = `-11.871-2-2.737-2.00334-2`;
var sub = `$1$2`;
var rgx = /\b[-]{1}(\S*?)-(\S*?)\b/g;
var res = str.replace(rgx, sub);
console.log(res);
Search for ([0-9\.-]{6,6})-
Replace with: $1MY_SEPARATOR

RegEX style for HTML code

Hey all, what would the regEX code be for the following:
<br/><span class=""synopsis-view-synopsis"">America's justice system comes under indictment in director <a href='/people/1035' class='actor' style='font-weight:bold'>Norman Jewison</a>'s trenchant film starring <a href='/people/1028' class='actor' style='font-weight:bold'>Al Pacino</a> as upstanding attorney Arthur Kirkland. A hard-line -- and tainted -- judge (<a href='/people/1034' class='actor' style='font-weight:bold'>John Forsythe</a>) stands accused of rape, and Kirkland (<a href='/people/1028' class='actor' style='font-weight:bold'>Al Pacino</a>) has to defend him. Kirkland has a history with the judge, who jailed one of the lawyer's clients on a technicality. When the judge confesses his guilt, Kirkland faces an ethical and legal quandary. </span>
Ive tried this:
regex = New System.Text.RegularExpressions.Regex("(?<=""synopsis-view-synopsis""\>)([^<\/span><]+)")
But that only seems to get the first part of the description; Americ
Any help would be great! :o)
David
I don't see any need for lookaheads or lookbehinds here; just match the whole <span> element and use a capturing group extract its content. Assuming there will never be any <span> elements inside the one you're matching, this should be all you need:
Regex rgx = new Regex(
#"<span\s+class=""synopsis-view-synopsis"">(.*?)</span>",
RegexOptions.IgnoreCase | RegexOptions.Singleline);
foreach (Match m in rgx.Matches(s0))
{
Console.WriteLine(m.Groups[1].Value);
}
Also, [^<\/span><]+ doesn't do what you probably think it does. What you've got there is a character class that matches any one character except <, /, s, p, a, n, or >. You may have been trying for this:
(?:(?!</span>).)+
...which matches one character at a time, after the lookahead confirms that the character isn't the beginning of the sequence </span>. It's a valid technique, but (as with the lookarounds) I don't think you need anything so fancy here.
(?=""synopsis-view-synopsis""\>).+(?!<\/span>)
Should probably work. Try using an HTML parser instead!
in .net there are different methods for "match" and "matches all" these are:
re.Match(str); // regex 're' match in string 'str'
re.Matches(str) // regex 're' matches all in string 'str'
update
Explain to regex
(?<=regex) is positive lookbehind
(?!regex) is a negativ lookahead
.+ finally matches anything between the lookaround
Raw Match Pattern:
(?<=""synopsis-view-synopsis""\>).+(?!</span>)
C#.NET Code Example:
using System;
using System.Text.RegularExpressions;
namespace myapp
{
class Class1
{
static void Main(string[] args)
{
String sourcestring =
"<br/><span class=""synopsis-view-synopsis"">America's justice... </span>
<br/><span class=""synopsis-view-synopsis"">Canada's justice... </span>";
Regex re = new Regex(#"(?<=""""synopsis-view-synopsis""""\>).+(?!</span>)");
MatchCollection mc = re.Matches(sourcestring);
int mIdx=0;
foreach (Match m in mc)
{
for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
{
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
}
mIdx++;
}
}
}
}
Matches Found:
[0][0] = America's justice... </span>
[1][0] = Canada's justice... </span>

Regular expression to match word pairs joined with colons

I don't know regular expression at all. Can anybody help me with one very simple regular expression which is,
extracting 'word:word' from a sentence. e.g "Java Tutorial Format:Pdf With Location:Tokyo Javascript"?
Little modification:
the first 'word' is from a list but second is anything. "word1 in [ABC, FGR, HTY]"
guys situation demands a little more
modification.
The matching form can be "word11:word12 word13 .. " till the next "word21: ... " .
things are becoming complex with sec.....i have to learn reg ex :(
thanks in advance.
You can use the regex:
\w+:\w+
Explanation:
\w - single char which is either a letter(uppercase or lowercase), digit or a _.
\w+ - one or more of above char..basically a word
so \w+:\w+
would match a pair of words separated by a colon.
Try \b(\S+?):(\S+?)\b. Group 1 will capture "Format" and group 2, "Pdf".
A working example:
<html>
<head>
<script type="text/javascript">
function test() {
var re = /\b(\S+?):(\S+?)\b/g; // without 'g' matches only the first
var text = "Java Tutorial Format:Pdf With Location:Tokyo Javascript";
var match = null;
while ( (match = re.exec(text)) != null) {
alert(match[1] + " -- " + match[2]);
}
}
</script>
</head>
<body onload="test();">
</body>
</html>
A good reference for regexes is https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp
Use this snippet :
$str=" this is pavun:kumar hello world bk:systesm" ;
if ( preg_match_all ( '/(\w+\:\w+)/',$str ,$val ) )
{
print_r ( $val ) ;
}
else
{
print "Not matched \n";
}
Continuing Jaú's function with your additional requirement:
function test() {
var words = ['Format', 'Location', 'Size'],
text = "Java Tutorial Format:Pdf With Location:Tokyo Language:Javascript",
match = null;
var re = new RegExp( '(' + words.join('|') + '):(\\w+)', 'g');
while ( (match = re.exec(text)) != null) {
alert(match[1] + " = " + match[2]);
}
}
I am currently solving that problem in my nodejs app and found that this is, what I guess, suitable for colon-paired wordings:
([\w]+:)("(([^"])*)"|'(([^'])*)'|(([^\s])*))
It also matches quoted value. like a:"b" c:'d e' f:g
Example coding in es6:
const regex = /([\w]+:)("(([^"])*)"|'(([^'])*)'|(([^\s])*))/g;
const str = `category:"live casino" gsp:S1aik-UBnl aa:"b" c:'d e' f:g`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Example coding in PHP
$re = '/([\w]+:)("(([^"])*)"|\'(([^\'])*)\'|(([^\s])*))/';
$str = 'category:"live casino" gsp:S1aik-UBnl aa:"b" c:\'d e\' f:g';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
You can check/test your regex expressions using this online tool: https://regex101.com
Btw, if not deleted by regex101.com, you can browse that example coding here
here's the non regex way, in your favourite language, split on white spaces, go through the element, check for ":" , print them if found. Eg Python
>>> s="Java Tutorial Format:Pdf With Location:Tokyo Javascript"
>>> for i in s.split():
... if ":" in i:
... print i
...
Format:Pdf
Location:Tokyo
You can do further checks to make sure its really "someword:someword" by splitting again on ":" and checking if there are 2 elements in the splitted list. eg
>>> for i in s.split():
... if ":" in i:
... a=i.split(":")
... if len(a) == 2:
... print i
...
Format:Pdf
Location:Tokyo
([^:]+):(.+)
Meaning: (everything except : one or more times), :, (any character one ore more time)
You'll find good manuals on the net... Maybe it's time for you to learn...