How to use TokenSequencePattern - regex

I'm just getting started with CoreNLP's TokenSequencePattern and I can't get simple matches to work. All im trying to do is to match a token from the input text. The code below executes without errors but doesn't match anything. However, if u change the match expression to [] then it matches the two sentences.
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation("This is sent 1. And here is sent 2");
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
Env env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
TokenSequencePattern pattern = TokenSequencePattern.compile(env,"[ { word:\"sent\" } ]");
TokenSequenceMatcher matcher = pattern.getMatcher(sentences);
while ( matcher.find() ) {
System.out.println( matcher.group() );
}
Thank you!

List<CoreLabel> tokens =
document.get(CoreAnnotations.TokensAnnotation.class);
TokenSequencePattern pattern= TokenSequencePattern.compile("[ {
word:\"sent\" } ]");
TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
while (matcher.find())
{
String matchedString = matcher.group();
List<CoreMap> matchedTokens = matcher.groupNodes();
System.out.println(matchedString + " " + matchedTokens);
}

Related

Target child pages with regex

I need an regex which target all child pages of a certain group of parent pages, but NOT the parent pages them selfes.
To be more specific, I need an expression which targets:
/categoryA/XXX
/categoryB/YYY
/categoryC/ZZZ
But I do not want to include
/categoryA/
/categoryB/
/categoryC/
All help much appreciated!
Gustav
Try this one:
\/(\w+)\/([a-zA-Z]+)
I am assuming that the strings after the categories use letters only.
Input:
/categoryA/XXX
/categoryB/YYY
/categoryC/ZZZ
/categoryA/
/categoryB/
/categoryC/
Matches:
/categoryA/XXX
/categoryB/YYY
/categoryC/ZZZ
This one
([^\/]+$)
targets everything after the last slash
You could use this in an if() statement to filter out what you need, if I understand the question correctly.
Or this one:
\/category[A-Z]\/(.*)
In C#
childpage = Regex.Match(target, "/category[A-Z]/(.*)").Groups[1].Value;
In JavaScript
var myregexp = /\/category[A-Z]\/(.*)/;
var match = myregexp.exec(target);
if (match != null) {
childpage = match[1];
} else {
childpage = "";
}
In PHP
if (preg_match('%/category[A-Z]/(.*)%', $target, $groups)) {
$childpage = $groups[1];
} else {
$childpage = "";
}
In PowerShell
if ($target -match '/category[A-Z]/(.*)') {
$childpage = $matches[1]
} else {
$childpage = ''
}
In Python
match = re.search("/category[A-Z]/(.*)", target)
if match:
childpage = match.group(1)
else:
childpage = ""

Typescript regex exclude whole string if followed by specific string

I'm been running into weird issues with regex and Typescript in which I'm trying to have my expression replace the value of test minus the first instance if followed by test. In other words, replace the first two lines that have test but for the third line below, replace only the second value of test.
[test]
[test].[db]
[test].[test]
Where it should look like:
[newvalue]
[newvalue].[db]
[test].[newvalue]
I've come up with lots of variations but this is the one that I thought was simple enough to solve it and regex101 can confirm this works:
\[(\w+)\](?!\.\[test\])
But when using Typescript (custom task in VSTS build), it actually replaces the values like this:
[newvalue]
[newvalue].[db]
[newvalue].[test]
Update: It looks like a regex like (test)(?!.test) breaks when changing the use cases removing the square brackets, which makes me think this might be somewhere in the code. Could the problem be with the index that the value is replaced at?
Some of the code in Typescript that is calling this:
var filePattern = tl.getInput("filePattern", true);
var tokenRegex = tl.getInput("tokenRegex", true);
for (var i = 0; i < files.length; i++) {
var file = files[i];
console.info(`Starting regex replacement in [${file}]`);
var contents = fs.readFileSync(file).toString();
var reg = new RegExp(tokenRegex, "g");
// loop through each match
var match: RegExpExecArray;
// keep a separate var for the contents so that the regex index doesn't get messed up
// by replacing items underneath it
var newContents = contents;
while((match = reg.exec(contents)) !== null) {
var vName = match[1];
// find the variable value in the environment
var vValue = tl.getVariable(vName);
if (typeof vValue === 'undefined') {
tl.warning(`Token [${vName}] does not have an environment value`);
} else {
newContents = newContents.replace(match[0], vValue);
console.info(`Replaced token [${vName }]`);
}
}
}
Full code is for the task I'm using this with: https://github.com/colindembovsky/cols-agent-tasks/blob/master/Tasks/ReplaceTokens/replaceTokens.ts
For me this regex is working like you are expecting:
\[(test)\](?!\.\[test\])
with a Typescript code like that
myString.replace(/\[(test)\](?!\.\[test\])/g, "[newvalue]");
Instead, the regex you are using should replace also the [db] part.
I've tried with this code:
class Greeter {
myString1: string;
myString2: string;
myString3: string;
greeting: string;
constructor(str1: string, str2: string, str3: string) {
this.myString1 = str1.replace(/\[(test)\](?!\.\[test\])/g, "[newvalue]");
this.myString2 = str2.replace(/\[(test)\](?!\.\[test\])/g, "[newvalue]");
this.myString3 = str3.replace(/\[(test)\](?!\.\[test\])/g, "[newvalue]");
this.greeting = this.myString1 + "\n" + this.myString2 + "\n" + this.myString3;
}
greet() {
return "Hello, these are your replacements:\n" + this.greeting;
}
}
let greeter = new Greeter("[test]", "[test].[db]", "[test].[test]");
let button = document.createElement('button');
button.textContent = "Say Hello";
button.onclick = function() {
alert(greeter.greet());
}
document.body.appendChild(button);
Online playground here.

How to remove asterisk from this spin syntax code?

here is my code it is a text spinner (synonym)
public function fetchContent($keyword)
{
$customContent = $this->getOption('custom_content_text');
$this->_setHttpStatusCode(200);
if (!$customContent)
{
$this->_setContentStatus(self::CONTENT_STATUS_NO_RESULTS);
return false;
}
if (preg_match_all('/({\*)(.*?)(\*})/', $customContent, $result))
{
if (is_array($result[0]))
{
foreach ($result[0] as $index => $group_string)
{
//replace the first or next pattern match with a replaceable token
$customContent = preg_replace('/(\{\*)(.*?)(\*\})/', '{#'.$index.'#}', $customContent, 1);
$words = explode('|', $result[2][$index]);
//clean and trim all words
$finalPhrase = array();
foreach ($words as $word)
{
if (preg_match('/\S/', $word))
{
$word = preg_replace('/{%keyword%}/i', $keyword, $word);
$finalPhrase[] = trim($word);
}
}
$finalPhrase = $finalPhrase[rand(0, count($finalPhrase) - 1)];
//now inject it back to where the token was
$customContent = str_ireplace('{#' . $index . '#}', $finalPhrase, $customContent);
}
$this->_setContentStatus(self::CONTENT_STATUS_PASSED);
}
}
return $customContent;
}
}
there is regex that request bracket like this
{*spin1|spin2|spin3*}
here is the regex from the snippet above
if (preg_match_all('/({\*)(.*?)(\*})/', $customContent, $result))
$customContent = preg_replace('/(\{\*)(.*?)(\*\})/', '{#'.$index.'#}', $customContent, 1);
i would like to remove the * to format allow just {spin1|spin2|spin3} wich is more compatible with most spinner ,
i tried with some regex that i find online
i tried to remove the * from both regex without result
thanks you very much for your help
Remove \* instead of just * – Lucas Trzesniewski

c# and regular expression

I want to get 100 and example from this string
?connect:100/username:example/
I searched in google but cannot find some useful regex patterns form my solution
Please help
try {
Regex RegexObj = new Regex(":(?<Number>\\d+)/.+?:(?<Text>\\w+)/");
Match MatchResults = RegexObj.Match(SubjectString);
while (MatchResults.Success) {
for (int i = 1; i < MatchResults.Groups.Count; i++) {
Group GroupObj = MatchResults.Groups[i];
if (GroupObj.Success) {
}
}
MatchResults = MatchResults.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
This is the regex:
\?connect:([0-9]+)/username:([^/]*)/
You don't need to use a regex for this, use Linq:
var url = "?connect:100/username:example/";
var data = url.Substring(1, url.Length-2).Split('/')
.Select(x => x.Split(':'))
.ToDictionary(x => x[0], x => x[1]);
Console.WriteLine(data["connect"]); // 100
Console.WriteLine(data["username"]); // example
You could remove the SubString(1, url.Length-2) call if you got the string back without the starting ? and trailing /.

GWT Regex and empty string

Could someone explain why this snip :
// import com.google.gwt.regexp.shared.MatchResult;
// import com.google.gwt.regexp.shared.RegExp;
RegExp regExp = RegExp.compile("^$");
MatchResult matcher;
while ((matcher = regExp.exec("")) != null)
{
System.out.println("match " + matcher);
}
give an incredible count of matches? I tested with different modifier allowed by GWT implementation of compile(), g, i and m. It works only with m (multiline).
I just want to check for empty string.
[EDIT] the new method
private ArrayList<MatchResult> getMatches(String input, String pattern)
{
ArrayList<MatchResult> matches = new ArrayList<MatchResult>();
if(null == regExp)
{
regExp = RegExp.compile(pattern, "g");
}
if(input.isEmpty())
{
// empty string : just check if pattern validate and
// don't try to extract matches : it will resutl in infinite
// loop.
if(regExp.test(input))
{
matches.add(new MatchResult(0, "", new ArrayList<String>(0)));
}
}
else
{
for(MatchResult matcher = regExp.exec(input); matcher != null; matcher = regExp
.exec(input))
{
matches.add(matcher);
}
}
return matches;
}
Your regExp.exec("") with RegExp.compile("^$") will never return null, as the empty string "" is a match for regex ^$, which reads "nothing between beginning and the end of line/string".
So your while is infinity loop.
Also, you print is
System.out.println("match " + matcher);
...but you probably wanted to use
System.out.println("match " + matcher.getGroup(0));
Also see GWT checking if textbox is empty.