Using $groups to define TokenRegex rules - regex

I am trying to use TokenRegex to match pattern in my data but have been getting errors in the regex expression I am using. What is the right regex format to match $group followed by numbers. E.g my data may contain JIRA bug ticket number JSON-123 or SBP-32 etc. I would also like to extract some keywords associated with each ticket e.g authentication failure or NullPointer Exception etc. What tool can I use in conjunction with TokenRegex to be able to extract these keywords as well. I looked at bootstrapped learning but am having a hard time implementing it. Any help would be greatly appreciated.
List<CoreMap> sentences = annotation.get(SentencesAnnotation.class);
List<CoreLabel> tokens = new ArrayList<CoreLabel>();
for (CoreMap sentence : sentences) {
// **using TokensRegex**
for (CoreLabel token : sentence.get(TokensAnnotation.class))
tokens.add(token);
String $PROJECTID = "/JSON|JPA|SBP/";
try {
TokenSequencePattern p1 = TokenSequencePattern
.compile('('+$PROJECTID+'\\-\\d+)');
TokenSequenceMatcher matcher = p1.getMatcher(tokens);
while (matcher.find()) {
System.out.println(matcher);
matcheData.append(matcher);
}
} catch (Exception e) {
e.printStackTrace();
}

Related

Custom vallidator to ban a specific wordlist

I need a custom validator to ban a specific list of banned words from a textarea field.
I need exactly this type of implementation, I know that it's not logically correct to let the user type part of a query but it's exactly what I need.
I tried with a regExp but it has a strange behaviour.
My RegExp
/(drop|update|truncate|delete|;|alter|insert)+./gi
my Validator
export function forbiddenWordsValidator(sqlRe: RegExp): ValidatorFn {
return (control: AbstractControl): { [key: string]: any } | null => {
const forbidden = sqlRe.test(control.value);
return forbidden ? { forbiddenSql: { value: control.value } } : null;
};
}
my formControl:
whereCondition: new FormControl("", [
Validators.required,
forbiddenWordsValidator(this.BAN_SQL_KEYWORDS)...
It works only in certain cases and I don't understand why does the same string works one time and doesn't work if i delete a char and rewrite it or sometimes if i type a whitespace the validator returns ok.
There are several issues here:
The global g modifier leads to unexpected alternated results when used in RegExp#test and similar methods that move the regex index after a valid match, it must be removed
. at the end requires any 1 char other than line break char, hence it must be removed.
Use
/drop|update|truncate|delete|;|alter|insert/i
Or, to match the words as whole words use
/\b(?:drop|update|truncate|delete|alter|insert)\b|;/i
This way, insert in insertion and drop in dropout won't get "caught" (=matched).
See the regex demo.
it's not a great idea to give such power to the user

Spring-Boot #RequestMapping and #PathVariable with regular expression matching

I'm attempting to use WebJars-Locator with a Spring-Boot application to map JAR resources. As per their website, I created a RequestMapping like this:
#ResponseBody
#RequestMapping(method = RequestMethod.GET, value = "/webjars-locator/{webjar}/{partialPath:.+}")
public ResponseEntity<ClassPathResource> locateWebjarAsset(#PathVariable String webjar, #PathVariable String partialPath)
{
The problem with this is that the partialPath variable is supposed to include anything after the third slash. What it ends up doing, however, is limiting the mapping itself. This URI is mapped correctly:
http://localhost/webjars-locator/angular-bootstrap-datetimepicker/datetimepicker.js
But this one is not mapped to the handler at all and simply returns a 404:
http://localhost/webjars-locator/datatables-plugins/integration/bootstrap/3/dataTables.bootstrap.css
The fundamental difference is simply the number of components in the path which should be handled by the regular expression (".+") but does not appear to be working when that portion has slashes.
If it helps, this is provided in the logs:
2015-03-03 23:03:53.588 INFO 15324 --- [ main] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/webjars-locator/{webjar}/{partialPath:.+}],methods=[GET],params=[],headers=[],consumes=[],produces=[],custom=[]}" onto public org.springframework.http.ResponseEntity app.controllers.WebJarsLocatorController.locateWebjarAsset(java.lang.String,java.lang.String)
2
Is there some type of hidden setting in Spring-Boot to enable regular expression pattern matching on RequestMappings?
The original code in the docs wasn't prepared for the extra slashes, sorry for that!
Please try this code instead:
#ResponseBody
#RequestMapping(value="/webjarslocator/{webjar}/**", method=RequestMethod.GET)
public ResponseEntity<Resource> locateWebjarAsset(#PathVariable String webjar,
WebRequest request) {
try {
String mvcPrefix = "/webjarslocator/" + webjar + "/";
String mvcPath = (String) request.getAttribute(
HandlerMapping.PATH_WITHIN_HANDLER_MAPPING_ATTRIBUTE, RequestAttributes.SCOPE_REQUEST);
String fullPath = assetLocator.getFullPath(webjar,
mvcPath.substring(mvcPrefix.length()));
ClassPathResource res = new ClassPathResource(fullPath);
long lastModified = res.lastModified();
if ((lastModified > 0) && request.checkNotModified(lastModified)) {
return null;
}
return new ResponseEntity<Resource>(res, HttpStatus.OK);
} catch (Exception e) {
return new ResponseEntity<>(HttpStatus.NOT_FOUND);
}
}
I will also provide an update for webjar docs shortly.
Updated 2015/08/05: Added If-Modified-Since handling
It appears that you cannot have a PathVariable to match "the remaining part of the url". You have to use ant-style path patterns, i.e. "**" as described here:
Spring 3 RequestMapping: Get path value
You can then get the entire URL of the request object and extract the "remaining part".

Regular expressions and Selenium WebDriver xpath

How can I fix this code to work?
public void check(WebDriver driver) {
driver.findElement(By.xpath("//a[matches(#href,'/staff/transcript/\\d{5}//.pdf')]")).click();
}
I must find a link where 5-digit indentifier varies.
Try to get href attribute
parse that string to get that 5 digit identifier
use that identifier and construct your locator and click.
String href=driver.findElement(By.xpath("//a[contains(#href,'/staff/transcript/')][contains(#href,'.pdf')]")).getAttribute("href");
String identifier=href.substring(href.lastIndexOf("/")+1,href.indexOf("."));
driver.findElement(By.xpath("//a[matches(#href,'/staff/transcript/"+identifier+"//.pdf')]")).click();
one possible solution to your problem:
using js iterate through all tags and find first which corresponds to your regex.
pubic String getLocatorByRegExp(){
JavascriptExecutor js = (JavascriptExecutor) driver;
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append("var regex = /^\d{5}$/");
stringBuilder.append("var x=document.getElementsByTagName('a');");
stringBuilder.append("for(var t = 0; t <x.length; t++){if(regex.test(parseInt(x[t].text()))) return x[t].text().toString();} ");
String res= (String) js.executeScript(stringBuilder.toString());
return res;
}
String properLinkText = getLocatorByRegExp();
driver.findElement(By.xpath(//a[contains(text(),properLinkText)])).click()
Quite complicated approach. But it seems to me that it is possible to find simplier solution.
Is it 5-digit indentifier unique on the page ( i mean only one element on the page ?)
If so, it is easy to find css locator or xpath to this element.
Provide please some piece of your html and point out element you need to click on.

Pulling multiple values from JSON response using RegEx Extractor

I'm testing a web service that returns JSON responses and I'd like to pull multiple values from the response. A typical response would contain multiple values in a list. For example:
{
"name":"#favorites",
"description":"Collection of my favorite places",
"list_id":4894636,
}
A response would contain many sections like the above example.
What I'd like to do in Jmeter is go through the JSON response and pull each section outlined above in a manner that I can tie the returned name and description as one entry to iterate over.
What I've been able to do thus far is return the name value with regular expression extractor ("name":"(.+?)") using the template $1$. I'd like to pull both name and description but can't seem to get it to work. I've tried using a regex "name":"(.+?)","description":"(.+?)" with a template of $1$$2$ without any success.
Does anyone know how I might pull multiple values using regex in this example?
You can just add (?s) to the regex to avoid line breaks.
E.g: (?s)"name":"(.+?)","description":"(.+?)"
It works for me on assertions.
It may be worth to use BeanShell scripting to process JSON response.
So if you need to get ALL the "name/description" pairs from response (for each section) you can do the following:
1. extract all the "name/description" pairs from response in loop;
2. save extracted pairs in csv-file in handy format;
3. read saved pairs from csv-file later in code - using CSV Data Set Config in loop, e.g.
JSON response processing can be implemented using BeanShell scripting (~ java) + any json-processing library (e.g. json-rpc-1.0):
- either in BeanShell Sampler or in BeanShell PostProcessor;
- all the required beanshell libs are currently provided in default
jmeter delivery;
- to use json-processing library place jar into JMETER_HOME/lib folder.
Schematically it will look like:
in case of BeanShell PostProcessor:
Thread Group
. . .
YOUR HTTP Request
BeanShell PostProcessor // added as child
. . .
in case of BeanShell Sampler:
Thread Group
. . .
YOUR HTTP Request
BeanShell Sampler // added separate sampler - after your
. . .
In this case there is no difference which one use.
You can either put the code itself into the sampler body ("Script" field) or store in external file, as shown below.
Sampler code:
import java.io.*;
import java.util.*;
import org.json.*;
import org.apache.jmeter.samplers.SampleResult;
ArrayList nodeRefs = new ArrayList();
ArrayList fileNames = new ArrayList();
String extractedList = "extracted.csv";
StringBuilder contents = new StringBuilder();
try
{
if (ctx.getPreviousResult().getResponseDataAsString().equals("")) {
Failure = true;
FailureMessage = "ERROR: Response is EMPTY.";
throw new Exception("ERROR: Response is EMPTY.");
} else {
if ((ResponseCode != null) && (ResponseCode.equals("200") == true)) {
SampleResult result = ctx.getPreviousResult();
JSONObject response = new JSONObject(result.getResponseDataAsString());
FileOutputStream fos = new FileOutputStream(System.getProperty("user.dir") + File.separator + extractedList);
if (response.has("items")) {
JSONArray items = response.getJSONArray("items");
if (items.length() != 0) {
for (int i = 0; i < items.length(); i++) {
String name = items.getJSONObject(i).getString("name");
String description = items.getJSONObject(i).getString("description");
int list_id = items.getJSONObject(i).getInt("list_id");
if (i != 0) {
contents.append("\n");
}
contents.append(name).append(",").append(description).append(",").append(list_id);
System.out.println("\t " + name + "\t\t" + description + "\t\t" + list_id);
}
}
}
byte [] buffer = contents.toString().getBytes();
fos.write(buffer);
fos.close();
} else {
Failure = true;
FailureMessage = "Failed to extract from JSON response.";
}
}
}
catch (Exception ex) {
IsSuccess = false;
log.error(ex.getMessage());
System.err.println(ex.getMessage());
}
catch (Throwable thex) {
System.err.println(thex.getMessage());
}
As well a set of links on this:
JSON in JMeter
Processing JSON Responses with JMeter and the BSF Post Processor
Upd. on 08.2017:
At the moment JMeter has set of built-in components (merged from 3rd party projects) to handle JSON without scripting:
JSON Path Extractor (contributed from ATLANTBH jmeter-components project);
JSON Extractor (contributed from UBIK Load Pack since JMeter 3.0) - see answer below.
I am assuming that JMeter uses Java-based regular expressions... This could mean no named capturing groups. Apparently, Java7 now supports them, but that doesn't necessarily mean JMeter would. For JSON that looks like this:
{
"name":"#favorites",
"description":"Collection of my favorite places",
"list_id":4894636,
}
{
"name":"#AnotherThing",
"description":"Something to fill space",
"list_id":0048265,
}
{
"name":"#SomethingElse",
"description":"Something else as an example",
"list_id":9283641,
}
...this expression:
\{\s*"name":"((?:\\"|[^"])*)",\s*"description":"((?:\\"|[^"])*)",(?:\\}|[^}])*}
...should match 3 times, capturing the "name" value into the first capturing group, and the "description" into the second capturing group, similar to the following:
1 2
--------------- ---------------------------------------
#favorites Collection of my favorite places
#AnotherThing Something to fill space
#SomethingElse Something else as an example
Importantly, this expression supports quote escaping in the value portion (and really even in the identifier name portion as well, so that the Javascript string I said, "What is your name?"! will be stored in JSON as AND parsed correctly as I said, \"What is your name?\"!
Using Ubik Load Pack plugin for JMeter which has been donated to JMeter core and is since version 3.0 available as JSON Extractor you can do it this way with following Test Plan:
namesExtractor_ULP_JSON_PostProcessor config:
descriptionExtractor_ULP_JSON_PostProcessor config:
Loop Controller to loop over results:
Counter config:
Debug Sampler showing how to use name and description in one iteration:
And here is what you get for the following JSON:
[{ "name":"#favorites", "description":"Collection of my favorite places", "list_id": 4894636 }, { "name":"#AnotherThing", "description":"Something to fill space", "list_id": 48265 }, { "name":"#SomethingElse", "description":"Something else as an example", "list_id":9283641 }]
Compared to Beanshell solution:
It is more "standard approach"
It performs much better than Beanshell code
It is more readable

Regex to parse querystring values to named groups

I have a HTML with the following content:
... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...
I would like to parse that and get a match with named groups:
match 1
group["user"]=123
group["section"]=2
match 2
group["user"]=678
group["section"]=5
I can do it if parameters always go in order, first User and then Section, but I don't know how to do it if the order is different.
Thank you!
In my case I had to parse an Url because the utility HttpUtility.ParseQueryString is not available in WP7. So, I created a extension method like this:
public static class UriExtensions
{
private static readonly Regex queryStringRegex;
static UriExtensions()
{
queryStringRegex = new Regex(#"[\?&](?<name>[^&=]+)=(?<value>[^&=]+)");
}
public static IEnumerable<KeyValuePair<string, string>> ParseQueryString(this Uri uri)
{
if (uri == null)
throw new ArgumentException("uri");
var matches = queryStringRegex.Matches(uri.OriginalString);
for (int i = 0; i < matches.Count; i++)
{
var match = matches[i];
yield return new KeyValuePair<string, string>(match.Groups["name"].Value, match.Groups["value"].Value);
}
}
}
Then It's matter of using it, for example
var uri = new Uri(HttpUtility.UrlDecode(#"file.aspx?userId=123&section=2"),UriKind.RelativeOrAbsolute);
var parameters = uri.ParseQueryString().ToDictionary( kvp => kvp.Key, kvp => kvp.Value);
var userId = parameters["userId"];
var section = parameters["section"];
NOTE: I'm returning the IEnumerable instead of the dictionary directly just because I'm assuming that there might be duplicated parameter's name. If there are duplicated names, then the dictionary will throw an exception.
Why use regex to split it out?
You could first extrct the query string. Split the result on & and then create a map by splitting the result from that on =
You didn't specify what language you are working in, but this should do the trick in C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace RegexTest
{
class Program
{
static void Main(string[] args)
{
string subjectString = #"... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...";
Regex regexObj =
new Regex(#"<a href=""file.aspx\?(?:(?:userId=(?<user>.+?)&section=(?<section>.+?)"")|(?:section=(?<section>.+?)&user=(?<user>.+?)""))");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
string user = matchResults.Groups["user"].Value;
string section = matchResults.Groups["section"].Value;
Console.WriteLine(string.Format("User = {0}, Section = {1}", user, section));
matchResults = matchResults.NextMatch();
}
Console.ReadKey();
}
}
}
Using regex to first find the key value pairs and then doing splits... doesn't seem right.
I'm interested in a complete regex solution.
Anyone?
Check this out
\<a\s+href\s*=\s*["'](?<baseUri>.+?)\?(?:(?<key>.+?)=(?<value>.+?)[&"'])*\s*\>
You can get pairs with something like Groups["key"].Captures[i] & Groups["value"].Captures[i]
Perhaps something like this (I am rusty on regex, and wasn't good at them in the first place anyway. Untested):
/href="[^?]*([?&](userId=(?<user>\d+))|section=(?<section>\d+))*"/
(By the way, the XHTML is malformed; & should be & in the attributes.)
Another approach is to put the capturing groups inside lookaheads:
Regex r = new Regex(#"<a href=""file\.aspx\?" +
#"(?=[^""<>]*?user=(?<user>\w+))" +
#"(?=[^""<>]*?section=(?<section>\w+))";
If there are only two parameters, there's no reason to prefer this way over the alternation-based approaches suggested by Mike and strager. But if you needed to match three parameters, the other regexes would grow to several times their current length, while this one would only need another lookahead like just like the two existing ones.
By the way, contrary to your response to Claus, it matters quite a bit which language you're working in. There's a huge variation in capabilities, syntax, and API from one language to the next.
You did not say which regex flavor you are using. Since your sample URL links to an .aspx file, I'll assume .NET. In .NET, a single regex can have multiple named capturing groups with the same name, and .NET will treat them as if they were one group. Thus you can use the regex
userID=(?<user>\d+)&section=(?<section>\d+)|section=(?<section>\d+)&userID=(?<user>\d+)
This simple regex with alternation will be far more efficient than any tricks with lookaround. You can easily expand it if your requirements include matching the parameters only if they're in a link.
a simple python implementation overcoming the ordering problem
In [2]: x = re.compile('(?:(userId|section)=(\d+))+')
In [3]: t = 'href="file.aspx?section=2&userId=123"'
In [4]: x.findall(t)
Out[4]: [('section', '2'), ('userId', '123')]
In [5]: t = 'href="file.aspx?userId=123&section=2"'
In [6]: x.findall(t)
Out[6]: [('userId', '123'), ('section', '2')]