How to get POS tagging on GATE - gate

How to get the features of words using GATE embedded (Java code) as in the following example:
type=Token;
features={category=VBG, kind=word, orth=lowercase, length=7, string=lacking};
start=NodeImpl;
id=21453;

If you use the opennlp pos tagging it should be something like this, given that "Token" is your annotation for tokens:
token.getFeatures().get("category").toString()
should give you the string corresponding to the pos tag.

Phase: Find_Features
Input: Token
Options: control = First
Rule: get_Token_features
(
{Token}
):label
-->
:label
{
AnnotationSet tokens = inputAS.get("Token");
for(Annotation t : tokens)
{
FeatureMap fm = t.getFeatures();
System.out.println(fm);
/*
If looking for specific features go for
System.out.println( t.getFeatures().get("FeatureName").toString() );
*/
System.out.println(t.getFeatures().get("category").toString());
}
}

Related

Convenient way to get the first char index of a given string that caused a specific text pattern not to match in Rust?

Language:
Rust
Rust regex crate: https://docs.rs/regex/1.5.4/regex/
Use case:
Printing friendly diagnostic message to user that inputs text that does not match an expected regex pattern
e.g.
if patterns are Regex::new(r"^--(\w+)=(\w+)$").unwrap(); and Regex = Regex::new(r"^-(\w+)$").unwrap();
and user inputs "---abc"
user can see diagnostic like:
"---abc"
^ Problem with character "-" at index 2.
Expecting format "--key=value".
^ Does not match expected format at index 2.
Possible solution:
Can I do something with capture groups? (They might only be relevant if there is a match). If no solution with capture groups, what else?
// "-a[bc..]" or "--key=value"
lazy_static! {
static ref SHORT_OPTION_RE: Regex = Regex::new(r"^-(\w+)$").unwrap();
static ref LONG_OPTION_RE: Regex = Regex::new(r"^--(\w+)=(\w+)$").unwrap();
}
// long option example
let caps = LONG_OPTION_RE.captures(s).ok_or(e_msg)?;
let key = caps.get(1).unwrap().as_str().to_string();
let value = caps.get(2).unwrap().as_str().to_string();
if key.is_some { }
Issue:
Can't get exact char index that caused capture group not to match.
Alternatives:
Just manually add in if/else checks for various indexes to try to catch every error scenario ("---a", "-a=b", etc) (Essentially implement mini parser that generates diagnostic message and problematic char index without using regex)
Out of scope:
I do not need recommendations for cli program libs/frameworks (unless you're pointing to an implementation detail within one)
Edit:
Modified question to be more generic than just regex.
I would use a parser like nom.
Here is a quick and partial implementation of your use case:
use nom::{
bytes::complete::tag, character::complete::alphanumeric1, combinator::map, sequence::tuple,
IResult,
};
#[derive(Debug)]
struct OptPair {
key: String,
value: String,
}
fn parse_option(input: &str) -> IResult<&str, OptPair> {
map(
tuple((tag("--"), alphanumeric1, tag("="), alphanumeric1)),
|(_, k, _, v): (&str, &str, &str, &str)| OptPair {
key: k.to_owned(),
value: v.to_owned(),
},
)(input)
}
fn test_parse(input: &str) {
println!("TEST: input = \"{}\":", input);
match parse_option(input) {
Ok((_, opt_pair)) => println!(" Ok, {:?}", opt_pair),
Err(err) => match err {
nom::Err::Incomplete(_) => eprintln!(" Incomplete"),
nom::Err::Error(err) => {
let offset = err.input.as_ptr() as usize - input.as_ptr() as usize;
eprintln!(" Error at index {}", offset);
}
nom::Err::Failure(_err) => println!(" Failure"),
},
}
}
fn main() {
test_parse("--foo=bar");
test_parse("---foo=bar");
test_parse("--foo=");
test_parse("Hello");
}
Output:
TEST: input = "--foo=bar":
Ok, OptPair { key: "foo", value: "bar" }
TEST: input = "---foo=bar":
Error at index 2
TEST: input = "--foo=":
Error at index 6
TEST: input = "Hello":
Error at index 0

AmazonCloudWatch PutMetricData request format parsing

How to parse PutMetricData Sample Request as show below.
I want to parse all the MetricData and stores the values in a struct in golang.
https://monitoring.&api-domain;/doc/2010-08-01/
?Action=PutMetricData
&Version=2010-08-01
&Namespace=TestNamespace
&MetricData.member.1.MetricName=buffers
&MetricData.member.1.Unit=Bytes
&MetricData.member.1.Value=231434333
&MetricData.member.1.Dimensions.member.1.Name=InstanceID
&MetricData.member.1.Dimensions.member.1.Value=i-aaba32d4
&MetricData.member.1.Dimensions.member.2.Name=InstanceType
&MetricData.member.1.Dimensions.member.2.Value=m1.small
&MetricData.member.2.MetricName=latency
&MetricData.member.2.Unit=Milliseconds
&MetricData.member.2.Value=23
&MetricData.member.2.Dimensions.member.1.Name=InstanceID
&MetricData.member.2.Dimensions.member.1.Value=i-aaba32d4
&MetricData.member.2.Dimensions.member.2.Name=InstanceType
&MetricData.member.2.Dimensions.member.2.Value=m1.small**
&AUTHPARAMS
Not able to understand this is in which format and how to parse it. Any library available to generate and parse this kind of formatted message?
If you remove the newlines that is a URL. Start with url.Parse, then use the Query() function to get access to the url parameters:
func main() {
var input = `https://monitoring.&api-domain;/doc/2010-08-01/
?Action=PutMetricData
&Version=2010-08-01
&Namespace=TestNamespace
&MetricData.member.1.MetricName=buffers
&MetricData.member.1.Unit=Bytes
&MetricData.member.1.Value=231434333
&MetricData.member.1.Dimensions.member.1.Name=InstanceID
&MetricData.member.1.Dimensions.member.1.Value=i-aaba32d4
&MetricData.member.1.Dimensions.member.2.Name=InstanceType
&MetricData.member.1.Dimensions.member.2.Value=m1.small
&MetricData.member.2.MetricName=latency
&MetricData.member.2.Unit=Milliseconds
&MetricData.member.2.Value=23
&MetricData.member.2.Dimensions.member.1.Name=InstanceID
&MetricData.member.2.Dimensions.member.1.Value=i-aaba32d4
&MetricData.member.2.Dimensions.member.2.Name=InstanceType
&MetricData.member.2.Dimensions.member.2.Value=m1.small**
&AUTHPARAMS`
// possibly also needs to replace \r
input = strings.ReplaceAll(input, "\n", "")
uri, err := url.Parse(input)
if err != nil {
log.Fatal(err)
}
for key, val := range uri.Query() {
fmt.Println(key, val)
}
}
Playground
From here on out it's up to you how you want the target struct to look like.

How to display Highlighted text using Solrj

I am new to Solr and SolrJ. I am trying to use for a desktop application and I have files(text files) to index and search. I wanted to use highlight feature and display the fragments with highlight,but I don't get them to display in yellow background as you highlight a text, please let me know how to display the text in yellow background.
here is my code snippet:
public void TestHighLight(SolrQuery query) throws
SolrServerException, IOException {
query.setQuery("*");
query.set("hl", "true");
query.set("hl.snippets", "5");
query.set("q", "text:Pune");
query.set("hl.fl", "*");
QueryResponse queryResponse = client.query(query);
SolrDocumentList docs = queryResponse.getResults();
Iterator iter = docs.iterator();
for (int i = 0; i < docs.size(); i++) {
iter = docs.get(i).getFieldNames().iterator();
String fldVal = (String) docs.get(i).getFieldValue("id");
String docID = (String) docs.get(i).get("id");
while (iter.hasNext()) {
String highlighText = getHighlightedText(queryResponse,
"text", docID);
System.out.println(" tHighlightedText is " + highlighText );
}
}
}
The output looks like this:how do I color it ?
[ for Java Developer at Pune
Thanks a lot !
Set the pre and post parameters of the highlighter. Specifies the “tag” to use before a highlighted term. This can be any string, but is most often an HTML or XML tag.
e.g:
solrQueryHandler.setHighlightSimplePre("<font color="yellow">");
solrQueryHandler.setHighlightSimplePost("/font");
But note that this will work only for the Original Highlighter

Regex Phone Number Using Validation V2 Golang Package Not Working

I am having some trouble when using github.com/go-validator/validator to validate regex some phone numbers with this prefix +62, 62, 0, for instance number e.g. +628112blabla, 0822blablabla, 628796blablabla.
I have try my regex on online regex tester and no issue with the regex on that. Here the regex is :
(0|\+62|062|62)[0-9]+$
But when I try with my go implement with it, the regex not working. This is my code for implement the purpose :
type ParamRequest struct {
PhoneNumber string `validate:"nonzero,regexp=(0|\+62|062|62)[0-9]+$"`
ItemCode string `validate:"nonzero"`
CallbackUrl string `validate:"nonzero"`
}
func (c *TopupAlloperatorApiController) Post() {
var v models.TopupAlloperatorApi
interf := make(map[string]interface{})
json.Unmarshal(c.Ctx.Input.RequestBody, &interf)
logs.Debug(" Json Input Request ", interf)
var phone, item, callback string
if _, a := interf["PhoneNumber"].(string); a {
phone = interf["PhoneNumber"].(string)
}
if _, b := interf["ItemCode"].(string); b {
item = interf["ItemCode"].(string)
}
if _, c := interf["CallbackUrl"].(string); c {
callback = interf["CallbackUrl"].(string)
}
ve := ParamRequest{
PhoneNumber: phone,
ItemCode: item,
CallbackUrl: callback,
}
logs.Debug(" Param Request ", ve)
err := validator.Validate(ve)
if err == nil {
//success
}else{
// not success
}
Many thanks for anything help. Thank you.
Because you are using regexp to check PhoneNumber that won't be matching if the value is empty it is better to remove nonzero from the validation.
I have checked out documentation and haven't found examples where you can use both: nonzero and regexp.
Also you need to make your regex symbol-escaped, otherwise it won't be detected by reflection. It means you should use (0|\\+62|062|62)[0-9]+$ in your code. Here is example where problem is: symbol escaping in struct tags
And also, please try to use this regexp: ^\\+{0,1}0{0,1}62[0-9]+$

How to combine multiple criteria in OR query dynamically in Play Morphia

I am trying to use a kind of builder pattern to build an OR query using multiple criteria depending upon the scenario. An example is
public class Stylist extends Model {
public String firstName;
public String lastName;
public String status;
...
}
I would like to search Stylist collection if the first name or last name matches a given string and also status matches another string. I am writing the query as follows:
MorphiaQuery query = Stylist.q();
if (some condition) {
query.or(query.criteria("status").equal("PendingApproval"), query.criteria("status").equal(EntityStatus.ACTIVE));
}
if (some other condition as well) {
query.or(query.criteria("firstName").containsIgnoreCase(name), query.criteria("lastName").containsIgnoreCase(name));
}
When both the conditions are met, I see that query contains only the criteria related to firstName and lastName i.e. different OR criteria are not added/appended but overwritten. It's quite different from filter criteria where all the different filter conditions are appended and you can easily build queries containing multiple AND conditions.
I can solve the problem by putting my conditions differently and building my queries differently but doesn't seem to be an elegant way. Am I doing something wrong ?
I am using Play! Framework 1.2.4 and Play Morphia module version 1.2.5a
Update
To put it more clearly, I would like to AND multiple OR queries. Concretely, in the above mentioned scenario, I would like to
I would like to search for Stylists where :
firstName or lastName contains supplied name AND
status equals ACTIVE or PENDING_APPROVAL.
I have been able to construct the query directly on Mongo shell through :
db.stylists.find({$and: [{$or : [{status: "PENDING_APPROVAL"}, {status : "ACTIVE"}]},{$or : [{firstName : { "$regex" : "test" , "$options" : "i"}}, {lastName : { "$regex" : "test" , "$options" : "i"}}]}] }).pretty();
But have not able to achieve the same through Query API methods. Here is my attempt :
Query<Stylist> query = MorphiaPlugin.ds().find(Stylist.class);
CriteriaContainer or3 = query.or(query.criteria("firstName").containsIgnoreCase(name), query.criteria("lastName").containsIgnoreCase(name));
CriteriaContainer or4 = query.or(query.criteria("status").equal("PENDING_APPROVAL"), query.criteria("status").equal("ACTIVE"));
query.and(or3, or4);
query.toString() results in following output : { "$or" : [ { "status" : "PENDING_APPROVAL"} , { "status" : "ACTIVE"}]}
Not sure, where am I missing ?
I guess there could be 2 ways to handle your case:
first, use List<Criteria>
MorphiaQuery query = Stylist.q();
List<Criteria> l = new ArrayList<Criteria>()
if (some condition) {
l.add(query.criteria("status").equals("PendingApproval");
l.add(query.criteria("status").equal(EntityStatus.ACTIVE));
}
if (some other conditional as well) {
l.add(query.criteria("firstName").containsIgnoreCase(name));
l.add(query.criteria("lastName").containsIgnoreCase(name));
}
query.or(l.toArray());
Second, use CritieriaContainer
MorphiaQuery query = Stylist.q();
CriteriaContainer cc = null;
if (some condition) {
cc = query.or(query.criteria("status").equal("PendingApproval"), query.criteria("status").equal(EntityStatus.ACTIVE));
}
if (some other condition) {
if (null != cc) query.or(cc, query.criteria("firstName").containsIgnoreCase(name), query.criteria("lastName").containsIgnoreCase(name));
else query.or(query.criteria("firstName").containsIgnoreCase(name), query.criteria("lastName").containsIgnoreCase(name));
}