understanding semcor corpus structure h - linguistics

I'm learning NLP. I currently playing with Word Sense Disambiguation. I'm planning to use the semcor corpus as training data but I have trouble understanding the xml structure. I tried googling but did not get any resource describing the content structure of semcor.
<s snum="1">
<wf cmd="ignore" pos="DT">The</wf>
<wf cmd="done" lemma="group" lexsn="1:03:00::" pn="group" pos="NNP" rdf="group" wnsn="1">Fulton_County_Grand_Jury</wf>
<wf cmd="done" lemma="say" lexsn="2:32:00::" pos="VB" wnsn="1">said</wf>
<wf cmd="done" lemma="friday" lexsn="1:28:00::" pos="NN" wnsn="1">Friday</wf>
<wf cmd="ignore" pos="DT">an</wf>
<wf cmd="done" lemma="investigation" lexsn="1:09:00::" pos="NN" wnsn="1">investigation</wf>
<wf cmd="ignore" pos="IN">of</wf>
<wf cmd="done" lemma="atlanta" lexsn="1:15:00::" pos="NN" wnsn="1">Atlanta</wf>
<wf cmd="ignore" pos="POS">'s</wf>
<wf cmd="done" lemma="recent" lexsn="5:00:00:past:00" pos="JJ" wnsn="2">recent</wf>
<wf cmd="done" lemma="primary_election" lexsn="1:04:00::" pos="NN" wnsn="1">primary_election</wf>
<wf cmd="done" lemma="produce" lexsn="2:39:01::" pos="VB" wnsn="4">produced</wf>
<punc>``</punc>
<wf cmd="ignore" pos="DT">no</wf>
<wf cmd="done" lemma="evidence" lexsn="1:09:00::" pos="NN" wnsn="1">evidence</wf>
<punc>''</punc>
<wf cmd="ignore" pos="IN">that</wf>
<wf cmd="ignore" pos="DT">any</wf>
<wf cmd="done" lemma="irregularity" lexsn="1:04:00::" pos="NN" wnsn="1">irregularities</wf>
<wf cmd="done" lemma="take_place" lexsn="2:30:00::" pos="VB" wnsn="1">took_place</wf>
<punc>.</punc>
</s>
I'm assuming wnsn is 'word sense'. Is it correct?
What does the attribute lexsn mean? How does it map to wordnet?
What does the attribute pn refer to? (third line)
How is the rdf attribute assigned? (again third line)
In general, what are the possible attributes?

The format is described in the "doc/cxtfile.txt" file in the SemCor 1.6 archive; for some reason, documentation is not included in later versions.

Related

How can I write a python code that searches for a word in a string and then prints the string if word is present

Here is a scenario:
Given:
facility_list = ['port', 'airport']
location_list =[ 'new york', 'Manchester', 'lagos port','florida port', 'london','Durban airport' ]
Task:
For each location in location_list,
If its name has 'port' or 'airport' in it,
print:
{name} is cool! .
Try:
facility_list = ["port", "airport"]
location_list = [
"new york",
"Manchester",
"lagos port",
"florida port",
"london",
"Durban airport",
]
for name in location_list:
if any(facility in name for facility in facility_list):
print(f"{name} is cool!")
Prints:
lagos port is cool!
florida port is cool!
Durban airport is cool!
OR: If you want to check separate words in location name, you can use str.split():
for name in location_list:
if any(facility in name.split() for facility in facility_list):
print(f"{name} is cool!")

Only first when expression works in apache camel choice?

I have this route which check if body begins with "01" or "02" and calls different beans based on that. the problem is that only first one works. for example if I send a message beginning with "01"
it works fine but if my message begins with "02" the otherwise part gets executed and i get the error message with an empty body.
<route id="genericService">
<from uri="servlet:///genericService"/>
<choice>
<when>
<simple>${body} regex "^01.*$"</simple>
<bean ref="cardFacade" method="getBalance" />
</when>
<when>
<simple>${body} regex "^02.*$"</simple>
<bean ref="depositFacade" method="getBalance" />
</when>
<otherwise>
<transform>
<simple>error: ${body}</simple>
</transform>
</otherwise>
</choice>
<marshal>
<json />
</marshal>
<transform>
<simple>${body}</simple>
</transform>
</route>
The problem is the servlet component provides the body as a stream that is only readable once. So you need to either enable stream caching, or convert the message body to a non stream type such as String or byte[].
You can find more details here
http://camel.apache.org/why-is-my-message-body-empty.html
And also see the 1st box on this page
http://camel.apache.org/servlet
I had the same problem in Java DSL (not the compilation issue). What I have to do is add .endchoice() for each and every choice some thing as below:
from(endPointTopic)
.errorHandler(deadLetterChannel)
.log("Message from Topic is ${body} & header string is ${header.Action}" )
.choice()
.when(header("Action").isEqualTo("POST"))
.setHeader(Exchange.HTTP_METHOD, constant("POST"))
.setHeader("Content-Type", constant("application/json"))
.convertBodyTo(String.class)
.to("log:like-to-see-all?level=INFO&showAll=true&multiline=true")
.to(privateApi)
.log("POST request for " + topicName)
.endChoice()
.when(header("Action").isEqualTo("PUT"))
.setHeader(Exchange.HTTP_METHOD, constant("PUT"))
.setHeader("Content-Type", constant("application/json"))
.convertBodyTo(String.class)
.to("log:like-to-see-all?level=INFO&showAll=true&multiline=true")
.to(privateApi)
.log("PUT request for " + topicName)
.endChoice()
.when(header("Action").isEqualTo("DELETE"))
.setHeader(Exchange.HTTP_METHOD, constant("DELETE"))
.setHeader("Content-Type", constant("application/json"))
.convertBodyTo(String.class)
.to("log:like-to-see-all?level=INFO&showAll=true&multiline=true")
.to(privateApi)
.log("DELET request for " + topicName)
.endChoice()
.otherwise()
.setHeader(Exchange.HTTP_METHOD, constant("GET"))
.setHeader("Content-Type", constant("application/json"))
.convertBodyTo(String.class)
.to("log:like-to-see-all?level=INFO&showAll=true&multiline=true")
.to(privateApi)
.log("Un-known HTTP action so posting to GET queue")
.endChoice();

Assigning a variable with Dynamic data in Xquery

I have a requirement where I have to validate the incoming data against data present in a constant.xml.
Say below is my constant file:
<Constant>
<data>
<Nation>India</Nation>
<EndPointURL>customers/{$custID}/Resource</EndPointURL>
</data>
<data>
<Nation>China</Nation>
<EndPointURL>customers/{$custID}/Resource</EndPointURL>
</data>
<data>
<Nation>Russia</Nation>
<EndPointURL>customers/Resource</EndPointURL>
</data>
</Constant>
and $body is as follows:
<body>
<custID>1234</custID>
<Country>India</Country>
<ServiceURL>customers/1234/Resource</ServiceURL>
</body>
Here I have to check, that if $body/ServiceURL = $Constant/data/EndPointURL.
And the cardinality of data is (1...infinity).
Is their a way I can change pass original CustID fr4om Input and make a validation check with customers/{$custID}/Resource.
Presently, I am using below code to make a check.
let $ServiceURL :=$body/ServiceURL/text()
let $country :=$body/Country/text()
for $service in ($Constant/data) where
($service/Nation/text() = $Country)
and ($service/EndPointURL/text() = $ServiceURL)
return
<ServiceURL>{fn:concat('/REALTime/',$service/EndPointURL/text())}</ServiceURL>
};
Please, let me know, how can I change the data of constant.xml in xquery
The following query shows you how to expand the embedded templates in your EndPointURLs and only return the result if there is a match.
(: find the `data` for our `Country` :)
let $data := $Constant/data[Nation eq $body/Country]
return
(: expand the {$var} parts of the EndPointURL :)
let $parts := tokenize($data/EndPointURL, "/")
return
let $expanded-parts :=
for $part in $parts
return
if(matches($part, "\{\$[^}]+\}"))then
let $src := replace($part, "\{\$([^}]+)\}", "$1")
return
$body/element()[local-name(.) eq $src]
else
$part
return
let $expanded-endpoint-url := concat("/", string-join($parts, "/"))
return
(: the body if the ServiceURL matches the expanded EndPointURL :)
$body[ServiceURL eq $expanded-endpoint-url]

Elmah Filter for Nancy MVC Framework

I'm using Nancy MVC and Nancy.Elmah. Currently, there's a bug in Nancy that raises an exception for requests with accept headers of "*". Here's the Elmah log:
System.ArgumentException
inputString not in correct Type/SubType format Parameter name: *
System.ArgumentException: inputString not in correct Type/SubType format
Parameter name: *
at Nancy.Responses.Negotiation.MediaRange.FromString(String contentType)
at Nancy.Routing.DefaultRouteInvoker.<>c__DisplayClass5a.<>c__DisplayClass5c.<GetCompatibleHeaders>b__4f(MediaRange mr)
at System.Linq.Enumerable.WhereSelectListIterator`2.MoveNext()
at System.Linq.Enumerable.<SelectManyIterator>d__14`2.MoveNext()
at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
at Nancy.Routing.DefaultRouteInvoker.GetCompatibleHeaders(IEnumerable`1 coercedAcceptHeaders, NancyContext context, Negotiator negotiator)
at Nancy.Routing.DefaultRouteInvoker.ProcessAsNegotiator(Object routeResult, NancyContext context)
at Nancy.Routing.DefaultRouteInvoker.InvokeRouteWithStrategy(Object result, NancyContext context)
at System.Dynamic.UpdateDelegates.UpdateAndExecute3[T0,T1,T2,TRet](CallSite site, T0 arg0, T1 arg1, T2 arg2)
at CallSite.Target(Closure , CallSite , DefaultRouteInvoker , Object , NancyContext )
at Nancy.Routing.DefaultRouteInvoker.Invoke(Route route, DynamicDictionary parameters, NancyContext context)
at Nancy.Routing.DefaultRequestDispatcher.Dispatch(NancyContext context)
at Nancy.NancyEngine.InvokeRequestLifeCycle(NancyContext context, IPipelines pipelines)
I tried filtering it with the following web.config
<errorFilter>
<test>
<or>
<regex binding="Exception" pattern="inputString not in correct Type/SubType format Parameter name: \*" />
</or>
</test>
</errorFilter>
But the errors are not filtered out.
Also tried binding="BaseException.Message" without luck.
I would like a way to filter these messages from being logged. Could some "correct" my filter configuration above to do so? Thanks!
Well, after a few more hours of trail and error I discovered the issue is that the, "Parameter name: *" portion is not part of the actual exception message. I'm guessing that the message in the log is a concatenation of the exception message and parameter values. Changed the filter as shown and it's works.
<errorFilter>
<test>
<regex binding="Exception.Message" pattern="inputString not in correct Type/SubType format" />
</test>
</errorFilter>

Nagiosgraph rrd files not created(maybe because of map file)

I'm having a problem with Nagiosgraph. I have created a nagios check which monitors the traffic on a server/workstation through SNMP and the output of the check is a long string that looks like this:
OK - traffmon eth0:incoming:170KB:outgoing:1606KB eth1:incoming:1576KB:outgoing:170KB eth2:incoming:156:outgoing:0|lo;incoming;25;outgoing;25 tunl0;incoming;0;outgoing;0 gre0;incoming;0;outgoing;0 sit0;incoming;0;outgoing;0 eth0;incoming;170KB;outgoing;1606KB eth1;incoming;1576KB;outgoing;170KB eth2;incoming;156;outgoing;0
I'm interested in the first three interfaces that is why i've separated eth0,eth1,eth2 from the whole string with interfaces(which i considered performance data) and i followed the instructions on http://www.novell.com/coolsolutions/feature/19843.html and i have in my service.cfg
define serviceextinfo{
host_name workstation
service_description Throughput Monitor
action_url /nagiosgraph/cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=eth0,incoming,outgoing,&geom=500x100&rrdopts%3D-l%200%20-u%2010000%20-t%20Traffic
}
and in my map file i have wrote this to match the things that interested me:
/output:.*traffmon ([0-9]+), ([0-9]+), ([0-9]+), ([0-9]+), ([0-9]+), ([0-9]+), ([0-9]+), ([0-9]+), ([0-9]+)/
and push #s, [ 'eth0',
['incoming', 'GAUGE', $2],
['outgoing', 'GAUGE', $3] ],
[ 'eth1',
['incoming', 'GAUGE', $5],
['outgoing', 'GAUGE', $6] ],
[ 'eth2',
['incoming', 'GAUGE', $8],
['outgoing', 'GAUGE', $9] ];
I wanted to create three tables (eth0, eth1, eth2) with two columns (incoming, outgoing) and from then on to try to represent them nicely. The thing is that usually my rrd files get created automatically, but for this check the folder in the rrd folder with the workstation's name doesn't get created and neither are the .rrd files, and i have the feeling that it has something to do with the map file, maybe the matching is not working or something(i'm saying this because i don't now perl). Any suggestion is appreciated. Thank you
You can try this regex:
/traffmon eth0:incoming:(\d+)(?:KB):outgoing:(\d+)(?:KB) eth1:incoming:(\d+)(?:KB):outgoing:(\d+)(?:KB) eth2:incoming:(\d+):outgoing:(\d+)/
You can test it on rubular: http://rubular.com/r/vj7VXwDPPU
I'm not familiar with how your nagios system works, but if there is room for more perl code, you could also do something like:
my $res = 'OK - traffmon eth0:incoming:170KB:outgoing:1606KB eth1:incoming:1576KB:outgoing:170KB eth2:incoming:156:outgoing:0|lo;incoming;25;outgoing;25 tunl0;incoming;0;outgoing;0 gre0;incoming;0;outgoing;0 sit0;incoming;0;outgoing;0 eth0;incoming;170KB;outgoing;1606KB eth1;incoming;1576KB;outgoing;170KB eth2;incoming;156;outgoing;0';
my #s;
push #s, map {
my #f = split /:/;
[ $f[0], [$f[1], 'GAUGE', $f[2] ], [$f[3], 'GAUGE', $f[4]] ]
} (split(/ |\|/, $res))[3..5];
print Dumper #s;
This splits the string at a space or a pipe |, takes the 3rd to 5th element (which is the first three interfaces) and then does a loop with them. It splits on colon :, builds your data structure and returns it for each interface. The returned data structure is pushed into #s.
Output:
$VAR1 = [
'eth0',
[
'incoming',
'GAUGE',
'170KB'
],
[
'outgoing',
'GAUGE',
'1606KB'
]
];
$VAR2 = [
'eth1',
[
'incoming',
'GAUGE',
'1576KB'
],
[
'outgoing',
'GAUGE',
'170KB'
]
];
$VAR3 = [
'eth2',
[
'incoming',
'GAUGE',
'156'
],
[
'outgoing',
'GAUGE',
'0'
]
];