Annotating a document with JAPE - gate

I have been searching for a solution to this for weeks, I have some documents(about 95) that I am trying to classify using GATE. I have put them in one corpus I called training_corpus, however, after ANNIE has annotated the corpus, I have to go back into each file, select all token in the document, and create an annotation called Mention, with feature type and value the class for the document. for example:
type Start End id Features
Mention 0 70000 2588 {type=neg}
Is there anyway to automatically do this with JAPE? Basically, I want to select all tokens and create a new annotation with feature(type=class). Also, the class is appended to the document. Since there are many documents, can JAPE extract the class from the document name and set it to the value of Mentions feature. Example document name is neg_data1.txt, so the annotation will be Mention.type = neg?
Any help will be greatly appreciated. Thanks

I think you answered to your question by yourself.If the class assignment based on just a token present in text - why not simply process text outside of GATE?
For example to create an xml file like:
text and then use it in training process.
Also you can create a simple JAPE rule which will:
a) will take a text within document boundaries (see gate.Utils.length methods AFAIR)
b) based on presence of your token will create a new Annotation instance with features necessary.
an abstract example:
Phase: Instance
Input: Token
Options: control = once
Rule:Instance
(
{Token}
):instance
-->
{
AnnotationSet instances = outputAS.get("INSTANCE_ANNOTATION");
FeatureMap featureMap = Factory.newFeatureMap();
if (instances!=null&&!instances.isEmpty()){
featureMap.put("features when annotation presented in doc");
}else{
featureMap.put("features when annotation not in doc");
}
outputAS.add(new Long(0), new Long(documentLength), "Mention", featureMap);
}

Related

Google Healthcare API Nodejs - Filter Appointment using start and end date

I cannot filter an Appointment from start and end date using the google healthcare API.
I am trying to recreate the query below:
Appointment?date=ge2023-02-03T04:00:00.000Z&date=le2023-02-05T04:00:00.000Z
This is my javascript code using the library:
const parent = `projects/${projectId}/locations/${cloudRegion}/datasets/${datasetId}/fhirStores/${fhirStoreId}`;
const params = {
parent,
resourceType,
date: `le${end}`,
date: `ge${start}`,
};
const resource = await healthcare.projects.locations.datasets.fhirStores.fhir.search(params);
date: `ge${start}` is ignored because you cannot duplicate the same key.
Is there any other way I can achieve this.
Thanks!
I was able to make it work.
const params = {
parent,
resourceType,
"date:end": `le${end}`,
"date:start": `ge${start}`,
};
Just change "date" to "date:start" and "date:end"
I see that you were able to make this work, but there's a better way. What you essentially want to do here is specify multiple date parameters. The solution that you've come up with does this, but by using non-existent modifiers, :start and :end. By default the FHIR store drop modifiers it does not recognize, so your query as the same as date=le...&date=ge..., which is the end result you want. But if you were using the header Prefer: handling=strict, or set defaultSearchHandlingStrict in your FHIR store config, you'll get an error back from this search.
So what's the correct thing to do? If you want to specify multiple query parameters to be ANDed together (with the Node API), all you need to do is specify an array. In the future, if you wanted to OR them together you would use a single parameter with a comma separated list. So your code becomes
const params = {
parent,
resourceType,
date: [`le${end}`, `ge${start}`],
};
As a side note, the behaviour of search with the resourceType parameter is undefined, the method you want is searchType. These look the same due to the way the code is generated, but they function slightly differently.

spring data neo4j (SDN4) - find by relationship

I'm studying spring data for Neo4J and I've seen some examples where you just define a method in the repository interface following some standards (to find by a specific attribute) and it's automatically handled by spring. Ex: findByName.
It works quite straightforward with basic attributes but it doesn't seems to work when the attribute is actually a relationship.
See this example:
public class AcceptOrganizationTask extends AbstractTask {
#Relationship(type="RELATES_TO", direction = "OUTGOING")
private OrganizationInvite invitation;
...
}
In the repository interface I've defined 3 methods (All with the same goal):
List<AcceptOrganizationTask> findAllByInvitation(OrganizationInvite invite);
#Query("MATCH (i:OrganizationInvite)<-[RELATES_TO]-(t:AcceptOrganizationTask) WHERE i={invite} RETURN t")
List<AcceptOrganizationTask> getTaskByInvitation(#Param("invite") OrganizationInvite invite);
AcceptOrganizationTask findByInvitation_Id(Long invitationId);
None of them are able to retrieve the task by its invite property. But if I use use findAll() I can get the object with the property associated to the correct invitation.
Am I missing something ?
Bellow I have the Cypher code generated for this three methods:
findAllByInvitation
MATCH (n:`AcceptOrganizationTask`)
WHERE n.`invitation` = { `invitation_0` }
WITH n MATCH p=(n)-[*0..1]-(m) RETURN p, ID(n)
with params {invitation_0={entityId=15, version=0, createdOn=1484758262374, lastChanged=1484758262374, createUser=null, lastUpdatedBy=null, email=user2#acme.com, randomKey=fc940b14-12c3-4894-b2b4-728e3a6b8036, invitedUser={entityId=11, version=0, createdOn=1484758261450, lastChanged=1484758261450, createUser=null, lastUpdatedBy=null, name=User user2#acme.com, email=user2#acme.com, credentialsNonExpired=true, lastPasswordResetDate=null, authorities=null, authoritiesInDB=[], accountNonExpired=true, accountNonLocked=true, enabled=true, id=11}, id=15}}
getTasksByInvitation
MATCH (i:OrganizationInvite)<-[RELATES_TO]-(t:AcceptOrganizationTask)
WHERE i={invite} RETURN t with params {invite=15}
findByInvitation_Id
MATCH (n:`AcceptOrganizationTask`)
MATCH (m0:`OrganizationInvite`)
WHERE m0.`id` = { `invitation_id_0` }
MATCH (n)-[:`RELATES_TO`]->(m0)
WITH n
MATCH p=(n)-[*0..1]-(m) RETURN p, ID(n)
with params {invitation_id_0=15}
All entities inherit from a common AbstractEntity with and Long id field, annotated with #GraphId.
Am I missing something ?
At the moment derived finder queries support only works to one level of nesting.
We are hoping to add this feature soon in the coming weeks though.
For now you can write a custom #Query to get around this.

Sitecore Multisite Manager and 'source' field in template builder

Is there any way to parametise the Datasource for the 'source' field in the Template Builder?
We have a multisite setup. As part of this it would save a lot of time and irritation if we could point our Droptrees and Treelists point at the appropriate locations rather than common parents.
For instance:
Content
--Site1
--Data
--Site2
--Data
Instead of having to point our site at the root Content folder I want to point it at the individual data folders, so I want to do something like:
DataSource=/sitecore/content/$sitename/Data
I can't find any articles on this. Is it something that's possible?
Not by default, but you can use this technique to code your datasources:
http://newguid.net/sitecore/2013/coded-field-datasources-in-sitecore/
You could possibly use relative paths if it fits with the rest of your site structure. It could be as simple as:
./Data
But if the fields are on random items all over the tree, that might not be helpul.
Otherwise try looking at:
How to use sitecore query in datasource location? (dynamic datasouce)
You might want to look at using a Querable Datasource Location and plugging into the getRenderingDatasource pipeline.
It's really going to depend on your use cases. The thing I like about this solution is there is no need to create a whole bunch of controls which effectively do he same thing as the default Sitecore ones, and you don't have to individually code up each datasource you require - just set the query you need to get the data. You can also just set the datasource query in the __standard values for the templates.
This is very similar to Holger's suggestion, I just think this code is neater :)
Since Sitecore 7 requires VS 2012 and our company isn't going to upgrade any time soon I was forced to find a Sitecore 6 solution to this.
Drawing on this article and this one I came up with this solution.
public class SCWTreeList : TreeList
{
protected override void OnLoad(EventArgs e)
{
if (!String.IsNullOrEmpty(Source))
this.Source = SourceQuery.Resolve(SContext.ContentDatabase.Items[ItemID], Source);
base.OnLoad(e);
}
}
This creates a custom TreeList control and passes it's Source field through to a class to handle it. All that class needs to do is resolve anything you have in the Source field into a sitecore query path which can then be reassigned to the source field. This will then go on to be handled by Sitecore's own query engine.
So for our multi-site solution it enabled paths such as this:
{A588F1CE-3BB7-46FA-AFF1-3918E8925E09}/$sitename
To resolve to paths such as this:
/sitecore/medialibrary/Product Images/Site2
Our controls will then only show items for the correct site.
This is the method that handles resolving the GUIDs and tokens:
public static string Resolve(Item item, string query)
{
// Resolve tokens
if (query.Contains("$"))
{
MatchCollection matches = Regex.Matches(query, "\\$[a-z]+");
foreach (Match match in matches)
query = query.Replace(match.Value, ResolveToken(item, match.Value));
}
// Resolve GUIDs.
MatchCollection guidMatches = Regex.Matches(query, "^{[a-zA-Z0-9-]+}");
foreach (Match match in guidMatches)
{
Guid guid = Guid.Parse(match.Value);
Item queryItem = SContext.ContentDatabase.GetItem(new ID(guid));
if (item != null)
query = query.Replace(match.Value, queryItem.Paths.FullPath);
}
return query;
}
Token handling below, as you can see it requires that any item using the $siteref token is inside an Site Folder item that we created. That allows us to use a field which contains the name that all of our multi-site content folders must follow - Site Reference. As long at that naming convention is obeyed it allows us to reference folders within the media library or any other shared content within Sitecore.
static string ResolveToken(Item root, string token)
{
switch (token)
{
case "$siteref":
string sRef = string.Empty;
Item siteFolder = root.Axes.GetAncestors().First(x => x.TemplateID.Guid == TemplateKeys.CMS.SiteFolder);
if (siteFolder != null)
sRef = siteFolder.Fields["Site Reference"].Value;
return sRef;
}
throw new Exception("Token '" + token + "' is not recognised. Please disable wishful thinking and try again.");
}
So far this works for TreeLists, DropTrees and DropLists. It would be nice to get it working with DropLinks but this method does not seem to work.
This feels like scratching the surface, I'm sure there's a lot more you could do with this approach.

Adding SignedDataObjects (and consequently add proofOfApproval property) to an enveloped signature

I'm creating an Enveloped signature with xades4j following this statements:
Element elemToSign = doc.getDocumentElement();
XadesSigner signer = new XadesTSigningProfile(...).newSigner();
new Enveloped(signer).sign(elemToSign);
But I need to put in the signature also other properties like ProofOfApprova etc...
I see that in xades4j examples the proofOfApprovalProperties are addedto enveloped signature using different statements of signature, for example:
AllDataObjsCommitmentTypeProperty globalCommitment = AllDataObjsCommitmentTypeProperty.proofOfApproval();
CommitmentTypeProperty commitment = CommitmentTypeProperty.proofOfCreation();
DataObjectDesc obj1 = new DataObjectReference('#' + elemToSign.getAttribute("Id"))
.withTransform(new EnvelopedSignatureTransform())
.withDataObjectFormat(new DataObjectFormatProperty("text/xml", "MyEncoding")
.withDescription("Isto é uma descrição do elemento raiz")
.withDocumentationUri("http://doc1.txt")
.withDocumentationUri("http://doc2.txt"))
.withIdentifier("http://elem.root"))
.withCommitmentType(commitment)
.withDataObjectTimeStamp(dataObjsTimeStamp)
SignedDataObjects dataObjs = new SignedDataObjects(obj1)
.withCommitmentType(globalCommitment);
signer.sign(dataObjs, elemToSign);
I see here that another procedure of signature is used, more specificately the statement in which I create a DataObjectreference saying that I use "Id" attibute fo root tag is unusable for me because in input I can have any kind of xml document and I cannot know what kind of attribute (if present) I can use foe define root tag.
Briefly, can I have some examp'le code where I create an Enveloped signature and put a proofOfApproval property using "new Enveloped(signer).sign(elemToSign);", or anyway whitout knowing the xml source structure?
Thanks
M.
The proofOfApproval property has to be applied to data objects being signed, hence the need to use the SignedDataObjects class.
The Enveloped class is just a helper for straightforward scenarios. If I understood correctly you want to sign the whole XML document. The XML-Signatures spec defines that an empty URI on a reference (URI="") means exactly that. If you check the code on the Enveloped class you'll see that it adds a DataObjectReference with an empty uri.
To sum up, you'll need something like:
DataObjectDesc obj1 = new DataObjectReference("")
.withTransform(new EnvelopedSignatureTransform())
.withCommitmentType(CommitmentTypeProperty.proofOfApproval());
signer.sign(new SignedDataObjects(obj1), elemToSign);

Search Informatica for text in SQL override

Is there a way to search all the mappings, sessions, etc. in Informatica for a text string contained within a SQL override?
For example, suppose I know a certain stored procedure (SP_FOO) is being called somewhere in an INFA process, but I don't know where exactly. Somewhere I think there is a Post SQL on a source or target calling it. Could I search all the sessions for Post SQL containing SP_FOO ? (Similar to what I could do with grep with source code.)
You can use Repository queries for querying REPO tables(if you have enough access) to get data related with all the mappings,transformations,sessions etc.
Please use the below link to get almost all kind of repo queries.Ur answers can be find in the below link.
https://uisapp2.iu.edu/confluence-prd/display/EDW/Querying+PowerCenter+data
select *--distinct sbj.SUBJECT_AREA,m.PARENT_MAPPING_NAME
from REP_SUBJECT sbj,REP_ALL_MAPPINGS m,REP_WIDGET_INST w,REP_WIDGET_ATTR wa
where sbj.SUBJECT_ID = m.SUBJECT_ID AND
m.MAPPING_ID = w.MAPPING_ID AND
w.WIDGET_ID = wa.WIDGET_ID
and sbj.SUBJECT_AREA in ('TLR','PPM_PNLST_WEB','PPM_CURRENCY','OLA','ODS','MMS','IT_METRIC','E_CONSENT','EDW','EDD','EDC','ABS')
and (UPPER(ATTR_VALUE) like '%PSA_CONTACT_EVENT%'
-- or UPPER(ATTR_VALUE) like '%PSA_MEMBER_CHARACTERISTIC%'
-- or UPPER(ATTR_VALUE) like '%PSA_REPORTING_HH_CHRSTC%'
-- or UPPER(ATTR_VALUE) like '%PSA_REPORTING_MEMBER_CHRSTC%'
)
--and m.PARENT_MAPPING_NAME like '%ARM%'
order by 1
Please let me know if you have any issues.
Another less scientific way to do this is to export the workflow(s) as XML and use a text editor to search through them for the stored procedure name.
If you have read access to the schema where the informatica repository resides, try this.
SELECT DISTINCT f.subj_name folder, e.mapping_name, object_type_name,
b.instance_name, a.attr_value
FROM opb_widget_attr a,
opb_widget_inst b,
opb_object_type c,
opb_attr d,
opb_mapping e,
opb_subject f
WHERE a.widget_id = b.widget_id
AND b.widget_type = c.object_type_id
AND ( object_type_name = 'Source Qualifier'
OR object_type_name LIKE '%Lookup%'
)
AND a.widget_id = b.widget_id
AND a.attr_id = d.attr_id
AND c.object_type_id = d.object_type_id
AND attr_name IN ('Sql Query')--, 'Lookup Sql Override')
AND b.mapping_id = e.mapping_id
AND e.subject_id = f.subj_id
AND a.attr_value is not null
--AND UPPER (a.attr_value) LIKE UPPER ('%currency%')
Yes. There is a small java based tool called Informatica Meta Query.
Using that tool, you can search for any information that is present in the Informatica meta data tables.
If you cannot find that tool, you can write queries directly in the Informatica Meta data tables to get the required information.
Adding few more lines to solution provided by Data Origin and Sandeep.
It is highly advised not to query repository tables directly. Rather, you can create synonyms or views and then query those objects to avoid any damage to rep tables.
In our dev/ prod environment application programmers are not granted any direct access to repo. tables.
As querying the Informatica database isn't the best idea, I would suggest you to export all the workflows in your folder into xml using Repository Manager. From Rep Mgr you can select all of them once and export them at once. Then write a java program to search the pattern from the xml's you have.
I have written a sample prog here, please modify it as per your requirement:
make a spec file with workflow names(specFileName).
main()
{
try {
File inFile = new File(specFileName);
BufferedReader reader = new BufferedReader(newFileReader(infile));
String tectToSearch = '<YourString>';
String currentLine;
while((currentLine = reader.readLine()) != null)
{
//trim newline when comparing with String
String trimmedLine = currentLine.trim();
if(currentline has the string pattern)
{
SOP(specFileName); //specfile name
}
}
reader.close();
}
catch(IOException ex)
{
System.out.println("Error reading to file '" + specFileName +"'");
}
}