GATE annotation extraction - gate

I have used GATE to annotate the documents and saved the out put in XML. I need all the entities with relations to be extracted to csv file, can anyone please guide me how to do it?

You can do it by excecuting this recipe in pipeline
new File("../outfile.csv").withWriterAppend{ out ->doc.getAnnotations().each {anno ->
if( anno.getFeatures() )
anno.getFeatures().each{ fName, fValue ->
out.writeLine(/"${doc.getName()}","${anno.getType()}","${doc.stringFor(anno)}",${anno.start()},${anno.end()},"${fName}","${fValue}"/)
}
else
out.writeLine(/"${doc.getName()}","${anno.getType()}","${doc.stringFor(anno)}",${anno.start()},${anno.end()},,/)
}}

Related

Can I use a ForAll and UpdateIf within a local offline Powerapps collection?

Can anyone help?I need assistance to collect multiple records in a gallery and save it to a local collection when offline.
When my app is connected, my script uses a ForAll to go through all the gallery items then if the Question ID matches the ID in the gallery, it patches the records to the SQL database. This part works fine.
However, when offline, I collect the items and save them to a local collection called LocalAnswers. It only saves 1 record (instead of 20) and does not pull in the Question ID. I have tried inserting a ForAll and UpdateIf within my Collect function but can't seem to get it right. Any ideas?
If(
Connection.Connected,
ForAll(
Gallery2.AllItems,
UpdateIf(
AuditAnswers,
ID = Value(IDGal.Text),
{
AuditID: IDAuditVar,
Answer: Radio1.Selected.Value,
Action: ActionGal.Text,
AddToActionPlan: tglAction.Value
}
)
),
Collect(
LocalAnswers,
{
AuditID: IDAuditVar,
Answer: Radio1.Selected.Value,
Action: ActionGal.Text,
AddToActionPlan: tglAction.Value
}
)
);
Collect only pulls a single record because you only have a single record defined (everything between the {}).
I don't typically create collections from Gallery.AllItems but rather from a Data Source (Sharepoint, SQL, another Collection, etc.), so not sure if this will work without testing.
Try something like:
ForAll(Gallery2.AllItems,
Collect(colLocalAnswers,
{
AuditID: ThisRecord.AuditID, //or whatever the control's name is
Answer: ThisRecord.Radio1.Selected.Value,
Action: ThisRecord.ActionGal.Text,
AddToActionPlan: ThisRecord.tglAction.Value
}
)
);
SaveDate(colLocalAnswers, "localfile")

Cannot create index on non-empty table

I'm currently using AWS Lambda (NodeJS) with AWS QLDB.
The scenario is like this.
I have the first table and its indexes when I deployed the service. So the table and indexes will be created. My problem is that, once I need to add new table and its indexes; it can't create the index because there's existing table.
My workaround to be able to create new table even if there's an existing table in my Ledger is that I'm querying the list of tables I have.
const getTables = async (transactionExecutor: TransactionExecutor) => {
const statement = `SELECT name FROM information_schema.user_tables`;
return await transactionExecutor.execute(statement);
};
Then I have this condition to check if the table is already existing
const tables = JSON.stringify(result.getResultList());
if (
!JSON.parse(tables).some((object): boolean => object.name === process.env.TABLE_NAME)
) {
console.log('TABLE A NOT EXISTING');
await createTable(transactionExecutor, process.env.TABLE_NAME);
}
if (
!JSON.parse(tables).some(
(object): boolean => object.name === process.env.TABLE_NAME_1,
)
) {
console.log('TABLE B NOT EXISTING');
await createTable(transactionExecutor, process.env.TABLE_NAME_1);
}
I don't know how to do it with indexes, I tried using SQL commands in QLDB but it's not working.
I hope you can help me.
Thank you
I'm not quite sure what your question is (the post title and body hint at different things), but I'm going to do my best to answer.
First, QLDB stores data in Ion, not JSON. So, please use the Ion APIs to parse data and not the JSON ones. The reason your code works at all is because Ion is a superset of JSON and the result set doesn't include types that are unknown to JSON. So, for example, if the result set was changed to include an Ion Timestamp, then your code would break.
Next, actually getting a list of tables has first class support in the driver. Simply use driver.getTableNames.
Third, I think you have a question "can I add an index to a non-empty table?". The answer is "no". This is planned functionality and I will update this answer when it is available. UPDATE: Now you can! https://aws.amazon.com/about-aws/whats-new/2020/09/amazon-qldb-launches-index-improvements/
Finally, I think you're also asking if there is a way to list indexes on a table in the same way as you can list tables in a ledger. The answer to that is 'yes'. The documents returned in information_schema.user_tables look like this:
{
tableId:"...",
name:"THE_TABLE_NAME",
indexes:[
{
expr:"[THE_FIELD_BEING_INDEXED]"
}
],
status:"ACTIVE"
}

Annotating a document with JAPE

I have been searching for a solution to this for weeks, I have some documents(about 95) that I am trying to classify using GATE. I have put them in one corpus I called training_corpus, however, after ANNIE has annotated the corpus, I have to go back into each file, select all token in the document, and create an annotation called Mention, with feature type and value the class for the document. for example:
type Start End id Features
Mention 0 70000 2588 {type=neg}
Is there anyway to automatically do this with JAPE? Basically, I want to select all tokens and create a new annotation with feature(type=class). Also, the class is appended to the document. Since there are many documents, can JAPE extract the class from the document name and set it to the value of Mentions feature. Example document name is neg_data1.txt, so the annotation will be Mention.type = neg?
Any help will be greatly appreciated. Thanks
I think you answered to your question by yourself.If the class assignment based on just a token present in text - why not simply process text outside of GATE?
For example to create an xml file like:
text and then use it in training process.
Also you can create a simple JAPE rule which will:
a) will take a text within document boundaries (see gate.Utils.length methods AFAIR)
b) based on presence of your token will create a new Annotation instance with features necessary.
an abstract example:
Phase: Instance
Input: Token
Options: control = once
Rule:Instance
(
{Token}
):instance
-->
{
AnnotationSet instances = outputAS.get("INSTANCE_ANNOTATION");
FeatureMap featureMap = Factory.newFeatureMap();
if (instances!=null&&!instances.isEmpty()){
featureMap.put("features when annotation presented in doc");
}else{
featureMap.put("features when annotation not in doc");
}
outputAS.add(new Long(0), new Long(documentLength), "Mention", featureMap);
}

Search Informatica for text in SQL override

Is there a way to search all the mappings, sessions, etc. in Informatica for a text string contained within a SQL override?
For example, suppose I know a certain stored procedure (SP_FOO) is being called somewhere in an INFA process, but I don't know where exactly. Somewhere I think there is a Post SQL on a source or target calling it. Could I search all the sessions for Post SQL containing SP_FOO ? (Similar to what I could do with grep with source code.)
You can use Repository queries for querying REPO tables(if you have enough access) to get data related with all the mappings,transformations,sessions etc.
Please use the below link to get almost all kind of repo queries.Ur answers can be find in the below link.
https://uisapp2.iu.edu/confluence-prd/display/EDW/Querying+PowerCenter+data
select *--distinct sbj.SUBJECT_AREA,m.PARENT_MAPPING_NAME
from REP_SUBJECT sbj,REP_ALL_MAPPINGS m,REP_WIDGET_INST w,REP_WIDGET_ATTR wa
where sbj.SUBJECT_ID = m.SUBJECT_ID AND
m.MAPPING_ID = w.MAPPING_ID AND
w.WIDGET_ID = wa.WIDGET_ID
and sbj.SUBJECT_AREA in ('TLR','PPM_PNLST_WEB','PPM_CURRENCY','OLA','ODS','MMS','IT_METRIC','E_CONSENT','EDW','EDD','EDC','ABS')
and (UPPER(ATTR_VALUE) like '%PSA_CONTACT_EVENT%'
-- or UPPER(ATTR_VALUE) like '%PSA_MEMBER_CHARACTERISTIC%'
-- or UPPER(ATTR_VALUE) like '%PSA_REPORTING_HH_CHRSTC%'
-- or UPPER(ATTR_VALUE) like '%PSA_REPORTING_MEMBER_CHRSTC%'
)
--and m.PARENT_MAPPING_NAME like '%ARM%'
order by 1
Please let me know if you have any issues.
Another less scientific way to do this is to export the workflow(s) as XML and use a text editor to search through them for the stored procedure name.
If you have read access to the schema where the informatica repository resides, try this.
SELECT DISTINCT f.subj_name folder, e.mapping_name, object_type_name,
b.instance_name, a.attr_value
FROM opb_widget_attr a,
opb_widget_inst b,
opb_object_type c,
opb_attr d,
opb_mapping e,
opb_subject f
WHERE a.widget_id = b.widget_id
AND b.widget_type = c.object_type_id
AND ( object_type_name = 'Source Qualifier'
OR object_type_name LIKE '%Lookup%'
)
AND a.widget_id = b.widget_id
AND a.attr_id = d.attr_id
AND c.object_type_id = d.object_type_id
AND attr_name IN ('Sql Query')--, 'Lookup Sql Override')
AND b.mapping_id = e.mapping_id
AND e.subject_id = f.subj_id
AND a.attr_value is not null
--AND UPPER (a.attr_value) LIKE UPPER ('%currency%')
Yes. There is a small java based tool called Informatica Meta Query.
Using that tool, you can search for any information that is present in the Informatica meta data tables.
If you cannot find that tool, you can write queries directly in the Informatica Meta data tables to get the required information.
Adding few more lines to solution provided by Data Origin and Sandeep.
It is highly advised not to query repository tables directly. Rather, you can create synonyms or views and then query those objects to avoid any damage to rep tables.
In our dev/ prod environment application programmers are not granted any direct access to repo. tables.
As querying the Informatica database isn't the best idea, I would suggest you to export all the workflows in your folder into xml using Repository Manager. From Rep Mgr you can select all of them once and export them at once. Then write a java program to search the pattern from the xml's you have.
I have written a sample prog here, please modify it as per your requirement:
make a spec file with workflow names(specFileName).
main()
{
try {
File inFile = new File(specFileName);
BufferedReader reader = new BufferedReader(newFileReader(infile));
String tectToSearch = '<YourString>';
String currentLine;
while((currentLine = reader.readLine()) != null)
{
//trim newline when comparing with String
String trimmedLine = currentLine.trim();
if(currentline has the string pattern)
{
SOP(specFileName); //specfile name
}
}
reader.close();
}
catch(IOException ex)
{
System.out.println("Error reading to file '" + specFileName +"'");
}
}

Does Qt Linguist offer the ability to add new entries to the editable .ts file?

I didn't find a way to do this - only to edit the translations to the existing fields.
If there is no way to achieve this - how should this be done (somehow automatically, because right now I was manually adding
<message>
<source>x</source>
<translation>xx</translation>
</message>
blocks to my .ts file and I assume that's not the correct way.
No, that's not the correct way :) Use tr() in the code to mark strings for translation.
For example
label->setText( tr("Error") );
The you run lupdate for your project to extract them to a .ts. See here for more details.
Or do you need to translate strings that are not in the source code?
I just wrote a python script to insert new entries
into the .ts file for a homegrown parser using ElementTree. It doesnt make the code pretty
when it adds it, but I believe it works just fine (so far):
from xml.etree import ElementTree as ET
tree = ET.parse(infile)
doc = tree.getroot()
for e in tree.getiterator()
if e.tag == "context":
for child in e.getchildren():
if child.tag == "name" and child.text == target:
elem = ET.SubElement(e, "message")
src = ET.SubElement(elem, "source")
src.text = newtext
trans = ET.SubElement(elem, "translation")
trans.text = "THE_TRANSLATION"
tree.write(outfile)
Where infile is the .ts file, outfile may be the same as infile or different.
target is the context you are looking for to add a new message into,
and newtext is of course the new source text.