GATE annotation counting - gate

I am trying to figure out a way of counting some annotations in GATE e.g. if I have some annotations occurring multiple times in a text document and if I want to count it, is there some sort of plugin that can help me?
Thanks

Another option is to use a groovy script. The code is from here:
sum = 0
docs.findAll{ // get all documents currently loaded
def filteredAnnots = it.getAnnotations("Filtered")
num = filteredAnnots["Anatomy"].size()
sum += num
println it.name + " " + num // or print to a file here
}
println "total:" + " " + sum
You can also easily put this code in a groovy plugin (PR) and run it as part of the pipeline, described here.

All that is necessary is just to get these annotations and then call a .size() method. AnnotationSet in GATE extends Java collection classes.
AnnotationSet annotationSet = gateDocument.getAnnotations().get("ABC");
int size=annotationSet.size();

Related

Why won't my list.append() not append despite working in print output

Below is my code for the leet code problem: here
class Solution:
def threeSum(self, nums: List[int]) -> List[List[int]]:
tripletsList = []
uniqueNums = set(nums)
uniqueNums = list(nums)
for x in range(len(uniqueNums)):
for y in range(len(uniqueNums)):
for z in range(len(uniqueNums)):
print(uniqueNums[x],uniqueNums[y],uniqueNums[z])
if (uniqueNums[x] + uniqueNums[y] + uniqueNums[z]) == 0:
if [[uniqueNums[x],uniqueNums[y],uniqueNums[z]]] not in tripletsList:
tripletsList.append([uniqueNums[x],uniqueNums[y],uniqueNums[z]])
print(tripletsList)
I realize it isn't the optimal solution but in my print line the output shows the correct combination, yet my final triplets list is not eliminating anything which leads me to believe I am not using the "not in" function of python correctly. Any help is greatly appreciated and I must be missing something that is just trivial
I tried first printing every combination of the unique list (yes tons of overlap), then I tried limiting the triplets that made it pass the == 0 check to only triplets that were not already in the triplets list (failed)

Scoring multiple TRUES in Pythton RE Search

Background
I have a list of "bad words" in a file called bad_words.conf, which reads as follows
(I've changed it so that it's clean for the sake of this post but in real-life they are expletives);
wrote (some )?rubbish
swore
I have a user input field which is cleaned and striped of dangerous characters before being passed as data to the following script, score.py
(for the sake of this example I've just typed in the value for data)
import re
data = 'I wrote some rubbish and swore too'
# Get list of bad words
bad_words = open("bad_words.conf", 'r')
lines = bad_words.read().split('\n')
combine = "(" + ")|(".join(lines) + ")"
#set score incase no results
score = 0
#search for bad words
if re.search(combine, data):
#add one for a hit
score += 1
#show me the score
print(str(score))
bad_words.close()
Now this finds a result and adds a score of 1, as expected, without a loop.
Question
I need to adapt this script so that I can add 1 to the score every time a line of "bad_words.conf" is found within text.
So in the instance above, data = 'I wrote some rubbish and swore too' I would like to actually score a total of 2.
1 for "wrote some rubbish" and +1 for "swore".
Thanks for the help!
Changing combine to just:
combine = "|".join(lines)
And using re.findall():
In [33]: re.findall(combine,data)
Out[33]: ['rubbish', 'swore']
The problem with having the multiple capturing groups as you originally were doing is that re.findall() will return each additional one of those as an empty string when one of the words is matched.

This cell duplicates image in that cell in spreadsheet?

Very simple. Cell A1 has an image in it, which I supply through Insert:Picture:From File...
Now I want Cell A3 to automatically show the same picture. I simply can't find a way-- certainly the "=" doesn't work. At this point, I don't care if the images are "links" or embedded, I just want it to work. Can it? Thx.
Edit, 09-01-17, based on Jim K's idea, here's macro code I have installed:
REM ***** BASIC *****
Private oListener as Object
Private cellA1 as Object
Sub AddListener
Dim Doc, Sheet, Cell as Object
Doc = ThisComponent
Sheet = Doc.Sheets.getByName("Sheet1")
cellA1 = Sheet.getCellrangeByName("A1")
'create a listener
oListener = createUnoListener("Modify_","com.sun.star.util.XModifyListener")
'register the listener
cellA1.addModifyListener(oListener)
End Sub
Sub Modify_disposing(oEv)
End Sub
Sub RmvListener
cellA1.removeModifyListener(oListener)
End Sub
' macro jumps here when oListener detects modification of Sheet
Sub Modify_modified(oEv)
Doc = ThisComponent
Sheet = Doc.Sheets.getByIndex(0)
originCell = Sheet.getCellByPosition(0,0)
originValue = originCell.Value
if originValue then
print "originValue is " & originValue
else
print "originValue zero"
end if
End Sub
The problem, ironically, is that it works. It works for integers and {non-value}, I mean an empty cell.
So any integer not zero prints TRUE, zero prints FALSE, and empty cell prints FALSE.
But that's where it quits working-- any kind of string "asdf" also returns FALSE.
Maybe that could be fixed, but there's something a lot worse: When I paste an image in the cell, or use the Insert/Image/From File... menu, or Cut an existing image... Nothing happens! The Sheet Modified business does not trigger the expected routine.
Any hope? Thx.
As you discovered, the solution in my comment does not work, because the Content changed event will not trigger when images are added. I looked into other events as well, but they did not work either.
So instead, we can set up a function that runs periodically. Each time it runs, it checks the count of images, and if any have been added or removed, it calls update_copied_images() below, which currently simply reports the value of cell A1, as in your code.
As explained here, I have not gotten a timer loop to work in Basic without crashing, so we can use Python instead (a better language, so this is not a drawback in my opinion).
import time
from threading import Thread
import uno
COLUMN_A, COLUMN_B, COLUMN_C = 0, 1, 2
FIRST_ROW = 0
def start_counting_images(action_event=None):
t = Thread(target = keep_counting_images)
t.start()
def keep_counting_images():
oDoc = XSCRIPTCONTEXT.getDocument()
oSheet = oDoc.getSheets().getByIndex(0)
oDrawPage = oSheet.getDrawPage()
messageCell = oSheet.getCellByPosition(COLUMN_C, FIRST_ROW)
messageCell.setString("Starting...")
prevCount = -1
while hasattr(oDoc, 'calculateAll'): # keep going until document is closed
count = oDrawPage.Count
if prevCount == -1 or prevCount != count:
prevCount = count
messageCell.setString("Number of Images: " + str(prevCount))
update_copied_images(oSheet)
time.sleep(1)
def update_copied_images(oSheet):
originCell = oSheet.getCellByPosition(COLUMN_A, FIRST_ROW)
originString = originCell.getString()
messageCell = oSheet.getCellByPosition(COLUMN_B, FIRST_ROW)
if len(originString):
messageCell.setString("originString '" + originString + "'")
else:
messageCell.setString("originString length is zero")
g_exportedScripts = start_counting_images,
To run, go to Tools -> Macros -> Run Macro, find the .py file under My Macros where you put this code, and run start_counting_images.

print sum of duplicate numbers and product of non duplicate numbers from the list

I am new to python. I am trying to print sum of all duplicates nos and products of non-duplicates nos from the python list. for examples
list = [2,2,4,4,5,7,8,9,9]. what i want is sum= 2+2+4+4+9+9 and product=5*7*8.
There are pythonic one liners that can do this but here is an explicit way you might find easier to understand.
num_list = [2,2,4,4,5,7,8,9,9]
sum_dup = 0
product = 1
for n in num_list:
if num_list.count(n) == 1:
product *= n
else:
sum_dup += n
Also side note, don't call your list the name "list", it interferes with the builtin name of the list type.
count is useful for this. Sum is built in, but there is no built in "product", so using reduce is the easiest way to do this.
from functools import reduce
import operator
the_sum = sum([x for x in list if list.count(x)>1])
the_product = reduce(operator.mul, [x for x in lst if lst.count(x)==1])
Use a for loop to read a number from the list. create a variable and assign the number to it, read another number and compare them using an if statement. If they are the same sum them like sameNumSum+=sameNumSum else multiply them. Before for loop create these two variables and initialize them. I just gave you the algorithm to it, you can change it into your code. Hope that help though.

Notepad++ or PowerGrep regex to find, multiply and replace a price

I have an xml file that i need to find, multiply (e.g. by 1.25) and replace all prices.
Price tag looks like that:
<price><![CDATA[15.9]]></price>
The price tag should look like that after the operation:
<price><![CDATA[19.875]]></price>
Can this be done in Notepad++ or PowerGrep using a regular expression?
Thanks in advance.
As far as I know you can't use either program to preform the math but you can build a simple program in most any language of your choice to take a file use regex to find the number. Cast that string as a double do the math and put it back into the string. Later on today I could probably build something in c# but it should be relatively straight forward in most languages. You may even be able to build a shell script and use grep if you're not in a windows environment or use Powershell for windows but I have less experience with Powershell.
Edit: There is an easier way to do this http://msdn.microsoft.com/en-us/library/hcebdtae(v=vs.110).aspx
this is essentially what you want to do using the xmldocument object.
Edit2: I did this even though I couldn't get a hold of the original poster I thought someone might be able to use the info and I learned a lot. I can add the source code to github if anyone is interested.
public static void ChangePricesWork(string filepath, double multiply)
{
var document = new XmlDocument();
document.Load(filepath);
XmlNodeList nodeList = document.GetElementsByTagName("price");
foreach (XmlNode node in nodeList)
{
if (!string.IsNullOrEmpty(node.InnerText))
{
node.InnerText = Convert.ToString(multiplyPrice(multiply, node.InnerText));
}
}
string newFilePath = string.Format(#"{0}\{1}_updated.xml", Path.GetDirectoryName(filepath), Path.GetFileNameWithoutExtension(filepath));
document.Save(newFilePath);
}
private static double multiplyPrice(double multiply, string oldPrice)
{
var newPrice = new double();
if (Double.TryParse(oldPrice, out newPrice))
{
newPrice = newPrice * multiply;
}
return newPrice;
}
Notepad++ has a Pythonscript plugin that allows you to create quick Python scripts that have access to your document and Notepad++ itself.
The install and setup is described in this answer.
The API has moved on a little bit since then, you do a regular expression replace with Editor.rereplace now.
# Start a sequence of actions that is undone and redone as a unit. May be nested.
editor.beginUndoAction()
# multiply_price_cdata
from decimal import *
TWOPLACES = Decimal(10) ** -2
def multiply_price_cdata( m ):
price = Decimal( m.group(2) ) * Decimal( 1.25 )
return m.group(1) + str(price.quantize(TWOPLACES)) + m.group(3)
def cdata( m ):
return "CDATA"
# npp++ search/replace
re_price = r'(<price><!\[CDATA\[)(\d+\.\d+|\d+)(\]\]></price>)'
editor.rereplace( re_price , multiply_price_cdata )
# end the undo sequence
editor.endUndoAction()