I am doing tag search function, user could observe a lot of tags, I get it all in one tuple, and now I would like to find all text which include at least one tag from the list.
Symbolic: text__contains__in=('asd','dsa')
My only idea is do loop e.g.:
q = text.objects.all()
for t in tag_tuple:
q.filter(data__contains=t)
For example:
input tuple of tags, ('car', 'cat', 'cinema')
output all messages what contains at least one word from that tuple, so My cat is in the car , cat is not allowed in the cinema, i will drive my car to the cinema
Thanks for help!
Here you go:
filter = Q()
for t in tag_tuple:
filter = filter | Q(data__contains=t)
return text.objects.filter(filter)
A couple of tips:
You should be naming your model classes with a capital (i.e. Text, not text)
You may want __icontains instead if you're not worried about the case
I don't know Django, so I have no idea how to apply this filter, but it seems you want a function like this one:
def contains_one_of(tags, text):
text = text.split() # tags should match complete words, not partial words
return any(t in text for t in tags)
Related
I'm looking for help on a tricky QRegExp that I'd like to pass to my QSortFilterProxyModel.setFilterRegex. I've been struggling to find a solution that handles my use-case.
From the sample code below, I need to capture items with two underscores (_) but ONLY if they have george or brian. I do not want items that have more or less than two underscores.
string_list = [
'john','paul','george','ringo','brian','carl','al','mike',
'john_paul','paul_george','john_ringo','george_ringo',
'john_paul_george','john_paul_brian','john_paul_ringo',
'john_paul_carl','paul_mike_brian','john_george_brian',
'george_ringo_brian','paul_george_ringo','john_george_ringo',
'john_paul_george_ringo','john_paul_george_ringo_brian','john_paul_george_ringo_brian_carl',
]
view = QListView()
model = QStringListModel(string_list)
proxy_model = QSortFilterProxyModel()
proxy_model.setSourceModel(model)
view.setModel(proxy_model)
view.show()
The first part (matching two underscores) can be accomplished with the line (simplified here, but really each token can be composed of any alphanumeric character other than _, so [a-zA-Z0-9]*):
proxy_model.setFilterRegExp('^[a-z]*_[a-z]*_[a-z]*$')
The second part can be accomplished (independently with)
proxy_model.setFilterRegExp('george|brian')
To complicate matters, these additional criterial apply:
This list may grow to the realm of several thousand items,
The tokenization may reach up to 10 or so tokens
The tokenization can be in any order (so george could occur at the beginning, middle, end)
We may also want to also capture georgeH and brainW35 when they occur, so long as they begin with george or brian.
We may have N-Number of names we're searching for (i.e. george|brian|jim|al, but only when they're in strings with two underscores.
To simplify them:
Lines will never begin or end with "_", and should only ever begin/end with [a-zA-Z0-9]
Do the QRegExp and QSortFilterProxyModel even have the capabilities I'm looking for, or will I need to resort to some other approach?
For very complex conditions using regex is not very useful, in that case it is better to override the filterAcceptsRow method where you can implement the filter function as shown in the following trivial example:
class FilterProxyModel(QSortFilterProxyModel):
_words = None
_number_of_underscore = -1
def filterAcceptsRow(self, source_row, source_parent):
text = self.sourceModel().index(source_row, 0, source_parent).data()
if not self._words or self._number_of_underscore < 0:
return True
return (
any([word in text for word in self._words])
and text.count("_") == self._number_of_underscore
)
#property
def words(self):
return self._words
#words.setter
def words(self, words):
self._words = words
self.invalidateFilter()
#property
def number_of_underscore(self):
return self._number_of_underscore
#number_of_underscore.setter
def number_of_underscore(self, number):
self._number_of_underscore = number
self.invalidateFilter()
view = QListView()
model = QStringListModel(string_list)
proxy_model = FilterProxyModel()
proxy_model.setSourceModel(model)
view.setModel(proxy_model)
view.show()
proxy_model.number_of_underscore = 2
proxy_model.words = (
"george",
"brian",
)
In an effort to make our budgeting life a bit easier and help myself learn; I am creating a small program in python that takes data from our exported bank csv.
I will give you an example of what I want to do with this data. Say I want to group all of my fast food expenses together. There are many different names with different totals in the description column but I want to see it all tabulated as one "Fast Food " expense.
For instance the Csv is setup like this:
Date Description Debit Credit
1/20/20 POS PIN BLAH BLAH ### 1.75 NaN
I figured out how to group them with an or statement:
contains = df.loc[df['Description'].str.contains('food court|whataburger', flags = re.I, regex = True)]
I ultimately would like to have it read off of a list? I would like to group all my expenses into categories and check those category variable names so that it would only output from that list.
I tried something like:
fast_food = ['Macdonald', 'Whataburger', 'pizza hut']
That obviously didn't work.
If there is a better way of doing this I am wide open to suggestions.
Also I have looked through quite a few posts here on stack and have yet to find the answer (although I am sure I overlooked it)
Any help would be greatly appreciated. I am still learning.
Thanks
You can assign a new column using str.extract and then groupby:
df = pd.DataFrame({"description":['Macdonald something', 'Whataburger something', 'pizza hut something',
'Whataburger something','Macdonald something','Macdonald otherthing',],
"debit":[1.75,2.0,3.5,4.5,1.5,2.0]})
fast_food = ['Macdonald', 'Whataburger', 'pizza hut']
df["found"] = df["description"].str.extract(f'({"|".join(fast_food)})',flags=re.I)
print (df.groupby("found").sum())
#
debit
found
Macdonald 5.25
Whataburger 6.50
pizza hut 3.50
Use dynamic pattern building:
fast_food = ['Macdonald', 'Whataburger', 'pizza hut']
pattern = r"\b(?:{})\b".format("|".join(map(re.escape, fast_food)))
contains = df.loc[df['Description'].str.contains(pattern, flags = re.I, regex = True)]
The \b word boundaries find whole words, not partial words.
The re.escape will protect special characters and they will be parsed as literal characters.
If \b does not work for you, check other approaches at Match a whole word in a string using dynamic regex
I have the following model:
class Address(models.Model):
full_address = models.CharField(max_length=100)
Some full_address ends with "Region". Examples:
123 Main Street, Markham, York Region
1 Bloor Street, Mississauga, Peel Region
I want to remove "Region" from any full_address field that ends with it.
Here is one possible solution, but it is slow, since you need to go over each Address one by one:
for i in Address.objects.filter(full_address__endswith=' Region'):
i.full_address = i.full_address[:-7]
i.save()
Is there some way to achieve the above function using Address.objects.update()?
You can use Query Expressions here
from django.db.models import F, Func, Value
Address.objects.filter(full_address__endswith=' Region').update(
full_address=Func(
F('full_address'),
Value(' Region'), Value(''),
function='replace',
)
)
Note that if you think you could get a string that contained the text ' Region' as well as ending with that string, this will replace both with the empty string. It seems unlikely, but if you want to be especially careful you could use regexp_replace instead of replace and use the appropriate expression for the first Value (i.e. Value(' Region$')) to ensure you only match the one part you want to replace.
I have been trying to search for an item which is there in a text file.
The text file is like
Eg: `
>HEADING
00345
XYZ
MethodName : fdsafk
Date: 23-4-2012
More text and some part containing instances of XYZ`
So I did a dictionary search for XYZ initially and found the positions, but I want only the 1st XYZ and not the rest. There is a property of XYZ that , it will always be between the 5 digit code and the text MethondName .
I am unable to do that.
WORDLIST ZipList = 'Zipcode.txt';
DECLARE Zip;
Document
Document{-> MARKFAST(Zip, ZipList)};
DECLARE Method;
"MethodName" -> Method;
WORDLIST typelist = 'typelist.txt';
DECLARE type;
Document{-> MARKFAST(type, typelist)};
Also how do we use REGEX in UIMA RUTA?
There are many ways to specify this. Here are some examples (not tested):
// just remove the other annotations (assuming type is the one you want)
type{-> UNMARK(type)} ANY{-STARTSWITH(Method)};
// only keep the first one: remove any annotation if there is one somewhere in front of it
// you can also specify this with POSISTION or CURRENTCOUNT, but both are slow
type # #type{-> UNMARK(type)}
// just create a new annotation in between
NUM{REGEXP(".....")} #{-> type} #Method;
There are two options to use regex in UIMA Ruta:
(find) simple regex rules like "[A-Za-z]+" -> Type;
(matches) REGEXP conditions for validating the match of a rule element like
ANY{REGEXP("[A-Za-z]+")-> Type};
Let me know if something is not clear. I will extend the description then.
DISCLAIMER: I am a developer of UIMA Ruta
To make things easier but also more complicated, I tried to implement a concept of "combined / concise tags" that expand further on into multiple basic tag forms.
In this case the tags consist of (one or more) "sub-tag(s)", delimited by semicolons:
food:fruit:apple:sour/sweet
drink:coffee/tea:hot/cold
wall/bike:painted:red/blue
Slashes indicate "sub-tag" interchangeability.
Therefore the interpreter translates them to this:
food:fruit:apple:sour
food:fruit:apple:sweet
drink:coffee:hot
drink:coffee:cold
drink:tea:hot
drink:tea:cold
wall:painted:red
wall:painted:blue
bike:painted:red
bike:painted:blue
The code used (not perfect, but works):
import itertools
def slash_split_tag(tag):
if not '/' in tag:
return tag
subtags = tag.split(':')
pattern, v_pattern = (), ()
for subtag in subtags:
if '/' in subtag:
pattern += (None,)
v_pattern += (tuple(subtag.split('/')),)
else:
pattern += (subtag,)
def merge_pattern_and_product(pattern, product):
ret = list(pattern)
for e in product:
ret[ret.index(None)] = e
return ret
CartesianProduct = tuple(itertools.product(*v_pattern)) # http://stackoverflow.com/a/170248
return [ ':'.join(merge_pattern_and_product(pattern, product)) for product in CartesianProduct ]
#===============================================================================
# T E S T
#===============================================================================
for tag in slash_split_tag('drink:coffee/tea:hot/cold'):
print tag
print
for tag in slash_split_tag('A1/A2:B1/B2/B3:C1/C2:D1/D2/D3/D4/EE'):
print tag
Question: How can I possibly revert this process? I need this for readability reasons.
Here's a simple, first-pass attempt at such a function:
def compress_list(alist):
"""Compress a list of colon-separated strings into a more compact
representation.
"""
components = [ss.split(':') for ss in alist]
# Check that every string in the supplied list has the same number of tags
tag_counts = [len(cc) for cc in components]
if len(set(tag_counts)) != 1:
raise ValueError("Not all of the strings have the same number of tags")
# For each component, gather a list of all the applicable tags. The set
# at index k of tag_possibilities is all the possibilities for the
# kth tag
tag_possibilities = list()
for tag_idx in range(tag_counts[0]):
tag_possibilities.append(set(cc[tag_idx] for cc in components))
# Now take the list of tags, and turn them into slash-separated strings
tag_possibilities_strs = ['/'.join(tt) for tt in tag_possibilities]
# Finally, stitch this together with colons
return ':'.join(tag_possibilities_strs)
Hopefully the comments are sufficient in explaining how it works. A few caveats, however:
It doesn't do anything sensible such as escaping backslashes if it finds them in the list of tags.
This doesn't recognise if there's a more subtle division going on, or if it gets an incomplete list of tags. Consider this example:
fish:cheese:red
chips:cheese:red
fish:chalk:red
It won't realise that only cheese has both fish and chips, and will instead collapse this to fish/chips:cheese/chalk:red.
The order of the tags in the finished string is random (or at least, I don't think it has anything to do with the order of the strings in the given list). You could sort tt before you join it with slashes if that's important.
Testing with the three lists given in the question seems to work, although as I said, the order may be different to the initial strings:
food:fruit:apple:sweet/sour
drink:tea/coffee:hot/cold
wall/bike:painted:blue/red