I am trying to get into pyparsing, I want to create what I think is a simple grammar for say a shopping basket. The following illustrates
basket=['metal basket','wicker basket','plastic basket']
fish=['haddock','plaice','dover sole']
meat=['beef','lamb','pork']
vegetable=['tomatoe','onion','cabbage','carrot']
fruit=['apple','mango','orange','strawberry']
So the rule for shopping is you must have
1 Shopping basket
1 or more vegetables
zero or more fruits
fish are optional
The resulting parser must enforce the list of requirements above. It should not matter what order the items are placed in the basket, i.e a list
of
metal basket
haddock
tomatoe
apple
cabbage
orange
is just as valid as
wicker basket
tomatoe
apple
orange
apple
The one that should fail however is
- lamb
- tomatoe
- apple
- wicker basket
- apple
Because the basket must always be first in the list. I am at a loss as to how to do this
I have tired:
basket + OneOrMore(vegetable) + ZeroOrMore(fruit) + StringEnd()
But doesn't seem to work. I'm using pyparsing on python 2.7 on Windows 7. Thanks
Each is the pyparsing class for specifying "all of these things, but in any order". Think of it as a special form of And. And in fact, the operator for Each is &.
You want to define various valid combinations of basket contents, after the basket is given first.
basket + (OneOrMore(vegetable) & ZeroOrMore(fruit) & ZeroOrMore(fish))
You can leave off the StringEnd() at the end - just specify parseAll=True in your call to parseString.
Alternatively, you could just put all the ingredients into a single clump like:
basket + ZeroOrMore(vegetable | fruit | fish)
and then put the validation into a parse action. I'm actually more in favor of this second approach than of using Each in the parser itself. For one thing, a parse action, implemented in Python code, can contain much more complex rules ("more vegetables than fruit", "at least as many vegetables as fish", "oysters only in months containing an 'R'", etc.). Also, I think these rules are more likely to change over time, and all such changes will be localized to the parse action, instead of forcing changes in the parser itself.
Related
There are a lot of similar questions, but I'm only finding partial solutions.
I have a group of users stored as objects, with a name attribute (User.name). I'm hoping to do a query with a user input (Foo) such that I can (without being case sensitive) find all users where either:
foo is in User.name
User.name is in foo
As an example, I want the user to be able to type in "Jeff William II" and return "Anderson Jeff William II", "jeff william iii", as well as "Jeff Will" and "william ii"
I know I can use the Q function to combine two queries, and I can use annotate() to transform User.name like so (though I welcome edits if you notice errors in this code):
users = User.objects.annotate(name_upper=Upper(name)).filter(Q(name_upper__icontains=foo) | Q(name_upper__in=foo))
But I'm running into trouble using __in to match multiple letters within a string. So if User.name is "F" I get a hit when inputting Jeff but if User.name is "JE" then it doesn't show up. How do I match multiple letters, or is there a better way to make this query?
SIDE NOTE: I initially solved this with the following, but would prefer making a query if possible.
for u in User.objects.all():
if u.name in foo or foo in u.name:
Please do not use Upper. It is a common misconception that by converting two items to uppercase (or lowercase) you make a case insenstive equality check. Certain characters, like ß have no uppercase/lowercase, and have more complicated rules (collation) to consider these equivalent. In Python one uses .casefold(…) [python-doc] for that.
For the database, you can simply make use of annotate, and then use two checks:
from django.db.models import CharField, F, Q, Value
foo = 'Jeff William II'
User.objects.annotate(foo=Value(foo, CharField())).filter(
Q(name__icontains=foo) | Q(foo__icontains=F('name'))
)
I'm using textacy's pos_regex_matches method to find certain chunks of text in sentences.
For instance, assuming I have the text: Huey, Dewey, and Louie are triplet cartoon characters., I'd like to detect that Huey, Dewey, and Louie is an enumeration.
To do so, I use the following code (on testacy 0.3.4, the version available at the time of writing):
import textacy
sentence = 'Huey, Dewey, and Louie are triplet cartoon characters.'
pattern = r'<PROPN>+ (<PUNCT|CCONJ> <PUNCT|CCONJ>? <PROPN>+)*'
doc = textacy.Doc(sentence, lang='en')
lists = textacy.extract.pos_regex_matches(doc, pattern)
for list in lists:
print(list.text)
which prints:
Huey, Dewey, and Louie
However, if I have something like the following:
sentence = 'Donald Duck - Disney'
then the - (dash) is recognised as <PUNCT> and the whole sentence is recognised as a list -- which it isn't.
Is there a way to specify that only , and ; are valid <PUNCT> for lists?
I've looked for some reference about this regex language for matching PoS tags with no luck, can anybody help? Thanks in advance!
PS: I tried to replace <PUNCT|CCONJ> with <[;,]|CCONJ>, <;,|CCONJ>, <[;,]|CCONJ>, <PUNCT[;,]|CCONJ>, <;|,|CCONJ> and <';'|','|CCONJ> as suggested in the comments, but it didn't work...
Is short, it is not possible: see this official page.
However the merge request contains the code of the modified version described in the page, therefore one can recreate the functionality, despite it's less performing than using a SpaCy's Matcher (see code and example -- though I have no idea how to reimplement my problem using a Matcher).
If you want to go down this lane anyway, you have to change the line:
words.extend(map(lambda x: re.sub(r'\W', '', x), keyword_map[w]))
with the following:
words.extend(keyword_map[w])
otherwise every symbol (like , and ; in my case) will be stripped off.
Hello fellow community,
First off, thank you for looking into my question.
I have a seperate text file called lions.txt which holds the following data:
The lion (Panthera leo) is one of the big cats in the genus Panthera and a member of the family Felidae. The commonly used term African lion collectively denotes the several subspecies in Africa.
With some males exceeding=250kg (550 lb) in weight,[4] it is the second-largest living cat after the tiger, the lion is an awesome cat.
The issue that I have is that I have created two functions. Each function takes in the users requested keyword to search, and if both keywords are in the same sentence then python will paste all sentences that have both the keywords into a seperate file called txtfile.txt - The issue is if I run one function it will find ALL the sentences based on the two keywords and paste it into the txtfile.txt. If I then run the second function and comment out the first function out it will also find ALL the sentence based on the keywords specified by the user. However, if I uncomment both of them then only the function which is at the top of the list will have ALL its sentences found and pasted to txtfile.txt
Example:
import re
import os
os.chdir("C:\Python 2016 Training") # Changes directory to the following path
what_directory_am_i_in = os.getcwd() # Variable holds the directory path
print what_directory_am_i_in # Prints what directory the user is in so they can confirm they are in the correct location
patterns = open("lions.txt", "r") #Opens the file we are searching through
shep = open('txtfile.txt', "w") # Creates the file
search = raw_input("What you looking for? ") # Takes in user input that will be used to search in the searchtext() function and kev() function
print search
def searchtext():
for line in patterns:
if search in line and "family" in line:
shep.write(line)
shep.write("\n")
def kev():
shep.write('\n')
for id in patterns:
match = re.search('exceeding=(\d+)', id)
if match and search in id:
shep.write("\n")
shep.write("THIS IS THE SECOND FUNCTION")
shep.write(id)
searchtext()
kev()
If I comment out searchtext() and have the user enter Keyword lion it will copy paste the following to txtfile.txt "THIS IS THE SECOND FUNCTIONWith some males lion exceeding=250kg (550 lb) in weight,[4] it is the second-largest living cat after the tiger, the lion is an awesome cat."
If I uncomment both and type lion it will print out the first sentence and not the second "The lion (Panthera leo) is one of the big cats in the genus Panthera and a member of the family Felidae. The commonly used term African lion collectively denotes the several subspecies in Africa."
Ultimately the end goal is to be able to call each function individually and if the keywords are in a sentence then Python should paste all the sentences that have those keywords in them to txtfile.txt.
Thank you so much for your help. I suspect my code is overwriting itself, but im not sure.
Regards,
Kevin
Use file_object.seek(0) to move the file pointer back to the beginning of the file before attempting to read from it again. – pzp
I'm writing an application that allows the user to configure the output using templates. For example:
Variables:
name = "BoppreH"
language = "Python"
Template:
My name is {name} and I like {language}.
Output:
My name is BoppreH and I like Python.
This works fine for simple data, like strings and numbers, but I can't find a good syntax for lists, more specifically for their delimiters.
fruits = ["banana", "apple", "watermelon"]
I like {???}.
I like banana, apple, watermelon.
In this case the desired delimiter was a comma, but how can the user specify that? Is there some template format with this feature?
I'm more concerned about making the syntax simple to understand, regardless of language or library.
Implement filters, and require their use for non-scalar types.
I like {fruits|join:", "}.
Typically, a list contains an unknown number of members, and sometimes variables/placeholders of its own. An example might be listing the name of a person along with their phone numbers. The desired output might look something like this:
John Doe
555-1212
555-1234
In a templating system that supported this, you'd need two types of variables: One that designated a placeholder for a value (like the curly braces you're using now), and another to denote the start and end of the list. So, something like this:
{name}
{{phone_numbers}}{phone}{{/phone_numbers}}
Your array of values might look like this:
values = [names: "John Doe", phone_numbers: [ phone: "555-1212", phone: "555-1234"]]
Each value in the "phone_numbers" array would create a new instance of everything that existed between {{phone_numbers}} and {{/phone_numbers}}, placing the values contained within those two "tags".
Let's say I have a ForeignKey from Coconut to Swallow (ie, a swallow has carried many coconuts, but each coconut has been carried by only one swallow). Now let's say that I have a ForeignKey from husk_segment to Coconut.
Now, I have a list of husk_segments and I want to find out if ALL of these have been carried gripped by a particular swallow.
I can have swallow.objects.filter(coconuts_carried__husk_sements__in = husk_segment_list) to show that this swallow has gripped at least one husk segment in the list. Now, how can I show that every husk segment that the swallow has ever carried are in this list?
I can have swallow.objects.filter(coconuts_carried__husk_sements__in =
husk_segment_list) to show that this swallow has gripped at least one
husk segment in the list.
No, this is wrong, this gives you a list of swallows which have carried at least one husk segment from *husk_segment_list*.
If I've understood right, we are talking about checking for a specific swallow.
So, from your description I guess your models look something like this:
class Swallow(models.Model):
name = models.CharField(max_length=100)
class Coconut(models.Model):
swallow = models.ForeignKey(Swallow, related_name='coconuts_carried')
class HuskSegment(models.Model):
coconut = models.ForeignKey(Coconut, related_name='husk_segments')
If you already have the husk segment list you need to check againts the swallows segments, there's no reason you need to resolve it in a query. Get the swallows' segments and check if it's a superset of your husk segment list.
So we have:
#husk_segment_list = [<object: HuskSegment>, <object: HuskSegment>, <object: HuskSegment>...]
husk_segments_set = set((husk.pk for husk in husk_segment_list))
whitey = Swallow.object.get(name='Neochelidon tibialis')
wh_segments_set = set((value[0] for value in HuskSegment.objects.filter(coconut__in=whitey.coconuts_carried.all()).values_list('id')))
whitey_has_carried_all = wh_segments_set.issuperset(husk_segments_set)
See the docs on queries spanning multi-valued relationships -- you should chain filter calls.
A simple way to go would be something like
queryset = Swallow.objects.all()
for coconut in coconuts:
queryset = queryset.filter(coconuts_carried=coconut)
A fancy way to do this in one line using reduce would be
reduce(lambda q, c: q.filter(coconuts_carried=c), coconuts, Swallow.objects.all())
If i understood your altered question correctly you should be able to compare the coconut_carried_set of the swallow with the list of coconuts that have been carried.
see docs
I'm not entirely sure that this is what you want - I guess it depends on if you know which swallow you want to check beforehand or if you want to check it against all swallows - in that case there may be a better solution.