creating a query using relational calculus (tuple relational calculus) - tuples

I have the following database scheme:
student: sid
course: pid
prerequisite: cid, precid
records: sid, cid
How do I go about creating a query in relational calculus such that i need to find all courses for which all its prerequisites have been taken by every student who has taken the course PSY100? I want to write this with at least one universal quantification ∀.
My idea was to find courses for which, for all courses to return, and for all students in records, there exist a student who took PSY100, that has also taken the prerequisite of that course to return.
so i have it written like this
{x:cid | ∃ c IN course [c(cid) = x(cid) AND
∀ y IN course ∀ r record
( y(cid) = c(cid) AND r(cid) = c(cid)
→ ∃ p IN prerequisite ( r(cid) = PSY100 AND r(cid) = p(pid) )]
I am really confused about this. I am pretty sure this is wrong. any help would be greatly appreciated!

Related

Prolog - searching a list inside a predicate

I have predicates of students and sports they do, and I want to find out which students do a particular sport. I have this sofar, but i can only get results if I enter exact sports in a list , and my find predicate works only to find a sport in a list. I don't know how to put it together to use to find students that do 1 sport:
student('Quinton Tarentino', male, 12).
student('Tom Hanks', male, 9).
student('Ed Harris', male, 11).
does_sport('Quinton Tarentino', [soccer, hockey, cricket]).
does_sport('Tom Hanks', []).
does_sport('Ed Harris', [hockey, swimming]).
sports([soccer, hockey, swimming, cricket, netball]).
find(X) :- sports(L), member(X, L).
I tried things like:
?- does_sport(X, find(soccer, L)).
This just returns false. I know I need to link my sports list to the does_sports predicate but not sure how.
Any advice appreciated :)
To find out which students do a particular sport, you could define a predicate like so:
student_sport(St,Sp) :-
does_sport(St,L), % L is a list of sports student St does
member(Sp,L). % Sp is a member of list L
Then you can query for e.g. soccer, as you seem to intend in your question, like so:
?- student_sport(St,soccer).
St = 'Quintin Tarentino' ? ;
no
Hockey on the other hand yields two results:
?- student_sport(St,hockey).
St = 'Quintin Tarentino' ? ;
St = 'Ed Harris' ? ;
no
If you want to have a list of students doing hockey instead, you can use findall/3 like so:
?- findall(St,student_sport(St,hockey),L).
L = ['Quintin Tarentino','Ed Harris']
Or alternatively setof/3 to get a sorted list (without duplicates, in case you happened to have facts that contain any):
?- setof(St,student_sport(St,hockey),L).
L = ['Ed Harris','Quintin Tarentino']
Note that in some Prologs you might have to explicitly include a library to use member/2, e.g. in Yap: :- use_module(library(lists))., while others autoload it, e.g. SWI.
EDIT:
Concerning the issues you raised in your comment, let's maybe start with your observation that student_sport/2 produces the answers one at a time. That is intentional, as suggested by the predicate name that contains the word student in singular: It describes a relation between a student and a particular sport that very student practices. That's why I added the example queries with findall/3 and setof/3, to show ways how you can collect solutions in a list. You can easily define a predicate students_sport/2 that describes a relation between a particular sport and a list of all students who practice it:
students_sport(L,Sp) :-
setof(St,student_sport(St,Sp),L).
Concerning the sports-austere, you can choose an atom to denote that case, say none and then add an according rule to student_sport/2 like so:
student_sport(St,none) :- % <- rule for the sports-austere
does_sport(St,[]). % <- succeeds if the student does no sport
student_sport(St,Sp) :-
does_sport(St,L),
member(Sp,L).
This yields the following results:
?- student_sport(St,none).
St = 'Tom Hanks' ? ;
no
?- students_sport(St,none).
St = ['Tom Hanks']
?- students_sport(St,hockey).
St = ['Ed Harris','Quintin Tarentino']
?- students_sport(St,Sp).
Sp = cricket,
St = ['Quintin Tarentino'] ? ;
Sp = hockey,
St = ['Ed Harris','Quintin Tarentino'] ? ;
Sp = none,
St = ['Tom Hanks'] ? ;
Sp = soccer,
St = ['Quintin Tarentino'] ? ;
Sp = swimming,
St = ['Ed Harris']
And finally, concerning your assumption of your code being exactly as I wrote it: There is a similarity in structure, namely your predicate find/1 having a first goal (sports/1) involving a list and subsequently using member/2 to check for membership in that list. The second rule (or single rule before the edit) of student_sport/2 is also having a first goal (but a different one: does_sport/2) involving a list and subsequently using member/2 to check for membership in that list. Here the similarities end. The version I provided is not using sports/1 at all but rather the list of sports associated with a particular student in does_sport/2. Note that find/1 does not describe any connection to students whatsoever. Furthermore your query ?- does_sport(X, find(soccer, L)). indicates that you seem to expect some sort of return value. You can regard predicates as functions returning true or false but that is usually not very helpful when programming Prolog. The argument find(soccer,L) is not being called as you seem to expect, but literally passed as an argument. And since your facts do not include something along the lines of
does_sport(*SomeStudentHere*, find(soccer,L)).
your query fails.

Retrieve an answer from a list in prolog

Hello I am a beginner in Prolog and i have stuck in the following problem.
Here it goes , I have a "database" which gives me information about the school schedule
something like this :
school(NameOfTeacher,([(Course,Day) ......]).
When asking the following
-? find(staff(NameOfTeacher,Course),Day)
the answer should be Day = (the day the course takes place). I manage to take an answer like Day = (Course,Day) but that it not what I want. Has anyone any idea of how to do this? Thank you in advance.
Remember that Prolog unification is a kind of bi-directional pattern matching, so you can use it to both create and decompose data structures:
?- Pair = (maths,monday), (_,Day) = Pair.
Pair = (maths, monday)
Day = monday
Yes

prolog - extracting information

Currently doing some PROLOG exercises - very new to this all so bear with me. I have the following knowledge base:
/* The structure of a subject teaching team takes the form:
team(Subject, Leader, Non_management_staff, Deputy).
Non_management_staff is a (possibly empty) list of teacher
structures and excludes the teacher structures for Leader and
Deputy.
teacher structures take the form:
teacher(Surname, Initial,
profile(Years_teaching,Second_subject,Club_supervision)).
Assume that each teacher has his or her team's Subject as their
main subject. */
team(computer_science,teacher(may,j,profile(20,ict,model_railways)),
[teacher(clarke,j,profile(32,ict,car_maintenance))],
teacher(hamm,p,profile(11,ict,science_club))).
team(maths,teacher(vorderly,c,profile(25,computer_science,chess)),
[teacher(o_connell,d,profile(10,music,orchestra)),
teacher(brankin,p,profile(20,home_economics,cookery_club))],
teacher(lynas,d,profile(10,pe,football))).
team(english,teacher(brewster,f,profile(30,french,french_society)),
[ ],
teacher(flaxman,j,profile(35,drama,debating_society))).
team(art,teacher(lawless,m,profile(20,english,film_club)),
[teacher(walker,k,profile(25,english,debating_society)),
teacher(brankin,i,profile(20,home_economics,writing)),
teacher(boyson,r,profile(30,english,writing))],
teacher(carthy,m,profile(20,music,orchestra))).
subject(X):- team(X,_,_,_).
leader(X) :- team(_,X,_,_).
deputy(X) :- team(_,_,_,X).
non_management(X) :-
team(_,_,Non_management,_),
member(X,Non_management).
exists(X) :-
subject(X);
leader(X);
deputy(X);
non_management(X).
I now have to write a rule which (q) the initials of teacher A and teacher B, where teacher A and teacher B are in different subject teams, each have Home Economics as their second
subject, and each have the surname Brankin.
I'm stuck on how to compare all of the entities in the knowledge base. Prior to this, I have only extracted values from single entities (in the case of this example, single teachers). For example:
question1(Initial,Surname) :-
exists(teacher(Surname,Initial,profile(_,english,_))).
Any help much appreciated.
You don't need to compare all the entities in the knowledge base explicitly - that's implicit in the way Prolog answers queries, whether they are simple or complex. For the first few criteria, you can just say
team(W,X,Y,Z), team(J,K,L,I), W \= J.
and that will get you all possible combinations of different teams via backtracking. You can extend the query with something like
member(A,Y), member(B,L), A=teacher(...), B=teacher(...)
to process the other criteria.

Erlang: Sorting or Ordering Function for List of Tuple Lists

I have trouble sorting two related but separate lists of tuple lists. One list is made up of tuple lists representing a blog post. The other list is made up of tuple lists representing a comment post.
The problem is when you would like the same order based on blog id value. The lists for blog posts is sorted via the date value. So you cannot just sort numerically via blog id for both blog and comment post. And you cannot just sort the comment post via date value because the date values of blog and related comment post may be different.
I am not sure how to approach the problem - at least not in an elegant way.
Should I use lists:nth and consequently get each tuple list and position value? Then I would get the value of blog id, Then I would search in the list for comment posts for that id. Get the value of that tuple list. Associate the value of that tuple list in a new list with the appropriate nth position value.
Should I use the lists:sort function?
Any suggestions with code samples much appreciated.
Here are two sample lists of tuple lists that can be used as a basis :
[[{<<"blog_id">>,<<"a2">>},
{<<"postDate">>,<<"2010-12-4T6:10:12">>},
{<<"message">>,<<"la di da bo di do">>}],
[{<<"blog_id">>,<<"b8">>},
{<<"postDate">>,<<"2009-12-3T10:09:33">>},
{<<"message">>,<<"that is cool">>}],
[{<<"blog_id">>,<<"a9">>},
{<<"postDate">>,<<"2009-12-2T18:12:29">>},
{<<"message">>,<<"i like san francisco">>}]]
[[{<<"comment_id">>,<<"n6">>},
{<<"related_blog_id">>,<<"b8">>},
{<<"postDate">>,<<"2010-12-5T15:10:12">>},
{<<"message">>,<<"yup really neat">>}],
[{<<"comment_id">>,<<"y2">>},
{<<"related_blog_id">>,<<"a9">>},
{<<"postDate">>,<<"2009-12-6T10:09:33">>},
{<<"message">>,<<"yes but rent is expensive">>}],
[{<<"comment_id">>,<<"x4">>},
{<<"related_blog_id">>,<<"a2">>},
{<<"postDate">>,<<"2009-12-5T16:12:29">>},
{<<"message">>,<<"sounds like a hit">>}]]
And the desired output is the following with first list unchanged and second list reordered :
[[{<<"blog_id">>,<<"a2">>},
{<<"postDate">>,<<"2010-12-4T6:10:12">>},
{<<"message">>,<<"la di da bo di do">>}],
[{<<"blog_id">>,<<"b8">>},
{<<"postDate">>,<<"2009-12-3T10:09:33">>},
{<<"message">>,<<"that is cool">>}],
[{<<"blog_id">>,<<"a9">>},
{<<"postDate">>,<<"2009-12-2T18:12:29">>},
{<<"message">>,<<"i like san francisco">>}]]
[ [{<<"comment_id">>,<<"x4">>},
{<<"related_blog_id">>,<<"a2">>},
{<<"postDate">>,<<"2009-12-5T16:12:29">>},
{<<"message">>,<<"sounds like a hit">>}],
[{<<"comment_id">>,<<"n6">>},
{<<"related_blog_id">>,<<"b8">>},
{<<"postDate">>,<<"2010-12-5T15:10:12">>},
{<<"message">>,<<"yup really neat">>}],
[{<<"comment_id">>,<<"y2">>},
{<<"related_blog_id">>,<<"a9">>},
{<<"postDate">>,<<"2009-12-6T10:09:33">>},
{<<"message">>,<<"yes but rent is expensive">>}]]
Ok, new try then :)
We have:
-module(foo).
-compile(export_all).
Basic module exports to test the thing
blogs() ->
[[{<<"blog_id">>,<<"a2">>},
{<<"postDate">>,<<"2010-12-4T6:10:12">>},
{<<"message">>,<<"la di da bo di do">>}],
[{<<"blog_id">>,<<"b8">>},
{<<"postDate">>,<<"2009-12-3T10:09:33">>},
{<<"message">>,<<"that is cool">>}],
[{<<"blog_id">>,<<"a9">>},
{<<"postDate">>,<<"2009-12-2T18:12:29">>},
{<<"message">>,<<"i like san francisco">>}]].
Your definition of blogs.
comments() ->
[[{<<"comment_id">>,<<"n6">>},
{<<"related_blog_id">>,<<"b8">>},
{<<"postDate">>,<<"2010-12-5T15:10:12">>},
{<<"message">>,<<"yup really neat">>}],
[{<<"comment_id">>,<<"y2">>},
{<<"related_blog_id">>,<<"a9">>},
{<<"postDate">>,<<"2009-12-6T10:09:33">>},
{<<"message">>,<<"yes but rent is expensive">>}],
[{<<"comment_id">>,<<"x4">>},
{<<"related_blog_id">>,<<"a2">>},
{<<"postDate">>,<<"2009-12-5T16:12:29">>},
{<<"message">>,<<"sounds like a hit">>}]].
Your definition of comments.
sorted_comments() ->
[[{<<"comment_id">>,<<"x4">>},
{<<"related_blog_id">>,<<"a2">>},
{<<"postDate">>,<<"2009-12-5T16:12:29">>},
{<<"message">>,<<"sounds like a hit">>}],
[{<<"comment_id">>,<<"n6">>},
{<<"related_blog_id">>,<<"b8">>},
{<<"postDate">>,<<"2010-12-5T15:10:12">>},
{<<"message">>,<<"yup really neat">>}],
[{<<"comment_id">>,<<"y2">>},
{<<"related_blog_id">>,<<"a9">>},
{<<"postDate">>,<<"2009-12-6T10:09:33">>},
{<<"message">>,<<"yes but rent is expensive">>}]].
Your definition of being sorted.
sort(Blogs, Comments) ->
%% Create list of blog id's
Bs = [proplists:get_value(<<"blog_id">>, B) || B <- Blogs],
Fetch all the blog_id values from the Blogs.
%% Create the numbering
DB = dict:from_list([Item || Item <- lists:zip(Bs,
lists:seq(1, length(Bs)))]),
Number the order the blogs occur in. Stuff these into a dict for fast lookup later.
%% Sorter function:
F = fun(I, J) ->
II = proplists:get_value(<<"related_blog_id">>,
I),
JJ = proplists:get_value(<<"related_blog_id">>,
J),
dict:fetch(II, DB) =< dict:fetch(JJ, DB)
end,
This function compares two Comments, I, J to each other based on their related blog_id.
{Blogs, lists:sort(F, Comments)}.
Return what we want to return.
sort_test() ->
{blogs(), sorted_comments()} == sort(blogs(), comments()).
Tester function.
2> c(foo).
{ok,foo}
3> foo:sort_test().
true

Is there a way to filter a django queryset based on string similarity (a la python difflib)?

I have a need to match cold leads against a database of our clients.
The leads come from a third party provider in bulk (thousands of records) and sales is asking us to (in their words) "filter out our clients" so they don't try to sell our service to a established client.
Obviously, there are misspellings in the leads. Charles becomes Charlie, Joseph becomes Joe, etc. So I can't really just do a filter comparing lead_first_name to client_first_name, etc.
I need to use some sort of string similarity mechanism.
Right now I'm using the lovely difflib to compare the leads' first and last names to a list generated with Client.objects.all(). It works, but because of the number of clients it tends to be slow.
I know that most sql databases have soundex and difference functions. See my test of it in the update below - it doesn't work as well as difflib.
Is there another solution? Is there a better solution?
Edit:
Soundex, at least in my db, doesn't behave as well as difflib.
Here is a simple test - look for "Joe Lopes" in a table containing "Joseph Lopes":
with temp (first_name, last_name) as (
select 'Joseph', 'Lopes'
union
select 'Joe', 'Satriani'
union
select 'CZ', 'Lopes'
union
select 'Blah', 'Lopes'
union
select 'Antonio', 'Lopes'
union
select 'Carlos', 'Lopes'
)
select first_name, last_name
from temp
where difference(first_name+' '+last_name, 'Joe Lopes') >= 3
order by difference(first_name+' '+last_name, 'Joe Lopes')
The above returns "Joe Satriani" as the only match. Even reducing the similarity threshold to 2 doesn't return "Joseph Lopes" as a potential match.
But difflib does a much better job:
difflib.get_close_matches('Joe Lopes', ['Joseph Lopes', 'Joe Satriani', 'CZ Lopes', 'Blah Lopes', 'Antonio Lopes', 'Carlos Lopes'])
['Joseph Lopes', 'CZ Lopes', 'Carlos Lopes']
Edit after gruszczy's response:
Before writing my own, I looked for and found a T-SQL implementation of Levenshtein Distance in the repository of all knowledge.
In testing it, it still won't do a better matching job than difflib.
Which led me to research what algorithm is behind difflib. It seems to be a modified version of the Ratcliff-Obershelp algorithm.
Unhappily I can't seem to find some other kind soul who has already created a T-SQL implementation based on difflib's... I'll try my hand at it when I can.
If nobody else comes up with a better answer in the next few days, I'll grant it to gruszczy. Thanks, kind sir.
soundex won't help you, because it's a phonetic algorithm. Joe and Joseph aren't similar phonetically, so soundex won't mark them as similar.
You can try Levenshtein distance, which is implemented in PostgreSQL. Maybe in your database too and if not, you should be able to write a stored procedure, which will calculate the distance between two strings and use it in your computation.
It's possible with trigram_similar lookups since Django 1.10, see docs for PostgreSQL specific lookups and Full text search
As per the answer of andilabs you can use the Levenshtein function to create your custom function. Postgres doc indicates that the Levenshtein function is as follows:
levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int
levenshtein(text source, text target) returns int
andilabs answer can use the only second function. If you want a more advanced search with insertion/deletion/substitution costs, you can rewrite function like this:
from django.db.models import Func
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s', %(ins_cost)d, %(del_cost)d, %(sub_cost)d)"
function = 'levenshtein'
def __init__(self, expression, search_term, ins_cost=1, del_cost=1, sub_cost=1, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
ins_cost=ins_cost,
del_cost=del_cost,
sub_cost=sub_cost,
**extras
)
And call the function:
from django.db.models import F
Spot.objects.annotate(
lev_dist=Levenshtein(F('name'), 'Kfaka', 3, 3, 1) # ins = 3, del = 3, sub = 1
).filter(
lev_dist__lte=2
)
If you need getting there with django and postgres and don't want to use introduced in 1.10 trigram-similarity https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/lookups/#trigram-similarity you can implement using Levensthein like these:
Extension needed fuzzystrmatch
you need adding postgres extension to your db in psql:
CREATE EXTENSION fuzzystrmatch;
Lets define custom function with wich we can annotate queryset. It just take one argument the search_term and uses postgres levenshtein function (see docs):
from django.db.models import Func
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s')"
function = "levenshtein"
def __init__(self, expression, search_term, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
**extras
)
then in any other place in project we just import defined Levenshtein and F to pass the django field.
from django.db.models import F
Spot.objects.annotate(
lev_dist=Levenshtein(F('name'), 'Kfaka')
).filter(
lev_dist__lte=2
)