Doctrine2 - use fulltext and MyIsam - doctrine-orm

I am building an app in Symfony2, using Doctrine2 with mysql. I would like to use a fulltext search. I can't find much on how to implement this - right now I'm stuck on how to set the table engine to myisam.
It seems that it's not possible to set the table type using annotations. Also, if I did it manually by running an "ALTER TABLE" query, I'm not sure if Doctrine2 will continue to work properly - does it depend on the InnoDB foreign keys?
Is there a better place to ask these questions?

INTRODUCTION
Doctrine2 uses InnoDB which supports Foreign Keys used in Doctrine associations. But as MyISAM does not support this yet, you can not use MyISAM to manage Doctrine Entities.
On the other side, MySQL v5.6, currently in development, will bring the support of InnoDB FTS and so will enable the Full-Text search in InnoDB tables.
SOLUTIONS
So there are two solutions :
Using the MySQL v5.6 at your own risks and hacking a bit Doctrine to implement a MATCH AGAINST method : link in french... (I could translate if needed but there still are bugs and I would not recommend this solution)
As described by quickshifti, creating a MyISAM table with fulltext index just to perform the search on. As Doctrine2 allows native SQL requests and as you can map this request to an entity (details here).
EXAMPLE FOR THE 2nd SOLUTION
Consider the following tables :
table 'user' : InnoDB [id, name, email]
table 'search_user : MyISAM [user_id, name -> FULLTEXT]
Then you just have to write a search request with a JOIN and mapping (in a repository) :
<?php
public function searchUser($string) {
// 1. Mapping
$rsm = new ResultSetMapping();
$rsm->addEntityResult('Acme\DefaultBundle\Entity\User', 'u');
$rsm->addFieldResult('u', 'id', 'id');
$rsm->addFieldResult('u', 'name', 'name');
$rsm->addFieldResult('u', 'email', 'email');
// 2. Native SQL
$sql = 'SELECT u.id, u.name FROM search_user AS s JOIN user AS u ON s.user_id = u.id WHERE MATCH(s.name) AGAINST($string IN BOOLEAN MODE)> 0;
// 3. Run the query
$query = $this->_em->createNativeQuery($sql, $rsm);
// 4. Get the results as Entities !
$results = $query->getResult();
return $results;
}
?>
But the FULLTEXT index needs to stay up-to-date. Instead of using a cron task, you can add triggers (INSERT, UPDATE and DELETE) like this :
CREATE TRIGGER trigger_insert_search_user
AFTER INSERT ON user
FOR EACH ROW
INSERT INTO search_user SET user_id=NEW.id, name=NEW.name;
CREATE TRIGGER trigger_update_search_user
AFTER UPDATE ON user
FOR EACH ROW
UPDATE search_user SET name=name WHERE user_id=OLD.id;
CREATE TRIGGER trigger_delete_search_user
AFTER DELETE ON user
FOR EACH ROW
DELETE FROM search_user WHERE user_id=OLD.id;
So that your search_user table will always get the last changes.
Of course, this is just an example, I wanted to keep it simple, and I know this query could be done with a LIKE.

Doctrine ditched the fulltext Searchable feature from v1 on the move to Doctrine2. You will likely have to roll your own support for a fulltext search in Doctrine2.
I'm considering using migrations to generate the tables themselves, running the search queries w/ the native SQL query option to get sets of ids that refer to tables managed by Doctrine, then using said sets of ids to hydrate records normally through Doctrine.
Will probly cron something periodic to update the fulltext tables.

Related

Is there any way to convert this into django orm?

I needed to get all the rows in the table1 even if it is not existing in table2 and display it as zero. I got it using raw sql query but in django ORM i am getting the values existing only in table2. The only difference on my django orm is that iI am using inner join while in the raw sql query I am using left join. Is there any way to achieve this or should I use raw sql query? Thanks.
Django ORM:
total=ApplicantInfo.objects.select_related('source_type').values('source_type__source_type').annotate(total_count=Count('source_type'))
OUTPUT OF DJANGO ORM IN RAW SQL:
SELECT "applicant_sourcetype"."source_type", COUNT("applicant_applicantinfo"."source_type_id") AS "total_count" FROM "applicant_applicantinfo" INNER JOIN "applicant_sourcetype" ON ("applicant_applicantinfo"."source_type_id" = "applicant_sourcetype"."id") GROUP BY "applicant_sourcetype"."source_type"
RAW SQL:
SELECT source.source_type, count(info.source_type_id) as total_counts from applicant_sourcetype as source LEFT JOIN applicant_applicantinfo as info ON source.id = info.source_type_id GROUP BY source.id
You cannot query on ApplicantInfo if you want a left join with it. Query on SourceType instead:
qs = (SourceType.objects
.values('source_type')
.annotate(cnt=Count('applicantinfo'))
.values('source_type', 'cnt')
)
Since you haven't posted the models, you may need to adapt the field names to make it work.

T4 POCO. Foreign Key Relationships (List<T> or T?)

I created a T4 template for our POCO objects using SMO to grab the object details from SQL Server. Right now I'm trying to determine how to determine the datatype of the navigation properties. My main issue is how to determine if it should be T or List<T>.
I'm not using EF or Linq to SQL.
Any ideas on what I should be checking to accurately determine the datatype?
Depending on which version of SQL you're using, you can use the INFORMATION_SCHEMA to get just about everything you need to build out your POCOs. The following is from http://searchcode.com/codesearch/view/15361587. It lists all tables and columns along with many attributes including whether the column is a foreign key.
SELECT
--TBL.TABLE_SCHEMA,
TBL.TABLE_TYPE,
COL.TABLE_NAME,
COL.ORDINAL_POSITION,
COL.COLUMN_NAME,
COL.DATA_TYPE,
COL.IS_NULLABLE,
ISNULL(COL.CHARACTER_MAXIMUM_LENGTH,-1) AS MAXIMUM_LENGTH,
--COL.TABLE_CATALOG,
(CASE KEYUSG.CONSTRAINT_TYPE WHEN 'PRIMARY KEY' THEN 'YES' ELSE 'NO' END) PRIMARY_KEY,
(CASE KEYUSG.CONSTRAINT_TYPE WHEN 'FOREIGN KEY' THEN 'YES' ELSE 'NO' END) FOREIGN_KEY,
FK.FOREIGN_TALBE,
FK.FOREIGN_COLUMN,
KEYUSG.CONSTRAINT_NAME
FROM
INFORMATION_SCHEMA.COLUMNS COL
JOIN
INFORMATION_SCHEMA.TABLES TBL
ON
COL.TABLE_NAME=TBL.TABLE_NAME
LEFT JOIN
(
SELECT
USG.CONSTRAINT_NAME,
USG.TABLE_NAME,
USG.COLUMN_NAME,
CONST.CONSTRAINT_TYPE
FROM
INFORMATION_SCHEMA.KEY_COLUMN_USAGE USG
JOIN
INFORMATION_SCHEMA.TABLE_CONSTRAINTS CONST
ON
USG.TABLE_NAME=CONST.TABLE_NAME
AND
USG.CONSTRAINT_NAME = CONST.CONSTRAINT_NAME
)
AS KEYUSG
ON
COL.TABLE_NAME=KEYUSG.TABLE_NAME
AND
COL.COLUMN_NAME=KEYUSG.COLUMN_NAME
---FOREIGHTKEYS
LEFT OUTER JOIN
(
SELECT
USAGE.TABLE_NAME,
USAGE.COLUMN_NAME,
UNI_USAGE.TABLE_NAME FOREIGN_TALBE,
UNI_USAGE.COLUMN_NAME FOREIGN_COLUMN,
CONST.CONSTRAINT_NAME,
UNIQUE_CONSTRAINT_NAME
FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS CONST
JOIN
INFORMATION_SCHEMA.KEY_COLUMN_USAGE USAGE
ON
USAGE.CONSTRAINT_NAME=CONST.CONSTRAINT_NAME
JOIN
INFORMATION_SCHEMA.KEY_COLUMN_USAGE UNI_USAGE
ON
UNI_USAGE.CONSTRAINT_NAME=CONST.UNIQUE_CONSTRAINT_NAME
)
AS FK
ON
FK.TABLE_NAME=COL.TABLE_NAME
AND
FK.COLUMN_NAME = COL.COLUMN_NAME
AND
KEYUSG.CONSTRAINT_NAME=FK.CONSTRAINT_NAME
What I was looking for is a way to determine the cardinality of a foreign key. I believe it can be done by checking the uniqueness of the primary key column(s), however, we've opted to hard code the navigation properties ourselves without having to worry about this as well as a auto-generating a name for the property.

query builder select the id from leftJoin

I have a select field that fetch from an entity
and I would like to customize completely my select by choosing the table the id is picked from
(here I would like to select t.id instead of tl.id as the select value)
return $er->createQueryBuilder('tl')
->addSelect('l')
->addSelect('t')
->leftJoin('tl.lang', 'l')
->leftJoin('tl.type', 't')
->where('l.isDefault = 1')
->orderBy('tl.name', 'ASC');
Due to my tables, I can't simply fetch the table t, I have to use tl
Your query is not according to the syntax defined in Doctrine 2 QueryBuilder: http://www.doctrine-project.org/docs/orm/2.0/en/reference/query-builder.html
Your query might work in Doctrine 1.2 but in Doctrine 2 you should build your query according to the syntax defined in the link I posted above.
For example ->addSelect('l') is not being used in Doctrine 2 anymore. It has become ->add('select', 'l').
You don't have to set different alias for your column. It'll be hydrated as column of the related entity.

Django: Distinct foreign keys

class Log:
project = ForeignKey(Project)
msg = CharField(...)
date = DateField(...)
I want to select the four most recent Log entries where each Log entry must have a unique project foreign key. I've tries the solutions on google search but none of them works and the django documentation isn't that very good for lookup..
I tried stuff like:
Log.objects.all().distinct('project')[:4]
Log.objects.values('project').distinct()[:4]
Log.objects.values_list('project').distinct('project')[:4]
But this either return nothing or Log entries of the same project..
Any help would be appreciated!
Queries don't work like that - either in Django's ORM or in the underlying SQL. If you want to get unique IDs, you can only query for the ID. So you'll need to do two queries to get the actual Log entries. Something like:
id_list = Log.objects.order_by('-date').values_list('project_id').distinct()[:4]
entries = Log.objects.filter(id__in=id_list)
Actually, you can get the project_ids in SQL. Assuming that you want the unique project ids for the four projects with the latest log entries, the SQL would look like this:
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4;
Now, you actually want all of the log information. In PostgreSQL 8.4 and later you can use windowing functions, but that doesn't work on other versions/databases, so I'll do it the more complex way:
SELECT logs.*
FROM logs JOIN (
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4 ) as latest
ON logs.project_id = latest.project_id
AND logs.date = latest.max_date;
Now, if you have access to windowing functions, it's a bit neater (I think anyway), and certainly faster to execute:
SELECT * FROM (
SELECT logs.field1, logs.field2, logs.field3, logs.date
rank() over ( partition by project_id
order by "date" DESC ) as dateorder
FROM logs ) as logsort
WHERE dateorder = 1
ORDER BY logs.date DESC LIMIT 1;
OK, maybe it's not easier to understand, but take my word for it, it runs worlds faster on a large database.
I'm not entirely sure how that translates to object syntax, though, or even if it does. Also, if you wanted to get other project data, you'd need to join against the projects table.
I know this is an old post, but in Django 2.0, I think you could just use:
Log.objects.values('project').distinct().order_by('project')[:4]
You need two querysets. The good thing is it still results in a single trip to the database (though there is a subquery involved).
latest_ids_per_project = Log.objects.values_list(
'project').annotate(latest=Max('date')).order_by(
'-latest').values_list('project')
log_objects = Log.objects.filter(
id__in=latest_ids_per_project[:4]).order_by('-date')
This looks a bit convoluted, but it actually results in a surprisingly compact query:
SELECT "log"."id",
"log"."project_id",
"log"."msg"
"log"."date"
FROM "log"
WHERE "log"."id" IN
(SELECT U0."id"
FROM "log" U0
GROUP BY U0."project_id"
ORDER BY MAX(U0."date") DESC
LIMIT 4)
ORDER BY "log"."date" DESC

postgresql full text search query to django ORM

I was following the documentation on FullTextSearch in postgresql. I've created a tsvector column and added the information i needed, and finally i've created an index.
Now, to do the search i have to execute a query like this
SELECT *, ts_rank_cd(textsearchable_index_col, query) AS rank
FROM client, plainto_tsquery('famille age') query
WHERE textsearchable_index_col ## query
ORDER BY rank DESC LIMIT 10;
I want to be able to execute this with Django's ORM so i could get the objects. (A little question here: do i need to add the tsvector column to my model?)
My guess is that i should use extra() to change the "where" and "tables" in the queryset
Maybe if i change the query to this, it would be easier:
SELECT * FROM client
WHERE plainto_tsquery('famille age') ## textsearchable_index_col
ORDER BY ts_rank_cd(textsearchable_index_col, plainto_tsquery(text_search)) DESC LIMIT 10
so id' have to do something like:
Client.objects.???.extra(where=[???])
Thxs for your help :)
Another thing, i'm using Django 1.1
Caveat: I'm writing this on a wobbly train, with a headcold, but this should do the trick:
where_statement = """plainto_tsquery('%s') ## textsearchable_index_col
ORDER BY ts_rank_cd(textsearchable_index_col,
plainto_tsquery(%s))
DESC LIMIT 10"""
qs = Client.objects.extra(where=[where_statement],
params=['famille age', 'famille age'])
If you were on Django 1.2 you could just call:
Client.objects.raw("""
SELECT *, ts_rank_cd(textsearchable_index_col, query) AS rank
FROM client, plainto_tsquery('famille age') query
WHERE textsearchable_index_col ## query
ORDER BY rank DESC LIMIT 10;""")