Django-Entangled multiple data in JsonField

Django-Entangled multiple data in JsonField - django

Working with django JsonField. Using django-entangled in form. I need data format like below. Need suggestion to avail this.
[
{
"name": "Test 1",
"roll": 1,
"section": "A"
},
{
"name": "Test 2",
"roll": 2,
"section": "A"
}
]

With django-entangled this is not possible, because that library does not offer multiple forms of the same kind.
You can however try my next form library django-formset, which can handle multiple forms of the same kind.

Related

How to wisely combine shingles and edgeNgram to provide flexible full text search?

We have an OData-compliant API that delegates some of its full text search needs to an Elasticsearch cluster.
Since OData expressions can get quite complex, we decided to simply translate them into their equivalent Lucene query syntax and feed it into a query_string query.
We do support some text-related OData filter expressions, such as:
startswith(field,'bla')
endswith(field,'bla')
substringof('bla',field)
name eq 'bla'
The fields we're matching against can be analyzed, not_analyzed or both (i.e. via a multi-field).
The searched text can be a single token (e.g. table), only a part thereof (e.g. tab), or several tokens (e.g. table 1., table 10, etc).
The search must be case-insensitive.
Here are some examples of the behavior we need to support:
startswith(name,'table 1') must match "Table 1", "table 100", "Table 1.5", "table 112 upper level"
endswith(name,'table 1') must match "Room 1, Table 1", "Subtable 1", "table 1", "Jeff table 1"
substringof('table 1',name) must match "Big Table 1 back", "table 1", "Table 1", "Small Table12"
name eq 'table 1' must match "Table 1", "TABLE 1", "table 1"
So basically, we take the user input (i.e. what is passed into the 2nd parameter of startswith/endswith, resp. the 1st parameter of substringof, resp. the right-hand side value of the eq) and try to match it exactly, whether the tokens fully match or only partially.
Right now, we're getting away with a clumsy solution highlighted below which works pretty well, but is far from being ideal.
In our query_string, we match against a not_analyzed field using the Regular Expression syntax. Since the field is not_analyzed and the search must be case-insensitive, we do our own tokenizing while preparing the regular expression to feed into the query in order to come up with something like this, i.e. this is equivalent to the OData filter endswith(name,'table 8') (=> match all documents whose name ends with "table 8")
"query": {
"query_string": {
"query": "name.raw:/.*(T|t)(A|a)(B|b)(L|l)(E|e) 8/",
"lowercase_expanded_terms": false,
"analyze_wildcard": true
}
}
So, even though, this solution works pretty well and the performance is not too bad (which came out as a surprise), we'd like to do it differently and leverage the full power of analyzers in order to shift all this burden at indexing time instead of searching time. However, since reindexing all our data will take weeks, we'd like to first investigate if there's a good combination of token filters and analyzers that would help us achieve the same search requirements enumerated above.
My thinking is that the ideal solution would contain some wise mix of shingles (i.e. several tokens together) and edge-nGram (i.e. to match at the start or end of a token). What I'm not sure of, though, is whether it is possible to make them work together in order to match several tokens, where one of the tokens might not be fully input by the user). For instance, if the indexed name field is "Big Table 123", I need substringof('table 1',name) to match it, so "table" is a fully matched token, while "1" is only a prefix of the next token.
Thanks in advance for sharing your braincells on this one.
UPDATE 1: after testing Andrei's solution
=> Exact match (eq) and startswith work perfectly.
A. endswith glitches
Searching for substringof('table 112', name) yields 107 docs. Searching for a more specific case such as endswith(name, 'table 112') yields 1525 docs, while it should yield less docs (suffix matches should be a subset of substring matches). Checking in more depth I've found some mismatches, such as "Social Club, Table 12" (doesn't contain "112") or "Order 312" (contains neither "table" nor "112"). I guess it's because they end with "12" and that's a valid gram for the token "112", hence the match.
B. substringof glitches
Searching for substringof('table',name) matches "Party table", "Alex on big table" but doesn't match "Table 1", "table 112", etc. Searching for substringof('tabl',name) doesn't match anything
UPDATE 2
It was sort of implied but I forgot to explicitely mention that the solution will have to work with the query_string query, mainly due to the fact that the OData expressions (however complex they might be) will keep getting translated into their Lucene equivalent. I'm aware that we're trading off the power of the Elasticsearch Query DSL with the Lucene's query syntax, which is a bit less powerful and less expressive, but that's something that we can't really change. We're pretty d**n close, though!
UPDATE 3 (June 25th, 2019):
ES 7.2 introduced a new data type called search_as_you_type that allows this kind of behavior natively. Read more at: https://www.elastic.co/guide/en/elasticsearch/reference/7.2/search-as-you-type.html

This is an interesting use case. Here's my take:
{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer",
"filter": ["lowercase"]
},
"my_edge_ngram_analyzer": {
"tokenizer": "my_edge_ngram_tokenizer",
"filter": ["lowercase"]
},
"my_reverse_edge_ngram_analyzer": {
"tokenizer": "keyword",
"filter" : ["lowercase","reverse","substring","reverse"]
},
"lowercase_keyword": {
"type": "custom",
"filter": ["lowercase"],
"tokenizer": "keyword"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "25"
},
"my_edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
},
"filter": {
"substring": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 25
}
}
}
},
"mappings": {
"test_type": {
"properties": {
"text": {
"type": "string",
"analyzer": "my_ngram_analyzer",
"fields": {
"starts_with": {
"type": "string",
"analyzer": "my_edge_ngram_analyzer"
},
"ends_with": {
"type": "string",
"analyzer": "my_reverse_edge_ngram_analyzer"
},
"exact_case_insensitive_match": {
"type": "string",
"analyzer": "lowercase_keyword"
}
}
}
}
}
}
}
my_ngram_analyzer is used to split every text into small pieces, how large the pieces are depends on your use case. I chose, for testing purposes, 25 chars. lowercase is used since you said case-insensitive. Basically, this is the tokenizer used for substringof('table 1',name). The query is simple:
{
"query": {
"term": {
"text": {
"value": "table 1"
}
}
}
}
my_edge_ngram_analyzer is used to split the text starting from the beginning and this is specifically used for the startswith(name,'table 1') use case. Again, the query is simple:
{
"query": {
"term": {
"text.starts_with": {
"value": "table 1"
}
}
}
}
I found this the most tricky part - the one for endswith(name,'table 1'). For this I defined my_reverse_edge_ngram_analyzer which uses a keyword tokenizer together with lowercase and an edgeNGram filter preceded and followed by a reverse filter. What this tokenizer basically does is to split the text in edgeNGrams but the edge is the end of the text, not the start (like with the regular edgeNGram).
The query:
{
"query": {
"term": {
"text.ends_with": {
"value": "table 1"
}
}
}
}
for the name eq 'table 1' case, a simple keyword tokenizer together with a lowercase filter should do it
The query:
{
"query": {
"term": {
"text.exact_case_insensitive_match": {
"value": "table 1"
}
}
}
}
Regarding query_string, this changes the solution a bit, because I was counting on term to not analyze the input text and to match it exactly with one of the terms in the index.
But this can be "simulated" with query_string if the appropriate analyzer is specified for it.
The solution would be a set of queries like the following (always use that analyzer, changing only the field name):
{
"query": {
"query_string": {
"query": "text.starts_with:(\"table 1\")",
"analyzer": "lowercase_keyword"
}
}
}

Mustache section values are overridden

How to get this output:
<h1>Colors</h1>
<li><strong>red</strong></li>
<li>green</li>
<li>blue</li>
From this template:
<h1>{{header}}</h1>
{{#bug}}
{{/bug}}
{{#items}}
{{#first}}
<li><strong>{{name}}</strong></li>
{{/first}}
{{#link}}
<li>{{name}}</li>
{{/link}}
{{/items}}
{{#empty}}
<p>The list is empty.</p>
{{/empty}}
This data:
{
"header": "Colors",
"first": true,
"items": [
{"name": "red", "first": true, "url": "#Red"},
{"name": "green", "link": true, "url": "#Green"},
{"name": "blue", "link": true, "url": "#Blue"}
],
"empty": false
}
So that the first field is not overridden.
Currently I get this output:
<h1>Colors</h1>
<li><strong>red</strong></li>
<li><strong>green</strong></li>
<li>green</li>
<li><strong>blue</strong></li>
<li>blue</li>
I'm testing here.

The first key is defined at two levels: at the item level, and at the root level.
When it is not defined at the item level, the Mustache engines digs in, and uses the one defined at the root level. This is called the Mustache context stack, and you have just learnt it the hard way.
Now the answer is simple, insn't it? In order to prevent the Mustache engine to dig in the context stack and look for first at the root level, make sure first is defined at the item level, for all items. Set it to false for all items but the first.

Restless - "objects" wrapper

I'm working with Restless and as stated in the documentation, returning Model.objects.all() produces something like this:
{
"objects": [
{
"id": 1,
"title": "First Post!",
"author": "daniel",
"body": "This is the very first post on my shiny-new blog platform...",
"posted_on": "2014-01-12T15:23:46",
},
{
# More here...
}
]
}
This works fine. However, I don't want the "objects" wrapper to be here. My front-end code expects an array.
Is there any way of telling Restless not to wrap the array?

You can do this by overriding method Resource.wrap_list_response(). Default implementation just wraps data in a dictionary (within the objects key), you can modify this to return data unchanged.

How do Django Fixtures handle ManyToManyFields?

I'm trying to load in around 30k xml files from clinicaltrials.gov into a mySQL database, and the way I am handling multiple locations, keywords, etc. are in a separate model using ManyToManyFields.
The best way I've figured out is to read the data in using a fixture. So my question is, how do I handle the fields where the data is a pointer to another model?
I unfortunately don't know enough about how ManyToMany/ForeignKeys work, to be able to answer...
Thanks for the help, sample code below: __ represent the ManyToMany fields
{
"pk": trial_id,
"model": trials.trial,
"fields": {
"trial_id": trial_id,
"brief_title": brief_title,
"official_title": official_title,
"brief_summary": brief_summary,
"detailed_Description": detailed_description,
"overall_status": overall_status,
"phase": phase,
"enrollment": enrollment,
"study_type": study_type,
"condition": _______________,
"elligibility": elligibility,
"Criteria": ______________,
"overall_contact": _______________,
"location": ___________,
"lastchanged_date": lastchanged_date,
"firstreceived_date": firstreceived_date,
"keyword": __________,
"condition_mesh": condition_mesh,
}
}

A foreign key is simple the pk of the object you are linking to, a manytomanyfield uses a list of pk's. so
[
{
"pk":1,
"model":farm.fruit,
"fields":{
"name" : "Apple",
"color" : "Green",
}
},
{
"pk":2,
"model":farm.fruit,
"fields":{
"name" : "Orange",
"color" : "Orange",
}
},
{
"pk":3,
"model":person.farmer,
"fields":{
"name":"Bill",
"favorite":1,
"likes":[1,2],
}
}
]
You will need to probably write a conversion script to get this done. Fixtures can be very flimsy; it's difficult to get the working so experiment with a subset before you spend a lot of time converting the 30k records (only to find they might not import)

Django: Trying to organize django fixtures

I've some models created wich i'd like to provide initial data for. The problem is that there are several models, and i'd like to organize the data.
Currently, i've a big JSON file: initial_data.json with the data. I was thinking i could use some comments, but JSON has no comments! I really want to use json.
So, the file is like:
[
{
"model": "app1.Model1",
"pk": 1,
"fields": {
"nombre": "A convenir con el vendedor"
}
},
//many more
{
"model": "app2.Model1",
"pk": 1,
"fields": {
"nombre": "A convenir con el vendedor"
}
},
//many more
{
"model": "app2.Model1",
"pk": 1,
"fields": {
"nombre": "A convenir con el vendedor"
}
},
]
So, i thought i could organize them in different files, and with some initial script load them. The idea is not issue several python manage.py loaddata thisApp.Model But, then it would be difficult to separate the files that are not ment to be loaded at initial time.
Here are the files as example:
+app1
+fixtures
model1.json
model2.json
+app2
+fixtures
model1.json
model2.json
+app3
+fixtures
model1.json
model2.json
Do you have any idea how to keep simple?

like you said, create several files, and write a script that combines them into initial_data.json and invokes the needed django.core.management command. this is what I do.

Call the files that contain initial data "initial_data.json" - syncdb will only load those. You can load the others manually with manage.py loaddata.
https://docs.djangoproject.com/en/dev/howto/initial-data/#automatically-loading-initial-data-fixtures

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django-Entangled multiple data in JsonField - django

Working with django JsonField. Using django-entangled in form. I need data format like below. Need suggestion to avail this. [ { "name": "Test 1", "roll": 1, "section": "A" }, { "name": "Test 2", "roll": 2, "section": "A" } ]

With django-entangled this is not possible, because that library does not offer multiple forms of the same kind. You can however try my next form library django-formset, which can handle multiple forms of the same kind.

Related

How to wisely combine shingles and edgeNgram to provide flexible full text search?

Mustache section values are overridden

Restless - "objects" wrapper

How do Django Fixtures handle ManyToManyFields?

Django: Trying to organize django fixtures

Categories

Resources