Specifying a list fields for searching result in django-haystack - django

I just wondering is there a way to specify returned fields for search request to the backend elasticsearch. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html for how to specify list in JSON API.
Let me explain why i need this. I have lots of articles with a large text data. Searching in this case is very slow, cause elasticsearch returns a whole large texts for each search results, but i want to render only titles except a whole text.
May be is there another way to do it?

There are multiple options here
You can use the fields option in Elasticsearch to specify the list of fields value that has to be returned. This will save some latency time as only less data has to be transported back. But then actual data would be stored as _source and it has to be fetched from hard disk and deserialized for each call.
LINK - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html
In case we don't want to retrieve this field but you just want that field to be searchable. You can disable _source and enable store for each field whose data needs to be retrievable.
LINK , _source - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-source-field.html
LINK , store - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html
Djanko haystack documentation - http://django-haystack.readthedocs.org/en/latest/searchresult_api.html#SearchResult.get_additional_fields

Related

Mapping user spreadsheet columns to database fields

I’m not sure where to start on this project. I know how to read the contents of the excel spreadsheet, I know how to identify the header row, I know how to loop over the contents. I believe I have the UX portion worked out but I am not sure how to process the data.
I’ve googled and only found .Net solutions but I’m looking for a ColdFusion/Lucee solution.
I have a working form allowing me to map a user's spreasheet column to my database values (this is being kept simple for this post; user does not have direct access to the database).
Now that I have my data, I'm not sure how to loop over the data results. I believe there will be several loops (an outer and an inner). Then of course I also need to loop over the file contents but I think if I can get the headings mapped out,I can figure out the remaining.
Any good links, tutorials, or guides would be greatly appreciated.
Some pseudo code might be enough to get me started.
User uploads form
System reads headers and content.
User is presented form with a list of columns from their uploaded spreadsheet to match with available database fields (eg “column1” matches “customer name”.
User submits form.
Now what?
UPDATED
Here is what the data looks like AFTER the mapping has been done in my form. The column deliiter is the ::: and within the column the ||| indicates the ID associated with the selected column value. I've included the id and the column value since I plan on displaying the mapping again as a confirmation. Having the ID saves a trip to the database.
If I understand correctly, your question is: how do you provide the user a form allowing them to map their spreadsheet columns to that of the database
Since you have their spreadsheet column names, and you have the database column names, then this problem is essentially a UI/UX problem. You need to show both lists, and allow the user to map them. I can imagine several approaches to this. My first thought would be some sort of drag/drop operation, as follows:
Create a list of boxes, one for each field in your database table, and include the field name in (or above) the box. I'll call this the db field list. Then, create another list for each column from the spreadsheet, which I'll call the spreadsheet column list. The user would drag/drop items from the spreadsheet column list to the db field list.
When a mapping has been completed by the user, you would store the column/field names in as data for the DOM element of the db field list box. Then upon submission, you would acquire the mapping data by visiting each box and adding it to an array. Then you would serialize that array into JSON and send that to your form submission handler.
This could be difficult or easy, depending on your knowledge of UI implementations using JavaScript. jQuery makes this easy (if you know jQuery). There's even a jquery UI plugin that does this: https://jqueryui.com/droppable/.
A quick search for javascript drag drop would help, and here's a few articles I found:
https://www.w3schools.com/html/html5_draganddrop.asp
https://medium.com/quick-code/simple-javascript-drag-drop-d044d8c5bed5
You would also need to submit the array of mappings using javascript. You could search for that as well, and here's an article I found:
https://codereview.stackexchange.com/questions/94493/submit-an-array-as-an-html-form-value-using-javascript

How to build query form to request AWS CloudSearch?

I have a SearchDomain on AWS CloudSearch. I know all the defined facets names.
I would like to build a web query form to use it, but I want to add my categories values (facets) on the side, like it is done on Amazon webstore
The only way I have to get facets values is to make a query (params query) and in the answer will contain facets linked to my query results.
Is there a way to fetch all the facet.FIELD possible values to build the query form ?
If not (as read here), how to design a form using facets ?
You could also use the matchall keyword in a structured query. Also, since you don't need the results you can pass size=0 so you only get the facets which will reduce latency.
/2013-01-01/search?q=matchall&q.parser=structured&size=0&facet.field={sort:'count',size:100}

django haystack SearchField with indexed False

Is there any reason to set additional fields with indexed=False into SearchIndex?
Documentation mentioned that additional fields should be defined for filtering or ordering results. By default SearchIndex has indexed=True, so what happens if I set indexed=False?
Will the data still be stored on index but not be indexed? What happens if I'd set stored=False?
How does it works?
Thanks
By default, all fields in Haystack are both indexed (searchable by the engine) and stored (retained by the engine and presented in the results). By using a stored field, you can store commonly used data in such a way that you don’t need to hit the database when processing the search result to get more information. You get this advantage if you specify indexed=True and stored=True.
If you specify only indexed=True, you will be hitting the database when processing the search result to get additional information not available in the index.
The purpose of indexed=False is to cater for the scenario where you want a rendered field to follow a pre-rendered template during the indexing process. A good example is illustrated here - https://django-haystack.readthedocs.org/en/latest/searchindex_api.html#stored-indexed-fields

Excluding a blob column from active record/linq query results

What's the easiest way to exclude a column from the result set in a Subsonic/ActiveRecord/Linq query?
I've a got a table of images, and often I only want the meta data associated with the image (image id/name/dimensions for example). Seems fairly wasteful to be pulling in the entire image data for these requests.
My current thought is to split out the image data to a separate table, but I'm wondering if there's an easier/better way.
As You can see in the docs via the link below its possible to use LINQ and detail your query that way.
http://subsonicproject.com/docs/Using_ActiveRecord

How can I persist a single value in Django?

My Django application retrieves an RSS feed every day. I would like to persist the time the feed was last updated somewhere in the app. I'm only retrieving one feed, it will never grow to be multiple feeds. How can I persist the last updated time?
My ideas so far
Create a model and add a datetime field to it. This seems like overkill as it adds another table to the database, in which there will only ever be one row. Other than that, it's the most obvious and straight-forward solution.
Create a settings object which just stores key/value mappings. The last updated date would just be row in this database. This is essentially a generic version of the previous solution.
Use dbsettings/django-values, which allows you to store settings in the database. The last updated date would just be a 'setting'.
Any other ideas that I'm missing?
In spite of the fact databases regularly store many rows in any given table, having a table with only one row is not especially costly, so long as you don't have (m)any indexes, which would waste space. In fact most databases create many single row tables to implement some features, like monotonic sequences used for generating primary keys. I encourage you to create a regular model for this.
RAM is volatile, thus not persistent: memcached is not what you asked for.
XML it is not the right technology to store a single value.
RDMS is not the right technology to store a single value.
Django cache framework will answer your question if CACHE_BACKEND is set to anything else than file://...
The filesystem is the right technology to "persist a single value".
In settings.py:
RSS_FETCH_DATETIME_PATH=os.path.join(
os.path.abspath(os.path.dirname(__file__)),
'rss_fetch_datetime'
)
In your rss fetch script:
from django.conf import settings
handler = open(RSS_FETCH_DATETIME_PATH, 'w+')
handler.write(int(time.time()))
handler.close()
Wherever you need to read it:
from django.conf import settings
handler = open(RSS_FETCH_DATETIME_PATH, 'r+')
timestamp = int(handler.read())
handler.close()
But cron is the right tool if you want to "run a command every day", for example at 5AM:
0 5 * * * /path/to/manage.py runscript /path/to/retreive/script
Of course, you can still write the last update timestamp in a file at the end of the retreive script, and use it somewhere else, if that makes sense to you.
Concluding by quoting Ken Thompson:
One of my most productive days was
throwing away 1000 lines of code.
One solution I've used in the past is to use Django's cache feature. You set a value to True with an expiration time of one day (in your case.) If the value is not set, you fetch the feed, otherwise you don't do anything.
You can see my solution here: Importing your Flickr photos with Django
If you need it only for caching purposes, why not store it in the memcached?
On the other hand, if you use this data for other purposes (e.g. display it on the page, or to make some calculation, etc.), then I would store it in a new model - in Django, all persistence is built on top of the database, via models, and I would not try to use other "clever" solutions.
One thing I used to do when I was deving with PHP, was to store the xml somewhere, but with a new tag inserted to hold the timestamp of the latest retrieval. It wasn't great, but it was quick and simple.
Keeping it simple would lead to the idea of just storing it in the file system ... why can't you do that? You could, for example, have a siteconfig module in one of your apps which held these sorts of data. This could load up data from a specific file, which could be text, JSON, ConfigParser, pickle or any suitable format. Just import siteconfig somewhere, and it can load the data and make it available to the other modules in your site. You could easily extend this to hold a dict-like object with a number of settings (e.g., if you ever have multiple feeds, but don't want to have a model just for 2-3 rows, you could easily hold the last-retrieved time for each feed in a dict keyed by feed URL).
Create a session key, which persists forever and update the feed timestamp every time you access it.