Cloudsearch API 2013-01-01 cs.text_relevance replacement - amazon-web-services

I have a cloudsearch instance using API 2011-02-01, and I use a rank expression defined as
rank-faq-boost = cs.text_relevance({"weights":{"keywords":5.00}, "default_weight":1})
This would boost the score for any records with keywords field containing the search term. For any records without keywords they would still be returned.
the cs.text_relevance() function is no longer supported and I can't find the correct alternative.
I have tried to use
q.options={fields:['keywords^5']}
But that doesn't return any records that don't have any keyword fields, but have the matching search term in other fields, for example in the name field.
The only solution I can find so far is to list every field in the array, but this just feels much less flexible than the old way.
Does anybody know of the correct way to implement this?

Related

Cloud Spanner: Case-insensitive query goes extremely slow even when indexed

https://use-the-index-luke.com/sql/where-clause/functions/case-insensitive-search
This page describes the problem I'm having and a potential solution.
To summarize the issue, I want to get all results matching a query using where UPPER(some_column) = UPPER(#param). I have an index that returns <50ms if I don't use UPPER on some_column. The same query takes 4+ seconds with the UPPER since the table is indexed on some_column alone and not the UPPER value of that column.
The author proposed this:
To support that query, we need an index that covers the actual search
term. That means we do not need an index on LAST_NAME but on
UPPER(LAST_NAME):
CREATE INDEX emp_up_name
ON employees (UPPER(last_name))
An index whose definition contains functions or expressions is a
so-called function-based index (FBI). Instead of copying the column
data directly into the index, a function-based index applies the
function first and puts the result into the index. As a result, the
index stores the names in all caps notation.
Does Spanner support a way to do this? If not what is a good alternative?
I've tried created a function-based index like this, but there's a syntax error making me think functions aren't allowed in the Cloud Spanner DDL
CREATE INDEX some_index
ON Table (
UPPER(Type)
)
As you said it's not possible to use UPPER in Cloud Spanner DDL, as it's not supported.
You can raise a feature request for that following this link [1].
The only workaround I can think of is changing the data before so it's already in uppercase.

Query DynamoDB with case-insensitive condition

We're storing organization names in a DynamoDB table on AWS, and would like to maintain official capitalization in those business names, for example in "TNT" and "FedEx".
Our use case is that users of the application can search for organizations by name, but we'd like that their queries are interpreted case-insensitively. So, queries for "FedEx", "Fedex" or "fedex" should all return the correct item in the table.
Other databases have ways to perform queries ignoring case (for example by the ILIKE key word in PostgreSQL), by expressing queries via regular expressions, or by applying functions in the condition (for example the LOWER() function).
How can this be done in DynamoDB? The documentation on Amazon DynamoDB's Query does not provide an answer.
(The best work-around seems to be storing the name twice: once with the official capitalization in effect, and once in another field with the name converted to lowercase. Searching should then be done on the latter field, with the query search term also converted to lowercase. Yes, I know it adds redundancy to the table. It's a work-around, not an optimal solution.)
yes, exactly, when you add the new item/row, add also a new field searchName, that is the lowercase (even more, maybe only letters/numbers/spaces) of the your name field. and then search by that searchName field
Writing duplicate data in dynamodb is not a good design. The best solution would be to add ' elastic search ' to dynamodb. You can connect this component ' out of the box' using the aws console. Then use custom anayzer in elastic search to get case insensitive data.

How to order django query set filtered using '__icontains' such that the exactly matched result comes first

I am writing a simple app in django that searches for records in database.
Users inputs a name in the search field and that query is used to filter records using a particular field like -
Result = Users.objects.filter(name__icontains=query_from_searchbox)
E.g. -
Database consists of names- Shiv, Shivam, Shivendra, Kashiva, Varun... etc.
A search query 'shiv' returns records in following order-
Kahiva, Shivam, Shiv and Shivendra
Ordered by primary key.
My question is how can i achieve the order -
Shiv, Shivam, Shivendra and Kashiva.
I mean the most relevant first then lesser relevant result.
It's not possible to do that with standard Django as that type of thing is outside the scope & specific to a search app.
When you're interacting with the ORM consider what you're actually doing with the database - it's all just SQL queries.
If you wanted to rearrange the results you'd have to manipulate the queryset, check exact matches, then use regular expressions to check for partial matches.
Search isn't really the kind of thing that is best suited to the ORM however, so you may which to consider looking at specific search applications. They will usually maintain an index, which avoids database hits and may also offer a percentage match ordering like you're looking for.
A good place to start may be with Haystack

Query for field existence in CloudSearch

Suppose I have an optional field called 'xyz' in the list of documents I've indexed in CloudSearch.
How do I query CloudSearch so that it returns only those documents that contain 'xyz'?
If I know up front, that it's a positive integer, I can probably do something like this to get the required list:
q=xyz:[0,}&q.parser=structured
But how do I do it if 'xyz' stores some other type like a string or a list of ints/strings etc.,?
BTW, I've used Solr before, and there, I could simply do q=xyz:* to achieve this. Does CloudSearch support such regular expressions?
You can query for non empty values in a field using * operator, in your case its going to be xyz:* This will only work if you are using Lucene parser for your query to CloudSearch.

SOLR query exclusions

I'm having an issue with querying an index where a common search term also happens to be part of a company name interspersed throughout most of the documents. How do I exclude the business name in results without effecting the ranking on a search that includes part of the business name?
example: Bobs Automotive Supply is the business name.
How can I include relevant results when someone searches automotive or supply without returning every document in the index?
I tried "-'Bobs Automotive Supply' +'search term'" but this seems to exclude any document with Bobs Automotive Supply and isn't very effective on searching 'supply' or 'automotive'
Thanks in advance.
Second answer here, based on additional clarification from first answer.
A few options.
Add the business name as StopWords in the StopWordFilter. This will stop Solr from Indexing them at all. Searches that use them will only really search for those words that aren't in the business name.
Rely on the inherent scoring that Solr will apply due to Term frequency. It sounds like these terms will be in the index frequently. Queries for them will still return the documents, but if the user queries for other, less common terms, those will get a higher score.
Apply a low query boost (not quite negative, but less than other documents) to documents that contain the business name. This is covered in the Solr Relevancy FAQ http://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_negative_.28or_very_low.29_boost_to_documents_that_match_a_query.3F
Do you know that the article is tied to the business name or derive this? If so, you could create another field and then just exclude entities that match on the business name using a filter query. Something like
q=search_term&fq=business_name:(NOT search_term)
It may be helpful to use subqueries for this or to just boost down rather than filter out results.
EDIT: Update to question make this irrelavent. Leaving it hear for posterity. :)
This is why Solr Documents have different fields.
In this case, it sounds like there is a "Footer" field that is separate from your "Body" field in your documents. When searches are performed, they would only done against the Body, which won't include data from the Footer. You could even have a third field which is the "OriginalContent" field, which contains the original copy for display purposes. You wouldn't search that, just store it for later.
The important part is to create the two separate fields in your schema and make sure that you index those field that you want to be able to search.