Solr search suggestion and results on wrong spelling - django

I am using solr 4.8.1 with django haystack and indexing across multiple fields - I am seeing a problem with some search queries that are spelt wrong, they are coming up with matches and also being put forward as a spelling suggestion.
Example: I have indexed documents that contain the word 'Berkeley' if I use the Solr admin UI and search for 'berkele' it comes up with the spelling suggestion 'berkelei' and then if i query 'berkelei' it will return 429 results (the same amount if I query 'berkeley')
I am using the example solrconfig.xml that came with solr and just generating the schema.xml using django haystack - has anyone got an idea why this would happen?
Basically I would like it to give the correct spelling suggestion when I query something like 'berkele' rather than another misspelt word

I managed to resolve this issue by removing from the schema.xml file generated by django-haystack.

Related

Django using Postgres Full Text Search not recognizing certain words when setting config='english'

So I'm running into a very weird issue. I'm using Django's SearchQuery app in Django, using Django 3.2.1 and the most up to date PostgreSQL. The issue emerges when I use a search query with websearch and config set to english.
Here is the initial code for the instantiation of the search_query with config='english'
search_query = SearchQuery(query_string, search_type='websearch', config='english')
And this is the code for when I search for it:
agenda_items = AgendaItem.objects.annotate(search=SearchVector('name', 'description', 'note'),search_rank=SearchRank('search', search_query)).filter(search=search_query).order_by('-search_rank').distinct()
This works for most words, but I have found some weird outliers. For instance, it works when I search "flamingo" and it correctly matches it.
However, when I search for the word "cupcake" it displays nothing, despite the word actually being present in many records in the correct spots.
However, if I remove config='english' from the SearchQuery it works. But without that, I lose the English language support to stem words.
Does anyone have any ideas?

Google Data Studio - Custom Field REGEXP_EXTRACT

I am trying to use the REGEXP_EXTRACT custom field to pull a portion of my URL using the page dimension in Google Data Studio and cannot figure it out. The page url structure is similar to this -
website.forum.com/webforms/great_practiceinfo_part2.aspx?function=greatcoverage
I'd like to only extract the middle section "great_practiceinfo_part2". I've tried many different formulas, but nothing seems to work. Does the page dimension work in this scenario? Any help would be much appreciated.
Thanks
It seemed to work fine in Google Sheets when I =REGEXEXTRACT(A3,B3) using your string, website.forum.com/webforms/great_practiceinfo_part2.aspx?function=greatcoverage for A3 and the regex \/([^\/]*?)\.aspx\? for B3. I'm guessing you just need to learn more about how to make your regex pattern making string.

Index documents to Solr using sunburnt

I want to index few csv files in solr and build search engine using sunburnt for solr,
from sunburnt import SolrInterface
si = sunburnt.SolrInterface("http://localhost:8985/solr/practice")
I get an error:
Key error: id
I am using python 2.7.11, Solr - 6.1, sunburnt 0.6
I found same post here in stackoverflow but it just had one answer and its link is not working now.
I am stuck. please guide me what should I do.
I have to build search engine which can search over multiple fields and over multiple files. I found that sunburnt is best for my case. Any suggestions?
Are your providing an Id for your document or not ? In Solr schema.xml generally id is defined as a Unique Key and is a required parameter. Generally CSV files may have or not have an id field inside them . If not present , using DataImportHandler you can send a file with id . But for a quick fix solution , go to your schema.xml , look for a field declared as id .Remove required=true parameter from this field . Also look for a uniqueKey tag defined in your schema.xml generally defined at the top of a schema.xml file . Remove it completely and try again by restarting the server. If this error resolves, you can further spend your time exploring how to send id as a parameter to your documents . An id for a document is required to uniquely identify it , otherwise same document could be indexed by Solr multiple times, there by creating duplicate documents in the index, which is not desirable at all . Hope this helps :)

AWS Cloudsearch strange issue

I uplaoded a JSON to cloudsearch with 1 field is 'text' type and searchable. It contains a word 'Residential'.
However if I use 'Residentia*', it shows me no search result. But using 'Residenti*' or 'Residential' is fine.
Who know about that? Thanks heaps!
I ran into similar issues with Cloudsearch and I searched everywhere for the answer. I eventually came across a piece about "Algorithmic Stemming": https://docs.aws.amazon.com/cloudsearch/latest/developerguide/configuring-analysis-schemes.html.
The default stemming level for English text is "full". I created a custom analysis scheme with stemming set to "None" and applied that to most fields in document and it solved my problems.

Solr admin shows number of indexes(numDocs) to be greater than the number of files I processed

When I process 56 files with Solr it says, 'numDoc:74'. I have no clue as to why more indexes would exists than files processed, but one explanation I came up with is that the indexes of a couple of the processed files are too big, so they are split up into multiple indexes(I use rich content extraction on all processed files) . It was just a thought, so I don't want to take it as true right off the bat. Can anyone give an alternate explanation or confirm this one?
using Django + Haystack + Solr.
Many thanks
Your terminology is unfortunately is all incorrect but the troubleshooting process should be simple enough. Solr comes with admin console. Usually at http:// [ localhost or domain ]:8983/solr/ . Go there, find your collection in the drop-down (I am assuming Solr 4) and run the default query in the Query screen. That should give you all your documents and you can see what the extras are.
I suspect you may have some issues with your unique ids and/or reindexing. But with the small number of documents you can really just review what you are actually storing in Solr and figure out what is not correct.