AWS Cloudsearch strange issue - amazon-web-services

I uplaoded a JSON to cloudsearch with 1 field is 'text' type and searchable. It contains a word 'Residential'.
However if I use 'Residentia*', it shows me no search result. But using 'Residenti*' or 'Residential' is fine.
Who know about that? Thanks heaps!

I ran into similar issues with Cloudsearch and I searched everywhere for the answer. I eventually came across a piece about "Algorithmic Stemming": https://docs.aws.amazon.com/cloudsearch/latest/developerguide/configuring-analysis-schemes.html.
The default stemming level for English text is "full". I created a custom analysis scheme with stemming set to "None" and applied that to most fields in document and it solved my problems.

Related

Google Data Studio - Custom Field REGEXP_EXTRACT

I am trying to use the REGEXP_EXTRACT custom field to pull a portion of my URL using the page dimension in Google Data Studio and cannot figure it out. The page url structure is similar to this -
website.forum.com/webforms/great_practiceinfo_part2.aspx?function=greatcoverage
I'd like to only extract the middle section "great_practiceinfo_part2". I've tried many different formulas, but nothing seems to work. Does the page dimension work in this scenario? Any help would be much appreciated.
Thanks
It seemed to work fine in Google Sheets when I =REGEXEXTRACT(A3,B3) using your string, website.forum.com/webforms/great_practiceinfo_part2.aspx?function=greatcoverage for A3 and the regex \/([^\/]*?)\.aspx\? for B3. I'm guessing you just need to learn more about how to make your regex pattern making string.

AWS cloudsearch console providing different search result

In Aws console when I search "authority" it provide me the result which have the term "author" which is not at all feasible according to my search term. Is there any configuration problem.
This is because of stemming, which is a trick search engines use to try to return matches for the same root word. For example, a query for "cooking" should probably also match "cook", "cooked", etc. This is accomplished by indexing the word stems, and you can control the extent to which CloudSearch stems.
The default for English is full algorithmic stemming and what you have here is a case of that algorithm not returning the desired results. Your options are to turn down stemming to light or none, or to index this as a literal field rather than text (probably not what you want but I don't know much about your use case).
Here are the docs for Configuring Text Analysis Schemes and Text Processing in CloudSearch.

How to add a new language in Sitecore 8?

I'm working on Sitecore 8 and want to add a new language but Im having a message:
The spell checker dictionary does not exist.
Could you please help me?
All the dictionary files are stored in sitecore\shell\Controls\Rich Text Editor\Dictionaries\ directory.
There is no en-AU dictionary there by default (there are en-UK and en-US dictionaries). You can use one of them I guess. Or you can try google en-AU and use something from the Internet "en-AU.tdf" google search.
I had the same issue. I was able to resolve it by editing the "InvalidItemNameChars" setting temporarily.
Our project had a patch for InvalidItemNameChars like this:
This setting is usually in web.config. If you don't find it there, check "/sitecore/admin/showconfig.aspx" for InvalidItemNameChars.
I had to remove the '-' from the value and create the language. You will not get the error on not selecting the spell checker. You can leave it empty.
Once you are done creating the language add the '-' back to the config (in my case it was a patch config).

Weblog comment error

I am using Weblog for blog functionality. Now I am facing one problem. I have created one Entry with name like 'sitecore-mvc', now when I am going to submit a comments it is giving me error 'End of string expected at position 39' if I changed Entry name like 'sitecoremvc' it works fine. There is some problem when I use '-' in Entry name. Any how I want to use '-' in url. Please give me solution.
If you look at the issues in GitHub for Weblog this is a known issue.
See this link for a solution:
https://github.com/WeTeam/WeBlog/issues/52
You should be able to swap out the DuplicateSubmissionGuard pipeline processor to a custom implementation that escapes hyphen characters in the path.

Solr search suggestion and results on wrong spelling

I am using solr 4.8.1 with django haystack and indexing across multiple fields - I am seeing a problem with some search queries that are spelt wrong, they are coming up with matches and also being put forward as a spelling suggestion.
Example: I have indexed documents that contain the word 'Berkeley' if I use the Solr admin UI and search for 'berkele' it comes up with the spelling suggestion 'berkelei' and then if i query 'berkelei' it will return 429 results (the same amount if I query 'berkeley')
I am using the example solrconfig.xml that came with solr and just generating the schema.xml using django haystack - has anyone got an idea why this would happen?
Basically I would like it to give the correct spelling suggestion when I query something like 'berkele' rather than another misspelt word
I managed to resolve this issue by removing from the schema.xml file generated by django-haystack.