Index documents to Solr using sunburnt

Index documents to Solr using sunburnt - python-2.7

I want to index few csv files in solr and build search engine using sunburnt for solr,
from sunburnt import SolrInterface
si = sunburnt.SolrInterface("http://localhost:8985/solr/practice")
I get an error:
Key error: id
I am using python 2.7.11, Solr - 6.1, sunburnt 0.6
I found same post here in stackoverflow but it just had one answer and its link is not working now.
I am stuck. please guide me what should I do.
I have to build search engine which can search over multiple fields and over multiple files. I found that sunburnt is best for my case. Any suggestions?

Are your providing an Id for your document or not ? In Solr schema.xml generally id is defined as a Unique Key and is a required parameter. Generally CSV files may have or not have an id field inside them . If not present , using DataImportHandler you can send a file with id . But for a quick fix solution , go to your schema.xml , look for a field declared as id .Remove required=true parameter from this field . Also look for a uniqueKey tag defined in your schema.xml generally defined at the top of a schema.xml file . Remove it completely and try again by restarting the server. If this error resolves, you can further spend your time exploring how to send id as a parameter to your documents . An id for a document is required to uniquely identify it , otherwise same document could be indexed by Solr multiple times, there by creating duplicate documents in the index, which is not desirable at all . Hope this helps :)

Related

sharepoint search doesn't work properly after change environment

I have a trouble in sharepoint search.
I have dev environment and customize the Core search web part results.
The results view changed with xlst. In xslt i use HitHighlighting template for highlighting search results and jquery for replacing # symbols.
On dev environment search site work great, but when I moved my settings to test environment some functionality doesn't work.
Search works with title in query, with query includes title and some properties, but doesn't work with query includes title with some other properties.
I tried to output search raw result and all of searched properties are in raw result. But when I use query like title and one of "problem" property the raw result return empty.
Why sharepoint search returns no result on query with "problem" property and return this property in raw result with title search query.
Where can be the difference in my environments?

I resolve my problem.
At test environment I delete Search service with database from services in CA.
Then I create a new one and configured it. Search start works properly.

Solr search suggestion and results on wrong spelling

I am using solr 4.8.1 with django haystack and indexing across multiple fields - I am seeing a problem with some search queries that are spelt wrong, they are coming up with matches and also being put forward as a spelling suggestion.
Example: I have indexed documents that contain the word 'Berkeley' if I use the Solr admin UI and search for 'berkele' it comes up with the spelling suggestion 'berkelei' and then if i query 'berkelei' it will return 429 results (the same amount if I query 'berkeley')
I am using the example solrconfig.xml that came with solr and just generating the schema.xml using django haystack - has anyone got an idea why this would happen?
Basically I would like it to give the correct spelling suggestion when I query something like 'berkele' rather than another misspelt word

I managed to resolve this issue by removing from the schema.xml file generated by django-haystack.

Solr admin shows number of indexes(numDocs) to be greater than the number of files I processed

When I process 56 files with Solr it says, 'numDoc:74'. I have no clue as to why more indexes would exists than files processed, but one explanation I came up with is that the indexes of a couple of the processed files are too big, so they are split up into multiple indexes(I use rich content extraction on all processed files) . It was just a thought, so I don't want to take it as true right off the bat. Can anyone give an alternate explanation or confirm this one?
using Django + Haystack + Solr.
Many thanks

Your terminology is unfortunately is all incorrect but the troubleshooting process should be simple enough. Solr comes with admin console. Usually at http:// [ localhost or domain ]:8983/solr/ . Go there, find your collection in the drop-down (I am assuming Solr 4) and run the default query in the Query screen. That should give you all your documents and you can see what the extras are.
I suspect you may have some issues with your unique ids and/or reindexing. But with the small number of documents you can really just review what you are actually storing in Solr and figure out what is not correct.

How do I display a field name containing the substring OMIT in ApEx?

One of the fields in my database table is named DATEOFDISCHARGEFROMITU. In any report output, this displays as DATEOFDISCHARGEFRU. I've figured out that the missing characters form the word 'OMIT', which makes me think it's related to this old problem in a previous version of ApEx (I'm using version 4.1.)
Is there a way to display the whole field name in the report header when the field name contains the string 'OMIT'?
Note: Using html character codes will allow the field name to display properly, but then when the report is exported to CSV the character codes are of course shown instead of the full field name. I need a solution that works for exports as well as displaying onscreen.

Platforms (tested): Oracle Application Express (APEX), Version 4.0.2
Note: I am not sure how the linked OTN post is relevant to your problem aside from the coincidence that their file export contains the word "OMIT" and your column title contains the word "OMIT".
It's safe to say that "OMIT" isn't an APEX or ORACLE reserved word that is sabotaging your output. However, if you were talking about a scrap of SQL that attempted to create a table named "SELECT" or "WHERE"
i.e., SELECT * FROM "SELECT" WHERE...
you'll be blocked by the RDBMS from proceeding. :)
I tried an export with a query that contained a column header labeled "OMIT" (see the far right in the example.) The .csv file interpreted by Microsoft Excel looked like this:
I wrote up a separate Q&A post about creating dynamic APEX report headers to answer your follow-on question about a suitable solution for providing a clean, htmlcode-free output when a report is eventually exported to a text, comma separated (or other delimited) output.
In summary, the linked post suggests to set up a dynamic PL/SQL Function within a page item. The page item can be referenced directly in the report column header definition. This is a screenshot demonstrating a possible solution:
The link to the general explanation has more details on the APEX design tasks that gets to this final product.
Onward.

I solved this by using this solution for exporting to csv without an enclosing quote character - as that was another challenge I was faced with for the particular application I was developing. By manually creating the export file I was also able to define the column headings exactly, and the "OMIT" issue did not occur.
Technically that's not a solution for displaying a report with the required headings that can also be exported (Richard's response does that) but it does what I need it to and solves the immediate problem of the DATEOFDISCHARGEFROMITU column heading.

Sitecore language search with Lucene.NET

I am using Alex Shyba's Advanced Database Crawler to index data from Sitecore and Lucene.NET queries to make search queries. I have it working solidly for the most part but having issues with the _language field when I try to do a term match for example en-US, zh-CN and de-DE.
It returns all results for the 'en' culture. But for example in the zh-CN culture it's returning about 99% of the results and leaving out 2-3 articles from each set. The en and zh-CN are different versions of the same item. I can see both information about the item in both cultures in the index via Luke.
I am using TermQuery on the language field to return data. I tried using PhraseQuery and WildCardQuery but everytime I got the same results.
I tried escaping the hyphen since Standard Analyzer doesn't like hypens with a back slash but that didn't work either.
At this point I am out of ideas. How can I have my queries return all the matching documents?
Thanks

The ADC has its own query objects to define search parameters. Simply use the Language property on the SearchParam object to search by a language.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Index documents to Solr using sunburnt - python-2.7

Related

sharepoint search doesn't work properly after change environment

Solr search suggestion and results on wrong spelling

Solr admin shows number of indexes(numDocs) to be greater than the number of files I processed

How do I display a field name containing the substring OMIT in ApEx?

Sitecore language search with Lucene.NET

Categories

Resources