SolrJ how to change field type of a field defined by annotation? - solrj

I followed Solr guide created a class and used #Field annotation in front of the class attributes.
public class MyDocument {
#Field
public String fra_contents;
... // Other fields
//NO getters and setters as shown https://lucene.apache.org/solr/guide/7_2/using-solrj.html#java-object-binding
}
Looking at the generated "managed-schema.xml" shows that "fra_contents" is of type "text_general" :
<field name="fra_contents" type="text_general"/>
Yet I need to apply a different tokenizer, and different filters to this field than the ones associated with "text_general". So I created a fieldtype programmatically (following based on Solr testing code) called "fra_contents_type" :
<fieldType name="fra_contents_type" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.KeywordRepeatFilterFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="lang/fra.txt"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.ElisionFilterFactory" articles="lang/contractions_fr.txt"/>
<filter class="solr.SnowballPorterFilterFactory" language="French"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.KeywordRepeatFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.ElisionFilterFactory" articles="lang/contractions_fr.txt"/>
<filter class="solr.SnowballPorterFilterFactory" language="French"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
This other SO question explains how the fieldtype is set based on the java variable type, but does not tell how to change this defauld fieldtype.
So how can I change the fieldtype of this field programmatically while keeping the annotation (ie without editing the "managed-schema.xml") ?
Any help appreciated,

So here is what I found which works a posteriori and not a priori using the Schema API.
// First retrieves the original field attributes
SchemaRequest.Field originalField = new SchemaRequest.Field(fieldName);
Map<String, Object> updatedFieldAttributes = originalField.process(
getSolrClient()).getField();
// Modifies the original attributes
updatedFieldAttributes.put("type",
fieldTypeName);
// Updates the field type of the field
SchemaRequest.ReplaceField replaceFieldRequest = new
SchemaRequest.ReplaceField(updatedFieldAttributes);
// Processes the requests
List<SchemaRequest.Update> list = new ArrayList<>(3);
list.add(addFieldTypeRequest);
list.add(replaceFieldRequest);
SchemaRequest.MultiUpdate multiUpdateRequest = new SchemaRequest.MultiUpdate(
list);
SchemaResponse.UpdateResponse multipleUpdatesResponse = multiUpdateRequest.process(
getSolrClient());
There may be a cleaner way (aka "one liner" ;-) ) to do it!

Related

SailPoint IdentityIQ 8.2 - Return a list of users who have any entitlement(group) in a predetermined list of entitlements

I'm working in an environment where IdentityIQ 8.2 is deployed for access management.
I am attempting to return a list of users, based on if they have any one of the entitlements in the provided "whitelist". (i.e. "Show me any user who has entitlement1 or entitlement2 or entitlement3")
I tried to use the Advanced Analytics search function. This does allow you to search for identities based on entitlement, but it function in an "Exclusive AND" logic style where only users who have every single entitlement on your "whitelist" will be returned. I haven't found a way to change this. The Advanced Search type doesn't support searching by entitlement, from what I can tell.
Is there an out of the box way to accomplish this?
You can create the entitlement search with AND and save the result as a Population. You can then change operation="AND" to operation="OR" using the Debug pages.
Example how to search for users who have either of these two AD group memberships (this is a Population saved from Advanced Analytics):
<GroupDefinition indexed="true" name="x" private="true">
<GroupFilter>
<CompositeFilter operation="AND">
<Filter operation="COLLECTION_CONDITION" property="identityEntitlements">
<CollectionCondition>
<CompositeFilter operation="OR">
<CompositeFilter operation="AND">
<Filter operation="EQ" property="application.name" value="AD"/>
<Filter operation="EQ" property="name" value="memberOf"/>
<Filter operation="EQ" property="value" value="{e4ca3ebf-543e-4f19-aa6d-60ebee9968a7}"/>
</CompositeFilter>
<CompositeFilter operation="AND">
<Filter operation="EQ" property="application.name" value="AD"/>
<Filter operation="EQ" property="name" value="memberOf"/>
<Filter operation="EQ" property="value" value="{b263fcce-26e5-4fc8-9ed3-012df6b4c262}"/>
</CompositeFilter>
</CompositeFilter>
</CollectionCondition>
</Filter>
</CompositeFilter>
</GroupFilter>
<Owner>
<Reference class="sailpoint.object.Identity" name="spadmin"/>
</Owner>
</GroupDefinition>

wso2 dataservice filter null for a dataservice field

in my Dataservice ,in a select statement one of the field has a null value.
It is returned like this
<ROLLNUMBER xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
I want to write a filter and do some logic based on if the value is null or it has some value.
How can do that in WSO2 ESB?
I tried a few XSL expression nothing works
You need to use a filter mediator wherein you need to use xpath expression as //*[local-name()='ROLL_NUMBER']/text(), what this will do is for the element ROLL_NUMBER if there is a value only then the filter condition is satisfied and it goes to then condition if not then it will go to else condition
Try the following
<filter source="boolean(get-property('yourProperty'))" regex="false">
<then> <!-- NULL OR NON EXIST --> </then>
<else> <!-- EXIST --> </else>
</filter>

Getting rid of plaintext hyperlinks before indexing a record in Solr

I have a field, whose content is used to generate facets from. One particular problem I'd like to solve is the fact that some of my content contains hyperlinks in plaintext i.e http://google.com. As a result, I started seeing http as one of my top facets. How can I make sure that I filter out the hyperlink content, before I index it? Using a regex filter of some sort?
I know that I can do this pre-processing part on the client side, when I add the records to Solr. Yet, I'd like to keep everything consistent, and part of the Solr pipeline, so I'd like the Solr pre-processor to do this for me if possible.
I would solve it with these components:
The solr.UAX29URLEmailTokenizer preserves the URL as a token
The solr.PatternReplaceFilterFactory replaces the URL token with an empty string (search Stack Overflow for a suitable regex pattern)
A solr.LengthFilterFactory filters the zero-length token
In schema.xml:
<analyzer type="index">
<tokenizer class="solr.UAX29URLEmailTokenizerFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="..." replacement="" />
<filter class="solr.LengthFilterFactory" min="1" max="1000" />
</analyzer>
Note that changing the tokenizer from the solr.StandardTokenizerFactory may have implications beyond what is described in this answer, so be sure to test.

Django Haystack Solr autocomplete with numbers not working

I almost have autocomplete working using Haystack with Solr, but it doesn't seem to work when the tag I'm trying to match starts with only one number.
I have these tags:
"8th Grade"
"9th Grade"
"10th Grade"
This is my query and Haystack definition:
tags = SearchQuerySet().models(Tag).filter(SQ(name_auto=autocomplete_string))
class TagIndex(indexes.SearchIndex, indexes.Indexable):
name = indexes.CharField(model_attr='name', faceted=True)
name_auto = indexes.EdgeNgramField(model_attr='name')
autocomplete_string = "10" works.
autocomplete_string = "th" works.
autocomplete_string = "8th" does NOT work.
This is part of my Schema for Solr:
<fieldType name="edge_ngram" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
</analyzer>
</fieldType>
Looks like it is somehow splitting "9th Grade" into numbers and words. It gets only a "9" of lenght 1, so it isn't able to perform the query. I wonder how can I force to index "9th" as an atomic word and not have issues when autocompleting by "9t" or adjust the settings to get it working.
For some reason, I wouldn't want to decrease minGramSize to 1, but if that's the only way ..
Please check out http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
you maybe want to put splitOnNumerics to 0
splitOnNumerics="1" causes alphabet => number transitions to generate a new part [Solr 1.3]:
"j2se" => "j" "2" "se"
default is true ("1"); set to 0 to turn off
(not a SOLR expert, I'm not 100% sure of this)

I want to find "Radiohead" but not "Radiohead's" with Sunspot/Solr

I'm using solr via the sunspot gem in a rails project.
I am indexing scraped data.
My indexing is currently done like so:
searchable do
text :title, :boost => 3.0 do
title.gsub(/\'s\b/, "")
end
text :mentions do
mentions.map do |mention|
mention.title.gsub(/\'s\b/, "")
end
end
end
Currently, if I do:
Video.solr_search { fulltext '"Radiohead"' }
Solr will return results with:
Radiohead's
and
Radiohead
I would like to only find:
Radiohead
Is there a way to do this via Sunspot?
Check what filters you have defined in the analyzer section of the field type for your field in schema.xml (in .../solr/conf directory). Here's an example:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
...
<filter class="solr.SnowballPorterFilterFactory" language="English" />
</analyzer>
</fieldType>
The behaviour you're seeing is called "stemming" - it's where the indexed value is the stem of the word, rather than the word itself. eg, "fly", "flies", "flew" and "flying" would all be indexed as "fly". If there's a filter like snowball (apache's stemmer), then you'll get the behaviour you're seeing. Try removing the filter, restarting solr then reindexing your documents.
You should do a phrase query (using double quotes) :
Video.solr_search { fulltext '"Radiohead"' }.
Or modify you solr schema.xml so that you don't split "Radiohead's". I don't know your field configuration here so I can't provide more details...