Searchkick not searching multiple terms when specify fields - ruby-on-rails-4

Can anyone provide advice on the following please?
I'm using searchkick / elasticsearch and would like to search for a key term or terms across multiple fields (name, manufacturer). So for example if im looking for a product called "myproduct" made my "somemanufacturer" i'd expect to see this result appear if I search either "myproduct", "somemanufacturer" or "myproduct somemanufacturer" as both these terms are included either in name or manufacturer fields.
My problem is the following:
#products = Product.search query
Allows all the search terms listed above and returns expected result however as soon as I add
#products = Product.search query, fields: [:name, :manufacturer_name]
It will only return a result for "myproduct", or "somecompany", but not "myproduct somecompany".
Now this isn't a big deal as I can remove the fields option entirely BUT I need to utilise searchkicks word_start for the name field. So my final query is something like this:
#products = Product.search query, fields: [{name: :word_start}, :manufacturer_name]
I'd like users to search for the 1st string of the product and be able to enter a manufacturer too eg "myprod somecompany" unfortunately this returns zero results when I was hoping it would return the product named myproduct, made by somecompany.
Am i missing something really obvious here? I can change add
operator: 'or'
but really i want to be able to part search on the name, add additional terms and if both are present for a particular record it gets returned.
heres my model code also
class Product < ActiveRecord::Base
searchkick word_start: [:name]
end
Thanks

If all of your fields share the same analyzer, you can use elasticsearch feature called cross_fields. If it is not the case, you can use query_string. Unfortunately searchkick does not support cross_fields and query_string yet. So, you have to do it by yourself.
Index (different analyzers)
searchkick merge_mappings: true, mappings: {
product: {
properties: {
name: {
type: 'string',
analyzer: 'searchkick_word_start_index',
copy_to: 'grouped'
},
manufacturer_name: {
type: 'string',
analyzer: 'default_index',
copy_to: 'grouped'
},
grouped: {
raw: {type: 'string', index: 'not_analyzed'}
}
}
}
}
Search with cross_fields
#products = Product.search(body: {
query: {
multi_match: {
query: query,
type: "cross_fields",
operator: "and",
fields: [
"name",
"manufacturer_name",
"grouped",
]
}
}
}
Search with query_string
#products = Product.search(body: {
query: {
query_string: {
query: query,
default_operator: "AND",
fields: [
"name",
"manufacturer_name",
"grouped",
]
}
}
}
Update
- Changed my answer to the use of different analyzers and multi fields following the this solution.
Unfortunately passing the query to elasticsearch by yourself you lose searchkick features like highlight and conversions, but, you can still do it, adding it to the elasticsearch query.
Adding hightlight to the query
#products = Product.search(body: {
query: {
...
}, highlight: {
fields: {
name: {},
manufacturer_name: {}
}
}
Adding conversions to the query
#products = Product.search(body: {
query: {
bool: {
must: {
dis_max: {
queries: {
query_string: {
...
}
}
}
},
should: {
nested: {
path: 'conversions',
score_mode: 'sum',
query: {
function_score: {
boost_mode: 'replace',
query: {
match: {
"conversions.query": query
}
},
field_value_factor: {
field: 'conversions.count'
}
}
}
}
}
}
}

The easiest way to do this is to combine the fields into one with the search_data method.
class Product
def search_data
{
full_name: "#{manufacturer_name} #{name}"
}
end
end
Be sure to reindex afterwards. Then use full name to search. All of Searchkick's features will continue to work.

Related

AWS RDS Data API executeStatement not return column names

I'm playing with the New Data API for Amazon Aurora Serverless
Is it possible to get the table column names in the response?
If for example I run the following query in a user table with the columns id, first_name, last_name, email, phone:
const sqlStatement = `
SELECT *
FROM user
WHERE id = :id
`;
const params = {
secretArn: <mySecretArn>,
resourceArn: <myResourceArn>,
database: <myDatabase>,
sql: sqlStatement,
parameters: [
{
name: "id",
value: {
"stringValue": 1
}
}
]
};
let res = await this.RDS.executeStatement(params)
console.log(res);
I'm getting a response like this one, So I need to guess which column corresponds with each value:
{
"numberOfRecordsUpdated": 0,
"records": [
[
{
"longValue": 1
},
{
"stringValue": "Nicolas"
},
{
"stringValue": "Perez"
},
{
"stringValue": "example#example.com"
},
{
"isNull": true
}
]
]
}
I would like to have a response like this one:
{
id: 1,
first_name: "Nicolas",
last_name: "Perez",
email: "example#example.com",
phone: null
}
update1
I have found an npm module that wrap Aurora Serverless Data API and simplify the development
We decided to take the current approach because we were trying to cut down on the response size and including column information with each record was redundant.
You can explicitly choose to include column metadata in the result. See the parameter: "includeResultMetadata".
https://docs.aws.amazon.com/rdsdataservice/latest/APIReference/API_ExecuteStatement.html#API_ExecuteStatement_RequestSyntax
Agree with the consensus here that there should be an out of the box way to do this from the data service API. Because there is not, here's a JavaScript function that will parse the response.
const parseDataServiceResponse = res => {
let columns = res.columnMetadata.map(c => c.name);
let data = res.records.map(r => {
let obj = {};
r.map((v, i) => {
obj[columns[i]] = Object.values(v)[0]
});
return obj
})
return data
}
I understand the pain but it looks like this is reasonable based on the fact that select statement can join multiple tables and duplicated column names may exist.
Similar to the answer above from #C.Slack but I used a combination of map and reduce to parse response from Aurora Postgres.
// declarative column names in array
const columns = ['a.id', 'u.id', 'u.username', 'g.id', 'g.name'];
// execute sql statement
const params = {
database: AWS_PROVIDER_STAGE,
resourceArn: AWS_DATABASE_CLUSTER,
secretArn: AWS_SECRET_STORE_ARN,
// includeResultMetadata: true,
sql: `
SELECT ${columns.join()} FROM accounts a
FULL OUTER JOIN users u ON u.id = a.user_id
FULL OUTER JOIN groups g ON g.id = a.group_id
WHERE u.username=:username;
`,
parameters: [
{
name: 'username',
value: {
stringValue: 'rick.cha',
},
},
],
};
const rds = new AWS.RDSDataService();
const response = await rds.executeStatement(params).promise();
// parse response into json array
const data = response.records.map((record) => {
return record.reduce((prev, val, index) => {
return { ...prev, [columns[index]]: Object.values(val)[0] };
}, {});
});
Hope this code snippet helps someone.
And here is the response
[
{
'a.id': '8bfc547c-3c42-4203-aa2a-d0ee35996e60',
'u.id': '01129aaf-736a-4e86-93a9-0ab3e08b3d11',
'u.username': 'rick.cha',
'g.id': 'ff6ebd78-a1cf-452c-91e0-ed5d0aaaa624',
'g.name': 'valentree',
},
{
'a.id': '983f2919-1b52-4544-9f58-c3de61925647',
'u.id': '01129aaf-736a-4e86-93a9-0ab3e08b3d11',
'u.username': 'rick.cha',
'g.id': '2f1858b4-1468-447f-ba94-330de76de5d1',
'g.name': 'ensightful',
},
]
Similar to the other answers, but if you are using Python/Boto3:
def parse_data_service_response(res):
columns = [column['name'] for column in res['columnMetadata']]
parsed_records = []
for record in res['records']:
parsed_record = {}
for i, cell in enumerate(record):
key = columns[i]
value = list(cell.values())[0]
parsed_record[key] = value
parsed_records.append(parsed_record)
return parsed_records
I've added to the great answer already provided by C. Slack to deal with AWS handling empty nullable character fields by giving the response { "isNull": true } in the JSON.
Here's my function to handle this by returning an empty string value - this is what I would expect anyway.
const parseRDSdata = (input) => {
let columns = input.columnMetadata.map(c => { return { name: c.name, typeName: c.typeName}; });
let parsedData = input.records.map(row => {
let response = {};
row.map((v, i) => {
//test the typeName in the column metadata, and also the keyName in the values - we need to cater for a return value of { "isNull": true } - pflangan
if ((columns[i].typeName == 'VARCHAR' || columns[i].typeName == 'CHAR') && Object.keys(v)[0] == 'isNull' && Object.values(v)[0] == true)
response[columns[i].name] = '';
else
response[columns[i].name] = Object.values(v)[0];
}
);
return response;
}
);
return parsedData;
}

Spring MongoDB Data elemMatch Simple

{ _id: 1, results: [ "tokyo", "japan" ] }
{ _id: 2, results: [ "sydney", "australia" ] }
db.scores.find(
{ results: { $elemMatch: { $regex: *some regex* } } }
)
How do you convert this simple elemMatch example using spring mongodb data Query Criteria?
If the array contains object I can do it this way
Criteria criteria =
Criteria.where("results").
elemMatch(
Criteria.where("field").is("tokyo")
);
But in my question, I dont have the "field"
Update:
I thought the Veeram's answer was going to work after trying it out
Criteria criteria =
Criteria.where("results").
elemMatch(
new Criteria().is("tokyo")
);
It does not return anything. Am I missing something?
When i inspect the query object, it states the following:
Query: { "setOfKeys" : { "$elemMatch" : { }}}, Fields: null, Sort: null
On the other hand, If i modify the criteria using Criteria.where("field") like above,
Query: { "setOfKeys" : { "$elemMatch" : { "field" : "tokyo"}}}, Fields: null, Sort: null
I'm getting something but that's not how my data was structured, results is an array of strings not objects.
I actually need to use regex, for simplicity , the above example is using .is
You can try below query.
Criteria criteria = Criteria.where("results").elemMatch(new Criteria().gte(80).lt(85));
Try this
Criteria criteria = Criteria.where("results").regex(".*tokyo.*","i");

How to exclude substring in Elasticsearch regexp

I'm trying to write an elasticsearch regexp that excludes elements that have a key that contains a substring, let's say in the title of books.
The elasticsearch docs suggest that a substring can be excluded with the following snippet:
#&~(foo.+) # anything except string beginning with "foo"
However, in my case, I've tried to create such a filter and failed.
{
query: {
constant_score: {
filter: {
bool: {
filter: query_filters,
},
},
},
},
size: 1_000,
}
def query_filters
[
{ regexp: { title: "#&~(red)" } },
# goal: exclude titles that start with "Red"
]
end
I've used other regexp in the same query filter that have worked, so I don't think there's a bug in the way the regexp is being passed to ES.
Any ideas? Thanks in advance!
Update:
I found a workaround: I can add a must_not clause to the filter.
{
query: {
constant_score: {
filter: {
bool: {
filter: query_filters,
must_not: must_not_filters,
},
},
},
},
size: 1_000,
}
def must_not_filters
[ { regexp: { title: "red.*" } } ]
end
Still curious if there's another idea for the original regex though

Strongloop Looback Where Or filter in REST syntax

I need to query where descr like 'xxx' or short_descr like 'xxx'
I know how to do it using:
{"where": {
"or": [
{"description": {"like": "xxx"}},
{"short_description": {"like": "xxx"}}
}
}
}
but need to add query params in REST syntax.
I'm trying:
params['filter[where][or]'] = JSON.stringify([
{ "description": { "like": "xxx" } },
{ "short_description": { "like": "xxx" } }
])
with The or operator has invalid clauses result.
Here is an example (I used 'desc' instead of 'description'):
http://localhost:3000/api/cats?filter[where][or][0][desc][like]=foo&filter[where][or][1][short_desc][like]=goo
So the important parts are this:
First, you need to give an index to each part of the OR clause. Note the first one is 0, then 1.
Secondly - um... I thought I had more, but that's pretty much it.
More information on WHERE filters: https://docs.strongloop.com/display/LB/Where+filter

Search for Substring in several fields with MongoDB and Mongoose

I am so sorry, but after one day researching and trying all different combinations and npm packages, I am still not sure how to deal with the following task.
Setup:
MongoDB 2.6
Node.JS with Mongoose 4
I have a schema like so:
var trackingSchema = mongoose.Schema({
tracking_number: String,
zip_code: String,
courier: String,
user_id: Number,
created: { type: Date, default: Date.now },
international_shipment: { type: Boolean, default: false },
delivery_info: {
recipient: String,
street: String,
city: String
}
});
Now user gives me a search string, a rather an array of strings, which will be substrings of what I want to search:
var search = ['15323', 'julian', 'administ'];
Now I want to find those documents, where any of the fields tracking_number, zip_code, or these fields in delivery_info contain my search elements.
How should I do that? I get that there are indexes, but I probably need a compound index, or maybe a text index? And for search, I then can use RegEx, or the $text $search syntax?
The problem is that I have several strings to look for (my search), and several fields to look in. And due to one of those aspects, every approach failed for me at some point.
Your use case is a good fit for text search.
Define a text index on your schema over the searchable fields:
trackingSchema.index({
tracking_number: 'text',
zip_code: 'text',
'delivery_info.recipient': 'text',
'delivery_info.street': 'text',
'delivery_info.city': 'text'
}, {name: 'search'});
Join your search terms into a single string and execute the search using the $text query operator:
var search = ['15232', 'julian'];
Test.find({$text: {$search: search.join(' ')}}, function(err, docs) {...});
Even though this passes all your search values as a single string, this still performs a logical OR search of the values.
Why just dont try
var trackingSchema = mongoose.Schema({
tracking_number: String,
zip_code: String,
courier: String,
user_id: Number,
created: { type: Date, default: Date.now },
international_shipment: { type: Boolean, default: false },
delivery_info: {
recipient: String,
street: String,
city: String
}
});
var Tracking = mongoose.model('Tracking', trackingSchema );
var search = [ "word1", "word2", ...]
var results = []
for(var i=0; i<search.length; i++){
Tracking.find({$or : [
{ tracking_number : search[i]},
{zip_code: search[i]},
{courier: search[i]},
{delivery_info.recipient: search[i]},
{delivery_info.street: search[i]},
{delivery_info.city: search[i]}]
}).map(function(tracking){
//it will push every unique result to variable results
if(results.indexOf(tracking)<0) results.push(tracking);
});
Okay, I came up with this.
My schema now has an extra field search with an array of all my searchable fields:
var trackingSchema = mongoose.Schema({
...
search: [String]
});
With a pre-save hook, I populate this field:
trackingSchema.pre('save', function(next) {
this.search = [ this.tracking_number ];
var searchIfAvailable = [
this.zip_code,
this.delivery_info.recipient,
this.delivery_info.street,
this.delivery_info.city
];
for (var i = 0; i < searchIfAvailable.length; i++) {
if (!validator.isNull(searchIfAvailable[i])) {
this.search.push(searchIfAvailable[i].toLowerCase());
}
}
next();
});
In the hope of improving performance, I also index that field (also the user_id as I limit search results by that):
trackingSchema.index({ search: 1 });
trackingSchema.index({ user_id: 1 });
Now, when searching I first list all substrings I want to look for in an array:
var andArray = [];
var searchTerms = searchRequest.split(" ");
searchTerms.forEach(function(searchTerm) {
andArray.push({
search: { $regex: searchTerm, $options: 'i'
}
});
});
I use this array in my find() and chain it with an $and:
Tracking.
find({ $and: andArray }).
where('user_id').equals(userId).
limit(pageSize).
skip(pageSize * page).
exec(function(err, docs) {
// hooray!
});
This works.