Query for field existence in CloudSearch - amazon-web-services

Suppose I have an optional field called 'xyz' in the list of documents I've indexed in CloudSearch.
How do I query CloudSearch so that it returns only those documents that contain 'xyz'?
If I know up front, that it's a positive integer, I can probably do something like this to get the required list:
q=xyz:[0,}&q.parser=structured
But how do I do it if 'xyz' stores some other type like a string or a list of ints/strings etc.,?
BTW, I've used Solr before, and there, I could simply do q=xyz:* to achieve this. Does CloudSearch support such regular expressions?

You can query for non empty values in a field using * operator, in your case its going to be xyz:* This will only work if you are using Lucene parser for your query to CloudSearch.

Related

Lucene Syntax For More Complicated Queries

I am developing a website for my company, that allows users to query a database in order to get the information they need.
Currently, the users are used to a particular form of queries, and I don't want to make them change the way they are used to. Therefore, I need to convert their query to Lucene's query syntax.
There are some cases which I'm not sure what is the best way to implement them using Lucene syntax, I was wondering maybe you have some better ideas:
"Current Query" : serverRole=~'(ServerOne|ServerTwo|ServerThree)'
"Lucene Suggested": (serverRole:*ServerOne* OR serverRole:*ServerTwo* OR serverRole:*ServerThree*)
Take into account that I'm using Regex to convert these queries, so one of the difficulties I'm facing for example, is how to do it if the number of elements (ServerOne|ServerTwo|ServerThree.....) is dynamic:
luceneQuery = currentQuery
.replace(/(==~|=~)('|")([a-zA-Z0-9]+)(\|)([a-zA-Z0-9]+)('|")/g, ':*$3 OR $5*')
Another query for example:
"Current Query" : OS=~'SLES1[12]'
"Lucene Suggested": (OS:*SLES11* OR OS:*SLES12*)
I would recomand you to check BooleanQuery() on Lucene to create more complex queries like Wildcard , Term, Fuzzy U can include all by using Occur parameter while u build your queries. As an example
Query query1 = new WildcardQuery(new Term("contents", "*ServerOne*"));
Query query2 = new WildcardQuery(new Term("contents", "*ServerTwo*"));
BooleanQuery booleanQuery = new BooleanQuery.Builder()
.add(query1, BooleanClause.Occur.SHOULD)
.add(query2, BooleanClause.Occur.SHOULD)
.build();
There is also regex queries you can directly run but when your indexed field will be complicates it taking time to find regex match

Is it possible to query multiple AWS Cloudsearch fields for the same value without repeating?

Using AWS Cloudsearch, I need to query 2 separate fields for the same value using a structured (compound) query e.g.
(and (or name:'john smith') (or curr_addr:'123 someplace' other_addr:'123 someplace'))
This query works, but I'm wondering if it's necessary to repeat the value for each field that I want to search against. Is there some way to specify the value only once e.g. curr_addr+other_addr:'123 someplace'
That is the correct way to structure your compound query. From the AWS documentation, you'll see that they structure their example query the same way:
(and title:'star' (or actors:'Harrison Ford' actors:'William Shatner')(not actors:'Zachary Quinto'))
From Constructing Compound Queries
You may be able to get around this by listing the more repetitive fields in the query options (q.options), and then specify the field for the rest of the fields. The fields list is sort of a fallback for when you don't specify which field you are searching in a compound query. So if you list the address fields there, and then only specify the name field in your query, you may get close to the behavior you're looking for.
Query options
q.options={fields: ['curr_addr','other_addr']}
Query
(and (or name:'john smith') (or '123 someplace'))
But this approach would only work for one set of repetitive fields, so it's not a silver bullet by any means.
From Search API Reference (see q.options => fields)

Cloudsearch API 2013-01-01 cs.text_relevance replacement

I have a cloudsearch instance using API 2011-02-01, and I use a rank expression defined as
rank-faq-boost = cs.text_relevance({"weights":{"keywords":5.00}, "default_weight":1})
This would boost the score for any records with keywords field containing the search term. For any records without keywords they would still be returned.
the cs.text_relevance() function is no longer supported and I can't find the correct alternative.
I have tried to use
q.options={fields:['keywords^5']}
But that doesn't return any records that don't have any keyword fields, but have the matching search term in other fields, for example in the name field.
The only solution I can find so far is to list every field in the array, but this just feels much less flexible than the old way.
Does anybody know of the correct way to implement this?

RegEx for a JSON string

Im storing a person object as JSON in my SQLite database. The table will have few 1000 records of person objects. What i need is to query person based on the "name" attribute.
After investigation i figured out using GLOB method of SQLite to perform a RegEx kind of search in the JSON elements.
My Sample JSON is something like this.
{"name":"john","age":"22","father-name":"jackson"}
Now i want a RegEx matcher to get me all the records that matches a part of the SubString provided with the name attribute in JSON. And it should be case insensitive too.
For Ex: "ohn" search should fetch me john's record.
While you can store JSON and search against it using regexes (which are rather limited in SQLite), it does not mean you should.
Instead, you should really consider splitting your JSON into fields and storing them in normal SQLite table. Doing so will not only allow you to perform easier searches without need to painfully parse data every single time, search will be much faster too (if you add necessary indexes).
If you do want to go down the regex route the following will extract the record:
/\{"name":"\w*ohn\w*[^\}]+\}/i
This will match each of these:
{"name":"john","age":"22","father-name":"jackson"}
{"name":"john","age":"22","father-name":"johnson"}
{"name":"johnny","age":"22","father-name":"smith"}
but not:
{"name":"fred","age":"22","father-name":"hall"},
{"name":"mike","age":"22","father-name":"johnson"}
{"name":"bob","age":"22","father-name":"todd"}

Scan a dynamodb based on a list

I have a String Set attribute i.e SS in a dynamodb table. I need to scan the database to check the value present in the any one list of the items.
Which comparison operator should I use for this scan?
example the db has items like this:
name
[email1, email2]
phone
I need to search for a items containing a particular email say email1 alone not giving the entire tuple.
It seems like you are looking for the CONTAINS operator of Scan operation. It basically is the equivalent of in in Python.
This said, if you need to perform this often, you probably should de-normalize your data to make it faster.
For example, you could build a second table like this:
hash_key: name
range_key: email
Of course, you would have to maintain this index yourself and query it manually.