Group by date field in solrj

Group by date field in solrj - solrj

i want to group the output i am getting through date type. But i am storing the data in solr using datetime type. Date Format i am using is
Date format :: "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
For e.g. Date is stored in solr as "2013-03-01T20:56:45.000+00:00"
What i want as output is count of dates :: for .e.g.
Date1:: "2013-03-01T20:56:45.000+00:00"
Date2:: "2013-03-01T21:56:45.000+00:00"
Date3:: "2013-03-01T22:56:45.000+00:00"
Date3:: "2013-03-02T22:56:45.000+00:00"
Date4:: "2013-03-02T23:56:45.000+00:00"
So i want the output as two columns ::
Date Count
2013-03-01 3
2013-03-02 2
Here is the code i am using
String url = "http://192.168.0.4:8983/solr";
SolrServer server = new HttpSolrServer(url);
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.addFilterQuery("sessionStartTime:[2013-03-01T00:00:00Z TO 2013-03-04T24:00:00Z]");
query.add("group", "true");
query.add("group.field","uniqueId"); // uniqueId is grouping the data
query.add("group.main","true");
query.setRows(9999);
QueryResponse rs=server.query(query);
Iterator<SolrDocument> iter = rs.getResults().iterator();
Help is appreciated.

I know that this is an older question but I am working on something related to this so I thought I would share my solution. Since you are using grouping, rs.getResults() will likely be null. After reading through the SolrJ API and doing some testing on my end, you will find that the results are indeed grouped as you want them to be. To access them, create a variable like such:
List<Group> groupedData = rs.getGroupResponse().getValues().get(0).getValues()
Note that Group is the class org.apache.solr.client.solrj.response.Group
Then, iterate through groupedData, usinig groupedData.get(i).getResult() to get a SolrDocumentList of results for each grouped value. In your example, (assuming the data is ordered as you said it would be), groupedData.get(0) would give you a SolrDocumentList of the three matches that have the date 2013-03-01.
I understand that this is quite the chain of method calls but it does end up getting the results to you. If anyone does know a faster way to get to the data, please feel free to let me know as I would like to know as well.
Refer to the API for GroupResponse for more information
Note that this answer is working on Solr 5.4.0

The output that you are trying to achieve, I believe is better suited to Faceting over grouping. Check out the documentation on Date Faceting more specifically and SolrJ fully supports faceting, see SolrJ - Advanced Usage. For an introduction to Faceting I would recommend reading Faceted Search with Solr

Related

Bigquery struct introspection

Is there a way to get the element types of a struct? For example something along the lines of:
SELECT #TYPE(structField.y)
SELECT #TYPE(structField)
...etc
Is that possible to do? The closest I can find is via the query editor and the web call it makes to validate a query:

As I mentioned already in comments - one of the option is to mimic same very Dry Run call with query built in such a way that it will fail with exact error message that will give you the info you are looking for. Obviously this assumes your use case can be implemented in whatever scripting language you prefer. Should be relatively easy to do.
Meantime, I was looking for making this within the SQL Query.
Below is the example of another option.
It is limited to below types, which might fit or not into your particular use case
object, array, string, number, boolean, null
So example is
select
s.birthdate, json_type(to_json(s.birthdate)),
s.country, json_type(to_json(s.country)),
s.age, json_type(to_json(s.age)),
s.weight, json_type(to_json(s.weight)),
s.is_this, json_type(to_json(s.is_this)),
from (
select struct(date '2022-01-01' as birthdate, 'UA' as country, 1 as age, 2.5 as weight, true as is_this) s
)
with output

You can try the below approach.
SELECT COLUMN_NAME, DATA_TYPE
FROM `your-project.your-dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE TABLE_NAME = 'your-table-name'
AND COLUMN_NAME = 'your-struct-column-name'
ORDER BY ORDINAL_POSITION
You can check this documentation for more details using INFORMATION_SCHEMA for BigQuery.
Below is the screenshot of my testing.
DATA:
RESULT USING THE ABOVE SYNTAX:

Lucene Syntax For More Complicated Queries

I am developing a website for my company, that allows users to query a database in order to get the information they need.
Currently, the users are used to a particular form of queries, and I don't want to make them change the way they are used to. Therefore, I need to convert their query to Lucene's query syntax.
There are some cases which I'm not sure what is the best way to implement them using Lucene syntax, I was wondering maybe you have some better ideas:
"Current Query" : serverRole=~'(ServerOne|ServerTwo|ServerThree)'
"Lucene Suggested": (serverRole:*ServerOne* OR serverRole:*ServerTwo* OR serverRole:*ServerThree*)
Take into account that I'm using Regex to convert these queries, so one of the difficulties I'm facing for example, is how to do it if the number of elements (ServerOne|ServerTwo|ServerThree.....) is dynamic:
luceneQuery = currentQuery
.replace(/(==~|=~)('|")([a-zA-Z0-9]+)(\|)([a-zA-Z0-9]+)('|")/g, ':*$3 OR $5*')
Another query for example:
"Current Query" : OS=~'SLES1[12]'
"Lucene Suggested": (OS:*SLES11* OR OS:*SLES12*)

I would recomand you to check BooleanQuery() on Lucene to create more complex queries like Wildcard , Term, Fuzzy U can include all by using Occur parameter while u build your queries. As an example
Query query1 = new WildcardQuery(new Term("contents", "*ServerOne*"));
Query query2 = new WildcardQuery(new Term("contents", "*ServerTwo*"));
BooleanQuery booleanQuery = new BooleanQuery.Builder()
.add(query1, BooleanClause.Occur.SHOULD)
.add(query2, BooleanClause.Occur.SHOULD)
.build();
There is also regex queries you can directly run but when your indexed field will be complicates it taking time to find regex match

Musicbrainz querying artist and release

I am trying to get an artist and their albums. So reading this page https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2 i created the following query to get Michael Jackson's albums
http://musicbrainz.org/ws/2/artist/?query=artist:michael%20jackson?inc=releases+recordings
My understanding is to add ?inc=releases+recordings at the end of the URL which should return Michael Jackson's albums however this doesnt seem to return the correct results or i cant seem to narrow down the results? I then thought to use the {MBID} but again thats not returned in the artists query (which is why im trying to use inc in my query)
http://musicbrainz.org/ws/2/artist/?query=artist:michael%20jackson
Can anyone suggest where im going wrong with this?

You're not searching for the correct Entity. What you want is to get the discography, not artist's infos. Additionally, query fields syntax is not correct (you must use Lucene Search Syntax).
Here is what you're looking for:
http://musicbrainz.org/ws/2/release-group/?query=artist:"michael jackson" AND primarytype:"album"
We're targeting the release-group entity to get the albums, searching for a specific artist and filtering the results to limit them to albums. (accepted values are: album, single, ep, other)
There are more options to fit your needs, for example you can filter the type of albums using the secondarytype parameter. Here is the query to retrieve only live albums:
http://musicbrainz.org/ws/2/release-group/?query=artist:"michael jackson" AND primarytype:"album" AND secondarytype="live"
Here is the doc:
https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2/Search
Note that to be able to use MB's API you need to understand how it is structured, especially, the relations between release_group, release and medium.

Usage of Id='' in EAI Siebel Adapter Query

Requirement: I am suppose to use an existing Integration Object for my requirement. As this IO consists of ICs that I do not need in my requirement, I would like to avoid them in my IO query output.
I observe that passing Id = '' returns no result in Siebel 8.0. Can I use it as a feature and pass SearchSpec => [Integration Component.Id]='' to EAI Siebel Adapter query to suppress ICs that I don't want in output?
How good is this query Id=''? Will Siebel ignore this query completely? or will it attempt and return no output?
As per my understanding Siebel ignores the query where row_id is passed as ''
(Not true for siebel 6.0) Please share your opinion.

Not sure of using Id = '', when you club the condition with other conditions, Siebel might try to find actual matching records. Also , not sure if future upgrading will keep the same system.
If yours is the only code using the IO, you could straightaway inactivate the ICs you dont want.
If you are unsure of IC inactivation, best way should be to a DatMapper. Set up an EAI Datamapper, source and target IOs of same name. In this datamapper, map only the ICs you need. After querying from EAI Siebel Adapter, send your output to this DataMapper.
Siebel will keep only the ICs mapped and remove all the rest.
Since this is a non-repository change, you can modify the DataMapper in future too.
Hope this helps !

Answering it myself with my opinion..
As per my understanding, querying with Id='' still queries the database for row_id = ''. Including this in IO query reduces the query scope to the parent's context..
Though this won't improve any performance, IO query output looks cleaner.
Update: I'm using a Indexed column based field Id (ROW_ID) with search spec as "[Id] IS NULL". It's a next to impossible case in database having ROW_ID = NULL, unless it's intentionally and manually updated. Again no one would do it unless really wants to messup that data .. because without ROW_ID record is literally invalid..

Adding a null query to the IC will inherently result in an empty property set for the IC in question. But if you don't need the IC, and the ICs in the IO are not hierarchically connected (no hierarchy key )(eg- independant BCs with same base table in the same BO), you just have to remove the IC mapping in the datamap editor and the IC wont show in the IO propset

Does ##parentid attribute work in Sitecore query?

Sitecore reference talk about some attributes you can use in Query, including ##templatename, ##id and ##parentid etc.
parentid doesn't seem to work - /sitecore/content//*[##parentid!=''] never returns any result. While /sitecore/content//*[##templatename!=''] works fine. Sitecore version is 6.5 and 6.6.
Has anyone been able to query with ##parentid? ( Perhaps it uses Ancestor/Descendant table and I'm missing data?? - just a guess )

It is attempting to parse the value as a GUID and failing. Instead, try an empty GUID like so:
/sitecore/content//*[##parentid!='{00000000-0000-0000-0000-000000000000}']

##parentid only works in fast query.
In fast query you can only use ancestor not ancestor-or-self (which doesn't give an error it just does a fallback too ancestor).
Also you can't use the pipe | in fast query to concatenate results of 2 or more queries.
I can't for the life of me figure out how to do a "give me the ancestor-or-self of the current node whose parent has id={110D559F-DEA5-42EA-9C1C-8A5DF7E70EF9}.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Group by date field in solrj - solrj

Related

Bigquery struct introspection

Lucene Syntax For More Complicated Queries

Musicbrainz querying artist and release

Usage of Id='' in EAI Siebel Adapter Query

Does ##parentid attribute work in Sitecore query?

Categories

Resources