Loading savedsearch in Suitescript doesnt include all columns. NetSuite - mapreduce

When loading a saved search in suitescript it doesnt include all columns, for example the summed columns in the end are not included. I tried getResults function but because im loading this in mapreduce getInputData function, because of huge data the script timelimit gets exceeded (SSS_TIME_LIMIT_EXCEEDED).
From the below screenshot the marked columns are not visible when i:
function getInputData(){
var mainSrch = search.load({ id: 'customsearch1000' });
return mainSrch;
}
Below is the result i get in the script:
{
"recordType": null,
"id": "16187",
"values": {
"GROUP(trandate)": "22/06/2022",
"GROUP(type)": {
"value": "VendBill",
"text": "Bill"
},
"GROUP(tranid)": "36380",
"GROUP(location)": {
"value": "140",
"text": "ACBD"
},
"GROUP(custitem_item_category.item)": {
"value": "13",
"text": "Frozen Food"
},
"GROUP(custitem_item_subcategory.item)": {
"value": "66",
"text": "Frozen Fruits & Vegetables"
},
"GROUP(itemid.item)": "MN-FGGH10271310",
"GROUP(displayname.item)": "ABC Product",
"GROUP(custcol_po_line_barcode)": "883638668390",
"GROUP(locationquantityonhand.item)": "4",
"SUM(quantity)": "1",
"SUM(totalvalue.item)": "4460.831",
"SUM(custcol_po_unit_price)": "8.00",
"SUM(formulanumeric)": "0"
}
}
Is there any way to get all the columns while loading saved search?

I haven't seen this particular issue before but Netsuite does have an issue sorting by any formulaX column other than the first one so seeing this is not surprising.
If you have no selection criteria on the aggregate values you could:
modify your search to have no summary types or formula numeric columns
in the map phase group them by the original search's grouping columns (no governance cost)
in the reduce phase calculate the values for the formulanumeric columns (no governance cost)
proceed with your original reduce phase logic.

As an alternative to my previous answer you can split your process into parts.
Modify you saved search to include column labels
Use N/task to schedule your search with a map reduce script as a dependency using addInboundDependency
If your search finishes successfully the map reduce script will be called with your search file
return the file from your getInputData phase. You'll have to modify your map/reduce script to handle a different format but if your search can complete at all you'll be able to process it.
Below is a fragment of a script that does this but uses a schedules script as the dependency. Map/reduce scripts are also supported.
var filePath = folderPath+ (folderPath.length ? '/' : '') + name;
var searchTask = task.create({
taskType: task.TaskType.SEARCH,
savedSearchId: searchId,
filePath: filePath
});
var dependency = task.create({
taskType:task.TaskType.SCHEDULED_SCRIPT,
scriptId:'customscript_kotn_s3_defer_transfer',
deploymentId:deferredDeployment,
params:{
custscript_kotn_deferred_s3_folder: me.getParameter({name:'custscript_kotn_s3_folder'}),
custscript_kotn_deferred_s3_file: filePath
}
});
searchTask.addInboundDependency(dependency);
var taskId = searchTask.submit();
log.audit({
title:'queued '+ name,
details: taskId
});

Related

Pulling MarkLogic template view data

I am new to TDE. For the document below, I ended up developing 5 templates and then was able to write a JOIN query (below). I pulled all document data by linking the views via __docid fragment ID.
It works fine when run in Query Console. However, when I tried to pull the same data say to PowerBI via ODBC then I cannot write the query because __docid is not getting passed.
Here are my questions:
How can I assign __docid value to a view field?
If not possible, can I create a single template for the document?
Any other solution?
Thanks in advance.
URI: /json/2017.04.27_ID_NA_SL/chambers_2730.json
Document:
{
"class": "sanction",
"sanction": ==> Template 1
{
"batch": "2017.04.27_ID_NA_SL",
"id": "2017.04.27_IN_NA_SL/chambers_2730",
"date_board_order": "2017-04-27T00:00:00",
"date_effective": null,
"decision": null,
"reasoning": null,
"pas_code": null,
"method": "web",
"orig": "results/results_04_27_2017_04_50PM/ID_SummaryList_03_04PM_February_27_2017/ID-John_chambers- 04_27_2017_BO.pdf",
"professional": ==> Template 2
{
"name_first": "John",
"name_middle": null,
"name_last": "chambers",
"license": null,
"me": "0499999999"
}
}
,
"app":
{
"assignment": ==> Template 3
{
"me": "Jessica Hernendez",
"pas": "Jessica Hernendez"
}
,
"status": ==> Template 4
{
"state": "complete",
"me_complete": "true",
"pas_complete": "true"
}
,
"meta": ==> Template 5
{
"alert": null,
"note": null
}
}
}
Query:
SELECT t.__docid, p.name_first, p.name_middle, p.name_last, p.license, p.meta,
s.batch,s.id,s.date_order,s.orig, a.me, t.state
FROM sanction s
JOIN professional p ON s.__docid=p.__docid
JOIN assignment a ON s.__docid = a.__docid
JOIN status t ON s.__docid = t.__docid
ORDER BY p.name_last
I am not sure if you can literally insert the value of __docid into a TDE field, but you can use xdmp:node-uri(.) instead. That will return the database uri, which is guaranteed unique.
I do wonder if you need 5 templates though. Your data doesn't seem to have repeated elements, so why not create one wide view that holds all sanction data? You could consider it a special purpose view optimized for PowerBI, and save effort on unnecessary joins at runtime.
HTH!

MongoDB: Aggregation using $cond with $regex

I am trying to group data in multiple stages.
At the moment my query looks like this:
db.captions.aggregate([
{$project: {
"videoId": "$videoId",
"plainText": "$plainText",
"Group1": {$cond: {if: {$eq: ["plainText", {"$regex": /leave\sa\scomment/i}]},
then: "Yes", else: "No"}}}}
])
I am not sure whether it is actually possible to use the $regex operator within a $cond in the aggregation stage. I would appreciate your help very much!
Thanks in advance
UPDATE: Starting with MongoDB v4.1.11, there finally appears to be a nice solution for your problem which is documented here.
Original answer:
As I wrote in the comments above, $regex does not work inside $cond as of now. There is an open JIRA ticket for that but it's, err, well, open...
In your specific case, I would tend to suggest you solve that topic on the client side unless you're dealing with crazy amounts of input data of which you will always only return small subsets. Judging by your query it would appear like you are always going to retrieve all document just bucketed into two result groups ("Yes" and "No").
If you don't want or cannot solve that topic on the client side, then here is something that uses $facet (MongoDB >= v3.4 required) - it's neither particularly fast nor overly pretty but it might help you to get started.
db.captions.aggregate([{
$facet: { // create two stages that will be processed using the full input data set from the "captions" collection
"CallToActionYes": [{ // the first stage will...
$match: { // only contain documents...
"plainText": /leave\sa\scomment/i // that are allowed by the $regex filter (which could be extended with multiple $or expressions or changed to $in/$nin which accept regular expressions, too)
}
}, {
$addFields: { // for all matching documents...
"CallToAction": "Yes" // we create a new field called "CallsToAction" which will be set to "Yes"
}
}],
"CallToActionNo": [{ // similar as above except we're doing the inverse filter using $not
$match: {
"plainText": { $not: /leave\sa\scomment/i }
}
}, {
$addFields: {
"CallToAction": "No" // and, of course, we set the field to "No"
}
}]
}
}, {
$project: { // we got two arrays of result documents out of the previous stage
"allDocuments" : { $setUnion: [ "$CallToActionYes", "$CallToActionNo" ] } // so let's merge them into a single one called "allDocuments"
}
}, {
$unwind: "$allDocuments" // flatten the "allDocuments" result array
}, {
$replaceRoot: { // restore the original document structure by moving everything inside "allDocuments" up to the top
newRoot: "$allDocuments"
}
}, {
$project: { // include only the two relevant fields in the output (and the _id)
"videoId": 1,
"CallToAction": 1
}
}])
As always with the aggregation framework, it may help to remove individual stages from the end of the pipeline and run the partial query in order to get an understanding of what each individual stage does.

How can I use scan/scroll with pagination and sort in ElasticSearch?

I have a ES DB storing history records from a process I run every day. Because I want to show only 20 records per page in the history (order by date), I was using pagination (size + from_) combined scroll, which worked just fine. But when I wanted to used sort in the query it didn't work. So I found that scroll with sort don't work. Looking for another alternative I tried the ES helper scan which works fine for scrolling and sorting the results, but with this solution pagination doesn't seem to work, which I don't understand why since the API says that scan sends all the parameters to the underlying search function. So my question is if there is any method to combine the three options.
Thanks,
Ruben
When using the elasticsearch.helpers.scan function, you need to pass preserve_order=True to enable sorting.
(Tested using elasticsearch==7.5.1)
yes, you can combine scroll with sort, but, when you can sort string, you will need change the mapping for it works fine, Documentation Here
In order to sort on a string field, that field should contain one term
only: the whole not_analyzed string. But of course we still need the
field to be analyzed in order to be able to query it as full text.
The naive approach to indexing the same string in two ways would be to
include two separate fields in the document: one that is analyzed for
searching, and one that is not_analyzed for sorting.
"tweet": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
The main tweet field is just the same as before: an analyzed full-text field.
The new tweet.raw subfield is not_analyzed.
Now, or at least as soon as we have reindexed our data, we can use the
tweet field for search and the tweet.raw field for sorting:
GET /_search
{
"query": {
"match": {
"tweet": "elasticsearch"
}
},
"sort": "tweet.raw"
}

distinct value with count and condition mongo DB

I am new to MongoDB, and so far it seems like it is trying to go out of it's way to make doing simple things overly complex.
I am trying to run the below MYSQL equivalent
SELECT userid, COUNT(*)
FROM userinfo
WHERE userdata like '%PC% or userdata like '%wire%'
GROUP BY userid
I have mongo version 3.0.4 and i am running MongoChef.
I tried using something like the below:
db.userinfo.group({
"key": {
"userid": true
},
"initial": {
"countstar": 0
},
"reduce": function(obj, prev) {
prev.countstar++;
},
"cond": {
"$or": [{
"userdata": /PC/
}, {
"userdata": /wire/
}]
}
});
but that did not like the OR.
when I took out the OR, thinking I’d do half at a time and combine results in excel, i got an error "group() can't handle more than 20000 unique keys", and the result table should be much bigger than that.
From what I can tell online, I could do this using aggregation pipelines, but I cannot find any clear examples of how to do that.
This seems like it should be a simple thing that should be built in to any DB, and it makes no sense to me that it is not.
Any help is much appreciated.
/
Works "sooo" much better with the .aggregate() method, as .group() is a very outmoded way of approaching this:
db.userinfo.aggregate([
{ "$match": {
"userdata": { "$in":[/PC/,/wire/] }
}},
{ "$group": {
"_id": "$userid",
"count": { "$sum": 1 }
}}
])
The $in here is a much shorter way of writing your $or condition as well.
This is native code as opposed to JavaScript translation as well, so it runs much faster.
Here is an example which counts the distinct number of first_name values for records with a last_name value of “smith”:
db.collection.distinct("first_name", {“last_name”:”smith”}).length;
output
3

How to wisely combine shingles and edgeNgram to provide flexible full text search?

We have an OData-compliant API that delegates some of its full text search needs to an Elasticsearch cluster.
Since OData expressions can get quite complex, we decided to simply translate them into their equivalent Lucene query syntax and feed it into a query_string query.
We do support some text-related OData filter expressions, such as:
startswith(field,'bla')
endswith(field,'bla')
substringof('bla',field)
name eq 'bla'
The fields we're matching against can be analyzed, not_analyzed or both (i.e. via a multi-field).
The searched text can be a single token (e.g. table), only a part thereof (e.g. tab), or several tokens (e.g. table 1., table 10, etc).
The search must be case-insensitive.
Here are some examples of the behavior we need to support:
startswith(name,'table 1') must match "Table 1", "table 100", "Table 1.5", "table 112 upper level"
endswith(name,'table 1') must match "Room 1, Table 1", "Subtable 1", "table 1", "Jeff table 1"
substringof('table 1',name) must match "Big Table 1 back", "table 1", "Table 1", "Small Table12"
name eq 'table 1' must match "Table 1", "TABLE 1", "table 1"
So basically, we take the user input (i.e. what is passed into the 2nd parameter of startswith/endswith, resp. the 1st parameter of substringof, resp. the right-hand side value of the eq) and try to match it exactly, whether the tokens fully match or only partially.
Right now, we're getting away with a clumsy solution highlighted below which works pretty well, but is far from being ideal.
In our query_string, we match against a not_analyzed field using the Regular Expression syntax. Since the field is not_analyzed and the search must be case-insensitive, we do our own tokenizing while preparing the regular expression to feed into the query in order to come up with something like this, i.e. this is equivalent to the OData filter endswith(name,'table 8') (=> match all documents whose name ends with "table 8")
"query": {
"query_string": {
"query": "name.raw:/.*(T|t)(A|a)(B|b)(L|l)(E|e) 8/",
"lowercase_expanded_terms": false,
"analyze_wildcard": true
}
}
So, even though, this solution works pretty well and the performance is not too bad (which came out as a surprise), we'd like to do it differently and leverage the full power of analyzers in order to shift all this burden at indexing time instead of searching time. However, since reindexing all our data will take weeks, we'd like to first investigate if there's a good combination of token filters and analyzers that would help us achieve the same search requirements enumerated above.
My thinking is that the ideal solution would contain some wise mix of shingles (i.e. several tokens together) and edge-nGram (i.e. to match at the start or end of a token). What I'm not sure of, though, is whether it is possible to make them work together in order to match several tokens, where one of the tokens might not be fully input by the user). For instance, if the indexed name field is "Big Table 123", I need substringof('table 1',name) to match it, so "table" is a fully matched token, while "1" is only a prefix of the next token.
Thanks in advance for sharing your braincells on this one.
UPDATE 1: after testing Andrei's solution
=> Exact match (eq) and startswith work perfectly.
A. endswith glitches
Searching for substringof('table 112', name) yields 107 docs. Searching for a more specific case such as endswith(name, 'table 112') yields 1525 docs, while it should yield less docs (suffix matches should be a subset of substring matches). Checking in more depth I've found some mismatches, such as "Social Club, Table 12" (doesn't contain "112") or "Order 312" (contains neither "table" nor "112"). I guess it's because they end with "12" and that's a valid gram for the token "112", hence the match.
B. substringof glitches
Searching for substringof('table',name) matches "Party table", "Alex on big table" but doesn't match "Table 1", "table 112", etc. Searching for substringof('tabl',name) doesn't match anything
UPDATE 2
It was sort of implied but I forgot to explicitely mention that the solution will have to work with the query_string query, mainly due to the fact that the OData expressions (however complex they might be) will keep getting translated into their Lucene equivalent. I'm aware that we're trading off the power of the Elasticsearch Query DSL with the Lucene's query syntax, which is a bit less powerful and less expressive, but that's something that we can't really change. We're pretty d**n close, though!
UPDATE 3 (June 25th, 2019):
ES 7.2 introduced a new data type called search_as_you_type that allows this kind of behavior natively. Read more at: https://www.elastic.co/guide/en/elasticsearch/reference/7.2/search-as-you-type.html
This is an interesting use case. Here's my take:
{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer",
"filter": ["lowercase"]
},
"my_edge_ngram_analyzer": {
"tokenizer": "my_edge_ngram_tokenizer",
"filter": ["lowercase"]
},
"my_reverse_edge_ngram_analyzer": {
"tokenizer": "keyword",
"filter" : ["lowercase","reverse","substring","reverse"]
},
"lowercase_keyword": {
"type": "custom",
"filter": ["lowercase"],
"tokenizer": "keyword"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "25"
},
"my_edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
},
"filter": {
"substring": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 25
}
}
}
},
"mappings": {
"test_type": {
"properties": {
"text": {
"type": "string",
"analyzer": "my_ngram_analyzer",
"fields": {
"starts_with": {
"type": "string",
"analyzer": "my_edge_ngram_analyzer"
},
"ends_with": {
"type": "string",
"analyzer": "my_reverse_edge_ngram_analyzer"
},
"exact_case_insensitive_match": {
"type": "string",
"analyzer": "lowercase_keyword"
}
}
}
}
}
}
}
my_ngram_analyzer is used to split every text into small pieces, how large the pieces are depends on your use case. I chose, for testing purposes, 25 chars. lowercase is used since you said case-insensitive. Basically, this is the tokenizer used for substringof('table 1',name). The query is simple:
{
"query": {
"term": {
"text": {
"value": "table 1"
}
}
}
}
my_edge_ngram_analyzer is used to split the text starting from the beginning and this is specifically used for the startswith(name,'table 1') use case. Again, the query is simple:
{
"query": {
"term": {
"text.starts_with": {
"value": "table 1"
}
}
}
}
I found this the most tricky part - the one for endswith(name,'table 1'). For this I defined my_reverse_edge_ngram_analyzer which uses a keyword tokenizer together with lowercase and an edgeNGram filter preceded and followed by a reverse filter. What this tokenizer basically does is to split the text in edgeNGrams but the edge is the end of the text, not the start (like with the regular edgeNGram).
The query:
{
"query": {
"term": {
"text.ends_with": {
"value": "table 1"
}
}
}
}
for the name eq 'table 1' case, a simple keyword tokenizer together with a lowercase filter should do it
The query:
{
"query": {
"term": {
"text.exact_case_insensitive_match": {
"value": "table 1"
}
}
}
}
Regarding query_string, this changes the solution a bit, because I was counting on term to not analyze the input text and to match it exactly with one of the terms in the index.
But this can be "simulated" with query_string if the appropriate analyzer is specified for it.
The solution would be a set of queries like the following (always use that analyzer, changing only the field name):
{
"query": {
"query_string": {
"query": "text.starts_with:(\"table 1\")",
"analyzer": "lowercase_keyword"
}
}
}