How to do a cross join / cartesian product in RavenDB? - mapreduce

I have a web application that uses RavenDB on the backend and allows the user to keep track of inventory. The three entities in my domain are:
public class Location
{
string Id
string Name
}
public class ItemType
{
string Id
string Name
}
public class Item
{
string Id
DenormalizedRef<Location> Location
DenormalizedRef<ItemType> ItemType
}
On my web app, there is a page for the user to see a summary breakdown of the inventory they have at the various locations. Specifically, it shows the location name, item type name, and then a count of items.
The first approach I took was a map/reduce index on InventoryItems:
this.Map = inventoryItems =>
from inventoryItem in inventoryItems
select new
{
LocationName = inventoryItem.Location.Name,
ItemTypeName = inventoryItem.ItemType.Name,
Count = 1
});
this.Reduce = indexEntries =>
from indexEntry in indexEntries
group indexEntry by new
{
indexEntry.LocationName,
indexEntry.ItemTypeName,
} into g
select new
{
g.Key.LocationName,
g.Key.ItemTypeName,
Count = g.Sum(entry => entry.Count),
};
That is working fine but it only displays rows for Location/ItemType pairs that have a non-zero count of items. I need to have it show all Locations and for each location, all item types even those that don't have any items associated with them.
I've tried a few different approaches but no success so far. My thought was to turn the above into a Multi-Map/Reduce index and just add another map that would give me the cartesian product of Locations and ItemTypes but with a Count of 0. Then I could feed that into the reduce and would always have a record for every location/itemtype pair.
this.AddMap<object>(docs =>
from itemType in docs.WhereEntityIs<ItemType>("ItemTypes")
from location in docs.WhereEntityIs<Location>("Locations")
select new
{
LocationName = location.Name,
ItemTypeName = itemType.Name,
Count = 0
});
This isn't working though so I'm thinking RavenDB doesn't like this kind of mapping. Is there a way to get a cross join / cartesian product from RavenDB? Alternatively, any other way to accomplish what I'm trying to do?
EDIT: To clarify, Locations, ItemTypes, and Items are documents in the system that the user of the app creates. Without any Items in the system, if the user enters three Locations "London", "Paris", and "Berlin" along with two ItemTypes "Desktop" and "Laptop", the expected result is that when they look at the inventory summary, they see a table like so:
| Location | Item Type | Count |
|----------|-----------|-------|
| London | Desktop | 0 |
| London | Laptop | 0 |
| Paris | Desktop | 0 |
| Paris | Laptop | 0 |
| Berlin | Desktop | 0 |
| Berlin | Laptop | 0 |

Here is how you can do this with all the empty locations as well:
AddMap<InventoryItem>(inventoryItems =>
from inventoryItem in inventoryItems
select new
{
LocationName = inventoryItem.Location.Name,
Items = new[]{
{
ItemTypeName = inventoryItem.ItemType.Name,
Count = 1}
}
});
)
this.AddMap<Location>(locations=>
from location in locations
select new
{
LocationName = location.Name,
Items = new object[0]
});
this.Reduce = results =>
from result in results
group result by result.LocationName into g
select new
{
LocationName = g.Key,
Items = from item in g.SelectMany(x=>x.Items)
group item by item.ItemTypeName into gi
select new
{
ItemTypeName = gi.Key,
Count = gi.Sum(x=>x.Count)
}
};

Related

Creating Dictionary from nth list items

I have a list in which each item in the list is further split into 3 fields, separated by a '| '
Suppose my list is :
[‘North America | 23 | United States’, ’South America | 12 | Brazil’,
‘Europe | 51 | Greece’………] and so on
Using this list, I want to create a dictionary that would make the first field in each item the value, and the second field in each item the key.
How can I add these list items to a dictionary using a for loop?
My expected outcome would be
{’23’:’North America’, ’12’:’South America’, ’51’:’Europe’}
How about something like this:
var myList = new List<string>() { "North America | 23 | United States", "South America | 12 | Brazil", "Europe | 51 | Greece" };
var myDict = myList.Select(x => x.Split('|')).ToDictionary(a => a[1], a => a[0]);
Assuming you're in Python, if you know what the delimiter is, you can use string.split() to break the string up into a list then go from there.
my_dict = {}
for val in my_list:
words = val.split(" | ")
my_dict[words[1]] = words[0]
For other languages, you can take an index of the first "|", substring from the beginning to give you your value, then take the index of the second line to give you the key. In Java this would look like:
Map<String, String> myDict = new DopeDataStructure<String, String>();
for(String s : myArray){
int pos = s.indexOf("|");
String val = s.substring(0, pos - 1);
String rest = s.substring(pos + 2);
String key = rest.substring(0, rest.indexOf("|") - 1);
myDict.put(key, val);
}
EDIT: there very well could be more efficient ways of solving the problem in other languages, that's just the simplest method I know off the top of my head
Using Python dict comphrehension
data = ['North America | 23 | United States', 'South America | 12 | Brazil',]
# Spliting each string in list by "|" and setting its 1st index value to dict key and 0th index value to dict value
res = {i.split(" | ")[1]: i.split(" | ")[0] for i in data}
print (res)
I hope this helps and counts!

Build data using dictionary

I am new to python and i am still learning how it all works. Its been just a week since i started.
I am trying to code a program which does this:
Reads 4 columns from a file (ref input file above)
Get date, day and count from the file
And construct an dictionary to represent date day and count.
basically i wanted to represent the data in something like below and I am stucked in the syntax.
{
"xyz" :
{"Sunday" : {
"20180101" : 72326,
"20180108" : 71120
}
"Monday" : {
"20171225" : 51954,
"20180102" : 51954
}
}
}
INPUT FILE:
DateDay value count floatex
20171225Monday | 270613| 51954|11.41|
20171226Tuesday | 133579| 46126|12.01|
20171227Wednesday| 630613| 71954|11.41|
20171228Thursday | 253779| 96126|12.01|
20171229Friday | 688613| 71054|11.41|
20171230Saturday | 633779| 66126|12.01|
20180101Sunday | 633779| 72326|12.01|
20180102Monday | 630613| 91954|11.41|
20180103Tuesday | 538779| 73326|12.01|
20180104Wednesday| 630613| 61954|11.41|
20180105Thursday | 393379| 75146|12.01|
20180106Friday | 130613| 51954|11.41|
20180107Saturday | 2643329| 70126|12.01|
20180108Sunday | 863979| 71120|12.01|
This is what i Have but its far from what i want: Infact its throwing error now.. But that is not my question. Basically i am trying to understand how do i create the nested dictionary based on the input data
def buildInputDataDictionary(file, ind):
dateCount = {}
dateDay = {}
#dictData = {}
# dateCount[dictData] = {}
with open(file) as f:
for line in f:
items = line.split("|")
date=items[0].strip()[0:8] ##strip spaces and substring to get only the date
count= items[2].strip()
day= items[0].strip()[8:]
dateCount[date] = count
dateDay[date] = day
dictData = {}
dictData[date] = {}
dictData [ind][date] = count
return dateCount,dateDay,dictData
dc,dd, di= buildInputDataDictionary(autoInqRhf, "xyz")
print dd
print dc
print di
In your current script you reset the dictionary in each for loop. You only need to initiate it once outside the loop. When adding nested data you have to make sure, that all keys above the one you want to enter already exists. You can do this like this
#initate dictionary with your identifier as first object
dictData = {ind: dict()}
with open(file) as f:
for line in f:
# extact your data (haven't tested your code)
items = line.split("|")
day = items[0].strip()[0:8]
date = items[0].strip()[0:8]
count = items[2].strip()
# add days
if not day in dictData[ind].keys():
dictData[ind][day] = dict()
dictData[ind][day][date] = count

Unique list from dynamic range table with possible blanks

I have an Excel table in sheet1 in which column A:
Name of company Company 1 Company 2 Company
3 Company 1 Company 4 Company 1 Company
3
I want to extract a unique list of company names to sheet2 also in column A. I can only do this with help of a helper column if I dont have any blanks between company names but when I do have I get one more company which is a blank.
Also, I've researched but the example was for non-dynamic tables and so it doesn't work because I don't know the length of my column.
I want in Sheet2 Column A:
Name of company Company 1 Company 2 Company 3
Company 4
Looking for the solution that requires less computational power Excel or Excel-VBA. The final order which they appear in sheet2 don't really matter.
Using a slight modification to Recorder-generated code:
Sub Macro1()
Sheets("Sheet1").Range("A:A").Copy Sheets("Sheet2").Range("A1")
Sheets("Sheet2").Range("A:A").RemoveDuplicates Columns:=1, Header:=xlYes
With Sheets("Sheet2").Sort
.SortFields.Clear
.SortFields.Add Key:=Range("A2:A" & Rows.Count) _
, SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
.SetRange Range("A2:A" & Rows.Count)
.Header = xlGuess
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
End Sub
Sample Sheet1:
Sample Sheet2:
The sort removes the blanks.
EDIT#1:
If the original data in Sheet1 was derived from formulas, then using PasteSpecial will remove unwanted formula copying. There is also a final sweep for empty cells:
Sub Macro1_The_Sequel()
Dim rng As Range
Sheets("Sheet1").Range("A:A").Copy
Sheets("Sheet2").Range("A1").PasteSpecial Paste:=xlPasteValues
Sheets("Sheet2").Range("A:A").RemoveDuplicates Columns:=1, Header:=xlYes
Set rng = Sheets("Sheet2").Range("A2:A" & Rows.Count)
With Sheets("Sheet2").Sort
.SortFields.Clear
.SortFields.Add Key:=rng, SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
.SetRange rng
.Header = xlGuess
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
Call Kleanup
End Sub
Sub Kleanup()
Dim N As Long, i As Long
With Sheets("Sheet2")
N = .Cells(Rows.Count, "A").End(xlUp).Row
For i = N To 1 Step -1
If .Cells(i, "A").Value = "" Then
.Cells(i, "A").Delete shift:=xlUp
End If
Next i
End With
End Sub
All of these answers use VBA. The easiest way to do this is to use a pivot table.
First, select your data, including the header row, and go to Insert -> PivotTable:
Then you will get a dialog box. You don't need to select any of the options here, just click OK. This will create a new sheet with a blank pivot table. You then need to tell Excel what data you're looking for. In this case, you only want the Name of company in the Rows section. On the right-hand side of Excel you will see a new section named PivotTable Fields. In this section, simply click and drag the header to the Rows section:
This will give a result with just the unique names and an entry with (blank) at the bottom:
If you don't want to use the Pivot Table further, simply copy and paste the result rows you're interested in (in this case, the unique company names) into a new column or sheet to get just those without the pivot table attached. If you want to keep the pivot table, you can right click on Grand Total and remove that, as well as filter the list to remove the (blank) entry.
Either way, you now have your list of unique results without blanks and it didn't require any formulas or VBA, and it took relatively few resources to complete (far fewer than any VBA or formula solution).
Here's another method using Excel's built-in Remove Duplicates feature, and a programmed method to remove the blank lines:
EDIT
I have deleted the code using the above methodology as it takes too long to run. I have replaced it with a method that uses VBA's collection object to compile a unique list of companies.
The first method, on my machine, took about two seconds to run; the method below: about 0.02 seconds.
Sub RemoveDups()
Dim wsSrc As Worksheet, wsDest As Worksheet
Dim rRes As Range
Dim I As Long, S As String
Dim vSrc As Variant, vRes() As Variant, COL As Collection
Set wsSrc = Worksheets("sheet1")
Set wsDest = Worksheets("sheet2")
Set rRes = wsDest.Cells(1, 1)
'Get the source data
With wsSrc
vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With
'Collect unique list of companies
Set COL = New Collection
On Error Resume Next
For I = 2 To UBound(vSrc, 1) 'Assume Row 1 is the header
S = CStr(Trim(vSrc(I, 1)))
If Len(S) > 0 Then COL.Add S, S
Next I
On Error GoTo 0
'Populate results array
ReDim vRes(0 To COL.Count, 1 To 1)
'Header
vRes(0, 1) = vSrc(1, 1)
'Companies
For I = 1 To COL.Count
vRes(I, 1) = COL(I)
Next I
'set results range
Set rRes = rRes.Resize(UBound(vRes, 1) + 1)
'Write the results
With rRes
.EntireColumn.Clear
.Value = vRes
.EntireColumn.AutoFit
'Uncomment the below line if you want
'.Sort key1:=.Columns(1), order1:=xlAscending, MatchCase:=False, Header:=xlYes
End With
End Sub
NOTE: You wrote you didn't care about the order, but if you want to Sort the results, that added about 0.03 seconds to the routine.
With two sheets named 1 and 2
Inside sheet named: 1
+----+-----------------+
| | A |
+----+-----------------+
| 1 | Name of company |
| 2 | Company 1 |
| 3 | Company 2 |
| 4 | |
| 5 | Company 3 |
| 6 | Company 1 |
| 7 | |
| 8 | Company 4 |
| 9 | Company 1 |
| 10 | Company 3 |
+----+-----------------+
Result in sheet named: 2
+---+-----------------+
| | A |
+---+-----------------+
| 1 | Name of company |
| 2 | Company 1 |
| 3 | Company 2 |
| 4 | Company 3 |
| 5 | Company 4 |
+---+-----------------+
Use this code in a regular module:
Sub extractUni()
Dim objDic
Dim Cell
Dim Area As Range
Dim i
Dim Value
Set Area = Sheets("1").Range("A2:A10") 'this is where your data is located
Set objDic = CreateObject("Scripting.Dictionary") 'use a Dictonary!
For Each Cell In Area
If Not objDic.Exists(Cell.Value) Then
objDic.Add Cell.Value, Cell.Address
End If
Next
i = 2 '2 because the heading
For Each Value In objDic.Keys
If Not Value = Empty Then
Sheets("2").Cells(i, 1).Value = Value 'Store the data in column D below the heading
i = i + 1
End If
Next
End Sub
The code return the date unsorted, just the way data appears.
if you want a sorted list, just add this code before the las line:
Dim sht As Worksheet
Set sht = Sheets("2")
sht.Activate
With sht.Sort
.SetRange Range("A:A")
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
This way the result will be always sorted.
(The subrutine would be like this)
Sub extractUni()
Dim objDic
Dim Cell
Dim Area As Range
Dim i
Dim Value
Set Area = Sheets("1").Range("A2:A10") 'this is where your data is located
Set objDic = CreateObject("Scripting.Dictionary") 'use a Dictonary!
For Each Cell In Area
If Not objDic.Exists(Cell.Value) Then
objDic.Add Cell.Value, Cell.Address
End If
Next
i = 2 '2 because the heading
For Each Value In objDic.Keys
If Not Value = Empty Then
Sheets("2").Cells(i, 1).Value = Value 'Store the data in column D below the heading
i = i + 1
End If
Next
Dim sht As Worksheet
Set sht = Sheets("2")
sht.Activate
With sht.Sort
.SetRange Range("A:A")
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
End Sub
If you have any question about the code, I will glad to explain.

Regex / subString to extract all matching patterns / groups

I get this as a response to an API hit.
1735 Queries
Taking 1.001303 to 31.856310 seconds to complete
SET timestamp=XXX;
SELECT * FROM ABC_EM WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
38 Queries
Taking 1.007646 to 5.284330 seconds to complete
SET timestamp=XXX;
show slave status;
6 Queries
Taking 1.021271 to 1.959838 seconds to complete
SET timestamp=XXX;
SHOW SLAVE STATUS;
2 Queries
Taking 4.825584, 18.947725 seconds to complete
use marketing;
SET timestamp=XXX;
SELECT * FROM ABC WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
I have extracted this out of the response html and have it as a string now.I need to retrieve values as concisely as possible such that I get a map of values of this format Map(Query -> T1 to T2 seconds) Basically what this is the status of all the slow queries running on MySQL slave server. I am building an alert system over it . So from this entire paragraph in the form of String I need to separate out the queries and save the corresponding time range with them.
1.001303 to 31.856310 is a time range . And against the time range the corresponding query is :
SET timestamp=XXX; SELECT * FROM ABC_EM WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
This information I was hoping to save in a Map in scala. A Map of the form (query:String->timeRange:String)
Another example:
("use marketing; SET timestamp=XXX; SELECT * FROM ABC WHERE last_modified >= 'XXX' AND last_modified xyz ;"->"4.825584 to 18.947725 seconds")
"""###(.)###(.)\n\n(.*)###""".r.findAllIn(reqSlowQueryData).matchData foreach {m => println("group0"+m.group(1)+"next group"+m.group(2)+m.group(3)}
I am using the above statement to extract the the repeating cells to do my manipulations on it later. But it doesnt seem to be working;
THANKS IN ADvance! I know there are several ways to do this but all the ones striking me are inefficient and tedious. I need Scala to do the same! Maybe I can extract recursively using the subString method ?
If you want use scala try this:
val regex = """(\d+).(\d+).*(\d+).(\d+) seconds""".r // extract range
val txt = """
|1735 Queries
|
|Taking 1.001303 to 31.856310 seconds to complete
|
|SET timestamp=XXX; SELECT * FROM ABC_EM WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
|
|38 Queries
|
|Taking 1.007646 to 5.284330 seconds to complete
|
|SET timestamp=XXX; show slave status;
|
|6 Queries
|
|Taking 1.021271 to 1.959838 seconds to complete
|
|SET timestamp=XXX; SHOW SLAVE STATUS;
|
|2 Queries
|
|Taking 4.825584, 18.947725 seconds to complete
|
|use marketing; SET timestamp=XXX; SELECT * FROM ABC WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
""".stripMargin
def logToMap(txt:String) = {
val (_,map) = txt.lines.foldLeft[(Option[String],Map[String,String])]((None,Map.empty)){
(acc,el) =>
val (taking,map) = acc // taking contains range
taking match {
case Some(range) if el.trim.nonEmpty => //Some contains range
(None,map + ( el -> range)) // add to map
case None =>
regex.findFirstIn(el) match { //extract range
case Some(range) => (Some(range),map)
case _ => (None,map)
}
case _ => (taking,map) // probably empty line
}
}
map
}
Modified ajozwik's answer to work for SQL commands over multiple lines :
val regex = """(\d+).(\d+).*(\d+).(\d+) seconds""".r // extract range
def logToMap(txt:String) = {
val (_,map) = txt.lines.foldLeft[(Option[String],Map[String,String])]((None,Map.empty)){
(accumulator,element) =>
val (taking,map) = accumulator
taking match {
case Some(range) if element.trim.nonEmpty=> {
if (element.contains("Queries"))
(None, map)
else
(Some(range),map+(range->(map.getOrElse(range,"")+element)))
}
case None =>
regex.findFirstIn(element) match {
case Some(range) => (Some(range),map)
case _ => (None,map)
}
case _ => (taking,map)
}
}
println(map)
map
}

Slickgrid Display Object in Single Column not Row

I am trying to figure out how to use slickgrid to display my object in a single column and not in a single row. Below is an example. Ideas?
Grid data looks like this
[{ fname: 'john', lname:'doe'}, {fname:'kyle', lname:'noobie'}]
This is what I get with slickgrid (row object view)
desc | fname | lanme
option1 | john | doe
option2 | kyle | noobie
This is what I want (column object view)
desc | option1 | option2
fname | john | kyle
lname | doe | noobie
So I figured this out and it's not pretty. Basically I have to pivot the data into a new row objects and then unpivot it back out to save it. It's not really that pretty but it gets the job done. The crazy thing is now you have to create a single editor and formatter for and apply that to each column. The editor/formatter look at the field names and do switch statement to determine what it should look like. Have fun. I did. :)
function getPivotedRows(origData) {
//this is a mapping between the column headings and the properties
var properties = [{ 'First Name': 'fname' }, { 'Last Name': 'lname' }];
//this holds my pivoted rows
var rows = [];
for (var i = 0; i < properties.length; i++) {
var row = {};
for (var j = 0; j < origData.length; j++) {
var key = "";
for (var p in properties[i]) {
row['Description'] = p; //set the column
key = properties[i][p];
}
row['options' + (j + 1)] = origData[j][key];
}
row[gridOptions.uniqueKey] = i;
rows.push(row);
}
return rows;
}