How to write CouchDB view to get currently active servers given start timestamp and end timestamp of each server? - mapreduce

I have set of documents which has the server name, with the start timestamp and end timestamp of that server. eg.
[
{
serverName: "Houston",
startTimestamp: "2018/03/07 17:52:13 +000",
endTimestamp: "2018/03/07 18:50:10 +000"
},
{
serverName: "Canberra",
startTimestamp: "2018/03/07 18:48:09 +000",
endTimestamp: "2018/03/07 20:10:00 +000"
},
{
serverName: "Melbourne",
startTimestamp: "2018/03/08 01:43:13 +000",
endTimestamp: "2018/03/08 12:09:10 +000"
}
]
With this data, given a Timestamp I need to get the list of active servers at that point of time.
For example. for TS="2018/03/07 18:50:00 +000" from the above data the list of active servers are ["Huston", "Canberra"]
Is it possible to achieve this using only CouchDB views. If so how to go about it?
Note: Initially I tried the following approach. In the map function I emit two documents
1 with key=doc.startTimestsamp and value={"station_add": doc.station}
1 with key=doc.startEndtsamp and value={"station_rem": doc.station}
My intention was to iterate through these in the reduce function adding stations present in "station_add" and removing stations in "stations_rem". But I found that CouchDB does not mention anything about the ordering of values in the reduce function.

If you can live with fixed periods and don't mind the extra disk space that might be needed for the view results, you can create a view of active servers per hour, for example.
Iterate over the periods between start and end and emit the time that each server was online during this period:
function(doc) {
var start = new Date(doc.startTimestamp).getTime()
var end = new Date(doc.endTimestamp).getTime()
var msPerPeriod = 60*60*1000
var msOfflineInFirstPeriod = start % msPerPeriod
var firstPeriod = start - msOfflineInFirstPeriod
var msOnlineInLastPeriod = end % msPerPeriod
var lastPeriod = end - msOnlineInLastPeriod
if (firstPeriod === lastPeriod) {
// The server was only online within one period.
emit([new Date(firstPeriod), doc.serverName], [1, msOnlineInLastPeriod - msOfflineInFirstPeriod])
} else {
// The server was online over multiple periods.
emit([new Date(firstPeriod), doc.serverName], [1,msPerPeriod - msOfflineInFirstPeriod])
for (var period = firstPeriod + msPerPeriod; period < lastPeriod; period += msPerPeriod) {
emit([new Date(period), doc.serverName], [1, msPerPeriod])
}
emit([new Date(lastPeriod), doc.serverName], [1,msOnlineInLastPeriod])
}
}
If you want the total without the server names, just add a reduce function with the built-in shortcut _sum. You'll get the number of servers online during the period as the first number and the milliseconds that the servers were online in that period as the second number.
You can play with the view if you emit the year, month and day as the first keys. Then you can use the group_level at query time to get a finer or more coarse overview.
Bear in mind that this view might get large on disk, as each row has to be stored, and also the intermediate results for each group level are stored. So you shouldn't set the period duration too small – emitting a row for each second would take a lot of disk space, for example.

Related

DynamoDB query performance is low

First of all some specs:
number of entries: 110k
total size of table: 700MB
number of columns per data set: can be up to 450
data set size: can be up to 25kB
read/write capacity: "on demand"
Problem: when trying to query for some rows by a column it takes easily up to 10 seconds or more.
The column we query by is an UUID column (index exists), not unique, used kind of like an external ID. So we say give me all records with that UUID and we expect up to ca. 1000 rows.
Even if I remove our application completely out of the equation (testing directly in the AWS management console) it makes no difference, still very poor performance (means also about 10 seconds or more).
So my question: do you have any ideas or concrete tips that I should check/test/adjust to improve the performance?
After request, here's the example code in PHP (reduced to relevant parts):
We use the official aws/aws-sdk-php package.
do {
// Marshal a native PHP array of data to a DynamoDB item.
$transformedValue = $this->getDynamoArrayFromNativeArray(
[':uuid' => $uuid]
);
$params = [
'TableName' => 'our-table-name',
'IndexName' => 'our-uuid-index',
'KeyConditionExpression' => 'our-uuid-column = :uuid',
'ExpressionAttributeValues' => $transformedValue,
];
if ($queryResult !== null) {
$lastEvaluatedKey = $queryResult['LastEvaluatedKey'];
$params['ExclusiveStartKey'] = $lastEvaluatedKey;
}
$queryResult = $this->client->query($params);
// (push results to some array)
} while ($queryResult['LastEvaluatedKey'] !== null);
Example data set:
{
"_id": "82ee23ce-d7ff-11eb-bf92-0aa84964df0a",
"_meta": {
"creation_date": 1624877797,
"uuid": "820025c0-d7ff-11eb-a5f4-0aa84964df0a"
},
"some_key.data.id": 63680,
(couple of hundred more simple key => value pairs, nothing special, no huge values or anything like that)
Read capacity chart for the index in question:
Query latency chart:

AmCharts4: Datagrouping, Tooltiptext and changing ranges

From a server, I get an entry for every hour in the range I request. If I request a range of 1 Year, I get something like 8000 Datapoints.
When rendering the graph, I want to group my data to hours(which is the raw data without grouping), days and months. However, the chart looks like this:
The tooltip does only display on the very first column, all other columns are above 1.5, but my ValueAxis does not scale automatically. I already checked if I set fixed min and max for the valueAxis, this is not the case.
Interestingly, if i use the scrollbar to zoom in until grouping kicks in, everything seems to work:
After zooming out again, it also works, but i cannot see the tooltip on the "June-Column":
And finally, if I trigger "invalidateData" the graph goes back to the state it was before.
My grouping looks as follows:
series = entry.chart.series.push(new am4charts.ColumnSeries());
dateAxis.groupData = true;
dateAxis.groupCount = 40;
dateAxis.groupIntervals.setAll([
{ timeUnit: "hour", count: 1 },
{ timeUnit: "day", count: 1 },
{ timeUnit: "month", count: 1 },
{ timeUnit: "year", count: 1 },
{ timeUnit: "year", count: 10 }
]);
series.groupFields.valueY = "sum";
I am also not very sure what I should set those values to. I want to see:
months when there is a period of 3 months or more
days when there is a period of 3 days until 3 months
hours when there is a period below 3 days
It is very difficult to do a fiddle for this, as there already is so much code and its hard to extract only the essential parts.
Maybe I am missing something obvious, please help!
Edit:
I forgot another question which is part of datagrouping:
How can I make the tooltip to show the date in a formatted matter so that:
hour-columns shows "dd hh:mm"(where mm obviously is 00 all the time)
day-columns shows: "dd.mm"
month-columns shows: "MM. YYYY"
Nevermind, I solved this issue. This was actually a bug fixed in this release:
https://github.com/amcharts/amcharts4/releases/tag/4.7.19
so updating my local amCharts4 files did the trick.
I still do not know how to change the tooltiptext and the grouping as described in my question

cloudant index: count number of unique users per time period

A very similar post was made about this issue here. In cloudant, I have a document structure storing when users access an application, that looks like the following:
{"username":"one","timestamp":"2015-10-07T15:04:46Z"}---| same day
{"username":"one","timestamp":"2015-10-07T19:22:00Z"}---^
{"username":"one","timestamp":"2015-10-25T04:22:00Z"}
{"username":"two","timestamp":"2015-10-07T19:22:00Z"}
What I want to know is to count the # of unique users for a given time period. Ex:
2015-10-07 = {"count": 2} two different users accessed on 2015-10-07
2015-10-25 = {"count": 1} one different user accessed on 2015-10-25
2015 = {"count" 2} two different users accessed in 2015
This all just becomes tricky because for example on 2015-10-07, username: one has two records of when they accessed, but it should only return a count of 1 to the total of unique users.
I've tried:
function(doc) {
var time = new Date(Date.parse(doc['timestamp']));
emit([time.getUTCFullYear(),time.getUTCMonth(),time.getUTCDay(),doc.username], 1);
}
This suffers from several issues, which are highlighted by Jesus Alva who commented in the post I linked to above.
Thanks!
There's probably a better way of doing this, but off the top of my head ...
You could try emitting an index for each level of granularity:
function(doc) {
var time = new Date(Date.parse(doc['timestamp']));
var year = time.getUTCFullYear();
var month = time.getUTCMonth()+1;
var day = time.getUTCDate();
// day granularity
emit([year,month,day,doc.username], null);
// year granularity
emit([year,doc.username], null);
}
// reduce function - `_count`
Day query (2015-10-07):
inclusive_end=true&
start_key=[2015, 10, 7, "\u0000"]&
end_key=[2015, 10, 7, "\uefff"]&
reduce=true&
group=true
Day query result - your application code would count the number of rows:
{"rows":[
{"key":[2015,10,7,"one"],"value":2},
{"key":[2015,10,7,"two"],"value":1}
]}
Year query:
inclusive_end=true&
start_key=[2015, "\u0000"]&
end_key=[2015, "\uefff"]&
reduce=true&
group=true
Query result - your application code would count the number of rows:
{"rows":[
{"key":[2015,"one"],"value":3},
{"key":[2015,"two"],"value":1}
]}

Neo4j regex string matching not returning expected results

I'm trying to use the Neo4j 2.1.5 regex matching in Cypher and running into problems.
I need to implement a full text search on specific fields that a user has access to. The access requirement is key and is what prevents me from just dumping everything into a Lucene instance and querying that way. The access system is dynamic and so I need to query for the set of nodes that a particular user has access to and then within those nodes perform the search. I would really like to match the set of nodes against a Lucene query, but I can't figure out how to do that so I'm just using basic regex matching for now. My problem is that Neo4j doesn't always return the expected results.
For example, I have about 200 nodes with one of them being the following:
( i:node {name: "Linear Glass Mosaic Tiles", description: "Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!"})
This query produces one result:
MATCH (p)-->(:group)-->(i:node)
WHERE (i.name =~ "(?i).*mosaic.*")
RETURN i
> Returned 1 row in 569 ms
But this query produces zero results even though the description property matches the expression:
MATCH (p)-->(:group)-->(i:node)
WHERE (i.description=~ "(?i).*mosaic.*")
RETURN i
> Returned 0 rows in 601 ms
And this query also produces zero results even though it includes the name property which returned results previously:
MATCH (p)-->(:group)-->(i:node)
WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
WHERE (searchText =~ "(?i).*mosaic.*")
RETURN i
> Returned 0 rows in 487 ms
MATCH (p)-->(:group)-->(i:node)
WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
RETURN searchText
>
...
SotoLinear Glass Mosaic Tiles Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!
...
Even more odd, if I search for a different term, it returns all of the expected results without a problem.
MATCH (p)-->(:group)-->(i:node)
WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
WHERE (searchText =~ "(?i).*plumbing.*")
RETURN i
> Returned 8 rows in 522 ms
I then tried to cache the search text on the nodes and I added an index to see if that would change anything, but it still didn't produce any results.
CREATE INDEX ON :node(searchText)
MATCH (p)-->(:group)-->(i:node)
WHERE (i.searchText =~ "(?i).*mosaic.*")
RETURN i
> Returned 0 rows in 3182 ms
I then tried to simplify the data to reproduce the problem, but in this simple case it works as expected:
MERGE (i:node {name: "Linear Glass Mosaic Tiles", description: "Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!"})
WITH i, (
i.name + " " + COALESCE(i.description, "")
) AS searchText
WHERE searchText =~ "(?i).*mosaic.*"
RETURN i
> Returned 1 rows in 630 ms
I tried using the CYPHER 2.1.EXPERIMENTAL tag as well but that didn't change any of the results. Am I making incorrect assumptions on how the regex support works? Is there something else I should try or some other way to debug the problem?
Additional information
Here is a sample call that I make to the Cypher Transactional Rest API when creating my nodes. This is the actual plain text that is sent (other than some formatting for easier reading) when adding nodes to the database. Any string encoding is just standard URL encoding that is performed by Go when creating a new HTTP request.
{"statements":[
{
"parameters":
{
"p01":"lsF30nP7TsyFh",
"p02":
{
"description":"Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!",
"id":"lsF3BxzFdn0kj",
"name":"Linear Glass Mosaic Tiles",
"object":"material"
}
},
"resultDataContents":["row"],
"statement":
"MATCH (p:project { id: { p01 } })
WITH p
CREATE UNIQUE (p)-[:MATERIAL]->(:materials:group {name: \"Materials\"})-[:MATERIAL]->(m:material { p02 })"
}
]}
If it is an encoding issue, why does a search on name work, description not work, and name + description not work? Is there any way to examine the database to see if/how the data was encoded. When I perform searches, the text returned appears correct.
just a few notes:
probably replace create unique with merge (which works a bit differently)
for your fulltext search I would go with the lucene legacy index for performance, if your group restriction is not limiting enough to keep the response below a few ms
I just tried your exact json statement, and it works perfectly.
inserted with
curl -H accept:application/json -H content-type:application/json -d #insert.json \
-XPOST http://localhost:7474/db/data/transaction/commit
json:
{"statements":[
{
"parameters":
{
"p01":"lsF30nP7TsyFh",
"p02":
{
"description":"Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!",
"id":"lsF3BxzFdn0kj",
"name":"Linear Glass Mosaic Tiles",
"object":"material"
}
},
"resultDataContents":["row"],
"statement":
"MERGE (p:project { id: { p01 } })
WITH p
CREATE UNIQUE (p)-[:MATERIAL]->(:materials:group {name: \"Materials\"})-[:MATERIAL]->(m:material { p02 }) RETURN m"
}
]}
queried:
MATCH (p)-->(:group)-->(i:material)
WHERE (i.description=~ "(?i).*mosaic.*")
RETURN i
returns:
name: Linear Glass Mosaic Tiles
id: lsF3BxzFdn0kj
description: Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!
object: material
What you can try to check your data is to look at the json or csv dumps that the browser offers (little download icons on the result and table-result)
Or you use neo4j-shell with my shell-import-tools to actually output csv or graphml and check those files.
Or use a bit of java (or groovy) code to check your data.
There is also the consistency-checker that comes with the neo4j-enterprise download. Here is a blog post on how to run it.
java -cp 'lib/*:system/lib/*' org.neo4j.consistency.ConsistencyCheckTool /tmp/foo
I added a groovy test script here: https://gist.github.com/jexp/5a183c3501869ee63d30
One more idea: regexp flags
Sometimes there is a multiline thing going on, there are two more flags:
multiline (?m) which also matches across multiple lines and
dotall (?s) which allows the dot also to match special chars like newlines
So could you try (?ism).*mosaic.*

GeoIP database that will Geo target by US state

I am looking for a GEO IP database (Similar to MaxMind GeoLite2 Country and City) that will allow me to identify the US state that the user is coming from in order to target specific content to that user.
Does anyone know how or where I could find such a database/service or solution?
Don't expect high accuracy, unless you're satisfied with country/city precision. It is after all IP based geolocation data and the accuracy varies. It is based on ISP providers or companies that manages the databases(commercial databases may have higher accuracy). Look at an IP location info webtool ( like http://geoipinfo.org/ ) and you'll see approximately where it finds you, and it also provided accuracy at city and country levels - percentage wise. It used the ip2location database for lookups and their precision data.
This thread is old, but as of today, I used http://api.ipstack.com and it's working perfectly. They have a VERY extensive help examples on their site, but basically you make the call, parse the data and get what you want.
First, be sure you have any/all includes (Namespace=System.Xml, System.Net, blah, blah).
Second, be sure not to test on a private network IP (192.168.x.x or 10.x.x.x) because that will always return blank/empty fields and you'll think something is coded wrong.
Third, you will need an Acess_Key from ipstack.com ... you can setup a FREE account (10,000 requests a month I think) and get your access code to enter into your string below for API call. I filled out the form and was up and running in 10 minutes for free.
This worked for me to track visitors to any page:
string IP = "";
string strHostName = "";
string strHostInfo = "";
string strMyAccessKeyForIPStack = "THEYGIVEYOUTHISWHENYOUSETUPFREEACCOUNT";
strHostName = System.Net.Dns.GetHostName();
IPHostEntry ipEntry = System.Net.Dns.GetHostEntry(strHostName);
IPAddress[] addr = ipEntry.AddressList;
IP = addr[2].ToString();
XmlDocument doc = new XmlDocument();
string strMyIPToLocate = "http://api.ipstack.com/" + IP + "?access_key=strMyAccessKeyForIPStack&output=xml";
doc.Load(strMyIPToLocate);
XmlNodeList nodeLstCity = doc.GetElementsByTagName("city");
XmlNodeList nodeLstState = doc.GetElementsByTagName("region_name");
XmlNodeList nodeLstZIP = doc.GetElementsByTagName("zip");
XmlNodeList nodeLstLAT = doc.GetElementsByTagName("latitude");
XmlNodeList nodeLstLON = doc.GetElementsByTagName("longitude");
strHostInfo = "IP is from " + nodeLstCity[0].InnerText + ", " + nodeLstState[0].InnerText + " (" + nodeLstZIP[0].InnerText + ")";
// Then I do what you want with strHostInfo, I put it in a DB myself, but whatever.