How to analyze an HTML text with compound words - amazon-web-services

I'm writing a search service based on Elasticsearch for a bunch of sites with content written in agglutinated languages like Swedish, German and Finnish.
I know that Elasticsearch offers language analyzers by default but after some testing I found their support sloppy at best.
What I got so far is:
{
"settings":{
"analysis":{
"filter":{
"swedish_stop":{
"type": "stop",
"stopwords": "_swedish_"
},
"swedish_stemmer":{
"type":"stemmer",
"language":"swedish"
},
"swedish_words":{
"type":"dictionary_decompounder",
"word_list":["very", "long", "list", "of", "words", "almost", "13", "MB"]
}
},
"analyzer":{
"custom_swedish":{
"tokenizer": "standard",
"filter":[
"lowercase",
"swedish_stop",
"swedish_stemmer",
"swedish_words"
],
"char_filter":[
"html_strip"
]
}
}
}
}
}
Do you guys have a clue?

Related

Preg Match GF to pull data from API CALL (Podio CRM)

I am trying to accomplish pulling all of the data that populates from this API CALL made within my CRM Podio...
The API call response is the following:
{
"status": {
"version": "1.0.0",
"code": 0,
"msg": "SuccessWithResult",
"total": 1,
"page": 1,
"pagesize": 10,
"transactionID": "ba31a62303e76d49b2063e94e2972bc6"
},
"property": [
{
"identifier": {
"Id": 34476108,
"fips": "48201",
"apn": "1288930010042",
"attomId": 34476108
},
"lot": {
"lotnum": "42",
"lotsize1": 0.2735078,
"lotsize2": 11914,
"poolind": "YES"
},
"area": {
"blockNum": "1",
"loctype": "VIEW - NONE",
"countrysecsubd": "Harris",
"countyuse1": "1001 ",
"muncode": "HA",
"munname": "HARRIS",
"subdname": "BLACKHORSE RANCH SOUTH SEC 6",
"taxcodearea": "40"
"legal1": "BLACK HORSE RANCE LOT 14 BLOCK 12 USA"
etc.
I have tried the following code to pull just the legal description but it returns the entire API response in the comments of my crm. I am trying to get all data points listed individually.
preg_match_gf("/legal1\.\:\s\/(.*)/ism",[(Variable) PropertyDetails], 1)
Any advise or insight is much appreciated!!
Thank you,
Cody

DynamoDB LIKE '%' (contains) search over an array of objects using a key from the object, NodeJS

I am trying to use a "LIKE" search on DynamoDB where I have an array of objects using nodejs.
Looking through the documentation and other related posts I have seen this can be done using the CONTAINS parameter.
My question is - Can I run a scan or query over all of my items in DynamoDB where a value in my object is LIKE "Test 2".
Here is my DynamoDB Table
This is how it looks as JSON:
{
"items": [
{
"description": "Test 1 Description",
"id": "86f550e3-3dee-4fea-84e9-30df174f27ea",
"image": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/86f550e3-3dee-4fea-84e9-30df174f27ea.jpg",
"live": 1,
"status": "new",
"title": "Test 1 Title"
},
{
"description": "Test 2 Description",
"id": "e17dbb45-63da-4567-941c-bb7e31476f6a",
"image": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/e17dbb45-63da-4567-941c-bb7e31476f6a.jpg",
"live": 1,
"status": "new",
"title": "Test 2 Title"
},
{
"description": "Test 3 Description",
"id": "14ad228f-0939-4ed4-aa7b-66ceef862301",
"image": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/14ad228f-0939-4ed4-aa7b-66ceef862301.jpg",
"live": 1,
"status": "new",
"title": "Test 3 Title"
}
],
"userId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
}
I am trying to perform a scan / query which will look over ALL users (every row) and look at ALL items and return ALL instances where description is LIKE "Test 2".
I have tried variations of scans as per the below:
{
"TableName": "my-table",
"ConsistentRead": false,
"ExpressionAttributeNames": {
"#items": "items",
},
"FilterExpression": "contains (#items, :itemVal)",
"ExpressionAttributeValues": {
":itemVal":
{
"M": {
"description": {
"S": "Test 2 Description"
},
"id": {
"S": "e17dbb45-63da-4567-941c-bb7e31476f6a"
},
"image": {
"S": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/e17dbb45-63da-4567-941c-bb7e31476f6a.jpg"
},
"live": {
"N": "1"
},
"status": {
"S": "new"
},
"title": {
"S": "Test 2 Title"
}
}
}
}
}
The above scan works but as you can see I am passing in the whole object as an ExpressionAttributeValues, what I want to do is just pass in the description for example something like the below (which doesnt work and returns no items found).
{
"TableName": "my-table",
"ConsistentRead": false,
"ExpressionAttributeNames": {
"#items": "items.description",
},
"FilterExpression": "contains (#items, :itemVal)",
"ExpressionAttributeValues": {
":itemVal":
{
"S": "Test 2"
}
}
}
Alternatively, would it be better to create a separate table where all the items are added and they are linked via the userId? I was always under the impression there should be one table per application but in this instance I think if I had all the item data at the top level, scanning it would be a lot safer and faster.
So with nearly 200 views since posting and no responses I have come up with a solution that does not immediately solve the initial problem (I honestly do not think it can be solved) but have come up with an alternative approach.
Firstly I do not want two tables as this seems overkill, and I do not want the aws costs associated with two tables.
This has lead me to restructure the primary keys with prefixes which I can search over using the "BEGINS_WITH" dynamodb selector query.
Users will be added as U_{USER_ID} and items will be added as I_{USER_ID}_{ITEM_ID}, this way I only have one table to manage and pay for and this allows me to run BEGINS_WITH "U_" to get a list of users or "I_" to get a list of items.
I will then flatten the item data as strings so I can run "contains" searches on any of the item data. This also allows me to run a "contains {USER_ID}" search on the primary keys for items so I can get a list of items for a particular user.
Hope this helps anyone who might come up against the same issue.

In Loopback.js, how to query on embedded models?

I'm using MongoDB connector and have a Considerate and Discussion model setup like this:
model-config.json:
{
"Considerate": {"dataSource": "db"},
"Discussion": {"dataSource": "transient"}
}
considerate.json:
{
"name": "Considerate",
"base": "PersistedModel",
"relations": {
"discussion": {"type": "embedsOne", "model": "Discussion"}
}
}
discussion.json:
{
"name": "Discussion",
"base": "Model",
"properties": {
"name": {"type": "string"}
},
"relations": {
"considerate": {"type": "belongsTo", "model": "Considerate"}
}
}
}
How can I query for Considerates based on Discussion's properties? For example, something like this:
Considerate.find({where: {'discussion.name': 'snow white'}})
Inspecting Mongo persisted data, I see that in each Considerate document, there's a _discussion property. Consequently, Considerate.find({where: {'_discussion.name': 'snow white'}}) works. However, this is undocumented and wondering if there is a documented/reliable way to to this.
You can use the filters in the REST APIs wherever you want data from your relations. Eg: [include][relationName]
I think you can also pass data to the fields here.
https://docs.strongloop.com/display/public/LB/Where+filter
This might help you too
https://github.com/strongloop/loopback/issues/517

How to form Hypermedia for Groups and Lists

I am designing a REST API and in this particular use case, trying to figure out the best way and what this hypermedia should look like.
The scenario is that the caller calls /persons?fields=lastName;filter=beginsWith=b because he wants back a list of people whos last names begin with "b".
Below shows the JSON response that I'm trying to figure out how best to mold/represent including hypermedia to associate with.
You can see a list of persons and you only see the name property because it's a partial representation of each person.
Then I try to add HATEAOS in here but not sure what would be the most useful to add and really what that should reference.
I figured ok, I can provide a href to the group (list) itself that I'm returning here. And if that's the case, where? I don't think it makes sense to put it in a root object like I'm doing. Because the expectation is to return a list of people, not also some root meta object so it doesn't feel good to me on what I have below.
OR
Does anyone think it isn't useful or not really embracing the HATEOS in this particular instance and I should instead provide some other type of href links in here?
JSON - List of person objects (representations) returned
[
"meta": {
"rel": "self",
"href": "http://ourdomain.api/persons?fields=lastName;filter=beginsWith=b"
},
{
"name": {
"last": "best"
}
},
{
"name": {
"last": "bettler"
}
},
{
"name": {
"last": "brown"
}
}
]
well, it is really your choice here, but I think that if you exclude a representation from navigability (i.e. not providing links to it) then it isn't HATEOAS anymore (just my opinion here).
In my opinion you could do something along the lines of the json you provided in the question, just separate clearly what is the result of the call and what is meta-information :
{
"meta": {
"rel": "self",
"href": "http://ourdomain.api/persons?fields=lastName;filter=beginsWith=b"
},
"resource" : [
{
"name": {
"last": "best"
}
},
{
"name": {
"last": "bettler"
}
},
{
"name": {
"last": "brown"
}
}
]
}
which is not so uncommon. Another approach would be to provide links to the list in every resource you return, with a relation like parent or source (maybe source makes more sense as it is a filtered query), e.g.
{[
{ "name" : {
"last" : "best"
},
"meta": {
"rel": "source",
"href": "http://ourdomain.api/persons?fields=lastName;filter=beginsWith=b"
}
},
{ ..and so on.. }
]}
after all, the list is just a container for the results, so it would feel more 'clean' (at least to me) not to have any hypermedia in it.

JsonPath expression to filter using regex

We are using a tool which uses jayway library for evaluating JSONpath expression. Javascript does NOT seem to work with it. How can I use regular expression in the JSONPath in such a case. For instance, in the below example I would like to filter all book titles whose title has the word "Sword" in it:
{
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
"expensive": 10
}
The Jayway implementation uses the Ruby regex operator:
$.store.book[?(#.title =~ /^.*Sword.*$/)]
To ignore case:
$.store.book[?(#.title =~ /^.*sword.*$/i)]
For the record, a workaround for conditional regex in Goessner's javascript JSONpath would be to write the query as follow:
$.store.book[?(/^.*sword.*$/i.test(#.title))]
Please see here
https://github.com/jpaquit/jsonpath/tree/0.8.5-+-regexp for "=~" syntax in JS lib.
You could use capturing group or lookbehind assertion.
"title":\s*"([^"]*\bSword\b[^"]*)"
Add case-insensitive modifier i if necessary. Grab the title string from group index 1.
DEMO