Tokenizing Product Models - regex

Looking to match some product info, returning structured data and rewriting or looking up the value.
Example input:
"I have a 1999 Cat (D-6) and an Ingersoll Rand Model Z for sale"
From which I want to create something like
[ { year:1999, brand:"CATERPILLAR", model:"D6" },
{ year:null, brand:"INGERSOLL-RAND", model:"MODEL Z" } ]
Based on known data:
/\d{4}/, YEAR
...
/cat(erpill[ae]r)/, BRAND, "CATERPILLAR"
...
/d[\-\s]6/, MODEL, "D6"
Can this be done with Regex alone? Or do I need a Lexer?
I can figure out the regexes no problem, but confused about the re-writing part, and grouping things together

I think you want to extract car trading details.
Here you need NLP ,You can use Stanford Core NLP to design your own NLP regex or you can train a dataset.
but Stanford NER is developed model which will give you entities like Date and time , organization also location , person ,percentage and price.
other related tools: apache openNLP , aylien

Related

How to filter queryset by string field containing json (Django + SQLite)

I have the following situation.
The Flight model (flights) has a field named 'airlines_codes' (TextField) in which I store data in JSON array like format:
["TB", "IR", "EP", "XX"]
I need to filter the flights by 2-letter airline code (IATA format), for example 'XX', and I achieve this primitively but successfully like this:
filtered_flights = Flight.objects.filter(airlines_codes__icontains='XX')
This is great but actually not.
I have flights where airlines_codes look like this:
["TBZ", "IR", "EP", "XXY"]
Here there are 3-letter codes (ICAO format) and obviously the query filter above will not work.
PS. I cannot move to PostgreSQL, also I cannot alter in anyway the database. This has to be achieved only by some query.
Thanks for any idea.
Without altering the database in any way you need to filter the value as a string. Your best bet might be airlines_codes__contains. Here's what I would recommend assuming your list will always be cleaned exactly as you represent it.
Flight.objects.filter(airlines_codes__contains='"XX"')
As of Django 3.1 JSONField is supported on a wider array of databases. Ideally, for someone else building a similar system from the ground up, this field would be a preferable approach.

django query with filtered annotations from related table

Take books and authors models for example with books having one or more authors. Books having cover_type and authors having country as origin.
How can I list all the books with hard cover, and authors only if they're from from france?
Books.objects.filter(cover_type='hard', authors__origin='france')
This query doesnt retrieve books with hard cover but no french author.
I want all the books with hard cover, this is predicate #1.
And if their authors are from France, I want them annotated, otherwise authors field may be empty or 'None'.
e.g.:
`
Bookname, covertype, origin
The Trial, hardcover, none
Madam Bovary, hardcover, France
`
Tried many options, annotate, Q, value, subquery, when, case, exists but could come up with a solution.
With sql this is so easy:
select * from books b left join authors a on a.bookref=b.id and a.origin=france where b.covertype='hard'
(my models are not books and authors, i picked them because they're django-docs' example models. my models are building and buildingtype, where i want building.id=454523 with buildigtype where buildingtype is active, buildingtype might be null for the building or only 1 active and zero or more passive)
You should use Book id in Auther table.then your query will be like this: Author.objects.filter(origin="france",book__cover_type="hard")
I think i solved it with subquery, outerref, exists, case, when, charfield...too many imports for a simple sql.
`
author = Authors.objects.filter(bookref=OuterRef('id'), origin='France').values('origin')
books = Books.objects.filter(cover_type='hard').annotate(author=Case(When(Exists(author), then=Subquery(author)), default='none', output_field=CharField())).distinct().values('name','cover_type','author')
`

Time Series forecasting with DeepAR for multiple independent products

I wanted to forecast some data(suppose countries temperature).Is there any way to add multiple countires temperature at once in deepAR (Algorithm available at AWS Sagemaker marketplace) and deepAR forecast them independently?.Is it possible to remove a particular country data and add another after few days?
I am new to Forecasting and wanted to try deepAR.If anyone has arleady worked on this, please provide me some guidelines on how to do this using deepAR
Link - https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html
This is a late reply to this post, but my reply could be helpful in the future to others. The answer to your first question is yes.
The page you linked to references the cat field, this allows you to encode a vector representing different record groups. In your case, the cat field can just be a single value, but the cat field can encode more complex relationships too with more dimensions in the vector.
Say you have 3 countries you want to make predictions on. You have some time-series temperature training data for each country, then you would enter them as rows in the train JSON file like this:
Country 1:
{"start": "02/09/2019 00:00:00", "target": [T1,T2,T3,T4,...], "cat": [0]}
Country 2:
{"start": "02/09/2019 00:00:00", "target": [T1,T2,T3,T4,...], "cat": [1]}
Country 3:
{"start": "02/09/2019 00:00:00", "target": [T1,T2,T3,T4,...], "cat": [2]}
The category field indicates to DeepAR that these are independent data categories, in other words, different countries.
The frequency (time between temperature measurements) has to be the same for all data, however, the start time and the number of training points does not.
When you've trained the model, open the endpoint and want to make a prediction for a country, you can pass the context for a particular country along with the same cat as one of those countries above.
This allows you to make a single model that will allow you to make predictions from many independent groups of data.
I'm not sure exactly what you mean by the second question. If you mean to add more training data for another country later on, this would require you to create a different training dataset with an additional category for that country, then re-train the model.

Django postgress - dynamic SearchQuery object creation

I have a app that lets the user search a database of +/- 100,000 documents for keywords / sentences.
I am using Django 1.11 and the Postgres FullTextSearch features described in the documentation
However, I am running into the following problem and I was wondering if someone knows a solution:
I want to create a SearchQuery object for each word in the supplied queryset like so:
query typed in by the user in the input field: ['term1' , 'term2', 'term3']
query = SearchQuery('term1') | SearchQuery('term2') | SearchQuery('term3')
vector = SearchVector('text')
Document.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank').annotate(similarity=TrigramSimilarity(vector, query).filter(simularity__gt=0.3).order_by('-simularity')
The problem is that I used 3 terms for my query in the example, but I want that number to be dynamic. A user could also supply 1, or 10 terms, but I do not know how to add the relevant code to the query assignment.
I briefly thought about having the program write something like this to an empty document:
for query in terms:
file.write(' | (SearchQuery( %s )' % query ))
But having a python program writing python code seems like a very convoluted solution. Does anyone know a better way to achieve this?
Ive never used it, but to do a dynamic query you can just loop and add.
compound_statement = SearchQuery(list_of_words[0])
for term in list_of_words[1:]:
compound_statement = compound_statement | SearchQuery(term)
But the documentation tells us that
By default, all the words the user provides are passed through the stemming algorithms, and then it looks for matches for all of the resulting terms.
are you sure you need this?

Regex Input Validation Limit to maximum no. of commas

I am running jQuery Autocomplete with a Laravel form field.
It grabs data from my db
Specialty Area Examples: Real Estate, Mortgage Lenders, Renovation, Buyer's Agent, Listing Agent, Relocation, Short-Sale, Consulting, Local Experts, Refinancing, Architecture, Home Building, Carpentry, Electrical, Engineering, Interior Design, Landscaping, Painting, Plumbing, Appraisal, Commercial Property, Insurance, Legal, Conveyancing,
Users can type in one of the examples and the autocomplete will complete the rest in the field.
I want to limit the user to being allowed to input a maximum of 4 Specialty Area Examples into the form field. So a user can type in for example:
Real Estate, Short-Sale, Consulting, Local Experts
After that the user should not be allowed to input more data. So the maximum number of commas I need to set in the form field is 3.
Try this:
$("#txtBox").keypress(function (e) {
var input = $(this).val() + String.fromCharCode(e.which);
if (input.split(',').length > 4) {
e.preventDefault();
}
});
Demo: http://jsfiddle.net/y6eQF/
This RegEx should do what you want: ([a-zA-Z0-9\-\_\ \'\"]+\,){3}[a-zA-Z0-9\-\_\ \'\"]+
You can also do it with split() as Vinod mentioned. In PHP you have split()/explode() as well.