defrecord holding a incrementing `vector` / `java class` - clojure

Clojurians:
Thank you for your attention on this question !
Here is case I'm thinking about, I want to define a immutable bank account record
(defrecord account [ name balance statements])
(def cash-account (->account :cash 0.0 []))
I have a function that will deposit money to that account ,and a new record of account shall return
(.deposit cash-account 100.0 )
;; returns a new cash-account with attributes
;; name = :cash balance= 100, statment=[ [(2018,1,1),100 ] ]
With more and more deposit and withdraw happening , the field statement list will expanding with more and more transactions inside.
My question will be :
after 1000 transactions, there are 1000 elements in the statment field of latest account return.
When 1001th transaction happend:
will Clojure *copy* 1000 transactions in the statment field of old account record ,and append new transaction, save them into new account record ?
or Clojure just *append* the new transaction to the old account record and provide a new pointer to it , make it look like new account record like persistent map ?
Appreciate your help & many thanks

From https://clojure.org/reference/datatypes#_deftype_and_defrecord :
defrecord provides a complete implementation of a persistent map
deftype supports mutable fields, defrecord does not
so, in your case, it will not copy the transactions, instead it will use a persistent data structure so it will look like the transaction was appended.

Here are some more docs you should also check:
https://www.braveclojure.com/functional-programming/
https://clojure.org/guides/learn/sequential_colls
https://purelyfunctional.tv/guide/clojure-collections/
https://youtu.be/lJr6ot8jGQE

Related

DynamoDB Modeling. Fetch Items once for each user

I need to design a table (I'm newbie in DynamoDB) with these entities:
"Question".
ID (int)
Order (int)
IsPublished (bool)
Data (some json)
"User"
ID (int)
Data (some json)
"QuestionResult"
UserID (int)
QuestionID (int)
Result (some json)
QuestionResult table is being populated after user passes question. It's going to grow.
The main query pattern is:
I need to fetch f.e. 10 questions from "Question" table ordered by Order and IsPublished is true. And the most important - only those "Question" items the given User has never answered before. In other words - fetch from Questions excluding QuestionID exists in QuestionResult for particular UserID
Every time client asks for new set of questions - they should be new to him.
Any ideas how to organize such DB in efficient way? Taking into account QuestionResult table will rapidly grow with number of users and passed questions.

Dynamodb One table design - scan and filter with limit approach?

So I'm following the one table design and the PK keys that are with the below format
P#product1
P#product2
P#product3
W#warehouse1
W#warehouse2
P#product4
....
With this query pattern "get all products" , I need to run a scan to get all records "begins_with = P#" and I'm not sure if this is the ideal approach.
I understand Scan is resource-consuming (and I would love not to have to rely on it)
Not to mention that if I want to put in limit & pagination, the scenario becomes even more cumbersome (as limit is applied before the filter). E.g: the first scan with a limit of 10 may return only 3 products, next one may only return 2 , etc..)
Is there a more straight forward approach? I was hoping to at least scan through say 87 products out of 1000 records, and will still be able to get 9 pages of 10 products per instead?
I've come across other forum topics and found this solution that we can utilise Dynamodb Global Secondary Index
Basically:
We'll set up an attribute , say entitytype(values can be product,warehouse...)
And create a Global Secondary Index with
GSI PK : to set to that entitytype
GSI SK : set to the original PK
We'll end up having the below in this GSI
product P#product1
product P#product2
warehouse W#warehouse1
We can then query against this GSI using Query entitytype=product

DynamoDB query using DynamoDBMapper

Say if I had a DynamoDB table:
UserId (hash): S
BookName (range): S
BorrowedTime: S
ReturnedTime: S
UserId is the primary key (hash), and I needed to set BookName as sort key (range) because another item being added to the database was overwriting the previous with the same UserId.
How would I go about creating a query using DynamoDBMapper, but the fields being queried are the time fields (which are non-keys)? For instance, say if I wanted to return the UserId and BookName of any book borrowed over 2 weeks ago that hasn't been returned yet?
Do I need to setup a GSI on both BorrowedTime and ReturnedTime fields?
Yes you can make a GSI using BorrowedTime and ReturnedTime or you can use scan instead of a query , if you use scan you dont need to make a gsi but scan operations scan the whole database so it is not really recommended on large db or frequent use.

Is python's slicing sytax used on model queryset excuted on database level?

model:
class person(models.Model)
name = models.CharField()
...
If I use
persons = person.objects.order_by('name')[0:25]
in the code, is the slice executed on database level (converting to SELECT * FROM person ORDER BY name LIMIT 25) or on the "code" level?
This is made very clear in the documentation (and the answer is yes is does):
Use a subset of Python’s array-slicing syntax to limit your QuerySet to a certain number of results. This is the equivalent of SQL’s LIMIT and OFFSET clauses.
Yes, slicing gets translated to SQL's LIMIT.
I think it depends on when it's executed.
Django's ORM QuerySets are "lazy", in that they don't actually run until they are iterated over. This lets you do things like this:
persons = person.objects.filter(age__gte=25)
persons = persons.filter(age__lte=50)
persons = persons.exclude(age=30)
persons = persons.order_by('name')
persons = persons[:25]
for person in persons:
print person.name
Which translates to "Get everyone over the age of 25, under the age of 50, excluding anyone who is 30, order by their name, and give me the first 25 records.
Because the QuerySet is lazy, all of that code only creates a single database call, when you actually enter the for loop.
So, yes, technically, order_by translates to a LIMIT, when the ORM enters the loop.
However, what the ORM does behind the scenes is a create a Python list of each record the database returns. So, let's say we continue on after the above:
for person in persons: # SQL command is compiled and run, with a list returned
print person.name
persons = persons[:10] # Django just slices the list we already have in memory.
It may seem trivial, or an edge case, but it's important to understand what's happening behind the scenes.

Deduplicaton / matching in Couchdb?

I have documents in couchdb. The schema looks like below:
userId
email
personal_blog_url
telephone
I assume two users are actually the same person as long as they have
email or
personal_blog_url or
telephone
be identical.
I have 3 views created, which basically maps email/blog_url/telephone to userIds and then combines the userIds into the group under the same key, e.g.,
_view/by_email:
----------------------------------
key values
a_email#gmail.com [123, 345]
b_email#gmail.com [23, 45, 333]
_view/by_blog_url:
----------------------------------
key values
http://myblog.com [23, 45]
http://mysite.com/ss [2, 123, 345]
_view/by_telephone:
----------------------------------
key values
232-932-9088 [2, 123]
000-111-9999 [45, 1234]
999-999-0000 [1]
My questions:
How can I merge the results from the 3 different views into a final user table/view which contains no duplicates?
Or whether it is a good practice to do such deduplication in couchdb?
Or what would be a good way to do a deduplication in couch then?
ps. in the finial view, suppose for all dupes, we only keep the smallest userId.
Thanks.
Good question. Perhaps you could listen to _changes and search for the fields you want to be unique for the real user in the views you suggested (by_*).
Merge the views into one (emit different fields in one map):
function (doc) {
if (!doc.email || !doc.personal_blog_url || !doc.telephone) return;
emit([1, doc.email], [doc._id]);
emit([2, doc.personal_blog_url], [doc._id]);
emit([3, doc.telephone], [doc._id]);
}
Merge the lists of id's in reduce
When new doc in changes feed arrives, you can query the view with keys=[[1, email], [2, personal_blog_url], ...] and merge the three lists. If its minimal id is smaller then the changed doc, update the field realId, otherwise update the documents in the list with the changed id.
I suggest using different document to store { userId, realId } relation.
You can't create new documents by just using a view. You'd need a task of some sort to do the actual merging.
Here's one idea.
Instead of creating 3 views, you could create one view (that indexes the data if it exists):
Key Values
--- ------
[userId, 'phone'] 777-555-1212
[userId, 'email'] username#example.com
[userId, 'url'] favorite.url.example.com
I wouldn't store anything else except the raw value, as you'd end up with lots of unnecessary duplication of data (if you stored the full object for example).
Then, to query, you could do something like:
...startkey=[userId]&endkey=[userId,{}]
That would give you all of the duplicate information as a series of docs for that user Id. You'd still need to parse it apart to see if there were duplicates. But, this way, the results would be nicely merged into a single CouchDB call.
Here's a nice example of using arrays as keys on StackOverflow.
You'd still probably load the original "user" document if it had other data that wasn't part of the de-duplication process.
Once discovered, you could consider cleaning up the data on the fly and prevent new duplicates from occurring as new data is entered into your application.