My access pattern/query would be: Get Name and Email of all friends by user (example USER#1).
I have added GSI to invert PK and SK which will allow me to query using SK.
Sample data is below in table,
+--------+----------+------+---------------+
| PK | SK | Name | Email |
+--------+----------+------+---------------+
| USER#1 | USER#1 | Bob | bob#email.com |
| USER#2 | USER#2 | Rob | rob#email.com |
| USER#3 | USER#3 | Tom | tom#email.com |
| USER#1 | FRIEND#2 | | |
| USER#1 | FRIEND#3 | | |
+--------+----------+------+---------------+
My question is it possible to get friends of USER#1 in single query?
Thanks in advance.
Option 1: Denormalization
Store the data in the following format:
+--------+----------+------+---------------+
| PK | SK | Name | Email |
+--------+----------+------+---------------+
| USER#1 | USER#1 | Bob | bob#email.com |
| USER#2 | USER#2 | Rob | rob#email.com |
| USER#3 | USER#3 | Tom | tom#email.com |
| USER#1 | FRIEND#2 | Rob | rob#email.com |
| USER#1 | FRIEND#3 | Tom | tom#email.com |
+--------+----------+------+---------------+
Pro: can query and get user name and email directly
Con: update becomes expensive
Option 2: BatchGetItem
Store the data as it is, aka
+--------+----------+------+---------------+
| PK | SK | Name | Email |
+--------+----------+------+---------------+
| USER#1 | USER#1 | Bob | bob#email.com |
| USER#2 | USER#2 | Rob | rob#email.com |
| USER#3 | USER#3 | Tom | tom#email.com |
| USER#1 | FRIEND#2 | | |
| USER#1 | FRIEND#3 | | |
+--------+----------+------+---------------+
Then use BatchGetItem API to retrieve the name and email for each friend.
Pro: data is "normalized" (less duplication, less storage)
Con: you need to do BatchGetItem's in a for loop if you want to retrieve all the friends' details
Related
I am new to PowerBI. I am trying to implement the following scenario in PowerBI
I have the following 2 tables -
Table 1:
| ExtractionId | DatasetId | UserId |
| -- | --- | --- |
| E_ID1 | D_ID1 | sta#example.com |
| E_ID2 | D_ID1 | dany#example.com |
| E_ID3 | D_ID2 | dany#example.com |
Table 2:
| DatasetId | Date | UserId | Status |
| --| --- | --- | --- |
| D_ID1 | 05/30/2021 | sta#example.com | Completed |
| D_ID1 | 05/30/2021 | dany#example.com | Completed |
| D_ID1 | 05/31/2021 | sta#example.com | Partial |
| D_ID1 | 05/31/2021 | dany#example.com | Completed |
| D_ID2 | 05/30/2021 | sta#example.com | Completed |
| D_ID2 | 05/30/2021 | dany#example.com | Completed |
| D_ID2 | 05/31/2021 | sta#example.com | Partial |
| D_ID2 | 05/31/2021 | dany#example.com | Completed |
I am trying to create a PowerBI report where, given an extraction id (in a slicer), we need to identify the corresponding DatasetId and UserID from Table 1 and use those fields to filter Table 2 and provide a visual of user status on the given date range.
When I am trying to implement the above scenario, I am creating a Many-Many relationship between DatasetID columns of Table1 and Table2, but cannot do the same for UserID column simultaneously as I get the following error :
You can't create a direct active relationship between Table1 and Table2 because an active set of indirect relationship already exists.
Because of this, given an extractionId, I can filter on DatasetID but not UserId and vice versa. Could you please help me understand what mistake I am doing here and how to resolve the same?
Thanks in advance
This case you said too. You can only merge two or more columns. Than you will create relationships.
I find it best to use an example, so here we go:
Say I have a table with chores and a table with a weekly schedule like this:
CHORES:
|----+---------------+----------+-------|
| id | name | type | hours |
|----+---------------+----------+-------|
| 1 | clean kitchen | cleaning | 4 |
|----+---------------+----------+-------|
| 2 | clean toilet | cleaning | 3 |
etc
SCHEDULE:
|------+---------------+---------------+-----|
| week | monday | tuesday | etc |
|------+---------------+---------------+-----|
| 1 | clean kitchen | clean toilet | etc |
|------+---------------+---------------+-----|
| 2 | clean toilet | clean kitchen | etc |
etc
I want to make sure that for one week, you can't have duplicate cells, so this wouldn't be allowed:
SCHEDULE:
|------+---------------+--------------+-----|
| week | monday | tuesday | etc |
|------+---------------+--------------+-----|
| 1 | clean toilet | clean toilet | etc |
etc
What would I have to do in my models.py to get this behaviour?
Try django unique-together in model meta option.
https://docs.djangoproject.com/en/1.11/ref/models/options/#unique-together
I'd better user ManyToMany through another table like that:
SCHEDULE:
------+------------------------+
| week | chores |
|------+------------------------+
| 1 | many to many to chores |
|------+------------------------+
| 2 | many to many to chores |
And trough table like that
THROUGH TABLE:
|---------+---------------+---------------+
| week_id | day of week | chores_id |
|---------+---------------+---------------+
| 1 | Monday | clean toilet |
|---------+---------------+---------------+
| 1 | Tuesday | clean kitchen |
And in that table make unique together for week_id and chores_id
I am new to django and SQL queries. I am trying some annotation with django. but unable to get results
+-----------------------+-----------+---------------------+
| email | event | event_date |
|-----------------------+-----------+---------------------|
| hector#example.com | open | 2017-01-03 13:26:13 |
| hector#example.com | delivered | 2017-01-03 13:26:28 |
| hector#example.com | open | 2017-01-03 13:26:33 |
| hector#example.com | open | 2017-01-03 13:26:33 |
| tornedo#example.com | open | 2017-01-03 13:34:53 |
| tornedo#example.com | 1 | 2017-01-03 13:35:22 |
| tornedo#example.com | open | 2016-09-05 00:00:00 |
| tornedo#example.com | open | 2016-09-17 00:00:00 |
| sparrow#example.com | open | 2017-01-03 16:05:36 |
| tornedo#example.com | open | 2017-01-03 20:12:15 |
| hector#example.com | open | 2017-01-03 22:06:47 |
| sparrow#example.com | open | 2017-01-09 19:46:26 |
| sparrow#example.com | open | 2017-01-09 19:47:59 |
| sparrow#example.com | open | 2017-01-09 19:48:28 |
| sparrow#example.com | delivered | 2017-01-09 19:52:24 |
+-----------------------+-----------+---------------------+
I have a table like this which contains email activity. I want to find who opened recently and also i want to count of each event happened. I want results exactly like
email | open | click | delivered | max_open_date
hector#example.com 4 <null> 1 2017-01-03 22:06:47
sparrow#example.com 3 <null> 1 2017-01-09 19:48:28
tornedo#example.com 4 1 <null> 2017-01-03 20:12:15
my model looks:
class EmailEvent(models.Model):
event = models.TextField(blank=True, null=True)
email = models.TextField(blank=True, null=True)
event_date = models.DateTimeField(blank=True, null=True)
i tried the following code. it giving correct count for open, click, delivered but giving wrong result for max_open_date. but i don't know why
EmailEvent.objects.values('email').annotate(
max_open_date=Case(When(event='open', then=Max('event_date')))),
open=Sum(Case(When(event='open',then=1),output_field=IntegerField())),
click=Sum(Case(When(event='click',then=1),output_field=IntegerField())),
open=Sum(Case(When(event='open',then=1),output_field=IntegerField())),
delivered=Sum(Case(When(event='delivered',then=1),output_field=IntegerField())),
)
Help me to get exact results i want. sorry for my bad english. Thanks!
I do not use django, but probably you need something like this:
max_open_date=Max(Case(When(event='open', then='event_date')))
Suppose we are given dataset ("DATA") like :
YEAR | FIRST NAME | LAST NAME | VARIABLES
2008 | JOY | ANDERSON | spark|python|scala; 45;w/o sports;w datascience
2008 | STEVEN | JOHNSON | Spark|R; 90|56
2006 | NIHA | DIVA | w/o sports
and we have another dataset ("RESULT") like :
YEAR | FIRST NAME | LAST NAME
1992 | EMMA | CENA
2008 | JOY | ANDERSON
2008 | STEVEN | ANDERSON
2006 | NIHA | DIVA
and so on.
The output should be ("RESULT") :
YEAR | FIRST NAME | LAST NAME | SUBJECT | SCORE | SPORTS | DATASCIENCE
1992 | EMMA | CENA | | | |
2008 | JOY | ANDERSON | SPARK | 45 | FALSE | TRUE
2008 | JOY | ANDERSON | PYTHON | 45 | FALSE | TRUE
2008 | JOY | ANDERSON | SCALA | 45 | FALSE | TRUE
2008 | STEVEN | ANDERSON | | | |
2006 | NIHA | DIVA | | | FALSE |
2008 | STEVEN | JOHNSON | SPARK | 90 | |
2008 | STEVEN | JOHNSON | SPARK | 56 | |
2008 | STEVEN | JOHNSON | R | 90 | |
2008 | STEVEN | JOHNSON | R | 56 | |
and so on.
Please note that there are some rows in DATA which are not present in RESULT and vice-versa. For eg - "2008,STEVEN,JOHNSON" is not present in RESULT but is present in DATA. And the entries should be made in RESULT dataset. The columns {SUBJECT, SCORE, SPORTS, DATASCIENCE} are made by my intuition that "spark" refers to the SUBJECT and so on.
Hope you understand my query. And I am using spark-shell with spark dataframes.
Note that "Spark" and "spark" should be considered as same.
As explained in the comments, you have can implement some of the tricky logic as in answers to splitting row in multiple row in spark-shell
data:
val df = List(
("2008","JOY ","ANDERSON ","spark|python|scala;45;w/o sports;w datascience"),
("2008","STEVEN ","JOHNSON ","Spark|R;90|56"),
("2006","NIHA ","DIVA ","w/o sports")
).toDF("YEAR","FIRST NAME","LAST NAME","VARIABLE")
I only highlight the relatively tricky parts, you can figure it out the details yourself. I suggest to handle "w" and "w/o" tags separately. Furthermore, you have to explode the language in separate "sql" statements. This give
val step1 = df.withColumn("backrefReplace",split(regexp_replace('VARIABLE,"^([A-z|]+)?;?([\\d\\|]+)?;?(w.*)?$","$1"+sep+"$2"+sep+"$3"),sep))
.withColumn("letter",explode(split('backrefReplace(0),"\\|")))
.select('YEAR,$"FIRST NAME",$"LAST NAME",'VARIABLE,'letter,
explode(split('backrefReplace(1),"\\|")).as("digits"),
'backrefReplace(2).as("tags")
)
which gives
scala> step1.show(false)
+----+----------+---------+----------------------------------------------+------+------+------------------------+
|YEAR|FIRST NAME|LAST NAME|VARIABLE |letter|digits|tags |
+----+----------+---------+----------------------------------------------+------+------+------------------------+
|2008|JOY |ANDERSON |spark|python|scala;45;w/o sports;w datascience|spark |45 |w/o sports;w datascience|
|2008|JOY |ANDERSON |spark|python|scala;45;w/o sports;w datascience|python|45 |w/o sports;w datascience|
|2008|JOY |ANDERSON |spark|python|scala;45;w/o sports;w datascience|scala |45 |w/o sports;w datascience|
|2008|STEVEN |JOHNSON |Spark|R;90|56 |Spark |90 | |
|2008|STEVEN |JOHNSON |Spark|R;90|56 |Spark |56 | |
|2008|STEVEN |JOHNSON |Spark|R;90|56 |R |90 | |
|2008|STEVEN |JOHNSON |Spark|R;90|56 |R |56 | |
|2006|NIHA |DIVA |w/o sports | | |w/o sports |
+----+----------+---------+----------------------------------------------+------+------+------------------------+
Then you have to handle capitalisation, and the tags. For the tags, you can have a relatively generic code using explode and pivot, but you have to do some cleaning to match your exact result. Here is an example:
List(("a;b;c")).toDF("str")
.withColumn("char",explode(split('str,";")))
.groupBy('str)
.pivot("char")
.count
.show()
+-----+---+---+---+
| str| a| b| c|
+-----+---+---+---+
|a;b;c| 1| 1| 1|
+-----+---+---+---+
Read more about pivot here
The final step is simply to do a left join on the second dataset (first "RESULT").
I have the following Model:
class SystemMessage(Model):
subject = TextField(default=None)
message = TextField(default=None)
And the following output from explain
| id | int(11) | NO | PRI | NULL | auto_increment |
| language_code | varchar(5) | NO | | NULL | |
| subject | longtext | NO | | NULL | |
| message | longtext | YES | | NULL | |
I have no outstanding migrations or makemigrations
Is there any reason why message is NULLable but subject is not? I would like both to be NOT NULLable.
I'm provided a default value for each field to ensure IntegrityError when saving the strings with no value.