Scenario
Source -> Lkp(dim) -> Expr -> 2Targets (Dim Table and Fact table)
Here target dim table is cached and used as dynamic lookup.
Whenever a row comes from source, it is looked up in the cache.
If lkp is successful then ID is retrieved and inserted into Fact along with other columns.
If lkp fails then new ID is generated in the Expr by a Sub Procedure call to database and is inserted into both Dim and Fact table along with other columns.
Issue
The ID since it is not a lookup column, it needs a value to be assigned in dynamic cache.
I can use the associated expression to set either a static value or a sequence.
But I need this ID to be generated by the DB which will not happen unless you pass the lkp and get into Expr
If I use auto sequence from informatica, then the dim table and cache will be out of sync - one getting sequence from informatica and the other from DB.
I cannot use informatica seq because this mapping will have multiple instances running concurrently.
Illustration
Dim table
ID col1 col2
401 a 1
402 b 3
403 h 8
Lookup
ID col1 col2
401 a 1
402 b 3
403 h 8
Fact Table
fcol1 fcol2 fcol3 ID(foreign key to DIM)
aa 11 1 401
bb 33 4 401
dk 44 2 403
Source
col1 col2
h 8
v 6
v 6
The first record from source will go smoothly. It will insert a record into fact with ID 403 from lkp
Fact Table after first record
fcol1 fcol2 fcol3 ID(foreign key to DIM)
aa 11 1 401
bb 33 4 401
dk 44 2 403
eo 23 5 403
The second record from source doesn't find a match in the lkp and hence updates the cache instantly.
Then Expr will execute this logic. If(lkp failed) then call sp(seq_gen) else use the id fetched from lkp
The new ID will be inserted to both DIM and FACT
Problem is this newly generated ID is not available to the lookup cache. Instead it uses the value from associated port.
Fact Table after second record
fcol1 fcol2 fcol3 ID(foreign key to DIM)
aa 11 1 401
bb 33 4 401
dk 44 2 403
eo 23 5 403
fa 32 3 404
Dim table after second record
ID col1 col2
401 a 1
402 b 3
403 h 8
404 v 6
Lookup
ID col1 col2
401 a 1
402 b 3
403 h 8
<ap> v 6
where is associated port expression. We can write any expression here. How do I bring 404 here. I cannot write Expr expression here as I cannot check if the lookup failed or succeeded.
Fact Table after third record
fcol1 fcol2 fcol3 ID(foreign key to DIM)
aa 11 1 401
bb 33 4 401
dk 44 2 403
eo 23 5 403
fa 32 3 404
ep 53 6 <ap> (I expect 404)
Related
I have the following scenario. My datasource looks like this:
Order Item Type Value
1 1 A 14
1 1 B 10
1 1 C 12
1 2 A 12
2 1 C 19
2 1 D 15
2 2 B 11
Now I apply a few steps in the query editor, inter alia, a Group By (by Order and Item), so that my finished table looks like this:
Order Item Value
1 1 36
1 2 12
2 1 34
2 2 11
I am looking now for a possiblity to filter my datasource table before the steps are getting applied (Filter datasource > query steps getting applied > chart changes).
In my example here I would filter the datasource by Type <> B:
Order Item Type Value
1 1 A 14
1 1 C 12
1 2 A 12
2 1 C 19
2 1 D 15
And the final table (chart datasource) would be looking like this:
Order Item Value
1 1 26
1 2 12
2 1 34
I tried it with parameters. But the problem is I need the filter in power bi online, so that the enduser can apply this filter.
Thanks in advance for any ideas !!
Don't apply the grouping in your query. Leave the source table as it is, create a measure which sums Value, and filter Type.
using order and item in a table visual(don't summarise) and for value using SUM of values, which can later be filtered by type should give the desired result.
It seems very simple but I can not get the graph to show the data I want.
So, I have got a lot of IDs with the end and start dates (LENGHT) and open items (OPEN). Each day has got availability (AVAIL) and there is nil used (USED) at day 1.
ID LENGTH OPEN USED AVAIL
1A 6 100 0 2400
I need to create the NEW_DAY column with count of the LENGHT. In this case the result would be
ID LENGTH NEW_DAY OPEN USED AVAIL
1A 6 1 100 0 2400
1A 6 2 100 0 2400
1A 6 3 100 0 2400
1A 6 4 100 0 2400
1A 6 5 100 0 2400
1A 6 6 100 0 2400
Note, I have hundreds of IDs so can not hard code it as 1A and needs to be dynamic.
I am not sure, but maybe this might help you.
If you add a blank query and add this expression:
= List.Repeat({1, 2}, 3)
you will get the first argument {1, 2} repeated three times.
When you separate your ID in a new column and pass this column to the code above (the same goes for the second argument) it might work.
I have large dataset of a few million patient encounters that include a diagnosis, timestamp, patientID, and demographic information.
We have found that a particular type of disease is frequently comorbid with a common condition.
I would like to count the number of this type of disease that each patient has, and then create a histogram showing how many people have 1,2,3,4, etc. additional diseases.
This is the format of the data.
PatientID Diagnosis Date Gender Age
1 282.1 1/2/10 F 25
1 282.1 1/2/10 F 87
1 232.1 1/2/10 F 87
1 250.02 1/2/10 F 41
1 125.1 1/2/10 F 46
1 90.1 1/2/10 F 58
2 140 12/15/13 M 57
2 282.1 12/15/13 M 41
2 232.1 12/15/13 M 66
3 601.1 11/19/13 F 58
3 231.1 11/19/13 F 76
3 123.1 11/19/13 F 29
4 601.1 12/30/14 F 81
4 130.1 12/30/14 F 86
5 230.1 1/22/14 M 60
5 282.1 1/22/14 M 46
5 250.02 1/22/14 M 53
Generally, I was thinking of a DO loop, but I'm not sure where to start because there are duplicates in the dataset, like with patient 1 (282.1 is listed twice). I'm not sure how to account for that. Any thoughts?
Target diagnoses to count would be 282.1, 232.1, 250.02. In this example, patient 1 would have a count of 3, patient 2 would have 2, etc.
Edit:
This is what I have used, but the output is showing each PatientID on multiple lines in the output.
PROC SQL;
create table want as
select age, gender, patientID,
count(distinct diagnosis_description) as count
from dz_prev
where diagnosis in (282.1, 232.1)
group by patientID;
quit;
This is what the output table looks like. Why is this patientID showing up so many times?
Obs AGE GENDER PATIENTID count
1 55 Male 107828695 1
2 54 Male 107828695 1
3 54 Male 107828695 1
4 54 Male 107828695 1
5 54 Male 107828695 1
If you include variables that are neither grouping variables or summary statistics then SAS will happily re-merge your summary statistics back with all of the source records. That is why you are getting multiple records. AGE can usually vary if your dataset covers many years. And GENDER can also vary if your data is messy. So for a quick analysis you might try something like this.
create table want as
select patientID
, min(age) as age_at_onset
, min(gender) as gender
, count(distinct diagnosis_description) as count
from dz_prev
where diagnosis in (282.1, 232.1)
group by patientID
;
I think you can get what you want with an SQL statement
PROC SQL NOPRINT;
create table want as
select PatientID,
count(distinct Diagnosis) as count
from have
where Diagnosis in (282.1, 232.1, 250.02)
group by PatientID;
quit;
This filters to only the diagnoses you are interested in, counts the distinct times they are seen, by the PatientID, and saves the results to a new table.
I need to add a foreign key to a table that I have imported using a csv
table:("SSSSSSSSSFFFFSSSSSFSSSSSSSSSSSSSSS"; enlist ",") 0:
`:table.csv
I do not want to have to redefine the whole table. is there a way to do this?
q)p:([p:`p1`p2`p3`p4`p5`p6]name:`nut`bolt`screw`screw`cam`cog;color:`red`green`blue`red`blue`red;weight:12 17 17 14 12 19;city:`london`paris`rome`london`paris`london)
q)sp:([]s:`s1`s1`s1`s1`s4`s1`s2`s2`s3`s4`s4`s1;p:`p$`p1`p2`p3`p4`p5`p6`p1`p2`p2`p2`p4`p5;qty:300 200 400 200 100 100 300 400 200 200 300 400)
q)
q)update `p$p from `sp
`sp
q)meta sp
c | t f a
---| -----
s | s
p | s p
qty| j
Defining a foreign key is similar to enumerating/casting and therefore an overload of $ is used.
`sp means that the table is updated in place.
I am facing a problem of huge memory leak on a server, serving a Django (1.8) app with Apache or Ngnix (The issue happens on both).
When I go on certain pages (let's say on the specific request below) the RAM of the server goes up to 16 G in few seconds (with only one request) and the server freeze.
def records(request):
"""Return list 14 last records page. """
values = []
time = timezone.now() - timedelta(days=14)
record =Records.objetcs.filter(time__gte=time)
return render(request,
'record_app/records_newests.html',
{
'active_nav_tab': ["active", "", "", ""]
' record': record,
})
When I git checkout to older version, back when there was no such problem, the problem survives and i have the same issue.
I Did a memory check with Gumpy for the faulty request here is the result:
>>> hp.heap()
Partition of a set of 7042 objects. Total size = 8588675016 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1107 16 8587374512 100 8587374512 100 unicode
1 1014 14 258256 0 8587632768 100 django.utils.safestring.SafeText
2 45 1 150840 0 8587783608 100 dict of 0x390f0c0
3 281 4 78680 0 8587862288 100 dict of django.db.models.base.ModelState
4 326 5 75824 0 8587938112 100 list
5 47 1 49256 0 8587987368 100 dict of 0x38caad0
6 47 1 49256 0 8588036624 100 dict of 0x39ae590
7 46 1 48208 0 8588084832 100 dict of 0x3858ab0
8 46 1 48208 0 8588133040 100 dict of 0x38b8450
9 46 1 48208 0 8588181248 100 dict of 0x3973fe0
<164 more rows. Type e.g. '_.more' to view.>
After a day of search I found my answer.
While investigating I checked statistics on my DB and saw that some table was 800Mo big but had only 900 rows. This table contains a Textfield without max len. Somehow one text field got a huge amount of data inserted into and this line was slowing everything down on every pages using this model.