I was having an issue with ms access 2000 in which I try to enter the same field in a query multiple times and it only displays the field once. As in if I entered the field with the number being (for example) 8150 multiple times, it would only display it once.
This image shows the query.
I've already checked everything on ms access 2000 to try to resolve this issue but I've come up with nothing suitable.
I know your data set is simplified, but looking at your data, inputs, etc, it appears your query is pulling from a single table and repeating results -- so there is no join consideration.
I think the issue is your DISTINCTROW in the query, which is removing all duplicate values.
If you remove the "DISTINCTROW," I believe it may give you what you are expecting. In other words, change this:
SELECT DISTINCTROW Ring.[Ring Number], Ring.[Mounting Weight]
FROM Ring
To this:
SELECT Ring.[Ring Number], Ring.[Mounting Weight]
FROM Ring
For what it's worth, there may also be some strategies to simplifying how this query is run in the future (less dependence on dialog box prompts), but I know you probably want to address the issue at a hand first, so let me know if this doesn't do it.
-- EDIT --
The removal of distinct still applies, but I suddenly see the problem. The query is depicting the logic as "OR" of multiple values. Therefore, repeating the value does not mean multiple rows, it just means you've repeated a true condition.
For example, if I have:
Fruit Count
------ ------
Apple 1
Pear 1
Kiwi 3
and I say select where Fruit is Apple or Apple or Apple or Apple, the query is still only going to list the first row. Once the "Or" condition matches true, short-circuiting kicks in, and no other conditions matter.
That does not sound like what you want.
Here's what I think you need to do:
Get rid of the prompts within the query
Load your options into a separate table -- the repetition can occur here
Change your query to perform an inner join on the new table
New table (named "Selection" for the sake of example):
Entry Ring Number Mounting Weight
----- ----------- ----------------
1 8105 you get the idea...
2 8110
3 8110
4 8110
5 8115
6 8130
7 8130
8 8130
9 8130
10 8150
New Query:
select
Ring.[Ring Number], Ring.[Mounting Weight]
from
Ring
Inner join Selection on Ring.[Ring Number] = Selection.[Ring Number]
This has the added advantage of allowing more (or less) than 10 records
Related
I'm building a staff allocation sheet for our production teams so that management can see (graphically) the peaks and troughs for each department. I've anonymised the data and shared it here: https://docs.google.com/spreadsheets/d/140w_v_ApksXH2q7h_dK5Iglm0VOTF1Zk9fq9jPxF5BM/edit?usp=sharing
What I am attempting to achieve is to have the week commencing dates across the top, the employees down the left (Based on a unique list of employees form the department tabs) and the individual cells populated with the relevant show number (I've manually entered the first two employees).
I had done this successfully by having hidden proxy columns for each show that pull a start and end date from the department sheet via and indirect lookup. Where this falls down is if an employee works on a production at two separate times, e.g. all of May and all of August but not between those months.
I initially attempted a series of nested ifs:
=if(and(indirect($A3&"!$B:$B")=$B:$B,
indirect($A3&"!$E:$E")<=AK$2,
indirect($A3&"!$F:$F")>AK$2,
indirect($A3&"!$C:$C")="SHOW #1"
),1,
if(and(indirect($A3&"!$B:$B")=$B:$B,
indirect($A3&"!$E:$E")<=AK$2,
indirect($A3&"!$F:$F")>AK$2,
indirect($A3&"!$C:$C")="SHOW #2"
),2,
if(and(indirect($A3&"!$B:$B")=$B:$B,
indirect($A3&"!$E:$E")<=AK$2,
indirect($A3&"!$F:$F")>AK$2,
indirect($A3&"!$C:$C")="SHOW #3"
),3,
. . .
if(and(indirect($A3&"!$B:$B")=$B:$B,
indirect($A3&"!$E:$E")<=AK$2,
indirect($A3&"!$F:$F")>AK$2,
indirect($A3&"!$C:$C")="SHOW #17"
),17,
0)))))))))))))))))
Hoping that this would calculate in each cell and return the correct result, however this did not work, that's when I discovered if() doesn't work with ranges in this context! I've effectively hit a wall on this.
I feel like I'm missing something obvious in this and would appreciate any help!!!
I hope somebody can help me with some hints for the following analysis. The students may do some actions for some courses (enroll, join, grant,...) and also the reverse - to cancel the latest action.
The first metric is to count all the action occurred in the system between two dates - these are exposed like a filter/slicer.
Some sample data :
person-id,person-name,course-name,event,event-rank,startDT,stopDT
11, John, CS101, enrol,1,2000-01-01,2000-03-31
11, John, CS101, grant,2,2000-04-01,2000-04-30
11, John, CS101, cancel,3,2000-04-01,2000-04-30
11, John, PHIL, enrol, 1, 2000-02-01,2000-03-31
11, John, PHIL, grant, 2, 2000-04-01,2000-04-30
The data set (ds) is above and I have added the following code for the count metric:
evaluate
sumx(
addcolumns( ds
,"z+", if([event] <> "cancel",1,0)
,"z-", if([event] = "cancel",-1,0)
)
,[z+] + [z-])
}
The metric should display : 3 subscriptions (John-CS101 = 1 , John-PHIL=2).
There are some other rules but I don't know how to add them to the DAX code, the cancel date is the same as the above action (non-cancel) and the rank of the cancel-action = the non-cancel-action + 1.
Also there is a need for adding the number for distinct student and course, the composite key . How to add this to the code, please ? (via summarize, rankx)
Regards,
Q
This isn't technically an answer, but more of a recommendation.
It sounds like your challenge is that you have actions that may then be cancelled. There is specific logic that determines whether an action is cancelled or not (i.e. the cancellation has to be the immediate next row and the dates must match).
What I would recommend, which doesn't answer your specific question, is to adjust your data model rather than put the cancellation logic in DAX.
For example, if you could add a column to your data model that flags a row as subsequently cancelled, then all DAX has to do is check that flag to know if an action is cancelled or not. A CALCULATE statement. You don't have to have lots of logic to determine whether the event was cancelled. You entirely eliminate the need for SUMX, which can be slow when working with a lot of rows since it works row by row.
The logic for whether an action is cancelled or not moves to your source system (e.g. SQL or even a calculated column in Excel), or to your ETL (e.g. the Query Editor in Power BI) which are better equipped for such tasks. The logic is applied 1 time and then exists in your data model for all measures, instead of needing to apply the logic each time a measure is used.
I know this doesn't help you solve your logic question, but the reason I make this recommendation is that DAX is fundamentally a giant calculator. It adds things up. It's great at filters (adding some things up but not others), but it works best when everything is reduced to columns that it can sum or count. Once you go beyond that (e.g. wanting to look at the row below to adjust something about the current row), your DAX is going to get very complicated (and slow), whereas a source system or the Query Editor will likely be able to handle such requirements more easily.
I am new to RedShift and just experimenting at this stage to help with table design.
We have a very simple table with about 6 million rows and 2 integer fields.
Both integer fields are in the sort key but the plan has a warning - "very selective query filter".
The STL_Alert_Event_Log entry is:
'Very selective query filter:ratio=rows(61)/rows_pre_user_filter(524170)=0.000116'
The query we are running is:
select count(*)
from LargeNumberofRowswithUniKey r
where r.benchmarkid = 291891 and universeid = 300901
Our Table DDL is:
CREATE TABLE public.LargeNumberofRowswithUniKey
(
benchmarkid INTEGER NOT NULL DISTKEY,
UniverseID INTEGER NOT NULL
)
SORTKEY
(
benchmarkid,UniverseID
);
We have also run the following commands on the table:
Vacuum full public.LargeNumberofRowswithUniKey;
Analyze public.LargeNumberofRowswithUniKey;
The screenshot of the plan is here: [Query Plan Image][1]
My expectation was that the multiple sort key including Benchmark and Universe and the fact that both are part of the filter predicate would ensure that the design was optimal for the sample query. This does not seem to be the case, hence the red warning symbol in the attached image. Can anyone shed light on this?
Thanks
George
Update 2017/09/07
I have some more information that may help:
If I run a much simpler query which just filters on the first column of the sort key.
select r.benchmarkid
from LargeNumberofRowswithUniKey r
where r.benchmarkid = 291891
This results in 524,170 rows being scanned according to the actual query plan from the console. When I look at the blocks using STV_BLOCKLIST. The relevant blocks that might be required to satisfy my query are:
|slice|col|tbl |blocknum|num_values|minvalue|maxvalue|
| 1| 0|346457| 4| 262085| 291881| 383881|
| 3| 0|346457| 4| 262085| 291883| 344174|
| 0| 0|346457| 5| 262085| 291891| 344122|
So shouldn't there be 786,255 rows scanned (3 x 262,085) instead of 524,170 (2 x 262,085) as listed in the plan?
The "very selective filter" warning is returned when the rows selected vs rows scanned ratio is less than 0.05 i.e. a relatively large number of rows are scanned compared to the number of rows actually returned. This can be caused by having a large number of unsorted rows in a table, which can be resolved by running a vacuum. However, as you're already doing that I think this is happening because your query is actually very selective (you're selecting a single combination of benchmarkid and universeid) and so you can probably ignore this warning.
Side-observation: If you are always selecting values by using both benchmarkid and UniverseID, you should probably use DISTKEY EVEN.
The reason for this is that a benchmarkid DISTKEY would distribute the data between slices based on benchmarkid. All the values for a given benchmarkid would be on the same slice. If your query always provides a benchmarkid in the query, then the query only utilizes one slice.
On the other hand, if it used DISTKEY EVEN, then every slice can participate in the query, making it more efficient (for queries with WHERE benchmarkid = xxx).
A general rule of thumb is:
Use DISTKEY for fields commonly used in JOIN or GROUP BY
Use SORTKEY for fields commonly used in WHERE
Consider the fictional data to illustrate my problem, which contains in reality thousands of rows.
Figure 1
Each individual is characterized by values attached to A,B,C,D,E. In figure1, I show 3 individuals for which some characteristics are missing. Do you have any idea how can I get the following completed table (figure 2)?
Figure 2
With the ID in figure 1 I could have used the carryforward command to filling in the values. But since each individual has a different number of rows I don't know how to create the ID.
Edit: All individual share the characteristic "A".
Edit: the existing order of observations is informative.
To detect the change of id, the idea is to compare if the precedent value of char is >= in each rows.
This works only if your data are ordered, but it seems mandatory in your data.
gen id= 1 if (char[_n-1] >= char[_n]) | _n ==1
replace id = sum(id) if id==1
replace id = id[_n-1] if missing(id)
fillin id char
drop _fillin
If an individual as only the characteristics A and C and another individual as only the characteristics D and E, this won't work, but it seems impossible to detect with your data.
SELECT
a.id,
b.url as codingurl
FROM fact_A a
INNER JOIN dim_B b
ON strpos(a.url,b.url)> 0
Records Count in Fact_A: 2 Million
Records Count in Dim_B : 1500
Time Taken to Execute : 10 Mins
No of Nodes: 2
Could someone help me with an understanding why the above query takes more time to execute?
We have declared the distribution key in Fact_A to appropriately distribute the records evenly in both the nodes and also Sort Key is created on URL in Fact_A.
Dim_B table is created with DISTRIBUTION ALL.
Redshift does not have full-text search indexes or prefix indexes, so a query like this (with strpos used in filter) will result in full table scan, executing strpos 3 billion times.
Depending on which urls are in dim_B, you might be able to optimise this by extracting prefixes into separate columns. For example, if you always compare subpaths of the form http[s]://hostname/part1/part2/part3 then you can extract "part1/part2/part3" as a separate column both in fact_A and dim_B, and make it the dist and sort keys.
You can also rely on parallelism of Redshift. If you resize your cluster from 2 nodes to 20 nodes, you should see immediate performance improvement of 8-10 times as this kind of query can be executed by each node in parallel (for the most part).