I have a model with DimCustomer, DimSegment and FactRevenue.
For most cases each customer is associated with a single segment.
There are a handful of Customers, that have two Segments… Therefore, in FactRevenue a few few cases of Customers have two Segments associated…
I want to “override” it and display 'current segment' only using FactRevenue (I don’t have a connection between DimCustomer and DimSegment)
Do I have any tools in DAX to achieve this?
Is it viable to have a mapped list of this exceptions and hardcode it somehow?
Related
I have a slight issue with my tables in POWERBI. In short, I have a missing link in one of my relations. As a result, instead of returning NOTHING which is logical and actually what I would like, it returns EVERYTHING.
A bit more details, I have the multiple tables with relations between them. The problem is that I have a few task_group pointing toward shipments that do not exist. In my visualization, I am trying to access data (a count of the number of Packages linked to a shipment) that is linked to a shipment. The logical thing for me would be that "If there is no shipment fitting the number that is given in the shipment table, then you cannot count the number of packages linked to that shipment".
But PowerBI beg to differ. His idea is "If I cannot find a shipment to link to package, i'm going to take every single package regardless of shipment". As a result, a group of task that do not have any package end up showing as having all the packages instead. How can I tell powerbi to return nothing if he doesn't find anything instead of returning everything?
Image of my relationships
I think Power BI behaves slightly unintuitively where there are nulls on one side of a join.
Have you tried filtering to only include where shipment_id is not blank?
If the problem is you having NULLs in one side of the relationship, the best way to tackle this would be to replace the NULLs with something else. Now, you can do it in two ways:
Edit the Shipment number NULLs to something else in the Power query while importing (Some number which is not likely to be an actual shipment, maybe 0)
Create a calculated field in DAX replacing the blanks/NULLs and use that in the relationship instead
But I think you may have NULLs in both the sides of the relationship. That is the only explanation I can think of, why Power BI is behaving this way. Either way, the above solutions should fix it.
We have currently a couple of authorized views in the big query for various teams
Currently, we are using partition_date column to use in the query to reduce the amount of data processed (reference)
#standardSQL
SELECT
<required_fields,...>,
EXTRACT(DATE FROM _PARTITIONTIME) AS partition_date
FROM
`<project-name>.<dataset-name>.<table-name>`
WHERE
_PARTITIONTIME >= TIMESTAMP("2018-05-01")
AND _PARTITIONTIME <= CURRENT_TIMESTAMP()
AND <Blah-Blah-Blah>
However, due to the number of users & data we have, it's very hard to maintain the quality of big query scripts leading us with increased query cost with the relatively increasing number of users.
I see we can use --require_partition_filter (reference) when creating TABLEs. So, could someone help me address the following questions
When I create a table with the above filter, does the referenced view will also expect the partition condition because of the partition filter enabled on the table level?
Due to the number of authorized views connected to tables we have, it requires significant efforts to change it to materialized views (tables). Is there an alternative way possible to apply something similar/use like --require_partition_filter on view level?
FYI, for someone who wants to update the current table with the above filter, I see we can use bq update command (reference) which I am planning to use for existing partitioned tables.
Yes, the same restriction on the tables being queried through the view applies.
There is not.
i know this a common question, but most frequently people ask about performancy between this two.
What I'm asking for is use cases of cte and temp table, for better understanding the usage of them
With a temp table you can use CONSTRAINT's and INDEX's. You can also create a CURSOR on a temp table where a CTE terminates after the end of the query(emphasizing a single query).
I will answer through specific use cases with an application I've had experience with in order to aid with my point.
Common use cases in an example enterprise application I've used is as follows:
Temp Tables
Normally, we use temp tables in order to transform data before INSERT or UPDATE in the appropriate tables in time that require more than one query. Gather similar data from multiple tables in order to manipulate and process the data.
There are different types of orders (order_type1, order_type2, order_type3) all of which are on different TABLE's but have similar COLUMN's. We have a STORED PROCEDURE that UNION's all these tables into one #orders temp table and UPDATE's a persons suggested orders depending on existing orders.
CTE's
CTE's are awesome for readability when dealing with single queries. When creating reports that requires analysis using PIVOT's,Aggregates, etc. with tons of lines of code, CTE's provide readability by being able to separate a huge query into logical sections.
Sometimes there is a combination of both. When more than one query is required. Its still useful to break down some of those queries with CTE's.
I hope this is of some usefulness, cheers!
Question to all Cassandra experts out there.
I have a column family with about a million records.
I would like to query these records in such a way that I should be able to perform a Not-Equal-To kind of operation.
I Googled on this and it seems I have to use some sort of Map-Reduce.
Can somebody tell me what are the options available in this regard.
I can suggest a few approaches.
1) If you have a limited number of values that you would like to test for not-equality, consider modeling those as a boolean columns (i.e.: column isEqualToUnitedStates with true or false).
2) Otherwise, consider emulating the unsupported query != X by combining results of two separate queries, < X and > X on the client-side.
3) If your schema cannot support either type of query above, you may have to resort to writing custom routines that will do client-side filtering and construct the not-equal set dynamically. This will work if you can first narrow down your search space to manageable proportions, such that it's relatively cheap to run the query without the not-equal.
So let's say you're interested in all purchases of a particular customer of every product type except Widget. An ideal query could look something like SELECT * FROM purchases WHERE customer = 'Bob' AND item != 'Widget'; Now of course, you cannot run this, but in this case you should be able to run SELECT * FROM purchases WHERE customer = 'Bob' without wasting too many resources and filter item != 'Widget' in the client application.
4) Finally, if there is no way to restrict the data in a meaningful way before doing the scan (querying without the equality check would returning too many rows to handle comfortably), you may have to resort to MapReduce. This means running a distributed job that would scan all rows in the table across the cluster. Such jobs will obviously run a lot slower than native queries, and are quite complex to set up. If you want to go this way, please look into Cassandra Hadoop integration.
If you want to use not-equals operator on a specific partition key and get all other data from table then you can use a combination of range queries and TOKEN function from CQL to achieve this
For example, if you want to fetch all rows except the ones having partition key as 'abc' then you execute below 2 queries
select <column1>,<column2> from <keyspace1>.<table1> where TOKEN(<partition_key_column_name>) < TOKEN('abc');
select <column1>,<column2> from <keyspace1>.<table1> where TOKEN(<partition_key_column_name>) > TOKEN('abc');
But, beware that result is going to be huge (depending on size of table and fields you need). So you might want to use this in conjunction with dsbulk kind of utility. Also note that there is no guarantee of ordering in your result. This is just a kind of data dump which will most probably be useful for some kind of one-time data migration like scenarios.
I have a custom variable set for all visitors; for our registered users it's some value, for unregistered users, it's empty.
I can find unregistered users in an advanced segment using the settings Exclude Custom Variable (Value 02) Matching Regexp .+ -- works brilliantly.
But I need a report of unregistered visitors for a dashboard, and tried to do the same thing with a filter. I have a metric of Visits and a dimension of something all vistors will have (e.g. Browser). My filter is identical to the one in the advanced segment, but ... not brilliant. I get no visits. I have tried to Include with a regex ^$ but no love there, either.
Any ideas what I am doing wrong?
To understand your problem and the solution yourself, let me illustrate how the data recording works in any collection process (Google Anlaytics is one of the tools used for data collection and analysis):
To record and analyse data, you first decide what you want to record, and then how. Maybe this how is where Google Analytics comes in for you. So, the data that you want to see is the metric, it can have a name and a (usually numeric) value, and each dimension is how you want to separate or drill down into the various views of the data. As an example, if you want to know how many visitors visited your site everyday, and you want to be able to see through which source they came, Daily Visitor Count is your metric and Source is your dimension.
The important thing to understand here is that Dimensions and Metrics are not bound together. What I mean here is that just because you decided that Daily Visitor Counts should be viewable by Source, doesn't add a source to every updation of the Daily Visitor Count metric. In order to view the metric by the dimenision, you need to update a value for the dimension every time you record the metric.
If you don't record a dimension for a metric, then you cannot obtain the value of the metrics for which you didn't record a dimension by applying a filter on the dimension. Because, using a dimension filter only lets you access the values recorded for the dimension, and not all metrics, because, dimensions don't contain values of metrics, only metrics can optionally contain values for dimensions.
So when you query "dimension equals regex +*", it works, with both include and exclude, but you cannot query metrics with empty dimension using a dimensional filter. The best way would be to only add a standard or default value for the dimension every time you record the metric so that you can separate, something like (not set) or unknown.
Hope that helps. :)
I just hope you understand what you were trying to do is conceptually wrong, though it could still have been made technically feasible.