I am attempting to unpivot COVID-19 data in Knime with the Unpivoting Node.
The data available from Johns Hopkins at
https://github.com/CSSEGISandData/COVID-19
is wide format where each new day of data is added as a new column.
I can manually make the columns with daily data be rows with the Unpivoting Node. However, each day I must reconfigure the node to account for the new column. There are 5 unpivoting nodes in my workflow where this must be done.
The Unpivoting Node has an option to use Regex to detect the columns to include or exclude but I am unable to make it work.
The available columns to include/exclude are a handful of field names such as Province/State, Country/Region, Lat, Long, plus the long list of date columns of the format m/d/yy (or m/dd/yy if later in the month). The Johns Hopkins data for the US is similar format but with additional columns for counties, iso codes, etc.
All of the date columns are this year (i.e. 2020).
For the top part of the Unpivoting node where Value Columns are
specified, I can do what I need by using the Wildcard setting and the
pattern */*/20
For the bottom part of the Unpivoting node, I need a wildcard or Regex
expression to specify all the other columns.
All the other columns include alphabet characters. None are of the format m/d/yy.
Therefore, some sort of Regex that includes any column with alphabetical column names, or specifies NOT m/d/yy should do the trick.
I tried using [\s\S]+ for help writing the Regex but nothing seems to work. I appreciate any help.
If other column names don't have / you can use [^/]+. Check here for more explanation.
I think it might be easy to select the other columns manually in the Retained columns section. (That way you can easily remove some of them if you want to.) I assume the date columns are in a single group, so you can click on the first column to retain, scroll down to the first date column you do not want to retain, Shift+click on the previous column, include those, scroll to the column after the dates columns, and do similar. Please use the Enforce inclusion option to not generate warnings/errors when the new columns added.
Example:
This way you can later easily remove columns from the retained.
PS: On your screenshot it seems you forgot to include the + from the end of the expression.
Related
I want to set current quarter dynamically, e.g [2021-01-01 ~ 2021-04-01)
Does superset support it? if so how to config it?
The Last vs Previous and date range control in general has been a source of confusion for my users.
Last Quarter just shows the last 3 months [because it's a quarter of a year?].
It would be great to have options like Week to date, Month/Period to date, Quarter to date, etc...
Another issue is that each company may define their quarters/periods on different starting dates, depending on their fiscal calendar.
As a stop-gap, I've done the following.
enriched the underlying dataset to have additional columns like period_start_date and fiscal_quarter_start_date.
created a fiscal_dates table that contains a list of every day over the years I need to query. The columns correlate with date columns in my other tables, like dob, fiscal_week_start_date, period_start_date, fiscal_quarter_start_date . I created this table in postgres using generate series
created a new virtual dataset that contains the column period_start_date, that shows the last 4 years of period start dates.
use a value native filter to select from the list of dates.
make the values sorted descending, and default value as "first item in list".
This allows the user to select all records that occur in the same quarter/period, with a default of the current quarter.
The tentative apache/superset#17416 pull request should remedy this problem, i.e., for the QTD you would simply specify the START as datetrunc(datetime("now"), quarter) and leave the END undefined.
I have joined an excel sheet with the world population web link from Wikipedia in my Power BI tool. When I merge these two tables, it shows me the population only from the United States, other countries have null values.
Would really appreciate the help. Screenshots provided below
It looks like your merge isn't matching the rows as you expect.
I would try to investigate if there are "invisible" differences in the columnar values:
Canada (with an appended space) will not match to Canada for example. To check for this, go into the table you are merging and select to Trim the key column.
In the table you are merging into, do the same Trim operation for the key column.
Edit: Another option is to apply fuzzy matching to the merging process and to limit the amount of fuzzyness by setting maximum number of matches per row and adjusting the similarity threshold up from 0.80 to something closer to the maximum 1.00 (= exact matching).
I think issue is left join. Try with join tables with country column to present all columns.
Click dropdown and select full outer feature and expand.
I've got an oddly set up google sheet as my data source for a powerBI dashboard. Right now my main stress point is a 'last 7 days' filter that needs to be applied. The problem is that there are multiple columns containing dates that could be in the last 7 days, in this case representing multiple steps in an email chain.
If any one of those columns contains a date in the last 7 days, then I need to capture the row in most of my visualizations and tables, but if I just use standard filters, PowerBI assumes 'AND' and displays none of the rows, since there will almost never be a row where multiple date entries are in the last 7 days.
I'm almost certain there is a way to do this with either merged columns or calculated fields, or maybe there is even something as simple as an 'OR' filter, but thus far my googling has not turned up anything. Do you know a work around for this?
Thanks in advance!
In Power Query Editor, create a duplicate column for each of your date fields.
Make sure each of the duplicated columns is in Date format and then calculate the "Age". You will get a time value. In the Transform pane, use the "Duration" function and convert to "Days". Do this for each of the duplicated columns.
Now the last step: Create a "conditional column" in the "Add column" pane, and pull all of these new columns that should now have integer values and set the condition to show "Yes" if less than or equal to 7, "No" if more than 7.
Let me know if this helps.
so, I got 3 xlsx full of data already treated, so I pretty much just got to display the data using the graphs. The problem seems to be, that Powerbi aggregates all numeric data (using: count, sum, etc.) In their community they suggest to create new measures, the thing is, in that case I HAVE TO CREATE A LOT OF MEASURES...Also, I tried to convert the data to text and even so, Powerbi counts it!!!
any help, pls?
There are several ways to tackle this:
When you pull a field into the field well for a visualisation, you can click the drop down in the field well and select "Don't summarize"
in the data model, select the column and on the ribbon select "don't summarize" as the summarization option in the Properties group.
The screenshot shows the field well option on the left and the data model options on the right, one for a numeric and one for a text field.
And, yes, you never want to use the implicit measures, i.e. the automatic calculations that Power BI creates. If you want to keep on top of what is being calculated, create your own measures, and yes, there will be many.
Edit: If by "aggregating" you are referring to the fact that text values will be grouped in a table (you don't see any duplicates), then you need to add a column with unique values to the table so all the duplicates of the text values show up. This can be done in the data source by adding an Index column, then using that Index column in the table and setting it to a very narrow with to make it invisible.
I am rather new to Power Bi and I have a question i can't find the answer to.
I want to import a table that have some label columns, with repeated items, and more than 15 data columns.
My desire result would be to group the label columns, so no repeated items, and aggregate the values of the remaining columns.
Is there a way to do that in PQ editor or DAX ?
I appreciate any help or direction you can give me!
A sample of the table (it's much bigger, with multiple values in the first three columns)
Table Sample
Thanks a lot
Edit: From that sample, the output y I want is the following
Output Sample
The thing is, there are many different values in the first columns, and i need to agreggate all the other values, keeping they column name (cause this info is already linked to other files).
Maybe the only way is to group by and add the columns, renaming them one by one?
I want to do this in a couple of files, so if you know of another way please let me know!
In your query designer import your table. Then go to Home > Group By and group like you want it, the same goes for the aggregations and thats it.
If you just want to remove row duplicates, just group all columns which you dont want to aggregate and the rest can be aggregated like you want it.