How to make power query case insensitive for purpose of duplicate removal? - powerbi

In power bi, I open the transform data, and import a table.
The table has name column with data like following:
Product 1
product 1
When I remove the duplicates, the power query is keeping both the above treating both as unique values being case sensitive.
How can I make power query case insensitive for purpose of duplicate removal?

Since you posted no code, I am assuming you did this from the UI. So:
Go into the Advanced Editor
Locate the line that starts with Table.Distinct
Change the equation criteria to something like: Table.Distinct(previousStep, {"ColumnName",Comparer.OrdinalIgnoreCase})
Be sure to add this in the correct location.
Check MS Help for the command.
If you can't figure it out, post the relevant M-Code.

Related

How to select multiple values from a filter in Google Sheets?

I've got lots of data in a Google sheet (I do not have Excel or Windows as I am on a Chromebook) and I want to use one column to filter out cells which contain two different words. The column of data might contain various values.
Example
Cell 1 Acme - Main - Location
Cell 2 Acme - Secondary - Location
Cell 3 Acme - Location - Main
Sticking with the above example, I would like to use my data filters set at the column headers to only show me cells where it matches Acme and Main.
What is the best way of doing this, please?
I tried using the Text Contains option in the data filter but I'm not sure how to insert both words as something to filter by, it seems to only filter the words exactly how they are typed. So if I type in Acme Main into the filter it will work for some cells which are in that exact order.
if the order of "acme main" combo does not matter you could use:
=REGEXMATCH(A1:A, "Acme(.+)Main|Main(.+)Acme")
if you also want it by any chance case-insensitive use:
=REGEXMATCH(LOWER(A1:A), "acme(.+)main|main(.+)acme")
In the filter options, use this custom formula
=regexmatch(A1:A, "Acme(.+)Main")
and see if that works?
Change column reference to suit.

Power BI filtering URL query Parameters

I am trying to filter the Power BI reports using the URL query filters.The field name I am trying to filter has space so I am passing in the parameter like
?filter=DW_Project/Project_x0020_Manager_x0020_Name eq 'Max Hex'
But the reports are being filtered
I am getting the error like
Can anyone please tell what I am missing here.
The encoding looks correct. Indeed the space is escaped with _x0020_ as per the documentation. Check the name of the table and the field and make sure they are the same. Note that these names are case sensitive. You will get this error if they do not match. Since you posted only images, I can't check, but DW__Project looks like containing not one, but two underscores, while it is only one in your URL.

How to select columns in Hive SQL with the same prefix (beginning) or suffix (ending) or key word in the middle (including)

EDIT 1: Note, I know some of us will question why not list different parts of information in different attributes, so that I will have a relational database to query. The real case is not like the example I am listing below, the variable names are just used here for convenience.
EDIT 2: To reduce the confusion of database design, I change the variable names in the example.
In Hive query, I am looking for a way to select columns with the same prefix, or the same suffix, or including the same key word in the middle of the variable names from the same table.
Here is an example: I have a list of variables like this:
a_A_1, a_A_2, a_B_1, a_B_2,
b_A_1, b_A_2, b_B_1, b_B_2
Exercise 1
I want to select all the attributes starting with 'a'.
Exercise 2
I want to select all the attributes ending with '1'.
Exercise 3
I want to select all the attributes including 'B'.
Much thanks in advance!
Luckily I found a way to do so and I hope it can benefit many others who are looking for the same answer.
First of all, you need to run this setting in your Hive environment:
set hive.support.quoted.identifiers=none;
See solutions below
Exercise 1
select `a.*` from test_table;
Exercise 2
select `.*1$` from test_table;
Exercise 3
select `.*B.*` from test_table;

Aqua Data studio, compare results, column names case-sensitive

Any Aqua Data studio users out here know how to turn off case-sensitivity when comparing results?
e.g. in one Query, column 1 is called "test", in another one it's called "TEST", then Aqua datastudio does not identify these columns when comparing results. How can I turn this off?
I can ignore upper/ lower case in the result set, but not in the column names.
Renaming every column each time manually is a pain. Somebody knows?
For Results Compare, cant you change your SQL Query to the same case, using Upper or using ALIAS for the column name ? I used for e.g. UPPER("category") AS CATEGORY and this solved the problem you are having.
For Schema Compare do the below
Under File->Options, Compare, enable option to Ignore Case
When you perform a Schema Compare, Under Object Alignment, you can select to Ignore Case

How to Scan HBase Rows efficiently

I need to write a MapReduce Job that Gets all rows in a given Date Range(say last one month). It would have been a cakewalk had My Row Key started with Date. But My frequent Hbase queries are on starting values of key.
My Row key is exactly A|B|C|20120121|D . Where combination of A/B/C along with date (in YearMonthDay format) makes a unique row ID.
My Hbase tables could have upto a few million rows. Should my Mapper read all the table and filter each row if it falls in given date range or Scan / Filter can help handling this situation?
Could someone suggest (or a snippet of code) a way to handle this situation in an effective manner?
Thanks
-Panks
A RowFilter with a RegEx Filter would work, but would not be the most optimal solution. Alternatively you can try to use secondary indexes.
One more solution is to try the FuzzyRowFIlter. A FuzzyRowFilter uses a kind of fast-forwarding, hence skipping many rows in the overall scan process and will thus be faster than a RowFilter Scan. You can read more about it here.
Alternatively BloomFilters might also help depending on your schema. If your data is huge you should do a comparative analysis on secondary index and Bloom Filters.
You can use a RowFilter with a RegexStringComparator. You'd need to come up with a RegEx that filters your dates appropriately. This page has an example that includes setting a Filter for a MapReduce scanner.
I am just getting started with HBase, bloom filters might help.
You can modify the Scan that you send into the Mapper to include a filter. If your date is also the record timestamp, it's easy:
Scan scan = new Scan();
scan.setTimeRange(minTime, maxTime);
TableMapReduceUtil.initTableMapperJob("mytable", scan, MyTableMapper.class,
OutputKey.class, OutputValue.class, job);
If the date in your row key is different, you'll have to add a filter to your scan. This filter can operate on a column or a row key. I think it's going to be messy with just the row key. If you put the date in a column, you can make a FilterList where all conditions must be true and use a CompareOp.GREATER and a CompareOp.LESS. Then use scan.setFilter(filterList) to add your filters to the scan.