Glue Custom Classifier for CSV without Delimiter and without Quote Symbol - amazon-web-services

I have a source text file that had no header, no delimiter, no quote symbol between the fields.
I need to define the glue classifier, then crawler to crawl the schema of this table to the database data catalog. I have tried but no luck when creating the classifier either with CDKv2 or from the Console.
Example rows:
12345678ab 12 cdefg 12345
11223344cde 34 aabb 54321
Column 1: position 1-8
Column 2: position 9-13
Column 3: position 10
Column 4: position 11-13
Column 5: position 14-20
Column 4: position 21-25
How can I customize this in the Glue Classifier?
I appreciate your help very much.

Related

Power BI - CRUD analysis how?

I have 2 data sources - Source 1 ( SharePoint List ) & Source 2 ( Cloud Source ). I bring both into Power BI. They each have a key to identify a unique instance of a record.
In Power BI I have been asked to identify the New Inserts / Deletes and Updated Records.
So is there an easy way of doing this?
Table 1
Key Column 1 Column 2 Column 3
Table 2
Key Column 1 Column 2 Column 3
You can use Merge queries transformation in Power Query Editor to do that. Left Anti and Right Anti join kinds will give you the rows that exists only in the first or second data source. Inner will give you the rows that exists in both (based on their key value) and later you can compare the other columns to decide are they modified or not.
Lets assume there are two data sources, as follows:
Source 1
Key
Column 1
Column 2
Column 3
1
initial value
initial value
initial value
2
initial value
initial value
initial value
3
initial value
initial value
initial value
Source 2
Key
Column 1
Column 2
Column 3
1
initial value
initial value
initial value
2
modified value
initial value
initial value
4
initial value
initial value
initial value
1 exists in both sources and is not modified;
2 exists in both sources but is modified (Column 1 has different values);
3 exists only in Source 1;
4 exists only in Source 2.
In Power Query Editor, in Home tab of the ribbon, in Combine group, click on Merge Queries -> Merge Queries as New, select Source 1 as the first source, Source 2 as the second source, set the join kind to be Left Anti and make sure Key columns in both sources are selected:
This will give you the rows, that exists only in Source 1, i.e. only 3 (and remove the columns from Source 2 there, because they are not needed):
Do the same merge again, but swap the sources:
to get the rows, that exists only in Source 2, i.e. 4:
And then do it again, but this time set the join kind to be Inner:
Click on the button in the header of Source 2 column and add all the columns except Key and add a conditional column (Add Column -> General -> Conditional Column) as follows (note, that the screenshot is incorrect - the third comparison should be between Column 3 and Source 2.Column 3):
It will tell you is this row modified (2) or not (1):
If you want you can click the button in the header of the custom column and filter the result to show only the rows, where the value is Modified, which will leave only 2 in the result:

Generating multple rows from one row

I have a datasource in powerbi with the following format:
Sale# Date Value Installments
1 Jan/2020 150,00 2
2 Mar/2020 210,00 3
Then Installments column is the number of payments the sale will be divide. I need to transform the above datasource in one line for each payment, So if the first one has two installments, the payment will be divided in two months:
Sale# Date Value
1 Jan/2020 75,00
1 Feb/2020 75,00
2 Mar/2020 70,00
2 Apr/2020 70,00
2 May/2020 70,00
You can do this in Power Query following these below steps-
Step-1: Add a custom column to your table as shown below-
This will generate a list per row.
Step-2: Expand the list as shown below (right click on the New column)-
You have now data as below-
Step-3 Add a custom column as shown in the below image for incremental date-
Step-4 Add another custom column as shown in the below image for equal installment amount-
Step-5 You will have this below data now. Just remove Yellow marked columns and you will get your final desired output.

How to load & use column names/headers > 65 char in IICS?

CSV/ Flat file with column names with no. of months and dates E.g. Benchmark Bloomberg 6 months in year 10/30/2018..
Informatica( IICS) fails to load > 65 char field names, so I loaded it as data -first row. Now I need to do unpivot and do logic based on the “original column Names” i.e. if the month was 6 & date was Oct 30 2018 compare with created date and do X. My best approach is as below. Please suggest a better approach.
1) load column names also as data 2) take out row 1 and store it as 1 row table 3) unpivot the table to make 1 column table and re-pivot it to make column names 4) Apply to the original table in SQL( no issues with > 65 char)
If your fields will always be in the same sequence edit the header names in the source transformation. Configure the source so that data starts on second row. There may be an option to ignore header row values. It should be clear from the UI once you edit the transformation.

How to convert attributes with number values to text values using Case/If function in MicroStrategy Visual Insights?

I have 2 reports/data sets to create a dashboard in Visual Insight. One data set is from Teradata (directly connected to MicroStrategy). The other data set is from Google BigQuery (connected to MicroStrategy via Intelligent Cube connector). The key of these 2 data sets is Categories.
The problem is the Categories attribute in Teradata is in number values i.e. 55, 45, 14, 29, 30 etc. And the values of Categories from the BQ data set is text i.e Food, Fashion. Food consists of numbers 55, 45 & 14. Numbers 29 & 30 make up Fashion. I tried grouping the number as text in the corresponding naming but the new grouped Teradata attribute doesn't link properly with the other data set.
So my challenge is how to align these 2 data sets with the key attribute and link them properly. I'm thinking of creating new attribute using Case/If function but didn't figure it out. Any other suggestion would also be very much appreciated!
Thank you very much,
Willow
You need to create a new table or a view in MicroStrategy holding both CategoryDESC and CategoryID
where you will have the following
Teradata
Column1
55
45
14
29
30
BigQuery
Column1
Food
Fashion
New table
Column1 Column2
Food 55
Food 45
Food 14
Fashion 29
Fashion 30

AWS quicksight parseInt() returns null

I'm trying to generate a QuickSight analysis with a simple .csv file. The file contains some arbitrary data like
Yifei, 24, Male, 2
Joe, 30, Male, 3
Winston, 40, Male, 7
Emily, 18, Female, 5
Wendy, 32, Female, 4
I placed the file in an S3 bucket, and then use AWS Athena to parse that into a table. The table treats all columns as strings, and I can query it properly
SELECT * FROM users
returns
name age gender consumed
1 Yifei 24 Male 2
2 Joe 30 Male 3
3 Winston 40 Male 7
4 Emily 18 Female 5
5 Wendy 32 Female 4
Okay so far so good. Then in QuickSight, I import the table as dataset, and it's properly displayed under fields with the correct values. The only problem remaining is that age and consumed are treated as strings, not numbers. So, I created two calculated fields:
age_calc: parseInt({age})
consumed_calc: parseInt({consume})
Works just fine, now under the fields I can see the newly created fields with correct values. However, once I try to create actual visualization (For example, a pie chart with how much everyone consumed) using the field consumed_calc, the value of consumed_calc is just null.
I found the issue. Basically, csv does not work very well with spaces, so despite the calculated fields showing correct result in preview, when parsed the field " 23" gets an error. Removing the spaces in the original .csv file solved this issue