Glue Classifier could not classify columns using Grok pattern - amazon-web-services

I have an s3 bucket that I structured using the format s3://<bucket-name>/year=<yearno>/month=<monthno>/day=<dayno>/<filename>.log. The lines in the .log files that I've got is structured like:
2020-01-06 09:05:14,450 INFO [Asterisk-Java DaemonPool-1-thread-3] handler.CallHandler (CallHandler.java:849) - Original name : harris changed to : haris . Exist? true
While the Grok pattern that I'm using for the classifier is:
[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9:,]{12} INFO \[Asterisk-Java DaemonPool-1-thread-[0-9]{1,3}] handler.CallHandler \(CallHandler.java:849\) - Original name : %{WORD:original_name} changed to : %{WORD:transformed_name} . Exist\? %{WORD:exist_prior}
I checked my Grok pattern using this debugger web app, and it's confirmed to be correct. What I expected the resulting table to be:
+------+-------+-----+---------------+------------------+--------------+
| year | month | day | original_name | transformed_name | exists_prior |
+------+-------+-----+---------------+------------------+--------------+
| - | - | - | - | - | - |
+------+-------+-----+---------------+------------------+--------------+
However, the table that I've gotten is:
+------+-------+-----+------+------+------+------+
| year | month | day | col0 | col1 | col2 | col3 |
+------+-------+-----+------+------+------+------+
| - | - | - | - | - | - | - |
+------+-------+-----+------+------+------+------+
Where did I go wrong?

I changed my capture regex from %{WORD:variable_name} to %{DATA:variable_name}. It then worked as expected.

Related

PowerBI filter Table on 2 different columns whose values are obtained from another table

I am new to PowerBI. I am trying to implement the following scenario in PowerBI
I have the following 2 tables -
Table 1:
| ExtractionId | DatasetId | UserId |
| -- | --- | --- |
| E_ID1 | D_ID1 | sta#example.com |
| E_ID2 | D_ID1 | dany#example.com |
| E_ID3 | D_ID2 | dany#example.com |
Table 2:
| DatasetId | Date | UserId | Status |
| --| --- | --- | --- |
| D_ID1 | 05/30/2021 | sta#example.com | Completed |
| D_ID1 | 05/30/2021 | dany#example.com | Completed |
| D_ID1 | 05/31/2021 | sta#example.com | Partial |
| D_ID1 | 05/31/2021 | dany#example.com | Completed |
| D_ID2 | 05/30/2021 | sta#example.com | Completed |
| D_ID2 | 05/30/2021 | dany#example.com | Completed |
| D_ID2 | 05/31/2021 | sta#example.com | Partial |
| D_ID2 | 05/31/2021 | dany#example.com | Completed |
I am trying to create a PowerBI report where, given an extraction id (in a slicer), we need to identify the corresponding DatasetId and UserID from Table 1 and use those fields to filter Table 2 and provide a visual of user status on the given date range.
When I am trying to implement the above scenario, I am creating a Many-Many relationship between DatasetID columns of Table1 and Table2, but cannot do the same for UserID column simultaneously as I get the following error :
You can't create a direct active relationship between Table1 and Table2 because an active set of indirect relationship already exists.
Because of this, given an extractionId, I can filter on DatasetID but not UserId and vice versa. Could you please help me understand what mistake I am doing here and how to resolve the same?
Thanks in advance
This case you said too. You can only merge two or more columns. Than you will create relationships.

AWS Oracle DMS show full row each time

I have an Oracle RDS instance configured with DMS with an S3 target.
After full load I ongoing replication, when I update a row with a new value, the DMS file that is created only shows those columns that were updated, but I want the whole row in its current state in the database.
Example:
| client_id | client_name | age |
| :---: | :---: | :----: |
| 1 | John Smith| 46|
| 2 | Jane Doe | 25 |
I then update Johns age to be 47, I would expect the DMS to look like this:
| Op | DMS_TIMESTAMP | client_id | client_name | age |
| :---: | :----: | :---: | :---: | :---: |
| u | 2022-01-01 12:00:00 | 1 | John Smith | 47 |
However the file I receive looks like this:
| Op | DMS_TIMESTAMP | client_id | client_name | age |
| :---: | :----: | :---: | :---: | :---: |
| u | 2022-01-01 12:00:00 | 1 | null | 47 |
According to the docs the DMS row should represent the current state of the row but all of my columns that are not a primary key seem to be missing, despite the row having correct values in the database. Am I missing a configuration?
I was missing a part of the documentation that explains that if you want the values of all the columns of a row, you need to apply the following to the table:
alter table table_name ADD SUPPLEMENTAL LOG DATA (all) columns';
As I needed to apply this for all the tables in a schema, I created this loop to apply it.
BEGIN
FOR I IN (
SELECT
table_name,
owner
FROM
ALL_TABLES
WHERE
owner = 'SCHEMA_OWNER'
) LOOP
-- Print table name
BEGIN
DBMS_OUTPUT.PUT_LINE('Attempting to alter ' || I.table_name || ' at ' || current_timestamp);
EXECUTE IMMEDIATE 'alter table SCHEMA_OWNER.' || I.table_name || ' ADD SUPPLEMENTAL LOG DATA (all) columns';
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE(I.table_name || ' alteration failed at ' || current_timestamp);
END;
END LOOP;
END;

How to add "description" for a column in Postgres DB using the corresponding Django model?

For e.g., in this table, I'd like to be able add the "description" text at the Django ORM layer and have it reflected at the database level.
test=# \d+ django_model
Table "public.django_model"
Column | Type | Modifiers | Description
--------+---------+-----------+-------------
i | integer | |
j | integer | |
Indexes:
"mi" btree (i) - Tablespace: "testspace"
"mj" btree (j)
Has OIDs: no
I suppose you can't do it. Here's the https://code.djangoproject.com/ticket/13867 request. Closed 6 ya as "Won't do".
You still can use postgres COMMENT extension, eg:
t=# create table t (i int, t text);
CREATE TABLE
Time: 12.068 ms
t=# comment on column t.i is 'some description';
COMMENT
Time: 2.994 ms
t=# \d+ t
Table "postgres.t"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+------------------
i | integer | | | | plain | | some description
t | text | | | | extended | |

Rescale Dataset using Power BI

I'm trying to rescale a dataset in using PowerBI Desktop. I've imported a dataset full of raw data, but I can't use row context together with an aggregate. I'm trying to accomplish this:
Data:
+---------+-----+
| Name | Bar |
+---------+-----+
| Alfred | 0 |
| Alfred | -1 |
| Alfred | 1 |
| Burt | 1 |
| Burt | 0 |
| Charlie | 1 |
| Charlie | 1 |
| Charlie | 0 |
+---------+-----+
Calculations:
Foo: = SUM(Bar) / COUNT(Bar) GROUP BY Name
Which would Generate this dataset:
+---------+-----+
| Name | Foo |
+---------+-----+
| Alfred | 0 |
| Burt | .5 |
| Charlie | .67 |
+---------+-----+
Final Calculation:
Score: = (#Foo - MIN(Foo)) / (MAX(Foo)-MIN(Foo))
The goal is to grade on a curve with a set of data. I can do it in excel, but was hoping that Power BI could handle all the heavy lifting.
At this point it might be easier to do it all in SQL before bringing it into PowerBI, but that would make it significantly less dynamic (with date filters and the like). Thanks for any insight you might have!
I think you're looking for the GROUPBY DAX function. https://support.office.com/en-us/article/GROUPBY-Function-DAX-d6d064b2-fd8b-4c1b-97f8-c6d03cdf8ad0
You then would GROUPBY on the Name field and proceed from there. If need to use the measure outside of a visual that groups by each Name (like show me the average score after applying the curve), then you'll need to wrap that in a calculate table where you include the names, your measure projected as a column, and then do your aggregates (min/max/average) over that calculated table.

how to classify the whole data set in weka

I've got a supervised data set with 6836 instances, and I need to know the predictions of my model for all the instances, not only for a test set.
I followed the approach train-test (2/3-1/3) to know about my rates TPR and FPR, and I've got the predictions about my test (1/3), but I need to know the predcitions about all the 6836 instances.
How can I do it?
Thanks!
In the classify tab in Weka Explorer there should be a button that says 'More options...' if you go in there you should be able to output predictions as plain text. If you use cross validation rather than a percentage split you will get predictions for all instances in a table like this:
+-------+--------+-----------+-------+------------+
| inst# | actual | predicted | error | prediction |
+-------+--------+-----------+-------+------------+
| 1 | 2:no | 1:yes | + | 0.926 |
| 2 | 1:yes | 1:yes | | 0.825 |
| 1 | 2:no | 1:yes | + | 0.636 |
| 2 | 1:yes | 1:yes | | 0.808 |
| ... | ... | ... | ... | ... |
+-------+--------+-----------+-------+------------+
If you don't want to do cross validation you also can create a data set containing all your data (training + test) and add it as test data. Then you can go to more options and show the results as Campino already answered.