ProxySQL data masking for multiple columns

ProxySQL data masking for multiple columns - regex

I want to mask sensitive information on multiple columns in a table named my_table using ProxySQL.
I've followed this tutorial to successfully mask a single column named column_name in a table using the following mysql_query_rules:
/* only show the first character in column_name */
INSERT INTO mysql_query_rules (rule_id,active,username,schemaname,match_pattern,re_modifiers,replace_pattern,apply)
VALUES (1,1,'developer','my_table','(\(?)(`?\w+`?\.)?\`?column_name\`?(\)?)([ ,\n])','caseless,global',
"\1CONCAT(LEFT(\2column_name,1),REPEAT('X',CHAR_LENGTH(column_name)-1))\3 column_name\4",1);
But when I add a second rule for masking another column called second_column_name in the table, proxysql fails to mask the second column. Here's the second rule:
/* masking the last 3 characters in second_column_name */
INSERT INTO mysql_query_rules (rule_id,active,username,schemaname,match_pattern,re_modifiers,replace_pattern,apply)
VALUES (2,1,'developer','my_table','(\(?)(`?\w+`?\.)?\`?second_column_name\`?(\)?)([ ,\n])','caseless,global',
"\1CONCAT(LEFT(\2second_column_name,CHAR_LENGTH(second_column_name)-3),REPEAT('X',3))\3 second_column_name\4",1);
Here's the query result after the 2 rules are added:
SELECT column_name FROM my_table; returns a masked column_name.
SELECT second_column_name FROM my_table; returns a masked second_column_name.
SELECT column_name, second_column_name FROM my_table; returns data with column_name masked, but second_column_name is not masked.
SELECT second_column_name, column_name FROM my_table; also returns data with column_name masked, but second_column_name is not masked.
Does this mean that 1 query can only be matched with 1 rule?
How can I mask data in multiple columns with ProxySQL?

Using flagIN, flagOUT, and apply allows me to mask data on multiple columns.
Here's the final mysql_query_rules I have:
/* only show the first character in column_name */
INSERT INTO mysql_query_rules (rule_id,active,username,schemaname,flagIN,match_pattern,re_modifiers,flagOUT,replace_pattern,apply)
VALUES (1,1,'developer','my_db',0,'(\(?)(`?\w+`?\.)?\`?column_name\`?(\)?)([ ,\n])','caseless,global',6, "\1CONCAT(LEFT(\2column_name,1),REPEAT('X',CHAR_LENGTH(column_name)-1))\3 column_name\4",0);
/* masking the last 3 characters in second_column_name */
INSERT INTO mysql_query_rules (rule_id,active,username,schemaname,flagIN,match_pattern,re_modifiers,flagOUT,replace_pattern,apply)
VALUES (2,1,'developer','my_db',6,'(\(?)(`?\w+`?\.)?\`?second_column_name\`?(\)?)([ ,\n])','caseless,global',NULL,
"\1CONCAT(LEFT(\2second_column_name,CHAR_LENGTH(second_column_name)-3),REPEAT('X',3))\3 second_column_name\4",1);
The meanings of the 3 variables are as the following:
flagIN, flagOUT, apply - these allow us to create "chains of rules"
that get applied one after the other. An input flag value is set to
0, and only rules with flagIN=0 are considered at the beginning. When
a matching rule is found for a specific query, flagOUT is evaluated
and if NOT NULL the query will be flagged with the specified flag in
flagOUT. If flagOUT differs from flagIN , the query will exit the
current chain and enters a new chain of rules having flagIN as the
new input flag. If flagOUT matches flagIN, the query will be
re-evaluate again against the first rule with said flagIN. This
happens until there are no more matching rules, or apply is set to 1
(which means this is the last rule to be applied)

Related

I want to Assign 'Y' to the Duplicate Records and 'N' to the Unque Records, And Display those 'Y' and 'N' Flags in 'Duplicate' Column

I want to Assign 'Y' to the Duplicate Records and 'N' to the Unique Records, And Display those 'Y' and 'N'
Flags in 'Duplicate' Column.
Like Below
Source Table:
Name,Location
Vivek,India
Vivek,UK
Vivek,India
Vivek,USA
Vivek,Japan
Target Table:
=============
Name,Location,Duplicate
Vivek,India,Y
Vivek,India,Y
Vivek,Japan,N
Vivek,UK,N
Vivek,USA,N
How to Create a Mapping in Informatica Powercenter?
Which Logic I Should use?
[See the Image for More Clarification][1]
[1]: https://i.stack.imgur.com/2F20A.png

You need to calculate count grouping by key columns using aggregator. And then join back to original flow based on key columns.
use Sorter sort the data based on key columns like name and country in your example.
use Aggregator to calculate count() group by key columns.
out_count= count(*)
in_out - key_column
use Joiner to join aggregator data and sorter data based on key columns. Drag out_count and key columns from aggregator to joiner. Drag all columns from sorter. Do a inner join on key columns.
use Expression and create an out expression. Use out_count column to calculate your duplicate flag.
out_Duplicate = iif( out_count>1, 'Y','N')
Whole map should look like this
SRC -->SRT ---->AGG-->\
|------------->JNR-->EXP-->TGT

There's one more way to solve it without the Joiner, which is costly. I'm going to use the Name, Location sample columns from your example.
Use the Sorter on the Name and Location
Add an Expression with variable port for each key column called e.g. v_prev_Name and v_prev_Location.
Assign the expressions accordingly:
v_prev_Name = Name
v_prev_Location = Location
Next create another variable v_is_duplicate with following expression:
IIF(v_prev_Name = Name and v_prev_Location = Location, 1, 0)
Move v_is_duplicate up the list of ports so that it is before v_prev_Name and v_prev_Location - THIS IS IMPORTANT. The order needs to be:
v_is_duplicate
v_prev_Name
v_prev_Location
Add output port is_duplicate with expression simply matching v_is_duplicate.

MYSQL get substring

I'm trying to get substring dynamically and group by it. So if my uri column contains records like: /uri1/uri2 and /somelongword/someotherlongword I would like to get everything up to second delimiter, namely up to second / and count it. I'm using this query but obviously it is cutting string statically (6 letters after the first one).
SELECT substr(uri, 1, 6) as URI,
COUNT(*) as COUNTER
FROM staging
GROUP BY substr(uri, 1, 6)
ORDER BY COUNTER DESC
How can I achieve that?

You can use combination of SUBSTRING() and POSITION()
schema:
CREATE TABLE Table1
(`uri` varchar(10))
;
INSERT INTO Table1
(`uri`)
VALUES
('some/text'),
('some/text1'),
('some/text2'),
('aa/bb'),
('aa/cc'),
('bb/cc')
;
query
SELECT
SUBSTRING(uri,1,POSITION('/' IN uri)-1),
COUNT(*)
FROM Table1
GROUP BY SUBSTRING(uri,1,POSITION('/' IN uri)-1);
http://sqlfiddle.com/#!9/293dd3/3/0
edit: here I found amazon athena documentation: https://docs.aws.amazon.com/athena/latest/ug/presto-functions.html and here is the string function documentation: https://prestodb.io/docs/0.217/functions/string.html
my answer above still stands, but you might need to change SUBSTRING to SUBSTR
edit 2: it seems there's a special function to achieve this in amazon athena called SPLIT_PART()
query:
SELECT SPLIT_PART(uri, '/', 1), COUNT(*) FROM tbl GROUP BY SPLIT_PART(uri, '/', 1)
from docs:
split_part(string, delimiter, index) → varchar
Splits string on delimiter and returns the field index. Field indexes start with 1. If the index is larger than than the number of fields, then null is returned.

Is there a way to assert a "not" condition using tSQLt?

I am new to using tSQLt and struggling a bit with the available assert functions. The tSQLt.AssertEmptyTable method is great, but how do you apply a "not" condition to this, i.e. I want to assert that a table contains data?

That depends on what you actually want to test. Do you want to validate the content of the rows and columns in that table or just that it has one or more rows?
If the former, then tSQLt.AssertEqualsTable will allow you to compare the contents of one table (e.g. a #expected table populated with the values you are expecting) with the table under test EXEC tSQLt.tSQLt.AssertEqualsTable '#expected', 'my_table';. One useful feature of this assertion is that only the columns in #expected are validated. So if #expected has ten columns but my_table as twelve, only the contents of those ten columns are checked, the other two will be ignored by this assertion. This can be useful, for example, when those two columns are auto-populated and so harder to test e.g. an IDENTITY column and a GETDATE() default. Obviously, if #expected has columns that do not exist on my_table the test will fail anyway.
If you just want to check that there is any data at all in the table you can do something like IF NOT (SELECT COUNT(*) FROM my_table) > 0 EXEC tSQLt.Fail 'my_table contains no data'

Sqlite Query to remove duplicates from one column. Removal depends on the second column

Please have a look at the following data example:
In this table, I have multiple columns. There is no PRIMARY KEY, as per the image I attached, there are a few duplicates in STK_CODE. Depending on the (min) column, I want to remove duplicate rows.
According to the image, one stk_code has three different rows. Corresponding to these duplicate stk_codes, value in (min) column is different, I want to keep the row which has minimum value in (min) column.
I am very new at sqlite and I am dealing with (-lsqlite3) to join cpp with sqlite.
Is there any way possible?

Your table has rowid as primary key.
Use it to get the rowids that you don't want to delete:
DELETE FROM comparison
WHERE rowid NOT IN (
SELECT rowid
FROM comparison
GROUP BY STK_CODE
HAVING (COUNT(*) = 1 OR MIN(CASE WHEN min > 0 THEN min END))
)
This code uses rowid as a bare column and a documented feature of SQLite with which when you use MIN() or MAX() aggregate functions the query returns that row which contains the min or max value.
See a simplified demo.

PowerBI custom combined column with Text.Combine and with embeded conditions leading to the insertion of different strings

I created a custom column in PowerBI, which concatenate columns.
I have the following:
Text.Combine({[Nip],[Nap],[Noup]]},"_")
However, I would like to have a specific text which change based on whether or not data is present in columns. I need to check if there is data in four columns. If there is data, a specific string of character should be inserted, if there is not, no data should be inserted.
I am trying to insert the outcome of the "IF"s, but there is some complexity, I have tried this, but this is not working, Power BI is telling me "Token Eof expected" :
If [Lapino] <> null or [Lapinou] <> null or [Werwolf] <> null or [Ciocolato] then
Text.Combine({[Nip],"Snoubadiuba",[Nap],[Noup]},"_")
else Text.Combine({[Nip],"BruttoCativo",[Nap],[Noup]},"_")

I believe this is as simple as changing your If to lowercase if. M code is case-sensitive.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

ProxySQL data masking for multiple columns - regex

Related

I want to Assign 'Y' to the Duplicate Records and 'N' to the Unque Records, And Display those 'Y' and 'N' Flags in 'Duplicate' Column

MYSQL get substring

Is there a way to assert a "not" condition using tSQLt?

Sqlite Query to remove duplicates from one column. Removal depends on the second column

PowerBI custom combined column with Text.Combine and with embeded conditions leading to the insertion of different strings

Categories

Resources