Remove all lines does not contain in other file - regex

I have file A 'Emails' with so many email , and file B 'Domain' with so many domain
Example File A 'Emails ':
ctv#ymail.com
kfi#aol.in
hi#axus.cc
0#gmail.com
igp#yahoo.com
encor#mail2.com
cjang#mail.com
vn#gmail.com
87#gmail.com
ee#maoyt.com
Example file B 'Domain'
#gmail.com
#yahoo.com
My expected result :
0#gmail.com
igp#yahoo.com
vn#gmail.com
87#gmail.com
is there a way to do with 2 file in emeditor .Thanks much

I would propose using the Join CSV function. #Abimanyu's regex method may work if you have less than 10 or so domains. More than that, it might take a while to process the data.
To prepare the document for joining, right click on the CSV/Sort toolbar and edit the User-defined separated format to use # as the delimiter.
Now on both file A and file B, change the CSV mode to User-defined separated. On the CSV/Sort toolbar, there is a button called "Join CSV".
Join CSV options:
Make sure the correct documents are selected
Key Column is the email domain columns
In the list at the bottom, select the output columns, which should be column 1 and 2 from file A
Press the Join Now button, change CSV mode to Normal mode and you will get an output which looks like this:
0#gmail.com
igp#yahoo.com
vn#gmail.com
87#gmail.com

May be this will be help to you :
Pattern : .*#gmail.com|.*#yahoo.com
Match groups:
Match 1
1. 0#gmail.com
Match 2
1. igp#yahoo.com
Match 3
1. vn#gmail.com
Match 4
1. 87#gmail.com
https://rubular.com/r/M3MVSoRj6qnSbl

Related

How to extract a column based on it's content in PowerBI

I have a column in my table which looks like below.
ResourceIdentifier
------------------
arn:aws:ec2:us-east-1:7XXXXXX1:instance/i-09TYTYTY79716
arn:aws:glue:us-east-1:5XXXXXX85:devEndpoint/etl-endpoint
i-075656565f7fea3
i-02c3434343f22
qa-271111145-us-east-1-raw
prod-95756565631-us-east-1-raw
prod-957454551631-us-east-1-isin-repository
i-02XXXXXXf0
I want a new column called 'Trimmed Resource Identifier' which looks at ResourceIdentifier and if the value starts with "arn", then returns value after last "/", else returns the whole string.
For eg.
arn:aws:ec2:us-east-1:7XXXXXX1:instance/i-09TYTYTY79716  ---> i-09TYTYTY797168
i-02XXXXXXf0 --> i-02XXXXXXf0
How do I do this ? I tried creating a new column called "first 3 letters" by extracting first 3 letters of the ResourceIdentifier column but I am getting stuck at the step of adding conditional column. Please see the image below.
Is there a way I can do all of this in one step using DAX instead of creating a new intermediate column ?
Many Thanks
The GUI is too simple to do exactly what you want but go ahead and use it to create the next step, which we can then modify to work properly.
Filling out the GUI like this
will produce a line of code that looks like this (turn on the Formula Bar under the View tab in the query editor if you don't see this formula).
= Table.AddColumn(#"Name of Previous Step Here", "Custom",
each if Text.StartsWith([ResourceIdentifier], "arn") then "output" else [ResourceIdentifier])
The first three letters bit is already handled with the operator I chose, so all that remains is to change the "output" placeholder to what we actually want. There's a handy Text.AfterDelimiter function we can use for this.
Text.AfterDelimiter([ResourceIdentifier], "/", {0, RelativePosition.FromEnd})
This tells it to take the text after the first / (starting from the end). Replace "output" with this expression and you should be good to go.

Is there any way that I can do format matching within a column in powerBI? ( something similar Fuzzy)

I have a column look like as below.
DK060
DK705
DK715
dk681
dk724
Dk716
Dk 685 (there is a space after Dk).
This is obviously due to human error. Is there any way that I can ensure the format is correct based on the specified format which is two uppercase DK followed by three digits?
Or Am I being too ambitious!!??
Go to the power query editor. Select advance editor and paste this 2 steps
#"Uppercase" = Table.TransformColumns(#"Source",{{"Column", Text.Upper, type text}}),
#"Replace Value" = Table.ReplaceValue(#"Uppercase"," ","",Replacer.ReplaceText,{"Column"})
Note: be sure to replace the "Source" statement into the Uppercase sentence for your previuos step name if needed.
So you will have something like this:
This is the expected result:

Parsing a name from a complex string in Tableau

I have a series of values in Tableau that are long strings intermixed with letters and numbers. I am unable to control the data output, but would like to parse the names from these strings. They follow the following format:
Potato 1TByte 4.5 NFA
Board 256GByte 553 NCA
Launch 4 512GByte 4.5 NFA
Launch 4S 512GByte 4.5 NCA
From each of these, I am attempting to capture the following:
"Potato"
"Board"
"Launch 4"
"Launch 4S"
Each string follows the same format: the name, followed by size, followed by some extra information we don't really care about.
I've tried to put together some text parsing strings, but am coming up short, and am still trying to learn regular expressions.
The Tableau calculated field I was trying to work with was something like the following:
LEFT([String], FIND([String], "Byte") - 2)
The issue is that the text and numbers preceding Byte can be anywhere from 4 to 2 characters and I need a way to identify the length of that.
Any help would be greatly appreciated!
One option which uses a regex replacement:
REGEXP_REPLACE('Launch 4 512GByte 4.5 NFA', ' \d+[A-Z]Byte .*$', '')
This strips off everything from the Byte term to the right, leaving us with only the product name.
You could try the following - this seems to work - Screenshot of Tableau output. Find below the formulas for the various derived columns you see in the screenshot (Your source column is called [Name])
Step1 = LEFT([Name],FIND([Name],"Byte")-1)
Step2 = LEN([Step1])-LEN(REPLACE([Step1]," ",""))
Step3 = FINDNTH([Step1]," ",[Step2])
Step4 = LEFT([Step1],[Step3]-1)
And of course you can nest all these in a single calculated field - kept them as separate columns for easier understanding

HiveQL: Parse strings and count

I am using HiveQL to work with millions of rows of domain name text data stored in HDFS. The following is a hand-selected subset to illustrate lexical diversity. There are duplicate entries.
dnsvm.mgmtsubnet.mgmtvcn.oraclevcn.com.
mgmtsubnet.mgmtvcn.oraclevcn.com.
asdf.mgmtvcn.oraclevcn.com.
dnsvm.mgmtsubnet.mgmtvcn.oraclevcn.com.
localhost.
a.localhost.
img.pulsemgr.com.
36.136.154.156.in-addr.arpa.
accounts.spotify.com.
_dmarc.ixia-devops.com.
&eventtype=close&reason=4&duration=35.
&eventtype=close&reason=3&duration=10336.
I am trying to get a count of # of rows based on the last two levels of the domain, where sometimes the 2nd level is absent (i.e. localhost.). For example:
domain_root count
oraclevcn.com. 4
localhost. 1
a.localhost. 1
pulsemgr.com. 1
in-addr.arpa. 1
spotify.com. 1
ixia-devops.com 1
It would be nice to also see how to filter out domains 2nd level is absent.
I am not sure where to start. I have seen use of the SPLIT() function, but that may not be robust since there could be many levels to a domain name, for example: a.b.c.d.e.f.g.h.i etc.
Any ideas are implementations are appreciated.
Below would be the query with regexp_extract.
select domain_root, count(*) from (select regexp_extract('dnsvm.mgmtsubnet.mgmtvcn.oraclevcn.com.', '[A-Za-z0-9-]+\.[A-Za-z0-9-]+\.$', 0) as domain_root from table) A group by A.domain_root -- replace first argument with column name
regex will extract for domain root with Alphanumeric and special character '-'
hope this helps.

SPLUNK subsearch 2 CSV Files join together

I have 2 Files with order data saved in two different sourcetypes in splunk.
One file contains an orderid, plnum(praefix + orderid (one ordernumer contains 3 plnum)), model (type of the order). The second file contains the same plnum's and Materialnumbers to those plnum's.
I want to search for the top Materials used for one or more Models.
So I searched for how to setup a subsearch:
sourcetype=file1 [search sourcetype=file2 MODEL="someting"| fields MODEL] |stats values(MATNR) by MODEL
I dont know why the subsearch dont work.
Run the subsearch by itself to verify it works and produces the expected results. I suspect it is working and is returning a list of PLNUMs in the form foo bar baz.... Splunk puts an implicit AND between search terms so your main search is looking for events containing all PLNUMs, which is unlikely.
Try using format in your subsearch. It returns the results in foo OR bar OR baz... format, which should work better in the main search.
sourcetype=file1 [search sourcetype=file2 MODEL="someting"| fields PLNUM | format] |stats values(MATNR) by PLNUM