remove after and before ip addresses - regex

I want to delete everything except IPs.
For example
1 138.68.161.60:1080 SOCKS5 HIA United States (New York NY) 138.68.161.60 (DigitalOcean, LLC) 0.143 75% (3) - 12-jan-2018 14:37 (10 minutes ago)
2 174.64.234.29:17501 SOCKS5 HIA United States wsip-174-64-234-29.sd.sd.cox.net (Cox Communications Inc.) 0.956
100% (5) - 12-jan-2018 14:36 (10 minutes ago)
3 45.79.219.154:63189 SOCKS5 HIA United States (Atlanta GA) li1318-154.members.linode.com (Linode, LLC) 6.973
90% (103) - 12-jan-2018 14:36 (11 minutes ago)
to
138.68.161.60:1080
174.64.234.29:17501
45.79.219.154:63189
I need a regex to this convert.

In Notepad++, it requires some finesse to delete text not containing matched strings, but you can choose Find, Mark, then check the Regular expression box and use the regex:
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}+) and Mark Allto bookmark all rows containing IP adresses.
Then select Find, Replace, enter ^[0-9]\W in Find what:, and Replace All with nothing.
Then select Find, Replace, enter \w+S.+ in Find what:, and Replace All with nothing.
Then, go to Search, Bookmark, Remove Unmarked Lines.
Et Voilà!

You could use this regex in notepad++ and replace the captured values with group 1 \1
(?s)(\d \d+\.\d+\.\d+\.\d+:\d+).*?\(\d+ minutes ago\)
You select all the text for each of the 3 blocks from your example and use a capturing group for the text that you want to keep. Then in the replace you use only the captured group which holds your data.
Explanation
Inline modifier to make the dot match a line break (?s)
Group 1 with the pattern that you want to capture (\d \d+\.\d+\.\d+\.\d+:\d+)
Match any character zero or more times non greedy .*?
The pattern that is at the end of every part \(\d+ minutes ago\)

Related

GSheets - remove everything *after* a word (but keep the word)

How can I remove everything after a specific word (while keeping the word)?
I want to remove everything after the word 'films'.
"George Fellini 194 films 273 169 Edit" would turn into "George Fellini 194 films"
"Rick Bathista 7 films 10 27 Edit" would turn into "Rick Bathista 7 films"
There are many posts that are similar but aren't google sheets specific, and the two google sheets specific answers I've found eliminate the word I want to keep.
(It would be a bonus if it could also keep the singular "film" but not necessary.
What I've tried:
=REGEXEXTRACT(B2,"(.*) films .*") - deletes the word 'films'
=regexreplace(B2,"films ","") - also deletes the word 'films'
my sheet: https://docs.google.com/spreadsheets/d/1UL0cvdgbwJIAPSJTxajxM7_pw_pPqxq-Ofmt8uK6J6o/edit?usp=sharing
Use this formula:
=REGEXEXTRACT(B2,".*films?")
The documentation of REGEXEXTRACT says:
Extracts matching substrings according to a regular expression.
The regular expression matches any sequence of zero or more characters (.*) followed by film and an optional s (s?).
use:
=INDEX(IFNA(REGEXEXTRACT(B2:B, "(.+films)")))
() - extract group of something
.+ - all characters / anything
(.+films) - extract group of all characters ended by films included

Regular expression for matching a specifc substring of a string

I have a log file that logs connection drops of computers in a LAN. I want to extract name of each computer from every line of the log file and for that I am doing this: (?<=Name:)\w+|(-PC)
The target text:
`[C417] ComputerName:KCUTSHALL-PC UserID:GO kcutshall Station 9900 (locked) LanId: | (11/23 10:54:09 - 11/23 10:54:44) | Average limit (300) exceeded while pinging www.google.com [74.125.224.147] 8x
[C445] ComputerName:FRONTOFFICE UserID:YB Yenae Ball Station 7C LanId: | (11/23 17:02:00) | Client is connected to agent.`
The problem is that some computer names have -PC in them and in some isn't. The expression I have created matches computer without -PC in their names but it if a computer has -PC in the name, it treats that as a separate match and I don't want that. In short, it gives me 3 matches, but I want only 2. That's why I need help here, I am beginner in regex.
You may use
(?<=Name:)\w+(?:-PC)?
Details
(?<=Name:) - a place immediately preceded with Name:
\w+ - 1+ word chars
(?:-PC)? - an optional non-capturing group that matches 1 or 0 occurrences of -PC substring.
Consider using word boundaries if you need to match PC as a whole word,
(?<=Name:)\w+(?:-PC\b)?
See the regex demo.

How to remove specific characters in notepad++ with regex?

This is data present in my .txt file
+919000009998 SMS +919888888888
+919000009998 MMS +91988 88888 88
+919000009998 MMS abcd google
+919000009998 MMS amazon
I want to convert my .txt like this
919000009998 SMS 919888888888
919000009998 MMS 919888888888
919000009998 MMS abcd google
919000009998 MMS amazon
removing the + symbol, and also the spaces if present in third column only if it is a number, if it is string no operation to be performed
is there any regex to do this which can I write in search and replace in notepad++?
Ctrl+H
Find what: \+|(?<=\d)\h+(?=\d)
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
Replace all
Explanation:
\+ # + sign
| # OR
(?<=\d) # positive lookbehind, make sure we have a digit before
\h+ # 1 or more horizontal spaces
(?=\d) # positive lookahead, make sure we have a digit after
Screen capture:
All previous answer will perfectly work.
However, I'm just adding this just in case you need it:
If for some reason you had non-phone numbers on the third column separated by spaces (a street comes to mind for me +919000009998 MMS street foo nº 123 4º-B) you may use this regex instead (It will join number as long as the third column starts by +):
Search: ^[+](\S+\s+\S+\s++)(?:([^+][^\n]*)|[+])|\G\s*(\d+)
Replace by: \1\2\3
That will avoid joining the 3 and 4 on my previous example.
You have a demo here.

Capture the latest in backreference

I have this regex
(\b(\S+\s+){1,10})\1.*MY
and I want to group 1 to capture "The name" from
The name is is The name MY
I get "is" for now.
The name can be any random words of any length.
It need not be at the beginning.
It need on be only 2 or 3 words. It can be less than 10 words.
Only thing sure is that it will be the last set of repeating words.
Examples:
The name is Anthony is is The name is Anthony - "The name is Anthony".
India is my country All Indians are India is my country - "India is my country "
Times of India Alphabet Google is the company Alphabet Google canteen - "Alphabet Google"
You could try:
(\b\w+[\w\s]+\b)(?:.*?\b\1)
As demonstrated here
Explanation -
(\b\w+[\w\s]+\b) is the capture group 1 - which is the text that is repeated - separated by word boundaries.
(?:.*?\b\1) is a non-capturing group which tells the regex system to match the text in group 1, only if it is followed by zero-or-more characters, a word-boundary, and the repeated text.
Regex generally captures thelongest le|tmost match. There are no examples in your question where this would not actualny be the string you want, but that could just mean you have not found good examples to show us.
With that out of the way,
((\S+\s)+)(\S+\s){0,9}\1
would appear to match your requirements as currently stated. The "longest leftmost" behavior could still get in the way if there are e.g. straddling repetitions, like
this that more words this that more words
where in the general case regex alone cannot easily be made to always prefer the last possible match and tolerate arbitrary amounts of text after it.

regex return everything up to the first space after nth character

I have a list of product names and I want to shorten them (Short Name). I need a regex that will return the first word if it is more than 5 characters and the first two words if it is 5 characters or less.
Product Name Short Name
BABY WIPES MIS /ALOE BABY WIPES
PKU GEL PAK PKU GEL
CA ASCORBATE TAB 500MG CA ASCORBATE
SOD SUL/SULF CRE 10-2% SOD SUL/SULF
ASPIRIN TAB 81MG EC ASPIRIN
IRON TAB 325MG IRON TAB
PEDA PEDA
I initially used:
^([^ \t]+).*
but it only returns the first word so BABY WIPES MIS /ALOE would be BABY. I then tried:
.....([^ \t]+)
But this appears to not work for names less than 5 characters. Any help would be greatly appreciated.
Brief
Your try is close, however, since you negated spaces and tabs, you were unable to move past the first word.
Code
See code in use here
^(\S{1,5}[ \t]*?\S+).*$
Note: The link uses the following shortened regex. \h may not work in your flavour of regex, which is why the code above is posted as well.
^(\S{1,5}\h*?\S+).*$
Super-simplified it becomes ^\S{1,5}\h*?\S+ (without capture groups and .*$ as the OP initially used.)
Results
Input
BABY WIPES MIS /ALOE
PKU GEL PAK
CA ASCORBATE TAB 500MG
SOD SUL/SULF CRE 10-2%
ASPIRIN TAB 81MG EC
IRON TAB
PEDA
Output
BABY WIPES
PKU GEL
CA ASCORBATE
SOD SUL/SULF
ASPIRIN
IRON TAB
PEDA
Explanation
^ Assert position at the start of a line
(\S{1,5}[ \t]*?\S+) Capture group doing the following
\S{1,5} Match any non-whitespace character between 1 and 5 times
[ \t]*? Match space or tab characters any number of times, but as few as possible (note in PCRE regex, this can be replaced with \h*? to make it shorter)
\S+ Match any non-whitespace character between one and unlimited times
.* Match any character (except newline character assuming s modifier is off - it should be for this problem)
$ Assert position at the end of a line
You can use a regex like this:
^\S{1,5} \S+|^\S+
or
^\S{1,5} ?\S*
Working demo
By the way, if you want to replace a full line with the shortened version, then you can use this regex instead:
(^\S{1,5} \S+|^\S+).*
or
(^\S{1,5} ?\S*).*
With the replacement string $1 or \1 depending on your regex engine.
Working demo