Use recoginsed data for replacing - regex

I have column of dates in my Notepad++:
2017-06-12
2017-06-13
2017-06-14
2017-06-15
2017-06-16
2017-06-17
2017-06-18
2017-06-19
2017-06-20
2017-06-20
2017-06-21
2017-06-22
2017-06-23
2017-06-24
2017-06-25
2017-06-26
2017-06-27
2017-06-28
2017-06-29
2017-06-30
2017-07-01
2017-07-02
2017-07-03
2017-07-04
2017-07-05
2017-07-06
2017-07-07
2017-07-08
2017-07-09
2017-07-10
I need it to cut in weeks by placing \r\n after each week like :
2017-06-12
2017-06-13
2017-06-14
2017-06-15
2017-06-16
2017-06-17
2017-06-18
2017-06-19
2017-06-20
2017-06-20
2017-06-21
2017-06-22
2017-06-23
2017-06-24
2017-06-25
2017-06-26
2017-06-27
2017-06-28
2017-06-29
2017-06-30
2017-07-01
2017-07-02
2017-07-03
2017-07-04
2017-07-05
2017-07-06
2017-07-07
2017-07-08
2017-07-09
2017-07-10
I do replace by using RegEx. I find 7 days:
\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n
And now I would like to add \r\n
But how to use selected data for replace with itself plus \r\n ?

If you are sure that the first date is monday, you could that:
Ctrl+H
Find what: (?:\d{4}-\d\d-\d\d\R){7}
Replace with: $0\r\n
Replace all

In your example input there are some lines doubled. e.g. the 2017-06-20. In your example output this line is also doubled and the week-block consists of eight lines. Seven unique lines and one doubled line for 2017-06-20. I assume that all lines in the input are sorted, thus non unique lines are behind each other. Additionally I assume that the first line marks the first day of a week.
Do a regular expression find/replace like this:
Open Replace Dialog
Find What: (((.*\R)\3*){7})
Replace With: \1\r\n
Check regular expression, do not check . matches newline
Click Replace or Replace All
Explanation
Lets explain (((.*\R)\3*){7}) from the inside out, starting at the third inner group: in the following x,y are regex-parts and do not mean literal characters.
(.*\R) the third group is just one line from start to end
(y\3*) we look for a y followed by an optional part that is captured in the third braces group, here it means a y followed by an optional number of repetitions of y, here y is the third group referenced by \3; this deals with the 2017-06-20 case
(x{7}) we match seven repetions of x, which means here seven unique rows wich can have repetitions in the block, so 8 line with one line doubled is ok

Related

How to remove specific characters in notepad++ with regex?

This is data present in my .txt file
+919000009998 SMS +919888888888
+919000009998 MMS +91988 88888 88
+919000009998 MMS abcd google
+919000009998 MMS amazon
I want to convert my .txt like this
919000009998 SMS 919888888888
919000009998 MMS 919888888888
919000009998 MMS abcd google
919000009998 MMS amazon
removing the + symbol, and also the spaces if present in third column only if it is a number, if it is string no operation to be performed
is there any regex to do this which can I write in search and replace in notepad++?
Ctrl+H
Find what: \+|(?<=\d)\h+(?=\d)
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
Replace all
Explanation:
\+ # + sign
| # OR
(?<=\d) # positive lookbehind, make sure we have a digit before
\h+ # 1 or more horizontal spaces
(?=\d) # positive lookahead, make sure we have a digit after
Screen capture:
All previous answer will perfectly work.
However, I'm just adding this just in case you need it:
If for some reason you had non-phone numbers on the third column separated by spaces (a street comes to mind for me +919000009998 MMS street foo nº 123 4º-B) you may use this regex instead (It will join number as long as the third column starts by +):
Search: ^[+](\S+\s+\S+\s++)(?:([^+][^\n]*)|[+])|\G\s*(\d+)
Replace by: \1\2\3
That will avoid joining the 3 and 4 on my previous example.
You have a demo here.

remove after and before ip addresses

I want to delete everything except IPs.
For example
1 138.68.161.60:1080 SOCKS5 HIA United States (New York NY) 138.68.161.60 (DigitalOcean, LLC) 0.143 75% (3) - 12-jan-2018 14:37 (10 minutes ago)
2 174.64.234.29:17501 SOCKS5 HIA United States wsip-174-64-234-29.sd.sd.cox.net (Cox Communications Inc.) 0.956
100% (5) - 12-jan-2018 14:36 (10 minutes ago)
3 45.79.219.154:63189 SOCKS5 HIA United States (Atlanta GA) li1318-154.members.linode.com (Linode, LLC) 6.973
90% (103) - 12-jan-2018 14:36 (11 minutes ago)
to
138.68.161.60:1080
174.64.234.29:17501
45.79.219.154:63189
I need a regex to this convert.
In Notepad++, it requires some finesse to delete text not containing matched strings, but you can choose Find, Mark, then check the Regular expression box and use the regex:
([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}+) and Mark Allto bookmark all rows containing IP adresses.
Then select Find, Replace, enter ^[0-9]\W in Find what:, and Replace All with nothing.
Then select Find, Replace, enter \w+S.+ in Find what:, and Replace All with nothing.
Then, go to Search, Bookmark, Remove Unmarked Lines.
Et Voilà!
You could use this regex in notepad++ and replace the captured values with group 1 \1
(?s)(\d \d+\.\d+\.\d+\.\d+:\d+).*?\(\d+ minutes ago\)
You select all the text for each of the 3 blocks from your example and use a capturing group for the text that you want to keep. Then in the replace you use only the captured group which holds your data.
Explanation
Inline modifier to make the dot match a line break (?s)
Group 1 with the pattern that you want to capture (\d \d+\.\d+\.\d+\.\d+:\d+)
Match any character zero or more times non greedy .*?
The pattern that is at the end of every part \(\d+ minutes ago\)

How to creating a regex pattern in VBA to extract dates from string and exclude false matches

I am trying to use Regex to parse a series of strings to extract one or more text dates that may be in multiple formats. The strings will look something like the following:
24 Aug 2016: nno-emvirt010a/b; 16 Aug 2016 nnt-emvirt010a/b nnd-emvirt010a/b COSI-1.6.5
24.16 nno-emvirt010a/b nnt-emvirt010a/b nnd-emvirt010a/b EI.01.02.03\
9/23/16: COSI-1.6.5 Logs updated at /vobs/COTS/1.6.5/files/Status_2016-07-27.log, Status_2016-07-28.log, Status_2016-08-05.log, Status_2016-08-08.log
I am not concerned about validating the individual date fields; just extracting the date string. The part I am unable to figure out is how to not match on number sequences that match the pattern but aren’t dates (‘1.6.5’ in ex. (1) and 01.02.03 in ex. (2)) and dates that are part of a file name (2016-07-27 in ex. (3)). In each of these exception cases in my input data, the initial numbers are preceded by either a period(.), underscore (_) or dash (-), but I cannot determine how to use this to edit the pattern syntax to not match these strings.
The pattern I have that partially works is below. It will only ignore the non date matches if it starts with 1 digit as in example 1.
/[^_\.\(\/]\d{1,4}[/\-\.\s*]([1-9]|0[1-9]|[12][0-9]|3[01]|[a-z]{3})[/\-\.\s*]\d{1,4}/ig`
I am not sure about vba check if this works . seems they have given so much options : https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch04s04.html
^(?:(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])|↵
(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9]))/(?:[0-9]{2})?[0-9]{2}$
^(?:
# m/d or mm/dd
(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])
|
# d/m or dd/mm
(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])
)
# /yy or /yyyy
/(?:[0-9]{2})?[0-9]{2}$
According to the test strings you've presented, you can use the following regex
See this regex in use here
(?<=[^a-zA-Z\d.]|^)((?:\d{1,2}\s*[A-Z][a-z]{2}\s*\d+)|(?:(?:\d{1,2}\/){2}\d+)|(?:\d+(?:-\d{2}){2})|\d{2}\.\d{2})(?=[^a-zA-Z\d.])
This regex ensures that specific date formats are met and are preceded by nothing (beginning of the string) or by a non-word character (specifically a-z, A-Z, 0-9) or dot .. The date formats that will be matched are:
24 Aug 2016
24.16
9/23/16
The regex could be further manipulated to ensure numbers are in the proper range according to days/month, etc., however, I don't feel that is really necessary.
Edits
Edit 1
Since VBA doesn't support lookbehinds, you can use the following. The date is in capture group 1.
(?:[^a-zA-Z\d.]|^)((?:\d{1,2}\s*[A-Z][a-z]{2}\s*\d+)|(?:(?:\d{1,2}\/){2}\d+)|(?:\d+(?:-\d{2}){2})|\d{2}\.\d{2})(?=[^a-zA-Z\d.])
Edit 2
As per bulbus's comment below
(?:[^\w.]|^)((?:\d{1,2}\s*[A-Z][a-z]{2}\s*\d{2,4})|(?:(?:\d{‌1,2}\/){2}\d{2,4})|(‌​?:\d{2,4}(?:-\d{2}){‌​2})|\d{2}\.\d{2})
Took liberty to edit that a bit.
replaced [^a-zA-Z\d.] with [^\w.], comes with added advantage of excluding dates with _2016-07-28.log
Due to 1 removed trailing condition (?=[^a-zA-Z\d.]).
Forced year digits from \d+ to \d{2,4}
Edit 3
Due to added conditions of the regex, I've made the following edits (to improve upon both previous edits). As per the OP:
The edited pattern above works in all but 2 cases:
it does not find dates with the year first (ex. 2016/07/11)
if the date is contained within parenthesis in the string, it returns the left parenthesis as part of the date (ex. match = (8/20/2016)
Can you provide the edit to fix these?
In the below regexes, I've changed years to \d+ in order for it to work on any year greater than or equal to 0.
See the code in use here
(?:[^\w.]|^)((?:\d{1,2}\s+[A-Z][a-z]{2}\s+\d+)|(?:(?:\d{1,2}\/){2}\d+)|(?:\d+(?:\/\d{1,2}){2})|(?:\d+(?:-\d{2}){2})|\d{2}\.\d+)
This regex adds the possibility of dates in the XXXX/XX/XX format where the date may appear first.
The reason you are getting ( as a match before the regex is the nature of the Full Match. You need to, instead, grab the value of the first capture group and not the whole regex result. See this answer on how to grab submatches from a regex pattern in VBA.
Also, note that any additional date formats you need to catch need to be explicitly set in the regex. Currently, the regex supports the following date formats:
\d{1,2}\s+[A-Z][a-z]{2}\s+\d+
12 Apr 17
12 Apr 2017
(?:\d{1,2}\/){2}\d+
1/4/17
01/04/17
1/4/2017
01/04/2017
\d+(?:\/\d{1,2}){2}
17/04/01
2017/4/1
2017/04/01
17/4/1
\d+(?:-\d{2}){2}
17-04-01
2017-04-01
\d{2}\.\d+ - Although I'm not sure what this date format is even used for and how it could be considered efficient if it's missing month
24.16

Parse a text file with no newline using RegEx

I have a text file like below. Every record has 12 fields which are separated by |, but there is no record delimiter like a newline, and every record starts with 555. I am trying to parse it with RegEx.
555|abc|user|2|20120914055204696|20120914055204718|0||||21|33555|def|udp|2|20120914055204696|20120914055204718|0||||22|33555|abc|user|2|20120914055204696|20120914055204718|0||||23|33
I tried with 555(\|.*?\|){12}(\d\d), but it did not work. Can anyone please help me with this?
You can use
555(?:\|[^|]*){11}(?=$|555)
See demo
It will match these records in the input string:
555|abc|user|2|20120914055204696|20120914055204718|0||||21|33
555|def|udp|2|20120914055204696|20120914055204718|0||||22|33
555|abc|user|2|20120914055204696|20120914055204718|0||||23|33
The regex 555(?:\|[^|]*){11}(?=$|555) matches:
555 - literal 555
(?:\|[^|]*){11} - 11 occurrences of | followed by any number of characters other than |
(?=$|555) - up to (but not returning as part of the match) end of string or 555.
555(?:\|[^|]*?){11}\d\d
You need to remove the second | .See demo.
https://regex101.com/r/sS2dM8/31

Replacing multiple blank lines with one blank line using RegEx search and replace

I have a file that I need to reformat and remove "extra" blank lines.
I am using the Perl syntax regular expression search and replace functionality of UltraEdit and need the regular expression to put in the "Find What:" field.
Here is a sample of the file I need to re-format.
All current text
REPLACE with all the following:
Winter 2011 Class Schedule
Winter 2011 Class Registration Dates: Dec. 6, 2010 – Jan. 1, 2011
Winter 2011 Class Session Dates: Jan. 5 – Feb. 12, 2011
DANCE
Adventures in Ballet & Tap
3 – 6 years Instructor: Ann Newby
Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement.
Saturdays 9 - 10 a.m. Jan. 8 – Feb. 12 Six-week fees: $30
African Storytelling
3 – 6 years Instructor: Ann Newby
Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences.
Saturdays 10 – 11 a.m. Jan. 8 – Feb. 12 Six-week fee: $30
African Dance / Children
You'll notice that some of the double blank lines have spaces or tabs or both in them.
After the search and replace has been run I should have a file that looks like this.
All current text
REPLACE with all the following:
Winter 2011 Class Schedule
Winter 2011 Class Registration Dates: Dec. 6, 2010 – Jan. 1, 2011
Winter 2011 Class Session Dates: Jan. 5 – Feb. 12, 2011
DANCE
Adventures in Ballet & Tap
3 – 6 years Instructor: Ann Newby
Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement.
Saturdays 9 - 10 a.m. Jan. 8 – Feb. 12 Six-week fees: $30
African Storytelling
3 – 6 years Instructor: Ann Newby
Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences.
Saturdays 10 – 11 a.m. Jan. 8 – Feb. 12 Six-week fee: $30
African Dance / Children
Replacing
^(\s*\r\n){2,}
With
\r\n
Is what I ended up with.
This only selects blank lines in multiples of two or more and replaces them with one.
It depends what the line endings are. Assuming \n, replace this:
([ \t]*\n){3,}
with \n\n.
Try this perl oneliner perl -00pe0, if you want in place editing, just add -i option
Replacing
\n\s*\n\s*
with
\n\n
should do the trick
For completeness I want to reference here the large post Remove / delete blank and empty lines in the user forums of UltraEdit which contains at bottom after all the explanations for newbies the solution for reducing two or more lines with nothing (empty lines) or just whitespaces (blank lines) to one empty line independent on line terminator type.
And some words on what Alan Moore wrote in his answer:
UltraEdit's Perl regular expression support is not crippled by its line-based architecture. Perl regular expression engines have a flag which determine if a dot matches all characters except newline characters like carriage return (CR) and line feed (LF) or really all characters including CR and LF. This makes the difference if a text file is interpreted as large byte stream or as a sequence of lines for Perl regular expression finds/replaces. In UltraEdit the flag is set by default to not include \r (CR) and \n (LF) by a dot in the regular expression search string. But this behavior can be easily changed in UltraEdit by starting the regular expression string with (?s) which changes the value of the flag match_not_dot_newline as posted in UltraEdit user forums at topic "." in Perl regular expressions doesn't include CRLFs?
A Perl regular expression replace working for files with
carriage return + line feed (DOS/Windows) or
only line feed (Unix, Mac OS 10.0 and later versions) or
only carriage return (Mac OS 9 and previous versions)
as line ending with optionally trailing spaces and tabs at end of a paragraph (one or more lines) and with two or more lines without (empty line) or with whitespaces (blank line) below the paragraph could be done with search string \h*(\r?\n|\r)(?:\h*\1){2,} and \1\1 as replace string.
Explanation:
\h* matches any horizontal whitespace character according to Unicode 0 or more times. This first part of the search expression matches horizontal whitespace characters at end of a line like horizontal tabs, normal spaces, no-break-spaces and some other not often used spaces.
The usage of \s is not good as this character class matches any whitespace character including the vertical whitespace characters carriage return and line feed.
(\r?\n|\r) ... is an OR expression with two arguments in a marking group. The first argument matches a line feed optionally with a preceding carriage return while the second argument matches just a carriage return. So this expression matches all three common types of line terminations completely correct. It is important for the rest of the search and the replace to match always either CR+LF (both together) or just LF or just CR.
(?:\h*\1) ... is a non marking group which matches 0 or more horizontal whitespaces and the newline as found before back-referenced with \1, i.e. CR+LF or just LF or just CR. So this part of the expression finds an empty or blank line.
{2,} ... is a multiplier for the previous expression in the non marking group which means at least two times. So after end of a paragraph there must be two or more empty or blank lines. Only one empty or blank line below a paragraph is not enough for a positive match of search expression.
The replace string \1\1 references twice the first found line break.
The advantage of this regular expression in comparison to the others posted here is that the line ending type must not be known. The search expression finds that out and found line ending is referenced in the replace string. And probably existing trailing whitespaces at end of a paragraph and whitespaces on next line are removed also by this regular expression replace if there are two or more empty or blank lines below a paragraph.
{2,} can be replaced by + in search string if trimming whitespaces at end of a paragraph and on next empty or blank line should be also done on running this Perl regular expression replace. But please note that in this case the replace makes replaces which do not change anything at all if there are not trailing whitespaces at end of a paragraph and next line is an empty line.
In Vim, Using
:%!cat -s
I find this is the easiest way to delete extra empty line so far.
I'm not sure what UltraEdit lets you get away with in the "replace" area, but if you cannot use a newline (I've had this problem before) but can use capture references, this might work:
Find : \s*(\r\n)\s*(\r\n)\s*\r\n
Replace : $1$2
Not tested extensively, but seems to work on the sample you provided.
See this thread for what's causing the problem. As I understand it, UltraEdit regexes are greedy at the character level (i.e., within a line), but non-greedy at the line level (roughly speaking). I don't have access to UE, but I would try writing the regex so it has to match something concrete after the last blank line. For example:
search: (\r\n[ \t]*){2,}(\S)
replace: $1$2
This matches and captures two or more instances of a line separator and any horizontal whitespace that follows it, but it only retains the last one. The \S should force it to keep matching until it finds a line with at least one non-whitespace character.
I admit that I don't have a whole lot of confidence in this solution; UltraEdit's regex support is crippled by its line-based architecture. If you want an editor that does regexes right, and you don't want to learn a whole new regex syntax (like vim's), get EditPadPro.
Should also work with spaces on blank lines
Search - /\n^\s*\n/
Replace - \n\n
On my Intellij IDE what was search for \n\n and Replace it by \n