Merge multiple REGEX statements

Merge multiple REGEX statements - regex

I have an example string.
This should be **bold**, *indented* or ***bold and indented***.
The string is parsed using 3 regex that run one after another to get the following result:
This should be <b>bold</b>, <i>indented</i> or <b><i>bold and indented</i></b>.
It's simple and works fine. However, I'd like to save me a few lines (if it's possible, prettier, and more efficient, then why not eh?), and merge them. To make all the replacements in a single regex statement. Is it possible with extra efficiency? or should I leave it as is? (even if I should, I'd like to see a possible solution?)
My matching statements:
\*\*\*(.+?)\*\*\* -> <b><i>$1</b></i>
\*\*(.+?)\*\* -> <b>$1</b>
\*(.+?)\* -> <i>$1</i>

Honestly, keeping them as 3 separate regexes is almost certainly...
More readable
Simpler
(Due to #1 and #2) More maintainable.
Fewer lines is not always better, especially when it comes to regexes.
Also, you only actually need 2 regexes - the bold one and the italic one. Just always run the bold one first:
***foo***
becomes, after the bold regex...
*<b>foo</b>*
and then the italic regex makes that...
<i><b>foo</b></i>
Which is the correct output. (The reason for running the bold one first is because the italic one would match *** as <i>*</i> which is wrong.)

Related

Notepad++ - Selecting or Highlighting multiple sections of repeated text IN 1 LINE

I have a text file in Notepad++ that contains about 66,000 words all in 1 line, and it is a set of 200 "lines" of output that are all unique and placed in 1 line in the basic JSON form {output:[{output1},{output2},...}]}.
There is a set of characters matching the RegEx expression "id":.........,"kind":"track" that occurs about 285 times in total, and I am trying to either single them out, or copy all of them at once.
Basically, without some super complicated RegEx terms, I am stuck because I can't figure out how to highlight all of them at once, and also the Remove Unbookmarked Lines feature does not apply because this is all in one line. I have only managed to be able to Mark every single occurrence.
So does this require a large number of steps to get the file into multiple lines and work from there, or is there something else I am missing?
Edit: I have come up with a set of Macro schemes that make the process of doing this manually work much faster. It's another alternative but still takes a few steps and quite some time.
Edit 2: I intended there to be an answer for actually just highlighting the different sections all at once, but I guess that it not possible. The answer here turns out to be more useful in my case, allowing me to have a list of IDs without everything else.

You seem to already have a regex which matches single instances of your pattern, so assuming it works and that we must use Notepad++ for this:
Replace .*?("id":.........,"kind":"track").*?(?="id".........,"kind":"track"|$) with \1.
If this textfile is valid JSON, this opens you up to other, non-notepad++ options, like using Python with the json module.
Edited to remove unnecessary steps

How to programmatically learn regexes?

My question is a continuation of this one. Basically, I have a table of words like so:
HAT18178_890909.098070313.1
HAT18178_890909.098070313.2
HAT18178_890909.143412462.1
HAT18178_890909.143412462.2
For my purposes, I do not need the terminal .1 or .2 for this set of names. I can manually write the following regex (using Python syntax):
r = re.compile('(.*\.\d+)\.\d+')
However, I cannot guarantee that my next set of names will have a similar structure where the final 2 characters will be discardable - it could be 3 characters (i.e. .12) and the separator could change as well (i.e. . to _).
What is the appropriate way to either explicitly learn a regex or to determine which characters are unnecessary?

It's an interesting problem.
X y
HAT18178_890909.098070313.1 HAT18178_890909.098070313
HAT18178_890909.098070313.2 HAT18178_890909.098070313
HAT18178_890909.143412462.1 HAT18178_890909.143412462
HAT18178_890909.143412462.2 HAT18178_890909.143412462
The problem is that there is not a single solution but many.
Even for a human it is not clear what the regex should be that you want.
Based on this data, I would think the possibilities to learn are:
Just match a fixed width of 25: .{25}
Fixed first part: HAT18178_890909.
Then:
There's only 2 varying numbers on each single spot (as you show 2 cases).
So e.g. [01] (either 0 or 1), [94] the next spot and so on would be a good solution.
The obvious one would be \d+
But it could also be \d{9}
You see, there are multiple correct answers.
These regexes would still work if the second point would be an underscore instead.
My conclusion:
The problem is that it is much more work to prepare the data for machine learning than it is to create a regex. If you want to be sure you cover everything, you need to have complete data, so then a regex is probably less effort.

You could split on non-alphanumeric characters;
[^a-zA-Z0-9']+
That would get you, in this case, few strings like this:
HAT18178
890909
098070313
1
From there on you can simply discard the last one if that's never necessary, and continue on processing the first sequences

Regex for converting file path to package/namespace

Given the following file path:
/Users/Lawrence/MyProject/some/very/interesting/Code.scala
I would like to generate the following using a single regex replace (the root can be a constant):
some.very.interesting
This is for the purpose of generating a snippet for Sublime Text which can automatically insert the correct package/namespace header for my scala/java classes :)
Sublime Text uses the following syntax for their regex replace patterns (aka 'substitutions'):
{input/regex/replace/flags}
Hence why an iterative approach cannot be taken - it has to be done in one pass! Also, substitutions cannot be nested :(

If you know the maximum number of nested folders.You can specify that in your regex.
For 1 to 3 nested folders
Regex:/Users/Lawrence/MyProject/(\w+)/?(\w+)?/?(\w+)?/[^/]+$
Replace:$1.$2.$3
For 1 to 5 nested folders
Regex:/Users/Lawrence/MyProject/(\w+)/?(\w+)?/?(\w+)?/?(\w+)?/?(\w+)?/[^/]+$
Replace:$1.$2.$3.$4.$5
Given the constraints this is only thing you can do

Input
/Users/Lawrence/MyProject/some/very/interesting/Code.scala
Regex
^/Users/Lawrence/MyProject/[^/]+/[^/]+/[^/]+/Code.scala
or
^/[^/]+/[^/]+/[^/]+/([^/]+)/([^/]+)/([^/]+)/
Replace
\1.\2.\3
Update
This gets you closer, but not exactly it:
Regex
(^/Users/Lawrence/MyProject/|/Code\.scala$|/)
Replacement
.
Output would be:
.some.very.interesting.
Without multiple replacements in a single line and without recursive back references it's going to be hard.
You might have to do a second replacement, replacing something like this with an empty string (if you can):
(^\.|\.$)

Vim different textwidth for multiline C comments?

In our C++ code base we keep 99 column lines but 79-some-odd column multiline comments. Is there a good strategy to do this automagically? I assume the modes are already known because of smart comment line-joining and leading * insertion.

Apparently both code and comments use the same textwidth option. As far as I can see, the only trick is to set this option dynamically:
:autocmd CursorMoved,CursorMovedI * :if match(getline(.), '^\s*\*') == 0 | :setlocal textwidth=79 | :else | :setlocal textwidth=99 | :endif
Here the critical part is detecting when we are in a comment. If you only format comments this way:
/*
* my comment
*/
my regex should work... unless you have lines in the code starting with * (which I guess can happen in C, less frequently in C++). If you use comments like this:
// comment line 1
// comment line 2
the regex is even simpler to write. If you want to cover all possible situations, including corner cases, well... I guess the best thing would be to define a separate detection function and call that from the :autocmd instead of match().

I came across this same problem and think that I have found a suitable solution.
What I wanted my comments to word wrap so that when I'm typing I don't have to worry about formating text. This works well with comment text. But I wasn't comfortable with having vim format my code. So I wanted vim to highlight every thing in red after x column.
To do this with only cpp code you would add the following to your ~/.vim/ftdetect/cpp.vim file.
set textwidth=79
match ErrorMsg '\%>99v.\+'
note: You may have to create the file and folders if they don't exist.
If you have problems with this make sure that you have formatoptions set to:
formatoptions=croql
You can see this by running :set formatoptions inside of vim.

RegEx - Not match inside a text?

I am working with iCal entries:
BEGIN:VEVENT
UID:944f660b-01f8-4e09-95a9-f04a352537d2
ORGANIZER;CN=******
DTSTART;TZID="America/Chicago":20100802T080000
DTEND;TZID="America/Chicago":20100822T170000
STATUS:CONFIRMED
CLASS:PRIVATE
X-MICROSOFT-CDO-INTENDEDSTATUS:BUSY
TRANSP:OPAQUE
X-MICROSOFT-DISALLOW-COUNTER:TRUE
DTSTAMP:20100802T212130Z
SEQUENCE:0
END:VEVENT
BEGIN:VEVENT
UID:aa132e2b-8a8d-4ffc-9e54-b75249e78c72
RRULE:FREQ=DAILY;COUNT=12;INTERVAL=1
SUMMARY:***********
X-ALT-DESC;FMTTYPE=text/html:<html><body><div style='font-family:Times New R
oman\; font-size: 12pt\; color: #000000\;'></div></body></html>
LOCATION:Map Room
ORGANIZER;CN=*********
DTSTART;TZID="America/Chicago":20100730T080000
DTEND;TZID="America/Chicago":20100730T170000
STATUS:CONFIRMED
CLASS:PUBLIC
X-MICROSOFT-CDO-INTENDEDSTATUS:BUSY
TRANSP:OPAQUE
X-MICROSOFT-DISALLOW-COUNTER:TRUE
DTSTAMP:20100727T025231Z
SEQUENCE:0
EXDATE;TZID="America/Chicago":20100810T080000
EXDATE;TZID="America/Chicago":20100807T080000
BEGIN:VALARM
ACTION:DISPLAY
TRIGGER;RELATED=START:-PT5M
DESCRIPTION:*********
END:VALARM
END:VEVENT
I need to parse out starting and ending times. I have a comparison function that determines if the passed in event is between the two times. Due to the increased complexity in calculating the times I plan on not supporting the recurrance series. I would like to play the safe side and make sure my code only reads the first event as a match and not the second. So I have the following RegEx with the single-line option:
BEGIN:VEVENT.+?
DTSTART;.+?([0-9]{8})T([0-9]{6})
DTEND;.+?([0-9]{8})T([0-9]{6}).+?
END:VEVENT
This gets me the start and end times of both entries. My thought was to only match ones that don't have FREQ= between the BEGIN:VEVENT and DTSTART. I don't understand how to do this, however. I was wondering if someone could help me out here?
I realize at a certain point a full blown parser is a better option, but I am unskilled with parsers and I am under a slight time constraint. I have tried using the !? operator without success.

It's harder to write a regex to match for things you don't want then to match the things you do want. Usually when I run into this situation, I find it easier and faster to do things in two steps. In this case, I'd probably find all events that do contains FREQ=, remove those events, then continue matching on the result for the start and end times I want. Could you post the regex you tried with !?, because maybe it's easy to fix... Also, I assume this is in Objective-C, and I'm guessing the environment you're using does support !? (but not all of them do)...
UPDATE
Ok, try this one:
BEGIN:VEVENT.+?
(?<!FREQ=.+)DTSTART;.+?([0-9]{8})T([0-9]{6})
DTEND;.+?([0-9]{8})T([0-9]{6}).+?
END:VEVENT

Why not use a PHP iCalendar parser?
http://www.phpclasses.org/browse/file/16660.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Merge multiple REGEX statements - regex

Related

Notepad++ - Selecting or Highlighting multiple sections of repeated text IN 1 LINE

How to programmatically learn regexes?

Regex for converting file path to package/namespace

Vim different textwidth for multiline C comments?

RegEx - Not match inside a text?

Categories

Resources