Regular Expression help skip lines "/*X" but not "/**/

Regular Expression help skip lines "/*X" but not "/**/ - regex

So for PCRE (PHP) regular expressions, I'm trying to read output from a search on the mainframe platform from the development team.
I need to parse out file names so I can join back into other lists I am tracking details of a system migration.
My expression so far is on Regex101.com here with sample data.
I'm certain there are more efficiencies I can introduce, but for now I'm looking to meet the requirement before I go down that road.
Here is the code for reference
^(?#
Objective is to capture the first two columns: program name and line number. This part is easy
I'm able to skip lines with comments and start parsing at the 'QRY' string.
Challenge is I would like to skip lines like this
.*\/\*QRY.*$
and include lines like this [end comment appears before the 'QRY' string]
.*\/\*.*\*\/QRY.*$
Check for comment indicator and skip lines with comments
)(?!.*\/\*.*(?!\*\/)QRY)(?#
)(?#
Program name
)^(?<prgName>.+?)[[:blank:]](?#
QRY clause
)(?:.*QRY\((?<qryName>.*?)\))*(?#
FILE or QRYFILE clause
)(?:.*FILE\(\((?<qryFile01>.*?)\)(?:[[:blank:]]\((?<qryFile02>.*?)\)\))*)*

Try '~(?mi)(?:(?:^.*?/\*(?:(?!\*/).)*QRY.*$(*SKIP)(*FAIL))|^(?<prgName>.+?)[[:blank:]](?:.*QRY\((?<qryName>.*?)\))*(?:.*FILE\(\((?<qryFile01>.*?)\)(?:[[:blank:]]\((?<qryFile02>.*?)\)\))*)*)~'
features
line oriented only, does not validate comments
SKIP's line if QRY after start of comment begin
passes line where QRY is outside comment
demo

Related

How can I create a Regex that matches and transforms a period delimited path?

I am using den4b Renamer to rename a lot of files that follow a specific pattern. The program allows me to use RegEx: (https://www.den4b.com/wiki/ReNamer:Regular_Expressions)
I am stuck trying to conjure up an expression for a specific pattern.
My current RegEx:
Expression: ^(com\.)(([\w\s]*\.){0,4})([\w\s]*)$
Replace: \L$1\L$2\u$4
Note: \L and \u transform the sub-expression to upper and lower case as defined in the table below:
Here are a few example strings so you can get an idea of the input:
Android File Transfer.svg
Angular Console.svg
Au.Edu.Uq.Esys.Escript.svg
Avidemux.svg
Blackmagic Fusion8.svg
Broken Sword.svg
Browser360 Beta.svg
Btsync GUI.svg
Buttercup Desktop.svg
Calc.svg
Calibre EBook Edit.svg
Calibre Viewer.svg
Call Of Duty.svg
com.GitHub.Plugarut.Pwned Checker.svg
com.GitHub.Plugarut.Wingpanel Monitor.svg
com.GitHub.Rickybas.Date Countdown.svg
com.GitHub.Spheras.Desktopfolder.svg
com.GitHub.Themix Project.Oomox.svg
com.GitHub.Unrud.Remote Touchpad.svg
com.GitHub.Unrud.Video Downloader.svg
com.GitHub.Weclaw1.Image Roll.svg
com.GitHub.Zelikos.Rannum.svg
com.Gitlab.Miridyan.Mt.svg
com.Inventwithpython.Flippy.svg
com.Neatdecisions.Detwinner.svg
com.Rafaelmardojai.Share Preview.svg
com.Rafaelmardojai.Webfont Kit Generator.svg
Distributor Logo Antix.svg
Distributor Logo Archlabs.svg
Distributor Logo Dragonflybsd.svg
DOSBox.svg
Drawio.svg
Drweb GUI.svg
For this question I am focused on the strings that begin with com.xxx.xxx.
Since I can't only target those names in Renamer, the expression has to "play nice" with the other input file names and correctly leave them alone. That's why I've prefixed my expression with ^(com\.)
What I want:
Transform the entire string to lower case except for the last period separated part of the string.
Strip white space from the entire string.
For instance:
Original: com.GitHub.Alcadica.Develop.svg
After my Regex: com.github.alcadica.Develop.svg
What I want: com.github.alcadica.Develop.svg
This specific file is correctly renamed. What I'm having trouble with are names that have spaces in any part of the string. I can't figure out how to strip whitespace:
Original: com.Belmoussaoui.Read it Later.svg
After my Regex: com.belmoussaoui.Read it Later.svg
What I want: com.belmoussaoui.ReaditLater.svg
Here is a hypothetical example because I couldn't find a file with more than four parts. I want my pattern to be robust enough to handle this:
Original: com.Shatteredpixel.Another Level.Next.Pixel Dungeon.svg
After my Regex: com.shatteredpixel.another level.next.Pixel Dungeon.svg
What I want: com.shatteredpixel.anotherlevel.next.PixelDungeon.svg
Note that since I'm not using any kind of programming language, I don't have access to common string operations like trim, etc. I can, however, stack expressions. But this would create more overhead and since I am renaming thousands of files at a time I'd ideally like to keep it to one find/replace expression.
Any help would be greatly appreciated. Please let me know if I can provide any more information to make this more clear.
Edit:
I got it to work with the following rules:
Really inefficient, but it works. (Thanks to Jeremy in the comments for the idea)

How to format a WinMerge fllter to ignore part of the line

I would like WinMerge to compare the full text but exclude a variable substring.
Orientation="West" PhysicalAddress="2395226" DefFieldFrmt="Uf4d0" UnitCustomText="sec"
Orientation="West" PhysicalAddress="2395230" DefFieldFrmt="Uf4d1" UnitCustomText="sec"
In the lines above I want to ignore the PhysicalAddress="xxx" and locate the changed DefFieldFrmt="Uf4d1"
I have tried adding the filter:
PhysicalAddress=".*"
However this filters the complete line.
The actual text before and after the PhysicalAddress="xxx" will vary so I need a filter that says: match prefix and match suffix but ignore target variable substring.
Help please.

According to the documentation, is not possible to use the line filters for this:
When a rule matches any part of the line, the entire difference is ignored. Therefore, you cannot filter just part of a line.
However, since WinMerge's source code is on GitHub, it is possible to add a feature request for this to its list of issues.

Vim: How to apply external command only to lines matching pattern

Two of my favorite Vim features are the ability to apply standard operators to lines matching a regex, and the ability to filter a selection or range of lines through an external command. But can these two ideas be combined?
For example, I have a text file that I use as a lab notebook, with notes from different dates separated by a line of dashes. I can do something like delete all the dash-lines with something like :% g/^-/d. But let's say I wanted to resize all the actual text lines, without touching those dash lines.
For a single paragraph, this would be something like {!}fmt. But how can this be applied to all the non-dash paragraphs? When I try what seems the logical thing, and just chain these two together with :% v/^-/!fmt, that doesn't work. (In fact, it seems to crash Vim...)
Is there a way to connect these two ideas, and only pass lines (not) matching a pattern into an external command like fmt?

Consider how the :global command works.
:global (and :v) make two passes through the buffer,
first marking each line that matches,
then executing the given command on the marked lines.
Thus if you can come up with a command – be it an Ex command or a command-line tool – and an associated range that can be applied to each matching line (and range), you have a winner.
For example, assuming that your text is soft-wrapped and your paragraphs are simply lines that don't begin with minus, here's how to reformat the paragraphs:
:v/^-/.!fmt -72
Here we used the range . "current line" and thus filtered every matching line through fmt. More complicated ranges work, too. For instance, if your text were hard-wrapped and paragraphs were defined as "from a line beginning with minus, up until the next blank line" you could instead use this:
:g/^-/.,'}!fmt -72
Help topics:
:h multi-repeat
:h :range!
:h :range

One way to do it may be applying the command to the lines matching the pattern 'not containing only dashes'
The solution I would try the is something like (not tested):
:g/\v^(-+)#!/normal V!fmt
EDIT I was doing some experiments and I think a recurvie macro should work for you
first of all set nowrapscan:
set nowrapscan
To prevent the recursive macro executing more than you want.
Then you make a search:
/\v^(-+)#!
Test if pressing n and p works with your pattern and tune it up if needed
After that, start recording the macro
qqn:.!awk '{print $2}'^M$
In this case I use awk as an example .! means filter current line with an external program
Then to make the macro recursive just append the string '#q' to the register #q
let #q .= '#q'
And move to the beggining of the buffer to apply the recursive macro and make the modifications:
gg#q
Then you are done. Hope this helps

notepad++ regular expressions to convert lines for SPSS syntax editor

I am curently busy with bulding a synthax document in SPSS and have a column of variable strings that consists of approximately 40 lines (it will be much much more in coming week). SPSS has a nice way of creating it (can be seen here :)
http://vault.hanover.edu/~altermattw/methods/stats/reliable/reliability-1.html) but it can be done per one variable at a time which is possible to automatize.
I am a total beginner (I wouldn't mind if you would call me n00b) at search&replace with reqular expressions in notepad++ but I can use the extended search function as a basic user :P
The data contains scores Likert scale (from 1-7) and I would like to reverse it to do some tests.
For example: my variable name on the line is q_4_SQ001 and the sline in synthax editor is q_4_SQ001=COMPUTE q_4_SQ001r=8-q_4_SQ001.
My question so far is thus:
How can I convert a line containing a unique variable name into it's revers formula?
So in this case, how can I replace the following lines:
q_4_SQ001
q_4_SQ002
q_4_SQ003
q_4_SQ004
into the synthax given under:
COMPUTE q_4_SQ001r=8-q_4_SQ001.
COMPUTE q_4_SQ002r=8-q_4_SQ002.
COMPUTE q_4_SQ003r=8-q_4_SQ003.
COMPUTE q_4_SQ004r=8-q_4_SQ004.
Please remark the dots in the end of each line I did this manually to give you an impression of what I would like to achieve. My data set has different questions and different variable strings so I would like to make my life a bit easier right now :P
I also tried recording and running a macro as stated in here (http://stackoverflow.com/questions/2467875/notepad-replace-all-regular-expression-start-of-the-line-and-end-of-the-line) but that still is pretty time consuming since I have to do each line manulally and clean up with extended search in the end.
Wouldn't it be easier to convert each line?
Thanks a bunch in advance :)

Funny, Notepad++ works under Wine, as I just found out ;)
New file, inserted:
q_4_SQ001
q_4_SQ002
q_4_SQ003
q_4_SQ004
Select all (CTRL+A), replace (CTRL+R).
Tick Regular Expr, stick ^(.*)$ in the "find" bit (first textbox), and COMPUTE \1r=8-\1. in the "replace" bit (second textbox). Hit the Find button, and then the Replace Rest button.
Parenthesis () around a pattern cause the pattern to be "memorised", each set of parenthesis available to the replacement pattern via \1, \2, etc.
After the replace, I got:
COMPUTE q_4_SQ001r=8-q_4_SQ001.
COMPUTE q_4_SQ002r=8-q_4_SQ002.
COMPUTE q_4_SQ003r=8-q_4_SQ003.
COMPUTE q_4_SQ004r=8-q_4_SQ004.
Which I assume is what you wanted. Enjoy.

Use cases for regular expression find/replace

I recently discussed editors with a co-worker. He uses one of the less popular editors and I use another (I won't say which ones since it's not relevant and I want to avoid an editor flame war). I was saying that I didn't like his editor as much because it doesn't let you do find/replace with regular expressions.
He said he's never wanted to do that, which was surprising since it's something I find myself doing all the time. However, off the top of my head I wasn't able to come up with more than one or two examples. Can anyone here offer some examples of times when they've found regex find/replace useful in their editor? Here's what I've been able to come up with since then as examples of things that I've actually had to do:
Strip the beginning of a line off of every line in a file that looks like:
Line 25634 :
Line 632157 :
Taking a few dozen files with a standard header which is slightly different for each file and stripping the first 19 lines from all of them all at once.
Piping the result of a MySQL select statement into a text file, then removing all of the formatting junk and reformatting it as a Python dictionary for use in a simple script.
In a CSV file with no escaped commas, replace the first character of the 8th column of each row with a capital A.
Given a bunch of GDB stack traces with lines like
#3 0x080a6d61 in _mvl_set_req_done (req=0x82624a4, result=27158) at ../../mvl/src/mvl_serv.c:850
strip out everything from each line except the function names.
Does anyone else have any real-life examples? The next time this comes up, I'd like to be more prepared to list good examples of why this feature is useful.

Just last week, I used regex find/replace to convert a CSV file to an XML file.
Simple enough to do really, just chop up each field (luckily it didn't have any escaped commas) and push it back out with the appropriate tags in place of the commas.

Regex make it easy to replace whole words using word boundaries.
(\b\w+\b)
So you can replace unwanted words in your file without disturbing words like Scunthorpe

Yesterday I took a create table statement I made for an Oracle table and converted the fields to setString() method calls using JDBC and PreparedStatements. The table's field names were mapped to my class properties, so regex search and replace was the perfect fit.
Create Table text:
...
field_1 VARCHAR2(100) NULL,
field_2 VARCHAR2(10) NULL,
field_3 NUMBER(8) NULL,
field_4 VARCHAR2(100) NULL,
....
My Regex Search:
/([a-z_])+ .*?,?/
My Replacement:
pstmt.setString(1, \1);
The result:
...
pstmt.setString(1, field_1);
pstmt.setString(1, field_2);
pstmt.setString(1, field_3);
pstmt.setString(1, field_4);
....
I then went through and manually set the position int for each call and changed the method to setInt() (and others) where necessary, but that worked handy for me. I actually used it three or four times for similar field to method call conversions.

I like to use regexps to reformat lists of items like this:
int item1
double item2
to
public void item1(int item1){
}
public void item2(double item2){
}
This can be a big time saver.

I use it all the time when someone sends me a list of patient visit numbers in a column (say 100-200) and I need them in a '0000000444','000000004445' format. works wonders for me!
I also use it to pull out email addresses in an email. I send out group emails often and all the bounced returns come back in one email. So, I regex to pull them all out and then drop them into a string var to remove from the database.
I even wrote a little dialog prog to apply regex to my clipboard. It grabs the contents applies the regex and then loads it back into the clipboard.

One thing I use it for in web development all the time is stripping some text of its HTML tags. This might need to be done to sanitize user input for security, or for displaying a preview of a news article. For example, if you have an article with lots of HTML tags for formatting, you can't just do LEFT(article_text,100) + '...' (plus a "read more" link) and render that on a page at the risk of breaking the page by splitting apart an HTML tag.
Also, I've had to strip img tags in database records that link to images that no longer exist. And let's not forget web form validation. If you want to make a user has entered a correct email address (syntactically speaking) into a web form this is about the only way of checking it thoroughly.

I've just pasted a long character sequence into a string literal, and now I want to break it up into a concatenation of shorter string literals so it doesn't wrap. I also want it to be readable, so I want to break only after spaces. I select the whole string (minus the quotation marks) and do an in-selection-only replace-all with this regex:
/.{20,60} /
...and this replacement:
/$0"¶ + "/
...where the pilcrow is an actual newline, and the number of spaces varies from one incident to the next. Result:
String s = "I recently discussed editors with a co-worker. He uses one "
+ "of the less popular editors and I use another (I won't say "
+ "which ones since it's not relevant and I want to avoid an "
+ "editor flame war). I was saying that I didn't like his "
+ "editor as much because it doesn't let you do find/replace "
+ "with regular expressions.";

The first thing I do with any editor is try to figure out it's Regex oddities. I use it all the time. Nothing really crazy, but it's handy when you've got to copy/paste stuff between different types of text - SQL <-> PHP is the one I do most often - and you don't want to fart around making the same change 500 times.

Regex is very handy any time I am trying to replace a value that spans multiple lines. Or when I want to replace a value with something that contains a line break.
I also like that you can match things in a regular expression and not replace the full match using the $# syntax to output the portion of the match you want to maintain.

I agree with you on points 3, 4, and 5 but not necessarily points 1 and 2.
In some cases 1 and 2 are easier to achieve using a anonymous keyboard macro.
By this I mean doing the following:
Position the cursor on the first line
Start a keyboard macro recording
Modify the first line
Position the cursor on the next line
Stop record.
Now all that is needed to modify the next line is to repeat the macro.
I could live with out support for regex but could not live without anonymous keyboard macros.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expression help skip lines "/*X" but not "/**/ - regex

Related

How can I create a Regex that matches and transforms a period delimited path?

How to format a WinMerge fllter to ignore part of the line

Vim: How to apply external command only to lines matching pattern

notepad++ regular expressions to convert lines for SPSS syntax editor

Use cases for regular expression find/replace

Categories

Resources