Regex capture group with non-uniform space group - regex
I'm trying to parse the output of the "display interface brief" Comware switch command to convert it to a CSV file using RegEx. This command is printed using the following format:
Interface Link Speed Duplex Type PVID Description
BAGG51 UP 4G(a) F(a) T 1
FGE1/0/42 DOWN auto A T 1 ### LIVRE ###
GE6/0/20 UP 100M(a) F(a) A 1 LIVRE (MGMT - [WAN8-P8]
It's seems quite challenging for me because doesn't matter which RegEx I try, it doesn't properly handle "DOWN auto" and "100M(a) F(a)" output that has only one space between them. I also couldn't find a way to properly handle the last field, that can contain one or more spaces, but into most RegEx that I tried it create a separate capture group for each space instead of handling it's text content properly.
I'd also tried countless ways to try to parse it, and I couldn't find much content about parsing non-uniform columns into the Internet and StackOverflow community.
I need to parse it into the following format, with 7 capture groups per line, respecting the end of line:
BAGG51;UP;4G(a);F(a);T;1
FGE1/0/42;DOWN;auto;A;T;1;### LIVRE ###
GE6/0/20;UP;100M(a);F(a);A;1;LIVRE (MGMT - [WAN8-P8]
The most successfully RegEx that I found so far was: ^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+) replacing it to $1;$2;$3;$4;$5;$6;$7 using Notepad++ but it doesn't properly handle the "Description" field, that can be empty.
The following pattern seems to be working here:
^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)(?:[ ]+(.*))?
This follows your pattern with six mandatory capture groups, followed by an optional seventh capture group. The (?:[ ]+(\S+))? at the end of the pattern matches one or more spaces followed by the content. Note that this pattern should be used in multiline mode.
Here is a working demo
Related
How to use Postgres Regex Replace with a capture group
As the title presents above I am trying to reference a capture groups for a regex replace in a postgres query. I have read that the regex_replace does not support using regex capture groups. The regex I am using is r"(?:[\s\(\)\=\)\,])(username)(?:[\s\(\)\=\)\,])?"gm The above regex almost does what I need it to but I need to find out how to only allow a match if the capture groups also capture something. There is no situation where a "username" should be matched if it just so happens to be a substring of a word. By ensuring its surrounded by one of the above I can much more confidently ensure its a username. An example application of the regex would be something like this in postgres (of course I would be doing an update vs a select): select *, REGEXP_REPLACE(reqcontent,'(?:[\s\(\)\=\)\,])(username)(?:[\s\(\)\=\)\,])?' ,'NEW-VALUE', 'gm') from table where column like '%username%' limit 100; If there is any more context that can be provided please let me know. I have also found similar posts (postgresql regexp_replace: how to replace captured group with evaluated expression (adding an integer value to capture group)) but that talks more about splicing in values back in and I don't think quite answers my question. More context and example value(s) for regex work against. The below text may look familiar these are JQL filters in Jira. We are looking to update our usernames and all their occurrences in the table that contains the filter. Below is a few examples of filters. We originally were just doing a find a replace but that doesn't work because we have some usernames that are only two characters and it was matching on non usernames (e.g je (username) would place a new value in where the word project is found which completely malforms the JQL/String resulting in something like proNEW-VALUEct = balh blah) type = bug AND status not in (Closed, Executed) AND assignee in (test, username) assignee=username assignee = username Definition of Answered: Regex that will only match on a 'username' if its surrounded by one of the specials A way to regex/replace that username in a postgres query.
Capturing groups are used to keep the important bits of information matched with a regex. Use either capturing groups around the string parts you want to stay in the result and use their placeholders in the replacement: REGEXP_REPLACE(reqcontent,'([\s\(\)\=\)\,])username([\s\(\)\=\)\,])?' ,'\1NEW-VALUE\2', 'gm') Or use lookarounds: REGEXP_REPLACE(reqcontent,'(?<=[\s\(\)\=\)\,])(username)(?=[\s\(\)\=\)\,])?' ,'NEW-VALUE', 'gm') Or, in this case, use word boundaries to ensure you only replace a word when inside special characters: REGEXP_REPLACE(reqcontent,'\yusername\y' ,'NEW-VALUE', 'g')
Regex in Notepad++ to select on string length between specific XML tags
I'm working with Emergency Services data in the NEMSIS XSD. I have a field, which is constrained to only 50 characters. I've searched this site extensively, and tried many solutions - Notepad++ rejects all of them, saying not found. Here's an XML Sample: <E09> <E09_01>-5</E09_01> <E09_02>-5</E09_02> <E09_03>-5</E09_03> <E09_04>-5</E09_04> <E09_05>this one is too long Non-Emergency - PT IS BEING DISCHARGED FROM H AFTER BEING ADMITTED FOR FAILURE TO THRIVE AND ALCOHOL WITHDRAWAL</E09_05> </E09> <E09> <E09_01>-5</E09_01> <E09_02>-5</E09_02> <E09_03>-5</E09_03> <E09_04>-5</E09_04> <E09_05>this one is is okay</E09_05> </E09> I've tried solutions naming the E09_05 tag in different ways, using <\/E09_05> for the closing tag as I've seen in some examples, and as just </E09_05> as I've seen in others. I've tried ^.{50,}$ between them, or [a-zA-Z]{50,}$ between them, I've tried wrapping those in-between expressions in () and without. I even tried just [\s\S]*? in between the tags. The only thing that Notepad++ finds is when I use ^.{50,}$ by itself with no XML tags ... but then I wind up hitting on all the E13_01 tags (which are EMS narratives, and always > 50 characters) -- making for painstaking and wrist-aching clicks. I wanted to XSLT this, but there is too much individual, hands on tweeking of each E09_05 field for automating it. Perl is not an option in this environment (and not a tool I know at all anyway). To be truly sublime, both E09_05 and E09_08 fields with string lengths >50 need to be what is selected on the search ... but no other elements of any kind or length. Thanks in advance. I'm sure I'm just missing some subtle \, or () or [] somewhere ... hopefully ...
The following regex will find the text content of <E09_05> elements with more than 50 characters. (?<=<E09_05>).{51,}?(?=</E09_05>) Explanation (?<=<E09_05>) Start matching right after <E09_05> .{51,}? Match 51 or more characters (in a single line) The ? makes it reluctant, so it'll stop at first </E09_05> (?=</E09_05>) Stop matching right before </E09_05> For truly sublime matching, i.e. both E09_05 and E09_08 fields with string lengths >50, use: (?<=<(E09_0[58])>).{51,}?(?=</\1>) Explanation <(E09_0[58])> Match <E09_05> or <E09_08>, and capture the name as group 1 </\1> Use \1 backreference to match name inside </name> If you want to shorten the text with ellipsis at the end, e.g. Hello World with max length 8 becomes Hello..., use: Find what: (?<=<(E09_0[58])>)(.{47}).{4,}(?=</\1>) Replace with: \2...
How to combine multiple RegEx commands for Notepad++ using capture groups and alternations?
I am converting exported SQL views as files to a different syntax using a separate specialized conversion tool. This tool can't handle certain commands and formatting so I'm using Notepad++ with RegEx to alter the files ahead of time. So far I am getting the results that I want, but it takes three separate Find/Replace actions. I'd like to reduce these three RegEx actions down to one if possible. Find: (.*)(CREATE VIEW.*\nGO)(.*) Replace: \2 Find: (CREATE VIEW )(.*)(\r\nAS) Replace: \1"\2"\3 Find: (oldschema1\.|\[oldschema1\]\.|\[|\]|TOP \(100\) PERCENT|oldschema2\.)|(^GO$)|(\A^(.*?)) Replace: (?1)(?2\;)(?3SET SCHEMA schemaname\; \n\n\1)``` I'm using Notepad++ 7.7.1 64-bit, Find/Replace with Regular Expression search mode - ". matches newline" check on. You'll see in my code that I'm already using capture groups with alternation. I thought I could combine the first two RegEx steps as additional capture groups to Step 3 but it doesn't work out, possibly because they are nested. I tried referencing the nested groups by incrementing the referencing number accordingly, but it doesn't work (blanks out the result). Here is an example SQL view file. It's not a working view because I added "oldschema2" so the RegEx would have something to find for one of the replacements, but it's representative as an example here. garbage text beforehand CREATE VIEW [oldschema1].[viewname] AS SELECT DISTINCT TOP (100) PERCENT oldschema1.TABLENAME.FIELD1, oldschema1.TABLENAME.FIELD2 FROM oldschema1.TABLENAME WHERE (oldschema1.TABLENAME.FIELD3 = N'Z003') AND oldschema2.TABLENAME.FIELD2 = 1 ORDER BY oldschema1.TABLENAME.FIELD1 GO garbage text after Here is some additional details of what I'm trying to achieve with each pass. Notepad++ RegEx Step 1 - isolate view block from CREATE VIEW to GO Find: (.*)(CREATE VIEW.*\nGO)(.*) Replace: \2 Step 2 - put quotes around view name Find: (CREATE VIEW )(.*)(\r\nAS) Replace: \1"\2"\3 Step 3 - remove/replace various texts and insert a line at the beginning of the file Find: (oldschema1\.|\[oldschema1\]\.|\[|\]|TOP \(100\) PERCENT|oldschema2\.)|(^GO$)|(\A^(.*?)) Replace: (?1)(?2\;)(?3SET SCHEMA schemaname\; \n\n\1) The expected output from the above example would be: SET SCHEMA schemaname; CREATE VIEW "viewname" AS SELECT DISTINCT TABLENAME.FIELD1, TABLENAME.FIELD2 FROM TABLENAME WHERE (TABLENAME.FIELD3 = N'Z003') AND TABLENAME.FIELD2 = 1 ORDER BY TABLENAME.FIELD1 ; which I achieve with the above three steps, but I'd like to do it in one Find/Replace if possible. I'm pretty new to RegEx, and StackOverflow for that matter. Your help is greatly appreciated.
Step 1 I'm not so sure about it, but I'm guessing that maybe we would want an expression similar to: [\s\S]*?(CREATE VIEW[\s\S]*GO\s*)[\s\S]* to be replaced with $1, where our desired data is in this capturing group: (CREATE VIEW[\s\S]*GO\s*) and we can even remove \s*: (CREATE VIEW[\s\S]*GO) and just try: [\s\S]*?(CREATE VIEW[\s\S]*GO)[\s\S]* with an m flag. In the right panel of this demo, the expression is further explained, if you might be interested. Step 2 We can likely try: (CREATE VIEW)(.*) and replace with: SET SCHEMA schemaname;\n\n$1 "viewname" Demo Step 3 This step would probably be done with an expression similar to: TOP \(100\) PERCENT |oldschema1\. being replaced with an empty string. Demo Step 4: \s*GO being replaced with \n; or just ; and we might likely have the desired output, not sure though. Demo
Capture repeating group with RegEx
I am trying to parse an input line looking like this: AC#10,N850FD,10%,WEEK,IFR,1/22:45,2/00:58,390,F,0743,KEWR,3/02:30,3/05:04,380,F,1202,KMEM,3/11:15,3/20:04,350,F,0038,LFPG,4/04:00,4/15:35,330,F,5342,ZGGG,4/19:05,4/22:50,370,F,5608,RJAA,5/13:25,5/14:45,300,F,0060,RJBB,5/18:05,6/06:35,330,F,0060,KMEM,6/20:45,0/05:42,340,F,0948,PHNL,0/07:21,0/12:24,370,F,0802,KLAX,0/14:49,0/18:09,370,F,0806,KMEM The first 5 "fields" are the "header" ("AC#10,N850FD,10%,WEEK,IFR"), and the rest is are repeating groups of 6 "fields" (e.g. "1/22:45,2/00:58,390,F,0743,KEWR"). I'm a RegEx newbie, but to do this I have come up with the following RegEx statement: (AC#)(\d+),([a-zA-Z0-9]+),(\d+%),(WEEK|DAY),(IFR|VFR)(,\d\/\d{2}:\d{2},\d\/\d{2}:\d{2},\d+,[FR],\d+,[A-Z0-9]{3,5})+. The result of the first many groups (each "field" in the "header") are extracted fine, and I can easily access each value (group). However my problem is the following/repeating groups. Only the last of the repeating "groups" are extracted. If I remove the very last "+" only the first of the repeating "groups" are extracted (naturally). Example here: https://regex101.com/r/HsQMge/1 Here is the result I hope to get (as groups): AC# 10 N850FD 10% WEEK IFR ,1/22:45,2/00:58,390,F,0743,KEWR ,3/02:30,3/05:04,380,F,1202,KMEM ,3/11:15,3/20:04,350,F,0038,LFPG ,4/04:00,4/15:35,330,F,5342,ZGGG ,4/19:05,4/22:50,370,F,5608,RJAA ,5/13:25,5/14:45,300,F,0060,RJBB ,5/18:05,6/06:35,330,F,0060,KMEM ,6/20:45,0/05:42,340,F,0948,PHNL ,0/07:21,0/12:24,370,F,0802,KLAX ,0/14:49,0/18:09,370,F,0806,KMEM
Probably RegEx is not the right tool to do this task. Maybe you can use it just for splitting string into array. Rest job is for array_chunk : $str = "AC#10,N850FD,10%,WEEK,IFR,1/22:45,2/00:58,390,F,0743,KEWR,3/02:30,3/05:04,380,F,1202,KMEM,3/11:15,3/20:04,350,F,0038,LFPG,4/04:00,4/15:35,330,F,5342,ZGGG,4/19:05,4/22:50,370,F,5608,RJAA,5/13:25,5/14:45,300,F,0060,RJBB,5/18:05,6/06:35,330,F,0060,KMEM,6/20:45,0/05:42,340,F,0948,PHNL,0/07:21,0/12:24,370,F,0802,KLAX,0/14:49,0/18:09,370,F,0806,KMEM"; $data = preg_split('/[,#]/',$str); $data = array_chunk($data, 6); var_dump($data); Try it online!
I can't get it to work with one regular expression (still think it should be possible), however I got it working in two passes. First I use the following RegEx, to split the individual fields of the "header" into groups, and then grab the rest of the input line as the last group (using "(.*)" after the last comma): (AC#)(\d+),([a-zA-Z0-9]+),(\d+%),(WEEK|DAY),(IFR|VFR),(.*) This leaves me with the rest of the information in one single group ("1/22:45,2/00:58,390,F,0743,KEWR,3/02:30,3/05:04,380,F,1202,KMEM,3/11:15,3/20:04,350,F,0038,LFPG,4/04:00,4/15:35,330,F,5342,ZGGG,4/19:05,4/22:50,370,F,5608,RJAA,5/13:25,5/14:45,300,F,0060,RJBB,5/18:05,6/06:35,330,F,0060,KMEM,6/20:45,0/05:42,340,F,0948,PHNL,0/07:21,0/12:24,370,F,0802,KLAX,0/14:49,0/18:09,370,F,0806,KMEM"). I then parse this group with another regular expression, that groups the repeating sections (without a problem - now there is no longer a "header"): (\d\/\d{2}:\d{2},\d\/\d{2}:\d{2},\d+,[FR],\d+,[A-Z0-9]{3,4})+ The groups are as I had hoped (even better as "," is no longer part of the result). Odd its no working with the "header". Anyhow I don't have to resort to "manually" splitting the line, and the RegEx statements can still "validate" each section.
Using regex multiple capture groups to split up a string
I have a file that looks like this... "1234567123456","V","0","0","BLAH","BLAH","BLAH","BLAH" "1234567123456","D","TEST1 " "1234567123456","D","TEST 2~TEST3" "1234567123456","R","TEST4~TEST5" "1234567123457","V","0","0","BLAH","BLAH","BLAH","BLAH" "1234567123457","D","TEST 6" "1234567123457","D","TEST7" "1234567123457","R","TEST 8~TEST9~TEST,10" All I'm trying to do is parse the D and R lines. The ~ is used in this case as a separator. So the end results would be... "1234567123456","V","0","0","BLAH","BLAH","BLAH","BLAH" "1234567123456","D","TEST1 " "1234567123456","D","TEST3" "1234567123456","D","TEST3" "1234567123456","R","TEST4" "1234567123456","R","TEST5" "1234567123457","V","0","0","BLAH","BLAH","BLAH","BLAH" "1234567123457","D","TEST 6" "1234567123457","D","TEST7" "1234567123457","R","TEST 8" "1234567123457","R","TEST9" "1234567123457","R","TEST,10" I'm using regex on applications like Textpad and Notepad++. I have not figured out how to use a regex like /.+/g because the applications do not like the forward slashes. So I don't think I can use things like the global modifier. I currently have the following regex... //In a program like Textpad/Notepad++ <FIND> "(.{13})","D","([^~]*)~(.*) <REPLACE> "\1","D","\2"\n"\1","D","\3 Now if I run a find and replace with the above params a few times it would work fine (for the D lines only). The problem is there is an unknown number of lines to be made. For example... "1234567123456","D","TEST1~TEST2~TEST3~TEST4~TEST5" "1234567123457","D","TEST1~TEST2~TEST3" "1234567123458","D","TEST1~TEST2" "1234567123459","D","TEST1~TEST2~TEST3~TEST4" I was hoping to be able to use a MULTI capture group to make this work. I found this PAGE talking about the common mistake between repeating a capturing group and capturing a repeated group. I need to capture a repeated group. For some reason I just could not make mine work right though. Anyone else have an idea? Note: If I could get rid of the leading and trailing spaces EX: "1234567123456","D","TEST1 " ending up as "1234567123456","D","TEST1" that would be even better but not necessary. RESOURCES: http://www.regular-expressions.info/captureall.html http://regex101.com/