RegEx - How to select the second comma and everything after it - regex

I'm using UltraEdit. I have a text file that contains strings like this
Workspace\\Trays\\Dialogs\\Components, Expand, kThisComputerOnly, P_BOOLEAN },
WebCommonDialog Sign_Out, Left, kThisComputerOnly, P_INTEGER_RANGE(0, 4096) },
ThreeDTextDlg, x, kThisComputerOnly, P_INTEGER_RANGE(0, 4096) },
Preferences\\Graphics, CtxDbgMaxGLVersionMajor, kThisComputerOnly, P_INTEGER },
UltraEdit allows PERL, UNIX and UltraEdit style RegEx.
I need to select the second comma and everything to the end of the line and delete it.
Using regexpal.com I've tried several different approaches but can't figure it out.
/,\s.+/ selects the first comma
/[,]\s.+/ same as above
I can't figure out how to select the second command and beyond.
I have also search StackOverflow and found several examples but couldn't change them to work for me.
Thanks.

You may use a Perl regex option with the following pattern:
^([^,]*,[^,]*),.*
and replace with \1.
See the regex demo.
Details:
^ - start of string
([^,]*,[^,]*) - Group 1 (later referred to with \1 backreference from the replacement pattern):
[^,]* - any 0+ chars other than a comma (to prevent overflowing across lines, add \n\r into the negated character class - [^,\n\r]*)
, - a comma
[^,]* - any 0+ chars other than a comma
, - a comma
.* - any 0+ chars other than line break chars as many as possible

Related

How to select the first space only, not including other characters

I want to fix some formatted strings with 'find and replace' in Visual Studio Code.
To do so, I have to select first spaces only in each line, not including characters.
The format goes like this :
dc932a17 3919734822 5234dce7debe.mp4
e_f943 4961243553 03be639fa8b7.mp4
9cbcc2 4365389628 e741018829d6.mp4
543419d 4639618462 d0bd72c9b737.mp4
Desired outputs look like :
dc932a17-3919734822 5234dce7debe.mp4
e_f943-4961243553 03be639fa8b7.mp4
9cbcc2-4365389628 e741018829d6.mp4
543419d-4639618462 d0bd72c9b737.mp4
So what I want to select are :
dc932a17 3919734822 5234dce7debe.mp4
|------|^These spaces
^Not these characters
So I made an regex like this :
^(?:([a-zA-Z0-9_]+))\s
But this selects all the characters before the first space including in it.
dc932a17 3919734822 5234dce7debe.mp4
|-------|
^Selected
Is there anything I got wrong?
condition:
The characters' length before spaces vary. I can't use alt+shift+Drag selection
You can use
Find: ^(\S+)\s+
Replace: $1-
See the regex demo.
Note: If there are any leading whitespace chars on the qualifyinf line, you need to add \s* after ^(, i.e. ^(\s*\S+)\s+.
Details:
^ - start of a line
(\s*\S+) - Group 1 ($1): zero or more whitespace chars (but not line break chars here since the pattern does not contain \r or \n) and then one or more non-whitespace chars
\s+ - one or more whitespaces (except line break chars).

How to clear lines after the last regex match

I got an huge log of records I need to turn into a table.
Each line has a record, preceded by date and time, something like this:
27/11/2019 16:35 - i don't need this
28/11/2019 17:25 - don't need this either
30/11/2019 11:33 - stuff i'm looking for
01/12/2019 08:11 - stuff that i'm also looking for
03/11/2019 09:39 - don't need this
I want to completely clear the file from all the lines that I don't need.
I'm able to clear most of the lines that I don't want if I use the following regex and substitution patterns (in notepad++, using the flag in which dot matches newline):
.+?(?<datetime>[\d\/]+\s[\d:]+)\s-\s(?<mystuff>stuff[^\n]+)
'${datetime};${mystuff}
However, I can't clear the lines after the last match. How could I do so?
You may use
Find What: ^(?:.+?([\d/]+\h[\d:]+)\h-\h(stuff.*)|.*\R?)
Replace With: (?{1}$1;$2)
Details
^ - start of a line
(?:.+?([\d/]+\h[\d:]+)\h-\h(stuff.*)|.*\R?) - match either
.+? - any 1+ chars, as few as possible
([\d/]+\h[\d:]+) - Group 1: one or more digits or /, a horizontal whitespace, one or more digits or :
\h-\h - a horizontal whitespace, - and a hor. whitespace
(stuff.*) - Group 2: stuff and the rest of the line
| - or
.* - any 0+ chars other than linebreak chars
\R? - an optional line break sequence.
The (?{1}$1;$2) replacement pattern only replaces with $1;$2 if Group 1 matches.
See the Notepad++ demo:

Sublime Regex extract

<.*>|\n.*\s.*\sid="(\w*)".*\n+|.*>\n|\n.+
and replace $1
This regex can take all id out from file
<a href="java" class="total" id="maker" placeholder="getTheResult('local6')">master6<a>
Result is maker
How can I extract getTheResult key name?
so my result will be local6
Tried <.*>|\n.*\s.*\sgetTheResult('(\w*)').*\n+|.*>\n|\n.+ but didn't helped
I assume that:
you have files with text like getTheResult('local6')
you may have several values like that on a line
you'd like to keep those text only, one value per line.
I suggest
getTheResult\('([^']*)'\)|(?:(?!getTheResult\(')[\s\S])*
and replace with $1\n. The \n will insert a newline between the values. You can then use ^\n regex (to replace with empty string) to remove empty lines.
Pattern details:
getTheResult\(' - matches getTheResult(' as a literal string (note the ( is escaped)
([^']*) - Group 1 capturing 0+ chars other than '
'\) - a literal ')
| - or
(?:(?!getTheResult\(')[\s\S])* - 0+ chars that are not starting chars of the getTheResult(' character sequence (this is a tempered greedy token).

Regex lookahead/lookbehind match for SQL script

I'm trying to analyse some SQLCMD scripts for code quality tests. I have a regex not working as expected:
^(\s*)USE (\[?)(?<![master|\$])(.)+(\]?)
I'm trying to match:
Strings that start with USE (ignore whitespace)
Followed by optional square bracket
Followed by 1 or more non-whitespace characters.
EXCEPT where that text is "master" (case insensitive)
OR EXCEPT where that that text is a $ symbol
Expected results:
USE [master] - don't match
USE [$(CompiledDatabaseName)] - don't match
USE [anything_else.01234] - match
Also, the same patterns above without the [ and ] characters.
I'm using Sublime Text 2 as my RegEx search tool and referencing this cheatsheet
Your pattern - ^(\s*)USE (\[?)(?<![master|\$])(.)+(\]?) - uses a lookbehind that is variable-width (its length is not known beforehand) if you fix the character class issue inside it (i.e. replace [...] with (...) as you mean an alternative list of $ or a character sequence master) and thus is invalid in a Boost regex. Your (.)+ capturing is wrong since this group will only contain one last character captured (you could use (.+)), but this also matches spaces (while you need 1 or more non-whitespace characters). ? is the one or zero times quantifier, but you say you might have 2 opening and closing brackets (so, you need a limiting quantifier {0,2}).
You can use
^\h*USE(?!\h*\[{0,2}[^]\s]*(?:\$|(?i:master)))\h*\[{0,2}[^]\s]*]{0,2}
See regex demo
Explanation:
^ - start of a line in Sublime Text
\h* - optional horizontal whitespace (if you need to match newlines, use \s*)
USE - a literal case-sensitive character sequence USE
(?!\h*\[{0,2}[^]\s]*(?:\$|(?i:master))) - a negative lookahead that makes sure the USE is NOT followed with:
\h* - zero or more horizontal whitespace
\[{0,2} - zero, one or two [ brackets
[^]\s]* - zero or more characters other than ] and whitespace
(?:\$|(?i:master)) - either a $ or a case-insensitive master (we turn off case sensitivity with (?i:...) construct)
\h* - go on matching zero or more horizontal whitespace
\[{0,2} - zero, one or two [ brackets
[^]\s]* - zero or more characters other than ] and whitespace (when ] is the first character in a character class, it does not have to be escaped in Boost/PCRE regexps)
]{0,2} - zero, one or two ] brackets (outside of character class, the closing square bracket does not need escaping)

Regular Expressions - Greedy but stop before a string match

I have the some data and i'd like to convert it into a table format.
Here's the input data
1- This is the 1st line with a
newline character
2- This is the 2nd line
Each line may contain multiple newline characters.
Output
<td>1- This the 1st line with
a new line character</td>
<td>2- This is the 2nd line</td>
I've tried the following
^(\d{1,3}-)[^\d]*
but it seems to match only till the digit 1 in 1st.
I'd like to be able to stop matching after i find another \d{1,3}\- in my string.
Any suggestions?
EDIT:
I'm using EditPad Lite.
This is for vim, and uses zerowidth positive-lookahead:
/^\d\{1,3\}-\_.*[\r\n]\(\d\{1,3\}-\)\#=
Steps:
/^\d\{1,3\}- 1 to 3 digits followed by -
\_.* any number of characters including newlines/linefeeds
[\r\n]\(\d\{1,3\}-\)\#= followed by a newline/linefeed ONLY if it is followed
by 1 to 3 digits followed by - (the first condition)
EDIT: This is how it would be in pcre/ruby:
/(\d{1,3}-.*?[\r\n])(?=(?:\d{1,3}-)|\Z)/m
Note you need a string ending with a newline to match the last entry.
SEARCH: ^\d+-.*(?:[\r\n]++(?!\d+-).*)*
REPLACE: <td>$0</td>
[\r\n]++ matches one or more carriage-returns or linefeeds, so you don't have to worry about whether the file use Unix (\n), DOS (\r\n), or older Mac (\r) line separators.
(?!\d+-) asserts that the first thing after the line separator is not another line number.
I used the possessive + in [\r\n]++ to make sure it matches the whole separator. Otherwise, if the separator is \r\n, [\r\n]+ could match the \r and (?!\d+-) could match the \n.
Tested in EditPad Pro, but it should work in Lite as well.
You did not specify a language (there are many regexp implementations), but in general, what you are looking for is called "positive lookahead", which lets you add patterns that will influence the match, but will not become part of it.
Search for lookahead in the documentation of whatever language you are using.
Edit: the following sample seems to work in vim.
:%s#\v(^\d+-\_.{-})\ze(\n\d+-|%$)#<td>\1</td>
Annotation below:
% - for all lines
s# - substitute the following (you can use any delimiter, and slash is most
common, but as that will require that we escape slashes in the command
I chose to use the number sign)
\v - very magic mode, let's us use less backslashes
( - start group for back referencing
^ - start of line
\d+ - one or more digits (as many as possible)
- - a literal dash!
\_. - any character, including a newline
{-} - zero or more of these (as few as possible)
) - end group
\ze - end match (anything beyond this point will not be included in the match)
( - start a new group
[\n\r] - newline (in any format - thanks Alan)
\d+ - one or more digits
- - a dash
| - or
%$ - end of file
) - end group
# - start substitute string
<td>\1</td> - a TD tag around the first matched group
(\d+-.+(\r|$)((?!^\d-).+(\r|$))?)
You can match only the separators and split on them. In C#, for example, it could be done like this:
string s = "1- This is the 1st line with a \r\nnewline character\r\n2- This is the 2nd line";
string ss = "<td>" + string.Join("</td>\r\n<td>", Regex.Split(s.Substring(3), "\r\n\\d{1,3}- ")) + "</td>";
MessageBox.Show(ss);
Would it be good for you to do it in 3 steps?
(these are perl regex):
Replace the first:
$input =~ s/^(\d{1,3})/<td>\1/;
Replace the rest
$input =~ s/\n(\d{1,3})/<\/td>\n<td>\1/gm;
Add the last:
$input .= '</td>';