Regex to find 4th value inside bracket - regex

How i can read 4th Value(inside "" i.e "vV0...." using Regex in below condition ?
I am updating a bit this part - Is it possible to first find Word "LaunchFileUploader" and then select the 4th Value, if there are multiple instance of LaunchFileUploader in the file just select 4th Value of first word found ? Attaching screenshot of file where this needs to be searched (In the file word is "LaunchFileUploader")
I tried this but it gives as - I need 4th value (Group 1 is giving me third value)
\bLaunchFileUploader\b(\:?.*?,){3}.*?\)
Match 1
Full match 11030-11428 LaunchFileUploader("ERM-1BLX3D04R10-0001", 1662, "2ecbb644-34fa-4919-9809-a5ff47594c2d", "8dZOPyHKBK...
Group 1. n/a "2ecbb644-34fa-4919-9809-a5ff47594c2d",
I am still looking for solution for this. Any help is aprreciated.

Depending on what's available to you to use, there's a couple of ways to do it.
Either way, this would work better if there were no new lines in the string, just plain ("value1","value2","value3","value4") etc. It'll still work, but you may need to clean up some new lines from the resulting string.
The easy way - use code for the hard part. Grab the inner string with:
(?<=\().*?(?=\))
This will get everything that's between the 2 parentheses (using positive lookarounds). In code, you could then split/explode this string on , and take the 4th item.
If you want to do it all in regex, you could use something along the lines of:
(?<=\()(?:.*?,){3}(.*?)(?=\))
This would a) match the entire contents of the parentheses and b) capture the 4th option in a capture group. To go even deeper:
(?<=\()(?:.*?,){3}\"(.*?)\"(?=\))
would capture the contents of the "" quotation marks only.
Some tools don't allow you to use lookarounds, if this is the case let me know and I'll see what other ways there are around it.
EDIT Ran this in JS console on browser. This absolutely does work.
EDIT 2 I see you've updated your question with the text you're actually searching in. This pattern will include the space and the new line character as per the copy/paste of the above text.
(?<=\(\")(?:.*?,\s?\n?){3}\"(.*?)\"(?=\))
See my second image for the test in console

This works for python and PHP:
(?<=\")(.*)(?:\"\);)\Z
Demo for Python and PHP
For Java, replace \Z with $ as follows:
(?:")(.*)(?:\"\);)$
Demo for JavaScript
NOTE: Be sure to look the captured group and not the matched group.
UPDATE:
Try this for your updated request:
"(.*)"(?:[\\);\] \/>}]*)$
Demo for updated input string
all the above regex patterns assume there is a line break after each comma
Auto-generated Java Code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "\"(.*)\"(?:[\\\\);\\] \\/>\\}]*)$";
final String string = "\n"
+ "}$(document).ready( function(){ PathUploader\n"
+ " (\"ERM-1BLX3D04R10-0001\", \n"
+ " 1662, \n"
+ " \"1bff5c85-7a52-4cc5-86ef-a4ccbf14c5d5\", \n"
+ "\"vV0mX3VadCSPnN8FsAO7%2fysNbP5b3SnaWWHQETFy7ORSoz9QUQUwK7jqvCEr%2f8UnHkNNVLkJedu5l%2bA%2bne%2fD%2b2F5EWVlGox95BYDhl6EEkVAVFmMlRThh1sPzPU5LLylSsR9T7TAODjtaJ2wslruS5nW1A7%2fnLB%2bljZaQhaT9vZLcFkDqLjouf9vu08K9Gmiu6neRVSaISP3cEVAmSz5kxxhV2oiEF9Y0i6Y5%2f5ASaRiW21w3054SmRF0rq3IwZzBvLx0%2fAk1m6B0gs3841b%2fw%3d%3d\"); } );//]]>";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}

Related

The regex in string.format of LUA

I use string.format(str, regex) of LUA to fetch some key word.
local RICH_TAGS = {
"texture",
"img",
}
--\[((img)|(texture))=
local START_OF_PATTER = "\\[("
for index = 1, #RICH_TAGS - 1 do
START_OF_PATTER = START_OF_PATTER .. "(" .. RICH_TAGS[index]..")|"
end
START_OF_PATTER = START_OF_PATTER .. "("..RICH_TAGS[#RICH_TAGS].."))"
function RichTextDecoder.decodeRich(str)
local result = {}
print(str, START_OF_PATTER)
dump({string.find(str, START_OF_PATTER)})
end
output
hello[img=123] \[((texture)|(img))
dump from: [string "utils/RichTextDecoder.lua"]:21: in function 'decodeRich'
"<var>" = {
}
The output means:
str = hello[img=123]
START_OF_PATTER = \[((texture)|(img))
This regex works well with some online regex tools. But it find nothing in LUA.
Is there any wrong using in my code?
You cannot use regular expressions in Lua. Use Lua's string patterns to match strings.
See How to write this regular expression in Lua?
Try dump({str:find("\\%[%("))})
Also note that this loop:
for index = 1, #RICH_TAGS - 1 do
START_OF_PATTER = START_OF_PATTER .. "(" .. RICH_TAGS[index]..")|"
end
will leave out the last element of RICH_TAGS, I assume that was not your intention.
Edit:
But what I want is to fetch several specific word. For example, the
pattern can fetch "[img=" "[texture=" "[font=" any one of them. With
the regex string I wrote in my question, regex can do the work. But
with Lua, the way to do the job is write code like string.find(str,
"[img=") and string.find(str, "[texture=") and string.find(str,
"[font="). I wonder there should be a way to do the job with a single
pattern string. I tryed pattern string like "%[%a*=", but obviously it
will fetch a lot more string I need.
You cannot match several specific words with a single pattern unless they are in that string in a specific order. The only thing you could do is to put all the characters that make up those words into a class, but then you risk to find any word you can build from those letters.
Usually you would match each word with a separate pattern or you match any word and check if the match is one of your words using a look up table for example.
So basically you do what a regex library would do in a few lines of Lua.

Backrefence without matching it on find result

Consider the text structure
(Title)[#1Title-link]
(Chapter1)[#Chapter1-link]
(Chapter2)[#Chapter2-link]
(Chapter3)[#Chapter3-link]
How can i backrefence to [#Title-link] without matching it on find result. Im trying to change
(Chapter1)[#Chapter1-link] => (Chapter1)[#1Title-link-Chapter1-link]
(Chapter2)[#Chapter2-link] => (Chapter2)[#1Title-link-Chapter2-link]
(Chapter3)[#Chapter3-link] => (Chapter3)[#1Title-link-Chapter3-link]
I tried to use and find
(\(Title\)\[(.*?)])([\s\S]*?\[)#(\D.*?\])
then replace it with
$1$3$2-$4
but the problem in here it only highlight once per find and i got lots of chapter its too inefficient to replace it one by one.
Making a constant title is no good too because i have multiple files with that same structure.
Is this possible in regex? any solution or alternative is welcome.
You can first do a search to get the correct substitution string and then do a subsequent replace operation with that substitution string. You did not specify what language you were using, so here is the code in Python (where that back reference to group 1 is \1 rather than the more usual $1):
import re
text = """(Title)[#1Title-link]
(Chapter1)[#Chapter1-link]
(Chapter2)[#Chapter2-link]
(Chapter3)[#Chapter3-link]"""
m = re.search(r'(?:\(Title\)\[#([^\]]*)\])', text)
assert(m) # that we have a match
substitution = m.group(1)
text = re.sub(r'\[#Chapter([^\]]*)\]', r'[#' + substitution + r'-Chapter\1' + ']', text)
print(text)
Prints:
(Title)[#1Title-link]
(Chapter1)[#1Title-link-Chapter1-link]
(Chapter2)[#1Title-link-Chapter2-link]
(Chapter3)[#1Title-link-Chapter3-link]
See Regex Demo 1 for getting the substitution string
See Regex Demo 2 for making the subsitutions

Regex to find segment of string searching from end

I'm in Java and have a string that will always be in this format:
;<b>gerg(1314)</b><br> (KC)<br>
This number 461610734 will change and may be any length.. I'd like to pick that number out and use it. As you can see the number is next to a ' (the first one working backwards) and a hash # (again, the first one working backwards).
I can find the numbers after the hash by using ([^\#]+$) and I can find up to the last ' by using ([^\']+$) (but this would be on the wrong side of the '...)
I'm lost... Anyone know how to join these two together and nudge the ' along one to the left to just get the numbers?
Actually, I believe that you could simply extract "the digits that immediately follow a #".
You could then use the following regex: (?<=#)\d+.
On the other hand, if you really want to specify that your digits are following a # and followed by a ', you could (should?) make use of the look-arounds.
The following regex should be what you're looking for:
(?<=#)\d+(?=')
You can see it live by clicking this link.
Try this:
String str = ";<b>gerg(1314)</b><br> (KC)<br>";
Pattern pattern = Pattern.compile("onClick=\"return CCL\\(this,'#([0-9]+)'");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(1)); // Prints 461610734
}

Notepad++ RegeEx group capture syntax

I have a list of label names in a text file I'd like to manipulate using Find and Replace in Notepad++, they are listed as follows:
MyLabel_01
MyLabel_02
MyLabel_03
MyLabel_04
MyLabel_05
MyLabel_06
I want to rename them in Notepad++ to the following:
Label_A_One
Label_A_Two
Label_A_Three
Label_B_One
Label_B_Two
Label_B_Three
The Regex I'm using in the Notepad++'s replace dialog to capture the label name is the following:
((MyLabel_0)((1)|(2)|(3)|(4)|(5)|(6)))
I want to replace each capture group as follows:
\1 = Label_
\2 = A_One
\3 = A_Two
\4 = A_Three
\5 = B_One
\6 = B_Two
\7 = B_Three
My problem is that Notepad++ doesn't register the syntax of the regex above. When I hit Count in the Replace Dialog, it returns with 0 occurrences. Not sure what's misesing in the syntax. And yes I made sure the Regular Expression radio button is selected. Help is appreciated.
UPDATE:
Tried escaping the parenthesis, still didn't work:
\(\(MyLabel_0\)\((1\)|\(2\)|\(3\)|\(4\)|\(5\)|\(6\)\)\)
Ed's response has shown a working pattern since alternation isn't supported in Notepad++, however the rest of your problem can't be handled by regex alone. What you're trying to do isn't possible with a regex find/replace approach. Your desired result involves logical conditions which can't be expressed in regex. All you can do with the replace method is re-arrange items and refer to the captured items, but you can't tell it to use "A" for values 1-3, and "B" for 4-6. Furthermore, you can't assign placeholders like that. They are really capture groups that you are backreferencing.
To reach the results you've shown you would need to write a small program that would allow you to check the captured values and perform the appropriate replacements.
EDIT: here's an example of how to achieve this in C#
var numToWordMap = new Dictionary<int, string>();
numToWordMap[1] = "A_One";
numToWordMap[2] = "A_Two";
numToWordMap[3] = "A_Three";
numToWordMap[4] = "B_One";
numToWordMap[5] = "B_Two";
numToWordMap[6] = "B_Three";
string pattern = #"\bMyLabel_(\d+)\b";
string filePath = #"C:\temp.txt";
string[] contents = File.ReadAllLines(filePath);
for (int i = 0; i < contents.Length; i++)
{
contents[i] = Regex.Replace(contents[i], pattern,
m =>
{
int num = int.Parse(m.Groups[1].Value);
if (numToWordMap.ContainsKey(num))
{
return "Label_" + numToWordMap[num];
}
// key not found, use original value
return m.Value;
});
}
File.WriteAllLines(filePath, contents);
You should be able to use this easily. Perhaps you can download LINQPad or Visual C# Express to do so.
If your files are too large this might be an inefficient approach, in which case you could use a StreamReader and StreamWriter to read from the original file and write it to another, respectively.
Also be aware that my sample code writes back to the original file. For testing purposes you can change that path to another file so it isn't overwritten.
Bar bar bar - Notepad++ thinks you're a barbarian.
(obsolete - see update below.) No vertical bars in Notepad++ regex - sorry. I forget every few months, too!
Use [123456] instead.
Update: Sorry, I didn't read carefully enough; on top of the barhopping problem, #Ahmad's spot-on - you can't do a mapping replacement like that.
Update: Version 6 of Notepad++ changed the regular expression engine to a Perl-compatible one, which supports "|". AFAICT, if you have a version 5., auto-update won't update to 6. - you have to explicitly download it.
A regular expression search and replace for
MyLabel_((01)|(02)|(03)|(04)|(05)|(06))
with
Label_(?2A_One)(?3A_Two)(?4A_Three)(?5B_One)(?6B_Two)(?7B_Three)
works on Notepad 6.3.2
The outermost pair of brackets is for grouping, they limit the scope of the first alternation; not sure whether they could be omitted but including them makes the scope clear. The pattern searches for a fixed string followed by one of the two-digit pairs. (The leading zero could be factored out and placed in the fixed string.) Each digit pair is wrapped in round brackets so it is captured.
In the replacement expression, the clause (?4A_Three) says that if capture group 4 matched something then insert the text A_Three, otherwise insert nothing. Similarly for the other clauses. As the 6 alternatives are mutually exclusive only one will match. Thus only one of the (?...) clauses will have matched and so only one will insert text.
The easiest way to do this that I would recommend is to use AWK. If you're on Windows, look for the mingw32 precompiled binaries out there for free download (it'll be called gawk).
BEGIN {
FS = "_0";
a[1]="A_One";
a[2]="A_Two";
a[3]="A_Three";
a[4]="B_One";
a[5]="B_Two";
a[6]="B_Three";
}
{
printf("Label_%s\n", a[$2]);
}
Execute on Windows as follows:
C:\Users\Mydir>gawk -f test.awk awk.in
Label_A_One
Label_A_Two
Label_A_Three
Label_B_One
Label_B_Two
Label_B_Three

Replace using RegEx outside of text markers

I have the following sample text and I want to replace '[core].' with something else but I only want to replace it when it is not between text markers ' (SQL):
PRINT 'The result of [core].[dbo].[FunctionX]' + [core].[dbo].[FunctionX] + '.'
EXECUTE [core].[dbo].[FunctionX]
The Result shoud be:
PRINT 'The result of [core].[dbo].[FunctionX]' + [extended].[dbo].[FunctionX] + '.'
EXECUTE [extended].[dbo].[FunctionX]
I hope someone can understand this. Can this be solved by a regular expression?
With RegLove
Kevin
Not in a single step, and not in an ordinary text editor. If your SQL is syntactically valid, you can do something like this:
First, you remove every string from the SQL and replace with placeholders. Then you do your replace of [core] with something else. Then you restore the text in the placeholders from step one:
Find all occurrences of '(?:''|[^'])+' with 'n', where n is an index number (the number of the match). Store the matches in an array with the same number as n. This will remove all SQL strings from the input and exchange them for harmless replacements without invalidating the SQL itself.
Do your replace of [core]. No regex required, normal search-and-replace is enough here.
Iterate the array, replacing the placeholder '1' with the first array item, '2' with the second, up to n. Now you have restored the original strings.
The regex, explained:
' # a single quote
(?: # begin non-capturing group
''|[^'] # either two single quotes, or anything but a single quote
)+ # end group, repeat at least once
' # a single quote
JavaScript this would look something like this:
var sql = 'your long SQL code';
var str = [];
// step 1 - remove everything that looks like an SQL string
var newSql = sql.replace(/'(?:''|[^'])+'/g, function(m) {
str.push(m);
return "'"+(str.length-1)+"'";
});
// step 2 - actual replacement (JavaScript replace is regex-only)
newSql = newSql.replace(/\[core\]/g, "[new-core]");
// step 3 - restore all original strings
for (var i=0; i<str.length; i++){
newSql = newSql.replace("'"+i+"'", str[i]);
}
// done.
Here is a solution (javascript):
str.replace(/('[^']*'.*)*\[core\]/g, "$1[extended]");
See it in action