Lua: get a substring - regex

I have for example
str = "Beamer-Template!navigation symbols#\\texttt {navigation symbols}"
print(str:gsub('[^!|#%s]+#', ''))
which prints
Beamer-Template!navigation \texttt {navigation symbols}
but it should be
Beamer-Template!\texttt {navigation symbols}
How can I catch the space?
Important is only the foo#bar. The pattern works fine for strings like
str="foo#bar!baz#foobar!nice|crazy"
-> bar!foobar!nice|crazy
but not with an additional space
str="foo#bar!baz baz#foobar!nice|crazy"
-> bar!baz foobar!nice|crazy
which should be bar!foobar!nice|crazy

To match makeindex entries it might be useful to use an LPEG grammar. This way you can split at the separators and even perform semantic actions, depending on the matched field.
local lpeg = assert(require"lpeg")
local C, S = lpeg.C, lpeg.S
local sep = S("#!|")
local str = C((1 - sep)^0)
local idx = str * ( "#" * str / function(match) return "#" .. match end
+ "!" * str / function(match) return "!" .. match end
+ "|" * str / function(match) return "|" .. match end)^0
print(idx:match("hello!world#foo|bar"))
$ lua test.lua
hello !world #foo |bar
Answer to the comment: Collecting the matches in a table. The matches are collected according to their prefix.
local lpeg = assert(require"lpeg")
local C, Ct, S = lpeg.C, lpeg.Ct, lpeg.S
local sep = S("#!|")
local str = C((1 - sep)^0)
local match = function(expr)
local prefix = function(prefix)
return function(match)
return prefix .. match
end
end
local idx = str * ( "#" * str / prefix("#")
+ "!" * str / prefix("!")
+ "|" * str / prefix("|"))^0
return Ct(idx):match(expr)
end
for _, str in ipairs{
"hello!world#foo|bar",
"foo#bar!baz baz#foobar!nice|crazy",
"foo#bar!baz#foobar!nice|crazy",
"Beamer-Template!navigation symbols#\\texttt {navigation symbols}"
} do
local t = match(str)
print(table.concat(t," "))
end
$ lua test.lua
hello !world #foo |bar
foo #bar !baz baz #foobar !nice |crazy
foo #bar !baz #foobar !nice |crazy
Beamer-Template !navigation symbols #\texttt {navigation symbols}

Related

Extract substring inbetween quotation marks, but skip \" and turn it into " instead in Lua

I have this string
"argument \\\" \"some argument\" \"some argument with a quotation mark \\\" in here \""
which prints out as this
argument \" "some argument" "some argument with a quotation mark \" in here"
and I am trying to extract all of it, so that at the end it gets stored like this:
> [1] = "argument",
> [2] = """,
> [3] = "some argument",
> [4] = "some argument with a quotation mark " in here"
This is the code that I have so far.
function ExtractArgs(text)
local skip = 0
local arguments = {}
local curString = ""
for i = 1, text:len() do
if (i <= skip) then continue end
local c = text:sub(i, i)
if (c == "\\") and (text:sub(i+1, i+1) == "\"") then
continue
end
if (c == "\"") and (text:sub(i-1, i-1) ~= "\\") then
local match = text:sub(i):match("%b\"\"")
if (match) and (match:sub(#match-1,#match-1) ~= "\\") then
curString = ""
skip = i + #match
arguments[#arguments + 1] = match:sub(2, -2)
else
curString = curString..c
end
elseif (c == " " and curString ~= "") then
arguments[#arguments + 1] = curString
curString = ""
else
if (c == " " and curString == "") then
continue
end
curString = curString..c
end
end
if (curString ~= "") then
arguments[#arguments + 1] = curString
end
return arguments
end
print(ExtractArgs("argument \\\" \"some argument\" \"some argument with a quotation mark \\\" in here\""))
It extracts \" correctly that is not inbetween quotation marks, but not if it is inbetween quotation marks.
How can this be solved properly?
This seems to work with regex \"([^\"\\]*(?:\\.[^\"\\]*)*)\" but what about Lua?
The task cannot be done with a single Lua pattern but can be achieved with a chain of a few patterns.
The text parameter must not contain bytes \0, \1 and \2 - these special characters are used for temporary substitution.
local function ExtractArgs(text)
local arguments = {}
for argument in
('""'..text:gsub("\\?.", {['\\"']="\1"}))
:gsub('"(.-)"([^"]*)', function(q,n) return "\2"..q..n:gsub("%s+", "\0") end)
:sub(2)
:gmatch"%Z+"
do
argument = argument:gsub("\1", '"'):gsub("\2", ""):gsub("\\(.)", "%1")
print(argument)
arguments[#arguments+1] = argument
end
return arguments
end
ExtractArgs[[argument \"\\ "" "some argument" "some argument with a quotation mark \" in here \\"]]
Output:
argument
"\
some argument
some argument with a quotation mark " in here \

Lua: How do I place something between two or more repeating characters in a string?

This question is somewhat similar to this, but my task is to place something, in my case the dash, between the repeating characters, for example the question marks, using the gsub function.
Example:
"?" = "?"
"??" = "?-?"
"??? = "?-?-?"
Try this:
function test(s)
local t=s:gsub("%?%?","?-?"):gsub("%?%?","?-?")
print(#s,s,t)
end
for n=0,10 do
test(string.rep("?",n))
end
A possible solution using LPeg:
local lpeg = require 'lpeg'
local head = lpeg.C(lpeg.P'?')
local tail = (lpeg.P'?' / function() return '-?' end) ^ 0
local str = lpeg.Cs((head * tail + lpeg.P(1)) ^ 1)
for n=0,10 do
print(str:match(string.rep("?",n)))
end
print(str:match("?????foobar???foo?bar???"))
This what i can came out with scanning each letter by letter
function test(str)
local output = ""
local tab = {}
for let in string.gmatch(str, ".") do
table.insert(tab, let)
end
local i = 1
while i <= #tab do
if tab[i - 1] == tab[i] then
output = output.."-"..tab[i]
else
output = output..tab[i]
end
i = i + 1
end
return output
end
for n=0,10 do
print(test(string.rep("?",n)))
end

scala - how to substitute env value for another variable using regex

I have a variable aa which is having reference to an environment variable.
And I need to substitute the value using regex
Name = TEMP
Value = C:\Users\asus101\AppData\Local\Temp
aa: String = "${TEMP}_Report"
Expected output:
p2: C:\Users\asus101\AppData\Local\Temp_Report
The code that I tried
import scala.collection.JavaConversions._
val aa = "${TEMP}\\Report"
for ((name,value) <- System.getenv() ) {
val p1 = """\${XX}""".replace("XX",name).r
val p2 = p1.replaceAllIn(aa,value)
if(name=="TEMP") {
println("Name = " + name)
println("Value = " + value)
println("p2 = " + p2 )
}
I'm getting the error as
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition near index 1
\${USERDOMAIN_ROAMINGPROFILE}
^
what is wrong with the regex
It's a little hard to tell, but I think this gets at what you're after.
import scala.util.Properties._
val pttrn = raw".*(\$$\{\s*(\w+)\s*\})".r
val strA = "${ME}:my ${HOME} is Gnome and my ${BROWSER} is fine."
val strB =
strA.split("(?<=})").map {
case s # pttrn(a,b) => envOrNone(b).fold(s)(s.replace(a,_))
case s => s
}.mkString
//strB: String = ${ME}:my /home/jwvh is Gnome and my firefox is fine.
There is no $ME in my environment, so no substitution there, but the $HOME and $BROWSER values are pulled out and substituted.

How to match escaped group signs {&date:dd.\{mm\}.yyyy} but not {&date:dd.{mm}.yyyy} with vba and regex

I'm trying to create a pattern for finding placeholders within a string to be able to replace them with variables later. I'm stuck on a problem to find all these placeholders within a string according to my requirement.
I already found this post, but it only helped a little:
Regex match ; but not \;
Placeholders will look like this
{&var} --> Variable stored in a dictionary --> dict("var")
{$prop} --> Property of a class cls.prop read by CallByName and PropGet
{#const} --> Some constant values by name from a function
Generally I have this pattern and it works well
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.pattern = "\{([#\$&])([\w\.]+)\}"
For example I have this string:
"Value of foo is '{&var}' and bar is '{$prop}'"
I get 2 matches as expected
(&)(var)
($)(prop)
I also want to add a formating part like in .Net to this expression.
String.Format("This is a date: {0:dd.mm.yyyy}", DateTime.Now());
// This is a date: 05.07.2019
String.Format("This is a date, too: {0:dd.(mm).yyyy}", DateTime.Now());
// This is a date, too: 05.(07).2019
I extended the RegEx to get that optional formatting string
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.pattern = "\{([#\$&])([\w\.]+):{0,1}([^\}]*)\}"
RegEx.Execute("Value of foo is '{&var:DD.MM.YYYY}' and bar is '{$prop}'")
I get 2 matches as expected
(&)(var)(DD.MM.YYYY)
($)(prop)()
At this point I noticed I have to take care for escapet "{" and "}", because maybe I want to have some brackets within the formattet result.
This does not work properly, because my pattern stops after "...{MM"
RegEx.Execute("Value of foo is '{&var:DD.{MM}.YYYY}' and bar is '{$prop}'")
It would be okay to add escape signs to the text before checking the regex:
RegEx.Execute("Value of foo is '{&var:DD.\{MM\}.YYYY}' and bar is '{$prop}'")
But how can I correctly add the negative lookbehind?
And second: How does this also works for variables, that should not be resolved, even if they have the correct syntax bus the outer bracket is escaped?
RegEx.Execute("This should not match '\{&var:DD.\{MM\}.YYYY\}' but this one '{&var:DD.\{MM\}.YYYY}'")
I hope my question is not confusing and someone can help me
Update 05.07.19 at 12:50
After the great help of #wiktor-stribiżew the result is completed.
As requested i provide some example code:
Sub testRegEx()
Debug.Print FillVariablesInText(Nothing, "Date\\\\{$var01:DD.\{MM\}.YYYY}\\\\ Var:\{$nomatch\}{$var02} Double: {#const}{$var01} rest of string")
End Sub
Function FillVariablesInText(ByRef dict As Dictionary, ByVal txt As String) As String
Const c_varPattern As String = "(?:(?:^|[^\\\n])(?:\\{2})*)\{([#&\$])([\w.]+)(?:\:([^}\\]*(?:\\.[^\}\\]*)*))?(?=\})"
Dim part As String
Dim snippets As New Collection
Dim allMatches, m
Dim i As Long, j As Long, x As Long, n As Long
' Create a RegEx object and execute pattern
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.pattern = c_varPattern
RegEx.MultiLine = True
RegEx.Global = True
Set allMatches = RegEx.Execute(txt)
' Start at position 1 of txt
j = 1
n = 0
For Each m In allMatches
n = n + 1
Debug.Print "(" & n & "):" & m.value
Debug.Print " [0] = " & m.SubMatches(0) ' Type [&$#]
Debug.Print " [1] = " & m.SubMatches(1) ' Name
Debug.Print " [2] = " & m.SubMatches(2) ' Format
part = "{" & m.SubMatches(0)
' Get offset for pre-match-string
x = 1 ' Index to Postion at least +1
Do While Mid(m.value, x, 2) <> part
x = x + 1
Loop
' Postition in txt
i = m.FirstIndex + x
' Anything to add to result?
If i <> j Then
snippets.Add Mid(txt, j, i - j)
End If
' Next start postition (not Index!) + 1 for lookahead-positive "}"
j = m.FirstIndex + m.Length + 2
' Here comes a function get a actual value
' e.g.: snippets.Add dict(m.SubMatches(1))
' or : snippets.Add Format(dict(m.SubMatches(1)), m.SubMatches(2))
snippets.Add "<<" & m.SubMatches(0) & m.SubMatches(1) & ">>"
Next m
' Any text at the end?
If j < Len(txt) Then
snippets.Add Mid(txt, j)
End If
' Join snippets
For i = 1 To snippets.Count
FillVariablesInText = FillVariablesInText & snippets(i)
Next
End Function
The function testRegEx gives me this result and debug print:
(1):e\\\\{$var01:DD.\{MM\}.YYYY(2):}{$var02
[0] = $
[1] = var02
[2] =
(1):e\\\\{$var01:DD.\{MM\}.YYYY
[0] = $
[1] = var01
[2] = DD.\{MM\}.YYYY
(2):}{$var02
[0] = $
[1] = var02
[2] =
(3): {#const
[0] = #
[1] = const
[2] =
(4):}{$var01
[0] = $
[1] = var01
[2] =
Date\\\\<<$var01>>\\\\ Var:\{$nomatch\}<<$var02>> Double: <<#const>><<$var01>> rest of string
You may use
((?:^|[^\\])(?:\\{2})*)\{([#$&])([\w.]+)(?::([^}\\]*(?:\\.[^}\\]*)*))?}
To make sure the consecutive matches are found, too, turn the last } into a lookahead, and when extracting matches just append it to the result, or if you need the indices increment the match length by 1:
((?:^|[^\\])(?:\\{2})*)\{([#$&])([\w.]+)(?::([^}\\]*(?:\\.[^}\\]*)*))?(?=})
^^^^^
See the regex demo and regex demo #2.
Details
((?:^|[^\\])(?:\\{2})*) - Group 1 (makes sure the { that comes next is not escaped): start of string or any char but \ followed with 0 or more double backslashes
\{ - a { char
([#$&]) - Group 2: any of the three chars
([\w.]+) - Group 3: 1 or more word or dot chars
(?::([^}\\]*(?:\\.[^}\\]*)*))? - an optional sequence of : and then Group 4:
[^}\\]* - 0 or more chars other than } and \
(?:\\.[^}\\]*)* - zero or more reptitions of a \-escaped char and then 0 or more chars other than } and \
} - a } char
Welcome to the site! If you need to only match balanced escapes, you will need something more powerful. If not --- I haven't tested this, but you could try replacing [^\}]* with [^\{\}]|\\\{|\\\}. That is, match non-braces and escaped brace sequences separately. You may need to change this depending on how you want to handle backslashes in your formatting string.

Regular expression in Scala

I want to extract a word from a string and then use that word in my regex.
My string looks like this:
val s = "null_eci_count"
I want to derive the below string from the above string:
sum(cast((eci is null or eci in ('', '0', 'null', 'NULL')) as int))
I used replaceAll and had derived a part of the above expression:
scala> s.replaceAll("null_", "sum(cast((").replaceAll("_count"," is null) as int))")
res69: String = sum(cast((eci is null) as int))
Please suggest a way to derive the whole expression.
Select the middle part of the string as a group (i.e. eci) .*?_(.*?)_.* and then return eci with the group reference \1.
How about:
val eci = s.split("_").drop(1).head
val result = s match {s"sum(cast(($eci is null or $eci in ('', '0', 'null', 'NULL')) as int))"
I used ArrayBuffer to do this:
import scala.collection.mutable.ArrayBuffer
val tgt=spark.sql("select * from ctx_monitor.xpo_click_counts")
val a = tgt.columns.slice(4,tgt.columns.length)
for (e <- a) {
if (e contains "null"){ val c=e.replaceFirst("null_","");
col += "sum(cast((" + c + " is null or " + c + " in('','0','null','NULL')) as int))"}}
val cols=col.mkString(",")