To check if one or more variables (like var1, var2, ...) are set in the script, I read the script into a list of strings (line by line) and check if a line looks like
var1=...
[var1,x]=...
[var2,x]=...
[x,y,var1,z]=...
Currently I'm use the following pattern
pattern = '^([.*)?(var1|var2|var3)(.*])?=';
ix = regexp(s,pattern,'once');
It works for my purpose but I know it's not a safe pattern, because something like [x,vvvar1,y]=... also matches the pattern.
My current solution is to make separate patterns for each type of expressions, but I wonder if there is a unique pattern that can meet my needs.
Here are some examples, if I want to match any of abc or def,
pattern = '^([.*)?(abc|def)(.*])?=';
%% good examples
regexp('x=1',pattern,'once') % output []
regexp('aabc=1',pattern,'once') % output []
regexp('abc=1',pattern,'once') % output 1
regexp('[other,abc]=deal(1,2)',pattern,'once') % output 1
%% bad examples
regexp('[x,aabcc]=deal(1,2)',pattern,'once') % output 1
regexp('[x,abcc,y]=deal(1,2,3)',pattern,'once') % output 1
You want to make sure there is at least one specific variable in the string.
You can use
^\[?(\w+,)*(abc|def)(,\w+)*]?=
See the regex demo.
Details:
^ - start of string
\[? - an optional literal [ char
(\w+,)* - zero or more one or more word chars + a comma sequences
(abc|def) - either abc or def
(,\w+)* - zero or more comma + one or more word chars sequences
]? - an optional ] char
= - a = char.
Related
I want to split below string by two pipe(|| ) regex .
Input String
value1=data1||value2=da|ta2||value3=test&user01|
Expected Output
value1=data1
value2=da|ta2
value3=test&user01|
I tried ([^||]+) but its consider single pipe | also to split .
Try out my example - Regex
value2 has single pipe it should not be considered as matching.
I am using lua script like
for pair in string.gmatch(params, "([^||]+)") do
print(pair)
end
You can explicitly find each ||.
$ cat foo.lua
s = 'value1=data1||value2=da|ta2||value3=test&user01|'
offset = 1
for idx in string.gmatch(s, '()||') do
print(string.sub(s, offset, idx - 1) )
offset = idx + 2
end
-- Deal with the part after the right-most `||`.
-- Must +1 or it'll fail to handle s like "a=b||".
if offset <= #s + 1 then
print(string.sub(s, offset) )
end
$ lua foo.lua
value1=data1
value2=da|ta2
value3=test&user01|
Regarding ()|| see Lua's doc about Patterns (Lua does not have regex support) —
Captures:
A pattern can contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the substrings of the subject string that match captures are stored (captured) for future use. Captures are numbered according to their left parentheses. For instance, in the pattern "(a*(.)%w(%s*))", the part of the string matching "a*(.)%w(%s*)" is stored as the first capture, and therefore has number 1; the character matching "." is captured with number 2, and the part matching "%s*" has number 3.
As a special case, the capture () captures the current string position (a number). For instance, if we apply the pattern "()aa()" on the string "flaaap", there will be two captures: 3 and 5.
the easiest way is to replace the sequence of 2 characters || with any other character (e.g. ;) that will not be used in the data, and only then use it as a separator:
local params = "value1=data1||value2=da|ta2||value3=test&user01|"
for pair in string.gmatch(params:gsub('||',';'), "([^;]+)") do
print(pair)
end
if all characters are possible, then any non-printable characters can be used, according to their codes: string.char("10") == "\10" == "\n"
even with code 1: "\1"
string.gmatch( params:gsub('||','\1'), "([^\1]+)" )
I was trying to replace/remove any string between - <branch prefix> /
Example:
String name = Application-2.0.2-bug/TEST-1.0.0.zip
expected output :
Application-2.0.2-TEST-1.0.0.zip
I tried the below regex, but it's not working accurate.
String FILENAME = Application-2.0.2-bug/TEST-1.0.0.zip
println(FILENAME.replaceAll(".+/", ""))
You can use
FILENAME.replaceAll("-[^-/]+/", "-")
See the regex demo. Details:
- - a hyphen
[^-/]+ - any one or more chars other than - and /
/ - a / char.
See the online Groovy demo:
String FILENAME = 'Application-2.0.2-bug/TEST-1.0.0.zip'
println(FILENAME.replaceAll("-[^-/]+/", "-"))
// => Application-2.0.2-TEST-1.0.0.zip
I find that using groovy closures for string replaces are most intuitive and easy to understand.
def str = "Application-2.0.2-bug/TEST-1.0.0.zip"
def newStr = str.replaceAll(/(.*-)(.*\/)(.*)/){all,group1,group2,group3 ->
println all
println group1
println group2
println group3
"${group1}${group3}" //this is the return value of the closure
}
println newStr
This is the output
Application-2.0.2-bug/TEST-1.0.0.zip
Application-2.0.2-
bug/
TEST-1.0.0.zip
Application-2.0.2-TEST-1.0.0.zip
Explanation:
If you notice in the regex that char groups are all in parentheses (). This denotes the groups in the input string. These groups can then be used in an easy way in a closure.
all - first variable will always be full string
group1 - (.*-) to indicate all chars ending with -
group2 - (.*\/) to indicate all chars ending with / (escaped with \).
group3 - (.*) all remaining chars
Now for your requirement all you need is to eliminate group2 and return a concatenation of group1 and group3.
By using this technique you can use the closure pretty powerfully, just make sure that the number of arguments in the closure (in this case 4) equal 1 more than the number of groups in the regex since the first one is always full input string. You can dynamically have any number of groups depending on your scenario
Please, try this one:
String FILENAME = "Application-2.0.2-**bug/**TEST-1.0.0.zip";
System.out.println(FILENAME.replaceAll("\\*\\*(.*)\\*\\*", ""));
Say I have this Matlab or Octave char variable:
>> filename = 'my.file.ext'
I want a regexprep command that adds a suffix, say '_old', to the file name before the extension, transforming it into 'my.file_old.ext'.
The following replaces all dots with '_old.':
>> regexprep(filename, '\.', '_old.')
ans =
'my_old.file_old.ext'
What is a regexprep command that prepends '_old' only to the last dot? (Ideally, if there is no dot (no extension), append '_old' at the very end.)
Thank you in advance!
If doing it without regular expressions is an option, you can use fileparts as follows:
filename = 'my.file.ext';
suffix = '_old';
[p, n, e] = fileparts(filename); % path, file, extension; each possibly empty
result = [p, n, suffix, e];
Example in Octave.
You may use
regexprep(filename, '^(?:(.*)(\.)|(.*))', '$3$1_old$2')
See the regex demo
Details
^ - start of string
(?:(.*)(\.)|(.*)) - a non-capturing group matching either of the two alternatives:
(.*)(\.) - Group 1 ($1 backreference refers to the value of the group): any zero or more chars as many as possible and then Group 2 ($2): a dot
| - or
(.*) - Group 3 ($3): any zero or more chars as many as possible
If an alternative is not matched, the backreference to the capturing group is an empty string. Thus, if (.*)(\.) matches, the replacement is Group 1 + _old + Group 2 value. Else, it is Group 3 + _old (just appending at the end).
I have a String which contains column names and datatypes as below:
val cdt = "header:integer|releaseNumber:numeric|amountCredit:numeric|lastUpdatedBy:numeric(15,10)|orderNumber:numeric(20,0)"
My requirement is to convert the postgres datatypes which are present as numeric, numeric(15,10) into spark-sql compatible datatypes.
In this case,
numeric -> decimal(38,30)
numeric(15,10) -> decimal(15,10)
numeric(20,0) -> bigint (This is an integeral datatype as there its precision is zero.)
In order to access the datatype in the string: cdt, I split it and created a Seq from it.
val dt = cdt.split("\\|").toSeq
Now I have a Seq of elements in which each element is a String in the below format:
Seq("header:integer", "releaseNumber:numeric","amountCredit:numeric","lastUpdatedBy:numeric(15,10)","orderNumber:numeric(20,0)")
I have the pattern matching regex: """numeric\(\d+,(\d+)\)""".r, for numeric(precision, scale) which only works if there is a
scale of two digits, ex: numeric(20,23).
I am very new to REGEX and Scala & I don't understand how to create regex pattterns for the remaining two cases & apply it on a string to match a condition. I tried it in the below way but it gives me a compilation error: "Cannot resolve symbol findFirstMatchIn"
dt.map(e => e.split("\\:")).map(e => changeDataType(e(0), e(1)))
def changeDataType(colName: String, cd:String): String = {
val finalColumns = new String
val pattern1 = """numeric\(\d+,(\d+)\)""".r
cd match {
case pattern1.findFirstMatchIn(dt) =>
}
}
I am trying to get the final output into a String as below:
header:integer|releaseNumber:decimal(38,30)|amountCredit:decimal(38,30)|lastUpdatedBy:decimal(15,10)|orderNumber:bigint
How to multiple regex patterns for different cases to check/apply pattern matching on datatype of each value in the seq and change it to my suitable datatype as mentioned above.
Could anyone let me know how can I achieve it ?
It can be done with a single regex pattern, but some testing of the match results is required.
val numericRE = raw"([^:]+):numeric(?:\((\d+),(\d+)\))?".r
cdt.split("\\|")
.map{
case numericRE(col,a,b) =>
if (Option(b).isEmpty) s"$col:decimal(38,30)"
else if (b == "0") s"$col:bigint"
else s"$col:decimal($a,$b)"
case x => x //pass-through
}.mkString("|")
//res0: String = header:integer|releaseNumber:decimal(38,30)|amountCredit:decimal(38,30)|lastUpdatedBy:decimal(15,10)|orderNumber:bigint
Of course it can be done with three different regex patterns, but I think this is pretty clear.
explanation
raw - don't need so many escape characters - \
([^:]+) - capture everything up to the 1st colon
:numeric - followed by the string ":numeric"
(?: - start a non-capture group
\((\d+),(\d+)\) - capture the 2 digit strings, separated by a comma, inside parentheses
)? - the non-capture group is optional
numericRE(col,a,b) - col is the 1st capture group, a and b are the digit captures, but they are inside the optional non-capture group so they might be null
I have the following in a data.frame in r:
example <- "Inmuebles24_|.|_Casa_|.|_Renta_|.|_NuevoLeon"
I would like to simply use stringr count and some basic grexpr functions on the string, but i'm stuck on the regex.
The delimiter is clearly (and confusingly): _|.|_
How would this be expressed with regex?
Currently trying to escape everything to no success:
str_count(string = example, pattern = "[\\_\\|\\.\\|\\_]")
Your regex does not work because you placed it into a character class (where you do not need to escape _, BTW). See my today's answer to Regex expression not working with once or none for an explanation of the issue (mainly, the characters are treated as separate symbols and not as sequences of symbols, and all the special symbols are treated as literals, too).
You can achieve what you want in two steps:
Trim the string from the delimiters with gsub
Use str_count + 1 to get the count (as the number of parts = number of delimiters inside the string + 1)
R code:
example <- "_|.|_Inmuebles24_|.|_Casa_|.|_Renta_|.|_NuevoLeon_|.|_"
str_count(string = gsub("^(_[|][.][|]_)+|(_[|][.][|]_)+$", "", example), pattern = "_\\|\\.\\|_") + 1
## => 4
Or, in case you have multile consecutive delimiters, you need another intermediate step to "contract" them into 1:
example <- "_|.|_Inmuebles24_|.|_Casa_|.|__|.|_Renta_|.|__|.|_NuevoLeon_|.|_"
example <- gsub("((_[|][.][|]_)+)", "_|.|_", example)
str_count(string = gsub("^(_[|][.][|]_)+|(_[|][.][|]_)+$", "", example), pattern = "_\\|\\.\\|_") + 1
## => 4
Notes on the regexps: _[|][.][|]_ matches _|.|_ literally as symbols in the [...] character classes lose their special meaning. ((_[|][.][|]_)+) (2) matches 1 or more (+) sequences of these delimiters. The ^(_[|][.][|]_)+|(_[|][.][|]_)+$ pattern matches 1 or more delimiters at the start (^) and end ($) of the string.
This gives you what you want for this specific example you've given: str_count(example, "\\w+")