regex to replace a string using replaceAll() or any other method - regex

I was trying to replace/remove any string between - <branch prefix> /
Example:
String name = Application-2.0.2-bug/TEST-1.0.0.zip
expected output :
Application-2.0.2-TEST-1.0.0.zip
I tried the below regex, but it's not working accurate.
String FILENAME = Application-2.0.2-bug/TEST-1.0.0.zip
println(FILENAME.replaceAll(".+/", ""))

You can use
FILENAME.replaceAll("-[^-/]+/", "-")
See the regex demo. Details:
- - a hyphen
[^-/]+ - any one or more chars other than - and /
/ - a / char.
See the online Groovy demo:
String FILENAME = 'Application-2.0.2-bug/TEST-1.0.0.zip'
println(FILENAME.replaceAll("-[^-/]+/", "-"))
// => Application-2.0.2-TEST-1.0.0.zip

I find that using groovy closures for string replaces are most intuitive and easy to understand.
def str = "Application-2.0.2-bug/TEST-1.0.0.zip"
def newStr = str.replaceAll(/(.*-)(.*\/)(.*)/){all,group1,group2,group3 ->
println all
println group1
println group2
println group3
"${group1}${group3}" //this is the return value of the closure
}
println newStr
This is the output
Application-2.0.2-bug/TEST-1.0.0.zip
Application-2.0.2-
bug/
TEST-1.0.0.zip
Application-2.0.2-TEST-1.0.0.zip
Explanation:
If you notice in the regex that char groups are all in parentheses (). This denotes the groups in the input string. These groups can then be used in an easy way in a closure.
all - first variable will always be full string
group1 - (.*-) to indicate all chars ending with -
group2 - (.*\/) to indicate all chars ending with / (escaped with \).
group3 - (.*) all remaining chars
Now for your requirement all you need is to eliminate group2 and return a concatenation of group1 and group3.
By using this technique you can use the closure pretty powerfully, just make sure that the number of arguments in the closure (in this case 4) equal 1 more than the number of groups in the regex since the first one is always full input string. You can dynamically have any number of groups depending on your scenario

Please, try this one:
String FILENAME = "Application-2.0.2-**bug/**TEST-1.0.0.zip";
System.out.println(FILENAME.replaceAll("\\*\\*(.*)\\*\\*", ""));

Related

matlab regular expression to check if variables are set in a script

To check if one or more variables (like var1, var2, ...) are set in the script, I read the script into a list of strings (line by line) and check if a line looks like
var1=...
[var1,x]=...
[var2,x]=...
[x,y,var1,z]=...
Currently I'm use the following pattern
pattern = '^([.*)?(var1|var2|var3)(.*])?=';
ix = regexp(s,pattern,'once');
It works for my purpose but I know it's not a safe pattern, because something like [x,vvvar1,y]=... also matches the pattern.
My current solution is to make separate patterns for each type of expressions, but I wonder if there is a unique pattern that can meet my needs.
Here are some examples, if I want to match any of abc or def,
pattern = '^([.*)?(abc|def)(.*])?=';
%% good examples
regexp('x=1',pattern,'once') % output []
regexp('aabc=1',pattern,'once') % output []
regexp('abc=1',pattern,'once') % output 1
regexp('[other,abc]=deal(1,2)',pattern,'once') % output 1
%% bad examples
regexp('[x,aabcc]=deal(1,2)',pattern,'once') % output 1
regexp('[x,abcc,y]=deal(1,2,3)',pattern,'once') % output 1
You want to make sure there is at least one specific variable in the string.
You can use
^\[?(\w+,)*(abc|def)(,\w+)*]?=
See the regex demo.
Details:
^ - start of string
\[? - an optional literal [ char
(\w+,)* - zero or more one or more word chars + a comma sequences
(abc|def) - either abc or def
(,\w+)* - zero or more comma + one or more word chars sequences
]? - an optional ] char
= - a = char.

How to use Matlab/Octave regexprep (regular expression replace) to add suffix to file name before extension

Say I have this Matlab or Octave char variable:
>> filename = 'my.file.ext'
I want a regexprep command that adds a suffix, say '_old', to the file name before the extension, transforming it into 'my.file_old.ext'.
The following replaces all dots with '_old.':
>> regexprep(filename, '\.', '_old.')
ans =
'my_old.file_old.ext'
What is a regexprep command that prepends '_old' only to the last dot? (Ideally, if there is no dot (no extension), append '_old' at the very end.)
Thank you in advance!
If doing it without regular expressions is an option, you can use fileparts as follows:
filename = 'my.file.ext';
suffix = '_old';
[p, n, e] = fileparts(filename); % path, file, extension; each possibly empty
result = [p, n, suffix, e];
Example in Octave.
You may use
regexprep(filename, '^(?:(.*)(\.)|(.*))', '$3$1_old$2')
See the regex demo
Details
^ - start of string
(?:(.*)(\.)|(.*)) - a non-capturing group matching either of the two alternatives:
(.*)(\.) - Group 1 ($1 backreference refers to the value of the group): any zero or more chars as many as possible and then Group 2 ($2): a dot
| - or
(.*) - Group 3 ($3): any zero or more chars as many as possible
If an alternative is not matched, the backreference to the capturing group is an empty string. Thus, if (.*)(\.) matches, the replacement is Group 1 + _old + Group 2 value. Else, it is Group 3 + _old (just appending at the end).

How to replace part of string using regex pattern matching in scala?

I have a String which contains column names and datatypes as below:
val cdt = "header:integer|releaseNumber:numeric|amountCredit:numeric|lastUpdatedBy:numeric(15,10)|orderNumber:numeric(20,0)"
My requirement is to convert the postgres datatypes which are present as numeric, numeric(15,10) into spark-sql compatible datatypes.
In this case,
numeric -> decimal(38,30)
numeric(15,10) -> decimal(15,10)
numeric(20,0) -> bigint (This is an integeral datatype as there its precision is zero.)
In order to access the datatype in the string: cdt, I split it and created a Seq from it.
val dt = cdt.split("\\|").toSeq
Now I have a Seq of elements in which each element is a String in the below format:
Seq("header:integer", "releaseNumber:numeric","amountCredit:numeric","lastUpdatedBy:numeric(15,10)","orderNumber:numeric(20,0)")
I have the pattern matching regex: """numeric\(\d+,(\d+)\)""".r, for numeric(precision, scale) which only works if there is a
scale of two digits, ex: numeric(20,23).
I am very new to REGEX and Scala & I don't understand how to create regex pattterns for the remaining two cases & apply it on a string to match a condition. I tried it in the below way but it gives me a compilation error: "Cannot resolve symbol findFirstMatchIn"
dt.map(e => e.split("\\:")).map(e => changeDataType(e(0), e(1)))
def changeDataType(colName: String, cd:String): String = {
val finalColumns = new String
val pattern1 = """numeric\(\d+,(\d+)\)""".r
cd match {
case pattern1.findFirstMatchIn(dt) =>
}
}
I am trying to get the final output into a String as below:
header:integer|releaseNumber:decimal(38,30)|amountCredit:decimal(38,30)|lastUpdatedBy:decimal(15,10)|orderNumber:bigint
How to multiple regex patterns for different cases to check/apply pattern matching on datatype of each value in the seq and change it to my suitable datatype as mentioned above.
Could anyone let me know how can I achieve it ?
It can be done with a single regex pattern, but some testing of the match results is required.
val numericRE = raw"([^:]+):numeric(?:\((\d+),(\d+)\))?".r
cdt.split("\\|")
.map{
case numericRE(col,a,b) =>
if (Option(b).isEmpty) s"$col:decimal(38,30)"
else if (b == "0") s"$col:bigint"
else s"$col:decimal($a,$b)"
case x => x //pass-through
}.mkString("|")
//res0: String = header:integer|releaseNumber:decimal(38,30)|amountCredit:decimal(38,30)|lastUpdatedBy:decimal(15,10)|orderNumber:bigint
Of course it can be done with three different regex patterns, but I think this is pretty clear.
explanation
raw - don't need so many escape characters - \
([^:]+) - capture everything up to the 1st colon
:numeric - followed by the string ":numeric"
(?: - start a non-capture group
\((\d+),(\d+)\) - capture the 2 digit strings, separated by a comma, inside parentheses
)? - the non-capture group is optional
numericRE(col,a,b) - col is the 1st capture group, a and b are the digit captures, but they are inside the optional non-capture group so they might be null

Lua gsub - How to set max character limit in regex pattern

From strings that are similar to this string:
|cff00ccffkey:|r value
I need to remove |cff00ccff and |r to get:
key: value
The problem is that |cff00ccff is a color code. I know it always starts with |c but the next 8 characters could be anything. So I need a gsub pattern to get the next 8 characters (alpha-numeric only) after |c.
How can I do this in Lua? I have tried:
local newString = string.gsub("|cff00ccffkey:|r value", "|c%w*", "")
newString = string.gsub(newString, "|r", "")
but that will remove everything up to the first white-space and I don't know how to specify the max characters to select to avoid this.
Thank you.
Lua patterns do not support range/interval/limiting quantifiers.
You may repeat %w alphanumeric pattern eight times:
local newString = string.gsub("|cff00ccffkey:|r value", "|c%w%w%w%w%w%w%w%w", "")
newString = string.gsub(newString, "|r", "")
print(newString)
-- => key: value
See the Lua demo online.
You may also make it a bit more dynamic if you build the pattern like ('%w'):.rep(8):
local newString = string.gsub("|cff00ccffkey:|r value", "|c" ..('%w'):rep(8), "")
See another Lua demo.
If your strings always follow this pattern - |c<8alpnum_chars><text>|r<value> - you may also use a pattern like
local newString = string.gsub("|cff00ccffkey:|r value", "^|c" ..('%w'):rep(8) .. "(.-)|r(.*)", "%1%2")
See this Lua demo
Here, the pattern matches:
^ - start of string
|c - a literal |c
" ..('%w'):rep(8) .. " - 8 alphanumeric chars
(.-) - Group 1: any 0+ chars, as few as possible
|r - a |r substring
(.*) - Group 2: the rest of the string.
The %1 and %2 refer to the values captured into corresponding groups.

Regex replace first letter of every word with second letter

My tagging system is now as follows:
- #issue
- #topic
- #subject
- #person
- #otherperson
- $company
- $othercompany
One of the apps on the Mac (DEVONthink) treats # specifically and therefore I would like to change the tagging system into:
- iissue
- ttopic
- ssubject
- pperson
- ootherperson
- ccompany
- oothercompany
Thanks for your help!
I would simply use groups here. Remember group(0) is always the entire matched String, so we use group(2) and group(3) for the second letter and then the rest of the word:
public static void main(String[] args) {
String[] words = {"#issue"
,"#topic"
,"#subject"
,"#person"
,"#otherperson"
,"$company"
,"$othercompany"};
String regex = "(.{1,1})(.{1,1})(.*)\\s*?";
Matcher m = Pattern.compile(regex).matcher("");
for (String word : words) {
m.reset(word).find();
String s = m.group(2) + m.group(2) + m.group(3);
System.out.println(s);
}
}
If you know that your words are formed from alpha-numeric characters, you can change the (.*?) to a more specific character group. for example (\\w*?) or something like that.
If all the words are trimmed, the ending \\s*? can be left out too. For example, here this works just fine too: (.{1,1})(.{1,1})(\\w*).
Also, if you know for a fact that the tags start with #, # or $, this can work too: ([##$])(.{1,1})(\\w*)
You can also replace find() with matches()