Special character replacement in java - replace

I am having a problem to replace any special character in java.
Example)
String input = "The list \d{1,3} [A-z] is not an ordinary list, but [L] is";
Expected output is,
The list 1,3 [A-z] is not an ordinary list, but [123] is
So, basically replacing "[L]" to "[123]"
String old_param = "\\d{1,3}";
String new_param = "1,3";
temp = input.replaceFirst("(?i)" + old_param, new_param);
Expected: The 1,3,3 list [A-z] is not an ordinary list, but [L] is
Actual output is:
The \d{1,3,3} list [A-z] is not an ordinary list, but [L] is
How do I fix this problem?
Thank you.

Related

regex to replace a string using replaceAll() or any other method

I was trying to replace/remove any string between - <branch prefix> /
Example:
String name = Application-2.0.2-bug/TEST-1.0.0.zip
expected output :
Application-2.0.2-TEST-1.0.0.zip
I tried the below regex, but it's not working accurate.
String FILENAME = Application-2.0.2-bug/TEST-1.0.0.zip
println(FILENAME.replaceAll(".+/", ""))
You can use
FILENAME.replaceAll("-[^-/]+/", "-")
See the regex demo. Details:
- - a hyphen
[^-/]+ - any one or more chars other than - and /
/ - a / char.
See the online Groovy demo:
String FILENAME = 'Application-2.0.2-bug/TEST-1.0.0.zip'
println(FILENAME.replaceAll("-[^-/]+/", "-"))
// => Application-2.0.2-TEST-1.0.0.zip
I find that using groovy closures for string replaces are most intuitive and easy to understand.
def str = "Application-2.0.2-bug/TEST-1.0.0.zip"
def newStr = str.replaceAll(/(.*-)(.*\/)(.*)/){all,group1,group2,group3 ->
println all
println group1
println group2
println group3
"${group1}${group3}" //this is the return value of the closure
}
println newStr
This is the output
Application-2.0.2-bug/TEST-1.0.0.zip
Application-2.0.2-
bug/
TEST-1.0.0.zip
Application-2.0.2-TEST-1.0.0.zip
Explanation:
If you notice in the regex that char groups are all in parentheses (). This denotes the groups in the input string. These groups can then be used in an easy way in a closure.
all - first variable will always be full string
group1 - (.*-) to indicate all chars ending with -
group2 - (.*\/) to indicate all chars ending with / (escaped with \).
group3 - (.*) all remaining chars
Now for your requirement all you need is to eliminate group2 and return a concatenation of group1 and group3.
By using this technique you can use the closure pretty powerfully, just make sure that the number of arguments in the closure (in this case 4) equal 1 more than the number of groups in the regex since the first one is always full input string. You can dynamically have any number of groups depending on your scenario
Please, try this one:
String FILENAME = "Application-2.0.2-**bug/**TEST-1.0.0.zip";
System.out.println(FILENAME.replaceAll("\\*\\*(.*)\\*\\*", ""));

Sanitize url path with regex

I'm trying to sanitize a url path from the following elements
ids (1, 14223423, 24fb3bdc-8006-47f0-a608-108f66d20af4)
filenames (things.xml, doc.v2.final.csv)
domains (covered under filenames)
emails (foo#bar.com)
Sample:
/v1/upload/dxxp-sSy449dk_rm_1debit_A_03MAY21.final.csv/email/foo#bar.com?who=knows
Desired outcome:
/upload/email
I have something that works... but I'm not proud (written in Ruby)
# Remove params from the path (everything after the ?)
route = req.path&.split('?')&.first
# Remove filenames with singlular extentions, domains, and emails
route = route&.gsub(/\b[\w-]*#?[\w-]+\.[\w-]+\b/, '')
# Remove ids from the path (any string that contains a number)
route = "/#{route&.scan(/\b[a-z_]+\b/i)&.join('/')}".chomp('/')
I can't help but think this can be done simply with something like \/([a-z_]+)\/?, but the \/? is too loose, and \/ is too restrictive.
Perhaps you can remove the parts starting with a / and that contain at least a dot or a digit.
Replace the match with an empty string.
/[^/\d.]*[.\d][^/]*
Rubular regex demo
/ Match a forward slash
[^/\d.]* Match 0+ times any char except / or . or a digit
[.\d] Match either a . or a digit
[^/]* Match 0+ times any char except /
Output
/upload/email
In Ruby, you can use a bit of code to simplify your checks in a similar way you did:
text = text.split('?').first.split('/').select{ |x| not x.match?(/\A[^#]*#\S+\z|\d/) }.join("/")
See the Ruby demo. Note how much this approach simplifies the email and digit checking.
Details
text.split('?').first - split the string with ? and grab the first part
.split('/') - splits with / into subparts
.select{ |x| not x.match?(/\A[^#]*#\S+\z|\d/) } - only keep the items that do not match \A[^#]*#\S+\z|\d regex: \A[^#]*#\S+\z - start of string, any zero or more chars other than #, a # char, then any zero or more non-whitespace chars and end of string, or a digit
.join("/") - join the resulting items with /.
So, I think it's better to go with the allow list here, rather than a block list. Seems like it's more predictable to say "we only keep words with letters and underscores".
# Keep path w/o params
route = req.path.to_s.split('?').first
# Keep words that only contain letters or _
route = route.split('/').keep_if { |chunk| chunk[/^[a-z_]+$/i] }
# Put the path back together
route = "/#{route.join('/')}".chomp('/')

Scala split and line start in the regex

I am trying to split the string in to four parts P, Q, R, S.
String starts with P as per the following example :
"P|VAL1|VAL2|VAL3|BLANK|Q|VAL4|BLANK|BLANK|R|VAL5|BLANK|VAL6|HELP|BLANK|VAL7|S|EDIT|BLANK|VAL8|(SDK 1.8)|BLANK".split("[(^?P\\|)][(Q?\\|)]?[(R?\\|)]?[(S?\\|)]")
"P|VAL1|VAL2|VAL3|BLANK|Q|VAL4|BLANK|BLANK|R|VAL5|BLANK|VAL6|HELP|BLANK|VAL7|S|EDIT|BLANK|VAL8|(SDK 1.8)|BLANK".split("[(^?P\|)][(Q?\|)]?[(R?\|)]?[(S?\|)]") foreach println
gives
VAL1|VAL2|VAL3|BLANK
VAL4|BLANK|BLANK
VAL5|BLANK|VAL6|HEL
BLANK|VAL7
|EDIT|BLANK|VAL8
DK 1.8
BLANK
where my expectation is :
VAL1|VAL2|VAL3|BLANK
VAL4|BLANK|BLANK
VAL5|BLANK|VAL6|HELP|BLANK|VAL7
EDIT|BLANK|VAL8|(SDK 1.8)|BLANK
However
"P|VAL1|VAL2|VAL3|BLANK|Q|VAL4|BLANK|BLANK|R|VAL5|BLANK|VAL6|HELP|BLANK|VAL7|S|EDIT|BLANK|VAL8|(SDK 1.8)|BLANK".split("[(^P\\|)][(Q?\\|)]?[(R?\\|)]?[(S?\\|)]") (0)
Checking first element of split with above gives
res9: String = ""
It seems that start of string is not honored here. I tried this on regex 101 as well it correctly matches P| at the start. However it also matches P| in the |HELP|. So it seems my regex is flawed. However my question is How the empty string above comes in to play ?
You can use the following regex if having an empty first element of your list is not important:
\\|[QRS]\\||^P\\|
You can replace this regex by \\|[PQRS]\\||^P\\| if you except other P as separator inside the string
OUTPUT:
"P|VAL1|VAL2|VAL3|BLANK|Q|VAL4|BLANK|BLANK|R|VAL5|BLANK|VAL6|HELP|BLANK|VAL7|S|EDIT|BLANK|VAL8|(SDK 1.8)|BLANK".split("\\|[QRS]\\||^P\\|");
[, VAL1|VAL2|VAL3|BLANK, VAL4|BLANK|BLANK, VAL5|BLANK|VAL6|HELP|BLANK|VAL7, EDIT|BLANK|VAL8|(SDK 1.8)|BLANK]
Otherwise you need to do it in 2 steps:
match and remove the P| at the beginning of your string using ^P\\| and replacing it by nothing demo1
split the string using the regex \\|[QRS]\\| demo2 You can replace this regex by \\|[PQRS]\\| if you except other P as separator inside the string
Here's one approach that defines the delimiter as one of P, Q, R, S enclosed by word boundary \b and optional |:
val s = "P|VAL1|VAL2|VAL3|BLANK|Q|VAL4|BLANK|BLANK|R|VAL5|BLANK|VAL6|HELP|BLANK|VAL7|S|EDIT|BLANK|VAL8|(SDK 1.8)|BLANK"
s.split("""\|?\b[PQRS]\b\|?""").filter(_ != "")
// res1: Array[String] = Array(VAL1|VAL2|VAL3|BLANK, VAL4|BLANK|BLANK, VAL5|BLANK|VAL6|HELP|BLANK|VAL7, EDIT|BLANK|VAL8|(SDK 1.8)|BLANK)
Skip the filter in case you want to include extracted empty strings.

Regular Expression to find string starts with letter and ends with slash /

I'm having a collection which has 1000 records has a string column.
I'm using Jongo API for querying mongodb.
I need to find the matching records where column string starts with letter "AB" and ends with slash "/"
Need help on the query to query to select the same.
Thanks.
I'm going to assume you know how to query using Regular Expressions in Jongo API and are just looking for the necessary regex to do so?
If so, this regex will find any string that begins 'AB' (case sensitive), is followed by any number of other characters and then ends with forward slash ('/'):
^AB.*\/$
^ - matches the start of the string
AB - matches the string 'AB' exactly
.* - matches any character ('.') any number of times ('*')
\/ - matches the literal character '/' (backslash is the escape character)
$ - matches the end of the string
If you're just getting started with regex, I highly recommend the Regex 101 website, it's a fantastic sandbox to test regex in and explains each step of your expression to make debugging much simpler.
I have found out the solution and the following worked fine.
db.getCollection('employees').find({employeeName:{$regex: '^Raju.*\\/$'}})
db.getCollection('employees').find({employeeName:{$regex: '\/$'}})
getCollection().find("{employeeName:{$regex:
'^Raju.*\\\\/$'}}").as(Employee.class);
getCollection().find("{employeeName:{$regex: '\\/$'}}").as(Employee.class);
getCollection().find("{employeeName:#}",
Pattern.compile("\\/$")).as(Employee.class);
getCollection().find("{"Raju": {$regex: #}}", "\\/$").as(Employee.class);
map = new HashMap<>();
map.put("employeeName", Pattern.compile("\\/$"));
coll = getCollection().getDBCollection().find(new BasicDBObject(map));
You could try this.
public List<Product> searchProducts(String keyword) {
MongoCursor<Product> cursor = collection.find("{name:#}", Pattern.compile(keyword + ".*")).as(Product.class);
List<Product> products = new ArrayList<Product>();
while (cursor.hasNext()) {
Product product = cursor.next();
products.add(product);
}
return products;
}

Trying to match a string in the format of domain\username using Lua and then mask the pattern with '#'

I am trying to match a string in the format of domain\username using Lua and then mask the pattern with #.
So if the input is sample.com\admin; the output should be ######.###\#####;. The string can end with either a ;, ,, . or whitespace.
More examples:
sample.net\user1,hello -> ######.###\#####,hello
test.org\testuser. Next -> ####.###\########. Next
I tried ([a-zA-Z][a-zA-Z0-9.-]+)\.?([a-zA-Z0-9]+)\\([a-zA-Z0-9 ]+)\b which works perfectly with http://regexr.com/. But with Lua demo it doesn't. What is wrong with the pattern?
Below is the code I used to check in Lua:
test_text="I have the 123 name as domain.com\admin as 172.19.202.52 the credentials"
pattern="([a-zA-Z][a-zA-Z0-9.-]+).?([a-zA-Z0-9]+)\\([a-zA-Z0-9 ]+)\b"
res=string.match(test_text,pattern)
print (res)
It is printing nil.
Lua pattern isn't regular expression, that's why your regex doesn't work.
\b isn't supported, you can use the more powerful %f frontier pattern if needed.
In the string test_text, \ isn't escaped, so it's interpreted as \a.
. is a magic character in patterns, it needs to be escaped.
This code isn't exactly equivalent to your pattern, you can tweek it if needed:
test_text = "I have the 123 name as domain.com\\admin as 172.19.202.52 the credentials"
pattern = "(%a%w+)%.?(%w+)\\([%w]+)"
print(string.match(test_text,pattern))
Output: domain com admin
After fixing the pattern, the task of replacing them with # is easy, you might need string.sub or string.gsub.
Like already mentioned pure Lua does not have regex, only patterns.
Your regex however can be matched with the following code and pattern:
--[[
sample.net\user1,hello -> ######.###\#####,hello
test.org\testuser. Next -> ####.###\########. Next
]]
s1 = [[sample.net\user1,hello]]
s2 = [[test.org\testuser. Next]]
s3 = [[abc.domain.org\user1]]
function mask_domain(s)
s = s:gsub('(%a[%a%d%.%-]-)%.?([%a%d]+)\\([%a%d]+)([%;%,%.%s]?)',
function(a,b,c,d)
return ('#'):rep(#a)..'.'..('#'):rep(#b)..'\\'..('#'):rep(#c)..d
end)
return s
end
print(s1,'=>',mask_domain(s1))
print(s2,'=>',mask_domain(s2))
print(s3,'=>',mask_domain(s3))
The last example does not end with ; , . or whitespace. If it must follow this, then simply remove the final ? from pattern.
UPDATE: If in the domain (e.g. abc.domain.org) you need to also reveal any dots before that last one you can replace the above function with this one:
function mask_domain(s)
s = s:gsub('(%a[%a%d%.%-]-)%.?([%a%d]+)\\([%a%d]+)([%;%,%.%s]?)',
function(a,b,c,d)
a = a:gsub('[^%.]','#')
return a..'.'..('#'):rep(#b)..'\\'..('#'):rep(#c)..d
end)
return s
end