Regex replace first letter of every word with second letter - regex

My tagging system is now as follows:
- #issue
- #topic
- #subject
- #person
- #otherperson
- $company
- $othercompany
One of the apps on the Mac (DEVONthink) treats # specifically and therefore I would like to change the tagging system into:
- iissue
- ttopic
- ssubject
- pperson
- ootherperson
- ccompany
- oothercompany
Thanks for your help!

I would simply use groups here. Remember group(0) is always the entire matched String, so we use group(2) and group(3) for the second letter and then the rest of the word:
public static void main(String[] args) {
String[] words = {"#issue"
,"#topic"
,"#subject"
,"#person"
,"#otherperson"
,"$company"
,"$othercompany"};
String regex = "(.{1,1})(.{1,1})(.*)\\s*?";
Matcher m = Pattern.compile(regex).matcher("");
for (String word : words) {
m.reset(word).find();
String s = m.group(2) + m.group(2) + m.group(3);
System.out.println(s);
}
}
If you know that your words are formed from alpha-numeric characters, you can change the (.*?) to a more specific character group. for example (\\w*?) or something like that.
If all the words are trimmed, the ending \\s*? can be left out too. For example, here this works just fine too: (.{1,1})(.{1,1})(\\w*).
Also, if you know for a fact that the tags start with #, # or $, this can work too: ([##$])(.{1,1})(\\w*)
You can also replace find() with matches()

Related

regex to replace a string using replaceAll() or any other method

I was trying to replace/remove any string between - <branch prefix> /
Example:
String name = Application-2.0.2-bug/TEST-1.0.0.zip
expected output :
Application-2.0.2-TEST-1.0.0.zip
I tried the below regex, but it's not working accurate.
String FILENAME = Application-2.0.2-bug/TEST-1.0.0.zip
println(FILENAME.replaceAll(".+/", ""))
You can use
FILENAME.replaceAll("-[^-/]+/", "-")
See the regex demo. Details:
- - a hyphen
[^-/]+ - any one or more chars other than - and /
/ - a / char.
See the online Groovy demo:
String FILENAME = 'Application-2.0.2-bug/TEST-1.0.0.zip'
println(FILENAME.replaceAll("-[^-/]+/", "-"))
// => Application-2.0.2-TEST-1.0.0.zip
I find that using groovy closures for string replaces are most intuitive and easy to understand.
def str = "Application-2.0.2-bug/TEST-1.0.0.zip"
def newStr = str.replaceAll(/(.*-)(.*\/)(.*)/){all,group1,group2,group3 ->
println all
println group1
println group2
println group3
"${group1}${group3}" //this is the return value of the closure
}
println newStr
This is the output
Application-2.0.2-bug/TEST-1.0.0.zip
Application-2.0.2-
bug/
TEST-1.0.0.zip
Application-2.0.2-TEST-1.0.0.zip
Explanation:
If you notice in the regex that char groups are all in parentheses (). This denotes the groups in the input string. These groups can then be used in an easy way in a closure.
all - first variable will always be full string
group1 - (.*-) to indicate all chars ending with -
group2 - (.*\/) to indicate all chars ending with / (escaped with \).
group3 - (.*) all remaining chars
Now for your requirement all you need is to eliminate group2 and return a concatenation of group1 and group3.
By using this technique you can use the closure pretty powerfully, just make sure that the number of arguments in the closure (in this case 4) equal 1 more than the number of groups in the regex since the first one is always full input string. You can dynamically have any number of groups depending on your scenario
Please, try this one:
String FILENAME = "Application-2.0.2-**bug/**TEST-1.0.0.zip";
System.out.println(FILENAME.replaceAll("\\*\\*(.*)\\*\\*", ""));

Sanitize url path with regex

I'm trying to sanitize a url path from the following elements
ids (1, 14223423, 24fb3bdc-8006-47f0-a608-108f66d20af4)
filenames (things.xml, doc.v2.final.csv)
domains (covered under filenames)
emails (foo#bar.com)
Sample:
/v1/upload/dxxp-sSy449dk_rm_1debit_A_03MAY21.final.csv/email/foo#bar.com?who=knows
Desired outcome:
/upload/email
I have something that works... but I'm not proud (written in Ruby)
# Remove params from the path (everything after the ?)
route = req.path&.split('?')&.first
# Remove filenames with singlular extentions, domains, and emails
route = route&.gsub(/\b[\w-]*#?[\w-]+\.[\w-]+\b/, '')
# Remove ids from the path (any string that contains a number)
route = "/#{route&.scan(/\b[a-z_]+\b/i)&.join('/')}".chomp('/')
I can't help but think this can be done simply with something like \/([a-z_]+)\/?, but the \/? is too loose, and \/ is too restrictive.
Perhaps you can remove the parts starting with a / and that contain at least a dot or a digit.
Replace the match with an empty string.
/[^/\d.]*[.\d][^/]*
Rubular regex demo
/ Match a forward slash
[^/\d.]* Match 0+ times any char except / or . or a digit
[.\d] Match either a . or a digit
[^/]* Match 0+ times any char except /
Output
/upload/email
In Ruby, you can use a bit of code to simplify your checks in a similar way you did:
text = text.split('?').first.split('/').select{ |x| not x.match?(/\A[^#]*#\S+\z|\d/) }.join("/")
See the Ruby demo. Note how much this approach simplifies the email and digit checking.
Details
text.split('?').first - split the string with ? and grab the first part
.split('/') - splits with / into subparts
.select{ |x| not x.match?(/\A[^#]*#\S+\z|\d/) } - only keep the items that do not match \A[^#]*#\S+\z|\d regex: \A[^#]*#\S+\z - start of string, any zero or more chars other than #, a # char, then any zero or more non-whitespace chars and end of string, or a digit
.join("/") - join the resulting items with /.
So, I think it's better to go with the allow list here, rather than a block list. Seems like it's more predictable to say "we only keep words with letters and underscores".
# Keep path w/o params
route = req.path.to_s.split('?').first
# Keep words that only contain letters or _
route = route.split('/').keep_if { |chunk| chunk[/^[a-z_]+$/i] }
# Put the path back together
route = "/#{route.join('/')}".chomp('/')

Python - how to add a new line every time there is a pattern is found in a string?

How can I add a new line every time there is a pattern of a regex-list found in a string ?
I am using python 3.6.
I got the following input:
12.13.14 Here is supposed to start a new line.
12.13.15 Here is supposed to start a new line.
Here is some text. It is written in one lines. 12.13. Here is some more text. 2.12.14. Here is even more text.
I wish to have the following output:
12.13.14
Here is supposed to start a new line.
12.13.15
Here is supposed to start a new line.
Here is some text. It is written in one lines.
12.13.
Here is some more text.
2.12.14.
Here is even more text.
My first try returns as the output the same as the input:
in_file2 = 'work1-T1.txt'
out_file2 = 'work2-T1.txt'
start_rx = re.compile('|'.join(
['\d\d\.\d\d\.', '\d\.\d\d\.\d\d','\d\d\.\d\d\.\d\d']))
with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
text_list = fin2.read().split()
fin2.seek(0)
for string in fin2:
if re.match(start_rx, string):
string = str.replace(start_rx, '\n\n' + start_rx + '\n')
fout2.write(string)
My second try returns an error 'TypeError: unsupported operand type(s) for +: '_sre.SRE_Pattern' and 'str''
in_file2 = 'work1-T1.txt'
out_file2 = 'work2-T1.txt'
start_rx = re.compile('|'.join(
['\d\d\.\d\d\.', '\d\.\d\d\.\d\d','\d\d\.\d\d\.\d\d']))
with open(in_file2,"r") as fin2, open(out_file2, 'w') as fout3:
for line in fin2:
start = False
if re.match(start_rx, line):
start = True
if start == False:
print ('do something')
if start == True:
line = '\n' + line ## leerzeichen vor Pos Nr
line = line.replace(start_rx, start_rx + '\n')
fout3.write(line)
First of all, to search and replace with a regex, you need to use re.sub, not str.replace.
Second, if you use a re.sub, you can't use the regex pattern inside a replacement pattern, you need to group the parts of the regex you want to keep and use backreferences in the replacement (or, if you just want to refer to the whole match, use \g<0> backreference, no capturing groups are required).
Third, when you build an unanchored alternation pattern, make sure longer alternatives come first, i.e. start_rx = re.compile('|'.join(['\d\d\.\d\d\.\d\d', '\d\.\d\d\.\d\d', '\d\d\.\d\d\.'])). However, you may use a more precise pattern here manually.
Here is how your code can be fixed:
with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
text = fin2.read()
fout2.write(re.sub(r'\s*(\d+(?:\.\d+)+\.?)\s*', r'\n\n\1\n', text))
See the Python demo
The pattern is
\s*(\d+(?:\.\d+)+\.?)\s*
See the regex demo
Details
\s* - 0+ whitespaces
(\d+(?:\.\d+)+\.?) - Group 1 (\1 in the replacement pattern):
\d+ - 1+ digits
(?:\.\d+)+ - 1 or more repetitions of . and 1+ digits
\.? - an optional .
\s* - 0+ whitespaces
Try this
out_file2=re.sub(r'(\d+) ', r'\1\n', in_file2)
out_file2=re.sub(r'(\w+)\.', r'\1\.\n', in_file2)

Regular Expression to find string starts with letter and ends with slash /

I'm having a collection which has 1000 records has a string column.
I'm using Jongo API for querying mongodb.
I need to find the matching records where column string starts with letter "AB" and ends with slash "/"
Need help on the query to query to select the same.
Thanks.
I'm going to assume you know how to query using Regular Expressions in Jongo API and are just looking for the necessary regex to do so?
If so, this regex will find any string that begins 'AB' (case sensitive), is followed by any number of other characters and then ends with forward slash ('/'):
^AB.*\/$
^ - matches the start of the string
AB - matches the string 'AB' exactly
.* - matches any character ('.') any number of times ('*')
\/ - matches the literal character '/' (backslash is the escape character)
$ - matches the end of the string
If you're just getting started with regex, I highly recommend the Regex 101 website, it's a fantastic sandbox to test regex in and explains each step of your expression to make debugging much simpler.
I have found out the solution and the following worked fine.
db.getCollection('employees').find({employeeName:{$regex: '^Raju.*\\/$'}})
db.getCollection('employees').find({employeeName:{$regex: '\/$'}})
getCollection().find("{employeeName:{$regex:
'^Raju.*\\\\/$'}}").as(Employee.class);
getCollection().find("{employeeName:{$regex: '\\/$'}}").as(Employee.class);
getCollection().find("{employeeName:#}",
Pattern.compile("\\/$")).as(Employee.class);
getCollection().find("{"Raju": {$regex: #}}", "\\/$").as(Employee.class);
map = new HashMap<>();
map.put("employeeName", Pattern.compile("\\/$"));
coll = getCollection().getDBCollection().find(new BasicDBObject(map));
You could try this.
public List<Product> searchProducts(String keyword) {
MongoCursor<Product> cursor = collection.find("{name:#}", Pattern.compile(keyword + ".*")).as(Product.class);
List<Product> products = new ArrayList<Product>();
while (cursor.hasNext()) {
Product product = cursor.next();
products.add(product);
}
return products;
}

Simple Regular Expression matching

Im new to regular expressions and Im trying to use RegExp on gwt Client side. I want to do a simple * matching. (say if user enters 006* , I want to match 006...). Im having trouble writing this. What I have is :
input = (006*)
input = input.replaceAll("\\*", "(" + "\\" + "\\" + "S\\*" + ")");
RegExp regExp = RegExp.compile(input).
It returns true with strings like BKLFD006* too. What am I doing wrong ?
Put a ^ at the start of the regex you're generating.
The ^ character means to match at the start of the source string only.
I think you are mixing two things here, namely replacement and matching.
Matching is used when you want to extract part of the input string that matches a specific pattern. In your case it seems that is what you want, and in order to get one or more digits that are followed by a star and not preceded by anything then you can use the following regex:
^[0-9]+(?=\*)
and here is a Java snippet:
String subjectString = "006*";
String ResultString = null;
Pattern regex = Pattern.compile("^[0-9]+(?=\\*)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
On the other hand, replacement is used when you want to replace a re-occurring pattern from the input string with something else.
For example, if you want to replace all digits followed by a star with the same digits surrounded by parentheses then you can do it like this:
String input = "006*";
String result = input.replaceAll("^([0-9]+)\\*", "($1)");
Notice the use of $1 to reference the digits that where captured using the capture group ([0-9]+) in the regex pattern.