Regex replace from a string ( smarty) [duplicate] - regex

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 6 years ago.
I'm trying to replace a string with empty space("") after a specific character(colon) ":"
example:2017 - Alpha Romeo United kingdom : New vehicle (by abc)
I want out put as "2017 - Alpha Romeo United kingdom"
it would be much appreciated if anyone can help me to write regex in smarty
Many Thanks

You could do it using the following regex (using capturing group and positive lookahead) :
input >> 2017 - Alpha Romeo United kingdom : New vehicle (by abc)
regex search >> (?=:):(.*)
replace with >> " "
output >> 2017 - Alpha Romeo United kingdom
see demo / explanation
smarty
{
assign
var = "articleTitle"
value = "2017 - Alpha Romeo United kingdom : New vehicle (by abc)"
} {
$articleTitle | regex_replace: "/(?=:):(.*)/": " "
}

private void Form1_Load(object sender, EventArgs e)
{
string str = "2017 - Alpha Romeo United kingdom : New vehicle (by abc)";
str = Regex.Replace(str, #":+(.*)", "");
MessageBox.Show(str);
}

Related

Dynamic String Masking in scala

Is there any simple way to do data masking in scala, can anyone please explain. I want to dynamically change the matching patterns to X with same keyword lengths
Example:
patterns to mask:
Narendra\s*Modi
Trump
JUN-\d\d
Input:
Narendra Modi pm of india 2020-JUN-03
Donald Trump president of USA
Ouput:
XXXXXXXX XXXX pm of india 2020-XXX-XX
Donald XXXXX president of USA
Note:Only characters should be masked, i want to retain space or hyphen in output for matching patterns
So you have an input String:
val input =
"Narendra Modi of India, 2020-JUN-03, Donald Trump of USA."
Masking off a given target with a given length is trivial.
input.replaceAllLiterally("abc", "XXX")
If you have many such targets of different lengths then it becomes more interesting.
"India|USA".r.replaceAllIn(input, "X" * _.matched.length)
//res0: String = Narendra Modi of XXXXX, 2020-JUN-03, Donald Trump of XXX.
If you have a mix of masked characters and kept characters, multiple targets can still be grouped together, but they must have the same number of sub-groups and the same pattern of masked-group to kept-group.
In this case the pattern is (mask)(keep)(mask).
raw"(Narendra)(\s+)(Modi)|(Donald)(\s+)(Trump)|(JUN)([-/])(\d+)".r
.replaceAllIn(input,{m =>
val List(a,b,c) = m.subgroups.flatMap(Option(_))
"X"*a.length + b + "X"*c.length
})
//res1: String = XXXXXXXX XXXX of India, 2020-XXX-XX, XXXXXX XXXXX of USA.
Something like that?
val pattern = Seq("Modi", "Trump", "JUN")
val str = "Narendra Modi pm of india 2020-JUN-03 Donald Trump president of USA"
def mask(pattern: Seq[String], str: String): String = {
var s = str
for (elem <- pattern) {
s = s.replaceAll(elem,elem.toCharArray.map(s=>"X").mkString)
}
s
}
print(mask(pattern,str))
out:
Narendra XXXX pm of india 2020-XXX-03 Donald XXXXX president of USA
scala> val pattern = Seq("Narendra\\s*Modi", "Trump", "JUN-\\d\\d", "Trump", "JUN")
pattern: Seq[String] = List(Narendra\s*Modi, Trump, JUN-\d\d, Trump, JUN)
scala> print(mask(pattern,str))
XXXXXXXXXXXXXXX pm of india 2020-XXXXXXXX Donald XXXXX president of USA
Yeah, It should work, try like above.
Please find the regex and code explanation inline
import org.apache.spark.sql.functions._
object RegExMasking {
def main(args: Array[String]): Unit = {
val spark = Constant.getSparkSess
import spark.implicits._
//Regex to fetch the word
val regEx : String = """(\s+[A-Z|a-z]+\s)""".stripMargin
//load your Dataframe
val df = List("Narendra Modi pm of india 2020-JUN-03",
"Donald Trump president of USA ").toDF("sentence")
df.withColumn("valueToReplace",
//Fetch the 1st word from the regex parse expression
regexp_extract(col("sentence"),regEx,0)
)
.map(row => {
val sentence = row.getString(0)
//Trim for extra spaces
val valueToReplace : String = row.getString(1).trim
//Create masked string of equal length
val replaceWith = List.fill(valueToReplace.length)("X").mkString
// Return sentence , masked sentence
(sentence,sentence.replace(valueToReplace,replaceWith))
}).toDF("sentence","maskedSentence")
.show()
}
}

How to make Regex work to find 7 letter words that start with A [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
Working on a Java program that can grab words from a text doc that start with "A" and are 7 letters longer. I am trying to use Regular Expressions in Java.
Might someone give me some pointers here how to do this?
`Pattern sevenLetters = Pattern.compile("^\\w{A}{6}$");`
Does not obtain what I'm aiming for unfortunately.
What is the syntax that goes into the ()?
Thanks
Maybe,
\\bA[a-z]{6}\\b
might simply work OK.
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "\\bA[a-z]{6}\\b";
final String string = "Aabcdef Aabcdefg Aabcde Aabcdef \n"
+ "Aabcdef Aabcdefg Aabcde Aabcdef ";
final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

How to Extract people's last name start with "S" and first name not start with "S"

As the title shows, how do I capture a person who:
Last name start with letter "S"
First name NOT start with letter "S"
The expression should match the entire last name, not just the first letter, and first name should NOT be matched.
Input string is like the following:
(Last name) (First name)
Duncan, Jean
Schmidt, Paul
Sells, Simon
Martin, Jane
Smith, Peter
Stephens, Sheila
This is my regular expression:
/([S].+)(?:, [^S])/
Here is the result I have got:
Schmidt, P
Smith, P
the result included "," space & letter "P" which should be excluded.
The ideal match would be
Schmidt
Smith
You can try this pattern: ^S\w+(?=, [A-RT-Z]).
^S\w+ matches any word (name in your case) that start with S at the beginning,
(?=, [A-RT-Z]) - positive lookahead - makes sure that what follows, is not the word (first name in your case) starting with S ([A-RT-Z] includes all caps except S).
Demo
I did something similar to catch the initials. I've just updated the code to fit your need. Check it:
public static void Main(string[] args)
{
//Your code goes here
Console.WriteLine(ValidateName("FirstName LastName", 'L'));
}
private static string ValidateName(string name, char letter)
{
// Split name by space
string[] names = name.Split(new string[] {" "}, StringSplitOptions.RemoveEmptyEntries);
if (names.Count() > 0)
{
var firstInitial = names.First().ToUpper().First();
var lastInitial = names.Last().ToUpper().First();
if(!firstInitial.Equals(letter) && lastInitial.Equals(letter))
{
return names.Last();
}
}
return string.Empty;
}
In you current regex you capture the lastname in a capturing group and match the rest in a non capturing group.
If you change your non capturing group (?: into a positive lookahead (?= you would only capture the lastname.
([S].+)(?=, [^S]) or a bit shorter S.+(?=, [^S])
Your regex worked for me fine
$array = ["Duncan, Jean","Schmidt, Paul","Sells, Simon","Martin, Jane","Smith, Peter","Stephens, Sheila"];
foreach($array as $el){
if(preg_match('/([S].+)(?:,)( [^S].+)/',$el,$matches))
echo $matches[2]."<br/>";
}
The Answer I got is
Paul
Peter

Regex to match all " - " deliminators in filename except first and last?

I've been trying to write a regex to match all the " - " deliminators in a filename except the first and last, so I can combine all the data in the middle into one group, for example a filename like:
Ann M Martin - Baby sitters Club - Baby sitters Little Sister - Super Special 04 - Karen, Hannie and Nancy - The Three Musketeers.doc
Has to become:
Ann M Martin - Baby sitters Club- Baby sitters Little Sister- Super Special 04- Karen, Hannie and Nancy - The Three Musketeers.doc
So basically I'm trying to replace " - " with "- " but not the first or last instance. The Filenames can have 1 to 6 " - " deliminators, but should only affect the ones with 3, 4, 5 or 6 " - " deliminators.
It's for use in File Renamer. flavor is JavaScript. Thanks.
Can you not use a regex? If so:
var s = "Ann M Martin - Baby sitters Club - Baby sitters Little Sister - Super Special 04 - Karen, Hannie and Nancy - The Three Musketeers.doc";
var p = s.split(' - ');
var r = ''; // result output
var i = 0;
p.forEach(function(e){
switch(i) {
case 0: r += e; break;
case 1: case p.length - 1: r += ' - ' + e; break;
default: r += '- ' + e;
}
i++;
});
console.log(r);
http://jsfiddle.net/c7zcp8z6/1/
s=Ann M Martin - Baby sitters Club - Baby sitters Little Sister - Super Special 04 - Karen, Hannie and Nancy - The Three Musketeers.doc
r=Ann M Martin - Baby sitters Club- Baby sitters Little Sister- Super Special 04- Karen, Hannie and Nancy - The Three Musketeers.doc
This is assuming that the separator is always - (1 space, 1 dash, 1 space). If not, you need to split on - only, then trim each tokens before reconstructing.
Two options:
1 - You'll need to do some processing of your own by iterating through the matches using
( - )
and building a new string (see this post about getting match indices).
You'll have to check that the match count is greater than 2 and skip the first and last matches.
2 - Use
.+ - ((?:.+ - )+).+ - .+
to get the part of the string to be modified and then do a replace on the the dashes, then build your string (again using the indices from the above regex).
Thanks for the suggestions.
I got it to work this way
It replaces the first and last " - " with " ! ", so I can then do a simple Find and Replace of all remaining " - " with "- ", then change all the " ! " back to " - "

Matching the pattern with foreign character

Here i do a regular expression where _pattern is the list of teams and _name is the keyword i would like to find whether it matches the _pattern.
Result shows that it matched. I'm wondering why is it possible because the keyword is totally different to the _pattern. I suspect that it is related with the é symbol.
string _pattern = "Ipswich Town F.C.|Ipswich Town Football Club|Ipswich|The Blues||Town|The Tractor Boys|Ipswich Town";
string _name = "Estudiantes de Mérida";
regex = new Regex( #"(" + _pattern + #")", RegexOptions .IgnoreCase );
Match m = regex. Match (_name );
if (m . Success)
{
var g = m. Groups [1 ]. Value;
break ;
}
It has nothing to do with the é symbol. Let's go over a few things..
Is it right that there are 2 | in as your questions formulates :
The Blues||Town
Also the point has special meaning in a regex so you should escape it
meaIpswich Town F\.C\.
And alternatives should be enclosed with parenthesis:
(Ipswich Town F.C.)|(Ipswich Town Football Club)|(Ipswich)|
The parenthesis in the following java line are not necessary
regex = new Regex( #"(" + _pattern + #")"
Aneway, The reason that it matches is not do to a valid regex. I think it has to do with your use of the java API.
The regex that I would rewrite for your purposes is:
^((Ipswich Town F\.C\.)|(Ipswich Town Football Club)|(Ipswich)|(The Blues)|(Town)|(The Tractor Boys)|(Ipswich Town))$
As you can see, there are quit a few differences.