I am a newbie in Regular expressions. What would be the regex to find all "DataKeyNames" followed by "," string? I am trying to find all text where we have "DataKeyNames" followed by "," in various files.
DataKeyNames="AppRevObjId, AsmtIdsOnHold"
i don't know what is you want to find but :
1) if you want 'AppRevObjId' only use this :
import re
s = 'AppRevObjId, AsmtIdsOnHold'
re.findall('([^\s]+),' , s)
output :
['AppRevObjId']
2) if you only want 'AsmtIdsOnHold' :
re.findall(',\s([^\s]+)' , s) # or-> re.findall(', ([^\s]+)' , s)
3) if you want both :
re.findall('([^\s]+), ([^\s]+)' , s) # or-> re.findall('([^\s]+),\s([^\s]+)' , s)
[('AppRevObjId', 'AsmtIdsOnHold')]
output:
[('AppRevObjId', 'AsmtIdsOnHold')]
so please explain more or report bugs to fix issues and bugs in codes
Related
Hi Every Regex Expert,
I have one Array List al1 like this (line by line):
al1 : L1_C1_0, L1_C2_"11229", L1_C2_"CHK_CASHING"_OK, etc... L1_C100_"FR45248624892", L2_C1_0, L2_C2_"11229", L2_C2_"CHK_CASHING"_OK etc... L2_C100_"FR45248624892"_KO, L3_C1_0, L3_C2_"11229", L3_C2_"CHK_CASHING"_OK etc... L3_C100_"FR45248624892"_KO, L4_C1_0, L3_C2_"11229", L4_C2_"CHK_CASHING"_OK etc... L4_C100_"FR45248624892"_OK
I write this regex but it doesn't work as i want :
String spattern = "(L(([1-9]?[0-9])|100)_C\\d_\\W.*?L\\2_C\\d{3}_\".*?\"(?:,?\$?))";
I want to display like this :
L1_C1_0, L1_C2_"11229", L1_C2_"CHK_CASHING"_OK, etc...L1_C100_"FR45248624892"
L2_C1_0, L2_C2_"11229", L2_C2_"CHK_CASHING"OK etc...L2_C100"FR45248624892"_KO
L3_C1_0, L3_C2_"11229", L3_C2_"CHK_CASHING"OK etc...L3_C100"FR45248624892"_KO
L4_C1_0, L3_C2_"11229", L4_C2_"CHK_CASHING"OK etc...L4_C100"FR45248624892"_OK
L5_C1_1 etc...
Some one can help me to Display this ?
Thank you very much for help
I have this code for extracting the repetitive : separated sections of a regex, which does not give me the right output.
val pattern = """([a-zA-Z]+)(:([a-zA-Z]+))*""".r
for (p <- pattern findAllIn "it:is:very:great just:because:it is") p match {
case pattern("it", pattern(is, pattern(very, great))) => println("it: "+ is + very+ great)
case pattern(it, _,rest) => println( it+" : "+ rest)
case pattern(it, is, very, great) => println(it +" : "+ is +" : "+ very +" : " + great)
case _ => println("match failure")
}
What am I doing wrong?
How can I write a case expression which allows me to extract each : separated part of the pattern regex?
What is the right syntax with which to solve this?
How can the match against unknown number of arguments to be extracted from a regex be done?
In this case print:
it : is : very : great
just : because : it
is
You can't use repeated capturing group like that, it only saves the last captured value as the current group value.
You can still get the matches you need with a \b[a-zA-Z]+(?::[a-zA-Z]+)*\b regex and then split each match with ::
val text = "it:is:very:great just:because:it is"
val regex = """\b[a-zA-Z]+(?::[a-zA-Z]+)*\b""".r
val results = regex.findAllIn(text).map(_ split ':').toList
results.foreach { x => println(x.mkString(", ")) }
// => it, is, very, great
// just, because, it
// is
See the Scala demo. Regex details:
\b - word boundary
[a-zA-Z]+ - one or more ASCII letters
(?::[a-zA-Z]+)* - zero or more repetitions of
: - a colon
[a-zA-Z]+ - one or more ASCII letters
\b - word boundary
I want to split the following string around +, but I couldn't succeed in getting the correct regex for this.
String input = "SOP3a'+bEOP3'+SOP3b'+aEOP3'";
I want to have a result like this
[SOP3a'+bEOP3', SOP3b'+aEOP3']
In some cases I may have the following string
c+SOP2SOP3a'+bEOP3'+SOP3b'+aEOP3'EOP2
which should be split as
[c, SOP2SOP3a'+bEOP3'+SOP3b'+aEOP3'EOP2]
I have tried the following regex but it doesn't work.
input.split("(SOP[0-9](.*)EOP[0-9])*\\+((SOP)[0-9](.*)(EOP)[0-9])*");
Any help or suggestions are appreciated.
Thanks
You can use the following regex to match the string and by replacing it using captured group you can get the expected result :
(?m)(.*?)\+(SOP.*?$)
see demo / explanation
Following is the code in Java that would work for you:
public static void main(String[] args) {
String input = "SOP3a'+bEOP3'+SOP3b'+aEOP3'";
String pattern = "(?m)(.*?)\\+(SOP.*?$)";
Pattern regex = Pattern.compile(pattern);
Matcher m = regex.matcher(input);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
} else {
System.out.println("NO MATCH");
}
}
The m.group(1) and m.group(2) are the values that you are looking for.
Do you really need to use split method?
And what are the rules? They are unclear to me.
Anyway, considering the regex you provided, I've only removed some unnecessary groups and I've found what you are looking for, however, instead of split, I just joined the matches as splitting it would generate some empty elements.
const str = "SOP1a+bEOP1+SOP2SOP3a'+bEOP3'+SOP3b'+aEOP3'EOP2";
const regex = RegExp(/(SOP[0-9].*EOP[0-9])*\+(SOP[0-9].*EOP[0-9])*/)
const matches = str.match(regex);
console.log('Matches ', matches);
console.log([matches[1],matches[2]]);
While filtering and cleaning text in Hebrew, I found that
gsub("[[:punct:]]", "", txt)
actually removes a relevant character. The character is "ק" and it is located in the "E" spot on the keyboard. Interestingly, the gsub function in R removes the "ק" character and then all words get messed up. Does anyone have an idea why?
According to Regular Expressions as used in R:
Certain named classes of characters are predefined. Their
interpretation depends on the locale (see locales); the interpretation
below is that of the POSIX locale.
Acc. to POSIX locale, [[:punct:]]should capture ! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { | } ~. So, you might need to adjust your regex to remove only the characters you want:
txt <- "!\"#$%&'()*+,\\-./:;<=>?#[\\\\^\\]_`{|}~"
gsub("[\\\\!\"#$%&'()*+,./:;<=>?#[\\^\\]_`{|}~-]", "", txt, perl = T)
Sample program output:
[1] ""
I have a bunch of strings with punctuation in them that I'd like to convert to spaces:
"This is a string. In addition, this is a string (with one more)."
would become:
"This is a string In addition this is a string with one more "
I can go thru and do this manually with the stringr package (str_replace_all()) one punctuation symbol at a time (, / . / ! / ( / ) / etc. ), but I'm curious if there's a faster way I'd assume using regex's.
Any suggestions?
x <- "This is a string. In addition, this is a string (with one more)."
gsub("[[:punct:]]", " ", x)
[1] "This is a string In addition this is a string with one more "
See ?gsub for doing quick substitutions like this, and ?regex for details on the [[:punct:]] class, i.e.
‘[:punct:]’ Punctuation characters:
‘! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { |
} ~’.
have a look at ?regex
library(stringr)
str_replace_all(x, '[[:punct:]]',' ')
"This is a string In addition this is a string with one more "