How can I get the contents between two delimiters using a regular expression? For example, I want to get the stuff between two |. For example, for this input:
|This is the text I want|
it should return this:
This is what I want
I already tried /^|(.*)$|/, but that returns This is what I want | instead of just This is what I want (shouldn't have the | at the end)
Try escaping the pipes /\|(.*?)\|/
For example, using JavaScript:
var s = '| This is what I want |';
var m = s.match(/\|\s*(.*?)\s*\|/);
m[1]; // => "This is what I want"
See example here:
https://regex101.com/r/qK6aG2/2
Related
I need to parse an input string that has the format of
AB~11111, AB~22222, AB~33333, AB~44444
into separate strings:
AB~11111, AB~22222, AB~33333, and AB~44444
Here is my attempted Regex:
range = "([^~,\n]+~[^,]+,)?";
non_delimiter = "[^,\n;]+";
range_regex = new RegExp(this.range + this.non_delimiter, 'g');
But somehow this regex would only parse the input string into
AB~11111, AB~22222 and AB~33333, AB~44444
instead of parsing the input string into individual strings.
Maybe this is missing the boat, but from your input what about something like:
AB~\d+
This should match each of the strings from the above: https://regex101.com/r/vVFDIG/1. And if there's variation (i.e., it can be other letters) then maybe something like:
[A-Z]{2}~\d+
Or whatever it would need to be but using the negative character class seems like quite a roundabout way of doing it. If that's the case, you could just do:
[^ ,]+
You should use a regex split here on ,\s*:
var input = "AB~11111, AB~22222, AB~33333, AB~44444";
var parts = input.split(/,\s*/);
console.log(parts);
If you need to check that the input also consists of CSV list of AB~11111 terms, then you may use test to assert that:
var input = "AB~11111, AB~22222, AB~33333, AB~44444";
console.log(/^[A-Z]{2}~\d{5}(?:,\s*[A-Z]{2}~\d{5})*$/.test(input));
My data set after a lot of programmatic clean up looks like this (showing partial data set here).
ABCD A M#L 79
BGDA F D#L 89
I'd like to convert this into the following for further Spark Dataframe operations
ABCD,A,M#L,79
BGDA,F,D#L,89
val reg = """/\s{2,}/"""
val cleanedRDD2 = cleanedRDD1.filter(x=> !reg.pattern.matcher(x).matches())
But this returns nothing. How do i find and replace empty strings with a delimiter?
Thanks!
rt
It seems you just want to replace all the non-vertical whitespaces in your string data. I suggest using replaceAll (to replace all the occurrences of the texts that match the pattern) with [\t\p{Zs}]+ regex.
Here is just a sample code:
val s = "ABCD A M#L 79\nBGDA F D#L 89"
val reg = """[\t\p{Zs}]+"""
val cleanedRDD2 = s.replaceAll(reg, ",")
print(cleanedRDD2)
// => ABCD,A,M#L,79
// BGDA,F,D#L,89
And here is the regex demo. The [\t\p{Zs}]+ matches 1 or more occurrences of a tab (\t) or any Unicode whitespace from the Space Separator category.
To modify the contents of the RDD, just use .map:
newRDD = yourRDD.map(elt => elt.replaceAll("""[\t\p{Zs}]+""", ","))
If you want to use directly on RDD
rdd_nopunc = rdd.flatMap(lambda x: x.split()).filter(lambda x: x.replace("[,.!?:;]", ""))
I have urls with following formats ...
/category1/1rwr23/item
/category2/3werwe4/item
/category3/123wewe23/item
/category4/132werw3/item
/category5/12werw33/item
I would replace the category numbers with {id} for further processing.
/category1/{id}/item
How do i replace category numbers with {id}. I have spend last 4 hours with out proper conclusion.
Assuming you'll be running regex in JavaScript, your regex will be.
/^(\/.*?\/)([^/]+)/gm
and replacement string should look like $1whatever
var str = "your url strings ..."
var replStr = 'replacement';
var re = /^(\/.*?\/)([^/]+)/gm;
var result = str.replace(re, '$1'+replStr);
console.log(result);
based on your input, it should print.
/category1/replacement/item
/category2/replacement/item
/category3/replacement/item
/category4/replacement/item
/category5/replacement/item
See DEMO
We devide it into 3 groups
1.part before replacement
2.replacement
3.part after replacement
yourString.replace(//([^/]*\/[^/]+\/)([^/]+)(\/[^/]+)/g,'$1' + replacement+ '$3');
Here is the demo: https://jsfiddle.net/9sL1qj87/
Input string contains multiple key[with some value], which we need to replace with key[with some value],val[value which is same as key].
Input string:
...key[102]...key[108]... key[211]...
Output string:
... key[102],val[102]...key[108],val[108]...key[211],val[211]...
Basically I need to replace all the key with values inside square braces with key[value],val[same value].
E.g. key[102] → key[102],val[102], and key[108] → key[108],val[108].
You need to use capturing groups.( http://www.regular-expressions.info/brackets.html )
key\[(.*?)\]
Debuggex Demo
Example java code (i couldn't test it):
var str = "...key[102]...key[108]... key[211]...";
System.out.println( (str.replaceAll("key\\[(.*?)\\]", "key[$1],val[$1]") );
I have the following tag from an XML file:
<msg><![CDATA[Method=GET URL=http://test.de:80/cn?OP=gtm&Reset=1(Clat=[400441379], Clon=[-1335259914], Decoding_Feat=[], Dlat=[0], Dlon=[0], Accept-Encoding=gzip, Accept=*/*) Result(Content-Encoding=[gzip], Content-Length=[7363], ntCoent-Length=[15783], Content-Type=[text/xml; charset=utf-8]) Status=200 Times=TISP:270/CSI:-/Me:1/Total:271]]>
Now I try to get from this message: Clon, Dlat, Dlon and Clat.
However, I already created the following regex:
(?<=Clat=)[\[\(\d+\)\n\n][^)n]+]
But the problem is here, I would like to get only the numbers without the brackets. I tried some other expressions.
Do you maybe know, how I can expand this expression, in order to get only the values without the brackets?
Thank you very much in advance.
Best regards
The regex
(clon|dlat|dlon|clat)=\[(-?\d+)\]
Gives
As I stated before, if you use this regex to extract the information out of this CDATA element, that's okay. But you really want to get to the contents of that element using an XML parser.
Example usage
Regex r = new Regex(#"(clon|dlat|dlon|clat)=\[(-?\d+)\]");
string s = ".. here's your cdata content .. ";
foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnoreCase))
{
var name = match.Groups[1].Value; //will contain "clon", "dlat", "dlon" or "clat"
var inner_value = match.Groups[2].Value; //will contin the value inside the square-brackets, e.g. "400441379"
//Do something with the matches
}