Substring in DataWeave up to the occurrence of a character - regex

In DataWeave, how would I substring an input value such that the output is everything up to the occurrence of a character? My input value is something like ABCD_123 and I need to take everything up to the underscore, so my output would be ABCD. The regex that achieves this is /[^_]*/, but I can't find a way to implement this using DataWeave. Any help would be appreciated!

Based on #WiktorStribiżew's comment, the way I solved this was by declaring a function:
%function split(text) text splitBy "_"
And then in my DW mapping, I take the value as:
OUTPUT: split(payload.INPUT)[0]

Related

Splitting name/value pairs with regex to ignore special characters based on surrounding characters

I have this regex that's worked well so far that splits 'name=value' pairs separated by a given character.
(?s)([^\s=]+)=(.*?)(?=\s+[^\s=]+=|\Z)
I know the separator, but the problem is in the example below (tab separated):
usrName=Wilma sev=4 cat=Detection CommandLine="C:\powershell.exe" -Enc 0ATQBpAG0AAcABDAHIAZQBkAHMAIgA= IOCValue= ProcessEndTime=2023-01-18 15:51:05
https://regex101.com/r/1wgVxs/5
Some values can have no value in the case of 'IOCValue' which works as expected, however some values like the CommandLine are giving me up to -Enc as one match and the remainder to the next pair as another.
What I'm hoping to get out from the above is:
usrName=Wilma
sev=4
cat=Detection
CommandLine="C:\powershell.exe" -Enc 0ATQBpAG0AAcABDAHIAZQBkAHMAIgA=
IOCValue=
ProcessEndTime=2023-01-18 15:51:05
But I'm getting:
usrName=Wilma
sev=4
cat=Detection
CommandLine="C:\powershell.exe" -Enc
0ATQBpAG0AAcABDAHIAZQBkAHMAIgA=
IOCValue=
ProcessEndTime=2023-01-18 15:51:05
Given I know the separator is a tab I think what I need is to only look for name=value pairs when they are at the start of the line or proceeded by the separator (tab). Is this possible?
Note, I can expect a space separator too, but I have a less performant and messy non-regex version I can send these too, so presume tab.
You may use this simplified regex:
(?s)([^\s=]+)=(.*?)(?=\t|\Z)
Updated RegEx Demo
Here, lookahead (?=\t|\Z) will make sure that value part is followed by either a tab character or end position.

Tcl - How to Add Text after last character through regex?

I need a tip, tip or suggestion followed by some example of how I can add an extension in .txt format after the last character of a variable's output line.
For example:
set txt " ONLINE ENGLISH COURSE - LESSON 5 "
set result [concat "$txt" .txt]
Print:
Note that there is space in the start, means and fin of the variable phrase (txt). What must be maintained are the spaces of the start and means. But replace the last space after the end of the sentence, with the format of the extension [.txt].
With the built-in concat method of Tcl, it does not achieve the desired effect.
The expected result was something like this:
ONLINE ENGLISH COURSE - LESSON 5.txt
I know I could remove spaces with string map but I don't know how to remove just the last occurrence on the line.
And otherwise I don’t know how to remove the last space to add the text [.txt]
If anyone can point me to one or more solutions, thank you in advance.
set result "[string trimright $txt].txt"
or
set result [regsub {\s*$} $txt ".txt"]

RegEx for matching a string after a string up to a comma

Here is a sample string.
"BLAH, blah, going to the store &^5, light Version 12.7(2)L6, anyway
plus other stuff Version 3.3.4.6. Then goes on an on for several lines..."
I want to capture only the first version number without including the word version if possible but not include the periods and parenthesis. The result would stop when it encounters a comma. The result would be:
"1272L6"
I don't want it to include other instances of version in the text. Can this be done?
I've tried (?<=version)[^,]* I know it does not address removing the periods and parens and does not address the subsequent versions.
This exact RegEx, maybe not the best solution, but it might help you to get 1272L6:
([0-9]{2})\.([0-9]{1})\(([0-9]{1})\)([A-Z]{1}[0-9]{1})
It creates four groups (where $1$2$3$4 is your target 1272L6) and passes ., ) and (.
You might change {1} to other numbers of repetitions, such as {1,2}.
Assuming the version number is fixed on format but not on the specific digits or letters, you could do this.
String s = "this is a test 12.7(2)L6, 13.7(2)L6, 14.7(2)L6";
String reg = "(\\d\\d\\.\\d\\(\\d\\)[A-Z]\\d),";
Matcher m = Pattern.compile(reg).matcher(s);
if (m.find()) { // should only find first one
System.out.println(m.group(1).replaceAll("[.()]", ""));
}

How can I replace multiple words "globally" using regexp_replace in Oracle?

I need to replace multiple words such as (dog|cat|bird) with nothing in a string where there may be multiple consecutive occurrences of a word. The actual code is to remove salutations and suffixes from a name. Unfortunately the garbage data I get sometimes contains "SNERD JR JR."
I was able to create a regular expression pattern that accomplishes my goal but only for the first occurrence. I implemented a stupid hack to get rid of the second occurrence, but I believe there has to be a better way. I just can't figure it out.
Here is my "hacked" code;
FUNCTION REMOVE_SALUTATIONS(IN_STRING VARCHAR2) RETURN VARCHAR2 DETERMINISTIC
AS
REGEX_SALUTATIONS VARCHAR2(4000) := '(^|\s)(MR|MS|MISS|MRS|DR|MD|M D|SR|SIR|PHD|P H D|II|III|IV|JR)(\.?)(\s|$)';
BEGIN
RETURN TRIM(REGEXP_REPLACE(REGEXP_REPLACE(IN_STRING,REGEX_SALUTATIONS,' '),REGEX_SALUTATIONS,''));
END REMOVE_SALUTATIONS;
I was actually proud that I was able to get this far, as regular expression are not very regular to me. All help is appreciated.
EDIT:
The default for regexp_replace based on my understanding is to do a global replace. But on the outside chance my DB is configured different I did try;
select REGEXP_REPLACE('SNERD JR JR','(^|\s)(MR|MS|MISS|MRS|DR|MD|M D|SR|SIR|PHD|P H D|II|III|IV|JR)(\.?)(\s|$)',' ',1,0) from dual;
and the results are;
SNERD JR
Use occurrence parameter of REGEXP_REPLACE function. The docs says:
occurrence is a nonnegative integer indicating the occurrence of the replace operation:
If you specify 0, then Oracle replaces all occurrences of the match.
If you specify a positive integer n, then Oracle replaces the nth occurrenc
https://docs.oracle.com/cd/B28359_01/server.111/b28286/functions137.htm#SQLRF06302
It should look like:
...
REGEXP_REPLACE(IN_STRING,REGEX_SALUTATIONS,' ', 1,0 )
...

Removing ending alpha characters from string in XSLT

I have one requirement related to XSLT.
i want to remove ending alphabets in my final output string.
here is the example:
Input string:0123467AAA
Output :0123467
i.e no ending alphbets.
i m new to xslt creation,any suggestion is very helpful to me.
Thank you all in advance.
With XSLT 1.0 your only real option for this is to write a recursive template. Write a named template that takes the string as a parameter. Test whether the last character is a letter. (You can find the last character by using substring($s, string-length($s)-1, 1), and you can test whether it is a letter by testing translate($s, 'ABCD..XYZ', '') = ''). If the last character is a letter make a recursive call to your template passing the whole string minus the last character as the value of the parameter (again, by using substring()). Otherwise, return the string. Make sure that your recursion terminates if the string is zero length.