Universe OCONV argument for zero-padding - universe

I'm looking for some argument (ARG) such that this code:
A = 5
B = OCONV(A,'ARG5')
PRINT B
will print to the screen
00005
Anybody know something which will do this for me?

In Universe I would use the MR% conversion code. Just be aware that it will truncate anything longer than 5 characters.
A = 5
B = OCONV(A,'MR%5')
PRINT B
I use this a lot when I need to use EVAL in a conditional or as an aggregate function in a SQL or other TCL statement like to find the record with the most fields in a file.
SELECT MAX(EVAL "DCOUNT(#RECORD,#FM)") FROM VOC;
SELECT MAX(EVAL "OCONV(DCOUNT(#RECORD,#FM),'MR%8')") FROM VOC;
Masking aside these generally return 2 different values on our system.

I am using UniData, but looking at the commands reference manual I can't see anything quite right, in terms of one simple argument to OCONV, or similar. I came up with these (somewhat kludgy) alternatives, though:
NUMLEN=5
VALUE=5
PRINT CHANGE(SPACES(NUMLEN-LEN(VALUE))," ","0"):VALUE
Here you are using the SPACES function to create that amount of space characters and then convert them to zeros.
PRINT OCONV(VALUE,"MR":NUMLEN:"(#####)")
This is using OCONV but has to define a string with the "mask" to only shew the final 5 digits. So if NUMLEN changes then the mask string definition would have to change.
PRINT OCONV(VALUE,"MR":NUMLEN)[3,NUMLEN]
This version uses OCONV but prints starting at the 3rd character and shews the next NUMLEN characters, therefore trimming off the initial "0." that is made by using the "MR" parameter

PADDED.VALUE = VALUE 'R%5' is the simplest way to do this.

Related

Extract digits in between 2 different parameters

I have all data being imported into one cell as:
"<blank space><email address><blank space><CustomerId><blank space><(email address)><line break for next entry>"
Example:
email1#provider.com 12345678 (email1#provider.com)
email224#provider.com 23902490 (email224#provider.com)
I need to extract only the customer ID's, while separating them with a comma, so I tried the following: regexreplace(A2,"([^[:digit:]])",","), however, this also extracts the numbers associated with the emails, so it returns me:
,,,,,1,,,,,,,,,,,,,,12345678,,,,,,,1,,,,,,,,,,,,,,
,,,,,224,,,,,,,,,,,,,,23902490,,,,,,,224,,,,,,,,,,,,,,
Since the email address is set by the user, I don't have control how many digits or if only digits are used in it. I can't seem to understand how to isolate the CustomerIds alone.
Please help!
Edit1:
CustomerID: 64-bit int field, randomly assigned to a client, therefore checking by the length of the string would not work.
Edit2:
For now, I am using the formula below, but I would still be interested in a solution using Regex.
filter(transpose(split($B$4," ")),isnumber(transpose(split($B$4," "))))
If they are separated by a space you should be able to set the space to be your delimiter and extract from there.
https://zapier.com/blog/split-text-excel-zapier/
use:
=ARRAYFORMULA(TEXTJOIN(", ", 1, IFERROR(REGEXEXTRACT(A1:A2, "(?s)(\d{8})"))))

Regex Multiple rows [duplicate]

I'm trying to get the list of all digits preceding a hyphen in a given string (let's say in cell A1), using a Google Sheets regex formula :
=REGEXEXTRACT(A1, "\d-")
My problem is that it only returns the first match... how can I get all matches?
Example text:
"A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq"
My formula returns 1-, whereas I want to get 1-2-2-2-2-2-2-2-2-2-3-3- (either as an array or concatenated text).
I know I could use a script or another function (like SPLIT) to achieve the desired result, but what I really want to know is how I could get a re2 regular expression to return such multiple matches in a "REGEX.*" Google Sheets formula.
Something like the "global - Don't return after first match" option on regex101.com
I've also tried removing the undesired text with REGEXREPLACE, with no success either (I couldn't get rid of other digits not preceding a hyphen).
Any help appreciated!
Thanks :)
You can actually do this in a single formula using regexreplace to surround all the values with a capture group instead of replacing the text:
=join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))
basically what it does is surround all instances of the \d- with a "capture group" then using regex extract, it neatly returns all the captures. if you want to join it back into a single string you can just use join to pack it back into a single cell:
You may create your own custom function in the Script Editor:
function ExtractAllRegex(input, pattern,groupId) {
return [Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId])];
}
Or, if you need to return all matches in a single cell joined with some separator:
function ExtractAllRegex(input, pattern,groupId,separator) {
return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]).join(separator);
}
Then, just call it like =ExtractAllRegex(A1, "\d-", 0, ", ").
Description:
input - current cell value
pattern - regex pattern
groupId - Capturing group ID you want to extract
separator - text used to join the matched results.
Edit
I came up with more general solution:
=regexreplace(A1,"(.)?(\d-)|(.)","$2")
It replaces any text except the second group match (\d-) with just the second group $2.
"(.)?(\d-)|(.)"
1 2 3
Groups are in ()
---------------------------------------
"$2" -- means return the group number 2
Learn regular expressions: https://regexone.com
Try this formula:
=regexreplace(regexreplace(A1,"[^\-0-9]",""),"(\d-)|(.)","$1")
It will handle string like this:
"A1-Nutrition;A2-ActPhysiq;A2-BioM---eta;A2-PH3-Généti***566*9q"
with output:
1-2-2-2-3-
I wasn't able to get the accepted answer to work for my case. I'd like to do it that way, but needed a quick solution and went with the following:
Input:
1111 days, 123 hours 1234 minutes and 121 seconds
Expected output:
1111 123 1234 121
Formula:
=split(REGEXREPLACE(C26,"[a-z,]"," ")," ")
The shortest possible regex:
=regexreplace(A1,".?(\d-)|.", "$1")
Which returns 1-2-2-2-2-2-2-2-2-2-3-3- for "A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq".
Explanation of regex:
.? -- optional character
(\d-) -- capture group 1 with a digit followed by a dash (specify (\d+-) multiple digits)
| -- logical or
. -- any character
the replacement "$1" uses just the capture group 1, and discards anything else
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex
This seems to work and I have tried to verify it.
The logic is
(1) Replace letter followed by hyphen with nothing
(2) Replace any digit not followed by a hyphen with nothing
(3) Replace everything which is not a digit or hyphen with nothing
=regexreplace(A1,"[a-zA-Z]-|[0-9][^-]|[a-zA-Z;/é]","")
Result
1-2-2-2-2-2-2-2-2-2-3-3-
Analysis
I had to step through these procedurally to convince myself that this was correct. According to this reference when there are alternatives separated by the pipe symbol, regex should match them in order left-to-right. The above formula doesn't work properly unless rule 1 comes first (otherwise it reduces all characters except a digit or hyphen to null before rule (1) can come into play and you get an extra hyphen from "Patho-jour").
Here are some examples of how I think it must deal with the text
The solution to capture groups with RegexReplace and then do the RegexExctract works here too, but there is a catch.
=join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))
If the cell that you are trying to get the values has Special Characters like parentheses "(" or question mark "?" the solution provided won´t work.
In my case, I was trying to list all “variables text” contained in the cell. Those “variables text “ was wrote inside like that: “{example_name}”. But the full content of the cell had special characters making the regex formula do break. When I removed theses specials characters, then I could list all captured groups like the solution did.
There are two general ('Excel' / 'native' / non-Apps Script) solutions to return an array of regex matches in the style of REGEXEXTRACT:
Method 1)
insert a delimiter around matches, remove junk, and call SPLIT
Regexes work by iterating over the string from left to right, and 'consuming'. If we are careful to consume junk values, we can throw them away.
(This gets around the problem faced by the currently accepted solution, which is that as Carlos Eduardo Oliveira mentions, it will obviously fail if the corpus text contains special regex characters.)
First we pick a delimiter, which must not already exist in the text. The proper way to do this is to parse the text to temporarily replace our delimiter with a "temporary delimiter", like if we were going to use commas "," we'd first replace all existing commas with something like "<<QUOTED-COMMA>>" then un-replace them later. BUT, for simplicity's sake, we'll just grab a random character such as  from the private-use unicode blocks and use it as our special delimiter (note that it is 2 bytes... google spreadsheets might not count bytes in graphemes in a consistent way, but we'll be careful later).
=SPLIT(
LAMBDA(temp,
MID(temp, 1, LEN(temp)-LEN(""))
)(
REGEXREPLACE(
"xyzSixSpaces:[ ]123ThreeSpaces:[ ]aaaa 12345",".*?( |$)",
"$1"
)
),
""
)
We just use a lambda to define temp="match1match2match3", then use that to remove the last delimiter into "match1match2match3", then SPLIT it.
Taking COLUMNS of the result will prove that the correct result is returned, i.e. {" ", " ", " "}.
This is a particularly good function to turn into a Named Function, and call it something like REGEXGLOBALEXTRACT(text,regex) or REGEXALLEXTRACT(text,regex), e.g.:
=SPLIT(
LAMBDA(temp,
MID(temp, 1, LEN(temp)-LEN(""))
)(
REGEXREPLACE(
text,
".*?("&regex&"|$)",
"$1"
)
),
""
)
Method 2)
use recursion
With LAMBDA (i.e. lets you define a function like any other programming language), you can use some tricks from the well-studied lambda calculus and function programming: you have access to recursion. Defining a recursive function is confusing because there's no easy way for it to refer to itself, so you have to use a trick/convention:
trick for recursive functions: to actually define a function f which needs to refer to itself, instead define a function that takes a parameter of itself and returns the function you actually want; pass in this 'convention' to the Y-combinator to turn it into an actual recursive function
The plumbing which takes such a function work is called the Y-combinator. Here is a good article to understand it if you have some programming background.
For example to get the result of 5! (5 factorial, i.e. implement our own FACT(5)), we could define:
Named Function Y(f)=LAMBDA(f, (LAMBDA(x,x(x)))( LAMBDA(x, f(LAMBDA(y, x(x)(y)))) ) ) (this is the Y-combinator and is magic; you don't have to understand it to use it)
Named Function MY_FACTORIAL(n)=
Y(LAMBDA(self,
LAMBDA(n,
IF(n=0, 1, n*self(n-1))
)
))
result of MY_FACTORIAL(5): 120
The Y-combinator makes writing recursive functions look relatively easy, like an introduction to programming class. I'm using Named Functions for clarity, but you could just dump it all together at the expense of sanity...
=LAMBDA(Y,
Y(LAMBDA(self, LAMBDA(n, IF(n=0,1,n*self(n-1))) ))(5)
)(
LAMBDA(f, (LAMBDA(x,x(x)))( LAMBDA(x, f(LAMBDA(y, x(x)(y)))) ) )
)
How does this apply to the problem at hand? Well a recursive solution is as follows:
in pseudocode below, I use 'function' instead of LAMBDA, but it's the same thing:
// code to get around the fact that you can't have 0-length arrays
function emptyList() {
return {"ignore this value"}
}
function listToArray(myList) {
return OFFSET(myList,0,1)
}
function allMatches(text, regex) {
allMatchesHelper(emptyList(), text, regex)
}
function allMatchesHelper(resultsToReturn, text, regex) {
currentMatch = REGEXEXTRACT(...)
if (currentMatch succeeds) {
textWithoutMatch = SUBSTITUTE(text, currentMatch, "", 1)
return allMatches(
{resultsToReturn,currentMatch},
textWithoutMatch,
regex
)
} else {
return listToArray(resultsToReturn)
}
}
Unfortunately, the recursive approach is quadratic order of growth (because it's appending the results over and over to itself, while recreating the giant search string with smaller and smaller bites taken out of it, so 1+2+3+4+5+... = big^2, which can add up to a lot of time), so may be slow if you have many many matches. It's better to stay inside the regex engine for speed, since it's probably highly optimized.
You could of course avoid using Named Functions by doing temporary bindings with LAMBDA(varName, expr)(varValue) if you want to use varName in an expression. (You can define this pattern as a Named Function =cont(varValue) to invert the order of the parameters to keep code cleaner, or not.)
Whenever I use varName = varValue, write that instead.
to see if a match succeeds, use ISNA(...)
It would look something like:
Named Function allMatches(resultsToReturn, text, regex):
UNTESTED:
LAMBDA(helper,
OFFSET(
helper({"ignore"}, text, regex),
0,1)
)(
Y(LAMBDA(helperItself,
LAMBDA(results, partialText,
LAMBDA(currentMatch,
IF(ISNA(currentMatch),
results,
LAMBDA(textWithoutMatch,
helperItself({results,currentMatch}, textWithoutMatch)
)(
SUBSTITUTE(partialText, currentMatch, "", 1)
)
)
)(
REGEXEXTRACT(partialText, regex)
)
)
))
)

Remove space from numbers - Leave rest intact

Full disclosure: I'm no programmer (which will probably be evident in a second). I frequently copy numbers from one sheet to another and numbers bigger than 999 have a space between the first and the second number. I have tried to make script (with help from similar scripts I've found here) that remove these spaces at the click of a button but it only kinda works.
Here's a sample sheet with my code included: https://docs.google.com/spreadsheets/d/11_CWZfhBupUpBKYi7I9cwGBakSr4LikOm0bGft6YA7Q/edit?usp=sharing
I have tried my best to modify the Regex-expression to do what I want but I guess my knowledge is just too poor.
....
function Remove_space() {
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Input");
var r = sheet.getRange("E4:E50");
var v = r.getValues();
for(var k=0;k<v.length;k++)
for(var j=0;j<v[0].length;j++)
v[k][j]=v[k][j].toString().replace(/\s/g, "");
r.setValues(v);
};
....
Can anyone help me out and make the spaces go away but leave my commas intact?
It does remove the spaces but for some reason it also converts my commas (decimal separator) into periods. If there are no decimals it's no problem.
Issue:
String conversion: Javascript doesn't support , as decimal separator. So, when you getValue(), they're automatically converted to .. When you setValue(), it'll be converted back to ,. But during the execution of the function, you use .toString(), which makes the period . permanent.
Solution:
Get as String type: Use getDisplayValues() if you want to get numbers as strings with , as decimal separator intact. OR
Set as Number type: Convert the regex replaced string back to a number. You should also manually convert comma , to period . in this case to cast it back to number type.
Snippet:
r.getDisplayValues();
OR
v[k][j]=Number(v[k][j].toString().replace(/\s/g, "").replace(/\,/g,"."));
References:
Number
Range#getDisplayValues: Notice that it returns String [], whereas getValues() returns Object []

How to remove the space between the minus sign and number's in informatica

i have a issue where the there is a amount field which has data like
(- 98765.00),minus{spaces]{numbers} ?, i need to remove the space between the minus and the number and get is as (-98765.00), how do i do it in expression transformation.
field datatype is decimal (8,2).
Thanks,
Kiran
output_port: TO_DECIMAL(REPLACECHR(FALSE,input_port,' ',''))
REPLACECHR replaces the blanks with empty character, essentially removing them. The first argument can be TRUE/FALSE to specify case sensitive or not, but it is not important in this case.
You can use REG_REPLACE function to replace space
To achieve this you need to follow below steps,
* Create two variable ports
* REG_REPLACE - function requires string column, so you need to convert the decimal column to string column using TO_CHAR function
First variable port(string) - TO_CHAR(column_name)
* In previous port data is converted to string, now convert it again to decimal and apply REG_REPLACE function
Second variable port(decimal) - to_decimal(reg_replace(first_variable_port,'s+',''))
s - determines the white spaces in informatica regular expression
See the below image,
same number which you provided is used. Use the same data type and function
Debugger gives the exact result by removing white space in the below image,
May be you have the issue with other transformations which you are passing through. Debug and verify the data once.
Hope you got it, any issues feel free to ask
To have enjoy informatica, have a fun on https://etlinfromatica.wordpress.com/
If my understanding is correct, you need to replace both the spaces and the brackets. Here's the expression:
TO_DECIMAL(
REPLACECHR(0,
REPLACECHR(0, '(- 98765.00)', ' ', '') -- this part does the space replacement
, '()', '') -- this part replaces the brackets
)

VB.Net Beginner: Replace with Wildcards, Possibly RegEx?

I'm converting a text file to a Tab-Delimited text file, and ran into a bit of a snag. I can get everything I need to work the way I want except for one small part.
One field I'm working with has the home addresses of the subjects as a single entry ("1234 Happy Lane Somewhere, St 12345") and I need each broken down by Street(Tab)City(Tab)State(Tab)Zip. The one part I'm hung up on is the Tab between the State and the Zip.
I've been using input=input.Replace throughout, and it's worked well so far, but I can't think of how to untangle this one. The wildcards I'm used to don't seem to be working, I can't replace ("?? #####") with ("??" + ControlChars.Tab + "#####")...which I honestly didn't expect to work, but it's the only idea on the matter I had.
I've read a bit about using Regex, but have no experience with it, and it seems a bit...overwhelming.
Is Regex my best option for this? If not, are there any other suggestions on solutions I may have missed?
Thanks for your time. :)
EDIT: Here's what I'm using so far. It makes some edits to the line in question, taking care of spaces, commas, and other text I don't need, but I've got nothing for the State/Zip situation; I've a bad habit of wiping something if it doesn't work, but I'll append the last thing I used to the very end, if that'll help.
If input Like "Guar*###/###-####" Then
input = input.Replace("Guar:", "")
input = input.Replace(" ", ControlChars.Tab)
input = input.Replace(",", ControlChars.Tab)
input = "C" + ControlChars.Tab + strAccount + ControlChars.Tab + input
End If
input = System.Text.RegularExpressions.Regex.Replace(" #####", ControlChars.Tab + "#####") <-- Just one example of something that doesn't work.
This is what's written to input in this example
" Guar: LASTNAME,FIRSTNAME 999 E 99TH ST CITY,ST 99999 Tel: 999/999-9999"
And this is what I can get as a result so far
C 99999/9 LASTNAME FIRSTNAME 999 E 99TH ST CITY ST 99999 999/999-9999
With everything being exactly what I need besides the "ST 99999" bit (with actual data obviously omitted for privacy and professional whatnots).
UPDATE: Just when I thought it was all squared away, I've got another snag. The raw data gives me this.
# TERMINOLOGY ######### ##/##/#### # ###.##
And the end result is giving me this, because this is a chunk of data that was just fine as-is...before I removed the Tabs. Now I need a way to replace them after they've been removed, or to omit this small group of code from a document-wide Tab genocide I initiate the code with.
#TERMINOLOGY###########/##/########.##
Would a variant on rgx.Replace work best here? Or can I copy the code to a variable, remove Tabs from the document, then insert the variable without losing the tabs?
I think what you're looking for is
Dim r As New System.Text.RegularExpressions.Regex(" (\d{5})(?!\d)")
Dim input As String = rgx.Replace(input, ControlChars.Tab + "$1")
The first line compiles the regular expression. The \d matches a digit, and the {5}, as you can guess, matches 5 repetitions of the previous atom. The parentheses surrounding the \d{5} is known as a capture group, and is responsible for putting what's captured in a pseudovariable named $1. The (?!\d) is a more advanced concept known as a negative lookahead assertion, and it basically peeks at the next character to check that it's not a digit (because then it could be a 6-or-more digit number, where the first 5 happened to get matched). Another version is
" (\d{5})\b"
where the \b is a word boundary, disallowing alphanumeric characters following the digits.