How to remove £ symbol from nested array and convert string to integer - regex

I've got a nested array of prices that are a string and I want to remove the £ symbols and convert the string into an integer. I need to strip the £ signs and also convert to a integer so I can use the values in a chart.js line graph.
I've been trying to use regex replace to remove the £ sign but I don't think I can get it working because the strings are in a nested array. I can't seem to find anything on the net about replacing characters in a nested array. I haven't even tried converting the string to an integer yet, but wondering if it could all be handled in one go in someway?
this is my nested array called linedata
var linedata = [["£14.99,£14.99,£14.99"],["£34.99,£34.99,£34.99"]]
this is the code I've been playing around with
var re = /£/g;
var newlinedata = linedata.replace(re, "");
Its not returning anything in chrome console and the ionic CLI is kicking out this error
ERROR in src/app/home/home.page.ts(66,26): error TS2339: Property
'replace' does not exist on type 'any[]'.
Thoughts?

This expression might help you to replace that:
£([0-9.]+,?)
The key is to add everything you like to keep in the capturing group () and the pound symbol outside of the group, then simply replace it with $1.
RegEx
If this wasn't your desired expression, you can modify/change your expressions in regex101.com.
RegEx Circuit
You can also visualize your expressions in jex.im:
JavaScript Demo
const regex = /£([0-9.]+,?)/gm;
const str = `£14.99,£14.99,£14.99
£34.99,£34.99,£34.99`;
const subst = `$1`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);

You could capture in a group matching a digit with an optional decimal part after matching the £
£(\d+(?:\.\d+)?)
£ Match literally
( Capturing group
\d+(?:\.\d+)? Match 1+ digits with an optional part to match a dot and 1+ digits
) Close group
Regex demo
You should use replace on a string instead of an array.
To turn the nested array with the strings into arrays of numbers, you could use map and in the replacement refer to the capturing group using $1 which contains your value.
For example:
var linedata = [
["£14.99,£14.99,£14.99"],
["£34.99,£34.99,£34.99"]
].map(ary =>
ary[0].split(',')
.map(v => Number(v.replace(/£(\d+(?:\.\d+)?)/g, "$1")))
);
console.log(linedata);
Or if you want to keep multiple nested arrays, you could use another map.
var linedata = [
["£14.99,£14.99,£14.99"],
["£34.99,£34.99,£34.99", "£34.99,£34.99,£34.99"]
].map(ary => ary
.map(s => s.split(',')
.map(v => Number(v.replace(/£(\d+(?:\.\d+)?)/g, "$1")))
)
);
console.log(linedata);

Related

Regular expression to search for digits after a decimal place

I'm trying to write a regular expression that can match a decimal (and the digits after) of a dollar value. For example, I want to match $1.00 , $1,100.89 (includes values in the thousands with commas). It cannot match any digits that are not preceded by a $ character. There values are also not the only pieces of text in this file.
So far, I've tried a few things that haven't quite gotten me there:
\.+[\d]+ (highlights the decimal and every digit after the decimal point, but not what we want because it includes non-dollar values like 1.00)
\$+[\d+\.]+ highlights the whole value of the dollar except the 1,250
(\$\d+\.+\d+)|\$\d+\,+\d+\.+\d+ highlights the whole value of anything with a dollar sign
Anyone have an idea?
I looked at your problem and I believe I have a solution.
You could use the regex below to search for the last two decimals.
^\$[\d,]+\.((?:\d){2})
You can see it in action here
Use:
^\$[\d,]+\.(\d\d)$
Explanation:
^ # beginning of string
\$ # $ sign
[\d,]+ # 1 or more digit or comma
\. # a dot
(\d\d) # group 1, 2 digits
$ # end of string
var test = [
'$100.00',
'$1,100.89',
'$123',
'123.45',
];
console.log(test.map(function (a) {
m = a.match(/^\$[\d,]+\.(\d\d)$/);
if (m)
return a + ' : ' + m[1];
else
return a + ' : no match';
}));
You could use the non matching group selector (?:) to isolate only the group you want. I've come up with this regex and it seams to do what you are looking for
^(?:\$[,\d]+)(?:\.([\d]{2}))
const regex = /^(?:\$[,\d]+)(?:\.([\d]{2}))/;
const values = [
'$100.00',
'$99.99',
'$1,354.92'
];
const result = values.map(item => regex.exec(item)[1]);
console.log(result);
You could test more cases here
EDIT :
Here is an example on how to replace only the last digit.
I'm using the same concept as the other one, only this time i'm not keeping the digit. I'm going to use $1 to get the group i want in the new string.
const regex = /^(\$[,\d]+)\.(?:[\d]{2})/;
const values = [
'$100.00',
'$99.99',
'$1,354.92'
];
const result = values.map(item => item.replace(regex, '$1.50'));
console.log(result);
Notice here that the $1 in the replace function refer to the first group matching group of the regex. This way, we can get it back an "insert" it into our final string.
Here I've choosen .50 as a replace string, but you could use what ever.
P.S. I know this might be confusing because we are talking about dollar, so here is an example where we replace the final digit with a word.
const regex = /^(\$[,\d]+)\.(?:[\d]{2})/;
const values = [
'$100.00',
'$99.99',
'$1,354.92'
];
const result = values.map(item => item.replace(regex, '$1 this is a word'));
console.log(result);

How to replace part of string using regex pattern matching in scala?

I have a String which contains column names and datatypes as below:
val cdt = "header:integer|releaseNumber:numeric|amountCredit:numeric|lastUpdatedBy:numeric(15,10)|orderNumber:numeric(20,0)"
My requirement is to convert the postgres datatypes which are present as numeric, numeric(15,10) into spark-sql compatible datatypes.
In this case,
numeric -> decimal(38,30)
numeric(15,10) -> decimal(15,10)
numeric(20,0) -> bigint (This is an integeral datatype as there its precision is zero.)
In order to access the datatype in the string: cdt, I split it and created a Seq from it.
val dt = cdt.split("\\|").toSeq
Now I have a Seq of elements in which each element is a String in the below format:
Seq("header:integer", "releaseNumber:numeric","amountCredit:numeric","lastUpdatedBy:numeric(15,10)","orderNumber:numeric(20,0)")
I have the pattern matching regex: """numeric\(\d+,(\d+)\)""".r, for numeric(precision, scale) which only works if there is a
scale of two digits, ex: numeric(20,23).
I am very new to REGEX and Scala & I don't understand how to create regex pattterns for the remaining two cases & apply it on a string to match a condition. I tried it in the below way but it gives me a compilation error: "Cannot resolve symbol findFirstMatchIn"
dt.map(e => e.split("\\:")).map(e => changeDataType(e(0), e(1)))
def changeDataType(colName: String, cd:String): String = {
val finalColumns = new String
val pattern1 = """numeric\(\d+,(\d+)\)""".r
cd match {
case pattern1.findFirstMatchIn(dt) =>
}
}
I am trying to get the final output into a String as below:
header:integer|releaseNumber:decimal(38,30)|amountCredit:decimal(38,30)|lastUpdatedBy:decimal(15,10)|orderNumber:bigint
How to multiple regex patterns for different cases to check/apply pattern matching on datatype of each value in the seq and change it to my suitable datatype as mentioned above.
Could anyone let me know how can I achieve it ?
It can be done with a single regex pattern, but some testing of the match results is required.
val numericRE = raw"([^:]+):numeric(?:\((\d+),(\d+)\))?".r
cdt.split("\\|")
.map{
case numericRE(col,a,b) =>
if (Option(b).isEmpty) s"$col:decimal(38,30)"
else if (b == "0") s"$col:bigint"
else s"$col:decimal($a,$b)"
case x => x //pass-through
}.mkString("|")
//res0: String = header:integer|releaseNumber:decimal(38,30)|amountCredit:decimal(38,30)|lastUpdatedBy:decimal(15,10)|orderNumber:bigint
Of course it can be done with three different regex patterns, but I think this is pretty clear.
explanation
raw - don't need so many escape characters - \
([^:]+) - capture everything up to the 1st colon
:numeric - followed by the string ":numeric"
(?: - start a non-capture group
\((\d+),(\d+)\) - capture the 2 digit strings, separated by a comma, inside parentheses
)? - the non-capture group is optional
numericRE(col,a,b) - col is the 1st capture group, a and b are the digit captures, but they are inside the optional non-capture group so they might be null

Regexp to extract studyinstanceuid from dump

I need to capture numbers and dots between brackets on lines containing the string 0020,000d, for example:
I: (0020,000d) UI [1.2.410.200001.1104.20160720104648421 ] # 38, 1 StudyInstanceUID
Using this regexp 0020,000d.*\[([\.0-9]+)\] I can match the needed value only if it doesn't have a space inside the brackets. How can I match the needed value ignoring any other character?.
Edit
If I use this regexp 0020,000d.*\[([\.0-9(\s|^\s))]+)\] I can capture numbers and dots and/or spaces, now if the string contains a space how can I capture in a group everything but the space?.
To clarify, I want to extract the 1.2.410.200001.1104.20160720104648421 string.
Codifying my (apparently helpful) answer from the comments:
You just need to allow zero or more spaces after the numbers-and-dots sequence before the closing bracket:
0020,000d.*\[([.0-9]+) *\]
Also, please note that you don't need to escape a dot in a character class.
Try this
let regex = /(?!\[)[.\d]+(?=[(\s)*\]])/g
let str = 'I: (0020,000d) UI [1.2.410.200001.1104.20160720104648421 ]'
let result = str.match(regex);
console.log(result);

Split by regex with capturing groups in lookahead produces repeating fragments in results

I was hoping for a one-liner to insert thousands separators into string of digits with decimal separator (example: 78912345.12). My first attempt was to split the string in places where there is either 3 or 6 digits left until decimal separator:
console.log("5789123.45".split(/(?=([0-9]{3}\.|[0-9]{6}\.))/));
which gave me the following result (notice how fragments of original string are repeated):
[ '5', '789123.', '789', '123.', '123.45' ]
I found out that "problem" (please read problem here as my obvious misunderstanding) comes from using a group within lookahead expression. This simple expression works "correctly":
console.log("abcXdeYfgh".split(/(?=X|Y)/));
when executed prints:
[ 'abc', 'Xde', 'Yfgh' ]
But the moment I surround X|Y with parentheses:
console.log("abcXdeYfgh".split(/(?=(X|Y))/));
the resulting array looks like:
[ 'abc', 'X', 'Xde', 'Y', 'Yfgh' ]
Moreover, when I change the group to a non-capturing one, everything comes back to "normal":
console.log("abcXdeYfgh".split(/(?=(?:X|Y))/));
this yields again:
[ 'abc', 'Xde', 'Yfgh' ]
So, I could do the same trick (changing to non-capturing group) within original expression (and it indeed works), but I was hoping for an explanation of this behavior I cannot understand. I experience identical results when trying to do the same in .NET so it seems like a fundamental thing with how regular expression lookaheads work. This is my question: why lookahead with capturing groups produces those "strange" results?
Capturing groups inside a regex pattern inside a regex split method/function make the captured texts appear as separate elements in the resulting array (for most of the major languages).
Here is C#/.NET reference:
If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array. For example, if you split the string "plum-pear" on a hyphen placed within capturing parentheses, the returned array includes a string element that contains the hyphen.
Here is JavaScript reference:
If separator is a regular expression that contains capturing parentheses, then each time separator is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array. However, not all browsers support this capability.
Just a note: the same behavior is observed with
PHP (with preg_split and PREG_SPLIT_DELIM_CAPTURE flag):
print_r(preg_split("/(?<=(X))/","XYZ",-1,PREG_SPLIT_DELIM_CAPTURE));
// --> [0] => X, [1] => X, [2] => YZ
Ruby (with string.split):
"XYZ".split(/(?<=(X))/) # => X, X, YZ
But it is the opposite in Java, the captured text is not part of the resulting array:
System.out.println(Arrays.toString("XYZ".split("(?<=(X))"))); // => [X, YZ]
And in Python, with re module, re.split cannot split on the zero-width assertion, so the string does not get split at all with
print(re.split(r"(?<=(X))","XXYZ")) # => ['XXYZ']
Here is a simple way to do it in Javascript
number.toString().replace(/\B(?=(\d{3})+(?!\d))/g, ",")
Normally, including capture buffers could sometimes produce extra elements
if mixing with lookaheads.
You are on the right track but didn't have a natural anchor.
If you use a string where all the characters are the same type
(in your case digits), and using lookaheads, its not good enough
to do the split incrementally based on a length of common characters.
The engine just bumps along one character at a time, splitting on that
character and including the captured ones as elements.
You could handle this by consuming the capture in the process,
like (?=(\d{3}))\1 but that not only splits at the wrong place but
injects an empty element in the array.
The solution is to use the Natural Anchor, the DOT, then split at
multiples of 3 up to the dot anchor.
This forces the engine to seek to the point at which there are multiples
away from the anchor.
Then your problem is solved, no need for captures and the split is perfect.
Regex: (?=(?:[0-9]{3})+\.)
Formatted:
(?=
(?: [0-9]{3} )+
\.
)
C#:
string[] ary = Regex.Split("51234555632454789123.45", #"(?=(?:[0-9]{3})+\.)");
int size = ary.Count();
for (int i = 0; i < size; i++)
Console.WriteLine(" {0} = '{1}' ", i, ary[i]);
Output:
0 = '51'
1 = '234'
2 = '555'
3 = '632'
4 = '454'
5 = '789'
6 = '123.45'

Parsing Excel reference with regular expression?

Excel returns a reference of the form
=Sheet1!R14C1R22C71junk
("junk" won't normally be there, but I want to be sure that there's no extraneous text.)
I would like to 'split' this into a VB array, where
a(0)="Sheet1"
a(1)="14"
a(2)="1"
a(3)="22"
a(4)="71"
a(5)="junk"
I'm sure it can be done easily with a regular expression, but I just can't get the hang of it.
Is there a kind soul who could help me?
Thanks
=([^!]+)!R(\d+)C(\d+)R(\d+)C(\d+)(.*)
should work.
[^!]+ matches a sequence of non-exclamation-point characters.
\d+ matches a sequence of digits.
.* matches anything.
So, in VB.NET:
Dim a As Match
a = Regex.Match(SubjectString, "=([^!]+)!R(\d+)C(\d+)R(\d+)C(\d+)(.*)")
If a.Success Then
' matched text: a.Value
' backreference n text: a.Groups(n).Value
Else
' Match attempt failed
End If
A straightforward String.Split would work, provided the "junk" text wasn't there:
Dim input As String = "=Sheet1!R14C1R22C71"
Dim result = input.Split(New Char() { "="c, "!"c, "R"c, "C"c }, StringSplitOptions.RemoveEmptyEntries)
For Each item As String In result
Console.WriteLine(item)
Next
The regex gets a little tricky since you will need to go through the Groups and Captures of the nested portions to get the proper order.
EDIT: here's my regex solution. It accepts multiple occurrences of R's and C's.
Dim input As String = "=Sheet1!R14C1R22C71junk"
Dim pattern As String = "=(?<Sheet>Sheet\d+)!(?:R(?<R>\d+)C(?<C>\d+))+"
Dim m As Match = Regex.Match(input, pattern)
If m.Success Then
Console.WriteLine(m.Groups("Sheet").Value)
For i = 0 To m.Groups("R").Captures.Count - 1
Console.WriteLine(m.Groups("R").Captures(i).Value)
Console.WriteLine(m.Groups("C").Captures(i).Value)
Next
End If
Pattern explanation:
"=(?Sheet\d+)" : matches an = sign followed by "Sheet" and digits. Uses named group of "Sheet"
"!(?:R(?\d+)C(?\d+))+" : matches the exclamation mark followed by at least one occurrence of the *R*xx*C*xx portion of the text. Named groups of "R" and "C" are used.
"(?:...)+" : this portion from the above portion matches but does not capture the inner pattern (i.e., the R/C part). This is to avoid unnecessarily capturing them while we are actually capturing them with the named groups.
More general regexes for R1C1 style:
^=(?:(?<Sheet>[^!]+)!)?(?:R((?<RAbs>\d+)|(?<RRel>\[-?\d+\]))C((?<CAbs>\d+)|(?<CRel>\[-?\d+\]))){1,2}$
And A1 style:
^=(?:(?<Sheet>[^!]+)!)?(?:(?<Col1>\$?[a-z]+)(?<Row1>\$?\d+))(?:\:(?<Col2>\$?[a-z]+)(?<Row2>\$?\d+))?$
It doesn't match external references like =[Book1]Sheet1!A1 though.