Related
I have this array of strings.
["Anyvalue", "Total", "value:", "9,999.00", "Token", " ", "|", " ", "Total", "chain", "value:", "4,948"]
and I'm trying to get numbers in one line of code. I tried many methods but wasn't really helpful as am expecting.
I'm using one with grep method:
array.grep(/\d+/, &:to_i) #[9, 4]
but it returns an array of first integers only. It seems like I have to add something to the pattern but I don't know what.
Or there is another way to grab these numbers in an Array?
you can use:
array.grep(/[\d,]+\.?\d+/)
if you want int:
array.grep(/[\d,]+\.?\d+/).map {_1.gsub(/[^0-9\.]/, '').to_i}
and a faster way (about 5X to 10X):
array.grep(/[\d,]+\.?\d+/).map { _1.delete("^0-9.").to_i }
for a data like:
%w[
,,,4
1
1.2.3.4
-2
1,2,3
9,999.00
4,948
22,956
22,536,129,336
123,456
12.]
use:
data.grep(/^-?\d{1,3}(,\d{3})*(\.?\d+)?$/)
output:
["1", "-2", "9,999.00", "4,948", "22,956", "22,536,129,336", "123,456"]
arr = ["Anyvalue", "Total", "value:", "9,999.00", "Token", " ", "61.4.5",
"|", "chain", "-4,948", "3,25.61", "1,234,567.899"]
rgx = /\A\-?\d{1,3}(?:,\d{3})*(?:\.\d+)?\z/
arr.grep(rgx)
#=> ["9,999.00", "-4,948", "1,234,567.899"]
Regex demo. At the link the regular expression was evaluated with the PCRE regex engine but the results are the same when Ruby's Onigmo engine is used. Also, at the link I've used the anchors ^ and $ (beginning and end of line) instead of \A and \z (beginning and end of string) in order test the regex against multiple strings.
The regular expression can be broken down as follows.
/
\A # match the beginning of the string
\-? # optionally match '-'
\d{1,3} # match between 1 and 3 digits inclusively
(?: # begin a non-capture group
,\d{3} # match a comma followed by 3 digits
)* # end the non-capture group and execute 0 or more times
(?: # begin a non-capture group
\.\d+ # match a period followed by one or more digits
)? # end the non-capture and make it optional
\z # match the end of the string
/
To make the test more robust we could use the methods Kernel::Float, Kernel::Rational and Kernel::Complex, all with the optional argument :exception set to false.
arr = ["Total", "9,999.00", " ", "61.4.5", "23e4", "-234.7e-2", "1+2i",
"3/4", "|", "chain", "-4,948", "3,25.61", "1,234,567.899", "10"]
arr.select { |s| s.match?(rxg) || Float(s, exception: false) ||
Rational(s, exception: false) Complex(s, exception: false) }
#=> ["9,999.00", "23e4", "-234.7e-2", "1+2i", "3/4", "-4,948",
# "1,234,567.899", "10"]
Note that "23e4", "-234.7e-2", "1+2i" and "3/4" are respectively the string representations of an integer, float, complex and rational number.
I am trying to parse all money from a string. For example, I want to extract:
['$250,000', '$3.90', '$250,000', '$500,000']
from:
'Up to $250,000………………………………… $3.90 Over $250,000 to $500,000'
The regex:
\$\ ?(\d+\,)*\d+(\.\d*)?
seems to match all money expressions as in this link. However, when I try to scan on Ruby, it fails to give me the desired result.
s # => "Up to $250,000 $3.90 Over $250,000 to $500,000, add$3.70 Over $500,000 to $1,000,000, add..$3.40 Over $1,000,000 to $2,000,000, add...........$2.25\nOver $2,000,000 add ..$2.00"
r # => /\$\ ?(\d+\,)*\d+\.?\d*/
s.scan(r)
# => [["250,"], [nil], ["250,"], ["500,"], [nil], ["500,"], ["000,"], [nil], ["000,"], ["000,"], [nil], ["000,"], [nil]]
From String#scan docs, it looks like this is because of the group. How can I parse all the money in the string?
Let's look at your regular expression, which I'll write in free-spacing mode so I can document it:
r = /
\$ # match a dollar sign
\ ? # optionally match a space (has no effect)
( # begin capture group 1
\d+ # match one or more digits
, # match a comma (need not be escaped)
)* # end capture group 1 and execute it >= 0 times
\d+ # match one or more digits
\.? # optionally match a period
\d* # match zero or more digits
/x # free-spacing regex definition mode
In non-free-spacing mode this would be written as follows.
r = /\$ ?(\d+,)*\d+\.?\d*/
When a regex is defined in free-spacing mode all spaces are stripped out before the regex is evaluated, which is why I had to escape the space. That's not necessary when the regex is not defined in free-spacing mode.
It is nowhere needed to match a space after the dollars sign, so \ ? should be removed. Suppose now we have
r = /\$\d+\.?\d*/
"$2.31 cat $44. dog $33.607".scan r
#=> ["$2.31", "$44.", "$33.607"]
That works, but it is questionable whether you want to match values that do not have exactly two digits after the decimal point.
Now write
r = /\$(\d+,)*\d+\.?\d*/
"$2.31 cat $44. dog $33.607".scan r
#=> [[nil], [nil], [nil]]
To see why this result was obtained examine the doc for String#scan, specifically the last sentence of the first paragraph: " If the pattern contains groups, each individual result is itself an array containing one entry per group.".
We can avoid that problem by changing the capture group to a non-capture group:
r = /\$(?:\d+,)*\d+\.?\d*/
"$2.31 cat $44. dog $33.607".scan r
#=> ["$2.31", "$44.", "$33.607"]
Now consider this:
"$2,241.31 cat $1,2345. dog $33.607".scan r
#=> ["$2,241.31", "$1,2345.", "$33.607"]
which is still not quite right. Try the following.
r = /
\$ # match a dollar sign
\d{1,3} # match one to three digits
(?:,\d{3}) # match ',' then 3 digits in a nc group
* # execute the above nc group >=0 times
(?:\.\d{2}) # match '.' then 2 digits in a nc group
? # optionally match the above nc group
(?![\d,.]) # no following digit, ',' or '.'
/x # free-spacing regex definition mode
"$2,241.31 $2 $1,234 $3,6152 $33.607 $146.27".scan r
#=> ["$2,241.31", "$2", "$1,234", "$146.27"]
(?![\d,.]) is a negative lookahead.
In normal mode this regular expression is written as follows.
r = /\$\d{1,3}(?:,\d{3})*(?:\.\d{2})?(?![\d,.])/
The following erroneous result would obtain without the negative lookahead at the end of the regex.
r = /\$\d{1,3}(?:,\d{3})*(?:\.\d{2})?/
"$2,241.31 $2 $1,234 $3,6152 $33.607 $146.27".scan r
#=> ["$2,241.31", "$2", "$1,234", "$3,615", "$33.60",
# "$146.27"]
[3] pry(main)> str = <<EOF
[3] pry(main)* Up to $250,000………………………………… $3.90 Over $250,000 to $500,000, add………………$3.70 Over $500,000 to $1,000,000, add……………..$3.40 Over $1,000,000 to $2,000,000, add……...........$2.25
[3] pry(main)* Over $2,000,000 add …..………………………$2.00
[3] pry(main)* EOF
=> "Up to $250,000………………………………… $3.90 Over $250,000 to $500,000, add………………$3.70 Over $500,000 to $1,000,000, add……………..$3.40 Over $1,000,000 to $2,000,000, add……...........$2.25\nOver $2,000,000 add …..………………………$2.00\n"
[4] pry(main)> str.scan /\$\d+(?:[,.]\d+)*/
=> ["$250,000", "$3.90", "$250,000", "$500,000", "$3.70", "$500,000", "$1,000,000", "$3.40", "$1,000,000", "$2,000,000", "$2.25", "$2,000,000", "$2.00"]
[5] pry(main)>
What I'm trying to accomplish
I'm trying to create a function to use string interpolation within VBA. The issue I'm having is that I'm not sure how to replace "\n" with a vbNewLine, as long as it does not have the escape character "\" before it?
What I have found and tried
VBScript does not have a negative look behind as far as I can research.
Below has two examples of Patterns that I have already tried:
Private Sub testingInjectFunction()
Dim dict As New Scripting.Dictionary
dict("test") = "Line"
Debug.Print Inject("${test}1\n${test}2 & link: C:\\notes.txt", dict)
End Sub
Public Function Inject(ByVal source As String, dict As Scripting.Dictionary) As String
Inject = source
Dim regEx As Object
Set regEx = CreateObject("VBScript.RegExp")
regEx.Global = True
' PATTERN # 1 REPLACES ALL '\n'
'regEx.Pattern = "\\n"
' PATTERN # 2 REPLACES EXTRA CHARACTER AS LONG AS IT IS NOT '\'
regEx.Pattern = "[^\\]\\n"
' REGEX REPLACE
Inject = regEx.Replace(Inject, vbNewLine)
' REPLACE ALL '${dICT.KEYS(index)}' WITH 'dICT.ITEMS(index)' VALUES
Dim index As Integer
For index = 0 To dict.Count - 1
Inject = Replace(Inject, "${" & dict.Keys(index) & "}", dict.Items(index))
Next index
End Function
Desired result
Line1
Line2 & link: C:\notes.txt
Result for Pattern # 1: (Replaces when not wanted)
Line1
Line2 & link: C:\
otes.txt
Result for Pattern # 2: (Replaces the 1 in 'Line1')
Line
Line2 & link: C:\\notes.txt
Summary question
I can easily write code that doesn't use Regular Expressions that can achieve my desired goal but want to see if there is a way with Regular Expressions in VBA.
How can I use Regular Expressions in VBA to Replace "\n" with a vbNewLine, as long as it does not have the escape character "\" before it?
Yes, you may use a regex here. Since the backslash is not used to escape itself in these strings, you may modify your solution like this:
regEx.Pattern = "(^|[^\\])\\n"
S = regEx.Replace(S, "$1" & vbNewLine)
It will match and capture any char but \ before \n and then will put it back with the $1 placeholder. As there is a chance that \n appears at the start of the string, ^ - the start of string anchor - is added as an alternative into the capturing group.
Pattern details
(^|[^\\]) - Capturing group 1: start of string (^) or (|) any char but a backslash ([^\\])
\\ - a backslash
n - a n char.
Here is the string, a full example:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
('1416851046', '1416851046', '2607:5300:60:6097::', '80.187.100.105', 'lagbugdc', 393884, '737537953'),
('1416851067', '1416851067', '174.66.174.101', '98.148.244.151', 'maihym', 393885, '1473193487'),
('1416851094', '1416851094', '2607:5300:60:6097::', '92.157.2.230', 'xeosse26', 393886, '737537953'),
I'd like to remove -EVERYTHING- from it except: facebook:jens.pettersson.7568
(the username slot)
And where facebook:jens.pettersson.7568 is actually 'facebook:jens.pettersson.7568', I'd like it to appear as:
facebook:jens.pettersson.7568 (see the white space there?)
Then sort my list where all 361k lines line up like so:
x x xx xcx xzx xyx xtz
All with spaces, in technically 1 line, if possible.
Or if removing and just collecting the 1 line I need would suffice, I could manually do the sorting i suppose
I'm going to read between the lines and guess that what you want is this:
BEFORE:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
^ this is username
AFTER:
facebook:humpy_electro
You could handle that with the following regex:
s/(?:[^,]*,){4}[\s'"]*([^'",]*).*/facebook:$1, /
i.e.
(?: # begin non-capturing group
[^,]*, # zero or more non-comma characters, followed by a comma
){4} # end non-capturing group, and repeat 4 times
# this skips the first 4 columns of data
[\s'"]* # matches any whitespace and the first quote
( # begin capturing group 1
[^'",]* # capture all non-comma characters until the end quote
) # end capturing group 1
.* # match rest of line
# REPLACE WITH
facebook: # literal text
$1 # capturing group 1
, # comma and a trailing space (not shown here)
And voila.
This turns this:
('1416851040', '1416851040', '50.62.177.118', '84.161.97.189', 'humpy_electro', 393883, '385962628'),
('1416851046', '1416851046', '2607:5300:60:6097::', '80.187.100.105', 'lagbugdc', 393884, '737537953'),
('1416851067', '1416851067', '174.66.174.101', '98.148.244.151', 'maihym', 393885, '1473193487'),
('1416851094', '1416851094', '2607:5300:60:6097::', '92.157.2.230', 'xeosse26', 393886, '737537953'),
Into this
facebook:humpy_electro, facebook:lagbugdc, facebook:maihym, facebook:xeosse26,
I got it, from a friend, to do this was a 2 part: First step: ^((.? '){4}) replace with nothing, then, second step '((.?$){1}) replace with nothing.
I have a CSV file (exported data from iWork Numbers) which contains of a list of users with information. What I want to do is to replace ;;;;;;;;; with ; on all lines accept "Last login".
By doing so and importing the file to Numbers again the data will (hopefully) be divided in rows like this:
User 1 | Points: 1 | Registered: 2012-01-01 | Last login 2012-02-02
User 2 | Points: 2 | Registered: 2012-01-01 | Last login 2012-02-02
How the CSV file looks:
;User1;;;;;;;;;
;Points: 1;;;;;;;;;
;Registered: 2012-01-01;;;;;;;;;
;Last login: 2012-02-02;;;;;;;;;
;User2;;;;;;;;;
;Points: 2;;;;;;;;;
;Registered: 2012-01-01;;;;;;;;;
;Last login: 2012-02-02;;;;;;;;;
So my question is what Regex code should I type in the Find and Replace fields?
Thanks in advance!
See the regex in action:
Find : ^(;(?!Last).*)(;{9})
Replace: $1;
Output will be:
;User1;
;Points: 1;
;Registered: 2012-01-01;
;Last login: 2012-02-02;;;;;;;;;
;User2;
;Points: 2;
;Registered: 2012-01-01;
;Last login: 2012-02-02;;;;;;;;;
Explanation
Find:
^ # Match start of the line
( # Start of the 1st capture group
;(?!Last) # Match a semicolon (;), only if not followed by 'Last' word.
.* # Match everything
) # End of the 1st capture group
( # Start of the 2nd capture group
;{9} # Match exactly 9 semicolons
) # End of the 2nd capture group
Replace:
$1; # Leave 1st capture group as is and append a semicolon.