I have a text doc with about 9000 lines. The data is alpha numeric. Within the doc, there are approximately 150 lines I need to identify. The only common factor is that each contains a dollar amount. I've tried multiple Regex searches, and just can't get it right.
INVALID PAYMENT AMT
013 1887000 CRJ 0.00 03/04/2015-01222015 - Code 938
INVALID PAYMENT AMT
019 0 ,CRJ 426.72 03/06/2015-01282015 - Code 628
In the example above, I need to bookmark the line with the 426.72. I don't care about the other 3 lines. Every line I need in the document has a positive dollar amount.
Perhaps:
(([1-9][0-9]*)\.([0-9]*[1-9][0-9]|00)*)|(0\.([0-9]*[1-9][0-9]))
Related
Numbers under 1 are currently being represented with a leading zero before the decimal point (example: 0.50). Because I'm working with baseball statistics (which almost never have the zero before the decimal) I would like to remove that. I want to keep the number before the decimal if its greater than 1 though. How would I do that?
For instance if I'm working with this measure. Is there something I can add to that?
AVG = SUM(Batter[H])/sum(Batter[AB])
Thanks. I appreciate the help.
Here is some sample data
Name AB H
Gleyber Torres 546 152
Brett Gardner 491 123
Aaron Judge 378 103
Adam Ottavino 0 0
Aroldis Chapman 0 0
The NAN error is occurring because you are dividing by 0. You should add an IF condition to avoid that:
AVG = IF(sum(Batter[AB])=0,BLANK(),SUM(Batter[H])/sum(Batter[AB]))
To tackle the formatting issue you can use the FORMAT function as mentioned by Andrey:
AVG = IF(sum(Batter[AB])=0,BLANK(),FORMAT(SUM(Batter[H])/sum(Batter[AB]),"###.0#"))
Hope this helps.
Unfortunately, it isn't directly possible. However, in the last step (the visualization of the data), you can convert the decimal number to text and format it as you want. For example, your measure could be like this:
AVG = FORMAT(SUM(Batter[H])/SUM(Batter[AB]), "#,###.00")
This will give you 2 decimal places (0 means that there will be a digit displayed at this position), but the digits before the decimal are optional (# means it will show a digit, but will omit the leading zeros) or here are some examples:
I want to mask phone numbers in a resume which also contains date in the for 2001, 2001-03 and percentages 45% 87% 78.45% 56.5%.
I only want to mask the phone numbers, and I don't need to mask it completely. If I could only mask 3 or 4 digits that makes it hard to guess, that does the job. Kindly help me out.
Phone number formats are
9876543210
98765 43210
98765-43210
9876 543 210
9876-543-210
Here is my answer:
(([0-9][- ]*){5})(([0-9][- ]*){5})
It will match exactly 10 digits with or without - or space.
After that, you can replace the first or the third group with ***** or anything you like.
For example:
$1*****
\d{4,5}[ -]?\d{3}[ -]?\d{2,3}
Strings matched:
9876543210, 98765 43210, 98765-43210, 9876 543 210, 9876-543-210
Strings not matched:
45% 87% 78.45% 56.5%
2001, 2001-03
I feel that a more complicated regex that doesn't match invalid phone numbers is not required since the requirement is to mask valid phone numbers of the above format.
Check here
Python code:
def fun(m):
if m:
return '*'*len(m.group(1))+m.group(2)
string = "Resume of candidate abcd. His phone numbers are : 9876543210, 98765 43210, 98765-43210.Date of birth of the candidate is 23-10-2013. His percentage is 57%. One more number 9876 543 213 His percentage in grad school is 44%. Another number 9876-543-210"
re.sub('(\d{4,5})([ -]?\d{3}[ -]?\d{2,3})',fun,string)
Output:
'Resume of candidate abcd. His phone numbers are : *****43210, *****
43210, *****-43210. Date of birth of the candidate is 23-10-2013. His
percentage is 57%. One more number **** 543 213 His percentage in grad
school is 44%. Another number ****-543-210'
More about re.sub:
re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping
occurrences of pattern in string by the replacement repl. If the
pattern isn’t found, string is returned unchanged. repl can be a
string or a function;
Just to help you on your way... I would use python to do is.
Use re module to search for number-like strings:
import re
num_re = re.compile('[0-9 -]{5,}')
with open('/my/file', 'r') as f:
for l in f:
for s in num_re.findall(l):
# Do some addition testing, like 'not starting with' or any
l.replace(s, '!!!MASKED!!!')
print l
I'm not saying that this code is finished, but it should help you on your way.
By the way, why I would use this approach:
You can easily add any tests you like to fix false positives.
Its readable.
I'm trying to use the regex from this site
/^([+]39)?((38[{8,9}|0])|(34[{7-9}|0])|(36[6|8|0])|(33[{3-9}|0])|(32[{8,9}]))([\d]{7})$/
for italian mobile phone numbers but a simple number as 3491234567 results invalid.
(don't care about spaces as i'll trim them)
should pass:
349 1234567
+39 349 1234567
TODO: 0039 349 1234567
TODO: (+39) 349 1234567
TODO: (0039) 349 1234567
regex101 and regexr both pass the validation..what's wrong?
UPDATE:
To clarify:
The regex should match any number that starts with either
388/389/380 (38[{8,9}|0])|
or
347/348/349/340 (34[{7-9}|0])|
or
366/368/360 (36[6|8|0])|
or
333/334/335/336/337/338/339/330 (33[{3-9}|0])|
328/329 (32[{8,9}])
plus 7 digits ([\d]{7})
and the +39 at the start optionally ([+]39)?
The following regex appears to fulfill your requirements. I took out the syntax errors and guessed a bit, and added the missing parts to cover your TODO comments.
^(\((00|\+)39\)|(00|\+)39)?(38[890]|34[7-90]|36[680]|33[3-90]|32[89])\d{7}$
Demo: https://regex101.com/r/yF7bZ0/1
Your test cases fail to cover many of the variations captured by the regex; perhaps you'll want to beef up the test set to make sure it does what you want.
The beginning allows for an optional international prefix with or without the parentheses. The basic pattern is (00|\+)39 and it is repeated with or without parentheses around it. (Perhaps a better overall approach would be to trim parentheses and punctuation as well as whitespace before processing begins; you'll want to keep the plus as significant, of course.)
Updated with information from #Edoardo's answer; wrapped for legibility and added comments:
^ # beginning of line
(\((00|\+)39\)|(00|\+)39)? # country code or trunk code, with or without parentheses
( # followed by one of the following
32[89]| # 328 or 329
33[013-9]| # 33x where x != 2
34[04-9]| # 34x where x not in 1,2,3
35[01]| # 350 or 351
36[068]| # 360 or 366 or 368
37[019] # 370 or 371 or 379
38[089]) # 380 or 388 or 389
\d{6,7} # ... followed by 6 or 7 digits
$ # and end of line
There are obvious accidental gaps which will probably also get filled over time. Generalizing this further is likely to improve resilience toward future changes, but of course may at the same time increase the risk of false positives. Make up your mind about which is worse.
I found this and i updated with new operators and MVNO prefixes (Iliad, ho.)
^(\((00|\+)39\)|(00|\+)39)?(38[890]|34[4-90]|36[680]|33[13-90]|32[89]|35[01]|37[019])\d{6,7}$
I improved the regex adding the case to handle space between numbers:
^(\((00|\+)39\)|(00|\+)39)?(38[890]|34[4-90]|36[680]|33[13-90]|32[89]|35[01]|37[019])(\s?\d{3}\s?\d{3,4}|\d{6,7})$
so, for example, I can match phone number like this (0039) 349 123 4567 or this 349 123 4567
Following doc:
https://it.qaz.wiki/wiki/Telephone_numbers_in_Italy
A simple regex for MOBILE italian numbers without special chars is:
/^3[0-9]{8,9}$/
it match a string starting with the digit '3' and followed by 8 or 9 digits, ex:
3345678103
you can add then ITALIAN prefix like '+39 ' or '0039 '
/^+39 3[0-9]{8,9}$/ --- match --> +39 3345678103
/^\0039 3[0-9]{8,9}$/ --- match --> 0039 3345678103
I have a text were I need to find 3 groups strings.
I try expression: \r?\n\r?\n\r?[0-9A-Z].*\d{7} but I find only 2 strings instead 3.
I should highlight 00170784,HEDINV,00173575 but I get only 00170784 and 00173575
This is the text:
BUY
USM4
200 contracts
04/28/2014 15:50
00170784
56
contracts
HEDINV
64
contracts
00173575
80
contracts
At average price of USD 134.375
SELL
USM4
200 contracts
04/28/2014 15:50
00170784
56
contracts
HEDINV
64
contracts
00173575
80
contracts
At average price of USD 134.5938
May I suggest using this instead?
^\d{8}$|^[A-Z]{6}$
It has two capture groups it looks for. One is an 8 digit sequence for a whole line. The other is a 6 letter sequence for a whole line. That grabs what you're looking, unless there's a specific reason you're using all those linebreak matches.
I have some large text files containing data like below:
2.086
0.019
2.181
0.004
2.308
0.005
2.165
0.023
2.113
0.004
2.022
0.005
0.013
0.033
0.005
0.026
0.009
0.037
I would like select every 13th line and swap the consecutive lines with one another up to the 18th line. The required output should be like below:
2.086
0.019
2.181
0.004
2.308
0.005
2.165
0.023
2.113
0.004
2.022
0.005
0.033
0.013
0.026
0.005
0.037
0.009
I was trying to construct a suitable regex for this operation but not sure how to start with selecting every 13th to 18th line !!
EDIT: After discussion in comments and in chat, here is the solution first. The explanation follows.
Search: (([^\n]*\n){12})((?2))((?2))((?2))((?2))((?2))((?2))
Replace: \1\4\3\6\5\8\7
Explanation
Here the general recipe for selecting 5 lines:
(?:[^\n]*\n){5}
See demo
The [^\n]*\n selects any number of non-newline characters, followed by a newline character. We do that 5 times. You can add anchors, lookarounds and so on to do more.
For instance, this regex swaps the first 3 lines with the next 5:
\A((?:[^\n]*\n){3})((?:[^\n]*\n){5})
The key to understand here is that groups of lines are captured to Groups 1 and 2. This is done by enclosing each expression in capturing parentheses. Later, in the replacement, as the demo shows, these groups can be referenced by their number like so: \1 for Group 1, and so on.
See demo
You want to target the 13th line? Match 12, then 1. You want to do something with further lines? Add lines, and switch the capture groups around as needed.
\A((?:[^\n]*\n){12})((?:[^\n]*\n))((?:[^\n]*\n){2})
In this example, the first group of parentheses captures 12 lines, the next captures the 13th line, the next captures the following two lines. You can switch the 13th and the following two lines by rearranging the groups in the replacement: \1\3\2
You want to do this multiple times in a file? Don't anchor. Here is a demo that swaps every 5th line with lines 6 and 7.
Just adjust to your needs.
EDIT
Based on your comments, this should do exactly what you need.
Search: ((?:[^\n]*\n){12})((?:[^\n]*\n))((?2))((?2))((?2))((?2))((?2))
Replace: \1\3\2\5\4\7\6
Demo
Same idea. The ((?2)) is a way to avoid repeating the same regex over and over (the second group of capturing parentheses, which captures one line). (?2) says "repeat the expression in Group 2", and the extra parentheses put these expressions into groups 3, 4, 5, 6, 7.