Regex Hero results differ from vb.net

Regex Hero results differ from vb.net - regex

Hi I am using Regex Hero to make a Regex. It worked as expected in Regex Hero. I then transferred it over to vb.net and now I get different results from the same exact data. I don't get it!
The Regex:
\d{10,13}.+?(?=(\bF\b|\bT\b|\bCT\b))
The .net code:
Dim strRegex As String = "\d{10,13}.+?(?=(\bF\b|\bT\b|\bCT\b))"
Dim myRegex As New Regex(strRegex, RegexOptions.None)
Dim strTargetString As String = a
For Each myMatch As Match In myRegex.Matches(strTargetString)
If myMatch.Success Then
RichTextBox1.AppendText(myMatch.Value & Environment.NewLine)
End If
Next
The Data:
" The Meijer Team appreciates your business 12/26/14 Your fast and friendly checkout was provided by ALARIA MEIJER SAVINGS SPECIALS 4.77 COUPONS 20.00 SAVINGS TOTAL 24.77 YOUR TOTAL SAVINGS SINCE 01/01/14 1,814.62 For additional savings and rewards visit mPerks.com GENERAL MERCHANDISE 7569107330 DYNO GEL THIMBL 1.49 CT 7569100487 DYNO THIMBLES 3.49 CT DRUGSTORE 2220094152 DEODORANT 1.99 T 30041667803 TOOTHBRUSH 9.99 T *70882049496 ORBIT TOOTHB was 3.69 now 2.95 T *1700006806 ANTIPERSPIRANT 1 # 2 / 6.00 was 3.99 now 3.00 T GROCERY 6414404213 CHEF BOYARDEE 2 # 1.07 2.14 F 6414404306 CHEF BOYARDEE 1.07 F 6414404315 CHEF BOYARDEE 1.07 F 6414404322 CHEF BOYARDEE 1.07 F 7680828008 SPAGHETTI 2 # 1.34 2.68 F 5100002549 PASTA SAUCE 2 # 1.97 3.94 F 4335400750 TORTILLAS 1.99 F 4400000057 SALTINES 2.69 F 4400002854 NABISCO OREOS 2 # 2.98 5.96 F 1312000484 FROZEN FRIES 2.99 F 4125002562 CHEESE SLICES 2.99 F 4125010210 MEIJER MILK 2 # 3.09 6.18 F 5150092751 PANCAKE SYRUP 3.19 F 3000032188 OATMEAL 3.29 F 3000032189 OATMEAL 3.29 F 8390000649 GOLD PEAK TEA 3.29 F 71373336283 MEIJER MILK 3.29 F 4400002734 COOKIES 3.49 F 1410007083 PEPPERDIGE FARM 2 # 3.99 7.98 F 1600027297 CEREAL 4.19 F 1600043471 CEREAL 4.19 F 4850001833 ORANGE JUICE 5.69 F 1450001420 BIRDS EYE VOILA 7.39 F *4400003113 SNACK CRACKER was 2.77 now 1.99 F *5100002526 SPAGHETTIOS 3 # 5 / 5.00 was 3.27 now 3.00 F *7192176312 FROZEN PIZZA 2 # 5.49 was 12.58 now 10.98 F *7131400331 AUNT MILLIE"S 1 # 2 / 6.00 was 3.39 now 3.00 F Total Basket Coupon => 20.00 off -20.00 Mperks # -- ********** TOTAL MI 6% Sales Tax 1.16 TOTAL TAX 1.16 TOTAL 107.09 PAYMENTS Primary Account - Debit ATM/DEBIT CARD TENDER 107.09 XXXXXXXXXXXXXXXX NUMBER OF ITEMS 42 See meijer.com or the Service Desk for current return policy. For additional savings and rewards visit mPerks.com. Tx:XXX Op:XXXXXX Tm:XX St:XX XXXXXXXXXXX How are we doing? Rate your shopping experience and you may win $1000 in Meijer gift cards! Visit us at www.meijer.com/tellmeijer or call 1-800-394-7198 Secure Code: 7800-0601-5020-3373-001 Survey should be completed within 72 hrs "

Related

Rank a column according to the Filters selected by the user

I have data consisting of route details of the customers and also their store scores.
raw data with overall ranking for all the customers :
Dist_Code|Dist_Name|State|Store_name|Route_code|Store_score|Rank
5371 ABC Chicago CG 1200 5 1
2098 HGT Kansas KK 6500 4.8 2
7680 POE Arizona QW 3300 4.2 3
3476 POE Arizona CV 3300 4 4
6272 KUN Florida ANF 7800 3.9 5
3220 ABC Chicago AF 1200 3.6 6
7266 IOR Califor LU 4500 3.2 7
3789 POE Arizona TR 3300 3 8
9383 KAR Newyork IO 5600 3 9
1583 KUN Florida BOT 7800 2.8 10
8219 ABC Chicago Bb 1200 2.5 11
3734 ABC Chicago AA 1200 2 12
6900 POE Arizona HAL 3300 1.8 13
8454 KUN Florida UYO 7800 1.5 14
Filters
Distname ALL
State ALL
Routecode ALL
This is the overall ranking for all the customers without selecting any filters. So when I select some filter like (Dist name, route code, store score) I want it to show the rank according to the selected filter. Eg :
Dist_Code|Dist_Name|State|Store_name|Route_code|Store_score|Rank
7680 POE Arizona QW 3300 4.2 1
3476 POE Arizona CV 3300 4 2
3789 POE Arizona TR 3300 3 3
6900 POE Arizona HAL 3300 1.8 4
Filter
Distname POE
State Arizona
Routecode 3300
The store score is based on some parameter which I calculated in a model using python. 
Currently it is string column in powerbi. I tried some dax but it was not successful.

Power BI: Conditional Formating Matrix Visual with data bars

I would like to create a matrix visual like below and add data bars as conditional formating to the "Sales Percentage" Column with different user defined max and min values based on the countries.
I have the following dummy data
Salesperson
Country
Product
Sales Percentage
Total Sales
Gina
Canada
City Bike
0.02
232
Gina
Canada
Mountain Bike
0.56
2800
Gina
Italy
City Bike
0.32
213
Gina
Italy
Mountain Bike
0.21
1050
Gina
USA
City Bike
0.11
122
Gina
USA
Mountain Bike
0.43
2150
John
Canada
City Bike
0.32
333
John
Canada
Mountain Bike
0.34
442
John
Italy
City Bike
0.12
2132
John
Italy
Mountain Bike
0.67
1233
John
USA
City Bike
0.22
3300
John
USA
Mountain Bike
0.45
7300
Mary
Canada
City Bike
0.21
121
Mary
Canada
Mountain Bike
0.53
2650
Mary
Italy
City Bike
0.32
213
Mary
Italy
Mountain Bike
0.12
600
Mary
USA
City Bike
0.11
123
Mary
USA
Mountain Bike
0.12
600
The matrix looks like this after showing columns as rows and putting "Sales Percentage" and "Total Sales" as values, Country as columns and Product + Salesperson as rows:
I can add databars when I right click the Sales Percentage under values but I can only enter one user defined min and max value for the whole "Sales Percentage" column. Is it possible to have different maximum value for data bars based on the Country? For example to create a target value of 35% for Canada, 40% for USA and 50% for Italy. So in other words the data bar would be full when the Sales Percentage for Canada reaches 35% and full when Sales Percentage for USA reaches 40% and so on.

This isn't possible with you current setup. The best you could do to approximate this is as follows.
Create a measure as follows:
% Canada = CALCULATE(SUM('Table'[Total Sales]), 'Table'[Country ] = "Canada")
Do the same for USA and Italy and then add them as values to your matrix.
You can now select individual targets for each country.

RegEx for matching Germany or Austria or CH Postcodes

It is about my site, it is a ad portal and 3 geodata are installed in the system: Germany, Switzerland and Austria.
When I look for an advertisement in Germany, everything works correctly, I'm looking for zip code 68259 and a radius of 30 km. The results are correct, it shows all ads from 68259 Mannheim and the radius of 30 km.
Problem: The problem exists when I search in Switzerland or Austria: I search for the postal code 6000 Lucerne 1 PF and a radius of 30 km ... the results are wrong, I also find ads from Munich or Frankfurt which correspond to 300-500 km radius! I think the mistake is somewhere in the regex postal verification! Any advice what could be wrong???
// Germany Postcode
preg_match('/\b((?:0[1-46-9]\d{3})|(?:[1-357-9]\d{4})|(?:[4][0-24-9]\d{3})|(?:[6][013-9]\d{3}))\b/is', $this->search_code, $output);
if(!empty($output[0])){
$this->search_code = $output[0];
}else{
// Switzerland, Austria Postcode
preg_match('/\d{4}/', $this->search_code, $at_ch);
if(!empty($at_ch[0])){
$this->search_code = $at_ch[0];
}
}

The following regex will match codes for DE, CH & AU:
'/\b((?:0[1-46-9]\d{3})|(?:[1-357-9]\d{4})|(?:[4][0-24-9]\d{3})|(?:[6][013-9]\d{3})|(?:\d{4}))\b/is'
Examples
68259 Mannheim -> 68259
6000 Lucerne 1 PF -> 6000
1234 Musterstadt -> 1234

Reading text into table format in pandas

I have a table in text form that I want to read into pandas
I can use \n to separate the rows, but how can I separate the columns they are in the format ( 2 x text fields, then 6 x numeric).
Is there a method using regex or similar?
table_text = '''Name AIC sector Price (last close) Price (bid) Price (offer) NAV Total assets (£m) Market cap (£m)
3i Infrastructure Plc Infrastructure GBX 296.00 2.96 2.96 254.50 2,268.700 2,638.645
Aberdeen Asian Income Fund Limited Asia Pacific Income GBX 227.50 2.26 2.29 252.51 479.110 399.796
Aberdeen Diversified Income & Growth Ord Flexible Investment GBX 95.20 0.95 0.96 115.34 379.030 294.985
Aberdeen Emerging Markets Investment Company Limited Global Emerging Markets GBX 704.00 6.98 7.10 829.47 391.268 323.595
Aberdeen Japan Investment Trust Plc Japan GBX 712.50 7.00 7.25 784.79 114.957 94.198
Aberdeen Latin American Income Latin America GBX 57.00 0.54 0.57 62.13 40.985 32.555
Aberdeen New Dawn Asia Pacific GBX 322.00 3.22 3.26 365.56 431.544 350.752
Aberdeen New India Investment Trust Plc India GBX 516.00 5.16 5.18 601.47 375.170 301.268
Aberdeen New Thai Investment Trust Plc Country Specialist GBX 445.00 4.40 4.50 516.30 92.585 71.180
Aberdeen Smaller Companies Income Trust UK Smaller Companies GBX 358.00 3.56 3.60 397.45 95.028 79.153
Aberdeen Standard Asia Focus 2025 CULS Asia Pacific Smaller Companies GBX 100.95 1.01 1.01 97.25 391.484 37.026
Aberdeen Standard Asia Focus PLC Asia Pacific Smaller Companies GBX 1,280.00 12.75 13.00 1,440.65 483.841 402.730
Aberdeen Standard Equity Inc Trust plc UK Equity Income GBX 353.00 3.50 3.56 379.60 203.368 170.598
Aberdeen Standard European Logistics Income PLC Property - Europe GBX 116.00 1.15 1.16 117.82 309.808 305.022
Aberforth Smaller Companies Trust Plc UK Smaller Companies GBX 1,496.00 14.94 15.00 1,613.41 1,513.467 1,327.297
Aberforth Split Level Income Trust Plc UK Smaller Companies GBX 80.10 0.80 0.81 91.46 228.143 152.390
Aberforth Split Level Income ZDP 2024 UK Smaller Companies GBX 111.50 1.10 1.13 113.83 227.713 53.032
Acorn Income Fund Ltd UK Equity & Bond Income GBX 351.00 3.46 3.56 415.97 100.206 55.517
Acorn Income Fund ZDP 2022 UK Equity & Bond Income GBX 161.00 1.61 1.61 162.09 34.413 34.182
AEW UK REIT Ord Property - UK Commercial GBX 92.40 0.92 0.92 97.85 194.107 146.384'''
df = pd.DataFrame([x.split(';') for x in table_text.split('\n')])
print(df)
Outputs:
0
0 Name AIC sector Price (last close) Price (bid)...
1 3i Infrastructure Plc Infrastructure GBX 296...
2 Aberdeen Asian Income Fund Limited Asia Paci...
3 Aberdeen Diversified Income & Growth Ord Fle...
4 Aberdeen Emerging Markets Investment Company...
5 Aberdeen Japan Investment Trust Plc Japan GB...
6 Aberdeen Latin American Income Latin America...
7 Aberdeen New Dawn Asia Pacific GBX 322.00 3....
8 Aberdeen New India Investment Trust Plc Indi...
9 Aberdeen New Thai Investment Trust Plc Count...
10 Aberdeen Smaller Companies Income Trust UK S...
11 Aberdeen Standard Asia Focus 2025 CULS Asia ...
12 Aberdeen Standard Asia Focus PLC Asia Pacifi...
13 Aberdeen Standard Equity Inc Trust plc UK Eq...
14 Aberdeen Standard European Logistics Income ...
15 Aberforth Smaller Companies Trust Plc UK Sma...
16 Aberforth Split Level Income Trust Plc UK Sm...
17 Aberforth Split Level Income ZDP 2024 UK Sma...
18 Acorn Income Fund Ltd UK Equity & Bond Incom...
19 Acorn Income Fund ZDP 2022 UK Equity & Bond ...
20 AEW UK REIT Ord Property - UK Commercial GBX...
EDIT:
This is my hacky way of doing it. Relies on there being a currency column populated with "GBX" though.
Would welcome any ideas on better ways of doing this?
Is there a regex way of finding three capital letters preceded by a space and with a space then number afterwards? That would find the currency without hardcoding "GBX".
def convert_rows(df):
sector_name = "GBX"
for index, row in df.iterrows():
if sector_name in row[0]:
name = row[0].split(sector_name)[0]
numbers = row[0].split(sector_name)[1]
df.at[index, ['Name']] = name
df.at[index, ['AIC sector']] = sector_name
df.at[index,['Price (last close)', 'Price (bid)', 'Price (offer)', 'NAV', 'Total assets (£m)', 'Market cap (£m)']] = numbers.split()
return df
df = convert_rows(df)

You could try this:
import re
def convert_rows(df):
for index, row in df.iterrows():
# Search for the pattern
sector_name = re.match(r".+\s([A-Z]{3})\s\d+.+", row[0])
if sector_name:
sector_name = sector_name.group(1) # GBX for instance
name = row[0].split(sector_name)[0]
numbers = row[0].split(sector_name)[1]
df.at[index, ['Name']] = name
df.at[index, ['AIC sector']] = sector_name
df.at[index,['Price (last close)', 'Price (bid)', 'Price (offer)', 'NAV', 'Total assets (£m)', 'Market cap (£m)']] = numbers.split()
return df

Create new column in dataframe by applying math operation to column values based on a match

I have the following dataframes:
df1
name phone duration(m)
Luisa 443442 1
Jack 442334 6
Matt 442212 2
Jenny 453224 1
df2
prefix charge rate
443 0.8 0.3
446 0.8 0.4
442 0.6 0.1
476 0.8 0.3
my desired output is to match each phone number with its prefix (there are more prefixes than phone numbers) and calculate how much to charge per called by multiplying the duration of call for each phone number by the corresponding prefix charge plus the corresponding rate.
output ex.
df1
name phone duration(m) bill
Luisa 443442 1 (example: 1x0.3+0.8)
Jack 442334 6 (example: 6x0.1+0.6)
Matt 442212 2
Jenny 453224 1
my idea was to convert df2 to a dictionary like so dict={'443':[0.3,0.8],'442':[0.1,0.6]...} so i could match each number with the dict key and then do the opertion with the corresponding value of that matching key. However is not working and would also like to know if there is a better alternative.

To merge with prefix of arbitrary length you can do
>> df1['phone'] = df1.phone.astype(str)
>> df2['prefix'] = df2.prefix.astype(str)
>> df1['prefix_len'] = df1.phone.apply(
lambda h: max([len(p) for p in df2.prefix if h.startswith(p)] or [0]))
>> df1['prefix'] = df1.apply(lambda s: s.phone[:s.prefix_len], axis=1)
>> df1 = df1.merge(df2, on='prefix')
>> df1['bill'] = df1['duration(m)'] * df1['rate'] + df1['charge']
>> df1
duration(m) name phone prefix_len prefix charge rate bill
0 1 Luisa 443442 3 443 0.8 0.3 1.1
1 6 Jack 442334 3 442 0.6 0.1 1.2
2 2 Matt 442212 3 442 0.6 0.1 0.8
Note that
in case of multiple prefixes I choose the one with maximum length;
in case when there are no prefixes for particular phone I fill its length with default zero value, (then s.phone[:s.prefix_len] will produce an empty prefix and pd.merge will eliminate those phones from the result).

df1 = pd.DataFrame({'name':["Louisa","Jack","Matt","Jenny"],'phone':[443442,442334,442212,453224],'duration':[1,6,2,1]})
df2 = pd.DataFrame({'prefix':[443,446,442,476],'charge':[0.8,0.8,0.6,0.8],'rate':[0.3,0.4,0.1,0.3]})
df3=pd.concat((df1,df2),axis=1)
df4=pd.DataFrame({"phone_pref":df3["phone"].astype(str).str[:3]})
df4=df4["phone_pref"].drop_duplicates()
df3["bill"]=None
for j in range(len(df4)):
for i in range(len(df3["prefix"])):
if df3.loc[i,"prefix"]==int(df4.iloc[j]):
df3.loc[i,"bill"]=df3.loc[i,"duration"]*df3.loc[i,"charge"]+df3.loc[i,"rate"]
print(df3)
duration name phone charge prefix rate bill
0 1 Louisa 443442 0.8 443 0.3 1.1
1 6 Jack 442334 0.8 446 0.4 None
2 2 Matt 442212 0.6 442 0.1 1.3
3 1 Jenny 453224 0.8 476 0.3 None
The None values in the bill column are because in your excample no phone number has the prefixes 446 or 476 and thus they are not in the df4...
Also the bill is calculated with the formula of yours given in the question

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex Hero results differ from vb.net - regex

Related

Rank a column according to the Filters selected by the user

Power BI: Conditional Formating Matrix Visual with data bars

RegEx for matching Germany or Austria or CH Postcodes

Reading text into table format in pandas

Create new column in dataframe by applying math operation to column values based on a match

Categories

Resources