Need a regex that captures date & amount fields & ignore blank lines & other miscellaneous data - regex

Goal:
Capture only the purchase date, amount, and purchased item name(s).
Ignore all blank lines
Ignore Reference # & SHIPPING AND TAX string
Then, repeat this on the next grouping of purchases.
I am using Google Sheets for this project.
Sample data showing 3 purchases (ie blocks of data)
Note: spacing & SHIPPING AND TAX string varies inbetween
01/12 P934200QXEHMBPNAD Acme MARKETPLACE SEATTLE WA $34.96 Date & Amount
Ignore (blank line)
435852496957 Ignore
BOSCH CO2135 1/8 In. x 2-3/4 I Purchased item name
BOSCH CO2131 1/16 In. x 1-7/8 Purchased item name
IZOD Men's Memory Foam Slipper Purchased item name
SHIPPING AND TAX Ignore
Ignore (blank line)
01/12 P934200QXEHMB6MQ0 Acme MARKETPLACE SEATTLE WA $48.91
492577232349
LxTek Remanufactured Ink Cartr
SHEENGO Remanufactured Ink Car
02/02 P934200AEHMB7E12 Acme MARKETPLACE SEATTLE WA $21.60
659473773469
LHXEQR Tubing Adapter for ResM
SHIPPING AND TAX
My updated attempt
=index(if(len(C26:C33),REGEXREPLACE(C26:C33,"(?Ums)(\d{2}/\d{2}) .* (\$\d{1,}\.\d{1,2}).(?:^\s+\d+$)(.*)(?:\s+SHIPPING AND TAX)","$1,$2,$3"),))
Unsuccessful results
01/12 P934200QXEHMBPNAD Acme MARKETPLACE SEATTLE WA $34.96
#VALUE!
BOSCH CO2135 1/8 In. x 2-3/4 I
BOSCH CO2131 1/16 In. x 1-7/8
IZOD Men's Memory Foam Slipper
SHIPPING AND TAX
01/12 P934200QXEHMB6MQ0 Acme MARKETPLACE SEATTLE WA $48.91
#VALUE!
LxTek Remanufactured Ink Cartr
SHEENGO Remanufactured Ink Car
02/02 P934200DJEHMB7E12 AMAZON MARKETPLACE SEATTLE WA $21.60
#VALUE!
LHXEQR Tubing Adapter for ResM
SHIPPING AND TAX
Unsuccessful
- it did not ignore:
data btwn date & amount
blank lines
SHIPPING AND TAX
- Value issue - not handling Reference # well
Function REGEXREPLACE parameter 1 expects text values. But '435848996957' is a number and cannot be coerced to a text.

This matches one "block" (note: turn DOTALL flag on):
(\d\d\/\d\d).*(\$[\d.]+).(?:^\s+\d+$)(.*)SHIPPING AND TAX\n\n
Capturing:
the date as group 1
the amount as group 2
the product lines as group 3
See live demo.

Related

RegEx for matching Germany or Austria or CH Postcodes

It is about my site, it is a ad portal and 3 geodata are installed in the system: Germany, Switzerland and Austria.
When I look for an advertisement in Germany, everything works correctly, I'm looking for zip code 68259 and a radius of 30 km. The results are correct, it shows all ads from 68259 Mannheim and the radius of 30 km.
Problem: The problem exists when I search in Switzerland or Austria: I search for the postal code 6000 Lucerne 1 PF and a radius of 30 km ... the results are wrong, I also find ads from Munich or Frankfurt which correspond to 300-500 km radius! I think the mistake is somewhere in the regex postal verification! Any advice what could be wrong???
// Germany Postcode
preg_match('/\b((?:0[1-46-9]\d{3})|(?:[1-357-9]\d{4})|(?:[4][0-24-9]\d{3})|(?:[6][013-9]\d{3}))\b/is', $this->search_code, $output);
if(!empty($output[0])){
$this->search_code = $output[0];
}else{
// Switzerland, Austria Postcode
preg_match('/\d{4}/', $this->search_code, $at_ch);
if(!empty($at_ch[0])){
$this->search_code = $at_ch[0];
}
}
The following regex will match codes for DE, CH & AU:
'/\b((?:0[1-46-9]\d{3})|(?:[1-357-9]\d{4})|(?:[4][0-24-9]\d{3})|(?:[6][013-9]\d{3})|(?:\d{4}))\b/is'
Examples
68259 Mannheim -> 68259
6000 Lucerne 1 PF -> 6000
1234 Musterstadt -> 1234

Power BI Matrix Visual Showing Row of Blank Values Even Though Source Data Does Not Have Blanks

I have two tables one with data about franchise locations (Franchise Profile Info) and one with Award data. Each franchise location is given a certain number of awards they are allowed to give out per year. Each franchise location rolls up to a larger group depending on where in the country they are located. These tables are in a 1 to 1 relationship using Franchise ID. I am trying to create a matrix with the number of awards, total utilized, and percentage utilized rolled up to group with the ability to expand the groups and see individual locations. For some reason when I add the value fields a blank row is created. There are not any blank rows in either of the original tables so I'm not sure where this is coming from.
Franchise Profile Info table
ID
Franchise Name
Group
Street Address
City
State
164
Park's
West
12 Park Dr.
Los Angeles
CA
365
A & J
East
243 Whiteoak Rd
Stafford
VA
271
Otto's
South
89 Main St.
St. Augustine
FL
Award table
ID
Year
TotalAwards
Utilized
164
2022
16
12
365
2022
5
5
271
2022
22
17
This tables are in a relationship with a 1 to 1 match on ID
What I want the matrix to look like
Group
Total Awards
Utilized
%Awards Utilized
East
5
5
100%
West
16
12
75%
South
22
17
77%
Instead what I'm getting is this
Group
Total Awards
Utilized
%Awards Utilized
East
5
5
100%
West
16
12
75%
South
22
17
77%
0
0
0%
I can't for the life of me figure out where this row is coming from. I can add in the Group and Franchise name as rows but as soon as I add any of the value columns this blank row shows up.
You have a value on the many side that does not exist on the one side. You can read a full explanation here. https://www.sqlbi.com/articles/blank-row-in-dax/

How do I create a pivot table with weighted averages from a table in PowerBI?

I have data in the following format:
Building
Tenant
Type
Floor
Sq Ft
Rent
Term Length
1 Example Way
Jeff
Renewal
5
100
100
6
47 Fake Street
Tom
New
3
500
200
12
I need to create a visualisation in PowerBI that displays a pivot table of attribute by tenant, with a weighted averages (by square foot) column, like this:
Jeff
Tom
Weighted Average (by Sq Ft)
Building
1 Example Way
47 Fake Street
-
Type
Renewal
New
-
Floor
5
3
-
Sq Ft
100
500
433.3333333
Rent
100
200
183.3333333
Term Length (months)
6
12
11
I have unpivoted the original data, like this:
Tenant
Attribute
Value
Jeff
Building
1 Example Way
Jeff
Type
Renewal
Jeff
Floor
5
Jeff
Sq Ft
100
Jeff
Rent
100
Jeff
Term Length (months)
6
Tom
Building
47 Fake Street
Tom
Type
New
Tom
Floor
3
Tom
Sq Ft
500
Tom
Rent
200
Tom
Term Length (months)
12
I can almost create what I need from the unpivoted data using a matrix (as below), but I can't calculate the weighted averages column from that matrix.
Jeff
Tom
Building
1 Example Way
47 Fake Street
Type
Renewal
New
Floor
5
3
Sq Ft
100
500
Rent
100
200
Term Length (months)
6
12
I can also create a table with my attributes as headers (instead of in a column). This displays the right values and lets me calculate weighted averages (as below).
Building
Type
Floor
Sq Ft
Rent
Term Length (months)
Jeff
1 Example Way
Renewal
5
100
100
6
Tom
47 Fake Street
New
3
500
200
12
Weighted Average (by Sq Ft)
-
-
-
433.3333333
183.3333333
11
However, it's important that these values are displayed vertically instead of horizontally. This is pretty straightforward in Excel, but I can't figure out how to do it in PowerBI. I hope this is clear. Can anyone help?
Thanks!

Powershell complicated regex powershell multiple groups

I need some help with my regex.
My code looks like this (I haven't gotten very far):
$source_file = "\\server\minified.txt"
$sf_content = gc $source_file -raw
$sections = $sf_content | select-string -AllMatches '(?smi)(^\s+\d+:\d+\s+AM\s+\w+\s+ACCOUNT ACTIVITY\s-\s)(\w+\s+\w+$)(.+?(Start Account\s\d+)(.+?Elapsed))'
$sections
The file looks like this:
I was able to get the first and last name using my regex from the "ACCOUNT ACTIVITY - PERSON'S NAMEHERE" string circled in red at the top of the image shown above.
My end goal is to be able to regex the blue box as a match, getting all information from the date on the top left, down to the "1 Accounts worked per hour". Then I want to get the info from the 2nd red circle. I would like to get the start time at the beginning of that line and then find the last instance of the same line "Start account 54321234" so that I can take the last time minus the first time.
So for each blue box, get the info from the red circles. For each red circle containing "Start account" take the blue circle minus the green circle.
I would like to try this using regex groups. If I can't figure that out I'd like to put each of my blue box regex into an array and for each item in the array I can further do regex to get what I want.
My code is not complete. But I'm not sure how to do the regex so I'll keep updating this as I update the script and do my own research.
If anybody has pointers I'd appreciate it.
Here is the source content in text form:
05/07/20 Acme, Inc. PAGE 1
9:48 AM ABC ACCOUNT ACTIVITY - Bart Simpson
The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
DATE TIME ELAPSED ACTION
04/16/20 8:06:50 0:00 Enter Account Screen
-------------------------------------------------------------------------------
8:06:53 0:03 Start account 12345678 ROSS, BOB N
8:07:24 0:31 Finished account in 31 seconds
-------------------------------------------------------------------------------
8:07:26 0:02 Start account 54321234 DOE, JOHN
8:07:27 0:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
8:07:28 0:02 Start account 54321234 DOE, JOHN
8:10:26 0:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
05/06/20 4:55:49 5:08 Leave Account Screen 9:33 Elapsed
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
05/06/20 4:55:55 0:06 Leave Account Screen
-------------------------------------------------------------------------------
DAILY TOTALS
5:33:46 - Time on Account screen for the day.
3 Calls 1 Calls per hour
3 Contacts 1 Contacts per hour
3 Accounts worked 1 Accounts worked per hour
05/07/20 Acme, Inc. PAGE 1
9:48 AM ABC ACCOUNT ACTIVITY - Lisa Simpson
The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
DATE TIME ELAPSED ACTION
04/16/20 8:06:50 0:00 Enter Account Screen
-------------------------------------------------------------------------------
8:06:53 0:03 Start account 6543212 DOE, JANE
8:07:24 0:31 Finished account in 31 seconds
-------------------------------------------------------------------------------
8:07:26 0:02 Start account 88888888 DEER, JOHN
8:07:27 1:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
05/06/20 4:55:49 5:08 Leave Account Screen 10:33 Elapsed
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
05/06/20 4:55:55 0:06 Leave Account Screen
-------------------------------------------------------------------------------
DAILY TOTALS
5:33:46 - Time on Account screen for the day.
3 Calls 1 Calls per hour
3 Contacts 1 Contacts per hour
3 Accounts worked 1 Accounts worked per hour
You're going to struggle with a regex. It seems to flap repeating the second capturing group. I tried for a while, adding in labels for your pertinent matches, and I was only picking up the first matches using this regex. Anyone who is a "regex king", please look away.
(?smi)(^\s+\d+:\d+\s+(AM|PM)\s+\w+\s+ACCOUNT ACTIVITY\s-\s)(?<name>\w+\s+\w+$)(.+?(?<begin>\d+:\d+:\d+)(\s+\d:\d+\s+)(?<acctnumber>Start Account\s\d+)(\s+)(?<account>\w+,\s\w+(\s[A-za-z]|))\s+(?<end>.+?\d:\d+))
You can provide a template to pick up all fields of potential interest and use ConvertFrom-String. The key is to label all the items you want uniquely in braces. You then have to mark the first item in the template with a star, so using your example from above, you'd have something like this.
$template = #"
05/07/20 Acme, Inc. PAGE 1
9:48 AM ABC ACCOUNT ACTIVITY - {customer*:Bart Simpson}
The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
DATE TIME ELAPSED ACTION
04/16/20 8:06:50 0:00 Enter Account Screen
-------------------------------------------------------------------------------
{begin1:8:06:53} 0:03 {accNum1:Start account 12345678} {name1:ROSS, BOB N}
{end1:8:07:24} 0:31 Finished account in 31 seconds
-------------------------------------------------------------------------------
{begin2:8:07:26} 0:02 {accNum2:Start account 54321234} {name2:DOE, JOHN}
{end2:8:07:27} 0:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
{begin3:8:07:28} 0:02 {accNum3:Start account 54321234} {name3:DOE, JOHN}
{end3:8:10:26} 0:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
05/06/20 4:55:49 5:08 Leave Account Screen 9:33 Elapsed
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
05/06/20 4:55:55 0:06 Leave Account Screen
-------------------------------------------------------------------------------
DAILY TOTALS
5:33:46 - Time on Account screen for the day.
3 Calls 1 Calls per hour
3 Contacts 1 Contacts per hour
3 Accounts worked 1 Accounts worked per hour
05/07/20 Acme, Inc. PAGE 1
9:48 AM ABC ACCOUNT ACTIVITY - {customer*:Lisa Simpson}
The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
DATE TIME ELAPSED ACTION
04/16/20 8:06:50 0:00 Enter Account Screen
-------------------------------------------------------------------------------
{begin1:8:06:53} 0:03 {accNum1:Start account 6543212} {name1:DOE, JANE}
{end1:8:07:24} 0:31 Finished account in 31 seconds
-------------------------------------------------------------------------------
{begin2:8:07:26} 0:02 {accNum2:Start account 88888888} {name2:DEER, JOHN}
{end2:8:07:27} 1:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
{begin3:\s} 0:02 {accNum3:\s} {name3:\s}
{end3:\s} 1:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
05/06/20 4:55:49 5:08 Leave Account Screen 10:33 Elapsed
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
05/06/20 4:55:55 0:06 Leave Account Screen
-------------------------------------------------------------------------------
DAILY TOTALS
5:33:46 - Time on Account screen for the day.
3 Calls 1 Calls per hour
3 Contacts 1 Contacts per hour
3 Accounts worked 1 Accounts worked per hour
"#
In your final example I have added a third set with a regex space in them, so that it doesn't duplicate the second set of data in set three.
You can then pipe your full input through the cmdlet, using the -TemplateContent parameter to apply your template. And you should get the data out the other side.
$data = # Get your data
$data | ConvertFrom-String -TemplateContent $template
customer : Bart Simpson
begin1 : 8:06:53
accNum1 : Start account 12345678
name1 : ROSS, BOB N
end1 : 8:07:24
begin2 : 8:07:26
accNum2 : Start account 54321234
name2 : DOE, JOHN
end2 : 8:07:27
begin3 : 8:07:28
accNum3 : Start account 54321234
name3 : DOE, JOHN
end3 : 8:10:26
customer : Lisa Simpson
begin1 : 8:06:53
accNum1 : Start account 6543212
name1 : DOE, JANE
end1 : 8:07:24
begin2 : 8:07:26
accNum2 : Start account 88888888
name2 : DEER, JOHN
end2 : 8:07:27
You can then compare your data, looping through the output objects.

How do I make regex non-greedy to extract specific element

I have the following text from which I need to extract certain phrases:
Restricted Cash 951 37505 Accounts Receivable - Affiliate 31613 27539 Accounts
Receivable - Third Party 23091 2641 Crude Oil Inventory 2200 0 Other Current
Assets 2724 389
Total Current Assets 71319 86100 Property Plant and Equipment Total Property
Plant and Equipment Gross 1500609 706039 Less Accumulated
Depreciation and Amortization (79357) (44271) Total Property Plant and Equipment
Net 1421252 661768 Intangible Assets Net 310202 0 Goodwill 109734 0 Investments
82317 80461 Other Noncurrent Assets 3093 1429 Total Assets 1997917 829758
LIABILITIES Current Liabilities Accounts Payable - Affiliate 2778 1616 Accounts
Payable - Trade 92756 109893 Other Current Liabilities 9217 2876 Total Current
Liabilities 104751 114385 Long-Term Liabilities Long-Term Debt 559021 85000
Asset Retirement Obligations 17330 10416 Other Long-Term Liabilities 582 3727
Total Liabilities 681684 213528 EQUITY Partners' Equity Limited Partner
Common Units (23759 and 23712 units outstanding respectively) 699866 642616
Subordinated Units (15903 units outstanding) (130207) (168136) General Partner 2421 520
Total Partners' Equity 572080 475000 Noncontrolling Interests 744153 141230 Total
Equity 1316233 616230 Total Liabilities and Equity 1997917 829758
I need to remove all phrases that would be in parenthesis, i.e. (), and also would contain number with word outstanding or units.
Based on these conditions, I have two phrases that needs to be removed:
(23759 and 23712 units outstanding respectively)
(15903 units outstanding)
I have tried the following Regex in Python:
\(\d+.+?(outstanding)+?\)
The idea was that .+? after \d+ will make Regex non-greedy (lazy). However, regex selects huge segment starting from (79357) (44271) Total Property Plant and Equipment till outstanding) which is greedy.
The unique marker here is word outstanding, may be there is better approach to extracting those phrases?
You may use
\(\d[^()]*outstanding[^()]*\)
See the regex demo and the regex graph:
Details
\( - ( char
\d - a digit
[^()]* - 0+ chars other than ( and )
outstanding - a substring
[^()]* - 0+ chars other than ( and )
\) - a ) char.
Python:
re.findall(r'\(\d[^()]*outstanding[^()]*\)', s)