Trying to scrape substrings within paragraphs using regex - regex

Main questions I have:
RegEx practices for statements that are close but not similar iterated through a list of websites when dealing with doubles and text
Should I pull all of the text then parse out what I want in the following code? Or is there a better way?
I am currently trying to extract interest rate data from different lenders using regular expression. I want to use regex because I want to be able to add more URLs as more lenders arise. I want the extracted information to basically take on the form of:
x.xx% - xx.xx% (the low - high bands of the interest rates)
Most of the websites vary in how they present this, some examples:
APR ranges from x.xx% to xx.xx%
Rates range from x.xx% to xx.xx%
x.xx% - xx.xx%
range from x.xx% (AA) to xx.xx% (HR)
Currently I am trying to just grab the paragraph the text lives in and then make substrings off of that to create the final piece of information I need (x.xx% - xx.xx%). Not sure if this is the best method, but would like to crowdsource my issue.
plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
'https://www.marcus.com/us/en/personal-loans',
'https://www.discover.com/personal-loans/',
'https://www.lightstream.com/',
'https://www.prosper.com/']
#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
l = r.get(link)
l.encoding = 'utf-8'
data = l.text
soup = bs(data, 'html.parser')
paragraph = soup.body.findAll(text=re.compile('% APR'))
#using this next if statement to try and iterate through what turned up empty in the initial pass through with the first .compile statement
if paragraph == []:
paragraph = soup.body.findAll(text=re.compile('% - [0-9]'))
print(paragraph)
Which returns this:
[]
[]
[' 6.99% to 24.99% APR']
['\r\n Payment example: Monthly payments for a $10,000 loan at 3.09% APR with a term of 3 years would result in 36 monthly payments of $291.21.', 'The lender’s interest rate (APR) must not be supported by any third-party arrangements such as vehicle manufacturer subvention payments (with rates as low as 0.0% APR), other manufacturer discounts, rate buy-downs by car buying services or any other\n similar third-party subsidized rate offerings.']
['* For example, a three-year $10,000 loan with a Prosper Rating of AA would have an interest rate of 5.31% and a 2.41% origination fee for an annual percentage rate (APR) of 6.95% APR. You would receive $9,759 and make 36 scheduled monthly payments of $301.10. A five-year $10,000 loan with a Prosper Rating of A would have an interest rate of 8.39% and a 5.00% origination fee with a 10.59% APR. You would receive $9,500 and make 60 scheduled monthly payments of $204.64. Origination fees vary between 2.41%-5%. APRs through Prosper range from 6.95% (AA) to 35.99% (HR) for first-time borrowers, with the lowest rates for the most creditworthy borrowers. Eligibility for loans up to $40,000 depends on the information provided by the applicant in the application form. Eligibility is not guaranteed, and requires that a sufficient number of investors commit funds to your account and that you meet credit and other conditions. Refer to Borrower Registration Agreement for details and all terms and conditions. All loans made by WebBank, member FDIC.']

Related

How do I include a 'between two dates' IF AND formula inside of a SUMIFS formula?

I am doing a personal budgeting project. I have 2 excel files - 1 is detail of my last 5 years banking activity, the other is an annual monthly summary and analysis (S/A) file - so 12 worksheets inside the one workbook.
I am trying to find a way to make the S/A file more dynamic so I can copy 1 formula to the various fields that look up and sum detail activity equal to or after the first date but before the 2nd date BUT ONLY for lines that match the category found in Column B of the S/A file worksheet. This gives me the activity in my budget categories between 1 paydate and the day before the next paydate.
Example: I want to find and sum all the amounts (Detail file $F:$F) in my detail file that are for Cell Phone payments (Detail File $E:$E, matching S/A field B6) where the transaction date (Detail file $A:$A) is on or after 7/14/2017 (start Date, S/A field E5) but before 7/28/2017 (end date, S/A field E33).
This is what I have tried:
=IF(AND('[2016-Present Bank Statements.xlsx]Checking TRX'!$A$2:$A$2959>(E5-1),'[2016-Present Bank Statements.xlsx]Checking TRX'!$A$2:$A$2959<E33),SUMIF('[2016-Present Bank Statements.xlsx]Checking TRX'!$E$2:$E$2959,B6,'[2016-Present Bank Statements.xlsx]Checking TRX'!$F$1:$F$2959),0)
I was getting an error trying to put ">=" into the first component so I put the cell reference -1 to get the previous date from my starting date. So anything greater than E5-1 should include the date in E5 (because let's face it, we all go mad spending on payday).
=SUMIFS('[2016-Present Bank Statements.xlsx]Checking TRX'!$F:$F,'[2016-Present Bank Statements.xlsx]Checking TRX'!$E:$E,B6,'[2016-Present Bank Statements.xlsx]Checking TRX'!$A:$A,"E5≤x<E33")
The last part "E5≤x<E33" I found on the interwebs for comparing dates.
I have tried breaking out the greater than/equal to and less than statements into 2 criteria in a SUMIFS, but no luck.
I either get back 0.00 or #SPILL
These are the headers in my details file. Do I need to rearrange them to get either of these formulas to pick up the right amount?
Date|Account|Merchant|Description|Category|Amount
My current process is to do a SUMIF but have it reference the specific lines that related to the start and end dates. I've gotten off somehow between yesterday and today and don't want to redo all my tabs so far.
=-SUMIF('[2016-Present Bank Statements.xlsx]Checking TRX'!$E$2076:$E$2168,B6,'[2016-Present Bank Statements.xlsx]Checking TRX'!$F$2076:$F$2168)
Please help!!!
Use SUMIFS. Something like:
=SUMIFS(sum_range,
type_range, type,
date_range, ">"&start_date,
date_range, "<"&stop_date)
Obviously replace ">" with ">=" if you see fit. See the simple example below.

DAX measure: "Weighted distribution %"

I hope you can help me with a complicated problem.
I am trying to make a measure which calculate the “weighted distribution %” of a product.
The business definition for this calculation is:
“In a customer Chain we have a number of customers which buys a specific product(selected). We need to find out how much volume these “BUYING customers” buys of the whole PRODUCT GROUP (which this product belongs to) and compare it to the volume of the PRODUCT GROUP bought by ALL customers in the Customer Chain.”
Example (calculation):
Product (selected)=”Product 1”
Number of ALL Customers in “Chain 1” = 18
Volume of PRODUCT GROUP bought by ALL customers in chain = 10.915
Number of BUYING customers in “Chain 1” (who have bought “Product 1”) = 8
Volume of PRODUCT GROUP bought by BUYING customers in chain = 6.945
Calculation:
Weighted distribution % =
Volume (BUYING Customers in chain) / Volume (ALL Customers in chain) = 6.945 / 10.915 = 63,6%
Example (Calculation setup in PBI):
Now, my datamodel is this (simplified):
NOTE(just for info): you may ask why I make a customer count on both “D_Customer” and “F_SALES”, but that is because I can make a customer count on specific transaction dates I F_SALES, ande these I don’t have in D_CUSTOMERIf I set the the following filters:
Chain= “Chain 1”
Product=”Product 1”
I get the following table:
I then calculate the volume on the PRODUCT GROUP with the following measure
Volume (PRODUCT GROUP) = CALCULATE('F_SALES'[Volume];ALLEXCEPT('D_PRODUCT';'D_PRODUCT'[Product group]))
And add it to the table:
Now I have the “Volume (ALL Customers in chain)” part for my weighted distribution calculation.
My problem is how I make a measure, which shows the Volume for the BUYING customers only?
I have tried to make the following calculation, which brings me close:
Volume (BUYING Customers) =
VAR BuyingCustomers_=CALCULATE([Number of Customers(F_SALES)];FILTER('F_SALES';NOT(ISBLANK('Sold to trade'[Customer ID]))))
RETURN
SUMX(SUMMARIZE(D_Customer;D_Customer[Customer Chain];"volume";CALCULATE('F_SALE'[Volume];ALLEXCEPT('D_Product';'D_Product'[Product Group]);FILTER('F_SALE';NOT(ISBLANK(BuyingCustomers_)))));[volume])
Result:
But, as you can see, the volume doesn’t aggregate to “PRODUCT GROUP”-level ?
What I need is this:
Which will give me the measure necessary for my calculation:
Can anyone bring me the missing part?
It will be greatly appreciated.
Br,
JayJay0306

Netsuite Saved Search a list of ID's by invoice No

I have a request to extract specific Invoices, over the course of an year. For target invoices i have the following information to use: Invoice Number, Invoice date, Order Ref, Value and currency. I tried extracting few months at a time, but it's too much data.
Is there a way to filter on about 200 unique Invoice No in Netsuite?
Thank you,
Daniel
You can do this with a Formula(Numeric) filter in your criteria like this:
Filter: Formula (Numeric)
Description: is greater than 0
Formula: INSTR(',SLS00000101,SLS00000102,SLS00000103,SLS00000104,SLS00000105,', ',' || {tranid} || ',')
Note that the initial string is a comma separated list of document numbers that begins and ends with a comma.
I'm not sure if there is an upper limit on formula size, but I've used this pattern to find a large number of transactions when I know the document numbers or internal IDs.

What is difference between Authorize.Net AIM line item and description

I'm not sure what the difference between description and lineitem. Is line item a comma separated list of the products bought? I would like to know the variables max length and data type. Also, like to know what field appears in users credit card statement, description or lineitem. Cant find any good documentation.
The description is your description of the transaction. It can be something like, "First subscription payment" or, "Refund for transaction #123456". It may be up to 255 characters (no symbols). See page 24 of the AIM guide for more information,.
Line items are for adding each item or service the user purchased. This includes name, price, quantity, etc. It's basically so you have a record of what the purchase was for. Line items have the following format:
x_line_item
Optional
Value: Any string
Format: Line item values must be delimited by a bracketed pipe <|>
Itemized order information
Item ID<|>
The ID assigned to an item.
Format: Up to 31 characters
<|>item name<|>
A short description of an item.
Format: Up to 31 characters
<|>item description<|>
A detailed description of an item.
Format: Up to 255 characters
<|>itemX quantity<|>
The quantity of an item.
Format: Up to two decimal places
Must be a positive number
<|>item price (unit cost)<|>
Cost of an item per unit, excluding tax, freight, and duty.
Format: Up to two decimal places
Must be a positive number
The dollar sign ($) is not allowed when submitting delimited information.
<|>itemX
taxable
Indicates whether the item is subject to tax.
Format: TRUE, FALSE, T, F, YES, NO, Y, N, 1, 0
Neither are required for a transaction to be processed or appear on your customers' credit card statements. Both exist to allow you to keep your transaction better organized in your Authorize.Net account.

Currency exchange rates through a web service in MATLAB

How can I obtain the current exchange rate for two given currencies in matlab?
I tried this one, however it seems that the web service is no longer available.
Is there another easy way of obtaining the up-to-date currency exchange rates through a web service in matlab?
Build a local class from a currency conversion web service using CREATECLASSFROMWSDL. You can then use the web service's operations to do the conversion using the class methods. One currency conversion web service (there are many) is available at http://www.webservicex.net/CurrencyConvertor.asmx?WSDL. Here's an example of its use:
>> converter = createClassFromWsdl('http://www.webservicex.net/CurrencyConvertor.asmx?WSDL');
Retrieving document at 'http://www.webservicex.net/CurrencyConvertor.asmx?WSDL'
>> converter = CurrencyConvertor
endpoint: 'http://www.webservicex.net/CurrencyConvertor.asmx'
wsdl: 'http://www.webservicex.net/CurrencyConvertor.asmx?WSDL'
>> ConversionRate(converter, 'CAD', 'EUR')
ans =
0.7059
>> ConversionRate(converter, 'USD', 'CAD')
ans =
0.953
Note that ConversionRate returns a char array, i.e. you still have to convert the result with str2double if you want to do calculations with the exchange rate.
A list of the currency abbreviations is available at http://www.webservicex.net/ws/wsdetails.aspx?wsid=10.
This is an old question, but thought I would update the answer. I made this currency converter function in MATLAB (exchangerate.m) that utilizes the openexchangerates.org API, which is better supported and also includes historical data. Here's a description how it works (it's very simple):
This function returns exchange rates obtained from openexchangerates.org
using their API. To work correctly, one must be connected to the
Internet. The default app_id is from a free account to
openexchangerates.org, which has a limit of 1000 API requests/month. For
more flexibility, sign up for your own free or paid account and replace
the app_id value with your own id number.
Inputs:
1) base: a string denoting the base currency, which is set to have a
value of 1. If an empty string '' is provided, the default 'USD' is
used. See list of valid currency abbreviations below.
2) curr: a
string or cell array of strings denoting the currency abbreviation to
compare with the base currency. If 'all' or '' is provided as input,
then all available currencies are returned. See list of valid
currency abbreviations below.
3) date: an optional string containing
the date desired for the exchange rate (historical data may not
always be available). The input should be in the form 'YYYY-MM-DD'.
To get the latest exchange rate data, use date
= 'latest' or '', which is the default value. Historical data from 1999 and onward
Outputs:
1) rates: a number or vector indicating the exchange rate(s) between the desired currency (currencies), curr, and the base currency, base.
2) currencies: a cell array of the corresponding currency
abbreviations in rates.
3) rate_struct: a structure with field names
equal to the currency abbreviations and associated values being the
rates. This output just combines rates and currencies for
convenience.
Examples:
1) Get the latest exchange rate between Bitcoin and the US Dollar (Note: All country abbreviations are listed in the m-file)
[rates,currencies,rates_struct] = exchangerate('USD','BTC');
>> rates = 1.614e-3
>> currencies = 'BTC'
>> rates_struct =
BTC: 1.614e-3
2) Get latest exchange rates for all available currencies
[rates,currencies,rates_struct] = exchangerate();
3) Obtain exchange rates for Bitcoin, Indian rupee, and Euro using the US
Dollar as base currency on June 5, 2013
[rates,currencies,rates_struct] = exchangerate('USD',{'BTC','INR','EUR'},'2013-06-05');
>> rates = [8.246e-3; 5.672e1; 7.642e-1]
>> currencies = {'BTC';'INR';'EUR'}
>> rates_struct =
BTC: 8.246e-3
INR: 5.672e1
EUR: 7.642e-1