I have 2 datasets, one large and one small. While functionality is most important here, efficiency matters as well, because the large dataset could be potentially tens of millions of records. Let's call my large dataset "Transactions" and my small one "Prices." Here's what I'm trying to do. In the transactions file, there are a bunch of values for 'Store'. For each 'Store', I would like to create a hash table of relevant 'Product', and for each unique 'Saledate', pull all product 'Price' into the output table, not just for the associated 'Product' but for ALL 'Products' found in the Transactions dataset for that 'Store'.
Here's a sample of the "Transactions" dataset:
Store Product SaleDate Price
A apple 1/1/2011 1.05
A apple 1/3/2011 1.02
A apple 1/4/2011 1.07
A pepper 1/2/2011 0.73
A pepper 1/3/2011 0.75
A pepper 1/6/2011 0.79
And here's a sample of the "Prices" dataset:
Product Saledate Price
apple 1/1/2011 1.05
apple 1/2/2011 1.06
apple 1/3/2011 1.02
apple 1/4/2011 1.07
...
pepper 1/1/2011 0.74
pepper 1/2/2011 0.73
pepper 1/3/2011 0.75
pepper 1/4/2011 0.75
pepper 1/5/2011 0.75
pepper 1/6/2011 0.79
...
mango 1/1/2011 2.40
mango 1/2/2011 2.42
...
And here's the desired output for this scenario(the asterisk denotes that the price was inserted from "Prices" into "Transactions" via lookup).
Store Product Saledate Price
A apple 1/1/2011 1.05
A pepper 1/1/2011 0.74 *
A apple 1/2/2011 1.06 *
A pepper 1/2/2011 0.73
A apple 1/3/2011 1.02
A pepper 1/3/2011 0.75
A apple 1/4/2011 1.07
A pepper 1/4/2011 0.75 *
A apple 1/6/2011 1.10 *
A pepper 1/6/2011 0.79
Basically, in pseudo-code, the idea is do loop through each store, look-up its list of distinct products and saledates, then find corresponding prices for those products and saledates in the prices data and insert them if they are not already there. In the example, since apple has a saledate of 1/1 in "Transactions", I need the saledate for pepper as well (from "Prices"). The same applies for 1/2, except vice-versa since the price is already there for pepper, but not for apple. 1/5 has a record in "Prices", but it's not needed, since it doesn't occur for either apple/pepper in "Transactions." For another store, different products exist, so pepper is not relevant at all, but mango is.
I've tried several approaches and can't get around the "double" look-up hangup that I think is necessary.
Here is a link to a previous question/answer related to this new question. The answer provides sample code to create the dummy tables. https://stackoverflow.com/a/36961795/214994
I think you can accomplish what you are looking for with a series of SQL statements.
data trans;
format Store $1. Product $6. SaleDate mmddyy10. price best.;
informat SaleDate mmddyy10.;
input Store $ Product $ SaleDate price;
datalines;
A apple 1/1/2011 1.05
A apple 1/3/2011 1.02
A apple 1/4/2011 1.07
A pepper 1/2/2011 0.73
A pepper 1/3/2011 0.75
A pepper 1/6/2011 0.79
B apple 1/1/2011 1.05
B apple 1/3/2011 1.02
B apple 1/4/2011 1.07
B mango 1/2/2011 2.42
B mango 1/3/2011 2.43
B mango 1/6/2011 2.46
;
data prices;
format Product $6. SaleDate mmddyy10. price best.;
informat SaleDate mmddyy10.;
input Product $ SaleDate price;
datalines;
apple 1/1/2011 1.05
apple 1/2/2011 1.06
apple 1/3/2011 1.02
apple 1/4/2011 1.07
apple 1/5/2011 1.10
apple 1/6/2011 1.15
pepper 1/1/2011 0.74
pepper 1/2/2011 0.73
pepper 1/3/2011 0.75
pepper 1/4/2011 0.75
pepper 1/5/2011 0.75
pepper 1/6/2011 0.79
mango 1/1/2011 2.40
mango 1/2/2011 2.42
mango 1/3/2011 2.43
mango 1/4/2011 2.44
mango 1/5/2011 2.45
mango 1/6/2011 2.46
;
proc sql noprint;
/*Store and Date Combinations*/
create table store_dates as
select distinct store, saledate
from trans;
/*Store and Product Combinations*/
create table store_products as
select distinct store, product
from trans;
/*Full Joins the combination tables and look up the price with a left join*/
create table want as
select a.store,
b.product,
a.saledate,
c.price
from store_dates as a
full join
store_products as b
on a.store=b.store
left join
prices as c
on b.product = c.product
and a.saledate = c.saledate
order by a.store, a.saledate, b.product;
quit;
This assumes that there are no differences in prices between TRANS and PRICES. If so, add another left join of TRANS and a coalesce() to handle.
Related
I would like to create a matrix visual like below and add data bars as conditional formating to the "Sales Percentage" Column with different user defined max and min values based on the countries.
I have the following dummy data
Salesperson
Country
Product
Sales Percentage
Total Sales
Gina
Canada
City Bike
0.02
232
Gina
Canada
Mountain Bike
0.56
2800
Gina
Italy
City Bike
0.32
213
Gina
Italy
Mountain Bike
0.21
1050
Gina
USA
City Bike
0.11
122
Gina
USA
Mountain Bike
0.43
2150
John
Canada
City Bike
0.32
333
John
Canada
Mountain Bike
0.34
442
John
Italy
City Bike
0.12
2132
John
Italy
Mountain Bike
0.67
1233
John
USA
City Bike
0.22
3300
John
USA
Mountain Bike
0.45
7300
Mary
Canada
City Bike
0.21
121
Mary
Canada
Mountain Bike
0.53
2650
Mary
Italy
City Bike
0.32
213
Mary
Italy
Mountain Bike
0.12
600
Mary
USA
City Bike
0.11
123
Mary
USA
Mountain Bike
0.12
600
The matrix looks like this after showing columns as rows and putting "Sales Percentage" and "Total Sales" as values, Country as columns and Product + Salesperson as rows:
I can add databars when I right click the Sales Percentage under values but I can only enter one user defined min and max value for the whole "Sales Percentage" column. Is it possible to have different maximum value for data bars based on the Country? For example to create a target value of 35% for Canada, 40% for USA and 50% for Italy. So in other words the data bar would be full when the Sales Percentage for Canada reaches 35% and full when Sales Percentage for USA reaches 40% and so on.
This isn't possible with you current setup. The best you could do to approximate this is as follows.
Create a measure as follows:
% Canada = CALCULATE(SUM('Table'[Total Sales]), 'Table'[Country ] = "Canada")
Do the same for USA and Italy and then add them as values to your matrix.
You can now select individual targets for each country.
Imagine a table with 3 columns:
Date
AssetType
Value
2022-01-01
A
1
2022-01-02
A
1.02
2022-01-03
A
1.05
2022-01-04
A
1.09
2022-01-05
A
1.06
2022-01-03
B
1
2022-01-04
B
1.05
2022-01-05
B
1.07
2022-01-06
B
1.09
2022-01-07
B
1.08
The First date of 2022 for each asset is diferent.
Asset A - 2022-01-01
Asset B - 2022-01-03
I want to create a new column or measure that returns the first date of 2022 for both assets.
So far i've tried to use = CALCULATE(STARTOFYEAR(table[date])), FILTER(Table, Table[AssetType] = [Asset type]
Obs. [Asset Type] is a measure tha giver me the type of asset.
But is returning the same date for both assets (2022-01-01)
Does anyone knows how get this done ?
Date
AssetType
Value
FirstDate
2022-01-01
A
1
2022-01-01
2022-01-02
A
1.02
2022-01-01
2022-01-03
A
1.05
2022-01-01
2022-01-04
A
1.09
2022-01-01
2022-01-05
A
1.06
2022-01-01
2022-01-03
B
1
2022-01-03
2022-01-04
B
1.05
2022-01-03
2022-01-05
B
1.07
2022-01-03
2022-01-06
B
1.09
2022-01-03
2022-01-07
B
1.08
2022-01-03
Thx
OK. This Time create a calculated column and paste this code:
FirstDate =
CALCULATE (
MIN ( YourTable[Date] ),
ALLEXCEPT ( YourTable, YourTable[AssetType] )
)
The result :
I have many dataset from files to be merged and arranged in one single output file. Here is the example of any two datasets to be merged accordingly.
Data 1 from File 1:
9.00 2.80 13.08 12.78 0.73
10.00 -3.44 19.30 18.99 0.14
12.00 2.60 20.28 20.12 0.39
Data 2 from File 2:
2.00 -7.73 20.04 18.49 0.62
5.00 -4.82 17.07 16.38 0.59
6.00 -2.69 12.55 12.25 0.50
8.00 -3.85 18.06 17.64 0.94
9.00 -3.59 16.13 15.73 0.64
Expected output in one file:
9.00 2.80 13.08 12.78 0.73
10.00 -3.44 19.30 18.99 0.14
12.00 2.60 20.28 20.12 0.39
2.00 -7.73 20.04 18.49 0.62
5.00 -4.82 17.07 16.38 0.59
6.00 -2.69 12.55 12.25 0.50
8.00 -3.85 18.06 17.64 0.94
9.00 -3.59 16.13 15.73 0.64
Temporarily the script i used using Python loop for is like this:
import numpy as np
import glob
path='./13-stat-plot-extreme-combine/'
files=glob.glob(path+'13-stat*.dat')
for x in range(len(files)):
file1=files[x]
data1=np.loadtxt(file1)
np.savetxt("Combine-Stats.dat",data1,fmt='%9.2f')
The problem is only one dataset is saved on that new file. Question how to use concatenate to such case at different axis dataset?
Like this:
arrays = [np.loadtxt(name) for name in files]
combined = np.concatenate(arrays)
I'm looking to do a lookup that is sort of a hybrid between a join and union. I have a large number of records in my main dataset, so I'm looking to do something that wouldn't be a "brute force" method of a many-to-many matrix.
Here is my main dataset, called 'All', which already contains price for each of the products listed.
product date price
apple 1/1/2011 1.05
apple 1/3/2011 1.02
apple 1/4/2011 1.07
pepper 1/2/2011 0.73
pepper 1/3/2011 0.75
pepper 1/6/2011 0.79
My other data dataset ('Prices' - not shown here, but contains the same two keys, product and date) contains prices for all products, on each possible date. The hash table look up I would like to create would essentially look up every date in the 'All' table, and output prices for ALL products for that date, resulting in a table such as this:
product date price
apple 1/1/2011 1.05
pepper 1/1/2011 0.71 *
apple 1/2/2011 1.04 *
pepper 1/2/2011 0.73
apple 1/3/2011 1.02
pepper 1/3/2011 0.75
apple 1/4/2011 1.07
pepper 1/4/2011 0.76 *
apple 1/6/2011 1.10 *
pepper 1/6/2011 0.79
That is, as long as one product has a date and price specified 'All' table, all other products should pull that in from the lookup table as welll.
The asterisks indicate that the price was looked up from the prices table, and new rows containing prices for products were essentially inserted into the new table.
If hash table are not a great way to go about this, please let me know alternative methods.
Well this is far from elegant, but curious if the below gives you the desired result? Since you have multiple records per key in ALL (which I assume you want to maintain), I basically unioned ALL with the records in PRICES that have a date in All, but I added an Except so as to excluded records that were already in ALL. No idea if this makes sense, or is doing what you want. Certainly doesn't qualify as 'elegant'.
data all;
input product $7. date mmddyy10. price;
Y=1;
format date mmddyy10.;
cards;
apple 01/01/2011 1.05
apple 01/01/2011 1.05
apple 01/03/2011 1.02
pepper 01/02/2011 0.73
pepper 01/03/2011 0.75
pepper 01/06/2011 0.79
;
run;
data prices;
input product $7. date mmddyy10. price;
format date mmddyy10.;
cards;
apple 01/01/2011 1.05
apple 01/02/2011 1.04
apple 01/03/2011 1.02
apple 01/04/2011 1.07
apple 01/05/2011 1.01
pepper 01/01/2011 0.70
pepper 01/02/2011 0.73
pepper 01/03/2011 0.75
pepper 01/04/2011 0.76
pepper 01/05/2011 0.77
pepper 01/06/2011 0.79
;
run;
proc sql;
create table want as
select * from all
union corr all
( (select product,date,price from
prices
where date IN (select distinct date from all)
)
except corr
select product,date,price from all
)
;
quit;
Hi I am using Regex Hero to make a Regex. It worked as expected in Regex Hero. I then transferred it over to vb.net and now I get different results from the same exact data. I don't get it!
The Regex:
\d{10,13}.+?(?=(\bF\b|\bT\b|\bCT\b))
The .net code:
Dim strRegex As String = "\d{10,13}.+?(?=(\bF\b|\bT\b|\bCT\b))"
Dim myRegex As New Regex(strRegex, RegexOptions.None)
Dim strTargetString As String = a
For Each myMatch As Match In myRegex.Matches(strTargetString)
If myMatch.Success Then
RichTextBox1.AppendText(myMatch.Value & Environment.NewLine)
End If
Next
The Data:
" The Meijer Team appreciates your business 12/26/14 Your fast and friendly checkout was provided by ALARIA MEIJER SAVINGS SPECIALS 4.77 COUPONS 20.00 SAVINGS TOTAL 24.77 YOUR TOTAL SAVINGS SINCE 01/01/14 1,814.62 For additional savings and rewards visit mPerks.com GENERAL MERCHANDISE 7569107330 DYNO GEL THIMBL 1.49 CT 7569100487 DYNO THIMBLES 3.49 CT DRUGSTORE 2220094152 DEODORANT 1.99 T 30041667803 TOOTHBRUSH 9.99 T *70882049496 ORBIT TOOTHB was 3.69 now 2.95 T *1700006806 ANTIPERSPIRANT 1 # 2 / 6.00 was 3.99 now 3.00 T GROCERY 6414404213 CHEF BOYARDEE 2 # 1.07 2.14 F 6414404306 CHEF BOYARDEE 1.07 F 6414404315 CHEF BOYARDEE 1.07 F 6414404322 CHEF BOYARDEE 1.07 F 7680828008 SPAGHETTI 2 # 1.34 2.68 F 5100002549 PASTA SAUCE 2 # 1.97 3.94 F 4335400750 TORTILLAS 1.99 F 4400000057 SALTINES 2.69 F 4400002854 NABISCO OREOS 2 # 2.98 5.96 F 1312000484 FROZEN FRIES 2.99 F 4125002562 CHEESE SLICES 2.99 F 4125010210 MEIJER MILK 2 # 3.09 6.18 F 5150092751 PANCAKE SYRUP 3.19 F 3000032188 OATMEAL 3.29 F 3000032189 OATMEAL 3.29 F 8390000649 GOLD PEAK TEA 3.29 F 71373336283 MEIJER MILK 3.29 F 4400002734 COOKIES 3.49 F 1410007083 PEPPERDIGE FARM 2 # 3.99 7.98 F 1600027297 CEREAL 4.19 F 1600043471 CEREAL 4.19 F 4850001833 ORANGE JUICE 5.69 F 1450001420 BIRDS EYE VOILA 7.39 F *4400003113 SNACK CRACKER was 2.77 now 1.99 F *5100002526 SPAGHETTIOS 3 # 5 / 5.00 was 3.27 now 3.00 F *7192176312 FROZEN PIZZA 2 # 5.49 was 12.58 now 10.98 F *7131400331 AUNT MILLIE"S 1 # 2 / 6.00 was 3.39 now 3.00 F Total Basket Coupon => 20.00 off -20.00 Mperks # -- ********** TOTAL MI 6% Sales Tax 1.16 TOTAL TAX 1.16 TOTAL 107.09 PAYMENTS Primary Account - Debit ATM/DEBIT CARD TENDER 107.09 XXXXXXXXXXXXXXXX NUMBER OF ITEMS 42 See meijer.com or the Service Desk for current return policy. For additional savings and rewards visit mPerks.com. Tx:XXX Op:XXXXXX Tm:XX St:XX XXXXXXXXXXX How are we doing? Rate your shopping experience and you may win $1000 in Meijer gift cards! Visit us at www.meijer.com/tellmeijer or call 1-800-394-7198 Secure Code: 7800-0601-5020-3373-001 Survey should be completed within 72 hrs "