SAS Enterprise Miner Score Node giving Identical Scores - sas

This question relates to the SAS Enterprise Miner product specifically.
I have an Enterprise Miner diagram that seems to work perfectly on training data, I run both an Impute and a Neural Network to get back a perfectly reasonable model. When I inspect the output I can see the individual predictions on the model.
I also have data to score, created using the same queries but where the target variable was null. I attempted to generation a prediction using a score node, but for some reason each and every EM_PREDICTION from Score is Identical, even when I open the dataset and visually check that the input variables are different.
I'm at a loss as to what is causing this or how to debug it, has anyone seen this before?
Diagram
Scored view

Related

Sentiment Analysis PowerBI AI Insights Visualization

I have a Data Set of online product-reviews (without any grades/stars/etc.). To this data-set I applied the integrated PowerBI AI-Insights Text Analytics Sentiment Analysis model and got a a sentiment score for each review. Next, I transformed the score into textual discrete values: POSITIVE, NEGATIV and NEUTRAL.
The dataset is artificially created by me, so I know the polarity of each comment. Now I want to compare the predicted value to the actual value. I've done this by adding a new column that compares the actual value with the predicted value and displays "PREDICTED" if the correct value was predicted and "NOT PREDICTED" if the prediction was false (it doesn't matter if it is positive, negative or neutral). My goal is to calculate some model metrics so I can evaluate the capabilities of this PowerBI integrated model and to visualize the results. How can I do this? Is "accuracy" the first thing that I have to start with? If yes then how can I calculate and visualize a result like the "accuracy".
Thank you for all your answers in advance.
Yes, take accuracy in first consideration. If you find 70 or 80 percent above results are accurate, you can easily rely on the PowerBI AI-Insights Text Analytics Sentiment Analysis. You can then create your visuals for Sentiment data. But if there is 50-50 occurrence of predicted and not predicted result, you may go for 3rd party Sentiment analysis service like - Google, Alchemy.

In SAS, how do you create a certain number of records where the primary outcome does not occur based on the value of another variable?

I am examining the effect of passing vs running plays on injuries across a few football seasons. The way the data was collected, all injuries were recorded as well as information about the play in which the injury occurred (ie position, quarter, play type), game info (ie weather conditions, playing surface, etc), and team info (ie number of pass vs run plays in the game).
I would like to use one play as the primary exposure with the outcome as injury vs no injury with analysis using logistic regression, but to do so I would need to create all the records with no injury. There is a range from 0 to around 6-7 injuries in a game for a team, and the total passing and running plays are recorded so I would need to find a way to add X (total passing plays minus injuries on passing plays) and Y (total running plays - injuries on running plays) records that share all the details for that particular game but have no injury as the outcome. I imagine there is a way in proc sql to do this, but I could not find it online. How would I go about coding this?
I have attached an example of the relevant data. An example of what I would need to do is for game 1 add 30 records for passing plays and 38 records for running plays with outcome of no injury and otherwise the same data (team A, dry weather, game plays).
You can use the freq statement to prevent having to de-aggregate it.
The FREQ statement identifies a variable that contains the frequency
of occurrence of each observation. PROC LOGISTIC treats each
observation as if it appears n times, where n is the value of the FREQ
variable for the observation. If it is not an integer, the frequency
value is truncated to an integer.
SAS Documentation
De-aggregating the data would require the data step and a do loop. It's not recommended to do this.

Visual has exceeded the available resources. Tips on how to streamline / better understand limitations?

OK, so I have a relatively complex report that works well on the desktop app but is bombing out on the web portal. Apparently, it is requesting 1048584KB which is just shy of the 1048576KB limit.
This report is a matrix, built as follows:
It is connected to two primary data sources, along with some tertiary feeds and helper tables. One of these is a sales detail table that is a CSV 887MB in size. The other is a purchasing detail table that is an XLS 26MB in size.
I have filtered out portions of the sales table (by date) in the Edit Queries screen. I have also filtered out specific item divisions in the matrix. It is the second step that was allowing this visual to function previously (took out a few not-needed divisions and it started working again, but now this no longer seems to work).
I would like to not just get a quick answer here, but also to better understand how Power BI is allocating the memory and how I can streamline. The rest of the report is using the same data but this is the only visual that fails to load (aside from some tables that are on a line-level and are intended to be filtered down via slicers prior to displaying information). Will add that there are some relatively complex measures that are firing on this visual and not used anywhere else, presuming this has a lot to do with memory demands...right?

How to estimates monthly/daily sales of an item using Amazon Advertising API

I have looked into the responses of "ItemSeach ()" and "lookUp()" functions in Amazon Advertising API and
could not find a possible way to get daily/monthly sales of an item.
Popular product research software like , JungleScout, ProfitPhonix, AMZ tracker etc do display Number of monthly sales but all of them show different results.
Does Amazon provide this information ? If not then how the above software are estimating it?
I think when they fetch the ASIN information, they do store "some thing" in their DB and next time when the same ASIN is pulled again then the estimated sales are roughly calculated based on DB previous value/score.
Any help will be highly appreciated .
Thanks
It is not a solution, but here is a reply from UnicornSmasher I found, it may help to save time searching for something that doesn't exist.
constantine We just took all of the bulk data from the products that are being tracked in AMZ Tracker and applied a formula to it all. If you have specific products that are way off please let us know! Certain categories we had less data on. This is version 1 of the research tool, so I'm sure it will continue to improve quickly over time.
Here is the link to question and answer:
amz forum
So, now, the question is 'What formula do they use?'
Let me know if you come up with an idea :)
Let me tell you first that if you're not the part of the Amazon data team you can't get the sales numbers of any product. And, its probably not easy to estimate sales using Amazon advertising API. You need to constantly track a huge number of products to estimate the sales. Here I can explain how AMZ Insight an Amazon tracking tool estimates the sales of any product.
They constantly track a few thousand products from all the categories and collect massive data. Then their in-house data scientist analyze the data to form the sales estimating algorithm. Relationship of multiple data points plots a scattered graph which means of course sales estimates are not 100 percent right.
Data is continuously gathered and analyzed by tracking the Best Seller Rank (BSR), Buybox, reviews and more factors. Then the relationship between this data is formed to come up with the unit sales. Once this relationship is in place then it is much easier to estimate monthly sales and revenue for the product.

How do I choose an appropriate number of customers for cluster analysis?

I am currently doing a customer segmentation project in SAS.
I have identified 2700 customers who are have made a purchase in each of the 4 years I am analysing. For the cluster analysis the more purchases/customer each year the better the data quality is. However as I become more selective over number of purchases needed each year per customer, the less customers can be considered in the cluster analysis.
How should I go about choosing the cutoff point for the number of purchases necessary per customer per year to be considered for analysis. I am struggling with this trade off between data quality and having enough customers for analysis.
Thanks a lot! :)
There is no correct way. It entirely depends on your data.
Clustering such data is "magic" and the results tend to be all but statistically sound. More like random gueses.
Because of this, always try multiple parameters and carefully inspect the results. No equation ever will tell you what a good clustering is.