PCA for sentiment analysis tweets - pca

I am working on a research project that will relate the sentiment analysis done on tweets with financial markets indexes, such as S&P500 and VIX. My work is based on the Tetlock (2007) paper.
So I have categorized every word in every tweet according to Harvard IV psychological dictionary and then summarise it on a daily basis (i.e. got the frequency of each category per day). Then, rescaled my frequencies dividing by total number of words of all tweets on that day. Also, I have selected only a few categories and not all 180+.
The idea is to construct a factor that captures the latent sentiment on those tweets and the obvious choice is to run a PCA on my frequency sentiment categories data (it is also Tetlock's approach).
My issue is that categories such as Pstv and Ngtv have the same loading signs for the first factor, while I expected them to have opposite signs. Here is a print screen of the R console where we see the loadings and here the screeplot of the components.
Any ideia why this would happen?

Related

How to create an evenly spaced timeseries for forecasting

I am trying to create a forecast but this is the error that I get:
I am working with about 300,000 rows of data. Most of the report has already been built. My data just doesn't cotain certain dates. How can I solve this issue?
So the issue boils down to the problem of "How to create an evenly spaced timeline". You can easily achieve this in PowerQuery
Create a separate daily date table.
Outer join your observations onto the dates, which will give you "null" for the unobserved days
Apply the "fill down" operation on your values column, which basically means that the last value will be repeated until a new observation appears.
These evenly distributed time series is suitable for ML forecasting, at least when it comes to predicting trends. But the real power of this feature in Power BI is in predicting seasonality, and you most likely won't get that right with the above interpolation.

Uncollapse Data for Survival Analysis SAS

I have data that looks like this:
where month is the number of months that have passed, vegetable is a category of interests, and n_spoiled is the number of vegetables from that category that spoiled after x amount of months.
I am interested in running a survival analysis to compare the curves for these three categories (proc lifetest).
It is my understanding that in SAS to run a survival analysis we need the 'uncollapsed' version of this data such that for example we will see 3289 entries with month=1 and potato, 9 entries with month=1 and onion and so on. None of this entries would need to censored for the analysis as all non-completions were omitted form the aggregated data.
I would really appreciate if someone could help me modify the data so that it runs OR alternatively, instruct me as to how to run the test without 'uncollapsing' the data.
Thank you.

Sentiment Analysis PowerBI AI Insights Visualization

I have a Data Set of online product-reviews (without any grades/stars/etc.). To this data-set I applied the integrated PowerBI AI-Insights Text Analytics Sentiment Analysis model and got a a sentiment score for each review. Next, I transformed the score into textual discrete values: POSITIVE, NEGATIV and NEUTRAL.
The dataset is artificially created by me, so I know the polarity of each comment. Now I want to compare the predicted value to the actual value. I've done this by adding a new column that compares the actual value with the predicted value and displays "PREDICTED" if the correct value was predicted and "NOT PREDICTED" if the prediction was false (it doesn't matter if it is positive, negative or neutral). My goal is to calculate some model metrics so I can evaluate the capabilities of this PowerBI integrated model and to visualize the results. How can I do this? Is "accuracy" the first thing that I have to start with? If yes then how can I calculate and visualize a result like the "accuracy".
Thank you for all your answers in advance.
Yes, take accuracy in first consideration. If you find 70 or 80 percent above results are accurate, you can easily rely on the PowerBI AI-Insights Text Analytics Sentiment Analysis. You can then create your visuals for Sentiment data. But if there is 50-50 occurrence of predicted and not predicted result, you may go for 3rd party Sentiment analysis service like - Google, Alchemy.

Latest Data Available over Time Series Dax Measure

I have data with the columns "Date","Care Home", "Accumulative Number of Deaths".
Some care homes will miss submissions for certain dates and these will likely never be filled out.
I would like to chart the data in a time series per day where it sums the most recent available values for each day for each care home. This then needs to be filterable by care home.
Completely stuck with it as every attempt I make comes out with a graph that only sums the submissions on that particular day as below. The graph obviously should not decline at any point.
Graph Screen Capture

How to estimates monthly/daily sales of an item using Amazon Advertising API

I have looked into the responses of "ItemSeach ()" and "lookUp()" functions in Amazon Advertising API and
could not find a possible way to get daily/monthly sales of an item.
Popular product research software like , JungleScout, ProfitPhonix, AMZ tracker etc do display Number of monthly sales but all of them show different results.
Does Amazon provide this information ? If not then how the above software are estimating it?
I think when they fetch the ASIN information, they do store "some thing" in their DB and next time when the same ASIN is pulled again then the estimated sales are roughly calculated based on DB previous value/score.
Any help will be highly appreciated .
Thanks
It is not a solution, but here is a reply from UnicornSmasher I found, it may help to save time searching for something that doesn't exist.
constantine We just took all of the bulk data from the products that are being tracked in AMZ Tracker and applied a formula to it all. If you have specific products that are way off please let us know! Certain categories we had less data on. This is version 1 of the research tool, so I'm sure it will continue to improve quickly over time.
Here is the link to question and answer:
amz forum
So, now, the question is 'What formula do they use?'
Let me know if you come up with an idea :)
Let me tell you first that if you're not the part of the Amazon data team you can't get the sales numbers of any product. And, its probably not easy to estimate sales using Amazon advertising API. You need to constantly track a huge number of products to estimate the sales. Here I can explain how AMZ Insight an Amazon tracking tool estimates the sales of any product.
They constantly track a few thousand products from all the categories and collect massive data. Then their in-house data scientist analyze the data to form the sales estimating algorithm. Relationship of multiple data points plots a scattered graph which means of course sales estimates are not 100 percent right.
Data is continuously gathered and analyzed by tracking the Best Seller Rank (BSR), Buybox, reviews and more factors. Then the relationship between this data is formed to come up with the unit sales. Once this relationship is in place then it is much easier to estimate monthly sales and revenue for the product.