Assume there are some number of buildings located in several locations (BL). Each building can have a certain building type (BT), for example it can be a residential house, a hospital or a school. The choice of construction material (CM) used to build buildings' walls depends on BL and BT.
How can I declare CM in my models, so that my app determines CM based on the selection of BL and BT? I assume regular ForeignKey won't work in this case.
Related
I decided to post here a kind information for support I put in Statalist yesterday. I have not yet received a possible hint and thought it could be useful to extend the audience by posting it here.
The link to the original post is the following:
https://www.statalist.org/forums/forum/general-stata-discussion/general/1659627-choose-the-appropriate-way-to-deal-with-weights-in-svyset?view=thread
Dear Members,
I defined a questionnaire to gather respondents' willingness to get vaccinated against COVID-19 via a discrete choice experiment. I relied on a company specialized in political opinion polls and market research to administer the survey. The company computed a weight for each respondent based on 1) the geographical location where the respondent lives (five macroareas of Italy), 2) whether the respondent has a bachelor degree or not, and 3) to which age group she/he pertains (five classes are considered).
The sum of the weights is equal to the number of individuals in the database. The individuals pertaining to the age classes 30-39 and 40-49 are oversampled, as per our request (related to a research hypothesis). The proportion of such two classes within the sample is larger than the actual in the Italian population. Weights are computed in order to take into account for this feature and guarantee that the sample is representative of the characteristics of the Italian population.
I will use the data to estimate a logit model, multinomial logit models and mixed logit models.
The issue I am facing with is the proper path to follow to declare the nature of the weight. I have no experience in the use of Stata to deal with this issue.
I am using Stata 17 on a PC with Windows 10 Pro 64 bit.
Combining the information from the video, the svysvyset manual and the results from the help for "weight" I tried to think what is the most appropriate solution.
I tried to add here the code multiple times as well but I kept receiving an error message on how I formatted it. My apologies
I am trying to predict match winner based on the historical data set as shown below,
The data set comprises of IPL seasons and Team_Name_id vs Opponent Team are the team names in IPL. I have set the match id as Row id and created the model. When running realtime testing, the result is not as expected (shown below)
Target is set as Match_winner_id.
Am I missing any configurations? Please help
The model is working perfectly correctly. There's just two problems:
Your input data is not very good
There's no way for the model to know that only one of those two teams should win
Data Quality
A predictive model needs good quality input data on which to reverse-engineer a model that explains a given result. This input data should contain information that can be used to predict a result given a different set of input data.
For example, when predicting house prices, it would need to know the suburb (category), number of bedrooms/bathrooms/parking spaces, age of the building and selling price. It could then predict the selling price for other houses with a slightly different mix of variables.
However, based on your screenshot, you are giving the following information (and probably more) on which to make your prediction:
Teams: Not great, because you are separating Column C and Column D. The model will assume they are unrelated information. It doesn't realise that those two values could be swapped.
Match date: Useless information unless the outcome varies in proportion to time (eg a team continually gets better)
Season: As with Match Date, this is probably useless because you're always predicting the future -- you won't be predicting for a past season
Venue: Only relevant if a particular team always wins at a given venue
Toss Decision: Would this really influence the outcome? Also, it's only known once the game begins, so not great for predicting a future game.
Win Type: You won't know the win type until a game is over, so it's not suitable for predicting a future game.
Score: Again, not known until the actual game, so no good for future predictions.
Man of the Match: Not known for future games.
Umpire: How does an umpire influence the result of a game?
City: Yes, given that home teams often have an advantage.
You have provided very little information that could be used to predict a future game. There is really only the teams and the venue. Everything else is either part of the game itself or irrelevant.
Picking only one of the two teams
When the ML model looks at your data and tries to make a prediction, it will look at all the data you have provided. For example, it might notice that for a given venue and season, Team 8 has a higher propensity to win. Therefore, given that venue and season, it will favour a win by Team 8. The model has no concept that the only possible outcome is one of the two teams given in columns C and D.
You are predicting for two given teams and you are listing the teams in either Column C or Column D and this makes no sense -- the result is the same if you swapped the teams between columns, but the model has no concept of this. Also, information about Team 1 vs Team 2 is totally irrelevant for Team 3 vs Team 4.
What you should do is create one dataset per team, listing all their matches, plus a column that shows the outcome -- either a boolean (Win/Lose) or a value that represents the number of runs by which they won (where negative is a loss). You would then ask them model to predict the result for that team, given the input data, which would be win/lose or a points above/below the other team.
But at the core, I think that your input data doesn't have enough rich content to be able to make a sensible prediction. Just ask yourself: "What data would I like to know if I were to guess which team would win?" It would probably be past results, weather conditions, which players were on each team, how many matches they played in the last week, etc. None of this information is being provided as input on each line of your input data.
I am new in prediction models. I am currently using python2.7 and sklearn. I would like to know a simple model to combine many features to predict one target.
To make it more clear. Lets say I have 4 arrays of size 10: A,B,C,Y. I would like to use the values of A,B,C to predict the values of Y.
Thank you
I'm working with our IT group to develop an optimizer for logistics operations. The basic design is that it will look at shipments, run a search for additional shipments originating with in XX miles of the previous shipments destination, and link them together in a loop. It will continue to do this until it hits a user defined set of shipment legs where the loop ends at or close to 1st shipment origin.
The issue we are facing is that the materials we ship are chemicals, which can have interactions if placed in a tank that contained XX chemical before it. The obvious solution is to use a different tank or wash it out, but we also need it to compute solutions prior to that.
My problem is, currently, there is no way on the market to do that prior product optimization.
The question is: Is there some kind of logic table function I can write that will allow the optimizer to see an element in the data set (say, Product Family of 1) that will pull from a product database containing predefined product families (i.e. PF 1 = Chemicals A1-B7, PF 2 = Chemical B8-J8, etc.) and then ping off of a logic table that defines a do not ship with list (i.e. PF 1 cannot ship if PF 2 was on the previous leg.
I want to build a website with a map based on openstreetmap that colors buildings based on a their potential average annual yield of solar power. I have the energy data for individual houses.
My question is now, can I assign each house (identified by street name and number) a value and the house can then be colored based on this value in the browser?
I have little to no experience with openstreetmap and would be happy about hints into the right direction.
So you need a OSM dataset and filter it for building=* ways to get the building outlines (e.g. with osmosis). Then you do create a second run to filter for addr:= tags of nodes and merge them with the building outlines from step 1. Be aware of conflicts and that one building can have multiple addresses. So now you have a dataset with normalized addresses and need to create a lookup structure like hashmap to get a mapping for your solar data: addr:street x addr:housenumber -> building id
(very raw idea on how to do it)
IMHO the mixing of external datasources to the copyleft open database license makes that you need to relicense your dataset also under ODbL.
Also keep in mind that not every address is currently at OSM and the existing ones can be wrong!