I am attempting to make somewhat of a pseudo-database of players by creating instances of a Player class for football. So, the information in my Player would have certain values such as
self.age
self.height
self.weight, etc.
as well as seasonal data, which I would format something like the following:
player_year year yards comp_perc rushing
X1 2008 400 65 70
X2 2009 500 35 100
Once I have created an instance of each player, I would then pickle everything in order to save it once I have scraped everything. My issue is then that I would like a way of cleanly looking through the tables for each player, such as wanting all 'yards' stats from 'player_year' X2 from each player. Is the cleanest way of doing this a 2d dictionary such as
dict['x1']['yards']
and if so, how would I be able to read this in neatly row by row. I would also like to mention that 'X1' would need to be created by my code as it is not a part of the scraped data.
Thanks for the help!
Related
This question is close, but doesn't quite help me with a similar issue as I am using a single data set and no related time series.
I am using AWS Forecast with a single time series dataset (no related data, just the main DS). It is a daily data set with about 10 years of data ranging from 2010-2020.
I have 3572 data points in the original data set; I manually filled missing data to ensure there were no missing days in the date range for a total of 3739 data points. I lopped off everything in 2020 to create a validation dataset and then configured the predictor for a 180 day Forecast. I keep getting the following error:
Unable to evaluate this dataset because there is missing data in the evaluation window for all items. Ensure that there is complete data for at least one item in the evaluation window starting from 2019-03-07T00:00:00 up to 2020-01-01T00:00.
There is definitely no missing data, I've double and triple checked the date range and data fill and every day between start and end dates has a data point. I also tried adding a data point for 1/1/2020 (it ended at 12/31/2019) and I continue to get this error. I can't figure out what it's asking me for, except that maybe I'm missing something in my math about the forecast Horizon and Backtest window offset?
Dataset example:
Brief model parameters (can share more if I'm missing something pertinent):
Total data points in training data: 3479
forecastHorizon = 180
create_predictor_response=forecast.create_predictor(PredictorName=predictorName,
ForecastHorizon=forecastHorizon,
PerformAutoML= True,
PerformHPO=False,
EvaluationParameters= {"NumberOfBacktestWindows": 1,
"BackTestWindowOffset": 180},
InputDataConfig= {"DatasetGroupArn": datasetGroupArn},
FeaturizationConfig= {"ForecastFrequency": 'D'
I noticed you don't have entry for 6/24/10 (this american date format is the worst btw)
I faced a similar problem when leaving out days (assuming you're modelling in daily frequency) just like that and having the Forecast automatic filling of gaps to nan values (as opposed to zero which is the default). I suggest you:
pre-fill literally every date within the range of training data (and of forecast window, if using related data)
choose zero as the option for automatically filling of missing values. I think mean or any other float value would also work for that matter
let me know if that works! I am also using Forecast and it's good to keep track of possible problems and solutions
Im a noob programmer currently making a small family database using cpp but i have trouble deleting a family from the the list...
My list looks something like this
start-of-family 1
jim
joe
bob
sam
end-of-family 1
start-of-family 2
rob
max
end-of-family 2
start-of-family 3
sue
tom
kim
end-of-family 3
If i wanted to delete family 1, I would locate start-of-family 1 and end-of-family 1. Then run a loop but how do i locate it if the user only inputs an int to represent a family number. Also how do i make the succeeding family numbers deduct by 1 so that family 2 will be 1 and family 3 will be 2.
thanks a lot
If I were doing this problem I would start by making each family into a vector of names. Then, I would create a vector containing those family vectors.
The result would look something like:
{
<(jim), (joe), (bob), (sam) >
<(rob), (max) >
<(sue), (tom), (kim) >
}
Then, if the user wants to delete one of the families, you can use vector.remove(n) where n is the index of the family to be removed.
This sounds like a school or text book assignment. Have you gotten to vectors yet? Where are the names coming from? Are you hard coding them into the list? Or reading them from a .txt file? What kind of list structure are you storing them in right now?
i realized that clearing the db and updating it with what ive got is way easier than modifying the db and updating my program
First off - sorry I can't attach an excel file or CSV with the proper data, but I'll write it in here as best I can.
Basically, I want to generate a heatmap that has two fixed points layered atop. These two fixed points are generated using coordinates I've created. I've been able to do with with ONE fixed point, using a double axis, but not sure how to add the other point. That dataset looked like this:
zip count place1 LONG LAT place2 LAT2 LONG2
95020 120 MY HOME -122.9011 37.3326 FRIENDS HOME 37.335895 -121.99833
95122 90 MY HOME -121.9011 37.3326 FRIENDS HOME 37.335895 -121.99833
94086 66 MY HOME -121.9011 37.3326 FRIENDS HOME 37.335895 -121.99833
95127 163 MY HOME -121.9011 37.3326 FRIENDS HOME 37.335895 -121.99833
To generate the one fixed point I did the following:
Added LONG and LAT measure to Columns and Rows TWICE
Made both a dual axis
This created two "Marks" - LAT and LAT(2).
I added sum(COUNT) and ZIP to LAT
I made LAT(2) the "Circle" type, as opposed to "Filled Map" for LAT
Changed colors
Results: http://i.imgur.com/WJ9CRxe.png
How can I add a circle for LAT2 and LONG2?
Perhaps, a sample workbook can be shared to try resolving the issue as the explanation is not clear.
I have about 30 rasters with 4 bands each that I am trying to create composites so that I can eventually bring all of the rasters together into 1 large raster. But the first step is to create composite rasters. I would like to do this all at once and I found a few examples on various sites on how to do it, including ESRI's. I've pieced them together to create my own code, unfortunately I keep getting error 000271: Cannot open the input datasets. I know the path is correct because arcpy.ListRasters() returns the files in the folder in a large list, so the problem is definitely with the CompositeBands tool. I've looked up possible solutions to this problem, but I did not understand the solutions or how they worked, so if you do have an answer or suggestion, could you comment on your code (if you write one) or answer so I know what is going on and why? About the data - they are all ERDAS Imagine image rasters with 4 image color bands : R, G, B, and whatever N is. All but a few rasters have bands named Layer_1, Layer_2 and so on. The few are called Band_1, Band_2 and so on. Here is my code:
arcpy.env.workspace = r'\\network\folder\subfolder1\subfolder2\All_RGBN'
ws = arcpy.env.workspace
outws = r'\\network\folder\subfolder1\subfolder2\RGBN_Composit'
for ras in arcpy.ListRasters("*.img"):
name = outws+"\\"+ras
try:
arcpy.CompositeBands_management("Layer_1.img;Layer_2.img;Layer_3.img,Layer_4.img", name)
except:
arcpy.CompositeBands_management("Band_1.img;Band_2.img;Band_3.img,Band_4.img", name)
Thanks!
If your rasters have multiple bands, they are already composite. Composite Bands should be used when your bands are distinct raster datasets that you want to merge into one raster.
If you want to merge all your rasters (composite or not) into one single dataset, you should create a Mosaic Dataset or a Raster Catalog and load your rasters into it.
And FYI, you get an error message in the Composite Band tool because your raster bands (inputs) are not correctly referenced, you should write something like:
ras + "\\Layer_x" instead of "Layer_x.img"
But doing this will output the exact same raster as the original one.
I have a list of songs. For each song I have the artist, writer and genre.
I want to create a directed graph so I can look for patterns.
So, I want to have a node for each artist, and so I will start with clustering the songs based on artist.
Then I want to also find the composer and somehow arrange the already clustered songs so they are close to the writer.
Later I will also group based on genre, but I am stuck on the first two.
So, my first approach is to just do something like (not tested):
pos_x = 20
for x in songs:
pos_y = 20
artist_list = [s for s in songs if s.artist==x.artist]
for y in len(artist_list):
artist_list[y].x = pos_x
pos_x += 10 * len(artist_list[y].title)
artist_list[y].y = pos_y
pos_y += 10
I would then loop over artist list, create the initial graph, but there is a problem when multiple artists are on one song, such as We are the world.
But, I believe in Python this is a horribly flawed approach as my next approach would be to take the songs, keeping them relatively close to the artist, but loop over to get the composer and make small changes to the groups to put those related close together, so the clusters of songs for artists may be moved.
I am using pyglet to do this, so basically I am doing this in OpenGL.
The actually positioning I can do, but it is the approach I am concerned about, as I am stuck on how should I approach this problem.
UPDATE
What I am looking for is something like:
Song A1 Song A3 Song A2
Artist A
Artist B
Song B1 Song B2
I would have lines going from A1,A2,A3 to Artist A, and B1,B2 going to Artist B, but A3 and B1, B2 connected to Artist B, but I also want the placement of Artist B closer to A2 and B2 as these two songs have the same composer.
So the artist will be a new node, separate from the songs, but the actual placement of songs within each cluster will depend on at least one other relationship. Later I may end up showing that relationship also, which is why I am mapping in 3D at the moment.
The approach I would take is to generate a directed graph in Python that can be written into the "dot" format and rendered by Graphviz (http://www.graphviz.org). Graphviz and the dot format are established tools for defining and rendering complex graphs.
The great news is that there are Python libraries that allow you to define the graph in a Pythonic manner, and then write out the dot file with a single line of code. PyGraphviz looks like a good choice: http://networkx.lanl.gov/pygraphviz. You can create the structure of the graph in Python, which is as simple as defining edges between songs and artists, between songs and composers, and so forth. Here's a snippet from the PyGraphviz tutorial:
>>> G.add_node('a') # adds node 'a'
>>> G.add_edge('b','c') # adds edge 'b'-'c' (and also nodes 'b', 'c')
Then just write the dot file and load it into Graphviz, which will lay out the nodes in 2d space. There are a variety of layout algorithms, so you can experiment with them in order to cluster the songs in the most useful manner.