predicted_value1 = regr1.predict(9) meaning of this statement - python-2.7

I am a newbe to the data science and I have downloaded the code which will tell the viewers for the next week.
But in this following code I am not able to understand the what the following function does, and how it will predict the values.
The data set is of 7 values for each. Why only 9 are inserted into the braces?
regr1 = linear_model.LinearRegression()
regr1.fit(x1, y1)
predicted_value1 = regr1.predict(9)
What thess lines will do?
Here is the full code:
import pandas as pd
def get_data(file_name):
data = pd.read_csv(file_name)
flash_x_parameter = []
flash_y_parameter = []
arrow_x_parameter = []
arrow_y_parameter = []
for x1,y1,x2,y2 in zip(data['flash_episode_number'],
data['flash_us_viewers'],
data['arrow_episode_number'],data['arrow_us_viewers']):
flash_x_parameter.append([float(x1)])
flash_y_parameter.append(float(y1))
arrow_x_parameter.append([float(x2)])
arrow_y_parameter.append(float(y2))
return flash_x_parameter,
flash_y_parameter,arrow_x_parameter,arrow_y_parameter
def more_viewers(x1,y1,x2,y2):
regr1 = linear_model.LinearRegression()
regr1.fit(x1, y1)
predicted_value1 = regr1.predict(9)
regr2 = linear_model.LinearRegression()
regr2.fit(x2, y2)
predicted_value2 = regr2.predict(9)
print predicted_value1,"are the flash viewers"
print predicted_value2,"are the arrow viewers"
if predicted_value1 > predicted_value2:
print "The Flash Tv Show will have more viewers for next week"
else:
print "Arrow Tv Show will have more viewers for next week"
x1,y1,x2,y2 = get_data('C:\\Users\\SHIVAPRASAD\\Desktop\\test.csv')
more_viewers(x1,y1,x2,y2)

No, your data is NOT the set of 7 values, it has 9 rows:
+----------------+-------------------+----------------+------------------+
| FLASH_EPISODE | FLASH_US_VIEWERS | ARROW_EPISODE | ARROW_US_VIEWERS |
+----------------+-------------------+----------------+------------------+
| 1 | 4.83 | 1 | 2.84 |
| 2 | 4.27 | 2 | 2.32 |
| 3 | 3.59 | 3 | 2.55 |
| 4 | 3.53 | 4 | 2.49 |
| 5 | 3.46 | 5 | 2.73 |
| 6 | 3.73 | 6 | 2.6 |
| 7 | 3.47 | 7 | 2.64 |
| 8 | 4.34 | 8 | 3.92 |
| 9 | 4.66 | 9 | 3.06 |
+----------------+-------------------+----------------+------------------+
(as your code is from Dataconomy Linear Regression Implementation in Python.)
So the value 9 in the command
predicted_value1 = regr1.predict(9)
is OK.

Related

Create column to classify rows based on realted tables DAX PowerBI

I have simplified my problem to solve. Lets suppose I have three tables. One containing data and specific codes that identify objects lets say Apples.
+-------------+------------+-----------+
| Data picked | Color code | Size code |
+-------------+------------+-----------+
| 1-8-2018 | 1 | 1 |
| 1-8-2018 | 1 | 3 |
| 1-8-2018 | 2 | 2 |
| 1-8-2018 | 2 | 3 |
| 1-8-2018 | 2 | 2 |
| 1-8-2018 | 3 | 3 |
| 1-8-2018 | 4 | 1 |
| 1-8-2018 | 4 | 1 |
| 1-8-2018 | 5 | 3 |
| 1-8-2018 | 6 | 1 |
| 1-8-2018 | 6 | 2 |
| 1-8-2018 | 6 | 2 |
+-------------+------------+-----------+
And i have two related helping tables to help understand the codes (their relationships are inactive in the model due to ambiguity with other tables in the real case).
+-----------+--------+
| Size code | Size |
+-----------+--------+
| 1 | Small |
| 2 | Medium |
| 3 | Large |
+-----------+--------+
and
+------------+----------------+-------+
| Color code | Color specific | Color |
+------------+----------------+-------+
| 1 | Light green | Green |
| 2 | Green | Green |
| 3 | Semi green | Green |
| 4 | Red | Red |
| 5 | Dark | Red |
| 6 | Pink | Red |
+------------+----------------+-------+
Lets say that I want to create an extra column in the original table to determine which apples are class A and class B given that medium green Apples are class A and large Red apples are class B, the other remain blank as the example below.
+-------------+------------+-----------+-------+
| Data picked | Color code | Size code | Class |
+-------------+------------+-----------+-------+
| 1-8-2018 | 1 | 1 | |
| 1-8-2018 | 1 | 3 | |
| 1-8-2018 | 2 | 2 | A |
| 1-8-2018 | 2 | 3 | |
| 1-8-2018 | 2 | 2 | A |
| 1-8-2018 | 3 | 3 | |
| 1-8-2018 | 4 | 1 | |
| 1-8-2018 | 4 | 1 | |
| 1-8-2018 | 5 | 3 | B |
| 1-8-2018 | 6 | 1 | |
| 1-8-2018 | 6 | 2 | |
| 1-8-2018 | 6 | 2 | |
+-------------+------------+-----------+-------+
What's the proper DAX to use given the relationships are initially inactive. Preferably solvable without creating any further additional columns in any table. I already tried codes like:
CALCULATE (
"A" ;
FILTER ( 'Size Table' ; 'Size Table'[Size] = "Medium");
FILTER ( 'Color Table' ; 'Color Table'[Color] = "Green")
)
And many variations on the same principle
Given that the relationships are inactive, I'd suggest using LOOKUPVALUE to match ID values on the other tables. You should be able to create a calculated column as follows:
Class =
VAR Size = LOOKUPVALUE('Size Table'[Size],
'Size Table'[Size code], 'Data Table'[Size code])
VAR Color = LOOKUPVALUE('Color Table'[Color],
'Color Table'[Color code], 'Data Table'[Color code])
RETURN SWITCH(TRUE(),
(Size = "Medium") && (Color = "Green"), "A",
(Size = "Large") && (Color = "Red"), "B", BLANK())
If your relationships are active, then you don't need the lookups:
Class = SWITCH(TRUE(),
(RELATED('Size Table'[Size]) = "Medium") &&
(RELATED('Color Table'[Color]) = "Green"),
"A",
(RELATED('Size Table'[Size]) = "Large") &&
(RELATED('Color Table'[Color]) = "Red"),
"B",
BLANK())
Or a bit more elegantly written (especially for more classes):
Class =
VAR SizeColor = RELATED('Size Table'[Size]) & " " & RELATED('Color Table'[Color])
RETURN SWITCH(TRUE(),
SizeColor = "Medium Green", "A",
SizeColor = "Large Red", "B",
BLANK())

Django calculate percentages within group by

I have a model for which I want to perform a group-by on two values and calculate the percentages of each value per outer grouping.
Currently I just make a query to get all the rows and put them into a pandas dataframe and perform something similar to the answer here. Although this works I'm sure it would be more efficient if I could make the query return the information I require directly.
I am currently running Django 2.0.5 with a backend DB on PostgreSQL 9.6.8
I think window functions could be the solution as indicated here but I cannot construct a successful combination of annotate and values to give me the desired output.
Another possible solution could be rollup introduced in PostgreSQL 9.5 if I can find a way to get the summary row as a set of extra columns for each row? But I also think it's not yet supported by Django.
Model:
class ModelA(models.Model):
grouper1 = models.CharField()
grouper2 = models.CharField()
metric1 = models.IntegerField()
All rows:
grouper1 | grouper2 | metric1
---------+----------+---------
A | C | 2
A | C | 2
A | C | 2
A | D | 4
A | D | 4
A | D | 4
B | C | 5
B | C | 5
B | C | 5
B | D | 6
B | D | 4
B | D | 5
Desired output:
grouper1 | grouper2 | sum(metric1) | Percentage
---------+----------+--------------+-----------
A | C | 6 | 40
A | D | 12 | 60
B | C | 15 | 50
B | D | 15 | 50
I got close to what I expected with
ModelA.objects.all(
).values(
'grouper1',
'grouper2'
).annotate(
SumMetric1=Window(expression=Sum('metric1'), partition_by=[F('grouper1'), F('grouper2')]),
GroupSumMetric1=Window(expression=Sum('metric1'), partition_by=[F('grouper1')])
)
However this returns a row for every original row in the database like so:
grouper1 | grouper2 | sum(metric1) | Percentage
---------+----------+--------------+-----------
A | C | 6 | 40
A | C | 6 | 40
A | C | 6 | 40
A | D | 12 | 60
A | D | 12 | 60
A | D | 12 | 60
B | C | 15 | 50
B | C | 15 | 50
B | C | 15 | 50
B | C | 15 | 50
B | C | 15 | 50
B | D | 15 | 50
In this situation .distinct() might help.
More information is here.

Weighted Cumulative Sum in Python

So I'm trying to figure out a good way of vectorizing a calculation and I'm a bit stuck.
| A | B (Calculation) | B (Value) |
|---|----------------------|-----------|
| 1 | | |
| 2 | | |
| 3 | | |
| 4 | =SUM(A1:A4)/4 | 2.5 |
| 5 | =(1/4)*A5 + (3/4)*B4 | 3.125 |
| 6 | =(1/4)*A6 + (3/4)*B5 | 3.84375 |
| 7 | =(1/4)*A7 + (3/4)*B6 | 4.6328125 |
I'm basically trying to replicate Wilder's Average True Range (without using TA-Lib). In the case of my simplified example, column A is the precomputed True Range.
Any ideas of how to do this without looping? Breaking down the equation it's effectively a weighted cumulative sum... but it's definitely not something that the existing pandas cumsum allows out of the box.
This is indeed an ewm problem. The issue is that the first 4 rows are crammed together into a single row... then ewm takes over
a = df.A.values
d1 = pd.DataFrame(dict(A=np.append(a[:4].mean(), a[4:])), df.index[3:])
d1.ewm(adjust=False, alpha=.25).mean()
A
3 2.500000
4 3.125000
5 3.843750
6 4.632812

Get data from SmartCard UEC

I already asked a question here (https://stackoverflow.com/questions/28658283/c-getslotlisttokenpresent-pslotlist-pulcount-return-pulcount-0) about my SmartCard (https://en.wikipedia.org/wiki/Universal_electronic_card), but I would like to know: is it possible to get a specific record from a smart card, knowing the pin code and where the record is located?
Map developed by ISO-7816, so the APDU-command must be based on the following scheme:
[CLA] [INS] [P1] [P2] [Lc field] [Data field] [Le field]
How APDU-command should look like and what the library is better to use on C++/C#, if I need the data from the field 5F20?
P.s.: here is data from file sectors.ini:
[Sector1_11]
Icon = "IDENTIFICATION SECTOR"
BlockDescr1 = "0 | 0 | The data block for sharing"
BlockDescr2 = "0 | 0 | block public access to the PIN"
DataDescr21 = "DF27 | 1 | 6 | 0,0,0 | 1 | SNILS"
DataDescr22 = "DF2B | 4 | 8 | 0,0,0 | 1 | Number of MHI"
DataDescr23 = "5F20 | 0 | 26 | 0,0,0 | 1 | Name"
DataDescr24 = "DF23 | 0 | 100 | 0,0,0 | 1 | Address of the issuer"
DataDescr25 = "5F2B | 4 | 4 | 0,0,0 | 1 | Born"
DataDescr26 = "DF24 | 0 | 100 | 0,0,0 | 1 | Birthplace"
DataDescr27 = "5F35 | 3 | 1 | 0,0,0 | 1 | Paul"
DataDescr28 = "DF2D | 0 | 40 | 0,0,0 | 1 | Last"
DataDescr29 = "DF2E | 0 | 40 | 0,0,0 | 1 | Name"
DataDescr210 = "DF2F | 0 | 40 | 0,0,0 | 1 | Middle"
I only know that the third number indicates the amount of data in bytes.

How to write a Django query where coloum_value < 100?

I have a table like this in DJango?
| id | user_id | name | source | remaining | start_date | time_remaining | size |
+----+---------+---------------+----------------+-----------+----------------------------+----------------+------+
| 1 | 1 | ok.txt | ngs.pradhi.com | 20 | February 05, 2013, 08:01AM | 1 | 4 MB |
| 2 | 1 | NC_008253.fna | ngs.pradhi.com | 20 | February 05, 2013, 08:02AM | 1 | 4 MB |
| 3 | 1 | test.data | ngs.pradhi.com | 0 | February 05, 2013, 08:21AM | 1 | 4 MB
I want to retrieve the data where user_id = request.user.id and remaining < 100.
Tried using:
Queue.objects.filter(user_id=request.user.id, remaining < 100) But didn't work.
Queue.objects.filter(user_id=request.user.id, remaining__lt=100).exclude(remaining=0)
Django Field lookups