How to compare two different data frames in pandas? - python-2.7

The data frames are not similar in any way. They do not have the same values. I want to be able to compare one column from one df with another column from the other and graph them. For example, one df has a column named "Offical poverty total" and another df has a column named "violent crime rate". I want to be able to compare these two.
I tried df['Offical Poverty_Total'].append(crime['Violent crime'])
but this isn't what I was looking for. To make it simple, I want to have a new table with the two columns and be able to analyze the new table.

You're looking for
pd.concat((df['Offical Poverty_Total'], crime['Violent crime']), axis = 1)
This will align their indexes, so if you've changed the row ordering of the dataframes and want to just glue them together in the order you see them in, do
pd.concat((df['Offical Poverty_Total'], crime['Violent crime']), axis = 1, ignore_index = True)

Related

How to create a new column from the select of the distinct values of two columns with Power Query Editor in Power Bi

I need to insert in a new column the distinct values of two columns using the Power Query Editor Power Bi.
Any ideas enter image description hereguys?
Here you go:
let
Source = SomeTable
firstCol = Source[FirstColumn],
secondCol = Source[SecondColumn],
thirdCol = List.Sort(List.RemoveNulls(List.Distinct(List.Combine( {firstCol, secondCol})))),
#"TableResult" = Table.FromColumns({firstCol, secondCol, thirdCol}, {"First","Second","Combined"})
in
#"TableResult"
This basically converts your first and second columns into lists, and combines them into one new list. Next, it transforms the new list a bit to match your requirements -- first by getting just the distinct values, then dropping any null values, and lastly sorting it in ascending order.
Once that's done, we can take advantage of Table.FromColumns and create a table from our three lists.
That should get you where you're going.
Thanks Ryan.
My requirement was to create a distinct list of IP addresses from one of the columns in a Table.
Here's the code that worked for me.
= let
Source = #"TABLENAME",
firstCol = #"TABLENAME"[Client Ip],
IP = List.Distinct(firstCol),
#"DistinctIP" = Table.FromColumns({IP}, {"IP"})
in
#"DistinctIP"

Fill columns with data from other sheet based on content from another column

Using Google Sheets, I'm trying to do something specific that is apparently un-searchable (zero results).
Sheet 1 contains data in column A. I want to import the data in Columns A and B to their respective columns on Sheet 2, but only if that same row also has (literally any) data in Column C.
maybe this in Sheet2!A1:
=ARRAYFORMULA(IF(LEN(Sheet1!C:C), Sheet1!A:B, ))
or perhaps this:
=ARRAYFORMULA(IF(LEN(C:C), Sheet1!A:B, ))

How to update one value in a dataframe with a dictionary?

I do not manage to update single values in a dataframe based on values of a dictionary before another calculation is executed
I am working with a dataframe and want to calculate values in different colums row per row. After the calculations of one row is finished, the values in the rows thereafter need to be changed before a new calculation can happen. This is since the values in the rows thereafter are dependent upon the results of the previous row.
To approach it, I am working with a dictionary. Currently I am working in Excel where I manage to update the values of a single cell with a dictionary. However to make the calculaton faster I want to work with a proper dataframe.
I manage to update the dictionary based on the results, but I do not manage to update my dataframe with these new values of the citionary
The code that works for my current model based on Excel is:
dict1={1:10,2:15.....38:29} #my dictionary
for row in range(2,sheet.max_row+1):
#updates the table with values of the dictionary before each calculation
sheet['F'+str(row)].value = dict1[sheet['C'+str(row)].value]
# calculations being executed
(.....)
#updating the dictionary with the results of the calculations in the row
dict1_1={sheet['C'+str(row)].value :sheet['F'+str(row)].value}
dict1.update(dict1_1)
What I tried so far with a dataframe looks like this:
for row in df.T.itertuples():
df.replace({"P_building_kg_y": dict1}) ##### <-----HERE IS THE PROBLEM!
# calculations being executed
(.....)
#updating the dictionary with the results of the calculations in the row
dict1_1=dict(zip(df.FacilityID, df.P_building_kg_y))
dict1.update(dict1_1)
I only want to update the values in the dataframe based on the dictionary. If you know a way of how to do it I would really appreciate it!

power BI diaplay one value

I am using Power BI to bring together data from several systems and display a dash board with data from all of the systems.
The dashboard has a couple of filters which are then used to display the data relating to one object across all systems.
When the dashboard is first loaded and none of the filter have been selected, the data cards display information from all rows in the table.
Is there a way to make a data card only display one row of data?
or
Be blank if there are more than one row of data?
There's no direct way to look at the number of rows in the visual, count them, and do something different if there's more than 1.
That said, there are a few things you can do.
HASONEFILTER
If you have a specific column in your table that, when selected, filters your results to a single row, then you can check if there's a filter on that column using HASONEFILTER. (If you have multiple alternative columns,any of which filter to a single row, that's ok too.)
You could then create a measure for each column that tests HASONEFILTER. If true, return the MAX of the column. (The reason for MAX is because measures always have to aggregate, but the MAX of a 1-row column will be the same as the value in that column.) If false, return either BLANK() or an empty string, depending on your preference.
E.g.
ColumnAMeasure = IF(HASONEFILTER(Sheet1[Slicer Column]),MAX(Sheet1[COLUMN A]), "")
ColumnBMeasure = IF(HASONEFILTER(Sheet1[Slicer Column]),MAX(Sheet1[COLUMN B]), "")
where Sheet1 is the name of the table and "Slicer Column" is the name of the column being used as a slicer
HASONEVALUE
If you have multiple columns that could be used as filters in combination (meaning that having a filter applied on "Slicer Column" doesn't guarantee only 1 row in the table), then rather than testing HASONEFILTER, you can test HASONEVALUE.
ColumnAMeasure = IF(HASONEVALUE(Sheet1[COLUMN A]),MAX(Sheet1[COLUMN A]), "")
ColumnBMeasure = IF(HASONEVALUE(Sheet1[Column B]),MAX(Sheet1[COLUMN B]), "")
Notice that HASONEVALUE tests the current column you're trying to display, rather than a slicer column like HASONEFILTER.
One side-effect of HASONEVALUE is that, if you're filtered to 3 rows, but all 3 rows have the same value for column A, then column A will display that value. (Whereas with HASONEFILTER, column A would stay blank until you're filtered to one thing.)
Low Tech
Both answers above depend on a measure existing for every column you want to display, so that you can test whether to display a blank row or not. That could become a pain if you have dozens of columns.
A lower-tech alternative is to add in an additional row with blanks for each column and then sort your table so that that row always appears first. (And shorten your visual so only the top row is visible.) Technically the other rows would be underneath and there'd be a scrollbar, but at least the initial display would be blank rather than showing a random row.
Hopefully something here has helped. Other people might have better solutions too. More information:
HASONEFILTER documentation: https://msdn.microsoft.com/en-us/library/gg492135.aspx
HASONEVALUE documentation: https://msdn.microsoft.com/en-us/library/gg492190.aspx

Deleting pandas dataframe rows if value in given column not contained in a list

I have pandas dataframe called df that contains several columns and a df['MY STATE'] column. My goal is to remove all the rows from the dataframe which to not contains US states. I want to do this by comparing the value in the cell to a pandas series I have containing all the state abbreviations. I have seen people use something like the following to clean a dataframe:
df = df[df['COST'] <= 0]
But something like what I need (below) doesn't work
df = df[df['MY STATE'] not in states['Abbreviation'].values]
Is there a way to do this simply?
I have read that df.query() can be used to do something like this, but I have not yet found an example, and have also read that df.query() cannot be used when there is a space in the name of the column.
Thank you,
Michael
IIUC you can use isin with inverse operator ~:
df = df[~df['MY STATE'].isin(states['Abbreviation'].values)]