I'm getting the text from the title and href attributes from the HTML. The code runs fine and I'm able to import it all into a PrettyTable fine. The problem that I face now is that there are some titles that I believe are too large for one of the boxes in the table and thus distort the entire PrettyTable made. I've tried adjusting the hrules, vrules, and padding_width and have not found a resolution.
from bs4 import BeautifulSoup
from prettytable import PrettyTable
import urllib
r = urllib.urlopen('http://www.genome.jp/kegg-bin/show_pathway?map=hsa05215&show_description=show').read()
soup = BeautifulSoup((r), "lxml")
links = [area['href'] for area in soup.find_all('area', href=True)]
titles = [area['title'] for area in soup.find_all('area', title=True)]
k = PrettyTable()
k.field_names = ["ID", "Active Compound", "Link"]
c = 1
for i in range(len(titles)):
k.add_row([c, titles[i], links[i]])
c += 1
print(k)
How I would like the entire table to display as:
print (k.get_string(start=0, end=25))
If PrettyTable can't do it. Are there any other recommended modules that could accomplish this?
This was not a formatting error, but rather the overall size of the table created was so large that the python window could not accommodate all the values on the screen.
This proven by changing to a much smaller font size. If it helps anyone exporting as .csv then arranging in Excel helped.
Related
I have following code that generates a histogram. How can I save the histogram automatically using the code? I tried what we do for other plot types but that did not work for histogram.a is a 'numpy.ndarray'.
a = [-0.86906864 -0.72122614 -0.18074998 -0.57190212 -0.25689268 -1.
0.68713553 0.29597819 0.45022949 0.37550592 0.86906864 0.17437203
0.48704826 0.2235648 0.72122614 0.14387731 0.94194514 ]
fig = pl.hist(a,normed=0)
pl.title('Mean')
pl.xlabel("value")
pl.ylabel("Frequency")
pl.savefig("abc.png")
This works for me:
import matplotlib.pyplot as pl
import numpy as np
a = np.array([-0.86906864, -0.72122614, -0.18074998, -0.57190212, -0.25689268 ,-1. ,0.68713553 ,0.29597819, 0.45022949, 0.37550592, 0.86906864, 0.17437203, 0.48704826, 0.2235648, 0.72122614, 0.14387731, 0.94194514])
fig = pl.hist(a,normed=0)
pl.title('Mean')
pl.xlabel("value")
pl.ylabel("Frequency")
pl.savefig("abc.png")
a in the OP is not a numpy array and its format also needs to be modified (it needs commas, not spaces as delimiters). This program successfully saves the histogram in the working directory. If it still does not work, supply it with a full path to the location where you want to save it like this
pl.savefig("/Users/atru/abc.png")
The pl.show() statement should not be placed before savefig() as it creates a new figure which makes savefig() save a blank figure instead of the desired one as explained in this post.
I am trying to plot a line chart which includes tooltip, but the code below results in displaying all the values of the line in a tooltip instead displaying a single value for those co ordinates
#Import the library
import pandas
import itertools
import bokeh
import MySQLdb
from bokeh.plotting import figure, output_file, show
from bokeh.models import HoverTool
TOOLS='hover'
wells=['F1','F2','F3','F4','F5','F6','F7','F8','F9','F10','F11','F12','G1','G2','G3','G4','G5','G6','G7','G8','G9','G10','G11','G12']
p = figure(plot_width=800, plot_height=640,x_axis_type="datetime", tools=TOOLS)
p.title.text = 'Click on legend entries to hide the corresponding lines'
# Open database connection
db = MySQLdb.connect("localhost","user","password","db" )
#pallete for the lines
my_palette=bokeh.palettes.inferno(len(wells))
#create a statement to get the data
for name, color in zip(wells,my_palette):
stmnt='select date_time,col1,wells,test_value from db where wells="%s"'%(name)
#creating dataframe
df=pandas.read_sql(stmnt,con=db)
p.scatter(df['date_time'], df['test_value'], line_width=2, color=color, alpha=0.8, legend=name,)
#Inserting tool tip
hover = p.select(dict(type=HoverTool))
hover.tooltips = [("Wells","#wells"),("Date","#%s"%(df['date_time'])),("Values","#%s"%(df['test_value']))]
hover.mode = 'mouse'
#Adding a legend
p.legend.location = "top_right"
output_file("interactive_legend.html", title="interactive_legend.py example")
show(p)
Given below is the resultant screenshot
I am trying to get only one well,Date_time,Test_value at given mouse over instance
This code:
hover.tooltips = [
("Wells","#wells"),
("Date","#%s"%(df['date_time'])),
("Values","#%s"%(df['test_value']))
]
Does not do what you think. Let's suppose df['date_time'] has the value [10, 20, 30, 40]. Then after your string substitution, your tooltip looks like:
("Date", "#[10, 20, 30, 40]")
Which exactly explains what you are seeing. The #[10 part looks for a column named "[10" in your ColumnDataSource (because of the # in front). There isn't a column with that name, so the tooltip prints ??? to indicate it can't find data to look up. The rest 20, 30, 40 is just plain text, so it gets printed as-is. In your code, you are actually passing a Pandas series and not a list, so the string substitution also prints the Name and dtype info in the tooltip text as well.
Since you are passing sequence literals to scatter, it creates a Column Data Source for you, and the default names in the CDS it are 'x' and 'y'. My best guess, is that you actually want:
hover.tooltips = [
("Wells","#wells"),
("Date","#x"),
("Values","#y")
]
But note that you would want to do this outside the loop. As it is you are simply modifying the same hover tool over and over.
I would like to execute this and get all of the text from the title and href attributes. The code runs, and I do get all of the needed data, but I would like to assign the outputs to an array and when I attempt to assign this just gives me the last instance of the attributes being true in the HTML.
from bs4 import BeautifulSoup
import urllib
r = urllib.urlopen('http://www.genome.jp/kegg-bin/show_pathway?map=hsa05215&show_description=show').read()
soup = BeautifulSoup((r), "lxml")
for area in soup.find_all('area', href=True):
print area['href']
for area in soup.find_all('area', title=True):
print area['title']
If it helps, I'm doing this because I will create a list with the data later. I'm just beginning to learn, so extra explanations are much appreciated.
You need to use list comprehensions:
links = [area['href'] for area in soup.find_all('area', href=True)]
titles = [area['title'] for area in soup.find_all('area', title=True)]
I am writing a program which generates satisfiable models (connected graphs) for a specific input string. The details here are not important but the main problem is that each node has a label and such label can be lengthy one. So, what happens is that it does not fit into the figure which results in displaying all the nodes but some labels are partly displayed... Also, the figure that is displayed does not provide an option to zoom out so it is impossible to capture entire graph with full labels on one figure.
Can someone help me out and perhaps suggest a solution?
for i in range(0,len(Graphs)):
graph = Graphs[i]
custom_labels={}
node_colours=['y']
for node in graph.nodes():
custom_labels[node] = graph.node[node]
node_colours.append('c')
#nx.circular_layout(Graphs[i])
nx.draw(Graphs[i], nx.circular_layout(Graphs[i]), node_size=1500, with_labels=True, labels = custom_labels, node_color=node_colours)
#show with custom labels
fig_name = "graph" + str(i) + ".png"
#plt.savefig(fig_name)
plt.show()
Update picture added:
You could scale the figure
import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G.add_edge('a'*50,'b'*50)
nx.draw(G,with_labels=True)
plt.savefig('before.png')
l,r = plt.xlim()
print(l,r)
plt.xlim(l-2,r+2)
plt.savefig('after.png')
before
after
You could reduce the font size, using the font_size parameter:
nx.draw(Graphs[i], nx.circular_layout(Graphs[i]), ... , font_size=6)
Is there a way by which I can freeze Pandas data frame header { as we do in excel}.So if its a long dataframe with multiple rows we can see the headers once we scroll down!! I am assuming ipython notebook
This function may do the trick:
from ipywidgets import interact, IntSlider
from IPython.display import display
def freeze_header(df, num_rows=30, num_columns=10, step_rows=1,
step_columns=1):
"""
Freeze the headers (column and index names) of a Pandas DataFrame. A widget
enables to slide through the rows and columns.
Parameters
----------
df : Pandas DataFrame
DataFrame to display
num_rows : int, optional
Number of rows to display
num_columns : int, optional
Number of columns to display
step_rows : int, optional
Step in the rows
step_columns : int, optional
Step in the columns
Returns
-------
Displays the DataFrame with the widget
"""
#interact(last_row=IntSlider(min=min(num_rows, df.shape[0]),
max=df.shape[0],
step=step_rows,
description='rows',
readout=False,
disabled=False,
continuous_update=True,
orientation='horizontal',
slider_color='purple'),
last_column=IntSlider(min=min(num_columns, df.shape[1]),
max=df.shape[1],
step=step_columns,
description='columns',
readout=False,
disabled=False,
continuous_update=True,
orientation='horizontal',
slider_color='purple'))
def _freeze_header(last_row, last_column):
display(df.iloc[max(0, last_row-num_rows):last_row,
max(0, last_column-num_columns):last_column])
Test it with:
import pandas as pd
df = pd.DataFrame(pd.np.random.RandomState(seed=0).randint(low=0,
high=100,
size=[200, 50]))
freeze_header(df=df, num_rows=10)
It results in (the colors were customized in the ~/.jupyter/custom/custom.css file):
Old question but wanted to revisit it because I recently found a solution. Use the qgrid module: https://github.com/quantopian/qgrid
This will not only allow you to scroll with the headers frozen but also sort, filter, edit inline and some other stuff. Very helpful.
Try panda's Sticky Headers:
import pandas as pd
import numpy as np
bigdf = pd.DataFrame(np.random.randn(16, 100))
bigdf.style.set_sticky(axis="index")
(this feature was introduced lately, I found it working on pandas 1.3.1, but not on 1.2.4)
A solution that would work on any editor is to select what rows you want to look at:
df.ix[100:110] # would show you from row 101 to 110 keeping the header on top