Pie charts in Stata - stata

I'm using the code below to draw some graphs and combine them. When I execute the entire file I get the error:
"Invalid Syntax r(198)".
And the code stops at the code segment below. However, when I run the code segment separately the program works without a flaw. Can you please help me understand what's causing this issue?
*pie chart
foreach i in "SPA" "EPD"{
graph pie billed_amount if type== "`i'", over(service_id) saving(gg`i',replace)
local gg `gg' "gg`i'"
}
local gg: subinstr local gg "ggSPA" `""ggSPA""'
gr combine `gg'
graph export "C\provider.png", as(png) replace
graph drop _all

Without any context -- whether the code before this that makes a difference -- or a dataset to use -- how can we tell? The problem lacks a minimal complete verifiable example. See https://stackoverflow.com/help/mcve for this and future questions.
That said, this seems to be a very roundabout way to get two pie charts side-by-side. That doesn't require a loop and it doesn't require graph combine.
graph pie billed_amount if inlist(type, "SPA", "EPD"), over(service_id) by(type)
graph export "C\provider.png", as(png) replace
Whether you want to drop all graphs afterwards is quite immaterial to the problem posed.

Related

Replicating conditional formatting Google Sheets

I am trying to create a simple table that I can just replicate over and over when needed. Although in my sheet, I have the first range, B3:D12 working exactly as I want, I am finding it a challenge to then copy the formatting across to E3:G12 and for it to work subsequently.
Is the formula wrong? Is there an easier way that I can do this to make it simple each time I copy + paste the table across?
Thanks
Google Sheet Conditional Formatting
apply this:
=(B3=D3)*(B3<>"")
to B3:12, C3:12 and D3:12 as green
then as red use:
=(B3<>D3)*(B3<>"")
on B3:12

bokeh - plotting shapefile map using datashader

Initially, I created an interactive map of the UK Postcode area where an individual area is color represented based on its value (e.g. population in that post code area) as following.
from bokeh.plotting import figure
from bokeh.palettes import Viridis256 as palette
from bokeh.models import LinearColorMapper
from bokeh.models import ColumnDataSource
import geopandas as gpd
shp = 'file_path_to_the_downloaded_shapefile'
#read shape file into dataframe using geopandas
df = gpd.read_file(shp)
def expandMultiPolygons(row, geometry):
if row[geometry].type = 'MultiPolygon':
row[geometry] = [p for p in row[geometry]]
return row
#Some rows were in MultiPolygons instead of Polygons.
#Expand MultiPolygons to multi rows of Polygons
df = df.apply(expandMultiPolygons, geometry='geometry', axis=1)
df = df.set_index('Area')['geometry'].apply(pd.Series).stack().reset_index()
#Visualize the polygons. To visualize different colors for different post areas, I added another column called 'value' which has some random integer value.
p = figure()
color_mapper = LinearColorMapper(palette=palette)
source = ColumnDataSource(df)
p.patches('x', 'y', source=source,\
fill_color={'field': 'value', 'transform': color_mapper},\
fill_alpha=1.0, line_color="black", line_width=0.05)
where df is a dataframe of four columns : post code area, x-coordinate, y-coordinate, value (i.e. population).
The above code creates an interactive map on a web browser which is great but I noticed the interactivity is not very smooth in speed. If I zoom in or move the map, it renders slowly. The size of the dataframe is only 1106 rows, so I'm quite confused why it is so slow.
As one of the possible solutions, I came across with datashader (https://datashader.readthedocs.io/en/latest/) but I find the example script is quite complicated and most of them are with holoview package on Jupyter notebook but I want to create a dashboard using bokeh.
Does anyone advise me in incorporating datashader into the above bokeh script? Do I need a different function within datashader to create the shape map instead of using bokeh's patches function?
Any suggestion would be highly appreciated!!!
Without the data file involved, I can't answer your question directly, but can offer some observations:
Datashader is unlikely to be of value for this purpose, because datashader does not currently have any support for rendering polygons. As a rule of thumb, Datashader is designed to aggregate your data, and if it's already aggregated, Datashader won't normally be of help. Here your data is aggregated by postcode, which datashader can't process, but if you had the original data per person it would be happy to render it.
If you prefer working with Bokeh directly rather than via the higher-level HoloViews/GeoViews interface, I'd recommend folllwing Matt Rocklin's work on accelerating geopandas; his approach should be very fast for your purpose.
All that said, HoloViews, and GeoViews should be a convenient way to work with Bokeh in general, whether or not you want to create a dashboard. E.g. the 2017 JupyterCon tutorial shows how to make a simple Bokeh dashboard using both libraries. It doesn't cover shape files, but those are covered in other GeoViews examples.
As mentioned in my comment, I believe that the complexity of your polygons might cause your problem. The file you linked to contains several shapefile of different sizes and complexities. You can simplify those, i.e. reduce the number of points for each polygon. This can change how they look. It can range from almost no difference over a bit more "edginess" to an angular appearance. This depends on the level of simplification you chose. Depending on your needs you can chose different levels of simplicity.
I know of three easy options to get this done:
GUI: Try QGis. It is a great opensource tool for geospatial data processing. Load your Shapefile as a new layer. Then use the "Simplify Geometries" tool under the Vector menu.
Command-Line: GDAL is an open-source library. It comes with an useful command-line tool. You can use it like this: ogr2ogr outfile.shp infile.shp -simplify 0.000001
Online: Visit mapshader. Import your file. Select simplify and chose your level. Then, export the result. What I really like here is that your file is rendered instantly. Hence, you can immediately see the result of your simplification.
Other than that, you should also update your bokeh version. It gets updated regularly and there have been some performance improvements since.
Using HoloViews or GeoViews will not positively affect your performance. Thus, it is not related to your issues. I guess #James A. Bednar was just giving some side advice there.
I found a way to speed up the interactive visualization of the UK map as I move the slider.
I created individual image (in 2D) for a different value of slider first and updated the map using the 2D images instead of using bokeh's patches function.
Since the images are in array format, it is much faster to update the image while changing the values in the slider. one downside in this method is that I can no longer use hover function on the UK map.
I referred to the following url to convert polygon information into arrays: https://gist.github.com/brendancol/db030013e981c46acb2886060dde607e#file-rasterio_datashader_polygons-py-L35

SegNet results of train set (test via test_segmentation.py)

I run SegNet on my own dataset (by Segnet tutorial). I see great results via test_segmentation.py.
my problem is that I want to see the real net results and not test_segmentation own colorisation (via classes).
for example, if I have trained net with 2 classes, so after the train I will see not only 2 colors (as we see with the classes), but we will see the real net color segmentation ([0.22,0.19,0.3....) lighter and darker as the net see it]
I hope that I explained myself well. thanks for helping.
You could use a python script to achieve what you want. Take a look at this script.
The command out = out['argmax'], extracts the raw output, so you can get a segmentation map with 'lighter and darker' values as you wanted.
When you say the 'real' net color segmentation I will assume that you mean the probability maps. Effectively the last layer will have one map for every class; and if you check the function predict in inference.py, they take the argmax; that is the channel (which represents the class) with the highest probability. If you want to get these maps, you just have to get the data without computing the argmax; something like:
predicted = net.blobs['prob'].data
I solve it. the solution is to range cmin and cmax from 0 to 1 in the scipy saving method. for example: scipy.misc.toimage(output, cmin=0.0, amax=1).save(/path/.../image.png)

How to plot the different graphs by stcurve in one chart in Stata?

I am using stcurve in Stata to plot survival probability. I need to plot the graph for all data and then for specific variables. I can generate the graphs in two different charts, but I need to have all three lines together in one chart.
I have tried the addplot() option but I get the error that stcurve is not a twoway graph. Do you have any idea how to do this?
This is the code that I have used which generates the graphs in two different charts separately:
stcurve, survival graphregion(lcolor(white) ilcolor(white) ifcolor(white) ) plotregion( lcolor(black)) title("Survival Function", size(vlarge)) ytitle("Survival probabilities", size(large)) xtitle("Time", size(large)) xlabel(,labsize(medium)) ylabel(,labsize(medium))
stcurve, survival at1( def=0) at2( def=1) graphregion(lcolor(white) ilcolor(white) ifcolor(white) ) plotregion( lcolor(black)) legend(label(1 "X Firms") label(2 "Y Firms")) legend(size(large)) lwidth(thin thick) title("Survival Function", size(vlarge)) ytitle("Survival probabilities", size(large)) xtitle("Time", size(large)) xlabel(,labsize(medium)) ylabel(,labsize(medium))
I am not sure if I understood correctly what you want. It would have been useful if you had added the stset and stcox code necessary before running stcurve.
If the Kaplan-Meier hazard graph is identical to your first stcurve, survival you can try a dirty fix by generating a variable e.g.
sts gen s2=s after running stset
then plotting it as a line against your time variable. i.e. adding this to the end of the second graph:
addplot(line s2 your_timevar, sort c(J) title("Survival probabilities"))
The equality of KM hazard and Cox hazard only holds if the first graph does not have any more predictors than failvar in the stset. So if you ran stcox, estimate after stset timevar, failure(failvar) id(idvar) it works, but if you have more variables in the stcox call this will not give you the correct plot.
edit:
As the above quick solution does not work, there is another dirty workaround: save the results from stcurve in a file (option outfile), then plot the "new" data as twoway graphs. Something like this:
stcurve, survival name("surv1") outfile(stcurve1.dta, replace)
stcurve, survival name("surv2") at1( def=0) at2( def=1) outfile(stcure2.dta, replace)
use stcurve1.dta, clear
rename surv1 surv1_A
rename _t _tA
append using stcurve2.dta
twoway line surv1 _t, sort || line surv1_A _tA, sort
I do not know if this will work with your data: it may be that you need to manipulate the new variables in the outfiles in some way to get the desired results, and you need to add the options you want to the twoway graphs. There surely are many better and easier ways of plotting this when you have the data for the graphs in separate datafiles, but this is the first solution that sprang to mind.

How to visualize generated RNA secondary structure

I'm working on a tool to visualize RNA secondary structure, for this purpose I have implemented Nussinov's algorithm which generates the RNA secondary structure as list with the corresponding indices, the code can be found here [0]
[0] http://dpaste.com/596262/
But I really stuck with understanding how I should visualize it (as a planar graph), the code above gives me a sequential list of the secondary structure, so can someone please suggest me as to how I can visualize the structure.An example of such tool can be found here [1]
[1] http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi
and I know there are better algorithms but for now I would just want to visualize with this and once I understand visualization, I will go for a better algorithm.
Visualizing the secondary structure of RNA (or any graph, for that matter) algorithmically is a difficult problem. You need to take care that there are as few overlaps as possible while maintaining consistent link lengths. As the other answers have pointed out, there are a number of existing implementations that you can already use. I'll just throw in another one that's quite easy to use and requires no downloads:
forna - nibiru.tbi.univie.ac.at/forna
Here you just need to enter a dotbracket string:
>molecule_name
CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAGCCUUAAACUCUUGAUUAUGAAGUG
((((((((((..((((((.........))))))......).((((((.......))))))..)))))))))
This will give you a visualization that looks something like this:
This is computed using a combination of the ViennaRNA RNAplot program and d3's force-directed graph algorithm.
You could do this with jmol . Jmol allows you to add arbitrary bonds / atoms to a coordinate space using its java or I believe its javascript api also.
In general, of course, PDB file formats would be used for such data.
RNAviz is old but still commonly used. JalView apparently was supposed to get RNA secondary structure rendering thru a GSoC project last year, but I'm not sure what the status in the program is.