Simple, non-ribbon chord diagram (Python 2.7)

Simple, non-ribbon chord diagram (Python 2.7) - python-2.7

I am looking to replicate a plot similar to this:
with the following criteria:
NON-ribbon connections between points (thin lines)
Text outside each dot
Only 10 or so points, instead of the 30+ shown above.
Additional colors (lines/text) not necessary.
I'm aware that there are packages like Plotly, but these seem to be dedicated to having ribbon-like connections, instead of thing lines. I'm also aware of Circos, but it seems difficult to make something that is relatively simple.
Is there a straight-forward Python 2.7 script that I can use (an online example, or online tool)?

Related

How to get most similar words to a document in gensim doc2vec?

I have built a gensim Doc2vec model. Let's call it doc2vec. Now I want to find the most relevant words to a given document according to my doc2vec model.
For example, I have a document about "java" with the tag "doc_about_java". When I ask for similar documents, I get documents about other programming languages and topics related to java. So my document model works well.
Now I want to find the most relevant words to "doc_about_java".
I follow the solution from the closed question How to find most similar terms/words of a document in doc2vec? and it gives me seemingly random words, the word "java" is not even among the first 100 similar words:
docvec = doc2vec.docvecs['doc_about_java']
print doc2vec.most_similar(positive=[docvec], topn=100)
I also tried like this:
print doc2vec.wv.similar_by_vector(doc2vec["doc_about_java"])
but it didn't change anything. How can I find the most similar words to a given document?

Not all Doc2Vec modes even train word-vectors. In particular, the PV-DBOW mode dm=0, which often works very well for doc-vector comparisons, leaves word-vectors at randomly-assigned (and unused) positions.
So that may explain why the results of your initial attempt to get a list-of-related-words seem random.
To get word-vectors, you'd need to use PV-DM mode (dm=1), or add optional concurrent word-vector training to PV-DBOW (dm=0, dbow_words=1).
(If this isn't the issue, there maybe other problems in your training setup, so you should show more detail about your data source, size, and code.)
(Separately, your alternate attempt code-line, by using doc2vec["doc_about_java"] is retrieving a word-vector for "doc_about_java" (which may not be present at all). To get the doc-vector, use doc2vec.docvecs["doc_about_java"], as in your first code block.)

TSFRESH library for python is taking way too long to process

I came across the TSfresh library as a way to featurize time series data. The documentation is great, and it seems like the perfect fit for the project I am working on.
I wanted to implement the following code that was shared in the quick start section of the TFresh documentation. And it seems simple enough.
from tsfresh import extract_relevant_features
feature_filtered_direct=extract_relevant_features(result,y,column_id=0,column_sort=1)
My data included 400 000 rows of sensor data, with 6 sensors each for 15 different id's. I started running the code, and 17 hours later it still had not finished. I figured this might be too large of a data set to run through the relevant feature extractor, so I trimmed it down to 3000, and then further down to 300. None of these actions made the code run under an hour, and I just ended up shutting it down after an hour or so of waiting. I tried the standard feature extractor as well
extracted_features = extract_features(timeseries, column_id="id", column_sort="time")
Along with trying the example dataset that TSfresh presents on their quick start section. Which includes a dataset that is very similar to my orginal data, with about the same amount of data points as I reduced to.
Does anybody have any experience with this code? How would you go about making it work faster? I'm using Anaconda for python 2.7.
Update
It seems to be related to multiprocessing. Because I am on windows, using the multiprocess code requires to be protected by
if __name__ == "__main__":
main()
Once I added
if __name__ == "__main__":
extracted_features = extract_features(timeseries, column_id="id", column_sort="time")
To my code, the example data worked. I'm still having some issues with running the extract_relevant_features function and running the extract features module on my own data set. It seems as though it continues to run slowly. I have a feeling its related to the multiprocess freeze as well, but without any errors popping up its impossible to tell. Its taking me about 30 minutes to run to extract features on less than 1% of my dataset.

which version of tsfresh did you use? Which OS?
We are aware of the high computational costs of some feature calculators. There is less we can do about it. In the future we will implement some tricks like caching to increase the efficiency of tsfresh further.
Have you tried calculating only the basic features by using the MinimalFeatureExtractionSettings? It will only contain basic features such as Max, Min, Median and so on but should run way, way faster.
from tsfresh.feature_extraction import MinimalFeatureExtractionSettings
extracted_features = extract_features(timeseries, column_id="id", column_sort="time", feature_extraction_settings = MinimalFeatureExtractionSettings())
Also it is probably a good idea to install the latest version from the repo by pip install git+https://github.com/blue-yonder/tsfresh. We are actively developing it and the master should contain the newest and freshest version ;).

Syntax has changed slightly (see docs), the current approach would be:
from tsfresh.feature_extraction import EfficientFCParameters, MinimalFCParameters
extract_features(timeseries, column_id="id", column_sort="time", default_fc_parameters=MinimalFCParameters())
Or
extract_features(timeseries, column_id="id", column_sort="time", default_fc_parameters=EfficientFCParameters())

Since version 0.15.0 we have improved our bindings for Apache Spark and dask.
It is now possible to use the tsfresh feature extraction directly in your usual dask or Spark computation graph.
You can find the bindings in tsfresh.convenience.bindings with the documentation here. For example for dask, it would look something like this (assuming df is a dask.DataFrame, for example the robot failure dataframe from our example)
df = df.melt(id_vars=["id", "time"],
value_vars=["F_x", "F_y", "F_z", "T_x", "T_y", "T_z"],
var_name="kind", value_name="value")
df_grouped = df.groupby(["id", "kind"])
features = dask_feature_extraction_on_chunk(df_grouped, column_id="id", column_kind="kind",
column_sort="time", column_value="value",
default_fc_parameters=EfficientFCParameters())
# or any other parameter set
Using either dask or Spark (or anything alike) might help you with very large data - both for memory as well as speed (as you can distribute the work over multiple machines). Of course, we still support the usual distributors (docu) as before.
Additional to that, it is also possible to run tsfresh together with a task orchestration system, such as luigi. You can create a task to
* read in the data for only one id and kind
* extract the features
* write out the result to disk
and let luigi handle all the rest. You may find a possible implementation of this here on my blog.

I've found, at least on a multicore machine, that a better way to distribute extract_features calculation over independent subgroups (identified by the column_id value) is through joblib.Parallel with the Loky backend.
For example, you define your features extraction function on a single value of columnd_id and you apply it
from joblib import Parallel, delayed
def map_extract_features(df):
return extract_features(
timeseries_container=df,
default_fc_parameters=settings,
column_id="ID",
column_sort="DATE",
n_jobs=1,
disable_progressbar=True
).reset_index().rename({"index":"ID_CONTO"}, axis=1)
out = Parallel(n_jobs=cpu_count()-1)(
delayed(map_extract_features)(
my_dataframe[my_dataframe["ID"]==id]
) for id in tqdm(my_dataframe["ID"].unique())
)
This method takes way less memory than specifying column_id directly in the extract_features function.

Is there a one line text input box in python 2.7?

Java has a method that opens a text box and pulls the answer:
JOptionPane.showInputDialog(frame, "What's your name?");
Is there a similar easy solution for pulling input text?
Edit: I understand that packages exist and what they are, I'm asking if you know of a package that has a one line solution

In python if you are using Tkinter then you can use
Entry()
Check this link out for more clear guidelines

Vanilla python has no methods for UI interfaces (I believe as any other programming language - Java, C#, C++, %any_other_language% use separate libraries or frameworks to create UI). There is number of libraries for python which are used to draw different user interfaces and which would allow you to do this. You could check this list, pick up the one you like more and read how you could do it there:
https://wiki.python.org/moin/GuiProgramming
On the other hand, the most commonly used one is tkinter. For tkinter, you could check examples here (there is example of creating whole python program - a loan calculator - with a UI)
http://www.python-course.eu/tkinter_entry_widgets.php

Getting Variables in and out with scipy weave

I'm trying to use SciPy Weave to speed up a loop intensive task in a Python code I'm translating from MatLab. In MatLab I went to great lengths to vectorise it which had it running really fast. I've already tried it in Python with loops and it is unacceptably slow, so although I could probably do what I did in MatLab with NumPy and get it running just as fast I want to learn to use Weave because it will be really useful in the future. I've checked my C++ works here: http://ideone.com/E8ACaq (the variables above the first comment are the ones that I pass to weave) and I have simpler examples working in weave, but I can't get the actual code to play ball in weave!
What I need to know is what format my variables need to be in to get them in and out. Ideally I want a numpy array out and I want to pass in a variety of integers, numpy arrays and doubles as shown below (Python indents not shown because of how code works on here).
def GoodManLookup(CycleData,MaterialID,Materials):
nCyc=len(CycleData)
NRVals=int(Materials[MaterialID]['rvalues'])
NLevels=np.product(Materials[MaterialID]['levels'].shape)
GoodmanAmp=Materials[MaterialID]['data'][:,1]
GoodmanMean=Materials[MaterialID]['data'][:,0]
RValAngles=np.array([np.arctan2(GoodmanData[0:NRVals,1],GoodmanData[0:NRVals,0])*180/math.pi])
CycleAngles=np.arctan2(CycleData[:,0],CycleData[:,1])*180/math.pi
CycleAngles[CycleAngles==180]=179.99
CycleAngles[CycleAngles==90]=89.99
CycleAngles[CycleAngles==0]=0.01
#Get sector of each cycle
SectB=np.tile(RValAngles[0,0:NRVals-1],(nCyc,1))
SectT=np.tile(RValAngles[0,1:NRVals],(nCyc,1))
Angles=np.tile(np.array([CycleAngles]).swapaxes(1,0),(1,NRVals-1))
Sect=np.array([np.sum(np.bitwise_and(Angles>SectB,Angles<SectT)*np.tile(np.array([range(0,NRVals-1)]),(nCyc,1)),axis=1)]).swapaxes(1,0)
Amplitude=CycleData[:,0];
Mean=CycleData[:,1];
N=Materials[MaterialID]['levels'];
if CurveFit==1:
PowerLaw=True
elif CurveFit==0:
PowerLaw=False
code_cpp ="""//code on IDE one
"""
P = weave.inline(code_cpp,['Sect','Amplitude','Mean','nCyc','NLevels','PowerLaw','N','NRVals','GoodmanMean','GoodmanAmp'],compiler='gcc')
I'm new to Python and C++ so this is what you might call a perfect storm of incompetence!

Tool for drawing automaton

I am doing composition of two automaton (Actually it is a transducer). So at the of it, I want to visually represent it to analyze it.
Which is the best tool/library for the same?
People have suggested me dot and graphviz. Which is better? I am writing code in OCaml. Does that have any library to draw that?
This is an example transducer which I want to draw?

People have suggested me dot and graphviz. Which is better?
There is no better - graphviz uses graphs in the dot language as input (and output among others), and has a layout command called dot which lays out directed graphs.
I am writing code in OCaml. Does that have any library to draw that?
I don't know OCaml, but it looks like there is ocamlgraph which is able to create the dot graphs - see also this similar question.
It also looks like there are graphviz ocaml extensions available for some platforms.
This is an example transducer which I want to draw?
Not sure what the question is, but this graph looks like it has been made with graphviz.

If you want high-quality renderings I suggest to generate dot files and then try dot2tex (never used it though) to generate PGF/TikZ to use with LaTeX. Here a few examples of TikZ automata renderings.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js