Matplotllib and Xelatex - python-2.7

I tried to find the answer for my question for some time now but could not come up with something that works for me. My question is: How can you use Xelatex to compile text in Matplotlib?
I know that there is this page:
http://matplotlib.org/users/pgf.html
However, I could not come up with something that would work. What I got up to now:
import matplotlib as mpl
mpl.use("pgf")
## TeX preamble
preamble = """
\usepackage{fontspec}
\setmainfont{Linux Libertine O}
"""
params = {"text.usetex": True,
'pgf.texsystem': 'xelatex',
'pgf.preamble': preamble}
mpl.rcParams.update(params)
import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
plt.xlabel(r'\textsc{Something in small caps}', fontsize=20)
plt.ylabel(r'Normal text ...', fontsize=20)
plt.savefig('test.pdf')
Running this code produces the following warning:
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/backends/backend_pgf.py:51: UserWarning: error getting fonts from fc-list
warnings.warn('error getting fonts from fc-list', UserWarning)
An output file is created but I don't the font is wrong (not Linux Libertine), even though I have the font installed and am able to use it with XeLaTex (I am able to write a pdf file using xelatex that is set in the Linux Libertine font).
Any help would be really appreciated....

There are a few problems with your code:
You need to give latex the control over the fonts by using the option:
'pgf.rcfonts': False
You should also use unicode for xelatex: 'text.latex.unicode': True.
'pgf.preamble' expects a python lists of single latex commands.
if you set the font to 'Linux Libertine O' you probably want serif fonts,
so 'font.family': 'serif'
beware of escape sequences in the preamble, you should make it raw strings
add a unicode tag at the beginning of the file and be sure the encoding is utf-8
Using this, your code becomes:
# -*- coding:utf-8 -*-
import matplotlib as mpl
mpl.use("pgf")
## TeX preamble
preamble = [
r'\usepackage{fontspec}',
r'\setmainfont{Linux Libertine O}',
]
params = {
'font.family': 'serif',
'text.usetex': True,
'text.latex.unicode': True,
'pgf.rcfonts': False,
'pgf.texsystem': 'xelatex',
'pgf.preamble': preamble,
}
mpl.rcParams.update(params)
import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
plt.xlabel(r'\textsc{Something in small caps}', fontsize=20)
plt.ylabel(r'Normal text ...', fontsize=20)
plt.savefig('test.pdf')
Result:

Related

UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 12

I'm developing a chatbot with the chatterbot library. The chatbot is in my native language --> Slovene, which has a lot of strange characters (for example: š, č, ž). I'm using python 2.7.
When I try to train the bot, the library has trouble with the characters mentioned above. For example, when I run the following code:
chatBot.set_trainer(ListTrainer)
chatBot.train([
"Koliko imam še dopusta?",
"Letos imate še 19 dni dopusta.",
])
it throws the following error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 12: invalid start byte
I added the # -*- coding: utf-8 -*- line to the top of my file, I also changed the encoding of all used files via my editor (Sublime text 3) to utf-8, I changed the system default encoding with the following code:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
The strings are of type unicode.
When I try to get a response, with these strange characters, it works, it has no issues with them. For example, running the following code in the same execution as the above training code(when I change 'š' to 's' and 'č' to 'c', in the train strings), throws no errors:
chatBot.set_trainer(ListTrainer)
chatBot.train([
"Koliko imam se dopusta?",
"Letos imate se 19 dni dopusta.",
])
chatBot.get_response("Koliko imam še dopusta?")
I can't find a solution to this issue. Any suggestions?
Thanks loads in advance. :)
EDIT: I used from __future__ import unicode_literals, to make strings of type unicode. I also checked if they really were unicode with the method type(myString)
I would also like to paste this link.
EDIT 2: #MallikarjunaraoKosuri - s code works, but in my case, I had one more thing inside the chatbot instance intialization, which is the following:
chatBot = ChatBot(
'Test',
trainer='chatterbot.trainers.ListTrainer',
storage_adapter='chatterbot.storage.JsonFileStorageAdapter'
)
This is the cause of my error. The json storage file the chatbot creates, is created in my local encoding and not in utf-8. It seems the default storage (.sqlite3), doesn't have this issue, so for now I'll just avoid the json storage. But I am still interested in finding a solution to this error.
The strings from your example are not of type unicode.
Otherwise Python would not throw the UnicodeDecodeError.
This type of error says that at a certain step of program's execution Python tries to decode byte-string into unicode but for some reason fails.
In your case the reason is that:
decoding is configured by utf-8
your source file is not in utf-8 and almost certainly in cp1252:
import unicodedata
b = '\x9a'
# u = b.decode('utf-8') # UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a
# in position 0: invalid start byte
u = b.decode('cp1252')
print unicodedata.name(u) # LATIN SMALL LETTER S WITH CARON
print u # š
So, the 0x9a byte from your cp1252 source can't be decoded with utf-8.
The best solution is to do nothing except convertation your source to utf-8.
With Sublime Text 3 you can easily do it by: File -> Reopen with Encoding -> UTF-8.
But don't forget to Ctrl+C your source code before the convertation beacuse just after that all your š, č, ž chars wil be replaced with ?.
Some of our friends are already suggested good part solutions, However again I would like combine all the solutions into one.
And author #gunthercox suggested some guidelines are described here http://chatterbot.readthedocs.io/en/stable/encoding.html#how-do-i-fix-python-encoding-errors
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
# Create a new chat bot named Test
chatBot = ChatBot(
'Test',
trainer='chatterbot.trainers.ListTrainer'
)
chatBot.train([
"Koliko imam še dopusta?",
"Letos imate še 19 dni dopusta.",
])
Python Terminal
>>> # -*- coding: utf-8 -*-
... from chatterbot import ChatBot
>>>
>>> # Create a new chat bot named Test
... chatBot = ChatBot(
... 'Test',
... trainer='chatterbot.trainers.ListTrainer'
... )
>>>
>>> chatBot.train([
... "Koliko imam še dopusta?",
... "Letos imate še 19 dni dopusta.",
... ])
List Trainer: [####################] 100%
>>>

Displaying PNG in matplotlib.pyplot framework in python 2.7

I am pulling PNG images from Jupyter Notebooks and manage to display with IPython.display.Image but not with matplotib.pyplot.plt. What am I missing? I use python 2.7.
I am using the following algorithm:
To open the notebook JSON content I do:
import nbformat
notebook_ = nbformat.read(file_notebook, 4)
After retrieving the relevant cell information I pull the png information from it using:
def cell_to_image(cell, out_value_item_number=1):
if "execution_count" in cell.keys(): # i.e version >=4
return cell["outputs"][out_value_item_number]['data']['image/png']
elif "prompt_number" in cell.keys(): # i.e version < 4
return cell["outputs"][out_value_item_number]['png']
return None
cell_image = cell_to_image(cell)
The first few characters of cell_image (which is unicode) looks like:
iVBORw0KGgoAAAANSUhEUgAAA64AAAFMCAYAAADLFeHSAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n
AAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xd8jef/x/HXyTjZiYQkCGrU3ruR0tr9oq2qGtGo0dbe
\nm5pVlJpFUSMoVb6UoEZ/lCpatWuPUiNEEiMDmef3R75OexonJKUO3s/HI4/mXPd1X/d1f+LRR965
\n7/u6DSaTyYSIiIiIiIiIjbJ70hMQERERERERyYiCq4iIiIiIiNg0BVcRERERERGxaQquIiIiIiIi
\nYtMUXEVERERERMSmKbiKiIiIiIiITVNwFRGRxyIkJIRixYqxfv36+24/e/YsxYoVo3jx4v/yzGxb
\naGgoderUIS4uDoBdu3bRsmVLKlasyCuvvMKgQYOIjo622CcsLIyGDRtSunRp6tSpw8KFC62OW7p0
\naRo2bJju53Lnzh1GjRrFyy+/TNmyZWnRogW//fbbQ835q6++olGjRpQvX5769eszc+ZMkpOTzdtT
\nU1OZNGkSNWrUoHTp0jRp0oTdu3enGyc2NpZOn
I can easily plot in my Jupityer notebook using
from IPython.display import Image
Image(cell_image)
And now to my question:
How can I manipulate cell_image to be plt.subplot friendly?
(Assuming import matplotlib.pyplot as plt).
I realise that plt.imshow wouldn't work because this would require an array, which is not my case (which is a string, as far as I understand).
If you have your image string representation in a variable string_rep, the following code should work.
from io import BytesIO
import matplotlib.image as mpimage
import matplotlib.pyplot as plt
with BytesIO(string_rep.decode('base64')) as byte_rep:
image = mpimage.imread(byte_rep)
plt.imshow(image)

Os.walk - WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect:

new to python and looking for some help on a problem I am having with os.walk. I have had a solid look around and cannot find the right solution to my problem.
What the code does:
Scans a users selected HD or folder and returns all the filenames, subdirs and size. This is then manipulated in pandas (not in code below) and exported to an excel spreadsheet in the formatting I desired.
However, in the first part of the code, in Python 2.7, I am currently experiencing the below error:
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'E:\03. Work\Bre\Files\folder2\icons greyscale flatten\._Icon_18?10 Stainless Steel.psd'
I have explored using raw string (r') but to no avail. Perhaps I am writing it wrong.
I will note that I never get this in 3.5 or on cleanly labelled selected folders. Due to Pandas and pysinstaller problems with 3.5, I am hoping to stick with 2.7 until the error with 3.5 is resolved.
import pandas as pd
import xlsxwriter
import os
from io import StringIO
#Lists for Pandas Dataframes
fpath = []
fname = []
fext = []
sizec = []
# START #Select file directory to scan
filed = raw_input("\nSelect a directory to scan: ")
#Scan the Hard-Drive and add to lists for Pandas DataFrames
print "\nGetting details..."
for root, dirs, files in os.walk(filed):
for filename in files:
f = os.path.abspath(root) #File path
fpath.append(f)
fname.append(filename) #File name
s = os.path.splitext(filename)[1] #File extension
s = str(s)
fext.append(s)
p = os.path.join(root, filename) #File size
si = os.stat(p).st_size
sizec.append(si)
print "\nDone!"
Any help would be greatly appreciated :)
In order to traverse filenames with unicode characters, you need to give os.walk a unicode path name.
Your path contains a unicode character, which is being displayed as ? in the exception.
If you pass in the unicode path, like this os.walk(unicode(filed)) you should not get that exception.
As noted in Convert python filenames to unicode sometimes you'll get a bytestring if the path is "undecodable" by Python 2.

Python Scikit-learn CountVectorizer throwing ValueError: empty vocabulary

I'm trying to extract features from a text document. Here is my code:
import sklearn
from sklearn.datasets import load_files
from sklearn.feature_extraction.text import CountVectorizer
files = sklearn.datasets.load_files('/home/niyas/Documents/project/container', shuffle = False)
vectorizer = CountVectorizer(min_df=1)
X = vectorizer.fit_transform(files.data[1])
Y=vectorizer.get_feature_names()
I'm getting an error "ValueError: empty vocabulary; perhaps the documents only contain stop words". The code works fine when I pass a string with the exact same content of the text doc.
Help me. Thanks in advance.

How to resolve UserWarning: findfont: Could not match :family=Bitstream Vera Sans

Following this example:
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
for i, label in enumerate(('A', 'B', 'C', 'D')):
ax = fig.add_subplot(2,2,i+1)
ax.text(0.05, 0.95, label, transform=ax.transAxes,
fontsize=16, fontweight='bold', va='top')
plt.show()
I get this output:
Why are my labels normal weight, while the documentation shows this should create bold letters A, B, C, D?
I also get this warning:
Warning (from warnings module):
File "C:\Python27\lib\site-packages\matplotlib\font_manager.py", line 1228
UserWarning)
UserWarning: findfont: Could not match :family=Bitstream Vera Sans:style=italic:variant=normal:weight=bold:stretch=normal:size=x-small. Returning C:\Python27\lib\site-packages\matplotlib\mpl-data\fonts\ttf\Vera.ttf
OP Resolution
From a deleted answer posted by the OP on Sep 15, 2013
Ok, it was a problem with the installation of matplotlib
Try using weight instead of fontweight.
Maybe try using this -
plt.rcParams['axes.labelsize'] = 16
plt.rcParams['axes.labelweight'] = 'bold'
Do this at a global level in your program.
The example from your question works on my machine. Hence you definately have a library problem. Have you considered using latex to make bold text? Here an example
Code
import numpy as np
import matplotlib.pyplot as plt
fig, axs = plt.subplots(3, 1)
ax0, ax1, ax2 = axs
ax0.text(0.05, 0.95, 'example from question',
transform=ax0.transAxes, fontsize=16, fontweight='bold', va='top')
ax1.text(0.05, 0.8, 'you can try \\textbf{this} using \\LaTeX', usetex=True,
transform=ax1.transAxes, fontsize=16, va='top')
ax2.text(0.05, 0.95,
'or $\\bf{this}$ (latex math mode with things like '
'$x_\mathrm{test}^2$)',
transform=ax2.transAxes, fontsize=10, va='top')
plt.show()
Not sure if you're still having the issue. I tried your code in Anaconda/Spyder, Python 2.7. The plots appear with Bold labels (A,B,C,D). I agree the issue is probably with the library. Try replacing / updating font_manager.py or confirming font files are present:
Lib\site-packages\matplotlib\mpl-data\fonts\ttf\
I had the same problem and spent quite a few hours today on that. Here`s the solution that helped me:
import matplotlib
matplotlib.font_manager._rebuild()
With this, the font_manager could be upgraded easily.