ImportError with nltk_train - python-2.7

I am trying to get my ways around with nltk-trainer (https://github.com/japerk/nltk-trainer). I managed to train Dutch taggers and chunkers with the commands (directly in Anaconda console):
python train_tagger.py conll2002 --fileids ned.train --classifier IIS --filename ~/nltk_data/taggers/conll2002_ned_IIS.pickle
python train_chunker.py conll2002 --fileids ned.train --classifier NaiveBayes --filename ~/nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle
Then I run a little script to test the tagger and the chunker:
import nltk
from nltk.corpus import conll2002
# Loading training pickles
tokenizer = nltk.data.load('tokenizers/punkt/dutch.pickle')
tagger = nltk.data.load('taggers/conll2002_ned_IIS.pickle')
chunker = nltk.data.load('chunkers/conll2002_ned_NaiveBayes.pickle')
# Testing
test_sents = conll2002.tagged_sents(fileids="ned.testb")[0:1000]
print "tagger accuracy on test-set: " + str(tagger.evaluate(test_sents))
test_sents = conll2002.chunked_sents(fileids="ned.testb")[0:1000]
print "tagger accuracy on test-set: " + str(chunker.evaluate(test_sents))
This works fine from the nltk-trainer-master folder, but when I move the script elsewhere, I get an import error:
ImportError: No module named nltk_trainer.chunking.chunkers
How can I make this work outside the nltk-trainer-master folder, without copying the nltk_trainer folder?
(Python 2.7, nltk 3.2.1)

Related

Failed to find data source: delta in Python environment

Following: https://docs.delta.io/latest/quick-start.html#python
I have installed delta-spark and run:
from delta import *
builder = pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
spark = spark = configure_spark_with_delta_pip(builder).getOrCreate()
However when I run:
data = spark.range(0, 5)
data.write.format("delta").save("/tmp/delta-table")
the error states: delta not recognised
& if I run
DeltaTable.isDeltaTable(spark, "packages/tests/streaming/data")
It states: TypeError: 'JavaPackage' object is not callable
It seemed that I could run these commands locally (such as unit tests) without Maven or running it in a pyspark shell? It would be good to just see if I am missing a dependency?
You can just install delta-spark PyPi package using pip install delta-spark (it will pull pyspark as well), and then refer to it.
Or you can add a configuration option that will fetch Delta package. It's .config("spark.jars.packages", "io.delta:delta-core_2.12:<delta-version>"). For Spark 3.1 Delta versions is 1.0.0 (see releases mapping docs for more information).
I have an example of using Delta tables in unit tests (please note, that import statement is in the function definition because Delta package is loaded dynamically):
import pyspark
import pyspark.sql
import pytest
import shutil
from pyspark.sql import SparkSession
delta_dir_name = "/tmp/delta-table"
#pytest.fixture
def delta_setup(spark_session):
data = spark_session.range(0, 5)
data.write.format("delta").save(delta_dir_name)
yield data
shutil.rmtree(delta_dir_name, ignore_errors=True)
def test_delta(spark_session, delta_setup):
from delta.tables import DeltaTable
deltaTable = DeltaTable.forPath(spark_session, delta_dir_name)
hist = deltaTable.history()
assert hist.count() == 1
environment is initialized via pytest-spark:
[pytest]
filterwarnings =
ignore::DeprecationWarning
spark_options =
spark.sql.extensions: io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog: org.apache.spark.sql.delta.catalog.DeltaCatalog
spark.jars.packages: io.delta:delta-core_2.12:1.0.0
spark.sql.catalogImplementation: in-memory

How to launch a Python Spyder session through script / shortcut?

I have this code to launch a Spyder IDE, in Anaconda 2, Python 2.7 :
from spyderlib import start_app
main1= start_app.main()
main1.load_session('/project27/_test01_.session.tar')
'''
from spyderlib.utils.iofuncs import load_session
load_session(filename+'.session.tar')
'''
Code method to load session is here: https://github.com/jromang/spyderlib/blob/master/spyderlib/spyder.py
#---- Sessions
def load_session(self, filename=None):
"""Load session"""
if filename is None:
self.redirect_internalshell_stdio(False)
filename, _selfilter = getopenfilename(self, _("Open session"),
getcwd(), _("Spyder sessions")+" (*.session.tar)")
self.redirect_internalshell_stdio(True)
if not filename:
return
if self.close():
self.next_session_name = filename
the 1st part comes from Anaconda Scripts where Spyder script.
It seems not working to load session.
Spyder sessions were removed in Spyder 3.0. Now the same functionality is provided by Projects (which also save the list of open files in the Editor), so please update to that version.
Besides, Spyder 3.1 will come with a new option called --project to load a project at startup (Spyder 3.1 will be released on January 17/2017).
For people still using only Spyder 2.0 (....), there is a small hack possible to create shortcut of session (SPyder session launched directly with shortcut).
Here, the code :
# -*- coding: utf-8 -*-
import sys, time, os
file_session= ''
if len(sys.argv) > 1 :
file_session= sys.argv[1]
print file_session
sys.argv= sys.argv[:1]
from spyderlib import start_app
if file_session != '' :
main1= start_app.main( file_session)
else :
main1= start_app.main()

python + wx & uno to fill libreoffice using ubuntu 14.04

I collected user data using a wx python gui and than I used uno to fill this data into an openoffice document under ubuntu 10.xx
user + my-script ( +empty document ) --> prefilled document
After upgrading to ubuntu 14.04 uno doesn't work with python 2.7 anymore and now we have libreoffice instead of openoffice in ubuntu. when I try to run my python2.7 code, it says:
ImportError: No module named uno
How could I bring it back to work?
what I tried:
installed https://pypi.python.org/pypi/unotools v0.3.3
sudo apt-get install libreoffice-script-provider-python
converted the code to python3 and got uno importable, but wx is not importable in python3 :-/
ImportError: No module named 'wx'
googled and read python3 only works with wx phoenix
so tried to install: http://wxpython.org/Phoenix/snapshot-builds/
but wasn't able to get it to run with python3
is there a way to get the uno bridge to work with py2.7 under ubuntu 14.04?
Or how to get wx to run with py3?
what else could I try?
Create a python macro in LibreOffice that will do the work of inserting the data into LibreOffice and then in your python 2.7 code envoke the macro.
As the macro is running from with LibreOffice it will use python3.
Here is an example of how to envoke a LibreOffice macro from the command line:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
##
# a python script to run a libreoffice python macro externally
# NOTE: for this to run start libreoffice in the following manner
# soffice "--accept=socket,host=127.0.0.1,port=2002,tcpNoDelay=1;urp;" --writer --norestore
# OR
# nohup soffice "--accept=socket,host=127.0.0.1,port=2002,tcpNoDelay=1;urp;" --writer --norestore &
#
import uno
from com.sun.star.connection import NoConnectException
from com.sun.star.uno import RuntimeException
from com.sun.star.uno import Exception
from com.sun.star.lang import IllegalArgumentException
def uno_directmacro(*args):
localContext = uno.getComponentContext()
localsmgr = localContext.ServiceManager
resolver = localsmgr.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext )
try:
ctx = resolver.resolve("uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
except NoConnectException as e:
print ("LibreOffice is not running or not listening on the port given - ("+e.Message+")")
return
msp = ctx.getValueByName("/singletons/com.sun.star.script.provider.theMasterScriptProviderFactory")
sp = msp.createScriptProvider("")
scriptx = sp.getScript('vnd.sun.star.script:directmacro.py$directmacro?language=Python&location=user')
try:
scriptx.invoke((), (), ())
except IllegalArgumentException as e:
print ("The command given is invalid ( "+ e.Message+ ")")
return
except RuntimeException as e:
print("An unknown error occurred: " + e.Message)
return
except Exception as e:
print ("Script error ( "+ e.Message+ ")")
print(e)
return
return(None)
uno_directmacro()
And this is the corresponding macro code within LibreOffice called "directmacro.py" and stored in the User area for libreOffice macros (which would normally be $HOME/.config/libreoffice/4/user/Scripts/python :
#!/usr/bin/python
from com.sun.star.awt.MessageBoxButtons import BUTTONS_OK, BUTTONS_OK_CANCEL, BUTTONS_YES_NO, BUTTONS_YES_NO_CANCEL, BUTTONS_RETRY_CANCEL, BUTTONS_ABORT_IGNORE_RETRY
from com.sun.star.awt.MessageBoxButtons import DEFAULT_BUTTON_OK, DEFAULT_BUTTON_CANCEL, DEFAULT_BUTTON_RETRY, DEFAULT_BUTTON_YES, DEFAULT_BUTTON_NO, DEFAULT_BUTTON_IGNORE
from com.sun.star.awt.MessageBoxType import MESSAGEBOX, INFOBOX, WARNINGBOX, ERRORBOX, QUERYBOX
def directmacro(*args):
import socket, time
class FontSlant():
from com.sun.star.awt.FontSlant import (NONE, ITALIC,)
#get the doc from the scripting context which is made available to all scripts
desktop = XSCRIPTCONTEXT.getDesktop()
model = desktop.getCurrentComponent()
text = model.Text
tRange = text.End
cursor = desktop.getCurrentComponent().getCurrentController().getViewCursor()
doc = XSCRIPTCONTEXT.getDocument()
parentwindow = doc.CurrentController.Frame.ContainerWindow
# your cannot insert simple text and text into a table with the same method
# so we have to know if we are in a table or not.
# oTable and oCurCell will be null if we are not in a table
oTable = cursor.TextTable
oCurCell = cursor.Cell
insert_text = "This is text inserted into a LibreOffice Document\ndirectly from a macro called externally"
Text_Italic = FontSlant.ITALIC
Text_None = FontSlant.NONE
cursor.CharPosture=Text_Italic
if oCurCell == None: # Are we inserting into a table or not?
text.insertString(cursor, insert_text, 0)
else:
cell = oTable.getCellByName(oCurCell.CellName)
cell.insertString(cursor, insert_text, False)
cursor.CharPosture=Text_None
return None
You will of course need to adapt the code to either accept data as arguments, read it from a file or whatever.
Ideally I would say use python 3, because python 2 is becoming outdated. The switch requires quite a bit of new coding changes, but better sooner than later. So I tried:
sudo pip3 install -U --pre \
-f http://wxpython.org/Phoenix/snapshot-builds/ \
wxPython_Phoenix
However this gave me errors, and I didn't want to spend the next couple of days working through them. Probably the pre-release versions are not ready for prime time yet.
So instead, what I recommend is to switch to AOO for now. See https://stackoverflow.com/a/27980255/5100564 for instructions. AOO does not have all the latest features that LO has, but it is a good solid Office product.
Apparently it is also possible to rebuild LibreOffice with python 2 using this script: https://gist.github.com/hbrunn/6f4a007a6ff7f75c0f8b

pywin32 and pyttsx error, trouble combining the two

i have pywin32 in my site packages and my pyttsx is in a separate folder. Is this the reason why i am getting the following error?
import win32api, sys, os
ImportError: DLL load failed: The specified module could not be found
The code is as follows,
import pyttsx
def onStart(name):
print 'starting', name
def onWord(name, location, length):
print 'word', name, location, length
def onEnd(name, completed):
print 'finishing', name, completed
engine = pyttsx.init()
engine.connect('started-utterance', onStart)
engine.connect('started-word', onWord)
engine.connect('finished-utterance', onEnd)
engine.say('The quick brown fox jumped over the lazy dog.')
engine.runAndWait()
from here, http://pyttsx.readthedocs.org/en/latest/engine.html#examples
My pywin32 is from here,
http://sourceforge.net/projects/pywin32/files/pywin32/Build%20219/
for Py 2.7
The problem was that the file
pywintypes27.dll
was not in the right directory. It had to be in
'C:\Windows\System32'
#CristiFati
Use the pyttsx3 module instead . It supports both python3 and python2.
To install:
pip install pyttsx3.
It automatically installs those win32 and other dependencies.

Downloading a file in Python 2.x vs 3.x

I'm working on a script that checks multiple domain names if they are registered or not. It loops through a list of the domains read from a file, and the registration check is done using enom's API. My problem is the code accessing the API in Python 2:
import urllib2
import xml.etree.ElementTree as ET
...
response = urllib2.urlopen(url)
content = ET.fromstring(response.read())
...
return content.find("RRPCode").text
generates the error: 'NoneType' object has no attribute 'text' in nearly 30% of the checks. While the Python 3 code:
import urllib.request
import xml.etree.ElementTree as ET
...
response = urllib.request.urlopen(url)
content = ET.fromstring(response.read())
...
return content.find("RRPCode").text
works fine. I should also mention that the number of errors returned are random and not related to specific domain names.
What could be the cause of these errors?
I am by the way using Python 2.7.3 and Python 3.2.3 on a VPS running Ubuntu 12.04 server.
Thanks.