How to store the triples in 4store - django

File "<console>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/django_gstudio-0.3.dev-py2.7.egg/gstudio/testing1.py", line 129, in rdf_description
store.add(self,(subject, predicate, object),context)
File "/usr/local/lib/python2.7/dist-packages/rdflib-3.2.0-py2.7.egg/rdflib/plugins/memory.py", line 298, in add
Store.add(self, triple, context, quoted)
File "/usr/local/lib/python2.7/dist-packages/rdflib-3.2.0-py2.7.egg/rdflib/store.py", line 177, in add
def add(self, (subject, predicate, object), context, quoted=False):
in
store.add(self, (subject, predicate, object), context, quoted=False)

AFAIK - rdflib does not support 4store. But you can easily assert the triples using curl and python and the 4store SPARQL Server. Here there is an example:
import subprocess
command = ["curl","-s",
"-T","/some/file/with/triples",
"-H","Content-Type: application/x-turtle",
"http://localhost:port/data/http://graph.to/save/triples"]
p = subprocess.Popen(command,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
output, err = p.communicate()
ret = p.poll()
if ret <> 0:
raise Exception, "Error asserting triples"
In this example the content type is turtle but you can use any of the other RDF serializations (ntriples, rdfxml).
If you do not want to deal with subprocesses you can also translate this call into a urllib/urllib2 function.
There are more examples in the 4store SparqlServer documentation. And, optionally, you can use any of the Python 4store client libraries.

Related

Python RPy2 function with multiple input arguments

I am looking to call an rPy2 function with multiple input parameters. Here is the R function write.csv that I am trying to use. It has multiple input parameters and I need to specify more than one such parameter.
If I use it without the optional parameter row.names and column.names, it works like this:
r("write.csv")(d,file='myfilename.csv')
For my requirements, I must issue this command with the optional parameters row.names and column.names. So, I tried:
r('write.csv')(d, file='myfilename.csv', row.names=FALSE, column.names=FALSE)
but I got this error message:
File "/home/UserName/test.py", line 12
r("write.csv")(d,file='myfilename.csv',row.names=FALSE, column.names=FALSE)
SyntaxError: keyword can't be an expression
[Finished in 0.0s with exit code 1]
[shell_cmd: python -u "/home/UserName/test.py"]
[dir: /home/UserName]
[path: /home/UserName/bin:/home/UserName/.local/bin:/usr/local/sbin:/usr/local/bin:
.../usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin]
How can I achieve write.csv with row.names=FALSE and column.names=FALSE, in rPy2?
You can use Python's **.
See the note here: http://rpy2.readthedocs.io/en/version_2.8.x/robjects_functions.html#callable
Ony of my mistakes was that I should have replaced . by _, as shown in the docs here:
from rpy2.robjects.packages import importr
base = importr('base')
base.rank(0, na_last = True)
so I would analogously need row_names = TRUE. However, the . in write.csv() still remained, so this only solved part of the question. Ok, so I tried a few things to get an answer:
Generating sample data:
from rpy2.robjects import r, globalenv
from rpy2.robjects import IntVector, DataFrame
d = {'a': IntVector((1,2,3)), 'b': IntVector((4,5,6))}
dataf = DataFrame(d)
Attempts follow - 1. did not work, 2. and 3. did work:
1:
r('write_csv')(x=dataf,file='testing.csv',row_names=False)
Traceback (most recent call last):
File "C:\Users\UserName\FileD\test.py", line 18, in <module>
r('write_csv')(x=dataf,file='testing.csv',row_names=False)
File "C:\Python27\lib\site-packages\rpy2\robjects\__init__.py", line 321, in __call__
res = self.eval(p)
File "C:\Python27\lib\site-packages\rpy2\robjects\functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "C:\Python27\lib\site-packages\rpy2\robjects\functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in eval(expr, envir, enclos) : object 'write_csv'
..not found
Error in eval(expr, envir, enclos) : object 'write_csv' not found
2.
r('''
write_csv <- function(x,verbose=FALSE)
write.csv(x,file='testing.csv',row.names=FALSE)
''')
r['write_csv'](dataf)
3.
globalenv['dataf'] = dataf
r("write.csv(dataf,file='testing2.csv',row.names=FALSE)")
I was really hoping attempt 1. would have worked. It seemed I had reproduced the example in the docs base.rank(0, na_last = True), but I think something might have still been missing.

Python csv writer "AttributeError: __exit__" issue

I have three variables I want to write in a tab delimited .csv, appending values each time the script iterates over a key value from the dictionary.
Currently the script calls a command, regex the stdout as out then assigns the three defined regex groups to individual variables for writing to .csv labeled first second and third. I get a __exit_ error when I run the below script.
/note I've read up on csv.writer and I'm still confused as to whether I can actually write multiple variables to a row.
Thanks for any help you can provide.
import csv, re, subprocess
for k in myDict:
run_command = "".join(["./aCommand", " -r data -p ", str(k)])
process = subprocess.Popen(run_command,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = process.communicate()
errcode = process.returncode
pattern = re.compile('lastwrite|(\d{2}:\d{2}:\d{2})|alert|trust|Value')
grouping = re.compile('(?P<first>.+?)(\n)(?P<second>.+?)([\n]{2})(?P<rest>.+[\n])',
re.MULTILINE | re.DOTALL)
if pattern.findall(out):
match = re.search(grouping, out)
first = match.group('first')
second = match.group('second')
rest = match.group('rest')
with csv.writer(open(FILE, 'a')) as f:
writer = csv.writer(f, delimiter='\t')
writer.writerow(first, second, rest)
Edit: Requested in the comments to post entire traceback, note the line listed in traceback will not match the above code as this is not the entire script.
Traceback (most recent call last):
File "/mydir/pyrr.py", line 60, in <module>
run_rip()
File "/mydir/pyrr.py", line 55, in run_rip
with csv.writer(open('/mydir/ntuser.csv', 'a')) as f:
AttributeError: __exit__
Answer: Using the below comment I was able to write it as follows.
f = csv.writer(open('/mydir/ntuser.csv', 'a'),
dialect=csv.excel,
delimiter='\t')
f.writerow((first, second, rest))
The error is pretty clear. The with statement takes a context manager, i.e., an object with an __enter__ and an __exit__ method, such as the object returned by open. csv.writer does not provide such an object. You are also attempting to create the writer twice:
with open(FILE, 'a') as f:
writer = csv.writer(f, delimiter='\t')
writer.writerow(first, second, rest)
The with ... f: is like a try...except...finally that guarantees that f is closed no matter what happens, except you don't have to type it out. open(...) returns a context manager whose __exit__ method is called in that finally block you don't have to type. That is what your exception was complaining about. open returns an object that has __exit__ properly defined and can therefore handle normal exit and exceptions in the with block. csv.writer does not have such a method, so you can't use it in the with statement itself. You have to do it in the with block following the statement, as I've shown you.

scipy.integrate.quad ctypes function error "quadpack.error: quad: first argument is a ctypes function pointer with incorrect signature"

I am trying to use scipy.integrate.nquad with a ctypes function. I exactly followed the instruction on Faster integration using Ctypes.
ctypes integration can be done in a few simple steps:
Write an integrand function in C with the function signature double f(int n, double args[n]), where args is an array containing the arguments of the function f.
//testlib.c
double f(int n, double args[n])
{
return args[0] - args[1] * args[2]; //corresponds to x0 - x1 * x2
}
Now compile this file to a shared/dynamic library (a quick search will help with this as it is OS-dependent). The user must link any math libraries, etc. used. On Linux this looks like:
$ gcc -shared -o testlib.so -fPIC testlib.c
The output library will be referred to as testlib.so, but it may have a different file extension. A library has now been created that can be loaded into Python with ctypes.
Load shared library into Python using ctypes and set restypes and argtypes - this allows Scipy to interpret the function
correctly:
>>> import ctypes
>>> from scipy import integrate
>>> lib = ctypes.CDLL('/**/testlib.so') # Use absolute path to testlib
>>> func = lib.f # Assign specific function to name func (for simplicity)
>>> func.restype = ctypes.c_double
>>> func.argtypes = (ctypes.c_int, ctypes.c_double)
Note that the argtypes will always be (ctypes.c_int, ctypes.c_double) regardless of the number of parameters, and restype will always be ctypes.c_double.
Now integrate the library function as normally, here using nquad:
>>> integrate.nquad(func, [[0,10],[-10,0],[-1,1]])
(1000.0, 1.1102230246251565e-11)
However, at the final step, I didn't get the result of the integral, but the following errors instead:
>>> integrate.nquad(func,[[0,1.0],[-2.0,3.0],[1.0,2.0]])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/bfn097/apps/scipy/0.13.3_mkl-11.1.2_gcc-4.4.7/lib64/python/scipy/integrate/quadpack.py", line 618, in nquad
return _NQuad(func, ranges, opts).integrate(*args)
File "/home/bfn097/apps/scipy/0.13.3_mkl-11.1.2_gcc-4.4.7/lib64/python/scipy/integrate/quadpack.py", line 670, in integrate
value, abserr = quad(f, low, high, args=args, **opt)
File "/home/bfn097/apps/scipy/0.13.3_mkl-11.1.2_gcc-4.4.7/lib64/python/scipy/integrate/quadpack.py", line 254, in quad
retval = _quad(func,a,b,args,full_output,epsabs,epsrel,limit,points)
File "/home/bfn097/apps/scipy/0.13.3_mkl-11.1.2_gcc-4.4.7/lib64/python/scipy/integrate/quadpack.py", line 319, in _quad
return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
File "/home/bfn097/apps/scipy/0.13.3_mkl-11.1.2_gcc-4.4.7/lib64/python/scipy/integrate/quadpack.py", line 670, in integrate
value, abserr = quad(f, low, high, args=args, **opt)
File "/home/bfn097/apps/scipy/0.13.3_mkl-11.1.2_gcc-4.4.7/lib64/python/scipy/integrate/quadpack.py", line 254, in quad
retval = _quad(func,a,b,args,full_output,epsabs,epsrel,limit,points)
File "/home/bfn097/apps/scipy/0.13.3_mkl-11.1.2_gcc-4.4.7/lib64/python/scipy/integrate/quadpack.py", line 319, in _quad
return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
File "/home/bfn097/apps/scipy/0.13.3_mkl-11.1.2_gcc-4.4.7/lib64/python/scipy/integrate/quadpack.py", line 670, in integrate
value, abserr = quad(f, low, high, args=args, **opt)
File "/home/bfn097/apps/scipy/0.13.3_mkl-11.1.2_gcc-4.4.7/lib64/python/scipy/integrate/quadpack.py", line 254, in quad
retval = _quad(func,a,b,args,full_output,epsabs,epsrel,limit,points)
File "/home/bfn097/apps/scipy/0.13.3_mkl-11.1.2_gcc-4.4.7/lib64/python/scipy/integrate/quadpack.py", line 319, in _quad
return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
quadpack.error: quad: first argument is a ctypes function pointer with incorrect signature
I am using gcc-4.4.7,python 2.6.6, numpy-1.7.1, scipy-0.13.3
The type of the argument should be: ctypes.POINTER(ctypes.c_double)
But have you considered using cffi? Besides being faster than ctypes, you also don't have to hand-write the argument stuff, just copy the C declarations and let cffi parse them.

Different behaviour for io in pickle with string content

When working with pickled data I encountered a different behavior for the io.open and __builtin__.open. Consider the following simple example:
import pickle
payload = 'foo'
fn = 'test.pickle'
pickle.dump(payload, open(fn, 'w'))
a = pickle.load(open(fn, 'r'))
This works as expected. But running this code here:
import pickle
import io
payload = 'foo'
fn = 'test.pickle'
pickle.dump(payload, io.open(fn, 'w'))
a = pickle.load(io.open(fn, 'r'))
gives the following Traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "D:/**.py", line 15, in <module>
pickle.dump(payload, io.open(fn, 'w'))
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 1370, in dump
Pickler(file, protocol).dump(obj)
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 224, in dump
self.save(obj)
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 488, in save_string
self.write(STRING + repr(obj) + '\n')
TypeError: must be unicode, not str
As I want to be future-compatible, how can I circumwent this misbehavior? Or, what else am I doing wrong here?
I stumbled over this when dumping dictionaries with keys of type string.
My python version is:
'2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]'
The difference is not supprising, because io.open() explicitly deals with Unicode strings when using text mode. The documentation is quite clear about this:
Note: Since this module has been designed primarily for Python 3.x, you have to be aware that all uses of “bytes” in this document refer to the str type (of which bytes is an alias), and all uses of “text” refer to the unicode type. Furthermore, those two types are not interchangeable in the io APIs.
and
Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn’t. Files opened in binary mode (including 'b' in the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is included in the mode argument), the contents of the file are returned as unicode strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.
You need to open files in binary mode. The fact that it worked without with the built-in open() at all is actually more luck than wisdom; if your pickles contained data with \n and/or \r bytes the pickle loading may well fail. The Python 2 default pickle happens to be a text protocol but the output should still be considered as binary.
In all cases, when writing pickle data, use binary mode:
pickle.dump(payload, open(fn, 'wb'))
a = pickle.load(open(fn, 'rb'))
or
pickle.dump(payload, io.open(fn, 'wb'))
a = pickle.load(io.open(fn, 'rb'))

How to use list of strings as training data for svm using scikit.learn?

I am using scikit.learn to train an svm based on data where each observation (X) is a list of words. The tags for each observation (Y) are floating point values. I have tried following the example given in the scikit learn documentation (http://scikit-learn.org/stable/modules/svm.html) for Multi-class classification.
Here is my code:
from __future__ import division
from sklearn import svm
import os.path
import numpy
import re
'''
The stanford-postagger was included to see how it tags the words and to see if it would help in getting just the names
of the ingredients. Turns out its pointless.
'''
#from nltk.tag.stanford import POSTagger
mainDirectory = './nyu/PROJECTS/Epicurious/DATA/ingredients'
#st = POSTagger('/usr/share/stanford-postagger/models/english-bidirectional-distsim.tagger','/usr/share/stanford-postagger/stanford-postagger.jar')
'''
This is where we would reach each line of the file and then run a regex match on it to get all the words before
the first tab. (these are the names of the ingredients. Some of them may have adjectives like fresh, peeled,cut etc.
Not sure what to do about them yet.)
'''
def getFileDetails(_filename,_fileDescriptor):
rankingRegexMatch = re.match('([0-9](?:\_)[0-9]?)', _filename)
if len(rankingRegexMatch.group(0)) == 2:
ranking = float(rankingRegexMatch.group(0)[0])
else:
ranking = float(rankingRegexMatch.group(0)[0]+'.'+rankingRegexMatch.group(0)[2])
_keywords = []
for line in _fileDescriptor:
m = re.match('(\w+\s*\w*)(?=\t[0-9])', line)
if m:
_keywords.append(m.group(0))
return [_keywords,ranking]
'''
Open each file in the directory and pass the name and file descriptor to getFileDetails
'''
def this_is_it(files):
_allKeywords = []
_allRankings = []
for eachFile in files:
fullFilePath = mainDirectory + '/' + eachFile
f = open(fullFilePath)
XandYForThisFile = getFileDetails(eachFile,f)
_allKeywords.append(XandYForThisFile[0])
_allRankings.append(XandYForThisFile[1])
#_allKeywords = numpy.array(_allKeywords,dtype=object)
svm_learning(_allKeywords,_allRankings)
def svm_learning(x,y):
clf = svm.SVC()
clf.fit(x,y)
'''
This just prints the directory path and then calls the callback x on files
'''
def print_files( x, dir_path , files ):
print dir_path
x(files)
'''
code starts here
'''
os.path.walk(mainDirectory, print_files, this_is_it)
When the svm_learning(x,y) method is called, it throws me an error:
Traceback (most recent call last):
File "scan for files.py", line 72, in <module>
os.path.walk(mainDirectory, print_files, this_is_it)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.py", line 238, in walk
func(arg, top, names)
File "scan for files.py", line 68, in print_files
x(files)
File "scan for files.py", line 56, in this_is_it
svm_learning(_allKeywords,_allRankings)
File "scan for files.py", line 62, in svm_learning
clf.fit(x,y)
File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/svm/base.py", line 135, in fit
X = atleast2d_or_csr(X, dtype=np.float64, order='C')
File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/utils/validation.py", line 116, in atleast2d_or_csr
"tocsr")
File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/utils/validation.py", line 96, in _atleast2d_or_sparse
X = array2d(X, dtype=dtype, order=order, copy=copy)
File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/utils/validation.py", line 80, in array2d
X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
File "/Library/Python/2.7/site-packages/numpy-1.8.0.dev_bbcfcf6_20130307-py2.7-macosx-10.8-intel.egg/numpy/core/numeric.py", line 331, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
Can anyone help? I am new to scikit and could not find any help in the documentation.
You should take a look at: Text feature extraction. You are going to want to use either a TfidfVectorizer, a CountVectorizer, or a HashingVectorizer(if your data is very large). These components take your text in and output feature matrices that are acceptable to classifiers. Be advised that these work on lists of strings, with one string per example, so if you have a list of lists of strings (you have already tokenized), you may need to either join() the tokens to get a list of strings or skip tokenization.