Error Import Parquet in Windows - python-2.7

I'm trying to install Parquet file format to use it with Apache Spark. I learned that I had to install Thrift, ThriftPy, and Python-Snappy in order to fully install Parquet.
I install Thrift using the command
pip install thrift
Then I installed python-snappy manually through a wheel file found here. This was because I was unable to install python-snappy automatically. Anyways, python-snappy was successfully installed.
I also installed ThrifPy using the similar command
pip install ThriftPy
And finally, I used pip to install parquet which was successful. After installing, when I try to import parquet, it raises the error as
---------------------------------------------------------------------------
ThriftParserError Traceback (most recent call last)
<ipython-input-55-942008defa53> in <module>()
----> 1 import parquet
C:\anaconda2\lib\site-packages\parquet\__init__.py in <module>()
17 from thriftpy.protocol.compact import TCompactProtocolFactory
18
---> 19 from . import encoding
20 from . import schema
21 from .converted_types import convert_column
C:\anaconda2\lib\site-packages\parquet\encoding.py in <module>()
17
18 THRIFT_FILE = os.path.join(os.path.dirname(__file__), "parquet.thrift")
---> 19 parquet_thrift = thriftpy.load(THRIFT_FILE,
module_name=str("parquet_thrift")) # pylint: disable=invalid-name
20
21 logger = logging.getLogger("parquet") # pylint: disable=invalid-name
C:\anaconda2\lib\site-packages\thriftpy\parser\__init__.pyc in load(path,
module_name, include_dirs, include_dir)
28 real_module = bool(module_name)
29 thrift = parse(path, module_name, include_dirs=include_dirs,
---> 30 include_dir=include_dir)
31
32 if real_module:
C:\anaconda2\lib\site-packages\thriftpy\parser\parser.pyc in parse(path,
module_name, include_dirs, include_dir, lexer, parser, enable_cache)
494 raise ThriftParserError('ThriftPy does not support generating
module '
495 'with path in protocol \'{}\''.format(
--> 496 url_scheme))
497
498 if module_name is not None and not module_name.endswith('_thrift'):
ThriftParserError: ThriftPy does not support generating module with path in
protocol 'c'
Would someone tell me what I'm doing wrong ?
For reference, I'm using anaconda Python 2.7 on a Jupyter notebook. My OS is Windows 7 , and I'm using Spark on a single cluster.

Modify the parser code in the windows side. It is in site-packages of python.
For Example:- C:\Anaconda\python27\Lib\site-packages\thriftpy\parser\parser.py
Modify the 488 line:
#if url_scheme == '':
if len(url_scheme) <= 1:
Then try again.

Related

Error in reading html to data frame in Python “html5lib not found”

I've come accross the following error about html5lib when trying to read an html data frame.
Here is the code:
!pip install html5lib
!pip install lxml
!pip install beautifulSoup4
import html5lib
import lxml
from bs4 import BeautifulSoup
table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")
This is the error:
ImportError Traceback (most recent call last)
<ipython-input-68-e24654a0a301> in <module>()
----> 1 table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")
/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na)
913 thousands=thousands, attrs=attrs, encoding=encoding,
914 decimal=decimal, converters=converters, na_values=na_values,
--> 915 keep_default_na=keep_default_na)
/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parse(flavor, io, match, attrs, encoding, **kwargs)
737 retained = None
738 for flav in flavor:
--> 739 parser = _parser_dispatch(flav)
740 p = parser(io, compiled_match, attrs, encoding)
741
/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parser_dispatch(flavor)
680 if flavor in ('bs4', 'html5lib'):
681 if not _HAS_HTML5LIB:
--> 682 raise ImportError("html5lib not found, please install it")
683 if not _HAS_BS4:
684 raise ImportError(
ImportError: html5lib not found, please install it
Any help would be much appreciated.
Thanks
If you read the error message, you don't have html5lib installed. Do:
pip install html5lib
in your terminal.
If you are calling from jupyter notebook (just like you did with !), try to restart the kernel in order to have the packages loaded.
I had this exact error show up while trying to read a saved .htm file using Spyder IDE.
This code displayed html5lib error:
import pandas as pd
df = pd.read_html("F:\xxxx\xxxxx\xxxxx\aaaa.htm")
I knew I had html5lib installed and working correctly because I had other scripts that worked.
For whatever reason, file path needed to be a string literal (putting an r in front of the file path).
This code works for me:
import pandas as pd
df = pd.read_html(r"F:\xxxx\xxxxx\xxxxx\aaaa.htm")
I ran into this error when I gave the wrong path to the local file I was trying to open. So also be sure that you're pointing to the right place!

Running Python script through command, Syntax Error

I use a Raspberry Pi 3, and I'm trying to run this Python script through the command terminal:
import RPi.GPIO as GPIO
from time import sleep
GPIO.setmode(GPIO.BOARD)
Motor1A = 16
Motor1B = 18
Motor1E = 22
GPIO.setup(Motor1A,GPIO.OUT)
GPIO.setup(Motor1B,GPIO.OUT)
GPIO.setup(Motor1E,GPIO.OUT)
print "Turning motor on"
GPIO.output(Motor1A,GPIO.HIGH)
GPIO.output(Motor1B,GPIO.LOW)
GPIO.output(Motor1E,GPIO.HIGH)
sleep(2)
print "Stopping motor"
GPIO.output(Motor1E,GPIO.LOW)
GPIO.cleanup()
The file is called "motor1.py". It is located in the "/home/pi" directory.
When I type sudo python motor1.py, the terminal responds like so:
pi#raspberrypi:~ $ sudo python motor1.py
File "motor.py", line 1
Python 2.7.9 (default, Sep 17 2016, 20:26:04)
^
SyntaxError: invalid syntax
Can anyone tell me why? Thank a lot.
Johnny

Python 2: Type error "only integer scalar arrays can be converted to a scalar index" using pd.read() with neo.Spike2IO

I have code to load in Spike2 .smr files and read them in Jupyter. My code was working fine 2 days ago and now, with absolutely no change on either the file that is loaded in or the code that loads it in, it is not working. The problem code is as follows...
Cell 1 Input (to show the versions of my packages):
import sys
print("Python version: {}\n\nPackages versions: ".format(sys.version))
# which package versions are installed?
import pip
all_packages = pip.get_installed_distributions()
used_packages = ["matplotlib", "neo", "numpy", "OpenElectrophy", "os", "pandas",
"pylab", "scipy"]
for entry in used_packages:
for p in all_packages:
if entry in str(p):
print(str(p))
Cell 1 Output:
Python version: 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
Packages versions:
matplotlib 1.4.3
matplotlib-venn 0.11.3
neo 0.3.3
numpy 1.12.0
pycosat 0.6.1
nose 1.3.7
backports.ssl-match-hostname 3.5.0.1
pandas 0.19.2
scipy 0.15.1
Cell 2 Input (load in my modules):
import pylab
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats as st
import os
import tables
import neo
import scipy.signal as sg
from scipy import interpolate as inter
import h5py as h
import quantities as q
plt.style.use('ggplot')
pd.options.display.max_rows = 999
%matplotlib inline
Now, I load in the Spike2 .smr file with:
r = neo.Spike2IO("Rawdata/143-16/nerve.smr").read()[0]
and get the following type error:
TypeError Traceback (most recent call last)
<ipython-input-3-f81fd520a4c5> in <module>()
----> 1 r = neo.Spike2IO("Rawdata/143-16/nerve.smr").read()[0]
/home/wolverine/anaconda/lib/python2.7/site-packages/neo/io/baseio.pyc in read(self, lazy, cascade, **kargs)
107 if not cascade:
108 return bl
--> 109 seg = self.read_segment(lazy=lazy, cascade=cascade, **kargs)
110 bl.segments.append(seg)
111 create_many_to_one_relationship(bl)
/home/wolverine/anaconda/lib/python2.7/site-packages/neo/io/spike2io.pyc in read_segment(self, take_ideal_sampling_rate, lazy, cascade)
120 if channelHeader.kind in [1, 9]:
121 #~ print 'analogChanel'
--> 122 anaSigs = self.readOneChannelContinuous( fid, i, header, take_ideal_sampling_rate, lazy = lazy)
123 #~ print 'nb sigs', len(anaSigs) , ' sizes : ',
124 for anaSig in anaSigs :
/home/wolverine/anaconda/lib/python2.7/site-packages/neo/io/spike2io.pyc in readOneChannelContinuous(self, fid, channel_num, header, take_ideal_sampling_rate, lazy)
240
241 anaSigs = [ ]
--> 242 if channelHeader.unit in unit_convert:
243 unit = pq.Quantity(1, unit_convert[channelHeader.unit] )
244 else:
/home/wolverine/anaconda/lib/python2.7/site-packages/neo/io/spike2io.pyc in __getattr__(self, name)
444 else:
445 l = np.fromstring(self.array[name][0], 'u1')
--> 446 return self.array[name][1:l+1]
447 else:
448 return self.array[name]
TypeError: only integer scalar arrays can be converted to a scalar index
The "neo.Spike2IO("filename.smr") works fine, but as soon as I add the "read()[0]" part, that is when I get the TypeError. I read up on this type error and the only answers I saw were that the file could be corrupt. I deleted my local file and re-downloaded it and also downloaded another similar file just in case the master file for the other one was corrupt. I retried my code on these two new files and received the Type Error code for both. As stated before, the code was working flawlessly just two days ago and now it won't load any .smr file. I went through and updated all of my modules and pip and anaconda, all of this did not help.
Here is a link to a short sample .smr file (only 3.1 MB) that I cut for sharing purposes. It also gives the Type Error. Any ideas? Thank you.
I solved this issue by further updating my modules and Anaconda itself (and all of its respective modules). Something must have reverted to an older version.
The code to update every package in Anaconda is:
conda update --all
Further help can be found here at the Conda homepage. Shutting down, then restarting your computer can also help to ensure that all of these updates are implemented.

Python 2.7 pickle won't recognize numpy multiarray

I need to load a set of pickled data from a collaborator. Problem is, it seems I need multiarray for this. My code is as below:
f = open('data.p', 'rb')
a = pickle.load(f)
And here is the error message.
ImportError Traceback (most recent call last)
<ipython-input-3-17918c47ae2d> in <module>()
----> 1 a = pk.load(f)
/usr/lib/python2.7/pickle.pyc in load(file)
1382
1383 def load(file):
-> 1384 return Unpickler(file).load()
1385
1386 def loads(str):
/usr/lib/python2.7/pickle.pyc in load(self)
862 while 1:
863 key = read(1)
--> 864 dispatch[key](self)
865 except _Stop, stopinst:
866 return stopinst.value
/usr/lib/python2.7/pickle.pyc in load_global(self)
1094 module = self.readline()[:-1]
1095 name = self.readline()[:-1]
-> 1096 klass = self.find_class(module, name)
1097 self.append(klass)
1098 dispatch[GLOBAL] = load_global
/usr/lib/python2.7/pickle.pyc in find_class(self, module, name)
1128 def find_class(self, module, name):
1129 # Subclasses may override this
-> 1130 __import__(module)
1131 mod = sys.modules[module]
1132 klass = getattr(mod, name)
ImportError: No module named multiarray
I thought it was the problem of the compiled numpy in my computer. So I uninstalled the numpy from my Arch Linux repo and installed the numpy through
sudo -H pip2 install numpy
Yet the problem persist. I have checked the folder $PACKAGE-SITE/numpy/core, multiarray.so is in it. And I have no idea why pickle can't load the module.
How can I solve the problem? What else do I need to do?
PS1. I am using Arch Linux. And tried all versions of python 2.7 since last year October. None of them works.
PS2. Since the problem is with the loading step. I suspect the problem being more likely from internal conflicts of python rather than from the data file.
Thanks to #MikeMcKems, the problem is now solved.
The issue is caused by different special symbols used MS Windows and Linux(eg. end of line symbol). My collaborator was using Windows machine, and saved the data with
pickle.dump(obj, 'filename', 'w')
The data was saved in plain text with a lot of special symbols in it. And when I load the data with my Linux machine, the symbols were misintepreted hence causing the problem.
The easiest way to solve it is to find a Windows machine, load the data with
a=pickle.load(open('filename_in', 'r'))
Then output with binary form
pickle.dump(a, open('filename_out', 'wb'))
Since binary data is universally recognized as long as you use pickle to read it, the file filename_out is easily recognizable by Python in linux.

Python easygui , global name 'Tk' is not defined

I am trying to execute the following code snippet:
import easygui
from Tkinter import *
easygui.msgbox('Hello')
but it returns the following error:
NameError Traceback (most recent call last)
<ipython-input-35-28d6ffa54e48> in <module>()
----> 1 easygui.msgbox('Hello')
/usr/local/lib/python2.7/dist-packages/easygui/boxes/derived_boxes.pyc in msgbox(msg, title, ok_button, image, root)
214 root=root,
215 default_choice=ok_button,
--> 216 cancel_choice=ok_button)
217
218
/usr/local/lib/python2.7/dist-packages/easygui/boxes/base_boxes.pyc in buttonbox(msg, title, choices, image, root, default_choice, cancel_choice)
64 boxRoot.withdraw()
65 else:
---> 66 boxRoot = Tk()
67 boxRoot.withdraw()
68
NameError: global name 'Tk' is not defined
I tried troubleshooting with various combinations of importing Tkinter -
import Tkinter as Tk
import Tkinter
from Tkinter import *
but none of them work. I have the latest version of both packages installed. What is wrong?
Maybe this helps.
Having a file named 'Tkinter' in the same directory as your script causes to import this file rather than Tkinter itself
If you are using 3+ Python then its "tkinter" not "Tkinter"
If thats not the problem then check your path to tkinter which should be
"\python??\lib\tkinter"
?? being the version you have and tkinter will be a folder of many py files.
Python is case sensitive.