pandas has no attribute read_html raspberry pi - python-2.7

import pandas as pd
f_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
So above script works fine when calling it directly in the python shell:
>>> import pandas as pd
>>> f_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
But does not work while doing python script.py with the error AttributeError: 'module' object has no attribute 'read_html'.
This is the same script but called in two different ways, so why does one work but the other not?

You need to update pandas, use:
pip install pandas==1.3

Related

Failed to find data source: delta in Python environment

Following: https://docs.delta.io/latest/quick-start.html#python
I have installed delta-spark and run:
from delta import *
builder = pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
spark = spark = configure_spark_with_delta_pip(builder).getOrCreate()
However when I run:
data = spark.range(0, 5)
data.write.format("delta").save("/tmp/delta-table")
the error states: delta not recognised
& if I run
DeltaTable.isDeltaTable(spark, "packages/tests/streaming/data")
It states: TypeError: 'JavaPackage' object is not callable
It seemed that I could run these commands locally (such as unit tests) without Maven or running it in a pyspark shell? It would be good to just see if I am missing a dependency?
You can just install delta-spark PyPi package using pip install delta-spark (it will pull pyspark as well), and then refer to it.
Or you can add a configuration option that will fetch Delta package. It's .config("spark.jars.packages", "io.delta:delta-core_2.12:<delta-version>"). For Spark 3.1 Delta versions is 1.0.0 (see releases mapping docs for more information).
I have an example of using Delta tables in unit tests (please note, that import statement is in the function definition because Delta package is loaded dynamically):
import pyspark
import pyspark.sql
import pytest
import shutil
from pyspark.sql import SparkSession
delta_dir_name = "/tmp/delta-table"
#pytest.fixture
def delta_setup(spark_session):
data = spark_session.range(0, 5)
data.write.format("delta").save(delta_dir_name)
yield data
shutil.rmtree(delta_dir_name, ignore_errors=True)
def test_delta(spark_session, delta_setup):
from delta.tables import DeltaTable
deltaTable = DeltaTable.forPath(spark_session, delta_dir_name)
hist = deltaTable.history()
assert hist.count() == 1
environment is initialized via pytest-spark:
[pytest]
filterwarnings =
ignore::DeprecationWarning
spark_options =
spark.sql.extensions: io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog: org.apache.spark.sql.delta.catalog.DeltaCatalog
spark.jars.packages: io.delta:delta-core_2.12:1.0.0
spark.sql.catalogImplementation: in-memory

how to pass current date in python when using os. system for copying files to a gcs location in python 2.7.5

I have a python script which moves file from local dir to a gs:// using os.system. I need to pass today's date to the filename in the gcs bucket.
Here is the script:
#!/usr/bin/python
import time
import requests
import csv
import json
import os
from datetime import date
#current_date = date.today()
def uploadfile2GCSraw():
current_date = date.today()
os. system('gsutil cp /u/y/XXXX/abcd.json gs://XXXX/XX/XX/CRE_DT=current_date')
Im very new to python, when i run the above, the file is created as cre_dt=current_date, as is. its not taking the date from date.today(). Can someone help? Thanks
When you have current_date on that final line, it's going to literally be the string current_date.
Try using an f-string, like this:
os.system(f"gsutil cp /u/y/XXXX/abcd.json gs://XXXX/XX/XX/CRE_DT={current_date}")
For older python 2, use this syntax:
os.system("gsutil cp /u/y/XXXX/abcd.json gs://XXXX/XX/XX/CRE_DT=%s"%(date.today()))
(And then upgrade to Python 3.)
Should do what you want.

Why can't I import WD_ALIGN_PARAGRAPH from docx.enum.text?

I transferred some code from IDLE 3.5 (64 bits) to pycharm (Python 2.7). Most of the code is still working, for example I can import WD_LINE_SPACING from docx.enum.text, but for some reason I can't import WD_ALIGN_PARAGRAPH.
At first, nearly non of the imports worked, but after I did
pip install python-docx
instead of
pip install docx
most of the imports worked except for WD_ALIGN_PARAGRAPH.
# works
from __future__ import print_function
import xlrd
import xlwt
import os
import subprocess
from calendar import monthrange
import datetime
from docx import Document
from datetime import datetime
from datetime import date
from docx.enum.text import WD_LINE_SPACING
from docx.shared import Pt
# does not work
from docx.enum.text import WD_ALIGN_PARAGRAPH
I don't get any error messages but Pycharm marks the line as error:
"Cannot find reference 'WD_ALIGN_PARAGRAPH' in 'text.py'".
You can use this instead:
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
and then substitute WD_PARAGRAPH_ALIGNMENT wherever WD_ALIGN_PARAGRAPH would have appeared before.
The reason this is happening is that the actual enum object is named WD_PARAGRAPH_ALIGNMENT, and a decorator is applied that also allows it to be referenced as WD_ALIGN_PARAGRAPH (which is a little shorter, and possibly clearer). I expect the syntax checker in PyCharm is operating on direct module attributes and doesn't pick up the alias, which is resolved by the Python parser/compiler.
Interestingly, I expect your code would work fine either way. But to get rid of the annoying message you can use the base name.
If someone uses pylint it can be easily suppressed with # pylint: disable=E0611 added at the end of the import line.

Shebang command to call script from existing script - Python

I am running a python script on my raspberry pi, at the end of which I want to call a second python script in the same directory. I call it using the os.system() command as shown in the code snippet below but get import errors. I understand this is because the system interprets the script name as a shell command and needs to be told to run it using python, using the shebang line at the beginning of my second script.
#!/usr/bin/env python
However doing so does not solve the errors
Here is the ending snippet from the first script:
# Time to Predict E
end3 = time.time()
prediction_time = end3-start3
print ("\nPrediction time: ", prediction_time, "seconds")
i = i+1
print (i)
script = '/home/pi/piNN/exampleScript.py'
os.system('"' + script + '"')
and here is the beginning of my second script:
'#!usr/bin/env python'
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
#from picamera import PiCamera
import argparse
import sys
import time
import numpy as np
import tensorflow as tf
import PIL.Image as Image
Any help is greatly appreciated :)
Since you have not posted the actual errors that you get when you run your code, this is my best guess. First, ensure that exampleScript.py is executable:
chmod +x /home/pi/piNN/exampleScript.py
Second, add a missing leading slash to the shebang in exampleScript.py, i.e. change
'#!usr/bin/env python'
to
'#!/usr/bin/env python'
The setup that you have here is not ideal.
Consider simply importing your other script (make sure they are in the same directory). Importing it will result in the execution of all executable python code inside the script that is not wrapped in if __name__ == "__main__":. While on the topic, should you need to safeguard some code from being executed, place it in there.
I have 2 python file a.py and b.py and I set execute permission for b.py with.
chmod a+x b.py
Below is my sample:
a.py
#!/usr/bin/python
print 'Script a'
import os
script = './b.py'
os.system('"' + script + '"')
b.py
#!/usr/bin/python
print 'Script b'
Execute "python a.py", the result is:
Script a
Script b

Upgraded to pandas 0.13.1, getting DeprecationWarning with sklearn

I'm using Python 2.7.3. I just upgraded from Pandas 0.12.0 to 0.13.1, and made no other changes. I upgraded to be able to use the new eval() method in the DataFrame class.
The following code runs perfectly, with no errors or warnings:
from numpy.random import randn
from pandas import DataFrame
df = DataFrame(randn(10, 2), columns=list('ab'))
df.eval('a + b')
However, if I import any classes from scikit-learn (version 0.14.1), the same code gives me a Deprecation warning. The following code:
from sklearn import naive_bayes
from numpy.random import randn
from pandas import DataFrame
df = DataFrame(randn(10, 2), columns=list('ab'))
df.eval('a + b')
gives me the following warning:
/usr/local/lib64/python2.7/site-packages/pandas/computation/ops.py:62:DeprecationWarning: object.__new__() takes no parameters
return supr_new(klass, name, env, side=side, encoding=encoding)
I'm using numpy version 1.6.2
What am I doing wrong?