I am working on a project to create a stream processing prediction engine on GCP. I am mostly learning from this repo here. However when I try to execute the script blogposts/got_sentiment/4_streaming_pipeline/streaming_tweet.py I keep getting error
NameError: name 'estimate' is not defined [while running 'generatedPtransform-129']
My function looks like as follows
from __future__ import absolute_import
import argparse
import datetime
import json
import logging
import numpy as np
import apache_beam as beam
import apache_beam.transforms.window as window
from apache_beam.io.gcp.bigquery import parse_table_schema_from_json
from apache_beam.options.pipeline_options import StandardOptions, GoogleCloudOptions, SetupOptions, PipelineOptions
from apache_beam.transforms.util import BatchElements
from googleapiclient import discovery
def init():
........
def estimate_cmle():
init()
.....
def estimate(instances):
estimate_cmle()
......
def run(argv=None):
....
output = (lines
| 'assign window key' >> beam.WindowInto(window.FixedWindows(10))
| 'batch into n batches' >> BatchElements(min_batch_size=49, max_batch_size=50)
| 'predict sentiment' >> beam.FlatMap(lambda messages: estimate(messages))
)
.....
f __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
run()
This is where beam seems to unable to to recognize the estimate function although I am creating it in the same script.
Edit
Trying with beam.FlatMap(estimate) gave error
name 'estimate_cmle' is not defined [while running 'generatedPtransform-1208']
Look at these 2 parts:
Function definition
def estimate(instances):
......
Function call
beam.FlatMap(lambda messages: estimate(messages,estimate_cmle))
Your call expect a function with 2 parameters, your declared function have only one. Your python script only contain an estimate with 1 parameter and the function with 2 parameters is not defined.
The examples, in the repo, the call contain only 1 parameter and thus it works. Fix this, it should work then
I am trying to understand how to import code from one file to another. I have two files file1.py and file2.py. I am running code in the first file, and have many variables and functions defined in the second file. I am using from file2 import * to import the code into file1.py. I have no problem using variables defined in file2.py in file1.py, but with functions I am getting NameError: name 'myfunc' is not defined when I try to use the function in file1.py. I can fix this problem by writing from file2 import myfunc, but I thought writing * would import everything from that file. What is the difference for functions versus variables?
I have tried to recreate the setup you have described and it is working OK for me. Hopefully this will give you an idea of how to get it to work.
# file1.py #####################################
import sys
sys.path.append("/home/neko/test/")
import file2
if __name__ == "__main__":
file2.testfunc()
# file2.py ######################################
testvar = 'hello'
def testfunc(): print testvar
For this test I was using python version 2.6.6
Both file1.py and file2.py is in /home/neko/test/
I would like to import a module from one location, unload it, and then import a module of the same name from another location in python. Something like:
sys.path.append(module_location_1)
import module
unload module
....
sys.path.append(module_location_2)
import module
I tried the following approach below but have had no luck:
sys.path.insert(0, /path1)
import my_module
print my_module # <module 'my_module' from '/path1/__init__.pyc'>
sys.path.insert(0, /path2)
import my_module
print my_module # still gives: <module 'my_module' from '/path1/__init__.pyc'
Unfortunately after the second input I see the the module is still be loaded from the original location I added to my path. I have tried:
1
removing the first location from sys.path all together between imports
imp.reload(my_module).
both appending and prepending to the path
Thanks!
I am trying to replicate a namespace kind of pattern in D. I would like to have one class per file and be able to use a single import statement to include all classes from a group of files.
in my root folder I have moduleExp.d
import foo.module;
import std.stdio;
void main()
{
writeln("hello world");
auto quakk = new Quakk();
quakk.write();
}
In my foo folder I have module1.d:
module foo.module1;
import std.stdio;
class Quakk
{
public string name = "thename";
public void write()
{
writeln(this.name);
}
}
Also in foo folder, I have module.d which I thought I could use to publicly import all my class modules into so they can be used in moduleExp.d of the root folder.
module foo.module;
public import foo.module1;
When I try to compile on Windows from the root folder with rdmd moduleExp, I get errors:
moduleExp.d(1): Error: identifier expected following package
moduleExp.d(1): Error: ';' expected
Failed: ["dmd", "-v", "-o-", "moduleExp.d", "-I."]
I'm not sure what I'm doing wrong here. if I change line 1 in module.Exp.d to import foo.module1; then everything works.
module is a reserved keyword that you shouldn't use as a module name. And what you are trying to do exists as a feature in dlang called package modules. Look in http://dlang.org/module.html for documentation.
So basically, you have a module under the folder foo, its name should be package.d, where you publicly import the modules within foo and any other modules that you want. Example:
module foo; // notice the module name is used
// for the package module and not foo.package
public import foo.module1;
public import foo.module2;
...
And now when you want to import the package, just write import foo.
If I run the following command:
>python manage.py test
Django looks at tests.py in my application, and runs any doctests or unit tests in that file. It also looks at the __ test __ dictionary for extra tests to run. So I can link doctests from other modules like so:
#tests.py
from myapp.module1 import _function1, _function2
__test__ = {
"_function1": _function1,
"_function2": _function2
}
If I want to include more doctests, is there an easier way than enumerating them all in this dictionary? Ideally, I just want to have Django find all doctests in all modules in the myapp application.
Is there some kind of reflection hack that would get me where I want to be?
I solved this for myself a while ago:
apps = settings.INSTALLED_APPS
for app in apps:
try:
a = app + '.test'
__import__(a)
m = sys.modules[a]
except ImportError: #no test jobs for this module, continue to next one
continue
#run your test using the imported module m
This allowed me to put per-module tests in their own test.py file, so they didn't get mixed up with the rest of my application code. It would be easy to modify this to just look for doc tests in each of your modules and run them if it found them.
Use django-nose since nose automatically find all tests recursivelly.
Here're key elements of solution:
tests.py:
def find_modules(package):
"""Return list of imported modules from given package"""
files = [re.sub('\.py$', '', f) for f in os.listdir(os.path.dirname(package.__file__))
if f.endswith(".py") and os.path.basename(f) not in ('__init__.py', 'test.py')]
return [imp.load_module(file, *imp.find_module(file, package.__path__)) for file in files]
def suite(package=None):
"""Assemble test suite for Django default test loader"""
if not package: package = myapp.tests # Default argument required for Django test runner
return unittest.TestSuite([doctest.DocTestSuite(m) for m in find_modules(package)])
To add recursion use os.walk() to traverse module tree and find python packages.
Thanks to Alex and Paul. This is what I came up with:
# tests.py
import sys, settings, re, os, doctest, unittest, imp
# import your base Django project
import myapp
# Django already runs these, don't include them again
ALREADY_RUN = ['tests.py', 'models.py']
def find_untested_modules(package):
""" Gets all modules not already included in Django's test suite """
files = [re.sub('\.py$', '', f)
for f in os.listdir(os.path.dirname(package.__file__))
if f.endswith(".py")
and os.path.basename(f) not in ALREADY_RUN]
return [imp.load_module(file, *imp.find_module(file, package.__path__))
for file in files]
def modules_callables(module):
return [m for m in dir(module) if callable(getattr(module, m))]
def has_doctest(docstring):
return ">>>" in docstring
__test__ = {}
for module in find_untested_modules(myapp.module1):
for method in modules_callables(module):
docstring = str(getattr(module, method).__doc__)
if has_doctest(docstring):
print "Found doctest(s) " + module.__name__ + "." + method
# import the method itself, so doctest can find it
_temp = __import__(module.__name__, globals(), locals(), [method])
locals()[method] = getattr(_temp, method)
# Django looks in __test__ for doctests to run
__test__[method] = getattr(module, method)
I'm not up to speed on Djano's testing, but as I understand it uses automatic unittest discovery, just like python -m unittest discover and Nose.
If so, just put the following file somewhere the discovery will find it (usually just a matter of naming it test_doctest.py or similar).
Change your_package to the package to test. All modules (including subpackages) will be doctested.
import doctest
import pkgutil
import your_package as root_package
def load_tests(loader, tests, ignore):
modules = pkgutil.walk_packages(root_package.__path__, root_package.__name__ + '.')
for _, module_name, _ in modules:
try:
suite = doctest.DocTestSuite(module_name)
except ValueError:
# Presumably a "no docstrings" error. That's OK.
pass
else:
tests.addTests(suite)
return tests