Trying to figure out why I can't get Lambda to work with writing a file.
Here is my code:
list = (i*2 for i in range(10))
file = open("text.txt", "a")
lambda i: (file.write(str(i) + "\n")), list
and I don't even get to close the file before I get following error message:
(<function <lambda> at 0x021E1B30>, <generator object <genexpr> at 0x021EC418>)
Based on your comment, I guess what you're trying to do is:
map(lambda i: file.write(str(i) + "\n"), list)
However, I think the code you were apparently trying to avoid would be more readable:
for i in list:
file.write(str(i) + "\n")
(Also, take Peter's note to heart... using list as a variable will come back to haunt you someday when you try to use the list() built-in function and find it's been overridden.)
Related
I'm currently using a not-very-Scala-like approach to parse large Unix mailbox files. I'm still learning the language and would like to challenge myself to find a better way, however, I do not believe I have a solid grasp on just what can be done with an Iterator and how to effectively use it.
I'm currently using org.apache.james.mime4j, and I use the org.apache.james.mime4j.mboxiterator.MboxIterator to get a java.util.Iterator from a file, as so:
// registers an implementation of a ContentHandler that
// allows me to construct an object representing an email
// using callbacks
val handler: ContentHandler = new MyHandler();
// creates a parser that parses a SINGLE email from a given InputStream
val parser: MimeStreamParser = new MimeStreamParser(configBuilder.build());
// register my handler
parser.setContentHandler(handler);
// Get a java.util.Iterator
val iterator = MboxIterator.fromFile(fileName).build();
// For each email, process it using above Handler
iterator.forEach(p => parser.parse(p.asInputStream(Charsets.UTF_8)))
From my understanding, the Scala Iterator is much more robust, and probably a lot more capable of handling something like this, especially because I won't always be able to fit the full file in memory.
I need to construct my own version of the MboxIterator. I dug through the source for MboxIterator and was able to find a good RegEx pattern to use to determine the beginning of individual email messages with, however, I'm drawing a blank from now on.
I created the RegEx like so:
val MESSAGE_START = Pattern.compile(FromLinePatterns.DEFAULT, Pattern.MULTILINE);
What I want to do (based on what I know so far):
Build a FileInputStream from an MBOX file.
Use Iterator.continually(stream.read()) to read through the stream
Use .takeWhile() to continue to read until the end of the stream
Chunk the Stream using something like MESSAGE_START.matcher(someString).find(), or use it to find the indexes the separate the message
Read the chunks created, or read the bits in between the indexes created
I feel like I should be able to use map(), find(), filter() and collect() to accomplish this, but I'm getting thrown off by the fact that they only give me Ints to work with.
How would I accomplish this?
EDIT:
After doing some more thinking on the subject, I thought of another way to describe what I think I need to do:
I need to keep reading from the stream until I get a string that matches my RegEx
Maybe group the previously read bytes?
Send it off to be processed somewhere
Remove it from the scope somehow so it doesn't get grouped the next time I run into a match
Continue to read the stream until I find the next match.
Profit???
EDIT 2:
I think I'm getting closer. Using a method like this gets me an iterator of iterators. However, there are two issues: 1. Is this a waste of memory? Does this mean everything gets read into memory? 2. I still need to figure out a way to split by the match, but still include it in the iterator returned.
def split[T](iter: Iterator[T])(breakOn: T => Boolean):
Iterator[Iterator[T]] =
new Iterator[Iterator[T]] {
def hasNext = iter.hasNext
def next = {
val cur = iter.takeWhile(!breakOn(_))
iter.dropWhile(breakOn)
cur
}
}.withFilter(l => l.nonEmpty)
If I understand correctly, you want to lazily chunk a large file delimited by a regex recognizable pattern.
You could try to return an Iterator for each request but the correct iterator management would not be trivial.
I'd be inclined to hide all file and iterator management from the client.
class MBox(filePath :String) {
private val file = io.Source.fromFile(filePath)
private val itr = file.getLines().buffered
private val header = "From .+ \\d{4}".r //adjust to taste
def next() :Option[String] =
if (itr.hasNext) {
val sb = new StringBuilder()
sb.append(itr.next() + "\n")
while (itr.hasNext && !header.matches(itr.head))
sb.append(itr.next() + "\n")
Some(sb.mkString)
} else {
file.close()
None
}
}
testing:
val mbox = new MBox("so.txt")
mbox.next()
//res0: Option[String] =
//Some(From MAILER-DAEMON Fri Jul 8 12:08:34 2011
//some text AAA
//some text BBB
//)
mbox.next()
//res1: Option[String] =
//Some(From MAILER-DAEMON Mon Jun 8 12:18:34 2012
//small text
//)
mbox.next()
//res2: Option[String] =
//Some(From MAILER-DAEMON Tue Jan 8 11:18:14 2013
//some text CCC
//some text DDD
//)
mbox.next() //res3: Option[String] = None
There is only one Iterator per open file and only the safe methods are invoked on it. The file text is realized (loaded) only on request and the client gets just what's requested, if available. Instead of all lines in one long String you could return each line as part of a collection, Seq[String], if that's more applicable.
UPDATE: This can be modified for easy iteration.
class MBox(filePath :String) extends Iterator[String] {
private val file = io.Source.fromFile(filePath)
private val itr = file.getLines().buffered
private val header = "From .+ \\d{4}".r //adjust to taste
def next() :String = {
val sb = new StringBuilder()
sb.append(itr.next() + "\n")
while (itr.hasNext && !header.matches(itr.head))
sb.append(itr.next() + "\n")
sb.mkString
}
def hasNext: Boolean =
if (itr.hasNext) true else {file.close(); false}
}
Now you can .foreach(), .map(), .flatMap(), etc. But you can also do dangerous things like .toList which will load the entire file.
Custom Keyword written in python 2.7:
#keyword("Update ${filename} with ${properties}")
def set_multiple_test_properties(self, filename, properties):
for each in values.split(","):
each = each.replace(" ", "")
key, value = each.split("=")
self.set_test_properties(filename, key, value)
When we send paremeters in a single line as shown below, its working as expected:
"Update sample.txt with "test.update=11,timeout=20,delay.seconds=10,maxUntouchedTime=10"
But when we modify the above line with a new lines (for better readability) it's not working.
Update sample.txt with "test.update = 11,
timeout=20,
delay.seconds=10,
maxUntouchedTime=10"
Any clue on this please?
I am not very sure whether it will work or not, but please try like this
Update sample.txt with "test.update = 11,
... timeout=20,
... delay.seconds=10,
... maxUntouchedTime=10"
Your approach is not working, cause the 2nd line is considered a call to a keyword (called "timeout=20,"), the 3rd another one, and so on. The 3 dots don't work cause they are "cell separators" - delimiter b/n arguments.
If you are going for readability, you can use the Catenate kw (it's in the Strings library):
${props}= Catenate SEPARATOR=${SPACE}
... test.update = 11,
... timeout=20,
... delay.seconds=10,
... maxUntouchedTime=10
, and then call your keyword with that variable:
Update sample.txt with "${props}"
btw, a) I think your keyword declaration in the decorator is without the double quotes - i.e. called like that ^ they'll be treated as part of the argument's value, b) there seems to be an error in the py method - the argument's name is "properties" while the itterator uses "values", and c) you might want to consider using named varargs (**kwargs in python, ${kwargs} in RF syntax) for this purpose (sorry, offtopic, but couldn't resist :)
I am new to Python and am trying to get my ahead around parsing SSE client code. I am using the SSE Client library. My code is very basic and follows the sample exactly. Here it is:
from sseclient import SSEClient
devID = "xxx"
AToken = "xxx"
sparkURL = 'https://api.spark.io/v1/devices/' + devID + '/events/?access_token=' + AToken
messages = SSEClient(sparkURL)
for msg in messages:
print(msg)
print(type(msg))
The code runs without a problem and I see some blank lines and SSE data coming through. Here is the sample output:
<class 'sseclient.Event'>
{"data":"0 days, 0:54:43","ttl":"60","published_at":"2015-04-09T22:43:52.084Z","coreid":"xxxx"}
<class 'sseclient.Event'>
<class 'sseclient.Event'>
{"data":"0 days, 0:55:3","ttl":"60","published_at":"2015-04-09T22:44:12.092Z","coreid":"xxx"}
<class 'sseclient.Event'>
The actual output above looks like a dictionary, but its type is "sseclient.Event". I am trying to figure out how to parse the output so I can pull out one of the fields and nothing I have tried has worked.
Sorry if this is basic questions, but can someone provide some simple guidance on how I would either convert the entire output to a dictionary or perhaps just pull out one of the fields?
Thank you in advance!
I figured this out. In case anyone else experiences the same problem, here is how I got it to work. The key was using msg.data and not just msg. I then converted the out using the JSON library and am good to go.
messages = SSEClient(sparkURL)
for msg in messages:
outputMsg = msg.data
if type(outputMsg) is not str:
outputJS = json.loads(outputMsg)
FilterName = "data"
#print( FilterName, outputJS[FilterName] )
print(outputJS[FilterName])
I am trying to write a query that simply drops a table.
let drop_table dbh table_name =
let query = String.concat " " ["drop table"; table_name] in
PGSQL(dbh) query
I am receiving the following error from the query
File "save.ml", line 37, characters 10-11:
Parse error: STRING _ expected after ")" (in [expr])
File "save.ml", line 1:
Error: Preprocessor error
Why am I getting this error? It appears that this function is valid Ocaml syntax.
Thanks guys!
You cannot construct query when using PG'OCaml's syntax extension. You must provide a literal string. This is the tradeoff for getting PG'Ocaml's compile time query validation. If query could be any OCaml expression, PG'OCaml wouldn't know how to validate it at compile time.
Personally, I've stopped using the syntax extension completely. My feeling is it doesn't scale to large projects. Instead I call prepare and execute directly. For example, this function will create a new database connection (assuming the connection parameters are previously defined), run the given query, and close the connection:
let exec query =
let db = PGOCaml.connect ~host ~user ~database ~port ~password ()
PGOCaml.prepare db ~query ();
let ans = PGOCaml.execute db ~params:[] () in
PGOCaml.close db;
ans
Of course, this isn't a robust implementation and shouldn't be used in production code. It doesn't handle errors and isn't asynchronous.
So I'm doing something wrong in this python script, but it's becoming convoluted and I'm losing sight of what I'm doing wrong.
I want a script to go through a file, find all the function definitions, and then pull out the name, return type, and parameters of the function, and output a "doxygen" style comment like this:
/******************************************************************************/
/*!
\brief
Main function for the file
\return
The exit code for the program
*/
/******************************************************************************/
But I'm doing something wrong with the regular expression in trying to parse the parameters... Here is the script so far:
import re
import sys
f = open(sys.argv[1])
functions = []
for line in f:
match = re.search(r'([\w]+)\s+([\S]+)\(([\w+\s+\w+])+\)',line)
if line.find("\\fn") < 0:
if match:
returntype = match.group(1)
funcname = match.group(2)
print '/********************************************************************'
print " \\fn " + match.group()
print ''
print ' \\brief'
print ' Function description for ' + funcname
print ''
if len(match.groups()) > 2:
params = []
count = len(match.groups()) - 2
while count > 0:
matchingstring = match.group(count + 2)
if matchingstring.find("void") < 0:
params.append(matchingstring)
count -= 1
for parameter in params:
print " \\param " + parameter
print ' Description of ' + parameter
print ''
print ' \\return'
print ' ' + returntype
print '********************************************************************/'
print ''
Any help would be appreciated. Thanks
The grammar of C++ is far too complex to be handled by simple
regular expressions. You'll need at least a minimal parser.
I've found that for restricted cases, where I'm not concerned
with C++ in general, but only my own style, I can often get away
with a flex based tokenizer and a simple state machine. This
will fail in many cases of legal C++—for starters, of
course, if someone uses the pre-processor to modify the syntax;
but also because < can have different meanings, depending on
what precedes it names a template or not. But it's often
adequate for a specific job.
I've used a PEG parser with great success when trying to do simple format parsing. pyPeg is a very simple implementation of such a parser written in Python.
Example Python code for C++ function parser:
EDIT: Address template parameters. Tested with input from SK-logic and output is correct.
import pyPEG
from pyPEG import parseLine
import re
def symbol(): return re.compile(r"[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ&*][\w:]+")
def type(): return symbol
def functionName(): return symbol
def templatedType(): return symbol, "<", -1, [templatedType, symbol, ","], ">"
def parameter(): return [templatedType, type], symbol
def template(): return "<", -1, [symbol, template], ">"
def function(): return [type, templatedType], functionName, -1, template, "(", -1, [",", parameter], ")" # -1 -> zero or more repetitions.
sourceCode = "std::string foobar(std::vector<int> &A, std::map<std::string, std::vector<std::string> > &B)"
results = parseLine(sourceCode, function(), [], packrat=True)
When this is executed results is:
([(u'type', [(u'symbol', 'std::string')]), (u'functionName', [(u'symbol', 'foobar')]), (u'parameter', [(u'templatedType', [(u'symbol', 'std::vector'), (u'symbol', 'int')]), (u'symbol', '&A')]), (u'parameter', [(u'templatedType', [(u'symbol', 'std::map'), (u'symbol', 'std::string'), (u'templatedType', [(u'symbol', 'std::vector'), (u'symbol', 'std::string')])]), (u'symbol', '&B')])], '')
C++ cannot really be parsed by a (sane) regular expression: they are a nightmare as soon as nesting is concerned.
There is another concern too, determining when to parse and when not to. A function may be declared:
at file scope
in a namespace
in a class
And the two last can be nested at arbitrary depths.
I would propose to use CLang here. It's a real C++ front-end with a full-featured parser and there are:
a C API, with (notably) an API to the Indexing Library
Python bindings on top of the C API
The C API and Python bindings are far from fully exposing the underlying C++ model, but for a task as simple as listing functions it should be enough.
That said, I would question the usefulness of the project: if the documentation can be generated by a simple parser, then it is redundant with the code. And redundancy is at best, useless, and worst dangerous: it introduces the potential threat of desynchronization...
If the function is tricky enough that its use requires documentation, then a developer, who knows the limitations and al, has to write this documentation.