Arrow Java ListVector writeBatch and read get empty list - apache-arrow

I have a ListVector and value [[1,2,3,4,5]] in VectorSchemaRoot and I could see its value in IDEA.
I use following code to write the VectorSchemaRoot variable and get the byte array
val out = new ByteArrayOutputStream()
val writer = new ArrowStreamWriter(vectorSchemaRoot, null, out)
writer.start()
writer.writeBatch()
writer.end()
out.close()
val byteArr = out.toByteArray
And read back
val allocator = new RootAllocator(Int.MaxValue)
val reader = new ArrowStreamReader(new ByteArrayInputStream(byteArr), allocator)
while (reader.loadNextBatch()) {
val schemaRoot = reader.getVectorSchemaRoot
schemaRoot
}
The schema is correct, but the list is empty []
However, I use other types of values, like char, bit, the result read from the byteArr is correct(non-empty).
How to fix the ListVector empty issue?

Finally I used just basic classes.
The StructVector, ListVector are complex classes, and according to my test, they do not bring speed or memory benefit over just using basic classes. And the documents for complex classes are very few.
Thus basic classes is recommended. And just use List of Fields to make the schema of them, could also get the structured vector.

Related

list of lists of dictionaries?

I need to create a structure, in my mind similar to an array of linked lists (where a python list = array and dictionary = linked list). I have a list called blocks, and this is something like what I am looking to make:
blocks[0] = {dictionary},{dictionary},{dictionary},...
blocks[1] = {dictionary},{dictionary},{dictionary},...
etc..
currently I build the blocks as such:
blocks = []
blocks.append[()]
blocks.append[()]
blocks.append[()]
blocks.append[()]
I know that must look ridiculous. I just cannot see in my head what that just made, which is part of my problem. I assign to a block from a different list of dictionary items. Here is a brief overview of how a single block is created...
hold = {}
hold['file']=file
hold['count']=count
hold['mass']=mass_lbs
mg1.append(hold)
##this append can happen several times to mg1
blocks[i].append(mg1[j])
##where i is an index for the block I want to append to, and j is the list index corresponding to whichever dictionary item of mg1 I want to grab.
The reason I want these four main indices in blocks is so that I have shorter code with just the one list instead of block1 block2 block3 block4, which would just make the code way longer than it is now.
Okay, going off of what was discussed in the comments, you're looking for a simple way to create a structure that is a list of four items where each item is a list of dictionaries, and all the dictionaries in one of those lists have the same keys but not necessarily the same values. However, if you know exactly what keys each dictionary will have and that never changes, then it might be worth it to consider making them classes that wrap dictionaries and have each of the four lists be a list of objects. This would be easier to keep in your head, and a bit more Pythonic in my opinion. You also gain the advantage of ensuring that the keys in the dictionary are static, plus you can define helper methods. And by emulating the methods of a container type, you can still use dictionary syntax.
class BlockA:
def __init__(self):
self.dictionary = {'file':None, 'count':None, 'mass':None }
def __len__(self):
return len(self.dictionary)
def __getitem__(self, key):
return self.dictionary[key]
def __setitem__(self, key, value):
if key in self.dictionary:
self.dictionary[key] = value
else:
raise KeyError
def __repr__(self):
return str(self.dictionary)
block1 = BlockA()
block1['file'] = "test"
block2 = BlockA()
block2['file'] = "other test"
Now, you've got a guarantee that all instances of your first block object will have the same keys and no additional keys. You can make similar classes for your other blocks, or some general class, or some mix of the two using inheritance. Now to make your data structure:
blocks = [ [block1, block2], [], [], [] ]
print(blocks) # Or "print blocks" if you're not using Python 3.x
blocks[0][0]['file'] = "some new file"
print(blocks)
It might also be worthwhile to have a class for this blocks container, with specific methods for adding blocks of each type and accessing blocks of each type. That way you wouldn't trip yourself up with accidentally adding the wrong kind of block to one of the four lists or similar issues. But depending on how much you'll be using this structure, that could be overkill.

Convert a Java List of Lists to Scala without O(n) iteration?

Answers to this question do a good job of explaining how to use Scala's Java Converters to change a Java List into a Scala List. Unfortunately, I need to convert a List of Lists from Java to Scala types, and that solution doesn't work:
// pseudocode
java.util.List[java.util.List[String]].asScala
-> scala.collection.immutable.List[java.util.List[String]]
Is there a way to do this conversion without an O(N) iteration over the Java object?
You need to convert the nested lists as well, but that would require the up front O(n):
import scala.collection.JavaConverters._
val javaListOfLists = List(List("a", "b", "c").asJava, List("d", "e", "f").asJava).asJava
val scalaListOfLists = javaListOfLists.asScala.toList.map(_.asScala.toList)
Alternatively, you could convert the outer list into a Stream[List[T]], that would only apply the conversion cost as you accessed each item
val scalaStreamOfLists = javaListOfLists.asScala.toStream.map(_.asScala.toList)
If you don't want to pay the conversion cost at all, you could write a wrapper around java.util.List which would give you a scala collection interface. a rought shot at that would be:
def wrap[T](javaIterator: java.util.Iterator[T]): Stream[T] = {
if (javaIterator.hasNext)
javaIterator.next #:: wrap(javaIterator)
else
empty
}
val outerWrap = wrap(javaListOfLists.iterator).map(inner => wrap(inner.iterator()))
alternatively you can use scalaj-collection library i wrote specifically for this purpose
import com.daodecode.scalaj.collection._
val listOfLists: java.util.List[java.util.List[String]] = ...
val s: mutable.Seq[mutable.Seq[String]] = listOfLists.deepAsScala
that's it. It will convert all nested java collections and primitive types to scala versions. You can also convert directly to immutable data structures using deepAsScalaImmutable (with some copying overhead of course)

Scala: Create new list of same type

I'm stuck and the solutions Google offered me (not that many) didn't work somehow. It sounds trivial but kept me busy for two hours now (maybe I should go for a walk...).
I've got a list of type XY, oldList: List[XY] with elements in it. All I need is a new, empty List of the same type.
I've already tried stuff like:
newList[classOf(oldList[0])]
newList = oldList.clone()
newList.clear()
But it didn't work some how or takes MutableList, which I don't like. :/
Is there a best (or any working) practice to create a new List of a certain type?
Grateful for any advice,
Teapot
P.S. please don't be too harsh if it's simple, I'm new to Scala. :(
Because empty lists do not contain anything, the type parameter doesn't actually matter - an empty List[Int] is the same as an empty List[String]. And in fact, they are exactly the same object, Nil. Nil has type List[Nothing], and can be upcast to any other kind of List.
In general though when working with types that take type parameters, the type parameter for the old value will be in scope, and it can be used to create a new instance. So a generic method for a mutable collection, where the type parameter matters even if empty:
def processList[T](oldList: collection.mutable.Buffer[T]) = {
val newList = collection.mutable.Buffer[T]()
// Do something with oldList and newList
}
There are probably nicer solutions, but from the top of my head:
def nilOfSameType[T](l: List[T]) = List.empty[T]
val xs = List(1,2,3)
nilOfSameType(xs)
// List[Int] = List()
if you want it to be empty, all you should have to do is:
val newList = List[XY]()
That's really it, as long as I'm understanding your question.

Scala - Mutable ListBuffer or Immutable list to choose?

I am writing a simple scala program that will calculate moving average of list of quotes of a defined size say 100.
Quotes will be coming at the rate of approx 5-6 quotes per second .
1) Is it good to keep the quotes in a immutable scala list wherein I guess each time a quote comes a new list will be created ? Is it going to take too much unnecessary memory ?
OR
2) Is it good to keep the quotes in a mutable list like ListBuffer wherein I will removing the oldest quote and pushing the new quotes each time a quote comes .
Current code
package com.example.csv
import scala.io.Source
import scala.collection.mutable.ListBuffer
object CsvFileParser {
val WINDOW_SIZE = 25;
var quotes = ListBuffer(0.0);
def main(args: Array[String]) = {
val src = Source.fromFile("GBP_USD_Week1.csv");
//drop header and split the comma separated tokens
val iter = src.getLines().drop(1).map(_.split(","));
// Sliding window reads ahead // remove it
val index = 0;
while(iter.hasNext) {
processRecord(iter.next)
}
src.close()
}
def processRecord(record: Array[String]) = {
if(quotes.length < WINDOW_SIZE){
quotes += record(4).toDouble;
}else {
val movingAverage = quotes.sum / quotes.length
quotes.map(_ + " ").foreach(print)
println("\nMoving Average " + movingAverage)
quotes = quotes.tail;
quotes += record(4).toDouble;
}
}
/*def simpleMovingAverage(values: ListBuffer[Double], period: Int): ListBuffer[Double] = {
ListBuffer.fill(period - 1)(0.0) ++ (values.sliding(period).map(_.sum).map(_ / period))
}*/
}
This depends on whether you keep the items in a reverse order or not. Lists will append elements in constant time at the beginning (:: will not create an entirely new list), while the ListBuffer#+= will create a node at append it at the end of the list.
There should be little or no performance difference, nor memory footprint difference, between using List and ListBuffer -- internally these are the same data-structures. The only question is will you need to reverse the list at the end -- if you will, this will require creating a second list, so it may be slower.
In your case the decision to use list buffers was correct -- you need to remove the first element, and append to the other side of the collection, which is something that ordinary functional lists would not allow you to do efficiently.
However, in your code you call tail on the ListBuffer. This will actually copy the contents of the list buffer rather than giving you a cheap tail (O(WINDOW_SIZE) operation). You should call the quotes.remove(0, 1) to remove the first entry -- this will just mutate the current buffer, an O(1) operation.
For very fast arrivals of quotes, you might consider using a customized data structure -- for list buffer you will pay the cost of boxing.
However, in the case that there are 5-6 quotes per second and the WINDOW_SIZE is around 100, you don't have to worry -- even calling tail on the list buffer should be more than acceptable.
Immutable structures in Scala use a technique called structural sharing.
For Lists it's also mentioned in the Scaladocs:
The functional list is characterized by persistence and structural sharing, thus offering considerable performance and space consumption benefits in some scenarios.
So:
the immutable List will take up more space, but not much more and will blend in better with other Scla Constructs
the mutable Version is prone to concurrency issues, will be a little faster and use a litte bit less space.
As for the code:
If you use a mutable Structure you can get rid of the var, keep it if you want to overwrite an immutable one
It would be considered good style having a method returning the List instead of manipulating a mutable Structure or having an global var
If you process the file like in the code snippet below, you need not care for closing the file, also the file will be processed concurrently
def processLines(path: String) : Unit = for {
line <- Source.fromFile("GBP_USD_Week1.csv").getLines.tail.par
record <- line.split(",")
} processRecord(record)

Scala and JPA Results Lists

Scala noob i'm afraid:
I have the following declared class variable which will the objects I read from the database:
val options = mutable.LinkedList[DivisionSelectOption]()
I then use JPA to get a List of all rows from a table:
val divisionOptions = em.createNamedQuery("SelectOption.all", classOf[SelectOption]) getResultList
/* Wrap java List in Scala List */
val wrappedOptions = JListWrapper.apply(divisionOptions)
/* Store the wrappedOptions in the class variable */
options += wrappedOptions
However, the last line has an error:
Type Expected: String, actual JListWrapper[SelectOption]
Can anyone help with what I am doing wrong? I'm just trying to populate the options object with the result of the DB call.
Thanks
What (probably) is happening is that a JlistWrapper[SelectOption] isn't a DivisionSelectOption, so the method += isn't applicable to it. That being the case, it is trying other stuff, and giving a final error on this:
options = options + wrappedOptions
That is a rewriting Scala can do to make things like x += 1 work for var x. The + method is present on all objects, but it takes a String as parameter -- that's so one can write stuff like options + ":" and have that work as in Java. But since wrappedOptions isn't a String, it complains.
Roundabout and confusing, I know, and even Odersky regrets his decision with regards to +. Let that be a lesson: if you thing of adding a method to Any, think really hard before doing it.