Split a RDD column into several ones in Scala - list

I have a RDD of this form:
org.apache.spark.rdd.RDD[(String, Int, Array[String])]
This is the first element of the RDD:
(001, 5, Array(a, b, c))
And I want to split that list on several columns, as it is separated by commas, the expected output would be:
(001, 5, a, b, c)
Any help?
SOLUTION:
I finally resolved the problem:
What I did was compose the array in a entire string with:
mkstring(",")
and then, converted the rdd to dataframe. With that, I was able to split the string in columns with the method withColumns

I think you just need to get values from the list one by one and put them into a tuple. Try this
val result = RDD.map(x => (x._1, x._2, x._3(0), x._3(1), x._3(2)))

If you have something like this,
RDD[(String, Int, List[String])]
In general you should not try to generate an RDD with elements of that List as columns.
The reason being the fact that Scala is a Strictly Typed language and your RDD[T] needs to be a RDD of type T.
Now lets say your RDD only had following two "rows" (elements) with lists of different lengths,
("001", 5, List("a", "b", "c"))
("002", 5, List("a", "b", "c", "d"))
Now as you can see... that the first row will need a RDD[(String, Int, String, String, String)] but the second will need a RDD[(String, Int, String, String, String, String)].
This will result in the generated RDD to think of its type as Any and you will have an RDD[Any]. And this Any type will further restrict you in doing things because of Erasure at run-time.
But the special case where, you can do this without problem is - if you know that each list has known and same length (lets say 3 in this case),
val yourRdd = rdd.map({
case (s, i, s1 :: s2 :: s3 :: _) => (s, i, s1, s2, s3)
})
Now... If it is not this special case and your lists can have different unknown sizes... and if even you want to do that... converting a list of unspecified length to tuple is not an easy thing to do. At least, I can not think of any easy way to do that.
And I will advise you to avoid trying to do that without a very very solid reason.

Related

How can i iterate a list in elixir to make another list?

I have a list of tuples in the following format
[{"string1",1},{"string2",2}...]
I'm gonna use that for another function but i realized i can't have that or rather i just think is too difficult to operate over that list in that format, so my solution was transforming that list into the following
["string1", "string2",...]
But i'm not sure how to do this as i'm still learning how Elixir works.
My way of getting it was this:
for x <- words do
text ++ elem(words,0)
end
"text" being an empty list and "words" being the list of tuples. But of course this doesn't work, not really sure why.
If you want to do it using for, you need to understand that for is not a construct to iterate over things, as in other programming languages. In Elixir, for is used for comprehensions, meaning that the result will be a data structure, created from an enumerable, like your list of tuples.
You also need to understand that if you updated your code, text ++ elem(words, 0) wouldn't actually update text, and that ++ doesn't work the way you think it does either. ++ is useful for concatenating lists, or keyword lists, but not for adding single elements to a list. For that purpose you could do list ++ ["string"] or ["string"] ++ list which is faster; or even simpler: ["string" | list].
And the reason it wouldn't update your list, is that in each "iteration" you would just be producing the concatenation, but you wouldn't actually be assigning that anywhere. And things in Elixir are not mutable, so ++ doesn't actually update something
Now, in order to correctly create what you want using for, you could do it like this:
iex> list = [{"string1", 1}, {"string2", 2}]
[{"string1", 1}, {"string2", 2}]
iex> for tup <- list, do: elem(tup, 0)
["string1", "string2"]
iex> for {string, _} <- list, do: string
["string1", "string2"]
Which basically means: using this list, create a new one, keeping only the first element of each tuple. Since a list is the default output of for. But you could also change the resulting data structure adding into: %{} or something else. Comprehensions are very powerful, you can get a better understanding of them here
Another popular way to solve it would be using Enum.map, which maps a given enumerable to something else. Meaning that it transforms the elements of the enumerable, so you could do it like this as well:
iex> list = [{"string1", 1}, {"string2", 2}]
[{"string1", 1}, {"string2", 2}]
iex> Enum.map(list, fn {string, _} -> string end)
["string1", "string2"]
Here, the transformation would be made by the function which takes the tuple, matching the first element into something called string and "returns" just that for each element.
Another simple way you could to it is using Enum.unzip/1 which takes a list of two-element tuples, like the ones you have, and produces a tuple with two elements. The first one being a list with the first element from each of your tuples, and the second one, a list with the second element from each of your tuples.
So, you could do:
iex> list = [{"string1",1},{"string2",2}]
[{"string1", 1}, {"string2", 2}]
iex> {strings, _} = Enum.unzip(list)
{["string1", "string2"], [1, 2]}
iex> strings
["string1", "string2"]
This way, you would be left with a strings variable, containing the list you want. However, this only works with two element tuples, so if you have any tuple that has more than two, it wouldn't work. Besides, you wouldn't be using the second list, so the for comprehension here could suit you better.
The most simple way to solve this is just using Enum.map https://hexdocs.pm/elixir/Enum.html#map/2
[{"string1", 1}, {"string2", 2}]
|> Enum.map(fn { string, _ } -> string)
To learn Elixir just keep doing what you are doing and look into the Docs and keep asking here as well

How to add sequential numbers to a list of tuples

I have a question which looks quite simple, but I could not find an acceptable answer as yet. It looks that variations of it have already been asked here several times, but none of the answers was helpful to me.
Here it is:
I have a lists of tuples, as follows:
reflist = [("Author1", 1900, "Some reference"), ("Author2", 1901, "Another reference"), ("Author3", 1902, "Yet another reference")]
What I want is to add a sequential number to each tuple in the list, so that I got:
reflist = [(1, "Author1", 1900, "Some reference"), (2, "Author2", 1901, "Another reference"), (3, "Author3", 1902, "Yet another reference")]
This looks silly and a list comprehension should do the trick, but I cannot discern just how :-(
Thanks in advance for any assistance you can provide.
enumerate() runs over a sequence and generates index, value pairs. You can't merge directly into your tuples - because tuples are immutable, you can't change their length - but one way you could do it is to convert the tuples you have into lists, make the index number a list, concatenate the two lists together, and convert the result to a tuple:
reflist2 = [tuple([index+1] + list(ref)) for index, ref in enumerate(reflist)]
(I've edited it to index+1 because enumerate starts counting from 0)
f = [tuple(list(elem).insert(0, i)) for elem in reflist for in range(len(reflist))]
What this list comprehension does is that it tells for each original entry in reflist, it should convert it to a list, then insert a number in some integer list to the 0 position of the list, then convert that list back into a tuple, and put it all together in a ne wlist.

How to read each element within a tuple from a list

I want to write a program which will read in a list of tuples, and in the tuple it will contain two elements. The first element can be an Object, and the second element will be the quantity of that Object. Just like: Mylist([{Object1,Numbers},{Object2, Numbers}]).
Then I want to read in the Numbers and print the related Object Numbers times and then store them in a list.
So if Mylist([{lol, 3},{lmao, 2}]), then I should get [lol, lol, lol, lmao, lmao] as the final result.
My thought is to first unzip those tuples (imagine if there are more than 2) into two tuples which the first one contains the Objects while the second one contains the quantity numbers.
After that read the numbers in second tuples and then print the related Object in first tuple with the exact times. But I don't know how to do this. THanks for any help!
A list comprehension can do that:
lists:flatten([lists:duplicate(N,A) || {A, N} <- L]).
If you really want printing too, use recursion:
p([]) -> [];
p([{A,N}|T]) ->
FmtString = string:join(lists:duplicate(N,"~p"), " ")++"\n",
D = lists:duplicate(N,A),
io:format(FmtString, D),
D++p(T).
This code creates a format string for io:format/2 using lists:duplicate/2 to replicate the "~p" format specifier N times, joins them with a space with string:join/2, and adds a newline. It then uses lists:duplicate/2 again to get a list of N copies of A, prints those N items using the format string, and then combines the list with the result of a recursive call to create the function result.

Haskell - Convert x number of tuples into a list [duplicate]

I have a question about tuples and lists in Haskell. I know how to add input into a tuple a specific number of times. Now I want to add tuples into a list an unknown number of times; it's up to the user to decide how many tuples they want to add.
How do I add tuples into a list x number of times when I don't know X beforehand?
There's a lot of things you could possibly mean. For example, if you want a few copies of a single value, you can use replicate, defined in the Prelude:
replicate :: Int -> a -> [a]
replicate 0 x = []
replicate n | n < 0 = undefined
| otherwise = x : replicate (n-1) x
In ghci:
Prelude> replicate 4 ("Haskell", 2)
[("Haskell",2),("Haskell",2),("Haskell",2),("Haskell",2)]
Alternately, perhaps you actually want to do some IO to determine the list. Then a simple loop will do:
getListFromUser = do
putStrLn "keep going?"
s <- getLine
case s of
'y':_ -> do
putStrLn "enter a value"
v <- readLn
vs <- getListFromUser
return (v:vs)
_ -> return []
In ghci:
*Main> getListFromUser :: IO [(String, Int)]
keep going?
y
enter a value
("Haskell",2)
keep going?
y
enter a value
("Prolog",4)
keep going?
n
[("Haskell",2),("Prolog",4)]
Of course, this is a particularly crappy user interface -- I'm sure you can come up with a dozen ways to improve it! But the pattern, at least, should shine through: you can use values like [] and functions like : to construct lists. There are many, many other higher-level functions for constructing and manipulating lists, as well.
P.S. There's nothing particularly special about lists of tuples (as compared to lists of other things); the above functions display that by never mentioning them. =)
Sorry, you can't1. There are fundamental differences between tuples and lists:
A tuple always have a finite amount of elements, that is known at compile time. Tuples with different amounts of elements are actually different types.
List an have as many elements as they want. The amount of elements in a list doesn't need to be known at compile time.
A tuple can have elements of arbitrary types. Since the way you can use tuples always ensures that there is no type mismatch, this is safe.
On the other hand, all elements of a list have to have the same type. Haskell is a statically-typed language; that basically means that all types are known at compile time.
Because of these reasons, you can't. If it's not known, how many elements will fit into the tuple, you can't give it a type.
I guess that the input you get from your user is actually a string like "(1,2,3)". Try to make this directly a list, whithout making it a tuple before. You can use pattern matching for this, but here is a slightly sneaky approach. I just remove the opening and closing paranthesis from the string and replace them with brackets -- and voila it becomes a list.
tuplishToList :: String -> [Int]
tuplishToList str = read ('[' : tail (init str) ++ "]")
Edit
Sorry, I did not see your latest comment. What you try to do is not that difficult. I use these simple functions for my task:
words str splits str into a list of words that where separated by whitespace before. The output is a list of Strings. Caution: This only works if the string inside your tuple contains no whitespace. Implementing a better solution is left as an excercise to the reader.
map f lst applies f to each element of lst
read is a magic function that makes a a data type from a String. It only works if you know before, what the output is supposed to be. If you really want to understand how that works, consider implementing read for your specific usecase.
And here you go:
tuplish2List :: String -> [(String,Int)]
tuplish2List str = map read (words str)
1 As some others may point out, it may be possible using templates and other hacks, but I don't consider that a real solution.
When doing functional programming, it is often better to think about composition of operations instead of individual steps. So instead of thinking about it like adding tuples one at a time to a list, we can approach it by first dividing the input into a list of strings, and then converting each string into a tuple.
Assuming the tuples are written each on one line, we can split the input using lines, and then use read to parse each tuple. To make it work on the entire list, we use map.
main = do input <- getContents
let tuples = map read (lines input) :: [(String, Integer)]
print tuples
Let's try it.
$ runghc Tuples.hs
("Hello", 2)
("Haskell", 4)
Here, I press Ctrl+D to send EOF to the program, (or Ctrl+Z on Windows) and it prints the result.
[("Hello",2),("Haskell",4)]
If you want something more interactive, you will probably have to do your own recursion. See Daniel Wagner's answer for an example of that.
One simple solution to this would be to use a list comprehension, as so (done in GHCi):
Prelude> let fstMap tuplist = [fst x | x <- tuplist]
Prelude> fstMap [("String1",1),("String2",2),("String3",3)]
["String1","String2","String3"]
Prelude> :t fstMap
fstMap :: [(t, b)] -> [t]
This will work for an arbitrary number of tuples - as many as the user wants to use.
To use this in your code, you would just write:
fstMap :: Eq a => [(a,b)] -> [a]
fstMap tuplist = [fst x | x <- tuplist]
The example I gave is just one possible solution. As the name implies, of course, you can just write:
fstMap' :: Eq a => [(a,b)] -> [a]
fstMap' = map fst
This is an even simpler solution.
I'm guessing that, since this is for a class, and you've been studying Haskell for < 1 week, you don't actually need to do any input/output. That's a bit more advanced than you probably are, yet. So:
As others have said, map fst will take a list of tuples, of arbitrary length, and return the first elements. You say you know how to do that. Fine.
But how do the tuples get into the list in the first place? Well, if you have a list of tuples and want to add another, (:) does the trick. Like so:
oldList = [("first", 1), ("second", 2)]
newList = ("third", 2) : oldList
You can do that as many times as you like. And if you don't have a list of tuples yet, your list is [].
Does that do everything that you need? If not, what specifically is it missing?
Edit: With the corrected type:
Eq a => [(a, b)]
That's not the type of a function. It's the type of a list of tuples. Just have the user type yourFunctionName followed by [ ("String1", val1), ("String2", val2), ... ("LastString", lastVal)] at the prompt.

How do I add x tuples into a list x number of times?

I have a question about tuples and lists in Haskell. I know how to add input into a tuple a specific number of times. Now I want to add tuples into a list an unknown number of times; it's up to the user to decide how many tuples they want to add.
How do I add tuples into a list x number of times when I don't know X beforehand?
There's a lot of things you could possibly mean. For example, if you want a few copies of a single value, you can use replicate, defined in the Prelude:
replicate :: Int -> a -> [a]
replicate 0 x = []
replicate n | n < 0 = undefined
| otherwise = x : replicate (n-1) x
In ghci:
Prelude> replicate 4 ("Haskell", 2)
[("Haskell",2),("Haskell",2),("Haskell",2),("Haskell",2)]
Alternately, perhaps you actually want to do some IO to determine the list. Then a simple loop will do:
getListFromUser = do
putStrLn "keep going?"
s <- getLine
case s of
'y':_ -> do
putStrLn "enter a value"
v <- readLn
vs <- getListFromUser
return (v:vs)
_ -> return []
In ghci:
*Main> getListFromUser :: IO [(String, Int)]
keep going?
y
enter a value
("Haskell",2)
keep going?
y
enter a value
("Prolog",4)
keep going?
n
[("Haskell",2),("Prolog",4)]
Of course, this is a particularly crappy user interface -- I'm sure you can come up with a dozen ways to improve it! But the pattern, at least, should shine through: you can use values like [] and functions like : to construct lists. There are many, many other higher-level functions for constructing and manipulating lists, as well.
P.S. There's nothing particularly special about lists of tuples (as compared to lists of other things); the above functions display that by never mentioning them. =)
Sorry, you can't1. There are fundamental differences between tuples and lists:
A tuple always have a finite amount of elements, that is known at compile time. Tuples with different amounts of elements are actually different types.
List an have as many elements as they want. The amount of elements in a list doesn't need to be known at compile time.
A tuple can have elements of arbitrary types. Since the way you can use tuples always ensures that there is no type mismatch, this is safe.
On the other hand, all elements of a list have to have the same type. Haskell is a statically-typed language; that basically means that all types are known at compile time.
Because of these reasons, you can't. If it's not known, how many elements will fit into the tuple, you can't give it a type.
I guess that the input you get from your user is actually a string like "(1,2,3)". Try to make this directly a list, whithout making it a tuple before. You can use pattern matching for this, but here is a slightly sneaky approach. I just remove the opening and closing paranthesis from the string and replace them with brackets -- and voila it becomes a list.
tuplishToList :: String -> [Int]
tuplishToList str = read ('[' : tail (init str) ++ "]")
Edit
Sorry, I did not see your latest comment. What you try to do is not that difficult. I use these simple functions for my task:
words str splits str into a list of words that where separated by whitespace before. The output is a list of Strings. Caution: This only works if the string inside your tuple contains no whitespace. Implementing a better solution is left as an excercise to the reader.
map f lst applies f to each element of lst
read is a magic function that makes a a data type from a String. It only works if you know before, what the output is supposed to be. If you really want to understand how that works, consider implementing read for your specific usecase.
And here you go:
tuplish2List :: String -> [(String,Int)]
tuplish2List str = map read (words str)
1 As some others may point out, it may be possible using templates and other hacks, but I don't consider that a real solution.
When doing functional programming, it is often better to think about composition of operations instead of individual steps. So instead of thinking about it like adding tuples one at a time to a list, we can approach it by first dividing the input into a list of strings, and then converting each string into a tuple.
Assuming the tuples are written each on one line, we can split the input using lines, and then use read to parse each tuple. To make it work on the entire list, we use map.
main = do input <- getContents
let tuples = map read (lines input) :: [(String, Integer)]
print tuples
Let's try it.
$ runghc Tuples.hs
("Hello", 2)
("Haskell", 4)
Here, I press Ctrl+D to send EOF to the program, (or Ctrl+Z on Windows) and it prints the result.
[("Hello",2),("Haskell",4)]
If you want something more interactive, you will probably have to do your own recursion. See Daniel Wagner's answer for an example of that.
One simple solution to this would be to use a list comprehension, as so (done in GHCi):
Prelude> let fstMap tuplist = [fst x | x <- tuplist]
Prelude> fstMap [("String1",1),("String2",2),("String3",3)]
["String1","String2","String3"]
Prelude> :t fstMap
fstMap :: [(t, b)] -> [t]
This will work for an arbitrary number of tuples - as many as the user wants to use.
To use this in your code, you would just write:
fstMap :: Eq a => [(a,b)] -> [a]
fstMap tuplist = [fst x | x <- tuplist]
The example I gave is just one possible solution. As the name implies, of course, you can just write:
fstMap' :: Eq a => [(a,b)] -> [a]
fstMap' = map fst
This is an even simpler solution.
I'm guessing that, since this is for a class, and you've been studying Haskell for < 1 week, you don't actually need to do any input/output. That's a bit more advanced than you probably are, yet. So:
As others have said, map fst will take a list of tuples, of arbitrary length, and return the first elements. You say you know how to do that. Fine.
But how do the tuples get into the list in the first place? Well, if you have a list of tuples and want to add another, (:) does the trick. Like so:
oldList = [("first", 1), ("second", 2)]
newList = ("third", 2) : oldList
You can do that as many times as you like. And if you don't have a list of tuples yet, your list is [].
Does that do everything that you need? If not, what specifically is it missing?
Edit: With the corrected type:
Eq a => [(a, b)]
That's not the type of a function. It's the type of a list of tuples. Just have the user type yourFunctionName followed by [ ("String1", val1), ("String2", val2), ... ("LastString", lastVal)] at the prompt.