Building list of tuples from list in Erlang - list

I am trying to read content from a file and then organize it into a list of tuples. I have read the file into a list of numbers, however it seems to skip numbers immediately after newlines, how to prevent this behaviour?
I am guaranteed a file of even number of characters.
-module(brcp).
-export([parse_file/1]).
parse_file(Filename) ->
read_file(Filename).
read_file(Filename) ->
{ok, File} = file:read_file(Filename),
Content = unicode:characters_to_list(File),
build_tuples([begin {Int,_}=string:to_integer(Token), Int end|| Token<-string:tokens(Content," /n/r")]).
build_tuples(List) ->
case List of
[] -> [];
[E1,E2|Rest] -> [{E1,E2}] ++ build_tuples(Rest)
end.
Here is a sample input file:
1 7 11 0
1 3 5 0 7 0
1 8 10 0 1 11
99 0

-module(tuples).
-export([parse/0]).
parse() ->
{ok, File} = file:read_file("tuples.txt"),
List = binary:split(File, [<<" ">>, <<"\t">>, <<"\n">>], [global, trim_all]),
io:format("~p~n", [List]),
build_tuples(List, []).
build_tuples([X,Y|T], Acc) ->
build_tuples(T, [{X,Y}|Acc]);
build_tuples([X|T], Acc) ->
build_tuples(T, [{X, undefined}|Acc]);
build_tuples([], Acc) ->
lists:reverse(Acc).
The text file I used is almost as yours but I added tabs and multiple spaces to make it more realistic:
1 7 11 0
1 3 5 0 7 0
1 8 10 0 1 11
99 0
You can of course convert binaries to integers when adding them to tuples with erlang:binary_to_integer/1. The binary:split/3 function used in the code parses all empty characters (tabs, spaces, new lines) to empty binaries and then trim_all ignores them. You can skip them if your input is always well-formed. Result:
14> tuples:parse().
[<<"1">>,<<"7">>,<<"11">>,<<"0">>,<<"1">>,<<"3">>,<<"5">>,<<"0">>,<<"7">>,<<"0">>,<<"1">>,<<"8">>,<<"10">>,<<"0">>,<<"1">>,<<"11">>,<<"99">>,<<"0">>]
[{<<"1">>,<<"7">>},{<<"11">>,<<"0">>},{<<"1">>,<<"3">>},{<<"5">>,<<"0">>},{<<"7">>,<<"0">>},{<<"1">>,<<"8">>},{<<"10">>,<<"0">>},{<<"1">>,<<"11">>},{<<"99">>,<<"0">>}]

Related

Function I defined is not cleaning my list properly

Here is my minimal working example:
list1 = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] #len = 21
list2 = [1,1,1,0,1,0,0,1,0,1,1,0,1,0,1,0,0,0,1,1,0] #len = 21
list3 = [0,0,1,0,1,1,0,1,0,1,0,1,1,1,0,1,0,1,1,1,1] #len = 21
list4 = [1,0,0,1,1,0,0,0,0,1,0,1,1,1,1,0,1,0,1,0,1] #len = 21
I have four lists and I want to "clean" my list 1 using the following rule: "if any of list2[i] or list3[i] or list4[i] are equal to zero, then I want to eliminate the item I from list1. SO basically I only keep those elements of list1 such that the other lists all have ones there.
here is the function I wrote to solve this
def clean(list1, list2,list3,list4):
for i in range(len(list2)):
if (list2[i]==0 or list3[i]==0 or list4[i]==0):
list1.pop(i)
return list1
however it doesn't work. If you apply it, it give the error
Traceback (most recent call last):line 68, in clean list1.pop(I)
IndexError: pop index out of range
What am I doing wrong? Also, I was told Pandas is really good in dealing with data. Is there a way I can do it with Pandas? Each of these lists are actually columns (after removing the heading) of a csv file.
EDIT
For example at the end I would like to get: list1 = [4,9,11,15]
I think the main problem is that at each iteration, when I pop out the elements, the index of all the successor of that element change! And also, the overall length of the list changes, and so the index in pop() is too large. So hopefully there is another strategy or function that I can use
This is definitely a job for pandas:
import pandas as pd
df = pd.DataFrame({
'l1':list1,
'l2':list2,
'l3':list3,
'l4':list4
})
no_zeroes = df.loc[(df['l2'] != 0) & (df['l3'] != 0) & (df['l4'] != 0)]
Where df.loc[...] takes the full dataframe, then filters it by the criteria provided. In this example, your criteria are that you only keep the items where l2, l3, and l3 are not zero (!= 0).
Gives you a pandas dataframe:
l1 l2 l3 l4
4 4 1 1 1
9 9 1 1 1
12 12 1 1 1
18 18 1 1 1
or if you need just list1:
list1 = df['l1'].tolist()
if you want the criteria to be where all other columns are 1, then use:
all_ones = df.loc[(df['l2'] == 1) & (df['l3'] == 1) & (df['l4'] == 1)]
Note that I'm creating new dataframes for no_zeroes and all_ones and that the original dataframe stays intact if you want to further manipulate the data.
Update:
Per Divakar's answer (far more elegant than my original answer), much the same can be done in pandas:
df = pd.DataFrame([list1, list2, list3, list4])
list1 = df.loc[0, (df[1:] != 0).all()].astype(int).tolist()
Here's one approach with NumPy -
import numpy as np
mask = (np.asarray(list2)==1) & (np.asarray(list3)==1) & (np.asarray(list4)==1)
out = np.asarray(list1)[mask].tolist()
Here's another way with NumPy that stacks those lists into rows to form a 2D array and thus simplifies things quite a bit -
arr = np.vstack((list1, list2, list3, list4))
out = arr[0,(arr[1:] == 1).all(0)].tolist()
Sample run -
In [165]: arr = np.vstack((list1, list2, list3, list4))
In [166]: print arr
[[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
[ 1 1 1 0 1 0 0 1 0 1 1 0 1 0 1 0 0 0 1 1 0]
[ 0 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 0 1 1 1 1]
[ 1 0 0 1 1 0 0 0 0 1 0 1 1 1 1 0 1 0 1 0 1]]
In [167]: arr[0,(arr[1:] == 1).all(0)].tolist()
Out[167]: [4, 9, 12, 18]

calling number from a list to check if they are in a text file or not

I have a text file which there are 3 numbers in each line.
I also have a list number, like : lists = [1,2,3,4,5,6]
I like to find the lines in a text file which all the 3 numbers are from the list. for example:
text file:
11 20 6
3 5 1
30 20 12
I want to find this line :3 5 1
What is the fastest way to do so?
Using split() and set():
l = [1,2,3,4,5,6]
with open('data.txt') as file:
for i, line in enumerate(file):
if(set(list(map(int, line.split()))).issubset(l)):
print("Line %d has all numbers from the list" % i)
With an example file: data.txt like so:
11 20 6
3 5 1
30 20 12
Output:
Line 1 has all numbers from the list

Random.randint on lists in Python

I want to create a list and fill it with 15 zeros, then I want to change the 0 to 1 in 5 random spots of the list, so it has 10 zeros and 5 ones, here is what I tried
import random, time
dasos = []
for i in range(1, 16):
dasos.append(0)
for k in range(1, 6):
dasos[random.randint(0, 15)] = 1
Sometimes I would get anywhere from 0 to 5 ones but I want exactly 5 ones,
if I add:
print(dasos)
...to see my list I get:
IndexError: list assignment index out of range
I think the best solution would be to use random.sample:
my_lst = [0 for _ in range(15)]
for i in random.sample(range(15), 5):
my_lst[i] = 1
You could also consider using random.shuffle and use the first 5 entries:
my_lst = [0 for _ in range(15)]
candidates = list(range(15))
random.shuffle(candidates)
for i in candidates[0:5]:
my_lst[i] = 1
TL;DR: Read the the Python random documentation, this can be done in multiple ways.

Printing element of List in a different way

I need to print a List of Lists using Scala and the function toString, where every occurrence of 0 needs to be replaced by an '_'. This is my attempt so far. The commented code represents my different attempts.
override def toString() = {
// grid.map(i => if(i == 0) '_' else i)
// grid map{case 0 => '_' case a => a}
// grid.updated(0, "_")
//grid.map{ case 0 => "_"; case x => x}
grid.map(_.mkString(" ")).mkString("\n")
}
My output should look something like this, but an underscore instead of the zeros
0 0 5 0 0 6 3 0 0
0 0 0 0 0 0 4 0 0
9 8 0 7 4 0 0 0 5
1 0 0 0 7 0 9 0 0
0 0 9 5 0 1 6 0 0
0 0 8 0 2 0 0 0 7
6 0 0 0 1 8 0 9 3
0 0 1 0 0 0 0 0 0
Thanks in advance.
Just put an extra map in there to change 0 to _
grid.map(_.map(_ match {case 0 => "_"; case x => x}).mkString(" ")).mkString("\n")
Nothing special:
def toString(xs: List[List[Int]]) = xs.map { ys =>
ys.map {
case 0 => "_"
case x => String.valueOf(x)
}.mkString(" ")
}.mkString("\n")
Although the other solutions are functionally correct, I believe this shows more explicitly what happens and as such is better suited for a beginner:
def gridToString(grid: List[List[Int]]): String = {
def replaceZero(i: Int): Char =
if (i == 0) '_'
else i.toString charAt 0
val lines = grid map { line =>
line map replaceZero mkString " "
}
lines mkString "\n"
}
First we define a method for converting the digit into a character, replacing zeroes with underscores. (It is assumed from your example that all the Int elements are < 10.)
The we take each line of the grid, run each of the digits in that line through our conversion method and assemble the resulting chars into a string.
Than we take we take the resulting line-strings and turn them into the final string.
The whole thing could be written shorter, but it wouldn't necessarily be more readable.
It is also good Scala style to use small inner methods like replaceZero in this example instead of writing all code inline, as the naming of a method helps indicating what it is does, and as such enhances readability.
There's always room for another solution. ;-)
A grid:
type Grid[T] = List[List[T]]
Print a grid:
def print[T](grid: Grid[T]) = grid map(_ mkString " ") mkString "\n"
Replace all zeroes:
for (row <- grid) yield row.collect {
case 0 => "_"
case anything => anything
}

R- regex index of start postion and then add it to a string?

So far i have been able to merge two files and get the following dataframe (df1):
ID someLength someLongerSeq someSeq someMOD someValue
A 16 XCVBNMHGFDSTHJGF NMH T3(P) 7
A 16 XCVBNMHGFDSTHJGF NmH M3(O); S4(P); S6(P) 1
B 24 HDFGKJSDHFGKJSDFHGKLSJDF HFGKJSDFH S9(P) 5
C 22 QIOWEURQOIWERERQWEFFFF RQoIWERER Q16(D); S19(P) 7
D 19 HSEKDFGSFDKELJGFZZX KELJ S7(P); C9(C); S10(P) 1
i am looking for a way to do a regex match based on "someSeq" column to look for that substring in the "someLongersSeq" column and get the start location of the match and then add that to the whole numbers that are attached to the characters such as T3(P).
Example:
For the second row "ID:A","someSeq":"NmH" matches starts at location 4 of the someLongerSeq (after to upper conversion of NmH). So i want to add that number 4 to someMOD fields M3(O);S4(P);S6(P) so that i get M7(O);S8(P);S10(P) and then overwrite the new value in the someMOD column.
And do that for each row. Regex is per row bases.
Any help is really appreciated. Thanks.
First of all, I should mention that it is hard to read your data. I slightly modify it( I remove spaces from someMOD column) to read them. This is not a problem since you have already your data into a data.frame. So I read the data like this :
dat <- read.table(text='ID someLength someLongerSeq someSeq someMOD someValue
A 16 XCVBNMHGFDSTHJGF NMH T3(P) 7
A 16 XCVBNMHGFDSTHJGF NmH M3(O);S4(P);S6(P) 1
B 24 HDFGKJSDHFGKJSDFHGKLSJDF HFGKJSDFH S9(P) 5
C 22 QIOWEURQOIWERERQWEFFFF RQoIWERER Q16(D);S19(P) 7
D 19 HSEKDFGSFDKELJGFZZX KELJ S7(P);C9(C);S10(P) 1',header=TRUE)
Then the idea is:
to process row by row using apply
use gregexpr to get the index of someSeq into someLongerSeq
use gsubfn to add the previous index to its digit of someMOD
Here the whole solution:
library(gsubfn)
res <- t(apply(dat,1,function(x){
idx <- gregexpr(x['someSeq'],x['someLongerSeq'],
ignore.case = TRUE)[[1]][1]
x[['someMOD']] <- gsubfn("[[:digit:]]+",
function(x) as.numeric(x)+idx,
x[['someMOD']])
x
}))
as.data.frame(res)
ID someLength someLongerSeq someSeq someMOD someValue
1 A 16 XCVBNMHGFDSTHJGF NMH T8(P) 7
2 A 16 XCVBNMHGFDSTHJGF NmH M8(O);S9(P);S11(P) 1
3 B 24 HDFGKJSDHFGKJSDFHGKLSJDF HFGKJSDFH S18(P) 5
4 C 22 QIOWEURQOIWERERQWEFFFF RQoIWERER Q23(D);S26(P) 7
5 D 19 HSEKDFGSFDKELJGFZZX KELJ S18(P);C20(C);S21(P) 1