Reading in TSP file Python

Reading in TSP file Python - python-2.7

I need to figure out how to read in this data of the filename 'berlin52.tsp'
This is the format I'm using
NAME: berlin52
TYPE: TSP
COMMENT: 52 locations in Berlin (Groetschel)
DIMENSION : 52
EDGE_WEIGHT_TYPE : EUC_2D
NODE_COORD_SECTION
1 565.0 575.0
2 25.0 185.0
3 345.0 750.0
4 945.0 685.0
5 845.0 655.0
6 880.0 660.0
7 25.0 230.0
8 525.0 1000.0
9 580.0 1175.0
10 650.0 1130.0
And this is my current code
# Open input file
infile = open('berlin52.tsp', 'r')
# Read instance header
Name = infile.readline().strip().split()[1] # NAME
FileType = infile.readline().strip().split()[1] # TYPE
Comment = infile.readline().strip().split()[1] # COMMENT
Dimension = infile.readline().strip().split()[1] # DIMENSION
EdgeWeightType = infile.readline().strip().split()[1] # EDGE_WEIGHT_TYPE
infile.readline()
# Read node list
nodelist = []
N = int(intDimension)
for i in range(0, int(intDimension)):
x,y = infile.readline().strip().split()[1:]
nodelist.append([int(x), int(y)])
# Close input file
infile.close()
The code should read in the file, output out a list of tours with the values "1, 2, 3..." and more while the x and y values are stored to be calculated for distances. It can collect the headers, at least. The problem arises when creating a list of nodes.
This is the error I get though
ValueError: invalid literal for int() with base 10: '565.0'
What am I doing wrong here?

This is a file in TSPLIB format. To load it in python, take a look at the python package tsplib95, available through PyPi or on Github
Documentation is available on https://tsplib95.readthedocs.io/
You can convert the TSPLIB file to a networkx graph and retrieve the necessary information from there.

You are feeding the string "565.0" into nodelist.append([int(x), int(y)]).
It is telling you it doesn't like that because that string is not an integer. The .0 at the end makes it a float.
So if you change that to nodelist.append([float(x), float(y)]), as just one possible solution, then you'll see that your problem goes away.
Alternatively, you can try removing or separating the '.0' from your string input.

There are two problem with the code above.I have run the code and found the following problem in lines below:
Dimension = infile.readline().strip().split()[1]
This line should be like this
`Dimension = infile.readline().strip().split()[2]`
instead of 1 it will be 2 because for 1 Dimension = : and for 2 Dimension = 52.
Both are of string type.
Second problem is with line
N = int(intDimension)
It will be
N = int(Dimension)
And lastly in line
for i in range(0, int(intDimension)):
Just simply use
for i in range(0, N):
Now everything will be alright I think.

nodelist.append([int(x), int(y)])
int(x)
function int() cant convert x(string(565.0)) to int because of "."
add
x=x[:len(x)-2]
y=y[:len(y)-2]
to remove ".0"

Related

Questions about Tuples

So I was able to run part of a program doing below (using tuples)
def reverse_string():
string_in = str(input("Enter a string:"))
length = -int(len(string_in))
y = 0
print("The reverse of your string is:")
while y != length:
print(string_in[y-1], end="")
y = y - 1
reverse_string()
The output is:
Enter a string:I Love Python
The reverse of your string is:
nohtyP evoL I
I am still thinking how for the program to reverse the position of the words instead of per letter.
The desired output will be "Phython Love I"
Is there anyway that I will input a string and then convert it to a tuple similar below:
So If I enter I love Phyton, a certain code will do as variable = ("I" ,"Love", "Python") and put additional codes from there...
Newbie Programmer,
Mac

TypeError: list indices must be integers, not unicode in python code

I used the split() function to convert string to a list time = time.split() and this is how my output looks like :
[u'1472120400.107']
[u'1472120399.999']
[u'1472120399.334']
[u'1472120397.633']
[u'1472120397.261']
[u'1472120394.328']
[u'1472120393.762']
[u'1472120393.737']
Then I tried accessing the contents of the list using print time[1] which gives the index out of range error (cause only a single value is stored in one list). I checked questions posted by other people and used print len(time). This is the output for that:
1
[u'1472120400.107']
1
[u'1472120399.999']
1
[u'1472120399.334']
1
[u'1472120397.633']
1
[u'1472120397.261']
1
[u'1472120394.328']
1
[u'1472120393.762']
1
[u'1472120393.737']
I do this entire thing inside a for loop because I get logs dynamically and have to extract out just the time.
This is part of my code:
line_collect = lines.collect() #spark function
for line in line_collect :
a = re.search(rx1,line)
time = a.group()
time = time.split()
#print time[1] #index out of range error which is why I wrote another for below
for k in time :
time1 = time[k]#trying to put those individual list values into one variable but get type error
print len(time1)
I get the following error :
time1 = time[k]
TypeError: list indices must be integers, not unicode
Can someone tell me how to read each of those single list values into just one list so I can access each of them using a single index[value]. I'm new to python.
My required output:
time =['1472120400.107','1472120399.999','1472120399.334','1472120397.633','1472120397.261','1472120394.328','1472120393.762','1472120393.737']
so that i can use time[1] to give 1472120399.999 as result.

Update: I misunderstood what you wanted. You have the correct output already and it's a string. The reason you have a u before the string is because it's a unicode string that has 16 bits. u is a python flag to distinguish it from a normal string. Printing it to the screen will give you the correct string. Use it normally as you would any other string.
time = [u'1472120400.107'] # One element just to show
for k in time:
print(k)

Looping over a list using a for loop will give you one value at a time, not the index itself. Consider using enumerate:
for k, value in enumerate(time):
time1 = value # Or time1 = time[k]
print(time1)
Or just getting the value itself:
for k in time:
time1 = k
print(time1)
--
Also, Python is zero based language, so to get the first element out of a list you probably want to use time[0].

Thanks for your help. I finally got the code right:
newlst = []
for line in line_collect :
a = re.search(rx1,line)
time = a.group()
newlst.append(float(time))
print newlst
This will put the whole list values into one list.
Output:
[1472120400.107, 1472120399.999, 1472120399.334, 1472120397.633,
1472120397.261, 1472120394.328, 1472120393.762, 1472120393.737]

Reading In Integers in Python

So, my question is simple. I'm simply struggling with syntax here. I need to read in a set of integers, 3, 11, 2, 4, 4, 5, 6, 10, 8, -12. What I want to do with those integers is place them in a list as I'm reading them. n = n x n array in which these will be presented. so if n = 3, then i will be passed something like this 3 \n 11 2 4 \n 4 5 6 \n 10 8 -12 ( \n symbolizing a new line in input file)
n = int(raw_input().strip())
a = []
for a_i in xrange(n):
value = int(raw_input().strip())
a.append(value)
print(a)
I receive this error from the above code code:
value = int(raw_input().strip())
ValueError: invalid literal for int() with base 10: '11 2 4'
The actual challenge can be found here, https://www.hackerrank.com/challenges/diagonal-difference .
I have already completed this in Java and C++, simply trying to do in Python now but I suck at python. If someone wants to, they don't have too, seeing the proper way to read in an entire line, say " 11 2 4 ", creating a new list out that line, and adding it to an already existing list. So then all I have to do is search said index of list[ desiredInternalList[ ] ].

You can split the string at white space and convert the entries into integers.
This gives you one list:
for a_i in xrange(n):
a.extend([int(x) for x in raw_input().split()])
and this a list of lists:
for a_i in xrange(n):
a.append([int(x) for x in raw_input().split()]):

You get this error because you try to give all inputs in one line. To handle this issue you may use this code
n = int(raw_input().strip())
a = []
while len(a)< n*n:
x=raw_input().strip()
x = map(int,x.split())
a.extend(x)
print(a)

Assinging 1 character from 1 list to another python

Hi i am making a decryption machine for my school project but i cant get it to work can you guys help me out?
Thanks already.
the error is: line 17, IndexError: list index out of range
The length of zin = 86 just so you know
this is what is in the file i need to decrypt: KEIGO N JIDOUBANEUOFIDNEIESUN IRAEI ESTIGIVNKMUEEER RDONAEOIW ENEZAEE NAML VN NILLRA
with open('something.txt', 'r') as fhandle:
key = 3
#reading the file
zin = list(fhandle.readline())
#setting up solution to which we will output
solution = list(" ")*86
solution[0] = zin[0]
#while loop in which we use the key to decrypt the message
i = 1
while i < len(zin):
solution[i] = zin[key] #this is where i get the error
i += 1
key += key
if i > 86:
break
print(solution)

Since you are accessing zin[key], you need to verify length of zin is at least key+1.

Need help in improving the speed of my code for duplicate columns removal in Python

I have written a code to take a text file as input and print only the variants which repeat more than once. By variants I mean, chr positions in the text file.
The input file looks like this:
chr1 1048989 1048989 A G intronic C1orf159 0.16 rs4970406
chr1 1049083 1049083 C A intronic C1orf159 0.13 rs4970407
chr1 1049083 1049083 C A intronic C1orf159 0.13 rs4970407
chr1 1113121 1113121 G A intronic TTLL10 0.13 rs12092254
As you can see, rows 2 and 3 repeat. I'm just taking the first 3 columns and seeing if they are the same. Here, chr1 1049083 1049383 repeat in both row2 and row3. So I print out saying that there is one duplicate and it's position.
I have written the code below. Though it's doing what I want, it's quite slow. It takes me about 5 min to run on a file which have 700,000 rows. I wanted to know if there is a way to speed things up.
Thanks!
#!/usr/bin/env python
""" takes in a input file and
prints out only the variants that occur more than once """
import shlex
import collections
rows = open('variants.txt', 'r').read().split("\n")
# removing the header and storing it in a new variable
header = rows.pop()
indices = []
for row in rows:
var = shlex.split(row)
indices.append("_".join(var[0:3]))
dup_list = []
ind_tuple = collections.Counter(indices).items()
for x, y in ind_tuple:
if y>1:
dup_list.append(x)
print dup_list
print len(dup_list)
Note: In this case the entire row2 is a duplicate of row3. But this is not necessarily the case all the time. Duplicate of chr positions (first three columns) is what I'm looking for.
EDIT:
Edited the code as per the suggestion of damienfrancois. Below is my new code:
f = open('variants.txt', 'r')
indices = {}
for line in f:
row = line.rstrip()
var = shlex.split(row)
index = "_".join(var[0:3])
if indices.has_key(index):
indices[index] = indices[index] + 1
else:
indices[index] = 1
dup_pos = 0
for key, value in indices.items():
if value > 1:
dup_pos = dup_pos + 1
print dup_pos
I used, time to see how long both the code takes.
My original code:
time run remove_dup.py
14428
CPU times: user 181.75 s, sys: 2.46 s,total: 184.20 s
Wall time: 209.31 s
Code after modification:
time run remove_dup2.py
14428
CPU times: user 177.99 s, sys: 2.17 s, total: 180.16 s
Wall time: 222.76 s
I don't see any significant improvement in the time.

Some suggestions:
do not read the whole file at once ; read line by line and process it on the fly ; you'll save memory operations
let indices be a default dict and increment the value at key "_".join(var[0:3]) ; this saves the costly (guessing here, should use a profiler) collections.Counter(indices).items() step
try pypy or a python compiler
split your data in as many subsets as your computer has cores, apply the program to each subset in parallel then merge the results
HTH

A big time sink is probably the if..has_key() portion of the code. In my experience, try-except is a lot faster...
f = open('variants.txt', 'r')
indices = {}
for line in f:
var = line.split()
index = "_".join(var[0:3])
try:
indices[index] += 1
except KeyError:
indices[index] = 1
f.close()
dup_pos = 0
for key, value in indices.items():
if value > 1:
dup_pos = dup_pos + 1
print dup_pos
Another option there would be replace the four try except lines with:
indices[index] = 1 + indices.get(index,0)
This approach only tells how many lines of the lines are duplicated, and not how many times they are repeated. (So if one line is duped 3x, then it will say one...)
If you are only trying to count the duplicates and not delete or note them, you could tally the lines of the file as you go, and compare this to the length of the indices dictionary, and the difference is the number of dupe lines (instead of looping back through and re-counting). This might save a little time, but gives a different answer:
#!/usr/bin/env python
f = open('variants.txt', 'r')
indices = {}
total_len=0
for line in f:
total_len +=1
var = line.split()
index = "_".join(var[0:3])
indices[index] = 1 + indices.get(index,0)
f.close()
print "Number of duplicated lines:", total_len - len(indices.keys())
I'd be curious to hear what your benchmarks are for code that does not include the has_key() test...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reading in TSP file Python - python-2.7

This is a file in TSPLIB format. To load it in python, take a look at the python package tsplib95, available through PyPi or on Github Documentation is available on https://tsplib95.readthedocs.io/ You can convert the TSPLIB file to a networkx graph and retrieve the necessary information from there.

nodelist.append([int(x), int(y)]) int(x) function int() cant convert x(string(565.0)) to int because of "." add x=x[:len(x)-2] y=y[:len(y)-2] to remove ".0"

Related

Questions about Tuples

TypeError: list indices must be integers, not unicode in python code

Reading In Integers in Python

Assinging 1 character from 1 list to another python

Need help in improving the speed of my code for duplicate columns removal in Python

Categories

Resources