How to search elements of a list in a file

How to search elements of a list in a file - python-2.7

Please check the code for searching list elements in file.
f=open("a.txt","r")
p=open("b.txt","r")
disk=[]
for line in p:
line = line.strip()
disk.append(line)
for line in f.readlines():
for word in disk[0]:
if word in line:
print line
The list is below:
>>> disk
['5000cca025884d5', '5000cca025a1ee6']
I want to search this list elements in file below, but I am not getting the output for index 0.
c0t5000CCA025A1EE6Cd0 <preSUN30G-A2B0-279.40GB> /scsi_vhci/disk#g5000cca025a1ee6c
1. c0t5000CCA025A28FECd0 <preSUN30G-A2B0-279.40GB> i/disk#g5000cca025a28fec
2. c0t5000CCA0258BA1DCd0 <HsdfdsSUN30G-A2B0 cyl 46873 alt 2 hd 20 sec 625> i/disk#g5000cca0258ba1dc
3. c0t5000CCA025884D5Cd0 <UN300G cyl 46873 alt 2 hd 20 sec 625> solaris i/disk#g5000cca025884d5c
4. c0t5000CCA02592705Cd0 <UN300G cyl 46873 alt 2 hd 20 sec 625> solaris i/disk#g5000cca02592705c

The only error that presents itself in your code is this:
for word in disk[0]:
As I mentioned in the comments, what this does is grab the first string in the disk list and start iterating over the individual characters. This will lead to most of the lines in a.txt getting printed multiple times.
Another possible problem would be getting the two files backwards. I did this accidentally when I was trying to duplicate your problem. When the files are backwards, nothing gets printed, because none of the lines in a.txt are in b.txt (in fact, most of them are much longer).
Here is a project on repl.it that shows the program working.

Related

Reading records from a file in FORTRAN66 using stdin adding extra unwanted junk

I'm trying to read a file in the format specified below using FORTRAN 66.
1000
MS 1 - Join Grps Group Project 5 5
Four Programs Programming 15 9
Quiz 1 Quizzes 10 7
FORTRAN Programming 25 18
Quiz 2 Quizzes 10 9
HW 1 - Looplang Homework 20 15
I execute and read the file like so:
program < grades.txt
The first line is the total number of points that can be earned in a class
The rest of the lines are assignments in a class
Each line is formatted as such: Assignment name(20 chars) category (20 chars) possible points(14 chars) earned points(14 chars)
For some reason, when the code runs and reads the file, starting at the first assignment record, I get error 5006, and cannot find an explanation of the error code. The output of the program while debugging looks like this:
$ file < grades.txt
MS 1 - Join Grps Group Project 5 6417876
NOT EOF
EOF 5006
NAME CATEGORY POSSIBLE EARNED
My goal is to be able to read each line and put each column into it's appropriate array, then reference those arrays later on to print a report for each category, with each assignment, points possible, earned, and total percentage for the category, then loop, etc.
I do not understand where the "6417876" in the output is coming from, it is definitely not part of the file that's being piped into stdin while the program reads.
The code for the program is as follows:
CHARACTER*20 ASSIGNMENTT(100)
CHARACTER*20 CATEGORY(100)
INTEGER POSSIBLE(100)
INTEGER EARNED(100)
INTEGER TOTALPTS
INTEGER REASON
INTEGER I, N
READ(5,50)TOTALPTS
50 FORMAT(I4)
c Read the arrays in
I=1
100 READ(5,110,IOSTAT=REASON)ASSIGNMENTT(I),CATEGORY(I),POSSIBLE(I),EARNED(I)
110 FORMAT(2A20x,2I14x)
WRITE(*,110)ASSIGNMENTT(I),CATEGORY(I),POSSIBLE(I),EARNED(I)
I=I+1
IF (REASON < 0) GOTO 120
WRITE(*,*)"NOT EOF"
IF (I<100 .AND. REASON == 0) GOTO 100
WRITE(*,*)"EOF", REASON
c Get the number of items (For some reason stdin adds an extra item that's not in the file, so I subtract 2 instead of 1
120 N=I-2
c Display the Names and Ages
WRITE(*,200)
200 FORMAT("NAME",T20,"CATEGORY",T40,"POSSIBLE",T54,"EARNED",T68)
DO 300 I=1,N
210 FORMAT(A20,A20,I14,I14)
300 WRITE(*,210)ASSIGNMENTT(I),CATEGORY(I),POSSIBLE(I),EARNED(I)
END
What could be causing the read issues I'm facing?

The line to read the file contents was too long, so I shortened the names of the variables to save some space and the problem was solved.

In python insert one space after every 5th Character in each line of a text file

I am reading a text file in python(500 rows) and it seems like:
File Input:
0082335401
0094446049
01008544409
01037792084
01040763890
I wanted to ask that is it possible to insert one space after 5th Character in each line:
Desired Output:
00823 35401
00944 46049
01008 544409
01037 792084
01040 763890
I have tried below code
st = " ".join(st[i:i + 5] for i in range(0, len(st), 5))
but the below output was returned on executing it:
00823 35401
0094 44604 9
010 08544 409
0 10377 92084
0104 07638 90
I am a novice in Python. Any help would make a difference.

There seems to be two issues here - By running your provided code, you seem to be reading the file into one single string. It would be much preferable (in your case) to read the file in as a list of strings, like the following (assuming your input file is input_data.txt):
# Initialize a list for the data to be stored
data = []
# Iterate through your file to read the data
with open("input_data.txt") as f:
for line in f.readlines():
# Use .rstrip() to get rid of the newline character at the end
data.append(line.rstrip("\r\n"))
Then, to operate on the data you obtained in a list, you could use a list comprehension similar to the one you have tried to use.
# Assumes that data is the result from the above code
data = [i[:5] + " " + i[5:] if len(i) > 5 else i for i in data]
Hope this helped!

If your only requirement is to insert a space after the fifth character than you could use the following simple version:
#!/usr/bin/env python
with open("input_data") as data:
for line in data.readlines():
line = line.rstrip()
if len(line) > 5:
print(line[0:5]+" "+line[5:])
else:
print(line)
If you don't mind if lines with less than five characters get a space at the end, you could even omit the if-else-statement and go with the print-function from the if-clause:
#!/usr/bin/env python
with open("input_data") as data:
for line in data.readlines():
line = line.rstrip()
print(line[0:5]+" "+line[5:])

Codeeval Challenge 230: Football, Answer Only Partially Correct

I am working on a relatively new challenge in CodeEval called 'Football.' The description is listed in the following link:
https://www.codeeval.com/open_challenges/230/
Inputs are lines of a file read by Python, and within each line there are lists separated by '|', with each list representing a country: the first being country "1", second being country "2", and so on.
1 2 3 4 | 3 1 | 4 1
19 11 | 19 21 23 | 31 39 29
Outputs are also lines in response to each line read from the file.
1:1,2,3; 2:1; 3:1,2; 4:1,3;
11:1; 19:1,2; 21:2; 23:2; 29:3; 31:3; 39:3;
so country 1 supports team 1, 2, and 3 as shown in the first line of output: 1:1,2,3.
Below is my solution, and since I have no clue why the solution only works for the two sample cases lited in the description link, I'd like to ask anyone for comments and hints on how to correct my code. Thank you very much for your time and assistance ahead of time.
import sys
def football(string):
countries = map(str.split, string.split('|'))
teams = sorted(list(set([i[j] for i in countries for j in range(len(i))])))
results = []
for i in range(len(teams)):
results.append([teams[i]+':'])
for j in range(len(countries)):
if teams[i] in countries[j]:
results[i].append(str(j+1))
for i in range(len(results)):
results[i] = results[i][0]+','.join(results[i][1:])
return '; '.join(results) + '; '
if __name__ == '__main__':
lines = [line.rstrip() for line in open(sys.argv[1])]
for line in lines:
print football(line)

After deliberately failing an attempt to checkout the complete test input and my output, I found the problem. The line:
teams = sorted(list(set([i[j] for i in countries for j in range(len(i))])))
will make the output problematic in terms of sorting. For example here's a sample input:
10 20 | 43 23 | 27 | 25 | 11 1 12 43 | 33 18 3 43 41 | 31 3 45 4 36 | 25 29 | 1 19 39 | 39 12 16 28 30 37 | 32 | 11 10 7
and it produces the output:
1:5,9; 10:1,12; 11:5,12; 12:5,10; 16:10; 18:6; 19:9; 20:1; 23:2; 25:4,8; 27:3; 28:10; 29:8; 3:6,7; 30:10; 31:7; 32:11; 33:6; 36:7; 37:10; 39:9,10; 4:7; 41:6; 43:2,5,6; 45:7; 7:12;
But the challenge expects the output teams to be sorted by numbers in ascending order, which is not achieved by the above-mentioned code as the numbers are in string format, not integer format. Therefore the solution is simply adding a key to sort the teams list by ascending order for integer:
teams = sorted(list(set([i[j] for i in countries for j in range(len(i))])), key=lambda x:int(x))
With a small change in this line, the code passes through the tests. A sample output looks like:
1:5,9; 3:6,7; 4:7; 7:12; 10:1,12; 11:5,12; 12:5,10; 16:10; 18:6; 19:9; 20:1; 23:2; 25:4,8; 27:3; 28:10; 29:8; 30:10; 31:7; 32:11; 33:6; 36:7; 37:10; 39:9,10; 41:6; 43:2,5,6; 45:7;
Please let me know if you have a better and more efficient solution to the challenge. I'd love to read better codes or great suggestions on improving my programming skills.

Here's how I solved it:
import sys
with open(sys.argv[1]) as test_cases:
for test in test_cases:
if test:
team_supporters = {}
for nation, nation_teams in enumerate(test.strip().split("|"), start=1):
for team in map(int, nation_teams.split()):
team_supporters.setdefault(team, []).append(nation)
print(*("{}:{};".format(team, ",".join(map(str, sorted(nations))))
for team, nations in sorted(team_supporters.items())))
The problem is not very complicated. We're given a mapping from nation (implicitly numbered by their order in the input) to a list of teams. We need to reverse that to create an output that maps from a team to a list of nations.
It seems natural to use a dictionary that maps in the same way as the desired output. We can use enumerate to give numbers to the nations as we iterate over them. The setdefault method of the dict adds empty lists to the dictionary as they are needed (using a collections.defaultdict instead of a regular dictionary would be another way to deal with this). We don't need to care about the order of the input, nor the order things are stored in the dictionary's inner lists.
The output we build using str.format calls and the default space separator of the print function. If the final semicolon wasn't desired, I'd have used print("; ".join("{}:{}.format(...))) instead. Since the output needs to be sorted by team at the top level, and by nation in the inner lists, we make some sorted calls where necessary.
Sorting the inner lists is probably not even be necessary, since the nations were processed in order, with their numbers derived from the order they had in the input line. Fortunately, Python's Timsort algorithm is very fast on already-sorted input, so even with a bit of unnecessary sorting, our code is still fast enough.

Compare two CSV files in Python

I have two CSV files as follows:
CSV1:
**ID Name Address Ph**
1 Mr.C dsf 142
2 Ms.N asd 251
4 Mr.V fgg 014
12 Ms.S trw 547
CSV2:
**ID Name Service Day**
1 Mr.C AAA Mon
2 Ms.N AAA Mon
2 Ms.N BBB Tue
2 Ms.N AAA Sat
As you can see very quickly CSV1 file is unique in having only 1 instance of every ID whilst CSV2 has repeats.
I am trying to match two CSV files based on ID and then wherever they match adding to CSV2 file the Address and Ph fields from CSV1. This is then saved as a new output file leaving the two original CSV files intact.
I have written a code but here's what's happening:
Either all the entries from CSV1 get added against the last row of CSV2
Or all the entries from CSV2 get the same address details appended against them
Here's what I have done so far.
import csv
csv1=open('C:\csv1file.csv')
csv2=open('C:\csv2file.csv')
csv1reader=csv.reader(csv1)
csv2reader=csv.reader(csv2)
outputfile=open('C:\mapped.csv', 'wb')
csvwriter=csv.writer(outputfile)
counter=0
header1=csv1reader.next()
header2=csv2reader.next()
csvwriter.writerow(header2+header1[2:4])
for row1 in csv1reader:
for row2 in csv2reader:
if row1[0]==row2[0]:
counter=counter+1
csvwriter.writerow(row2+row1[2:4])
I am running this code in Python 2.7. As you might have guessed the two different results that I am getting are based on the indentation of the csvwriter statement in the above code. I feel I am quite close to the answer and understand the logic but somehow the loop doesn't loop very well.
Can any one of you please assist?
Thanks.

The problem arises because the inner loop only works once. the reason for that is, because csv2reader will be empty after you run the loop once
a way to fix this would be to make a copy of the rows in the second file and use that copy in the loop
csvwriter.writerow(header2+header1[2:4])
csv2copy=[]
for row2 in csv2reader: csv2copy.append(row2)
for row1 in csv1reader:
for row2 in csv2copy:
print row1,row2,counter
if row1[0]==row2[0]:
counter=counter+1
csvwriter.writerow(row2+row1[2:4])

Fortran95 -- Reading from a formatted text file

I need to read some values from a table. These are the first five rows, to give you some idea of what it should look like:
1 + 3 98 96 1
2 + 337 2799 2463 1
3 + 2801 3733 933 1
4 + 3734 5020 1287 1
5 + 5234 5530 297 1
My interest is in the first four columns of each row. I need to read these into arrays. I used the following code:
program ----
implicit none
integer, parameter :: totbases = 4639675, totgenes = 4395
integer :: codtot, ks
integer, dimension(totgenes) :: ngene, lend, rend
character :: genome*4639675, sign*4
open(1,file='e_coli_g_info')
open(2,file='e_coli_g_str')
do ks = 1, totgenes
read(1,100) ngene(ks),sign(ks:ks),lend(ks), rend(ks)
end do
100 format(1x,i4,8x,a1, 2(5x,i7), 22x)
do ks = 1, 100
write(*,*) ngene(ks), sign(ks:ks),lend(ks), rend(ks)
end do
end program
The loop at the end of the program is to print the first hundred entries to test that they are being read correctly. The problem is that I am getting this garbage (the fourth row is the problem):
1 + 3 757934891
2 + 337 724249387
3 + 2801 757803819
4 + 3734 757803819
5 + 5234 757935405
Clearly, the fourth column is way off. In fact, I cannot find these values anywhere in the file that I am reading from. I am using the gfortran compiler for Ubuntu 12.04. I would greatly appreciate if somebody would point me in the right direction. I'm sure it's likely that I'm missing something very obvious because I'm new at Fortran.

Fortran formats are (traditionally, there's some newer stuff that I won't go into here) fixed format, that is, they are best suited for file formats with fixed columns. I.e. column N always starts at character position M, no ifs or buts. If your file format is more "free format"-like, that is, columns are separated by whitespace, it's often easier and more robust to read data using list formatting. That is, try to do your read loop as
do ks = 1, totgenes
read(1, *) ngene(ks), sign(ks:ks), lend(ks), rend(ks)
end do
Also, as a general advice, when opening your own files, start from unit 10 and go upwards from there. Fortran implementations typically use some of the low-numbered units for standard input, output, and error (a common choice is units 1, 5, and 6). You probably don't want to redirect those.
PS 2: I haven't tried your code, but it seems that you have a bounds overflow in the sign variable. It's declared of length 4, but then you assign to index ks which goes all the way up to totgenes. As you're using gfortran on Ubuntu 12.04 (that is, gfortran 4.6), when developing compile with options "-O1 -Wall -g -fcheck=all"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to search elements of a list in a file - python-2.7

Related

Reading records from a file in FORTRAN66 using stdin adding extra unwanted junk

In python insert one space after every 5th Character in each line of a text file

Codeeval Challenge 230: Football, Answer Only Partially Correct

Compare two CSV files in Python

Fortran95 -- Reading from a formatted text file

Categories

Resources