I have a text file that contains a bunch of items:
One
Two
Three
Three
Four
Five
Five
and I want it to spit out Three and Five since they appeare more than once.
In python it would look like this:
lines = {}
with open( "file.txt", "r" ) as src:
for line in ins.readlines():
lines[line] = lines.get(line, 0) + 1
for line_key in lines.keys():
if lines.get(line_key, 0 ) > 0:
print line_key
Related
How can I get specific lines in a file and add it to array?
For example: I want to get lines 200-300 and put them inside an array. And while at that count the total line in the file. The file can be quite big.
File.each_line is a good reference for this:
lines = [] of String
index = 0
range = 200..300
File.each_line(file, chomp: true) do |line|
index += 1
if range.includes?(index)
lines << line
end
end
Now lines holds the lines in range and index is the number of total lines in the file.
To prevent reading the entire file and allocating a new array for all of its content, you can use File.each_line iterator:
lines = [] of String
File.each_line(file, chomp: true).with_index(1) do |line, idx|
case idx
when 1...200 then next # ommit lines before line 200 (note exclusive range)
when 200..300 then lines << line # collect lines 200-300
else break # early break, to be efficient
end
end
The text file can be found at this link. What I am interested in is the value of PE score. Graphically, it appears under the column Feature2 sys.
This is my code:
def main():
file = open ( "combined_scores.txt" , "r" )
lines = file.readlines()
file.close()
count_pe=0
for line in lines:
line=line.strip()
line=line[24:31] #1problem is here:the range is not fixed in all line of the file
if line.find( "3.19") != -1 : # I need value >=3.19 not only 3.19
count_pe = count_pe + 1
print ( ">=3.19: ", count_pe )#at the end i need how many times PE>3,19 occur
main()
I suggest you parse the column using tab (\t), and compare with value "3.19". It should be something like below (Python 2.7):
with open('combined_scores.txt') as f:
lines = f.readlines()[1:] # remove the header line
# reset counter
n = 0
for line in lines:
if float(line.split('\t')[-3]) >= 3.19:
n = n + 1
# print total count
print 'total=', n
You are given an integer NN on one line. The next line contains NN space separated integers. Create a tuple of those NN integers. Let's call it TT.
Compute hash(T) and print it.
Note: Here, hash() is one of the functions in the __builtins__ module.
Input Format
The first line contains NN. The next line contains NN space separated integers.
Output Format
Print the computed value.
Sample Input
2
1 2
Sample Output
3713081631934410656
My code
a=int(raw_input())
b=()
i=0
for i in range (0,a):
x=int(raw_input())
c = b + (x,)
i=i+1
hash(b)
Error:
invalid literal for int() with base 10: '1 2'
There are three errors that I can spot:
First, your for-loop is not indented.
Second, you should not be adding 1 to i - the for-loop does this automatically.
Thirds - and this is where the error is thrown - is that raw_input reads the entire line. If you are reading the line '1 2', you cannot convert this to an int.
To fix this problem, I suggest doing:
line = tuple(map(int,raw_input().split(' ')))
This takes the raw input, splits it into an list, makes this list into ints, then turns this list into a tuple.
In fact, you can scrap the entire for loop. You could answer this problem in two lines of code:
raw_input()#To get rid of the first line, which we do not need
print hash(tuple(map(int,raw_input().split(' '))))
The input format
next line contains NN space separated integers
eg: 1 2 3, is not an integer (because of the spaces), that is why when you try int(raw_input()) your code throws an error. You should use split(' ') as the other answer has suggested, to separate each integer. This will remove the error.
Also, there is no need to use i=i+1 as the loop will take care of it
Try the below code:
if __name__ == '__main__':
n = int(input())
integer_list = map(int, input().split())
t = tuple(integer_list)
print(hash(t))
Try This code for Python-3
if __name__ == '__main__':
n = int(input())
integer_list = map(int, input().split())
input_list = [int(x) for x in integer_list]
t = tuple(input_list)``
print(hash(t))
I am new to python. Trying to write a script that will use numeric colomns from a file whcih also contains a header. Here is an example of a file:
#File_Version: 4
PROJECTED_COORDINATE_SYSTEM
#File_Version____________-> 4
#Master_Project_______->
#Coordinate_type_________-> 1
#Horizon_name____________->
sb+
#Horizon_attribute_______-> STRUCTURE
474457.83994 6761013.11978
474482.83750 6761012.77069
474507.83506 6761012.42160
474532.83262 6761012.07251
474557.83018 6761011.72342
474582.82774 6761011.37433
474607.82530 6761011.02524
I'd like to skip the header. here is what i tried. It works of course if i know which characters will appear in the header like "#" and "#". But how can i skip all lines containing any letter character?
in_file1 = open(input_file1_short, 'r')
out_file1 = open(output_file1_short,"w")
lines = in_file1.readlines ()
x = []
y = []
for line in lines:
if "#" not in line and "#" not in line:
strip_line = line.strip()
replace_split = re.split(r'[ ,|;"\t]+', strip_line)
x = (replace_split[0])
y = (replace_split[1])
out_file1.write("%s\t%s\n" % (str(x),str(y)))
in_file1.close ()
Thank you very much!
I think you could use some built ins like this:
import string
for line in lines:
if any([letter in line for letter in string.ascii_letters]):
print "there is an ascii letter somewhere in this line"
This is only looking for ascii letters, however.
you could also:
import unicodedata
for line in lines:
if any([unicodedata.category(unicode(letter)).startswith('L') for letter in line]):
print "there is a unicode letter somewhere in this line"
but only if I understand my unicode categories correctly....
Even cleaner (using suggestions from other answers. This works for both unicode lines and strings):
for line in lines:
if any([letter.isalpha() for letter in line]):
print "there is a letter somewhere in this line"
But, interestingly, if you do:
In [57]: u'\u2161'.isdecimal()
Out[57]: False
In [58]: u'\u2161'.isdigit()
Out[58]: False
In [59]: u'\u2161'.isalpha()
Out[59]: False
The unicode for the roman numeral "Two" is none of those,
but unicodedata.category(u'\u2161') does return 'Nl' indicating a numeric (and u'\u2161'.isnumeric() is True).
This will check the first character in each line and skip all lines that doesn't start with a digit:
for line in lines:
if line[0].isdigit():
# we've got a line starting with a digit
Use a generator pipeline to filter your input stream.
This takes the lines from your original input lines, but stops to check that there are no letters in the entire line.
input_stream = (line in lines if
reduce((lambda x, y: (not y.isalpha()) and x), line, True))
for line in input_stream:
strip_line = ...
I'm new in Python, I have a script that prints all lines in a file that contains 9 using python:
#!/usr/bin/env phyton
import re
testFile = open("test.txt", "r")
for line in testFile:
if re.findall("\\b9\\b", line):
print line
Now, how can I print all lines that contains a number greater than 9?
test.txt:
number1 9
number2 10
number3 5
number4 6
number5 15
You can use regular expression grouping:
for line in testFile:
m = re.search(r"\b(\d+)\b", line)
if m is not None and int(m.group(1)) >= 9:
print line
The (\d+) extracts the text matched by that part of the regex into m.group(1). Then the int() converts that to an integer and compares with 9.
This will extract the first instance of a number within each line. If you want to search all numbers in a line, you will need to use something like re.finditer() in combination with the above.
This prints the line if there is any space-separated number greater than 9.
testFile = open("test.txt", "r")
for line in testFile:
for word in line.split():
try:
if int(word) > 9:
print line
break
except ValueError:
pass
Or, for your example
testFile = open("test.txt", "r")
for line in testFile:
if int(line.split()[1]) > 9:
print line