Reading file character by character in Python

Reading file character by character in Python - python-2.7

I have a big file called random.txt containing a lot of 1 and 0. I'm trying to read each value using this script:
random_file = open("random.txt", "r")
while True:
char = random_file.read(1)
if not char: break
print char
The problem is that sometimes instead of printing 0or 1 as a char, it prints 010, so it reads three at once.
I'm using Python 2.7.9
The expected output should be a lot of lines containing just a 0 or 1. In the same line there shouldn't be more than one number.

In python, if not 0 evaluates to True since 0 is a False-y value:
>>> not 0 is True
True
What you probably want is:
random_file = open("random.txt", "r")
while True:
char = random_file.read(1)
if char == "": break
print char
This may explain some of your problem. Not sure where the rest of it comes from.

Related

Is there a pythonic way to count the number of leading matching characters in two strings?

For two given strings, is there a pythonic way to count how many consecutive characters of both strings (starting at postion 0 of the strings) are identical?
For example in aaa_Hello and aa_World the "leading matching characters" are aa, having a length of 2. In another and example there are no leading matching characters, which would give a length of 0.
I have written a function to achive this, which uses a for loop and thus seems very unpythonic to me:
def matchlen(string0, string1): # Note: does not work if a string is ''
for counter in range(min(len(string0), len(string1))):
# run until there is a mismatch between the characters in the strings
if string0[counter] != string1[counter]:
# in this case the function terminates
return(counter)
return(counter+1)
matchlen(string0='aaa_Hello', string1='aa_World') # returns 2
matchlen(string0='another', string1='example') # returns 0

You could use zip and enumerate:
def matchlen(str1, str2):
i = -1 # needed if you don't enter the loop (an empty string)
for i, (char1, char2) in enumerate(zip(str1, str2)):
if char1 != char2:
return i
return i+1

An unexpected function in os.path, commonprefix, can help (because it is not limited to file paths, any strings work). It can also take in more than 2 input strings.
Return the longest path prefix (taken character-by-character) that is a prefix of all paths in list. If list is empty, return the empty string ('').
from os.path import commonprefix
print(len(commonprefix(["aaa_Hello","aa_World"])))
output:
2

from itertools import takewhile
common_prefix_length = sum(
1 for _ in takewhile(lambda x: x[0]==x[1], zip(string0, string1)))
zip will pair up letters from the two strings; takewhile will yield them as long as they're equal; and sum will see how many there are.
As bobble bubble says, this indeed does exactly the same thing as your loopy thing. Its sole pro (and also its sole con) is that it is a one-liner. Take it as you will.

What is the error in my python code

You are given an integer NN on one line. The next line contains NN space separated integers. Create a tuple of those NN integers. Let's call it TT.
Compute hash(T) and print it.
Note: Here, hash() is one of the functions in the __builtins__ module.
Input Format
The first line contains NN. The next line contains NN space separated integers.
Output Format
Print the computed value.
Sample Input
2
1 2
Sample Output
3713081631934410656
My code
a=int(raw_input())
b=()
i=0
for i in range (0,a):
x=int(raw_input())
c = b + (x,)
i=i+1
hash(b)
Error:
invalid literal for int() with base 10: '1 2'

There are three errors that I can spot:
First, your for-loop is not indented.
Second, you should not be adding 1 to i - the for-loop does this automatically.
Thirds - and this is where the error is thrown - is that raw_input reads the entire line. If you are reading the line '1 2', you cannot convert this to an int.
To fix this problem, I suggest doing:
line = tuple(map(int,raw_input().split(' ')))
This takes the raw input, splits it into an list, makes this list into ints, then turns this list into a tuple.
In fact, you can scrap the entire for loop. You could answer this problem in two lines of code:
raw_input()#To get rid of the first line, which we do not need
print hash(tuple(map(int,raw_input().split(' '))))

The input format
next line contains NN space separated integers
eg: 1 2 3, is not an integer (because of the spaces), that is why when you try int(raw_input()) your code throws an error. You should use split(' ') as the other answer has suggested, to separate each integer. This will remove the error.
Also, there is no need to use i=i+1 as the loop will take care of it

Try the below code:
if __name__ == '__main__':
n = int(input())
integer_list = map(int, input().split())
t = tuple(integer_list)
print(hash(t))

Try This code for Python-3
if __name__ == '__main__':
n = int(input())
integer_list = map(int, input().split())
input_list = [int(x) for x in integer_list]
t = tuple(input_list)``
print(hash(t))

how to skip multiple header lines using python

I am new to python. Trying to write a script that will use numeric colomns from a file whcih also contains a header. Here is an example of a file:
#File_Version: 4
PROJECTED_COORDINATE_SYSTEM
#File_Version____________-> 4
#Master_Project_______->
#Coordinate_type_________-> 1
#Horizon_name____________->
sb+
#Horizon_attribute_______-> STRUCTURE
474457.83994 6761013.11978
474482.83750 6761012.77069
474507.83506 6761012.42160
474532.83262 6761012.07251
474557.83018 6761011.72342
474582.82774 6761011.37433
474607.82530 6761011.02524
I'd like to skip the header. here is what i tried. It works of course if i know which characters will appear in the header like "#" and "#". But how can i skip all lines containing any letter character?
in_file1 = open(input_file1_short, 'r')
out_file1 = open(output_file1_short,"w")
lines = in_file1.readlines ()
x = []
y = []
for line in lines:
if "#" not in line and "#" not in line:
strip_line = line.strip()
replace_split = re.split(r'[ ,|;"\t]+', strip_line)
x = (replace_split[0])
y = (replace_split[1])
out_file1.write("%s\t%s\n" % (str(x),str(y)))
in_file1.close ()
Thank you very much!

I think you could use some built ins like this:
import string
for line in lines:
if any([letter in line for letter in string.ascii_letters]):
print "there is an ascii letter somewhere in this line"
This is only looking for ascii letters, however.
you could also:
import unicodedata
for line in lines:
if any([unicodedata.category(unicode(letter)).startswith('L') for letter in line]):
print "there is a unicode letter somewhere in this line"
but only if I understand my unicode categories correctly....
Even cleaner (using suggestions from other answers. This works for both unicode lines and strings):
for line in lines:
if any([letter.isalpha() for letter in line]):
print "there is a letter somewhere in this line"
But, interestingly, if you do:
In [57]: u'\u2161'.isdecimal()
Out[57]: False
In [58]: u'\u2161'.isdigit()
Out[58]: False
In [59]: u'\u2161'.isalpha()
Out[59]: False
The unicode for the roman numeral "Two" is none of those,
but unicodedata.category(u'\u2161') does return 'Nl' indicating a numeric (and u'\u2161'.isnumeric() is True).

This will check the first character in each line and skip all lines that doesn't start with a digit:
for line in lines:
if line[0].isdigit():
# we've got a line starting with a digit

Use a generator pipeline to filter your input stream.
This takes the lines from your original input lines, but stops to check that there are no letters in the entire line.
input_stream = (line in lines if
reduce((lambda x, y: (not y.isalpha()) and x), line, True))
for line in input_stream:
strip_line = ...

putting text,csv,excel file in pattern

I am beginner for real programming and have the ff problem
I want to read many instances stored in a file/csv/txt/excel
like the folloing
find<S>ing<G>s<p>
Then when I read this file it goes through each character and start from the six position and continue until the 11 position-the max size of a single row is 12
-,-,-,-,-,f,i,n,d,i,n,0
-,-,-,-,f,i,n,d,i,n,g,0
-,-,-,f,i,n,d,i,n,g,s,0
-,-,f,i,n,d,i,n,g,s,-,S//there is an S value next to the letter d
-,f,i,n,d,i,n,g,s,-,-,0
f,i,n,d,i,n,g,s,-,-,-,0
i,n,d,i,n,g,s,-,-,-,-,G // there is a G value here at th end of g
n,d,i,n,g,s,-,-,-,-,-,P */// there is a P value here at th end of s
Here is the code that I tried in python. but can be possible in c++, java, dotNet.
import sys
import os
f = open('/home/mm/exprimentdata/sample3.csv')// can be txt file
string = f.read()
a = []
b = []
i = 0
while (i < len(string)):
if (string[i] != '\n '):
n = string[i]
if (string[i] == ""):
print ' = '
if (string[i] = upper | numeric)
print rep(char).rjust(12),delimiter=','
a.append(n)
i = (i+1)
print (len(a))
print a
my question is how can I compare each string and assign a single char at the rightmost part (position 12 like above G,P,S)
how can I push one step back after aligning the first row?
how can i fix the length
please anyone see fragment and adjust to solve the above case

I don't understand your question.
But some advice:
Firstly, you should be closing the file after you open it.
f = open('/home/mm/exprimentdata/sample3.csv')// can be txt file
string = f.read()
**f.close()**
Secondly, your indentation is problematic. Whitespace matters in Python. (Maybe your real code is indented properly and it's just a StackOverflow thing.)
Thirdly, instead of using a while loop and incrementing, you should be writing:
for i range(len(string)):
# loop code
Fourthly, this line will never evaluate to True:
if (string[i] == ""):
string[i] will always be some character (or cause an out of bounds error).
I advise you read a Python tutorial before you try and write this program.

NZEC in python on spoj for AP2

I wrote the following two codes
FCTRL2.py
import sys;
def fact(x):
res = 1
for i in range (1,x+1):
res=res*i
return res;
t = int(raw_input());
for i in range (0,t):
print fact(int(raw_input()));
and
AP2.py
import sys;
t = int(raw_input());
for i in range (0,t):
x,y,z = map(int,sys.stdin.readline().split())
n = (2*z)/(x+y)
d = (y-x)/(n-5)
a = x-(2*d)
print n
for j in range(0,n):
sys.stdout.write(a+j*d)
sys.stdout.write(' ')
print' '
FCTRL2.py is accepted on spoj whereas AP2.py gives NZEC error. Both work fine on my machine and i do not find much difference with regard to returning values from both. Please explain what is the difference in both and how do i avoid NZEC error for AP2.py

There may be extra white spaces in the input. A good problem setter would ensure that the input satisfies the specified format. But since spoj allows almost anyone to add problems, issues like this sometimes arise. One way to mitigate white space issues is to read the input at once, and then tokenize it.
import sys; # Why use ';'? It's so non-pythonic.
inp = sys.stdin.read().split() # Take whitespaces as delimiter
t = int(inp[0])
readAt = 1
for i in range (0,t):
x,y,z = map(int,inp[readAt:readAt+3]) # Read the next three elements
n = (2*z)/(x+y)
d = (y-x)/(n-5)
a = x-(2*d)
print n
#for j in range(0,n):
# sys.stdout.write(a+j*d)
# sys.stdout.write(' ')
#print ' '
print ' '.join([str(a+ti*d) for ti in xrange(n)]) # More compact and faster
readAt += 3 # Increment the index from which to start the next read

The n in line 10 can be a float, the range function expects an integer. Hence the program exits with an exception.
I tested this on Windows with values:
>ap2.py
23
4 7 9
1.6363636363636365
Traceback (most recent call last):
File "C:\martin\ap2.py", line 10, in <module>
for j in range(0,n):
TypeError: 'float' object cannot be interpreted as an integer

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reading file character by character in Python - python-2.7

Related

Is there a pythonic way to count the number of leading matching characters in two strings?

What is the error in my python code

how to skip multiple header lines using python

putting text,csv,excel file in pattern

NZEC in python on spoj for AP2

Categories

Resources