Python print() and sys.stdout.write() issues - python-2.7

Guys see below code in python 2.7.
I'm having issues with the "print(ciphertext)" and the "sys.stdout.write(ciphertext)" parts of the code
when I run the code the "print(passline) and the "sys.stdout.write(passline)" come out fine i.e. if the file has a line that says "Billz" it will show as is but when I try to output using either function(i.e. sys.stdout.write() and print()) the ciphertext(via the encryptMessage(key, message) method)
the output splits across lines depending on the "myKey" variable (see below for code and example)
*I understand the limitations of the transposition encryption method, but the 'ciphertext' going to a new line before the original line has finished outputting the line from the line it started from
I think the problem is with the encryptMessage() function and how it interacts with the enc() method, i.e. the for ...in... block of the code in particular
Does that make sense?
i think the answer to this can help when
-reading data from files but not overwriting those files
-when trying to code programmes related to logs, password/word lists
-and understand how the for, in and .join works together
i.e. myKey = 1
C:\Users\baawan\Desktop\Cyber Sec\COMP_lang\python>py cypher7.py
would you like to Encrypt(e) or Decrypt(d): e
Enter file name: pass.txt
Enter Key: 1
This is a list of Passwords to be encrypted
This is a list of Passwords to be encrypted
Billz786
Billz786
123456
123456
Milly
Milly
Bilklzcfvcx
Bilklzcfvcx
i.e. myKey = 2
C:\Users\baawan\Desktop\Cyber Sec\COMP_lang\python>py cypher7.py
would you like to Encrypt(e) or Decrypt(d): e
Enter file name: pass.txt
Enter Key: 2
This is a list of Passwords to be encrypted
Ti sals fPswrst eecytdhsi ito asod ob nrpe
Billz786
Blz8
il76
123456
135
246
Milly
Mlyil
Bilklzcfvcx
Bllcvxikzfc
i.e. myKey = 4
C:\Users\baawan\Desktop\Cyber Sec\COMP_lang\python>py cypher7.py
would you like to Encrypt(e) or Decrypt(d): e
Enter file name: pass.txt
Enter Key: 4
This is a list of Passwords to be encrypted
T asfsrtecthi t sdo reisl Pws eyds ioao bnp
Billz786
Bz
i7l8l6
123456
15263
4
Milly
Myi
ll
Bilklzcfvcx
Blvizclcxkf
i.e. myKey = 8
C:\Users\baawan\Desktop\Cyber Sec\COMP_lang\python>py cypher7.py
would you like to Encrypt(e) or Decrypt(d): e
Enter file name: pass.txt
Enter Key: 8
This is a list of Passwords to be encrypted
Tafreth d eilPsedsia n
sstcitsors w y oobp
Billz786
B
illz786
123456
123456
Milly
Milly
Bilklzcfvcx
Bviclxklzcf
the code is
def enc():
myMessage = raw_input('Enter file name: ')
myKey = int(raw_input('Enter Key: '))
text_file = open(myMessage, "r")
lines = text_file.readlines()
for passline in lines:
myMessage = passline
ciphertext = encryptMessage(myKey, myMessage)
print(passline)
#sys.stdout.write(passline)
print ciphertext
#sys.stdout.write(ciphertext)
text_file.close()
def encryptMessage(key, message):
ciphertext = [''] * key
for col in range(key):
pointer = col
while pointer < len(message):
ciphertext[col] += message[pointer]
pointer += key
return ''.join(ciphertext)

When using readlines and most other ways of reading lines in files, python includes the newline character(s) in the line (So, in your case, passline contains (a) newline character(s)). To prevent this, use something like passline = passline.rstrip('\n\r') in the start of your for loop

Related

Attempted READ of key larger than file maximum key size

I'm running a program to help document what is contained in our 30+year old database. During the course of this process, I am getting the following error message:
Attempted READ of record ID larger than file/table maximum record ID size of 255 characters.
My program is working like this:
LOOP WHILE I <= NUM.FILES
RECORD = ""
FILENAME = FILE.LIST<I>
ERROR = ""
DEBUG.RECORD = ""
HAVE.LOOKED = 0
OPEN 'DICT ':FILENAME TO D.FILE THEN
OPEN FILENAME TO T.FILE THEN
STATEMENT = "SSELECT ONLY DICT ":FILENAME:' BY FIELD.NO WITH FIELD.NO >= 0 AND WITH FIELD.NO <= 900 AND WITH FIELD # ".]"'
DEBUG = ""
PRINT FILENAME
EXECUTE STATEMENT RETURNING DEBUG
LOOP WHILE READNEXT FIELDNAME DO
READ FIELD.RECORD FROM D.FILE, FIELDNAME THEN
IF LEN(FIELDNAME) > BIGGEST.KEY.LEN THEN
BIGGEST.KEY = FIELDNAME
BIGGEST.KEY.LEN = LEN(FIELDNAME)
BIGGEST.KEY.FILE = "DICT ": FILENAME
PRINT FILENAME:" ":LEN(FIELDNAME):" ":FIELDNAME
END
USE.COUNT = ""
USE.LIST = ""
USE.COUNT.STATEMENT = "SELECT ":FILENAME:" WITH ":FIELDNAME:' # ""'
DEBUGS = ""
EXECUTE USE.COUNT.STATEMENT RTNLIST USE.LIST RETURNING DEBUGS
ROW = ""
ROW<1,1> = FIELD.RECORD<2> ; *Attribute Number
ROW<1,2> = FIELDNAME ; *Field Name
ROW<1,3> = FIELD.RECORD<1> ; *Field Type
ROW<1,4> = FIELD.RECORD<10> ; *Field Size
ROW<1,5> = FIELD.RECORD<12> ; *Is Multivalued: "" = no, "Y" = Multivalued, "###" = specific multivalue
ROW<1,6> = FIELD.RECORD<13> ; *Is Subvalued: "" = no, "Y" = Subvalued, "###" = specific subvalue
ROW<1,7> = FIELD.RECORD<7> ; *Automatic data output conversion
ROW<1,8> = FIELD.RECORD<8> ; *Correlative field definition
ROW<1,9> = FIELD.RECORD<11> ; *Field description
ROW<1,10> = #SELECTED ; *Number of records that don't have this field blank
RECORD<-1> = ROW
IF ROW<1,10> < 1 THEN
READ UNUSED.FIELDS FROM CHUCK.WORK, "FILE.DEBUG.UNUSED.FIELDS" ELSE
UNUSED.FIELDS = ""
END
UNUSED.FIELDS<-1> = FILENAME:VM:ROW
WRITE UNUSED.FIELDS ON CHUCK.WORK, "FILE.DEBUG.UNUSED.FIELDS"
END
IF FIELD.RECORD<2> = 0 AND #SELECTED > 0 AND HAVE.LOOKED = 0 THEN
LOOP WHILE READNEXT KEY FROM USE.LIST DO
IF LEN(KEY) > BIGGEST.KEY.LEN THEN
BIGGEST.KEY = KEY
BIGGEST.KEY.LEN = LEN(KEY)
BIGGEST.KEY.FILE = FILENAME
PRINT FILENAME:" ":LEN(KEY):" ":KEY
END
REPEAT
HAVE.LOOKED = 1
END
END
REPEAT
END ELSE
ERROR<-1> = "Failed to open file '":FILENAME:"'"
END
END ELSE
ERROR<-1> = "Failed to open file DICT '":FILENAME:"'"
END
WRITE RECORD ON CHUCK.WORK, "FILE.":FILENAME
WRITE DEBUG.RECORD ON CHUCK.WORK, "FILE.DEBUG.":FILENAME
READ CHUCK.LOG FROM CHUCK.WORK, "CHUCK.LOG" ELSE
CHUCK.LOG = ""
END
CHUCK.LOG<-1> = "FILE '":FILENAME:"' had ":DCOUNT(RECORD,AM):" fields"
IF ERROR THEN
CHUCK.LOG<-1> = ERROR
ERRORS<-1> = ERROR
END
WRITE CHUCK.LOG ON CHUCK.WORK,"CHUCK.LOG"
CLEARSELECT
I = I + 1
REPEAT
When I look at the database directly, I can't find any record IDs or keys with more than 35 characters in the file which is causing problems, and nothing longer than 70 characters in the entire database. Can anyone help identify why these records are getting flagged in this process but aren't discoverable directly?
Below is a program I wrote to specifically find the problematic records, but it can't find the culprit
OPEN "CHUCK.WORK" TO CHUCK.WORK ELSE
PRINT "UNABLE TO OPEN CHUCK.WORK"
RETURN
END
READ FILENAME FROM CHUCK.WORK, "LISTME" ELSE
PRINT "UNABLE TO READ LISTME"
RETURN
END
NUM.FILES = DCOUNT(FILENAME,AM)
FOR I = 1 TO NUM.FILES
OPEN FILENAME<I> TO T.FILE ELSE
PRINT "UNABLE TO OPEN ":FILENAME<I>
RETURN
END
EXECUTE 'SELECT ':FILENAME<I>
LOOP WHILE READNEXT KEY DO
IF LEN(KEY) > 20 THEN
PRINT FILENAME<I>:" ":LEN(KEY):" ":KEY
END
REPEAT
NEXT I
UPDATE
One of my coworkers identified the source of the problem, even though we haven't identified how to fix the problem:
one of our files has a multivalued field which is a key used in a correlative. For some reason, Universe is trying to read the entire attribute instead of the individual multivalue as the key, which causes the long record IDs. Anyone able to see whether I am doing something wrong in my code or if there is some setting in the database that we need to look at?
When you see this error it has nothing to do with the size of keys in file, it is simply that the #ID you are trying to READ is longer than 255 chars. When I have seen it usually show me what line in the source code it happened on. If you put this right before you that line you should be able to track it down.
IF LEN(THIS.ID) GT 255 THEN
DEBUG
END
Edit. Apparently the error in this case is does not reference a line number. I was not sure if this was omitted for clarity or was some difference in the UniVerse flavor, but I now believe its absence is a hint that the error message is coming from the shell and not that the interpreter.
OPEN '','VOC' TO FILE.VOC ELSE STOP "CANNOT OPEN FILE VOC"
STMT = "SELECT VAL WITH ":STR("A",256):" EQ 0"
EXECUTE STMT RTNLIST USE.LIST RETURNING DEBUGS
CRT "**************************************"
READ TEST FROM FILE.VOC,STR("A",256) ELSE NULL
END
Which on my system outputs this.
>RUN TEST.SC TEST.LONG.ID
Attempted READ of record ID larger than file/table maximum
record ID size of 255 characters.
RetrieVe: syntax error. Unexpected sentence without filename. Token was "".
Scanned command was SELECT 'VAL' WITH 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' EQ '0'
**************************************
Program "TEST.LONG.ID": Line 5, Attempted READ of record ID larger than file/t
able maximum
record ID size of 255 characters.
>
The first looks like your error message and would point to one of the dynamic SELECT statements you are building. #ID is synonymous with with record key and to carry the analogy a little further, it appears that you are trying to unlock your bike with one of those comically large "Keys to the City".

Reading mailing addresses of varying length from a text file using regular expressions

I am trying to read a text file and collect addresses from it. Here's an example of one of the entries in the text file:
Electrical Vendor Contact: John Smith Phone #: 123-456-7890
Address: 1234 ADDRESS ROAD Ship To:
Suite 123 ,
Nowhere, CA United States 12345
Phone: 234-567-8901 E-Mail: john.smith#gmail.com
Fax: 345-678-9012 Web Address: www.electricalvendor.com
Acct. No: 123456 Monthly Due Date: Days Until Due
Tax ID: Fed 1099 Exempt Discount On Assets Only
G/L Liab. Override:
G/L Default Exp:
Comments:
APPROVED FOR ELECTRICAL THINGS
I cannot wrap my head around how to search for and store the address for each of these entries when the amount of lines in the address varies. Currently, I have a generator that reads each line of the file. Then the get_addrs() method attempts to capture markers such as the Address: and Ship keywords in the file to signify when an address needs to be stored. Then I use a regular expression to search for zip codes in the line following a line with the Address: keyword. I think I've figured out how successfully save the second line for all addresses using that method. However, in a few addresses,es there is a suite number or other piece of information that causes the address to become three lines instead of two. I'm not sure how to account for this and I tried expanding my save_previous() method to three lines, but I can't get it quite right. Here's the code that I was able to successfully save all of the two line addresses with:
import re
class GetAddress():
def __init__(self):
self.line1 = []
self.line2 = []
self.s_line1 = []
self.addr_index = 0
self.ship_index = 0
self.no_ship = False
self.addr_here = False
self.prev_line = []
self.us_zip = ''
# Check if there is a shipping address.
def set_no_ship(self, line):
try:
self.no_ship = line.index(',') == len(line) - 1
except ValueError:
pass
# Save two lines at a time to see whether or not the previous
# line contains 'Address:' and 'Ship'.
def save_previous(self, line):
self.prev_line += [line]
if len(self.prev_line) > 2:
del self.prev_line[0]
def get_addrs(self, line):
self.addr_here = 'Address:' in line and 'Ship' in line
self.po_box = False
self.no_ship = False
self.addr_index = 0
self.ship_index = 0
self.zip1_index = 0
self.set_no_ship(line)
self.save_previous(line)
# Check if 'Address:' and 'Ship' are in the previous line.
self.prev_addr = (
'Address:' in self.prev_line[0]
and 'Ship' in self.prev_line[0])
if self.addr_here:
self.po_box = 'Box' in line or 'BOX' in line
self.addr_index = line.index('Address:') + 1
self.ship_index = line.index('Ship')
# Get the contents of the line between 'Address:' and
# 'Ship' if both words are present in this line.
if self.addr_index is not self.ship_index:
self.line1 += [' '.join(line[self.addr_index:self.ship_index])]
elif self.addr_index is self.ship_index:
self.line1 += ['']
if len(self.prev_line) > 1 and self.prev_addr:
self.po_box = 'Box' in line or 'BOX' in line
self.us_zip = re.search(r'(\d{5}(\-\d{4})?)', ' '.join(line))
if self.us_zip and not self.po_box:
self.zip1_index = line.index(self.us_zip.group(1))
if self.no_ship:
self.line2 += [' '.join(line[:line.index(',')])]
elif self.zip1_index and not self.no_ship:
self.line2 += [' '.join(line[:self.zip1_index + 1])]
elif len(self.line1) > 0 and not self.line1[-1]:
self.line2 += ['']
# Create a generator to read each line of the file.
def read_gen(infile):
with open(infile, 'r') as file:
for line in file:
yield line.split()
infile = 'Vendor List.txt'
info = GetAddress()
for i, line in enumerate(read_gen(infile)):
info.get_addrs(line)
I am still a beginner in Python so I'm sure a lot of my code may be redundant or unnecessary. I'd love some feedback as to how I might make this simpler and shorter while capturing both two and three line addresses.
I also posted this question to Reddit and u/Binary101010 pointed out that the text file is a fixed width, and it may be possible to slice each line in a way that only selects the necessary address information. Using this intuition I added some functionality to the generator expression, and I was able to produce the desired effect with the following code:
infile = 'Vendor List.txt'
# Create a generator with differing modes to read the specified lines of the file.
def read_gen(infile, mode=0, start=0, end=0, rows=[]):
lines = list()
with open(infile, 'r') as file:
for i, line in enumerate(file):
# Set end to correct value if no argument is given.
if end == 0:
end = len(line)
# Mode 0 gives all lines of the file
if mode == 0:
yield line[start:end]
# Mode 1 gives specific lines from the file using the rows keyword
# argument. Make sure rows is formatted as [start_row, end_row].
# rows list should only ever be length 2.
elif mode == 1:
if rows:
# Create a list for indices between specified rows.
for element in range(rows[0], rows[1]):
lines += [element]
# Return the current line if the index falls between the
# specified rows.
if i in lines:
yield line[start:end]
class GetAddress:
def __init__(self):
# Allow access to infile for use in set_addresses().
global infile
self.address_indices = list()
self.phone_indices = list()
self.addresses = list()
self.count = 0
def get(self, i, line):
# Search for appropriate substrings and set indices accordingly.
if 'Address:' in line[18:26]:
self.address_indices += [i]
if 'Phone:' in line[18:24]:
self.phone_indices += [i]
# Add address to list if both necessary indices have been collected.
if i in self.phone_indices:
self.set_addresses()
def set_addresses(self):
self.address = list()
start = self.address_indices[self.count]
end = self.phone_indices[self.count]
# Create a generator that only yields substrings for rows between given
# indices.
self.generator = read_gen(
infile,
mode=1,
start=40,
end=91,
rows=[start, end])
# Collect each line of the address from the generator and remove
# unnecessary spaces.
for element in range(start, end):
self.address += [next(self.generator).strip()]
# This document has a header on each page and a portion of that is
# collected in the address substring. Search for the header substring
# and remove the corresponding elements from self.address.
if len(self.address) > 3 and not self.address[-1]:
self.address = self.address[:self.address.index('header text')]
self.addresses += [self.address]
self.count += 1
info = GetAddress()
for i, line in enumerate(read_gen(infile)):
info.get(i, line)

Deleting the last octets of an IP address

This is my code:
ip = ("192.143.234.543/23
192.143.234.5/23
192.143.234.23/23")
separateOct = (".")
ipNo4Oct = line.split(separateOct, 1) [0]
print (ipNo4Oct)
The IPs come from a text file and I have done my for loops right.
The result I get is:
192
192
192
But I want this result:
192.143.234
192.143.234
192.143.234
How do I get the result I want?
You can use almost the same code, with some slicing and join:
>>> ipNo4Oct = ip.split(separateOct) [0:3]
>>> '.'.join(ipNo4Oct)
'192.143.234'
Or for the entire string (considering it can be splitted to lines as your code suggests):
>>> for line in ip:
ipNo4Oct = line.split(separateOct) [0:3]
'.'.join(ipNo4Oct)
'192.143.234'
'192.143.234'
'192.143.234'
With
ip = ("192.143.234.543/23",
"192.143.234.5/23",
"192.143.234.23/23")
for line in ip:
separator = "."
ipNo4Oct = separator.join(line.split(separator, 3)[:-1])
print (ipNo4Oct)
you re-join your 3 parts using the separator.

Error in writing output file through AWK scripting

I have a AWK script to write specific values matching with specific pattern to a .csv file.
The code is as follows:
BEGIN{print "Query Start,Query End, Target Start, Target End,Score, E,P,GC"}
/^\>g/ { Query=$0 }
/Query =/{
split($0,a," ")
query_start=a[3]
query_end=a[5]
query_end=gsub(/,/,"",query_end)
target_start=a[8]
target_end=a[10]
}
/Score =/{
split($0,a," ")
score=a[3]
score=gsub(/,/,"",score)
e=a[6]
e=gsub(/,/,"",e)
p=a[9]
p=gsub(/,/,"",p)
gc=a[12]
printf("%s,%s,%s,%s,%s,%s,%s,%s\n",query_start, query_end,target_start,target_end,score,e,p,gc)
}
The input file is as follows:
>gi|ABCDEF|
Plus strand results:
Query = 100 - 231, Target = 100 - 172
Score = 20.92, E = 0.01984, P = 4.309e-08, GC = 51
But I received the output in a .csv file as provided below:
100 0 100 172 0 0 0 51
The program failed to copy the values of:
Query end
Score
E
P
(Note: all the failed values are present before comma (,))
Any help to obtain the right output will be great.
Best regards,
Amit
As #Jidder mentioned, you don't need to call split() and as #jaypal mentioned you're using gsub() incorrectly, but also you don't need to call gsub() at all if you just include , in your FS.
Try this:
BEGIN {
FS = "[[:space:],]+"
OFS = ","
print "Query Start","Query End","Target Start","Target End","Score","E","P","GC"
}
/^\>g/ { Query=$0 }
/Query =/ {
query_start=$4
query_end=$6
target_start=$9
target_end=$11
}
/Score =/ {
score=$4
e=$7
p=$10
gc=$13
print query_start,query_end,target_start,target_end,score,e,p,gc
}
That work? Note the field numbers are bumped out by 1 because when you don't use the default FS awk no longer skips leading white space so there's an empty field before the white space in your input.
Obviously, you are not using your Query variable so the line that populates it is redundant.

Read fields from text file and store them in a structure

I am trying to read a file that looks as follows:
Data Sampling Rate: 256 Hz
*************************
Channels in EDF Files:
**********************
Channel 1: FP1-F7
Channel 2: F7-T7
Channel 3: T7-P7
Channel 4: P7-O1
File Name: chb01_02.edf
File Start Time: 12:42:57
File End Time: 13:42:57
Number of Seizures in File: 0
File Name: chb01_03.edf
File Start Time: 13:43:04
File End Time: 14:43:04
Number of Seizures in File: 1
Seizure Start Time: 2996 seconds
Seizure End Time: 3036 seconds
So far I have this code:
fid1= fopen('chb01-summary.txt')
data=struct('id',{},'stime',{},'etime',{},'seizenum',{},'sseize',{},'eseize',{});
if fid1 ==-1
error('File cannot be opened ')
end
tline= fgetl(fid1);
while ischar(tline)
i=1;
disp(tline);
end
I want to use regexp to find the expressions and so I did:
line1 = '(.*\d{2} (\.edf)'
data{1} = regexp(tline, line1);
tline=fgetl(fid1);
time = '^Time: .*\d{2]}: \d{2} :\d{2}' ;
data{2}= regexp(tline,time);
tline=getl(fid1);
seizure = '^File: .*\d';
data{4}= regexp(tline,seizure);
if data{4}>0
stime = '^Time: .*\d{5}';
tline=getl(fid1);
data{5}= regexp(tline,seizure);
tline= getl(fid1);
data{6}= regexp(tline,seizure);
end
I tried using a loop to find the line at which file name starts with:
for (firstline<1) || (firstline>1 )
firstline= strfind(tline, 'File Name')
tline=fgetl(fid1);
end
and now I'm stumped.
Suppose that I am at the line at which the information is there, how do I store the information with regexp? I got an empty array for data after running the code once...
Thanks in advance.
I find it the easiest to read the lines into a cell array first using textscan:
%// Read lines as strings
fid = fopen('input.txt', 'r');
C = textscan(fid, '%s', 'Delimiter', '\n');
fclose(fid);
and then apply regexp on it to do the rest of the manipulations:
%// Parse field names and values
C = regexp(C{:}, '^\s*([^:]+)\s*:\s*(.+)\s*', 'tokens');
C = [C{:}]; %// Flatten the cell array
C = reshape([C{:}], 2, []); %// Reshape into name-value pairs
Now you have a cell array C of field names and their corresponding (string) values, and all you have to do is plug it into struct in the correct syntax (using a comma-separated list in this case). Note that the field names have spaces in them, so this needs to be taken care of before they can be used (e.g replace them with underscores):
C(1, :) = strrep(C(1, :), ' ', '_'); %// Replace spaces with underscores
data = struct(C{:});
Here's what I get for your input file:
data =
Data_Sampling_Rate: '256 Hz'
Channel_1: 'FP1-F7'
Channel_2: 'F7-T7'
Channel_3: 'T7-P7'
Channel_4: 'P7-O1'
File_Name: 'chb01_03.edf'
File_Start_Time: '13:43:04'
File_End_Time: '14:43:04'
Number_of_Seizures_in_File: '1'
Seizure_Start_Time: '2996 seconds'
Seizure_End_Time: '3036 seconds'
Of course, it is possible to prettify it even more by converting all relevant numbers to numerical values, grouping the 'channel' fields together and such, but I'll leave this to you. Good luck!