I'm trying to shorten or simplify my code.
I want to download a log file from an internal server which is updated every 10 seconds, but I'm only running my script every 10 or 15 minutes.
The log file is semicolon seperated and has many rows in it I don't use. So my workflow is as following.
get current date in YYYYMMDD format
download the file
delay for waiting that the file is finished downloading
trim the file to the rows I need
only process last line of the file
delete the files
I'm new to python and if you could help me to shorten/simplify my code in less steps I would be thankful.
import urllib
import time
from datetime import date
today = str(date.today())
import csv
url = "http://localserver" + today + ".log"
urllib.urlretrieve (url, "output.log")
time.sleep(15)
with open("output.log","rb") as source:
rdr= csv.reader(source, delimiter=';')
with open("result.log","wb") as result:
wtr= csv.writer( result )
for r in rdr:
wtr.writerow( (r[0], r[1], r[2], r[3], r[4], r[5], r[15], r[38], r[39], r[42], r[54], r[90], r[91], r[92], r[111], r[116], r[121], r[122], r[123], r[124]) )
with open('result.log') as myfile:
print (list(myfile)[-1]) #how do I access certain rows here?
You could probably make use of the advanced module, requests as below. The timeout can be increased depending on the time it takes for the download to complete successfully. Furthermore, the two with open statements can be consolidated in a single line. What is more, in order to load the line one by one in to the memory, we can make use of iter_lines generator. Note that stream=True should be set in order to load line one at a time.
from datetime import date
import csv
import requests
# Declare variables
today = str(date.today())
url = "http://localserver" + today + ".log"
outfile = 'output.log'
# Instead of waiting for 15 seconds explicitly consider using requests module
# with timeout parameter
response = requests.get(url, timeout=15, stream=True)
if response.status_code != 200:
print('Failed to get data:', response.status_code)
with open(outfile, 'w') as dest:
writer = csv.writer(dest)
# Walk through the request response line by line w/o loadin gto memory
line = list(response.iter_lines())[-1]
# Decode the response to string and split line by line
reader = csv.reader(line.decode('utf-8').splitlines(), delimiter=';')
# Read line by line for the splitted content and write to file
for r in reader:
writer.writerow((r[0], r[1], r[2], r[3], r[4], r[5], r[15], r[38], r[39], r[42], r[54], r[90], r[91], r[92],
r[111], r[116], r[121], r[122], r[123], r[124]))
print('File written successfully: ' + outfile)
Related
I'm currently trying to save a file via requests, it's rather large, so I'm instead streaming it.
I'm unsure how to specifically do this, as I keep getting different errors. This is what I have so far.
def download_file(url, matte_upload_path, matte_servers, job_name, count):
local_filename = url.split('/')[-1]
url = "%s/static/downloads/%s_matte/%s/%s" % (matte_servers[0], job_name, count, local_filename)
with requests.get(url, stream=True) as r:
r.raise_for_status()
fs = FileSystemStorage(location=matte_upload_path)
print(matte_upload_path, 'matte path upload')
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
fs.save(local_filename, f)
return local_filename
but it returns
io.UnsupportedOperation: read
I'm basically trying to have requests save it to the specific location via django, any help would be appreciated.
I was able to solve this, by using a tempfile to save the python requests, then saving it via the FileSystemStorage
local_filename = url.split('/')[-1]
url = "%s/static/downloads/%s_matte/%s/%s" % (matte_servers[0], job_name, count, local_filename)
response = requests.get(url, stream=True)
fs = FileSystemStorage(location=matte_upload_path)
lf = tempfile.NamedTemporaryFile()
# Read the streamed image in sections
for block in response.iter_content(1024 * 8):
# If no more file then stop
if not block:
break
# Write image block to temporary file
lf.write(block)
fs.save(local_filename, lf)
I am trying to create two files with the same data. One file to use for updating live web data and the other as a log. One file needs to be appended to and updated frequently. I can create the log fine but am struggling on how to handle the data for the second file.
I have tried using a 'with open' statement for the log file. When I try reading this into a live web page, it shows me the data that has been logged previously, and updates the data only when the file is closed.
#!/usr/bin/env python2.7
import os
import RPi.GPIO as GPIO
import time
import subprocess
#Solar Panel Script 1.0
#Set pin for Pump Relay Signal (PR = pin 29)
#Set up Pump Relay BCM5 (pin 29) as output pin in off position
GPIO.setmode(GPIO.BCM)
GPIO.setup (5, GPIO.OUT, initial=0)
GPIO.setwarnings(False)
#Load Hot Water Tank (HWT), Solar Panel (SP), and Outside Temp (OT) with OWFS
#Create CSV File for temperature data
from time import sleep, strftime, time
with open("/var/www/html/data.csv", "a") as log:
while True:
with open ("/mnt/1wire/28.C14777910F02/temperature", "r") as myfile:
HWT=myfile.read().replace('\n', '')
myfile.close()
with open ("/mnt/1wire/28.390877910402/temperature", "r") as myfile2:
SP=myfile2.read().replace('\n', '')
myfile.close()
log.write("{0},{1},{2}\n".format(strftime("%Y-%m-%d %H:%M:%S"), str(HWT), str(SP)))
#Solar Hot Water Heater Module
#Turns on PR only if SP is 10F hotter than HWT. Checks OT for frezing temps, if less than 33, PR is off.
print ('hot water: ' + HWT)
print ('solar panel: '+ SP)
flt_HWT = float(HWT)
flt_SP = float(SP)
if flt_HWT > 170:
GPIO.output(5, GPIO.LOW) #Pump Relay Off
if flt_SP > (flt_HWT + 10):
GPIO.output(5, GPIO.HIGH) #Pump Relay On
state = GPIO.input(5)
print state
sleep(20) #10 Minutes = 600
I expected the log file to allow me to collect data from it while it was open.
log.write("{0},{1},{2}\n".format(strftime("%Y-%m-%d %H:%M:%S"), str(HWT), str(SP)))
This is where you are writing the log. You can simply include another with open() statement here
with open("secondfile.log") as secfile:
log.write("{0},{1},{2}\n".format(strftime("%Y-%m-%d %H:%M:%S"), str(HWT), str(SP))) ##original log file can be here
secfile.write("{0},{1},{2}\n".format(strftime("%Y-%m-%d %H:%M:%S"), str(HWT), str(SP))) ##and here you are wrighting the second file.
However if you are wrighting multiple files it would be better to stick them into a function of their own.
def write_file(text, filename):
try:
with open(filename) as file:
file.write(text)
return True
except:
return False ##include any other exception stuff here
now you can use
success = write_file("log text", "filename.log")
if success:
success = write_file("log2 text", "filename2.log")
if success:
print("Yey both files have been written to")
else:
print("Awww, there was an error writing to the file")
The purpose of the below code is the webscrape the oxford english dictionary for words that were "invented" in each year within a range of years. This all works as intended.
import csv
import os
import re
import requests
import urllib2
year_start= 1550
year_end = 1552
subject_search = ['Law']
for year in range(year_start, year_end +1):
path = '/Applications/Python 3.5/Economic'
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)
user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
header = {'User-Agent':user_agent}
resultPath = os.path.join(path, 'OED_table.csv')
htmlPath = os.path.join(path, 'OED.html')
request = urllib2.Request('http://www.oed.com/search?browseType=sortAlpha&case-insensitive=true&dateFilter='+ str(year)+ '&nearDistance=1&ordered=false&page=1&pageSize=100&scope=ENTRY&sort=entry&subjectClass='+ str(subject_search)+ '&type=dictionarysearch', None, header)
page = opener.open(request)
with open(resultPath, 'wb') as outputw, open(htmlPath, 'w') as outputh:
urlpage = page.read()
outputh.write(urlpage)
new_words = re.findall(r'<span class=\"hwSect\"><span class=\"hw\">(.*?)</span>', urlpage)
print new_words
csv_writer = csv.writer(outputw)
if csv_writer.writerow([year] + new_words):
csv_writer.writerow([year, word])
However, when I actually run the code, the only portion that gets written to the csv file is the very last year that I call. So, my csv file ends up looking like a one row like this:
1552, word1, word2, word3, etc....
I basically want to have a separate row for each year in the range of years. How do I go about this?
You keep overwriting in the loop and every time you run the code, open it once outside the loops and append to the file opening with a instead of w so each run of the code will add to the existing data not overwrite.:
with open("/Applications/Python 3.5/Economic/OED_table.csv", 'a') as outputw, open("/Applications/Python 3.5/Economic/OED.html", 'a') as outputh:
for year in range(year_start, year_end +1):
.....................
I have written following code I am able to print out the parsed values of Lat and lon but i am unable to write them to a file. I tried flush and also i tried closing the file but of no use. Can somebody point out whats wrong here.
import os
import serial
def get_present_gps():
ser=serial.Serial('/dev/ttyUSB0',4800)
ser.open()
# open a file to write gps data
f = open('/home/iiith/Desktop/gps1.txt', 'w')
data=ser.read(1024) # read 1024 bytes
f.write(data) #write data into file
f = open('/home/iiith/Desktop/gps1.txt', 'r')# fetch the required file
f1 = open('/home/iiith/Desktop/gps2.txt', 'a+')
for line in f.read().split('\n'):
if line.startswith('$GPGGA'):
try:
lat, _, lon= line.split(',')[2:5]
lat=float(lat)
lon=float(lon)
print lat/100
print lon/100
a=[lat,lon]
f1.write(lat+",")
f1.flush()
f1.write(lon+"\n")
f1.flush()
f1.close()
except:
pass
while True:
get_present_gps()
You're covering the error up by using the except: pass. Don't do that... ever. At least log the exception.
One error which it definitely covers is lat+",", which is going to fail because it's float+str and it's not implemented. But there may be more.
In python 2.7.3, I try to merge two files in one.
I download a file over the Internet. The entire file size is exactly 3,197,743 bytes. I download it in two parts, one part is 3,000,000 bytes in size, the second part is 197,743 in size. Then, I want to merge the two files to reconstruct the entire file.
Here my code :
import requests
import shutil
URL = 'some_URL'
headers = {'user-agent': 'Agent'}
headers.update({'range': 'bytes=0-2999999'})
response = requests.get(URL, headers=headers)
file = open('some_file', 'wb')
file.write(response.content)
file.close()
headers2 = {'user-agent': 'Agent'}
headers2.update({'range': 'bytes=3000000-'})
response2 = requests.get(URL, headers=headers2)
file2 = open('some_file2', 'wb')
file2.write(response2.content)
file2.close()
source = open('some_file2','rb')
destination = open('some_file','ab')
shutil.copyfileobj(source,destination)
destination.close()
source.close()
At the end, I have one file ('some-file' in the example) which size is exactly 3,197,743 bytes but the file is corrupted. I tried this with a PDF file.
Where is the problem ?
I tried to solve your problem with different approaches and used diff tool to identify whether the program retrieves part files differently. I identified that there are no difference, thus I am not really sure what's wrong.
However, I propose following solution to resolve your usecase
import urllib2
URL = "http://traffic.org/general-reports/traffic_pub_gen19.pdf"
req = urllib2.urlopen(URL)
CHUNK = 3000000
with open("some_file.pdf", 'wb') as fp:
while True:
chunk = req.read(CHUNK)
if not chunk: break
fp.write(chunk)