Making re.sub print whole input along with its modifications - regex

All I need is to perform a parameter replacement in a config file:
#!/usr/bin/python3
import re
current_config = '''
#USERNAME
[name = foo]
#-USERNAME
#DATABASE
[config = old]
#-DATABASE
#MORE_FIELDS
[moresettings]
#-MORE_FIELDS
'''
new_config ='''
#DATABASE
[config = new]
#-DATABASE
'''
final = re.sub('#DATABASE}}(.*)#-#{{#-DATABASE}}', current_config, new_config)
print(final)
The regex itself works fine, the problem is that it prints out only the (correctly) modified part, ignoring the rest of the file:
dev-sandbox#workstation:~/PycharmProjects/test$ python3 test.py
#DATABASE
[config = new]
#-DATABASE
What I want to achieve is the entire "current_config" variable including the new settings for the DATABASE block. I got it working with SED on shell, but I need this in python3.
Any help is much appreciated, the documentation for re.sub sugests it can't be done with it. Thank you!

Related

Scrapy how to save a State between spider runs (via scrapinghub)?

I have a spider that will run on schedule. Spider input is based on Date. From date of last scrape to todays date. So the question is how to save the date of last scrape within the Scrapy project? There is an option to get data from scrapy settings using pkjutil module, but i did not find any reference in the docs on how to write data in that file. Any idea? Maybe an alternative?
P.S. My other option is to use some free remote MySql DB just for this. But looks like more work if simple solution is available.
import pkgutil
class CodeSpider(scrapy.Spider):
name = "code"
allowed_domains = ["google.com.au"]
def start_requests(self):
f = pkgutil.get_data("au_go", "res/state.json")
ids = json.loads(f)
id = ids[0]['state']
yield {'state':id}
ids[0]['state'] = 'New State'
with open('./au_go/res/state.json', 'w') as f:
json.dump(ids, f)
The above solution works fine when ran locally. But I am getting no such file or directory when running the code at Scrapinghub.
File "/tmp/unpacked-eggs/__main__.egg/au_go/spiders/test_state.py", line 33, in parse
with open(savePath, 'w') as f:
IOError: [Errno 2] No such file or directory: './au_go/res/state.json'
The problem is fixed with use of Scrapinghub Colections
And scrapinghub API. Works nice now.
Here is an example code in case somebody will find it usefull.
from scrapinghub import ScrapinghubClient
client = ScrapinghubClient(Your API KEY)
project = client.get_project(Your Project ID)
collections = project.collections
last_accessed = collections.get_store('last_accessed')
last_accessed.set({'_key': 'Date', 'value': '12-54-1235'})
print last_accessed.get('Date')['value']

how to convert .docx file to html using python?

import mammoth
f = open("D:\filename.docx", 'rb')
document = mammoth.convert_to_html(f)
I am unable to get a .html file while i run this code,please help me to get it, When i converted to .html file i am not getting images inserted into word file into .html file,Can you please help me how to get images into .html from .docx?
Try this:
import mammoth
f = open("path_to_file.docx", 'rb')
b = open('filename.html', 'wb')
document = mammoth.convert_to_html(f)
b.write(document.value.encode('utf8'))
f.close()
b.close()
This is may be late to answer this question but just incase if someone still looking for the answer where word "tables/images/" should remains same after conversion to html below answer would help.
import win32com.client as win32
# Open MS Word
word = win32.gencache.EnsureDispatch('Word.Application')
wordFilePath = "C:\filename.docx"
doc = word.Documents.Open(wordFilePath)
# change to a .html
txt_path = wordFilePath.split('.')[0] + '.html'
# wdFormatFilteredHTML has value 10
# saves the doc as an html
doc.SaveAs(txt_path, 10)
doc.Close()
# noinspection PyBroadException
try:
word.ActiveDocument()
except Exception:
word.Quit()
I suggest you to try the following code
import mammoth
with open("document.docx", "rb") as docx_file:
result = mammoth.convert_to_html(docx_file)
html = result.value

Element not found in cache - Selenium (Python)

I just wrote a simple webscraping script to give me all the episode links on a particular site's page. The script was working fine, but, now it's broke. I didn't change anything.
Try this URL (For scraping ) :- http://www.crunchyroll.com/tabi-machi-late-show
Now, the script works mid-way and gives me an error stating, ' Element not found in the cache - perhaps the page has changed since it was looked up'
I looked it up on internet and people said about using the 'implicit wait' command at certain places. I did that, still no luck.
UPDATE : I tried this script in a demote desktop and it's working there without any problems.
Here's my script :-
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import time
from subprocess import Popen
#------------------------------------------------
try:
Link = raw_input("Please enter your Link : ")
if not Link:
raise ValueError('Please Enter A Link To The Anime Page. This Application Will now Exit in 5 Seconds.')
except ValueError as e:
print(e)
time.sleep(5)
exit()
print 'Analyzing the Page. Hold on a minute.'
driver = webdriver.Firefox()
driver.get(Link)
assert "Crunchyroll" in driver.title
driver.implicitly_wait(5) # <-- I tried removing this lines as well. No luck.
elem = driver.find_elements_by_xpath("//*[#href]")
driver.implicitly_wait(10) # <-- I tried removing this lines as well. No luck.
text_file = open("BatchLink.txt", "w")
print 'Fetching The Links, please wait.'
for elem in elem:
x = elem.get_attribute("href")
#print x
text_file.write(x+'\n')
print 'Links have been fetched. Just doing the final cleaning now.'
text_file.close()
CleanFile = open("queue.txt", "w")
with open('BatchLink.txt') as f:
mylist = f.read().splitlines()
#print mylist
with open('BatchLink.txt', 'r') as inF:
for line in inF:
if 'episode' in line:
CleanFile.write(line)
print 'Please Check the file named queue.txt'
CleanFile.close()
os.remove('BatchLink.txt')
driver.close()
Here's a screenshot of the error (might be of some help) :
http://i.imgur.com/SaANlsg.png
Ok i didn't work with python but know the problem
you have variable that you init -> elem = driver.find_elements_by_xpath("//*[#href]")
after that you doing some things with it in loop
before you finishing the loop try to init this variable again
elem = driver.find_elements_by_xpath("//*[#href]")
The thing is that the DOM is changes and you loosing the element collection.

Manually building a deep copy of a ConfigParser in Python 2.7

Just starting in on my Python learning curve, and hitting a snag in porting some code up to Python 2.7. It appears that in Python 2.7 it is no longer possible to perform a deepcopy() on instances of ConfigParser. It also appears that the Python team isn't terribly interested in restoring such a capability:
http://bugs.python.org/issue16058
Can someone propose an elegant solution for manually constructing a deepcopy/duplicate of an instance of ConfigParser?
Many thanks, -Pete
This is just an example implementation of Jan Vlcinsky answer written in Python 3 (I don't have enough reputation to post this as a comment to Jans answer). Many thanks to Jan for the push in the right direction.
To make a full (deep) copy of base_config into new_config just do the following;
import io
import configparser
config_string = io.StringIO()
base_config.write(config_string)
# We must reset the buffer ready for reading.
config_string.seek(0)
new_config = configparser.ConfigParser()
new_config.read_file(config_string)
Based on #Toenex answer, modified for Python 2.7:
import StringIO
import ConfigParser
# Create a deep copy of the configuration object
config_string = StringIO.StringIO()
base_config.write(config_string)
# We must reset the buffer to make it ready for reading.
config_string.seek(0)
new_config = ConfigParser.ConfigParser()
new_config.readfp(config_string)
The previous solution doesn't work in all python3 use cases. Specifically if the original parser is using Extended Interpolation the copy may fail to work correctly. Fortunately, the easy solution is to use the pickle module:
def deep_copy(config:configparser.ConfigParser)->configparser.ConfigParser:
"""deep copy config"""
rep = pickle.dumps(config)
new_config = pickle.loads(rep)
return new_config
If you need new independent copy of ConfigParser, then one option is:
have original version of ConfigParser
serialize the config file into temporary file or StringIO buffer
use that tmpfile or StringIO buffer to create new ConfigParser.
And you have it done.
If you are using Python 3 (3.2+) you can use the Mapping Protocol Access to copy (actually deep copy) the sections and options of a source configuration to another ConfigParser object.
You can use read_dict() to copy the state of a configuration parser.
Here is a demo:
import configparser
# the configuration to deep copy:
src_cfg = configparser.ConfigParser()
src_cfg.add_section("Section A")
src_cfg["Section A"]["key1"] = "value1"
src_cfg["Section A"]["key2"] = "value2"
# the destination configuration
dst_cfg = configparser.ConfigParser()
dst_cfg.read_dict(src_cfg)
dst_cfg.add_section("Section B")
dst_cfg["Section B"]["key3"] = "value3"
To display the resulting configuration, you can try:
import io
output = io.StringIO()
dst_cfg.write(output)
print(output.getvalue())
You get:
[Section A]
key1 = value1
key2 = value2
[Section B]
key3 = value3
After reading this article, I am more familiar with config.ini.
Record as follows:
import io
import configparser
def copy_config_demo():
with io.StringIO() as memory_file:
memory_file.write(str(test_config_data.__doc__)) # original_config.write(memory_file)
memory_file.seek(0)
new_config = configparser.ConfigParser(interpolation=configparser.ExtendedInterpolation())
new_config.read_file(memory_file)
# below is just for test
for section_name, list_item in [(section_name, new_config.items(section_name)) for section_name in new_config.sections()]:
print('\n[' + section_name + ']')
for key, value in list_item:
print(f'{key}: {value}')
def test_config_data():
"""
[Common]
home_dir: /Users
library_dir: /Library
system_dir: /System
macports_dir: /opt/local
[Frameworks]
Python: >=3.2
path: ${Common:system_dir}/Library/Frameworks/
[Arthur]
name: Carson
my_dir: ${Common:home_dir}/twosheds
my_pictures: ${my_dir}/Pictures
python_dir: ${Frameworks:path}/Python/Versions/${Frameworks:Python}
"""
output:
[Common]
home_dir: /Users
library_dir: /Library
system_dir: /System
macports_dir: /opt/local
[Frameworks]
python: >=3.2
path: /System/Library/Frameworks/
[Arthur]
name: Carson
my_dir: /Users/twosheds
my_pictures: /Users/twosheds/Pictures
python_dir: /System/Library/Frameworks//Python/Versions/>=3.2
hoping it is helpful to you.

Program for Backup - Python

Im trying to execute the following code in Python 2.7 on Windows7. The purpose of the code is to take back up from the specified folder to a specified folder as per the naming pattern given.
However, Im not able to get it work. The output has always been 'Backup Failed'.
Please advise on how I get resolve this to get the code working.
Thanks.
Code :
backup_ver1.py
import os
import time
import sys
sys.path.append('C:\Python27\GnuWin32\bin')
source = 'C:\New'
target_dir = 'E:\Backup'
target = target_dir + os.sep + time.strftime('%Y%m%d%H%M%S') + '.zip'
zip_command = "zip -qr {0} {1}".format(target,''.join(source))
print('This is a program for backing up files')
print(zip_command)
if os.system(zip_command)==0:
print('Successful backup to', target)
else:
print('Backup FAILED')
See if escaping the \'s helps :-
source = 'C:\\New'
target_dir = 'E:\\Backup'