extracting javascript code from html - regex

i would want to extract the id of this statement , how could i proceed with this in python.
i am a beginner in python.
javascript:return WebForm_FireDefaultButton(event, 'ctl00_ibtnFind')

#!/usr/bin/python2
# -*- coding: utf-8 -*-
import re
input = """
javascript:return WebForm_FireDefaultButton(event, 'ctl00_ibtnFind')
javascript:return WebForm_FireDefaultButton(event, 'ctl00_ibtnFind2')
"""
m = re.findall("javascript:return WebForm_FireDefaultButton\(event, '([^']+)'\)", input)
print m

Related

Parsing input using the python2.7 argsparse with other Language support

# -*- coding: utf-8 -*-
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("name", help="Enter the name")
args = parser.parse_args()
name = args.name
print name
Command:
python work_space.py அரவிந்த்
Output:
????????
Command
python work_space.py åäö
Output:
σΣ÷
I am not able to get the text which I need, Also I am in need to concatenate this input text, Please let me know what module I have to use and How to implement it?

Gtts changing the destination

I'm trying to generate some .mp3 files, but i don't have control where they are saved! (basically they're saved on the .py's location).
I want to change the save location, it's possible?
# -*- coding: <UTF-8> -*-
from gtts import gTTS as gtts
from datetime import datetime as dt
#.MP3 FUNCTIONS FROM GTTS , THIS IS THE STANDARD
def audio_br(words, mp3name, language="pt"):
teste = gtts(text=words,lang=language)
teste.save("%s.mp3" % mp3name)
#GET THE TIME TO CREATE MY VARIABLE TO BE TURN ON MP3
tempo = dt.now()
begin = "Olá! são " + str(tempo.hour) + " horas e "+ str(tempo.minute) + " minutos"
print begin
#GENERATE MY .MP3
audio_br(begin,"A_BEGIN")
The problem resides in the fact that the read from gtts don't talk about save path location.
EDIT 1: Sorry for forgiving the code
# -*- coding: <UTF-8> -*-
from gtts import gTTS as gtts
from datetime import datetime as dt
import os
#.MP3 FUNCTIONS FROM GTTS , THIS IS THE STANDARD
def audio_br(words, mp3name, language="pt"):
teste = gtts(text=words,lang=language)
teste.save("%s.mp3" % os.path.join(<desired folder>,mp3name))
You need to call teste.save with the complete path to where you want the files to go. By default, if no path is provided, python writes files to the current working directory, which is the folder where the script is called from.
Using os.path.join is best b/c it joins paths using the system path separator, making your code more platform independent.

why cleaning text function doens't work without decoding to UTF8?

I wrote the following function in python 2.7 to clean the text but it doesn't work without decoding the tweet variable to utf8
# -*- coding: utf-8 -*-
import re
def clean_tweet(tweet):
tweet = re.sub(u"[^\u0622-\u064A]", ' ', tweet, flags=re.U)
return tweet
if __name__ == "__main__":
s="sadfas سيبس sdfgsdfg/dfgdfg ffeee منت منشس يت??بمنشس//تبي منشكسميكمنشسكيمنك ٌاإلا رًاٌااًٌَُ"
print "not working "+clean_tweet(s)
print "working "+clean_tweet(s.decode("utf-8"))
Could any one explain why?
Because I don't want to use the decoding as it makes the manipulation of the text in Sframe in graphlab is too slow.

how to get dictionary value as same using python?

Solved with your help
#!/usr/bin/python
# -*- coding: utf-8 -*-
message = {'message1':'நாம்','message2':'செய்தி'}
a={}
for i in message.keys():
if "message" in i:
a[i]=message[i]
status="success"
print a
got output:
{'message2':'செய்தி','message1':'நாம்'}
Thanks for all your help!!!!
You need to decode with 'utf-8' format in order to print/read as it is
print message['message1'].decode('utf-8')
This will print correctly.
Have you tried writing them as literal string:
message = {'message1': r'நாம்','message2':r'செய்தி'}

Can't output in Russian_Python2.7.9

I can't make output in Russian language only output of Unicode=(
I use Pythonv.2.7.9
Microsoft 8
How I can do that with list?
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup
r = requests.get("http://fs.to/video/films/group/film_genre/")
response = r.content.decode('utf-8')
page = BeautifulSoup(response)
for tag in page.findAll('li'):
a = tag.find('a')
for b in a.contents:
print (u'{0}'.format(u'○'),unicode(b.string))
Example of output must be like:
Аниме
Биография
...
Фэнтези
Эротика
Change the last line to:
print (u'{0}'.format(u'○'),b.string