Can't output in Russian_Python2.7.9 - python-2.7

I can't make output in Russian language only output of Unicode=(
I use Pythonv.2.7.9
Microsoft 8
How I can do that with list?
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup
r = requests.get("http://fs.to/video/films/group/film_genre/")
response = r.content.decode('utf-8')
page = BeautifulSoup(response)
for tag in page.findAll('li'):
a = tag.find('a')
for b in a.contents:
print (u'{0}'.format(u'○'),unicode(b.string))
Example of output must be like:
Аниме
Биография
...
Фэнтези
Эротика

Change the last line to:
print (u'{0}'.format(u'○'),b.string

Related

Django-grpc-framework generates strange gRPC code

When I generate gRPC code in my django-grpc-framework project using command:
python -m grpc_tools.protoc --proto_path=./ --python_out=./temp --grpc_python_out=./temp ./config.proto
something generates, but config_pb2 looks so empty:
# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: config.proto
"""Generated protocol buffer code."""
from google.protobuf.internal import builder as _builder
from google.protobuf import descriptor as _descriptor
from google.protobuf import descriptor_pool as _descriptor_pool
from google.protobuf import symbol_database as _symbol_database
# ##protoc_insertion_point(imports)
_sym_db = _symbol_database.Default()
from google.protobuf import empty_pb2 as google_dot_protobuf_dot_empty__pb2
DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\x0c\x63onfig.proto\x12\x0c\x63onfig_proto\x1a\x1bgoogle/protobuf/empty.proto\"u\n\rConfigMessage\x12\x0f\n\x07service\x18\x01 \x01(\t\x12\x0f\n\x07version\x18\x02 \x01(\t\x12\x0f\n\x07is_used\x18\x03 \x01(\x08\x1a\x31\n\x03Key\x12\x13\n\x0bservice_key\x18\x01 \x01(\t\x12\x15\n\rservice_value\x18\x02 \x01(\t\"\x1a\n\x18\x43onfigMessageListRequest\"*\n\x1c\x43onfigMessageRetrieveRequest\x12\n\n\x02id\x18\x01 \x01(\x05\x32\x88\x03\n\x10\x43onfigController\x12O\n\x04List\x12&.config_proto.ConfigMessageListRequest\x1a\x1b.config_proto.ConfigMessage\"\x00\x30\x01\x12\x44\n\x06\x43reate\x12\x1b.config_proto.ConfigMessage\x1a\x1b.config_proto.ConfigMessage\"\x00\x12U\n\x08Retrieve\x12*.config_proto.ConfigMessageRetrieveRequest\x1a\x1b.config_proto.ConfigMessage\"\x00\x12\x44\n\x06Update\x12\x1b.config_proto.ConfigMessage\x1a\x1b.config_proto.ConfigMessage\"\x00\x12#\n\x07\x44\x65stroy\x12\x1b.config_proto.ConfigMessage\x1a\x16.google.protobuf.Empty\"\x00\x62\x06proto3')
_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, globals())
_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'config_pb2', globals())
if _descriptor._USE_C_DESCRIPTORS == False:
DESCRIPTOR._options = None
_CONFIGMESSAGE._serialized_start=59
_CONFIGMESSAGE._serialized_end=176
_CONFIGMESSAGE_KEY._serialized_start=127
_CONFIGMESSAGE_KEY._serialized_end=176
_CONFIGMESSAGELISTREQUEST._serialized_start=178
_CONFIGMESSAGELISTREQUEST._serialized_end=204
_CONFIGMESSAGERETRIEVEREQUEST._serialized_start=206
_CONFIGMESSAGERETRIEVEREQUEST._serialized_end=248
_CONFIGCONTROLLER._serialized_start=251
_CONFIGCONTROLLER._serialized_end=643
# ##protoc_insertion_point(module_scope)
So when I look at config_pb2_grpc i see imports:
config__pb2.ConfigMessage
But there is no implemented ConfigMessage in config_pb2. And more then this config_pb2 has one underscore and this import has two underscores. In my mind this is rather strange.
Is it all right?
For example when i generate code in my ordinary django-rest-framework project using:
protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/data.proto
I get in data_pb2.py:
...
ConfigMessage = _reflection.GeneratedProtocolMessageType('ConfigMessage', (_message.Message,), {
...

Parsing input using the python2.7 argsparse with other Language support

# -*- coding: utf-8 -*-
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("name", help="Enter the name")
args = parser.parse_args()
name = args.name
print name
Command:
python work_space.py அரவிந்த்
Output:
????????
Command
python work_space.py åäö
Output:
σΣ÷
I am not able to get the text which I need, Also I am in need to concatenate this input text, Please let me know what module I have to use and How to implement it?

why cleaning text function doens't work without decoding to UTF8?

I wrote the following function in python 2.7 to clean the text but it doesn't work without decoding the tweet variable to utf8
# -*- coding: utf-8 -*-
import re
def clean_tweet(tweet):
tweet = re.sub(u"[^\u0622-\u064A]", ' ', tweet, flags=re.U)
return tweet
if __name__ == "__main__":
s="sadfas سيبس sdfgsdfg/dfgdfg ffeee منت منشس يت??بمنشس//تبي منشكسميكمنشسكيمنك ٌاإلا رًاٌااًٌَُ"
print "not working "+clean_tweet(s)
print "working "+clean_tweet(s.decode("utf-8"))
Could any one explain why?
Because I don't want to use the decoding as it makes the manipulation of the text in Sframe in graphlab is too slow.

extracting javascript code from html

i would want to extract the id of this statement , how could i proceed with this in python.
i am a beginner in python.
javascript:return WebForm_FireDefaultButton(event, 'ctl00_ibtnFind')
#!/usr/bin/python2
# -*- coding: utf-8 -*-
import re
input = """
javascript:return WebForm_FireDefaultButton(event, 'ctl00_ibtnFind')
javascript:return WebForm_FireDefaultButton(event, 'ctl00_ibtnFind2')
"""
m = re.findall("javascript:return WebForm_FireDefaultButton\(event, '([^']+)'\)", input)
print m

Scrapy:: issue with encoding when dumping to the json file

Here is the web-site, I would like to parse: [web-site in russian][1]
Here is the code that extracts the info that I need:
# -*- coding: utf-8 -*-
from scrapy.spider import Spider
from scrapy.selector import Selector
from flats.items import FlatsItem
class DmozSpider(Spider):
name = "dmoz"
start_urls = ['http://rieltor.ua/flats-sale/?ncrnd=6510']
def parse(self, response):
sel=Selector(response)
flats=sel.xpath('//*[#id="content"]')
flats_stored_info=[]
flat_item=FlatsItem()
for flat in flats:
flat_item['square']=[s.encode("utf-8") for s in sel.xpath('//div/strong[#class="param"][1]/text()').extract()]
flat_item['rooms_floor_floors']=[s.encode("utf-8") for s in sel.xpath('//div/strong[#class="param"][2]/text()').extract()]
flat_item['address']=[s.encode("utf-8") for s in flat.xpath('//*[#id="content"]//h2/a/text()').extract()]
flat_item['price']=[s.encode("utf-8") for s in flat.xpath('//div[#class="cost"]/strong/text()').extract()]
flat_item['subway']=[s.encode("utf-8") for s in flat.xpath('//span[#class="flag flag-location"]/a/text()').extract()]
flats_stored_info.append(flat_item)
return flats_stored_info
How I dump to json file
scrapy crawl dmoz -o items.json -t json
The problem is when I replace the code above to print in console the extracted info i.e. like this:
flat_item['square']=sel.xpath('//div/strong[#class="param"][1]/text()').extract()
for bla in flat_item['square']:
print bla
the script properly displays the information in russian.
But, when I use to dump the scraped information using the first version of the script (with encoding to utf-8), it writes to the json file something like this:
[{"square": ["2-\u043a\u043e\u043c\u043d., 16 \u044d\u0442\u0430\u0436 16-\u044d\u0442. \u0434\u043e\u043c", "1-\u043a\u043e\u043c\u043d.,
How can I dump information into json file in russian? Thank you for your advises.
[1]: http://rieltor.ua/flats-sale/?ncrnd=6510
It is correctly encoded, it's just that the json library escapes non-ascii characters by default.
You can load the data and use it (copying data from your example):
>>> import json
>>> print json.loads('"2-\u043a\u043e\u043c\u043d., 16 \u044d\u0442\u0430\u0436 16-\u044d\u0442. \u0434\u043e\u043c"')
2-комн., 16 этаж 16-эт. дом