What I'm trying to do
I'm trying to enter a list of tags in flask that should become passable as a list but I can't figure out how to do it in flask, nor can I find documentation to add lists (of strings) in flask_wtf. Has anyone have experience with this?
Ideally I would like the tags to be selectively delete-able, after you entered them. So that you could enter.
The problem
Thus far my form is static. You enter stuff, hit submit, it gets processed into a .json. The tags list is the last element I can't figure out. I don't even know if flask can do this.
A little demo of how I envisioned the entry process:
How I envisioned the entry process:
The current tags are displayed and an entry field to add new ones.
[Tag1](x) | [Tag2](x)
Enter new Tag: [______] (add)
Hit (add)
[Tag1](x) | [Tag2](x)
Enter new Tag: [Tag3__] (add)
New Tag is added
[Tag1](x) | [Tag2](x) | [Tag3](x)
Enter new Tag: [______]
How I envisioned the deletion process:
Hitting the (x) on the side of the tag should kill it.
[Tag1](x) | [Tag2](x) | [Tag3](x)
Hit (x) on Tag2. Result:
[Tag1](x) | [Tag3](x)
The deletion is kind of icing on the cake and could probably be done, once I have a list I can edit, but getting there seems quite hard.
I'm at a loss here.
I basically want to know if it's possible to enter lists in general, since there does not seem to be documentation on the topic.
Your description is not really clear (is Tag1 the key in the JSON or is it Tag the key, and 1 the index?)
But I had a similar issue recently, where I wanted to submit a basic list in JSON and let WTForms handle it properly.
For instance, this:
{
"name": "John",
"tags": ["code", "python", "flask", "wtforms"]
}
So, I had to rewrite the way FieldList works because WTForms, for some reason, wants a list as "tags-1=XXX,tags-2=xxx".
from wtforms import FieldList
class JSONFieldList(FieldList):
def process(self, formdata, data=None):
self.entries = []
if data is None or not data:
try:
data = self.default()
except TypeError:
data = self.default
self.object_data = data
if formdata:
for (index, obj_data) in enumerate(formdata.getlist(self.name)):
self._add_entry(formdata, obj_data, index=index)
else:
for obj_data in data:
self._add_entry(formdata, obj_data)
while len(self.entries) < self.min_entries:
self._add_entry(formdata)
def _add_entry(self, formdata=None, data=None, index=None):
assert not self.max_entries or len(self.entries) < self.max_entries, \
'You cannot have more than max_entries entries in this FieldList'
if index is None:
index = self.last_index + 1
self.last_index = index
name = '%s-%d' % (self.short_name, index)
id = '%s-%d' % (self.id, index)
field = self.unbound_field.bind(form=None, name=name, id=id, prefix=self._prefix, _meta=self.meta,
translations=self._translations)
field.process(formdata, data)
self.entries.append(field)
return field
On Flask's end to handle the form:
from flask import request
from werkzeug.datastructures import ImmutableMultiDict
#app.route('/add', methods=['POST'])
def add():
form = MyForm(ImmutableMultiDict(request.get_json())
# process the form, form.tags.data is a list
And the form (notice the use of JSONFieldList):
class MonitorForm(BaseForm):
name = StringField(validators=[validators.DataRequired(), validators.Length(min=3, max=5)], filters=[lambda x: x or None])
tags = JSONFieldList(StringField(validators=[validators.DataRequired(), validators.Length(min=1, max=250)], filters=[lambda x: x or None]), validators=[Optional()])
I found a viable solution in this 2015 book, where a tagging system is being build for flask as part of a blog building exercise.
It's based on Flask_SQLAlchemy.
Entering lists therefore is possible with WTForms / Flask by submitting the items to the database via, e.g. FieldList and in the usecase of a tagging system, reading them from the database back to render them in the UI.
If however you don't want to deal with O'Rielly's paywall (I'm sorry, I can't post copyrighted material here) and all you want is a solution to add tags, check out taggle.js by Sean Coker. It's not flask, but javascript, but it does the job.
Related
Case: paginated page with an overview of users whom sent the 'receiver' a message. Once clicked, the receiver will go to the page with the actual conversation.
The 'overview page' is paginated (works). However, after implementation we noticed that if user X received 2 messages from Y the front-end displays two rows for Y. So we are looking for a solution to group the paginated objects - the below came to life. This however gives us the next challenge. As the pagination only retrieves the first X objects, if the next 'paginated page' contains the user from page 1 (user Y). user Y will occur on page 2 as well. I tried to create a pagination object after making the group 'unique_group' but this is just a list and cannot be paginated.
Retrieving .all() seems inefficient, especially when the application grows.
def group_paginated_msgs(user, Message):
#[Y, Y, Z] = simple example
''' grab user messages '''
page = request.args.get('page', 1, type=int)
messages = user.messages_received.order_by(
Message.timestamp.desc()
).paginate(
page,
5, #from config in prod.
False
)
unique_authors = []
unique_group = []
# group user messages
try:
p = messages.items
for m in p:
if m.author in unique_authors:
continue
else:
unique_authors.append(m.author)
unique_group.append(m)
except:
...
# print(unique_group) -> [Y, Z]
Could you use the .distinct() method to get the unique users from your query before you paginate?
Something like this
messages = Messages.query.filter(receiver==user).distinct(Message.author)
.order_by(
Message.timestamp.desc()
).paginate(
page,
5, #from config in prod.
False
)
This example works by querying the messages that have been received by your user and then filtering for distinct message authors.
I'm not sure exactly what I have written will work, but I think using the distinct() method is a good place to start. You can find more info about the method here.
I have a ReactJS component inside a Django template, where a user clicks on a checkout button, posts the item_code and gets redirected to checkout:
onCheckout = () => {
fetch("/onCheckout/", {
method: "POST",
body: JSON.stringify({'item': this.props.item_info.code})
}).then(window.location.replace("/checkout"))
}
A Django view receives the request and stores it in a session.
def onCheckout(request):
if request.method == "POST":
items = request.session.get('items', [])
new_item = json.loads(request.body.decode('utf-8'))['item']
items.append(new_item)
request.session['items'] = items
I am having a issue with storing data in the session. After the first item gets stored correctly in the array, and I then checkout on a second item, the items array starts acting up:
(Pdb) items
['15130BC.ZZ.8042BC.01']
(Pdb) new_item
'5213G-001'
(Pdb) items
['15130BC.ZZ.8042BC.01']
(Pdb) items
['5213G-001']
If I try to access request.session['item'] from any other view function, I get a KeyError.
I am fairly new to Django, any help would be appreciated. Also, I would like to know if there are better alternatives to accomplish the above.
Sessions Config
settings.SESSION_ENGINE = 'django.contrib.sessions.backends.db'
settings.SESSION_CACHE_ALIAS = 'default'
settings.CACHES = {'default': {'BACKEND': 'django.core.cache.backends.locmem.LocMemCache'}}
Some reading on change detection for Django sessions: https://docs.djangoproject.com/en/2.0/topics/http/sessions/#when-sessions-are-saved
Based on your code, it appears to me that the change detection should happen. However, let's try to brute force this, can you add the following line as the last line of your code: request.session.modified = True - see if this fixes your issue?
Update: some basic checks
Can you verify the following
Check if your db backend is configured priestly
If you want to use a database-backed session, you need to add 'django.contrib.sessions' to your INSTALLED_APPS setting. Once you have configured your installation, run manage.py migrate to install the single database table that stores session data.
Check if your session Middleware is enabled
Sessions are implemented via a piece of middleware. The default settings.py created by django-admin startproject has SessionMiddleware activated. To enable session functionality, edit the MIDDLEWARE_CLASSES setting and make sure it contains 'django.contrib.sessions.middleware.SessionMiddleware'.
Update 2: Test the session
Maybe modify a style existing endpoint as follows and see if you are able to store values and persist them in session :
test_keys = request.session.get('test_keys', [])
test_keys.append(random.randint())
request.session['test_keys'] = test_keys
return Response(request.session.get('test_keys', []))
You should see that each time you hit the api, you get a list with one new integer in it in addition to all past values. Lmk how this goes.
I am trying to scrape webpage given in the this link -
http://new-york.eat24hours.com/picasso-pizza/19053
Here I am trying to get all the possible details like address and phone etc..
So, Far I have extracted the name, phone, address, reviews, rating.
But I also want to extract the the full menu of restaurant here(name of item with price).
So, far I have no idea how to manage this data into output of csv.
The rest of the data for a single url will be single but the items in menu will always be of different amount.
here below is my code so far-
import scrapy
from urls import start_urls
class eat24Spider(scrapy.Spider):
AUTOTHROTTLE_ENABLED = True
name = 'eat24'
def start_requests(self):
for x in start_urls:
yield scrapy.Request(x, self.parse)
def parse(self, response):
brickset = response
NAME_SELECTOR = 'normalize-space(.//h1[#id="restaurant_name"]/a/text())'
ADDRESS_SELECTION = 'normalize-space(.//span[#itemprop="streetAddress"]/text())'
LOCALITY = 'normalize-space(.//span[#itemprop="addressLocality"]/text())'
REGION = 'normalize-space(.//span[#itemprop="addressRegion"]/text())'
ZIP = 'normalize-space(.//span[#itemprop="postalCode"]/text())'
PHONE_SELECTOR = 'normalize-space(.//span[#itemprop="telephone"]/text())'
RATING = './/meta[#itemprop="ratingValue"]/#content'
NO_OF_REVIEWS = './/meta[#itemprop="reviewCount"]/#content'
OPENING_HOURS = './/div[#class="hours_info"]//nobr/text()'
EMAIL_SELECTOR = './/div[#class="company-info__block"]/div[#class="business-buttons"]/a[span]/#href[substring-after(.,"mailto:")]'
yield {
'name': brickset.xpath(NAME_SELECTOR).extract_first().encode('utf8'),
'pagelink': response.url,
'address' : str(brickset.xpath(ADDRESS_SELECTION).extract_first().encode('utf8')+', '+brickset.xpath(LOCALITY).extract_first().encode('utf8')+', '+brickset.xpath(REGION).extract_first().encode('utf8')+', '+brickset.xpath(ZIP).extract_first().encode('utf8')),
'phone' : str(brickset.xpath(PHONE_SELECTOR).extract_first()),
'reviews' : str(brickset.xpath(NO_OF_REVIEWS).extract_first()),
'rating' : str(brickset.xpath(RATING).extract_first()),
'opening_hours' : str(brickset.xpath(OPENING_HOURS).extract_first())
}
I am sorry if I am making this confusing but any kind of help will be appreciated.
Thank you in advance!!
If you want to extract full restaurant menu, first of all, you need to locate element who contains both name and price:
menu_items = response.xpath('//tr[#itemscope]')
After that, you can simply make for loop and iterate over restaurant items appending name and price to list:
menu = []
for item in menu_items:
menu.append({
'name': item.xpath('.//a[#class="cpa"]/text()').extract_first(),
'price': item.xpath('.//span[#itemprop="price"]/text()').extract_first()
})
Finally you can add new 'menu' key to your dict:
yield {'menu': menu}
Also, I suggest you use scrapy Items for storing scraped data:
https://doc.scrapy.org/en/latest/topics/items.html
For outputting data in csv file use scrapy Feed exports, type in console:
scrapy crawl yourspidername -o restaurants.csv
I need to scrap the data of each item from a website using Scrapy(http://example.com/itemview). I have a list of itemID and I need to pass it in a form in example.com.
There is no url change for each item. So for each request in my spider the url will always be the same. But the content will be different.
I don't wan't a for loop for handling each request. So i followed the below mentioned steps.
started spider with the above url
added item_scraped and spider_closed signals
passed through several functions
passed the scraped data to pipeline
trigerred the item_scraped signal
After this it automatically calls the spider_closed signal. But I want the above steps to be continued till the total itemID are finished.
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
itemIDs = [11111,22222,33333]
current_item_num = 0
def __init__(self, itemids=None, *args, **kwargs):
super(ExampleSpider, self).__init__(*args, **kwargs)
dispatcher.connect(self.item_scraped, signals.item_scraped)
dispatcher.connect(self.spider_closed, signals.spider_closed)
def spider_closed(self, spider):
self.driver.quit()
def start_requests(self):
request = self.make_requests_from_url('http://example.com/itemview')
yield request
def parse(self,response):
self.driver = webdriver.PhantomJS()
self.driver.get(response.url)
first_data = self.driver.find_element_by_xpath('//div[#id="itemview"]').text.strip()
yield Request(response.url,meta={'first_data':first_data},callback=self.processDetails,dont_filter=True)
def processDetails(self,response):
itemID = self.itemIDs[self.current_item_num]
..form submission with the current itemID goes here...
...the content of the page is updated with the given itemID...
yield Request(response.url,meta={'first_data':response.meta['first_data']},callback=self.processData,dont_filter=True)
def processData(self,response):
...some more scraping goes here...
item = ExamplecrawlerItem()
item['first_data'] = response.meta['first_data']
yield item
def item_scraped(self,item,response,spider):
self.current_item_num += 1
#i need to call the processDetails function here for the next itemID
#and the process needs to contine till the itemID finishes
self.parse(response)
My piepline:
class ExampleDBPipeline(object):
def process_item(self, item, spider):
MYCOLLECTION.insert(dict(item))
return
I wish I had an elegant solution to this. But instead it's a hackish way of calling the underlying classes.
self.crawler.engine.slot.scheduler.enqueue_request(scrapy.Request(url,self.yourCallBack))
However, you can yield a request after you yield the item and have it callback to self.processDetails. Simply add this to your processData function:
yield item
self.counter += 1
yield scrapy.Request(response.url,callback=self.processDetails,dont_filter=True, meta = {"your":"Dictionary"}
Also, PhantomJS can be nice and make your life easy, but it is slower than regular connections. If possible, find the request for json data or whatever makes the page unparseable without JS. To do so, open up chrome, right click, click inspect, go to the network tab, then enter the ID into the form, then look at the XHR or JS tabs for a JSON that has the data or next url you want. Most of the time, there will be some url made by adding the ID, if you can find it, you can just concatenate your urls and call that directly without having the cost of JS rendering. Sometimes it is randomized, or not there, but I've had fair success with it. You can then also use that to yield many requests at the same time without having to worry about phantomJS trying to do two things at once or having to initialize many instances of it. You could use tabs, but that is a pain.
Also, I would use a Queue of your IDs to ensure thread safety. Otherwise, you could have processDetails called twice on the same ID, though in the logic of your program everything seems to go linearly, which means you aren't using the concurrency capabilities of Scrapy and your program will go more slowly. To use Queue add:
import Queue
#go inside class definition and add
itemIDQueue = Queue.Queue()
#within __init__ add
[self.itemIDQueue.put(ID) for ID in self.itemID]
#within processDetails replace itemID = self.itemIDs[self.current_item_num] with
itemID = self.itemIDQueue.get()
And then there is no need to increment the counter and your program is thread safe.
I am developing a web application for managing customers. So I have a Customer entity which is made up by usual fields such as first_name, last_name, age etc.
I have a page where these customers are shown as a table. In the same page I have a search field, and I'd like to filter customers and update the table while the user is typing a something in the search field, using Ajax.
Here is how it should work:
Figure 1: The main page showing all of the customers:
Figure 2: As long as the user types letter "b", the table is updated with the results:
Given that partial text matching is not supported in GAE, I have tricked and implemented it arising from what is shown here: TL;DR: I have created a Customers Index, that contains a Search Document for every customer(doc_id=customer_key). Each Search Document contains Atom Fields for every customer's field I want to be able to search on(eg: first_name, last_name): every field is made up like this: suppose the last_name is Berlusconi, the field is going to be made up by these Atom Fields "b" "be" "ber" "berl" "berlu" "berlus" "berlusc" "berlusco" "berluscon" "berlusconi".
In this way I am able to perform full text matching in a way that resembles partial text matching. If I search for "Be", the Berlusconi customer is returned.
The search is made by Ajax calls: whenever a user types in the search field(the ajax is dalayed a little bit to see if the user keeps typing, to avoid sending a burst of requests), an Ajax call is made with the query string, and a json object is returned.
Now, things were working well in debugging, but I was testing it with a few people in the datastore. As long as I put many people, search looks very slow.
This is how I create search documents. This is called everytime a new customer is put to the datastore.
def put_search_document(cls, key):
"""
Called by _post_put_hook in BaseModel
"""
model = key.get()
_fields = []
if model:
_fields.append(search.AtomField(name="empty", value=""),) # to retrieve customers when no query string
_fields.append(search.TextField(name="sort1", value=model.last_name.lower()))
_fields.append(search.TextField(name="sort2", value=model.first_name.lower()))
_fields.append(search.TextField(name="full_name", value=Customer.tokenize1(
model.first_name.lower()+" "+model.last_name.lower()
)),)
_fields.append(search.TextField(name="full_name_rev", value=Customer.tokenize1(
model.last_name.lower()+" "+model.first_name.lower()
)),)
# _fields.append(search.TextField(name="telephone", value=Customer.tokenize1(
# model.telephone.lower()
# )),)
# _fields.append(search.TextField(name="email", value=Customer.tokenize1(
# model.email.lower()
# )),)
document = search.Document( # create new document with doc_id=key.urlsafe()
doc_id=key.urlsafe(),
fields=_fields)
index = search.Index(name=cls._get_kind()+"Index") # not in try-except: defer will catch and retry.
index.put(document)
#staticmethod
def tokenize1(string):
s = ""
for i in range(len(string)):
if i > 0:
s = s + " " + string[0:i+1]
else:
s = string[0:i+1]
return s
This is the search code:
#staticmethod
def search(ndb_model, query_phrase):
# TODO: search returns a limited number of results(20 by default)
# (See Search Results at https://cloud.google.com/appengine/docs/python/search/#Python_Overview)
sort1 = search.SortExpression(expression='sort1', direction=search.SortExpression.ASCENDING,
default_value="")
sort2 = search.SortExpression(expression='sort2', direction=search.SortExpression.ASCENDING,
default_value="")
sort_opt = search.SortOptions(expressions=[sort1, sort2])
results = search.Index(name=ndb_model._get_kind() + "Index").search(
search.Query(
query_string=query_phrase,
options=search.QueryOptions(
sort_options=sort_opt
)
)
)
print "----------------"
res_list = []
for r in results:
obj = ndb.Key(urlsafe=r.doc_id).get()
print obj.first_name + " "+obj.last_name
res_list.append(obj)
return res_list
Did anyone else had my same experience? If so, how have you solved it?
Thank you guys very much,
Marco Galassi
EDIT: names, email, phone are obviously totally invented.
Edit2: I have now moved to TextField, who look a little bit faster, but the problem still persist