regular expression to analyze site link - regex

I tried to create a regex string to analyze link
site- www.example.com/page.php?u=userid&action=add&date=yyyy-MM-dd
I want to create named groups as:
site: includes the full requested link
user: includes value of u parameter
action: includes value of action parameter
by taking example above the result will be:
site: www.example.com/page.php?u=userid&action=add&date=yyyy-MM-dd
user: userid
action: add

This regex gives you named captures of site, user and action,
(?=(?<site>www.*$))(?=.*u=(?<user>(?:[^&]*)))(?=.*action=(?<action>(?:[^&]*)))
Play here,
https://regex101.com/r/1VAgSO/1

Python 3
In[2]: from urllib.parse import parse_qs, urlparse
In[3]: url = 'www.example.com/page.php?u=userid&action=add&date=yyyy-MM-dd'
In[4]: parsed_url = urlparse(url)
In[5]: parsed_url
Out[5]: ParseResult(scheme='', netloc='', path='www.example.com/page.php', params='', query='u=userid&action=add&date=yyyy-MM-dd', fragment='')
In[6]: parsed_query = parse_qs(parsed_url.query)
In[7]: parsed_query
Out[7]: {'u': ['userid'], 'action': ['add'], 'date': ['yyyy-MM-dd']}
In[8]: {'site': url, 'user': parsed_query['u'], 'action': parsed_query['action']}
Out[8]:
{'site': 'www.example.com/page.php?u=userid&action=add&date=yyyy-MM-dd',
'user': ['userid'],
'action': ['add']}
https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse
https://docs.python.org/3/library/urllib.parse.html#urllib.parse.parse_qs

Related

Django POST multipart/form-data unexpectedly parsing field as array

I am testing an endpoint in the following manner:
from rest_framework.test import APIClient
from django.urls import reverse
import json
client = APIClient()
response = client.post(list_url, {'name': 'Zagyg Co'})
I find that the model object is being created with a name of [Zagyg Co] instead of Zagyg Co.
Inspecting the request object reveals the following:
self._request.META['CONTENT_TYPE']
#=> 'multipart/form-data; boundary=BoUnDaRyStRiNg; charset=utf-8'
self._request.body
#=> b'--BoUnDaRyStRiNg\r\nContent-Disposition: form-data; name="name"\r\n\r\nZagyg Co\r\n--BoUnDaRyStRiNg--\r\n'
self._request.POST
#=> <QueryDict: {'name': ['Zagyg Co']}>
Using JSON like so:
response = client.post(
list_url,
json.dumps({'name': 'Zagyg Co'}),
content_type='application/json',
)
sets the name correctly. Why is this so?
request.data is a Django QueryDict. When the data is sent as a multipart form it handles potential multiple values of the same field by putting it in a list.
Using its dict method or using dictionary access syntax returns the last value stored in the relevant key(s):
request.data['name']
#=> 'Zagyg Co'
request.dict()
#=> {'name': 'Zagyg Co'}
Which is great if it's guaranteed that each key has a single value. For list values there's getlist:
request.data.getlist('name')
#=> ['Zagyg Co']
For mixes of keys with single and multiple values it seems like manual parsing is required.

How to get and save into file the full list of twitter account followers with Tweepy

I wrote this code to get the full list of twitter account followers using Tweepy:
# ... twitter connection and streaming
fulldf = pd.DataFrame()
line = {}
ids = []
try:
for page in tweepy.Cursor(api.followers_ids, screen_name="twittername").pages():
df = pd.DataFrame()
ids.extend(page)
try:
for i in ids:
user = api.get_user(i)
line = [{'id': user.id,
'Name': user.name,
'Statuses Count':user.statuses_count,
'Friends Count': user.friends_count,
'Screen Name':user.screen_name,
'Followers Count':user.followers_count,
'Location':user.location,
'Language':user.lang,
'Created at':user.created_at,
'Time zone':user.time_zone,
'Geo enable':user.geo_enabled,
'Description':user.description.encode(sys.stdout.encoding, errors='replace')}]
df = pd.DataFrame(line)
fulldf = fulldf.append(df)
del df
fulldf.to_csv('out.csv', sep=',', index=False)
print i ,len(ids)
except tweepy.TweepError:
time.sleep(60 * 15)
continue
except tweepy.TweepError as e2:
print "exception global block"
print e2.message[0]['code']
print e2.args[0][0]['code']
At the end I have only 1000 line in the csv file, It's not best solution to save everything on memory (dataframe) and save it to file in the same loop. But at least I have something that works but not getting the full list just 1000 out of 15000 followers.
Any help with this will be appreciated.
Consider the following part of your code:
for page in tweepy.Cursor(api.followers_ids, screen_name="twittername").pages():
df = pd.DataFrame()
ids.extend(page)
try:
for i in ids:
user = api.get_user(i)
As you use extend for each page, you simply add the new set of ids onto the end of your list of ids. The way you have nested your for statements means that with every new page you return, you get_user for all of the previous pages first - as such, when you hit the final page of ids you'd still be looking at the first 1000 or so when you hit the rate limit and have no more pages to browse. You're also likely hitting the rate limit for your cursor, hich would be why you're seeing the exception.
Let's start over a bit.
Firstly, tweepy can deal with rate limits (one of the main error sources) for you when you create your API if you use wait_on_rate_limit. This solves a whole bunch of problems, so we'll do that.
Secondly, if you use lookup_users, you can look up 100 user objects per request. I've written about this in another answer so I've taken the method from there.
Finally, we don't need to create a dataframe or export to a csv until the very end. If we get a list of user information dictionaries, this can quickly change to a DataFrame with no real effort from us.
Here is the full code - you'll need to sub in your keys and the username of the user you actually want to look up, but other than that it hopefully will work!
import tweepy
import pandas as pd
def lookup_user_list(user_id_list, api):
full_users = []
users_count = len(user_id_list)
try:
for i in range((users_count / 100) + 1):
print i
full_users.extend(api.lookup_users(user_ids=user_id_list[i * 100:min((i + 1) * 100, users_count)]))
return full_users
except tweepy.TweepError:
print 'Something went wrong, quitting...'
consumer_key = 'XXX'
consumer_secret = 'XXX'
access_token = 'XXX'
access_token_secret = 'XXX'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
ids = []
for page in tweepy.Cursor(api.followers_ids, screen_name="twittername").pages():
ids.extend(page)
results = lookup_user_list(ids, api)
all_users = [{'id': user.id,
'Name': user.name,
'Statuses Count': user.statuses_count,
'Friends Count': user.friends_count,
'Screen Name': user.screen_name,
'Followers Count': user.followers_count,
'Location': user.location,
'Language': user.lang,
'Created at': user.created_at,
'Time zone': user.time_zone,
'Geo enable': user.geo_enabled,
'Description': user.description}
for user in results]
df = pd.DataFrame(all_users)
df.to_csv('All followers.csv', index=False, encoding='utf-8')

NoReverseMatch while trying to match dates with regular expression in url

In my URL I expect to get 2 dates as input:
url(r'^export/range/csv/(?P<start_date>\d+)/(?P<end_date>\d+)/$', views.export_payment_range_csv, name="export_payment_range_csv"),
But I am getting error :
NoReverseMatch at /payment/list/range/ Reverse for
'export_payment_range_csv' with arguments '()' and keyword arguments
'{u'start_date': datetime.date(2016, 2, 1), u'end_date':
datetime.date(2016, 12, 31)}' not found. 2 pattern(s) tried:
['condition/export/range/csv/(?P\d+)/(?P\d+)/$',
'payment/export/range/csv/(?P\d+)/(?P\d+)/$']
I assume this has to do with regular expression in my URL file.
what I am doing wrong?
UPDATE:
URL I access
<li>CSV for current range payments
When start date and end date I got during template rendering from the view.
I expect dates on the view side
payment_list = LeasePaymentFilter(request.GET, queryset=LeasePayment.objects.filter(payment_date__range=[start_date, end_date]))
Your url parameters expects digits (\d+), for example:
reverse("export_payment_range_csv", kwargs={
'start_date': '123',
'end_date': '456',
})
but you pass datetime.date instances:
reverse("export_payment_range_csv", kwargs={
'start_date': d1,
'end_date': d2,
})
check the view itself (the function views.export_payment_range_csv()) and see what format is expected for the parameters, and generate the needed string, for example:
def format_my_date(d):
return d.strftime("%Y%m%d")
reverse("export_payment_range_csv", kwargs={
'start_date': format_my_date(d1),
'end_date': format_my_date(d2),
})

Create/Update Tag for Intercom.io User in Python

Unfortunately there's no way to create a user in Intercom.io with a tag, so I'm trying to write some code that will look for an existing tag in Intercom, and if it's there, add a user to that tag, and if it's not, create the tag and add the user to it. I've tried several different variations by looking at the docs for the python-intercom library, but there are conflicting methods (Intercom.update_tag vs. Tag.update), and nothing has worked yet.
Here's how users are created in Intercom (this works):
import time
from members.models import Member
from intercom import Intercom, Tag
Intercom.app_id = settings.INTERCOM_TEST_APP_ID
Intercom.api_key = settings.INTERCOM_TEST_API_KEY
member = Member.objects.get(email="exampleemail#example.com")
Intercom.create_user(
email=member.email,
user_id=member.email,
name="%s %s" % (member.first_name, member.last_name),
created_at=int(time.time()),
city_name=member.city,
last_seen_ip=member.last_ip,
)
Here's what I currently have to look for and create or update tags, which triggers no errors, but doesn't successfully tag the user:
tag = Intercom.get_tag(name=member.referral_code)
if tag['id'] != None:
Intercom.update_tag(member.referral_code, "tag", user_ids=[member.pk])
else:
Intercom.create_tag(tag, "tag", user_ids=[member.pk])
I've also tried variations of the following, but it gets the error "descriptor 'update' requires a 'dict' object but received a 'unicode':
if Tag.find_by_name(member.referral_code) != 0:
Tag.update(member.referral_code, "tag", user_ids=[member.pk])
else:
Tag.create(member.referral_code, "tag", user_ids=[member.pk])
What do I need to change to get tagging to work?
My name's Jeff, I'm one of the customer success engineers at Intercom. Unfortunately the intercom-python library is still using our deprecated V1 API which is likely causing some of the confusion here. Until that library updates to use our newer REST API I would suggest that you use the python requests library and call our API directly. I've got minimal python experience but something like this should get you started on the right track.
import requests
from requests.auth import HTTPBasicAuth
import json
tags_url = 'https://api.intercom.io/tags'
app_id = 'YOUR_APP_ID'
api_key = 'YOUR_API_KEY'
headers = {'content-type': 'application/json', 'Accept': 'application/json'}
tag_name = 'My sweet tag'
# Get tags to then loop through
list_tag_response_as_json = requests.get(tags_url, auth=(app_id, api_key), headers=headers).json()
tag_names = [tag['name'] for tag in list_tag_response_as_json['tags']]
if tag_name not in tag_names
# Create a tag
tag_response = requests.post(tags_url, auth=(app_id, api_key), headers=headers, data={'name': tag_name})
# Tag users
tag_and_users = {'name':tag_name, 'users': [{'email': 'abc#example.com'}, {'email': 'def#example.com'}]}
tagged_user_response = requests.post(tags_url, auth=(app_id, api_key), headers=headers, data=tag_and_users)
Also, feel free to give us a shout in Intercom if you're still having trouble and we can help you there.

Remove the domain name from URLs

I need to remove the domain name from URLs with different schemes.
examples of url:
http://www.example.org/cat1/page1
example.org/cat1/page1
https://www.example.org/cat1/page1
outcome:
cat1/page1
It can be done both on django template or on the views.
Use the urlparse module:
>>> from urlparse import urlparse
>>> o = urlparse('http://www.example.org/cat1/page1')
>>> o.path
'/cat1/page1'
Note, that example.org/cat1/page1 is a valid path, so you can't remove the domain from it. As workaround you can manually add the protocol to the url string:
>>> url = 'example.org/cat1/page1'
>>> if not '//' in url:
... url = 'http://' + url
...
>>> o = urlparse(url)
>>> o.path
'/cat1/page1'
The request object has this information also:
https://docs.djangoproject.com/en/1.7/ref/request-response/#module-django.http
HttpRequest.path
A string representing the full path to the requested page, not including the domain.
Example: "/music/bands/the_beatles/"
This will let you get the path of the current page, so it might not work in your situation