I have a pyspark dataframe that visually looks like the following. I want the column to hold float values only. Please note, currently the values have square bracket around it.
from pyspark.sql.types import StructType,StructField
from pyspark.sql.types import StringType, IntegerType, ArrayType
data = [
("Smith","OH","[55.5]"),
("Anna","NY","[33.3]"),
("Williams","OH","[939.3]"),
]
schema = StructType([
StructField('name', StringType(), True),
StructField('state', StringType(), True),
StructField('salary', StringType(), True)
])
df = spark.createDataFrame(data = data, schema= schema)
df.show(truncate=False)
Input:
+--------+-----+-------+
|name |state|salary |
+--------+-----+-------+
|Smith |OH |[55.5] |
|Anna |NY |[33.3] |
|Williams|OH |[939.3]|
+--------+-----+-------+
And the output should look like,
+--------+-----+------------------+
|name |state|float_value_salary|
+--------+-----+------------------+
|Smith |OH |55.5 |
|Anna |NY |33.3 |
|Williams|OH |939.3 |
+--------+-----+------------------+
Thank you for any help.
You can trim the square brackets and cast to float:
import pyspark.sql.functions as F
df2 = df.withColumn('salary', F.expr("float(trim('[]', salary))"))
df2.show()
+--------+-----+------+
| name|state|salary|
+--------+-----+------+
| Smith| OH| 55.5|
| Anna| NY| 33.3|
|Williams| OH| 939.3|
+--------+-----+------+
Or you can use from_json to parse it as an array of float, and get the first array element:
df2 = df.withColumn('salary', F.from_json('salary', 'array<float>')[0])
You can use regex:
import pyspark.sql.functions as F
df.select(
F.regexp_extract('salary', '([\d\.]+)', 1).cast('float').alias('salary')
).show()
Output:
+------+
|salary|
+------+
| 55.5|
| 33.3|
| 939.3|
+------+
you need to parse the string to a float array using a UDF and then you can explode the array to get the singular value within the array.
the program would be as follows :
import json
from pyspark.sql import functions as F
from pyspark.sql.types import FloatType
def parse_value_from_string(x):
res = json.loads(x)
return res
parse_float_array = F.udf(parse_value_from_string, ArrayType(FloatType()))
df = df.withColumn('float_value_salary',F.explode(parse_float_array(F.col('salary'))))
df_output = df.select('name','state','float_value_salary')
The output dataframe would like the following result
+--------+-----+------------------+
| name|state|float_value_salary|
+--------+-----+------------------+
| Smith| OH| 55.5|
| Anna| NY| 33.3|
|Williams| OH| 939.3|
+--------+-----+------------------+
I've been searching around and haven't figured out a way to restructure a dataframe's column to add new columns to the dataframe based on the array contents dynamically. I'm new to python, so I might be searching on the wrong terms and be the reason I haven't found a clear example yet. Please let me know if this is a duplicate and reference link to find it. I think I just need to be pointed in the right direction.
Ok, the details.
The environment is pyspark 2.3.2 and python 2.7
The sample column contains 2 arrays, which they are correlated to each other 1 to 1. I would like to create a column for each value in the titles array and put the corresponding name (in the person array) the respective column.
I cobbled up an example to focus on my problem with changing the dataframe.
import json
from pyspark.sql.types import ArrayType, StructType, StructField, StringType
from pyspark.sql import functions as f
input = { "sample": { "titles": ["Engineer", "Designer", "Manager"], "person": ["Mary", "Charlie", "Mac"] }, "location": "loc a"},{ "sample": { "titles": ["Engineer", "Owner"],
"person": ["Tom", "Sue"] }, "location": "loc b"},{ "sample": { "titles": ["Engineer", "Designer"], "person": ["Jane", "Bill"] }, "location": "loc a"}
a = [json.dumps(input)]
jsonRDD = sc.parallelize(a)
df = spark.read.json(jsonRDD)
This is the schema of my dataframe:
In [4]: df.printSchema()
root
|-- location: string (nullable = true)
|-- sample: struct (nullable = true)
| |-- person: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- titles: array (nullable = true)
| | |-- element: string (containsNull = true)
My dataframe data:
In [5]: df.show(truncate=False)
+--------+-----------------------------------------------------+
|location|sample |
+--------+-----------------------------------------------------+
|loc a |[[Mary, Charlie, Mac], [Engineer, Designer, Manager]]|
|loc b |[[Sue, Tom], [Owner, Engineer]] |
|loc a |[[Jane, Bill], [Engineer, Designer]] |
+--------+-----------------------------------------------------+
And what I would like my dataframe to look like:
+--------+-----------------------------------------------------+------------+-----------+---------+---------+
|location|sample |Engineer |Desginer |Manager | Owner |
+--------+-----------------------------------------------------+------------+-----------+---------+---------+
|loc a |[[Mary, Charlie, Mac], [Engineer, Designer, Manager]]|Mary |Charlie |Mac | |
|loc b |[[Sue, Tom], [Owner, Engineer]] |Tom | | |Sue |
|loc a |[[Jane, Bill], [Engineer, Designer]] |Jane |Bill | | |
+--------+-----------------------------------------------------+------------+-----------+---------+---------+
I've tried to use the explode function, only to end up with more records with the array field in each record. There have been some examples in stackoverflow, but they have static column names. This dataset can have them in any order and new titles can be added later.
Without explode
First convert each struct to a map:
from pyspark.sql.functions import udf
#udf("map<string,string>")
def as_dict(x):
return dict(zip(*x)) if x else None
dfmap = df.withColumn("sample", as_dict("sample")
Then use method shown in PySpark converting a column of type 'map' to multiple columns in a dataframe to split map into columns
With explode
Add unique id using monotonically_increasing_id.
Use one of the methods show in Pyspark: Split multiple array columns into rows to explode both arrays together or explode the map created with the first method.
pivot the result, grouping by added id and other fields you want to preserve, pivot by title and taking first(person)
#user10601094 helped me get this question answered. I'm posting the full solution below to help anyone else that might have a similar question
I'm not very fluent in python, so please feel free to suggest better approaches
In [1]: import json
...: from pyspark.sql import functions as f
...:
In [2]: # define a sample data set
...: input = { "sample": { "titles": ["Engineer", "Designer", "Manager"], "person": ["Mary", "Charlie", "Mac"] }, "location": "loc a"},{ "sample": { "titles": ["Engineer", "Owner"],
...: "person": ["Tom", "Sue"] }, "location": "loc b"},{ "sample": { "titles": ["Engineer", "Designer"], "person": ["Jane", "Bill"] }, "location": "loc a"}
In [3]: # create a dataframe with the sample json data
...: a = [json.dumps(input)]
...: jsonRDD = sc.parallelize(a)
...: df = spark.read.json(jsonRDD)
...:
2018-11-03 20:48:09 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
In [4]: # Change the array in the sample column to a dictionary
...: # swap the columns so the titles are the key
...:
...: # UDF to convert 2 arrays into a map
...: #f.udf("map<string,string>")
...: def as_dict(x):
...: return dict(zip(x[1],x[0])) if x else None
...:
In [5]: # create a new dataframe based on the original dataframe
...: dfmap = df.withColumn("sample", as_dict("sample"))
In [6]: # Convert sample column to be title columns based on the map
...:
...: # get the columns names, stored in the keys
...: keys = (dfmap
...: .select(f.explode("sample"))
...: .select("key")
...: .distinct()
...: .rdd.flatMap(lambda x: x)
...: .collect())
In [7]: # create a list of column names
...: exprs = [f.col("sample").getItem(k).alias(k) for k in keys]
...:
In [8]: dfmap.select(dfmap.location, *exprs).show()
+--------+--------+--------+-------+-----+
|location|Designer|Engineer|Manager|Owner|
+--------+--------+--------+-------+-----+
| loc a| Charlie| Mary| Mac| null|
| loc b| null| Tom| null| Sue|
| loc a| Bill| Jane| null| null|
+--------+--------+--------+-------+-----+
I was following this articles to have 2 columns per 1 field , so my custom field code is something like this :
class GeopositionField(models.Field):
description = "A geoposition (latitude and longitude)"
def __init__(self, *args, **kwargs):
kwargs['max_length'] = 42
super(GeopositionField, self).__init__(*args, **kwargs)
def contribute_to_class(self, cls, name):
self.name = name
position_longitude = DecimalField(decimal_places=6,max_digits=9,default=0,blank=True)
cls.add_to_class("position_longitude",position_longitude)
position_latitude = DecimalField(decimal_places=6,max_digits=8,default=0,blank=True)
cls.add_to_class("position_latitude",position_latitude)
setattr(cls,"position_longitude",position_longitude)
setattr(cls,"position_latitude",position_latitude)
And my model class is Request :
class Request(models.Model):
person = models.ForeignKey(Person)
position = GeopositionField(null = False,default = 0)
( I'm modifying django-geoposition ) Until now I have only position_latitude and position_longitude in my table (and not only "position" like it was originally)
Before
+--------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| person_id | int(11) | NO | MUL | NULL | |
| creation_date | datetime | NO | | NULL | |
| position | varchar(50) | NO | | NULL | |
+--------------------+---------------+------+-----+---------+----------------+
After
+--------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| person_id | int(11) | NO | MUL | NULL | |
| creation_date | datetime | NO | | NULL | |
| position_longitude | decimal(9,6) | NO | | NULL | |
| position_latitude | decimal(8,6) | NO | | NULL | |
+--------------------+---------------+------+-----+---------+----------------+
That's good, but the problem comes in django admin, and also in the shell, because if I create a "Request" object and then I try to print "position" attribute, I got a error that says the "position" attribute doesn't exists :
>>> from main.models import Request
>>> x = Request()
>>> x.position
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'Request' object has no attribute 'position'
>>>
It works if I set position attribute in __init__ method of Request class, but that's not the idea.
So, back to the real problem, when I try to show it in admin panel, if I call explicitly to "position" field django throws an error, curiously it works with "position_latitude" and "position_longitude"
class RequestAdminForm(forms.ModelForm):
class Meta:
model = Request
# fields = ['position_latitude','position_longitude] <-- this works !
fields = ['position'] # <-- this returns error = Unknown field(s) (position) specified for Request
Is there a way to show "position_latitude" and "position_longitude" when RequestAdminForm only have "position" in fields list? That's what I want to achieve. Why happen the "undefined-attribute" problem ?
How can I see the current urlpatterns that "reverse" is looking in?
I'm calling reverse in a view with an argument that I think should work, but doesn't. Any way I can check what's there and why my pattern isn't?
If you want a list of all the urls in your project, first you need to install django-extensions
You can simply install using command.
pip install django-extensions
For more information related to package goto django-extensions
After that, add django_extensions in INSTALLED_APPS in your settings.py file like this:
INSTALLED_APPS = (
...
'django_extensions',
...
)
urls.py example:
from django.urls import path, include
from . import views
from . import health_views
urlpatterns = [
path('get_url_info', views.get_url_func),
path('health', health_views.service_health_check),
path('service-session/status', views.service_session_status)
]
And then, run any of the command in your terminal
python manage.py show_urls
or
./manage.py show_urls
Sample output example based on config urls.py:
/get_url_info django_app.views.get_url_func
/health django_app.health_views.service_health_check
/service-session/status django_app.views.service_session_status
For more information you can check the documentation.
Try this:
from django.urls import get_resolver
get_resolver().reverse_dict.keys()
Or if you're still on Django 1.*:
from django.core.urlresolvers import get_resolver
get_resolver(None).reverse_dict.keys()
Django >= 2.0 solution
I tested the other answers in this post and they were either not working with Django 2.X, incomplete or too complex. Therefore, here is my take on this:
from django.conf import settings
from django.urls import URLPattern, URLResolver
urlconf = __import__(settings.ROOT_URLCONF, {}, {}, [''])
def list_urls(lis, acc=None):
if acc is None:
acc = []
if not lis:
return
l = lis[0]
if isinstance(l, URLPattern):
yield acc + [str(l.pattern)]
elif isinstance(l, URLResolver):
yield from list_urls(l.url_patterns, acc + [str(l.pattern)])
yield from list_urls(lis[1:], acc)
for p in list_urls(urlconf.urlpatterns):
print(''.join(p))
This code prints all URLs, unlike some other solutions it will print the full path and not only the last node. e.g.:
admin/
admin/login/
admin/logout/
admin/password_change/
admin/password_change/done/
admin/jsi18n/
admin/r/<int:content_type_id>/<path:object_id>/
admin/auth/group/
admin/auth/group/add/
admin/auth/group/autocomplete/
admin/auth/group/<path:object_id>/history/
admin/auth/group/<path:object_id>/delete/
admin/auth/group/<path:object_id>/change/
admin/auth/group/<path:object_id>/
admin/auth/user/<id>/password/
admin/auth/user/
... etc, etc
Django 1.11, Python 2.7.6
cd to_your_django_project
python manage.py shell
Then paste following code.
from django.conf.urls import RegexURLPattern, RegexURLResolver
from django.core import urlresolvers
urls = urlresolvers.get_resolver()
def if_none(value):
if value:
return value
return ''
def print_urls(urls, parent_pattern=None):
for url in urls.url_patterns:
if isinstance(url, RegexURLResolver):
print_urls(url, if_none(parent_pattern) + url.regex.pattern)
elif isinstance(url, RegexURLPattern):
print(if_none(parent_pattern) + url.regex.pattern)
print_urls(urls)
Sample output:
^django-admin/^$
^django-admin/^login/$
^django-admin/^logout/$
^django-admin/^password_change/$
^django-admin/^password_change/done/$
^django-admin/^jsi18n/$
^django-admin/^r/(?P<content_type_id>\d+)/(?P<object_id>.+)/$
^django-admin/^wagtailimages/image/^$
^django-admin/^wagtailimages/image/^add/$
^django-admin/^wagtailimages/image/^(.+)/history/$
^django-admin/^wagtailimages/image/^(.+)/delete/$
^django-admin/^wagtailimages/image/^(.+)/change/$
^django-admin/^wagtailimages/image/^(.+)/$
...
In Django 3.0, it's as easy as:
from django.urls import get_resolver
print(get_resolver().url_patterns)
Prints:
[<URLPattern '' [name='home']>, <URLPattern '/testing' [name='another_url']>]
Here is a quick and dirty hack to just get the information you need without needing to modify any of your settings.
$ pip install django-extensions
$ python manage.py shell -c 'from django.core.management import call_command; from django_extensions.management.commands.show_urls import Command; call_command(Command())'
This is piggy backing off #robert's answer. While correct, I didn't want to have django-extensions as a dependency even if it was for just a second.
I am using the next command:
(Python3 + Django 1.10)
from django.core.management import BaseCommand
from django.conf.urls import RegexURLPattern, RegexURLResolver
from django.core import urlresolvers
class Command(BaseCommand):
def add_arguments(self, parser):
pass
def handle(self, *args, **kwargs):
urls = urlresolvers.get_resolver()
all_urls = list()
def func_for_sorting(i):
if i.name is None:
i.name = ''
return i.name
def show_urls(urls):
for url in urls.url_patterns:
if isinstance(url, RegexURLResolver):
show_urls(url)
elif isinstance(url, RegexURLPattern):
all_urls.append(url)
show_urls(urls)
all_urls.sort(key=func_for_sorting, reverse=False)
print('-' * 100)
for url in all_urls:
print('| {0.regex.pattern:20} | {0.name:20} | {0.lookup_str:20} | {0.default_args} |'.format(url))
print('-' * 100)
Usage:
./manage.py showurls
Sample output:
----------------------------------------------------------------------------------------------------
| ^(.+)/$ | | django.views.generic.base.RedirectView | {} |
| ^(.+)/$ | | django.views.generic.base.RedirectView | {} |
| ^(.+)/$ | | django.views.generic.base.RedirectView | {} |
| ^(.+)/$ | | django.views.generic.base.RedirectView | {} |
| ^(.+)/$ | | django.views.generic.base.RedirectView | {} |
| ^(.+)/$ | | django.views.generic.base.RedirectView | {} |
| ^static\/(?P<path>.*)$ | | django.contrib.staticfiles.views.serve | {} |
| ^media\/(?P<path>.*)$ | | django.views.static.serve | {'document_root': '/home/wlysenko/.virtualenvs/programmerHelper/project/media'} |
| ^(?P<app_label>polls|snippets|questions)/$ | app_list | apps.core.admin.AdminSite.app_index | {} |
| ^(?P<app_label>activity|articles|badges|books|comments|flavours|forum|marks|newsletters|notifications|opinions|polls|questions|replies|snippets|solutions|tags|testing|users|utilities|visits)/reports/$ | app_reports | apps.core.admin.AdminSite.reports_view | {} |
| ^(?P<app_label>activity|articles|badges|books|comments|flavours|forum|marks|newsletters|notifications|opinions|polls|questions|replies|snippets|solutions|tags|testing|users|utilities|visits)/statistics/$ | app_statistics | apps.core.admin.AdminSite.statistics_view | {} |
| articles/(?P<slug>[-\w]+)/$ | article | apps.articles.views.ArticleDetailView | {} |
| book/(?P<slug>[-_\w]+)/$ | book | apps.books.views.BookDetailView | {} |
| category/(?P<slug>[-_\w]+)/$ | category | apps.utilities.views.CategoryDetailView | {} |
| create/$ | create | apps.users.views.UserDetailView | {} |
| delete/$ | delete | apps.users.views.UserDetailView | {} |
| detail/(?P<email>\w+#[-_\w]+.\w+)/$ | detail | apps.users.views.UserDetailView | {} |
| snippet/(?P<slug>[-_\w]+)/$ | detail | apps.snippets.views.SnippetDetailView | {} |
| (?P<contenttype_model_pk>\d+)/(?P<pks_separated_commas>[-,\w]*)/$ | export | apps.export_import_models.views.ExportTemplateView | {} |
| download_preview/$ | export_preview_download | apps.export_import_models.views.ExportPreviewDownloadView | {} |
| ^$ | import | apps.export_import_models.views.ImportTemplateView | {} |
| result/$ | import_result | apps.export_import_models.views.ImportResultTemplateView | {} |
| ^$ | index | django.contrib.admin.sites.AdminSite.index | {} |
| ^$ | index | apps.core.views.IndexView | {} |
| ^jsi18n/$ | javascript-catalog | django.views.i18n.javascript_catalog | {'packages': ('your.app.package',)} |
| ^jsi18n/$ | jsi18n | django.contrib.admin.sites.AdminSite.i18n_javascript | {} |
| level/(?P<slug>[-_\w]+)/$ | level | apps.users.views.UserDetailView | {} |
| ^login/$ | login | django.contrib.admin.sites.AdminSite.login | {} |
| ^logout/$ | logout | django.contrib.admin.sites.AdminSite.logout | {} |
| newsletter/(?P<slug>[_\w]+)/$ | newsletter | apps.newsletters.views.NewsletterDetailView | {} |
| newsletters/$ | newsletters | apps.newsletters.views.NewslettersListView | {} |
| notification/(?P<account_email>[-\w]+#[-\w]+.\w+)/$ | notification | apps.notifications.views.NotificationDetailView | {} |
| ^password_change/$ | password_change | django.contrib.admin.sites.AdminSite.password_change | {} |
| ^password_change/done/$ | password_change_done | django.contrib.admin.sites.AdminSite.password_change_done | {} |
| ^image/(?P<height>\d+)x(?P<width>\d+)/$ | placeholder | apps.core.views.PlaceholderView | {} |
| poll/(?P<pk>\w{8}-\w{4}-\w{4}-\w{4}-\w{12})/(?P<slug>[-\w]+)/$ | poll | apps.polls.views.PollDetailView | {} |
| ^add/$ | polls_choice_add | django.contrib.admin.options.ModelAdmin.add_view | {} |
| ^(.+)/change/$ | polls_choice_change | django.contrib.admin.options.ModelAdmin.change_view | {} |
| ^$ | polls_choice_changelist | django.contrib.admin.options.ModelAdmin.changelist_view | {} |
| ^(.+)/delete/$ | polls_choice_delete | django.contrib.admin.options.ModelAdmin.delete_view | {} |
| ^(.+)/history/$ | polls_choice_history | django.contrib.admin.options.ModelAdmin.history_view | {} |
| ^add/$ | polls_poll_add | django.contrib.admin.options.ModelAdmin.add_view | {} |
| ^(.+)/change/$ | polls_poll_change | django.contrib.admin.options.ModelAdmin.change_view | {} |
| ^$ | polls_poll_changelist | django.contrib.admin.options.ModelAdmin.changelist_view | {} |
| ^(.+)/delete/$ | polls_poll_delete | django.contrib.admin.options.ModelAdmin.delete_view | {} |
| ^(.+)/history/$ | polls_poll_history | django.contrib.admin.options.ModelAdmin.history_view | {} |
| ^$ | polls_vote_changelist | django.contrib.admin.options.ModelAdmin.changelist_view | {} |
| publisher/(?P<slug>[-_\w]+)/$ | publisher | apps.books.views.PublisherDetailView | {} |
| question/(?P<slug>[-_\w]+)/$ | question | apps.questions.views.QuestionDetailView | {} |
| ^add/$ | questions_answer_add | django.contrib.admin.options.ModelAdmin.add_view | {} |
| ^(.+)/change/$ | questions_answer_change | django.contrib.admin.options.ModelAdmin.change_view | {} |
| ^$ | questions_answer_changelist | django.contrib.admin.options.ModelAdmin.changelist_view | {} |
| ^(.+)/delete/$ | questions_answer_delete | django.contrib.admin.options.ModelAdmin.delete_view | {} |
| ^(.+)/history/$ | questions_answer_history | django.contrib.admin.options.ModelAdmin.history_view | {} |
| ^add/$ | questions_question_add | django.contrib.admin.options.ModelAdmin.add_view | {} |
| ^(.+)/change/$ | questions_question_change | django.contrib.admin.options.ModelAdmin.change_view | {} |
| ^$ | questions_question_changelist | django.contrib.admin.options.ModelAdmin.changelist_view | {} |
| ^(.+)/delete/$ | questions_question_delete | django.contrib.admin.options.ModelAdmin.delete_view | {} |
| ^(.+)/history/$ | questions_question_history | django.contrib.admin.options.ModelAdmin.history_view | {} |
| ^setlang/$ | set_language | django.views.i18n.set_language | {} |
| ^add/$ | snippets_snippet_add | django.contrib.admin.options.ModelAdmin.add_view | {} |
| ^(.+)/change/$ | snippets_snippet_change | django.contrib.admin.options.ModelAdmin.change_view | {} |
| ^$ | snippets_snippet_changelist | django.contrib.admin.options.ModelAdmin.changelist_view | {} |
| ^(.+)/delete/$ | snippets_snippet_delete | django.contrib.admin.options.ModelAdmin.delete_view | {} |
| ^(.+)/history/$ | snippets_snippet_history | django.contrib.admin.options.ModelAdmin.history_view | {} |
| solution/(?P<pk>\w{8}-\w{4}-\w{4}-\w{4}-\w{12})/(?P<slug>[-_\w]+)/$ | solution | apps.solutions.views.SolutionDetailView | {} |
| suit/(?P<slug>[-\w]+)/$ | suit | apps.testing.views.SuitDetailView | {} |
| tag/(?P<name>[-_\w]+)/$ | tag | apps.tags.views.TagDetailView | {} |
| theme/(?P<slug>[-_\w]+)/$ | theme | apps.forum.views.SectionDetailView | {} |
| topic/(?P<slug>[-_\w]+)/$ | topic | apps.forum.views.TopicDetailView | {} |
| update/$ | update | apps.users.views.UserDetailView | {} |
| ^r/(?P<content_type_id>\d+)/(?P<object_id>.+)/$ | view_on_site | django.contrib.contenttypes.views.shortcut | {} |
| writer/(?P<slug>[-_\w]+)/$ | writer | apps.books.views.WriterDetailView | {} |
----------------------------------------------------------------------------------------------------
There is a recipe on activestate
import urls
def show_urls(urllist, depth=0):
for entry in urllist:
print(" " * depth, entry.regex.pattern)
if hasattr(entry, 'url_patterns'):
show_urls(entry.url_patterns, depth + 1)
show_urls(urls.url_patterns)
There's a plugin I use: https://github.com/django-extensions/django-extensions, it has a show_urls command that could help.
Simply type in a url you know does not exist and the server will return an error message with a list of url patterns.
For example, if you're running a site at http://localhost:8000/something
Type in
http://localhost:8000/something/blahNonsense, and your server will return the url search list and display it in the browser
def get_resolved_urls(url_patterns):
url_patterns_resolved = []
for entry in url_patterns:
if hasattr(entry, 'url_patterns'):
url_patterns_resolved += get_resolved_urls(
entry.url_patterns)
else:
url_patterns_resolved.append(entry)
return url_patterns_resolved
In python manage.py shell
import urls
get_resolved_urls(urls.urlpatterns)
Minimalist solution for django 2.0
For instance, if you're looking for an url that's on the first app of installed_apps, you can access it like that:
from django.urls import get_resolver
from pprint import pprint
pprint(
get_resolver().url_patterns[0].url_patterns
)
Django 1.8, Python 2.7+
Just run these commands in your Shell. Python manage.py shell and execute the following code.
from django.conf.urls import RegexURLPattern, RegexURLResolver
from django.core import urlresolvers
urls = urlresolvers.get_resolver(None)
def if_none(value):
if value:
return value
return ''
def print_urls(urls, parent_pattern=None):
for url in urls.url_patterns:
if isinstance(url, RegexURLResolver):
print_urls(url, if_none(parent_pattern) + url.regex.pattern)
elif isinstance(url, RegexURLPattern):
print(if_none(parent_pattern) + url.regex.pattern)
print_urls(urls)
I have extended Seti's command to show namespace, all url parts, auto-adjust column widths, sorted by (namespace,name):
https://gist.github.com/andreif/263a3fa6e7c425297ffee09c25f66b20
import sys
from django.core.management import BaseCommand
from django.conf.urls import RegexURLPattern, RegexURLResolver
from django.core import urlresolvers
def collect_urls(urls=None, namespace=None, prefix=None):
if urls is None:
urls = urlresolvers.get_resolver()
_collected = []
prefix = prefix or []
for x in urls.url_patterns:
if isinstance(x, RegexURLResolver):
_collected += collect_urls(x, namespace=x.namespace or namespace,
prefix=prefix + [x.regex.pattern])
elif isinstance(x, RegexURLPattern):
_collected.append({'namespace': namespace or '',
'name': x.name or '',
'pattern': prefix + [x.regex.pattern],
'lookup_str': x.lookup_str,
'default_args': dict(x.default_args)})
else:
raise NotImplementedError(repr(x))
return _collected
def show_urls():
all_urls = collect_urls()
all_urls.sort(key=lambda x: (x['namespace'], x['name']))
max_lengths = {}
for u in all_urls:
for k in ['pattern', 'default_args']:
u[k] = str(u[k])
for k, v in list(u.items())[:-1]:
# Skip app_list due to length (contains all app names)
if (u['namespace'], u['name'], k) == \
('admin', 'app_list', 'pattern'):
continue
max_lengths[k] = max(len(v), max_lengths.get(k, 0))
for u in all_urls:
sys.stdout.write(' | '.join(
('{:%d}' % max_lengths.get(k, len(v))).format(v)
for k, v in u.items()) + '\n')
class Command(BaseCommand):
def handle(self, *args, **kwargs):
show_urls()
Note: column order is kept in Python 3.6 and one would need to use OrderedDict in older versions.
Update: A new version with OrderedDict now lives in django-🍌s package: https://github.com/5monkeys/django-bananas/blob/master/bananas/management/commands/show_urls.py
Django >= 2.0 List Solution
adopted from #CesarCanassa
from django.conf import settings
from django.urls import URLPattern, URLResolver
URLCONF = __import__(settings.ROOT_URLCONF, {}, {}, [''])
def list_urls(patterns, path=None):
""" recursive """
if not path:
path = []
result = []
for pattern in patterns:
if isinstance(pattern, URLPattern):
result.append(''.join(path) + str(pattern.pattern))
elif isinstance(pattern, URLResolver):
result += list_urls(pattern.url_patterns, path + [str(pattern.pattern)])
return result
from django.urls.resolvers import RegexPattern,RoutePattern
from your_main_app import urls
def get_urls():
url_list = []
for url in urls.urlpatterns:
url_list.append(url.pattern._regex) if isinstance(url.pattern, RegexPattern) else url_list.append(url.pattern._route)
return url_list
Here your_main_app is the app name where your settings.py file is placed
Yet another adaption of #Cesar Canassa 's generator magic. This can be added to the yourapp/management/commands/dumpurls.py director of your app so that it'll be accessible as a subcommand in management.py.
note: I added a line to make sure it filters for only yourapp. Update or remove it accordingly if additional URLs are desired.
As a management.py Subcommand
Deploy Path: yourapp/management/commands/dumpurls.py
from django.core.management.base import BaseCommand, CommandError
from django.conf import settings
from django.urls import URLPattern, URLResolver
def list_urls(lis, acc=None):
if acc is None:
acc = []
if not lis:
return
l = lis[0]
if isinstance(l, URLPattern):
yield acc + [str(l.pattern),l.name]
elif isinstance(l, URLResolver):
yield from list_urls(l.url_patterns, acc + [str(l.pattern)])
yield from list_urls(lis[1:], acc)
class Command(BaseCommand):
help = 'List all URLs from the urlconf'
def handle(self, *args, **options):
urlconf = __import__(settings.ROOT_URLCONF, {}, {}, [''])
records, glen, nlen = [], 0, 0
for p in list_urls(urlconf.urlpatterns):
record = [''.join(p[:2]), p[2]]
# Update me, or add an argument
if record[0].startswith('yourapp'):
clen = len(record[0])
if clen > glen: glen = clen
clen = len(record[1])
if clen > nlen: nlen = clen
records.append(record)
self.stdout.write('{:-<{width}}'.format('',width=glen+nlen))
self.stdout.write('{:<{glen}}Name'.format('Path',glen=glen+4))
self.stdout.write('{:-<{width}}'.format('',width=glen+nlen))
for record in records:
self.stdout.write('{path:<{glen}}{name}'.format(path=record[0],
name=record[1],
glen=glen+4))
self.stdout.write('{:-<{width}}'.format('',width=glen+nlen))
Sample Output
(env) django#dev:myproj~> ./manage.py dumpurls
-------------------------------------------------------------------------------------------------------
Path Name
-------------------------------------------------------------------------------------------------------
yourapp/^api-key/$ api-key-list
yourapp/^api-key\.(?P<format>[a-z0-9]+)/?$ api-key-list
yourapp/^attacks/$ attack-list
yourapp/^attacks\.(?P<format>[a-z0-9]+)/?$ attack-list
yourapp/^attack-histories/$ attackhistory-list
yourapp/^attack-histories\.(?P<format>[a-z0-9]+)/?$ attackhistory-list
yourapp/^files/$ file-list
yourapp/^files\.(?P<format>[a-z0-9]+)/?$ file-list
yourapp/^modules/$ module-list
yourapp/^modules\.(?P<format>[a-z0-9]+)/?$ module-list
You can create a dynamic import to gather all URL Patterns from each application in your project with a simple method like so:
def get_url_patterns():
import importlib
from django.apps import apps
list_of_all_url_patterns = list()
for name, app in apps.app_configs.items():
# you have a directory structure where you should be able to build the correct path
# my example shows that apps.[app_name].urls is where to look
mod_to_import = f'apps.{name}.urls'
try:
urls = getattr(importlib.import_module(mod_to_import), "urlpatterns")
list_of_all_url_patterns.extend(urls)
except ImportError as ex:
# is an app without urls
pass
return list_of_all_url_patterns
list_of_all_url_patterns = get_url_patterns()
I recently used something like this to create a template tag to show active navigation links.
import subprocces
res = subprocess.run(
'python manage.py show_urls',
capture_output=True,
shell=True,
)
url_list = [
line.split('\t')[0]
for line in res.stdout.decode().split('\n')
]
In case you are using DRF, you can print all the URL patterns for a particular router by printing the urlpatterns from router.get_urls() (within your Django app's urls.py file).
Open your apps urls.py and add the print statement to the bottom of the file, so the whole file might look like this:
import pprint
from django.urls import include, path
from rest_framework import routers
from . import views
router = routers.DefaultRouter()
router.register(r"users", views.UserViewSet, basename="User")
router.register(r"auth", views.AuthenticationView, basename="Auth")
router.register(r"dummy", views.DummyViewSet, basename="Dummy")
router.register("surveys", views.SurveyViewSet, basename="survey")
urlpatterns = [
path("", include(router.urls)),
]
pprint.pprint(router.get_urls())
The patterns are then printed like this:
[<URLPattern '^users/$' [name='User-list']>,
<URLPattern '^users\.(?P<format>[a-z0-9]+)/?$' [name='User-list']>,
<URLPattern '^users/admins/$' [name='User-admins']>,
<URLPattern '^users/admins\.(?P<format>[a-z0-9]+)/?$' [name='User-admins']>,
<URLPattern '^users/current/$' [name='User-current']>,
<URLPattern '^users/current\.(?P<format>[a-z0-9]+)/?$' [name='User-current']>,
<URLPattern '^users/(?P<pk>[^/.]+)/$' [name='User-detail']>,
<URLPattern '^users/(?P<pk>[^/.]+)\.(?P<format>[a-z0-9]+)/?$' [name='User-detail']>,
<URLPattern '^auth/login/$' [name='Auth-login']>,
...
]