sitemap.xml in a wagtail-bakery Site

sitemap.xmlis a very important file for managing how search engines understand your web site. I won't include any background on the file itself, but here are some links I find helpful:

For my static site (see Baking a Static Site from Wagtail CMS) I've decided to maintain a sitemap including all of my pages.


sitemap.xml in a Dynamic Wagtail Site

Wagtail comes with a thin wrapper around Django's sitemap framework, both of which dynamically create a sitemap on request. It's easy enough to set up. Just follow the Wagtail docs to add a URL pattern that uses a builtin view:

urls.py
from wagtail.contrib.sitemaps.views import sitemap

urlpatterns = [
    # ...

    path('sitemap.xml', sitemap),

    # ...
]

Now you can query the development server for a sitemap:

$ # in one terminal, start the development server
$ ./manage.py runserver

$ # in another terminal, fetch the sitemap
$ curl http://127.0.0.1:8000/sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://www.joelsleppy.com/</loc><lastmod>2021-01-02</lastmod></url><url><loc>https://www.joelsleppy.com/blog/</loc><lastmod>2021-01-01</lastmod></url><url><loc>https://www.joelsleppy.com/blog/baking-a-static-site-from-wagtail-cms/</loc><lastmod>2021-01-02</lastmod></url><url><loc>https://www.joelsleppy.com/blog/syntax-highlighted-code-blocks-with-wagtail-cms/</loc><lastmod>2021-01-02</lastmod></url>
</urlset>

It's intelligent enough to find all the Wagtail pages and filter only the published ones (the unpublished draft I have of this blog post doesn't appear in the sitemap).

sitemap.xml in a Static wagtail-bakery Site

The only issue is that wagtail-bakery doesn't include sitemap.xml in our static site:

$ ./manage.py build
Build started
Build finished
$ ls build/
blog  index.html  media  static

This is because, following the wagtail-bakery README, you might have this in your project settings:

settings.py
BAKERY_VIEWS = (
	'wagtailbakery.views.AllPublishedPagesView',
)

This picks up all published pages, but there is no Wagtail Page model for our sitemap so it gets left out of the build. Are there any other views that look like they might do the trick?

$ grep -e "^class" /path/to/your/virtual/env/lib/python3.8/site-packages/wagtailbakery/views.py
class WagtailBakeryView(BuildableDetailView):
class AllPagesView(WagtailBakeryView):
class AllPublishedPagesView(AllPagesView):

None of those look promising. It turns out that the BAKERY_VIEWS setting doesn't belong to wagtail-bakery, it belongs to django-bakery. Unfortunately django-bakery doesn't have any out-of-the-box solution or even any extension points to easily roll your own sitemap support (here's the open Github issue for that feature). Not to worry, we can hack our way out of this.

build.sh
#!/usr/bin/env bash

echo ''
echo 'Tearing down old build'
./manage.py collectstatic --clear --no-input
./manage.py unbuild

echo ''
echo 'Making new build'
./manage.py build

echo ''
echo 'Starting development server to generate sitemap'
./manage.py runserver &
# this leaves the server running in the background as job 1

echo 'waiting for development server to start up'
sleep 5
# a less dirty way would be to health check the development server until it responds

echo 'Fetching sitemap'
curl http://127.0.0.1:8000/sitemap.xml -o "build/sitemap.xml"

echo 'Killing development server'
kill %1
# the server is job 1 since we haven't started any others

Now our build includes the sitemap:

$ ./build.sh 
...
$ ls build
blog  index.html  media  sitemap.xml  static
$ cat build/sitemap.xml 
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://www.joelsleppy.com/</loc><lastmod>2021-01-02</lastmod></url><url><loc>https://www.joelsleppy.com/blog/</loc><lastmod>2021-01-01</lastmod></url><url><loc>https://www.joelsleppy.com/blog/baking-a-static-site-from-wagtail-cms/</loc><lastmod>2021-01-02</lastmod></url><url><loc>https://www.joelsleppy.com/blog/syntax-highlighted-code-blocks-with-wagtail-cms/</loc><lastmod>2021-01-02</lastmod></url>
</urlset>

Telling Google about your sitemap.xml

Submitting the sitemap to Google couldn't be easier. They have a web interface for this (see all the options for submitting your sitemap), but in order to automate this in your deployment script you should add a line like this:

deploy.sh
curl 'http://www.google.com/ping?sitemap=https://www.joelsleppy.com/sitemap.xml'

Cheers!