Automating Blog posts with Git

In an ideal world reducing the friction to publish a blog post would increase the frequency of blogging! With that in mind I've automated the deployment of my blog with a script that runs every time I git commit a post. There are lots of great posts on how to do this, the main ways my approach is different is that I'm building the site on a remote host, and that I use git branches to develop and review my posts before publishing them.

Automating a build using git hooks is pretty straightforward. The source of the blog (pages, posts, etc) is stored in a git repository. When the source of a post is complete it's git committed to the repository. This kicks off a git hook script (called a post-receive hook). The git hook script builds the blog into HTML and deploys it.

In my case, I have the blogs source repository on a remote host: the git hook script runs on the remote host, it uses an installation of the blog engine (Pelican) to build the source into HTML and then rsyncs it to a virtual server.

It doesn't really matter that the git repository is on a server, as the key technologies are the git hook and the script. The same set-up works just as well on a laptop. The advantage for me is that I have the source git repository of the blog both on my laptop and in a shell account on sdf.org: I can write posts locally on my laptop or I can SSH into SDF.

I like to test my blog posts before I finally publish them. A nice way to handle this is to have the post-receive script check the branch: if it's the devel branch the script publishes to a development virtual server, if it's the master branch it publishes to the public site.

To set this up I have one installation of Pelican on my laptop, and one installation of Pelican on the remote host. I won't go through all the steps of the laptop set-up as that's pretty straightforward. The set-up on the remote host has the following:

The source of the site in git is where all the post and pages are. It's a git repository where we commit/push changes to the content of the site. It's a normal git bare repository, which is on the remote server. For me this is in ~/repos/futurile-www.git. New posts are pushed to this central blog source repository for publishing. This repo also has a git hook script which builds the site.

On the remote server I have a build environment. This is a directory with an installation of the blog engine (Pelican). As Pelican is in Python, I use a virtualenv for isolation and have the same version of Pelican on this remote host as I use locally (on my laptop).

The build environment uses a separate checked out copy of the blog source. I keep the checked out version in ~/workspace/futurile-www/. It has a checked out version of the source of the site - so when the git hook script is called it builds the site using this copy.

The git hook is called each time there's a commit into the central blog source repository. It's a Python script. The script looks at the branch that we're on, changes into the virtualenv for the build environment, builds the site and then depending on the branch it rsync's it to one location or another. If it's on devel it publishes to a devel virtual server, if it's on master it publishes to the main site's virtual server.

Central source repository

The first prerequisite for this process to work is that the source of the blog has to be in a git repository. This is our central main repository. When we're writing posts on the laptop or any other machine they're all eventually pushed up to this central repository. It's where the post-receive hook script is placed, so that the build can be automated.

I'm not going to show all the steps for this, but in my set-up I keep the main repository on the remote host. So the steps are:

# connect to my remote host
ssh sdf.org

# create a git repository for the main site
[server]$ git init --bare ~/repost/www-futurile.git

I then checkout this repository to my laptop.

Laptop Pelican

The second step is to set-up Pelican on the laptop. We put all the source of the blog (posts/pages) into a checked out local copy of the central git repository. See my other post on how to do this on a laptop, the main steps will be:

# checkout the remote repository to a local directory on laptop
[laptop]$ git clone ssh://user@sdf.org/repos/www-futurile.git $HOME/workspace/futurile-www

# create a virtualenv for this installation
[laptop]$ mkproject --verbose Pelican4.01

# edit the virtual env so it points to the checked out blog source directory
[laptop]$ setvirtualenvproject $HOME/.virtualenvs/Pelican4.01 $HOME/workspace/futurile-www

# set the remote tracking branch as the server
[laptop]$ workon Pelican4.01
[laptop]$ git remote add origin ssh://youruser@sdf.org:/some/path/repos/futurile-www.git

# install Pelican
[laptop]$ pip install pelican

With Pelican installed locally it's easy to write posts and then to commit and push the source up to the remote server. Something like this:

[laptop]$ git commit content/posts/a-new-post.rst
[laptop]$ git push

Server source copy

We also need a clone of the source repository on the remote server. This will be the same configuration as on the laptop. We'll install Pelican on the server in a moment, and then to build the source of the blog into HTML we'll use the same sort of virtualenv set-up. To create the checked out source repository:

# connect to the server
ssh sdf.org

# clone a copy of the blog's source repository
[server]$ git clone ~repos/www-futurile.git $HOME/workspace/futurile-www

Server Pelican

The source of the blog is on the remote server, so to deploy a new version we have to build HTML pages. This requires an installation of the blog engine on the remote server. I already have an installation of Pelican on my laptop and to keep things simple I use the same version on both laptop and server.

On the server we do:

# check that you're using python3
[server]$ which python
[server]$ python3 --version
3.5.9

Set-up the environment for the virtualenv by adding the following in .bashrc:

export PROJECT_HOME=$HOME/workspace

export VIRTUALENV_PYTHON=/usr/bin/python3
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
export VIRTUALENVWRAPPER_VIRTUALENV=/usr/bin/virtualenv
export VIRTUALENVWRAPPER_VIRTUALENV_ARGS='--no-site-packages'
source ~/.local/bin/.virtulenvwrapper.sh

Having sourced .bashrc the next step is to install Pelican:

[server]$ mkproject --verbose -r futurile-www/requirements.txt Pelican4.01

I already have a repository with my content in it, this has a requirements.txt which I use to install everything. I admit that I'm using an old version of Pelican - I'm just being lazy with my updates!

Now we can connect our blog source repository with the virtualenv we just created. This will mean that when we activate the virtualenv it will use the blog's source repository. This is what we want because our blog posts source is in the repository and we can then use Pelican to build all the content.

# connect the virtual env to a checked out repository with the source in it something like:
# git checkout ~repos/www-futurile.git $HOME/workspace/futurile-www
[server]$ setvirtualenvproject $HOME/.virtualenvs/Pelican4.01 $HOME/workspace/futurile-www

[server]$ workon Pelican4.01

The next step is to update submodules which are used for some Pelican plugins.

[server]$ git submodule status
-c96846c201c5f2f2a9fe9434266e77672ccd3ed6 plugins

[server]$ git submodule init
Submodule 'plugins' (https://github.com/getpelican/pelican-plugins.git) registered for path 'plugins'

[server]$ git submodule update
[lots of output]

Have to admit I don't love git submodules and always find them a bit confusing! At this point there's a functioning blog with content so doing make html builds the content.

Post Receive Hook

Git hooks are very interesting as they provide all sorts of ways to automate actions when using git. The post receive hook runs on the server side and is used to work with external services after the git repository has been updated.

The main difficulty I found with developing hook scripts is how to test the script without having to do a git commit every time. Luckily we can simulate the inputs pretty easily.

A hook script can be in any scripting language, they are placed in the hooks directory. We're creating a post-receive script, so the file is simply called post-receive. When the script runs on a git commit it receives the hash from, the hash to and the branch. When developing the script locally we can simulate the environment by doing the following:

$ workon pelican-env

# see which branch we are on.
$ git branch

# get some git hashes
$ git log -2 --format=oneline --reverse
e247265200d683dd92b516221447263bfd91e469 New post on Specter nested data manipulation in Clojure.
935ae6339d7bf0f18b97257701d5e560cf6d4f82 (HEAD -> master, origin/master, origin/HEAD) Blog post on migrating from NeoBundle to Vim-plug.

# put the output into FROM_ID and TO_ID
$ export FROM_ID=e247265200d683dd92b516221447263bfd91e469
$ export TO_ID=935ae6339d7bf0f18b97257701d5e560cf6d4f82

# then play with your script as follows
$ echo "$FROM_ID $TO_ID master" | ./post-receive

It's much easier to develop this way, rather than putting the script into the hooks directory and then doing a commit every time you want to test it.

Python Hook Script

There's lots of blog posts that show simple scripts using bash scripting: so if this set-up looks too complicated there's simpler scripts out there to play with! Mine is a bit more complicated as I'm dealing with backing up my site, interact with a remote host and wanted things nicely formatting.

#!/usr/bin/env python3

from fabric import Connection
from datetime import datetime

from invoke import Context
from invoke import run as local

import os, sys, argparse

ANSI_green = '\033[92m'
ANSI_yellow = '\033[93m'
ANSI_red = '\033[91m'
ANSI_reset = '\x1b[0m'

remote_dir = '/directory/site/is/published/to/'


def backup_sites(cn):
    ''' Backup the dev and production site content and store it on ma.sdf.org '''

    # change to the right location on the local side
    os.chdir(os.path.expanduser('~/sites/backup/'))

    # find the name of the backup file on the remote end
    backup_name = "{remote_dir}{backup_date}-futurile.gz".format(
        remote_dir=remote_dir, backup_date=datetime.today().strftime('%Y%m%d')
    )
    # the local name of the backup file
    local_backup_name = "{backup_date}-futurile.gz".format(
        backup_date=datetime.today().strftime('%Y%m%d')
    )

    # backup the site if it doesn't already exist
    if os.path.isfile(local_backup_name):
        print(ANSI_yellow + "[Warning] " + ANSI_reset, end="")
        print("backup already exists, not recreating: {}".format(backup_name))
        return
    else:
        with cn.cd(remote_dir):
            result = cn.run(
                tar --create -v --gzip --file `date +%Y%m%d`-futurile.gz ./dev.futurile.net ./futurile.net'
            )

        if result.return_code == 0:
            print(ANSI_green + "[Success] " + ANSI_reset + "backup created.")
            print('    command: {}, return: {}'.format(result.command, result.return_code))
        else:
            # never gets to this because fabric will print it for you
            print(ANSI_red + "[Failure] backup creation failed: " + ANSI_reset, end="")
            print('command: {}, return: {}'.format(result.command, result.return_code))
            sys.exit()

        # get the remote file - no result object, it raises an exception on error
        cn.get(backup_name)
        print(ANSI_green + "[Success] " + ANSI_reset + "backup stored on local system.")

        result = cn.run('/bin/rm {}'.format(backup_name))
        if result.return_code == 0:
            print(ANSI_green + "[Success] " + ANSI_reset + "backup removed from remote.")
            print('    command: {}, return: {}'.format(result.command, result.return_code))


def build_site(lo, required_branch, local_dir, local_virtualenv, build_cmd):
    ''' Build the dev site under virtualenv '''
    os.chdir(os.path.expanduser(local_dir))

    # checkout the branch that we want - master or devel
    result = local('{} && git checkout {}'.format(local_virtualenv, required_branch))
    if result.return_code == 0:
        print(ANSI_green + "[Success] " + ANSI_reset +
                 "git branch successful, on {}.".format(required_branch))
        print('    command: {}, return: {}'.format(result.command, result.return_code))
    else:
        print(ANSI_red + "[Failure] git branch failed: " + ANSI_reset, end="")
        print('command: {}, return: {}'.format(result.command, result.return_code))
        sys.exit()

    # check if we're on the master branch - it shows 'master' if it's on the main one
    branch = local('{} && git rev-parse --abbrev-ref HEAD'.format(local_virtualenv), hide='both')
    #if branch.stdout.rstrip() == "devel":
    if branch.stdout.rstrip() == required_branch:
        print(ANSI_green + "[Success] " + ANSI_reset + "on the required branch.")
        print('    command: {}, return:{}'.format(branch.command, branch.stdout))

        # pull in any changes
        result = local('{} && git pull'.format(local_virtualenv), hide='stdout')
        if result.return_code == 0:
            print(ANSI_green + "[Success] " + ANSI_reset + "git pull successful, site source up to date.")
            print('    command: {}, return: {}'.format(result.command, result.return_code))
        else:
            print(ANSI_red + "[Failure] git pull failed: " + ANSI_reset, end="")
            print('command: {}, return: {}'.format(result.command, result.return_code))
            sys.exit()

        # update submodules
        result = local('{} && git submodule update'.format(local_virtualenv))
        if result.return_code == 0:
            print(ANSI_green + "[Success] " + ANSI_reset + "submodules updates.")
            print('    command: {}, return: {}'.format(result.command, result.return_code))
        else:
            print(ANSI_red + "[Failure] git submodule update failed: " + ANSI_reset, end="")
            print('command: {}, return: {}'.format(result.command, result.return_code))
            sys.exit()

        #build the site
        result = local('{0} && {1}'.format(local_virtualenv, build_cmd), hide='both')
        if result.return_code == 0:
            print(ANSI_green + "[Success] " + ANSI_reset + "site built.")
            print('    command: {}, return: {}'.format(result.command, result.return_code))
        else:
            print(ANSI_red + "[Failure] site build failed.", + ANSI_reset, end="")
            print('    command: {}, return: {}'.format(result.command, result.return_code))

    else:
        print(ANSI_red, '[Failure] exiting not on the required branch {}'.format(required_branch), ANSI_reset)
        sys.exit()


def sync_site(cn, local_dir, remote_dir):
    ''' Sync the dev site from ma.sdf.org -> tty.sdf.org '''

    os.chdir(os.path.expanduser(local_dir))

    # Shell out and do the rsync
    result = local(
      'rsync -C -a -c -h --itemize-changes -e ssh $PWD/output/ tty.sdf.org:{}'.format(remote_dir)
      )

    if result.return_code == 0:
        print(ANSI_green + "[Success] " + ANSI_reset + "site deployed.")
        print('    command: {}, return: {}'.format(result.command, result.return_code))
    else:
        print(ANSI_red + "[Failure] site deployment to virtual server failed." + ANSI_reset)
        print('    command: {}, return: {}'.format(result.command, result.return_code))
        sys.exit()

    # On the remote host I have to run this command to set Web permissions
    result = cn.run('/usr/pkg/bin/setwebperms', hide="both")
    if result.return_code == 0:
        print(ANSI_green + "[Success] " + ANSI_reset + "permissions set for virtual server.")
        print('    command: {}, return: {}'.format(result.command, result.return_code))
    else:
        print(ANSI_red + "[Failure] setting permissions on virtual server failed." + ANSI_reset)
        print('    command: {}, return: {}'.format(result.command, result.return_code))
        sys.exit()


def hook(from_commit, to_commit, branch):
    ''' Co-ordinates the actions depending on the branch we're on '''

    devel_branch = 'devel'
    prod_branch = 'master'

    if branch.endswith(devel_branch):
        print(ANSI_green + "[Success] " + ANSI_reset + "devel site build path")
        print("    received branch %s, matching %s" % (branch, devel_branch))
        cn = Connection('tty.sdf.org')
        lo = Context()
        local_dir = '~/sites/dev-futurile/'
        local_virtualenv = 'source ~/.virtualenvs/pelican4.01-publish-dev/bin/activate'
        build_cmd = 'make html'
        remote_dir = '/www/af/f/futurile/dev.futurile.net/'
        backup_sites(cn)
        build_site(cn, devel_branch, local_dir, local_virtualenv, build_cmd)
        sync_site(cn, local_dir, remote_dir)
        cn.close()
    elif branch.endswith(prod_branch):
        print(ANSI_green + "[Success] " + ANSI_reset + "production site build path")
        print("    received branch %s, matching %s" % (branch, prod_branch))
        cn = Connection('tty.sdf.org')
        lo = Context()
        local_dir = '~/sites/dev-futurile/'
        local_virtualenv = 'source ~/.virtualenvs/pelican4.01-publish-dev/bin/activate'
        build_cmd = 'make publish'
        remote_dir = '/www/af/f/futurile/www.futurile.net/'
        backup_sites(cn)
        build_site(cn, prod_branch, local_dir, local_virtualenv, build_cmd)
        sync_site(cn, local_dir, remote_dir)
        cn.close()
    else:
        print(ANSI_red + "[Failure] received branch {}, must be either devel or master.".format(branch) + ANSI_reset)


if __name__ == "__main__":

    # remove the GIT_DIR environment variable as the hook environment is
    # different from the 'virtualenv' one that we're working in.
    os.environ.pop('GIT_DIR')

    args = sys.stdin.read().split()
    # split the args and provide them to the hook function
    hook(*args)

It's not as complicated as it appears. The main thing to know is that I'm using fabric2 to deal with making local requests.

The hook function co-ordinates everything depending on whether it's a development or master branch. For each one it then uses virtualenv to enter the right environment. Then it backups up my existing site, builds the new site and then rsyncs the new site into the correct web directory.

Posting workflow

The workflow for creating a blog post is that I write it on the devel branch and review it by publishing to the devel virtual server. It doesn't matter whether I'm on the laptop or remotely ssh'd into the server:

workon Pelican4.01

# see which branch we are on.
$ git branch
* devel
  master

# switch to the devel branch if needed
$ git checkout devel

# push the devel branch to the remote if needed
$ git push origin devel

# write post
$ vim content/posts/my-new-post.rst

# commit and push
$ git stage content/posts/my-new-post.rst
$ git commit
$ git push

I can then look at the post in my devel virtual server (dev.futurile.net).

When writing a post I push it to the dev server multiple times. I use vim-fugitive to commit the new changes and push them as normal.

When the blog post is ready I do an interactive rebase - editing the commit message if needed. Followed by a merge from the devel branch into the master branch.

$ git rebase -i master
$ git checkout master
$ git merge devel

The site will be built when we do:

$ git push

Finally, if there were any interim commits on the devel branch I can remove it:

$ git push origin --delete devel
$ git branch --delete devel
$ git branch --delete --remotes origin/devel
$ git branch --all

Useful sources

Final Comments

Unfortunately, my personal evidence is that automating the deployment of blog posts doesn't increase the frequency of blogging! Automating the publishing of the final blog post is pretty useful. The work put in so that I can write blog posts on my remote host and review them on the devel branch hasn't been useful. The famous xkcd joke about automation applies. However, I'm happy with the set-up and enjoyed exploring this form of automation.


Posted in Pelican Saturday 26 March 2022
Tagged with tech blog pelican