Puppet dashboard - six months on
In this second article we'll be taking a look at the dashboard itself. Six months ago we were distinctly underwhelmed by the dashboard - it was more style than substance.
More recently we tried it and found various bugs including a lack of sort which resulted in this amusing graph:

We've imported approximately 45000 reports into the dashboard spread over 75 hosts to generate these numbers. This took about an hour to import on a fairly small virtual machine. Whilst a few months ago this fell over, today it worked flawlessly despite one worrying stall.
The initial homepage of the dashboard gives some overall numbers - now many nodes, how many were successful, when each node last checked in. Note the duplication of "Daily Run Status" at the top. Occasionally it also reports failed nodes as well as nodes which had failed to check in for 30mins (the default puppetrun interval - we don't run our nodes that frequently in any case - note the lack of any way to change this from within the GUI). The list is allegedly sorted by "latest report" but this wasn't the case (and neither could you change it to alphabetical which we'd prefer).

From here you can drill down to look at a specific node. You'll see a view like this:

It's fairly self explanatory - the Total/Failed is the number of resources under Puppet control. Below the first ten reports there is a "View more" button which results in the following:

If you click on an individual node you get a fairly long (approx two pages) report which starts with information like this:

Finishing with information like this:

This page desperately needs re-ordering - currently it is ordered "Time, Resources, Log" and to be frank, most people aren't going to care about time other than when they are performing tuning. I suggest that only the grand total time is displayed with a "Details" button beside it.
That log occasionally shows you some helpful information. Sadly it's often just filled with md5sums as in the example above - not terribly helpful.
Sadly it seems that although stability progress has been made, functionality is still sadly lacking. The roadmap mentions integrating puppetrun (so that you can remotely trigger puppet runs), however we'd like to run puppet in --noop mode frequently and without --noop on a less frequent (or even manual) basis. We'd also like to see some ability of dashboard to break the reports down by class or tag. For example perhaps the majority of failures are in the "apache" class rather than the "LDAP" class. Perhaps linking the reports back to source control so that failures can be compared with changes in the puppet configuration (not a guarantee that the error was caused by the change, or even that the puppetmaster was in sync with source control at the time, but better than nothing).