Dynamically generated linked dropbox menus with R and Shiny

I've been recently tasked with coming up with a time tracking solution for our team. We work on multiple projects, some of which are publicly funded, and this means we need to be able to track individual project contributions. Due to various technicalities it makes more sense to track percentage contribution instead of actual hours. In addition some projects have sub-projects for which we need to track contributions as well.

After wrestling with redmine with various plugins I decided that "there has to be a better way". I've been using R and Shiny for another project so I thought I would use my newly acquired R chops to build a Shiny (pun intended) time tracking web-application. This gives us all the power of R for any further manipulation we are likely to need in the future, but the simplicity of a nicely designed web interface.

The stars seemed to align and the excellent Dean Attali had recently released an article on using R and Shiny to build Google Form-like sites for data collection using R. Perfect! This became the basis for the web app, except I switched from CSV-based storage to MySQL, and I am pulling a bunch of the category data (project name, and user name) from the database.

All the code shown here is part of a minimal working example which I will upload later. I've noted where I normally query the mySQL database so you can go ahead a mirror that in your own code. Also note that this is my solution, not the best solution or the most concise.

Dynamically generated UI

Each user needs to be able to submit their weekly project contribution against any number of projects. There are multiple design solutions to this problem, like having add/remove buttons to add a contribution to a project. This approach has too many sources of error so I decided to simply ask the user how many project they would like to record and then generate N number of contribution fields.

Each contribution requires the user answer three questions: Which project is it? Which sub-project, if any, is it? What is the size of the contribution? So there are three fields, one for each question. A group of fields is generated for the number of N contributions a user would like to track.

The three elements are dynamically generated in server.R. The first snippet of code generates a list of input objects for each contribution to be recorded:

build_fields <- function(entries_to_add) {
  list_of_projects <- # LOAD DATA FROM mySQL

  lapply(1:entries_to_add, function(entry) {
    list(
      column(4, selectInput(paste0("project_field", entry), "Project Name:", choices = c("", list_of_projects))),
      column(4, selectInput(paste0("subproject_field", entry), "Sub-project Name:", choices = c(""))),
      column(4, sliderInput(paste0("contrib_field", entry), "How much of week contributed:", min = 0, max = 100, step = 5, value = 5, post = "%"))
    )
  })
}

contribution_ui_generator <- eventReactive(input$add_more, {
  entries <- input$number_of_contributions
  build_fields(entries)
})

output$ui_contributions_fields <- renderUI({
  contribution_ui_generator()
})

And then displayed in the ui.R:

user_form <- 
  div(id = "form",
      fluidRow(
        column(4, selectInput("user_name", "Select Name of Reseacher:", choices = users)),
        column(4, sliderInput("number_of_projects", "How many contributions to add?", min = 1, max = 10, value = 2))
      ),
      fluidRow(
        column(4, actionButton("add_more", "Add Contributions"))
      ),
      hr()
)

shinyUI(fluidPage(
  titlePanel("Example Dynamic UI"),
  user_form,
  uiOutput("ui_contributions_fields")
))

Updating menu content

Next the sub-projects field needs to update depending on which project is selected. You've probably seen these kinds of dependent or linked dropbox menus, especially with country and state/county fields in registration forms. When you select USA, the state menu is populated with the American states, if you select UK you get a list of counties, and so on.

All the sub-project information is stored on the MySQL database so I pull that information using the RMySQL package, apply a simple filter by projectID and select only the name column. This gives me the list of sub-projects associated with the selected project.

The content of a selectInput can be updated using the updateSelectInput() function as follows:

updateSelectInput("subproject_field", choices = c("new", "choices"))

Creating an observer

In order to react to changes in the project name field I create an observer, that updates the choices in the sub-projects field when the value of the project field changes:

observe({
  project_id <- input$project_field
  subprojects <- # LOAD DATA FROM mySQL

  list_of_subprojects <- subprojects %>%
          filter(projectID == selected_project) %>%
        .[['name']]

  updateSelectInput("subproject_field", choices = list_of_subprojects)
})

But then I run into a problem: I can build dynamic UI elements, I can create observers for known input objects, how does one create observers for an arbitrary number of objects?

Bringing it all together

The trick here is to realize that you can create observer objects in a loop and also nest reactive and observer objects.

So after all the fields are built we generate an observer connecting each project dropbox to a subproject dropbox. There are a lot of parts to the following code so first we loop over all the generated entries and create an observeEvent object that reacts to changes in the project dropbox selection, loading the subproject data, filters out rows where the project name matches the project name selected in the dropbox menu. Then the subproject dropbox menu choices are updated to include all the associated subprojects.

You'll note that I am using the local object here, this is to bind the value of entry to the current iteration, otherwise only one of the dropbox menus is updated. Finally I return a div object containing all the contribution fields:

contribution_ui_generator <- eventReactive(input$add_more, {
  entries_to_add <- input$number_of_projects
  contribution_fields <- build_fields(input$number_of_projects)

  for(entry in 1:entries_to_add) {
    local({
      local_entry <- entry
      project_name_field <- paste0("project_field", local_entry)

      observeEvent(input[[project_name_field]], {
        local_project_field_name <- project_name_field
        subproject_field_name <- paste0("subproject_field", local_entry)

        selected_project <- input[[local_project_field_name]]

        list_of_subprojects <- subprojects %>%
          filter(projectID == selected_project) %>%
          .[['name']]

        updateSelectInput(session, subproject_field_name, choices = list_of_subprojects)
      })

    })
  }

  div(id = "list_of_contributions", contribution_fields)
})

This should set you up with dynamically generated, linked dropbox menus. This technique can be used to create observers for most types of dynamically generated UI elements.

Thank you for reading, I hope this was helpful. Let me know if you've found a better way or have any questions.

Adventures in Refactoring #1 - Is it left or right?

Adventures in refactoring is a series covering my progress in refactoring an old codebase I wrote at the end of my PhD. This is a learning exercise to improve my design and refactoring skills and an excuse to learn C++11 on a realistic code base

As part of my PhD I had to design and implement a large analysis scheme that to calibrate our b-tagger on the latest data from the ATLAS detector. As this was the first time I had the reigns on a whole project from start to finish, I took time to think through the problem and implement it right. I chose the latest tools provided by the collaboration and made sure to create a large set of small classes - the road to hell is paved with good intentions.

As the project grew larger and larger, and the specifications changed or expanded, the application grew into an unrecognisable monstrosity full of scary hacks and duplicated code. Sound familiar?

I've recently devoured several of Scott Meyers' C++ books, Michael Feathers' Working with Legacy code, Robert Martin's Clean Coder, and Martin Fowler's Refactoring. So I decided to apply some of the stuff I'd learnt on my calibration project. Since this code is not going to be used any more I am not afraid to break it.

You can check out my progress on the GitHub repository. Note that the code might not much since I am making changes to the code as I write these posts.

Pandora's box

So the analysis scheme is mostly written in C++ with a few bash scripts to manipulate files and simplify running the analysis routines.

There were/are two major problems with the code. First, the code is not under test. This is not an uncommon situation, in fact Michael Feathers' book is dedicated to that very problem. The tactic is to take things slowly, teasing method and classes apart and placing small portions of code under test. This slow iterative approach reduces the danger of breaking something and over time improves your ability to make significant changes with confidence.

The other problem is that there are many explicit dependencies on external packages. This is also quite common, but it requires a more considered approach. Each dependency needs to be considered separately, and as it turns out, the way in which the code is going to be used directly affects which dependencies you break and how.

A little less conversation

So instead of talking through the whole system, what the classes look like and what everything does, I am just going to show you some code and we shall decipher things as we go along. I present to you TJPsiTagSelector:


#ifndef TJPSITAGSELECTOR_H_
#define TJPSITAGSELECTOR_H_ 1

#include 
#include 
#include 

class TJPsiTagSelector {
public:
  /// Standard ctor
  TJPsiTagSelector(const std::string& val_name="TJPsiTagSelector");

public:
/// Standard dtor
  virtual ~TJPsiTagSelector();

public:
  /// Initialize
  int initialize(void);

public:
  /// Test if muon passes
  int accept(const D3PDReader::MuonD3PDObjectElement& muon);

public:
  /// Test if muon passes
  int accept(float eta,
             int combinedMuon,
             float pt,
             float d0,
             float z0,
             float d0Sig,
             float z0Sig);

public:
   /// Finalize stuff
   int finalize(void);

public:
  std::string name;

public:
  // Cut values and names
  float   etaCut;
  int     combinedMuonCut;
  float   trackMatchDrCut;
  float   ptCut;
  float   d0Cut;
  float   z0Cut;
  float   d0SigCut;
  float   z0SigCut;

  ClassDef(TJPsiTagSelector, 1);
}; // End TJPsiTagSelector

#endif // END TJPSITASELECTOR_H_

You should be vomiting profusely at this point, there are formatting problems, encapsulation is non-existent, unnecessary commenting, unclear method and variable names, dependencies on implementation details, and even unnecessary included headers. Can you tell what this class does? It takes in a muon object and determines whether it passes a kinematic selection. Conceptually the particle is then known as a tag. Is that clear from the code? Absolutely not!

Even this small piece of code reveals a lot of the smells present throughout the code base. The formatting is something that can be easily fixed using automated tools without the need for tests. Let's do that first.

After cleaning things up the class looks a little bit better. Did you spot the typo?



    #ifndef TJPSITAGSELECTOR_H_
    #define TJPSITAGSELECTOR_H_ 1
    
    #include 
    
    class TJPsiTagSelector {
    public:
      TJPsiTagSelector(const std::string &val_name = "TJPsiTagSelector");
      virtual ~TJPsiTagSelector();
      int initialize(void);
    
      // Test if muon passes
      int accept(const D3PDReader::MuonD3PDObjectElement &muon);
      int accept(float eta, int combinedMuon, float pt, float d0, float z0,
                 float d0Sig, float z0Sig);
    
      int finalize(void);
    
    public:
      std::string name;
      float etaCut;
      int combinedMuonCut;
      float trackMatchDrCut;
      float ptCut;
      float d0Cut;
      float z0Cut;
      float d0SigCut;
      float z0SigCut;
    };
    
    #endif

I left a single comment behind which clarifies what the two functions called accept do. Everything else was stupidly obvious -of course finalize does the finalizing of things- or extraneous such as the comments at the end of the class. For the member variables I would like to make them private and instead create accessor methods, but that means messing with an unknown number of clients. I want to avoid that at this point, especially with no tests in place.

First lets try to get this class into a test-harness. The constructor is fairly straight-forward; it does nothing but set default values for the kinematic cuts and name:

TJPsiTagSelector::TJPsiTagSelector(const std::string &val_name)
 : name(val_name), etaCut(std::numeric_limits::max()),
   combinedMuonCut(-1), ptCut(std::numeric_limits::min()),
   d0Cut(std::numeric_limits::max()),
   z0Cut(std::numeric_limits::max()),
   d0SigCut(std::numeric_limits::max()),
   z0SigCut(std::numeric_limits::max()) {}

Lets start by constructing an empty object, providing no parameters:

TEST_F(TestTagSelector, initialTestConstructingObjectWithNoParameters) {
  TJPsiTagSelector selector();
}

I compile and of course the code runs.

This is not super exciting, but I like to take this approach of building the simplest version of an object first before moving on to more complex testing. Often times even that is quite difficult and requires too much work. That's how I've assessed where to start the refactoring. It is no coincidence that we started with TJPsiTagSelector.

Initialize methods

From other classes in the same package as TagSelector I know that initialize is actually a misnomer, it's meant to ensure that you have set the cut variables and warn you otherwise. Unfortunately TJPsiTagSelector::initialize merely returns one. This is very bad, there is no checking at all. This is where I write my first test, initialize should return zero if any of the cuts are not set.

So I remove the empty test above and add a check for the failing case:

TEST_F(TestTagSelector, InitializeReturnsFalseIfCutsAreNotSet) {
  TJPsiTagSelector* invalidSelector = new TJPsiTagSelector;
  EXPECT_EQ(0, invalidSelector->initialize());
}

With the test in place I make the necessary changes:

int TJPsiTagSelector::initialize() const {
  if(etaCut == std::numeric_limits::max()) return (0);
  if(combinedMuonCut == -1) return (0);
  if(ptCut == std::numeric_limits::min()) return (0);
  if(d0Cut == std::numeric_limits::max()) return (0);
  if(z0Cut == std::numeric_limits::max()) return (0);
  if(d0SigCut == std::numeric_limits::max()) return (0);
  if(z0SigCut == std::numeric_limits::max()) return (0);

  return (1);
}

Note that this is a big chunk of code to write and you should go more slowly, adding tests for each individual cut to be set. Since there is almost no logic here I decided to test all values not being set and move on. I compile and run the test, everything passes.

I then create a TagSelector object and set it up with some cut variables. This will be the object which is used throughout the tests. This gets put in the SetUp method, with a corresponding clean-up in TearDown:

class TestTagSelector : public ::testing::Test {
  //...

  TJPsiTagSelector* selector;

  virtual void SetUp() {
    selector = new TJPsiTagSelector();
    selector->etaCut = 2.5;
    selector->combinedMuonCut = 1;
    selector->ptCut = 4000;
    selector->d0Cut = 0.3;
    selector->z0Cut = 1.5;
    selector->d0SigCut = 3.0;
    selector->z0SigCut = 3.0;
  }

  virtual void TearDown() {
    delete selector;
  }

  //...
};

By the way, these values correspond to the ones used for the real analysis. I then add a test for the case where initialize should return one given that the cuts were set:

TEST_F(TestTagSelector, InitializeReturnsOneIfCutsAreSet) {
  EXPECT_EQ(1, selector->initialize());
}

Compile and test, everything passes. Nothing crazy but every marathon starts with a first step. I am definitely not done with initialize, the name is ridiculous and I need to also test for invalid values, such as a negative pt cut. Note that there are probably more correct ways of checking the input, but once again this is a first step.

In the next instalment I will start work on accept since that's where the meat of the class is and things get way more interesting (and complicated).

Becoming a better programmer in the new year

Welcome to 2015, I hope you've made the transition well and with all of your limbs intact. This year I am focused on learning as much about programming and being a professional software developer as I can. My hope is to join a good development team from whom I can learn a lot and contribute to in return.

To that end I have set up a personal Kanban board to keep track of all the things I want to learn. Oh boi! Let me tell you, there is a lot of stuff I want to learn. Since I don't have enough space to set-up a proper board, the Kanban is currently on the inside of my notebook and that has been working quite well.

One of the projects I've started on now is the development of a speedrunning split timer in Python. Speedrunning is the practice of attempting to finish games as quickly as possible by exploiting glitches and clever routing through the game. Runners will set-up a set of time splits through-out the game to keep track of their progress compared to world records or personal bests. There are numerous split timers out there but I've not been able to get any of them running on my Mac without crashing.

I figure this would be a nice, real-world pet project I can develop and improve my Python in the progress. This is not my first exposure to the language but it has been a while.

On the list is also learning about UML, Waterfall development, structured design and analysis, more C++, more refactoring techniques, more testing techniques, and more algorithms.

One step at a time. No pressure.