The Fellowship of The Data

Hey you how's it going? You looking mighty fine! insert mandatory not blogged in a while excuse.

Next week I'm starting an eight week fellowship organized by the Advanced Skills Initiative. Yes, it does sound like something out of the Bourne universe; it's not though, I think.

The first two weeks or so will consist of workshops and lectures and the rest of the time I am going to be working on a personal project for an organization. I don't know what organization I am going to be working with right now, but it should be interesting work regardless. In addition to the lectures we'll have talks from people in the industry and various demo days where we will show off our work.

Apparently there will be around 20 of us in the programme; all PhD students with some kinda of numerical analysis background. I am looking forward to meeting everyone but I must admit I'm a bit intimidated.

Hobbits really are amazing creatures

The programme is said to be very intensive so I'm using my time to prepare for it. Data scientists need to develop a variety of different skills:

  • Statistical data analysis
  • Machine learning
  • Programming language
  • Business insight
  • Communication and networking skills

I want to develop as many of these skills as possible but to hit the ground running I'm focusing on the tools of the trade: programming skills!

And my axe!

Data scientists seem to be using one of two languages: R, a statistical analysis language; and Python, the well known general-purpose programming language. I decided to go with Python since I am already somewhat familiar with the language having used it before, it's a more generic language than R therefore more valuable to know, and the industry seems to be moving in that direction anyway.

I am recreating the last portion of my PhD work using a combination of Python and MySQL to get familiar with some of the packages used.

Speak friend and enter

I've managed to install and set-up a MySQL server, and import a small sample of muon data. I was surprised by how quickly I got this done with some rudimentary Python scripts and MySQL commands. I then connected to the server in a Python script and plotted the transverse momentum distribution of some decent quality muons. All of this in less than two days, not sure why I think this would take way longer using an ATLAS-style work-flow.

Next I am going to develop the analysis further introducing the interaction between two tables of data and multiple selections. It's interesting having to think and structure the analysis in a whole different way. Maybe once I have this developed a bit more I will talk about it.

That's it for now, I will leave you with the transverse momentum distribution of some fancy muons that I managed to make. Look at those Latex-typeset axes labels! Ooooh!

The distribution of muon momentum /img/first_pt.png Figure 1: Muons are fast!