Update: There has been some good discussion on this post at the LinkedIn group on Scientific Software Development and Management. See also the papers by Victoria Stodden and the great complmentary article by Ilian Todorov.
Update 2: There is now a Part 2 to this post: The research software engineer.
Last week I attended the 2012 Collaborations Workshop at Queen’s College in Oxford. Organized by the Software Sustainability Institute its goal was to bring together software developers and researchers and relect upon how both groups interact and if anything needs to be changed.
I only found out about the two day workshop and the existence of the SSI a few weeks before but immediately signed up. It was the first time I attended a conference so relevant to my own position and work. There were about 50 attendees, all in a similar position: PhD degree, working in academia or research lab, strong computational/software skills and working closely with researchers from at least one other scientific field (with a strong representation from biology/chemistry).
The conference went very smoothly, expertly managed and organized by Simon Hettrick and Neil Chue Hong. There were hardly any conventional talks, rather lightning talks and a whole series of break-out sessions which resulted in a lot of interesting discussions. One of the fundamental problems that kept coming up was the problem of defining ourselves as a group. What kind of species were we?
A new species?
On the one hand hardly anybody felt comfortable being called a software developer. The unease, I think, stemming from the fact that would imply their (our) work is ‘just’ (note the quotes) about setting up websites, database apps, writing GUIs, etc. While in reality such people often have to deal with very complex algorithmical problems, obscure FORTRAN codes, big data problems, HPC resources, etc. Additionally they need to have a good understanding of the problem domain at the cutting edge of some scientific field as well as have technical writing skills (for papers/grants) and pedagogic skills (for dealing with students).
On the other hand, labeling this new species as researchers also doesn’t quite fit either as their work does not involve the classical scientific method of hypothesis generation & testing. Nor are they exclusively interested in writing papers and obtaining grants (the only thing an academic fundamentally has to care about). They may very well contribute to papers or write a couple on the tools they developed but that is typically where it ends. Their main output is code.
So if this new species is not a software developer and not a researcher what is it then? Scientific software engineer? Research engineer? Technical consultant? Technology specialst? Research software developer? … Many terms were discussed but no real consensus was reached. Personally I think the term Microsoft uses for these kind of people in their research labs is not far off: Research Software Development Engineer. But then again neither is Computational Scientist or Computational Science Engineer. I think the exact term will come down to your individual preference depending on where you are happiest on the programmer – scientist spectrum:
The fact that as a group we are unable to name ourselves should be a warning light or at least lead to some reflection. Do we really deserve to be put in a different category? I wonder how developers working in the Financial sector deal with this problem or if it exists at all.
A career path
The naming problem brings me to another gripe that came up a number of times: the lack of recognition and of a career path. I can understand the frustration, its very easy to stick around in a post-doc type role, hopping from project to project, doing very cool things, but as soon as you start talking about getting married, children, or wanting to move to the next level you have a problem. To my knowledge there simply is no career path at research institutions for this type of role. One of the reasons cited for this is that the main output of this new species is software, something which is not valued up to the same level as a publication. The manifesto strives for this and its something that is changing (e.g., beyond-impact.org and the recently launched Journal of Open Research Software) but the funding bodies need to catch up.
Are people right to feel under appreciated and under paid? Or are we all just a bunch of moaners? Looking at the way science has evolved over the past 10-20 years shows that we should at least deserve some appreciation. The computational element in scientific research has grown enormously over the last two decades and there is no branch of science left untouched by its reach. I would even argue that day to day scientific research would quickly cease to function without the countless codes, data warehouses, clusters, websites, servers, built, maintained, and tested by people of this new, unnamed species. Luckily I’m not the only one commenting on this: see the blog posts of Joss Winn and Paul Walk for example. Turns out there even is a manifesto you can sign online. With founding signatories including the likes of Peter Norvig it should be something to take note of.
A consequence of this is that research institutions are currently losing very valuable people. I have personally known people with great skill & knowledge being forced to move into industry due to the lack of a longer term career path. Note that in all this I do not want to imply that working in industry is boring, uninteresting or a step down. By no means. However, by definition the fact of working in a research institution means you tend to get to work on very cutting edge stuff and have more freedom, especially at a university. A lot of the people would like to stay at university for exactly this reason.
One response to all of this is that this problem will go away once researchers start learning proper software development, which some argue should be a prerequisite anyway. Then they will be able to manage their own tools & clusters. In my opinion, for small, domain specific, algorithmic projects, yes. For bigger things, no. It really needs a dedicated person properly trained in software development, testing, maintenance, deployment, continuous integration, etc. I have seen a lot of code produced by non-computer scientists over the years and I have to say I’m not worried my job will disappear any time soon 🙂
Another response I have heard is that a supporting role (e.g., similar to a lab technician) should cover this new role adequately. The objection here is that people feel they do more than simply support researchers. The tools they build typically form the core of the research activity and its not unusual for them to help drive the research direction forward. A counter response is then that people should just choose from one of the two watersheds. Become a researcher, do the cool stuff, and live with the fact that your hacking time will be partially replaced by writing papers and looking for grant money. Alternatively go to industry, enjoy a career path and higher pay, and live with the fact that you loose some freedom and interaction with cutting edge research. If you are lucky you may even manage to land an industry job which has the best of both worlds. There is something to be said for this you-cant-have-your-cake-and-eat-it argument, should we just grow up and make a decision?
This brings me to a next point: mobility. It is generally non-trivial to move between academia and industry, especially more than once. The ease at which you cross from one watershed to another will strongly depend on the contacts you have and how you have spent your time (this is true for any sector really). If after a 10 year career as an academic you decide you don’t want to become a professor and move to industry I can imagine a company scratching their head, wondering what to do with you when you apply for a software engineering position. Its very unlikely you will be able to start at a level matching your years of service. Thats fine if you progress faster (which you would assume) but its not an easy hit to take and the reality may turn out quite different.
Thou shalt learn
Core here is the obligation to develop yourself. Typically computer scientists working inside research groups are lone rangers (at least that has been my experience). They are seen of the center of expertise when it comes to computers & computation and typically have full control over the technologies they use and how. A disadvantage of this is that your knowledge is seldom challenged and the pressure to keep learning and up your level may wane. It also becomes hard to benchmark your skills and really know where you stand. When this happens and such people move to industry it naturally does not help the image of academia. The fact that, unlike industry, the majority of the software developed in academia never leaves the research group or gets released in a formal manner does not help. Though luckily there is pressure for this to change.
In that way a computer scientist in academia has exactly the same responsibility as a computer scientist or software developer anywhere else: read blogs, follow twitter feeds, solve project Euler problems (or similar), learn a new language, attend hackathons, etc. There should be no hiding behind research.
With that I think I have gone on long enough. This has become quite a lengthy post and I don’t think I have really solved any problems. But at least I should have given a good overview of some of the discussion that went on at the workshop and thrown it out in the open. I can only hope this will encourage more people to join the discussion, endorse the manifesto, and voice their opinions. The closest place of discussion I found so far is the LinkedIn group Scientific Software Development and Management but I would be very interested in other resources and fora. Good starting points are probably the DevCSI and LNCD communities.
Me personally? So far I have walked the line between computer scientist and research almost perfectly, but this issue has been on my mind for some time. Though interestingly, when I registered for the conference and forced to choose between a red (developer) and green (researcher) badge, I choose the green one.