Implementing the Virtual Data Warehouse: August 2010

There’s tons of talk about the NIH’s Collaboratory initiative among HMORNsters. It’s still pretty undefined (RFA comes in early October last I heard) but that doesn’t stop us from having fun speculating on what it will/should entail in terms of scope and funding. Here’s my take (from my data-centric and generally geeky point of view) on what an optimal Collaboratory would look like.

I should maybe note that even though I’m located at Group Health where the cabal leading the proposal development is, I do not have any inside information about what the people actually in control are formulating—I am not part of the cabal. This is just me spouting off.

I start from the premise that NIH’s main goals are to increase the research bandwidth of the HMORN and to increase the access that non-HMORN investigators have to HMORN scientific staff and data. That may be incomplete or flawed, but that’s the impression I have. Those goals should be easy to objectively evaluate: if we did X studies/year in a pre-collaboratory world, we should be doing X + Y post-collaboratory. If our research involved people from A different external organizations pre-collab, it should involve A + B external orgs post-collab. The larger Y and B are, the more successful we have been.

Like crime, collaboration requires motive, means, and opportunity. The players who have resources (scientific & industry expertise; data) have to be motivated to collaborate. There has to be something in it for them—some good they can acheive by collaborating that they cannot acheive without. I don’t have much to say about motivation other than it is absolutely crucial, and is probably the toughest nut to crack.

So assume motivation away. If we can assume motivated players, how could a collaboratory enable their collaboration?

By creating an environment where Investigators can get to know and trust one another

It’s very temping—particularly for us technical folk—to focus on the technical issues involved in collaboration—how do we develop the dataset we need to address our questions, and once we have it, how do we get it from point a to point b? This is because these are by and large, fun problems with acheivable technical solutions. But we humans are primitive creatures, and way before any of the fun technical issues can come up, we have to establish a basic level of trust between people at the data sources, and those at the destinations. That requires a lot of good old fashioned schmoozing, and for that, there’s nothing like in-person meetings.

That’s particularly challenging for a distributed, virtual organization like the HMORN. Staff come together at the HMORN and other scientific conferences during the year, but those contacts are too few and far between to foster real trust relationships as quickly as we will want them to form. So we need to supplement these contacts with second-best, electronic contacts.

Specifically, the collaborative should include a Social Networking component—a means for the community of Investigators to discover and engage with one another informally. We need a FaceBook for investigators, where they can describe their interests, expertise, acheivements and goals, and communicate informally with one another about their current professional ideas and activities, and solicit/offer each others’ cooperation.

By creating a defined, discoverable and transparent process by which new proposals are evaluated and passed on

Right now there’s a lot of “you gotta know a guy” to playing in the HMORN. How do you get included in an HMORN grant application? You gotta know a guy. How do you get your foot in the door to initiate your own project idea? You gotta know a guy on the inside. (And I use the non-gender-specific sense of the word ‘guy’ here.)

A large part of that is the abovementioned trust issue. If only people we already know & trust can play in our reindeer games, then we don’t have to work with anybody we consider untrustworthy. Once someone is part of the informal inner circle, we can make the proper introductions and grease the skids for collaborations with them.

But some ideas are too time-sensitive or good to put them off. We can’t always wait for trust relationships to form naturally. For those ideas to have any hope of bearing fruit, we need to have an articulated process that outsiders can use to engage with the HMORN.
I quite like the way the Cancer Research Networkhandles this—they have a process laid out with an inquiry form to fill out on their public-facing website. The form asks who you are, what organization you’re with, what your idea is, whether you’re close to a CRN site, etc. Filling out that form results in an e-mail to CRN’s project manager (and an entry in a tracking database). The PM assigns each inquiry a “Collaboration Navigator” to it who is charged with helping the inquirer through the process of engaging with the CRN.

The collaboratory should use a similar process to help qualified candidates engage with the HMORN.

By making the HMORN and its scientific and data assets discoverable and documented

If we intend to collaborate with people from external organizations, those people are going to need to know what they would be getting into. Exactly what sorts of data do we have? Over what periods of time? How big is the population? Are there outstanding usablility issues w/the data, or is it ready to go? What sorts of uses has the data been put to in the past? Can I use my own programmer for the wrangling, or do I have to get the time of someone on the inside?

Potential collaborators will have these, and a host of other questions. The more of these we can document in writing the fewer we will have to answer (again and again and again) verbally. Because the answers to these questions will change over time, the best means for documenting them is on an easily edited, publicly accessible website. The collaboratory should fund the creation of such a website.

Implementing the Virtual Data Warehouse

Friday, August 27, 2010

Shoulding all over the Collaboratory

By creating an environment where Investigators can get to know and trust one another

By creating a defined, discoverable and transparent process by which new proposals are evaluated and passed on

By making the HMORN and its scientific and data assets discoverable and documented

Followers

Blog Archive