[Andrew Williams]
Introduction and Overview of the Tutorial
Well, we can go ahead and get started anyhow. Anybody who's outside, who's waiting to come in, please come on in. We're very excited to kick this off.
Introduction to OHDSI and Real World Evidence
This is the first tutorial of its type at the annual symposiums. There's lots and lots that cover different aspects about how to do a study. This one is going to take a slightly different angle. I'm Andrew Williams. I am at Tufts. Each of the different faculty members who are gonna be presenting today have some experience in running Odyssey Studies and they're gonna introduce themselves. We're not gonna go through a big introduction there. I'm gonna kick things off. And I'm gonna kick things off in part by timing myself. And what we're gonna do is gonna be broken up into two halves. The first part of what we're gonna be going through is what's known about the different aspects of doing a study. And it's gonna have some high-level intro for folks. It's gonna be redundant for the veterans in the rooms. A lot of folks who've done studies themselves are gonna be contributing to the second part which is a panel discussion which talks about what has gone well, what hasn't gone as well in the conduct of Odyssey Studies.
[00:01:10]
And then a group discussion that's gonna really leverage the great expertise in the room to identify areas we can improve and to develop some momentum using Odyssey resources to improve how studies are done. So basically, that's what this is gonna be about. What will be covered in some detail is the kinds of studies that are typically done, the study design considerations, tools that are used, community participation, what that looks like, roles on teams when you're doing studies, and responsibilities, workflows, project management issues. And then again, in that second part, after the break, we're gonna be going after a panel to discussion where people are bringing examples of when things have really gone well and when they haven't gone as to plan.
[00:02:06]
Identify opportunities for improvement. And so you'll have a better sense, if you don't already, of how the Odyssey community organizes itself and works as a group to improve things and try and initiate some momentum coming out of that conversation for a specific improvement goal. So it should be very exciting, I think. I'm gonna leap in now to this broad overview. And again, I apologize for those who are very familiar with Odyssey already. This is gonna be very redundant. In fact, there'll be some slides that you have seen many times, but you are probably aware that we are a global open science community dedicated to generating reliable medical evidence using EHR data, trying to provide evidence to clinicians, to policy makers, to patients, to help improve their health. And this is done in a federated way. So normalized data analyzed locally and results are shared centrally.
[00:03:03]
It's a large community estimated. We keep coming up with this. I'm sure there's an improved estimate out there somewhere. But as of a few years ago, about 12% of the world's population had a record in an OMOP common data model. And studies are often the largest of their kind. And in many cases, they're very impactful. So they've been used to guide regulatory decisions that are very important to develop therapeutic pathways that are being implemented at a national level in different parts of the world. And impressively, have done things like precede predictions from clinical trials on the same topical area. So that's in general what Odyssey is.
Real World Evidence: Advantages and Challenges
And it is dealing with real world evidence. You're, again, probably quite familiar with this, but it is evidence generated using data from healthcare settings, which has advantages. It tends to be more representative of typical populations and of interventions and less expensive to do than prospective data collection as you would do in a trial or a massive epidemiologic study that's collecting primary data.
[00:04:15]
And can be done on extremely large populations, as we've already alluded to. It also entails some challenges. So the data, because they are collected for non-research reasons, are subject to all kinds of data quality problems, missingness, inaccuracy, and so forth. And you're not dealing with the exact population you wish you had to answer the question. You're dealing with an enormous convenience sample, which has important methodological implications. One of the characteristic challenges that's often a bugaboo of doing this kind of research is you don't have complete data on outcomes of interest. A lot of times you're seeing interventions, because that's what people are focused on, may or may not really have a great confidence in having captured the relevant outcomes associated with those exposures.
[00:05:01]
So those are some of the major challenges.
OHDSI's Approach to Real World Evidence Generation
Again, this is super high level, orienting what Odyssey is, what kind of evidence we produce, kind of studies that are done. In order to meet some of those challenges is a really robust set of activities across the community in developing and validating and empirically testing methods for doing evidence generation of these different types. And people implementing the methods against the standardized data in really excellent software that's often undergone a lot of development and maturity, so it can be used at scale. Rapid run through the kinds of questions you can ask of real world data, develop real world evidence broadly. How is care delivered, just characterizes a giant sea of healthcare activity out there in the world. This is what populations are exposed to, and it's hard to get a lens on what's really happening at all, so just characterizing. What is happening at scale?
[00:06:01]
What are the incidence and prevalence basic information? You can do something that's complementary to your traditional epidemiologic perspective. Design, because you've got these huge populations, you can characterize incidence and prevalence well. You can understand what treatments work best in which clinical populations that are often infeasible to do in trial by trials. You'd have to have zillions of trials to subset the populations to ask specific questions about effectiveness in each subset, whereas you can do that algorithmically when you've got standardized data the way we do it. If you've also got cost data, then you can not only understand effectiveness, but cost effectiveness in a way that's often extremely important for policymaking and other kinds of decisions. And you can do surveillance of adverse events, either from pharmacological interventions, device interventions, and so on, so what treatments may pose safety risk to populations.
[00:07:00]
And then there's a fabulous infrastructure for doing prediction models, individualized prediction models that use this rich kind of data to understand the risk of outcomes for given individuals, given their specific characteristics. And then a little bit different than the clinical evidence generation, but getting back to this community of people who are doing methods research, a lot of the challenges are met because there's a fantastic community of people who are doing methods research, so it isn't necessarily designed to produce clinical evidence. It's designed to produce empirical evidence about which statistical methods work best, so those are often done.
Federated Nature of OHDSI
And I mentioned it's a federated community. This is a slide many of you are familiar with, stole it from George and or Patrick, I'm not sure, but on the left-hand side, you'll see the individual data holders in that big square. There's four of them listed there, each of which has their data in an OMOP common data model.
[00:08:01]
Standardized analytics are brought to that harmonized data, the source data brought in through an ETL, and then standardized analytics are applied to it, and those are shared centrally, so they conduct the analyses locally, share them centrally, and the right-hand side shows the fact that people are continuously working on improving the data standards, the software, the methodological research, and using it to do evidence generation. Ends up as a protocol, a pre-specified protocol that goes to the sites for being run, and then that gets shared and synthesized, and we're gonna be covering all of those different aspects in the tutorial today in more depth.
Data Standardization with OMOP Common Data Model
Data standardization, as you all undoubtedly know, is happening through the OMOP common data model. Again, super fast overview for the folks this is new for. On the left-hand side, that's the clinical data as it gets represented in individual tables
[00:09:00]
that are in a relational model, so you have persons who have visits, and at those visits, you have conditions, and drug exposures, and procedure exposures, and lab results, and all of those are related in a relational database to the person, so you get a longitudinal record for this person over this period of time. You had these visits, and each of these visits, you had these lab results, or these medications prescribed to you, and these indications of a health event, and that all gets tied together in the perhaps most important part in the middle. Those standardized vocabularies are representing both the data as it originally gets captured at a site, and as it gets mapped to a standard concept, and everything in the whole CDM is represented as a concept. The tables themselves, the variables and so on within those tables, everything is computable because it gets represented and persisted always
[00:10:00]
in a same schema and as a concept, and there's a lot of methodological advantages and software development advantages from having it represented that way, so the standardized vocabularies, many hundreds of vocabularies, and their synonyms, and descendants, and ancestors are all represented and can be computed on there and are used to define the concept sets that go in using logic to define cohort definitions and conduct studies with those, so that's how data gets standardized in OMOP,
Standardized Analytics and Software Ecosystem
and then standardized analytics are applied to those data. There's a really robust open source software community developing software that either implements methods or does other work visualizing things and managing things, assuming those data are in that schema and represented using those standard concepts. This is impossible to read, but it's impressive to me. I put it there because I'm so impressed
[00:11:00]
with my software developer colleagues who have really matured this ecosystem, so you have unit test coverage, you have all the dependencies written, who owns it, where it's released, what the dependencies, it's all very transparent and increasingly reliable as regression testing to make sure it runs in standard ways depending on regardless of which database you have and so forth, it's all a really impressive set of standard analytics that get run against those standard data and then the use of them
Phases of an OHDSI Study
in a given study has a structured form, again, a slide you're familiar with from previous presentations. The phases, this is along the top here, you'll see the distributed network, so each of those disks along the top are data sets that might be held and going from the upper left to the bottom right are the typical phases of a study where for each of the different phases, there'll be some characterization of whether a threshold has been met, some diagnostic.
[00:12:06]
So that happens for data quality in general or the data quality of any of those disks high enough to contribute to the study. It goes for the cohort definitions that are developed from the data if it is of sufficient quality. So there's various tools for assessing the performance of computable phenotypes as they're used or cohort definitions. And then depending on the kind of analysis you're doing, there are all sorts of assumptions about those particular analyses and there's support for transparent inspection of how well those assumptions that are made analytically are met and it's only after you've passed each of those three stop signs that final unblinded results are shared. And that's, I would say, still more aspirational. You know, we could talk about this. I think it really reflects best open science practices. It's meant to be occurring in a framework where you have a GitHub site for each study.
[00:13:07]
Martijn's outside and I'm showing one of his slides here. It may have been Martijn's. And it's really meant to prevent p-hacking, to have this open representation of a protocol that goes through each of these steps and whether you've passed it or not is publicly inspectable and versionable.
Study Team Roles and Responsibilities
So in addition to that set of data and analytics that are standardized and the phases of study, there's also, you know, components of who's actually doing the study. This isn't codified to the same extent and this may represent more my view, but I think it's relatively safe to say you're gonna need a science lead who's in charge of defining a question, meets important clinical or policy need and can situate the study in the context of prior research. You're also gonna need somebody who may or may not be that same person who understands the clinical context in which the data are developed and the decision that's being made or other aspects of the clinical reality, either about the data or the results.
[00:14:11]
And methods experts who really know about the different Hades packages that are gonna be applied in the standard analytics to things, people who don't just know about them but can really make the difficult decisions about how to do things like hyperparameter tuning or selection of models or all the other kinds of nuanced methodological decisions that are gonna be important. You're also gonna need a data expert, somebody who really understands the data model and what its limitations are, how things get represented and that's a non-trivial thing. So it's somebody who's knee deep in that is gonna be an important part. How do you put all of that together into a study package? That's a special skill in and of itself. It's increasingly routinized but somebody who's done it before, who knows how the machinery works to do that, test it, make sure it works well is somebody else you're gonna need on your team.
[00:15:02]
And then of course you're gonna need data contributors, people who have, fill out the roles of having those data sites. Finally, and this isn't often done but I feel pretty strongly about this one, you really need a project manager. It's hard for any one person in this role to do what is described here and define all the tasks associated with those different roles and make sure everything stays on track. So I think you really need somebody who may or may not have any of those other skills, may not come to the community because they're a methodologist or a clinical expert but who's really excellent at getting people to be organized and get work done. I think that's, I would call an essential role.
OHDSI Community Resources
Beyond all of that, so that's a lot. I went through it really quickly intentionally because I knew it was redundant for a lot of folks. Beyond all that, there's a large set of resources. It aren't technical, they aren't infrastructure.
[00:16:00]
They are about the community and they're helping people to understand what to do, how to do it, and how to connect with each other. There is the Book of Odyssey. There are hundreds of hours of YouTube tutorials and presentations. There's OpenCourseWare that's been completed by hundreds of people. These slides will be available in all these links for those to whom this is new. You'll be able to follow all those. There's also a very strong set of engagement activities that keep the community talking to one another, keep presenting things to one another. There's regular community calls every Tuesday. There's a very vibrant public discussion, forums, and then there are symposiums like this one, but there's also ones in Europe, in Asia. There's gonna be one in Africa soon. There's really important asset libraries. The whole goal, the whole motivation of doing research in this way is in part to standardize things, not for its own sake, but because that allows you to reuse things.
[00:17:02]
So when you've got things like a phenotype library that have performance data associated with them, you can reuse those. You may not use it exactly as is, but it's standardized stuff that's available to reuse, and there's those for the phenotype. There's a prediction model library. Code is preserved for almost all studies. There's been a little bit of movement in the history of Odyssey as to exactly where study code lives and so forth, but increasingly, there's a standard place where all of that lives. And overall, the code, not just for studies, but for all the other things, for the most part, lives in a single place, the GitHub repo for Odyssey. And then there's this fantastic, and I think undersold aspect of how results are managed and shared, and we're gonna hear more about that today. So there's a lot that's been published. It's increasing year on year, but there's a really interesting infrastructure in data.odyssey.org for keeping all of the results of studies in public view in a way that's unusual and or unique, maybe.
[00:18:09]
I'm not even sure there's another community that does that, and it's a really extremely valuable resource.
The Importance of the OHDSI Community
Finally, the most important thing, far and away, I think, is the community. The community is extraordinary. You're part of it. You may be some of the people who are just completely committed to it and spend a lot of time on it or are fairly new to it, but it is an extraordinary thing. It is a wonderful, welcoming group of people who are usually eager to help, and it's an unbelievable amount of talent. And I guess because events like this one and elsewhere really inspire a lot of people to contribute their time, people just spend a lot of time volunteering for things, and it's dependent on that enthusiasm. A lot of what it takes to get a study done is harnessing that enthusiasm and maintaining it over a long period of time because studies don't often go extremely quickly.
[00:19:05]
But there's a wonderful pool of people who have all kinds of skills. Sometimes they have enthusiasm and willingness, but they don't have a lot of prior experience or relevant expertise. You've got people leaping into roles that are good for learning, and they're not necessarily good for super efficient conduct of what it is they're in charge of, and you want to be managing that, right? So learning is good. Helping people grow in the community is great and important. There's no downside to that, but just being aware that you're not always getting somebody who's done something a million times is important. And people have realistic constraints, even though they get enthusiastic and want to volunteer, they don't necessarily have an unlimited amount of that free time to devote to things. So where your project planning deviates from its initial goals, people are gonna start to run into realistic constraints in the community.
[00:20:02]
But despite those challenges, things often get done really well.
Q&A Session Part 1
I am going to pause very briefly for questions. Again, this is a super high-level overview. The meat of what you're gonna be learning is coming from other folks. But yes. I like the slide where you talk about the different people that should be involved in the study. How many of those? Yeah, I think it's a great question. I'll give my answer. And again, this whole tutorial is gonna reflect the fact that we know some things pretty well. We know some of what I just ran through pretty well, and you're gonna hear about it in ways that are very easily implementable. Other things are evolving.
[00:21:01]
I gave my sense of what I think those roles are. So if I was gonna attach a should to that, I would say you should before getting other people to commit what might be a large amount of time to whatever it is you want to lead in the Odyssey community, have an excellent idea of how those are gonna be filled. You may not need to have filled them, but you may wanna do something that says, it's really gonna be feasible for me to get the full complement of kinds of expertise to do this before pulling the trigger, because, again, that's just my two cents on it. I think, yeah, Jamie? Can you go back to the contributors slide for a second to test it?
[00:22:04]
Yeah, absolutely. Before you go and distribute your- I love that advice. You need to have some data contributors that are your close friends. It might be worth calling out as a separate thing here, rather than a subset of data contributors. A package development site or something like that, package testing site, separate from other data contributors, is really maybe a good addition there. Thanks, Jamie. Yes?
[00:23:17]
I had a little trouble hearing some of what you were saying. No, it's all right. And I think probably for the sake of the video, we want people to use this also, but thanks. So, the question is for, if I want to share my code to another site for them to run, what are some advice for me to sort of decrease the number of bugs in my codes that would prevent other sites to run successfully? In general, I know it sort of has to be answered by having a data partner to run your codes before you share it to everyone else, but if I do not have that, what are some advice? I think the advice is to do that, rather than what the alternative is if you haven't done that.
[00:24:00]
So I think it's basically to Jamie's point, and what Anthony said that's gonna be going over in a little bit more detail later, is exactly how to test that at a site and to make sure that it is running. There is a fair amount that can be relied on that I didn't cover here. So I mentioned in very brief passing that there's a lot of regression testing to make sure that all of the main packages that are used to define cohorts and run analytics run on a variety of databases. And so that's updated and maintained regularly. So the kinds of bugs that might be experienced because I'm doing mine in Databricks and you're doing yours in Oracle are not very likely because for the most part, there's really good regression testing on those sorts of things. There are other things because they're complex study packages that can occur. And I think others should chime in, but my sense is rather than think about what to do if you can't, is to just make sure you can test stuff out so that given the standardization and the backend agnostic way of doing studies, it's very likely that folks won't run into those.
[00:25:14]
But we are gonna be hearing a lot more about it from people who are knee deep and providing the support for the community for that.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Thank you.
[Andrew Williams]
Other questions? Yeah, she. Maybe we can get... All the slide decks will be shared and there's a video recording that's being made and that'll be shared as well. So maybe I'll hand it off with a little bit of time to spare so we can get to the meat of stuff here. This is very, very exciting. And so, Nicole, please take it away.
[00:26:10]
[Nicole Pratt]
Overview of Running a Network Study
Okay, well, welcome everybody to this tutorial. It's great to see so many people. There are two seats at the front. If people are sitting on the floor and wanna sit on a seat, please go ahead. Okay, so my name is Nicole Pratt. I'm from the University of South Australia. What I'm going to do today in this section is really just go over the process of how to run a network study. And some of you who have already been involved with Odyssey for a while or have actually run a network study might have seen this particular slide before. But really these kind of step out the functions of designing a network study.
[00:27:01]
So the first part is what I'm gonna cover in the next 20 minutes or so, 30 minutes. And then we're going to have Ben, Sena and Yong talking to us about actually implementing a network study. So first of all, we'll talk about the research question and how to actually develop that research question into a viable product that we might then actually run as a network study. Talk about the process of doing that network study. And then Chan is going to give us a bit of an overview of where things can go wrong. And some of the things that we might need to really think about when executing these network studies. You might think that this is a wonderful opportunity for developing studies and it absolutely is. But there are some things we really need to think about to ensure that we get successful applications of our networks and really get the answers that are gonna be impactful and address the Odyssey mission.
[00:28:08]
So really quickly, before we start, I'm just gonna quickly do a show of hands. Has anybody ever been involved in a network study? Great, wow, lots. Okay, so I might skip over a few things. Who has never been involved in a network study, but really, okay, hands up. Who wants to be involved in a network study? Everyone, good, excellent, excellent, good. Okay, so let's get started.
Defining the Research Question
Well, the first thing we need to think about when we wanna run a network study is of course the question. What is your question? There are a thousand questions out there in healthcare and Andrew went through a few of those considerations. So what's your question? And what's the motivation for asking that question? You know, people have questions all the time, but what is it?
[00:29:01]
Why do you wanna answer? Why do you want an answer to that question? What motivates you to think that that is a good piece of evidence? The answer to that question, that will advance healthcare. Yeah, and what is that decision? What is the evidence trying to inform? Okay, what is it you want to do with that piece of evidence? All of that consideration is part of this idea of a question and why you would want to access a network like Odyssey to answer that question. What you also have to think about is what is it you're trying to estimate? Okay, you've got a question. You want to be able to generate some evidence, but you want to decide, well, what is that piece of evidence look like? What is the actual estimate that I'm looking for that will then translate into some sort of healthcare outcome? You might also wanna think really carefully about what's it relative to, okay?
[00:30:01]
We're asking questions, but we wanna say, what is the risk of an adverse outcome, but compared to what? Do we wanna just know the risk or do we wanna know, is it greater or less than or equal to some other comparator group? In Odyssey, you'll see, again, this diagram very often. So there are lots of different questions you can ask and Odyssey kind of puts them into these big buckets of things. So the first thing is around clinical characterization. And the question you're trying to ask there is what happened to them? So you've got a population. You wanna know what's happening in that population in terms of healthcare utilization or in terms of drug utilization or just what is happening. You might wanna look at that for different reasons. So the motivation might be because I'm wanting to do a clinical trial. Is it feasible? So that might be the motivation for asking a clinical characterization question.
[00:31:04]
You might wanna understand treatment utilization because I wanna know whether we have excess off-label use of a particular medicine or we wanna know are people getting the appropriate treatment sequence for their disease? Again, a different motivation. We wanna know disease history or something about quality improvement. Are people accessing the correct lines of treatment or the correct healthcare? So after we do clinical characterization, understanding the population use of a particular medicine, we might then want to say, well, what are the effects of that? So we do a population level effect estimation. What are the causal effects? And this is where a lot of research is done in ODYSSEY, asking about the causal question, the causal effects. Safety surveillance, comparative effectiveness. Then we have questions around patient level prediction.
[00:32:01]
So what's gonna happen to me if I get exposed to a particular treatment? What is my risk of getting an outcome of interest? I care about that for me. A doctor might care about that for his or her patient. This helps us ask the questions around precision medicine. What should I give a particular patient who is in a particular situation? And we might want to do that because we wanna intercept. We wanna mitigate some outcome. We wanna find patients where we believe that they have a high risk of having some adverse outcome. So we might wanna put in place some practice to mitigate that risk. So it's really important to understand those questions as well. So to answer all of those questions in an ODYSSEY framework, we have foundational pillars and I'm not gonna go through that because I think everybody understands. We've got vocabularies, we've got the standardized data network and we have open source tools.
[00:33:00]
And this whole infrastructure is what allows us as the ODYSSEY community to actually implement a network study. And we won't go through that. You can go and do a tutorial on each of those pillars and understand more about that but we won't go through the detail today. So some of the different types of questions, clinical characterization. You might wanna know amongst patients who have a particular disease, which treatments are patients exposed to? And here I've got some colors and I'll use those colors throughout and most of you will already probably know this. So in these clinical characterization questions, we wanna understand a target population. So in this case, we wanna understand those people with depression. So that's our target population. Who has depression? Amongst those people who have depression based on our design, who has the outcome of interest? So we just might want to know the incident rate of use of particular medicines in people with depression.
[00:34:07]
So that's pretty easy and there are lots of different examples of those kinds of questions. And personally, in the work that I do, these kind of clinical characterization questions are extremely important and sometimes overlooked. In lots of studies, people wanna go straight to the Holy Grail, the effect estimate. So what is the effect estimate? But interpreting that effect estimate, you really need to situate that in your clinical characterization. You really need to understand what is the use of this medicine. So if I have a really high risk of an outcome, what does that mean clinically to the population? Four times increased risk might not translate to a lot of the population level if only one in a million people get exposed to this particular medicine. So it's really important to think about that question and what it means to healthcare.
[00:35:05]
And we can do network studies on clinical characterization. It's really, really valuable to look at utilization rates across the world in different countries. So we can start to understand heterogeneity and so on. So it often gets overlooked, but I'm gonna make the plea to please always do characterization. It's fundamental to the work that we do. So patient level prediction. We might wanna ask about new users of warfarin. So for a given patient population who start taking warfarin, what is the probability that they'll have a GI bleed? And then now we need to incorporate this idea of time at risk. So when will that risk occur? And in this case, we've got one year. So start warfarin. What's the probability you'll get a GI bleed in a year?
[00:36:00]
So that's an example of a question in patient level prediction. Population level effect estimation. Does exposure to ACE inhibitors, for example, so a medicine for hypertension, do ACE inhibitors have a different risk of experience acute myocardial infarction or heart attack while on treatment? So time at risk is on treatment relative to some other treatment for hypertension, in this case, the thiazide diuretic. So now we're starting to incorporate the comparative effectiveness type or comparative safety in this example. So that's all just to say that we have lots of questions we could ask of a network study.
Assessing the Need for a Network Study
But the next thing we wanna do is really think about the implications of that question. Do we need to actually do a network study? Not every question needs to go to a network study. Is there already evidence about that particular question?
[00:37:01]
It might be something really exciting to you because you've never heard of it before, but does it need to be answered in healthcare? So is there an evidence gap? And then if there is no evidence and there is a gap, what then is your hypothesis? What do you wanna know about that particular research question? I will say also, I forgot to mention earlier that my background is as a statistician. And very often, I have clinical people coming and saying, could you run a study on, I wanna know about the risk of Y in people who take this particular medicine, go do it, right? It's really important to think about the reason why you wanna do it and give enough information to the person who's going to design your study and implement your study so that you can actually think about that with them.
[00:38:02]
So I also wanna make sure that we're all thinking about collaboration across those groups of people who are working on study designs. And lots of people can come up with a research question, but we really need a team to answer and address that question. So just keep that in mind as we go through. I quite like this hierarchy, I don't know, framework, I suppose, to think about the research question itself, okay? Really think about whether that question is practical. So this is a finer criteria. So thinking about whether it's feasible, is it practical? I've got a question, but can I answer it, okay? Do I have the resources and the time to commit? And Andrew talked about that a little bit as well. These network studies are a big investment in time, in effort, and you really have to think through, is this something that's going to be achievable?
[00:39:06]
So do we have the data available? That's also a very important question. And of course, with Odyssey, we tend to have those data available in lots of different formats as well. So that's less of an issue in a network study. Importantly, are there sufficient subjects available? So in part of designing your network study and your research question, you really need to engage a statistician to help you think through the sample size issue and the power issue. But again, with network studies, that's one of the key advantages, is that we tend to have a lot more people available to us to answer questions, and we can really address those issues of power that we would have if we were only in a single database. So think about that. Now, this one is interesting, and it's interesting.
[00:40:00]
So is there a gap in the existing knowledge that is worthwhile, right? Is it worthwhile to answer this question? And will it produce results that will address the gap? So everything is interesting to me anyway, but really think through the interest of your question. Does somebody wanna know, really? And who wants to know the answer to this question? And will it advance healthcare? Is it novel? And novel's a hard kind of scale, I think. You can have incremental novelty, or you can have completely brand new ideas. And I think both of those are very important, but worth thinking through when you actually design your study. So will it give you a fresh insight? Will it give you some new perspective of something you already know? So both of those are very important. We also need to think, is it ethical? Is it ethical for me to do this study? Not just, do I have the ethics of people to be included in the study, but is it ethical to do the piece of work?
[00:41:09]
And can we conduct that study with integrity, with respect, and safeguarding the confidentiality of those patients who actually contribute their healthcare data to a data source? Is it relevant? Will there be an impact? And this is the really key point about a lot of the studies we do. We don't wanna just do studies. We wanna do studies that produce an outcome that will make an impact to healthcare. And we might wanna really think through our questions and maybe give some value to those questions that will result in a really impactful outcome. And not doing a study and getting an answer isn't the only, isn't the end of the study. We want to know whether there is a practical implication for that particular piece of evidence we have.
[00:42:04]
What are we gonna do with it? Might be interesting, it might answer a question, but what are we gonna do now that we know that piece of information?
Novelty and Existing Research
So the question, is it novel? Odyssey has a lot of things, a lot of tools that you can use to help answer this particular question. And one thing you can do is you can get on data.odyssey.org and have a look at all of the different Odyssey studies that are ongoing. So you might have a question and you think, wow, I don't know the answer to that. The literature says that there's a gap and you wanna design your research study or a network study. Go onto the Odyssey studies, ShinyUp and have a look who else is doing a study in that area? Is it similar to the question you have? Can you join with another study to answer your question?
[00:43:02]
Can you address your specific question in another study that is addressing a similar question? Okay, so you can start to then start collaborating with other people in the community who are already doing studies in your area and joining with another group that is already being formed can help progress your study maybe a little bit more smoothly, but we can talk about that with Chan a bit later. Okay, again, where does the question come from?
Examples of Network Studies (SOS Challenge)
Is it novel? So here are two examples of network studies that happened earlier last year. I think, so these were community wide studies. We called it the SOS challenge, okay? Our Save Our Sisyphus challenge. And so these questions were derived from across the community about things that people were interested in. So we all had to submit our questions and I think there were quite a few.
[00:44:05]
Senator, do you know how many? There's maybe a hundred or so questions that were submitted, all questions that could be answered in the community, all of interest to the community. And a couple of those ideas were then chosen to then work through as a network study as a community. So here are the two that went all the way through study execution and publication. And both of those questions are safety questions, but they both have different origins of the context or why they were to be done. So the anti-VEGF and kidney failure, that's a safety question that was not known at the time, okay? Do these injections that you put in your eye, which sounds horrible, but you inject them straight into your eye, do they end up giving people kidney failure?
[00:45:02]
Okay, so that's a question. The fluoroquinolone study on the right there was a study I was involved with, with Chan, in Korea, and that was a question that came up because the regulators had put out a safety warning that fluoroquinolones, which are antibiotics, increase the risk of aortic aneurysm. Now, aortic aneurysm is a horrible thing, okay? If you get it, it's quite disastrous and it has a high rate of mortality. So the regulators put out this question. The observational studies that underpinned that safety alert had some methodological issues, okay? So we looked at those methodological studies, or those studies, and thought, well, we need to address this research question with the ODYSSEY framework and try to really understand is there a risk associated with these medicines or is it potentially due to other factors like inappropriate control for confounding, different study designs, and so on?
[00:46:10]
So that was the origin of that particular question, okay? Can we unpack this safety concern that has been highlighted in other observational studies and implement a rigorous study using the ODYSSEY framework to understand that risk? So two slightly different questions, but really important and impactful research.
Designing an OHDSI Network Study
So when designing your study in the ODYSSEY framework, you and your study team must then make decisions about how you would define the whole study, okay? So we're in the situation now where we've got a really good question and we want to design a study to address that question. So you'll see, again, this framework in the ODYSSEY framework tool stack, I guess.
[00:47:02]
We've got certain ways of addressing each of the components of a study. You need to design, well, you need to define who your population is. So what is the disease? You need to define who your target cohort is. What exposure have you got a question about? Looking at your comparator cohort, who's gonna be the counterfactual of the target cohort? You need to define the outcome. So what events are you interested in learning about? And again, in a network study, you don't have to always think about just the one outcome, okay? We have the opportunity across the ODYSSEY network to really look at the whole range of outcomes in our studies because of the structure and the way we do things. You need to define the time at risk, which is the span of time that we need to look up post-exposure for outcomes to have occurred. And of course, we need to define our analytical approach.
[00:48:01]
So let's just have a quick look at how ODYSSEY defines its cohorts. And again, I'm not gonna go through too much detail because there are lots of information out there on how you actually do this. But the idea is we start with a target clinical cohort or targeting comparative cohort. We have a qualifying target cohort and analytic cohort. So this is getting to the group of people that we actually want to study. So we have our target and our comparative cohort, and then we have another cohort, which is the outcome cohort. And in ODYSSEY study, you can think about the intersections of all those cohorts. So we have a cohort of people who are taking the target medication, cohort of people who are taking the comparative, a cohort of people who have the outcome, and now let's combine all those cohorts together and look at the intersection. So the ODYSSEY's definition of a cohort is a set of persons who satisfy one or more inclusion criteria for a duration of time.
[00:49:07]
And we'll go back to our ACE inhibitor, acute myocardial infarction example, and just have a quick look at how we do that in ODYSSEY. So to define a new users of ACE inhibitors, we want to really define the logic to determine whether a person belongs to that cohort. So think about who is it we actually want to include in our cohort. And I've just taken this example out of the book of ODYSSEY. So all of this, you'll be able to understand in more detail through the book, but basically we just get events where we want to say, what's the first time patients have used an ACE inhibitor? What initial cohort, what initial event defines time of cohort entry? Apply some inclusion criteria to that. Well, we want people to be exposed to an ACE inhibitor, but we also want them to have a diagnosis of hypertension in the past year.
[00:50:08]
So there's other inclusion exclusion criteria there. And then we want to define the cohort exit time. Okay, all of that needs to be defined in an algorithm. But what you'll notice is, I've just used the word ACE inhibitor here and hypertension here without sort of qualifying that. And what we can do in ODYSSEY to create a cohort of people who have a particular condition is come up with concept set expressions for those cohorts. So for people with ACE inhibitor cohort, for example, we want to get all of the people who have a record in the database that have an ACE inhibitor and we define the people who have ACE inhibitor by a group of concept sets. Okay, so we've got a group of concept sets and that becomes the definition of the element that is used in the algorithm to define a cohort.
[00:51:10]
Okay, and so then you specify all your TCs and Os based on that cohort logic. Right, time at risk and analytical approach. Again, I'm not gonna go through this too much, but the idea is we get our target, we get our comparator, we get our time at risk, and we look at the outcomes within that time of risk. When designing your study, you need to think about your analytics. What's your outcome model? Are you gonna care about just ever having an event, a logistic regression? Do you wanna know how many events people have had Poisson regression? Or do you wanna look at time to first event and do a Cox proportional hazard? So think about how you want to use the outcome information. Define your risk metric, define your model assumptions, define any sensitivity analyses. Think about what negative controls you're going to use and you'll hear a lot more about negative controls throughout the course of this symposium and it's really important to think about how you're going to understand systematic error in your study and pre-think all of those through.
[00:52:17]
Okay, so you've thought all about that and you've defined all your cohorts and you've defined your analysis and you've got your study plan, now you have to write a protocol.
Writing a Study Protocol
So when you're doing a network study, you must define a detailed study protocol that provides the full specification of all the design decisions that can then be reproduced, okay? So in Odyssey, we have protocols that are very specific, that define every part of the process so it can be reproduced, okay? And then we publish the protocol and that ensures transparency. So we're at the point now where we have a protocol and ready to do a network study.
[00:53:05]
Just really quickly, here's some examples of where you can get a study protocol, a template for a study protocol on GitHub repo and there are lots of different examples of the protocols if you look through that network study list I showed you earlier. Now, here's an example of a very recent post on the forum, Fan Bu has asked for participation in a network study to look at pneumococcal conjugate vaccines, okay? So what she did is created a protocol, posted it on the forum and asked for participation and Ben is gonna talk a little bit more about that. This is what a research protocol kind of looks like in the Odyssey format, so you need to fill all of that in. So in the end, by clearly defining your research question, situating it within the existing evidence base, you can leverage all the Odyssey framework and tools for the analysis.
[00:54:10]
And that will hopefully produce you a robust study that contributes valuable insights. So for me, I've got these big buckets of work you need to do to get to the point of your research protocol, but now that you've posted it on GitHub, you can look at the next step. I'm just gonna flick through that and get straight on to Ben. Oh, yes.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Q&A Session Part 2
It's hopefully a little provocative. Should your protocol, I have an opinion on this, I just wanna know yours and everyone's. Should your protocol include source code?
[Nicole Pratt]
Source code meaning?
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Because it is sometimes challenging for people to read words and properly reproduce something.
[00:55:05]
Source code makes that unambiguous, but it is also burdensome to see an appendix or a link to lots and lots of source code.
[Nicole Pratt]
Okay, well, that's something, let's put that as a question for this afternoon's panel, because what we're hoping to do is perhaps come up with some recommendations. So should we park that for this afternoon's?
[Andrew Williams]
Yeah, I really didn't set up the whole tutorial expectation with respect to questions very well. We have decided, because it's cram-packed for each of the half hours, and so ideally we'd like to have time for questions and answers after each one, but we have this whole section in the afternoon where we're gonna have a very thorough discussion. So things will be brought up then. But thanks, it was a great question, Jim. Let's wait until the afternoon, yeah.
[00:56:00]
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Excellent.
Study Execution and Launching a Network Study
Okay, hello, everyone, my name is Ben Martin. I'm a postdoctoral fellow at Johns Hopkins School of Medicine, and I'm in the Biomedical Informatics and Data Science section at the School of Medicine there. So my portion of the talk, and before I get into my specific portion, I wanted to share some good advice, and I'm glad that Nicole asked for everyone to raise their hands about what your previous involvement has been in a network study, because this being titled, so you want to run a network study, the most helpful thing for me to running a network study was participating in a network study prior to trying to run one. So participating and helping, in my experience, has been much different from running or leading. Not having that pressure of you are carrying the whole distributed, federated network study across multiple sites, having that removed allowed my learning and experience to be more positive.
[00:57:03]
And so some of those lessons and insights and experience that I got from just being one of that team, I took the bullets from Andrew's slide, so one of these bullet points, or somewhere in between, or also there's many people that just help with reviewing the manuscript. So you can raise your hand to help with a network study. There is no limit to how, well, I guess I can't say that, but there are very large groups of people that are helping to run a network study, and you're also at the best place possible to find a network study to help with, whether that be actually contributing data that you've mapped to the OMOP CDM, or just lending your clinical expertise or your statistical expertise. You can do this by joining the community calls. There's calls for collaborators. You can find posts on the forum, like Nicole showed an example. Or you can go to the Odyssey Global Symposium, where there will be many posters and people who are here actively looking for people to help with their network study.
[00:58:02]
So there's just a lot less expectation for you to just, you know, your value add, you can help, you can get experience, and that has been very helpful for when it's our turn to actually run and lead a network study, because it's a big undertaking, as you've seen already. So I also stole Nicole's slide while she was talking, just to kind of keep us tracking with where we're at. So we're now at section number five, first name on the list, Ben. So now we're at execution phase. And as you saw, there was a lot of work and important discussions and meetings and considerations and questions to be answered before we get to the point where it's like, okay, send out your study package to all the data partners and run this study and get results. So we've already done a lot of work. We can acknowledge that. And I want to be clear about what the steps are. This is not a, this is what I tried to do, is create a very simple, clear, explicit set of instructions, because it's a little confusing.
[00:59:10]
In my experience, I've had clinical, at Johns Hopkins, we have lots of people who are very experienced with observational research. So like Nicole's section, they get that stuff. You know, that's all, you know, seems very rich and complex to me, but they're familiar with what it takes to define a research question that will work with observational data. They're familiar with that. And then we get to this part where it's like, time to start the network study. And they're like, you know, fish out of water with like, how do we ask for people to join? Like, and you make a forum post and they're like, that's it. So this is for the people who have never done this before. They've never seen this done before. I just wanted to have at least one slide that says, here's your checklist. And then I'll just provide some more detail about each one of these points. So the first thing, which as you saw, is not a small amount of work, is getting your study protocol document in order.
[01:00:03]
And my best advice is similar to what Nicole said, is go look at some other examples. Every single, there's like 280 repos. Apparently I always can't keep count, but there's a study protocol document in every network study repo on the GitHub. Find one that's maybe similar. I provided two examples here of recent ones. So the first one I consider to be a, you know, very fleshed out, they did what Nicole described as the holy grail, so effect estimation. And so that may be, these are very long documents, so I also included one that's as simple and as small as can be if it's a little overwhelming to look through this whole long document when you're just starting. So the Odyssey Evidence Network is, we ran it as a network study, but it's really just having people run the DB diagnostics package. So it's not any crazy analysis. There's no clinical questions. We're just counting, we're just counting concepts, but in order to, you still have a study protocol document.
[01:01:00]
So I've got a simple example and a rich example. You can use those, but overall, and this is, you know, Jamie brought up a good question that I am anxious to hear what others' thoughts are too, because I don't have an answer with that. I can go ahead and say that maybe we keep the source code. I'm affected by that burdensome effect that you talked about. So when I see this big, long study protocol document, as soon as there's code, I'm like, you know, it's a lot. So what I put in my slides is, at a bare minimum, the expectation for a study protocol document is to, whether or not it's difficult or easy, in theory, someone could reproduce your study by reading the words in your protocol. There may be strategies to make that, you know, easier, but someone should be able to read the description of your study population and have a definition that recreates that study population on their data. Same with the analysis or the statistical methods that you have decided will be the appropriate ones to answer your question.
[01:02:03]
That should be clear through the study document too. So this is a description of what question you want to answer and how you're going to answer it with enough detail to be reproducible and post it out there before you do your actual analysis. And there's also a reason. So one thing, it's good to be organized. And so when you're telling your potential data partners that are like, yeah, we'd love to be involved with your study, describe your study, I can tell you in, you know, 90 seconds, and I can also provide this 12-page document that goes into as much detail as you. So it's good admin, but it's also, like, good integrity for observational research. Andrew mentioned the word p-hacking or harking, which is hypothesizing after results are known. So if your hypothesis is out there publicly on GitHub before you produce the results, no one has to worry about harking or p-hacking.
[01:03:00]
So not to say that anyone would be engaging in, you know, potentially, what's the word? A good word for, you know, controversial research techniques. It just, it removes that from the equation, and that looks good, and it's just good practice. So you have your study protocol. This is probably the largest amount of work that will be on your shoulders if you are a part of the team or leading the team that is running a network study, because you have to go from question that you guys have decided to fully fleshed out document describing this question and your proposed methods. Oh, sorry, wrong button. Okay, so number two is we've talked about these GitHub repositories. So this is twofold. This is additional documentation. There's a README file that is used to keep track of all of the Odyssey network studies that we have going on. So this is a computable summarization of your study and where you're at in the process. So studies status, whether it's, you know, if you're recruiting data partners or you're under IRB review, you know, we can use computers to summarize where all of the studies that are currently going on in the Odyssey world, where they're at in the process so people can know and keep track of that.
[01:04:16]
But the most important part is that this is actually where your analysis code sits. So this is an executable study package, you know, with your cohort definitions, with your different analysis modules. And this is the part that Anthony Sano will talk about in great, better detail. And just know that he and others have done a lot of work to make this part easier for you. So my advice here, just go, there's a study repo template. So you can go and find this template under Odyssey studies. It's called strategists study repo template. Just copy that and use that to build your, to populate your GitHub repository. And this is where your study protocol will sit too.
[01:05:00]
So to go ahead and have this set up, this is something that you will need to have ready and ready to be tested as well.
Testing the Study Package
So another point that Jamie made earlier, there's a lot of testing that goes on when you think you should be analyzing and producing results. We have had experiences where we think it's ready. And this time, there's really no more problems. This time we really fixed all the errors. And we let all the data partners know, and we're like, all right, do it again. And then the first person to get back to us has found another problem with it. So it's, you do your due diligence to have this GitHub repo, the study package ready to run, but just know that it's not abnormal for it to be kind of a burden on everyone saying, okay, sorry, nevermind, nevermind, don't run it. We have to fix something. Okay, now it's ready to go. But this is why we use GitHub, because managing changes and pushing and pulling changes to the study package itself, that's what GitHub does. So this is why Odyssey chose to go the GitHub route. So number two, your GitHub repository.
[01:06:02]
Announcing the Study and Recruiting Data Partners
And once you have your study protocol document and your GitHub, what feels like mostly ready to go, and you've tested it maybe at least once with your best friend, now you can make your forum post. So this is what was funny, that it was seemed not official enough to some people. They're like, how do we declare we are running a network study and we would like to call for, the forum, it's again, kind of to Odyssey's philosophy, like we're trying to do all of our communication in public so that people can see in the replies, solving issues on GitHub, that's all done in public. So rather than emailing all your friends, you can do it that way too, but we're just telling you what the sort of agreed upon way to announce your network study is to start with this forum post that has some key information, who is the study lead, so contact information for the study lead or leads, a link to your GitHub repository that you've now fleshed out and a link to your study protocol so people can get more information,
[01:07:01]
a deadline, it's good to set deadlines because large groups, things move slow, so go ahead and decide on a deadline for people to let you know that they would like to participate in your network study. And then just a basic summary of what is contained in your protocol document, more detail, what are your basic study objectives and what's the background and rationale behind why you want to do this study? And that's it, post it in the forum and people will reply and say, we would love to contribute our data to your network study and this then forms your data partners for your network study.
IRB and Data Governance Considerations
So this point applies to not everyone, I've only worked in academic medical centers so I thought this was always this hurdle that everyone has to jump through, it depends but from what I understand if you get like federal funding or and I have a very US based perspective here but we with our academic medical centers typically you have a institutional review board
[01:08:00]
and sometimes additional data governance boards, everyone has some degree of data governance processes that they have to go through, just for us we like to go ahead and submit that at least on our end so if we're leading a network study we submit the IRB to our institutional review board, that does not cover all of the data partner organizations so when we go to Janssen they have a different process and a lot of times they don't even have to, they can just go ahead and run it but then another data partner they may have their own institutional review board and they have to submit their IRB protocol as well and so this takes time, back in the day like getting data and running analyses took so long that this was not something that we needed to explicitly say hey go ahead and get this going but now this is one of the rate limiting steps because everything else is set up on rails such that the pace of the analysis is actually outpacing the just getting approval to do the analysis
[01:09:02]
so go ahead and get the IRB ball rolling would be my advice and we have shared a, so important point, the word protocol is used twice here, there's your study protocol document and then there's what the IRB likes to call an IRB protocol, these are similar but different documents, we attach the study protocol to our IRB submission and a lot of the language in that study protocol document we literally copy and paste into our IRB protocol but at Johns Hopkins there's a specific form with specific questions and specific information that they want for them to review our application to do this research and so we have our example, you know it's N of one but we have an example of what our IRB wants to see, the information that they want and we have created kind of a generalized template for Odyssey type studies going to the IRB so things like the data does not get shared, the code gets shared and the results get aggregated,
[01:10:01]
that can be new to people, like I know to us it seems like yeah we've been doing this for what, five years now, 10 years, like this is old news but like some people are hearing this concept like a distributed data network for the first time so it's helpful to share resources in this sense and that's why it's under the Odyssey Evidence Network GitHub repo under Odyssey Studies that our generalized IRB template that we use to submit is there for you guys to, you know take what is helpful and it may be you know more specific to your institution so but anyway if you have an IRB go ahead and get that ball rolling because there are many times where a network study is put on pause because we're all waiting to hear back from our IRBs so. And then number five is if you email Craig Saxon you can get a shout out on the community call for a call for collaborators. So your forum post is out there, you have your GitHub repository ready and it's just another source of letting people know
[01:11:01]
because there's no, this is a decentralized network so you know diversifying your raising awareness strategies is good and the community call is a great place to just let people know and you know sometimes I'm sitting there doing other work on the community call but you know people hear certain clinical applications or things that you know would interest them in helping with or contributing to their study so this is a good thing to also do to launch your network study.
The Odyssey Evidence Network
And then lastly this is a new option if you guys have been coming to last year I think they officially launched the Evidence Network or announced it and so now we have, I should know the number better, I think it's like 30 to 50 people who have already submitted their database diagnostics profile which is basically aggregated counts for all the standard concepts in your OMOP CDM and so the Odyssey Evidence Network is a place where if you have a network study and a study package that has cohort definitions
[01:12:01]
that look for you know patients with this concept, that concept, you can go ahead and find data partners that will have sufficient data around the concepts that are required to run your network study question before going through all this trouble. So it's not the end of the world but it's just nice to go ahead and when you have your cohort definitions in your study package you can come to the Odyssey Evidence Network and we can run your cohort definitions across the aggregated counts from each data partner organization in the Odyssey Evidence Network and tell you hey Johns Hopkins, Stanford, Vanderbilt, these places all have enough, you know it doesn't mean it'll necessarily work, there are other problems that can be had but at least the problem of not having sufficient data to answer your question, you can go ahead and make sure that the people you reach out to and ask to go through the trouble of running your, it's not that much trouble but running your cohort definitions on their site, this sort of fast tracks that. And so the way that you would do this right now,
[01:13:03]
this will be a more formalized process in the future, this is a very new initiative that we have but the leaders of the Odyssey Evidence Network are Claire Blacketter and Paul Nagy, you guys probably have heard or know these two individuals so their recommendation is that you come to office hours and if you can't come to the office hours which is 9 a.m. Eastern Standard Time every Friday then set up a time to talk to one of these two people and then we also have others in the group that like help support the running of the Odyssey Evidence Network and so we'll talk through your study question through a meeting and then we'll go through, because this is a new process so we'll figure out how best to run your cohort definitions and concept sets across the aggregated DV diagnostics profiles of the Odyssey Evidence Network and give you your result. What I'm trying to say is that would be the, we haven't technically done that before but we are prepared to do this for you so come to office hours and we can give you a set of good matches for your network study so that's a new option for launching your network study is this targeted approach.
[01:14:12]
And that is my instructions to you to, this is how you launch a network study and also that is assuming that you have a good study package which Sena will now tell you all the good details about that. All right, thank you, that's the USB-C converter if you need, yeah I can do this guy here.
[Anthony Sena]
Study Package Development with Strategus
Hello everyone, my name is Anthony Sena. You can just call me Sena which is what most people do. I'm part of the observational health data analytics team at Johnson & Johnson. I've been an Odyssey collaborator for about nine years now and have been contributing to open source solutions in our space for that time and I'm happy to take you through the package development for your network study.
[01:15:04]
So for me, this is the fun part but maybe for some of you this may be the intimidating part is actually getting down into the coding of your package and so one of the efforts that we've had going on in the Odyssey community is this work around an R package called Stratagus and Stratagus is aimed at coordinating and executing the various Hades analytics packages that are part of our standardized analytics suite and as part of Stratagus, we have developed a new template as part of a new release of Stratagus 1.0 and that's what I'm gonna take you through here today. This is all brand new stuff so if you've never heard this, you're not alone, okay. So the book of Odyssey chapter 20 talks about the elements of an open science or an open Odyssey network study. There's a lot of elements which you've heard and I'm gonna mainly focus on the fact that you have to create a study package, typically with R and SQL with code that is CDM compliant
[01:16:03]
and if you use Stratagus to do that, you've kind of accomplished this task and so the other aspects of running an open network study include things that you've heard about, publishing your protocol to GitHub and setting up a GitHub repository as well as sharing results and publishing like a Shiny application to review those results.
Elements of an Open Network Study
So, excuse me, due to the Odyssey studies repo is the Stratagus study repo template and the idea with the template is to give you a place to start where you don't have to develop all of the code from scratch and this template will be copied into whatever new network study you decide to do and I'm gonna take you through the elements of that repository and kind of show you the resources that are available as part of this template along with links to where you can find documentation not only about Stratagus but more importantly,
[01:17:00]
the Hades analytical modules that help answer the questions that Nicole and Andrew showed you earlier.
The Strategus Study Repo Template
So, one of the Odyssey GitHub admins who will remain nameless but it'll probably be me or someone else will go ahead and set up your Odyssey studies repo for you and what we will do is we'll use this Stratagus study template repo to basically give you a starting place for your study. So, the way that that's done is inside of GitHub, there's a button that says use this template and we can create a new repository using that template and it'll copy in all of the code that we're about to go through. So, just to start, you give it a name that's unique, you provide a description so people can find it, you fill out the README as Ben was going through so that it can be found on data.odyssey.org Odyssey studies and just by virtue of doing that, you've kind of gotten yourself started. So, you clone this repository to your machine and you follow the instructions that are in the template and what the template provides you with are a set of R scripts that we'll go through in more detail in a moment as well as some instructions.
[01:18:10]
There are README documents about how to use the template, how to execute the study and other links to other resources that will take you to the various parts of the Hades analytical packages to help you with the design. So, the main documentation or kind of the root of the documentation is this using this template.markdown document which is linked from this QR code here. Importantly, there is an important package that we use that's part of the R ecosystem. It's not something that's been developed by Odyssey but it's used by Odyssey which is the RENV package and RENV is aimed at making your R and Python environment reproducible. So, an important aspect of running a network study is setting up your execution environment, making sure you have all of the packages necessary to actually perform the analytical tasks and RENV makes that much easier, much better for reproducibility and also provides a very clear list of everything that went into developing and running the package.
[01:19:17]
So, if you use the repo template that I'm gonna go through, we've already included an RENV lock file which is where all of those details around the packages are contained and it reflects the latest release of what we call the Hades wide lock file. So, this is essentially a known configuration for Hades that will work within R.
Configuring the R Environment
So, to start within the study package template, we have a script called downloadcohorts.r and the reason that this exists is sometimes you need to download the cohorts that you may have developed inside of Atlas into your study package. So, you just wanna basically have a way of copying down that code and keeping it as an artifact for your work. The same holds true for the negative control outcomes which are stored as a concept set inside of Atlas and you want to just have these inside of the INST folder for your study project and if you want to use the phenotype library, you want to build your own cohort, you have the flexibility to do that.
[01:20:16]
Again, these are more or less ways that you can approach building out the package. It's not a prescription, but it's more of an example of how to use some of the Hades packages to do that.
Defining the Analysis Specification
The next important part is defining your analysis specification. So, as I mentioned, Strategas is an R package for coordinating and executing the analytics provided by Hades and we'd like to think of the Hades packages as building blocks that you can use to design the different types of studies that Nicole mentioned before. As a matter of fact, you might want to be able to combine them into characterization, estimation, and prediction studies and so you can build these up one building block at a time and so each of these building blocks is a Hades package and all Strategas is doing is enabling you to make use of that in a consistent manner.
[01:21:05]
So, the way that you configure and run cohort generator and characterization is identical and if you want to add in prediction tasks, those follow the same pattern and we can build up all the way to all of the current modules that exist inside of Strategas to run all of the various Hades analytics. So, within the template, there is a createStrategasAnalysisSpecification.R and that lengthy script will go through each one of the Hades analytical packages that's supported and it is used to create the exemplar that is part of the template. So, if you are familiar with Hades, you may be familiar with a synthetic OMOP CDM that we use in the R space called Unomia and that sample OMOP CDM can be used to run the study that is part of this template.
[01:22:00]
So, if you don't have access to an OMOP CDM or you're a developer that just wants to kind of kick the tires on this, you can download this template, you can make use of it to just run it on some synthetic data to kind of see how you would go about configuring Strategas to run your study. The building blocks that I mentioned earlier are kind of captured in the last part of this R script which I put here. And don't worry, there's no quiz, so you don't need to worry about what this is actually saying but this lengthy list you can kind of see from the descriptions is just chaining together all of the different modules, all of the different analytic tasks that you want to perform as part of your study. So, if you don't want them, you just take them out, comment them out, just remove them, it's not gonna hurt anything. But most importantly, at the very end here, there's a call to save these settings to a JSON file. And so, the analytical choices that you make as part of your protocol that are then codified are saved as a document that lives inside of your repo. That JSON document is what powers your study through Strategas.
[01:23:01]
So, once you've made those decisions, you've captured them, you commit them to GitHub, they're codified, they're locked. You don't have to change them unless you find that there is a problem but you have those captured as a document. So, I'm not gonna cover each of the methods in detail but there are links on the Strategas documentation site which is linked here. And the Hades packages themselves have a lot of documentation and Andrew mentioned there are a lot of resources for learning how those methods work.
Running the Study Package
Now to the fun part. Jamie mentioned you want to get a friend to run your study but you want to run it yourself first. Don't bother your friends until you've actually done it yourself. And so, to do that, there's a code to run or a Strategas code to run file as part of the template. And the steps here are really to just configure the connection to your CDM in your environment. So, we don't know your connection details, thankfully, so you have to provide those and run it yourself. You load the analysis specification from the JSON file as I mentioned earlier.
[01:24:00]
You create execution settings to store the results. So, those are things like, where on my file system should I write out the results? If I have minimum cell count for my site to preserve privacy, that's configured as part of the execution settings. Then you run the execution, you review the results that are produced and all of the results that are produced are produced as comma-separated value files and they are aggregate statistics. There is no patient-level data that is created as part of the CSV creation process. And then ultimately when you're ready, you share those results or maybe you phone a friend to let them know that it's time to try it themselves.
Managing and Sharing Results
So, something that we don't often cover that I wanted to mention here in this tutorial is once you have results and you wanna work with them, how does that happen? In a lot of the previous network studies that I've participated in, I've been kind of the man behind the scenes collecting the results and uploading them and doing that sort of thing, but there's no magic to it. Essentially, once you've looked in your CSV files and you feel comfortable with sharing them and you send them to me, I upload them to a database.
[01:25:05]
But you could very well do that yourself. And so what I'm showing here in this depiction is you have these aggregate results in CSV format, you wanna upload them to a Postgres database and use a results viewer to take a look at what has your site produced in terms of evidence around this study. And if you happen to be at a site that has multiple CDMs like I do, the process is the same. We would just run it on our different CDMs, upload to a central results database as we call it and use the same results viewer. And this holds true when you're doing a multi-site, multi OMOP CDM, which is why we're all here. Those results are, you can look at them individually, but ultimately we want you to share them with the study coordinator who's gonna upload them to, or sorry, they're gonna secure FTP them to someone like myself who would then upload them to a results database that Odyssey has. And we would publish the result viewer on results.odyssey.org.
[01:26:01]
Along those lines, we provide scripts that help you set up your database for the results to upload them. And the app.r provides the shiny results viewer that you could use and will be used when you are running your network study. So if you'd like to get involved with the open source development of Stratagus, we have a sub team of the Hades Working Group that meets once a month. We formed this team earlier this year and we've recorded all of the meetings. So if you wanna get into the nitty gritty around some of the design decisions that went into this new version of Stratagus, you're welcome to look. And we would love to have you join the team. So thank you. I'll hand it off to Yong.
[Yong Chen]
Federated Analysis Methods
Okay, so for the next 20 minutes, I'm going to talk about different ways of running federated analysis without sharing patient level data.
[01:27:03]
So this is an overview. You know, Odyssey is expanded over many, many data partners and has expanded over the years, 10 years from a few dozen people at Columbia and the JNNJ now is a few several thousand people. I joined Odyssey about 10 years ago and I was fascinated by this open science research community in sharing ideas and giving feedbacks and motivated next generation methodology. So in the next few slides, I'm going to give you a few samples of what type of federated learning analysis, what are the considerations, should be taken care of instead of diving into the technical details. I invite you guys to think along with me about real world challenges and then feel free to interrupt me with questions.
Data Sharing Considerations
So now the common assumption is that we have multi-sided data and it can be either EHR data or claims data and we cannot share patient level data.
[01:28:09]
So the barrier of sharing patient level data usually the HIPAA or the GPDRs at Europe, this type of data sharing policy and a very, very lengthy data usage agreement. I think in practice, the most important issue is also the ownership of the data because once you give away your data, you lose control. Although some collaborators promise saying, oh, I'm going to just run a logistic regression but once you give away your data, you lose the control. And here, the other thing is that each side, we assume they have converted the data in the common data model. Most of the scenarios that we start with the observation start with this type of setting, meaning that, for example, I work at UPenn, I have access to patient level data at UPenn
[01:29:02]
but I have my friends through all the scenario work or personal friends from Columbia, from Stanford, that they are willing to collaborate but we don't have time to establish this lengthy DUA so they can share some aggregated data. So this is a very typical setting of running this analysis. And usually, you have a study lead and then you have people to share some aggregated information and then you work together, write a paper.
Data Sharing Protocols and Budgets
I also want to invite people to step back a little bit to think broader, what are the data sharing protocols and budgets? So actually, the pioneering work about putting this federated learning, which is often the pure computer science concept to the real-world study was done by Lucila at UCSD back in 2010. There was a research consortia building from UC San Diego but across the state of California where people can share some data through secure the research network.
[01:30:13]
And they basically build some firewall that can infinitely query your database and get your updated results and do that iteratively. So it's a iterative federated learning framework but a key limitation is scaling up this framework to broader community. For example, if you talk to Penn School of Medicine, it will be very difficult to convince the privacy officer to connect our Penn database with some online firewall and allow iterative communication. And the other framework is this FieldShot. We call it a FieldShot federated learning. Also computer science, the FieldShot means the field sample but here is a FieldShot of communication.
[01:31:01]
So it only takes a few rounds of communication. Later, I will show you, even that is run into some synchronization stability issues but there are a lot of work about making this iterative federated learning to be few rounds of communication. And the last one is sort of well understood in one version but less understood in the other version which is a meta-analysis. I think most of us know meta-analysis. Basically, just divide and conquer, right? You share a protocol and the codes. As Sina said, distribute the protocol after thinking through the clinical problem and ask them to run their analysis out of their data and then share some aggregated results. You just do a meta-analysis. Inverse variance weighting or some type of averaging, right? And but historically, meta-analysis is very rigid because if you just average across some point estimates, you actually run into some problem.
[01:32:05]
Like in pharmaco-epi, we are often interested in some adverse event. You have very few events in the database. So that point estimate and a standard error are extremely unstable. In some cases, you don't have any event for some database but throwing away those data sets is going to create the biases. So the one short of federated learning is a popular important concept. I will talk a little bit more. So basically, we want to design the protocol, the data sharing protocol in a clever way that you can get as a result as close to the poor data analysis as possible. Here, the gold standard is always the poor data analysis assuming you can go through the DUA and you can have all the patient level data put together. You run a big model, right? That's the gold standard. So it turns out that we find that in some scenarios, we can do, we can gather both one short, just one communication of aggregated statistics and being so-called lossless.
[01:33:08]
You get the identical results as if you have the patient level data. So that case has happened very rarely. So when that happened, I call this the uniformly best solution because you can basically do any downstream task as you want. So let me do a quick survey.
Types of Analyses in OHDSI
Like in this room, as we know, we talk about the evidence generation and with causal inference and target trial emulation. But I want to quickly find out how many of us are doing one of the three. Like if you can raise your hand, if you are, for example, interested in characterization of the cohort, can you, so we have about half of the cohort. And what about the quantifying some association? For example, some regression model or doing causal inference or target trial emulation.
[01:34:00]
Could you raise your hand? Another, yeah, also half. And as the last one is about the risk of prediction. You want to use other people's data to help you boost the prediction performance for your own data. How many of you are doing this? About a third, right? Okay, I'm going to tailor the material a little bit in the rest of the talk, okay? So as you know, if you have a data all collected in one particular study, could be multi-site or single-site, you can do a lot of things like characterization, latent class model, topic modeling, subphenotyping, association, causal inference you can run, sequential target trial emulation. You can do hypothesis generation testing, all kinds of things, right? And this is just some example studies that our center produced in the last 12 months. So as I said before, there's a lot of freedom type of analysis you can do using single study when data are centralized.
[01:35:04]
Scaling Up to Multi-Site Studies
But the whole point of this tutorial is about how to scaling up your study from a single study to multi-site study. So the fundamental advantage is that when you have multi-site study, you cover a broader population, but nothing is free. This comes to the price of potentially introducing heterogeneous population, and there are all kinds of things that we need to pay, right? So just be careful about multi-site study, the potential challenges. And the communication efficiency is something I want to emphasize. When you share data, for example, as Sina talked about, you have some aggregated CSV file saved at a place. If your algorithm takes three or four iterations, you have to do that again and again, and later you will realize there's something called a synchronization you run into the problem.
[01:36:02]
Communication Efficiency in Federated Analysis
So let's look at a very, very ideal situation. So we can try to answer the question, do we really need to share patient-level data to run a model? Answer is always it depends, right? It depends on the complexity of the model and how smart the design is. Let's think about the simplest model, linear regression model. We put all the data elements aligned in common data model. We just want to run this linear regression model. Ideally, if you can pull patient-level data together, there's an estimator called ordinary least square estimator, this guy, right? If you can pull the data together. It turns out that this guy can be losslessly reconstructed by sharing a simple vector and a matrix. If you have 20 predictors, regardless of how many patients you have in a hospital, say 2 million patients, and you only need to share a 20 by 20 matrix, a 20 by one vector, and you get a lossless result with only one round of communication.
[01:37:05]
So this is a very simple one-on-one result from linear regression, statistical one-on-one, but it's quite profound. I feel in federated learning, especially in the community of odyssey, we should push for lossless one-shot as much as possible because there's all kinds of complications in the communication. So this is just quickly dichotomizing the existing algorithms in the literature. There are iterative algorithms developed by the PSCAN framework, and then there are non-iterative ones. So now let's talk about the characterization.
Characterization and Subphenotyping
So since I just did this away, so one of the main characterization of patients is subphenotyping or clustering, right? You want to find out who are more similar compared to others. So if you have data from multiple sites, the provenance version is to run divide-and-conquer clustering, just run separate clustering analysis and try to somehow combine them together.
[01:38:11]
So that actually run into two issues. One issue is that you may end up with a different number of clusters. How do you aggregate? The second issue is that the cluster you identified in this second hospital may be somewhat like the first cluster or also like the second cluster. How do you communicate? How do you learn from each other, right? The solution of this we proposed is pretty simple, actually. You can just first acknowledge the populations are different, but on the other hand, we can postulate that the heterogeneous population are composed of some common ingredients. For example, each population is coming from a mixture of the three subpopulations, whereas the proportion of each population can be different, right?
[01:39:03]
So this actually can help you to capture the between-hospital heterogeneity, but on the other hand, you can also allow them to collaboratively learn the common subphenotype. So this is a paper still under revision at the Journal of Machine Learning Research, but we have actually implemented it in multiple tasks in the pediatric long COVID study, trying to identify what are the subtypes of the long COVID. So as you can see, yeah.
[Chan]
One day, how do you determine the number of latent classes from just one database, or you first do a federated learning to infer how many latent classes are there?
[Yong Chen]
Yeah, this, first of all, testing number of latent classes is statistically called a non-regular problem. There is no universal solution. It's a non-regular hypothesis testing problem. So in practice, what we do is we specify a number of possible latent classes, say three, four, five, six, just run up to 10, and run this in the same protocol and do it together.
[01:40:12]
And then when you get the results, you can look at the heat map and look at those distribution, combine this result with clinical knowledge. And all of this is unsupervised learning. There's no gold standard. So it's, you should combine, incorporate the clinical knowledge. So in this setting with nine hospitals, as you can see, the composition of different subtypes are very different across different children's hospitals. So the other analysis that I think we presented in last year's tutorial at the SOS challenge, which is combining causal effect. Suppose you have, say, in a pharmacotherapy setting where you study some rare adverse event, for example, acute MI, right? So those are the things that happened with very low prevalence.
[01:41:06]
And then this is a paper that we wrote in the Odyssey community, trying to combine different causal regression model to aggregate the result from different studies. And more recently, you know, people are doing a lot of target trial emulation. So we also offered a procedure which is actually lossless one-shot running this federated target trial emulation. And it turns out that if you're adopting, say, training different propensity score model at different hospitals, and then you stratify your propensity score model, the bottom line is that regardless of how big your data set is, you just need to share a number of two-by-two tables. And suppose you stratify your propensity score in five strata, which is commonly used.
[01:42:01]
You just need to share five two-by-two tables, and you are guaranteed to get identical results as if you pool the patient-level data together. So I'm going to skip this result. It's actually implemented at Penn University, Florida, Canada, Yale. And quantifying association, running all kinds of regression model.
Quantifying Association with Regression Models
So this actually was the earlier effort that we made because, you know, people are running a lot of logistic regression, Coxian regression model. So we actually had a large number of algorithms supporting this type of regression analysis. The very early analysis was called O'Dell. This is actually done, developed, led by my former student, Ray Duan, now at Harvard Biostat. And she led the development of these two algorithms called O'Dell and O'Deck. So it's a one-shot distributed algorithm for fitting multicenter Cox proportional hazard model and logistic regression model.
[01:43:02]
These two models are so popular in the epidemiology and the biostatistics. So, and then later, let me just demonstrate how it work. So basically, in your local site, you train your logistic regression model or Cox model. You get your model coefficient, say beta bar. Suppose your model have 20 predictors. Just send your 20 predictors estimate, a 20 by one vector to your collaborator. They just need to share this guy. This is actually the gradient of your log likelihood function evaluated at this initial value, send it back. And then you just revise your target function, you do the iteration. So it's very easy to implement, but I think it reflect the two important philosophies. First, this is a fundamentally different from the iterative federate learning because it only takes one iteration. It has the iteration, but all the iterations was done inside of your local datasets.
[01:44:06]
This is number one. So only take one communication. Secondly, is also reflect most of the philosophy of most of our collaboration, meaning that your lead collaborator should do 90% of the work and then try to minimize what your data partner need to do, right? So your data partner literally just need to take the initial value, get that number sent back to you and it's done. So just to put this in a concrete setting for Odell model with four predictors, you only need to share a right-hand side, very small, just literally four numbers. So we actually extended this beyond Odell and ODAC and we started a competing risk model and so on.
Heterogeneity in Multi-Site Studies
The other thing I also want to encourage people to think about is heterogeneity because if you collaborate with people, people tend to think about heterogeneity in a very, very generic term, but you need to push your collaborator or push yourself to think about, yeah, to think about what are the heterogeneity.
[01:45:12]
There are heterogeneity in the relationship between the outcome and the covariance and also the distribution can be different. The patient population can shift and data quality can be different, right? And so this is a model that we actually consider the heterogeneity and we were very lucky, we actually find a lossless one-shot algorithm. So I'm going to skip this and then also, I think some of us are interested in prediction.
Prediction and Transfer Learning
I think I want to share this basic concept which is the difference between federated learning or so-called distributed learning and transfer learning. So distributed learning or federated learning is the task that we want to train something in common.
[01:46:00]
We want to train a common model that people can use. But transfer learning is trying to use a collaborator's data as a source data to benefit your particular prediction. If another hospital is going to do a transfer learning, there should be a different model tailored for that particular target population. That is a fundamental difference. So in the literature, I think oftentimes people overlook, think about the heterogeneity across data set but have not thought sufficiently about the data within a data set. So this is a framework that we actually consider so-called latent transfer learning. So this is a forthcoming paper at the Cell Patterns and we consider the so-called within and between data heterogeneity. So just one minute.
Synchronization Challenges
So synchronization, I give you one concrete example. We launched this study with J&J, working with Jena and we give people three months to update the result with two iterations.
[01:47:05]
And it turns out that there are people, especially people from Europe, they are basically saying, oh, I'm going to disappear for the entire summer because, you know, so this is quite typical. But then they actually shared with us their first round of results. Should we drop them? Should we wait for them, right? So that put us in a difficult situation. So I think it's very important to think about how much we can push for one shot. So basically on the left-hand side are iterative algorithms, middle is a few-shot algorithms and we are actually working very hard to push for one-shot algorithm, meaning that if you can have people to run the analysis with only one round of communication, people would be happy, right? So if you do a survey here, would you rather to let your algorithm run for three days but communicate only once or would you rather to let your algorithm run for one day but for each round do three communications?
[01:48:09]
I think most of us would rather to let the computer to do the work, right, rather than synchronizing. So I'm going to stop here. So just end with my contact information. Feel free to reach out. The slides will be available online. So thank you.
[Chan]
Ensuring Study Impact and Community Engagement
Okay, hi, I'm Chan from Korea and then the agenda of today's presentation is how to ensure impact of your study and engaging with this community to assemble a team of contributors and organizing the work of conducting a study and then lastly managing the phases of study execution and dissemination. As a nation, the most important virtue of Asian pursuit is that being humble and being modest but next to the most successful researchers in our region, I want to say that I'm also a successful researcher.
[01:49:19]
So I'd like to share my experience as a successful researcher.
Importance of Ensuring Study Impact
So why you need to ensure impact of your study? Because most published research findings are false. So according to the Ionides, he said that discovery-oriented explanatory research with massive testing, the possibility of being really true of this kind of research is one in thousand, so only one in thousand studies are true. And if you conduct a very accurate powered explanatory study with an Odyssey network, then the positive predictability of your study will be 0.2. So it's much higher possibility to be true.
[01:50:20]
This is the comments from the reviewers of my second Odyssey study. It's kind of the common comments you will get from your observation research. So the reviewers said that this remains a hopelessly flawed observation design using claims databases in a study. And actually, at that study, I tried to compare the effectiveness and safety of tracheolar versus clopidogrel in patients with acute coronary syndrome. And the evidence of this comparison is actually published in New England Journal of Medicine in 2009.
[01:51:03]
It was really large-scale RCT as really landmark study. And my study actually published in JAMA in 2020, and it is unusual in cardiology that the studies with different conclusions is published in very privileged journals like this. So in the previous RCT showed that tracheolar is better than clopidogrel in these patients, but I showed that there is no difference between these two medications. And how did I argue against the reviewer? I said that we, I emphasized that our approach should represent a significant advance in observation research. We balanced thousands of variables. We used over 90 falsification endpoints or native control outcomes. And then we published our entire protocol and all our source code before running our trial.
[01:52:05]
And we tried to try a bit p-hacking, and we ran a database inside and outside of United States, and we ran large set of sensitivity analysis. That's what you can achieve within RDC network study.
Challenges in Conducting OHDSI Studies
But challenges remains. So it's not easy to find meaningful and answerable research questions because I think that only a subset of your research questions can be answered by using the observatory study appropriately. Most of the studies, most of the patients requires interventional studies, which is an RCT. So it's not easy to find the appropriate questions. And even though the OMOP-CDM seems very simple, but there are a bunch of diverse data sources across the world, and the convention is actually very complex.
[01:53:06]
And then the convention itself is evolving. So it's not easy to capture all these conventions. And the OMOP vocabulary is very comprehensive and very powerful, but I know that mastering it is not a simple task. So it's not your fault that, you know, even though you do not understand this very well. And I need to admit that, you know, understanding the RDC study pipeline requires us deep learning curves. So, you know, there is very advanced concepts such as large-scale propensity model, empirical egg pose, empirical calibration. I do know that you may know, you have no ideas about this. And then, you know, developing RDC package, even though as Anna showed that how you can develop your own package very effectively with a strategist, but still it is very challenging and you may need a very deep understanding of R, SQL, and as well as RDC package itself.
[01:54:11]
So anyway, yeah, we may need to accept these challenges. But the first truth is that, you know, there is no one who can do this by himself.
Engaging with the Community and Assembling a Team
So you need collaborators. And surprisingly, there are actual people out there who want to help you with your research, like unicorns exist, yeah, deer exist. But beware, collaborating with diverse, multicultural, and multidisciplinary team can be tough, especially when you don't have control over them. You don't pay for them, you know, to them. And then, yeah, miracles do happen, it happens. So through collaboration, this community has published studies that are nothing short of miraculous. So as Andrews shared that, you know, we may require many, many different contributors such as science lead, clinical expert, method experts, data expert, development expert, data collaborators, and project manager.
[01:55:18]
How to, so how we can engage with the community to assemble a team of contributors? Anyway, we need a team. And I want to recommend, don't wait, and you can start early and start strong. You can pitch your study by using platforms like Artists Forum, or you can pitch your study during the community call. And you need to make your research's importance clear and spark interest. The good news is that, once again, the artists community wants to help, so this community is really eager to engage with new research effort of yours.
[01:56:03]
But sometimes, people won't immediately jump in. So, you know, that happens. That happens to me, like successful researcher like me. So that's okay. It doesn't mean that, you know, your research isn't important. And it does not mean that you are not renowned researcher or because it's not about your expertise or status. It's just natural. Once again, you know, you do not pay for them. So it's natural. But, you know, please don't get discouraged. So the most important thing is to keep going. Perseverance matters the most. How to organize the work of conducting a study? So if you assemble a team, your team, then, you know, you need to make sure that you have enough experts on your team. And if you do have experience in the artist studies, then you can organize your team.
[01:57:01]
But if you don't, you have to reach out to one of the artist experts to organize your team. Because there are like essential parts, such as protocol development, study package creation, running the study on real data, and then synthesizing the results. You need to cover all of these essential parts and you may need an expert to organize your team. And then, yeah, and I wanna recommend you to include a mix of junior and senior scientists for healthy blend of enthusiasm and expertise. And this will help foster the next generation of leaders in the artist community as well.
Managing Study Execution and Dissemination
So how to manage the face of the study execution and dissemination? You know, unfortunately, starting and initiating is the easy part of RDC network studies.
[01:58:03]
And then the real challenge is keeping people engaged through execution and dissemination. And most time, people won't move the way you expect. And then, in my experience, the key challenge was me. And four years of research, the key challenge would be you. So, you know, I've led many studies, I've initiated many studies, but only some of them actually done successfully and then most times I failed. So, you know, the problem was me. It's not the team. And because during the study execution and dissemination, I've got distracted or discouraged, so I just fail. So that happens. There are several practical tips for managing, like leveraging the collaborative tools such as GitHub, Teams, or other tools.
[01:59:04]
And then, you know, make sure to manage the participants and authorship very effectively and fairly. And I recommend you to read RDC authorship guidelines for organizing the author list fairly. And you are the leader of your study, so maintain the leadership and motivation. As a leader, you need to maintain the motivation of your team, especially during tough times. And team members often lose motivation when facing obstacles and it's your job to recognize these obstacles and then help them through those challenges. And remember, the community is there for you. So, my final remark is that, you know, once again, network studies are nothing short of miraculous. So failure is the default. Default is the failure. So success requires miracles at each stages.
[02:00:00]
And it's okay to fail. Please do not disappear and do not give up because of your failure. There are other challenges. I thought that my nationality, my language, my race, and my different time zone would be challenges for my studies, but, you know, it can be manageable. But real challenge is finishing and starting tough, but maintaining and completing the study is even much harder. And then before you lead, check your own passion, perseverance, and leadership. And I really emphasize to participate in others' study first. So, you know, before expecting others' commitment, show your own first. And you can learn a lot by participating others' studies in terms of the skills and management organization and leadership. And I believe that my dedication was rewarded and then your dedication will be and the community will remember that.
[02:01:07]
Finally, we are here. We are all here for you. So please reach out to us if you need it. Thanks. Thank you. Thank you.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
All right.
[Andrew Williams]
Break and Introduction to Panel Discussion
Well, we have a 20-minute break now and then we're going to follow on that excellent series of presentations to talk about what has worked well and what hasn't worked as well as planned with some real use cases. You can tell everybody involved has been actually leading Odyssey Studies. And then a very exciting conversation we're gonna have with you following that panel discussion where we're gonna identify some ways in which we can use the community resources to actually improve things. So all of these challenges have been identified. We're gonna zero in on a couple and really find things that are feasible to tackle in some sense going forward.
[02:02:02]
And so 20 minutes from now, please be back and we'll start off from there. Thanks.
Panel Discussion: What Works Well and What Doesn't
All right. For the next 40 minutes or so, we are going to discuss in a little bit of depth things that have gone well, why it makes sense to try and manage all the challenges that people have raised and do all the things that we've described, and things that we think could benefit from improvement. And again, that's gonna set up a conversation that's gonna lead to some kind of action at the end in the second part of the conversation. But we're gonna go through each panelist one at a time and they are going to talk a little bit about some example of things that have gone well, maybe highlighting some reasons why they think they've gone well, and the adverse of that, things that could be improved. And I'm just gonna go in order across here, starting with him.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Ben's Experience: Asking for Help and Resilience
So this is one thing that has gone well in my observation and experience, and one thing that we could say I would do differently if I could do it over again, our area for improvement.
[02:03:10]
Or we can just say something that didn't go well. So I, again, I'm a postdoctoral fellow, so I learned about Odyssey a little over two years ago, so I'm very new, but I've been thrown into the whole process and I've been involved in several network studies at Johns Hopkins. And I will start with the thing that I think that I would do differently and will do differently, something that has been highlighted by my panel members here. Several times in the earlier part of the presentations is willingness to ask for help and use the community. And it's not like to do something all on my own, it's like kind of goes back to like, no, I can do this all by myself. Like it's not something to be proud of to do more,
[02:04:03]
like if it could go better by utilizing the community and utilizing the network of help available to me, that's, it's faster, it's more efficient, it is less prone to mistakes, and that's the part that I really, our experience in one particular network study, it was three of us, all of us were pretty new to Odyssey, so there was a clinical lead, this was in myositis, so rheumatology, I'm not a clinical person, so I work in a bunch of different clinical domains, and then two of us are like kind of data computer informatics guys. And so we were trying to figure out like kind of the steps that I presented in my little section, like Dr. Meckley had a very good idea of the question he wanted to ask and the use case for Odyssey, and so this particular study, we kept it very simple, which was a smart plan, but it was to just evaluate computable phenotypes, so this room is probably pretty familiar with that term.
[02:05:11]
It was a series of definitions to identify patients with a certain type of myositis, dermatomyositis, and so we had a bunch of different cohort definitions that represented different ways that we could identify those patients, and we were using this new tool that Joel Swerdl developed called FeValuator, and FeValuator, you can learn more about that, but it basically trains a predictive model based upon a highly specific cohort definition and then uses that predictive model to evaluate the probability that your other phenotypes are identifying accurately patients with, in this case, dermatomyositis. So we talked with each other, so the three of us went around and around and around
[02:06:00]
trying to understand exactly how to use this pretty, I mean, in the scheme of things, pretty simple tool, pretty straightforward, the documentation is all out there, and we tried to do this on our own, is understand how to use this tool, how to interpret the results, how to set it up so that it would be the most informative for this space of identifying these patients. And we did ask for help, we were involved with the phenotype and evaluation working group, and they were very helpful to us, and Joel is a member of that, the guy who wrote this package was there, and he was helping us along the way, but when we finally kind of went in circles, and then at the very end we said, okay, it's ready, we are, here's the package, and we sent it to our data partners, and Joel himself went to run it,
[02:07:01]
and it went okay, there were some certain errors that we finally were like, okay, what's going on here? And it caused us to work all the way back to realize that there was some problematic design choices that we made in the execution of this very simple package that caused us to basically have to start. So we did a lot of work to only realize we needed to start over, and if we had just sat down with Joel, so the way we found this out is we sat down with Joel, the guy who wrote the package, and he looked through our code with us, and he said, that's where you're doing this wrong. And it was something as simple as like, excluding a list of concepts, and I misread the documentation, and I understand why I made this mistake, but if we had just done this first, rather than last, it just would have saved us all, our data partners, but mostly us three, a lot of time and effort.
[02:08:00]
So asking for help, and asking for it sooner, and not being afraid to, I think the reason we made this mistake is it's very, it's counterintuitive to have someone who wrote the, like, these are the people maintaining the package, and it seems like too good to be true that they would be willing and have the time to sit down with just one little study team who is running their package and look through their code, but what I came to realize is, these people are willing to help you in this manner. Like, it just, it seemed like that wasn't accessible, and I know, you know, people would say, use the community, use the Odyssey network and the community to ask for help, and it's like, in my head, it was one of those things that I thought, you're supposed to say that, because it's nice, but in reality, people are gonna be too busy to worry about my little package errors, and that was not true, and so now, next time, I will do that in the reverse order,
[02:09:00]
where I will ask for help first, and not try to do it all on our own, and then have to deal with this later on.
[Andrew Williams]
So one thing that didn't go well was being reluctant to ask for help, and then my- Before you go on, so would it be safe to say the lesson there is, not only don't be reluctant, but maybe check sooner in the process, so being reluctant prevented you, you would otherwise have checked sooner before the network was invested.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
I will now, just in case. I think it's a good thing to, it would be better to have gone to Joel and said, can you review our code, and then there's no errors, and maybe I wasted 30 minutes of his time versus we wasted hours of our time, so I think just err on the side of being too willing to ask for help is not a bad idea, and sooner, of course. And then my thing that I have observed that has been a major key for success is this resilience that,
[02:10:01]
I think one of our goals with this session was to level set expectations about how difficult, and the many ways that errors, and problems, and challenges can arise when you're doing this type of federated analysis, and the network studies that I have seen go from start to completion have always had at least one, usually two people who are committed to the death to see this through to completion, so we have, if you've seen Cindy Kai's work in the ophthalmology group, she has done two networks, she's at Johns Hopkins, and this most recent one was my first time being involved. I saw that she had done one before, and now I was the one just running a network study package for one data partner organization as Johns Hopkins, but she was the network study lead, and watching her, it's like wrangling cats, or juggling balls, it's so many moving parts, and I'm sure that it is exhausting,
[02:11:01]
and seeing her do that just with this consistent, resilient effort, and also in a patient way, I'm sure it was very frustrating to try to keep this whole large group moving, and check in, just with me personally, she had to check in so many times, because I'm busy, and I haven't done this yet, and so just all of that administrative burden, it clearly requires a degree of resilience from a project lead perspective, that if I was to be the one leading a network study, I would need to be sure that I am ready to, I think I could handle it, but it would be much easier to know that that's the degree of commitment it's gonna take, rather than in my head, I think this is gonna go very smoothly, I'm gonna push button, get answer, and then when it's much more difficult than that, I can understand why people might want to jump ship, it's just, we only have limited time, so if I level so that, so the people that know what they're getting into, and just have that commitment to see it through to the end, because there's going to be, in both projects that I've been involved with,
[02:12:02]
that have gone from start to completion, there have been many moments where this is just taking longer, there's unforeseen challenges, and it's just, it's harder than we thought, even though we thought it was gonna be hard, it's still harder than we thought, so having that prepared resilience to deal with those moments has been key to get through a network study to completion, so I think that's something that is both a good idea, and also I think a requirement is resilience, so.
[Andrew Williams]
So it's very similar to the lessons learned that Chan left us with, but not giving up in there. Not giving up. Yeah. Maybe onto Sam.
[Anthony Sena]
Sena's Experience: COVID Studies and Strategus Challenges
Yeah, so the first network study I was involved in was in March of 2020, which you might remember was kind of a big moment for all of us, and we were embarking on some COVID studies to really try and help answer questions about what was going on, and prior to that, I remember there were a lot of social impediments for doing network studies before the pandemic was upon us.
[02:13:08]
A lot of folks were saying, you know, this sounds like a great idea, but I don't know if I'd ever have permission from my organization to run a network study and provide results, and that was a bit of a watershed moment in terms of the need for this information trumped some of the organizational bureaucracy concerns, and it really allowed us to kind of move forward as a community and really show the value in what we can do together as a community with our data if we're allowed to do so, and so for me, that was just really an important moment in terms of things that really can go well with the commitment of the community to push forward a network study, and I think since then, it's really snowballed in terms of people now feel like they can do a network study with support from the community and other expertise. Do you want a bad thing?
[02:14:00]
[Andrew Williams]
Yeah, I want a bad thing.
[Anthony Sena]
You want a bad thing?
[Andrew Williams]
What goes wrong that you think we could draw some kind of lesson from? So you could just complain if you want, or we could sort of think about something that we might draw a lesson from that you've observed not going well.
[Anthony Sena]
Yeah, there's no shortage of those, so I would say the Save Our Sisyphus Challenge last year for me personally was tough in that version of Stratagus that zero dot whatever three didn't work very well at a lot of sites. We had this architecture that was based largely on some differences in the way that we might have to set up environments to run the different Hades packages. To make a long story short, like at some sites just did not work well or at all, and it was an important learning that went into this newer version of Stratagus. So I feel like even though that was a largely successful effort through the Save Our Sisyphus Challenge, it highlighted a lot of problems with the Stratagus package that we were able to kind of leverage and improve upon in this year.
[02:15:08]
[Andrew Williams]
So the first thing you were saying was kind of how inspiring, I guess, in a key moment for global response to a crisis, that things could come together and that they could really provide value in a way that was clearly mattering and that that showed the whole enterprise being very much worth all the effort that goes into it despite its warts. And the second one was more of a learning opportunity you took from an earlier effort to help automate the study package generation and make it easier for everybody to do that piece of the work and so on. So that's great, thanks.
[Yong Chen]
Yong's Experience: Large-Scale Collaboration and Data Limitations
Yeah, so I'm going to talk about the good things first. So the positive things. For me, I have done a number of network studies about the 20 and some is lead by myself or some lead by my student postdocs.
[02:16:08]
But the real largest scale one was also at the beginning of the pandemic. I think the silver lining of the pandemic is that people start to collaborate internationally. And we actually run a study to identify risk factors that characterize the length of stay for those admitted COVID patients. And we involve many sites, as many as possible. We recruited as a data partners from the Odyssey Forum. Within a month, we recruited 12 data partners. Some are from Spain and some are from Korea. And lots of enthusiasm and also they give lots of interesting feedbacks. For example, in US, you typically run a model with the race as a variable to adjust. But we got the feedback from Korean collaborators saying, we don't have a race variable because everyone is a Korean, right?
[02:17:04]
So this is actually a good example that the meta-analysis Divine Unconquer would fail because how would you adjust the race variable in a database if there is no variation, right? So the federate learning actually offers a lot of advantages. So we end up, wrote that paper together and later publish it at National Communication. And at that point, I volunteered to serve as a point of contact just to have a taste of what a scale of communication that Odyssey, a real Odyssey collaboration is really leading to. Turns out I have about 300 emails. And yeah, I quickly regretted it. I wish I passed that to my postdoc, asked him to leave. But that actually motivated us to build this actually a web platform similar to what Sina talked about, but we have a centralized SFTP algorithm-specific self-guided platform.
[02:18:12]
People can use it for free and then can lead to, you can build your community, you can build your avatar. That we spend at least a quarter million dollar to build that. First the prototype and then we outsource to a commercial company to build it from scratch. And we have done the stress test. I think those are the type of effort that people can push forward. As I said, from few-shot learning to one-shot learning is hopeful. I used to think one-shot can only work for simple model, like linear regression model. But only not until recently, we found for more complex model, we can do lossless in one-shot. So I think I'm very hopeful on this regard. And I think the good thing in the Odyssey community is to train people, understanding the data, how to develop protocol, like what Nicole said in the beginning.
[02:19:03]
Think very hard about what you want to estimate, what is your target cohort. So those are very, very promising. But if you step back, think about from end to end, what is the time budget? What is the amount of time you spend from end to end, run end-to-end network analysis? I think thinking through the problem, developing the protocol, making your data ready, that at least take 40% of your time, right? And normally the communication, like the 300 emails, that take another 45% of your time. Writing the paper is actually the least challenging task. So at the most, 10% and revision, maybe 10% of your time. So I think as a community, like a statistician, informatician, we can push that 40% of federated learning to as simple as possible, like one-shot.
[02:20:00]
My dream is that we can schedule, after people get the data ready, we can schedule a Zoom meeting, one hour. We can get the analysis done. We can do the real-time troubleshooting. It's totally possible, okay? But so those are the positive side. I'm very hopeful for more and more scalable federated learning, federated analysis. But the downside, I hope that, the part that I hope people should do more is actually think about what your data can do, what the data cannot do. I think that we need more critical thinking. I know for data partners, there's a lot of enthusiasm to join a study, but sometimes your data has certain limitations. Fragmentation, miscapture of the patients, under-reporting, like a vaccine has a lot of under-reporting, COVID vaccine. Now people are working on GLP-1. There are lots of over-reporting. People are prescribed with GLP-1, they can, I was told three out of 10 patients, after being prescribed, they actually cannot afford to take the GLP-1 because it's very expensive.
[02:21:07]
And so we need to think about what your data capture, the limitations, and the data prominence. I hope people can think more. And I had a conversation with Patrick about this diagnostic package that people are pushing. So we basically, we are pushing people to run a set of diagnostic metrics, and then objectively evaluate whether your data is ready to fit for purpose, answer your clinical question. And then if your data is not ready, your data center should not participate in this study. Remember, in some cases, introducing the noise is not good, it's not the bigger the better. So you want to have a set of relatively homogeneous study to answer the same question you are targeting. So regarding the authorship, then I asked him about the question, for those studies, who actually run the diagnostic metrics but failed, meaning that their data did not directly influence your final estimate, should they be included as a co-author?
[02:22:14]
Now he would suggest yes, because people spend effort. I think we should develop an encouraging mechanism to have people to participate, and acknowledge their authorship, whether or not their study got selected. So this way, there is no selection bias, right? So otherwise, people tend to say, oh, just include my study in your project. So I think we should do better in terms of being more inclusive and a better publication mechanism by building this diagnostic metrics in the evidence generation pipeline. So that's from me.
[Andrew Williams]
Great, so inspired by some of the rapid advances and important methods research that were possible, that's one of the positive takeaways. And in general, the pitfalls of less transparency, to get data support for particular questions, and having, I guess, a lot of effort invested in things, only to discover at the end that some sites don't have the data, or maybe most of the sites don't really have the outcomes or the exposures in a representative way.
[02:23:19]
And I think those are really important challenges. And then you were starting at the end to talk about some possible remedies for some of the data support transparency issues and how to plan around those. It's exciting.
[Yong Chen]
Yeah, we should identify the problem before we run the analysis, not after.
[Andrew Williams]
Right, so building on this, I guess, evidence network and the dbDiagnostics package everybody's run to do that, to get more granular, because I think you could run that, maybe part of what you're saying is you could run that and still not really see whether the data are representative or not in a way that would satisfy most research criteria. And I guess also squaring engagement of sites that don't have perfect data while limiting analyses to those that have good enough data and how do we develop those standards as a community.
[02:24:11]
Is that right? Yep, precisely.
[Yong Chen]
Great.
[Nicole Pratt]
Nicole's Experience: Community Studies and Prioritization
Okay, so I guess, I've got a lot of examples, but just as a bit of a background, my first network study was back in 2015. Which was the first legend hypertension study. So that was nearly 10 years ago. Wow, I can't believe it, long time ago. But one of the things that's gone really well for me is that participation in large community studies. So the legend hypertension study, for those of you that might not know it, was really one of the biggest studies that Odyssey had ever done to that point.
[02:25:02]
And it's been followed up by the legend diabetes study, which you'll hear about tomorrow. And there's been other big community effort network studies that I've been involved with, like the SOS challenge and a few other of those, the bigger network studies. So what's gone really well for those is it's had a big community effort, right? It hasn't been one person driving it. It has been the community driving it. A lot of people invested in getting things done. And I guess don't overestimate or underestimate, I suppose, the power of that and the ability to get things done when there's such a big community all getting towards the same goal. So there's everyone in that group had experience in different aspects, and that was probably what worked well.
[02:26:00]
So I think we made the point a few times in our slides this afternoon, that you need variation in expertise and variation in experience. So having all of those people along the way was what I think drove the success of those really big studies. What's gone poorly is when I've tried to do it all by myself, as everyone has said. And you don't often put yourself as a priority, right? When you're working in these big collaborative studies, you have a small piece of work to do and people are expecting something of you. So you prioritise that, you try to get that done and you try to get, you know, to get that to them. That next step and you'll, you know, participate and put yourself, you know, put that as the priority. When you're running your own study, you don't often put yourself as the priority.
[02:27:04]
Everyone else becomes the priority. And I think if I was to run a network study from scratch with an idea that I've conceptualised, then I would try to build into my work, you know, goals that I would force myself to achieve. So I think having, you know, a good structure and good discipline for your own work is very important because, you know, I'll tell myself off, but so what? I'll just, you know, I don't care, I'll move on, right? So that I think has been a key learning for me. I also wanted to respond to what Yong said that the publication part is the easy part. I would disagree. It may be 10% of effort, but it seems to be a massive challenge for us to get to the publication side.
[02:28:04]
And I've just seen Martijn walk in the room and you might wanna talk to him about some of the great work he's doing in helping the community actually get to that publication stage. So I think that's one of the key challenges we have is that we have this great evidence out there from some of these studies, but when it gets to the point of actually writing the publication and getting that evidence disseminated, we tend to drop the ball sometimes, right? And think, oh, well, there's something else we need to work on now, so I'm gonna prioritise that and not prioritise the publication. So if there are ways that we can facilitate that and get those publications that come out from these really important studies and that we've all put lots of effort into, I think that's really one of the key areas. There's one other thing. Oh, the other thing I was just gonna say that went poorly with the SOS challenge was that we had a lot of people in the community saying they wanted to run the network study that we were doing on the fluoroquinolone analysis, and that was great, but when it came down to it, we needed specific databases with specific data in them.
[02:29:22]
We required people to be hospitalised for aortic dissection and aneurysm, and there were some databases that participated that were only in community, so they didn't have that really critical element of the analysis that was required, and we didn't then have an estimate from that database, or we had smaller databases in the Asian region that weren't able to pass diagnostics, and we weren't able to include them in the overall meta-analysis. So again, with hindsight, perhaps we could have tried to get more people from the Asia community to be involved in that study, but we were on a tight timeline as well to get this study done.
[02:30:14]
For the Australian Odyssey Conference we had last year, we really wanted to get the results out for that. So I think there's some, again, some management that you really need to do prior to some of these big network studies to make sure you've got the right data, those data have the elements that you need, and you're able to get a good representation across the world.
[Andrew Williams]
So the positive side, another report for, in this case, sort of the landmark first legend hypertension study, which was kind of a new thing under the sun in general, kind of an amazing accomplishment. Also, it was the thing I referred to in my intro as like presage of the clinical trial result that came out subsequent to it, amongst other things that are notable about it.
[02:31:02]
So that was an inspiring event for you. And then some of the downsides were, I think you had three. So one was kind of not being easy to judge as a leader what it's gonna require of you. Another is what you were just saying, the ability to have enough time and enough transparency about data support, or in this case, data representativeness from a geo, from a global point of view, not so much the data elements, maybe in addition to the data elements, those that are in a specific region. And then the third one was the amount of, like kind of keeping it all the way going until you deliver the paper at the end.
[Nicole Pratt]
So not- Accountability and prioritization of yourself to finish the job that you started. And I think that's not an Odyssey problem.
[02:32:00]
That's not a network study problem. That's everybody's research problem. So yeah.
[Andrew Williams]
Opportunity for optimization, that's great.
[Chan]
Okay, so I've already addressed what's related with my success story and what's related with my failure story. But first of all, we need a very appropriate study questions once again. So there are many study questions and many people ask me to collaborate with some Odyssey studies. And many times, I actually, even though I cannot tell them honestly, but their research questions are not, can be answered by Odyssey Network because sometimes because there is no enough data captured in our current Odyssey data resources, data network, or sometimes it's, it cannot be answered by observational research.
[02:33:05]
It should be, sometimes it requires the interventional studies such as RCT. So selecting very appropriate research question is the key for the success in the study. And then my first Odyssey study, actually, before my first Odyssey study, I participated several Odyssey studies. So I learned a lot from those experience. But still, my first Odyssey study was rejected by six or seven journals without any review. And I feel really frustrated at the time. I was really ambitious at the time because it's very, it was quite large study, a large scale study.
[02:34:01]
And I think I, my research question is very relevant clinically, but it doesn't, the final answer was not so successful. But after that, I participated in the Odyssey Legend Hypertension and then we published the first study in the Lancet. And then, I learned a lot from that study as well because how to address, how to organize the results within the paper and how to answer to the reviewer's comments and then how to describe the study's results within the manuscript. So I learned a lot. And then, my second study was published in the JAMA. So one of the key elements of success would be prior publications of Odyssey Network, I think.
[02:35:03]
Because by, it can be a reference of your current study. You can learn a lot about how to organize your results, how to describe your study, and how you can deal with the comments from the reviewers. So I think we need more studies, publications for future studies. It may facilitate more Odyssey studies. And finally, even though I initiated lots of studies, but I did not continue all of them because I just got distracted and then sometimes I just stopped to continue the study. So I need to check my commitment or my time before initiating my studies. And then, that's really important.
[02:36:02]
And lastly, once again, just like Nicole, I did not agree with Young that writing a paper is really 10%. It's easy for Young. Now for me, developing a study package and protocol and recruiting data partners is somewhat getting easier and easier, even though it is hard until now, but still getting easier. But organize the results and synthesizing the results across the data partners and then how to make some stories by using this, it's very, sometimes it's tough. You cannot expect the results before executing the study package against the data partners. And then sometimes the results is quite opposite to what you expected.
[02:37:01]
So in my JAMA paper, I thought that Tychler is better than Clopidogrel, just like the RCT shows it. But it was the opposite. These two medications are not different. So at the time, I was just so shocked. And once again, I felt frustration because I wanted to just share some consistency with the previous large-scale RCT, but the results from the autophagy is quite opposite. And I'm not sure whether the reviewers or editors will believe in this. But I just stepped forward because that's what observation study can show that because the Tychler, the newer drug, is much more expensive and its compliance is lower than the previous old drug.
[02:38:02]
So I think that that's the key difference between the RCTs and routine clinical practice or real-world data. So that's how I appeal to the reviewers or editors. So ensure the impact of your study is very important. And then sometimes it's very, because you cannot expect the results from the data partners, it can be challenging. But you have to overcome these challenges for your study. Thanks.
[Andrew Williams]
And to summarize there, so I think you said it's really important and sometimes very hard to make sure you've picked a question that the data can actually answer. So it's maybe similar to being able to vet the data support. It's also possible to maybe switch the perspective and say, do you have the data for this question? But are you really thinking clearly about what kinds of questions can be answered to begin with and that can, studies can run awry if that isn't done very well.
[02:39:09]
That was one of the things you pointed out. You also reiterated, I think very eloquently, the need for persistence and bringing things home at the end and then the inspiring experience of participating in very successful research. In this case, your wonderful publication in JAMA that kind of really showed some of the highlights of what observational research adds to what the clinical trial literature shows. And it really was a great example. I was glad you emphasized that.
Andrew's Experience: Danish Colorectal Cancer Study and Oncology NLP Study
I'm going to give a couple of my own. The first one isn't really a study I was participating in, but it's one I'm always struck by whenever I see it presented. So the folks in Denmark at Zeeland Institute, Andreas Rosenberger-Rosen and Ismail Gorguner are often the people who are presenting on those and help lead those efforts.
[02:40:03]
They are a person who works on infrastructure and analytics and a lead clinician in a unit that is focused on colorectal cancer surgery. And what always strikes me is how beautifully they characterized every aspect of what data are available, the natural history of outcomes in the patient population that are of interest, how those features can be brought to bear on a decision that has to be made. In this case, it's a decision about the design of post-surgical care and how to prepare for it at the systems level in Denmark. And they thought very innovatively, but very clearly about what is that decision? What are the triage kind of mechanisms? What are the actual options? And so there was a very, very clearly articulated both what the data are, the clinical situation and the decision. And as a result, they're really colorectal care is post-surgical.
[02:41:04]
Colorectal cancer care has changed in the whole country and it's sort of a shining example of how to put all the pieces together in the right way and that it's affecting the populations of an entire country in that particular case. So that always comes to mind when I think about stuff that goes well, again, with some of these other examples that have been brought up. And the one that I am responsible for not going well, which I'm sure there are several, is a current ongoing oncology NLP study. And here, I tried, I tried hard to really leverage all the standardization we've been hearing about, right? That Odyssey does, right? So you get, you don't do standardization for its own sake. You get it because it allows you to reuse stuff. You get a whole, you get the ability to scale and you get the ability to build on prior work because stuff has been standardized and preserved in the way that Odyssey allows you to do.
[02:42:01]
So in this particular study, we're looking at demonstrating the added benefit of using NLP on top of the structured data if you're gonna do oncology research. And we thought, well, we'll reuse a prior oncology study and we'll just do the same thing that was done originally and then we'll add the NLP and we'll have a very simple, clean, look, we got all this extra data. We can do this more nuanced phenotyping and so on and so forth. And I still think, good way to go. But similar to what we've heard here, we picked a study and it was a good study by good people, nothing bad about them, but I didn't do a good enough job in leading that and saying, we're gonna really pick the right study that we're gonna build this on because we got a couple of months in after monthly meetings, everybody organizing, everybody doing stuff. I was like, well, actually, this wasn't a good study. So it's more like another object lesson and in this case, it's not data support or write the question. In this case, it's the right things to be reusing, to be building on. If you've got all these reusable assets, in some cases, whole study protocols that can help jumpstart your work, you still need to vet that very thoroughly and you need to do it in this big, open, collaborative way.
[02:43:06]
And in this case, I am responsible for the early conversation where we didn't really pick the right study. We're on a good track now. Everybody should join that study. It's a fantastic study. It's great. And that's the end of my little contribution there.
Discussion: Identifying Targets for Improvement
We are exactly at the time to start the most exciting part of our tutorial. This is the part where we move the whole community forward and please don't head for the doors. I do this, you can. I'm not singling you out, I promise you. I'm just saying a mass exodus at this key moment would be a little discouraging. So, again, I've said this a couple times. Here, we wanna have a little bit of discussion about kind of what we just said with the idea of identifying targets, things we might improve and that are really feasible to improve. So we're probably gonna have two artifacts at the end of this. One will be sort of a list of stuff we might improve about how to do studies, but then a smaller set of stuff we wanna act on and some next steps of how to act on them based on how does the Odyssey community do its business.
[02:44:10]
We organize ourselves in work groups, we have these communication channels, we have strategies for getting work done collectively as a community. So we're gonna have a bit of a free-ranging discussion that's gonna be kicked off by our illustrious panel, probably by me and then going down the line, and then a broader discussion, like, well, everybody thinks that's important, that's the stuff we could do. Probably about 20 minutes left, we're gonna move to, okay, what are we actually gonna do? Who wants to do something if we've got next steps defined for a feasible target? Does that sound good? Does that sound like the most thrilling thing you could possibly do at 4.20 in the afternoon? I know it does, you don't have to dissimulate. I see it in your faces. I'm gonna start this off and say, I think broadly, here's a target.
[02:45:00]
Andrew's Suggestions: Project Management and Effort Planning
I think we need improved organization and task management in studies. I think if you're not an old hand, if you haven't done this a whole bunch of times, some support, like, what does that look like? How do you just organize everybody and make sure tasks are appropriately assigned to right people and keep track of all that stuff? That's just hard. We listed all of those different expertise types, and at the bottom, I put project management, and that's not a group that we have a place for in the community, and I think we should. So, the target is improved project management, specifically organization and task management for studies, and the suggested remedy is, like we always do, start a work group, but also maybe add an Odyssey Forums channel, and in general, just start to say, hey, we don't perform real well without really good project management. We can't pretend I'm a statistician and I'm also a fantastic project manager. I'm a clinician and I'm also a fantastic project manager. A lot of times, those things don't go together.
[02:46:00]
Some people bring those things together, but a lot of times, we don't, and trying to play too many roles is not a good thing. So, that's my kick it off. Actually, I'm gonna start with two ideas. That's one. Second is improve planning with respect to effort. So, as you're trying to think about, I'm gonna do this Odyssey study. I got the right question. I got the right data. I got the right people involved. What's it actually gonna take? How many months and years? When you're doing this, if you had to be in charge of a budget, right, for a grant or your company or something, you gotta have line items with people doing certain kinds of work that takes a certain amount of time and you gotta have a reasonable estimate of that or nobody's gonna approve it. You're not gonna get it, right? And if a big volunteer effort like this happens, what do you do, right? Well, one of the things we don't know what to do is to help people who wanna participate say this is about how much of your time this is really gonna take. And it's probably not gonna be the same amount of time for everybody. It's probably gonna vary by how much experience you have, but I think we could do something, first initial step at the community level to improve time and effort associated with different study roles.
[02:47:08]
So we went through those different time and effort study roles and I think the thing we could do, food for thought, is create a standardized template for capturing projected and actual time for a given role. So we've got a really, we've seen a bunch of different nice protocols for studies that's there. So just have a little place where it lives. Okay, we think this is gonna take X number of hours. Whoever's in that role, guess at the end. I'm not asking anybody to clock in, clock out like a lawyer or whatever. But roughly, this took X number of hours to get this done. And then overall, how long do you think the study's gonna do? I think there could be a simple way to get how far off are we and we're guessing it's gonna take X number of hours for the person doing the study package or doing the phenotype development to do. And so those are my two suggestions. I'm gonna hand it off to Ben. What are yours?
[02:48:02]
Oh, I should say while you're doing this, Ben, thankfully, thank goodness, is noting things down. So we have these two documents. One's gonna be ways we might improve the world. The other is the one we're gonna focus on at the end that's gonna say what might we actually agree is a good thing to do and who wants to participate in it. There's gonna be separate documents.
Ben's Suggestion: Intra-organizational Resource Sharing
And Ben, maybe you could say where they live or how people are gonna be able to see them afterwards.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
I guess we'll have to decide where they live. Right now, they live as document two unsaved on my computer, but we will change this. But as you hear these suggestions, think about that's what Andrew's saying at the end. We're gonna try and come up with a loose, all of these are gonna be important suggestions, but a loose rank ordering of time is limited. So what should we focus on first? And so those are two very good suggestions and we'll probably get my vote. But the one that I brought to the table was, the one that I thought of, it may be a little biased because of my involvement with the Evidence Network.
[02:49:03]
And then also, I work for Paul Nagy who has done a good job at Johns Hopkins of utilizing existing resources to build up our Odyssey community. But just within the node of Johns Hopkins. And I've seen the power of that and how we're able to help and contribute to network studies. And when I look at the network studies that have already been done, and I know this is a very rapidly developing community and ecosystem, but when we talk about the Odyssey network represents 12% of the world's population and then the biggest network study we've done has like 18 or so CDMs. There's such a gap that, I think there's a lot of reasons for that. But what we can do is lean on some of the nodes in the Odyssey network that have more experience in creating that within node community of people, resources, processes, experience, data.
[02:50:15]
We work in several projects where we have people going from no OMOP at all to having a very basic, we refer to as vanilla OMOP. You can have an OMOP instance relatively quickly and easily, but it may not be rich enough to ever really contribute to a network study. And so there's a wide variety in what it means to have OMOP data. And I think leaning on some of these sites that are, there's a representation of claims databases, but I think we need more EHR based OMOP CDMs and the ones that are able to stand up this OMOP ecosystem on EHR based database.
[02:51:02]
It's possible to, the word reuse has come up a lot. So it's possible to share this experience strategy, what has worked, different methods of taking what is already there. So we have a lot of people who are doing very much observational research. And so we've kind of converted existing research groups at Hopkins. So they're already doing research. They're already doing great things. They already have a lot of great people working for them. So this is personnel, this is resources. And they're now doing the same thing, but we sell them on this idea of like, that's a great study. It would probably be a great study to run across a network of, wouldn't you like to run it across the world for all these sites? And they're like, yeah, we would really like to get geographic variation and stuff. And so we really only have, I think the estimate to go from like zero to like a basic OMOP is like, and this is just a number that my supervisor throws around.
[02:52:09]
So like Andrew said, we need to get maybe a more like mathematical representation of how much time and resources are required to actually, people love the idea, but they're like, what does it take to, I wanna participate. We don't have any OMOP CDM, but how many people do I need to hire? How long will this take? And I hear the minimum, it's like one to two FTEs and like maybe $150,000. We have, keeping our little ecosystem going, there isn't one or two people. It's like one or two FTEs divided across 12 people. So they're already busy. They have lots of other things to do, but by taking this ecosystem and like having people spend some time and use their expertise to contribute to this OMOP ecosystem, we've done it without hiring like an OMOP guy.
[02:53:02]
We do pay for people's time to specifically do OMOP stuff, but it really all adds up to like two people, but we do it across like 15. And so that's a strategy that I think other people could utilize and sharing this information would be something that I think we should do more intentionally, and it sounds like we might have an opportunity to do that right now. Do you have a question? I just have a quick question. It's very loose, but I think that's.
[Andrew Williams]
Question was what is the one to two FTEs represent in terms of a time period?
[Multiple Audience Members/Panelists (including Jamie and Jim)]
So to the point about getting specific about what's our numerator and denominator, I think that that was basically over a year's time. But again, see this is the degree of information that we can share with each other. We should do better about this is what I think we have a big area for improvement, and we can do this through the, to talk about your specific action plan, we can do this through the Odyssey Evidence Network.
[02:54:07]
So when people submit, right now all you have to do is submit a DV diagnostics profile, and you're an official data partner. What does that get you? Not anything more than not being a data partner to be on it, but so we can have, there is support and resources and sharing and things that we can reuse that is beyond just the software that Sena is writing and the other materials like organizational strategies for building up your OMOP ecosystem, where people have done that well, we can share this and I can speak for my supervisor, Dr. Nagy would love to share this with people. So coming up with a formalized way and a place, like maybe it's the Evidence Network Working Group, but making a decision around how we're going to help each other build up the OMOP within institutions, I think is something that we're.
[Andrew Williams]
I'm gonna boil it down, just make sure I follow, because there's a couple things there.
[02:55:00]
I wanna say it was, we need estimates throughout the community of what it takes in terms of personnel and man hours to build and maintain and use a high quality OMOP instance and participate in Odyssey studies, just that price tag and those kind of roles.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Yeah, the equations like you already pointed out too, yes. Yep, perfect. But I mean like a place and a group of people whose responsibility is to be there for people to say, yeah, okay, how much time can I expect to like, not just a number to give them, but people to help them through that process rather than saying like, you can expect this, good luck, bye, I hope it goes well. Actual things, a strategy and a program maybe to help people build up OMOP capacity at their institution. That's my. Please. It's all on the same thing.
[Nicole Pratt]
I'm a data partner at Home Partners, so I'm gonna say you're probably, all of you are summing yourself a little short.
[02:56:20]
I understand why you wanna go.
[Yong Chen]
No, we're not, we're not, we're not, we're not.
[Nicole Pratt]
The other thing is, instead of just this is our proprietary data, we have to guard it, right?
[02:57:08]
[Anthony Sena]
But the other part is they're willing to put the money in now instead of like the little years, we'll see how it goes, and you are the ones who are asking this very question, Andrew, about we were the ones who put that question up there.
[Andrew Williams]
Yeah, it's a common question. And there's a, so Marty, in my shop, produces a funding opportunity thing every week, and it's on an Odyssey site. It's not quite as visible as it needs to be. I think it's gonna be going on the website going forward, but so yeah, academic centers or other folks who need grant funding, there's resources there. There's an industry group that's, you know, coming up with strategies for that. I think governments in general are in a little bit different boat. So it depends on where you are in the community, what part you're in, but there's a need for more transparency and communication about sustainability enabling approaches and opportunities and so on.
[02:58:04]
So maybe we'll go on to Santa.
[Anthony Sena]
Sena's Suggestion: Container Enablement
Yeah, my focus is generally on the open source development. So with respect to network studies, I would think one area that we could focus on as a community is better container enablement within the community. So right now, if you wanna set up like the Hades tool stack and kind of walk through the project template that I was going through earlier, it takes several hours and some time to kind of set up your system and configure everything. Whereas most developers these days are accustomed to just pulling a container out and having all of that ready to go, along with some data that they could start to use to construct their study. We have elements of that in the community with the Broad-C initiative, but I think there's a lot more that we can do there, both to enable developers that want to develop for network studies, as well as partners that are capable of running containers to execute their studies.
[Andrew Williams]
Awesome, yeah. And I have to connect you to Jared. Jared's in my shop. That's what we're doing. This is all packaged up and we got our own synthetic data, working with it in these different areas.
[02:59:04]
And so whole, it builds on Broad-C, but it does other stuff. So yeah, awesome.
[Yong Chen]
Yong's Suggestion: Protocol Development and Publication Facilitation
I think we may want to also think of it reversely, like what are the deliverables for this network of studies, right? So I think a publication is one of the main incentive, in the sense that we want to advance the knowledge, the frontiers of the research, push forward and also get recognized. So, and then we can think reversely about how to make this process as efficient as possible. So, as I said earlier, I think writing up the paper is 20% of the effort. This is not saying that the leader side is only, the leader person is only spending 20% of the overall effort. What I want to talk about is actually conceptualizing the problem, developing the protocol, formulating this rigorously.
[03:00:07]
That first 40% is incredibly important because for me, I used to act very quickly. So when I see a problem, I just develop a protocol and then jump to analysis and play with lots of post-hoc problems. And one thing I learned from collaboration during the pandemic, at beginning of pandemic, from a health econometrician, he's also a Penn colleague, David Ash. Last name is A-S-C-H. He's a wonderful econometrician. So we actually established a collaboration with the United Health Group. During the pandemic, we have two meetings each week, Monday and Friday, Monday and Thursday, and then with homework in between. So very, very fast pace. But to my surprise, we have phenomenal data because later they actually have this, they brought this optimal data, incredible data.
[03:01:02]
They didn't dive into any particular problem. Instead, we spend weeks dancing around the possible problem that the data is ready to resolve. And what is the most important health policy question we want to use the data to answer? I think for me, it's a big shock, a culture shock, because I would have dived into any of the problem much earlier, but instead, David would say, okay, let's hold on this thought, let's think through. And then we have a lot of qualitative thinking rather than quantitative thinking. I think that I learned a lot from that process, and then nowadays, almost every project, we spend 40% of the overall time developing the protocol, trying to understand what problem to answer, consult many clinical collaborators, domain experts. I wish there's a mechanism at Odyssey, we can post the protocol to solicit for comments and questions.
[03:02:00]
On the other hand, protecting the originality of the idea. But that process, I think, is the most important process that a lot of biostatisticians tend to overlook. So we focus too much on the technical innovations, but less about those qualitative thinking. To me, I think 40% of effort should be spent on understanding what problem to be solved, and then what are the analysis protocols. And actually, for reproducibility, that's also very important. Nowadays, for every single clinical paper we submit to a clinical journal, we always publish as appendix, supplementary material. We submit the protocol that we developed before touching the data. And there's always a revised version, it's the revised protocol that you actually implement in your final analysis, because there are certain things, no matter how hard you try, there are certain things you didn't see ahead of the time.
[03:03:10]
So there's a revised protocol, and we provide a documented explanation, say why we did this, we did this revision. And this is a way to ensure the reproducibility, ensuring that we are not cherry-picking the results. I think this actually helped a lot with the publication process. For that, that's why I'm saying that, to me, I feel it's 20% if you have done your homework very thoroughly. So I think it's also important lesson to share with the colleagues.
[Andrew Williams]
So the target would be to get closer to what you think is the appropriate 40% amount of time that should be spent on protocol development, making sure the question is right, and that the methodological choices are appropriate and so forth, and some community-level support that ensures that, I guess, is the question I'm gonna ask you. Does that look like a revised version
[03:04:02]
of a chapter in Odyssey that says this is, in general, you should be thinking about this amount of time, and some, I guess, workflow that follows what you said about being able to put up for public review a protocol in development that guards people's intellectual property so that they aren't afraid to put it up there because people are gonna come and steal it. So some new piece of the, I guess, web or GitHub or other infrastructure that has that declared purpose, kind of like this is a flag being planted by this team. They have done all this work. They're putting it here not to, so that you can come steal it, but so that you can go provide reflection on it and improve it. That sounds exciting. I really like that. Yeah, I think I mentioned that some of the anti-phacking and kind of public revisioning of protocols is stuff that's more aspirational at this point than it is fully realized.
[03:05:04]
So essentially, I think the vision for why to do that and a little bit about how to do that has already been laid out by people in the room, other people elsewhere, but how do we actually build some infrastructure and some standard operating procedures that help people follow that? Is that the gist of it? Great.
[Yong Chen]
I just want to end this comment by one famous quote by a very famous computer scientist. He said, the premature optimization is the root of all evils. So just something to keep in mind.
[Nicole Pratt]
Nicole's Suggestion: Resurrection of Stalled Studies
Okay, I have a kind of a similar option, I think, we could work together as a community to do, and that is to work hard on resurrection. And by that, I mean, lots of research is wasted. And I think there are many studies that have got that 40% of the way through and have got a good research question, have got maybe a good protocol, maybe not so much, or even maybe have some packages already written or at least some code for cohorts and so on.
[03:06:17]
So how can we identify those studies that have put in that hard effort and have got something on paper that we can then get across the line and work towards sort of completion? And that will obviously involve some of that qualitative assessment that you were talking about. Like, what is the question? Is it still an important question in healthcare? And should we prioritize it for completion? So I think one of the things we could potentially get together and do is a review of studies that are sitting around half completed and just need that extra push.
[03:07:01]
And perhaps to get that extra push to happen, we could implement some sort of mentoring approach where we place those researchers with somebody who has some experience and get them to kind of help get to the next stage and get some action happening in those things that are already half done. You know, it'd be great to get them finished if they need finishing. Some have been around for a while perhaps and are no longer relevant. But, you know, I know I have a study or two that is sitting there sort of formulated, you know, half specified in a protocol and the next new shiny thing comes up and off we go and do that. So yeah, that's my suggestion.
[Andrew Williams]
So harvesting prior work and with mentorship, like getting a group together to bring it home and drive by.
[Nicole Pratt]
I will just really quickly say there is a network study that's going on for the APAC community, so the Asia Pacific Odyssey community.
[03:08:05]
And the approach we're using for that one is to actually implement some mentorship. So there's a new researcher who's wanting to lead a network study that we're gonna do as a community. But what we're trying to do with the help of Chan and with the help of Jack Janetsky who did also work on the SOS challenge is to, you know, mentor her to get through. And perhaps that mentorship might, you know, help with the goals and meeting deadlines and being accountable. So that could also be another way of doing things.
[Chan]
Chan's Suggestions: Study Protocol and Incentives
So, you know, to tackle the challenges in the executing the study and then managing the study. So we need to think about why it is really hard to manage the Odyssey study. Why? Because we lack the experts on the Odyssey study.
[03:09:00]
Because there is no person who knows how to proceed the study. So that's why it's very hard to manage the study. Because there is no one to enable that. So I think that first feasible step of ours can be creating and publishing the concrete protocols for the Odyssey study. So, you know, we need better framing of our Odyssey study even though we do have kind of some frame like, just like legend study. So we publish the legend, the protocol for the legend study. It's like 10 principles. But the recent, one of the recent successful story related with this is emulation of RCT. So, you know, the Shunive script from the Harvard framed their work as a clinical trial emulation.
[03:10:06]
And it gets really famous. So everyone just conduct emulation of RCT studies. They published the reporting guidelines of the RCT emulation of clinical trials studies and they provide some kind of a checklist for that. So, you know, and in Odyssey, you know, even though our study pipeline is quite, a very standardized, you know, much more standardized compared with the emulation RCTs. But, you know, those rationales of our standardized framework is very fragmented. So there are very different papers about each of our, you know, each of our rationale for this standardized study protocol, but not in the one place.
[03:11:04]
Even though we published the book of Odyssey, which is phenomenal, but still it's not really concise and reporting guidelines of the study. So I think we need to write and publish the more concise and with more standard frame for our study. And second is that, you know, how we can build some incentives system for the network study. So, once again, the basic challenge is that there is no actual incentives in participating the network studies. So we may need to build some incentives. In my institution, the Severance Hospital at University, I need, they charged me for my CDM studies because mostly due to the, because I need, we use the AWS cloud system, so we have to, I have to pay for that.
[03:12:05]
But anyway, I need to pay for my CDM studies to my institution. So we may think about the kind of the incentive studies, or kind of the standardized incentives systems across the data partners. It's so, you know, so with that grant, it can be sustainable.
[Andrew Williams]
Perfect, right on time. So I'm gonna, I think Chen, just to summarize what he said, I think is either a revision of the book of Odyssey or in some other effort that helps specify the protocol for actually conducting a study in much more realistic and full way than we have so far as a community. And then the last thing, I'm gonna put words in your mouth. You tell me if this is wrong. Sort of a survey of people who participate in studies in the community to understand what their incentives are or what rewards they would need in order to justify to the other people they work with or the people they work for why they're participating.
[03:13:03]
Something, so it might be different in different parts of the world or different kinds of institutions, but some better understanding of what those returns are so that we can figure out like an economy that's sustainable with respect to people's devoted efforts and so on. Go ahead, Yong.
[Yong Chen]
Just a very quick comment. I think Chen talked about there's a lack of principle, the guidelines of running network studies. Actually, my single favorite paper published by Odyssey community, I'm speaking this not because Martin is here, but it's actually the paper called 10 Principles for Running a Legend Study. So it's as concise as just a simple table, just 10 principles. I told every collaborator, every student I work with, that is a very concise summary about the key principles.
[03:14:00]
And it sounds like from the feedback, Martin, maybe you should write a revised version for the other principles. But I highly encourage everyone to take a look at that paper. Just you can skip, personally, I think I can skip all the text. Just jump into that particular table one. You get the key message.
[Andrew Williams]
Prioritization and Next Steps
Great, so I wanna, as I said, in the last part of this, we wanna move towards sort of concrete steps we might take. We wanna have Ben review what's been said so far, but there's a lot of expertise in the room. So I wanna, this is gonna be a little bit painful. It's a little painful, and I'm sorry, but we can't have voluminous descriptions of what your suggestions should be. If somebody has a tersely worded, just diamantine crystallization of an idea of what needs to be improved and how they might use the odyssey, more time.
[Yong Chen]
It's not actually my idea, I have a question.
[03:15:00]
I have a question for Chan for an idea that he had a long time ago. Correct me if I'm wrong, but you had this idea of a office hour where people could go for mentoring when they did a study. You did that for a while, and then it stopped. And so I'm just curious if you wanted to, could you share your experience with this?
[Chan]
So actually, I've initiated the nurturing committee, kinds of, so I set my office hour, and then just, I will be available during the time, and anyone can ask me to help if they need some support. And actually, what happened is that no one asked me. But it was like about seven or eight years ago, so actually not really active, this committee is not so active.
[03:16:00]
And actually, today's morning, some of the students from Hawkins actually asked me for their studies, so I just spent kind of an hour, and I think that we made a very huge progress on that study. So yeah, I think that kind of that. I think you were ahead of your time. Right. No, I'm serious. Yeah, so what I suggest to them is that, because today's, at the Odyssey Symposium in this year, that more than 40 people from Hopkins joined this community, but they complained to me that there is no, you know, many experts in the Hopkins. So I think I recommend that them that, you know, I will be in Chicago next year in March, and then I can spend one or two days for you guys, like, you know, to assist, to be a mentor of your students.
[03:17:05]
That's an excellent addition to the list. I'm gonna cut the conversation there.
[Andrew Williams]
Painful, as I said it would be, it is. It's painful. I know there are ideas in the room, excellent ideas that we haven't gotten to, but we have to, if we're gonna make progress, we have to try and do that now. So, Ben.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Yes. Are we, am I going through it from start to finish, or should we stop?
[Andrew Williams]
Yeah, let's talk about process for like 30 seconds. So let's say Ben is going to kind of review what it was that was just suggested, and then we're gonna take four minutes to ask questions, and I'm gonna be just brutal about saying what those four minutes, when they end, and then we're gonna have a really rough democratic vote, like based kind of on people's level of enthusiasm at 445,
[03:18:00]
and say this is the thing that seems feasible to do that would be really important, have a big impact, and then the remaining time, the remaining 12 minutes, whatever it is, we're gonna say, okay, who's gonna be involved in this, what might it look like, how do we take next steps? You're not gonna be able to solve all that, but getting a sense of who cares, what it might look like, what some next steps are, that's all we're gonna shoot for in that last 10 minutes. So, take it away, Ben.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
All right, the first one was improved organization and task management and network studies, so like really taking project management and like focusing on that as a discipline, and so, I, oh, I can, yeah, yeah, I can. There's a top on that, so we're not too stressed. Thank you, thank you. Excellent idea. I'll give this to you, because now I have this. You have that. And on this one, I didn't get the specific action item as far as like what your potential suggestion was, but.
[03:19:06]
Work group and. Work group. New forum channel devoted to project management. Excellent. And that is not on the screen. There it is. Oh, well, no, I don't see it. Okay. Okay, here we go. So, work group and forum channel, forums channel. It is amazingly hard to do this. Okay, I misspelled forum, but I'll deal with that later. Any other suggestions as to actionable items? Okay. So, are we discussing for four minutes this one, or am I going? Excellent, okay, I'm on board now.
[03:20:01]
And so, the next one was improved planning with respect to effort. So, this is around that equation that also came up later when I was getting at my point too, and so, this is calculating, like capturing and estimating the time associated with each of those either roles, breaking it down as detailed or as accurately as we can, but actually starting that effort of trying to capture the time. Like you said, I think you said clocking in, clocking out as like not literally, but along those lines.
[Andrew Williams]
In general, misspecifying the amount of work in a project is a thing, like it's a huge area where people study that and how to improve it. In general, you project, you say, well, I think it's gonna cost this, right? So, this is the amount of time, this is the amount of effort, and then you'd see how much it actually does, and you're different, and then you iterate on why it's different, and gradually you get better. So, the idea is just super lightweight, project, I think it's gonna take X amount of time from a statistician, X amount of time from somebody who's an informatician, yada, yada. Getting that projection and really lightweight way to collect it, that's it, and maybe associated with the study protocol itself so we have a standard way to go get it.
[03:21:09]
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Okay, and then we got to my suggestion around coming up with, I would say, either expanding an existing working group or creating a new working group to intra-organizational resource strategy sharing. You made the point about is there grant funding for this type of thing? There is, so this would be helping, how do we help each other get grants, which is like, I'm new to academia, but sometimes you can think of, these are our competitors we're helping, but it seems like people are very willing to share that sort of. So, where would we house that type of, it would be a working group is, maybe all of our suggestions are gonna be new working group, or adding that to the existing. I think I liked the idea of doing that in the evidence network working group.
[03:22:00]
So, expanding a working group to focus on helping individual institutions grow their Odyssey community and resources. And then I have documented a second mention for that equation of time and resources that came up again. And so, I just wrote containerization, exclamation mark. I think the voting might be biased towards how many actual software developers we have in the room, because I know that that to be important, but shout out to all the software engineers, I'm sure they will vote for that one. And then, so, that first 40%, that's what Yong suggested, is really focusing on that first 40%. And I think Andrew, you mentioned that this could be in different ways. It could be, again, another section of the forum, or the GitHub, or some sort of infrastructure in our maintenance of communications where we both communicate strategies and learning or experience about how to really optimize that first 40%.
[03:23:14]
You also mentioned a new book chapter, but.
[Andrew Williams]
I was really kind of asking, it's Yong's idea, but I was thinking a publication or a revision of the book chapter, or some other thing that really kind of encourages adequate time spent on those important preliminary questions. That's the gist, so like building on one of those Odyssey assets to do that. Yep, okay.
[Yong Chen]
So, very quickly. So, you talk about this technical level sharing through GitHub. There's a human level, right? So, I think it's always about having someone with trust that who can help running this, like my time. We don't want to add you to this, but we can have someone, people, everyone, know and respect, help coordinate this.
[03:24:02]
That would be great.
[Andrew Williams]
We might need to just name the last two.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
If we're gonna have time for any questions and for voting and next steps, we need to move fast. Of course. So, Nicole talked about the other piece where there may be studies that have done well in the first 40%, but they're out there stalled out for various reasons. And so, we can call it Nicole's extra push initiative. Some sort of program or mentorship focused around identifying those studies and giving them and learning how to give that extra push where, when, and how it's needed. And then Chan talked about several things, but it kind of sounded like a little bit of a marketing framing, but how do we frame Odyssey studies in a way that is so clear and so powerful that they become almost famous, like the example that you used? And so, people very intuitively know what that, and so, just better framing, more structure, and more clear sort of like checklist type of explicit instructions so that everyone can very quickly know what it means to run an Odyssey network study.
[03:25:06]
Does that summarize it well? Okay. That's what I heard.
[Andrew Williams]
And then, a revival of the mentoring office hours.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
I did, yes.
[Andrew Williams]
It's the last one. Okay, so, voting wise, let's see. Let's just, is there, you had talked about like those poll, you know, EV poll thing. Yes, I can set that up. Can we set that up in the inside of 26 seconds?
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Oh, actually, no, I can't.
[Andrew Williams]
I cannot. Yeah. Yeah, I just wanna do it, you know, I'm a psychologist, and I can tell you what happens is you see people raise their hand, even if you don't want to, you're thinking, that's what people want. Yeah, we gotta, everybody's gotta be in the, go into the booth in the back and pull the lever, and then we'll know, but we can't, we don't have a booth.
[03:26:00]
We don't have the evpoll.com. My plan is falling apart quickly, but we're gonna salvage it. We're gonna say everybody is a free and independent person who cares not a whit about what everybody else in the room thinks, and they're just gonna raise their hand. They're gonna free ourselves from deeply embedded psychological biases that the rest of the human race suffers from, and we're gonna do it in order. How many votes do you get? It depends on which political party you belong to. Sorry. That's, you know, I'm from the great state of Maine, where we have ranked choice voting, and I believe in ranked choice voting, but no, we're gonna keep it simple, and we're gonna do it fast, and yes, please, actually, I'm gonna, because I know we just went through it, but take a minute. Take these next 20 seconds, and come to just an absolutely firm decision. This is the one I am gonna vote for, and I am gonna vote for one.
[03:27:01]
You got 20 seconds. Do that now. Trying to zoom out. Da, da, da, da, da, da, da, da, da, da, da, da, da, da, da. Da, da, da, da, da, da, da, da, da, da, da, da, da, da, da. My musical talents are wasted. Okay. All right. We shouldn't have names attached to these, because nobody's gonna take it personally if you vote for or against their thing. No, no, it's good. It was good. You needed it. Just don't worry. Nobody is going to be offended, but we will start at the top. If you think, as you understand it, that's the best one, raise your hand. All right. Short description is have a new work group and a new forum channel devoted to project managers and all things about managing projects in Odyssey, and that's the gist.
[03:28:11]
So we've got one, two, three, four, five.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Okay, got it.
[Andrew Williams]
All right. All right. Improve planning with respect to effort. So this is just projecting costs and time spent on things and then recording the actual time spent on things as a standard practice that we have some infrastructure for and a template for recording. If you vote for this, raise your hand. Deafening silence.
[Anthony Sena]
This was my failed idea.
[Andrew Williams]
It's nothing I'm going to have a hard time recovering from eventually. Okay. Ben, you want to say briefly what yours is?
[03:29:02]
This next one?
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Oh yeah, true. I can do either one, but this one, just the one, because the second one was sort of like what we just voted on, but this is a concerted effort to share strategy resources and building up intra-institutional Odyssey capacity and communities. And the vote is, raise your hand.
[Andrew Williams]
This is the one you picked in your mind.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
And I did not take that personally.
[Andrew Williams]
Okay. The tidal wave for containers is coming.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
That's next.
[Andrew Williams]
You said you were skipping this one? The next one?
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Yeah, yeah.
[Andrew Williams]
All right. So, containerization, exclamation point. All right.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
I got seven? Seven. Is that seven? Wow. It is the leader.
[Andrew Williams]
All right. The leaderboard, this is tense.
[03:30:02]
I'm feeling the tension. All right. Young, do you want to say more about what yours is or just have people vote for it? All right. For publication facilitation and some new infrastructure that reserves space for the 40% of time that's needed to make sure that critically important work is done well. Please raise your hand. Okay.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
All right.
[Andrew Williams]
Best five? Nicole's resurrection of Nice. partially completed studies that have so much value, if only they could be pushed over the edge. Nicole, is that a good representation? Yes? Two? Raise your hand.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Three.
[Andrew Williams]
Three?
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Excellent. This doesn't mean we can't do it, we're just prioritizing.
[Andrew Williams]
Yeah, it's just what we're gonna do today. This is the problem, the world's problem that we're gonna get going on today.
[03:31:02]
All right. Chan. Better protocol for studies.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Better framing. Raise your hand. People like it. Ooh, lots of people like it. That's true. There is two. I'm gonna take them separately. Yeah. So this is not the incentives. This is not the incentives. This is the framing.
[Andrew Williams]
Protocol. One, two, three, four, five, six. Okay. Has everybody only voted once?
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Don't.
[Andrew Williams]
Is that seven?
[Multiple Audience Members/Panelists (including Jamie and Jim)]
I got six. Was there a seven? Seven? Thank you. We all need to count.
[Andrew Williams]
All right. Building incentives for, you know, in general understanding how to build incentives for, that motivate people's work and justify it. One. Two. Oh, you can only vote once. Only one vote.
[03:32:00]
Only one vote. Yeah. Okay. Okay, got one.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
And then.
[Andrew Williams]
Okay, and the last one. Resurrecting the office hours project.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
Yeah.
[Andrew Williams]
Our time. It's a single vote. No, there's no. You're not allowed. No ballot box stuffing. Okay.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
But you've been heard.
[Andrew Williams]
Voting Results and Call for Volunteers
Yes, I'm sure it was appreciated. All right, so I think we had a tie, right? We had two sevens?
[Multiple Audience Members/Panelists (including Jamie and Jim)]
We do have two sevens.
[Andrew Williams]
So we're out of time. We got nothing to do.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
A vote off?
[Andrew Williams]
Well, I think it's more important to say who wants to do something in this space? And really, it's okay to say you think you might. It's better to say I'm really sure. This is something I would love to make a contribution to and I could imagine spending some amount of time per month working on delivering some sort of strategy that actually pushes one or both of those forward.
[03:33:08]
I think we got a good enough prioritization process for our initial focus. We more need people who are interested in helping to lead an organization around that. I have a full plate. This isn't even all of my stinking, ridiculous, overstretched number of Odyssey things that I'm involved in. So it's not gonna be me, I'll tell you that. So we need somebody who wants to do this extremely important work because this is why we're doing everything in Odyssey. It comes down to getting studies done, right? And we know there's challenges. We know there's a lot to build on. We know it's super valuable and we know there's challenges. So it would be a huge contribution. I'm feeling a little bit motivational speaker-ish. I'm probably being a little annoying, but we need somebody who wants to maybe contribute.
[03:34:00]
Who wants to maybe contribute? Maybe. That's all you're saying. You're saying I'm a maybe person. That's fine. That's like nothing. Who's in that vote? I can start with a quick example of the Yeah. Great. All right, and so everybody who raised your hand, put it back up and just, Ben, if you can take people's names down. These are the maybe people. I think volunteering people who aren't in the room is a basic principle that we should all abide by at all times and I think volunteering whole organizations that are out of the room is even better than individuals that are out of the room. Yeah, I think that's, you're doubling down on that basic principle there. I like it. So we have, Ben, do you have all the names? I don't think you do.
[Multiple Audience Members/Panelists (including Jamie and Jim)]
I do not.
[Andrew Williams]
So everybody raise your hand again and then just tell Ben your name. Show your name or Ben can come around and get you.
[03:35:02]
Memorizing faces. What we will do is we will contact you afterwards and we will say, you are amongst the groups who said maybe. We're gonna try and find a leader of this group and develop, just like schedule a meeting, right? A team's meeting for you all to discuss what might be done and who might be taking on leadership of the meeting. It'll be discussion, right? There wasn't a lot of time to go through the practical considerations that make sense to taking this on. No harm, no foul. But that's the next step. I think that's as far as we could get.
Conclusion and Next Steps
We tried to get a little further but I think we still got some momentum going and it's exciting and thank you all. I learned a lot from everybody else who presented today so thank you all for attending and look forward to doing Odyssey studies with you in the future. Hope you'll make some good Libya.