All times indicated are Central Europe Time

9 November 2020
14:00
16:00
Seminar | Machine Learning in clinical drug development
Markus Lange - Senior Principal Statistical Consultant at Novartis AG
Lorenz Uhlmann - Principal Biostatistician at Novartis AG
16:00
16:10
Break
16:10
18:00
Seminar | Machine Learning in clinical drug development
Markus Lange - Senior Principal Statistical Consultant at Novartis AG
Lorenz Uhlmann - Principal Biostatistician at Novartis AG
10 November 2020
10:00
10:20
Welcome
10:50
11:20
AIML and Statistics: don’t get lost in translation
Graeme Archer - VP & Head, Non-Clinical & Translational Statistics at GlaxoSmithKline Pharmaceuticals R&D

In 2001, Leo Breiman, a Professor in the Department of Statistics, University of California, Berkeley, wrote an influential – and not uncontroversial – paper, in which he stated: “There are two cultures in the use of statistical modelling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown.” The “data modelling” culture (so-named by Professor Breiman) obsesses with identifying a specific form of nature lying inside the black box that turns inputs (of whatever form) into outputs (response values of scientific interest) by specifying parametric data generating mechanisms, fitting these to available data, then making predictions and/or classifications about the output variables conditional on that fit. Data modellers are nearly always what we’d call “statisticians”.

Breiman set this up in opposition to what he called the “algorithmic modelling” culture, which (supposedly) doesn’t care about the structure within the black box, and instead identifies a function that links input with output with sufficiently high accuracy. Such an objective is indeed how the ML scientists of my acquaintance define their work.

But is the disjunction (data vs algorithm) valid? I will argue that it isn’t – that any successful modelling activity requires an understanding of the properties of both data and algorithm, and whether one calls oneself a statistician or an ML scientist is a matter to which the universe, with regard to the validity of one’s predictions, is entirely indifferent.

The real cultural separation, I will argue, between the statistical and ML communities (and why the former keeps losing talent to the latter) isn’t to do with models for data and algorithms, but is much less tangible, and to do with disposition: “you shouldn’t do that”, versus “why not try?” This is the real challenge to the statistical community: can we open-up our attitudes quickly enough to remain relevant in a world that ought to be crying out for our talents. We seem to have boxed ourselves in to something unfashionable; I think we can learn from our ML friends in order to change, and the best way to learn is to work together. I’ll explore all these thoughts in the talk with some examples from my work in pharmaceutical R&D.

Reference
Breiman, L (2001) Statistical Modelling: The Two Cultures. Statistical Science, Vol 16, No.3, 199-231.

10:20
10:50
The challenges and opportunities in using (and regulating) broader data and applied analytics in guidance development and technology assessment in the UK
Thomas Lawrence - Data scientist, Managed Access at NICE

The National Institute for Health and Care Excellence (NICE) provides national guidance and advice to improve health and social care. In January 2020 we published a Statement of Intent to increase and extend the use of data in the development and evaluation of our guidance. This explores:

  • What kind of evidence does NICE currently use to develop guidance
  • What broader types of data are available
  • When and why should broader types of data be considered
  • Practical considerations associated with data analytics

We have since defined the programme to develop a methods and standards framework for activities involving broader sources of data and applied analytics. This is summarised into five key topics, which are crucial to address to ensure best practice in conducting high quality analyses of data. These are Research Governance; Data; Analysis; Results and Dissemination. In addition to these topics, the programme will consider some cross cutting issues ranging from transparency and public trust, to the validation and evaluation of artificial intelligence [AI] in digital technology.

The presentation will give an overview of the transformational journey and the detail on the issues we are considering; which will be crucial for any organisation seeking to ensure their approach is suitable for regulatory assessment/evaluation.

11:20
11:30
Break
11:30
12:00
Challenges and Opportunities of Machine Learning Methods in Biostatistics
Marietta Kirchner - Biostatistician at Institute of Medical Biometry and Biostatistics, University of Heidelberg

In the 21st century, data is the most valuable asset of companies and researchers. For many fields, the availability of Big Data holds the promise to answer questions that would have been out of reach just a couple of years ago. The analysis of Big Data requires expert knowledge, which is why data science is becoming an increasingly important topic. There is an enormous curiosity to explore new approaches and an increased visibility of data analytics to support decision making. However, the focus is often on data analysis and important steps of design, report, and communicating complex concepts are not addressed. Statistics can make a decisive contribution to a successful and safe application of e.g. machine learning (ML) techniques. While there are workshops on data science available, teaching the relevant techniques with real-life application to medical data is currently still a niche subject. To fill this gap, the Department of Medical Biometry (University of Heidelberg) offers a certified course that introduces and deepens the essentials of medical data science. The course is structured into four different modules teaching the skills to answer clinically relevant questions by analyzing Big Data.

This talk provides insight into the programme and addresses the challenges and opportunities of applying ML methods in biostatistics by the presentation of a practical example on prediction dealing with clustered or nested data. Clustered data is mainly analyzed using generalized mixed-effects regression models, because ML algorithms, like tree-based models, usually do not consider clustered data structure. Fokkema et al. (2018) propose the generalized linear mixed-effects model tree (GLMM tree) algorithm which accounts for the clustered data structure and automatically performs variable selection as done by classical tree-based algorithms. The GLMM tree algorithm will be illustrated in the context of developing a prediction tool for tooth loss, where teeth are clustered within patients. To conclude, ML techniques can facilitate workflows, but there are situations where these methods may need further development to be applicable to the particular needs of medical research. Teaching the relevant techniques for a safe application is important.

References: M. Fokkema, N. Smits, A. Zeileis, et al. (2018). Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees. Behav Res, 50:2016–2034.

12:00
12:30
Drug candidate selection and pipeline development - a data science approach
Dimitrios Skaltsas - Co-founder & Executive Director at Intelligencia

Biotech and Pharmaceutical companies alike are increasingly relying on Artificial Intelligence applied on expertly curated data to identify complex patterns in order to better understand and minimize the risk of drug development.

Example use case:

What is the probability of technical and regulatory success (PTRS) for a program in clinical development and what are key drivers for this? The application of Machine Learning techniques can help analyze external historic data and provide a powerful perspective. We examine as a case study an ongoing program in Diffuse Large B-Cell Lymphoma (DLBCL), and address several related questions, e.g., (a) how does the PTRS in this program compare to historic averages, (b) how does the PTRS compare to competing ongoing programs, (c) what are the key drivers (e.g., outcomes, number of patients) for the Machine Learning analysis, (d) what are historic benchmarks for such drivers, (e) how do historic benchmarks compare to current trends in ongoing clinical development programs?

12:30
12:40
Morning Wrap up
12:40
14:00
Lunch break
14:00
14:30
Integrating clinical and genetics data for predicting anti-epileptic drug response
Johann de Jong - Principal Scientist / A.I. Data Scientist at UCB

Johann De Jong (UCB); Ioana Cutcutache (UCB); Sami Elmoufti (UCB); Robert Power (UCB); Cynthia Dilley (UCB); Matthew Page (UCB); Martin Armstrong (UCB); Holger Froehlich (former UCB 1)

Epilepsy treatment and drug development are hampered by high rates of non-response. A better understanding of non-response can help clinical trial design and disease management, by focusing therapies on probable responders, and providing optimal treatment earlier in a patient’s disease course.

We collected clinical trial data for the anti-epileptic drug (AED) brivaracetam (n = 235 patients; single trial). We combined a hybrid data-/knowledge-driven feature extraction with advanced machine learning to systematically integrate available clinical and genetic data, and successfully predict drug response (cross-validated AUC = 0.76, external validation cohort: AUC = 0.75). The most important predictors represented a mix of clinical and genetic features, e.g. prior AED use, epileptic focus localization and enrichment of certain classes of genetic variants. Additionally, we showed that by enriching for responders, such models can be used to substantially reduce sample sizes required in confirmatory studies, albeit at the expense of stricter inclusion-/exclusion criteria.

Our analysis represents the first of its kind in epilepsy: Even with limited sample size, integrating clinical and genetic data can inform AED response prediction. Furthermore, such models can substantially impact clinical trial design. This also shows that we can begin thinking more systematically about personalized healthcare in epilepsy.

UCB Pharma-funded

1. Current affiliation: Fraunhofer Institute for Scientific Computing and Algorithmcs (SCAI), Business Area Bioinformatics, Schloss Birlinghoven, 53754 Sankt Augustin, Germany
Email: holger.froehlich@scai.fraunhofer.de

14:30
15:00
Recent examples of how data science is being used in clinical development. Reflections on lessons learnt
David Wright - Head of Statistical Innovation at AstraZeneca

In recent years there has been an massive increase in the use of advanced analytics techniques in a wide variety of areas associated with clinical development.

This presentation will provide details of a number of such examples such as finding drug candidates to include in clinical trials, site selection, subgroup detection and analysis of sensor data. Reflection on what has worked well and where opportunities have not been fully grasped will be given and suggestions for new areas in clinical development where data science could be applied will be provided.

15:00
15:10
Break
15:10
15:50
ROUND TABLE | Data Science in Drug Development: are we in presence of revolution?
Graeme Archer - VP & Head, Non-Clinical & Translational Statistics at GlaxoSmithKline Pharmaceuticals R&D
Marietta Kirchner - Biostatistician at Institute of Medical Biometry and Biostatistics, University of Heidelberg
Johann de Jong - Principal Scientist / A.I. Data Scientist at UCB
Thomas Lawrence - Data scientist, Managed Access at NICE
Dimitrios Skaltsas - Co-founder & Executive Director at Intelligencia
David Wright - Head of Statistical Innovation at AstraZeneca

Moderator: Giacomo Mordenti – Head of Biostatistics & Data Management Europe at Daiichi Sankyo Europe GmbH

15:50
16:00
Conclusion