Learning to cope with complexity in iterated high throughput experiments

Norman Packard

Modern robotic technology enables high throughput exploration of many experiments in parallel. Each well of a micro-well plate may contain an experiment specified by several parameters (e.g. initial concentrations, temperatures, incubation times, etc.), with these parameters varying from well to well and plate to plate. Exploration of the experimental space is difficult for two basic reasons (i) high dimensionality (each experimental parameter corresponding to a different dimension of the space), causing a combinatorial explosion in the number of possible experiments to be considered, and (ii) complexity of the experiments, i.e., the difficulty (in principle) of predicting experimental response to variations in experimental parameters, because of their strong nonlinear interactions. For high dimensional experimental spaces, each generation of an iterated high throughput experiment can sample the space only very sparsely. Consequently, the experimental design problem of designing subsequent generations of experiments on the basis of results from previous generations is generally quite difficult... what is the best choice of experimental parameters, given a sequence of results from past generations? We describe an approach to answering this question, using a learning algorithm to model the experimental response surface, sample it with myriad virtual experiments, and suggest experiments to try next. We discuss application of the technique in several contexts, including design of drug formulations, optimization of cell-free protein synthesis, and combination therapies for cancer. We will also describe extensions of the technique to include explorations of sequence spaces, where the sequences considered may be base pairs of DNA or RNA, gene sequences, or amino-acid sequences.