Science these days is inseparably linked with computational analysis. All organizations whether small biotechs, large (bio)pharma companies or academic research groups have a computational element to their work that is gaining in importance as automation enables bench scientists to generate ever larger and more complex data sets. To make sense of that data, cloud workstations can be readily, and flexibly accessed and new software packages almost instantaneously perform analyses that in the past took days or weeks.
Deep Learning has advanced the state-of-the-art for image analysis and can process larger datasets than traditional methods. This world of big data, customizable computing power and advanced image analysis tools has removed many of the old bottlenecks that slowed down innovation but also created a set of new ones primarily around shareability, traceability and reproducibility of image analyses.
Here is an overview of the nine most pressing challenges organizations struggle with:
- Managing computational teams and projects
With data generation no longer the research bottleneck, workflow challenges are now rate-limiting. The days of a biologist generating data and analyzing them in a spreadsheet are over. Now teams of bioinformaticians and computational researchers crunch complex multi-omics data sets created in the lab. The challenge is to manage these different skill sets and – in the absence of established best practices – develop a methodology for standardized data analysis that can be consistently applied and avoids ad hoc, project-based approaches that need to be reinvented for every project.
Each team has different tools of choice they are skilled in using, from choices in programming languages (R, Python, or MATLAB) to the computing environment (GPUs, FPGAs, and CPUs). Collaborating on team projects that are multidisciplinary in nature can be a complex undertaking.
- Splitting up complex tasks
Computational projects are designed to help answer an important scientific question. Complex problems like identifying the best target or drug candidate need to be broken down in subtasks that are assigned to different computational teams.
Organizations struggle with splitting large tasks and enabling teams to quickly access data, launch the computational tools they need and reassemble the different components for a project.
- Handling DevOps tasks
Before computational researchers can start analyzing data, a host of DevOps tasks are required, e.g. setting up security and computing infrastructure. The smaller an organization, the bigger the problem around efficiently executing these DevOps tasks. Larger organizations can hire software engineers – an expensive solution that often slows progress as computational researchers must wait for help from scarce IT resources.
- Collaborating with colleagues and external partners
For large organizations the main challenge is enabling collaborative work. Knowledge needs to be shared internally or externally in a way that the collaborators can readily build on it. Static slides or spreadsheets are no longer adequate for deep knowledge transfer, but organizations generally do not have a centralized platform that hosts curated data and analysis and allows for easy sharing and quick iterations.
- Aggregating knowledge company-wide
Computational researchers often lack deep software engineering skills, they know enough about GitHub and Docker to be dangerous, but not enough to perform tasks such as merging code or releasing it to production. Hiring software developers solves this problem but creates a new one: organizations need to find a way for these teams to work together and aggregate the knowledge.
- Keeping knowledge in the company
Common problems organizations of all sizes are struggling with is capturing the knowledge created by an individual on their computing machine. A team member leaving the organization highlights this problem: often their work is lost or extremely difficult to recover or recreate.
- Increasing cycle time for sharing
Cycle time for sharing complex analyses with biologists who don’t code is high. Currently, reducing that cycle time requires the involvement of the IT department or hiring specialized staff. Organizations need approaches that allow them to quickly share results and iterate without also dramatically increasing overheads.
- Making costly decisions based on non-reproducible analyses
In biological and medical research very expensive decisions are made based on computational analyses. Trust in the results and their reproducibility are critical before a decision – such as taking a compound into the clinic – can be made. Establishing trust often requires rerunning analyses that were done months or even years ago. While this sounds basic, the reality is that attempts to rerun old analyses almost always fail. New versions of software and new dependencies turn reproducing previous work into a time-consuming, frustrating and often futile exercise when it should be as easy as clicking “rerun”.
- Keeping research gratifying and fun
Long cycle times, challenges cooperating and sharing results and hard to reproduce results slow down progress and can turn the fun and excitement of cutting-edge research work into a drag – something organizations intent on retaining their computational researchers in times of record low unemployment rates want to avoid.
Computational research is a young discipline and growing pains are to be expected. Many of the critical pieces already exist. What is missing is a central platform that orchestrates and integrates the entire process, enables quick cycle times, easy sharing and collaboration, frees lab and computational researchers from the burden of DevOps tasks, and reliably generates the same results it did yesterday or six months ago
Simon Adar
CEO, Code Ocean