Principal Data Scientists within RSS helps drive forward R&D activity by providing a deep level of insight into experimental data, its interpretation and implications for project development efforts. Principal data scientists combine a number of skills from different domains to organize, process, and learn from data, often through the lens of domain-expert informed models that help to abstract concepts from the data, test their validity and make predictions.
The principal data scientist must then synthesize insights from this data and communicate it to internal customers in a way that is both approachable and informative. Given a need for routine analysis by internal customers following a pilot study by the data scientist, the principal data scientist then integrates the necessary analytical or inferential routines into a pipeline through a collaborate software engineering effort, resulting in a validated internal software product that may be used regularly by experimentalist to track their research effort's progress. The principal data scientist may also have scientific expertise in the field of study, often to great advantage for the purpose of this role, though not necessarily at the level of the experimental scientist / customer. Alternatively, the principle data scientist may have little or no scientific expertise in that particular domain but bring an extremely advanced set of skills in the areas of machine learning, deep learning, statistical modeling, signal processing, data engineering, software engineering etc., such that they are generally advantageous members of a project or subproject team regardless of the scientific domain of the problem.
Finally, in some cases, a principal data scientist may be even more specialized with regards to the research, development, or correct implementation of the most advanced versions of the skills and technical approaches in the areas formerly listed, that his/her role requires little or no interaction with experimental groups and is instead focused entirely on advancing the core competencies, tools sets, infrastructures, and processes within the data science organization(s) of RSS.
* Research, design, implement and evaluate machine learning algorithms and statistical models for signal processing and/or bioinformatics applications. Generate internal implementations to achieve results on Roche applications
* Work closely with software engineering teams to drive scalable, production-ready implementations
* Collaborate with teams across the company and serve as an internal expert on technical issues
* Document technical work as part of the product development process. Support patent application and scientific paper publishing process
* Contribute to our evolving cloud infrastructure, data engineering pipeline, and analysis stack
* Identify technical challenges, define requirements and prioritize efforts
* Assist with defining requirements and architectures for next-generation machine learning / statistical analysis products
* Contribute to scientific software engineering efforts, utilizing professional coding standards and participating in reviewing PRs
* PhD in Bioinformatics, Computer Science, Engineering, Computational Biology, Physics and 0 to 2 years of relevant experience OR Master's degree (related degree above) and 3 years+ relevant work experience.
* Previous experience in Biotech or Life Science related field required. Genetics or Genomics work experience is highly preferred.
* Advanced understanding of scientific process and analysis of empirical data. Demonstrated ability to design experimental analyses which result in meaningful conclusions. Additionally, demonstrated ability to work with experimentalists when planning physical experimental design.
* Working knowledge of linear algebra, differential equations, calculus and/or discrete math.
* Probability distributions, classical hypothesis testing, regression. Bayesian concepts: conditional probabilities, priors, posteriors, maximum likelihood estimators. Ability to correctly apply probability theory/statistics across multiple domains with little or no guidance. Assist and mentor data scientists on nuances of applied statistical analysis particularly with regards to challenging problems.
* Machine Learning: classification, regression, clustering; Demonstrated ability to apply deep learning approaches to categories of machine learning problems.
* Algorithms: Fundamental data types (stacks, queues, etc.); Sorting algorithms (quicksort, mergesort, etc.); Graph traversal; Dynamic programming
* Strong communication skills and collaborative nature
* Proficiency with programming languages such as: R, Python, C++, or Java. Proficiency in ML/scientific computing libraries. Ideally some background with OOP design
* Familiarity with collaborative software engineering practices, including version control, code reviews, etc
* Ability to architect and implement machine learning or data science solutions with specific skills mentioned in rows above; Strong foundation in machine learning, mathematics, statistics, with demonstrated professional or academic experience
* Less than 5%