The NCATS Informatics group is looking for a post-doctoral scientist to work on machine learning methods within Pharos, an interface to the Illuminating the Druggable Genome (IDG) knowledgebase. The position will involve method development in the areas of knowledge representation, recommendation algorithms and network modeling & inference.
The IDG program was initiated in 2014 with the goal of shedding light on unstudied protein targets. Over the last three years, it has integrated data on 20,000 protein targets from more than 50 data sources and types. NCATS has developed Pharos, a user interface that allows users to explore the IDG knowledge base, using full text, faceted search. More details of current capabilities can be found in Nguyen & Mathias et al, 2017.
Over the next few years we will focus on developing novel, knowledge-based representations of proteins, literature mining to identify biomedical knowledge domains and incorporate inference algorithms to enhance search facilities (semantic autosuggestions, recommendations), with a special focus on prioritizing unstudied proteins. We expect that research results developed by the applicant will be implemented and deployed within Pharos for use by the community.
Some example problems that we will be working on include
- Construct knowledge based representations of proteins and explore their use in clustering and classification.
- Recommending related and relevant entities to a user during search
- Use unsupervised topic models to associate subsets of literature (publications, patents, wikipedia pages, etc) to a target or sets of targets
- Explore the use of Bayesian networks to enhance searches and filters
- Prioritize unstudied targets for screening experiments
While these are key questions we are interested, It’s expected that the candidate will dig in to the data and take the lead on formulating related questions and identify opportunities to develop and apply new methods.
The candidate is expected to present results of ongoing work at meetings (such as internal group meetings, IDG teleconferences and meetings or national conferences) and write up their work for publication. Depending on the problem being worked on, there will be opportunities to work with experimental scientists so that novel methods can be tested on real data.
The NCATS Informatics group plays a key role in the operation of multiple screening platforms (small molecules, RNAi, ADME, drug combinations) and large scale infrastructure within the Division of Preclinical Innovation at NCATS. Informatics scientists are committed to research and development in all areas of molecular informatics. The group provides an intellectually stimulating environment and is interested in a diverse range of informatics topics that vary in scale from small molecules to phenotypes and other high level biomedical datatypes. As part of the IDG program we collaborate with computational and experimental groups across the US and Europe.
- Ph.D. in Computer Science/Statistics/Bioinformatics with a specialization in machine learning and/or networks
- 3 or more publications during your Ph.D. with at least 1 first author paper (or else, a paper where you were a significant contributor).
- Must have strong programming skills – ideally in R. If not, you should have experience with other data analysis environments (Python, Matlab, etc). In general you’ll likely need to munge data, internal and external, so this should not be a bottleneck.
- Strong communication skills (in written and spoken English).
- Background in biology (ideally in the area of proteins and networks)
- Experience with bayesian networks
- Experience with text mining/modeling
- Experience with Java development
Please send a cover letter, that specifically addresses why you’re interested in this position, a CV and bibliography (all in PDF format) to Rajarshi Guha (email@example.com)