Skip to content

Dependency Mapping

Dependency mapping is an in silico mutagenesis technique that identifies co-conserved elements in a sequence. AIDO.ModelGenerator implements the procedure proposed by Tomaz da Silva et al. We use this to mine functional genomic elements in the AIDO.DNA paper with the AIDO.DNA-7B and AIDO.DNA-300M models. This task uses the pre-trained models directly, and does not require finetuning.

To reproduce the dependency mapping results from the AIDO.DNA paper, run the following from the ModelGenerator root directory:

# Inference
mgen predict --config experiments/AIDO.DNA/dependency_mapping/config.yaml

# Plotting
python experiments/AIDO.DNA/dependency_mapping/ \
    -i predictions \
    -o plots \
    -v experiments/AIDO.DNA/dependency_mapping/DNA.txt \
    -t modelgenerator/huggingface_models/rnabert/vocab.txt 

To create new dependency maps,

  1. Gather your sequences into a .tsv file with an id and sequence column.
  2. Run mgen predict --config config.yaml where
  class_path: Inference
    backbone: <you-choose>
  class_path: DependencyMappingDataModule
    path: <path/to/your/seq/dir>  # Note: this errors for ., use ../dependency_mapping if necessary
      - <my_sequences.tsv>
    vocab_file: <vocab>.txt  # E.g. experiments/AIDO.DNA/dependency_mapping/DNA.txt
  - class_path: modelgenerator.callbacks.PredictionWriter
      output_dir: predictions
      filetype: pt
  1. Run the plotting tool
python experiments/AIDO.DNA/dependency_mapping/ \
    -i <prediction_dir> \
    -o <output_dir> \
    -v <vocab.txt> \
    -t <tokenizer_vocab.txt>  

The output will be files of the name <id>.png in the output directory, with heatmaps of dependencies and logos with sequence information content.