Dependency Mapping
Dependency mapping is an in silico mutagenesis technique that identifies co-conserved elements in a sequence. AIDO.ModelGenerator implements the procedure proposed by Tomaz da Silva et al. We use this to mine functional genomic elements in the AIDO.DNA paper with the AIDO.DNA-7B and AIDO.DNA-300M models. This task uses the pre-trained models directly, and does not require finetuning.
To reproduce the dependency mapping results from the AIDO.DNA paper, run the following from the ModelGenerator root directory:
# Inference
mgen predict --config experiments/AIDO.DNA/dependency_mapping/config.yaml
# Plotting
python experiments/AIDO.DNA/dependency_mapping/plot_dependency_maps.py \
-i predictions \
-o plots \
-v experiments/AIDO.DNA/dependency_mapping/DNA.txt \
-t modelgenerator/huggingface_models/rnabert/vocab.txt
To create new dependency maps,
- Gather your sequences into a .tsv file with an
id
andsequence
column. - Run
mgen predict --config config.yaml
where
model:
class_path: Inference
init_args:
backbone: <you-choose>
data:
class_path: DependencyMappingDataModule
init_args:
path: <path/to/your/seq/dir> # Note: this errors for ., use ../dependency_mapping if necessary
test_split_files:
- <my_sequences.tsv>
vocab_file: <vocab>.txt # E.g. experiments/AIDO.DNA/dependency_mapping/DNA.txt
trainer:
callbacks:
- class_path: modelgenerator.callbacks.PredictionWriter
dict_kwargs:
output_dir: predictions
filetype: pt
- Run the plotting tool
python experiments/AIDO.DNA/dependency_mapping/plot_dependency_maps.py \
-i <prediction_dir> \
-o <output_dir> \
-v <vocab.txt> \
-t <tokenizer_vocab.txt>
The output will be files of the name <id>.png
in the output directory, with heatmaps of dependencies and logos with sequence information content.