Skip to content

Exporting Models

While AIDO.ModelGenerator is CLI-driven, models created with AIDO.ModelGenerator can also be loaded in Python scripts and exported to HuggingFace.

Exporting and Loading with CLI

As an example of a finetuned model export, see some of the many checkpoints available on Huggingface for immediate inference. The only requirements are the config.yaml and <model>.ckpt files to be run again from AIDO.ModelGenerator.

# Download from HF
git clone https://huggingface.co/genbio-ai/dummy-ckpt

# Evaluate
mgen test --config dummy-ckpt/config.yaml \
  --ckpt_path dummy-ckpt/best_val:step=742-val_loss=0.404-train_loss=0.464.ckpt 

# Predict
mgen predict --config dummy-ckpt/config.yaml \
  --ckpt_path dummy-ckpt/best_val:step=742-val_loss=0.404-train_loss=0.464.ckpt \
  --config configs/examples/save_predictions.yaml

Exporting and Loading in Python

Model checkpoints can also be used in Python scripts, notebooks, or other codebases with PyTorch Lightning.

# my_notebook.ipynb

########################################################################################
# Download the data
from huggingface_hub import snapshot_download
from pathlib import Path

my_models_path = Path.home().joinpath('my_models', 'dummy-ckpt')
my_models_path.mkdir(parents=True, exist_ok=True)
snapshot_download(repo_id="genbio-ai/dummy-ckpt", local_dir=my_models_path)

########################################################################################
# Run the model
import torch

# Check the config.yaml for the model class
from modelgenerator.tasks import SequenceClassification

ckpt_path = my_models_path.joinpath('best_val:step=742-val_loss=0.404-train_loss=0.464.ckpt')
model = SequenceClassification.load_from_checkpoint(ckpt_path)

collated_batch = model.transform({"sequences": ["ACGT", "ACGT"]})
logits = model(collated_batch)
print(logits)
print(torch.argmax(logits, dim=-1))
########################################################################################

Exporting and Loading in Hugging Face

Checkpoints can also be converted to HF format.

# Download from HF
git clone https://huggingface.co/genbio-ai/dummy-ckpt

# Convert to HF
mgen convert --config conversion_config.yaml

# conversion_config.yaml:
task_class: modelgenerator.tasks.SequenceClassification
ckpt_path: dummy-ckpt/best_val:step=742-val_loss=0.404-train_loss=0.464.ckpt
dest_dir: dummy-ckpt-hf
push_to_hub: false
repo_id: genbio-ai/dummy-ckpt-hf

Then load the model in Hugging Face format

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained("./dummy-ckpt-hf")
# Or if pushed to hub
# model = AutoModel.from_pretrained("genbio-ai/dummy-ckpt-hf", trust_remote_code=True)

collated_batch = model.genbio_model.transform({"sequences": ["ACGT", "ACGT"]})
logits = model(collated_batch)
print(logits)
print(torch.argmax(logits, dim=-1))