Skip to content

Backbones

modelgenerator.backbones.GenBioBERT

Bases: HFSequenceBackbone

GenBioBERT model

Note

Models using this interface include aido_dna_7b, aido_dna_300m, dna_dummy, aido_dna_debug, aido_rna_1b600m, aido_rna_1b600m_cds, aido_rna_1m_mars, aido_rna_25m_mars, aido_rna_300m_mars, aido_rna_650m, aido_rna_650m_cds.

FSDP auto_wrap_policy is modelgenerator.distributed.fsdp.wrap.AutoWrapPolicy

Parameters:

Name Type Description Default
config_overwrites dict

Optional model arguments for PretrainedConfig. Defaults to None.

required
model_init_args dict

Optional model arguments passed to its init method. Defaults to None.

required
from_scratch bool

Whether to create the model from scratch. Defaults to False.

False
max_length int

Maximum sequence length. Defaults to 512.

None
use_peft bool

Whether to use LoRA PEFT. Defaults to False.

False
frozen bool

Whether to freeze encoder. Defaults to False.

False
save_peft_only bool

Whether to save only the PEFT weights. Defaults to True.

True
lora_r int

LoRA r parameter. Defaults to 16.

16
lora_alpha int

LoRA alpha parameter. Defaults to 32.

32
lora_dropout float

LoRA dropout. Defaults to 0.1.

0.1
lora_target_modules Optional[List[str]]

LoRA target modules. Defaults to ["query", "value"].

['query', 'value']

modelgenerator.backbones.GenBioFM

Bases: HFSequenceBackbone

GenBioFM model

Note

Models using this interface include aido_protein_16b, aido_protein_16b_v1, aido_protein2structoken_16b, aido_protein_debug.

FSDP auto_wrap_policy is modelgenerator.distributed.fsdp.wrap.AutoWrapPolicy

Parameters:

Name Type Description Default
config_overwrites dict

Optional model arguments for PretrainedConfig. Defaults to None.

required
model_init_args dict

Optional model arguments passed to its init method. Defaults to None.

required
from_scratch bool

Whether to create the model from scratch. Defaults to False.

False
max_length int

Maximum sequence length. Defaults to 512.

None
use_peft bool

Whether to use LoRA PEFT. Defaults to False.

False
frozen bool

Whether to freeze encoder. Defaults to False.

False
save_peft_only bool

Whether to save only the PEFT weights. Defaults to True.

True
lora_r int

LoRA r parameter. Defaults to 16.

16
lora_alpha int

LoRA alpha parameter. Defaults to 16.

16
lora_dropout float

LoRA dropout. Defaults to 0.1.

0.1
lora_target_modules Optional[List[str]]

LoRA target modules. Defaults to ["query", "value", "key", "dense", "router"].

['query', 'value', 'key', 'dense', 'router']
lora_modules_to_save Optional[List[str]]

LoRA modules to save. Defaults to None.

None
lora_use_rslora bool

Whether to use RSLora. Defaults to False.

False

modelgenerator.backbones.Onehot

Bases: HFSequenceBackbone

Tokenizer-only model for one-hot encoding. Useful for baseline model testing (CNNs, linear, etc.)

Note

Models using this interface include dna_onehot and protein_onehot.

Does not contain any parameters, and cannot be used without an adapter.

Parameters:

Name Type Description Default
vocab_file str

Path to the vocabulary file. Defaults to "modelgenerator/huggingface_models/rnabert/vocab.txt".

None
max_length Optional[int]

Maximum sequence length. Defaults to 512.

512

modelgenerator.backbones.GenBioCellFoundation

Bases: HFSequenceBackbone

GenBioCellFoundation model

Note

Models using this interface include aido_cell_100m, aido_cell_10m, and aido_cell_3m.

FSDP auto_wrap_policy is modelgenerator.distributed.fsdp.wrap.AutoWrapPolicy

Parameters:

Name Type Description Default
config_overwrites dict

Optional model arguments for PretrainedConfig. Defaults to None.

required
model_init_args dict

Optional model arguments passed to its init method. Defaults to None.

required
from_scratch bool

Whether to create the model from scratch. Defaults to False.

False
max_length int

Maximum sequence length. Defaults to 512.

None
use_peft bool

Whether to use LoRA PEFT. Defaults to False.

False
frozen bool

Whether to freeze encoder. Defaults to False.

False
save_peft_only bool

Whether to save only the PEFT weights. Defaults to True.

True
lora_r int

LoRA r parameter. Defaults to 16.

16
lora_alpha int

LoRA alpha parameter. Defaults to 16.

16
lora_dropout float

LoRA dropout. Defaults to 0.1.

0.1
lora_target_modules Optional[List[str]]

LoRA target modules. Defaults to ["query", "value", "key", "dense", "router"].

['query', 'value', 'key', 'dense', 'router']
lora_modules_to_save Optional[List[str]]

LoRA modules to save. Defaults to None.

None
lora_use_rslora bool

Whether to use RSLora. Defaults to False.

False

modelgenerator.backbones.SCFoundation

Bases: HFSequenceBackbone

Wraps SCFoundation model in ModelGenerator backbone with multiple gene embedding modes

Note

Models using this interface include aido_scfoundation

Parameters:

Name Type Description Default
legacy_adapter_type Union[LegacyAdapterType, None]

Type of legacy adapter

required
default_config Union[DefaultConfig, None]

Default values set by downstream tasks

required
max_length

Maximum sequence length

required
frozen bool

Whether to freeze model

False
output_type str

Type of output embedding ('cell', 'gene', 'gene_batch', 'gene_expression')

'cell'
pool_type str

Pooling type for cell embedding ('all', 'max')

'all'
input_type str

Input data type ('singlecell', 'bulk')

'singlecell'
pre_normalized str

Whether input is pre-normalized ('T', 'F', 'A')

'F'
train_last_n_layers int

Number of layers to train in the encoder

0

modelgenerator.backbones.Enformer

Bases: HFSequenceBackbone

Wrap Enformer https://github.com/lucidrains/enformer-pytorch in ModelGenerator backbone

Note: Do not support LoRA

Parameters:

Name Type Description Default
legacy_adapter_type (LegacyAdapterType, None)

Type of legacy adapter, setting it to None disables it.

required
default_config (dict, None)

Default values set by downstream tasks. Defaults to None.

required
config_overwrites dict

Optional model arguments for PretrainedConfig. Defaults to None.

required
model_init_args dict

Optional model arguments passed to its init method. Defaults to None.

required
from_scratch bool

Whether to create the model from scratch. Defaults to False.

False
max_length int

Maximum sequence length. Defaults to 196_608.

196608
frozen bool

Whether to freeze model. Defaults to False.

False
delete_crop_layer bool

Whether to delete cropping layer. Defaults to False.

False

modelgenerator.backbones.Borzoi

Bases: HFSequenceBackbone

Wrap Borzoi https://github.com/johahi/borzoi-pytorch in ModelGenerator backbone

Note: Do not support LoRA

Parameters:

Name Type Description Default
legacy_adapter_type (LegacyAdapterType, None)

Type of legacy adapter, setting it to None disables it.

required
default_config (dict, None)

Default values set by downstream tasks. Defaults to None.

required
config_overwrites dict

Optional model arguments for PretrainedConfig. Defaults to None.

required
model_init_args dict

Optional model arguments passed to its init method. Defaults to None.

required
from_scratch bool

Whether to create the model from scratch. Defaults to False.

False
max_length int

Maximum sequence length. Defaults to 524_288.

524288
frozen bool

Whether to freeze model. Defaults to False.

False
delete_crop_layer bool

Whether to skip cropping layer. Defaults to False.

False