Backbones
modelgenerator.backbones.GenBioBERT
Bases: HFSequenceBackbone
GenBioBERT model
Note
Models using this interface include aido_dna_7b
, aido_dna_300m
, dna_dummy
, aido_dna_debug
,
aido_rna_1b600m
, aido_rna_1b600m_cds
, aido_rna_1m_mars
, aido_rna_25m_mars
, aido_rna_300m_mars
,
aido_rna_650m
, aido_rna_650m_cds
.
FSDP auto_wrap_policy is modelgenerator.distributed.fsdp.wrap.AutoWrapPolicy
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config_overwrites
|
dict
|
Optional model arguments for PretrainedConfig. Defaults to None. |
required |
model_init_args
|
dict
|
Optional model arguments passed to its init method. Defaults to None. |
required |
from_scratch
|
bool
|
Whether to create the model from scratch. Defaults to False. |
False
|
max_length
|
int
|
Maximum sequence length. Defaults to 512. |
None
|
use_peft
|
bool
|
Whether to use LoRA PEFT. Defaults to False. |
False
|
frozen
|
bool
|
Whether to freeze encoder. Defaults to False. |
False
|
save_peft_only
|
bool
|
Whether to save only the PEFT weights. Defaults to True. |
True
|
lora_r
|
int
|
LoRA r parameter. Defaults to 16. |
16
|
lora_alpha
|
int
|
LoRA alpha parameter. Defaults to 32. |
32
|
lora_dropout
|
float
|
LoRA dropout. Defaults to 0.1. |
0.1
|
lora_target_modules
|
Optional[List[str]]
|
LoRA target modules. Defaults to ["query", "value"]. |
['query', 'value']
|
modelgenerator.backbones.GenBioFM
Bases: HFSequenceBackbone
GenBioFM model
Note
Models using this interface include aido_protein_16b
, aido_protein_16b_v1
, aido_protein2structoken_16b
, aido_protein_debug
.
FSDP auto_wrap_policy is modelgenerator.distributed.fsdp.wrap.AutoWrapPolicy
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config_overwrites
|
dict
|
Optional model arguments for PretrainedConfig. Defaults to None. |
required |
model_init_args
|
dict
|
Optional model arguments passed to its init method. Defaults to None. |
required |
from_scratch
|
bool
|
Whether to create the model from scratch. Defaults to False. |
False
|
max_length
|
int
|
Maximum sequence length. Defaults to 512. |
None
|
use_peft
|
bool
|
Whether to use LoRA PEFT. Defaults to False. |
False
|
frozen
|
bool
|
Whether to freeze encoder. Defaults to False. |
False
|
save_peft_only
|
bool
|
Whether to save only the PEFT weights. Defaults to True. |
True
|
lora_r
|
int
|
LoRA r parameter. Defaults to 16. |
16
|
lora_alpha
|
int
|
LoRA alpha parameter. Defaults to 16. |
16
|
lora_dropout
|
float
|
LoRA dropout. Defaults to 0.1. |
0.1
|
lora_target_modules
|
Optional[List[str]]
|
LoRA target modules. Defaults to ["query", "value", "key", "dense", "router"]. |
['query', 'value', 'key', 'dense', 'router']
|
lora_modules_to_save
|
Optional[List[str]]
|
LoRA modules to save. Defaults to None. |
None
|
lora_use_rslora
|
bool
|
Whether to use RSLora. Defaults to False. |
False
|
modelgenerator.backbones.Onehot
Bases: HFSequenceBackbone
Tokenizer-only model for one-hot encoding. Useful for baseline model testing (CNNs, linear, etc.)
Note
Models using this interface include dna_onehot
and protein_onehot
.
Does not contain any parameters, and cannot be used without an adapter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab_file
|
str
|
Path to the vocabulary file. Defaults to "modelgenerator/huggingface_models/rnabert/vocab.txt". |
None
|
max_length
|
Optional[int]
|
Maximum sequence length. Defaults to 512. |
512
|
modelgenerator.backbones.GenBioCellFoundation
Bases: HFSequenceBackbone
GenBioCellFoundation model
Note
Models using this interface include aido_cell_100m
, aido_cell_10m
, and aido_cell_3m
.
FSDP auto_wrap_policy is modelgenerator.distributed.fsdp.wrap.AutoWrapPolicy
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config_overwrites
|
dict
|
Optional model arguments for PretrainedConfig. Defaults to None. |
required |
model_init_args
|
dict
|
Optional model arguments passed to its init method. Defaults to None. |
required |
from_scratch
|
bool
|
Whether to create the model from scratch. Defaults to False. |
False
|
max_length
|
int
|
Maximum sequence length. Defaults to 512. |
None
|
use_peft
|
bool
|
Whether to use LoRA PEFT. Defaults to False. |
False
|
frozen
|
bool
|
Whether to freeze encoder. Defaults to False. |
False
|
save_peft_only
|
bool
|
Whether to save only the PEFT weights. Defaults to True. |
True
|
lora_r
|
int
|
LoRA r parameter. Defaults to 16. |
16
|
lora_alpha
|
int
|
LoRA alpha parameter. Defaults to 16. |
16
|
lora_dropout
|
float
|
LoRA dropout. Defaults to 0.1. |
0.1
|
lora_target_modules
|
Optional[List[str]]
|
LoRA target modules. Defaults to ["query", "value", "key", "dense", "router"]. |
['query', 'value', 'key', 'dense', 'router']
|
lora_modules_to_save
|
Optional[List[str]]
|
LoRA modules to save. Defaults to None. |
None
|
lora_use_rslora
|
bool
|
Whether to use RSLora. Defaults to False. |
False
|
modelgenerator.backbones.SCFoundation
Bases: HFSequenceBackbone
Wraps SCFoundation model in ModelGenerator backbone with multiple gene embedding modes
Note
Models using this interface include aido_scfoundation
Parameters:
Name | Type | Description | Default |
---|---|---|---|
legacy_adapter_type
|
Union[LegacyAdapterType, None]
|
Type of legacy adapter |
required |
default_config
|
Union[DefaultConfig, None]
|
Default values set by downstream tasks |
required |
max_length
|
Maximum sequence length |
required | |
frozen
|
bool
|
Whether to freeze model |
False
|
output_type
|
str
|
Type of output embedding ('cell', 'gene', 'gene_batch', 'gene_expression') |
'cell'
|
pool_type
|
str
|
Pooling type for cell embedding ('all', 'max') |
'all'
|
input_type
|
str
|
Input data type ('singlecell', 'bulk') |
'singlecell'
|
pre_normalized
|
str
|
Whether input is pre-normalized ('T', 'F', 'A') |
'F'
|
train_last_n_layers
|
int
|
Number of layers to train in the encoder |
0
|
modelgenerator.backbones.Enformer
Bases: HFSequenceBackbone
Wrap Enformer https://github.com/lucidrains/enformer-pytorch in ModelGenerator backbone
Note: Do not support LoRA
Parameters:
Name | Type | Description | Default |
---|---|---|---|
legacy_adapter_type
|
(LegacyAdapterType, None)
|
Type of legacy adapter, setting it to None disables it. |
required |
default_config
|
(dict, None)
|
Default values set by downstream tasks. Defaults to None. |
required |
config_overwrites
|
dict
|
Optional model arguments for PretrainedConfig. Defaults to None. |
required |
model_init_args
|
dict
|
Optional model arguments passed to its init method. Defaults to None. |
required |
from_scratch
|
bool
|
Whether to create the model from scratch. Defaults to False. |
False
|
max_length
|
int
|
Maximum sequence length. Defaults to 196_608. |
196608
|
frozen
|
bool
|
Whether to freeze model. Defaults to False. |
False
|
delete_crop_layer
|
bool
|
Whether to delete cropping layer. Defaults to False. |
False
|
modelgenerator.backbones.Borzoi
Bases: HFSequenceBackbone
Wrap Borzoi https://github.com/johahi/borzoi-pytorch in ModelGenerator backbone
Note: Do not support LoRA
Parameters:
Name | Type | Description | Default |
---|---|---|---|
legacy_adapter_type
|
(LegacyAdapterType, None)
|
Type of legacy adapter, setting it to None disables it. |
required |
default_config
|
(dict, None)
|
Default values set by downstream tasks. Defaults to None. |
required |
config_overwrites
|
dict
|
Optional model arguments for PretrainedConfig. Defaults to None. |
required |
model_init_args
|
dict
|
Optional model arguments passed to its init method. Defaults to None. |
required |
from_scratch
|
bool
|
Whether to create the model from scratch. Defaults to False. |
False
|
max_length
|
int
|
Maximum sequence length. Defaults to 524_288. |
524288
|
frozen
|
bool
|
Whether to freeze model. Defaults to False. |
False
|
delete_crop_layer
|
bool
|
Whether to skip cropping layer. Defaults to False. |
False
|