EDDISPLICE
EDDISPLICE is a single-modality splicing-only VAE that uses the same missingness-aware partial encoder as SpliceVI but without a gene expression branch. It is useful as a baseline or when only splicing data is available.
When to use EDDISPLICE vs SpliceVI
| EDDISPLICE | SpliceVI | |
|---|---|---|
| Input | Splicing only | Gene expression + Splicing |
| Use case | Splicing-only datasets; ablation baselines | Paired or unpaired multimodal data |
| Latent space | Splicing-driven | Joint GE + AS |
API reference
splicevi.EDDISPLICE
Bases: VAEMixin, UnsupervisedTrainingMixin, BaseModelClass
__init__(adata=None, code_dim=16, h_hidden_dim=64, encoder_hidden_dim=128, encoder_n_layers=2, latent_dim=10, dropout_rate=0.1, learn_concentration=True, splice_likelihood='dirichlet_multinomial', encode_covariates=False, deeply_inject_covariates=True, initialize_embeddings_from_pca=True, pool_mode='mean', max_nobs=-1, latent_distribution='normal', **kwargs)
Initialize an EDDISPLICE model wrapping a PARTIALVAE with the
PartialEncoderEDDIFaster encoder and a linear decoder.
This model learns a low-dimensional latent representation of junction-level
splicing usage from an AnnData object set up via
:meth:EDDISPLICE.setup_anndata. The encoder uses per-junction embeddings
plus an EDDI-style partial observation mechanism that only aggregates
observed junctions per cell. The reconstruction head supports binomial,
beta-binomial, or Dirichlet–multinomial likelihoods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData | None
|
AnnData object with layers/fields registered using
:meth: |
None
|
code_dim
|
int
|
Dimensionality of the per-junction embedding table used by the encoder. |
16
|
h_hidden_dim
|
int
|
Hidden size of the shared "h" MLP that processes each observed junction embedding + PSI value before pooling across junctions. |
64
|
encoder_hidden_dim
|
int
|
Hidden size of the final encoder MLP that maps the pooled per-cell representation to the mean and variance of the latent Gaussian. |
128
|
encoder_n_layers
|
int
|
Number of hidden layers in the encoder MLP (passed to
:class: |
2
|
latent_dim
|
int
|
Dimensionality of the latent space :math: |
10
|
dropout_rate
|
float
|
Dropout probability applied in the encoder networks. |
0.1
|
learn_concentration
|
bool
|
If |
True
|
splice_likelihood
|
Literal['binomial', 'beta_binomial', 'dirichlet_multinomial']
|
Reconstruction likelihood for junction counts. One of:
|
'dirichlet_multinomial'
|
encode_covariates
|
bool
|
If |
False
|
deeply_inject_covariates
|
bool
|
If |
True
|
initialize_embeddings_from_pca
|
bool
|
If |
True
|
pool_mode
|
Literal['mean', 'sum']
|
How to aggregate per-junction representations into a per-cell vector:
|
'mean'
|
max_nobs
|
int
|
Maximum number of observed (cell, junction) pairs processed in a single chunk inside the encoder. If negative, disables chunking. Useful for controlling memory when masks are dense. |
-1
|
latent_distribution
|
Literal['normal', 'ln']
|
Form of the latent prior/approximate posterior. |
'normal'
|
**kwargs
|
Additional keyword arguments forwarded to :class: |
{}
|
Notes
- For
splice_likelihood="dirichlet_multinomial", a junction→ATSE mapping is constructed fromadata.var["event_id"]and stored as a sparse tensormodule.junc2atse. - Per-junction embeddings can optionally be initialized from PCA for faster and more stable training on high-dimensional splicing matrices.
get_latent_representation(adata=None, indices=None, give_mean=True, batch_size=None)
Return latent embeddings of splicing VAE.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData | None
|
AnnData for inference (defaults to init adata). |
None
|
indices
|
Sequence[int] | None
|
Cell indices to use. |
None
|
give_mean
|
bool
|
If True, use posterior mean; else sample. |
True
|
batch_size
|
int | None
|
Batch size. |
None
|
Returns:
| Type | Description |
|---|---|
Array of shape (cells, latent_dim).
|
|
get_normalized_splicing(adata=None, indices=None, use_z_mean=True, n_samples=1, batch_size=None, return_numpy=False, silent=True)
Return the decoded splicing probabilities p_nj = sigmoid(decoder_logits).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData | None
|
AnnData for inference (defaults to the one used at init). |
None
|
indices
|
Sequence[int] | None
|
Which cells to pull (default: all). |
None
|
use_z_mean
|
bool
|
If True, run generative with use_z_mean=True. |
True
|
n_samples
|
int
|
How many posterior samples to draw (passed to inference). |
1
|
batch_size
|
int | None
|
Mini-batch size (defaults to scvi.settings.batch_size). |
None
|
return_numpy
|
bool
|
If True, returns a (n_cells, n_junctions) numpy array; otherwise returns a DataFrame with var_names as columns. |
False
|
silent
|
bool
|
If False, shows a little progress info. |
True
|
Returns:
| Type | Description |
|---|---|
Array or DataFrame of shape (cells, junctions) of decoded probabilities.
|
|
setup_anndata(adata, junc_ratio_layer, junc_counts_layer, cluster_counts_layer, psi_mask_layer, batch_key=None, size_factor_key=None, categorical_covariate_keys=None, continuous_covariate_keys=None, **kwargs)
classmethod
Set up AnnData for EddiSplice.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
AnnData to register. |
required |
junc_ratio_layer
|
str
|
Layer with junction usage ratios (X input). |
required |
junc_counts_layer
|
str
|
Layer with junction counts (successes). |
required |
cluster_counts_layer
|
str
|
Layer with total cluster counts (trials). |
required |
psi_mask_layer
|
str
|
Layer with binary mask (1=observed, 0=missing) per junction. |
required |
batch_key
|
str | None
|
Column in obs for batch. |
None
|
size_factor_key
|
str | None
|
If provided, registers size factor but unused for splicing. |
None
|
categorical_covariate_keys
|
list[str] | None
|
|
None
|
continuous_covariate_keys
|
list[str] | None
|
|
None
|
train(max_epochs=200, lr=0.0001, accelerator='auto', devices='auto', train_size=None, validation_size=None, shuffle_set_split=True, batch_size=512, weight_decay=0.001, eps=1e-08, early_stopping=True, save_best=True, check_val_every_n_epoch=None, n_steps_kl_warmup=None, n_epochs_kl_warmup=10, reduce_lr_on_plateau=False, lr_factor=0.6, lr_patience=30, lr_threshold=0.0, lr_min=0.0, datasplitter_kwargs=None, plan_kwargs=None, **kwargs)
Trains the model using amortized variational inference on splicing data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_epochs
|
int
|
Number of epochs to train over. |
200
|
lr
|
float
|
Learning rate for optimization. |
0.0001
|
accelerator
|
str
|
Hardware acceleration options. |
'auto'
|
devices
|
str
|
Hardware acceleration options. |
'auto'
|
train_size
|
float | None
|
Proportions for splitting the data. |
None
|
validation_size
|
float | None
|
Proportions for splitting the data. |
None
|
shuffle_set_split
|
bool
|
Whether to shuffle indices before splitting. |
True
|
batch_size
|
int
|
Minibatch size for training. |
512
|
weight_decay
|
float
|
Optimizer hyperparameters. |
0.001
|
eps
|
float
|
Optimizer hyperparameters. |
0.001
|
early_stopping
|
bool
|
Early stopping and checkpointing options. |
True
|
save_best
|
bool
|
Early stopping and checkpointing options. |
True
|
check_val_every_n_epoch
|
int | None
|
Frequency of validation checks. |
None
|
n_steps_kl_warmup
|
int | None
|
KL warmup parameters. |
None
|
n_epochs_kl_warmup
|
int | None
|
KL warmup parameters. |
None
|
datasplitter_kwargs
|
dict | None
|
Additional options for data splitting, training plan, and trainer. |
None
|
plan_kwargs
|
dict | None
|
Additional options for data splitting, training plan, and trainer. |
None
|
**kwargs
|
dict | None
|
Additional options for data splitting, training plan, and trainer. |
None
|