Skip to content

EDDISPLICE

EDDISPLICE is a single-modality splicing-only VAE that uses the same missingness-aware partial encoder as SpliceVI but without a gene expression branch. It is useful as a baseline or when only splicing data is available.


When to use EDDISPLICE vs SpliceVI

EDDISPLICE SpliceVI
Input Splicing only Gene expression + Splicing
Use case Splicing-only datasets; ablation baselines Paired or unpaired multimodal data
Latent space Splicing-driven Joint GE + AS

API reference

splicevi.EDDISPLICE

Bases: VAEMixin, UnsupervisedTrainingMixin, BaseModelClass

__init__(adata=None, code_dim=16, h_hidden_dim=64, encoder_hidden_dim=128, encoder_n_layers=2, latent_dim=10, dropout_rate=0.1, learn_concentration=True, splice_likelihood='dirichlet_multinomial', encode_covariates=False, deeply_inject_covariates=True, initialize_embeddings_from_pca=True, pool_mode='mean', max_nobs=-1, latent_distribution='normal', **kwargs)

Initialize an EDDISPLICE model wrapping a PARTIALVAE with the PartialEncoderEDDIFaster encoder and a linear decoder.

This model learns a low-dimensional latent representation of junction-level splicing usage from an AnnData object set up via :meth:EDDISPLICE.setup_anndata. The encoder uses per-junction embeddings plus an EDDI-style partial observation mechanism that only aggregates observed junctions per cell. The reconstruction head supports binomial, beta-binomial, or Dirichlet–multinomial likelihoods.

Parameters:

Name Type Description Default
adata AnnData | None

AnnData object with layers/fields registered using :meth:EDDISPLICE.setup_anndata. If None, the module is initialized lazily at training time.

None
code_dim int

Dimensionality of the per-junction embedding table used by the encoder.

16
h_hidden_dim int

Hidden size of the shared "h" MLP that processes each observed junction embedding + PSI value before pooling across junctions.

64
encoder_hidden_dim int

Hidden size of the final encoder MLP that maps the pooled per-cell representation to the mean and variance of the latent Gaussian.

128
encoder_n_layers int

Number of hidden layers in the encoder MLP (passed to :class:scvi.nn.FCLayers).

2
latent_dim int

Dimensionality of the latent space :math:z.

10
dropout_rate float

Dropout probability applied in the encoder networks.

0.1
learn_concentration bool

If True, learn a global concentration parameter used in the beta-binomial or Dirichlet–multinomial reconstruction likelihood.

True
splice_likelihood Literal['binomial', 'beta_binomial', 'dirichlet_multinomial']

Reconstruction likelihood for junction counts. One of:

  • "binomial": simple binomial over junction vs. cluster counts.
  • "beta_binomial": beta-binomial with learned concentration.
  • "dirichlet_multinomial": hierarchical Dirichlet–multinomial over ATSE-level totals using a junction→ATSE mapping.
'dirichlet_multinomial'
encode_covariates bool

If True, concatenates observed covariates (batch and any extra covariates registered via setup_anndata) into the encoder input.

False
deeply_inject_covariates bool

If True, passes covariates into the decoder so that reconstruction can be conditioned on batch / covariates as well as :math:z.

True
initialize_embeddings_from_pca bool

If True and adata is provided, initialize the per-junction embedding table with a truncated SVD (PCA) fit to the registered splicing layer instead of random initialization.

True
pool_mode Literal['mean', 'sum']

How to aggregate per-junction representations into a per-cell vector:

  • "mean": average over observed junctions per cell.
  • "sum": sum over observed junctions per cell.
'mean'
max_nobs int

Maximum number of observed (cell, junction) pairs processed in a single chunk inside the encoder. If negative, disables chunking. Useful for controlling memory when masks are dense.

-1
latent_distribution Literal['normal', 'ln']

Form of the latent prior/approximate posterior. "normal" uses a standard Gaussian; "ln" applies a softmax transformation to the latent samples to produce simplex-valued representations.

'normal'
**kwargs

Additional keyword arguments forwarded to :class:PARTIALVAE. These are rarely needed in routine use and are kept for extensibility.

{}
Notes
  • For splice_likelihood="dirichlet_multinomial", a junction→ATSE mapping is constructed from adata.var["event_id"] and stored as a sparse tensor module.junc2atse.
  • Per-junction embeddings can optionally be initialized from PCA for faster and more stable training on high-dimensional splicing matrices.

get_latent_representation(adata=None, indices=None, give_mean=True, batch_size=None)

Return latent embeddings of splicing VAE.

Parameters:

Name Type Description Default
adata AnnData | None

AnnData for inference (defaults to init adata).

None
indices Sequence[int] | None

Cell indices to use.

None
give_mean bool

If True, use posterior mean; else sample.

True
batch_size int | None

Batch size.

None

Returns:

Type Description
Array of shape (cells, latent_dim).

get_normalized_splicing(adata=None, indices=None, use_z_mean=True, n_samples=1, batch_size=None, return_numpy=False, silent=True)

Return the decoded splicing probabilities p_nj = sigmoid(decoder_logits).

Parameters:

Name Type Description Default
adata AnnData | None

AnnData for inference (defaults to the one used at init).

None
indices Sequence[int] | None

Which cells to pull (default: all).

None
use_z_mean bool

If True, run generative with use_z_mean=True.

True
n_samples int

How many posterior samples to draw (passed to inference).

1
batch_size int | None

Mini-batch size (defaults to scvi.settings.batch_size).

None
return_numpy bool

If True, returns a (n_cells, n_junctions) numpy array; otherwise returns a DataFrame with var_names as columns.

False
silent bool

If False, shows a little progress info.

True

Returns:

Type Description
Array or DataFrame of shape (cells, junctions) of decoded probabilities.

setup_anndata(adata, junc_ratio_layer, junc_counts_layer, cluster_counts_layer, psi_mask_layer, batch_key=None, size_factor_key=None, categorical_covariate_keys=None, continuous_covariate_keys=None, **kwargs) classmethod

Set up AnnData for EddiSplice.

Parameters:

Name Type Description Default
adata AnnData

AnnData to register.

required
junc_ratio_layer str

Layer with junction usage ratios (X input).

required
junc_counts_layer str

Layer with junction counts (successes).

required
cluster_counts_layer str

Layer with total cluster counts (trials).

required
psi_mask_layer str

Layer with binary mask (1=observed, 0=missing) per junction.

required
batch_key str | None

Column in obs for batch.

None
size_factor_key str | None

If provided, registers size factor but unused for splicing.

None
categorical_covariate_keys list[str] | None
None
continuous_covariate_keys list[str] | None
None

train(max_epochs=200, lr=0.0001, accelerator='auto', devices='auto', train_size=None, validation_size=None, shuffle_set_split=True, batch_size=512, weight_decay=0.001, eps=1e-08, early_stopping=True, save_best=True, check_val_every_n_epoch=None, n_steps_kl_warmup=None, n_epochs_kl_warmup=10, reduce_lr_on_plateau=False, lr_factor=0.6, lr_patience=30, lr_threshold=0.0, lr_min=0.0, datasplitter_kwargs=None, plan_kwargs=None, **kwargs)

Trains the model using amortized variational inference on splicing data.

Parameters:

Name Type Description Default
max_epochs int

Number of epochs to train over.

200
lr float

Learning rate for optimization.

0.0001
accelerator str

Hardware acceleration options.

'auto'
devices str

Hardware acceleration options.

'auto'
train_size float | None

Proportions for splitting the data.

None
validation_size float | None

Proportions for splitting the data.

None
shuffle_set_split bool

Whether to shuffle indices before splitting.

True
batch_size int

Minibatch size for training.

512
weight_decay float

Optimizer hyperparameters.

0.001
eps float

Optimizer hyperparameters.

0.001
early_stopping bool

Early stopping and checkpointing options.

True
save_best bool

Early stopping and checkpointing options.

True
check_val_every_n_epoch int | None

Frequency of validation checks.

None
n_steps_kl_warmup int | None

KL warmup parameters.

None
n_epochs_kl_warmup int | None

KL warmup parameters.

None
datasplitter_kwargs dict | None

Additional options for data splitting, training plan, and trainer.

None
plan_kwargs dict | None

Additional options for data splitting, training plan, and trainer.

None
**kwargs dict | None

Additional options for data splitting, training plan, and trainer.

None