EDDISPLICE

EDDISPLICE is a single-modality splicing-only VAE that uses the same missingness-aware partial encoder as SpliceVI but without a gene expression branch. It is useful as a baseline or when only splicing data is available.

When to use EDDISPLICE vs SpliceVI

	EDDISPLICE	SpliceVI
Input	Splicing only	Gene expression + Splicing
Use case	Splicing-only datasets; ablation baselines	Paired or unpaired multimodal data
Latent space	Splicing-driven	Joint GE + AS

API reference

`splicevi.EDDISPLICE`

Bases: VAEMixin, UnsupervisedTrainingMixin, BaseModelClass

`init(adata=None, code_dim=16, h_hidden_dim=64, encoder_hidden_dim=128, encoder_n_layers=2, latent_dim=10, dropout_rate=0.1, learn_concentration=True, splice_likelihood='dirichlet_multinomial', encode_covariates=False, deeply_inject_covariates=True, initialize_embeddings_from_pca=True, pool_mode='mean', max_nobs=-1, latent_distribution='normal', **kwargs)`

Initialize an EDDISPLICE model wrapping a PARTIALVAE with the PartialEncoderEDDIFaster encoder and a linear decoder.

This model learns a low-dimensional latent representation of junction-level splicing usage from an AnnData object set up via :meth:EDDISPLICE.setup_anndata. The encoder uses per-junction embeddings plus an EDDI-style partial observation mechanism that only aggregates observed junctions per cell. The reconstruction head supports binomial, beta-binomial, or Dirichlet–multinomial likelihoods.

Parameters:

Name	Type	Description	Default
`adata`	`AnnData \| None`	AnnData object with layers/fields registered using :meth:`EDDISPLICE.setup_anndata`. If `None`, the module is initialized lazily at training time.	`None`
`code_dim`	`int`	Dimensionality of the per-junction embedding table used by the encoder.	`16`
`h_hidden_dim`	`int`	Hidden size of the shared "h" MLP that processes each observed junction embedding + PSI value before pooling across junctions.	`64`
`encoder_hidden_dim`	`int`	Hidden size of the final encoder MLP that maps the pooled per-cell representation to the mean and variance of the latent Gaussian.	`128`
`encoder_n_layers`	`int`	Number of hidden layers in the encoder MLP (passed to :class:`scvi.nn.FCLayers`).	`2`
`latent_dim`	`int`	Dimensionality of the latent space :math:`z`.	`10`
`dropout_rate`	`float`	Dropout probability applied in the encoder networks.	`0.1`
`learn_concentration`	`bool`	If `True`, learn a global concentration parameter used in the beta-binomial or Dirichlet–multinomial reconstruction likelihood.	`True`
`splice_likelihood`	`Literal['binomial', 'beta_binomial', 'dirichlet_multinomial']`	Reconstruction likelihood for junction counts. One of: `"binomial"`: simple binomial over junction vs. cluster counts. `"beta_binomial"`: beta-binomial with learned concentration. `"dirichlet_multinomial"`: hierarchical Dirichlet–multinomial over ATSE-level totals using a junction→ATSE mapping.	`'dirichlet_multinomial'`
`encode_covariates`	`bool`	If `True`, concatenates observed covariates (batch and any extra covariates registered via `setup_anndata`) into the encoder input.	`False`
`deeply_inject_covariates`	`bool`	If `True`, passes covariates into the decoder so that reconstruction can be conditioned on batch / covariates as well as :math:`z`.	`True`
`initialize_embeddings_from_pca`	`bool`	If `True` and `adata` is provided, initialize the per-junction embedding table with a truncated SVD (PCA) fit to the registered splicing layer instead of random initialization.	`True`
`pool_mode`	`Literal['mean', 'sum']`	How to aggregate per-junction representations into a per-cell vector: `"mean"`: average over observed junctions per cell. `"sum"`: sum over observed junctions per cell.	`'mean'`
`max_nobs`	`int`	Maximum number of observed (cell, junction) pairs processed in a single chunk inside the encoder. If negative, disables chunking. Useful for controlling memory when masks are dense.	`-1`
`latent_distribution`	`Literal['normal', 'ln']`	Form of the latent prior/approximate posterior. `"normal"` uses a standard Gaussian; `"ln"` applies a softmax transformation to the latent samples to produce simplex-valued representations.	`'normal'`
`**kwargs`		Additional keyword arguments forwarded to :class:`PARTIALVAE`. These are rarely needed in routine use and are kept for extensibility.	`{}`

Notes

For splice_likelihood="dirichlet_multinomial", a junction→ATSE mapping is constructed from adata.var["event_id"] and stored as a sparse tensor module.junc2atse.
Per-junction embeddings can optionally be initialized from PCA for faster and more stable training on high-dimensional splicing matrices.

`get_latent_representation(adata=None, indices=None, give_mean=True, batch_size=None)`

Return latent embeddings of splicing VAE.

Parameters:

Name	Type	Description	Default
`adata`	`AnnData \| None`	AnnData for inference (defaults to init adata).	`None`
`indices`	`Sequence[int] \| None`	Cell indices to use.	`None`
`give_mean`	`bool`	If True, use posterior mean; else sample.	`True`
`batch_size`	`int \| None`	Batch size.	`None`

Returns:

Type	Description
`Array of shape (cells, latent_dim).`

`get_normalized_splicing(adata=None, indices=None, use_z_mean=True, n_samples=1, batch_size=None, return_numpy=False, silent=True)`

Return the decoded splicing probabilities p_nj = sigmoid(decoder_logits).

Parameters:

Name	Type	Description	Default
`adata`	`AnnData \| None`	AnnData for inference (defaults to the one used at init).	`None`
`indices`	`Sequence[int] \| None`	Which cells to pull (default: all).	`None`
`use_z_mean`	`bool`	If True, run generative with use_z_mean=True.	`True`
`n_samples`	`int`	How many posterior samples to draw (passed to inference).	`1`
`batch_size`	`int \| None`	Mini-batch size (defaults to scvi.settings.batch_size).	`None`
`return_numpy`	`bool`	If True, returns a (n_cells, n_junctions) numpy array; otherwise returns a DataFrame with var_names as columns.	`False`
`silent`	`bool`	If False, shows a little progress info.	`True`

Returns:

Type	Description
`Array or DataFrame of shape (cells, junctions) of decoded probabilities.`

`setup_anndata(adata, junc_ratio_layer, junc_counts_layer, cluster_counts_layer, psi_mask_layer, batch_key=None, size_factor_key=None, categorical_covariate_keys=None, continuous_covariate_keys=None, **kwargs)` `classmethod`

Set up AnnData for EddiSplice.

Parameters:

Name	Type	Description	Default
`adata`	`AnnData`	AnnData to register.	required
`junc_ratio_layer`	`str`	Layer with junction usage ratios (X input).	required
`junc_counts_layer`	`str`	Layer with junction counts (successes).	required
`cluster_counts_layer`	`str`	Layer with total cluster counts (trials).	required
`psi_mask_layer`	`str`	Layer with binary mask (1=observed, 0=missing) per junction.	required
`batch_key`	`str \| None`	Column in obs for batch.	`None`
`size_factor_key`	`str \| None`	If provided, registers size factor but unused for splicing.	`None`
`categorical_covariate_keys`	`list[str] \| None`		`None`
`continuous_covariate_keys`	`list[str] \| None`		`None`

`train(max_epochs=200, lr=0.0001, accelerator='auto', devices='auto', train_size=None, validation_size=None, shuffle_set_split=True, batch_size=512, weight_decay=0.001, eps=1e-08, early_stopping=True, save_best=True, check_val_every_n_epoch=None, n_steps_kl_warmup=None, n_epochs_kl_warmup=10, reduce_lr_on_plateau=False, lr_factor=0.6, lr_patience=30, lr_threshold=0.0, lr_min=0.0, datasplitter_kwargs=None, plan_kwargs=None, **kwargs)`

Trains the model using amortized variational inference on splicing data.

Parameters:

Name	Type	Description	Default
`max_epochs`	`int`	Number of epochs to train over.	`200`
`lr`	`float`	Learning rate for optimization.	`0.0001`
`accelerator`	`str`	Hardware acceleration options.	`'auto'`
`devices`	`str`	Hardware acceleration options.	`'auto'`
`train_size`	`float \| None`	Proportions for splitting the data.	`None`
`validation_size`	`float \| None`	Proportions for splitting the data.	`None`
`shuffle_set_split`	`bool`	Whether to shuffle indices before splitting.	`True`
`batch_size`	`int`	Minibatch size for training.	`512`
`weight_decay`	`float`	Optimizer hyperparameters.	`0.001`
`eps`	`float`	Optimizer hyperparameters.	`0.001`
`early_stopping`	`bool`	Early stopping and checkpointing options.	`True`
`save_best`	`bool`	Early stopping and checkpointing options.	`True`
`check_val_every_n_epoch`	`int \| None`	Frequency of validation checks.	`None`
`n_steps_kl_warmup`	`int \| None`	KL warmup parameters.	`None`
`n_epochs_kl_warmup`	`int \| None`	KL warmup parameters.	`None`
`datasplitter_kwargs`	`dict \| None`	Additional options for data splitting, training plan, and trainer.	`None`
`plan_kwargs`	`dict \| None`	Additional options for data splitting, training plan, and trainer.	`None`
`**kwargs`	`dict \| None`	Additional options for data splitting, training plan, and trainer.	`None`

EDDISPLICE

When to use EDDISPLICE vs SpliceVI

API reference

splicevi.EDDISPLICE

get_latent_representation(adata=None, indices=None, give_mean=True, batch_size=None)

get_normalized_splicing(adata=None, indices=None, use_z_mean=True, n_samples=1, batch_size=None, return_numpy=False, silent=True)

setup_anndata(adata, junc_ratio_layer, junc_counts_layer, cluster_counts_layer, psi_mask_layer, batch_key=None, size_factor_key=None, categorical_covariate_keys=None, continuous_covariate_keys=None, **kwargs) classmethod

`splicevi.EDDISPLICE`

`get_latent_representation(adata=None, indices=None, give_mean=True, batch_size=None)`

`get_normalized_splicing(adata=None, indices=None, use_z_mean=True, n_samples=1, batch_size=None, return_numpy=False, silent=True)`

`setup_anndata(adata, junc_ratio_layer, junc_counts_layer, cluster_counts_layer, psi_mask_layer, batch_key=None, size_factor_key=None, categorical_covariate_keys=None, continuous_covariate_keys=None, **kwargs)` `classmethod`