Style-Aware Blending and Prototype-Based Cross-Contrast Consistency for SSMIS

Abstract

Research Overview

Research Flow: From Phenomena to Solutions

Weak-strong consistency learning strategies are widely employed in semi-supervised medical image segmentation, but existing methods overlook critical limitations within the framework itself.

01

Empirical Distribution Mismatch Challenge

Phenomenon

Empirical Distribution Mismatch

Although labeled and unlabeled data are drawn from the same underlying distribution, the limited of labeled samples often results in an incomplete representation, causing a noticeable distribution mismatch with the unlabeled set.

Problem

Separated Training Streams

A separate and symmetric training data pipeline for labeled and unlabeled images leads to confirmation bias dominated by the labeled stream, failing to leverage the full potential of unlabeled data.

Solution

Style-Guided Distribution Blending

Novel approach that transfers statistical moments from unlabeled to labeled images, enabling effective cross-stream information interaction while preserving semantic content.

02

Consistency Utilization Challenge

Phenomenon

Weak-Strong Consistency

Current frameworks primarily enforce one-directional consistency from weak to strong augmentations, utilizing weak predictions as pseudo-labels for strong predictions.

Problem

Incomplete Supervisory Utilization

Valuable information in strongly augmented predictions is underutilized, limiting exploration of bidirectional consistency and missing opportunities for mutual learning.

Solution

Prototype-Based Cross-Contrast Learning

Confidence-guided prototype estimation with cross-view contrastive learning enables bidirectional supervision while mitigating noise from strong predictions.

Integrated Framework

Our Style-Aware Blending and Prototype-Based Cross-Contrast Consistency Learning Framework synergistically combines both solutions to achieve superior performance across multiple medical segmentation benchmarks, demonstrating effectiveness in various semi-supervised settings.

Plug-and-play architecture

State-of-the-art performance

Cross-dataset generalization

Overview

Framework Overview

$Framework Overview$

Illustration of our framework, comprising two main components: a style-guided distribution blending module (solid box) and a dual-branch architecture (dashed box) with labeled and unlabeled branches. The labeled branch is trained on style-blended labeled data, while the unlabeled branch enforces both pixel-wise weak-strong consistency and prototype-based cross-contrast consistency.

Style-Guided Distribution Blending

Addresses the empirical distribution mismatch between labeled and unlabeled data by transferring statistical moments (mean and variance) from unlabeled to labeled images.

Statistical moment transfer

Content preservation

Style space interpolation

Prototype-Based Cross-Contrast

Enforces mutual consistency between weak and strong augmentations through confidence-guided prototype estimation and cross-view contrastive learning.

Confidence-weighted aggregation

Memory bank storage

Bidirectional supervision

Key Architectural Components

Labeled Branch

Processes style-blended labeled images with weak augmentations, trained using combined cross-entropy and Dice loss.

Unlabeled Branch

Enforces weak-strong consistency and prototype-based cross-contrast consistency using teacher-student paradigm.

Memory Bank

Queue-based category-wise storage for robust prototype representations across training iterations.

Cross-View Loss

Mutual contrastive learning between weak and strong views for enhanced feature representations.

Methodology

Style-Guided Distribution Blending

1 / 3

1
Data Distribution Analysis

The following provides an intuitive visual comparison between partially labeled and unlabeled images on the Synapse dataset under the 5% split setting, demonstrating the empirical distribution mismatch challenge.

Labeled Slices (153 samples)

+ 145 more

Limited samples (5%) Incomplete coverage

Unlabeled Slices (2058 samples)

+ 2051 more

Abundant samples (95%) Complete coverage

The labeled and unlabeled data exhibit significant distributional differences, particularly evident in organs such as the liver. This phenomenon arises from variations in CT scanning equipment and parameters, which result in different statistical distributions in the grayscale appearance of medical images. Such distribution mismatch poses a fundamental challenge for semi-supervised learning approaches.

Prototype-Based Cross-Contrast Consistency

Motivation and Approach

Current weak-strong consistency frameworks primarily enforce one-directional supervision, underutilizing valuable information from strongly augmented predictions. Our prototype-based cross-contrast strategy enables bidirectional learning while mitigating noise through confidence-guided aggregation.

Pipeline Overview

1

Class-wise Feature Extraction

Extract features from weak and strong augmented views

$$Z^* = q(f_\theta(A_*(X^u)))$$

2

Confidence-Guided Aggregation

Compute category-wise prototypes weighted by prediction confidence

$$\tilde{P}_c^* = \frac{\sum_{i,j} Z_{:,i,j}^* \cdot P_{c,i,j}^*}{\sum_{i,j} P_{c,i,j}^*}$$

3

Category-wise Memory Bank Storage

Store prototypes in category-specific queues for robustness

$$P_c^* = \frac{\sum_{k=1}^K \tilde{P}_{c_k}^* \cdot w_{c_k}^*}{\sum_{k=1}^K w_{c_k}^*}$$

4

Cross-View Contrastive Loss

Enforce mutual consistency between weak and strong views

$$\mathcal{L}_{\text{ctr}} = \frac{1}{2}(\mathcal{L}_{\text{ctr}}^w + \mathcal{L}_{\text{ctr}}^s)$$

Cross-Contrast Mechanism

Weak View

Weak Pixel Features → Strong Prototypes

→

Bidirectional
Supervision

←

Strong View

Strong PixelFeatures → Weak Prototypes

Loss Formulation

Weak-to-Strong Contrastive Loss

$$\mathcal{L}_{\text{ctr}}^w (Z_{i,j}^w, P^s) = -\log \left( \frac{\exp \left( \text{sim}(Z_{i,j}^w, P_{\hat{Y}_{i,j}^w}^s)/\tau \right)}{\sum_{c=1}^{\mathcal{C}} \exp \left( \text{sim}(Z_{i,j}^w, P_c^s)/\tau \right)} \right)$$

Strong-to-Weak Contrastive Loss

$$\mathcal{L}_{\text{ctr}}^s (Z_{i,j}^s, P^w) = -\log \left( \frac{\exp \left( \text{sim}(Z_{i,j}^s, P_{\hat{Y}_{i,j}^s}^w)/\tau \right)}{\sum_{c=1}^{\mathcal{C}} \exp \left( \text{sim}(Z_{i,j}^s, P_c^w)/\tau \right)} \right)$$

Combined Loss

$$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{sup}} + \lambda(t) \cdot (\alpha \mathcal{L}_{\text{con}} + \beta \mathcal{L}_{\text{ctr}})$$

where $\lambda(t)$ is a time-dependent weighting function, $\alpha$ and $\beta$ are hyperparameters, and $\tau$ is the temperature parameter.

Results

Experimental Results

State-of-the-Art

Achieves new SOTA results across multiple medical segmentation benchmarks

Consistent Impr

Demonstrates significant improvements over baseline methods

Plug-and-Play

Easy to integrate into other frameworks to enhance performance

Synapse Multi-Organ Dataset

Comprehensive evaluation on 8 abdominal organs (Aorta, Gallbladder, Left/Right Kidney, Liver, Pancreas, Spleen, Stomach) with CT scans under 5% and 10% labeled settings.

5% Labeled Data Performance

Synapse DSC ↑ 61.87% +3.90%

Synapse ASD ↓ 18.81mm Best

10% Labeled Data Performance

Synapse DSC ↑ 66.59% +6.12%

Synapse ASD ↓ 23.52mm Competitive

Detailed Comparison on Synapse Dataset

Method	Labeled	Overall Metrics		Organ-Specific DSC (%)
Method	Labeled	DSC ↑	ASD ↓	Aorta	GB	KL	KR	Liver	PC	SP	SM
UNet (Baseline)	1 (5%)	32.16	41.57	50.72	18.78	27.72	28.88	73.78	9.60	36.85	10.98
UA-MT		34.36	45.59	64.42	21.88	26.46	29.91	73.31	11.75	34.22	12.89
SS-Net		36.95	31.20	65.44	24.51	39.16	19.78	86.40	2.04	51.19	7.11
BCP		43.09	62.03	65.31	12.44	48.18	46.25	82.79	8.51	62.34	18.88
MCSC		34.00	--	50.90	13.00	17.60	54.60	64.30	5.50	43.10	23.50
SCP-Net		36.38	40.00	58.54	21.06	34.92	35.84	76.74	11.43	39.48	13.04
ABD		53.37	18.89	74.87	2.73	73.51	70.69	89.56	17.10	63.77	34.69
AD-MT		57.97	22.82	76.96	26.90	80.36	72.67	84.34	17.33	76.89	28.29
Ours (MT)		60.00	19.73	79.53	40.52	68.59	63.05	89.80	20.70	78.38	39.00
Ours (BCP)		61.87	18.81	80.07	38.78	68.30	63.12	91.60	24.54	80.65	47.93
UNet (Baseline)	4 (10%)	37.75	43.54	58.06	23.07	39.33	27.94	81.00	10.48	40.71	21.39
UA-MT		39.94	42.24	69.96	26.67	44.14	27.30	83.46	4.36	44.72	18.90
SS-Net		41.46	24.50	77.94	31.54	54.01	12.02	86.25	3.36	56.98	9.55
BCP		51.77	43.11	45.08	31.34	62.03	58.05	91.07	16.34	78.82	31.44
MCSC		61.10	--	73.90	26.40	69.90	72.70	90.00	33.20	79.40	43.00
SCP-Net		45.07	25.62	63.44	25.81	57.95	33.07	89.64	14.70	52.99	22.99
ABD		59.61	9.47	78.64	4.93	73.27	68.91	90.23	36.97	76.44	47.48
AD-MT		60.47	20.63	69.55	28.72	76.44	74.56	89.18	30.53	80.86	33.89
Ours (MT)		65.81	15.88	80.77	45.42	76.39	76.45	90.10	30.14	75.84	51.36
Ours (BCP)		66.59	23.52	81.79	40.09	76.80	68.64	90.43	38.80	75.79	60.39

Best results are highlighted in green, second-best in blue. DSC: Dice Similarity Coefficient (%), ASD: Average Surface Distance (mm). GB: Gallbladder, KL: Left Kidney, KR: Right Kidney, PC: Pancreas, SP: Spleen, SM: Stomach.

DSC Performance Comparison (5% Labeled)

UNet

32.16%

UA-MT

34.36%

BCP

43.09%

ABD

53.37%

AD-MT

57.97%

Ours

61.87%

DSC Performance Comparison (10% Labeled)

UNet

37.75%

UA-MT

39.94%

BCP

51.77%

ABD

59.61%

AD-MT

60.47%

Ours

66.59%

ACDC Cardiac Dataset

Evaluation on 3 cardiac structures (Left Ventricle, Myocardium, Right Ventricle) with MRI scans under semi-supervised settings using 3 cases (5%) and 7 cases (10%) as labeled data.

5% Labeled Data Performance

ACDC DSC ↑ 88.60% +1.43%

ACDC ASD ↓ 0.61mm Best

10% Labeled Data Performance

ACDC DSC ↑ 89.80% +0.99%

ACDC ASD ↓ 0.95mm 2nd Best

Detailed Comparison on ACDC Dataset

Method	Labeled	Overall Metrics		Structure-Specific DSC (%)
Method	Labeled	DSC ↑	ASD ↓	LV	Myo	RV
UNet (Baseline)	3 (5%)	78.51	2.47	88.81	77.14	69.58
UA-MT		56.58	8.04	59.10	41.40	69.24
SS-Net		65.82	2.28	65.66	57.55	74.26
BCP		87.59	0.68	85.97	85.71	91.09
MCSC		73.60	--	79.20	70.00	71.70
SCP-Net		70.93	6.55	70.78	61.89	80.11
ABD		85.15	2.81	82.71	84.46	88.29
AD-MT		88.22	0.94	86.13	86.68	91.85
Ours (MT)		87.31	1.10	84.90	85.60	91.42
Ours (BCP)		88.60	0.61	85.51	87.36	92.93
UNet (Baseline)	7 (10%)	80.75	2.75	80.74	74.84	86.67
UA-MT		80.60	2.91	78.79	77.87	85.13
SS-Net		86.78	1.40	84.34	85.36	90.64
BCP		88.84	1.17	86.54	87.68	92.30
MCSC		89.40	--	93.60	87.60	87.10
SCP-Net		88.17	1.67	85.39	87.74	91.39
ABD		87.62	3.08	84.50	88.19	90.17
AD-MT		89.07	0.82	87.11	88.12	91.96
Ours (MT)		89.27	0.63	86.63	88.35	92.82
Ours (BCP)		89.80	0.95	86.10	90.36	92.94

Best results are highlighted in green, second-best in blue. DSC: Dice Similarity Coefficient (%), ASD: Average Surface Distance (mm). LV: Left Ventricle, Myo: Myocardium, RV: Right Ventricle.

DSC Performance Comparison (5% Labeled)

UNet

78.51%

UA-MT

84.25%

BCP

86.83%

ABD

87.09%

AD-MT

87.17%

Ours

88.60%

DSC Performance Comparison (10% Labeled)

UNet

84.31%

UA-MT

87.55%

BCP

88.12%

ABD

88.81%

AD-MT

88.81%

Ours

89.80%

Qualitative Results

Visual segmentation results showing superior balance between under-segmentation and over-segmentation.

Foreground class probabilities are visualized as heatmaps, with red-to-white transitions indicating increasing prediction confidence.

Ablation Studies

Component Analysis

Mean Teacher (Baseline) 55.78% Base

+ Style-guided Blending (SDB) 62.11% +6.33%

+ Cross-Contrast Learning 61.28% +5.50%

Ours (MT) 65.81% Best

Memory Bank Analysis

K = 32 64.76% +0.53%

K = 64 63.87% -0.36%

K = 128 65.81% Best

K = 256 64.78% +0.55%

Style Mixing Coefficient

Binomial 65.64% +9.86%

Beta 64.33% +8.55%

Uniform 65.81% Best

Cross-Contrast Strategy

Weak → Strong 62.35% +6.57%

Strong → Weak 62.14% +6.36%

Bidirectional (Ours) 65.81% Best

Conclusion

Key Takeaways

We propose a novel Style-Aware Blending and Prototype-Based Cross-Contrast Consistency Learning Framework for semi-supervised medical image segmentation that addresses two critical limitations in existing approaches: distribution mismatch and incomplete utilization of supervisory information.

Key Achievements

Novel Insights

Identified distribution mismatch in statistical moments and underutilization of strong predictions

Innovative Solutions

Style-guided blending and prototype-based cross-contrast for effective information exchange

Superior Performance

State-of-the-art results across multiple benchmarks with significant improvements

Style-Aware Blending and Prototype-Based Cross-Contrast Consistency for Semi-Supervised Medical Image Segmentation

Research Overview

Research Flow: From Phenomena to Solutions

Empirical Distribution Mismatch Challenge

Empirical Distribution Mismatch

Separated Training Streams

Style-Guided Distribution Blending

Consistency Utilization Challenge

Weak-Strong Consistency

Incomplete Supervisory Utilization

Prototype-Based Cross-Contrast Learning

Integrated Framework

Framework Overview

Style-Guided Distribution Blending

Prototype-Based Cross-Contrast

Key Architectural Components

Labeled Branch

Unlabeled Branch

Memory Bank

Cross-View Loss

Style-Guided Distribution Blending

1Data Distribution Analysis

Labeled Slices (153 samples)

Unlabeled Slices (2058 samples)

2Statistical Moment Analysis

1st-order moment (Mean)

2nd-order moment (Standard Deviation)

Split-wise L/U Pixel Moments

Class-wise L/U Pixel Moments (5% Labeled)

3Style-Guided Distribution Blending

Statistical Moment Extraction

Labeled Image De-stylization

Style Interpolation

Unlabeled Style Transfer

Prototype-Based Cross-Contrast Consistency

Motivation and Approach

Pipeline Overview

Class-wise Feature Extraction

Confidence-Guided Aggregation

Category-wise Memory Bank Storage

Cross-View Contrastive Loss

Cross-Contrast Mechanism

Weak View

Strong View

Loss Formulation

Weak-to-Strong Contrastive Loss

Strong-to-Weak Contrastive Loss

Combined Loss

Experimental Results

State-of-the-Art

Consistent Impr

Plug-and-Play

Synapse Multi-Organ Dataset

5% Labeled Data Performance

10% Labeled Data Performance

Detailed Comparison on Synapse Dataset

ACDC Cardiac Dataset

5% Labeled Data Performance

10% Labeled Data Performance

Detailed Comparison on ACDC Dataset

Qualitative Results

Ablation Studies

Component Analysis

Memory Bank Analysis

Style Mixing Coefficient

Cross-Contrast Strategy

Key Takeaways

Key Achievements

Novel Insights

Innovative Solutions

Superior Performance

1
Data Distribution Analysis

2
Statistical Moment Analysis

3
Style-Guided Distribution Blending