Style-Aware Blending and Prototype-Based Cross-Contrast Consistency for Semi-Supervised Medical Image Segmentation

A novel framework addressing distribution mismatch and incomplete utilization of supervisory information in SSMIS through style-guided distribution blending and prototype-based cross-contrast learning

Chaowei Chen1, Xiang Zhang1, Honglie Guo1, and Shunfang Wang1,2,*

1School of Information Science and Engineering, Yunnan University, Kunming 650504, Yunnan, China

2Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming, 650504, Yunnan, China

Correspondence: sfwang_66@ynu.edu.cn

Abstract

Research Overview

Research Flow: From Phenomena to Solutions

Weak-strong consistency learning strategies are widely employed in semi-supervised medical image segmentation, but existing methods overlook critical limitations within the framework itself.

01

Empirical Distribution Mismatch Challenge

Phenomenon
Empirical Distribution Mismatch

Although labeled and unlabeled data are drawn from the same underlying distribution, the limited of labeled samples often results in an incomplete representation, causing a noticeable distribution mismatch with the unlabeled set.

Problem
Separated Training Streams

A separate and symmetric training data pipeline for labeled and unlabeled images leads to confirmation bias dominated by the labeled stream, failing to leverage the full potential of unlabeled data.

Solution
Style-Guided Distribution Blending

Novel approach that transfers statistical moments from unlabeled to labeled images, enabling effective cross-stream information interaction while preserving semantic content.

02

Consistency Utilization Challenge

Phenomenon
Weak-Strong Consistency

Current frameworks primarily enforce one-directional consistency from weak to strong augmentations, utilizing weak predictions as pseudo-labels for strong predictions.

Problem
Incomplete Supervisory Utilization

Valuable information in strongly augmented predictions is underutilized, limiting exploration of bidirectional consistency and missing opportunities for mutual learning.

Solution
Prototype-Based Cross-Contrast Learning

Confidence-guided prototype estimation with cross-view contrastive learning enables bidirectional supervision while mitigating noise from strong predictions.

Integrated Framework

Our Style-Aware Blending and Prototype-Based Cross-Contrast Consistency Learning Framework synergistically combines both solutions to achieve superior performance across multiple medical segmentation benchmarks, demonstrating effectiveness in various semi-supervised settings.

Plug-and-play architecture
State-of-the-art performance
Cross-dataset generalization
Overview

Framework Overview

Framework Overview
Illustration of our framework, comprising two main components: a style-guided distribution blending module (solid box) and a dual-branch architecture (dashed box) with labeled and unlabeled branches. The labeled branch is trained on style-blended labeled data, while the unlabeled branch enforces both pixel-wise weak-strong consistency and prototype-based cross-contrast consistency.

Style-Guided Distribution Blending

Addresses the empirical distribution mismatch between labeled and unlabeled data by transferring statistical moments (mean and variance) from unlabeled to labeled images.

Statistical moment transfer
Content preservation
Style space interpolation

Prototype-Based Cross-Contrast

Enforces mutual consistency between weak and strong augmentations through confidence-guided prototype estimation and cross-view contrastive learning.

Confidence-weighted aggregation
Memory bank storage
Bidirectional supervision

Key Architectural Components

Labeled Branch

Processes style-blended labeled images with weak augmentations, trained using combined cross-entropy and Dice loss.

Unlabeled Branch

Enforces weak-strong consistency and prototype-based cross-contrast consistency using teacher-student paradigm.

Memory Bank

Queue-based category-wise storage for robust prototype representations across training iterations.

Cross-View Loss

Mutual contrastive learning between weak and strong views for enhanced feature representations.

Methodology

Style-Guided Distribution Blending

1 / 3
1
Data Distribution Analysis

The following provides an intuitive visual comparison between partially labeled and unlabeled images on the Synapse dataset under the 5% split setting, demonstrating the empirical distribution mismatch challenge.

Labeled Slices (153 samples)
Slice 94
Slice 97
Slice 100
Slice 109
Slice 116
Slice 125
Slice 129
+ 145 more
Limited samples (5%) Incomplete coverage
Unlabeled Slices (2058 samples)
Slice 229
Slice 448
Slice 772
Slice 1053
Slice 1318
Slice 1887
Slice 2175
+ 2051 more
Abundant samples (95%) Complete coverage

The labeled and unlabeled data exhibit significant distributional differences, particularly evident in organs such as the liver. This phenomenon arises from variations in CT scanning equipment and parameters, which result in different statistical distributions in the grayscale appearance of medical images. Such distribution mismatch poses a fundamental challenge for semi-supervised learning approaches.

Prototype-Based Cross-Contrast Consistency

Motivation and Approach

Current weak-strong consistency frameworks primarily enforce one-directional supervision, underutilizing valuable information from strongly augmented predictions. Our prototype-based cross-contrast strategy enables bidirectional learning while mitigating noise through confidence-guided aggregation.

Pipeline Overview

1
Class-wise Feature Extraction

Extract features from weak and strong augmented views

$$Z^* = q(f_\theta(A_*(X^u)))$$
2
Confidence-Guided Aggregation

Compute category-wise prototypes weighted by prediction confidence

$$\tilde{P}_c^* = \frac{\sum_{i,j} Z_{:,i,j}^* \cdot P_{c,i,j}^*}{\sum_{i,j} P_{c,i,j}^*}$$
3
Category-wise Memory Bank Storage

Store prototypes in category-specific queues for robustness

$$P_c^* = \frac{\sum_{k=1}^K \tilde{P}_{c_k}^* \cdot w_{c_k}^*}{\sum_{k=1}^K w_{c_k}^*}$$
4
Cross-View Contrastive Loss

Enforce mutual consistency between weak and strong views

$$\mathcal{L}_{\text{ctr}} = \frac{1}{2}(\mathcal{L}_{\text{ctr}}^w + \mathcal{L}_{\text{ctr}}^s)$$

Cross-Contrast Mechanism

Weak View

Weak Pixel Features → Strong Prototypes

Bidirectional
Supervision

Strong View

Strong PixelFeatures → Weak Prototypes

Loss Formulation

Weak-to-Strong Contrastive Loss
$$\mathcal{L}_{\text{ctr}}^w (Z_{i,j}^w, P^s) = -\log \left( \frac{\exp \left( \text{sim}(Z_{i,j}^w, P_{\hat{Y}_{i,j}^w}^s)/\tau \right)}{\sum_{c=1}^{\mathcal{C}} \exp \left( \text{sim}(Z_{i,j}^w, P_c^s)/\tau \right)} \right)$$
Strong-to-Weak Contrastive Loss
$$\mathcal{L}_{\text{ctr}}^s (Z_{i,j}^s, P^w) = -\log \left( \frac{\exp \left( \text{sim}(Z_{i,j}^s, P_{\hat{Y}_{i,j}^s}^w)/\tau \right)}{\sum_{c=1}^{\mathcal{C}} \exp \left( \text{sim}(Z_{i,j}^s, P_c^w)/\tau \right)} \right)$$
Combined Loss
$$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{sup}} + \lambda(t) \cdot (\alpha \mathcal{L}_{\text{con}} + \beta \mathcal{L}_{\text{ctr}})$$
where \(\lambda(t)\) is a time-dependent weighting function, \(\alpha\) and \(\beta\) are hyperparameters, and \(\tau\) is the temperature parameter.
Results

Experimental Results

State-of-the-Art

Achieves new SOTA results across multiple medical segmentation benchmarks

Consistent Impr

Demonstrates significant improvements over baseline methods

Plug-and-Play

Easy to integrate into other frameworks to enhance performance

Synapse Multi-Organ Dataset

Comprehensive evaluation on 8 abdominal organs (Aorta, Gallbladder, Left/Right Kidney, Liver, Pancreas, Spleen, Stomach) with CT scans under 5% and 10% labeled settings.

5% Labeled Data Performance

Synapse DSC ↑ 61.87% +3.90%
Synapse ASD ↓ 18.81mm Best

10% Labeled Data Performance

Synapse DSC ↑ 66.59% +6.12%
Synapse ASD ↓ 23.52mm Competitive

Detailed Comparison on Synapse Dataset

MethodLabeledOverall MetricsOrgan-Specific DSC (%)
DSC ↑ASD ↓AortaGBKLKRLiverPCSPSM
UNet (Baseline)1 (5%)32.1641.5750.7218.7827.7228.8873.789.6036.8510.98
UA-MT34.3645.5964.4221.8826.4629.9173.3111.7534.2212.89
SS-Net36.9531.2065.4424.5139.1619.7886.402.0451.197.11
BCP43.0962.0365.3112.4448.1846.2582.798.5162.3418.88
MCSC34.00--50.9013.0017.6054.6064.305.5043.1023.50
SCP-Net36.3840.0058.5421.0634.9235.8476.7411.4339.4813.04
ABD53.3718.8974.872.7373.5170.6989.5617.1063.7734.69
AD-MT57.9722.8276.9626.9080.3672.6784.3417.3376.8928.29
Ours (MT)60.0019.7379.5340.5268.5963.0589.8020.7078.3839.00
Ours (BCP)61.8718.8180.0738.7868.3063.1291.6024.5480.6547.93
UNet (Baseline)4 (10%)37.7543.5458.0623.0739.3327.9481.0010.4840.7121.39
UA-MT39.9442.2469.9626.6744.1427.3083.464.3644.7218.90
SS-Net41.4624.5077.9431.5454.0112.0286.253.3656.989.55
BCP51.7743.1145.0831.3462.0358.0591.0716.3478.8231.44
MCSC61.10--73.9026.4069.9072.7090.0033.2079.4043.00
SCP-Net45.0725.6263.4425.8157.9533.0789.6414.7052.9922.99
ABD59.619.4778.644.9373.2768.9190.2336.9776.4447.48
AD-MT60.4720.6369.5528.7276.4474.5689.1830.5380.8633.89
Ours (MT)65.8115.8880.7745.4276.3976.4590.1030.1475.8451.36
Ours (BCP)66.5923.5281.7940.0976.8068.6490.4338.8075.7960.39
Best results are highlighted in green, second-best in blue. DSC: Dice Similarity Coefficient (%), ASD: Average Surface Distance (mm). GB: Gallbladder, KL: Left Kidney, KR: Right Kidney, PC: Pancreas, SP: Spleen, SM: Stomach.
DSC Performance Comparison (5% Labeled)
UNet
32.16%
UA-MT
34.36%
BCP
43.09%
ABD
53.37%
AD-MT
57.97%
Ours
61.87%
DSC Performance Comparison (10% Labeled)
UNet
37.75%
UA-MT
39.94%
BCP
51.77%
ABD
59.61%
AD-MT
60.47%
Ours
66.59%

ACDC Cardiac Dataset

Evaluation on 3 cardiac structures (Left Ventricle, Myocardium, Right Ventricle) with MRI scans under semi-supervised settings using 3 cases (5%) and 7 cases (10%) as labeled data.

5% Labeled Data Performance

ACDC DSC ↑ 88.60% +1.43%
ACDC ASD ↓ 0.61mm Best

10% Labeled Data Performance

ACDC DSC ↑ 89.80% +0.99%
ACDC ASD ↓ 0.95mm 2nd Best

Detailed Comparison on ACDC Dataset

MethodLabeledOverall MetricsStructure-Specific DSC (%)
DSC ↑ASD ↓LVMyoRV
UNet (Baseline)3 (5%)78.512.4788.8177.1469.58
UA-MT56.588.0459.1041.4069.24
SS-Net65.822.2865.6657.5574.26
BCP87.590.6885.9785.7191.09
MCSC73.60--79.2070.0071.70
SCP-Net70.936.5570.7861.8980.11
ABD85.152.8182.7184.4688.29
AD-MT88.220.9486.1386.6891.85
Ours (MT)87.311.1084.9085.6091.42
Ours (BCP)88.600.6185.5187.3692.93
UNet (Baseline)7 (10%)80.752.7580.7474.8486.67
UA-MT80.602.9178.7977.8785.13
SS-Net86.781.4084.3485.3690.64
BCP88.841.1786.5487.6892.30
MCSC89.40--93.6087.6087.10
SCP-Net88.171.6785.3987.7491.39
ABD87.623.0884.5088.1990.17
AD-MT89.070.8287.1188.1291.96
Ours (MT)89.270.6386.6388.3592.82
Ours (BCP)89.800.9586.1090.3692.94
Best results are highlighted in green, second-best in blue. DSC: Dice Similarity Coefficient (%), ASD: Average Surface Distance (mm). LV: Left Ventricle, Myo: Myocardium, RV: Right Ventricle.
DSC Performance Comparison (5% Labeled)
UNet
78.51%
UA-MT
84.25%
BCP
86.83%
ABD
87.09%
AD-MT
87.17%
Ours
88.60%
DSC Performance Comparison (10% Labeled)
UNet
84.31%
UA-MT
87.55%
BCP
88.12%
ABD
88.81%
AD-MT
88.81%
Ours
89.80%

Qualitative Results

Visual Results
Visual segmentation results showing superior balance between under-segmentation and over-segmentation.
Heatmap Analysis
Foreground class probabilities are visualized as heatmaps, with red-to-white transitions indicating increasing prediction confidence.

Ablation Studies

Component Analysis

Mean Teacher (Baseline) 55.78% Base
+ Style-guided Blending (SDB) 62.11% +6.33%
+ Cross-Contrast Learning 61.28% +5.50%
Ours (MT) 65.81% Best

Memory Bank Analysis

K = 32 64.76% +0.53%
K = 64 63.87% -0.36%
K = 128 65.81% Best
K = 256 64.78% +0.55%

Style Mixing Coefficient

Binomial 65.64% +9.86%
Beta 64.33% +8.55%
Uniform 65.81% Best

Cross-Contrast Strategy

Weak → Strong 62.35% +6.57%
Strong → Weak 62.14% +6.36%
Bidirectional (Ours) 65.81% Best
Conclusion

Key Takeaways

We propose a novel Style-Aware Blending and Prototype-Based Cross-Contrast Consistency Learning Framework for semi-supervised medical image segmentation that addresses two critical limitations in existing approaches: distribution mismatch and incomplete utilization of supervisory information.

Key Achievements

Novel Insights

Identified distribution mismatch in statistical moments and underutilization of strong predictions

Innovative Solutions

Style-guided blending and prototype-based cross-contrast for effective information exchange

Superior Performance

State-of-the-art results across multiple benchmarks with significant improvements

Image Information