Fix Chinese Whispers clustering to process all nodes each iteration by SamareshSingh · Pull Request #3133 · davisking/dlib

SamareshSingh · 2026-02-02T20:26:21Z

Summary

Fixed a critical bug in the Chinese Whispers clustering algorithm where vectors within the distance threshold were not being grouped together correctly.

The Problem

The algorithm was supposed to perform num_iterations complete passes over all nodes in the graph, but instead it was using random node selection. This meant some nodes could be skipped entirely during an iteration, breaking the label propagation required for correct clustering.

For example, vectors with a distance of 0.371814 (below the 0.38 threshold) were ending up in different clusters when they should have been grouped together.

The Solution

Changed the algorithm to guarantee that every node is processed at least once per iteration using a Fisher-Yates shuffle approach:

Each iteration shuffles all node indices
Then processes each node sequentially in the shuffled order
This ensures complete label propagation through the graph while maintaining randomization

- Changed algorithm from random node selection to guaranteed sequential processing using Fisher-Yates shuffle per iteration - Each iteration now shuffles all node indices and processes each sequentially, ensuring complete label propagation

davisking · 2026-02-05T00:51:28Z

Thanks but this isn't what the algorithm is supposed to be doing. I.e. the way it's written in dlib isn't a bug. What is here in this PR is a different, but related algorithm.

davisking · 2026-02-05T00:55:02Z

Although I agree this is what the original chinese whispers paper said to do. I forget at this point why the version in dlib deviates from that paper, but what's in dlib works really well for the applications it's used for so I don't want to go changing it. Might not be as good for existing users.

dlib-issue-bot · 2026-03-12T08:00:06Z

Warning: this issue has been inactive for 35 days and will be automatically closed on 2026-03-22 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Chinese Whispers clustering to process all nodes each iteration#3133

Fix Chinese Whispers clustering to process all nodes each iteration#3133
SamareshSingh wants to merge 1 commit intodavisking:masterfrom
SamareshSingh:fix/chinese-whispers-issue-2829

SamareshSingh commented Feb 2, 2026 •

edited

Loading

Uh oh!

davisking commented Feb 5, 2026

Uh oh!

davisking commented Feb 5, 2026

Uh oh!

dlib-issue-bot commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

SamareshSingh commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The Problem

The Solution

Uh oh!

davisking commented Feb 5, 2026

Uh oh!

davisking commented Feb 5, 2026

Uh oh!

dlib-issue-bot commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SamareshSingh commented Feb 2, 2026 •

edited

Loading