Skip to content

Enhance advisory grouping 2172#2228

Open
shivamshrma09 wants to merge 1 commit intoaboutcode-org:mainfrom
shivamshrma09:enhance-advisory-grouping-2172
Open

Enhance advisory grouping 2172#2228
shivamshrma09 wants to merge 1 commit intoaboutcode-org:mainfrom
shivamshrma09:enhance-advisory-grouping-2172

Conversation

@shivamshrma09
Copy link

Summary

Fixes #2172

This PR enhances the advisory grouping mechanism by adding two new
heuristics to the existing ComputeToDo pipeline in
compute_advisory_todo.py.

Changes

New issue types in ISSUE_TYPE_CHOICES (models.py)

  • POTENTIALLY_RELATED_BY_ALIASES — for advisories from different
    datasources that share the same alias
  • SIMILAR_SUMMARIES — for advisories with near-identical summaries

New pipeline steps (compute_advisory_todo.py)

relate_advisories_by_aliases()

  • Iterates over all AdvisoryAlias objects
  • If 2+ advisories from different datasources share the same alias,
    creates a POTENTIALLY_RELATED_BY_ALIASES todo
  • Stores the shared alias in issue_detail
  • Same-datasource advisories are skipped (expected, not interesting)

detect_similar_summaries()

  • For each alias group with 2+ different datasources
  • Compares all cross-datasource advisory pairs using
    difflib.SequenceMatcher
  • If similarity ratio >= 0.8, creates a SIMILAR_SUMMARIES todo
  • Stores similarity_score, datasource_a, datasource_b in
    issue_detail
  • Advisories with empty summaries are excluded

Migration (0117_add_alias_and_summary_issue_types.py)

  • Updates issue_type field choices on both AdvisoryToDo and
    AdvisoryToDoV2

Tests (test_compute_advisory_todo_v2.py) — 5 new tests

  • test_relate_advisories_by_aliases_creates_todo
  • test_relate_advisories_by_aliases_same_datasource_not_flagged
  • test_detect_similar_summaries_creates_todo
  • test_detect_similar_summaries_below_threshold_not_flagged
  • test_detect_similar_summaries_empty_summary_skipped

Approach

Both new steps follow the exact same pattern as the existing
detect_conflicting_advisories() step — using LoopProgress,
bulk_create_with_m2m(), and advisories_checksum() from the
existing codebase.

The SUMMARY_SIMILARITY_THRESHOLD = 0.8 constant is defined at the
top of the file for easy adjustment.

@shivamshrma09 shivamshrma09 force-pushed the enhance-advisory-grouping-2172 branch from b3a2b09 to cfdec1c Compare March 20, 2026 09:25
…advisory grouping

Signed-off-by: shivamshrma09 <shivamsharma27107@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance advisory grouping mechanism

1 participant