Skip to content

Conversation

@AliAlimohammadi
Copy link
Contributor

Description

This PR adds a complete implementation of Huffman encoding to the compression module. Huffman coding is a widely-used lossless data compression algorithm that assigns variable-length codes to characters based on their frequency of occurrence.

Implementation Details

  • Single file: All functionality in src/compression/huffman_encoding.rs (~555 lines)
  • Complete algorithm: Frequency analysis, binary tree construction using min-heap, code generation, encoding/decoding
  • CLI capability: Can be compiled as standalone tool with rustc
  • Well-tested: 12 comprehensive unit tests + 3 passing doc tests
  • Fully documented: Module and function-level documentation with examples

Features

✅ Huffman encoding and decoding functions
✅ File processing with detailed output display
✅ Unicode support (handles all UTF-8 text)
✅ Command-line tool functionality (via main() function)
✅ Comprehensive error handling and edge cases
✅ Zero unsafe code

Algorithm Complexity

  • Time: O(n + m log m) where n = text length, m = unique characters
  • Space: O(m) for frequency map, tree, and code table

Public API

The module exports two main functions via pub use:

use the_algorithms_rust::compression::{huffman_encode, huffman_decode};

let text = "hello world";
let (encoded, codes) = huffman_encode(text);
let decoded = huffman_decode(&encoded, &codes);
assert_eq!(text, decoded);

Testing

All tests pass with zero warnings:

Unit Tests (12/12 passing)

test compression::huffman_encoding::tests::test_all_unique_characters ... ok
test compression::huffman_encoding::tests::test_build_frequency_map ... ok
test compression::huffman_encoding::tests::test_compression_ratio ... ok
test compression::huffman_encoding::tests::test_demonstrate_empty_file ... ok
test compression::huffman_encoding::tests::test_demonstrate_huffman_from_file ... ok
test compression::huffman_encoding::tests::test_empty_string ... ok
test compression::huffman_encoding::tests::test_encode_decode_roundtrip ... ok
test compression::huffman_encoding::tests::test_frequency_based_encoding ... ok
test compression::huffman_encoding::tests::test_simple_string ... ok
test compression::huffman_encoding::tests::test_single_character ... ok
test compression::huffman_encoding::tests::test_unicode_characters ... ok

Documentation Tests (3/3 passing)

All doc examples compile and run correctly.

Code Quality

  • ✅ Zero clippy warnings (cargo clippy)
  • ✅ Properly formatted (cargo fmt)
  • ✅ No unsafe code
  • ✅ Comprehensive error handling

Test Coverage

  • Empty string handling
  • Single character edge case
  • Round-trip encoding/decoding verification
  • Unicode character support
  • Compression ratio calculation
  • Frequency-based code length verification
  • File processing functions

Files Changed

  • src/compression/huffman_encoding.rs - New implementation file
  • src/compression/mod.rs - Added module declaration and public exports

Usage Example

use the_algorithms_rust::compression::{huffman_encode, huffman_decode};

fn main() {
    let text = "The quick brown fox jumps over the lazy dog";
    
    // Encode
    let (encoded, codes) = huffman_encode(text);
    println!("Original: {} chars", text.len());
    println!("Encoded: {} bits", encoded.len());
    
    // Decode
    let decoded = huffman_decode(&encoded, &codes);
    assert_eq!(text, decoded);
}

Bonus: Standalone CLI Tool

The file can also be compiled as a standalone binary:

rustc src/compression/huffman_encoding.rs -o huffman
./huffman input.txt

This displays character codes, compression statistics, and verification - useful for educational purposes.

Design Decisions

  1. Type-safe tree structure: Uses Rust enum instead of separate classes
  2. Efficient heap operations: BinaryHeap provides O(log n) operations
  3. Clean public API: Re-exports in mod.rs for user convenience
  4. Single-file design: Easier to review and maintain
  5. Dual-purpose: Library module + optional standalone tool

Checklist

  • Code follows repository style guidelines
  • All tests pass locally (15/15)
  • No clippy warnings
  • Code is properly formatted
  • Documentation included with examples
  • Handles edge cases (empty strings, single characters, unicode)
  • Time/space complexity documented
  • No unsafe code used

References

@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2025

Codecov Report

❌ Patch coverage is 93.10345% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.73%. Comparing base (43299ac) to head (69f7ff2).

Files with missing lines Patch % Lines
src/compression/huffman_encoding.rs 93.10% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #981      +/-   ##
==========================================
- Coverage   95.77%   95.73%   -0.04%     
==========================================
  Files         351      352       +1     
  Lines       22934    23166     +232     
==========================================
+ Hits        21964    22178     +214     
- Misses        970      988      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AliAlimohammadi
Copy link
Contributor Author

@siriak, this is ready to be merged.

@siriak siriak merged commit ea93ffd into TheAlgorithms:master Dec 30, 2025
7 checks passed
@AliAlimohammadi AliAlimohammadi deleted the add-huffman-encoding branch December 30, 2025 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants