|
| 1 | +# Performance Optimization Guidelines |
| 2 | + |
| 3 | +## OpenCV LUT (Look-Up Table) Operations |
| 4 | + |
| 5 | +### Critical: Maintain float32 dtype for LUT arrays |
| 6 | + |
| 7 | +When using `cv2.LUT()` with floating-point lookup tables, **always ensure the LUT array is float32, not float64**. This can have a dramatic performance impact, especially on large arrays like videos. |
| 8 | + |
| 9 | +#### The Problem |
| 10 | + |
| 11 | +OpenCV's statistics functions (`cv2.meanStdDev`, etc.) return float64 values. When these are used in LUT creation: |
| 12 | + |
| 13 | +```python |
| 14 | +# BAD: Creates float64 LUT due to numpy promotion |
| 15 | +mean, std = cv2.meanStdDev(img) # Returns float64 |
| 16 | +lut = (np.arange(0, 256, dtype=np.float32) - mean[0, 0]) / std[0, 0] |
| 17 | +# lut.dtype is now float64! |
| 18 | +``` |
| 19 | + |
| 20 | +This causes: |
| 21 | +1. `cv2.LUT()` returns a float64 array (slower operations) |
| 22 | +2. Subsequent operations (clip, etc.) are slower on float64 |
| 23 | +3. Often requires `.astype(np.float32)` on the large result array (very expensive) |
| 24 | + |
| 25 | +#### The Solution |
| 26 | + |
| 27 | +Cast the LUT array to float32 after creation: |
| 28 | + |
| 29 | +```python |
| 30 | +# GOOD: Maintain float32 throughout |
| 31 | +lut = ((np.arange(0, 256, dtype=np.float32) - mean[0, 0]) / std[0, 0]).astype(np.float32) |
| 32 | +# lut.dtype is float32 |
| 33 | +``` |
| 34 | + |
| 35 | +#### Performance Impact |
| 36 | + |
| 37 | +For a video of shape (200, 256, 256, 3): |
| 38 | +- With float64 LUT: ~111ms (includes expensive astype on result) |
| 39 | +- With float32 LUT: ~55ms (2x faster!) |
| 40 | + |
| 41 | +#### Best Practices |
| 42 | + |
| 43 | +1. **For uint8 images**: LUT operations are extremely fast and should be preferred when possible |
| 44 | +2. **Always check dtype**: Use `.astype(np.float32)` on small LUT arrays (256 elements) rather than large result arrays |
| 45 | +3. **Avoid dtype promotion**: Be aware that numpy operations with mixed dtypes promote to the higher precision type |
| 46 | + |
| 47 | +#### Example: Image Normalization with LUT |
| 48 | + |
| 49 | +```python |
| 50 | +def normalize_with_lut(img: np.ndarray) -> np.ndarray: |
| 51 | + """Fast normalization for uint8 images using LUT""" |
| 52 | + # Get statistics |
| 53 | + mean, std = cv2.meanStdDev(img) |
| 54 | + mean = mean[0, 0] |
| 55 | + std = std[0, 0] + 1e-4 |
| 56 | + |
| 57 | + # Create LUT - ensure float32! |
| 58 | + lut = ((np.arange(0, 256, dtype=np.float32) - mean) / std).astype(np.float32) |
| 59 | + |
| 60 | + # Apply LUT - result will be float32 |
| 61 | + return cv2.LUT(img, lut).clip(-20, 20) |
| 62 | +``` |
| 63 | + |
| 64 | +This optimization applies to any LUT-based operation where floating-point precision is needed. |
0 commit comments