Skip to content

Commit 2ea96e6

Browse files
authored
Speedup normalize (#62)
* Added tests for batch normalize * Speedup in normalize * Speedup in Normalize
1 parent 2d76ea5 commit 2ea96e6

File tree

6 files changed

+333
-95
lines changed

6 files changed

+333
-95
lines changed

.cursor/rules/optimizations.mdc

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Performance Optimization Guidelines
2+
3+
## OpenCV LUT (Look-Up Table) Operations
4+
5+
### Critical: Maintain float32 dtype for LUT arrays
6+
7+
When using `cv2.LUT()` with floating-point lookup tables, **always ensure the LUT array is float32, not float64**. This can have a dramatic performance impact, especially on large arrays like videos.
8+
9+
#### The Problem
10+
11+
OpenCV's statistics functions (`cv2.meanStdDev`, etc.) return float64 values. When these are used in LUT creation:
12+
13+
```python
14+
# BAD: Creates float64 LUT due to numpy promotion
15+
mean, std = cv2.meanStdDev(img) # Returns float64
16+
lut = (np.arange(0, 256, dtype=np.float32) - mean[0, 0]) / std[0, 0]
17+
# lut.dtype is now float64!
18+
```
19+
20+
This causes:
21+
1. `cv2.LUT()` returns a float64 array (slower operations)
22+
2. Subsequent operations (clip, etc.) are slower on float64
23+
3. Often requires `.astype(np.float32)` on the large result array (very expensive)
24+
25+
#### The Solution
26+
27+
Cast the LUT array to float32 after creation:
28+
29+
```python
30+
# GOOD: Maintain float32 throughout
31+
lut = ((np.arange(0, 256, dtype=np.float32) - mean[0, 0]) / std[0, 0]).astype(np.float32)
32+
# lut.dtype is float32
33+
```
34+
35+
#### Performance Impact
36+
37+
For a video of shape (200, 256, 256, 3):
38+
- With float64 LUT: ~111ms (includes expensive astype on result)
39+
- With float32 LUT: ~55ms (2x faster!)
40+
41+
#### Best Practices
42+
43+
1. **For uint8 images**: LUT operations are extremely fast and should be preferred when possible
44+
2. **Always check dtype**: Use `.astype(np.float32)` on small LUT arrays (256 elements) rather than large result arrays
45+
3. **Avoid dtype promotion**: Be aware that numpy operations with mixed dtypes promote to the higher precision type
46+
47+
#### Example: Image Normalization with LUT
48+
49+
```python
50+
def normalize_with_lut(img: np.ndarray) -> np.ndarray:
51+
"""Fast normalization for uint8 images using LUT"""
52+
# Get statistics
53+
mean, std = cv2.meanStdDev(img)
54+
mean = mean[0, 0]
55+
std = std[0, 0] + 1e-4
56+
57+
# Create LUT - ensure float32!
58+
lut = ((np.arange(0, 256, dtype=np.float32) - mean) / std).astype(np.float32)
59+
60+
# Apply LUT - result will be float32
61+
return cv2.LUT(img, lut).clip(-20, 20)
62+
```
63+
64+
This optimization applies to any LUT-based operation where floating-point precision is needed.

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ repos:
6262
rev: v0.12.0
6363
hooks:
6464
# Run the linter.
65-
- id: ruff
65+
- id: ruff-check
6666
exclude: '__pycache__/'
6767
args: [ --fix ]
6868
# Run the formatter.

0 commit comments

Comments
 (0)