Skip to content

feat(evals): add skia category with 20 react-native-skia evals#377

Open
andriicallstack wants to merge 1 commit into
mainfrom
feat/skia-evals
Open

feat(evals): add skia category with 20 react-native-skia evals#377
andriicallstack wants to merge 1 commit into
mainfrom
feat/skia-evals

Conversation

@andriicallstack
Copy link
Copy Markdown
Collaborator

Summary

Adds a new skia eval category with 20 evaluations covering the core @shopify/react-native-skia API surface.

Each eval includes a focused prompt, atomic requirements, and a reference implementation. A category README.md with a best-practice inventory and eval traceability matrix is also included.

Evals added

# Eval Focus
01 canvas-fill-background Canvas, Fill, useCanvasSize
02 shape-primitives Rect, Circle, RoundedRect, Line
03 path-drawing Path, Skia.Path.Make()
04 paint-stroke-fill Paint, stroke vs fill
05 linear-gradient LinearGradient, vec
06 radial-gradient RadialGradient
07 image-display Image, useImage
08 text-rendering Text, matchFont
09 blur-filter Blur
10 color-matrix-filter ColorMatrix
11 reanimated-basic-animation useSharedValue, withRepeat, withTiming
12 derived-value-animation useDerivedValue
13 animated-color-interpolation interpolateColors
14 gesture-pan GestureDetector, Gesture.Pan
15 transforms transform, Group
16 clip-rect-and-path ClipRect, ClipPath
17 blend-mode blendMode
18 svg-path-rendering Skia.Path.MakeFromSVGString
19 runtime-effect-shader Skia.RuntimeEffect.Make, GLSL
20 canvas-snapshot useCanvasRef, makeImageSnapshot

Runner fixes

  • Apply extractJsonMiddleware in both the solver and judge client to handle Claude wrapping JSON responses in markdown code fences (AI_NoObjectGeneratedError)
  • Improve ensureOpencodeServerStarted to reuse an already-running OpenCode server instead of attempting a duplicate startup, and forward ANTHROPIC_API_KEY to newly spawned servers

Baseline scores

Solver and judge: anthropic/claude-sonnet-4-5

Score Evals
100% 01, 02, 04, 06, 07, 09, 10, 11, 12, 13, 16, 17
75% 03, 05, 08, 14, 15, 18, 20
50% 19

Average: ~88%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant