Skip to content

[ALF] Add index adjustment for UTF-8 indices#8056

Open
emilypgoogle wants to merge 7 commits intomainfrom
ep/citation-utf-16
Open

[ALF] Add index adjustment for UTF-8 indices#8056
emilypgoogle wants to merge 7 commits intomainfrom
ep/citation-utf-16

Conversation

@emilypgoogle
Copy link
Copy Markdown
Contributor

@emilypgoogle emilypgoogle commented Apr 21, 2026

The AI Logic endpoints return citation indices based on UTF-8 bytes, but Java and Kotlin use UTF-16 natively. This means that the provided indices are often offset from actual content and can even point out of bounds, making them very difficult to use.

Applies both to citation metadata and grounding.

Testing was added for all validation to ensure that grounding indices match completely with what is provided, currently passing all extant testing. Further testing was added to force grounding with citation using strings which will differ in length in UTF-8 and UTF-16 (the degree symbol and accented letters are multi-byte unicode characters).

Manual testing was done for accurate indices in practice, exhaustive testing is difficult, as citation is generally rarer. Grounding is easy to force and test.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

📝 PRs merging into main branch

Our main branch should always be in a releasable state. If you are working on a larger change, or if you don't want this change to see the light of the day just yet, consider using a feature branch first, and only merge into the main branch when the code complete and ready to be released.

@emilypgoogle
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements UTF-8 to UTF-16 index conversion for citations and grounding metadata, ensuring accurate text segment mapping in the public API. Key changes include the introduction of a convertUtf8IndexToUtf16 utility, updates to the Candidate model hierarchy, and the addition of comprehensive unit and instrumentation tests. Feedback highlights a potential out-of-bounds error when handling malformed surrogate pairs, performance inefficiencies due to redundant content scans, and compatibility issues with android.util.Log in non-Android test environments.

Comment thread ai-logic/firebase-ai/src/main/kotlin/com/google/firebase/ai/type/Candidate.kt Outdated
Comment thread ai-logic/firebase-ai/src/main/kotlin/com/google/firebase/ai/type/Candidate.kt Outdated
@emilypgoogle emilypgoogle marked this pull request as ready for review May 5, 2026 21:40
val c = text[i].code
progress +=
when {
c < 0x80 -> 1 // ASCII
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, we should be able to use them for some cases, but at a quick glance there's not methods for all of these cases, would you prefer we mix and match direct checking and method usage or remain consistent?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the ones in Char and adding our own as extension functions can help with clarity. Without some decent understanding of encoding it's not easy to follow the code (IMO).

Comment on lines +59 to +60
segment.startIndex shouldBeGreaterThanOrEqual 0
segment.startIndex shouldBeLessThanOrEqual segment.endIndex
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +54 to +55
segment.partIndex shouldBeGreaterThanOrEqual 0
segment.partIndex shouldBeLessThan candidate.content.parts.size
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as below

segment.partIndex shouldBeGreaterThanOrEqual 0
segment.partIndex shouldBeLessThan candidate.content.parts.size
val part = candidate.content.parts[segment.partIndex]
part::class shouldBe TextPart::class
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

internal fun convertUtf8IndexToUtf16(content: Content, originalIndex: Int): Int {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to help understand the logic below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants