Skip to content

Conversation

@lohanidamodar
Copy link
Contributor

@lohanidamodar lohanidamodar commented Dec 9, 2025

Usage

  • Automatically construct ID using md5 hash

  • Automatically aggregate metric in DB library using upsertDocumentWithIncrease and in ClickHouse using summingMergeTree engine and final select to make sure the aggregation happens

  • Database adapter (utopia-php/database)

  • ClickHouse adapter

Summary by CodeRabbit

  • New Features

    • Pluggable usage tracking library: adapters (ClickHouse, Database), SQL base, metric model, query builder, Usage facade, batch & counter logging, period/date-range queries, aggregations, multi-tenant/namespace/shared-tables support.
  • Documentation

    • Added README, CONTRIBUTING guide, Code of Conduct, and LICENSE.
  • Tests

    • Extensive unit and integration tests covering adapters, models, query parsing, and core usage flows.
  • Chores

    • CI workflows, Docker/Docker Compose support, PHPUnit config, composer scripts/requirements (PHP ≥8.0), and coding style preset.

✏️ Tip: You can customize this high-level summary in your review settings.

- Database adapter
- ClickHouse adapter
@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Warning

Rate limit exceeded

@lohanidamodar has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 10 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

Adds a Utopia Usage subsystem with public API types: Adapter (abstract), SQL base adapter, Usage façade, Metric, Query, and concrete adapters Database and ClickHouse (multi-tenant, namespace, shared-tables, batch/counter logging, querying, aggregation, purge). Introduces extensive PHPUnit tests and a Usage test trait. Adds CI workflows (Tests, Linter, CodeQL), Dockerfile, docker-compose (mariadb, clickhouse, usage), phpunit.xml, composer.json updates, pint.json, CONTRIBUTING.md, CODE_OF_CONDUCT.md, LICENSE.md, and README.md documentation.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat: Usage library' directly summarizes the main change—adding a comprehensive Usage library with multiple adapters and components.
Docstring Coverage ✅ Passed Docstring coverage is 85.57% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lohanidamodar lohanidamodar requested review from eldadfux and shimonewman and removed request for eldadfux and shimonewman December 9, 2025 03:11
@lohanidamodar lohanidamodar self-assigned this Dec 9, 2025
Copy link

@shimonewman shimonewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

*
* @throws Exception
*/
public function log(string $metric, int $value, string $period = '1h', array $tags = []): bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use constants for the periods

'tags' => json_encode($tags),
];

if ($this->sharedTables) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we aware for the concept of shared vs not shared tables in this scope? We can completely abstract this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstract this where? for database library, Database already abstracts it, but for Clickhouse we don't have a separate library, either we create a clickhouse client that handles these stuff, that can be shared here and in audits as well, or we duplicate this logic both here in audit library in clickhouse adapter

$tableName = $this->getTableName();
$escapedDatabaseAndTable = $this->escapeIdentifier($this->database) . '.' . $this->escapeIdentifier($tableName);

$sql = "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we protecting against SQL injections here?

foreach ($metrics as $metricData) {
$period = $metricData['period'] ?? '1h';

if (! isset(self::PERIODS[$period])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add better validation here of the structure we expect. Might even introduce a Metric class

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 737-750: The buildInsertColumns() method appends $attribute['$id']
which is currently typed as mixed; update the loop in buildInsertColumns() to
assert and handle the type (e.g., use is_string($attribute['$id']) and either
cast to string before appending or skip/throw on invalid types) so the returned
array contains only strings and matches the declared return type; reference the
getAttributes() iteration and the '$id' key in the attribute to locate where to
add the type check and handling.
- Around line 706-717: The loop uses $attribute['$id'] without guaranteeing it's
a string before calling getColumnType; cast and validate it first: retrieve the
raw id from getAttributes() into a local variable, assert/cast it to string
(e.g., (string)$rawId) and optionally handle missing/empty ids (skip or throw)
before calling getColumnType($attrId), and then use that typed $attrId for
building $attrKey, $queryParams and $valuePlaceholders so getColumnType always
receives a string.
- Around line 850-863: The CodeQL/static-analysis warnings come from untyped
array accesses in logBatchCounter (and similarly in logBatch); after
validateMetricsBatch has run, add explicit type assertions/casts when extracting
$metricData elements in logBatchCounter (cast $metric to string, $value to
int/float as appropriate, $tags to array, $period to string/int, and $tenant to
string|null when using resolveTenantFromMetric) before passing them into
buildInsertValuesForMetric; update the extraction lines in logBatchCounter (and
the analogous lines in logBatch) to perform these casts so static analysis
recognizes the narrowed types while leaving validation logic unchanged.
🧹 Nitpick comments (4)
tests/Usage/UsageBase.php (2)

302-330: Remove unused variables $aggCount and $counterCount.

These variables are assigned but never used in assertions. Either remove them or add assertions to verify the count behavior matches expectations.

Proposed fix
         $aggSum = $this->usage->sumByPeriod('agg-vs-counter', '1h');
-        $aggCount = $this->usage->countByPeriod('agg-vs-counter', '1h');
 
         // Clear for counter test
         $this->usage->purge(DateTime::now());
 
         // Log same counter metric 3 times (last one wins)
         $this->assertTrue($this->usage->logCounter('counter-vs-agg', 10, '1h', []));
         $this->assertTrue($this->usage->logCounter('counter-vs-agg', 20, '1h', []));
         $this->assertTrue($this->usage->logCounter('counter-vs-agg', 30, '1h', []));
 
         $counterSum = $this->usage->sumByPeriod('counter-vs-agg', '1h');
-        $counterCount = $this->usage->countByPeriod('counter-vs-agg', '1h');

173-195: Assertions always pass - consider strengthening.

Both testLessThanEqualQuery and testGreaterThanEqualQuery use assertGreaterThanOrEqual(0, count($results)) which always passes. If the intent is only to verify no exceptions are thrown, consider using assertIsArray($results) instead for clarity.

src/Usage/Adapter/Database.php (2)

93-140: $batchSize parameter is unused - consider implementing or documenting.

The logBatch method ignores the $batchSize parameter and processes all metrics in a single upsert operation. While this may be acceptable for the Database adapter, it differs from the ClickHouse adapter which respects the batch size for chunked inserts. Consider either:

  1. Implementing batch chunking for consistency
  2. Adding a comment explaining why batching isn't needed for this adapter

388-407: Consider using batch deletion for better performance.

The purge method deletes documents one-by-one in a loop. For large datasets, this could be slow. Consider using a batch deletion approach if the underlying database supports it.

Comment on lines 706 to 717

// Add attributes dynamically - must include ALL attributes in schema order
foreach ($this->getAttributes() as $attribute) {
$attrId = $attribute['$id'];

$attrKey = $attrId . ($paramCounter > 0 ? '_' . $paramCounter : '');
$type = $this->getColumnType($attrId);

// Use the value from map, or null if not present
$queryParams[$attrKey] = $attributeMap[$attrId] ?? null;
$valuePlaceholders[] = '{' . $attrKey . ':' . $type . '}';
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add type assertion for $attrId before calling getColumnType.

Proposed fix
         // Add attributes dynamically - must include ALL attributes in schema order
         foreach ($this->getAttributes() as $attribute) {
-            $attrId = $attribute['$id'];
+            $attrId = $attribute['$id'];
+            if (!is_string($attrId)) {
+                continue;
+            }

             $attrKey = $attrId . ($paramCounter > 0 ? '_' . $paramCounter : '');
             $type = $this->getColumnType($attrId);
🧰 Tools
🪛 GitHub Actions: CodeQL

[error] 712-712: Parameter #1 $id of method Utopia\Usage\Adapter\ClickHouse::getColumnType() expects string, mixed given.

🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 706 - 717, The loop uses
$attribute['$id'] without guaranteeing it's a string before calling
getColumnType; cast and validate it first: retrieve the raw id from
getAttributes() into a local variable, assert/cast it to string (e.g.,
(string)$rawId) and optionally handle missing/empty ids (skip or throw) before
calling getColumnType($attrId), and then use that typed $attrId for building
$attrKey, $queryParams and $valuePlaceholders so getColumnType always receives a
string.

Comment on lines 737 to 750
private function buildInsertColumns(): array
{
$insertColumns = ['id'];

foreach ($this->getAttributes() as $attribute) {
$insertColumns[] = $attribute['$id'];
}

if ($this->sharedTables) {
$insertColumns[] = 'tenant';
}

return $insertColumns;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add type assertion for attribute ID to fix return type.

$attribute['$id'] is typed as mixed, causing the return type mismatch. Add a type check.

Proposed fix
     private function buildInsertColumns(): array
     {
         $insertColumns = ['id'];

         foreach ($this->getAttributes() as $attribute) {
-            $insertColumns[] = $attribute['$id'];
+            $id = $attribute['$id'];
+            $insertColumns[] = is_string($id) ? $id : '';
         }

         if ($this->sharedTables) {
             $insertColumns[] = 'tenant';
         }

         return $insertColumns;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private function buildInsertColumns(): array
{
$insertColumns = ['id'];
foreach ($this->getAttributes() as $attribute) {
$insertColumns[] = $attribute['$id'];
}
if ($this->sharedTables) {
$insertColumns[] = 'tenant';
}
return $insertColumns;
}
private function buildInsertColumns(): array
{
$insertColumns = ['id'];
foreach ($this->getAttributes() as $attribute) {
$id = $attribute['$id'];
$insertColumns[] = is_string($id) ? $id : '';
}
if ($this->sharedTables) {
$insertColumns[] = 'tenant';
}
return $insertColumns;
}
🧰 Tools
🪛 GitHub Actions: CodeQL

[error] 749-749: Method Utopia\Usage\Adapter\ClickHouse::buildInsertColumns() should return array but returns array<int, mixed>.

🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 737 - 750, The
buildInsertColumns() method appends $attribute['$id'] which is currently typed
as mixed; update the loop in buildInsertColumns() to assert and handle the type
(e.g., use is_string($attribute['$id']) and either cast to string before
appending or skip/throw on invalid types) so the returned array contains only
strings and matches the declared return type; reference the getAttributes()
iteration and the '$id' key in the attribute to locate where to add the type
check and handling.

Comment on lines 850 to 863
foreach ($metricsBatch as $metricData) {
$period = $metricData['period'] ?? Usage::PERIOD_1H;
$metric = $metricData['metric'];
$value = $metricData['value'];
$tags = (array) ($metricData['tags'] ?? []);

// Build values for this metric
$tenant = $this->sharedTables ? $this->resolveTenantFromMetric($metricData) : null;
$result = $this->buildInsertValuesForMetric($metric, $value, $period, $tags, $tenant, $paramCounter);

$queryParams = array_merge($queryParams, $result['queryParams']);
$valueClauses[] = '(' . implode(', ', $result['valuePlaceholders']) . ')';
$paramCounter++;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add type assertions to satisfy static analysis after validation.

The batch processing in logBatchCounter extracts values from $metricData without type assertions, causing CodeQL errors. Since validateMetricsBatch already validates the types, add explicit casts to narrow the types for static analysis.

Proposed fix
             foreach ($metricsBatch as $metricData) {
                 $period = $metricData['period'] ?? Usage::PERIOD_1H;
                 $metric = $metricData['metric'];
                 $value = $metricData['value'];
                 $tags = (array) ($metricData['tags'] ?? []);
+                
+                // Type assertions (validated by validateMetricsBatch)
+                $metric = is_string($metric) ? $metric : '';
+                $value = is_int($value) ? $value : 0;
+                $period = is_string($period) ? $period : Usage::PERIOD_1H;

                 // Build values for this metric
                 $tenant = $this->sharedTables ? $this->resolveTenantFromMetric($metricData) : null;
                 $result = $this->buildInsertValuesForMetric($metric, $value, $period, $tags, $tenant, $paramCounter);

Also applies to logBatch at lines 966-974.

🧰 Tools
🪛 GitHub Actions: CodeQL

[error] 858-858: Parameter #1 $metric of method Utopia\Usage\Adapter\ClickHouse::buildInsertValuesForMetric() expects string, mixed given.


[error] 858-858: Parameter #2 $value of method Utopia\Usage\Adapter\ClickHouse::buildInsertValuesForMetric() expects int, mixed given.


[error] 858-858: Parameter #3 $period of method Utopia\Usage\Adapter\ClickHouse::buildInsertValuesForMetric() expects string, mixed given.


[error] 858-858: Parameter #4 $tags of method Utopia\Usage\Adapter\ClickHouse::buildInsertValuesForMetric() expects array<string, mixed>, array given.

🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 850 - 863, The
CodeQL/static-analysis warnings come from untyped array accesses in
logBatchCounter (and similarly in logBatch); after validateMetricsBatch has run,
add explicit type assertions/casts when extracting $metricData elements in
logBatchCounter (cast $metric to string, $value to int/float as appropriate,
$tags to array, $period to string/int, and $tenant to string|null when using
resolveTenantFromMetric) before passing them into buildInsertValuesForMetric;
update the extraction lines in logBatchCounter (and the analogous lines in
logBatch) to perform these casts so static analysis recognizes the narrowed
types while leaving validation logic unchanged.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 683-688: When PERIOD_INF is used the code sets $time to null which
yields an empty string param and lets ClickHouse coerce to 1970 implicitly;
instead set a deterministic sentinel datetime for $timestamp when
Usage::PERIOD_INF is selected. Update the block around $now/$time/$timestamp
(used with Usage::PERIODS and formatDateTime) to assign a concrete sentinel
string (e.g. '1970-01-01 00:00:00.000') and pass that through
$this->formatDateTime so $timestamp is never null, ensuring partition key
behavior is explicit and deterministic.

Comment on lines 683 to 688
$now = new \DateTime();
$time = $period === Usage::PERIOD_INF
? null
: $now->format(Usage::PERIODS[$period]);
$timestamp = $time !== null ? $this->formatDateTime($time) : null;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -type f -name "ClickHouse.php" 2>/dev/null | head -5

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# First, let's locate the file
find . -type f -name "ClickHouse.php" -path "*/Usage/Adapter/*" 2>/dev/null

Repository: utopia-php/usage

Length of output: 94


🏁 Script executed:

# Let's search from the repo root
git ls-files | grep -i clickhouse

Repository: utopia-php/usage

Length of output: 131


🏁 Script executed:

# Read the relevant file section
head -n 700 src/Usage/Adapter/ClickHouse.php | tail -n 50

Repository: utopia-php/usage

Length of output: 1626


🏁 Script executed:

# Get more context around lines 683-688
sed -n '670,700p' src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 978


🏁 Script executed:

# Search for table schema or CREATE TABLE statement
rg -i "CREATE TABLE|table_schema" src/Usage/Adapter/ClickHouse.php | head -20

Repository: utopia-php/usage

Length of output: 202


🏁 Script executed:

# Search for the column mapping/attribute map where timestamp is used
sed -n '680,750p' src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 2342


🏁 Script executed:

# Search for how attributes/columns are mapped in the insert
rg -A 30 "attributeMap.*=" src/Usage/Adapter/ClickHouse.php | head -60

Repository: utopia-php/usage

Length of output: 1246


🏁 Script executed:

# Find the complete CREATE TABLE statement
rg -B 5 -A 50 "CREATE TABLE IF NOT EXISTS" src/Usage/Adapter/ClickHouse.php | head -150

Repository: utopia-php/usage

Length of output: 2819


🏁 Script executed:

# Also check if there are any migration files or schema definitions
find . -type f \( -name "*.php" -o -name "*.sql" \) | xargs grep -l "time.*DateTime" 2>/dev/null | grep -i schema

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# Search for column type definitions or schema initialization
rg "time.*DateTime|PARTITION BY" src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 438


🏁 Script executed:

# Search for tests related to PERIOD_INF or this code path
rg -i "PERIOD_INF|period_inf" tests/ src/

Repository: utopia-php/usage

Length of output: 263


🏁 Script executed:

# Check if there's error handling around NULL inserts
rg -B 3 -A 3 "try|catch|Exception.*time|Exception.*NULL" src/Usage/Adapter/ClickHouse.php | head -50

Repository: utopia-php/usage

Length of output: 1853


🏁 Script executed:

# Search for how the query is actually executed with parameters
rg -B 5 -A 10 "query.*INSERT.*valuePlaceholders|execute.*INSERT" src/Usage/Adapter/ClickHouse.php | head -80

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# Search for formatParamValue function
rg -A 20 "function formatParamValue|formatParamValue.*=" src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 571


🏁 Script executed:

# Also check the full picture - how parameters are formatted in the HTTP request
rg -B 10 -A 15 "formatParamValue" src/Usage/Adapter/ClickHouse.php | head -80

Repository: utopia-php/usage

Length of output: 3381


🏁 Script executed:

# Check the Usage.php file for PERIODS mapping
rg -B 3 -A 10 "PERIODS.*=" src/Usage/Usage.php

Repository: utopia-php/usage

Length of output: 394


🏁 Script executed:

# Search for getColumnType function
rg -A 30 "function getColumnType" src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 1070


🏁 Script executed:

# Also check how 'time' is defined as an attribute
rg -B 5 -A 15 "getAttributes.*=|'time'.*DateTime" src/Usage/Adapter/ClickHouse.php | head -60

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# Search for where 'time' attribute is defined or set
rg -B 5 -A 5 "'time'" src/Usage/Adapter/ClickHouse.php | grep -A 5 -B 5 "required\|type"

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# Check the attributes array or getAttributes implementation
rg -B 10 -A 30 "function getAttributes|protected.*attributes" src/Usage/Adapter/ClickHouse.php | head -100

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# Look for attribute initialization in parent class or base
fd "Adapter.php" src/ --type f

Repository: utopia-php/usage

Length of output: 81


🏁 Script executed:

# Check the base Adapter class
cat -n src/Usage/Adapter.php

Repository: utopia-php/usage

Length of output: 3630


🏁 Script executed:

# Search for 'time' attribute definition specifically
rg -B 3 -A 3 "'time'" src/Usage/Adapter.php

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# Search for how attributes are defined - looking for 'metric', 'value', 'period', 'time', 'tags' 
rg -B 5 -A 20 "metric.*value.*period|'metric'|'period'|'tags'" src/Usage/Adapter/ClickHouse.php | head -100

Repository: utopia-php/usage

Length of output: 3660


🏁 Script executed:

# Look at how 'metric', 'value', etc. are mapped to columns
rg -B 3 -A 3 "getAttribute.*time|'time'.*required" src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# Search for where the schema attributes are hardcoded or predefined
rg -B 5 -A 20 "buildInsertColumns|columnDefs.*=" src/Usage/Adapter/ClickHouse.php | head -80

Repository: utopia-php/usage

Length of output: 2698


🏁 Script executed:

# Search for getAttributes implementation
rg -A 30 "function getAttributes" src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# Check if 'time', 'metric', 'value', 'period', 'tags' are hardcoded or predefined
rg -B 10 "columns.*metric.*value|columns.*=.*\[\]" src/Usage/Adapter/ClickHouse.php | head -80

Repository: utopia-php/usage

Length of output: 294


🏁 Script executed:

# Look for where columns array is built and check if time is always included
rg -B 20 "time.*DateTime64|columns\[\].*time" src/Usage/Adapter/ClickHouse.php | head -100

Repository: utopia-php/usage

Length of output: 2718


🌐 Web query:

ClickHouse DateTime64 non-nullable column empty string parameter NULL behavior insert

💡 Result:

If you insert an empty string ('') into a non-nullable DateTime64 column, ClickHouse does not treat that as NULL. What happens depends on how you’re inserting (format/settings):

  • CSV input: empty fields are treated as the column default value when input_format_csv_empty_as_default=1 (this is true by default)—so an empty field for DateTime64 becomes its default (typically the “zero” datetime) rather than NULL. [2]
  • JSONEachRow input: empty strings are only replaced with defaults if you enable input_format_json_empty_as_default=1. Otherwise, the empty string is just a string and parsing can fail. [3]
  • Actual NULL into a non-nullable column throws an error unless you enable input_format_null_as_default=1, in which case NULL is replaced by the column’s default. [1][6]

How to handle “empty string means missing” safely

Option A (recommended): make the column nullable

ts Nullable(DateTime64(3))

Then send NULL when you mean missing.

Option B: convert empty string to NULL, then choose a fallback

INSERT INTO t (ts)
VALUES (coalesce(toDateTime64OrNull(nullif({p:String}, ''), 3), toDateTime64('1970-01-01 00:00:00', 3)));

toDateTime64OrNull returns NULL on invalid/empty input. [4]

Option C: convert empty/invalid directly to the default

INSERT INTO t (ts)
VALUES (toDateTime64OrDefault({p:String}, 3));

toDateTime64OrDefault returns the provided default, or 1970-01-01 00:00:00.000 if you don’t provide one. [5]

Sources: [1] Format setting input_format_null_as_default [2] CSV input_format_csv_empty_as_default [3] JSON input_format_json_empty_as_default [4] toDateTime64OrNull [5] toDateTime64OrDefault [6] non-nullable NULL insert error


Set a deterministic timestamp for PERIOD_INF instead of relying on implicit ClickHouse coercion.

The time column is non-nullable DateTime64(3) and used as a partition key. When PERIOD_INF is used, $timestamp is set to null, which becomes an empty string when sent as a parameter. ClickHouse then silently coerces the empty string to its default datetime (1970-01-01 00:00:00), making the behavior implicit and fragile.

Use an explicit sentinel timestamp (e.g., '1970-01-01 00:00:00.000') or reconsider the column design:

Proposed fix (sentinel timestamp)
-        $time = $period === Usage::PERIOD_INF
-            ? null
-            : $now->format(Usage::PERIODS[$period]);
-        $timestamp = $time !== null ? $this->formatDateTime($time) : null;
+        $time = $period === Usage::PERIOD_INF
+            ? '1970-01-01 00:00:00.000'
+            : $now->format(Usage::PERIODS[$period]);
+        $timestamp = $this->formatDateTime($time);
🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 683 - 688, When PERIOD_INF is
used the code sets $time to null which yields an empty string param and lets
ClickHouse coerce to 1970 implicitly; instead set a deterministic sentinel
datetime for $timestamp when Usage::PERIOD_INF is selected. Update the block
around $now/$time/$timestamp (used with Usage::PERIODS and formatDateTime) to
assign a concrete sentinel string (e.g. '1970-01-01 00:00:00.000') and pass that
through $this->formatDateTime so $timestamp is never null, ensuring partition
key behavior is explicit and deterministic.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/Database.php`:
- Around line 482-486: The setSharedTables method currently calls itself causing
infinite recursion; change the internal call from
$this->setSharedTables($sharedTables) to call the underlying database adapter
$this->db->setSharedTables($sharedTables) (ensure $this->db is the correct
property and initialized) and then return $this to preserve fluent interface in
the setSharedTables method.
🧹 Nitpick comments (3)
src/Usage/Adapter/Database.php (3)

93-93: Unused $batchSize parameter.

The $batchSize parameter is declared but never used. Either implement batching logic to process documents in chunks of $batchSize, or remove the parameter to avoid confusion.

Option 1: Implement batching
     public function logBatch(array $metrics, int $batchSize = 1000): bool
     {
         $this->db->getAuthorization()->skip(function () use ($metrics, $batchSize) {
             $documentsById = [];
             // ... existing aggregation logic ...
 
-            if (!empty($documents)) {
-                $this->db->upsertDocumentsWithIncrease($this->collection, 'value', $documents);
-            }
+            if (!empty($documents)) {
+                foreach (array_chunk($documents, $batchSize) as $batch) {
+                    $this->db->upsertDocumentsWithIncrease($this->collection, 'value', $batch);
+                }
+            }
         });

188-188: Unused $batchSize parameter (same as logBatch).

Apply the same batching fix as suggested for logBatch(), or remove the parameter.


388-407: Purge implementation is functional but could be optimized.

The current approach deletes documents one-by-one in a loop. While this works correctly, it may be slow for large datasets. Consider using a bulk delete operation if available in the Utopia\Database library.

Comment on lines +482 to +486
public function setSharedTables(bool $sharedTables): self
{
$this->setSharedTables($sharedTables);
return $this;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Infinite recursion in setSharedTables().

Line 484 calls $this->setSharedTables($sharedTables) instead of $this->db->setSharedTables($sharedTables), causing a stack overflow at runtime.

Required fix
     public function setSharedTables(bool $sharedTables): self
     {
-        $this->setSharedTables($sharedTables);
+        $this->db->setSharedTables($sharedTables);
         return $this;
     }
🤖 Prompt for AI Agents
In `@src/Usage/Adapter/Database.php` around lines 482 - 486, The setSharedTables
method currently calls itself causing infinite recursion; change the internal
call from $this->setSharedTables($sharedTables) to call the underlying database
adapter $this->db->setSharedTables($sharedTables) (ensure $this->db is the
correct property and initialized) and then return $this to preserve fluent
interface in the setSharedTables method.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 912-915: When Usage::PERIOD_INF is detected the code sets
$timestamp to null which relies on ClickHouse coercion into the non-nullable
DateTime64(3) time column; change the branch that sets $timestamp so it uses an
explicit sentinel timestamp (for example the epoch or another agreed sentinel)
formatted via the existing formatDateTime() helper instead of null so the
$timestamp variable written to the time column is deterministic (refer to
Usage::PERIOD_INF, $timestamp, formatDateTime(), and the time column
DateTime64(3)).
- Line 896: The prepareMetricRow function is declared to return ?array but
always returns an array; update its signature to return array (remove the
nullable) and adjust any PHPDoc or type hints that reference ?array to array;
specifically change the method declaration for prepareMetricRow(array
$metricData): ?array to prepareMetricRow(array $metricData): array and ensure
callers and static analyzers remain consistent with the non-nullable return
type.
🧹 Nitpick comments (2)
src/Usage/Adapter/ClickHouse.php (2)

1243-1256: Simplify value extraction — $singleValue only assigned conditionally but used unconditionally.

Query::getValues() always returns an array, making is_array($values) always true. However, $singleValue is only assigned inside the if (is_array($values)) block but used unconditionally at line 1248. This creates fragile code structure.

♻️ Proposed refactor
                 case Query::TYPE_LESSER_EQUAL:
                     $this->validateAttributeName($attribute);
                     $escapedAttr = $this->escapeIdentifier($attribute);
                     $paramName = 'param_' . $paramCounter++;
+                    $singleValue = $values[0] ?? null;
                     if ($attribute === 'time') {
-                        if (is_array($values)) {
-                            /** `@var` \DateTime|string|null $singleValue */
-                            $singleValue = $values[0] ?? null;
-                        }
                         $filters[] = "{$escapedAttr} <= {{$paramName}:DateTime64(3)}";
                         $params[$paramName] = $this->formatDateTime($singleValue);
                     } else {
-                        if (is_array($values)) {
-                            /** `@var` bool|float|int|string $singleValue */
-                            $singleValue = $values[0] ?? null;
-                        }
                         $filters[] = "{$escapedAttr} <= {{$paramName}:String}";
                         $params[$paramName] = $this->formatParamValue($singleValue);
                     }
                     break;

1264-1277: Same refactor needed for TYPE_GREATER_EQUAL.

Same issue as TYPE_LESSER_EQUAL$singleValue is conditionally assigned but unconditionally used.

♻️ Proposed refactor
                 case Query::TYPE_GREATER_EQUAL:
                     $this->validateAttributeName($attribute);
                     $escapedAttr = $this->escapeIdentifier($attribute);
                     $paramName = 'param_' . $paramCounter++;
+                    $singleValue = $values[0] ?? null;
                     if ($attribute === 'time') {
-                        if (is_array($values)) {
-                            /** `@var` \DateTime|string|null $singleValue */
-                            $singleValue = $values[0] ?? null;
-                        }
                         $filters[] = "{$escapedAttr} >= {{$paramName}:DateTime64(3)}";
                         $params[$paramName] = $this->formatDateTime($singleValue);
                     } else {
-                        if (is_array($values)) {
-                            /** `@var` bool|float|int|string $singleValue */
-                            $singleValue = $values[0] ?? null;
-                        }
                         $filters[] = "{$escapedAttr} >= {{$paramName}:String}";
                         $params[$paramName] = $this->formatParamValue($singleValue);
                     }
                     break;

Comment on lines +912 to +915
$time = $period === Usage::PERIOD_INF
? null
: $now->format(Usage::PERIODS[$period]);
$timestamp = $time !== null ? $this->formatDateTime($time) : null;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Null timestamp for PERIOD_INF relies on implicit ClickHouse coercion.

When period === Usage::PERIOD_INF, $timestamp becomes null (line 915), which is then assigned to the time column (line 929). Since time is a non-nullable DateTime64(3) column defined at line 489, ClickHouse will silently coerce the null/empty value to 1970-01-01 00:00:00, making the behavior implicit and fragile.

Consider using an explicit sentinel timestamp for deterministic behavior.

🐛 Proposed fix (explicit sentinel)
         // Period-aligned time
         $now = new \DateTime();
         $time = $period === Usage::PERIOD_INF
-            ? null
+            ? '1970-01-01 00:00:00.000'
             : $now->format(Usage::PERIODS[$period]);
-        $timestamp = $time !== null ? $this->formatDateTime($time) : null;
+        $timestamp = $this->formatDateTime($time);

Also applies to: 929-929

🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 912 - 915, When
Usage::PERIOD_INF is detected the code sets $timestamp to null which relies on
ClickHouse coercion into the non-nullable DateTime64(3) time column; change the
branch that sets $timestamp so it uses an explicit sentinel timestamp (for
example the epoch or another agreed sentinel) formatted via the existing
formatDateTime() helper instead of null so the $timestamp variable written to
the time column is deterministic (refer to Usage::PERIOD_INF, $timestamp,
formatDateTime(), and the time column DateTime64(3)).

…s, update parsing logic, and add comprehensive tests.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 1078-1085: In the count() method after $result =
$this->query($sql, $params) and $json = json_decode($result, true), guard the
decoded value before indexing: verify that $json is an array, that
isset($json['data']) and is_array($json['data']), that isset($json['data'][0])
and is_array($json['data'][0]), and that isset($json['data'][0]['total']) before
casting and returning it; if any check fails return 0. Use the existing symbols
query(), $result, $json and the count() method to locate where to add these
checks.
- Around line 1559-1566: The JSON decode in sumByPeriod() is used without
guarding against failures or non-array shapes; update the code around $result =
$this->query($sql, $parsed['params']); and $json = json_decode($result, true) to
validate that $json is an array, contains a 'data' key that is an array, that
index 0 exists, and that 'grand_total' exists before indexing; if any check
fails return 0 (optionally also handle json_decode errors via json_last_error()
for clarity). Ensure you reference the existing query() call and the $json
variable so the checks are added right before accessing
$json['data'][0]['grand_total'].
- Around line 1330-1372: In parseResults(), guard against json_decode and mixed
types: verify $json is an array and that isset($json['data']) yields an array
before iterating (e.g., ensure $rows is an array), handle json_decode($result,
true) possibly returning null; for each $row ensure it is an array before
foreach; when handling 'tenant' cast only if is_numeric($value) or not null;
when handling 'time' ensure $value is a string before calling strpos/str_replace
and otherwise leave as null or format-safe fallback; when handling 'tags' check
is_string($value) and that json_decode($value, true) returns an array (use
is_array() fallback to empty array) and otherwise preserve arrays/objects; and
generally use is_array/is_string checks to satisfy PHPStan for mixed-offset and
type errors.

In `@tests/Usage/Adapter/ClickHouseTest.php`:
- Around line 319-326: Remove the unused local variables $now and $oldTime from
the test — delete the lines that assign $now = DateTime::now(); and $oldTime =
(new \DateTime())->modify('-2 hours'); and update or retain only the explanatory
comments around the calls to $this->usage->log('metric-A', ...),
$this->usage->log('metric-B', ...), and the comment about relying on creation
times; ensure no other code references $now or $oldTime so PHPMD warnings go
away.

Comment on lines +319 to +326
$now = DateTime::now();
// metric A: value 10, time NOW
$this->usage->log('metric-A', 10, '1h', ['category' => 'cat1']);
// metric B: value 20, time NOW
$this->usage->log('metric-B', 20, '1h', ['category' => 'cat2']);
// metric C: value 30, time NOW - 2 hours
$oldTime = (new \DateTime())->modify('-2 hours');
// We can't easily force time in log(), so we just rely on metrics created now being "newer" than this timestamp
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused locals flagged by PHPMD.

$now and $oldTime are unused; drop them to keep the test clean.

🛠️ Proposed fix
-        $now = DateTime::now();
-        // metric A: value 10, time NOW
+        // metric A: value 10, time NOW
         $this->usage->log('metric-A', 10, '1h', ['category' => 'cat1']);
         // metric B: value 20, time NOW
         $this->usage->log('metric-B', 20, '1h', ['category' => 'cat2']);
-        // metric C: value 30, time NOW - 2 hours
-        $oldTime = (new \DateTime())->modify('-2 hours');
-        // We can't easily force time in log(), so we just rely on metrics created now being "newer" than this timestamp
+        // We can't easily force time in log(), so we just rely on metrics created now being "newer" than this timestamp
🧰 Tools
🪛 PHPMD (2.15.0)

[warning] 319-319: Avoid unused local variables such as '$now'. (undefined)

(UnusedLocalVariable)


[warning] 325-325: Avoid unused local variables such as '$oldTime'. (undefined)

(UnusedLocalVariable)

🤖 Prompt for AI Agents
In `@tests/Usage/Adapter/ClickHouseTest.php` around lines 319 - 326, Remove the
unused local variables $now and $oldTime from the test — delete the lines that
assign $now = DateTime::now(); and $oldTime = (new \DateTime())->modify('-2
hours'); and update or retain only the explanatory comments around the calls to
$this->usage->log('metric-A', ...), $this->usage->log('metric-B', ...), and the
comment about relying on creation times; ensure no other code references $now or
$oldTime so PHPMD warnings go away.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 1264-1278: The branch in ClickHouse.php uses $singleValue after
only assigning it inside the inner is_array($values) check, which can leave it
undefined; update the TYPE_LESS_EQUAL handling (and mirror the same change for
TYPE_GREATER_EQUAL) to initialize $singleValue = null before the is_array check
so both the time branch (used by formatDateTime) and the non-time branch (used
by formatParamValue) always have a defined variable; locate the blocks
referencing $singleValue, $paramName, $escapedAttr, and $values and ensure
$singleValue is set to null (or a sensible default) prior to the conditional
assignment.
🧹 Nitpick comments (1)
src/Usage/Adapter/ClickHouse.php (1)

388-389: Optional: Consider URL-encoding the query string for defensive consistency.

While backticks are valid in URL query strings and ClickHouse correctly handles them, the query string could be URL-encoded for defensive consistency with standard URL practices. However, the current implementation works correctly since table names are internally controlled and alphanumeric. If you want to add URL encoding for robustness:

-        $url = "{$scheme}://{$this->host}:{$this->port}/?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow";
+        $query = "INSERT INTO {$escapedTable} FORMAT JSONEachRow";
+        $url = "{$scheme}://{$this->host}:{$this->port}/?query=" . urlencode($query);

This is not required for correctness but improves alignment with standard URL encoding practices.

Comment on lines +1264 to +1278
if ($attribute === 'time') {
if (is_array($values)) {
/** @var \DateTime|string|null $singleValue */
$singleValue = $values[0] ?? null;
}
$filters[] = "{$escapedAttr} <= {{$paramName}:DateTime64(3)}";
$params[$paramName] = $this->formatDateTime($singleValue);
} else {
if (is_array($values)) {
/** @var bool|float|int|string $singleValue */
$singleValue = $values[0] ?? null;
}
$filters[] = "{$escapedAttr} <= {{$paramName}:String}";
$params[$paramName] = $this->formatParamValue($singleValue);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potentially undefined variable $singleValue.

$singleValue is assigned only inside the if (is_array($values)) block but used unconditionally afterwards. While Query::getValues() always returns an array, this code structure is fragile:

if ($attribute === 'time') {
    if (is_array($values)) {
        $singleValue = $values[0] ?? null;  // Only assigned here
    }
    $filters[] = "...";
    $params[$paramName] = $this->formatDateTime($singleValue);  // Used unconditionally
Proposed fix
                 case Query::TYPE_LESSER_EQUAL:
                     $this->validateAttributeName($attribute);
                     $escapedAttr = $this->escapeIdentifier($attribute);
                     $paramName = 'param_' . $paramCounter++;
+                    $singleValue = $values[0] ?? null;
                     if ($attribute === 'time') {
-                        if (is_array($values)) {
-                            /** `@var` \DateTime|string|null $singleValue */
-                            $singleValue = $values[0] ?? null;
-                        }
                         $filters[] = "{$escapedAttr} <= {{$paramName}:DateTime64(3)}";
                         $params[$paramName] = $this->formatDateTime($singleValue);
                     } else {
-                        if (is_array($values)) {
-                            /** `@var` bool|float|int|string $singleValue */
-                            $singleValue = $values[0] ?? null;
-                        }
                         $filters[] = "{$escapedAttr} <= {{$paramName}:String}";
                         $params[$paramName] = $this->formatParamValue($singleValue);
                     }
                     break;

Apply the same pattern to TYPE_GREATER_EQUAL (lines 1281-1300).

🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 1264 - 1278, The branch in
ClickHouse.php uses $singleValue after only assigning it inside the inner
is_array($values) check, which can leave it undefined; update the
TYPE_LESS_EQUAL handling (and mirror the same change for TYPE_GREATER_EQUAL) to
initialize $singleValue = null before the is_array check so both the time branch
(used by formatDateTime) and the non-time branch (used by formatParamValue)
always have a defined variable; locate the blocks referencing $singleValue,
$paramName, $escapedAttr, and $values and ensure $singleValue is set to null (or
a sensible default) prior to the conditional assignment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 970-1017: In validateMetricsBatch(), the tenant validation reads
the wrong array key ('$tenant') causing tenant checks to be skipped or wrong;
change the lookup to use the correct key 'tenant' (e.g., $tenantValue =
$metricData['tenant']), keep the surrounding existence check
(array_key_exists('tenant', $metricData)), and retain the existing validation
logic (allow non-negative integers or numeric strings, reject negatives and
other types) so that metric-level tenant overrides are validated correctly.
🧹 Nitpick comments (1)
src/Usage/Adapter/Database.php (1)

133-180: $batchSize parameter is unused.

The logBatch method accepts $batchSize but processes all metrics in a single upsertDocumentsWithIncrease call. Either implement batching or document that the Database adapter handles all documents at once regardless of batch size.

Option 1: Document the behavior
     /**
      * Log multiple metrics in batch
      *
      * `@param`  array<array{metric: string, value: int, period?: string, tags?: array<string,mixed>}>  $metrics
-     * `@param`  int  $batchSize  Maximum number of metrics per INSERT statement
+     * `@param`  int  $batchSize  Unused - Database adapter processes all metrics at once via upsert
      */
+    #[\SuppressWarnings(PHPMD.UnusedFormalParameter)]
     public function logBatch(array $metrics, int $batchSize = 1000): bool

Comment on lines +970 to +1017
private function validateMetricsBatch(array $metrics): void
{
foreach ($metrics as $index => $metricData) {
// Validate required fields exist
if (!isset($metricData['metric'])) {
throw new Exception("Metric #{$index}: 'metric' is required");
}
if (!isset($metricData['value'])) {
throw new Exception("Metric #{$index}: 'value' is required");
}

$metric = $metricData['metric'];
$value = $metricData['value'];
$period = $metricData['period'] ?? Usage::PERIOD_1H;

// Validate types
if (!is_string($metric)) {
throw new Exception("Metric #{$index}: 'metric' must be a string, got " . gettype($metric));
}
if (!is_int($value)) {
throw new Exception("Metric #{$index}: 'value' must be an integer, got " . gettype($value));
}
if (!is_string($period)) {
throw new Exception("Metric #{$index}: 'period' must be a string, got " . gettype($period));
}

/** @var array<string, mixed> */
$tags = $metricData['tags'] ?? [];
$this->validateMetricData($metric, $value, $period, $tags, $index);

// Validate tenant when provided (metric-level tenant overrides adapter tenant)
if (array_key_exists('tenant', $metricData)) {
$tenantValue = $metricData['$tenant'];

if ($tenantValue !== null) {
if (is_int($tenantValue)) {
if ($tenantValue < 0) {
throw new Exception("Metric #{$index}: 'tenant' cannot be negative");
}
} elseif (is_string($tenantValue) && ctype_digit($tenantValue)) {
// ok numeric string
} else {
throw new Exception("Metric #{$index}: 'tenant' must be a non-negative integer, got " . gettype($tenantValue));
}
}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Thorough batch validation but tenant key mismatch needs fixing.

The validateMetricsBatch() method performs comprehensive validation of required fields, types, and constraints. However, the tenant key mismatch issue at lines 1001-1002 (noted above) affects this validation block.

🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 970 - 1017, In
validateMetricsBatch(), the tenant validation reads the wrong array key
('$tenant') causing tenant checks to be skipped or wrong; change the lookup to
use the correct key 'tenant' (e.g., $tenantValue = $metricData['tenant']), keep
the surrounding existence check (array_key_exists('tenant', $metricData)), and
retain the existing validation logic (allow non-negative integers or numeric
strings, reject negatives and other types) so that metric-level tenant overrides
are validated correctly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 1026-1041: The validation is reading the wrong array key: when
checking array_key_exists('tenant', $metricData) the code then uses
$metricData['$tenant'] (with a stray dollar sign) which yields null and skips
validation; update the access to use $metricData['tenant'] and keep the existing
null/int/string checks (including the ctype_digit branch) so that the tenant
validation in this block (referencing $metricData and $index) correctly enforces
non-negative integer values and throws the same Exception messages on invalid
input.

Comment on lines +1026 to +1041
// Validate tenant when provided (metric-level tenant overrides adapter tenant)
if (array_key_exists('tenant', $metricData)) {
$tenantValue = $metricData['$tenant'];

if ($tenantValue !== null) {
if (is_int($tenantValue)) {
if ($tenantValue < 0) {
throw new Exception("Metric #{$index}: 'tenant' cannot be negative");
}
} elseif (is_string($tenantValue) && ctype_digit($tenantValue)) {
// ok numeric string
} else {
throw new Exception("Metric #{$index}: 'tenant' must be a non-negative integer, got " . gettype($tenantValue));
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Tenant key mismatch causes validation to read wrong array key.

Line 1027 checks array_key_exists('tenant', $metricData) but line 1028 reads $metricData['$tenant'] (with dollar sign prefix). This means validation passes when the tenant key exists, but then reads from the non-existent $tenant key, returning null and bypassing all validation logic.

🐛 Proposed fix
             // Validate tenant when provided (metric-level tenant overrides adapter tenant)
-            if (array_key_exists('tenant', $metricData)) {
-                $tenantValue = $metricData['$tenant'];
+            if (array_key_exists('$tenant', $metricData)) {
+                $tenantValue = $metricData['$tenant'];
 
                 if ($tenantValue !== null) {

Or alternatively, if the key should be tenant without dollar sign:

             if (array_key_exists('tenant', $metricData)) {
-                $tenantValue = $metricData['$tenant'];
+                $tenantValue = $metricData['tenant'];
🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 1026 - 1041, The validation is
reading the wrong array key: when checking array_key_exists('tenant',
$metricData) the code then uses $metricData['$tenant'] (with a stray dollar
sign) which yields null and skips validation; update the access to use
$metricData['tenant'] and keep the existing null/int/string checks (including
the ctype_digit branch) so that the tenant validation in this block (referencing
$metricData and $index) correctly enforces non-negative integer values and
throws the same Exception messages on invalid input.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 782-883: The INSERT should enable ClickHouse deduplication instead
of disabling retries: modify the SQL in the ClickHouse Adapter where $sql and
$url are built (variables $sql = "INSERT INTO {$escapedTable} FORMAT
JSONEachRow" and $url =
".../?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow") to include a
SETTINGS clause (e.g. " SETTINGS non_replicated_deduplication_window=1") and
ensure the URL encodes that same SQL (or append the equivalent settings query
parameter) so retries remain enabled but server-side deduplication prevents
duplicate aggregated rows; update any logging (buildErrorMessage/logQuery) that
references $sql if needed to reflect the added SETTINGS.

Comment on lines 782 to 883
$attempt = 0;
$lastException = null;

while ($attempt <= $this->maxRetries) {
$startTime = microtime(true);
$scheme = $this->secure ? 'https' : 'http';
$escapedTable = $this->escapeIdentifier($table);
$url = "{$scheme}://{$this->host}:{$this->port}/?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow";

// Update the database header
$this->client->addHeader('X-ClickHouse-Database', $this->database);
$this->client->addHeader('Content-Type', 'application/x-ndjson');

// Enable keep-alive for connection pooling
if ($this->enableKeepAlive) {
$this->client->addHeader('Connection', 'keep-alive');
} else {
$this->client->addHeader('Connection', 'close');
}

// Enable compression if configured
if ($this->enableCompression) {
$this->client->addHeader('Accept-Encoding', 'gzip');
}

// Track request count for statistics (only on first attempt)
if ($attempt === 0) {
$this->requestCount++;
}

// Join JSON strings with newlines
$body = implode("\n", $data);

$sql = "INSERT INTO {$escapedTable} FORMAT JSONEachRow";
$params = ['rows' => count($data), 'bytes' => strlen($body)];

try {
$response = $this->client->fetch(
url: $url,
method: Client::METHOD_POST,
body: $body
);

$httpCode = $response->getStatusCode();

if ($httpCode !== 200) {
$bodyStr = $response->getBody();
$bodyStr = is_string($bodyStr) ? $bodyStr : '';
$duration = microtime(true) - $startTime;
$rowCount = count($data);
$baseError = "ClickHouse insert failed with HTTP {$httpCode}: {$bodyStr}";
$errorMsg = $this->buildErrorMessage($baseError, $table, "INSERT INTO {$table} ({$rowCount} rows)");
$this->logQuery($sql, $params, $duration, false, $errorMsg, $attempt);

// Clean up Content-Type before retry
$this->client->removeHeader('Content-Type');

// Check if error is retryable
if ($attempt < $this->maxRetries && $this->isRetryableError($httpCode, $baseError)) {
$attempt++;
$delay = $this->retryDelay * (2 ** ($attempt - 1)); // Exponential backoff
usleep($delay * 1000); // Convert ms to microseconds
continue;
}

throw new Exception($errorMsg);
}

$duration = microtime(true) - $startTime;
$this->logQuery($sql, $params, $duration, true, null, $attempt);

// Clean up Content-Type after successful insert
$this->client->removeHeader('Content-Type');
return;
} catch (Exception $e) {
$duration = microtime(true) - $startTime;
$this->logQuery($sql, $params, $duration, false, $e->getMessage(), $attempt);
$lastException = $e;

// Clean up Content-Type before retry
$this->client->removeHeader('Content-Type');

// Check if error is retryable
if ($attempt < $this->maxRetries && $this->isRetryableError(null, $e->getMessage())) {
$attempt++;
$delay = $this->retryDelay * (2 ** ($attempt - 1)); // Exponential backoff
usleep($delay * 1000); // Convert ms to microseconds
continue;
}

$rowCount = count($data);
$baseError = "ClickHouse insert execution failed after " . ($attempt + 1) . " attempt(s): {$e->getMessage()}";
$errorMsg = $this->buildErrorMessage($baseError, $table, "INSERT INTO {$table} ({$rowCount} rows)");
throw new Exception($errorMsg, 0, $e);
}
}

// Should never reach here, but just in case
$rowCount = count($data);
$baseError = "ClickHouse insert execution failed after " . ($this->maxRetries + 1) . " attempt(s)";
$errorMsg = $this->buildErrorMessage($baseError, $table, "INSERT INTO {$table} ({$rowCount} rows)");
throw new Exception($errorMsg, 0, $lastException);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's see the full context of the method and class to understand the retry logic better
head -100 src/Usage/Adapter/ClickHouse.php | tail -50

Repository: utopia-php/usage

Length of output: 1928


🏁 Script executed:

# Let's look at the method around lines 782-883 and understand the context
sed -n '700,900p' src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 8777


🏁 Script executed:

# Check the isRetryableError method to understand what errors trigger retries
rg -A 15 "function isRetryableError" src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 699


🏁 Script executed:

# Look for deduplication-related configuration or methods
rg -i "dedup|query_id|insert_dedup" src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# Check what maxRetries is set to
rg "maxRetries\s*=" src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 133


🌐 Web query:

ClickHouse INSERT idempotency query_id deduplication retry safety

💡 Result:

What ClickHouse actually deduplicates (and what it doesn’t)

  • ClickHouse “retry safety” for INSERT is block-based deduplication for *MergeTree tables: each inserted block gets a block_id (hash of the block’s data). If the same block_id is seen again within the table’s dedup log window, the duplicate block is skipped. This is what makes retries effectively idempotent for ambiguous failures like timeouts/network drops. [1]
  • This is not primarily driven by query_id. query_id is useful for tracing/observability, but the documented retry deduplication mechanism is block_id + dedup logs (and optionally a token). [1][2]

How to make INSERT retries safe (recommended approach)

  1. Use a *MergeTree table (dedup-on-insert is only supported there). [1]
  2. Keep retries byte-for-byte and setting-for-setting identical:
    • For INSERT ... VALUES, block splitting is deterministic given the same settings, so retries should reuse the same settings. [1]
    • For INSERT ... SELECT, dedup only works if the SELECT returns the same rows in the same order; the docs recommend using ORDER BY ALL for stable ordering. [1]
  3. Ensure deduplication is enabled:
    • For Replicated*MergeTree, it’s enabled by default and governed by replicated_deduplication_window / _seconds. [1][2]
    • For non-replicated MergeTree, use non_replicated_deduplication_window (and make sure insert_deduplicate=1 is not disabled for your inserts). [1][2]

When you should use a deduplication token

  • If you may legitimately insert the “same” block content multiple times (or you want an explicit idempotency key), set insert_deduplication_token to a stable value for the logical operation you’re retrying; ClickHouse will use it to control dedup behavior (works at partition level). [1][2]

Important limitations (so you don’t assume “exactly-once”)

  • Deduplication only works within a finite deduplication window; if too many other inserts happen before your retry, duplicates can slip through. [1]
  • With materialized views, ambiguous failures can still leave you unsure which tables received data if you stop retrying before success. [1]

Asynchronous inserts caveat

  • For async_insert, deduplication is not automatically on; enable it explicitly (e.g., async_insert_deduplicate=1) and prefer wait_for_async_insert=1 for production reliability. [3][4]

Sources: ClickHouse docs on deduplicating inserts/retries and settings, plus insert-strategy guidance. [1][2][3][4]


🏁 Script executed:

# Search for table creation or engine configuration
rg -i "summing\|mergetree\|engine" src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# Check if there's any ClickHouse settings configuration
rg -i "insert_deduplicate|insert_deduplication_token|settings" src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 199


🏁 Script executed:

# Look for table creation methods or schema setup
rg -A 10 "CREATE TABLE" src/

Repository: utopia-php/usage

Length of output: 1533


🏁 Script executed:

# Check if there are any test files that show how the insert is used
fd -e php -path "*/test*" -o -path "*/spec*" | head -20

Repository: utopia-php/usage

Length of output: 228


🏁 Script executed:

# Check the larger context of the adapter class to understand its purpose
rg -B 5 "class ClickHouse" src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 305


🏁 Script executed:

# Check if insert_deduplication_token is used anywhere
rg "insert_deduplication_token|deduplicate" src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# Check the full insert method signature and if there are any headers/settings related to dedup
sed -n '775,895p' src/Usage/Adapter/ClickHouse.php | head -150

Repository: utopia-php/usage

Length of output: 5209


🏁 Script executed:

# Check if there are any tests that show the expected behavior for retries
find . -name "*.php" -type f | xargs grep -l "ClickHouse" | grep -i test | head -5

Repository: utopia-php/usage

Length of output: 100


🌐 Web query:

ClickHouse block_id deduplication SummingMergeTree default behavior insert

💡 Result:

  • SummingMergeTree uses the standard MergeTree insert-dedup mechanism: incoming data is split into blocks, each block gets a block_id = hash of the inserted block’s bytes (data + schema), and ClickHouse can skip re-inserting a block if its block_id is already in the table’s deduplication log. [1]

  • Default behavior depends on whether the table is replicated:

    • Replicated SummingMergeTree (ReplicatedSummingMergeTree): deduplication on INSERT retries is enabled by default via insert_deduplicate=1, with a default deduplication log window of 100 recent blocks per partition (and also time-limited by replicated_deduplication_window_seconds, default 1 week). [2][3]
    • Non-replicated SummingMergeTree (SummingMergeTree): deduplication is OFF by default because non_replicated_deduplication_window defaults to 0 (disabled). You must set it > 0 to enable deduplication. [3]
  • Practical implication: if you run the same INSERT again (same rows, same order, and effectively the same block splitting), ClickHouse may drop it as a duplicate only within the deduplication window. If you intentionally want to insert identical data multiple times, use insert_deduplication_token (vary it per insert) to avoid “same block_id” deduplication. [1][2]

Sources:
[1] ClickHouse “Deduplicating Inserts on Retries” (block_id definition and behavior) (clickhouse.com)
[2] ClickHouse settings: insert_deduplicate, insert_deduplication_token defaults/notes (clickhouse.com)
[3] ClickHouse MergeTree settings: replicated_deduplication_window default 100; non_replicated_deduplication_window default 0 (docs-content.clickhouse.tech)

Citations:


🏁 Script executed:

# Check if there's any configuration for non_replicated_deduplication_window
rg "non_replicated_deduplication_window|replicated_deduplication" src/

Repository: utopia-php/usage

Length of output: 42


🏁 Script executed:

# See if there's a README or documentation about the ClickHouse adapter
cat README.md | head -100

Repository: utopia-php/usage

Length of output: 3114


🏁 Script executed:

# Check the test file to understand expected behavior
cat tests/Usage/Adapter/ClickHouseTest.php | head -200

Repository: utopia-php/usage

Length of output: 6898


🏁 Script executed:

# Check if there are any settings passed with INSERT queries
rg -B 5 -A 5 "INSERT" src/Usage/Adapter/ClickHouse.php | head -100

Repository: utopia-php/usage

Length of output: 3939


🏁 Script executed:

# Check the exact URL being constructed for INSERT to see if any dedup settings are passed
sed -n '790,810p' src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 867


🏁 Script executed:

# Look at the exact INSERT URL construction
sed -n '782,815p' src/Usage/Adapter/ClickHouse.php

Repository: utopia-php/usage

Length of output: 1404


Enable ClickHouse-level deduplication for INSERT retries instead of disabling them.

INSERT retries with SummingMergeTree can produce duplicate rows if the server acknowledges the insert but the response is lost. This inflates aggregated totals. ClickHouse provides built-in deduplication via block_id (hash of inserted data), but for non-replicated MergeTree tables this is disabled by default. Set non_replicated_deduplication_window > 0 in the INSERT query settings to enable it.

Recommended fix:
Add deduplication settings to the INSERT URL:

-        $url = "{$scheme}://{$this->host}:{$this->port}/?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow";
+        $url = "{$scheme}://{$this->host}:{$this->port}/?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow&settings[non_replicated_deduplication_window]=100";

This preserves retry resilience while preventing duplicate aggregation. Disabling retries entirely sacrifices availability.

🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 782 - 883, The INSERT should
enable ClickHouse deduplication instead of disabling retries: modify the SQL in
the ClickHouse Adapter where $sql and $url are built (variables $sql = "INSERT
INTO {$escapedTable} FORMAT JSONEachRow" and $url =
".../?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow") to include a
SETTINGS clause (e.g. " SETTINGS non_replicated_deduplication_window=1") and
ensure the URL encodes that same SQL (or append the equivalent settings query
parameter) so retries remain enabled but server-side deduplication prevents
duplicate aggregated rows; update any logging (buildErrorMessage/logQuery) that
references $sql if needed to reflect the added SETTINGS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants