feat : support spark compatible int to timestamp cast by coderfender · Pull Request #20555 · apache/datafusion

coderfender · 2026-02-25T16:59:22Z

Which issue does this PR close?

Closes Support spark compatible cast from int -> timestamp #20554

Rationale for this change

What changes are included in this PR?

Created a new cast scalar UDF to support int to timestamp casts (spark compatible). The goal is to leverage this is as an entry point for all spark compatible / df incompatible casts

Are these changes tested?

Yes (through series of unit tests testing various edge cases like overflow , null input etc)

Are there any user-facing changes?

Yes. We are bringing in ability to leverage spark compatible cast operations in DF through spark_cast as a first step. Next step would be to make changes to the planner to leverage spark compatible cast instead of regular cast operations

martin-g · 2026-02-27T08:29:12Z

datafusion/spark/src/function/conversion/cast.rs

+#[derive(Debug, PartialEq, Eq, Hash)]
+pub struct Cast {
+    signature: Signature,
+}


Suggested change

}

}

martin-g · 2026-02-27T08:33:10Z

datafusion/spark/src/function/conversion/cast.rs

+    }
+
+    fn return_type(&self, arg_types: &[DataType]) -> DataFusionResult<DataType> {
+        // for now we will be supporting int -> timestamp and keep adding more spark-compatible spark


Suggested change

// for now we will be supporting int -> timestamp and keep adding more spark-compatible spark

// for now we will be supporting int -> timestamp and keep adding more spark-compatible casts

?!

martin-g · 2026-02-27T08:36:22Z

datafusion/spark/src/function/conversion/cast.rs

+        &self.signature
+    }
+
+    fn return_type(&self, arg_types: &[DataType]) -> DataFusionResult<DataType> {


Consider implementing return_field_from_args() instead.
It gives better support for deciding whether the return type is nullable or not.
See datafusion/functions/src/core/coalesce.rs for inspiration.

Something like:

fn return_type(&self, _arg_types: &[DataType]) -> DataFusionResult<DataType> { internal_err!("return_field_from_args should be used instead") } fn return_field_from_args(&self, args: ReturnFieldArgs) -> DataFusionResult<FieldRef> { let nullable = args.arg_fields.iter().any(|f| f.is_nullable()); Ok(Arc::new(Field::new( self.name(), DataType::Timestamp(TimeUnit::Microsecond, None), nullable, ))) }

martin-g · 2026-02-27T08:42:48Z

datafusion/spark/src/function/conversion/cast.rs

+use std::sync::Arc;
+const MICROS_PER_SECOND: i64 = 1_000_000;
+
+#[derive(Debug, PartialEq, Eq, Hash)]


Please add some documentation.
With a link to the Spark function that is implemented.

martin-g · 2026-02-27T08:48:13Z

datafusion/spark/src/function/conversion/cast.rs

+impl Cast {
+    pub fn new() -> Self {
+        Self {
+            signature: Signature::any(1, Volatility::Immutable),


I think the signature should specify a target type.
At the moment only TimestampMicrosecond is supported but later when other types are needed you will need to add the second parameter.
SELECT spark_cast(arrow_cast(0, 'Int8')); does not tell me anyhow that 0_i8 will be casted to a timestamp.

martin-g · 2026-02-27T09:02:16Z

datafusion/spark/src/function/conversion/cast.rs

+        if arr.is_null(i) {
+            builder.append_null();
+        } else {
+            let micros = (arr.value(i).into()).saturating_mul(MICROS_PER_SECOND);


Spark does not saturate.

spark-sql (default)> SELECT cast(1 AS TIMESTAMP); 1970-01-01 02:00:01 Time taken: 1.06 seconds, Fetched 1 row(s) spark-sql (default)> SELECT cast(987654321 AS TIMESTAMP); 2001-04-19 07:25:21 Time taken: 0.035 seconds, Fetched 1 row(s) spark-sql (default)> SELECT cast(987654321012 AS TIMESTAMP); +33267-07-09 09:30:12 Time taken: 0.036 seconds, Fetched 1 row(s) spark-sql (default)> SELECT cast(987654321012345 AS TIMESTAMP); +294247-01-10 06:00:54.775807 Time taken: 0.035 seconds, Fetched 1 row(s) spark-sql (default)> SELECT cast(9876543210123456789 AS TIMESTAMP); +282703-12-03 02:32:57.380672 Time taken: 0.034 seconds, Fetched 1 row(s) spark-sql (default)> SELECT cast(98765432101234567890987654321 AS TIMESTAMP); -68156-01-09 16:20:08.49952 Time taken: 0.04 seconds, Fetched 1 row(s) spark-sql (default)> SELECT cast(98765432101234567890987654321434636434636432463463462362362 AS TIMESTAMP); [DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION] Decimal precision 59 exceeds max precision 38. SQLSTATE: 22003 org.apache.spark.SparkArithmeticException: [DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION] Decimal precision 59 exceeds max precision 38. SQLSTATE: 22003 at org.apache.spark.sql.errors.DataTypeErrors$.decimalPrecisionExceedsMaxPrecisionError(DataTypeErrors.scala:45) at org.apache.spark.sql.types.DecimalType.<init>(DecimalType.scala:52) at org.apache.spark.sql.types.DecimalType$.fromDecimal(DecimalType.scala:142) at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:85)

martin-g · 2026-02-27T09:03:27Z

datafusion/spark/src/function/conversion/cast.rs

+    ColumnarValue, ScalarFunctionArgs, ScalarUDFImpl, Signature, Volatility,
+};
+use std::any::Any;
+use std::sync::Arc;


Suggested change

use std::sync::Arc;

use std::sync::Arc;

martin-g · 2026-02-27T09:08:54Z

datafusion/spark/src/function/conversion/cast.rs

+        let args = make_args(ColumnarValue::Scalar(ScalarValue::Int8(Some(100))));
+        let result = cast.invoke_with_args(args).unwrap();
+        assert_scalar_timestamp(result, 100_000_000);
+    }


Suggested change

}

}

#[test]

fn test_cast_scalar_int16() {

let cast = Cast::new();

let args = make_args(ColumnarValue::Scalar(ScalarValue::Int16(Some(100))));

let result = cast.invoke_with_args(args).unwrap();

assert_scalar_timestamp(result, 100_000_000);

}

martin-g · 2026-02-27T09:11:18Z

datafusion/spark/src/function/conversion/mod.rs

 use datafusion_expr::ScalarUDF;
 use std::sync::Arc;

 pub mod expr_fn {}


Maybe add an export_functions! here ?!

martin-g · 2026-02-27T09:13:49Z

datafusion/sqllogictest/test_files/spark/conversion/cast_int_to_timestamp.slt

+query P
+SELECT spark_cast(NULL::bigint);
+----
+NULL


How does it behave for SELECT spark_cast(NULL); ? Let's add it
You may need to add | DataType::Null to line 85

coderfender · 2026-02-27T15:13:52Z

Thank you . I will address the review comments shortly

github-actions bot added the spark label Feb 25, 2026

coderfender force-pushed the df_int_timestamp_cast branch 3 times, most recently from 339a475 to 05c433b Compare February 25, 2026 17:26

coderfender marked this pull request as draft February 25, 2026 17:28

df_int_timestamp_cast

9e71267

coderfender force-pushed the df_int_timestamp_cast branch from 05c433b to 9e71267 Compare February 25, 2026 19:50

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 25, 2026

coderfender marked this pull request as ready for review February 25, 2026 19:54

martin-g reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat : support spark compatible int to timestamp cast#20555

feat : support spark compatible int to timestamp cast#20555
coderfender wants to merge 1 commit intoapache:mainfrom
coderfender:df_int_timestamp_cast

coderfender commented Feb 25, 2026 •

edited

Loading

Uh oh!

martin-g Feb 27, 2026

Uh oh!

martin-g Feb 27, 2026

Uh oh!

martin-g Feb 27, 2026

Uh oh!

martin-g Feb 27, 2026

Uh oh!

martin-g Feb 27, 2026

Uh oh!

martin-g Feb 27, 2026

Uh oh!

martin-g Feb 27, 2026

Uh oh!

martin-g Feb 27, 2026

Uh oh!

martin-g Feb 27, 2026

Uh oh!

martin-g Feb 27, 2026

Uh oh!

coderfender commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// for now we will be supporting int -> timestamp and keep adding more spark-compatible spark
	// for now we will be supporting int -> timestamp and keep adding more spark-compatible casts

-    }
+    }
+    #[test]
+    fn test_cast_scalar_int16() {
+        let cast = Cast::new();
+        let args = make_args(ColumnarValue::Scalar(ScalarValue::Int16(Some(100))));
+        let result = cast.invoke_with_args(args).unwrap();
+        assert_scalar_timestamp(result, 100_000_000);
+    }

Conversation

coderfender commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderfender commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderfender commented Feb 25, 2026 •

edited

Loading