[SPARK-38483][PYTHON][CONNECT] Add Column._name property exposing a column's name by AgenticSpark · Pull Request #56726 · apache/spark

AgenticSpark · 2026-06-24T05:31:19Z

What changes were proposed in this pull request?

This adds a _name property to the PySpark Column class that returns the
column's name, alias, or expression as a string -- the same string shown inside
Column.__repr__ (Column<'...'>). It is implemented for both Spark Classic
(self._jc.toString()) and Spark Connect (self._expr.__repr__()).

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(2, "Alice")], ["age", "name"])
>>> df.age._name
'age'
>>> sf.col("value")._name
'value'
>>> sf.col("a").cast("int")._name
'CAST(a AS INT)'

The leading underscore intentionally avoids a collision with the existing
Column.name method, which is an alias for Column.alias.

Why are the changes needed?

Requested in SPARK-38483.
Having the name available as an attribute enables convenient patterns, e.g.
re-aliasing an expression with the source column's name, or branching on a
column's name inside a helper function:

values = sf.col("values")
distinct_values = sf.array_distinct(values).alias(values._name)

def custom_function(col):
    return col.cast("int") if col._name == "my_column" else col.cast("string")

Previously the name was only obtainable by parsing repr(col).

Does this PR introduce any user-facing change?

Yes -- a new Column._name property is available. There is no change to any
existing behavior.

How was this patch tested?

Added test_name_property to ColumnTestsMixin, so it runs under both the
classic (pyspark.sql.tests.test_column) and Spark Connect parity
(pyspark.sql.tests.connect.test_parity_column) suites. It checks concrete
values and the invariant repr(col) == "Column<'%s'>" % col._name. Doctests
were also added on the new property.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: GitHub Copilot CLI (Claude Opus 4.8)

…olumn's name Adds a `_name` property to the PySpark `Column` class that returns the column's name, alias, or expression as a string, mirroring what is shown inside `Column.__repr__`. This makes it easy to reuse a column's name (e.g. re-aliasing an expression with the source column's name) or to branch on the name inside a helper function. The leading underscore intentionally avoids a collision with the existing `Column.name` method, which is an alias for `Column.alias`. Implemented for both Spark Classic (`self._jc.toString()`) and Spark Connect (`self._expr.__repr__()`). Tested with a new case in `ColumnTestsMixin` (exercised by both the classic and Connect parity suites) plus doctests on the new property.

uros-b · 2026-06-24T11:12:36Z

+        name inside a helper function. The leading underscore avoids a collision
+        with the existing :func:`name` method (an alias for :func:`alias`).
+
+        .. versionadded:: 5.0.0


Suggested change

.. versionadded:: 5.0.0

.. versionadded:: 4.3.0

uros-b · 2026-06-24T11:14:10Z


+    @property
+    def _name(self) -> str:
+        return self._expr.__repr__()


Note: undocumented Classic/Connect output divergence. The docstring's cast("int") -> 'CAST(a AS INT)' happens to agree, but Spark's own CastExpression.repr (connect/expressions.py:1005-1008) documents cast("long") -> CAST(a AS BIGINT) (Classic) vs CAST(a AS LONG) (Connect). _name inherits this, so the PR's stated motivating use-case ("branch on the name inside a helper function") is backend-dependent and unreliable.

uros-b · 2026-06-24T11:15:07Z

        ...

+    @property
+    def _name(self) -> str:


Note: leading-underscore name on a documented, versionadded-tagged public API is a design smell (private by Python convention yet appears in docs/tab-completion). The collision-avoidance motivation vs Column.name is real, but an alternative (col_name / expr_name) would be cleaner.

uros-b · 2026-06-24T11:15:29Z


+    @property
+    def _name(self) -> str:
+        return self._expr.__repr__()


Suggested change

return self._expr.__repr__()

return repr(self._expr)

uros-b · 2026-06-24T11:16:07Z

+    def test_name_property(self):
+        # SPARK-38483: Column._name exposes the column name/alias shown in repr
+        self.assertEqual(sf.col("a")._name, "a")
+        self.assertEqual(sf.col("a").cast("int")._name, "CAST(a AS INT)")


The new test has one concrete assert (CAST(a AS INT)) plus a tautological invariant loop (repr(col) == "Column<'%s'>" % col._name); please consider adding a concrete expected string for alias/arithmetic (e.g. col("x").alias("y")._name == "x AS y").

uros-b reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-38483][PYTHON][CONNECT] Add Column._name property exposing a column's name#56726

[SPARK-38483][PYTHON][CONNECT] Add Column._name property exposing a column's name#56726
AgenticSpark wants to merge 1 commit into
apache:masterfrom
AgenticSpark:agenticspark/SPARK-38483-column-name

AgenticSpark commented Jun 24, 2026

Uh oh!

uros-b Jun 24, 2026

Uh oh!

uros-b Jun 24, 2026

Uh oh!

uros-b Jun 24, 2026

Uh oh!

uros-b Jun 24, 2026

Uh oh!

uros-b Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AgenticSpark commented Jun 24, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

uros-b Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants