[SPARK-38483][PYTHON][CONNECT] Add Column._name property exposing a column's name#56726
[SPARK-38483][PYTHON][CONNECT] Add Column._name property exposing a column's name#56726AgenticSpark wants to merge 1 commit into
Conversation
…olumn's name Adds a `_name` property to the PySpark `Column` class that returns the column's name, alias, or expression as a string, mirroring what is shown inside `Column.__repr__`. This makes it easy to reuse a column's name (e.g. re-aliasing an expression with the source column's name) or to branch on the name inside a helper function. The leading underscore intentionally avoids a collision with the existing `Column.name` method, which is an alias for `Column.alias`. Implemented for both Spark Classic (`self._jc.toString()`) and Spark Connect (`self._expr.__repr__()`). Tested with a new case in `ColumnTestsMixin` (exercised by both the classic and Connect parity suites) plus doctests on the new property.
| name inside a helper function. The leading underscore avoids a collision | ||
| with the existing :func:`name` method (an alias for :func:`alias`). | ||
|
|
||
| .. versionadded:: 5.0.0 |
There was a problem hiding this comment.
| .. versionadded:: 5.0.0 | |
| .. versionadded:: 4.3.0 |
|
|
||
| @property | ||
| def _name(self) -> str: | ||
| return self._expr.__repr__() |
There was a problem hiding this comment.
Note: undocumented Classic/Connect output divergence. The docstring's cast("int") -> 'CAST(a AS INT)' happens to agree, but Spark's own CastExpression.repr (connect/expressions.py:1005-1008) documents cast("long") -> CAST(a AS BIGINT) (Classic) vs CAST(a AS LONG) (Connect). _name inherits this, so the PR's stated motivating use-case ("branch on the name inside a helper function") is backend-dependent and unreliable.
| ... | ||
|
|
||
| @property | ||
| def _name(self) -> str: |
There was a problem hiding this comment.
Note: leading-underscore name on a documented, versionadded-tagged public API is a design smell (private by Python convention yet appears in docs/tab-completion). The collision-avoidance motivation vs Column.name is real, but an alternative (col_name / expr_name) would be cleaner.
|
|
||
| @property | ||
| def _name(self) -> str: | ||
| return self._expr.__repr__() |
There was a problem hiding this comment.
| return self._expr.__repr__() | |
| return repr(self._expr) |
| def test_name_property(self): | ||
| # SPARK-38483: Column._name exposes the column name/alias shown in repr | ||
| self.assertEqual(sf.col("a")._name, "a") | ||
| self.assertEqual(sf.col("a").cast("int")._name, "CAST(a AS INT)") |
There was a problem hiding this comment.
The new test has one concrete assert (CAST(a AS INT)) plus a tautological invariant loop (repr(col) == "Column<'%s'>" % col._name); please consider adding a concrete expected string for alias/arithmetic (e.g. col("x").alias("y")._name == "x AS y").
What changes were proposed in this pull request?
This adds a
_nameproperty to the PySparkColumnclass that returns thecolumn's name, alias, or expression as a string -- the same string shown inside
Column.__repr__(Column<'...'>). It is implemented for both Spark Classic(
self._jc.toString()) and Spark Connect (self._expr.__repr__()).The leading underscore intentionally avoids a collision with the existing
Column.namemethod, which is an alias forColumn.alias.Why are the changes needed?
Requested in SPARK-38483.
Having the name available as an attribute enables convenient patterns, e.g.
re-aliasing an expression with the source column's name, or branching on a
column's name inside a helper function:
Previously the name was only obtainable by parsing
repr(col).Does this PR introduce any user-facing change?
Yes -- a new
Column._nameproperty is available. There is no change to anyexisting behavior.
How was this patch tested?
Added
test_name_propertytoColumnTestsMixin, so it runs under both theclassic (
pyspark.sql.tests.test_column) and Spark Connect parity(
pyspark.sql.tests.connect.test_parity_column) suites. It checks concretevalues and the invariant
repr(col) == "Column<'%s'>" % col._name. Doctestswere also added on the new property.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: GitHub Copilot CLI (Claude Opus 4.8)