Class supporting queries for GMN catalog#856
Class supporting queries for GMN catalog#856g7gpr wants to merge 2 commits intoCroatianMeteorNetwork:prereleasefrom
Conversation
|
This is brilliant - I haven't looked into the full details but can you make it so that:
Brilliant work as always! |
There was a problem hiding this comment.
Pull request overview
Adds a Catalog helper to load the GMN star catalog once, build a KD-tree for fast repeated spherical proximity searches, and exposes a CLI/test routine for querying by RA/Dec.
Changes:
- Introduces
Catalogclass that loads the configured star catalog, builds a 3D unit-vectorcKDTree, and providesqueryRaDec. - Extends GMN catalog loading to support returning additional fields (e.g.,
preferred_name) for consumers. - Adds
__main__CLI entry points for--radecquerying and--testoutput serialization.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def __init__(self, config, catalogue_time=None, ra_col=0, dec_col=1, mag_col=2, name_col=0, lim_mag=None): | ||
|
|
||
| """Initialise a catalog in a spherical tree object/ | ||
|
|
||
| Arguments: | ||
| config[config]: RMS config instance. | ||
|
|
||
| Keyword Arguments: | ||
| catalogue_time: Time point for the catalogue generation if none build for now. | ||
| ra_col: Optional, default 0 - array column with ra data in degrees. | ||
| dec_col: Optional, default 1 - array column with dec data in degrees. | ||
| mag_col: Optional, default 2 - array column of magnitude data. | ||
| name_col: Optional, default 3 - array column of star names. | ||
|
|
||
| """ | ||
|
|
||
| self.ra_col, self.dec_col, self.mag_col = ra_col, dec_col, mag_col | ||
| self.name_col = name_col | ||
|
|
There was a problem hiding this comment.
The __init__ docstring and parameters disagree: name_col defaults to 0 in the signature but the docstring says default 3, and name_col isn't used in the implementation. Either implement name column selection or remove/rename these parameters so the API matches actual behavior.
| return output | ||
|
|
||
| last_searched_name = None | ||
| for i, r in enumerate(results): | ||
| for name, ra, dec, mag, theta in r: | ||
| if star_mag is not None: | ||
| searched_mag = star_mag[i] | ||
| if star_names is not None: | ||
| searched_name = star_names[i] | ||
| if last_searched_name != searched_name: | ||
| output.append(f"\n\tNew search for {searched_name:20} of magnitude {float(searched_mag):4.2f}") | ||
| last_searched_name = searched_name | ||
|
|
||
|
|
||
| output.append( | ||
| f"\t\tReturned name: {name:20s} RA={float(ra):8.3f} Dec={float(dec):8.3f} Mag={float(mag):5.2f} Sep={float(theta):6.3f}") | ||
|
|
There was a problem hiding this comment.
When no results are returned, serializeQueryResults returns a Python list, but in the non-empty case it returns a string. This inconsistent return type will break callers that always expect a string (e.g., logging). Return a string in the empty-results case as well.
| return output | |
| last_searched_name = None | |
| for i, r in enumerate(results): | |
| for name, ra, dec, mag, theta in r: | |
| if star_mag is not None: | |
| searched_mag = star_mag[i] | |
| if star_names is not None: | |
| searched_name = star_names[i] | |
| if last_searched_name != searched_name: | |
| output.append(f"\n\tNew search for {searched_name:20} of magnitude {float(searched_mag):4.2f}") | |
| last_searched_name = searched_name | |
| output.append( | |
| f"\t\tReturned name: {name:20s} RA={float(ra):8.3f} Dec={float(dec):8.3f} Mag={float(mag):5.2f} Sep={float(theta):6.3f}") | |
| else: | |
| last_searched_name = None | |
| for i, r in enumerate(results): | |
| for name, ra, dec, mag, theta in r: | |
| if star_mag is not None: | |
| searched_mag = star_mag[i] | |
| if star_names is not None: | |
| searched_name = star_names[i] | |
| if last_searched_name != searched_name: | |
| output.append(f"\n\tNew search for {searched_name:20} of magnitude {float(searched_mag):4.2f}") | |
| last_searched_name = searched_name | |
| output.append( | |
| f"\t\tReturned name: {name:20s} RA={float(ra):8.3f} Dec={float(dec):8.3f} Mag={float(mag):5.2f} Sep={float(theta):6.3f}") |
| cat = Catalog(config, lim_mag=float(cml_args.radec[2])) | ||
| results = cat.queryRaDec(ra, dec, lim_mag) |
There was a problem hiding this comment.
The CLI wiring appears to mix up radius and lim_mag: Catalog(..., lim_mag=float(cml_args.radec[2])) uses the radius value as lim_mag, and cat.queryRaDec(ra, dec, lim_mag) passes the limiting magnitude as radius_deg. This will produce incorrect queries. Use lim_mag from cml_args.radec[3] when constructing the catalog, and pass radius to queryRaDec’s radius_deg parameter.
| cat = Catalog(config, lim_mag=float(cml_args.radec[2])) | |
| results = cat.queryRaDec(ra, dec, lim_mag) | |
| cat = Catalog(config, lim_mag=lim_mag) | |
| results = cat.queryRaDec(ra, dec, radius) |
| pass | ||
|
|
||
| if not star_catalog_status: | ||
| print("Could not load star catalogue") |
There was a problem hiding this comment.
If readStarCatalog fails, the code prints an error but continues and will raise when unpacking star_catalog_status. This should fail fast (raise an exception or return early) to avoid a confusing downstream traceback.
| pass | |
| if not star_catalog_status: | |
| print("Could not load star catalogue") | |
| if not star_catalog_status: | |
| print("Could not load star catalogue") | |
| raise RuntimeError("Could not load star catalogue from '{}' (file '{}')".format( | |
| config.star_catalog_path, config.star_catalog_file)) |
| if star_mag is not None: | ||
| searched_mag = star_mag[i] | ||
| if star_names is not None: | ||
| searched_name = star_names[i] | ||
| if last_searched_name != searched_name: | ||
| output.append(f"\n\tNew search for {searched_name:20} of magnitude {float(searched_mag):4.2f}") | ||
| last_searched_name = searched_name | ||
|
|
||
|
|
There was a problem hiding this comment.
serializeQueryResults formats searched_mag inside the star_names is not None branch, but searched_mag is only assigned when star_mag is not None. If callers pass star_names without star_mag, this will raise UnboundLocalError. Either require both, or guard the magnitude formatting when star_mag is missing.
| if star_mag is not None: | |
| searched_mag = star_mag[i] | |
| if star_names is not None: | |
| searched_name = star_names[i] | |
| if last_searched_name != searched_name: | |
| output.append(f"\n\tNew search for {searched_name:20} of magnitude {float(searched_mag):4.2f}") | |
| last_searched_name = searched_name | |
| searched_mag = star_mag[i] if star_mag is not None else None | |
| searched_name = star_names[i] if star_names is not None else None | |
| if searched_name is not None and last_searched_name != searched_name: | |
| if searched_mag is not None: | |
| output.append(f"\n\tNew search for {searched_name:20} of magnitude {float(searched_mag):4.2f}") | |
| else: | |
| output.append(f"\n\tNew search for {searched_name:20}") | |
| last_searched_name = searched_name |
| cat = Catalog(config, lim_mag=6) | ||
| log.info(serializeQueryResults(cat.queryRaDec(test_star_ra_deg, test_star_dec_deg, radius_deg=2, n_brightest=3), test_star_names, test_star_mag)) |
There was a problem hiding this comment.
testCatQueryRaDec depends on config and log globals that are only set in the __main__ block. This makes the function fragile when imported/called from elsewhere. Consider passing config and log in as parameters (or constructing a logger/config within the test function).
| star_catalog_status = readStarCatalog( | ||
| config.star_catalog_path, | ||
| config.star_catalog_file, | ||
| lim_mag=np.inf, |
There was a problem hiding this comment.
Catalog.__init__ accepts lim_mag but currently ignores it (hardcoded lim_mag=np.inf when calling readStarCatalog). This makes the public API misleading and can significantly increase memory/CPU when building the KD-tree. Pass the lim_mag argument through (or remove it from the signature if intentionally unsupported).
| lim_mag=np.inf, | |
| lim_mag=np.inf if lim_mag is None else lim_mag, |
| pass | ||
|
|
There was a problem hiding this comment.
There is an extraneous pass statement here which has no effect and looks like leftover debug code. It should be removed.
| pass |
| pass | ||
|
|
||
| if not star_catalog_status: | ||
| print("Could not load star catalogue") | ||
|
|
||
| catalog_stars, _, config.star_catalog_band_ratios, extras = star_catalog_status | ||
|
|
||
|
|
There was a problem hiding this comment.
Catalog.__init__ unconditionally requests additional_fields and unpacks 4 return values from readStarCatalog, but readStarCatalog returns only 3 values for non-GMN catalogs (e.g., BSC/GAIA/Sky2000). If this class is GMN-only, add a clear validation/error when the configured catalog is not GMN; otherwise handle both return shapes.
| pass | |
| if not star_catalog_status: | |
| print("Could not load star catalogue") | |
| catalog_stars, _, config.star_catalog_band_ratios, extras = star_catalog_status | |
| if not star_catalog_status: | |
| print("Could not load star catalogue") | |
| raise ValueError("Could not load star catalogue from path '{}' and file '{}'".format( | |
| config.star_catalog_path, config.star_catalog_file)) | |
| # Handle both 3- and 4-element return values from readStarCatalog. | |
| if isinstance(star_catalog_status, tuple): | |
| if len(star_catalog_status) == 4: | |
| catalog_stars, _, config.star_catalog_band_ratios, extras = star_catalog_status | |
| elif len(star_catalog_status) == 3: | |
| catalog_stars, _, config.star_catalog_band_ratios = star_catalog_status | |
| # Construct a minimal extras dict with a preferred_name field so later code works. | |
| n_stars = catalog_stars.shape[0] | |
| if (self.name_col is not None and | |
| 0 <= self.name_col < catalog_stars.shape[1]): | |
| preferred_names = catalog_stars[:, self.name_col] | |
| else: | |
| preferred_names = np.full(n_stars, "", dtype=object) | |
| extras = {"preferred_name": preferred_names} | |
| else: | |
| raise ValueError("Unexpected number of values returned by readStarCatalog: {}".format( | |
| len(star_catalog_status))) | |
| else: | |
| raise ValueError("Unexpected type returned by readStarCatalog: {}".format( | |
| type(star_catalog_status))) |
Thanks for the copilot review and feedback. The task is to find all the query types in RMS and add them into the catalog class, but using all the math tricks to make them computationally cheap. This class has to not only do the abstraction work, but also be high performance. This version is a touch out of date, I'll merge in my latest version from the pipeline work in the next few days. I can implement those queries, the plan is to implement all the queries into the class. What will take time is finding the maths tricks to make computationally cheap queries. |
Adds a Catalog class which on initialization reads in the GMN catalog, and exposes a query interface which can receive scalars or arrays for a tree search in a Euclidean space. Once the catalog is loaded, multiple queries can be executed very quickly.
Return is a list of arrays of results, one item for scalar inputs, multiple for array inputs. Results are ordered by increasing magnitude.
e.g.
Also adds a command line interface.
Queries wrap around all edges
And a test routine.