Skip to content

dashboard with compute_graph#264

Open
lazarusA wants to merge 26 commits into
mainfrom
la/ups_dashboard
Open

dashboard with compute_graph#264
lazarusA wants to merge 26 commits into
mainfrom
la/ups_dashboard

Conversation

@lazarusA
Copy link
Copy Markdown
Member

supersedes #235

lazarusA and others added 9 commits March 24, 2026 17:11
* git fetch-depth 0

* changelog fetch
Co-authored-by: CompatHelper Julia <compathelper_noreply@julialang.org>
* gpu, cpu devices

* pfuff, of course symbol or variable indexing will not work on gpu, that needs to be done in the outer loop, refactoring genericHybrid is needed for that

* GPU support

* All tests passing, rewrote mask/nan checks.

* gdev only on batch level, some fixes to dimdata

* dimdata works with gpu

* cleaned

* Dimensional data

* formatting

* cleaned commented code

* MultiNN Support

* Is this necessary?

* tests

* computers are magic, I guess

* to_tuple not needed anymore

* lstm gpu changes -> does break other things?

* GPUArraysCore add

* fallback for functions and callables

* toDataFrame for LSTM

* runic

* local docu

* gpu tutorial

* ne Lux

* working on gpu cluster?

* gpu example - here gpu slower

* new tune and tutorial gpu

* lets try macOS for docs build

* quotes

* we need names, not sure tune works like this

* tune api adapted

* some fixes

* smaller subset bigger NN

* runic + docu tune

* fair timing?

* save_training

* fix cond

* nothing

* tweaks

* benchmarks

* separate lines and runic

* bigger NN

* get rid of gc?

* don't mention gpu for now

---------

Co-authored-by: qfl3x <qfl3x@protonmail.com>
Co-authored-by: Bernhard Ahrens <bahrens@bgc-jena.mpg.de>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces GPU acceleration support for hybrid model training, including device-aware configurations and refactored data structures that utilize tuples for features, forcings, and masks to improve batching efficiency. It also adds new Makie-based visualization recipes and comprehensive documentation for GPU usage. Review feedback identifies that the MonitorPlot and ScatterTarget recipes have empty implementations and suggests improvements for efficiency in batch checking, robustness in the R² loss calculation, and the removal of redundant file operations during model saving.

Comment on lines +10 to +12
function Makie.plot!(p::MonitorPlot)
return p
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Makie.plot! implementation for MonitorPlot is currently empty. This recipe should be implemented to visualize the training and validation monitors as intended for the dashboard functionality.

Comment on lines +20 to +22
function Makie.plot!(p::ScatterTarget)
return p
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Makie.plot! implementation for ScatterTarget is empty. Please implement the plotting logic to visualize predictions versus observations.

Comment thread src/training/epoch.jl
Comment on lines +35 to 37
function isemptybatch(mask)
return all(x -> all(x .== 0), mask)
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The isemptybatch function uses x .== 0, which creates a temporary boolean array for each element in the mask. Using iszero with all is more efficient as it avoids temporary allocations and can short-circuit during evaluation.

function isemptybatch(mask)
    return all(x -> all(iszero, x), mask)
end

Comment thread src/io/save.jl
Comment on lines +11 to 12
isfile(file_name) && rm(file_name)
return jldopen(file_name, "w") do file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The manual removal of the file via rm(file_name) is redundant because jldopen(file_name, "w") automatically truncates the file if it already exists. Removing it manually can also cause issues if the file is locked or being accessed by another process.

    return jldopen(file_name, "w") do file

Comment thread src/losses/loss_fn.jl
function loss_fn(ŷ, y, y_nan, ::Val{:r2})
r = cor(ŷ[y_nan], y[y_nan])
return r * r
return 1 - sum((y[y_nan] .- ŷ[y_nan]) .^ 2) / sum((y[y_nan] .- mean(y[y_nan])) .^ 2)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The $R^2$ implementation using y[y_nan] creates data copies. Furthermore, it is susceptible to NaN results if the mask y_nan is empty or if the observations have zero variance (leading to division by zero). Consider pre-calculating the mean of observations and using a more robust calculation that handles these edge cases.

@lazarusA
Copy link
Copy Markdown
Member Author

getting there (for the LSTM case) @BernhardAhrens

Screenshot 2026-04-30 at 11 41 26

I still need to create the other recipes, but in principle this refactor looks promising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants