-
Notifications
You must be signed in to change notification settings - Fork 6
Convergence, bootstrap, p-values #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Changes from all commits
e80389c
1623119
8f2eaf1
6f1246c
20c8b63
0eb04ef
5e5575b
c27706d
24a7947
aa62557
fdca879
ccf8c0d
1adb1fa
9478d55
2e37710
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,39 +1,166 @@ | ||
| """ | ||
| bootstrap( | ||
| fitted::SemFit, | ||
| specification::SemSpecification; | ||
| statistic = solution, | ||
| n_boot = 3000, | ||
| data = nothing, | ||
| engine = :Optim, | ||
| parallel = false, | ||
| fit_kwargs = Dict(), | ||
| replace_kwargs = Dict()) | ||
|
|
||
| Return bootstrap samples for `statistic`. | ||
|
|
||
| # Arguments | ||
| - `fitted`: a fitted SEM. | ||
| - `specification`: a `ParameterTable` or `RAMMatrices` object passed to `replace_observed`. | ||
| - `statistic`: any function that can be called on a `SemFit` object. | ||
| The output will be returned as the bootstrap sample. | ||
| - `n_boot`: number of boostrap samples | ||
| - `data`: data to sample from. Only needed if different than the data from `sem_fit` | ||
| - `engine`: optimizer engine, passed to `fit`. | ||
| - `parallel`: if `true`, run bootstrap samples in parallel on all available threads. | ||
| The number of threads is controlled by the `JULIA_NUM_THREADS` environment variable or | ||
| the `--threads` flag when starting Julia. | ||
| - `fit_kwargs` : a `Dict` controlling model fitting for each bootstrap sample, | ||
| passed to `fit` | ||
| - `replace_kwargs`: a `Dict` passed to `replace_observed` | ||
|
|
||
| # Example | ||
| ```julia | ||
| # 1000 boostrap samples of the minimum, fitted with :Optim | ||
| bootstrap( | ||
| fitted; | ||
| statistic = StructuralEquationModels.minimum, | ||
| n_boot = 1000, | ||
| engine = :Optim, | ||
| ) | ||
| ``` | ||
| """ | ||
| function bootstrap( | ||
| fitted::SemFit, | ||
| specification::SemSpecification; | ||
| statistic = solution, | ||
| n_boot = 3000, | ||
| data = nothing, | ||
| engine = :Optim, | ||
| parallel = false, | ||
| fit_kwargs = Dict(), | ||
| replace_kwargs = Dict(), | ||
| ) | ||
| # access data and convert to matrix | ||
| data = prepare_data_bootstrap(data, fitted.model) | ||
| start = solution(fitted) | ||
| # pre-allocations | ||
| out = [] | ||
| conv = [] | ||
| # fit to bootstrap samples | ||
| if !parallel | ||
| for _ in 1:n_boot | ||
| sample_data = bootstrap_sample(data) | ||
| new_model = replace_observed( | ||
| fitted.model; | ||
| data = sample_data, | ||
| specification = specification, | ||
| replace_kwargs..., | ||
| ) | ||
| new_fit = fit(new_model; start_val = start, engine = engine, fit_kwargs...) | ||
| sample = statistic(new_fit) | ||
| c = converged(new_fit) | ||
| push!(out, sample) | ||
| push!(conv, c) | ||
| end | ||
| else | ||
| n_threads = Threads.nthreads() | ||
| # Pre-create one independent model copy per thread via deepcopy. | ||
| model_pool = Channel(n_threads) | ||
| for _ in 1:n_threads | ||
| put!(model_pool, deepcopy(fitted.model)) | ||
| end | ||
| # fit models in parallel | ||
| lk = ReentrantLock() | ||
| Threads.@threads for _ in 1:n_boot | ||
| thread_model = take!(model_pool) | ||
| sample_data = bootstrap_sample(data) | ||
| new_model = replace_observed( | ||
| thread_model; | ||
| data = sample_data, | ||
| specification = specification, | ||
| replace_kwargs..., | ||
| ) | ||
| new_fit = fit(new_model; start_val = start, engine = engine, fit_kwargs...) | ||
| sample = statistic(new_fit) | ||
| c = converged(new_fit) | ||
| lock(lk) do | ||
| push!(out, sample) | ||
| push!(conv, c) | ||
| end | ||
| put!(model_pool, thread_model) | ||
| end | ||
| end | ||
| return Dict( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's better to use |
||
| :samples => out, | ||
| :n_boot => n_boot, | ||
| :n_converged => isempty(conv) ? 0 : sum(conv), | ||
| :converged => conv, | ||
| ) | ||
| end | ||
|
|
||
| """ | ||
| se_bootstrap( | ||
| sem_fit::SemFit; | ||
| fitted::SemFit, | ||
| specification::SemSpecification; | ||
| n_boot = 3000, | ||
| data = nothing, | ||
| specification = nothing, | ||
| parallel = false, | ||
| kwargs...) | ||
| fit_kwargs = Dict(), | ||
| replace_kwargs = Dict()) | ||
|
|
||
| Return bootstrap standard errors. | ||
|
|
||
| # Arguments | ||
| - `fitted`: a fitted SEM. | ||
| - `specification`: a `ParameterTable` or `RAMMatrices` object passed to `replace_observed`. | ||
| - `n_boot`: number of boostrap samples | ||
| - `data`: data to sample from. Only needed if different than the data from `sem_fit` | ||
| - `specification`: a `ParameterTable` or `RAMMatrices` object passed down to `replace_observed`. | ||
| Necessary for FIML models. | ||
| - `engine`: optimizer engine, passed to `fit`. | ||
| - `parallel`: if `true`, run bootstrap samples in parallel on all available threads. | ||
| The number of threads is controlled by the `JULIA_NUM_THREADS` environment variable or | ||
| the `--threads` flag when starting Julia. | ||
| - `kwargs...`: passed down to `replace_observed` | ||
| - `fit_kwargs` : a `Dict` controlling model fitting for each bootstrap sample, | ||
| passed to `sem_fit` | ||
| - `replace_kwargs`: a `Dict` passed to `replace_observed` | ||
|
|
||
| # Example | ||
| ```julia | ||
| # 1000 boostrap samples, fitted with :NLopt | ||
| using NLopt | ||
|
|
||
| se_bootstrap( | ||
| fitted; | ||
| n_boot = 1000, | ||
| engine = :NLopt, | ||
| ) | ||
| ``` | ||
| """ | ||
| function se_bootstrap( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be possible to make |
||
| fitted::SemFit{Mi, So, St, Mo, O}; | ||
| fitted::SemFit, | ||
| specification::SemSpecification; | ||
| n_boot = 3000, | ||
| data = nothing, | ||
| specification = nothing, | ||
| engine = :Optim, | ||
| parallel = false, | ||
| kwargs..., | ||
| ) where {Mi, So, St, Mo, O} | ||
| fit_kwargs = Dict(), | ||
| replace_kwargs = Dict(), | ||
| ) | ||
| # access data and convert to matrix | ||
| data = prepare_data_bootstrap(data, fitted.model) | ||
| start = solution(fitted) | ||
| # pre-allocations | ||
| total_sum = zero(start) | ||
| total_squared_sum = zero(start) | ||
| n_failed = Ref(0) | ||
| n_conv = Ref(0) | ||
| # fit to bootstrap samples | ||
| if !parallel | ||
| for _ in 1:n_boot | ||
|
|
@@ -42,14 +169,15 @@ function se_bootstrap( | |
| fitted.model; | ||
| data = sample_data, | ||
| specification = specification, | ||
| kwargs..., | ||
| replace_kwargs..., | ||
| ) | ||
| try | ||
| sol = solution(fit(new_model; start_val = start)) | ||
| new_fit = fit(new_model; start_val = start, engine = engine, fit_kwargs...) | ||
| sol = solution(new_fit) | ||
| conv = converged(new_fit) | ||
| if conv | ||
| n_conv[] += 1 | ||
| @. total_sum += sol | ||
| @. total_squared_sum += sol^2 | ||
| catch | ||
| n_failed[] += 1 | ||
| end | ||
| end | ||
| else | ||
|
|
@@ -63,37 +191,37 @@ function se_bootstrap( | |
| lk = ReentrantLock() | ||
| Threads.@threads for _ in 1:n_boot | ||
| thread_model = take!(model_pool) | ||
| try | ||
| sample_data = bootstrap_sample(data) | ||
| new_model = replace_observed( | ||
| thread_model; | ||
| data = sample_data, | ||
| specification = specification, | ||
| kwargs..., | ||
| ) | ||
| sol = solution(fit(new_model; start_val = start)) | ||
| sample_data = bootstrap_sample(data) | ||
| new_model = replace_observed( | ||
| thread_model; | ||
| data = sample_data, | ||
| specification = specification, | ||
| replace_kwargs..., | ||
| ) | ||
| new_fit = fit(new_model; start_val = start, engine = engine, fit_kwargs...) | ||
| sol = solution(new_fit) | ||
| conv = converged(new_fit) | ||
| if conv | ||
| lock(lk) do | ||
| n_conv[] += 1 | ||
| @. total_sum += sol | ||
| @. total_squared_sum += sol^2 | ||
| end | ||
| catch | ||
| lock(lk) do | ||
| n_failed[] += 1 | ||
| end | ||
|
Comment on lines
-80
to
-82
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To avoid locking, you can also use |
||
| finally | ||
| put!(model_pool, thread_model) | ||
| end | ||
| put!(model_pool, thread_model) | ||
| end | ||
| end | ||
| # compute parameters | ||
| n_conv = n_boot - n_failed[] | ||
| n_conv = n_conv[] | ||
| sd = sqrt.(total_squared_sum / n_conv - (total_sum / n_conv) .^ 2) | ||
| if !iszero(n_failed[]) | ||
| @warn "During bootstrap sampling, "*string(n_failed[])*" models did not converge" | ||
| end | ||
| @info string(n_conv)*" models converged" | ||
| return sd | ||
| end | ||
|
|
||
| ############################################################################################ | ||
| ### Helper Functions | ||
| ############################################################################################ | ||
|
|
||
| function bootstrap_sample(data::Matrix) | ||
| nobs = size(data, 1) | ||
| index_new = rand(1:nobs, nobs) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| _doc_normal_CI = """ | ||
| (1) normal_CI(fitted, se; α = 0.05, name_lower = :ci_lower, name_upper = :ci_upper) | ||
|
|
||
| (2) normal_CI!(partable, fitted, se; α = 0.05, name_lower = :ci_lower, name_upper = :ci_upper) | ||
|
|
||
| Return normal-theory confidence intervals for all model parameters. | ||
| `normal_CI!` additionally writes the result into `partable`. | ||
|
|
||
| # Arguments | ||
| - `fitted`: a fitted SEM. | ||
| - `se`: standard errors for each parameter, e.g. from [`se_hessian`](@ref) or | ||
| [`se_bootstrap`](@ref). | ||
| - `partable`: a [`ParameterTable`](@ref) to write confidence intervals to. | ||
| - `α`: significance level. Defaults to `0.05` (95% intervals). | ||
| - `name_lower`: column name for the lower bound in `partable`. Defaults to `:ci_lower`. | ||
| - `name_upper`: column name for the upper bound in `partable`. Defaults to `:ci_upper`. | ||
|
|
||
| # Returns | ||
| - a `Dict` with keys `name_lower` and `name_upper`, each mapping to a vector of bounds | ||
| over all parameters. | ||
| """ | ||
|
|
||
| @doc "$(_doc_normal_CI)" | ||
| function normal_CI(fitted, se; α = 0.05, name_lower = :ci_lower, name_upper = :ci_upper) | ||
| qnt = quantile(Normal(0, 1), 1-α/2); | ||
| sol = solution(fitted) | ||
| return Dict(name_lower => sol - qnt*se, name_upper => sol + qnt*se) | ||
| end | ||
|
|
||
| @doc "$(_doc_normal_CI)" | ||
| function normal_CI!( | ||
| partable, | ||
| fitted, | ||
| se; | ||
| α = 0.05, | ||
| name_lower = :ci_lower, | ||
| name_upper = :ci_upper, | ||
| ) | ||
| cis = normal_CI(fitted, se; α, name_lower, name_upper) | ||
| update_partable!(partable, name_lower, fitted, cis[name_lower]) | ||
| update_partable!(partable, name_upper, fitted, cis[name_upper]) | ||
| return cis | ||
| end |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also preallocate
out = Vector{Any}(nothing, n_boot)andconv = fill(false, n_boot)(orconv = Vector{Union{Missing, Bool}}(missing, n_boot)), then you can avoid locks and just set the results directlyout[i] = ....The failed iterations should have
out[i] === nothing, you can filter them out when generating the result.If you are using
convwith true/false/missing logic, you also don't have to count failed iterations as they will haveconv[i] === missing.