sched: add a reference count to the TCB to prevent it from being deleted.#17468
sched: add a reference count to the TCB to prevent it from being deleted.#17468hujun260 wants to merge 1 commit intoapache:masterfrom
Conversation
f192057 to
cae0ffd
Compare
Some kernel features cannot be made optional via configuration switches, as disabling them would introduce bugs. |
anchao
left a comment
There was a problem hiding this comment.
I won't let this submission be merged:
- code size will increase.
- performance will degrade.
|
If you can't resolve these 2 issues, please stop changing the status of this PR. |
However, currently, using nxsched_get_tcb cannot guarantee safety even within critical sections, which is more critical than other factors. Furthermore, this patch reduces the scope of critical sections, thereby improving system real-time performance. As for the code size, it remains acceptable because the added safety justifies the increase. |
Zephyr is no more safety either. If your proposal can be applied to any other RTOS, I will approve this submission. |
We found an issue with NuttX, so why drag other RTOSes into this? I'm not even developing on Zephyr. |
Additionally, please stop setting the status manually – GitHub will send out unnecessary push notifications and emails because of this. |
I will proceed with submitting my proposal. |
If the only way you can avoid hitting performance is to add a noref method, then what’s the point of having ref in the first place? Should we use noref everywhere instead? Why do I have to bear the extra overhead for the implementation of ref? |
without refcount, the used after free will happen under the stress testing of task/thread create/destroy. Used after free is a critical bug which must be fixed. refcount is one of well known method to fix this type of error, if you have better suggestion/method, please point out. |
Setting aside the naming issue for now, |
|
The top priority is to ensure the kernel incurs no performance or footprint impact, right? For the sake of this so-called unnecessary security, making all developers bear the extra performance overhead—isn’t that the biggest issue here? |
|
One more update for you all:
|
No, I just consider POSIX spec, not private business logic. POSIX spec allow appliation create/destroy task/thread dynamically, we need fix the problem. Let's hightlight the inviolable rule here:
it's bug fix, I don't unerstand why we add an option to choice whether to fix a critical(memory corruption) bug. |
But this is not a bug on my end. There is no scenario in my commercial solution where threads are dynamically created or deleted. If you insist on merging this commit, make it configurable—this way it won’t block existing users. This is my final concession. |
…deleted. To replace the large lock with smaller ones and reduce the large locks related to the TCB, in many scenarios, we only need to ensure that the TCB won't be released instead of locking, thus reducing the possibility of lock recursion. Signed-off-by: hujun5 <hujun5@xiaomi.com>





Summary
Implement reference counting for task control blocks (TCBs) to enable fine-grained
synchronization and reduce reliance on global task locks. This mechanism allows code
to safely access TCBs without holding large locks, reducing contention and lock
recursion while ensuring TCBs remain valid during access.
should merge with apache/nuttx-apps#3246
Problem Statement
Current implementation relies on large global locks to protect task data structures.
The new reference counting approach provides a lightweight alternative that only
guarantees TCB validity rather than exclusive access, reducing:
Key Changes
1. Core Implementation
nxsched_get_tcb()andnxsched_put_tcb()for reference counting2. Scheduler Updates (72 files)
Usage Pattern
// New: Reference counting
struct tcb_s *tcb = nxsched_get_tcb(pid);
if (tcb != NULL)
{
// ... safely access tcb ...
nxsched_put_tcb(tcb);
}
Impact
tcb release
Test Objective nucleo-g431rb:nsh
after this patch
size nuttx
text data bss dec hex filename
127964 904 8644 137512 21928 nuttx
before this patch
size nuttx
text data bss dec hex filename
127232 904 8612 136748 2162c nuttx
flash size increase 732 byte
Testing
test in hardware
esp32s3-devkit:nsh
user_main: scheduler lock test
sched_lock: Starting lowpri_thread at 97
sched_lock: Set lowpri_thread priority to 97
sched_lock: Starting highpri_thread at 98
sched_lock: Set highpri_thread priority to 98
sched_lock: Waiting...
sched_lock: PASSED No pre-emption occurred while scheduler was locked.
sched_lock: Starting lowpri_thread at 97
sched_lock: Set lowpri_thread priority to 97
sched_lock: Starting highpri_thread at 98
sched_lock: Set highpri_thread priority to 98
sched_lock: Waiting...
sched_lock: PASSED No pre-emption occurred while scheduler was locked.
sched_lock: Finished
End of test memory usage:
VARIABLE BEFORE AFTER
======== ======== ========
arena 5d8bc 5d8bc
ordblks 7 6
mxordblk 548a0 548a0
uordblks 5014 5014
fordblks 588a8 588a8
Final memory usage:
VARIABLE BEFORE AFTER
======== ======== ========
arena 5d8bc 5d8bc
ordblks 1 6
mxordblk 59238 548a0
uordblks 4684 5014
fordblks 59238 588a8
user_main: Exiting
ostest_main: Exiting with status 0
nsh> u
nsh: u: command not found
nsh>
nsh>
nsh>
nsh> uname -a
NuttX 12.11.0 ef91333e3ac-dirty Dec 10 2025 16:11:04 xtensa esp32s3-devkit
nsh>