Skip to content

Fixes redundant hash in ghcide cache path (getCacheDirsDefault)#4854

Merged
fendor merged 5 commits intohaskell:masterfrom
adpad-13:fix-double-hash-4843
Mar 11, 2026
Merged

Fixes redundant hash in ghcide cache path (getCacheDirsDefault)#4854
fendor merged 5 commits intohaskell:masterfrom
adpad-13:fix-double-hash-4843

Conversation

@adpad-13
Copy link
Contributor

@adpad-13 adpad-13 commented Feb 27, 2026

Fixes #4843

@fendor
Copy link
Collaborator

fendor commented Feb 27, 2026

Hi, thank you for your PR.

You give no reasoning or justification for your fix. I encourage you to think about it in more detail or to reach out with questions if you do not understand why the code is how it is right now.
Also, commit messages should include reasoning for the change so that we understand better in the future why a change was made.

@adpad-13 adpad-13 force-pushed the fix-double-hash-4843 branch from db2a6b5 to fa2a354 Compare February 28, 2026 09:35
@adpad-13
Copy link
Contributor Author

adpad-13 commented Feb 28, 2026

Hi,thanks for the feedback. I'm pretty new to open source and this codebase, so I really appreciate the guidance.
You were completely right to ask for the reasoning. From what I found, sometimes the upstream code passes down an ID that already has a hash in it, but getCacheDirsDefault function doesn't check for that and blindly adds the hash again, causing the main-hash-hash duplication.
I added the isSuffixOf check just to stop it from adding a second hash if one is already there. I've also updated the commit to explain this.
I'm not familiar with the upstream cradle code yet, but if you think it's better to fix the root cause up there instead of doing this check here, I'm happy to give it a try, I just guessed that touching the upstream logic might be a much bigger and heavier change.
Let me know what you think is best.

@fendor fendor changed the title Fix #4843 : Fixes redundant hash in ghcide cache path (getCacheDirsDe… Fixes redundant hash in ghcide cache path (getCacheDirsDefault) Feb 28, 2026
@fendor
Copy link
Collaborator

fendor commented Feb 28, 2026

If you validate your PR against the cache dirs on your local system, do you think your patch would work?
Iirc, there is no guarantee the hashes would be the same, and I think they aren't always identical.

Fixing upstream is a far better solution than monkey patching in HLS, imo :)
However, not appending the hash when there is a hash already can be entirely achieved in HLS, I believe.

@adpad-13
Copy link
Contributor Author

adpad-13 commented Feb 28, 2026

Ah , thanks I just realised that now , if the hashes are different it will be appended no matter what .
to handle it entirely in HLS like you mentioned, we can implement a logic to check if a valid Cabal hash is already present in the ID, and just skip appending a new one. I'll have to look up exactly how Cabal formats its hashes .

@fendor
Copy link
Collaborator

fendor commented Mar 1, 2026

Out of curiosity, how did you figure out this hash is appended by cabal or is a cabal hash?

@adpad-13
Copy link
Contributor Author

adpad-13 commented Mar 1, 2026

I'm still learning how the different layers interact, so i mostly figured this out just by understanding what I could by looking at my local files, I was not really sure that the hash was added by cabal but since i saw the the prefix already had an appended hash
I assumed the hash was being added by Cabal, but after checking the dist-newstyle cache and running a verbose build, I saw that the ids there actually do not have that hash. However, the folders being generated in ~/.cache/ghcide/ definitely have a hash attached to the prefix. This led me to deduce that the hash is being appended in between ,somewhere after the build process but before the id is passed to ghcide, so assumed that cabal was adding those hashes.
Since I wasn't sure exactly where that upstream logic lives or how that works, I focused on adding a logic to the getCacheDirsDefault as discussed earlier ,to handle these pre-hashed IDs.
After looking at my local files , i noticed most of them used 40-digit hashes .
So I have updated the logic , i set a lower limit of 32 characters for the hasHashSuffix just to be safe.
It checks if the suffix of the uid given to the function contains valid hex digits.IMG-20260301-WA0013.jpg

My local tests confirm this prevents the double-hash.
IMG-20260301-WA0014.jpg

Let me know if you'd like me to push the commit

@fendor
Copy link
Collaborator

fendor commented Mar 1, 2026

My local tests confirm this prevents the double-hash.

The screenshot you are sending never had any double hashes, the double hashes are only added when the unit id is main. So I am curious what local tests did you perform?


Your reasoning seems very circular, are you saying, you assume it is a cabal hash, although it doesn't look like cabal generated uses this hash, but you have no better explanation than it being a cabal hash?
To cut to the chase, no, this is not a cabal hash, we generate the first hash here: https://github.com/haskell/haskell-language-server/blob/master/ghcide/session-loader/Development/IDE/Session.hs#L1171C43-L1171C44

The idea of stripping the last part of the hash is wrong. Both hashes are present for a reason.
While I fully acknowledge that it is basically impossible for you to know why these hashes are necessary, you need to be careful when proposing changes you do not understand the ramifications of.

I updated the original issue description, which hopefully explains the solution in more detail, instead of simply stripping the last hash away.

@adpad-13
Copy link
Contributor Author

adpad-13 commented Mar 1, 2026

oh , i'm sorry , i meant to share this
beforefix
afterfix

Your reasoning seems very circular, are you saying, you assume it is a cabal hash, although it doesn't look like cabal generated uses this hash, but you have no better explanation than it being a cabal hash? To cut to the chase, no, this is not a cabal hash, we generate the first hash here: https://github.com/haskell/haskell-language-server/blob/master/ghcide/session-loader/Development/IDE/Session.hs#L1171C43-L1171C44

You are totally right that my explanation was confusing , to clarify my thought process: I initially assumed it was Cabal adding the hash, but when I checked the dist-newstyle cache and verbose build logs, i realized Cabal wasn't generating it. I deduced the hash was being appended somewhere in between the build and ghcide, but since I did not go through the whole thing as i was fixated on shortening the path by stripping the last hash, I didn't know exactly where that was happening. Thank you for pointing me directly to those lines of code.

The idea of stripping the last part of the hash is wrong. Both hashes are present for a reason. While I fully acknowledge that it is basically impossible for you to know why these hashes are necessary, you need to be careful when proposing changes you do not understand the ramifications of.

Understood, I will drop this idea of stripping the last part .

I updated the original issue description, which hopefully explains the solution in more detail, instead of simply stripping the last hash away.

Thanks for the update. I see now the goal is to thread that original ByteString hash to getCacheDirsDefault and combine them into a single hash.
I'm going to take a few days to properly trace the data flow through Session.hs and work on this refactor. Thanks for the patience and guidance!

@adpad-13 adpad-13 force-pushed the fix-double-hash-4843 branch 2 times, most recently from 7e001d5 to 5901a7b Compare March 4, 2026 13:17
@fendor
Copy link
Collaborator

fendor commented Mar 7, 2026

Thank you, this looks much better now. I kind of still dislike the tuple, what do you say to replacing the tuple you modified into a proper datatype?

What do you think about this?

Referring to this tuple in particular: (DynFlags, [GHC.Target],Maybe B.ByteString)

A potential name could be something again HomeUnitConfig, but I am not 100% sure.

Also, please make sure you use the whitespace consistently, like adding a space after a comma in tuples, e.g. (a, b) instead of (a,b)

Afterwards, I think this is ready to merge :)

Note, you will also have to rebase this PR, as we have merged a quite a number of changes to Session.hs recently.

@adpad-13
Copy link
Contributor Author

adpad-13 commented Mar 7, 2026

Hi, thanks for the heads-up, I see all the hashing logic was moved to Ghc.hs , I agree to use a datatype named HomeUnitConfig instead of that tuple

@adpad-13 adpad-13 force-pushed the fix-double-hash-4843 branch 3 times, most recently from fe503b8 to cb74d59 Compare March 7, 2026 20:37
@adpad-13
Copy link
Contributor Author

adpad-13 commented Mar 10, 2026

Hey @fendor,
I've rebased the PR and pushed the updates based on your feedback. Specifically, I added a type for the monadic tuple HomeUnitConfig and also applied the hashing in Ghc.hs (since the hashing logic was moved there).
Kindly have another look whenever you're free

Copy link
Collaborator

@fendor fendor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Only small documentation issues and nitpicks!
Also, one more rebase, please!

, sessionLoadingOptions = newSessionLoadingOptions
}


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

toFlagsMap TargetDetails{..} =
[ (l, (targetEnv, targetDepends)) | l <- targetLocations]


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change


-- | Mapping from @hie.yaml@ to all components that have been loaded
-- from this @hie.yaml@ location.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

return (Map.insert k res mp, res)
Just res -> return (mp, res)


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines +434 to +435
getCacheDirsDefault :: String -> Maybe B.ByteString -> [String] -> IO CacheDirs
getCacheDirsDefault prefix mFirstHash opts = do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this function got more complicated, let's give it some proper haddock docs!

-- | Maps cradle dependencies, such as `stack.yaml`, or `.cabal` file
-- to last modification time. See Note [Multi Cradle Dependency Info].
, rawComponentDependencyInfo :: DependencyInfo
-- | The raw ByteString for the hash generated in setoptions for the uid "main"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- | The raw ByteString for the hash generated in setoptions for the uid "main"
-- | An optional hash seed generated in 'setOptions' for the unit id "main".

If you quote function names and definitions like this, then they are linked in HLS, and you can even gotoDefinition for setOptions in the docs.

import qualified Data.Map.Strict as Map
import Data.Maybe
import Data.Proxy

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

…ted hashBytes inside the monadic tuple (earlier, it was only returning the flags and the target).

* To thread this down, I added a new rawComponentHash field as a Maybe B.ByteString to the RawComponentInfo data type. This passes the hash from setOptions straight into getCacheDirsDefault.
* Inside getCacheDirsDefault, I added a check:
  * If we have Just mFirstHash, I set the prefix to 'main'. Then, I initiate a new hash and update it with mFirstHash using H.updates, storing this mutated context as baseCtx.
  * If it's Nothing, baseCtx just gets a standard H.init.
* Finally, for opts_hash, it takes that baseCtx and updates it with the opts string (converted via B.pack). This gives out the final 40-character hex string for the SHA1 hash.
 (Also updated getCacheDirs in the session loading to accept the ByteString).
This keeps the path short and stops Windows from crashing
…o Ghc.hs after rebase ,also updated the getDirsDefault to handel the Maybe B.ByteString in Session.hs
…o Ghc.hs after rebase ,also updated the getDirsDefault to handel the Maybe B.ByteString in Session.hs
@adpad-13 adpad-13 force-pushed the fix-double-hash-4843 branch from cb74d59 to 31d55e2 Compare March 11, 2026 10:35
@adpad-13 adpad-13 force-pushed the fix-double-hash-4843 branch from 31d55e2 to c61dc27 Compare March 11, 2026 10:47
Copy link
Collaborator

@fendor fendor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One copy-paste mistake

Co-authored-by: fendor <fendor@users.noreply.github.com>
@fendor fendor enabled auto-merge (squash) March 11, 2026 11:28
@adpad-13
Copy link
Contributor Author

One of the checks is failing , can you please re run that

@fendor fendor merged commit cf36960 into haskell:master Mar 11, 2026
55 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Double hashes in ghcide cache path causes long path

2 participants