Conversation
|
Hi, thanks for the review and the constructive feedback! I completely agree with standardizing the modding approach, but I'd like to clarify why the official 1. Why the official API fails on Microsoft heavily strips private type information from Office PDBs. Additionally, Because types like I tested dozens of exact mangled/undecorated string variations with 2. Regarding the hardcoded Windhawk internal path: 3. Hooking I'd love to use the official [The Test Mod] As you will see in the code below, the array includes the raw native mangled names, meticulously formatted undecorated names (with various spacing, struct/class, and modifier differences), bare function names, wildcard strings, and even the flawed signatures generated by IDA Pro. The goal was simple: if the symbol is resolvable by the official API in any format, this exhaustive list would catch at least one. // ==WindhawkMod==
// @id word-pdf-lossless-api-test-ultimate
// @name Word PDF Lossless (API Ultimate Test)
// @description The final absolute shotgun test for Windhawk SYMBOL_HOOK API.
// @version 1.0
// @author Joe Ye
// @include winword.exe
// @compilerOptions -lversion
// ==/WindhawkMod==
#include <windhawk_utils.h>
#include <windows.h>
#include <atomic>
// =============================================================
// Basic structure definitions
// =============================================================
namespace Gdiplus {
struct PointF {
float X;
float Y;
};
}
std::atomic<bool> g_bMsoHooked{false};
// =============================================================
// Hook proxy function definitions
// =============================================================
typedef HRESULT (__thiscall *HrComputeSize_t)(void* pThis, float* p1, const Gdiplus::PointF* p2);
HrComputeSize_t pOrig_HrComputeSize = nullptr;
HRESULT __fastcall Hook_HrComputeSize(void* pThis, void* edx_dummy, float* p1, const Gdiplus::PointF* p2) {
Wh_Log(L"[Hook] DOCEXIMAGE::HrComputeSize intercepted successfully!");
return pOrig_HrComputeSize(pThis, p1, p2);
}
typedef HRESULT (__thiscall *HrCheckForLosslessOutput_t)(void* pThis, int arg1);
HrCheckForLosslessOutput_t pOrig_HrCheckForLosslessOutput = nullptr;
HRESULT __fastcall Hook_HrCheckForLosslessOutput(void* pThis, void* edx_dummy, int arg1) {
Wh_Log(L"[Hook] DOCEXIMAGE::HrCheckForLosslessOutput intercepted successfully!");
return pOrig_HrCheckForLosslessOutput(pThis, arg1);
}
// =============================================================
// Core logic: Windhawk official API ultimate test
// =============================================================
void ScanAndHookMso() {
HMODULE hMso = GetModuleHandleW(L"mso.dll");
if (!hMso || g_bMsoHooked.exchange(true)) return;
Wh_Log(L"[Test Ultimate] Starting official API ultimate symbol matching test...");
WindhawkUtils::SYMBOL_HOOK officialHook[] = {
{
{
// 1. Bare function name (most extreme test)
L"HrComputeSize",
L"DOCEXIMAGE::HrComputeSize",
// 2. With wildcards (testing if Windhawk supports wildcard expansion)
L"*HrComputeSize*",
L"*DOCEXIMAGE::HrComputeSize*",
// 3. Native Mangled Name
L"?HrComputeSize@DOCEXIMAGE@@AAEJPAMPBVPointF@Gdiplus@@@Z",
// 4. Standard exact undecorated series (various minor space and modifier differences)
L"private: long __thiscall DOCEXIMAGE::HrComputeSize(float *,class Gdiplus::PointF const *)",
L"private: long __thiscall DOCEXIMAGE::HrComputeSize(float *,struct Gdiplus::PointF const *)",
L"long __thiscall DOCEXIMAGE::HrComputeSize(float *,class Gdiplus::PointF const *)",
L"long __thiscall DOCEXIMAGE::HrComputeSize(float *,struct Gdiplus::PointF const *)",
L"DOCEXIMAGE::HrComputeSize(float *,class Gdiplus::PointF const *)",
L"DOCEXIMAGE::HrComputeSize(float *,struct Gdiplus::PointF const *)",
L"DOCEXIMAGE::HrComputeSize(float*,class Gdiplus::PointF const*)", // No spaces version
// 5. IDA hallucination version
L"public: virtual long __cdecl DOCEXIMAGE::HrComputeSize(float *,struct Gdiplus::PointF const *)",
L"long __cdecl DOCEXIMAGE::HrComputeSize(float *,struct Gdiplus::PointF const *)",
L"private: long __thiscall DOCEXIMAGE::HrComputeSize(float *,class Gdiplus::PointF const *)"
},
(void**)&pOrig_HrComputeSize,
(void*)Hook_HrComputeSize,
false // Don't force exception on failure
},
{
{
// 1. Bare function name
L"HrCheckForLosslessOutput",
L"DOCEXIMAGE::HrCheckForLosslessOutput",
// 2. With wildcards
L"*HrCheckForLosslessOutput*",
L"*DOCEXIMAGE::HrCheckForLosslessOutput*",
// 3. Native Mangled Name
L"?HrCheckForLosslessOutput@DOCEXIMAGE@@MBEJH@Z",
// 4. Standard exact undecorated series
L"protected: virtual long __thiscall DOCEXIMAGE::HrCheckForLosslessOutput(int)",
L"virtual long __thiscall DOCEXIMAGE::HrCheckForLosslessOutput(int)",
L"long __thiscall DOCEXIMAGE::HrCheckForLosslessOutput(int)",
L"protected: long __thiscall DOCEXIMAGE::HrCheckForLosslessOutput(int)",
L"DOCEXIMAGE::HrCheckForLosslessOutput(int)",
// 5. IDA hallucination version
L"int __thiscall DOCEXIMAGE::HrCheckForLosslessOutput(int)",
L"int DOCEXIMAGE::HrCheckForLosslessOutput(int)",
L"protected: virtual long __thiscall DOCEXIMAGE::HrCheckForLosslessOutput(int)const"
},
(void**)&pOrig_HrCheckForLosslessOutput,
(void*)Hook_HrCheckForLosslessOutput,
false
}
};
bool bResult = WindhawkUtils::HookSymbols(hMso, officialHook, ARRAYSIZE(officialHook));
if (bResult) {
Wh_Log(L"[Test Ultimate] Official API returned TRUE! Shotgun coverage successful!");
} else {
Wh_Log(L"[Test Ultimate] Official API still returned FALSE! Proves that no matter how the name is written, it doesn't work.");
}
if (pOrig_HrComputeSize) {
Wh_Log(L"[Test Ultimate] -> DOCEXIMAGE::HrComputeSize resolved successfully!");
} else {
Wh_Log(L"[Test Ultimate] -> DOCEXIMAGE::HrComputeSize resolution completely failed...");
}
if (pOrig_HrCheckForLosslessOutput) {
Wh_Log(L"[Test Ultimate] -> DOCEXIMAGE::HrCheckForLosslessOutput resolved successfully!");
} else {
Wh_Log(L"[Test Ultimate] -> DOCEXIMAGE::HrCheckForLosslessOutput resolution completely failed...");
}
}
DWORD WINAPI ScoutThread(LPVOID lpParam) {
HMODULE hMso = nullptr;
while (!hMso) {
hMso = GetModuleHandleW(L"mso.dll");
if (!hMso) Sleep(500);
}
Sleep(500); // Allow time for initialization
ScanAndHookMso();
return 0;
}
BOOL Wh_ModInit() {
Wh_Log(L"Word PDF Lossless API Ultimate Test Loaded");
CreateThread(nullptr, 0, ScoutThread, nullptr, 0, nullptr);
return TRUE;
}
void Wh_ModUninit() {}[Explanation of the Results] As you can see, despite testing every possible naming permutation, Notice the ~1.9-second processing time between starting the test (57.496) and the failure log (59.357). This indicates that Windhawk's engine is successfully downloading and parsing the massive [Final Proof: Dumping the Exact Strings from DbgHelp] To completely rule out the possibility that my string array was simply missing the "correct" format, I wrote a routine to hook into Here is the exact output from the engine when it hits the targets: As we can see, the undecorated string output by the symbol engine matches exactly what I provided in my previous test array ( This undeniably confirms that the underlying DIA parser used by Windhawk's official API fails to reconcile these specific obfuscated/stripped entries within the Office Public PDB, despite being fed the mathematically perfect string. This is why the fuzzy substring matching ( I am completely on board with refactoring the code to use a |
|
I only skimmed over your message, I'll read it carefully later, but have you tried:
|
|
Output of Windhawk Symbol Helper that contains the target function names: Which is every bit the same as BTW, why not embed the symbol helper in the sidebar of the mod editor? There is plenty of space. I think most people will not notice this helpful tool.
|
These are the decorated symbols. What are the undecorated symbols? Can you upload the dll? If only decorated symbols can be used, Have you tried using
That simply wasn't a priority. It could be nice, maybe one day. For now, I'll add it to the API comments. |
|
Sure, here are all the decorated, undecorated symbols: They are also, like what I said before, every bit the same as what's in the test mod. Here is the dll you asked for (along with the pdb file Windhawk downloaded in case you ever need it): https://drive.google.com/file/d/1vp9_Oe9W09GUd8sDPlf8qOr2jluq2Gss/view?usp=drive_link
BTW I'd like to say this method is really really niche, like, there is essentially 0 mod ever used this method in the whole |
|
Could it be that the issue is simply that |
mods/word-pdf-lossless-export.wh.cpp
Outdated
|
|
||
| if (GetModuleHandleW(L"mso.dll")) { | ||
| // If already loaded, start thread directly | ||
| CreateThread(nullptr, 0, DelayedHookThread, nullptr, 0, nullptr); |
There was a problem hiding this comment.
The returned handle isn't closed. And why is the thread needed at all?
Interesting. Yes, adding a space solved the problem. But actually I'm still kind of favoring the Thanks for the review! Good catch on the handle leak. I totally missed CloseHandle(). I will wrap the CreateThread calls to immediately close the returned handle so the thread object cleans up properly. As for why the thread is needed: mso.dll is massive, and downloading/parsing its PDB via WindhawkUtils::HookSymbols takes several / tens of seconds. If I run ScanAndHookMso() synchronously inside Wh_ModInit or the LoadLibraryExW hook callback, it completely blocks the thread and causes Microsoft Word to freeze/hang during startup while it waits for the symbol resolution to finish. Offloading the symbol hooking to a background thread prevents this UI freeze and allows Word to launch smoothly. I'll push a commit to fix the unclosed handles. Thanks for pointing that out. |
This pull request introduces a new Windhawk mod for Microsoft Word, aiming to solve the long-standing issue of image quality loss when exporting documents to PDF. The mod hooks into Word's internal graphics pipeline to prevent image downsampling and JPEG compression, ensuring lossless export of images. It includes robust dynamic symbol scanning and adapts to both 32-bit and 64-bit Office versions.
Key enhancements for PDF export quality:
Image Quality Improvements
DOCEXIMAGE::HrComputeSizeto prevent downsampling by forcing the target image size to match the original and clearing the resample flag, ensuring pixel-perfect exports.DOCEXIMAGE::HrCheckForLosslessOutputto intercept attempts to use JPEG compression and force Word to use lossless FLATE (Zlib) compression instead, bypassing hidden internal optimization.Robustness and Compatibility
Documentation and User Guidance