WoW64 - aka Windows (32-bit) on Windows (64-bit) - is a subsystem that enables 32-bit Windows applications to run on 64-bit Windows. Most people today are familiar with WoW64 on Windows x64, where they can run x86 applications. WoW64 has been with us since Windows XP, and x64 wasn’t the only architecture where WoW64 has been available - it was available on IA-64 architecture as well, where WoW64 has been responsible for emulating x86. Newly, WoW64 is also available on ARM64, enabling emulation of both x86 and ARM32 appllications.
MSDN offers brief article on WoW64 implementation details. We can find that WoW64 consists of (ignoring IA-64):
wow64.dll: translation of
Nt*system calls (
wow64win.dll: translation of
NtUser*and other GUI-related system calls (
wow64cpu.dll: support for running x86 programs on x64
wowarmhw.dll: support for running ARM32 programs on ARM64
xtajit.dll: support for running x86 programs on ARM64
Nt* system call translation, the
wow64.dll provides the core
If you have previous experience with reversing WoW64 on x64, you can notice
that it shares plenty of common code with WoW64 subsystem on ARM64. Especially
if you peeked into WoW64 of recent x64 Windows, you may have noticed that it
actually contains strings such as
SysArm32 and that some functions check
IMAGE_FILE_MACHINE_ARMNT (0x1C4) machine type:
WoW on x64 systems cannot emulate ARM32 though - it just apparently shares
common code. But
SysArm64 sound particularly interesting!
Those similarities can help anyone who is fluent in x86/x64, but not that much in ARM. Also, HexRays decompiler produce much better output for x86/x64 than for ARM32/ARM64.
Initially, my purpose with this blogpost was to get you familiar with how WoW64 works for ARM32 programs on ARM64. But because WoW64 itself changed a lot with Windows 10, and because WoW64 shares some similarities between x64 and ARM64, I decided to briefly get you through how WoW64 works in general.
Everything presented in this article is based on Windows 10 - insider preview, build 18247.
Througout this article I’ll be using some terms I’d like to explain beforehand:
ntdll.dll- these will be always refering to the native
ntdll.dll(x64 on Windows x64, ARM64 on Windows ARM64, …), until said otherwise or until the context wouldn’t indicate otherwise.
ntdll32.dll- to make an easy distinction between native and WoW64
ntdll.dll, any WoW64
ntdll.dllwill be refered with the
emu.dll- these will represent any of the emulation support DLLs (one of
module!FunctionName- refers to a symbol
module. If you’re familiar with WinDbg, you’re already familiar with this notation.
CHPE- “compiled-hybrid-PE”, a new type of PE file, which looks as if it was x86 PE file, but has ARM64 code within them.
CHPEwill be tackled in more detail in the x86 on ARM64 section.
This section shows some points of interest in the
ntoskrnl.exe regarding to
the WoW64 initialization. If you’re interested only in the user-mode part of
the WoW64, you can skip this part to the
Initialization of the WoW64 process.
Initalization of WoW64 begins with the initialization of the kernel:
nt!PsLocateSystemDlls routine takes a pointer named
and then calls
nt!PspLocateSystemDll in a loop. Let’s figure out what’s
going on here:
nt!PspSystemDlls appears to be array of pointers to some structure, which
holds some NTDLL-related data. The order of these NTDLLs corresponds with
enum (included in the PDB):
Now, let’s look how such structure looks like:
nt!PspLocateSystemDll function intializes fields of this structure. The
layout of this structure isn’t unfortunatelly in the PDB, but you can
find a reconstructed version in the appendix.
Now let’s get back to the
nt!Phase1Initialization - there’s more:
nt!PspInitializeSystemDlls routine takes a pointer named
Let’s look at it:
It looks like it’s some sort of array, again, ordered by the
Nothing unexpected - just tuples of function name and function pointer.
Did you notice the difference in the number after the
NtdllExports field? On x64
there is 19 meanwhile on ARM64 there is 14. This number represents number
of items in
NtdllExports - and indeed, there is slightly different set of them:
On the other hand, all
NtdllWow*Exports contain the same set of function names:
Notice names of second fields of these “structures”:
PsWowChpeX86SharedInformation, … If we look at the address of those fields,
we can see that they’re part of another array:
Those addresses are actually targets of the pointers in the
structure. Also, those functions combined with
give you hint that they’re related to this
enum (included in the PDB):
Notice how the order of the
SharedNtdll32BaseAddress corellates with the empty field in
the previous screenshot (highlighted). The set of WoW64 NTDLL functions is same
on both x64 and ARM64.
(The C representation of this data can be found in the appendix.)
Now we can tell what the
nt!PspInitializeSystemDlls function does - it gets
image base of each NTDLL (
nt!PsQuerySystemDllInfo), resolves all
Ntdll*Exports for them (
nt!RtlFindExportedRoutineByName). Also, only for
all WoW64 NTDLLs (
if ((SYSTEM_DLL_TYPE)SystemDllType > PsNativeSystemDll))
it assigns the image base to the
SharedNtdll32BaseAddress field of the
PsWow*SharedInformation array (
Let’s talk briefly about process creation. As you probably already know, the
ntdll.dll is mapped as a first DLL into each created process. This
applies for all architectures - x86, x64 and also for ARM64.
The WoW64 processes aren’t exception to this rule - the WoW64 processes share
the same initialization code path as native processes.
nt!KeInitThread // Entry-point: nt!PspUserThreadStartup
If you ever wondered how is the first user-mode instruction of the newly created
process executed, now you know the answer - a “synthetic” user-mode exception
is dispatched, with
ExceptionRecord.ExceptionAddress = &PspLoaderInitRoutine,
PspLoaderInitRoutine points to the
This is the first function that is executed in every process - including WoW64
The fun part begins!
NOTE: Initialization of the
wow64.dllis same on both x64 and ARM64. Eventual differences will be mentioned.
ntdll!LdrpLoadWow64 function is called when the
ntdll!UseWOW64 global variable is
which is set when
NtCurrentTeb()->WowTebOffset != NULL.
It constructs the full path to the
wow64.dll, loads it, and then resolves
NOTE: The resolution of these pointers is wrapped between pair of
ntdll!LdrProtectMrdatacalls, responsible for protecting (1) and unprotecting (0) the
.mrdatasection - in which these pointers reside.
MRDATA(Mutable Read Only Data) are part of the CFG (Control-Flow Guard) functionality. You can look at Alex’s slides for more information.
When these functions are successfully located, the
transfers control to the
wow64.dll by calling
Let’s go through the sequence of calls that eventually bring us to the entry-point
of the “emulated” application.
wow64!Wow64InfoPtr = (NtCurrentPeb32() + 1)
NtCurrentTeb()->TlsSlots[/* 10 */ WOW64_TLS_WOW64INFO] = wow64!Wow64InfoPtr
wow64!CpuNotifyMapViewOfSection // Process image
wow64!CpuNotifyMapViewOfSection // 32-bit NTDLL image
Wow64InfoPtr is the first initialized variable in the
wow64.dll. It contains
data shared between 32-bit and 64-bit execution mode and its structure is not
documented, although you can find this structure partialy restored in the appendix.
RtlWow64GetCpuAreaInfo is an internal
ntdll.dll function which is called a lot
during emulation. It is mainly used for fetching the machine type and architecture-specific
CPU context (the
CONTEXT structure) of the emulated process. This information is fetched into an undocumented
structure, which we’ll be calling
WOW64_CPU_AREA_INFO. Pointer to this structure
is then given to the
Wow64DetectMachineTypeInternal determines the machine type of the executed
process and returns it.
Wow64SelectSystem32PathInternal selects the “emulated”
System32 directory based on that machine type, e.g.
SysWOW64 for x86 processes
SysArm32 for ARM32 processes.
You can also notice calls to
CpuNotifyMapViewOfSection function. As the name
suggests, it is also called on each “emulated” call of
NtHeaders->OptionalHeader.MajorSubsystemVersion == USER_SHARED_DATA.NtMajorVersion
NtHeaders->OptionalHeader.MinorSubsystemVersion == USER_SHARED_DATA.NtMinorVersion
If these checks pass,
CpupResolveReverseImports function is called. This function
checks if the mapped image exports the
Wow64Transition symbol and if so, it
assigns there a 32-bit pointer value returned by
Wow64Transition is mostly known to be exported by
but there are actually multiple of Windows’ WoW DLLs which exports this symbol.
You might be already familiar with the term “Heaven’s Gate” -
this is where the
Wow64Transition will point to on Windows x64 - a simple far
jump instruction which switches into long-mode (64-bit) enabled code segment.
On ARM64, the
Wow64Transition points to a “nop” function.
NOTE: Because there are no checks on the
Wow64Transitionsymbol is resolved for all executable images that passes the checks mentioned earlier. If you’re wondering whether
Wow64Transitionwould be resolved for your custom executable or DLL - it indeed would!
The initialization then continues with thread-specific initialization by
ThreadInit. This is followed by pair of calls
ThunkStartupContext64TO32(CpuArea.MachineType, CpuArea.Context, NativeContext)
Wow64SetupInitialCall(&CpuArea) - these functions perform the necessary
setup of the architecture-specific WoW64
CONTEXT structure to prepare start
of the execution in the emulated environment. This is done in the exact same
way as if
ntoskrnl.exe would actually executed the emulated application - i.e.:
RunCpuSimulation function is called. This function just
BTCpuSimulate from the binary-translator DLL, which contains the
actual emulation loop that never returns.
wow64!Wow64ProtectMrdata // 0
ntdll!LdrLoadDll // "%SystemRoot%\system32\wow64log.dll"
wow64.dll has also it’s own
.mrdata section and
ProcessInit begins with
unprotecting it. It then tries to load the
wow64log.dll from the constructed
system directory. Note that this DLL is never present in any released Windows
installation (it’s probably used internally by Microsoft for debugging of the
WoW64 subsystem). Therefore, load of this DLL will normally fail. This isn’t
problem, though, because no critical functionality of the WoW64 subsystem
depends on it. If the load would actually succeed, the
wow64.dll would try
to find following exported functions there:
If any of these functions wouldn’t be exported, the DLL would be immediately unloaded.
If we’d drop custom
wow64log.dll (which would export functions mentioned above)
%SystemRoot%\System32 directory, it would actually get loaded into
every WoW64 process. This way we could drop a custom logging DLL, or even inject
every WoW64 process with native DLL!
Then, certain important values are fetched from the
These contains base address of the
ntdll32.dll, pointer to functions like
control flow guard information and others.
Wow64pInitializeFilePathRedirection is called, which - as the name
suggests - initializes WoW64 path redirection. The path redirection is completely
implemented in the
wow64.dll and the mechanism is basically based on string
replacement. The path redirection can be disabled and enabled by calling
function pairs. Both of these functions internally call
which directly operates on
NtCurrentTeb()->TlsSlots[/* 8 */ WOW64_TLS_FILESYSREDIR] field.
ServiceTables array is initialized. You might be already familiar
KSERVICE_TABLE_DESCRIPTOR from the
ntoskrnl.exe, which contains - among
other things - a pointer to an array of system functions callable from the user-mode.
ntoskrnl.exe contains 2 of these tables: one for
ntoskrnl.exe itself and one
win32k.sys, aka the Windows (GUI) subsystem.
wow64.dll has 4 of them!
WOW64_SERVICE_TABLE_DESCRIPTOR has the exact same structure as the
except that it is extended:
(More detailed definition of this structure is in the appendix.)
ServiceTables array is populated as follows:
ServiceTables[/* 0 */ WOW64_NTDLL_SERVICE_INDEX] = sdwhnt32
ServiceTables[/* 1 */ WOW64_WIN32U_SERVICE_INDEX] = wow64win!sdwhwin32
ServiceTables[/* 2 */ WOW64_KERNEL32_SERVICE_INDEX = wow64win!sdwhcon
ServiceTables[/* 3 */ WOW64_USER32_SERVICE_INDEX] = sdwhbase
wow64.dlldirectly depends (by import table) on two DLLs: the native
wow64win.dll. This means that
wow64win.dllis loaded even into “non-Windows-subsystem” processes, that wouldn’t normally load
These two symbols mentioned above are the only symbols that
Let’s have a look at
sdwhnt32 service table:
There is nothing surprising for those who already dealt with service tables in
sdwhnt32JumpTable contains array of the system call functions, which are
traditionaly prefixed. WoW64 “system calls” are prefixed with
honestly I don’t have any idea what it stands for - although it might be the
case as with
Zw* prefix - it stands for nothing and is simply used as an
The job of these
wh* functions is to correctly convert any arguments and
return values from the 32-bit version to the native, 64-bit version. Keep in
mind that that it not only includes conversion of integers and pointers, but
also content of the structures. Interesting note might be that each of the
wh* functions has only one argument, which is pointer to an array of 32-bit
values. This array contains the parameters passed to the 32-bit system call.
As you could notice, in those 4 service tables there are “system calls” that
are not present in the
ntoskrnl.exe. Also, I mentioned earlier that the
Wow64Transition is resolved in multiple DLLs. Currently, these DLLs export
win32u.dll are obvious and they represent the same thing
as their native counterparts. The service tables used by
user32.dll contain functions for transformation of particular
into their 64-bit version.
It’s also worth noting that at the end of the
ntdll.dll system table, there
are several functions with
NtWow64* calls, such as
NtWow64WriteVirtualMemory64 and others. These are special functions which are
provided only to WoW64 processes.
One of these special functions is also
NtWow64CallFunction64. It has it’s own
small dispatch table and callers can select which function should be called
based on its index:
NOTE: I’ll be talking about one of these functions - namely
Wow64CallFunctionTurboThunkControl- later in the Disabling Turbo thunks section.
This function is similar to the kernel’s
nt!KiSystemCall64 - it does the
dispatching of the system call. This function is exported by the
and imported by the emulation DLLs.
Wow64SystemServiceEx accepts 2 arguments:
The system call number isn’t just an index, but also contains index of a system
table which needs to be selected (this is also true for
This function then selects
ServiceTables[ServiceTableIndex] and calls the
wh* function based on the
NOTE: In case the
wow64log.dllhas been successfully loaded, the
wow64log!Wow64LogSystemServicefunction): once before the actual system call and one immediately after. This can be used for instrumentation of each WoW64 system call! The structure passed to
Wow64LogSystemServicecontains every important information about the system call - it’s table index, system call number, the argument list and on the second call, even the resulting
NTSTATUS! You can find layout of this structure in the appendix (
Finally, as have been mentioned, the
KSERVICE_TABLE_DESCRIPTOR in that it contains
The code mentioned above is actually wrapped in a SEH
whService raise an exception, the
__except block calls
Wow64HandleSystemServiceError function. The function looks if the corresponding
service table which raised the exception has non-
ErrorCase and if it does,
it selects the appropriate
WOW64_ERROR_CASE for the system call. If the
NULL, the values from
ErrorCaseDefault are used. The
NTSTATUS of the
exception is then transformed according to an algorithm which can be found in the appendix.
wow64!CpuLoadBinaryTranslator // MachineType
wow64!CpuGetBinaryTranslatorPath // MachineType
ntdll!NtOpenKey // "\Registry\Machine\Software\Microsoft\Wow64\"
ntdll!NtQueryValueKey // "arm" / "x86"
ntdll!RtlGetNtSystemRoot // "arm" / "x86"
ntdll!RtlUnicodeStringPrintf // "%ws\system32\%ws"
As you’ve probably guessed, this function constructs path to the binary-translator DLL,
which is - on x64 - known as
wow64cpu.dll. This DLL will be responsible for
the actual low-level emulation.
We can see that there is no
wow64cpu.dll on ARM64. Instead, there is
used for x86 emulation and
wowarmhw.dll used for ARM32 emulation.
CpuGetBinaryTranslatorPathfunction is same on both x64 and ARM64 except for one peculiar difference: on Windows x64, if the
\Registry\Machine\Software\Microsoft\Wow64\x86key cannot be opened (is missing/was deleted), the function contains a fallback to load
wow64cpu.dll. On Windows ARM64, though, it doesn’t have such fallback and if the registry key is missing, the function fails and the WoW64 process is terminated.
wow64.dll then loads one of the selected DLL and tries to find there following
Interestingly, not all functions need to be found - only those marked with the
“(!)”, the rest is optional. As a next step, the resolved
function is called, which performs binary-translator-specific process initialization.
At the end of the
wow64!Wow64ProtectMrdata(1) is called,
.mrdata non-writable again.
NtCurrentTeb32()->WOW32Reserved = BTCpuGetBopCode()
ThreadInit does some little thread-specific initialization, such as:
IdealProcessorvalues from 64-bit
WOW64_CPUFLAGS_SOFTWAREemulators, it calls
NtCurrentTeb32()->WOW32Reserved = BTCpuGetBopCode().
WOW64_CPUFLAGS_SOFTWAREemulators, it creates an event, which added into
AlertByThreadIdEventHashTableand set to
NtCurrentTeb()->TlsSlots. This event is used for special emulation of
WOW64_CPUFLAGS_MSFT64 (1)or the
WOW64_CPUFLAGS_SOFTWARE (2)flag is stored in the
NtCurrentTeb()->TlsSlots[/* 10 */ WOW64_TLS_WOW64INFO], in the
WOW64INFO.CpuFlagsfield. One of these flags is always set in the emulator’s
BTCpuProcessInitfunction (mentioned in the section above):
RunSimulatedCode runs in a loop and performs transitions into 32-bit mode
jmp fword ptr[reg]- a “far jump” that not only changes instruction pointer (
RIP), but also the code segment register (
CS). This segment usually being set to
0x23, while 64-bit code segment is
iret- called on every “state reset”
NOTE: Explanation of segmentation and “why does it work just by changing a segment register” is beyond scope of this article. If you’d like to know more about “long mode” and segmentation, you can start here.
Far jump is used most of the time for the transition, mainly because it’s faster.
iret on the other hand is more powerful, as it can change
all at once. The “state reset” occurs when
WOW64_CPURESERVED_FLAG_RESET_STATE (1) bit set. This happens during exception
Also, this flag is cleared on every emulation loop (using
btr - bit-test-and-reset).
You can see the simplest form of switching into the 32-bit mode. Also, at the beginning
you can see that
TurboThunkDispatch address is moved into the
This register stays untouched during the whole
The switch back to the 64-bit mode is very similar - it also uses far jumps. The usual situation when code wants to switch back to the 64-bit mode is upon system call:
Wow64SystemServiceCall is just a simple jump to the
If you remember, the
Wow64Transition value is resolved by the
It selects either
KiFastSystemCall2 based on the
KiFastSystemCall looks like this (used when
CpupSystemCallFast != 0):
[x86] jmp 33h:$+9(jumps to the instruction below)
[x64] jmp qword ptr [r15+offset](which points to
KiFastSystemCall2 looks like this (used when
CpupSystemCallFast == 0):
[x86] push 0x33
[x86] push eax
[x86] call $+5
[x86] pop eax
[x86] add eax, 12
[x86] xchg eax, dword ptr [esp]
[x86] jmp fword ptr [esp](jumps to the instruction below)
[x64] add rsp, 8
[x64] jmp wow64cpu!CpupReturnFromSimulatedCode
KiFastSystemCall is faster, so why it’s not used used every time?
It turns out,
CpupSystemCallFast is set to 1 in the
wow64cpu!BTCpuProcessInit function if
the process is not executed with the
ProhibitDynamicCode mitigation policy
NtProtectVirtualMemory(&KiFastSystemCall, PAGE_READ_EXECUTE) succeeds.
This is because
KiFastSystemCall is in a non-executable read-only section (
KiFastSystemCall2 is in read-executable section (
But the actual reason why is
KiFastSystemCall in non-executable section by default and needs to be
set as executable manually is, honestly, unknown to me. My guess would be that
it has something to do with relocations, because the address in the
instruction must be somehow resolved by the loader. But maybe I’m wrong. Let me know if you know the answer!
I hope you didn’t forget about the
TurboThunkDispatch address hanging in the
r15 register. This value is used as a jump-table:
There are 32 items in the jump-table.
CpupReturnFromSimulatedCode is the first code that is always executed in the 64-bit
mode when 32-bit to 64-bit transition occurs. Let’s recapitulate the code:
eax- which contains the encoded service table index and system call number - is moved into the
ecx >> 16.
You might be confused now, because few sections above we’ve defined the service number like this:
…therefore, after right-shifting this value by 16 bits we should get always 0, right?
It turns out, on x64, the
WOW64_SYSTEM_SERVICE might be defined like this:
Let’s examine few WoW64 system calls:
Based on our new definition of
WOW64_SYSTEM_SERVICE, we can conclude that:
NtMapViewOfSectionuses turbo thunk with index 0 (
NtWaitForSingleObjectuses turbo thunk with index 13 (
NtDeviceIoControlFileuses turbo thunk with index 27 (
Let’s finally explain “turbo thunks” in proper way.
Turbo thunks are an optimalization of WoW64 subsystem - specifically on Windows x64 -
that enables for particular system calls to never leave the
wow64cpu.dll - the
conversion of parameters and return value, and the
syscall instruction itself
is fully performed there. The set of functions that use these turbo thunks reveals,
that they are usually very simple in terms of parameter conversion - they receive
numerical values or handles.
The notation of
Thunk* labels is as follows:
Spconverts parameter with sign-extension
NSpconverts parameter without sign-extension
ReloadStatewill return to the 32-bit mode using
iretinstead of far jump, if
DeviceIoctlFile, … are special cases
Let’s take the
NtWaitForSingleObject and its turbo thunk
as an example:
When we cross-check this information with its function prototype, it makes sense:
The sign-extension of
HANDLE makes sense, because if we pass there an
which happens to be
0xFFFFFFFF (-1) on 32-bits, we don’t want to convert this value
On the other hand, if the
TurboThunkNumber is 0, the call will end up in the
TurboDispatchJumpAddressEnd which in turn calls
You can consider this case as the “slow path”.
On Windows x64, the Turbo thunk optimization can be actually disabled!
In one of
the previous sections I’ve been talking about
wow64!Wow64CallFunctionTurboThunkControl functions. As with any other
NtWow64CallFunction64 is only available in the WoW64
This function can be called with an index to WoW64 function in the
Wow64FunctionDispatch64 table (mentioned earlier).
The function prototype might look like this:
NOTE: This function prototype has been reconstructed with the help of the
wow64!Wow64CallFunction64Nopfunction code, which just logs the parameters.
We can see that
wow64!Wow64CallFunctionTurboThunkControl can be called with an
index of 2. This function performs some sanity checks and then passes calls
wow64cpu!BTCpuTurboThunkControl then checks the input parameter.
TurboDispatchJumpAddressEnd(remember, this is the target that is called when
This means 2 things:
wow64cpu!BTCpuTurboThunkControl(0)disables the Turbo thunks, and every system call ends up taking the “slow path”.
With all this in mind, we can achieve disabling Turbo thunks by this call:
What it might be good for? I can think of 3 possible use-cases:
If we deploy custom
wow64log.dll, disabling Turbo thunks
guarantees that we will see every WoW64 system call in our
wow64log!Wow64LogSystemService callback. We wouldn’t see such calls if the Turbo thunks
were enabled, because they would take the “fast path” inside of the
syscall would be executed.
If we decide to hook
Nt* functions in the native
Turbo thunks guarantees that for each
Nt* function called in the
Nt* function will be called in the native
(This is basically the same point as the previous one.)
NOTE: Keep in mind that this only applies on system calls, i.e. on
Zw*functions. Other functions are not called from the 32-bit
ntdll.dllto the 64-bit
ntdll.dll. For example, if we hooked
RtlDecompressBufferin the native
ntdll.dllof the WoW64 process, it wouldn’t be called on
ntdll32!RtlDecompressBuffercall. This is because the full implementaion of the
Rtl*functions is already in the
We can “harmlessly” patch high-word moved to the
eax in every WoW64 system call stub to 0.
For example we could see in
NtWaitForSingleObject there is
mov eax, 0D0004h.
If we patched appropriate 2 bytes in that instruction so that the instruction
mov eax, 4h, the system call would still work.
This approach can be used as an anti-hooking technique - if there’s a jump at the start of the function, the patch will break it. If there’s not a jump, we just disable the Turbo thunk for this function.
Emulation of x86 applications on ARM64 is handled by an actual binary translation.
xtajit.dll (probably shortcut for “x86 to ARM64 JIT”)
is used for its emulation. As with other emulation DLLs, this DLL is native (ARM64).
The x86 emulation on Windows ARM64 consists also of other “XTA” components:
xtac.exe- XTA Compiler
XtaCache.exe- XTA Cache Service
Execution of x86 programs on ARM64 appears to go way behind just emulation. It
is also capable of caching already binary-translated code, so that next execution
of the same application should be faster. This cache is located in the
directory which contains files in format
These files are then mapped to the user-mode address space of the application.
If you’re asking whether you can find an actual ARM64 code in these files - indeed,
Unfortunatelly, Microsoft doesn’t provide symbols to any of these
xta* DLLs or executables. But if
you’re feeling adventurous, you can find some interesting artifacts, like
this array of structures inside of the
xtajit.dll, which contains name of the function and its pointer.
There are thousands of items in this array:
With a simple Python script, we can mass-rename all functions referenced in this array:
I’d like to thank Milan Boháček for providing me this script.
One thing you can observe on ARM64 is that it contains two folders used for x86
emulation. The difference between them is that
SyCHPE32 contains small subset
of DLLs that are frequently used by applications, while contents of the
folder is quite identical with the content of this folder on Windows x64.
CHPE DLLs are not pure-x86 DLLs and not even pure-ARM64 DLLs. They are
“compiled-hybrid-PE”s. What does it mean? Let’s see:
SyCHPE32\ntdll.dll, IDA will first tell us - unsurprisingly -
that it cannot download PDB for this DLL. After looking at randomly chosen
function, we can see that it doesn’t differ from what we would see in the
SysWOW64\ntdll.dll. Let’s look at some non-
We can see it contains regular x86 function prologue, immediately followed by x86 function epilogue and then jump somewhere, where it looks like that there’s just garbage. That “garbage” is actually ARM64 code of that function.
My guess is that the reason for this prologue is probably compatibility with applications that check whether some particular functions are hooked or not - by checking if the first bytes of the function contain real x86 prologue.
NOTE: Again, if you’re feeling adventurous, you can patch
FileHeader.Machinefield in the PE header to
IMAGE_FILE_MACHINE_ARM64 (0xAA64)and open this file in IDA. You will see a whole lot of correctly resolved ARM64 functions. Again, I’d like to thank to Milan Boháček for this tip.
If your question is “how are these images generated?”, I would answer that I don’t know,
but my bet would be on some internal version of Microsoft’s C++ compiler toolchain. This idea
appears to be supported by various occurences of the
CHPE keyword in the ChakraCore codebase.
The loop inside of the
wowarmhw!BTCpuSimulate is fairly simple compared to
CpupSwitchTo32Bit does nothing else than saving the whole
instruction and then restoring the
I won’t be explaining here how system call dispatching works in the
Bruce Dang already did an excellent job doing it.
This section is a follow up on his article, though.
SVC instruction is sort-of equivalent of
SYSCALL instruction on ARM64 - it
basically enters the kernel mode. But there is a small difference between
SVC: while on Windows x64 the system call number is moved into
eax register, on ARM64 the system call number can be encoded directly
Let’s peek for a moment into the kernel to see how is this
SVC instruction handled:
We can see that:
MRS X30, ELR_EL1- current interrupt-return address (stored in
ELR_EL1system register) will be moved to the register
X30(link register -
MSR ELR_EL1, X15- the interrupt-return address will be replaced by value in the register
X15(which is aliased to the instruction pointer register -
PC- in the 32-bit mode).
ORR X16, X16, #0b10000- bit  is being set in
X16which is later moved to the
SPSR_EL1register. Setting this bit switches the execution mode to 32-bits.
Simply said, in the
X15 register, there is an address that will be
executed once we leave the kernel-mode and enter the user-mode - which happens
ERET instruction at the end.
Alright, we’re in the 32-bit ARM mode now, how exactly do we leave? Windows
solves this transition via
UND instruction - which is similar to the
instruction on the Intel CPUs. If you’re not familiar with it, you just
need to know that it is instruction that basically guarantees that it’ll
throw “invalid instruction” exception which can OS kernel handle. It is defined-“undefined instruction”.
Again there is the same difference between the
UD2 instruction in
that the ARM can have any 1-byte immediate value encoded directly in the
Let’s look at the
NtMapViewOfSection system call in the
Let’s peek into the kernel again:
Keep in mind that meanwhile the 32-bit code is running, it cannot modify the value of the
X30 register - it is not visible in 32-bit mode. It stays there the
whole time. Upon
UND #0xF8 execution, following happens:
KiFetchOpcodeAndEmulatefunction moves value of
X24register (not shown on the screenshot).
AND X19, X16, #0xFFFFFFFFFFFFFFC0- bit  (among others) is being cleared in the
X19register, which is later moved to the
SPSR_EL1register. Clearing this bit switches the execution mode back to 64-bits.
KiExit32BitModethen moves the value of
X24register into the
ELR_EL1register. That means when this function finishes its execution, the
ERETbrings us back to the 64bit code, right after the
NOTE: It can be noticed that Windows uses
UNDinstruction for several purposes. Common example might also be
UND #0xFEwhich is used as a breakpoint instruction (equivalent of
As you could spot, 3 kernel transitions are required for emulation of the
system call (
SVC 0xFFFF, system call itself,
UND 0xF8). This is because on
ARM there doesn’t exist a way how to switch between 32-bit and 64-bit mode only
If you’re looking for “ARM Heaven’s Gate” - this is it. Put whatever function
address you like into the
X15 register and execute
Next instruction will be executed in the 32-bit ARM mode, starting with that
address. When you feel you’d like to come back into 64-bit mode, simply
UND #0xF8 and your execution will continue with the next instruction
How does one retrieve the 32-bit context of a Wow64 program from a 64-bit process on Windows Server 2003 x64?
Mixing x86 with x64 code
Windows 10 on ARM
Knockin’ on Heaven’s Gate – Dynamic Processor Mode Switching
Closing “Heaven’s Gate”