'How to switch from 32-bit to PAE paging directly?

I'm developing a microkernel for my personal research. I have chosen to run my kernel at 0xf0000000, leaving 3.75 GiB for user space programs. When my kernel starts up, it sets up 32-bit paging (with hardcoded page directory and page tables). Then it checks if PAE is supported on host machine and sets up page directory pointer table (PDPT). But the problem comes when I try to load it into %cr3. According to the Intel Software Developer Manual:

Software can transition between 32-bit paging and PAE paging by changing the value of CR4.PAE with MOV to CR4.

So tried to use the following code to switch to PAE paging:

movl %cr4, %eax
orl $(1 << 5), %eax
movl %eax, %cr4
movl %ebx, %cr3 // %ebx holds physical address to PDPT

Or, in Intel syntax (or NASM):

mov eax, cr4
or eax, 1 << 5
mov cr4, eax
mov cr3, ebx // ebx holds physical address to PDPT

But it fails (on QEMU). It writes to %cr4, sets %eip to next instruction, executes it (atleast GDB says this), and resets. I tried to write to %cr3 before %cr4, but still the same result.

Then I tried to switch to PAE paging by: unset PG -> set PAE -> write to %cr3 -> set PG and I succeeded. But I want switch to PAE paging directly. How is that possible?



Solution 1:[1]

Then I tried to switch to PAE paging by: unset PG -> set PAE -> write to %cr3 -> set PG and I succeeded. But I want switch to PAE paging directly. How is that possible?

It's not possible.

If "plain paging" is already in use/enabled, then you can't atomically enable PAE and load CR3 at the same time, so (regardless of whether you load CR3 first then CR4, or load CR4 first then try to load CR3) whichever instruction happens first will make the CPU crash before the second instruction is fetched.

The only way is to temporarily disable paging.

Solution 2:[2]

Finally, I have figured out an way to switch to PAE paging directly, thanks to @Brendan and his valuable comment. To switch to from 32-bit paging to PAE paging directly, I had to trick the CPU. My kernel's virtual base is at 0xf0000000. So first 960 PDEs were unused after I had jumped to jump higher half. So, I copied my new PDPT (Page Directory Pointer Table) to my initial 32-bit page directory. Then I set PAE bit and the CPU was happy as it was reading only first 32 bytes of the initial page directory which was holding the PDPT.

The process was like this:

  1. memcpy (initial_page_directory, new_pdpt, 32);
  2. Enable PAE bit in %cr4. CPU happily reads the PDPT from the first 32 bytes of your initial page directory you have overwritten in step 1.
  3. Load your new PDPT into %cr3.

Note: This process won't work if you try to access first 32 MiB of mapped memory (if you have mapped it) between step 1 and 2 (possibly undefined behaviour or maybe a triple fault and reset).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Brendan
Solution 2