switcher_32.S 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388
  1. /*P:900
  2. * This is the Switcher: code which sits at 0xFFC00000 (or 0xFFE00000) astride
  3. * both the Host and Guest to do the low-level Guest<->Host switch. It is as
  4. * simple as it can be made, but it's naturally very specific to x86.
  5. *
  6. * You have now completed Preparation. If this has whet your appetite; if you
  7. * are feeling invigorated and refreshed then the next, more challenging stage
  8. * can be found in "make Guest".
  9. :*/
  10. /*M:012
  11. * Lguest is meant to be simple: my rule of thumb is that 1% more LOC must
  12. * gain at least 1% more performance. Since neither LOC nor performance can be
  13. * measured beforehand, it generally means implementing a feature then deciding
  14. * if it's worth it. And once it's implemented, who can say no?
  15. *
  16. * This is why I haven't implemented this idea myself. I want to, but I
  17. * haven't. You could, though.
  18. *
  19. * The main place where lguest performance sucks is Guest page faulting. When
  20. * a Guest userspace process hits an unmapped page we switch back to the Host,
  21. * walk the page tables, find it's not mapped, switch back to the Guest page
  22. * fault handler, which calls a hypercall to set the page table entry, then
  23. * finally returns to userspace. That's two round-trips.
  24. *
  25. * If we had a small walker in the Switcher, we could quickly check the Guest
  26. * page table and if the page isn't mapped, immediately reflect the fault back
  27. * into the Guest. This means the Switcher would have to know the top of the
  28. * Guest page table and the page fault handler address.
  29. *
  30. * For simplicity, the Guest should only handle the case where the privilege
  31. * level of the fault is 3 and probably only not present or write faults. It
  32. * should also detect recursive faults, and hand the original fault to the
  33. * Host (which is actually really easy).
  34. *
  35. * Two questions remain. Would the performance gain outweigh the complexity?
  36. * And who would write the verse documenting it?
  37. :*/
  38. /*M:011
  39. * Lguest64 handles NMI. This gave me NMI envy (until I looked at their
  40. * code). It's worth doing though, since it would let us use oprofile in the
  41. * Host when a Guest is running.
  42. :*/
  43. /*S:100
  44. * Welcome to the Switcher itself!
  45. *
  46. * This file contains the low-level code which changes the CPU to run the Guest
  47. * code, and returns to the Host when something happens. Understand this, and
  48. * you understand the heart of our journey.
  49. *
  50. * Because this is in assembler rather than C, our tale switches from prose to
  51. * verse. First I tried limericks:
  52. *
  53. * There once was an eax reg,
  54. * To which our pointer was fed,
  55. * It needed an add,
  56. * Which asm-offsets.h had
  57. * But this limerick is hurting my head.
  58. *
  59. * Next I tried haikus, but fitting the required reference to the seasons in
  60. * every stanza was quickly becoming tiresome:
  61. *
  62. * The %eax reg
  63. * Holds "struct lguest_pages" now:
  64. * Cherry blossoms fall.
  65. *
  66. * Then I started with Heroic Verse, but the rhyming requirement leeched away
  67. * the content density and led to some uniquely awful oblique rhymes:
  68. *
  69. * These constants are coming from struct offsets
  70. * For use within the asm switcher text.
  71. *
  72. * Finally, I settled for something between heroic hexameter, and normal prose
  73. * with inappropriate linebreaks. Anyway, it aint no Shakespeare.
  74. */
  75. // Not all kernel headers work from assembler
  76. // But these ones are needed: the ENTRY() define
  77. // And constants extracted from struct offsets
  78. // To avoid magic numbers and breakage:
  79. // Should they change the compiler can't save us
  80. // Down here in the depths of assembler code.
  81. #include <linux/linkage.h>
  82. #include <asm/asm-offsets.h>
  83. #include <asm/page.h>
  84. #include <asm/segment.h>
  85. #include <asm/lguest.h>
  86. // We mark the start of the code to copy
  87. // It's placed in .text tho it's never run here
  88. // You'll see the trick macro at the end
  89. // Which interleaves data and text to effect.
  90. .text
  91. ENTRY(start_switcher_text)
  92. // When we reach switch_to_guest we have just left
  93. // The safe and comforting shores of C code
  94. // %eax has the "struct lguest_pages" to use
  95. // Where we save state and still see it from the Guest
  96. // And %ebx holds the Guest shadow pagetable:
  97. // Once set we have truly left Host behind.
  98. ENTRY(switch_to_guest)
  99. // We told gcc all its regs could fade,
  100. // Clobbered by our journey into the Guest
  101. // We could have saved them, if we tried
  102. // But time is our master and cycles count.
  103. // Segment registers must be saved for the Host
  104. // We push them on the Host stack for later
  105. pushl %es
  106. pushl %ds
  107. pushl %gs
  108. pushl %fs
  109. // But the compiler is fickle, and heeds
  110. // No warning of %ebp clobbers
  111. // When frame pointers are used. That register
  112. // Must be saved and restored or chaos strikes.
  113. pushl %ebp
  114. // The Host's stack is done, now save it away
  115. // In our "struct lguest_pages" at offset
  116. // Distilled into asm-offsets.h
  117. movl %esp, LGUEST_PAGES_host_sp(%eax)
  118. // All saved and there's now five steps before us:
  119. // Stack, GDT, IDT, TSS
  120. // Then last of all the page tables are flipped.
  121. // Yet beware that our stack pointer must be
  122. // Always valid lest an NMI hits
  123. // %edx does the duty here as we juggle
  124. // %eax is lguest_pages: our stack lies within.
  125. movl %eax, %edx
  126. addl $LGUEST_PAGES_regs, %edx
  127. movl %edx, %esp
  128. // The Guest's GDT we so carefully
  129. // Placed in the "struct lguest_pages" before
  130. lgdt LGUEST_PAGES_guest_gdt_desc(%eax)
  131. // The Guest's IDT we did partially
  132. // Copy to "struct lguest_pages" as well.
  133. lidt LGUEST_PAGES_guest_idt_desc(%eax)
  134. // The TSS entry which controls traps
  135. // Must be loaded up with "ltr" now:
  136. // The GDT entry that TSS uses
  137. // Changes type when we load it: damn Intel!
  138. // For after we switch over our page tables
  139. // That entry will be read-only: we'd crash.
  140. movl $(GDT_ENTRY_TSS*8), %edx
  141. ltr %dx
  142. // Look back now, before we take this last step!
  143. // The Host's TSS entry was also marked used;
  144. // Let's clear it again for our return.
  145. // The GDT descriptor of the Host
  146. // Points to the table after two "size" bytes
  147. movl (LGUEST_PAGES_host_gdt_desc+2)(%eax), %edx
  148. // Clear "used" from type field (byte 5, bit 2)
  149. andb $0xFD, (GDT_ENTRY_TSS*8 + 5)(%edx)
  150. // Once our page table's switched, the Guest is live!
  151. // The Host fades as we run this final step.
  152. // Our "struct lguest_pages" is now read-only.
  153. movl %ebx, %cr3
  154. // The page table change did one tricky thing:
  155. // The Guest's register page has been mapped
  156. // Writable under our %esp (stack) --
  157. // We can simply pop off all Guest regs.
  158. popl %eax
  159. popl %ebx
  160. popl %ecx
  161. popl %edx
  162. popl %esi
  163. popl %edi
  164. popl %ebp
  165. popl %gs
  166. popl %fs
  167. popl %ds
  168. popl %es
  169. // Near the base of the stack lurk two strange fields
  170. // Which we fill as we exit the Guest
  171. // These are the trap number and its error
  172. // We can simply step past them on our way.
  173. addl $8, %esp
  174. // The last five stack slots hold return address
  175. // And everything needed to switch privilege
  176. // From Switcher's level 0 to Guest's 1,
  177. // And the stack where the Guest had last left it.
  178. // Interrupts are turned back on: we are Guest.
  179. iret
  180. // We tread two paths to switch back to the Host
  181. // Yet both must save Guest state and restore Host
  182. // So we put the routine in a macro.
  183. #define SWITCH_TO_HOST \
  184. /* We save the Guest state: all registers first \
  185. * Laid out just as "struct lguest_regs" defines */ \
  186. pushl %es; \
  187. pushl %ds; \
  188. pushl %fs; \
  189. pushl %gs; \
  190. pushl %ebp; \
  191. pushl %edi; \
  192. pushl %esi; \
  193. pushl %edx; \
  194. pushl %ecx; \
  195. pushl %ebx; \
  196. pushl %eax; \
  197. /* Our stack and our code are using segments \
  198. * Set in the TSS and IDT \
  199. * Yet if we were to touch data we'd use \
  200. * Whatever data segment the Guest had. \
  201. * Load the lguest ds segment for now. */ \
  202. movl $(LGUEST_DS), %eax; \
  203. movl %eax, %ds; \
  204. /* So where are we? Which CPU, which struct? \
  205. * The stack is our clue: our TSS starts \
  206. * It at the end of "struct lguest_pages". \
  207. * Or we may have stumbled while restoring \
  208. * Our Guest segment regs while in switch_to_guest, \
  209. * The fault pushed atop that part-unwound stack. \
  210. * If we round the stack down to the page start \
  211. * We're at the start of "struct lguest_pages". */ \
  212. movl %esp, %eax; \
  213. andl $(~(1 << PAGE_SHIFT - 1)), %eax; \
  214. /* Save our trap number: the switch will obscure it \
  215. * (In the Host the Guest regs are not mapped here) \
  216. * %ebx holds it safe for deliver_to_host */ \
  217. movl LGUEST_PAGES_regs_trapnum(%eax), %ebx; \
  218. /* The Host GDT, IDT and stack! \
  219. * All these lie safely hidden from the Guest: \
  220. * We must return to the Host page tables \
  221. * (Hence that was saved in struct lguest_pages) */ \
  222. movl LGUEST_PAGES_host_cr3(%eax), %edx; \
  223. movl %edx, %cr3; \
  224. /* As before, when we looked back at the Host \
  225. * As we left and marked TSS unused \
  226. * So must we now for the Guest left behind. */ \
  227. andb $0xFD, (LGUEST_PAGES_guest_gdt+GDT_ENTRY_TSS*8+5)(%eax); \
  228. /* Switch to Host's GDT, IDT. */ \
  229. lgdt LGUEST_PAGES_host_gdt_desc(%eax); \
  230. lidt LGUEST_PAGES_host_idt_desc(%eax); \
  231. /* Restore the Host's stack where its saved regs lie */ \
  232. movl LGUEST_PAGES_host_sp(%eax), %esp; \
  233. /* Last the TSS: our Host is returned */ \
  234. movl $(GDT_ENTRY_TSS*8), %edx; \
  235. ltr %dx; \
  236. /* Restore now the regs saved right at the first. */ \
  237. popl %ebp; \
  238. popl %fs; \
  239. popl %gs; \
  240. popl %ds; \
  241. popl %es
  242. // The first path is trod when the Guest has trapped:
  243. // (Which trap it was has been pushed on the stack).
  244. // We need only switch back, and the Host will decode
  245. // Why we came home, and what needs to be done.
  246. return_to_host:
  247. SWITCH_TO_HOST
  248. iret
  249. // We are lead to the second path like so:
  250. // An interrupt, with some cause external
  251. // Has ajerked us rudely from the Guest's code
  252. // Again we must return home to the Host
  253. deliver_to_host:
  254. SWITCH_TO_HOST
  255. // But now we must go home via that place
  256. // Where that interrupt was supposed to go
  257. // Had we not been ensconced, running the Guest.
  258. // Here we see the trickness of run_guest_once():
  259. // The Host stack is formed like an interrupt
  260. // With EIP, CS and EFLAGS layered.
  261. // Interrupt handlers end with "iret"
  262. // And that will take us home at long long last.
  263. // But first we must find the handler to call!
  264. // The IDT descriptor for the Host
  265. // Has two bytes for size, and four for address:
  266. // %edx will hold it for us for now.
  267. movl (LGUEST_PAGES_host_idt_desc+2)(%eax), %edx
  268. // We now know the table address we need,
  269. // And saved the trap's number inside %ebx.
  270. // Yet the pointer to the handler is smeared
  271. // Across the bits of the table entry.
  272. // What oracle can tell us how to extract
  273. // From such a convoluted encoding?
  274. // I consulted gcc, and it gave
  275. // These instructions, which I gladly credit:
  276. leal (%edx,%ebx,8), %eax
  277. movzwl (%eax),%edx
  278. movl 4(%eax), %eax
  279. xorw %ax, %ax
  280. orl %eax, %edx
  281. // Now the address of the handler's in %edx
  282. // We call it now: its "iret" drops us home.
  283. jmp *%edx
  284. // Every interrupt can come to us here
  285. // But we must truly tell each apart.
  286. // They number two hundred and fifty six
  287. // And each must land in a different spot,
  288. // Push its number on stack, and join the stream.
  289. // And worse, a mere six of the traps stand apart
  290. // And push on their stack an addition:
  291. // An error number, thirty two bits long
  292. // So we punish the other two fifty
  293. // And make them push a zero so they match.
  294. // Yet two fifty six entries is long
  295. // And all will look most the same as the last
  296. // So we create a macro which can make
  297. // As many entries as we need to fill.
  298. // Note the change to .data then .text:
  299. // We plant the address of each entry
  300. // Into a (data) table for the Host
  301. // To know where each Guest interrupt should go.
  302. .macro IRQ_STUB N TARGET
  303. .data; .long 1f; .text; 1:
  304. // Trap eight, ten through fourteen and seventeen
  305. // Supply an error number. Else zero.
  306. .if (\N <> 8) && (\N < 10 || \N > 14) && (\N <> 17)
  307. pushl $0
  308. .endif
  309. pushl $\N
  310. jmp \TARGET
  311. ALIGN
  312. .endm
  313. // This macro creates numerous entries
  314. // Using GAS macros which out-power C's.
  315. .macro IRQ_STUBS FIRST LAST TARGET
  316. irq=\FIRST
  317. .rept \LAST-\FIRST+1
  318. IRQ_STUB irq \TARGET
  319. irq=irq+1
  320. .endr
  321. .endm
  322. // Here's the marker for our pointer table
  323. // Laid in the data section just before
  324. // Each macro places the address of code
  325. // Forming an array: each one points to text
  326. // Which handles interrupt in its turn.
  327. .data
  328. .global default_idt_entries
  329. default_idt_entries:
  330. .text
  331. // The first two traps go straight back to the Host
  332. IRQ_STUBS 0 1 return_to_host
  333. // We'll say nothing, yet, about NMI
  334. IRQ_STUB 2 handle_nmi
  335. // Other traps also return to the Host
  336. IRQ_STUBS 3 31 return_to_host
  337. // All interrupts go via their handlers
  338. IRQ_STUBS 32 127 deliver_to_host
  339. // 'Cept system calls coming from userspace
  340. // Are to go to the Guest, never the Host.
  341. IRQ_STUB 128 return_to_host
  342. IRQ_STUBS 129 255 deliver_to_host
  343. // The NMI, what a fabulous beast
  344. // Which swoops in and stops us no matter that
  345. // We're suspended between heaven and hell,
  346. // (Or more likely between the Host and Guest)
  347. // When in it comes! We are dazed and confused
  348. // So we do the simplest thing which one can.
  349. // Though we've pushed the trap number and zero
  350. // We discard them, return, and hope we live.
  351. handle_nmi:
  352. addl $8, %esp
  353. iret
  354. // We are done; all that's left is Mastery
  355. // And "make Mastery" is a journey long
  356. // Designed to make your fingers itch to code.
  357. // Here ends the text, the file and poem.
  358. ENTRY(end_switcher_text)