0003-aarch64-Mitigate-SLS-for-BLR-instruction.patch 25 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658
  1. Upstream-Status: Backport
  2. Signed-off-by: Ross Burton <ross.burton@arm.com>
  3. From a5e7efc40ed841934c1d913f39476afa17d8e5f7 Mon Sep 17 00:00:00 2001
  4. From: Matthew Malcomson <matthew.malcomson@arm.com>
  5. Date: Thu, 9 Jul 2020 09:11:59 +0100
  6. Subject: [PATCH 3/3] aarch64: Mitigate SLS for BLR instruction
  7. This patch introduces the mitigation for Straight Line Speculation past
  8. the BLR instruction.
  9. This mitigation replaces BLR instructions with a BL to a stub which uses
  10. a BR to jump to the original value. These function stubs are then
  11. appended with a speculation barrier to ensure no straight line
  12. speculation happens after these jumps.
  13. When optimising for speed we use a set of stubs for each function since
  14. this should help the branch predictor make more accurate predictions
  15. about where a stub should branch.
  16. When optimising for size we use one set of stubs for all functions.
  17. This set of stubs can have human readable names, and we are using
  18. `__call_indirect_x<N>` for register x<N>.
  19. When BTI branch protection is enabled the BLR instruction can jump to a
  20. `BTI c` instruction using any register, while the BR instruction can
  21. only jump to a `BTI c` instruction using the x16 or x17 registers.
  22. Hence, in order to ensure this transformation is safe we mov the value
  23. of the original register into x16 and use x16 for the BR.
  24. As an example when optimising for size:
  25. a
  26. BLR x0
  27. instruction would get transformed to something like
  28. BL __call_indirect_x0
  29. where __call_indirect_x0 labels a thunk that contains
  30. __call_indirect_x0:
  31. MOV X16, X0
  32. BR X16
  33. <speculation barrier>
  34. The first version of this patch used local symbols specific to a
  35. compilation unit to try and avoid relocations.
  36. This was mistaken since functions coming from the same compilation unit
  37. can still be in different sections, and the assembler will insert
  38. relocations at jumps between sections.
  39. On any relocation the linker is permitted to emit a veneer to handle
  40. jumps between symbols that are very far apart. The registers x16 and
  41. x17 may be clobbered by these veneers.
  42. Hence the function stubs cannot rely on the values of x16 and x17 being
  43. the same as just before the function stub is called.
  44. Similar can be said for the hot/cold partitioning of single functions,
  45. so function-local stubs have the same restriction.
  46. This updated version of the patch never emits function stubs for x16 and
  47. x17, and instead forces other registers to be used.
  48. Given the above, there is now no benefit to local symbols (since they
  49. are not enough to avoid dealing with linker intricacies). This patch
  50. now uses global symbols with hidden visibility each stored in their own
  51. COMDAT section. This means stubs can be shared between compilation
  52. units while still avoiding the PLT indirection.
  53. This patch also removes the `__call_indirect_x30` stub (and
  54. function-local equivalent) which would simply jump back to the original
  55. location.
  56. The function-local stubs are emitted to the assembly output file in one
  57. chunk, which means we need not add the speculation barrier directly
  58. after each one.
  59. This is because we know for certain that the instructions directly after
  60. the BR in all but the last function stub will be from another one of
  61. these stubs and hence will not contain a speculation gadget.
  62. Instead we add a speculation barrier at the end of the sequence of
  63. stubs.
  64. The global stubs are emitted in COMDAT/.linkonce sections by
  65. themselves so that the linker can remove duplicates from multiple object
  66. files. This means they are not emitted in one chunk, and each one must
  67. include the speculation barrier.
  68. Another difference is that since the global stubs are shared across
  69. compilation units we do not know that all functions will be targeting an
  70. architecture supporting the SB instruction.
  71. Rather than provide multiple stubs for each architecture, we provide a
  72. stub that will work for all architectures -- using the DSB+ISB barrier.
  73. This mitigation does not apply for BLR instructions in the following
  74. places:
  75. - Some accesses to thread-local variables use a code sequence with a BLR
  76. instruction. This code sequence is part of the binary interface between
  77. compiler and linker. If this BLR instruction needs to be mitigated, it'd
  78. probably be best to do so in the linker. It seems that the code sequence
  79. for thread-local variable access is unlikely to lead to a Spectre Revalation
  80. Gadget.
  81. - PLT stubs are produced by the linker and each contain a BLR instruction.
  82. It seems that at most only after the last PLT stub a Spectre Revalation
  83. Gadget might appear.
  84. Testing:
  85. Bootstrap and regtest on AArch64
  86. (with BOOT_CFLAGS="-mharden-sls=retbr,blr")
  87. Used a temporary hack(1) in gcc-dg.exp to use these options on every
  88. test in the testsuite, a slight modification to emit the speculation
  89. barrier after every function stub, and a script to check that the
  90. output never emitted a BLR, or unmitigated BR or RET instruction.
  91. Similar on an aarch64-none-elf cross-compiler.
  92. 1) Temporary hack emitted a speculation barrier at the end of every stub
  93. function, and used a script to ensure that:
  94. a) Every RET or BR is immediately followed by a speculation barrier.
  95. b) No BLR instruction is emitted by compiler.
  96. gcc/ChangeLog:
  97. * config/aarch64/aarch64-protos.h (aarch64_indirect_call_asm):
  98. New declaration.
  99. * config/aarch64/aarch64.c (aarch64_regno_regclass): Handle new
  100. stub registers class.
  101. (aarch64_class_max_nregs): Likewise.
  102. (aarch64_register_move_cost): Likewise.
  103. (aarch64_sls_shared_thunks): Global array to store stub labels.
  104. (aarch64_sls_emit_function_stub): New.
  105. (aarch64_create_blr_label): New.
  106. (aarch64_sls_emit_blr_function_thunks): New.
  107. (aarch64_sls_emit_shared_blr_thunks): New.
  108. (aarch64_asm_file_end): New.
  109. (aarch64_indirect_call_asm): New.
  110. (TARGET_ASM_FILE_END): Use aarch64_asm_file_end.
  111. (TARGET_ASM_FUNCTION_EPILOGUE): Use
  112. aarch64_sls_emit_blr_function_thunks.
  113. * config/aarch64/aarch64.h (STB_REGNUM_P): New.
  114. (enum reg_class): Add STUB_REGS class.
  115. (machine_function): Introduce `call_via` array for
  116. function-local stub labels.
  117. * config/aarch64/aarch64.md (*call_insn, *call_value_insn): Use
  118. aarch64_indirect_call_asm to emit code when hardening BLR
  119. instructions.
  120. * config/aarch64/constraints.md (Ucr): New constraint
  121. representing registers for indirect calls. Is GENERAL_REGS
  122. usually, and STUB_REGS when hardening BLR instruction against
  123. SLS.
  124. * config/aarch64/predicates.md (aarch64_general_reg): STUB_REGS class
  125. is also a general register.
  126. gcc/testsuite/ChangeLog:
  127. * gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c: New test.
  128. * gcc.target/aarch64/sls-mitigation/sls-miti-blr.c: New test.
  129. ---
  130. gcc/config/aarch64/aarch64-protos.h | 1 +
  131. gcc/config/aarch64/aarch64.c | 225 ++++++++++++++++++++-
  132. gcc/config/aarch64/aarch64.h | 15 ++
  133. gcc/config/aarch64/aarch64.md | 11 +-
  134. gcc/config/aarch64/constraints.md | 9 +
  135. gcc/config/aarch64/predicates.md | 3 +-
  136. .../aarch64/sls-mitigation/sls-miti-blr-bti.c | 40 ++++
  137. .../aarch64/sls-mitigation/sls-miti-blr.c | 33 +++
  138. 8 files changed, 328 insertions(+), 9 deletions(-)
  139. create mode 100644 gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
  140. create mode 100644 gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
  141. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
  142. index ee0ffde..839f801 100644
  143. --- a/gcc/config/aarch64/aarch64-protos.h
  144. +++ b/gcc/config/aarch64/aarch64-protos.h
  145. @@ -782,6 +782,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
  146. tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
  147. const char *aarch64_sls_barrier (int);
  148. +const char *aarch64_indirect_call_asm (rtx);
  149. extern bool aarch64_harden_sls_retbr_p (void);
  150. extern bool aarch64_harden_sls_blr_p (void);
  151. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
  152. index 2389d49..0f7bba3 100644
  153. --- a/gcc/config/aarch64/aarch64.c
  154. +++ b/gcc/config/aarch64/aarch64.c
  155. @@ -10605,6 +10605,9 @@ aarch64_label_mentioned_p (rtx x)
  156. enum reg_class
  157. aarch64_regno_regclass (unsigned regno)
  158. {
  159. + if (STUB_REGNUM_P (regno))
  160. + return STUB_REGS;
  161. +
  162. if (GP_REGNUM_P (regno))
  163. return GENERAL_REGS;
  164. @@ -10939,6 +10942,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
  165. unsigned int nregs, vec_flags;
  166. switch (regclass)
  167. {
  168. + case STUB_REGS:
  169. case TAILCALL_ADDR_REGS:
  170. case POINTER_REGS:
  171. case GENERAL_REGS:
  172. @@ -13155,10 +13159,12 @@ aarch64_register_move_cost (machine_mode mode,
  173. = aarch64_tune_params.regmove_cost;
  174. /* Caller save and pointer regs are equivalent to GENERAL_REGS. */
  175. - if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS)
  176. + if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS
  177. + || to == STUB_REGS)
  178. to = GENERAL_REGS;
  179. - if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS)
  180. + if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS
  181. + || from == STUB_REGS)
  182. from = GENERAL_REGS;
  183. /* Make RDFFR very expensive. In particular, if we know that the FFR
  184. @@ -22957,6 +22963,215 @@ aarch64_sls_barrier (int mitigation_required)
  185. : "";
  186. }
  187. +static GTY (()) tree aarch64_sls_shared_thunks[30];
  188. +static GTY (()) bool aarch64_sls_shared_thunks_needed = false;
  189. +const char *indirect_symbol_names[30] = {
  190. + "__call_indirect_x0",
  191. + "__call_indirect_x1",
  192. + "__call_indirect_x2",
  193. + "__call_indirect_x3",
  194. + "__call_indirect_x4",
  195. + "__call_indirect_x5",
  196. + "__call_indirect_x6",
  197. + "__call_indirect_x7",
  198. + "__call_indirect_x8",
  199. + "__call_indirect_x9",
  200. + "__call_indirect_x10",
  201. + "__call_indirect_x11",
  202. + "__call_indirect_x12",
  203. + "__call_indirect_x13",
  204. + "__call_indirect_x14",
  205. + "__call_indirect_x15",
  206. + "", /* "__call_indirect_x16", */
  207. + "", /* "__call_indirect_x17", */
  208. + "__call_indirect_x18",
  209. + "__call_indirect_x19",
  210. + "__call_indirect_x20",
  211. + "__call_indirect_x21",
  212. + "__call_indirect_x22",
  213. + "__call_indirect_x23",
  214. + "__call_indirect_x24",
  215. + "__call_indirect_x25",
  216. + "__call_indirect_x26",
  217. + "__call_indirect_x27",
  218. + "__call_indirect_x28",
  219. + "__call_indirect_x29",
  220. +};
  221. +
  222. +/* Function to create a BLR thunk. This thunk is used to mitigate straight
  223. + line speculation. Instead of a simple BLR that can be speculated past,
  224. + we emit a BL to this thunk, and this thunk contains a BR to the relevant
  225. + register. These thunks have the relevant speculation barries put after
  226. + their indirect branch so that speculation is blocked.
  227. +
  228. + We use such a thunk so the speculation barriers are kept off the
  229. + architecturally executed path in order to reduce the performance overhead.
  230. +
  231. + When optimizing for size we use stubs shared by the linked object.
  232. + When optimizing for performance we emit stubs for each function in the hope
  233. + that the branch predictor can better train on jumps specific for a given
  234. + function. */
  235. +rtx
  236. +aarch64_sls_create_blr_label (int regnum)
  237. +{
  238. + gcc_assert (STUB_REGNUM_P (regnum));
  239. + if (optimize_function_for_size_p (cfun))
  240. + {
  241. + /* For the thunks shared between different functions in this compilation
  242. + unit we use a named symbol -- this is just for users to more easily
  243. + understand the generated assembly. */
  244. + aarch64_sls_shared_thunks_needed = true;
  245. + const char *thunk_name = indirect_symbol_names[regnum];
  246. + if (aarch64_sls_shared_thunks[regnum] == NULL)
  247. + {
  248. + /* Build a decl representing this function stub and record it for
  249. + later. We build a decl here so we can use the GCC machinery for
  250. + handling sections automatically (through `get_named_section` and
  251. + `make_decl_one_only`). That saves us a lot of trouble handling
  252. + the specifics of different output file formats. */
  253. + tree decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
  254. + get_identifier (thunk_name),
  255. + build_function_type_list (void_type_node,
  256. + NULL_TREE));
  257. + DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
  258. + NULL_TREE, void_type_node);
  259. + TREE_PUBLIC (decl) = 1;
  260. + TREE_STATIC (decl) = 1;
  261. + DECL_IGNORED_P (decl) = 1;
  262. + DECL_ARTIFICIAL (decl) = 1;
  263. + make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
  264. + resolve_unique_section (decl, 0, false);
  265. + aarch64_sls_shared_thunks[regnum] = decl;
  266. + }
  267. +
  268. + return gen_rtx_SYMBOL_REF (Pmode, thunk_name);
  269. + }
  270. +
  271. + if (cfun->machine->call_via[regnum] == NULL)
  272. + cfun->machine->call_via[regnum]
  273. + = gen_rtx_LABEL_REF (Pmode, gen_label_rtx ());
  274. + return cfun->machine->call_via[regnum];
  275. +}
  276. +
  277. +/* Helper function for aarch64_sls_emit_blr_function_thunks and
  278. + aarch64_sls_emit_shared_blr_thunks below. */
  279. +static void
  280. +aarch64_sls_emit_function_stub (FILE *out_file, int regnum)
  281. +{
  282. + /* Save in x16 and branch to that function so this transformation does
  283. + not prevent jumping to `BTI c` instructions. */
  284. + asm_fprintf (out_file, "\tmov\tx16, x%d\n", regnum);
  285. + asm_fprintf (out_file, "\tbr\tx16\n");
  286. +}
  287. +
  288. +/* Emit all BLR stubs for this particular function.
  289. + Here we emit all the BLR stubs needed for the current function. Since we
  290. + emit these stubs in a consecutive block we know there will be no speculation
  291. + gadgets between each stub, and hence we only emit a speculation barrier at
  292. + the end of the stub sequences.
  293. +
  294. + This is called in the TARGET_ASM_FUNCTION_EPILOGUE hook. */
  295. +void
  296. +aarch64_sls_emit_blr_function_thunks (FILE *out_file)
  297. +{
  298. + if (! aarch64_harden_sls_blr_p ())
  299. + return;
  300. +
  301. + bool any_functions_emitted = false;
  302. + /* We must save and restore the current function section since this assembly
  303. + is emitted at the end of the function. This means it can be emitted *just
  304. + after* the cold section of a function. That cold part would be emitted in
  305. + a different section. That switch would trigger a `.cfi_endproc` directive
  306. + to be emitted in the original section and a `.cfi_startproc` directive to
  307. + be emitted in the new section. Switching to the original section without
  308. + restoring would mean that the `.cfi_endproc` emitted as a function ends
  309. + would happen in a different section -- leaving an unmatched
  310. + `.cfi_startproc` in the cold text section and an unmatched `.cfi_endproc`
  311. + in the standard text section. */
  312. + section *save_text_section = in_section;
  313. + switch_to_section (function_section (current_function_decl));
  314. + for (int regnum = 0; regnum < 30; ++regnum)
  315. + {
  316. + rtx specu_label = cfun->machine->call_via[regnum];
  317. + if (specu_label == NULL)
  318. + continue;
  319. +
  320. + targetm.asm_out.print_operand (out_file, specu_label, 0);
  321. + asm_fprintf (out_file, ":\n");
  322. + aarch64_sls_emit_function_stub (out_file, regnum);
  323. + any_functions_emitted = true;
  324. + }
  325. + if (any_functions_emitted)
  326. + /* Can use the SB if needs be here, since this stub will only be used
  327. + by the current function, and hence for the current target. */
  328. + asm_fprintf (out_file, "\t%s\n", aarch64_sls_barrier (true));
  329. + switch_to_section (save_text_section);
  330. +}
  331. +
  332. +/* Emit shared BLR stubs for the current compilation unit.
  333. + Over the course of compiling this unit we may have converted some BLR
  334. + instructions to a BL to a shared stub function. This is where we emit those
  335. + stub functions.
  336. + This function is for the stubs shared between different functions in this
  337. + compilation unit. We share when optimizing for size instead of speed.
  338. +
  339. + This function is called through the TARGET_ASM_FILE_END hook. */
  340. +void
  341. +aarch64_sls_emit_shared_blr_thunks (FILE *out_file)
  342. +{
  343. + if (! aarch64_sls_shared_thunks_needed)
  344. + return;
  345. +
  346. + for (int regnum = 0; regnum < 30; ++regnum)
  347. + {
  348. + tree decl = aarch64_sls_shared_thunks[regnum];
  349. + if (!decl)
  350. + continue;
  351. +
  352. + const char *name = indirect_symbol_names[regnum];
  353. + switch_to_section (get_named_section (decl, NULL, 0));
  354. + ASM_OUTPUT_ALIGN (out_file, 2);
  355. + targetm.asm_out.globalize_label (out_file, name);
  356. + /* Only emits if the compiler is configured for an assembler that can
  357. + handle visibility directives. */
  358. + targetm.asm_out.assemble_visibility (decl, VISIBILITY_HIDDEN);
  359. + ASM_OUTPUT_TYPE_DIRECTIVE (out_file, name, "function");
  360. + ASM_OUTPUT_LABEL (out_file, name);
  361. + aarch64_sls_emit_function_stub (out_file, regnum);
  362. + /* Use the most conservative target to ensure it can always be used by any
  363. + function in the translation unit. */
  364. + asm_fprintf (out_file, "\tdsb\tsy\n\tisb\n");
  365. + ASM_DECLARE_FUNCTION_SIZE (out_file, name, decl);
  366. + }
  367. +}
  368. +
  369. +/* Implement TARGET_ASM_FILE_END. */
  370. +void
  371. +aarch64_asm_file_end ()
  372. +{
  373. + aarch64_sls_emit_shared_blr_thunks (asm_out_file);
  374. + /* Since this function will be called for the ASM_FILE_END hook, we ensure
  375. + that what would be called otherwise (e.g. `file_end_indicate_exec_stack`
  376. + for FreeBSD) still gets called. */
  377. +#ifdef TARGET_ASM_FILE_END
  378. + TARGET_ASM_FILE_END ();
  379. +#endif
  380. +}
  381. +
  382. +const char *
  383. +aarch64_indirect_call_asm (rtx addr)
  384. +{
  385. + gcc_assert (REG_P (addr));
  386. + if (aarch64_harden_sls_blr_p ())
  387. + {
  388. + rtx stub_label = aarch64_sls_create_blr_label (REGNO (addr));
  389. + output_asm_insn ("bl\t%0", &stub_label);
  390. + }
  391. + else
  392. + output_asm_insn ("blr\t%0", &addr);
  393. + return "";
  394. +}
  395. +
  396. /* Target-specific selftests. */
  397. #if CHECKING_P
  398. @@ -23507,6 +23722,12 @@ aarch64_libgcc_floating_mode_supported_p
  399. #undef TARGET_MD_ASM_ADJUST
  400. #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
  401. +#undef TARGET_ASM_FILE_END
  402. +#define TARGET_ASM_FILE_END aarch64_asm_file_end
  403. +
  404. +#undef TARGET_ASM_FUNCTION_EPILOGUE
  405. +#define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks
  406. +
  407. struct gcc_target targetm = TARGET_INITIALIZER;
  408. #include "gt-aarch64.h"
  409. diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
  410. index 8e0fc37..7331450 100644
  411. --- a/gcc/config/aarch64/aarch64.h
  412. +++ b/gcc/config/aarch64/aarch64.h
  413. @@ -643,6 +643,16 @@ extern unsigned aarch64_architecture_version;
  414. #define GP_REGNUM_P(REGNO) \
  415. (((unsigned) (REGNO - R0_REGNUM)) <= (R30_REGNUM - R0_REGNUM))
  416. +/* Registers known to be preserved over a BL instruction. This consists of the
  417. + GENERAL_REGS without x16, x17, and x30. The x30 register is changed by the
  418. + BL instruction itself, while the x16 and x17 registers may be used by
  419. + veneers which can be inserted by the linker. */
  420. +#define STUB_REGNUM_P(REGNO) \
  421. + (GP_REGNUM_P (REGNO) \
  422. + && (REGNO) != R16_REGNUM \
  423. + && (REGNO) != R17_REGNUM \
  424. + && (REGNO) != R30_REGNUM) \
  425. +
  426. #define FP_REGNUM_P(REGNO) \
  427. (((unsigned) (REGNO - V0_REGNUM)) <= (V31_REGNUM - V0_REGNUM))
  428. @@ -667,6 +677,7 @@ enum reg_class
  429. {
  430. NO_REGS,
  431. TAILCALL_ADDR_REGS,
  432. + STUB_REGS,
  433. GENERAL_REGS,
  434. STACK_REG,
  435. POINTER_REGS,
  436. @@ -689,6 +700,7 @@ enum reg_class
  437. { \
  438. "NO_REGS", \
  439. "TAILCALL_ADDR_REGS", \
  440. + "STUB_REGS", \
  441. "GENERAL_REGS", \
  442. "STACK_REG", \
  443. "POINTER_REGS", \
  444. @@ -708,6 +720,7 @@ enum reg_class
  445. { \
  446. { 0x00000000, 0x00000000, 0x00000000 }, /* NO_REGS */ \
  447. { 0x00030000, 0x00000000, 0x00000000 }, /* TAILCALL_ADDR_REGS */\
  448. + { 0x3ffcffff, 0x00000000, 0x00000000 }, /* STUB_REGS */ \
  449. { 0x7fffffff, 0x00000000, 0x00000003 }, /* GENERAL_REGS */ \
  450. { 0x80000000, 0x00000000, 0x00000000 }, /* STACK_REG */ \
  451. { 0xffffffff, 0x00000000, 0x00000003 }, /* POINTER_REGS */ \
  452. @@ -862,6 +875,8 @@ typedef struct GTY (()) machine_function
  453. struct aarch64_frame frame;
  454. /* One entry for each hard register. */
  455. bool reg_is_wrapped_separately[LAST_SAVED_REGNUM];
  456. + /* One entry for each general purpose register. */
  457. + rtx call_via[SP_REGNUM];
  458. bool label_is_assembled;
  459. } machine_function;
  460. #endif
  461. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
  462. index dda04ee..43da754 100644
  463. --- a/gcc/config/aarch64/aarch64.md
  464. +++ b/gcc/config/aarch64/aarch64.md
  465. @@ -1022,16 +1022,15 @@
  466. )
  467. (define_insn "*call_insn"
  468. - [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Usf"))
  469. + [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "Ucr, Usf"))
  470. (match_operand 1 "" ""))
  471. (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
  472. (clobber (reg:DI LR_REGNUM))]
  473. ""
  474. "@
  475. - blr\\t%0
  476. + * return aarch64_indirect_call_asm (operands[0]);
  477. bl\\t%c0"
  478. - [(set_attr "type" "call, call")]
  479. -)
  480. + [(set_attr "type" "call, call")])
  481. (define_expand "call_value"
  482. [(parallel
  483. @@ -1050,13 +1049,13 @@
  484. (define_insn "*call_value_insn"
  485. [(set (match_operand 0 "" "")
  486. - (call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Usf"))
  487. + (call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "Ucr, Usf"))
  488. (match_operand 2 "" "")))
  489. (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
  490. (clobber (reg:DI LR_REGNUM))]
  491. ""
  492. "@
  493. - blr\\t%1
  494. + * return aarch64_indirect_call_asm (operands[1]);
  495. bl\\t%c1"
  496. [(set_attr "type" "call, call")]
  497. )
  498. diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
  499. index d993268..8cc6f50 100644
  500. --- a/gcc/config/aarch64/constraints.md
  501. +++ b/gcc/config/aarch64/constraints.md
  502. @@ -24,6 +24,15 @@
  503. (define_register_constraint "Ucs" "TAILCALL_ADDR_REGS"
  504. "@internal Registers suitable for an indirect tail call")
  505. +(define_register_constraint "Ucr"
  506. + "aarch64_harden_sls_blr_p () ? STUB_REGS : GENERAL_REGS"
  507. + "@internal Registers to be used for an indirect call.
  508. + This is usually the general registers, but when we are hardening against
  509. + Straight Line Speculation we disallow x16, x17, and x30 so we can use
  510. + indirection stubs. These indirection stubs cannot use the above registers
  511. + since they will be reached by a BL that may have to go through a linker
  512. + veneer.")
  513. +
  514. (define_register_constraint "w" "FP_REGS"
  515. "Floating point and SIMD vector registers.")
  516. diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
  517. index 215fcec..1754b1e 100644
  518. --- a/gcc/config/aarch64/predicates.md
  519. +++ b/gcc/config/aarch64/predicates.md
  520. @@ -32,7 +32,8 @@
  521. (define_predicate "aarch64_general_reg"
  522. (and (match_operand 0 "register_operand")
  523. - (match_test "REGNO_REG_CLASS (REGNO (op)) == GENERAL_REGS")))
  524. + (match_test "REGNO_REG_CLASS (REGNO (op)) == STUB_REGS
  525. + || REGNO_REG_CLASS (REGNO (op)) == GENERAL_REGS")))
  526. ;; Return true if OP a (const_int 0) operand.
  527. (define_predicate "const0_operand"
  528. diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
  529. new file mode 100644
  530. index 0000000..b1fb754
  531. --- /dev/null
  532. +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
  533. @@ -0,0 +1,40 @@
  534. +/* { dg-do compile } */
  535. +/* { dg-additional-options "-mharden-sls=blr -mbranch-protection=bti" } */
  536. +/*
  537. + Ensure that the SLS hardening of BLR leaves no BLR instructions.
  538. + Here we also check that there are no BR instructions with anything except an
  539. + x16 or x17 register. This is because a `BTI c` instruction can be branched
  540. + to using a BLR instruction using any register, but can only be branched to
  541. + with a BR using an x16 or x17 register.
  542. + */
  543. +typedef int (foo) (int, int);
  544. +typedef void (bar) (int, int);
  545. +struct sls_testclass {
  546. + foo *x;
  547. + bar *y;
  548. + int left;
  549. + int right;
  550. +};
  551. +
  552. +/* We test both RTL patterns for a call which returns a value and a call which
  553. + does not. */
  554. +int blr_call_value (struct sls_testclass x)
  555. +{
  556. + int retval = x.x(x.left, x.right);
  557. + if (retval % 10)
  558. + return 100;
  559. + return 9;
  560. +}
  561. +
  562. +int blr_call (struct sls_testclass x)
  563. +{
  564. + x.y(x.left, x.right);
  565. + if (x.left % 10)
  566. + return 100;
  567. + return 9;
  568. +}
  569. +
  570. +/* { dg-final { scan-assembler-not {\tblr\t} } } */
  571. +/* { dg-final { scan-assembler-not {\tbr\tx(?!16|17)} } } */
  572. +/* { dg-final { scan-assembler {\tbr\tx(16|17)} } } */
  573. +
  574. diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
  575. new file mode 100644
  576. index 0000000..88bafff
  577. --- /dev/null
  578. +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
  579. @@ -0,0 +1,33 @@
  580. +/* { dg-additional-options "-mharden-sls=blr -save-temps" } */
  581. +/* Ensure that the SLS hardening of BLR leaves no BLR instructions.
  582. + We only test that all BLR instructions have been removed, not that the
  583. + resulting code makes sense. */
  584. +typedef int (foo) (int, int);
  585. +typedef void (bar) (int, int);
  586. +struct sls_testclass {
  587. + foo *x;
  588. + bar *y;
  589. + int left;
  590. + int right;
  591. +};
  592. +
  593. +/* We test both RTL patterns for a call which returns a value and a call which
  594. + does not. */
  595. +int blr_call_value (struct sls_testclass x)
  596. +{
  597. + int retval = x.x(x.left, x.right);
  598. + if (retval % 10)
  599. + return 100;
  600. + return 9;
  601. +}
  602. +
  603. +int blr_call (struct sls_testclass x)
  604. +{
  605. + x.y(x.left, x.right);
  606. + if (x.left % 10)
  607. + return 100;
  608. + return 9;
  609. +}
  610. +
  611. +/* { dg-final { scan-assembler-not {\tblr\t} } } */
  612. +/* { dg-final { scan-assembler {\tbr\tx[0-9][0-9]?} } } */
  613. --
  614. 2.7.4