Writing a MachineFunctionPass in LLVM
January 27, 2017
I’ve been hacking on LLVM lately and I recently needed to write a MachineFunctionPass
to analyze some IR instructions while they got converted to assembly, since I was
working with machine-dependent representations in LLVM as opposed to machine-independent IR.
Unfortunately, LLVM’s splendid Writing an LLVM Pass
doc (which has a great introduction to IR-level passes), didn’t fully cover how
to write a MachineFunctionPass
(or rather, get it running), at least not well
enough for noobs like me to understand. This mailing list thread
was invaluable for me to get started off, but I’ll elaborate a bit more.
The post below assumes you’ve read and understand the Writing an LLVM Pass doc. It also assumes you have a basic familiarity with the kind of tools LLVM offers out-of-the-box.
LLVM’s opt
tool doesn’t make any machine-dependant optimizations
and only runs on IR, so it makes sense why MachineFunctionPasses
don’t work in opt
,
since they only run on MachineInstr
s (if this doesn’t make sense to you,
check out Eli Bendersky’s Life of an instruction in LLVM post).
The cool part of opt
is that you can write a pass out-of-source and choose to
dynamically load it as a shared object library into opt
without recompiling
the entire opt
tool, which takes a shit-ton of time. Unfortunately, there is no
such nice modular way to write machine-dependent passes for llc
. You simply
need to hack LLVM’s source to get llc
to run your MachineFunctionPass
when
you invoke it for the architecture you’re working on.
Enough talk, let’s dive in!
So, let’s say I want to write a MachineFunctionPass
dumping the MachineInstr
s
in each MachineFunction
. Let’s call our file X86MachineInstrPrinter.cpp
living
in lib/Target/X86
.
#include "X86.h" | |
#include "X86InstrInfo.h" | |
#include "llvm/CodeGen/MachineFunctionPass.h" | |
#include "llvm/CodeGen/MachineInstrBuilder.h" | |
#include "llvm/Target/TargetRegisterInfo.h" | |
using namespace llvm; | |
#define X86_MACHINEINSTR_PRINTER_PASS_NAME "Dummy X86 machineinstr printer pass" | |
namespace { | |
class X86MachineInstrPrinter : public MachineFunctionPass { | |
public: | |
static char ID; | |
X86MachineInstrPrinter() : MachineFunctionPass(ID) { | |
initializeX86MachineInstrPrinterPass(*PassRegistry::getPassRegistry()); | |
} | |
bool runOnMachineFunction(MachineFunction &MF) override; | |
StringRef getPassName() const override { return X86_MACHINEINSTR_PRINTER_PASS_NAME; } | |
}; | |
char X86MachineInstrPrinter::ID = 0; | |
bool X86MachineInstrPrinter::runOnMachineFunction(MachineFunction &MF) { | |
for (auto &MBB : MF) { | |
outs() << "Contents of MachineBasicBlock:\n"; | |
outs() << MBB << "\n"; | |
const BasicBlock *BB = MBB.getBasicBlock(); | |
outs() << "Contents of BasicBlock corresponding to MachineBasicBlock:\n"; | |
outs() << BB << "\n"; | |
} | |
return false; | |
} | |
} // end of anonymous namespace | |
INITIALIZE_PASS(X86MachineInstrPrinter, "x86-machineinstr-printer", | |
X86_MACHINEINSTR_PRINTER_PASS_NAME, | |
true, // is CFG only? | |
true // is analysis? | |
) | |
namespace llvm { | |
FunctionPass *createX86MachineInstrPrinterPass() { return new X86MachineInstrPrinter(); } | |
} |
Whenever you start navigating a codebase as intimidating as LLVM’s, you often wonder,
“How the heck do you figure out how xyz works without documentation?”. The clichéd
answer is simply, read the source. I ended up making friends with grep -nr "[search term]" .
and ctags, and life got a bit better.
You might want to go through the LLVM Target-Independent Code Generator (and optionally, the Machine IR (MIR) Format Reference Manual) doc now.
The crucial learning from the first link is that all optimizations on machineinstr
s
are in the form of MachineFunctionPass
es. If you follow the LLVM Reviews page
(and you should!), try to get hold of some review process involving such an
optimization. The diffs should give you an idea of the additions you need to make
to get your stuff working (and shhh, find some sample code). My helper link was this
and sample file was lib/Target/X86/X86EvexToVex.cpp
.
Moving on, add the following to X86.h
:
FunctionPass *createX86MachineInstrPrinter();
void initializeX86MachineInstrPrinterPass(PassRegistry &);
Then for lib/Target/X86/X86TargetMachine.cpp
, add the snippet below. Note
that we’ll be added our pass under the addPreRegAlloc()
function because we’ll
choose to print our machineinstr
s before register allocation takes place).
extern "C" void LLVMInitializeX86Target() {
// ...
PassRegistry &PR = *PassRegistry::getPassRegistry();
// ...
initializeX86MachineInstrPrinterPass(PR);
}
// ...
void X86PassConfig::addPreRegAlloc() {
if (getOptLevel() != CodeGenOpt::None) {
// ...
}
// ...
addPass(createX86MachineInstrPrinter());
}
Finally, add X86MachineInstrPrinter.cpp
to the CMakeLists.txt
in lib/Target/X86
,
and compile llvm from your build directory. If you have a computer like mine,
you might want to get a cup of coffee despite the fact you just need to recompile
llc
.
The next time you run llc
, you’ll see your machineinstr
s being outputted. How exciting!
Phew, that was long. Hopefully, this gave you some insight into how does one go about figuring out a large codebase like LLVM without losing one’s mind or going in too deep looking for reasoning (“To make an apple pie from scratch, you must first invent the universe.” - Carl Sagan).
Bonus observation:
You might notice that all the createXYZ()
and initializeXYZPass()
functions
follow the same naming scheme. You might think this is just a convention; why not
try changing one of them when you define them in X86MachineInstrPrinter.cpp
?
You’ll be greeted with a bunch of cryptic error messages.
To answer them, take a look at /include/llvm/PassSupport.h
. And holy shit, the
entire file is full of giant macros in the wild with the all the function names
hardcoded in…
#define INITIALIZE_PASS(passName, arg, name, cfg, analysis) \
static void *initialize##passName##PassOnce(PassRegistry &Registry) { \
PassInfo *PI = new PassInfo( \
name, arg, &passName::ID, \
PassInfo::NormalCtor_t(callDefaultCtor<passName>), cfg, analysis); \
Registry.registerPass(*PI, true); \
return PI; \
} \
LLVM_DEFINE_ONCE_FLAG(Initialize##passName##PassFlag); \
void llvm::initialize##passName##Pass(PassRegistry &Registry) { \
llvm::call_once(Initialize##passName##PassFlag, \
initialize##passName##PassOnce, std::ref(Registry)); \
}
// etc...
shudder