Wednesday, May 07, 2008

Practical Applications of Static Bytecode Analysis and Transformation for the Java Platform - BOF-5839

DISCLAIMER
These are the tidied up notes I took from the session at the JavaOne 2008 conference in San Francisco, California. There may well be mistakes and ommissions. I will come back and correct them once the conference has completed. However my first priority is to get the information published before the backlog gets too large and swamps me. All comments welcome. Enjoy!

The JVM
- proven and reliable
- built for statically typed java language
- but now > 200 languages, which can be compiled to bytecode
- Classes can be generated and analysed at rnutime and compile time

The Class File Format
- Constant pool - field and method names, ty descriptors, string litersla
- Attributes - fields, methods, code
- debug information (line numbers etc, local variable names)
- exceptions
- User defined attributes

Class File Access and Modification Problems
- Lots of serialization and deserialization details
- constant pool management - managing constant pool indexes, references, missing or unised constants
- jump offsets - calculation of instruction offsets, inserting and removing instructions from the method
- Computation of stack size and StackMapTable - requires a control flow analysis

There are several bytecode frameworks to help hide the details - ASM is an example

ASM Bytecode Framework (v3.1)
- From France Telecom in 2002
- Initial goal: Fast tool to do simple bytecode transformations
- Core - generate classes, basic transforms
- Tree and analysis - in memory representation, analysis algorithms
- Commons - renaming, advice adaptor, inline JSR/RET subroutines, sort local variables, calculate serialVersionUID etc.
- Utilities - checker, decompiler, ASMifier (write code, compile to bytecode, then run ASMifier to see what ASM to write to create this)

"Visitor" Pattern Based
- Heavy use of the visitor pattern
- ClassReader implements. Takes a serialised input
- MyClassAdapter - can be chained to add, remove, change etc. things. In this respect this is very similar to SAX API
- ClassWriter implements. Writes serialized output

How Cobertura Uses ASM
- Instruments compiled classes
- Adds callbacks to the recording engine - for each method line, for each branching instruction
- Saves information about added callbacks
- Recording engine is called during test execution - records number of invocations for each callback to mark code branhes as tested. compares which code branhes have been tested with what hasn't been tested.

Google Singleton Detector
- Singletons are bad and make testing hard
- When you come to a project you want to get an overview - where is all the hairy stuff? Do it with ASM, based on heuristics (private constructor, hingleton (helper class enforcing a singleton), mingleton (class with static method that returns some state and do not have parameters), fingleton (public static field singleton) - all are ways to hide global state)
- Find with ClassVisitor (iterate over all classes in jar), MethodVisitor (mingletons), FieldVisitor (fingletons - look for global state)
- They tested Catalina - try and run two instances of Tomcat in same JVM on different ports and there will be shared state problems.
- Second tool - testability explorer - looks for global mutable state (just bad) and the lack of injectability (hard to test individual pieces in isolation). Assigns points to classes (goodness (DependencyInjection, interface / implementation seperation) and badness (global state))
- http://testabilityexplorer.org & on code.google.com/p/testability-explorer
- Take these ideas, put in a jar and add to CI when something hard to test is added
- It tells you the line number where a variable is not injected ("why I am marking this class hard to test")
- It tells you how easy (or hard) a project is to unit test. Not how covered it is

No comments: