B. Performance expectations

Psyco can compile code that uses arbitrary object types and extension modules. Operations that it does not know about will be compiled into direct calls to the C code that implements them. However, some specific operations can be optimized, and sometimes massively so -- this is the core idea around which Psyco is built, and the reason for the sometimes impressive results.

The other reason for the performance improvement is that the machine code does not have to decode the pseudo-code (``bytecode'') over and over again while interpreting it. Removing this overhead is what compilers classically do. They also simplify the frame objects, making function calls more efficients. So does Psyco. But doing only this would be ineffective with Python, because each bytecode instruction still has a lot of run-time decoding to do (typically, looking up the type of the arguments in tables, invoking the corresponding operation and building a resulting Python object).

The type-based look-ups and the successive construction and destruction of objects for all intermediate values is what Psyco can most successfully cancel, but it needs to be taught about a type and its operations before it can do so.

We list below the specifically optimized types and operations. Possible performance gains are just wild guesses; specialization is known to give often-good-but-hard-to-predict gains. Remember, all operations not listed below work well -- they just cannot be much accelerated.

A performance killer is the usage of the built-in functions map and filter. Never use them with Psyco. Replace them with list comprehensions (see 2.4). The reason is that entering code compiled by Psyco from non-Psyco-accelerated (Python or C) code is quite slow, slower than a normal Python function call. The map and filter functions will typically result in a very large number of calls from C code to a lambda expression foolishly compiled by Psyco. An exception to this rule is when using map or filter with a built-in function, when they are typically slightly faster than list comprehension because the only difference is then that the loop is performed by C code instead of by Psyco-generated code. Still, I generally recommend that you forget about map and filter and use the "Pythonic" way.

Virtual-time objects are objects that, when used as intermediate values, are simply not be built at run-time at all. The noted performance gains only apply if the object can actually remain virtualized. Any unsupported operation will force the involved objects to be normally built.

Type	Operations	Notes
Any built-in type	reading members and methods	(1)
Built-in function and method	call	(1)
Integer	truth testing, unary `+` `-` `~` `abs()`, binary `+` `-` `*` `\|` `&` `<<` `>>` `^`, comparison	(2)
Dictionary	`len()`	(4)
Float	truth testing, unary `+` `-` `abs()`, binary `+` `-` `*` `/`, comparison	(5)
Function	call	(6)
Sequence iterators	`for`	(7)
List	`len()`, item get and set, concatenation	(8)
Long	all arithmetic operations	(9)
Instance method	call	(1)
String	`len()`, item get, slicing, concatenation	(10)
Tuple	`len()`, item get, concatenation	(11)
Type	call
array.array	item get, item set	(15)

Built-in function	Notes
`range`	(8)
`xrange`	(13)
`chr`, `ord`	(10)
`id`
`type`
`len`, `abs`, `divmod`
`apply`	(14)
the whole `math` module	(16)
`map`, `filter`	not supported(17)

Notes:

(1)

In the common "object.method(args)" the intermediate bound method object is never built; it is translated into a direct call to the function that implements the method. For C methods, the underlying PyMethodDef structure is decoded at compile-time. Algorithms doing repetitive calls to methods of e.g. lists or strings can see huge benefits.

(2)

Virtual-time integers can be 100 times faster than their regular counterpart.

(4)

Complex data structures are not optimized yet, beyond (1). In a future version it is planned to allow these structures to be re-implemented differently by Psyco, with an implementation that depends on actual run-time usage.

(5)

Psyco does not know about the Intel FPU instruction set. It emits calls to C functions that just add or multiply two doubles together. Virtual-time floats are still about 10 times faster than Python.

(6)

Virtual-time functions occur when defining a function inside another function, with some default arguments.

(7)

Sequence iterators are virtual-time, making for loops over sequences as efficient as what you would write in C.

(8)

Short lists and ranges of step 1 are virtualized. A for looping over a range is as efficient as the common C for loop. For the other cases of lists see (4).

(9)

Minimal support only. Objects of this type are never virtualized. The majority of the CPU time is probably spent doing the actual operation anyway, not in the Python glue.

(10)

Virtual-time strings come in many flavors: single characters implemented as a single byte; slices implemented as a pointer to a portion of the full string; concatenated strings implemented as a (possibly virtual) list of the strings this string is the join of. Text-manipulation algorithms should see massive speed-ups.

(11)

Programs manipulating small tuples in local variables can see them completely virtualized away. In general however, the gains with tuples are mostly derived from the various places where Python (and Psyco that mimics it) internally manipulates tuples.

(13)

Psyco can optimize range well enough to make xrange useless. Indeed, with no specific support xrange would be less efficient than range! Currently xrange is almost identical to range.

(14)

Without keyword argument dictionary.

(15)

Type codes 'I' and 'L' are not supported. Type code 'f' does not support item assignment. The speed of a complex algorithm using an array as buffer (like manipulating an image pixel-by-pixel) should be very high; closer to C than plain Python.

(16)

Missing: frexp, ldexp, log, log10, modf. See note (5).

(17)

Systematically avoid map and filter and replace them with list comprehension (section 2.4).