DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

(gmp.info.gz) Assembler Cache Handling

Info Catalog (gmp.info.gz) Assembler Carry Propagation (gmp.info.gz) Assembler Coding (gmp.info.gz) Assembler Functional Units
 
 Cache Handling
 --------------
 
 GMP aims to perform well both on operands that fit entirely in L1 cache
 and those which don't.
 
    Basic routines like `mpn_add_n' or `mpn_lshift' are often used on
 large operands, so L2 and main memory performance is important for them.
 `mpn_mul_1' and `mpn_addmul_1' are mostly used for multiply and square
 basecases, so L1 performance matters most for them, unless assembler
 versions of `mpn_mul_basecase' and `mpn_sqr_basecase' exist, in which
 case the remaining uses are mostly for larger operands.
 
    For L2 or main memory operands, memory access times will almost
 certainly be more than the calculation time.  The aim therefore is to
 maximize memory throughput, by starting a load of the next cache line
 while processing the contents of the previous one.  Clearly this is
 only possible if the chip has a lock-up free cache or some sort of
 prefetch instruction.  Most current chips have both these features.
 
    Prefetching sources combines well with loop unrolling, since a
 prefetch can be initiated once per unrolled loop (or more than once if
 the loop covers more than one cache line).
 
    On CPUs without write-allocate caches, prefetching destinations will
 ensure individual stores don't go further down the cache hierarchy,
 limiting bandwidth.  Of course for calculations which are slow anyway,
 like `mpn_divrem_1', write-throughs might be fine.
 
    The distance ahead to prefetch will be determined by memory latency
 versus throughput.  The aim of course is to have data arriving
 continuously, at peak throughput.  Some CPUs have limits on the number
 of fetches or prefetches in progress.
 
    If a special prefetch instruction doesn't exist then a plain load
 can be used, but in that case care must be taken not to attempt to read
 past the end of an operand, since that might produce a segmentation
 violation.
 
    Some CPUs or systems have hardware that detects sequential memory
 accesses and initiates suitable cache movements automatically, making
 life easy.
 
Info Catalog (gmp.info.gz) Assembler Carry Propagation (gmp.info.gz) Assembler Coding (gmp.info.gz) Assembler Functional Units
automatically generated byinfo2html