Hardware design pattern Lecture Sept 24 Greg Presented: LCS Accelerator LCS problem is the heart of diff, it computes the total length of the matched characters O(N2) complexity, O(N) parallelism, O(N) space Parallel Software Implementation OPL structural pattern involved: Master/Worker; blocking creates coarse parallelism; allows for load balancing Hardware Implementation LCS accelerator: OPL structural pattern pipeline HPL In order pipeline Systolic Array No load balancing or scheduling Perfect scaling Infrastructure Virtual Stream (FIFO) Inputs Control/Output registers LCS wave front sweep to the right --> only needs to access the x value, y value and the neighbor values That’s why it only needs O(N) space Application patterns in LCS pipeline dynamic programming Machine organization: Systolic (LCS Core) Heterogeneous CPU+Acc/FPGA feedback loops in each LCS node PMS Layer In Order pipeline FIFO/ShiftRegister Memory Infrastructure (RCBIOS) self-Timed VS Inputs Multi-Stage Network XLink Comm Channels Drill down into NoC XLink, etc. Doing conditional in hardware --> mux structure Parallel prefix can be a pattern Bounded depth can use similar structure to this LCS but lots of application have unbounded depth, which would require the use of RAM, to parallel access RAM then multiport RAM might be needed. These applications are memory parallelism bound when the number needs to be accessed increases Andrew Presented: IIT video Controller Processor H261/MPEG encoding decoding still used today for low end dvd used for Camera VCR etc separated into two parts vision controller and vision processor vision controller --> vision processor This design's controller has an MIPS-X Risc core use DMA and frame buffer to communicate with the vision processor which runs coder & decoder SIMD data path and local store programmable vision processor this is a heterogeneous SIMD machine interconnect are buses or point to point connection there are no networks High level organization looks random