Topic: Performance Evaluation


      +   Suggested reading: Heidelberger and Lavenberg, "Computer Per-

          formance  Evaluation Methodology", IEEETC, C33, 12, December,

          1984, pp. 1195-...


      +   Performance modeling and measurement is  needed  through  the

          entire  life cycle of a system:  design, debugging, installa-

          tion, tuning, data collection for next system.


      +   Performance evaluation covers these areas:

          +   Measurement

          +   Analytic Modeling

          +   Simulation Modeling

          +   Tuning

          +   Design Improvement


      +   Good work in performance evaluation requires a good  substan-

          tive understanding of the system under study.  You can't just

          arrive with a bag of tricks (e.g.  queueing  theory)  and  do

          something useful.


      +   Measurement

          +   Advantage of dealing with something "real."  Has all  in-

              teractions, which would tend to escape the model.

          +   Disadvantage of time and effort to get data (inc. facili-

              ties needed).


                                   - .29 -


          +   Not that many published measurement studies.


      +   Hardware Monitoring

          +   Use some sort of hardware monitor (e.g.  logic  analyzer)

              to collect and partially reduce data.

          +   Data is pulled off system with logic probes.

              +   Note that signals have to be  available.   (Not  from

                  middle of chip).  May have to design in probe points.

          +   Signals can be used to  count  events,  generate  traces,

              sample, etc.

          +   Hardware monitors typically have some  hard  logic  (e.g.

              counters  and  gates) backed up by some programmable con-

              trol (e.g. a minicomputer).

          +   HW monitors are difficult to get, expensive, and hard  to

              use.   Seldom  used  except by vendors and large computer

              centers.

          +   Samples should be taken at random times, not regular  in-

              tervals.  The latter will fail to get correct measures of

              events which occur at regular intervals which are  multi-

              ples or submultiples of the sampling interval.


      +   Software Measurements

          +   Code running on system is instrumented.

          +   Can put counters or signallers in  OS  code,  compile  it

              into  source code to be studied.  Can sample at timer in-

              terrupts (e.g. PC).

              +   Note that sampling won't sample something which isn't


                                   - .30 -


                  interruptable.

          +   Can use built in profiling facilities  provided  by  some

              compilers.

          +   Can instrument the microcode to collect data, if  machine

              is microcoded.

          +   Automatic facilities like compilers and profilers can  do

              much of this.

          +   Can generate significant overhead  - e.g.  20%  or  more.

              (GTF)


      +   Hardware Counters

          +   Modern CPUs have counters built into the  hardware.   Can

              be set to count various things: branches, misses, cycles,

              instructions, etc.


      +   Multics Measurements

          +   Sampled using timer interrupts.

          +   Used hardware counter for memory cycles.

          +   Had external I/O channel which could be externally driven

              to do useful things like read certain regions of memory.

          +   Used remote terminal emulator to drive system.


      +   Diamond

          +   Diamond is an internal DEC measurement tool  -  a  hybrid

              monitor.   Hardware probes read the PC, CPU mode, channel

              and I/O device activity, and system  task  ID.   Software

              gets the user ID.  A minicomputer reads the data and does


                                   - .31 -


              real time and later analysis.  Can generate traces also.


      +   IBM's GTF (General Trace Facility)

          +   IBM - generates trace records of any system events - e.g.

              I/O  interrupts,  SIOs,  opens,  traps, SVCs, dispatches,

              closes, etc.

              +   Designed for debugging.  Generates  lots  of  useless

                  data, and not as much useful data as you would like.


      +   IBM's SMF (System Management Facility)

          +   Generates records for  assorted  events  -  jobs,  tasks,

              opens, closes, etc.

              +   Designed for accounting and  management.   Also  gen-

                  erates  lots  of useless data, and not as much useful

                  data as you would like.


      +   Some mainframes have console  monitors  which  generate  real

          time  load  information and measurements (e.g. queue lengths,

          channel utilization, etc.)


      +   Workload Characterization

          +   Important part of any workload study - must know what the

              workload is.

          +   Three types of workloads for Performance Experiments:

              +   Live workloads - good for measurement  but  poor  for

                  experiments- - uncontrolled.

              +   Executable workloads  consisting of real samples.


                                   - .32 -


              +   Synthetic executable workloads. - can  be  parameter-

                  ized versions of real workloads.

                  +   A synthetic workload may be needed as  a  projec-

                      tion of a future workload.


      +   To characterize a workload:

          +   Decide what to characterize.  (not easy - as you can see,

              many published papers are not interesting)

          +   Figure out how to characterize those items

          +   Figure out how to get the data (!!!)

          +   Get the data.

          +   Exploratory data analysis.

          +   Cluster analysis


      +   Statistical Methods

          +   Means, variances, distributions

          +   Techniques such as linear regression and factor  analysis

              quite useful.

          +   Can do statistical analysis on data  to  see  if  various

              models fit.

          +   Seldom used - little intersection between class  of  com-

              petant  experimental  performance  analysts and competent

              statisticians.


      +   Analytic Performance Modeling

          +   Build analytic model of some type of system of interest.

          +   Calculate factors of interest as function of parameters


                                   - .33 -


          +   Lots of research of this type.  Some of it useful.

          +   Models tend to be  queueing  models,  stochastic  process

              models.

              +   Most of progress in queueing  theory  in  last  30-40

                  years due to computer system modeling.

                  +   Advances are:

                      +   class of queueing network  models  which  are

                          easily solved,

                      +   efficient computational algorithms for  these

                          solutions,

                      +   good approximation methods  for  systems  not

                          easily solved.


      +   Pros/Cons of Analytic Modeling

          +   They are good for capacity planning, I/O subsystem model-

              ing, preliminary design aid.

              +   Capacity planning is a  big  area  of  application  -

                  measure  and analyze current system, set up validated

                  model of current system and workload, project changes

                  in  workload, and see what sort of system design will

                  handle it.

          +   Analytic models do not capture the fine level  of  detail

              needed  for  some  things  such  as  hardware  design and

              analysis - e.g. caches, CPU pipelines.


      +   Queueing networks are powerful technique in  analytic  model-

          ing.


                                   - .34 -


          +   Major Components of Queueing Network are:

              +   Servers

              +   Customers

              +   Routing

          +   Diagram of queueing network.

          +   Many types of queueing networks  can  be  easily  solved,

              such  as  the  following  (called "BCMP" - Baskett Chandy

              Muntz & Palacios):

              +   Customers can have a type T.

              +   Routing is system can be of form p(i,t1,j,t2),  where

                  i and j are servers, and t1 and t2 are types.

              +   Servers can be

                  +   FCFS exponential, with service rate a function of

                      queue length

                  +   Processor sharing, any distribution

                  +   Infinite server - any distribution.

                  +   LCFSPR  -  last  come,  first  serve,  preemptive

                      resume, any distribution

              +   Solution is product of terms for  each  service  sta-

                  tion.  (See JACM, 1975.)

          +   If network is not of BCMP type, probably can't be  solved

              exactly.   There  are  approximation  methods that can be

              used.


      +   Simulation

          +   Types:


                                   - .35 -


              +   discrete event simulation

                  +   Trace Driven

                  +   Random Number (Stochastic)

              +   continuous simulation (e.g. a diff eq) (not used  for

                  computer system modeling)

              +   Monte carlo methods (e.g. sampling)


      +   Simulation model has these components:

          +   A model of the system, which has a state.

          +   A set of events which cause changes in the state.

          +   A method for generating such a sequence of events.

          +   A measurement component, which records the statistics  of

              interest.


      +   Discrete Event general simulation model:

          +   Events come from  event  list.   (Events  can  come  from

              trace).

          +   Next event is taken off list.

          +   System state is updated.

          +   Event list is  updated.  (events  added,  deleted,  their

              times changed)

          +   Statistics are accumulated.

          +   Next event is obtained from event list.

      +   Example: simple discrete event simulation of M/G/1 queue.

      +   Events come from trace and/or random number generator.


      +   There are special languages for simulation (GPSS,  SIMSCRIPT,


                                   - .36 -


          GASP, SLAM, SIMULA)

      +   There are simulation modeling packages: RESQ (IBM), PAWS  (UT

          Austin), QNAP (INRIA - Potier)


      +   Analysis of simulation output

          +   Regenerative simulation -  find  regeneration  point,  do

              standard IID statistics

          +   Time Series Analysis - do time series analysis (e.g.  au-

              tocorrelation anal) - also called Spectral method.

          +   Repeat entire simulation run

          +   Very long runs

          +   Batch means - take long samples, and consider them IID


      +   Back of the Envelope methods


                                   - .37 -


          Topic: Virtual Machines


          Reading:  Robert  Goldberg,  "Survey   of   Virtual   Machine

          Research", IEEE Computer, June, 1974, pp. 34-45.


      +   A virtual machine is a software supported copy of  the  basic

          (hardware) machine.

          +   Usually accomplished by allowing most instructions to run

              on real hardware.


      +   Virtual Machine Monitor (VMM) - the piece  of  software  that

          provides the pseudo-bare machine interfaces.


      +   Diagram of virtual machine idea


      +   Virtual machines run on base machines that are the same.

      +   Emulators are used to provide dissimilar bare machine  inter-

          face (i.e. different than the machine underneath).


      +   Contrasts with  operating  systems,  which  provide  extended

          machine interfaces - i.e. they are presumably better than the

          bare machine.


      +   Uses of virtual machines

          +   Run several different OS's at once (for different users)

          +   Debug versions of OS while running other users. (inc. di-


                                   - .38 -


              agnostics, etc.)

          +   Develop network software on single machine

          +   Run multiple OS releases

          +   Have students do systems programming.

          +   High reliability due to high  isolation  between  virtual

              machines

          +   High security (for same reason)


      +   Examples:

          +   VM/370, M44/44X (on 7044), CP-40 (on 360/40),  CP-67  (on

              360/67), Hitac 8400, UMMPS, VM-Ware (Mendel Rosenblum)


      +   Implementation

          +   For performance reasons, run  non-sensitive  instructions

              to actually execute on the bare hardware.

          +   Trap and simulate all sensitive instructions -  i.e.  any

              which could affect the VMM or any of the other VMs.

              +   If it isn't possible to trap all  sensitive  instruc-

                  tions,  then  may not be possible to build VM on that

                  machine.


      +   Memory Mapping

          +   Must map memory of VM to real machine.  Can be done  with

              page tables or base and bounds registers.

          +   The VMM itself may provide  a  paged  machine.   In  this

              case,  there  are two levels of page tables - one from OS

              to VM, and one from VM to real machine.


                                   - .39 -


              +   In actual operation, will create composed page  table

                  to  map  two  tables  in one operation.  (So built in

                  mapping hardware can be used.)  This can work for ar-

                  bitrary numbers of levels.

                  +   Note that on a page fault, you have to figure out

                      who handles it.


      +   I/O

          +   It is necessary to trap and simulate I/O.

              +   Want to permit I/O only to valid areas for the VM

              +   Without interfering with other VMs

              +   The I/O code (e.g. channel programs) must be properly

                  interpreted  or  translated,  since they use real ad-

                  dresses.  (For this reason,  self  modifying  channel

                  programs  are usually prohibited, as too difficult to

                  translate.)

          +   I/O devices are usually simulated.  Each  user  is  given

              virtual  I/O devices ("mini-disks"), which look like real

              hardware devices.


      +   The VMM keeps a bit for each VM which specifies  whether  the

          VM  is  in user or supervisor state.  It can thus provide ap-

          propriate simulations of sensitive instructions.

          +   Attempts to execute sensitive instructions in user  state

              cause abends.  In supervisor state, they are executed ap-

              propriately.


                                   - .40 -


      +   Hardware Virtualizer

          +   Idea is to have hardware which dynamically maps N  levels

              of virtual machines into the actual hardware.  An associ-

              ative memory is used to keep recently evaluated mappings.

              The  hardware  virtualizer  would  therefore  avoid  most

              software simulation.

          +   Define  maps which  map  from  VM  interface  to  machine

              below.   Define  U  map  which maps from extended machine

              (OS) to VM below it.  U is provided by OS.  The  hardware

              virtualizer  composes  as many levels of f maps as neces-

              sary (these map virtual machine to a machine).


      +   VM Performance

          +   Will run slower than real machine due  to  simulation  of

              sensitive instructions.

          +   Specific performance degradations:

              +   support of privileged instructions

              +   maintaining  the  status  of  the  virtual  processor

                  (user/supervisor)

              +   support of paging within virtual machines

              +   console functions

              +   acceptance and reflection of interrupts to individual

                  VMs.

              +   translation of channel programs

              +   maintenance of clocks


      +   Ways to enhance performance


                                   - .41 -


          +   dedicate some resources, so that they don't  have  to  be

              mapped or simulated.

          +   give certain critical VMs priority to run

          +   run virtual=real (same as #1)

          +   let VM instead of OS do paging.  (If OS does it, it  gets

              done twice.)

          +   modify OS to avoid costly (slow) features

          +   extend VMM to provide performance enhancements  (but  not

              truly a VM any more.  (e.g. VM assist on 370)

          +   extend hardware to support VM


      +   Special performance problems

          +   Optimization within the OS may conflict with optimization

              within  VMM.  E.g. double paging anomally,  buffer paging

              problem of IMS, disk optimization where disk  is  mapped,

              spooling by VMM and also by OS.


                                   - .42 -


              Topic:  Current Research in Operating Systems


      +   Most of what we have talked about in the  area  of  Operating

          Systems is not new, but goes back 20-30 years.

      +   What are people doing currently?


      +   In a recent Operating Systems Conf. Proceedings (Proc.  17'th

          ACM  Symposium  on  Operating  Systems  Principles, December,

          1999), the principal topics include:

          +   Managability, Availability and Performance in a mail ser-

              vice.

          +   Performance of Web Proxy Caching

          +   Performance of a stateless, thin-client architecture

          +   Energy aware adaptation for mobile environments.

          +   Active Networks (customized programs are executed  within

              the network).

          +   Building reliable, high  performance  communication  sys-

              tems.

          +   File system usage in Windows NT.

          +   The Elephant file system.

          +   File system security.

          +   Integrating segmentation and paging protection.

          +   Resource management on shared-memory multiprocessors.

          +   A fast capability based operating system.

          +   A naming system for dynamic networks and mobile units.

          +   A distributed virtual machine for networked computers.

          +   A modular router.


                                   - .43 -


          +   Timer support for network processing.

          +   CPU priority scheduling.

          +   Scheduling for latency sensitive threads.

          +   A small, real-time microkernel.


      +   In a recent Operating Systems Conf. Proceedings (Proc.  16'th

          ACM  Symposium  on  Operating  Systems  Principles,  October,

          1997), the principal topics include:

          +   Performance Analysis - profiling and distributed/parallel

              programs

          +   OS Kernels

          +   Caching in Computer Networks

          +   Transactions on Networks

          +   Security for Java

          +   Formal Analysis of Security

          +   Running Commodity OS on Scalable Microprocessors

          +   Transparent Distributed Shared Memory

          +   Scheduler for Multimedia Applications

          +   CPU Scheduling

          +   Scalable Distributed File System

          +   Log Structured File System

          +   Reducing I/O latency

          +   File Caching and Hoarding in Mobile Systems

          +   Update Policies for Mobile Operation


      +   Some titles fromn 1993 SOSP:


                                   - .44 -


          +   Distributed file systems

          +   RAID type file systems

          +   Synchronization and its limitations in  distributed  sys-

              tems.

          +   Distributed system design.

          +   Distributed programming

          +   Using threads.

          +   Memory Management of an object oriented language

          +   Relation between operating system  structure  and  memory

              system performance (effects on the cache of OS code).

          +   Concurrent compacting garbage collection.

          +   Improved IPC (interprocess communcation)

          +   Improved fault isolation.

          +   Audio and video in a distributed system.

          +   Authentication

          +   Location info for distributed systems.


      +   From ASPLOS (Architectural Support for Programming  Languages

          and Operating Systems), 10/94, related to OS:

          +   Data and control transfer in distributed systems

          +   Scheduling and page migration for multiprocessor  compute

              servers

          +   Synchronization algorithms for multiprocessors

          +   Software overhead in message passing.

          +   Software support for exception handling.

          +   Performance monitoring.


                                   - .45 -


      +   In summary:

          +   Networks and the Web

          +   Performance Issues- memory, scheduling, networks.

          +   Mobility

          +   Energy Management

          +   File Systems

          +   Protection and Security

          +   Misc: virtual machines, kernels.


      +   Personal View of Important Issues:

          +   The world is becoming one large distributed computer sys-

              tem  with file migration, process migration, load balanc-

              ing, distributed transparent file system, etc.

          +   This suggests that the important issues are:

              +   Efficient ways to write reliable OS

              +   With high performance

                  +   file migration algorithms

                  +   load balancing

                  +   distributed transparent file  system  implementa-

                      tion

                  +   wireless and mobile systems

              +   Supporting mobility

                  +   Location and naming issues

                  +   Energy management


                                   - .46 -


                      Topic:  Real Time Systems


      +   Real Time Systems are Systems in which there is a  real  time

          DEADLINE

          +   Typically a mechanical system is being controlled.

              +   E.g. assembly line, anti-balistic missile defense


      +   Real Time System must be able to:

          +   Meet all Deadlines (with 100% or 99+% probability)

          +   Handle the aggregate load.

              +   If there are N events per second,  must  be  able  to

                  handle them, whether or not each has a deadline.


      +   This implies:

          +   Deadline Scheduler

          +   Avoidance of page faults - generally must  lock  deadline

              oriented code into memory.

          +   Avoidance of I/O operations when near deadline  is  pend-

              ing.    Usually  must  keep  necessary info in electronic

              memory, and/or fetch it in advance.


      +   This does not imply no cache memory

          +   No matter what people say....

          +   Better to have system that sometimes runs at 5X or  4.9X,

              rather than one that is guaranteed to run at 1.0X all the

              time.


                                   - .47 -