============================================================================== Lecture on 2005-05-02 Farhad Massoudi cs162-db James McBryan cs162-bj ============================================================================== Announcement:The last midterm is 7pm, May 9, 10 Evans. It will probably be about 2 hours long. No class on Monday, May 9. Topic: Performance Evaluation Performance evaluation covers these areas: + Measurement + Analytic Modeling + Simulation Modeling + Tuning + Design Improvement **************************************** Measurement (continue from last lecture): **************************************** Hardware Counters * Modern CPUs like Pentium processors have counters built into the hardware. Hardware Counters can be used to count various things such as: branches, misses, cycles, instructions, etc. Multics System * Employs timer interrupts. * Timer went off randomly to get samples * Used a hardware counters to count memory cycles * Had external I/O channel which could be externally driven to do useful things like read certain regions of memory. * Used remote terminal emulator in order to reproduce workloads (instead of humans) * Multics believed initially to be able to timeshare 200 users First boot took 9 minutes for the first echo and never ran more than 20 users - Smith: If one is to build a whole new system, it is very hard to figure the performance. Diamond * Diamond is an internal DEC measurement tool * Digital Equipment Corporation (DEC) bought by CPQ bought by HP * Had something called diamond and hardware probes that would have something to collect all the info * Diamond is a hybrid monitor. Hardware probes read the PC, CPU mode, channel and I/O device activity, and system task ID. Software gets the user ID. A minicomputer reads the data and does real time and later analysis. * Diamond can also generate traces. IBM has two products: GTF & SMF: - IBM's GTF (General Trace Facility) * GTF was designed for debugging * Involves no hardware counters * this system generates trace records of any system events - e.g.: I/O interrupts, SIOs, opens, traps, SVCs, dispatches, closes, etc. * GTF generates lots of useless data, and not as much useful data as you would like. * Can take up to 1/3 os CPU time which is very bad. * Smith: I don't know of any I/O device avaialbe on today's market that can absorb this much data rate. - IBM's SMF (System Management Facility) * SMF was designed for accounting and management * Generates records for assorted events -e.g.: jobs, tasks, opens, closes, etc. *SMF also generates lots of useless data, and not as much useful data as you would like. - IBM’s combination of GTF and SMF was most optimal - Some mainframes have console monitors which generate real time load information and measurements (e.g. queue lengths, channel utilization, etc.) ******************* How to use tracing ******************* Workload Characterization * Designers always want to know what the work load of any computer is. * There are 3 types of workloads for Performance Experiments: 1)Live workloads: real people doing real stuff at real times * It is good for measurement but poor for experiments ie, uncontrolled. * Not reproducable. 2)Executable workloads consisting of real samples. * able to do something to be replayed day after day * This is very useful. 3)Synthetic executable workloads. * Paramitized versions of a workload * Make up a set of data. If you gussed right you get good results, otherwise it is meaningless. * A synthetic workload may be needed as a projection of a future workload. To characterize a workload * Decide what to characterize. (not easy - as you can see, many published papers are not interesting) * Figure out how to characterize those items * Figure out how to get the data (!!!) * Get the data. * Exploratory data analysis. * Cluster analysis: cluster data into sets and analize them - Smith: For example, there was an experiment that tried to find the human characteristics based on their zip code!!! - Adrian: Was that done by the Census? :D Statistical Methods * things to do with data * Means, variances, distributions Techniques such as linear regression and factor analysis quite useful. * Can do statistical analysis on data to see if various models fit. * Seldom used - little intersection between class of competant experimental performance analysts and competent statisticians. Analytic Performance Modeling * To be able to build some analytic model * Calculate factors of interests as function of parameters * Much research along this type * Models tend to be queueing models, stochastic process models. Pros/Cons of Analytic Modeling * They are good for capacity planning, I/O subsystem modeling, preliminary design aid. * Capacity planning is a big area of application measure and analyze current system, set up validated model of current system and workload, project changes in workload, and see what sort of system design will handle it. * Analytic models do not capture the fine level of detail needed for some things such as hardware design and analysis - e.g. caches, CPU pipelines. Queueing network models * Queueing networks are powerful technique in analytic modeling and used for: - Capacity planning - Modeling of I/O systems - Back of the envelop designs * Major Components of Queueing Network are: Servers Customers Routing Diagram of queueing network: +---------------------+ | Q-1 | | -----+ | +-->|| | server-1 | ---------->|| |-[]--+ | +-->|| | | | | -----+ | | | | | +-------------+ | | Q-2 | | -----+ | +-->|| | server-2 | || |-[]------> | +-->|| | | | -----+ | | | | | | Q-3 | | -----+ | | || | server-3 | +---|| |-[]----------+ || | -----+ * Customers have a type T * Routing is of form p(u,t1,j,t2) where i and j are servers and t1 and t2 are types * Servers are of type - FCFS exponential, where service rate is a function of queue length - Processor sharing, CPU with time-sharing - Infinite server, all customers being served at shame rate - LCFSPR - last come, first serve, preemptive resume If network is not of type BCMP, then there are other equations (Professor said likely not to be on the midterm) ************* SIMULATIONS ************* Simulation Types: * discrete event simulation Trace Driven Random Number (Stochastic): Use it for something like in Boeing to analyze wind * continuous simulation (e.g. a diff eq) (not used for computer system modeling) * Monte carlo methods (e.g. sampling) Throwing darts and using fractions to determine the area Simulation model has these components: A model of the system, which has a state. A set of events which cause changes in the state. A method for generating such a sequence of events. A measurement component, which records the statistics of interest. Discrete Event general simulation model: Events come from event list. (Events can come from trace). Next event is taken off list. System state is updated. Event list is updated. (events added, deleted, their times changed) Statistics are accumulated. Next event is obtained from event list. Example: simple discrete event simulation of M/G/1 queue. Events come from trace and/or random number generator. MGI queue M - Markovian exponential arrival time G - General anything 1 - server Special languages for simulation: GPSS,SIMSCRIPT,GASP, SLAM, SIMULA SIMULA has automatic queuing with states which allows to write efficiently but runs slowly There are simulation modeling packages: RESQ (IBM), PAWS (UT Austin), QNAP (INRIA - Potier) Analysis of simulation output * Regenerating simulation by finding the regeneration point where the computer returns to a common loop * Time series Analysis by using statistical frequency domain * Repeat Entire Process * Run forever and find the convergence * Batch means, stop and take averages, and compute averages. Easy to fool yourself Drinking wine is good for you => Drinking beer is good for you.. not! Did not include some demographics in studies Back of the envelope calculations (do calculations on the back of a envelope) i.e. How much water flows down the Mississippi? Must generate algorithm in different ways with different parameters i.e. How much damage is there in a fire? Area of damage is 3 square miles Average replacement cost of each is 200 k 3 houses per acre 640 acres of miles total = 3*3*640*200 = $12 billion So total was actually 1.5-2 billion (most likely to have an exam question on this, look at reader) ************************************************ Topic: Current Research in Operating Systems ************************************************ Most of what we have talked about in the area of Operating Systems is not new, but goes back 20-30 years. What are people doing currently? In a recent Operating Systems Conf. Proceedings (Proc. 17'th ACM Symposium on Operating Systems Principles, December, 1999), the principal topics include: Managability, Availability and Performance in a mail ser- vice. Performance of Web Proxy Caching Performance of a stateless, thin-client architecture Energy aware adaptation for mobile environments. Active Networks (customized programs are executed within the network). Building reliable, high performance communication sys- tems. File system usage in Windows NT. The Elephant file system. File system security. Integrating segmentation and paging protection. Resource management on shared-memory multiprocessors. A fast capability based operating system. A naming system for dynamic networks and mobile units. A distributed virtual machine for networked computers. A modular router. Timer support for network processing. CPU priority scheduling. Scheduling for latency sensitive threads. A small, real-time microkernel. In a recent Operating Systems Conf. Proceedings (Proc. 16'th ACM Symposium on Operating Systems Principles, October, 1997), the principal topics include: Performance Analysis - profiling and distributed/parallel programs OS Kernels Caching in Computer Networks Transactions on Networks Security for Java Formal Analysis of Security Running Commodity OS on Scalable Microprocessors Transparent Distributed Shared Memory Scheduler for Multimedia Applications CPU Scheduling Scalable Distributed File System Log Structured File System Reducing I/O latency File Caching and Hoarding in Mobile Systems Update Policies for Mobile Operation Some titles fromn 1993 SOSP: Distributed file systems RAID type file systems Synchronization and its limitations in distributed sys- tems. Distributed system design. Distributed programming Using threads. Memory Management of an object oriented language Relation between operating system structure and memory system performance (effects on the cache of OS code). Concurrent compacting garbage collection. Improved IPC (interprocess communcation) Improved fault isolation. Audio and video in a distributed system. Authentication Location info for distributed systems. From ASPLOS (Architectural Support for Programming Languages and Operating Systems), 10/94, related to OS: Data and control transfer in distributed systems Scheduling and page migration for multiprocessor compute servers Synchronization algorithms for multiprocessors Software overhead in message passing. Software support for exception handling. Performance monitoring. In summary: Networks and the Web Performance Issues- memory, scheduling, networks. Mobility Energy Management File Systems Protection and Security Misc: virtual machines, kernels. Personal View of Important Issues: The world is becoming one large distributed computer system with file migration, process migration, load balanc- ing, distributed transparent file system, etc. This suggests that the important issues are: Efficient ways to write reliable OS With high performance file migration algorithms load balancing distributed transparent file system implementation wireless and mobile systems Supporting mobility Location and naming issues Energy management Real Time Systems are Systems in which there is a real time DEADLINE Typically a mechanical system is being controlled. E.g. assembly line, anti-balistic missile defense Real Time System must be able to: Meet all Deadlines (with 100% or 99+% probability) Handle the aggregate load. If there are N events per second, must be able to handle them, whether or not each has a deadline. This implies: Deadline Scheduler Avoidance of page faults - generally must lock deadline oriented code into memory. Avoidance of I/O operations when near deadline is pend- ing. Usually must keep necessary info in electronic memory, and/or fetch it in advance. This does not imply no cache memory No matter what people say.... Better to have system that somet ================= Summary of above: ================= Current research in OS’s Manageability Performance of Web Proxy Caching Thin Client Energy aware adaption for mobile environment active network building reliable high performance communication system file system usage in windows nt elephant file system file system security integrating segmentation and paging protection resource management on shared memory multiprocessors a fast capability based operating system naming system for dynamic network and mobile units a distributed virtual machine ... distributed systems OS Research in summary: networks and the web performance issues - memory, scheduling, networks killed and done with mobility energy management battery has energy equivalent to nitro glycerin file systems need to put your data somewhere protection and security no more stealing virtual machine and kernels coming back! Professor Smith’s personal view of important issues world is becoming one large distribute computer system with file migration, process migration high performance file migration algorithms load balancing distributed systems wireless and mobile systems In order to support mobility location and naming issues energy management security