Lecture Notes for Monday 5/2/2005 By Alan Jensen ANNOUNCEMENTS Reminder: Last midterm is at 7pm on 5/9 in 10 Evans. -There will almost definitely not be a lecture on that day. -They're trying to get everything graded by then, and so it is also the deadline to contest grades. TOPICS -Finish performance, talk about current research. Picking up where we left off: +Hardware Counters -Modern CPUs have counters built into the hardware. Can be set to count various things: branches, misses, cycles, instructions, etc., so it can later be read out. -These are very useful for finding why certain things are going slow/fast (statistically). -Intel processors use these. +Multics Measurements -Sampled using timer interrupts. -Used hardware counter for memory cycles. -Had external I/O channel which could be externally driven to do useful things like read certain regions of memory. -Used remote terminal emulator to drive system. -It is always difficult to project performance predictions onto a new system with multiple users (for obvious reasons - too many unknowns). +Diamond -Diamond is an internal DEC measurement tool - a hybrid monitor. Hardware probes read the PC, CPU mode, channel and I/O device activity, and system task ID. Software gets the user ID. A minicomputer reads the data and does real time and later analysis. Can generate traces also. +IBM's GTF (General Trace Facility) -IBM - generates LARGE trace records of any system events - e.g. I/O interrupts, SIOs, opens, traps, SVCs, dispatches, closes, etc. ~Designed for debugging. Generates lots of useless data, and not as much useful data as you would like. ~No hardware counter ~Not so good for performance studies because the records are SO large and SO slow to generate (eating CPU time) - because the I/O cannot absorb the data as fast as it is generated. +IBM's SMF (System Management Facility) - Generates records for assorted events - jobs, tasks, opens, closes, etc. ~Designed for accounting and management. Also generates lots of useless data, and not as much useful data as you would like. ~Not designed for performance either. It glues together with GTF, but measurements are located at different places in the OS. Relating one to the other is problematic because all the blocks have to be maintained by data structures that may not fit in memory. -Some mainframes have console monitors which generate real time load information and measurements (e.g. queue lengths, channel utilization, etc.) +Workload Characterization -VERY Important part of any workload study - must know what the workload is (what the system is going to be asked to do). -Three types of workloads for Performance Experiments: 1) Live workloads - good for measurement but poor for experiments - uncontrolled, and not reproducable. 2) Executable workloads consisting of real samples. Very useful. 3) Synthetic executable workloads - parameterized versions of real workloads. ~A synthetic workload may be needed as a projection of a future workload. +To characterize a workload: -Decide what to characterize. (not easy - as you can see, many published papers are not interesting) -Figure out how to characterize those items -Figure out how to get the data (!!!) -Get the data. -Exploratory data analysis - "poking around" -Cluster analysis - grouping data into types -If someone doesn't know what they're studying, their minus evaporation = some solution. Another would be to compute volume of water & a cross-section(at mouth of river) & rate of flow (assume uniform flow). These 2 ways of calculating it. Might be close to correct (within a factor of 2 or 3). Another example: Oakland fire. $5 billion in damages reported until they found out it wasn characterization is worthless, because the numbers aren't related to what really should be measured. If you forgot a parameter, you have to redo all the tests. +Statistical Methods -Means, variances, distributions -Techniques such as linear regression and factor analysis quite useful. -Can do statistical analysis on data to see if various models fit. -Seldom used - little intersection between class of com- petant experimental performance analysts and competent statisticians. +Analytic Performance Modeling -Build analytic model of some type of system of interest. -Calculate factors of interest as function of parameters -Lots of research of this type. Some of it useful. -Simple cases are easy, don't need to run the simulation. -Models tend to be queueing models, stochastic process models. - Most of progress in queueing theory in last 30-40 years due to computer system modeling. - Advances are: ~class of queueing network models which are easily solved, ~efficient computational algorithms for these solutions, ~good approximation methods for systems not easily solved. +Pros/Cons of Analytic Modeling -They are good for capacity planning, I/O subsystem model- ing, preliminary design aid. ~Capacity planning is a big area of application - measure and analyze current system, set up validated model of current system and workload, project changes in workload, and see what sort of system design will handle it. -Analytic models do not capture the fine level of detail needed for some things such as hardware design and analysis - e.g. caches, CPU pipelines. -Not so good for the stochastic model. +Queueing networks are powerful technique in analytic modeling. -Major Components of Queueing Network are: ~Servers - the circles/nodes in the diagram (the queue is to the left of the node) ~Customers - the contents of each queue, travelling through the routing arrows. ~Routing - the arrows in the diagram, pointing to queues -Diagram of queueing network (example): |_|_|_|_|_|O ----------> |_|_|_|_|_|O----------> |_|_|_|_|_|O -\ /\ / /\ | \ / \ / \ ------- \----------- ----- / \ / \ v |_|_|_|_|_|O ------------> |_|_|_|_|_|O -\ /\ | \ / \------------------------------------- -Many types of queueing networks can be easily solved, such as the following (called "BCMP" - Baskett Chandy Muntz & Palacios): -Customers can have a type T. -Routing is system can be of form p(i,t1,j,t2), where i and j are servers, and t1 and t2 are types. -Servers can be ~FCFS exponential, with service rate a function of queue length ~Processor sharing, any distribution ~Infinite server - any distribution. ~LCFSPR - last come, first serve, preemptive resume, any distribution -Solution is product of terms for each service sta- tion. (See JACM, 1975.) -If network is not of BCMP type, probably can't be solved exactly. There are approximation methods that can be used. +Simulation -We did a discrete one for the first assignment, which was mostly trace-driven. -Types: ~discrete event simulation -Trace Driven -Random Number (Stochastic) ~continuous simulation (e.g. a diff eq) (not used for computer system modeling) ~Monte carlo methods (e.g. sampling) -Simulation model has these components: ~A model of the system, which has a state. ~A set of events which cause changes in the state. ~A method for generating such a sequence of events. ~A measurement component, which records the statistics of interest. -Discrete Event general simulation model: ~Events come from event list. (Events can come from trace). ~Next event is taken off list. ~System state is updated. ~Event list is updated. (events added, deleted, their times changed) ~Statistics are accumulated. ~Next event is obtained from event list. -Example: simple discrete event simulation of M/G/1 queue. ~M = exponential arrival times ~G = general servoce times ~1 means only one server ~There are other abbreviations. This is just notation. -Events come from trace and/or random number generator. -There are special languages for simulation (GPSS, SIMSCRIPT, GASP, SLAM, SIMULA) ~All have stuff in them like automatic queueing. They run slow, even though writing in them is efficient. -There are simulation modeling packages: RESQ (IBM), PAWS (UT Austin), QNAP (INRIA - Potier) -Analysis of simulation output ~Regenerative simulation - find regeneration point, do standard IID statistics ~Time Series Analysis - do time series analysis (e.g. au- tocorrelation anal) - also called Spectral method. ~Repeat entire simulation run ~Very long runs ~Batch means - take long samples, and consider them independent. -Takes a very long time before reaching a steady state. Hard to get parameters right. -It's easy to fool yourself with experimental methods, even if the results are reasonable. Example: deaths from being fat. Used to be statistically horrible to be overweight, but now being slightly plump is best. Another Example: drinking wine/beer is good for you. Latest study says its not true, they didn't correct for demographics (wine drinkers are typically wealthy, and therefore normally healthier anyway). We need to correct for obvious stuff like this. The numbers don't always mean something. +Back of the Envelope methods -These are calculations that are small enough to be done (within reasonable error) on the back of an envelope. -He asks us how much water flows down the Mississippi each year. Students shouted out numbers for the size of the basin and the annual rainfall near the Mississippi. This gave us some solution (the exact number isn't what is important here). He then asks us for another way to solve the problem. So students shout numbers for the size of a cross-section of the mouth of the Mississippi and the rate of flow there (assuming it's uniform). Again, when we had reasonable numbers, we got some solution. The important point here is that the solutions were quite close, and we could reason that they were thus fairly correct. -We also went over examples for the Oakland hills fire damage, the capacity of a 4-lane freeway, and the volume of a six foot person. The big point here is that with reasonable parameters, we got reasonably correct answers. ~(Important)~ We can do this with computers too! -Problem: need the values of the parameters! We saw that the news reporter got the parameters wrong for the Oakland fire. But with each example, we saw that if we got the parameters right, we came up with reasonably good solutions. +TOPIC +Current Research -We quickly read through some (slightly old) slides mentioning what the issues are that are currently being researched. Topics to note were networking, security, various performance (obviously, this is an eternal target for research), Internet, Peer-to-Peer software, Virtual Machines, OS robustness, etc. Big Point: everything we've studied in this class fits into these topics. -Some topics have died, but some people study them anyway because they don't know the topic is dead. +In Summary -Networking is still very important, performance is ALWAYS significant, there will always be problems with optimization, mobility is a sprouting area of research (battery life, etc.), file systems (everyone wants to save stuff and have it be there when they get back), and Virtual Machines is a re-emerging topic. +Personal View of Important Issues: -The world is becoming one large distributed computer system with file migration, process migration, load balancing distributed transparent file system, etc. -This suggests that the important issues are: ~Efficient ways to write reliable OS ~With high performance -file migration algorithms -load balancing -distributed transparent file system implementation -wireless and mobile systems ~Supporting mobility -Location and naming issues -Energy management Next Lecture: on Grad School. Reminder Again: Last midterm is at 7pm on 5/9 in 10 Evans.