CS 162 Lecture Notes for Wednesday 3/30/2005 Announcement: MIDTERM NEXT WEDNESDAY April 6, 2005 7pm 10 Evans I/O Optimization **BLOCK SIZE OPTIMIZATION -Small blocks - Small I/O buffers -I/O buffer is either located on the controller or the I/O device so all possible relevant info can be loaded quickly into the buffer, and then what is desired can be chosen from there. - Are quickly transferred since size is smaller. - Require lots more transfers for a fixed amount of data as more blocks are needed to hold data for a given size. - High overhead on disk - wasted bytes for every disk block. (Inter record gaps, header bytes, ERC bytes). - More entries in file descriptor to point to blocks (Inode). -Inode is a structure that holds all the relevant info about a file, including pointers to the actual data in the file. They will be discussed in a later lecture. (Does not include file “Name”). - Less internal fragmentation (same idea as small page size). - If random allocation, more seeks. -ie if a file is 1 MB in size, and the block size is 50 KB, you will need 20 blocks to store all the info. If these blocks are randomly allocated, then you will need to perform 20 random seeks. If the block size were bigger, fewer seeks would be necessary. - Optimal block sizes tend to range from 2K to 8K bytes. This optimum is continually increasing/changing with improvements in technology. - Berkeley Unix uses 4K blocks. (now 8K?) Basic (hardware) block size in VAX is 512 bytes. - Berkeley Unix also uses fragments that are 1/4 the size of the logical block size to reduce the internal fragmentation associated with file allocation. **DISK ARM SCHEDULING: in timesharing systems, it may sometimes be the case that there are several disk I/O's requested at the same time. DIFFERENT SCHEDULING ALGORITHMS: - First come first served (FIFO, FCFS): may result in a lot of unnecessary disk arm motion under heavy loads. -The arm could need to swing from one end of the disk to the other if the requests are such that the locations on disk are far apart: x-----------------------x / x---------------------/ \ \--------------------------x - Shortest seek time first (SSTF): handle nearest request first. This can reduce arm movement and result in greater overall disk efficiency, but some requests may have to wait a long time. - Problem is starvation. Imagine that disk is heavily loaded, with 3 open files. Two of the files located near center of disk. Other file near edge. Disk can be fully busy servicing first two files, and ignoring last file. - Scan: Works like an elevator. The arms moves in one direction, servicing requests until there are no additional requests in that direction. Then, it reverses direction and continues. - This algorithm doesn't get hung up in any one place for very long. It works well under heavy load. But it may not get the shortest seek times. - Also, tends to neglect files at periphery of disk since on every pass, the center is passed over twice while the edges are only reached once. - CScan - (circular scan) like a one-way elevator. Moves only in one direction. When it finds no further requests in the scan direction, it returns immediately to the furthest request in the other direction, and it resumes the scan. - This treats all files (and tracks) equally, but some- what higher mean access time than Scan. - SSTF has best mean access time. Scan or CScan can be used if there is a danger of starvation. - Most of the time there aren't very many disk requests in the queue, so this isn't a terribly important decision. - Also, if contiguous allocation is used (as with OS/360), then seeks are seldom required as all the files will be located in one contiguous block (or a few blocks). **ROTATIONAL SCHEDULING - It is rare to have more than one request outstanding for a given cylinder. (This was more relevant when drums were used.) - When we have this issue, SRLTF (shortest rotational latency first) works well. -This is the complement of SSTF, working in much the same way. - But rotational scheduling can be useful for writing data, if we don't have to write back to same location. (log structured file system.) - Rotational scheduling is hard using logical block address (LBA) - since you don't know the rotational position or the number of blocks per track. - Rotational and seek scheduling can be usefully combined (into shortest time to next block) if done in the onboard disk controller, which should know the angular and radial position. **SKIP-SECTOR or INTERLEAVED DISK ALLOCATION - Imagine that you are reading the blocks of a file sequen- tially and quickly, and file is allocated sequentially. - Usually, will find that you try to read a block just after the start of the block has been passed. - Solution is to allocate file blocks to alternate disk blocks or sectors. This way, you have some leeway (the time it takes for your head to pass over a block) to handle requests and get ready for the next block. - Note that if all bits read are immediately placed into a semiconductor buffer, this is unnecessary. The whole track can be transferred to the buffer and then the transfer to the requestor can occur from there. **TRACK OFFSET FOR HEAD AND CYLINDER SWITCHING - This is analogous to skip-sector allocation, just for jumping from one track to the other. - It takes time to switch between heads on different tracks or cylinders. Thus we may want to skip several blocks when moving sequentially between tracks, to allow the head to be selected. The number of blocks that need to be offset will be a function of the rotational speed of the disk. **FILE PLACEMENT - Seek distances will be minimized if commonly used files are located near center of disk. This result comes from statistical analysis of the average time it takes the head to move from an arbitrary location of the disk to the center. - Even better results if reference patterns are analyzed and files that are frequently referenced to- gether are placed near each other. - Frequency of seeks, and queueing for disks will be reduced if commonly used files (or files used at the same time) are located on different disks. - E.g. spread the paging data sets and operating sys- tems data sets over several disks. This way, all the seeks won’t end up being sent to the same disk. **DISK CACHING - Keep a cache of recently used disk blocks in main memory. - Recently read blocks are retained in cache until re- placed. - Writes go to disk cache, and are later written back. this write-back can occur at such a time as to not cause any (much) performance loss. - The cache would typically include the index blocks for an open file. - Also use the cache for read ahead and write behind. - Can load entire disk tracks into cache at once. - Typically works quite well – the hit ratios are in the range of 70-90%. - Can also do caching in disk controller - most controllers these days have 64K-4MB of cache/buffer in the controll- er. Mostly useful as buffer, not cache, since the main memory cache is so much larger. **PREFETCHING AND DATA REORGANIZATION - Since disk blocks are often read (and written) sequen- tially, it can be very helpful to prefetch ahead of the current read point. -Prefetching refers to either the user or the operating system knowing or guessing before hand what data/info will be needed. All the data is then read in before hand so it will be available when it is needed. - It is also therefore useful to make sure that the physi- cal layout of the data reflects the logical organization of the data - i.e. logically sequential blocks are also physically sequential. Thus it is useful to periodically reorganize the data on the disk. **DATA REPLICATION - Frequently used data can be replicated at multiple loca- tions on the disk. This way, when an access to that data is needed, the head will have many positions to choose from when fetching the data, and will be able to pick the closest location, increasing performance. - This means that on writes, extra copies must either be updated or invalidated. **ALIS - automatical locality improving storage - Research project meant to find out the effects/tradeoffs of various schemes, and trying to find an optimal technique that will perform well in many situations. - Best results obtained when techniques are combined: reorganize to make sequential, cluster, and replicate. **RAID(Redundant Array of Inexpensive/Independent Disks) - Observations: - Small disks cheaper than large ones (due to economies of scale) - Failure rate is constant, independent of disk size - Therefore, if we replace a few large disks with lots of small disks, failure rate increases - Solution: - Interleave the blocks of the file across a set of smaller disks, and add a parity disk. - Note that since we presume (a) only one disk failure, and (b) we know which disk failed, we can reconstruct the failed disk. - Can do parity in two directions for extra reliabili- ty. - Advantage: - Improves read bandwidth. - Problem: - This means that we have to write the parity disk on every write. It becomes a bottleneck. - A solution - interleave on a different basis than the number of disks. That means that the parity disk varies, and the bottleneck is spread around. In effect the parity disk is scattered around all/some disks. - Types of RAID: - RAID 0 - ordinary disks -This technique involves striping the data across as many disks as are in the array. This leads to faster reads mainly, but also faster writes. There is no redundancy in this scheme, so data can easily be lost - RAID 1 – replication -In this technique, all the data is replicated to a second set of disks, doubling the number of disks required for a given size of storage. Writes take longer now as both copies of the data need to be updated. Reads, however, can happen from either disk. - RAID 4 - parity disk in fixed location -The parity in this scheme is contained on one disk which accompanies the rest of the disks in the array. This provides good data protection without the need for a whole second set of disks like in RAID 1. The rest of the disks in the array act like a RAID 0 array. If the parity disk fails however, there could be a problem. - RAID 5 - parity disk in varying location -This scheme provides the protection of RAID 4, but doesn’t have the problem of the parity disk being damaged or failing. This is because the parity data is spread throughout all of the disks in the array. If any one disk fails, all the data can still be reconstructed from the parity data that’s been spread around.