CS 162 Lecture Notes for Wednesday 3/30/2005

Announcement:
MIDTERM NEXT WEDNESDAY
April 6, 2005 7pm 10 Evans

I/O Optimization

    **BLOCK SIZE OPTIMIZATION
         -Small blocks
             -   Small I/O buffers
                 -I/O buffer is either located on the controller or the 			
			I/O device so all possible relevant info can be
			loaded quickly into the buffer, and then what is
			desired can be chosen from there.
             -   Are quickly transferred since size is smaller.
             -   Require lots more transfers for a  fixed  amount  of
                 data as more blocks are needed to hold data for a 
			given size.
             -   High overhead on disk - wasted bytes for  every  disk
                 block. (Inter record gaps, header bytes, ERC bytes).
             -   More entries in file descriptor to  point  to  blocks
                 (Inode).
			-Inode is a structure that holds all the relevant
			info about a file, including pointers to the actual
			data in the file. They will be discussed in a
			later lecture. (Does not include file “Name”).
	- Less internal fragmentation (same idea as small page
	size).
	- If random allocation, more seeks.
		-ie if a file is 1 MB in size, and the block size is 
		50 KB, you will need 20 blocks to store all the
		info. If these blocks are randomly allocated, then
		you will need to perform 20 random seeks. If the
		block size were bigger, fewer seeks would be 
		necessary.
      -   Optimal block sizes tend to range from 2K  to  8K  bytes.
		This optimum is continually increasing/changing with 
		improvements in technology.
      -   Berkeley Unix uses 4K blocks.  (now 8K?) Basic (hardware)
		block size in VAX is 512 bytes.
             -   Berkeley Unix also uses fragments that are 1/4  the
                 size of the logical block size to reduce the internal
			fragmentation associated with file allocation.


   **DISK ARM SCHEDULING: in timesharing systems, it may sometimes
	be  the  case  that there are several disk I/O's requested at the same time.

	DIFFERENT SCHEDULING ALGORITHMS:

         -   First come first served (FIFO, FCFS):  may  result  in  a
             lot of unnecessary disk arm motion under heavy loads.
             -The arm could need to swing from one end of the disk to
			the other if the requests are such that the locations
			on disk are far apart:

			x-----------------------x
							/
			 x---------------------/
			  \
			   \--------------------------x
										    
						    
         -   Shortest seek time first (SSTF):  handle nearest  request
             first.   This  can  reduce  arm  movement  and  result in
             greater overall disk efficiency, but  some  requests  may
             have to wait a long time.
             -   Problem is starvation.  Imagine that disk is  heavily
                 loaded,  with 3 open files.  Two of the files located
                 near center of disk.  Other file near edge.  Disk can
                 be fully busy servicing first two files, and ignoring
                 last file.

   	  -	Scan: Works like an elevator. The arms moves in one direction, 
		servicing requests until there are no additional requests in that
		direction. Then, it reverses direction and continues. 
		-   This algorithm doesn't get hung up in any  one  place for very long.  
			It works well under heavy load. But it may not get the shortest seek times.
		-   Also, tends to neglect files at periphery of disk since on every pass, 
			the center is passed over twice while the edges are only reached once.

  	  -  CScan - (circular scan) like a  one-way  elevator.  Moves
             only in one direction.  When it finds no further requests
             in the scan direction,  it  returns  immediately  to  the
             furthest  request  in the other direction, and it resumes
             the scan.
             -   This treats all files (and tracks) equally, but some-
                 what higher mean access time than Scan.
     	
        -   SSTF has best mean access time.  Scan or CScan can be used if
		there is a danger of starvation.
	  -   Most of the time there aren't very many disk requests in  the queue, 
		so this isn't a terribly important decision.
        -   Also, if contiguous allocation is used (as with OS/360), then
         	seeks are seldom required as all the files will be located
		in one contiguous block (or a few blocks).

 
   **ROTATIONAL SCHEDULING
         -   It is rare to have more than one request outstanding  for
             a  given  cylinder.   (This  was more relevant when drums
             were used.)
	   -   When we have this issue, SRLTF (shortest rotational latency first) works well.
		-This is the complement of SSTF, working in much
		the same way.
         -   But rotational scheduling can be useful for writing data,
             if  we  don't  have to write back to same location.  (log
             structured file system.)
         -   Rotational scheduling is hard using logical block address
             (LBA)  -  since you don't know the rotational position or
             the number of blocks per track.
         -   Rotational and seek scheduling can be  usefully  combined
             (into shortest time to next block) if done in the onboard
             disk controller, which should know the angular and radial
             position.

    **SKIP-SECTOR or INTERLEAVED DISK ALLOCATION
         -   Imagine that you are reading the blocks of a file sequen-
             tially and quickly, and file is allocated sequentially.
         -   Usually, will find that you try  to  read  a  block  just
             after the start of the block has been passed.
         -   Solution is to allocate file  blocks  to  alternate  disk
             blocks  or  sectors. This way, you have some leeway
		(the time it takes for your head to pass over a block) to
		handle requests and get ready for the next block.
             -   Note that if all bits  read  are  immediately  placed
                 into a semiconductor buffer, this is unnecessary.
			The whole track can be transferred to the buffer
			and then the transfer to the requestor can occur from
			there.

     **TRACK OFFSET FOR HEAD AND CYLINDER SWITCHING
	   -	 This is analogous to skip-sector allocation, just for
		 jumping from one track to the other. 
         -   It takes time to switch between heads on different tracks
             or  cylinders.  Thus  we  may want to skip several blocks
             when moving sequentially between  tracks,  to  allow  the
             head to be selected. The number of blocks that need to be
		 offset will be a function of the rotational speed of the 
		 disk.

     **FILE PLACEMENT
         -   Seek distances will be minimized if commonly  used  files
             are located near center of disk. This result comes from
		 statistical analysis of the average time it takes the head
		 to move from an arbitrary location of the disk to the
		 center.
             -   Even  better  results  if  reference   patterns   are
                 analyzed and files that are frequently referenced to-
                 gether are placed near each other.
         -   Frequency of seeks, and queueing for disks will be reduced
             if  commonly  used files (or files used at the same time)
             are located on different disks.
             -   E.g. spread the paging data sets and  operating  sys-
                 tems data sets over several disks. This way, all the
		     seeks won’t end up being sent to the same disk.

     **DISK CACHING
         -   Keep a cache of recently used disk blocks in main memory.
             -   Recently read blocks are retained in cache until  re-
                 placed.
             -   Writes go to disk cache, and are later written back.
		     this write-back can occur at such a time as to not
		     cause any (much) performance loss.
             -   The cache would typically include the index blocks for
		     an open file.
         -   Also use the cache for read ahead and write behind.
             -   Can load entire disk tracks into cache at once.
         -   Typically works quite well – the hit ratios are in the 
		 range of 70-90%.
         -   Can also do caching in disk controller - most controllers
             these  days have 64K-4MB of cache/buffer in the controll-
             er.  Mostly useful as buffer, not cache, since  the  main
             memory cache is so much larger.


     **PREFETCHING AND DATA REORGANIZATION
         -   Since disk blocks are often read  (and  written)  sequen-
             tially,  it  can be very helpful to prefetch ahead of the
             current read point.
		-Prefetching refers to either the user or the operating
		 system knowing or guessing before hand what data/info
		 will be needed. All the data is then read in before hand
		 so it will be available when it is needed.
         -   It is also therefore useful to make sure that the  physi-
             cal  layout of the data reflects the logical organization
             of the data - i.e. logically sequential blocks  are  also
             physically sequential.  Thus it is useful to periodically
             reorganize the data on the disk.

     **DATA REPLICATION
         -   Frequently used data can be replicated at multiple loca-
             tions on the disk. This way, when an access to that data
		 is needed, the head will have many positions to choose
		 from when fetching the data, and will be able to pick the
		 closest location, increasing performance.
         -   This means that on writes, extra copies  must  either  be
             updated or invalidated.

     **ALIS - automatical locality improving storage
	   -	 Research project meant to find out the effects/tradeoffs
		 of various schemes, and trying to find an optimal
		 technique that will perform well in many situations.
         -   Best  results  obtained  when  techniques  are  combined:
             reorganize to make sequential, cluster, and replicate.

     **RAID(Redundant Array of Inexpensive/Independent Disks)
         -   Observations:
             -   Small disks cheaper than large ones (due to economies
                 of scale)
             -   Failure rate is constant, independent of disk size
             -   Therefore, if we replace a few large disks with  lots
                 of small disks, failure rate increases
         -   Solution:
             -   Interleave the blocks of the file  across  a  set  of
                 smaller disks, and add a parity disk.
             -   Note that since we presume (a) only one disk failure,
                 and (b) we know which disk failed, we can reconstruct
                 the failed disk.
             -   Can do parity in two directions for extra  reliabili-
                 ty.
         -   Advantage:
             -   Improves read bandwidth.
         -   Problem:
             -   This means that we have to write the parity  disk  on
                 every write.  It becomes a bottleneck.
             -   A solution - interleave on a different basis than the
                 number  of  disks.   That  means that the parity disk
                 varies, and the bottleneck is spread around. In effect
		     the parity disk is scattered around all/some disks.

         -   Types of RAID:
             - RAID 0 - ordinary disks
			-This technique involves striping the data across as
			many disks as are in the array. This leads to faster
			reads mainly, but also faster writes. There is no
			redundancy in this scheme, so data can easily be lost
		 - RAID 1 – replication
			-In this technique, all the data is replicated to a
			second set of disks, doubling the number of disks 
			required for a given size of storage. Writes take
			longer now as both copies of the data need to be 
			updated. Reads, however, can happen from either disk.
		 - RAID 4 - parity disk in fixed location
			-The parity in this scheme is contained on one disk
			which accompanies the rest of the disks in the array.
			This provides good data protection without the need
			for a whole second set of disks like in RAID 1.
			The rest of the disks in the array act like a RAID
			0 array. If the parity disk fails however, there
			could be a problem. 
		 - RAID 5 - parity disk in varying location
			-This scheme provides the protection of RAID 4, but
			doesn’t have the problem of the parity disk being
			damaged or failing. This is because the parity data
			is spread throughout all of the disks in the array.
			If any one disk fails, all the data can still be reconstructed from
			the parity data that’s been spread around.