These notes include Smith’s lecture notes with notes taken in class interweaved. 
Smith’s notes are denoted with a “+” and notes in take in class are denoted by “-“. 

    +   File Placement

         +   Seek distances will be minimized if commonly  used  files

             are located near center of disk.

             +   Even  better  results  if  reference   patterns   are

                 analyzed and files that are frequently referenced to-

                 gether are placed near each other.

         +   Freqency of seeks, and queueing for disks will be reduced

             if  commonly  used files (or files used at the same time)

             are located on different disks.

             +   E.g. spread the paging data sets and  operating  sys-

                 terms data sets over several disks.



     +   Disk Caching

         +   Keep a cache of recently used disk blocks in main memory.
		-Can do this in main memory and controller. 

             +   Recently read blocks are retained in cache until re-

                 placed.

             +   Writes go to disk cache, and are later written back.

             +   Typically would include index blocks for  an  open

                 file.
	- Most cache is in DRAM, which is volatile. 
	- Linux is not a data-processing system

         +   Also use the cache for read ahead and write behind.
		-write behind is a write to the cache

             +   Can load entire disk tracks into cache at once.

         +   Typically works quite well - hit ratios of 70-90%.

         +   Can also do caching in disk controller - most controllers

             these  days have 64K-4MB of cache/buffer in the controll-

             er.  Mostly useful as buffer, not cache, since  the  main

             memory cache is so much larger.



     +   Prefetching and Data Reorganization
	- If you read a block, it’s likely you are going to read the next few blocks as well, 
	so you read those and put them in memory. 
	Since it’s in memory, when you actually read it, it’ s a lot faster.

         +   Since disk blocks are often read  (and  written)  sequen-

             tially,  it  can be very helpful to prefetch ahead of the

             current read point.

         +   It is also therefore useful to make sure that the  physi-

             cal  layout of the data reflects the logical organization

             of the data - i.e. logically sequential blocks  are  also

             physically sequential.  Thus it is useful to periodically

             reorganize the data on the disk.
	- Don’t want to put sequential file randomly scattered over the disk! Duh!


     +   Data Replication

         +   Frequential used data can be replicated at multiple loca-

             tions on the disk.
	- If it’s at more places at the disk, then the average time to look for it decreases since it’s all over the place.

         +   This means that on writes, extra copies  must  either  be

             updated or invalidated.
	- If you don’t do this, you get lots of different versions which is bad!

     +   ALIS - automatical locality improving storage
	- Basically a smart I/O system that can reorganize itself. Does this in the background during idle time.

         +   Best results  obtained  when  techniques  are  combined:

             reorganize to make sequential, cluster, and replicate.



     +   RAID
	- Redundant Array of Inexpensive Disk

         +   Observations:

             +   Small disks cheaper than large ones (due to economies

                 of scale)	
	- This is per-byte

             +   Failure rate is constant, independent of disk size

             +   Therefore, if we replace a few large disks with lots

                 of small disks, failure rate increases

         +   Solution:

             +   Interleave the blocks of the file across  a  set  of

                 smaller disks, and add a parity disk.
	- Parity in main memory is not enough, but in raid, you will know which physical disk failed so you know which parity it is.
	- If in main memory, you won’t know which bit went bad.

             +   Note that since we presume (a) only one disk failure,

                 and (b) we know which disk failed, we can reconstruct

                 the failed disk.

             +   Can do parity in two directions for extra  reliabili-

                 ty.

         +   Advantage:

             +   Improves read bandwidth.

         +   Problem:

             +   This means that we have to write the parity  disk  on

                 every write.  It becomes a bottleneck.
	- This kills our bandwidth which is very bad.
	
             +   A solution - interleave on a different basis than the

                 number  of  disks.   That  means that the parity disk

                 varies, and the bottleneck is spread around.


	- Put the parity on different blocks, so P1 can be on D2, P2 on D3, etc etc… 


         +   Types of RAID:

             +   RAID 0 - ordinary disks
	- No redundancy. This interleaves cross disks but no parity blocks. 
	- You increase bandwidth, but reliability goes down

             +   RAID 1 – replication
	- This is just a mirror. So you double writes, but your cost doubles. This has no performance gain.

                 +   RAID 4 - parity disk in fixed location
	- See Diagrams

                     +   RAID 5 - parity disk in varying location

	 -----	   -----     -----    -----     -----	
	| 1 5 |   | 2 6 |   | 3 7 |  | 4 8 |   | xor |
	|     |   |     |   |     |  |     |   |     |
	|     |   |     |   |     |  |     |   |     |
	 -----     -----     -----    -----     -----

	- The above raid setup has a single parity and can recover from one failure.
	O O O O O
	O O O O O		The last row and last column are parities. This can recover 
	O O O O O	from two failures
	O O O O O
	O O O O

	===========================


                 Topic: Directories and Other File System Topics



     +   Naming:

         +   How do users refer to their files?

         +   How does OS refer to the file itself?

         +   How does OS find file, given name?



     +   File Descriptor is a data structure or record  that  describes

         the file.

     +   The file descriptor information has to be stored on disk, so

         it will stay around even when the OS doesn't.  (Note that we

         are assuming that disk contents are permanent.)

         +   In Unix, all the descriptors are stored in a fixed size

             array on  disk.  The descriptors also contain protection

             and accounting information.

         +   A special area of disk is used for this  (disk  contains

             two parts:   the  fixed-size  descriptor  array, and the

             remainder, which  is  allocated  for  data  and  indirect

             blocks).

         +   The size of the descriptor array is determined  when  the

             disk  is initialized, and can't be changed.  In Unix, the

             descriptor is called an inode, (index node and its  index

             in the array is called its i-number).  Internally, the OS

             uses the i-number to refer to the file.
	- Inode is the name of the file in the system. This is unique!

             +   IBM calls the equivalent structure the volume table

                 of contents (VTOC).

         +   The Inode is the focus of all file activity  in  UNIX.

             There  is a unique inode allocated for each file, includ-

             ing directories.  An inode is 'named' by its  dev/inumber

             pair. (iget/iget.c)

         +   Inode fields:

             +   reference count (number of times open)

             +   number of links to file
	- Number of directories pointing to the file, if there are zero… then get rid of the file.

             +   owner's user id, owner's group id
	- Need it for protection and accounting (disk quota and system where you are charged for disk space)
	- Disk space and cpu time is getting really cheap… so don’t really know how to charge (haha) :P

             +   number of bytes in file

             +   time last accessed, time last  modified,  last  time

                 inode changed
	- When was this last referenced, time last modified is more useful so you know which version it is.

             +   disk block addresses, indirect blocks (discussed pre-

                 viously)

             +   flags: (inode is locked, file has been modified, some

                 process waiting on lock)
	- These are one-bit flags

             +   file mode: (type of file: character special, directo-

                 ry, block special, regular, symbolic link, socket),
	- Symbolic link is another file

                 +   A socket is an endpoint of a communication, re-

                     ferred  to by a descriptor, just like a file or a

                     pipe.  Two processes can each create a socket and

                     then connect those two endpoints to produce a re-

                     liable byte stream.  (a pipe  requires  a  common

                     parent  process.   a  socket  does  not,  and the

                     processes may be on different machines)

             +   protection info: (set user id on execution, set group

                 id  on  execution,  read, write, execute permissions,

                 sticky bit
	- In inode sticky bit is the bit in an inode representing a directory that indicates that other users can or cannot modify this
	- So basically used for sharing a directory and sharing rights.

             +   count of shared locks on inode
	- Lock you used for reading
	- How many people have it open for reading, makes sense.

             +   count of exclusive locks on inode
	- Lock you used for writing
	- Why would you need this? More than one person writing? Not good!
	- This is more so like a “suggestion” so we should still keep track of how many people have it open.
	- Sort of like a “warning”

             +   unique identifier

             +   file sys associated with this inode

             +   quota structure controlling this file
	- May have limited amount of disk space you can use, it basically tells you your quota oh yay.





         +   When a file is open, its descriptor is kept  in  main

             memory.   When the  file  is  closed,  the descriptor is

             stored back to disk.
	- We don’t store it in disk, because we don’t always want to be going back to disk to get it out.

             +   There is usually a per process table of open files.

                 +   In Unix, there is a process open file table, with

                     one  entry for each file open.  The integer entry

                     into that table is the handle for that file open.

                     Multiple opens for the file will get multiple en-

                     tries.  (note that if a process  forks,  a  given

                     entry can be shared by several processes.)

                     +   (standard-in is #0 and  standard-out  is  #1,

                         stderr is #2, must be per process.)

             +   Unix also has a system open file table, which points

                 to the inode for the file (in the inode table).  This

                 table is system wide.  Maps names to files.
	- We have this because it makes it faster to access an already open file
	- So if the file is already open, we don’t have to go look for the name in the directory.

             +   There is also the inode table, which is a system-wide

                 table holding active and recently used inodes.
	- when you close a file, you don’t necessary delete it from the table
	- We need this because if something gets changed, you want all the processes to be able to see it.

             +   Descriptor is kept in OS space which  is  paged.   So

                 may  be  necessary  to  have  page  fault  to  get to

                 descriptor info.

	- All of these tables are stared in page tables, so we can have a page fault.

     +   Users need a way of referencing files that they leave around

         on  disk.   One approach is  just  to  have  users remember

         descriptor indexes.  I.e. the user would have to remember

         something like  the  number of the descriptor, or some such.

         Unfortunately, not very user friendly.



     +   Of course, users want to use text names to refer  to  files.

         Special  disk  structures called directories are used to tell

         what descriptor indices correspond to what names.

	- This basically maps names to files.

     +   Approach #1:  have a single directory for the  whole  disk.

         Use a special area of disk to hold the directory.

         +   Directory contains <name, index> pairs.

         +   Problems:

             +   If one user uses a name, no-one else can.

             +   If you can't remember the name of a  file,  you  may

                 have to look through a very long list.

             +   Security problem - people can  see  your  file  names

                 (which can be dangerous.)

         +   Old personal computers (pre-Windows) work this way.



     +   Approach #2: have a separate directory for each user (TOPS-10

         approach).   This is still clumsy:  names from a user's dif-

         ferent projects get confused.  Still can't remember names of

         files.
	- This is still a flat directory for each individual user! 
	- File naming was a pain in the ass.


         +   IBM's VM is similar to this.  Files have 3  part  name:

             <name,  type,  location>, where location is A, B, C, etc.

             (i.e. which disk).  Very painful.  (Also, file names lim-

             ited to 8 characters.)



     +   #3 - Unix approach:  generalize the directory structure to a

         tree.

         +   Directories are stored on disk just like  regular  files

             (i.e.  file descriptor with 13 pointers, etc.).

             +   User programs can manipulate directories almost  like

                 any  other  file.  Only  special  system programs may

                 write directories.

         +   Each directory  contains  <name,  file  descriptor  index

             (inode  #)>  pairs.  The file pointed to by the index may

             be  another  directory.   Hence,  get  hierarchical  tree

             structure.   Names  have slashes separating the levels of

             the tree.

         +   There is one special directory, called  the  root.   This

             directory  has  no  name,  and  is the file pointed to by

             descriptor 2 (descriptors 0 and 1 have other special pur-poses).

             +   Note that we need ROOT.  Otherwise, we would have  no

                 way  to  reach any files.  From root, we can get any-

                 where in the file system.

         +   Full file name is the path name, i.e.  full name  from

             root.

             +   A directory consists of  some  number  of  blocks  of

                 DIRBLKSIZ  bytes, where DIRBLKSIZ is chosen such that

                 it can be transferred to  disk  in  a  single  atomic

                 operation (e.g. 512 bytes on most machines).

             +   Each directory block contains some number of directo-

                 ry  entry  structures,  which are of variable length.

                 Each directory entry has info at  the  front  of  it,

                 containing its inode number, the length of the entry,

                 and the length of the name contained  in  the  entry.

                 These  are  followed  by  the name padded to a 4 byte

                 boundary with null bytes.  All names  are  guaranteed

                 null terminated.

         +   Note that in Unix, a file name is not the name of a file.

             It is only a name by which the kernel can search for the

             file.  The inode is really the "name" of the file.

             +   Each pointer from a directory to a file is  called  a

                 hard link.

                 +   In some systems, there is a distinction between a

                     "branch" and a "link", where the link is a secon-

                     dary access path, and the branch is  the  primary

                     one (goes with ownership).

             +   You "erase" a file by removing  a  link  to  it.   In

                 reality,  a count is kept of the number of links to a

                 file.  It is only really erased when the last link is

                 removed.

                 +   To really erase a file, we put the blocks of  the

                     file on the free list.
	- You basically garbage collect. If there are no links, then it is put on the free list. 
	- If you created a file, and someone else has it open, then you can’t delete it. However, you can overwrite it.
	- Unix says that you can’t have multiple hard links so that way you won’t have a loop. Recursive loops are bad bad bad.
	- Hard link is an actual item number. 

     +   Symbolic Links

         +   There are two ways to "link" to  another  directory  or

             file.   One is a direct pointer.  In Unix, such links are

             limited to not cross "file systems" - i.e. not to another

             disk.
	- Recursive commands are okay in this case.

         +   We can use symbolic links, by which instead  of  pointing

             to  the  file  or  directory, we have a symbolic name for

             that file or directory.

         +   We need to be careful not to create cycles in the  direc-

             tory  system - otherwise recursive operations on the file

             system will loop.  (E.g.   cp  -r).   In  Unix,  this  is

             solved  by  not  permitting hard links to existing direc-

             tories (except by the superuser).





     +   Pros and Cons of tree structured directory scheme

         +   Can organize files in logical manner.  Easy to  find  the

             file  you're  looking  for,  even  if  you  don't exactly

             remember its name.

         +   "Name" of the file is in fact a concatenation of the path

             from  the  root.   Thus name is actually quite long- pro-

             vides semantic info.

         +   Can have duplicate names, if path to the  file  is  dif-

             ferent.

         +   Can (assuming protection scheme permits) give away access

             to a subdirectory and the files under it, without giving

             access to all files.  (Note: Unix does not permit  multi-

             ple hard links to a directory, unless done by superuser.)

         +   Access to a file  requires  only  reading  the  relevant

             directories,  not  the entire list of files.  (My list of

             files prints out to a 1/2" printout-  20000 files)

         +   Structure is more complex to move around and maintain

         +   A file access may require that many directories be  read,

             not just one.

         +   It is very nice that directories and file descriptors are

             separate,  and that directories are implemented just like

             files.  This simplifies the implementation and management

             of  the structure (can write ``normal'' programs to mani-

             pulate them as files).

             +   I.e.  the  file  descriptors  are  things  the   user

                 shouldn't  have to touch.  Directories can be treated

                 as normal files.

	- A tree directory is basically like a tree that we learned in cs61b. (you can lookup a tree structure on wiki). 
	We can have a tree structure that have loops, so a child can reference a parent. You can also have two separate 
	nodes reference the same file!


     +   Working directory:  it is cumbersome constantly  to  have  to

         specify the full path name for all files.

         +   In Unix, there is one directory per process,  called  the

             working directory, which the system remembers.

         +   This is not the same as  the  home  directory,  which  is

             where  you are at log-in time, and which is in effect the

             root of your personal file system.



     +   Every user has a search path, which is a list of  directories

         in  which  to look to resolve a file name.  The first element

         is almost always the working directory.
	- Type a name of something… you don’t want to type the full path for every file that you have.
	- There is some way of specifying search path which is where you would look for this file that you want.

         +   ``/'' is an escape to allow full path names.   I.e.  most

             names  are  relative  file names.  Ones with "/" are full

             (complete) path names.

         +   Note that in Unix, the search path is maintained  by  the

             shell.  If any other program wants to do the same, it has

             to rebuild the facilities from scratch.  Should be in the

             OS.  ("set path" in .cshrc or .login.)

             +   My path is: (. ~/bin /usr/new /usr/ucb /bin  /usr/bin

                 /usr/local /usr/hosts ~/com)

             +   Basically, want to look in  working  directory,  then

                 system library directories.

             +   We probably don't want a search strategy that actual-

                 ly  searches more widely.  If it did, it might find a

                 file that wasn't really the target.

         +   This is yet another example of locality.

         +   Simple means to change working  directory  -  "cd".   Can

             also  refer  to  directories  of other users by prefacing

             their logins by "~".
	- Search paths are important because you don’t want to find the wrong file etc etc




     +   Operations on Files

         +   Open - put a file descriptor  into  your  table  of  open

             files.   Those  are  the files that you can use.  May re-

             quire that locks be set, and a user count be incremented.

             (If  any  locking  is  involved,  may  have  to check for

             deadlock.)

         +   Close - inverse of open.

         +   Create a file - sometimes done automatically by open.

         +   Remove (rm) or erase - drop the link to  the  file.   Put

             the  blocks  back  on  the  free list if this is the last

             link.

         +   Read - read a record from the file.  (This usually  means

             that  there is an "access method" - i.e. I/O code - which

             deals with the user in terms of records, and  the  device

             in terms of physical blocks).

         +   Write - like read, but may also require disk space  allo-

             cation.

         +   Rename ("mv" or "move") - rename the file.  Unix combines

             two different operations here.  Rename would strictly in-

             volve changing the file name within the  same  directory.

             "move"  moves  the  file  from  one directory to another.

             Unix does both with one command.

             +   Note that mv also destroys old file if there is  one

                 with new name.  (This is BUG).

                 +   (which could be considered a bug, not a feature)

         +   Seek - move to a given location in the file.

         +   Synch - write blocks of file from disk cache back to disk

         +   Change properties (e.g. protection info, owner)

         +   Link - add a link to a file

         +   Lock & Unlock - lock/unlock the file.

         +   Partial Erase (truncate)

     +   Note that commands such as "copy", "cat", etc., are built out

         of the simpler commands listed above.