Peng Lou
4/4 Notes


Topic: Directories and Other File System  Topics


FILE DESCRIPTOR AND INODE
-------------------------

     +   File Descriptor is a data strucure or record  that  describes
	 the file.

     +   The file descriptor information has to be stored on disk,  so
         it  will stay around even when the OS doesn't.  (Note that we
         are assuming that disk contents are permanent.)

     ************
     * - Why store file descriptors on disk?  Logical and convenient approach and it's
     *   fairly fast.
     ************

         +   In Unix, all the descriptors are stored in a  fixed  size
             array  on  disk.  The descriptors also contain protection
             and accounting information.

         +   A special area of disk is used for  this  (disk  contains
             two  parts:   the  fixed-size  descriptor  array, and the
             remainder, which  is  allocated  for  data  and  indirect
             blocks).

         +   The size of the descriptor array is determined  when  the
             disk  is initialized, and can't be changed.  In Unix, the
             descriptor is called an inode, (index node and its  index
             in the array is called its i-number).  Internally, the OS
             uses the i-number to refer to the file.

     ************
     * - An Inode is not only the index of the file descriptor array, but also an index of
     *   the disk itself          
     * Q: How can does the system know how big the descriptor array should be?
     * A: Doesn't matter.  Inodes are small (take up 3% of disk usually), so the system
     *    could afford to allocate more than needed and waste some.	
     ************

         +   The Inode is the focus of  all  file  activity  in  UNIX.
             There  is a unique inode allocated for each file, includ-
             ing directories.  An inode is 'named' by its  dev/inumber
             pair. (iget/iget.c)

         +   Inode fields:

             +   reference count (number of times open)
           *   number of links to file - how many accesses to file                  
             +   owner's user id, owner's group id
             +   number of bytes in file
             +   time last accessed, time  last  modified,  last  time
                 inode changed
           *   disk block addresses, indirect blocks (discussed pre-
               viously) - indirect block pointers resides in the 1st 12 files
             +   flags: (inode is locked, file has been modified, some
                 process waiting on lock)
             +   file mode: (type of file: character special, directo-
                 ry, block special, regular, symbolic link, socket),
                 +   A socket is an endpoint of a  communication,  re-
                     ferred  to by a descriptor, just like a file or a
                     pipe.  

         +   (items below not in text on 4.4BSD)
          *   protection info (~ 10 bits): (set user id on execution, set group
                id  on  execution,  read, write, execute permissions,
                sticky bit? (check))

     ************
     * sticky bit - indicates unprivileged users cannot change or delete files in the
     *	            directory
     ************

     ************
     * Some OS history:
     *
     *	2 version of Unix (pretty much the same:
     *	
     *	1) AT&T - sun OS based on this; sys V was 32 bits
     *	2) Berkeley - v. 4.1 had VM.  It was more robust early and then AT&T caught
     *                on.  Note: HP UX has a Berkeley kernel 
     ************

           *   count of shared locks on inode - read
           *   count of exclusive locks on inode - write

     ************
     * Note: Unix allows multiple writes to the same file at one time (unix isn't like a data
     * system and does not guarantee consistency)
     ************

           *   unique identifier - when a file gets destroyed and a new one gets created using                                      the same inode, it will have a different unique ID then
                      previously.  This allows the system to distinguish between
                      two different files that happen to have used the same inode.
             +   file sys associated with this inode
             +   quota structure controlling this file


THE THREE DYNAMIC TABLES
------------------------

         +   When a file is open,  its  descriptor  is  kept  in  main
             memory.   When  the  file  is  closed,  the descriptor is
             stored back to disk.

             +   There is usually a per process table of open files.
                 +   In Unix, there is a process open file table, with
                     one  entry for each file open.  The integer entry
                     into that table is the handle for that file open.
                     Multiple opens for the file will get multiple en-
                     tries.  (note that if a process  forks,  a  given
                     entry can be shared by several processes.)
                     +   (standard-in is #0 and  standard-out  is  #1,
                         stderr is #2, must be per process.)

     ************
     * Note: Contains an index to the inode table
     ************

             +   Unix also has a system open file table, which  points
                 to the inode for the file (in the inode table).  This
                 table is system wide.  Maps names to files.

             +   There is also the inode table, which is a system-wide
                 table holding active and recently used inodes.

     ************
     * Q: If a file is openned multiple times, will the descriptor be loaded multiple times?
     * A: Yes in the process open file table
     *
     * Prof. Q: Why is a system open file table necessary and/or useful?
     *       A: Useful but not necessary.  Acts as a cache for the directory so the system
     *          doesn't need to search through it to perform every access.
     * However, the process open file table and the inode table are necessary b
     ************

             +   Descriptor is kept in OS space which  is  paged.   So
                 may  be  necessary  to  have  page  fault  to  get to
                 descriptor info.

APPROACHES TO DIRECTORY ORGANIZATION
------------------------------------

     +   Users need a way of referencing files that they leave  around
         on  disk.   One  approach  is  just  to  have  users remember
         descriptor indexes.  I.e. the user  would  have  to  remember
         something  like  the  number of the descriptor, or some such.
         Unfortunately, not very user friendly.

     +   Of course, users want to use text names to  refer  to  files.
         Special  disk  structures called directories are used to tell
         what descriptor indices correspond to what names.

     +   Approach #1:  have a single directory  for  the  whole  disk.
         Use a special area of disk to hold the directory.
         +   Directory contains <name, index> pairs.
         +   Problems:
             +   If one user uses a name, no-one else can.
             +   If you can't remember the name of  a  file,  you  may
                 have to look through a very long list.
             +   Security problem - people can  see  your  file  names
                 (which can be dangerous.)
         +   Old personal computers (pre-Windows) work this way.

     +   Approach #2: have a separate directory for each user (TOPS-10
         approach).   This  is still clumsy:  names from a user's dif-
         ferent projects get confused.  Still can't remember names  of
         files.

         +   IBM's VM is similar to this.  Files  have  3  part  name:
             <name,  type,  location>, where location is A, B, C, etc.
             (i.e. which disk).  Very painful.  (Also, file names lim-
             ited to 8 characters.)

     ************
     * Due to the flat structure of the first two approaches, there is a high degree of
     * difficulty to find specific files. Not so great. 
     ************

     +   #3 - Unix approach:  generalize the directory structure to  a
         tree.
         +   Directories are stored on disk just  like  regular  files
             (i.e.  file descriptor with 13 pointers, etc.).

     ************
     * Q: How can one tell a directory from a file from the name alone?
     * A: Must search through inodes and check the type bit.  Inodes contain everything but the
     *    file name.  Don't want to keep the type bit or other attributes outside of the inode
     *    because they need to be kept consistent throughout.
     ************

             +   User programs can manipulate directories almost  like
                 any  other  file.  Only  special  system programs may
                 write directories.

         +   Each directory  contains  <name,  file  descriptor  index
             (inode  #)>  pairs.  The file pointed to by the index may
             be  another  directory.   Hence,  get  hierarchical  tree
             structure.   Names  have slashes separating the levels of
             the tree.

         +   There is one special directory, called  the  root.   This
             directory  has  no  name,  and  is the file pointed to by
             descriptor 2 (descriptors 0 and 1 have other special pur-
             poses).
             +   Note that we need ROOT.  Otherwise, we would have  no
                 way  to  reach any files.  From root, we can get any-
                 where in the file system.

     ************
     * ROOT is represented by .
     ************

         +   Full file name is the path  name,  i.e.  full  name  from
             root.
             +   A directory consists of  some  number  of  blocks  of
                 DIRBLKSIZ  bytes, where DIRBLKSIZ is chosen such that
                 it can be transferred to  disk  in  a  single  atomic
                 operation (e.g. 512 bytes on most machines).
             +   Each directory block contains some number of directo-
                 ry  entry  structures,  which are of variable length.
                 Each directory entry has info at  the  front  of  it,
                 containing its inode number, the length of the entry,
                 and the length of the name contained  in  the  entry.
                 These  are  followed  by  the name padded to a 4 byte
                 boundary with null bytes.  All names  are  guaranteed
                 null terminated.

     ************
     * Optimization - place most frequently used inodes in the middle of the disk
     ************

         +   Note that in Unix, a file name is not the name of a file.
             It  is only a name by which the kernel can search for the
             file.  The inode is really the "name" of the file.
             +   Each pointer from a directory to a file is  called  a
                 hard link.
                 +   In some systems, there is a distinction between a
                     "branch" and a "link", where the link is a secon-
                     dary access path, and the branch is  the  primary
                     one (goes with ownership).
             +   You "erase" a file by removing  a  link  to  it.   In
                 reality,  a count is kept of the number of links to a
                 file.  It is only really erased when the last link is
                 removed.
                 +   To really erase a file, we put the blocks of  the
                     file on the free list.

     ************
     * Erasing a link - sys. call to decrement link count.  When count = 0, place blocks on the
     *                  free list
     ************

     *   Symbolic Links - note multiple hard links are possible for a file in unix
         +   There are two ways to  "link"  to  another  directory  or
             file.   One is a direct pointer.  In Unix, such links are
             limited to not cross "file systems" - i.e. not to another
             disk.
         +   We can use symbolic links, by which instead  of  pointing
             to  the  file  or  directory, we have a symbolic name for
             that file or directory.
         +   We need to be careful not to create cycles in the  direc-
             tory  system - otherwise recursive operations on the file
             system will loop.  (E.g.   cp  -r).   In  Unix,  this  is
             solved  by  not  permitting hard links to existing direc-
             tories (except by the superuser).

SCHEMATIC OF INODEs AND DIRECTORY
--------------------------------

		Directory x		  -------------------
	------------------------  <------ | Inode for Dir x |
	|	    |          |	  -------------------
 	------------------------
	|	    |          |
 	------------------------
	|	    |          |
 	------------------------	  ---------
	|  Name	    | Inode #  |  ------> | Inode |
 	------------------------	  ---------
	|	    |          |          /  | |  \ 
 	------------------------         /   | |   \
 					 file blocks

     ************
     * There is a static inode table that resides on disk which catalogues the physical locations
     * of the inodes. (note difference from the dynamic inode table).
     ************

     +   Pros and Cons of tree structured directory scheme
         +   Can organize files in logical manner.  Easy to  find  the
             file  you're  looking  for,  even  if  you  don't exactly
             remember its name.
         +   "Name" of the file is in fact a concatenation of the path
             from  the  root.   Thus name is actually quite long- pro-
             vides semantic info.
         +   Can have duplicate names, if path to  the  file  is  dif-
             ferent.
         +   Can (assuming protection scheme permits) give away access
             to  a subdirectory and the files under it, without giving
             access to all files.  (Note: Unix does not permit  multi-
             ple hard links to a directory, unless done by superuser.)
         +   Access to a  file  requires  only  reading  the  relevant
             directories,  not  the entire list of files.  (My list of
             files prints out to a 1/2" printout-  20000 files)
         +   Structure is more complex to move around and maintain
         +   A file access may require that many directories be  read,
             not just one.
         +   It is very nice that directories and file descriptors are
             separate,  and that directories are implemented just like
             files.  This simplifies the implementation and management
             of  the structure (can write ``normal'' programs to mani-
             pulate them as files).
             +   I.e.  the  file  descriptors  are  things  the   user
                 shouldn't  have to touch.  Directories can be treated
                 as normal files.

     +   Working directory:  it is cumbersome constantly  to  have  to
         specify the full path name for all files
         +   In Unix, there is one directory per process,  called  the
             working directory, which the system remembers.
         +   This is not the same as  the  home  directory,  which  is
             where  you are at log-in time, and which is in effect the
             root of your personal file system.

     +   Every user has a search path, which is a list of  directories
         in  which  to look to resolve a file name.  The first element
         is almost always the working directory
         +   ``/'' is an escape to allow full path names.   I.e.  most
             names  are  relative  file names.  Ones with "/" are full
             (complete) path names.
         +   Note that in Unix, the search path is maintained  by  the
             shell.  If any other program wants to do the same, it has
             to rebuild the facilities from scratch.  Should be in the
             OS.  ("set path" in .cshrc or .login.)
             +   My path is: (. ~/bin /usr/new /usr/ucb /bin  /usr/bin
                 /usr/local /usr/hosts ~/com)
             +   Basically, want to look in  working  directory,  then
                 system library directories.
             +   We probably don't want a search strategy that actual-
                 ly  searches more widely.  If it did, it might find a
                 file that wasn't really the target.
         +   This is yet another example of locality.
         +   Simple means to change working  directory  -  "cd".   Can
             also  refer  to  directories  of other users by prefacing
             their logins by "~".

FILE OPERATIONS
---------------

         +   Open - put a file descriptor  into  your  table  of  open
             files.   Those  are  the files that you can use.  May re-
             quire that locks be set, and a user count be incremented.
             (If  any  locking  is  involved,  may  have  to check for
             deadlock.)
         +   Close - inverse of open.
         +   Create a file - sometimes done automatically by open
         +   Remove (rm) or erase - drop the link to  the  file.   Put
             the  blocks  back  on  the  free list if this is the last
             link.

     ************
     * Note: System does not know which inodes are invalid.
     ************

         +   Read - read a record from the file.  (This usually  means
             that  there is an "access method" - i.e. I/O code - which
             deals with the user in terms of records, and  the  device
             in terms of physical blocks).
         +   Write - like read, but may also require disk space  allo-
             cation.
         +   Rename ("mv" or "move") - rename the file.  Unix combines
             two different operations here.  Rename would strictly in-
             volve changing the file name within the  same  directory.
             "move"  moves  the  file  from  one directory to another.
             Unix does both with one command.
             +   Note that mv also destroys old file if there  is  one
                 with new name.  (This is a BUG).
                 +   (which could be considered a bug, not a feature)
         +   Seek - move to a given location in the file.
         +   Synch - write blocks of file from disk cache back to disk
         +   Change properties (e.g. protection info, owner)
         +   Link - add a link to a file
         +   Lock & Unlock - lock/unlock the file.

     ************
     * Note: Locks in Unix are mostly just warnings.
     ************

         +   Partial Erase (truncate)

     +   Note that commands such as "copy", "cat", etc., are built out
         of the simpler commands listed above.

     ************
     * Ex: copy = read + write
     ************