Peng Lou 4/4 Notes Topic: Directories and Other File System Topics FILE DESCRIPTOR AND INODE ------------------------- + File Descriptor is a data strucure or record that describes the file. + The file descriptor information has to be stored on disk, so it will stay around even when the OS doesn't. (Note that we are assuming that disk contents are permanent.) ************ * - Why store file descriptors on disk? Logical and convenient approach and it's * fairly fast. ************ + In Unix, all the descriptors are stored in a fixed size array on disk. The descriptors also contain protection and accounting information. + A special area of disk is used for this (disk contains two parts: the fixed-size descriptor array, and the remainder, which is allocated for data and indirect blocks). + The size of the descriptor array is determined when the disk is initialized, and can't be changed. In Unix, the descriptor is called an inode, (index node and its index in the array is called its i-number). Internally, the OS uses the i-number to refer to the file. ************ * - An Inode is not only the index of the file descriptor array, but also an index of * the disk itself * Q: How can does the system know how big the descriptor array should be? * A: Doesn't matter. Inodes are small (take up 3% of disk usually), so the system * could afford to allocate more than needed and waste some. ************ + The Inode is the focus of all file activity in UNIX. There is a unique inode allocated for each file, includ- ing directories. An inode is 'named' by its dev/inumber pair. (iget/iget.c) + Inode fields: + reference count (number of times open) * number of links to file - how many accesses to file + owner's user id, owner's group id + number of bytes in file + time last accessed, time last modified, last time inode changed * disk block addresses, indirect blocks (discussed pre- viously) - indirect block pointers resides in the 1st 12 files + flags: (inode is locked, file has been modified, some process waiting on lock) + file mode: (type of file: character special, directo- ry, block special, regular, symbolic link, socket), + A socket is an endpoint of a communication, re- ferred to by a descriptor, just like a file or a pipe. + (items below not in text on 4.4BSD) * protection info (~ 10 bits): (set user id on execution, set group id on execution, read, write, execute permissions, sticky bit? (check)) ************ * sticky bit - indicates unprivileged users cannot change or delete files in the * directory ************ ************ * Some OS history: * * 2 version of Unix (pretty much the same: * * 1) AT&T - sun OS based on this; sys V was 32 bits * 2) Berkeley - v. 4.1 had VM. It was more robust early and then AT&T caught * on. Note: HP UX has a Berkeley kernel ************ * count of shared locks on inode - read * count of exclusive locks on inode - write ************ * Note: Unix allows multiple writes to the same file at one time (unix isn't like a data * system and does not guarantee consistency) ************ * unique identifier - when a file gets destroyed and a new one gets created using the same inode, it will have a different unique ID then previously. This allows the system to distinguish between two different files that happen to have used the same inode. + file sys associated with this inode + quota structure controlling this file THE THREE DYNAMIC TABLES ------------------------ + When a file is open, its descriptor is kept in main memory. When the file is closed, the descriptor is stored back to disk. + There is usually a per process table of open files. + In Unix, there is a process open file table, with one entry for each file open. The integer entry into that table is the handle for that file open. Multiple opens for the file will get multiple en- tries. (note that if a process forks, a given entry can be shared by several processes.) + (standard-in is #0 and standard-out is #1, stderr is #2, must be per process.) ************ * Note: Contains an index to the inode table ************ + Unix also has a system open file table, which points to the inode for the file (in the inode table). This table is system wide. Maps names to files. + There is also the inode table, which is a system-wide table holding active and recently used inodes. ************ * Q: If a file is openned multiple times, will the descriptor be loaded multiple times? * A: Yes in the process open file table * * Prof. Q: Why is a system open file table necessary and/or useful? * A: Useful but not necessary. Acts as a cache for the directory so the system * doesn't need to search through it to perform every access. * However, the process open file table and the inode table are necessary b ************ + Descriptor is kept in OS space which is paged. So may be necessary to have page fault to get to descriptor info. APPROACHES TO DIRECTORY ORGANIZATION ------------------------------------ + Users need a way of referencing files that they leave around on disk. One approach is just to have users remember descriptor indexes. I.e. the user would have to remember something like the number of the descriptor, or some such. Unfortunately, not very user friendly. + Of course, users want to use text names to refer to files. Special disk structures called directories are used to tell what descriptor indices correspond to what names. + Approach #1: have a single directory for the whole disk. Use a special area of disk to hold the directory. + Directory contains pairs. + Problems: + If one user uses a name, no-one else can. + If you can't remember the name of a file, you may have to look through a very long list. + Security problem - people can see your file names (which can be dangerous.) + Old personal computers (pre-Windows) work this way. + Approach #2: have a separate directory for each user (TOPS-10 approach). This is still clumsy: names from a user's dif- ferent projects get confused. Still can't remember names of files. + IBM's VM is similar to this. Files have 3 part name: , where location is A, B, C, etc. (i.e. which disk). Very painful. (Also, file names lim- ited to 8 characters.) ************ * Due to the flat structure of the first two approaches, there is a high degree of * difficulty to find specific files. Not so great. ************ + #3 - Unix approach: generalize the directory structure to a tree. + Directories are stored on disk just like regular files (i.e. file descriptor with 13 pointers, etc.). ************ * Q: How can one tell a directory from a file from the name alone? * A: Must search through inodes and check the type bit. Inodes contain everything but the * file name. Don't want to keep the type bit or other attributes outside of the inode * because they need to be kept consistent throughout. ************ + User programs can manipulate directories almost like any other file. Only special system programs may write directories. + Each directory contains pairs. The file pointed to by the index may be another directory. Hence, get hierarchical tree structure. Names have slashes separating the levels of the tree. + There is one special directory, called the root. This directory has no name, and is the file pointed to by descriptor 2 (descriptors 0 and 1 have other special pur- poses). + Note that we need ROOT. Otherwise, we would have no way to reach any files. From root, we can get any- where in the file system. ************ * ROOT is represented by . ************ + Full file name is the path name, i.e. full name from root. + A directory consists of some number of blocks of DIRBLKSIZ bytes, where DIRBLKSIZ is chosen such that it can be transferred to disk in a single atomic operation (e.g. 512 bytes on most machines). + Each directory block contains some number of directo- ry entry structures, which are of variable length. Each directory entry has info at the front of it, containing its inode number, the length of the entry, and the length of the name contained in the entry. These are followed by the name padded to a 4 byte boundary with null bytes. All names are guaranteed null terminated. ************ * Optimization - place most frequently used inodes in the middle of the disk ************ + Note that in Unix, a file name is not the name of a file. It is only a name by which the kernel can search for the file. The inode is really the "name" of the file. + Each pointer from a directory to a file is called a hard link. + In some systems, there is a distinction between a "branch" and a "link", where the link is a secon- dary access path, and the branch is the primary one (goes with ownership). + You "erase" a file by removing a link to it. In reality, a count is kept of the number of links to a file. It is only really erased when the last link is removed. + To really erase a file, we put the blocks of the file on the free list. ************ * Erasing a link - sys. call to decrement link count. When count = 0, place blocks on the * free list ************ * Symbolic Links - note multiple hard links are possible for a file in unix + There are two ways to "link" to another directory or file. One is a direct pointer. In Unix, such links are limited to not cross "file systems" - i.e. not to another disk. + We can use symbolic links, by which instead of pointing to the file or directory, we have a symbolic name for that file or directory. + We need to be careful not to create cycles in the direc- tory system - otherwise recursive operations on the file system will loop. (E.g. cp -r). In Unix, this is solved by not permitting hard links to existing direc- tories (except by the superuser). SCHEMATIC OF INODEs AND DIRECTORY -------------------------------- Directory x ------------------- ------------------------ <------ | Inode for Dir x | | | | ------------------- ------------------------ | | | ------------------------ | | | ------------------------ --------- | Name | Inode # | ------> | Inode | ------------------------ --------- | | | / | | \ ------------------------ / | | \ file blocks ************ * There is a static inode table that resides on disk which catalogues the physical locations * of the inodes. (note difference from the dynamic inode table). ************ + Pros and Cons of tree structured directory scheme + Can organize files in logical manner. Easy to find the file you're looking for, even if you don't exactly remember its name. + "Name" of the file is in fact a concatenation of the path from the root. Thus name is actually quite long- pro- vides semantic info. + Can have duplicate names, if path to the file is dif- ferent. + Can (assuming protection scheme permits) give away access to a subdirectory and the files under it, without giving access to all files. (Note: Unix does not permit multi- ple hard links to a directory, unless done by superuser.) + Access to a file requires only reading the relevant directories, not the entire list of files. (My list of files prints out to a 1/2" printout- 20000 files) + Structure is more complex to move around and maintain + A file access may require that many directories be read, not just one. + It is very nice that directories and file descriptors are separate, and that directories are implemented just like files. This simplifies the implementation and management of the structure (can write ``normal'' programs to mani- pulate them as files). + I.e. the file descriptors are things the user shouldn't have to touch. Directories can be treated as normal files. + Working directory: it is cumbersome constantly to have to specify the full path name for all files + In Unix, there is one directory per process, called the working directory, which the system remembers. + This is not the same as the home directory, which is where you are at log-in time, and which is in effect the root of your personal file system. + Every user has a search path, which is a list of directories in which to look to resolve a file name. The first element is almost always the working directory + ``/'' is an escape to allow full path names. I.e. most names are relative file names. Ones with "/" are full (complete) path names. + Note that in Unix, the search path is maintained by the shell. If any other program wants to do the same, it has to rebuild the facilities from scratch. Should be in the OS. ("set path" in .cshrc or .login.) + My path is: (. ~/bin /usr/new /usr/ucb /bin /usr/bin /usr/local /usr/hosts ~/com) + Basically, want to look in working directory, then system library directories. + We probably don't want a search strategy that actual- ly searches more widely. If it did, it might find a file that wasn't really the target. + This is yet another example of locality. + Simple means to change working directory - "cd". Can also refer to directories of other users by prefacing their logins by "~". FILE OPERATIONS --------------- + Open - put a file descriptor into your table of open files. Those are the files that you can use. May re- quire that locks be set, and a user count be incremented. (If any locking is involved, may have to check for deadlock.) + Close - inverse of open. + Create a file - sometimes done automatically by open + Remove (rm) or erase - drop the link to the file. Put the blocks back on the free list if this is the last link. ************ * Note: System does not know which inodes are invalid. ************ + Read - read a record from the file. (This usually means that there is an "access method" - i.e. I/O code - which deals with the user in terms of records, and the device in terms of physical blocks). + Write - like read, but may also require disk space allo- cation. + Rename ("mv" or "move") - rename the file. Unix combines two different operations here. Rename would strictly in- volve changing the file name within the same directory. "move" moves the file from one directory to another. Unix does both with one command. + Note that mv also destroys old file if there is one with new name. (This is a BUG). + (which could be considered a bug, not a feature) + Seek - move to a given location in the file. + Synch - write blocks of file from disk cache back to disk + Change properties (e.g. protection info, owner) + Link - add a link to a file + Lock & Unlock - lock/unlock the file. ************ * Note: Locks in Unix are mostly just warnings. ************ + Partial Erase (truncate) + Note that commands such as "copy", "cat", etc., are built out of the simpler commands listed above. ************ * Ex: copy = read + write ************