CS 162 Lecture Notes for Monday 3/28/2005 - Device Interconnection * Used to be like trees, now it's complicated. * In small systems, CPU connects directly to device controller, which drives device. Controller may be built into device. * Multiple devices can be attached to SCSI bus. * In IBM mainframe systems, there are channels. Channels connect to storage control units. Storage controllers connect to string controllers. String controllers have a number of disks on them. * CPU talks to Channels, which talks to storage controllers, which talks to device controllers, which talks to the device. * There can be multiple paths from CPU to disk. * Devices can be shared among CPUs at the level of the storage controller or string controller. - NAS and SAN * NAS - network attached storage. Storage attaches to lo- cal area network (e.g. ethernet). Provides "file" inter- face. Low to midrange product. * SAN - storage area network. Separate(?) network contain- ing storage. "block level" interface. mid-high end product. 4-7 figure storage. * Storage networking Industry Associaton (SNIA) - working on standards, so NAS, SAN are standard and can interoper- ate and attach to many types of systems. - Storage Service Providers * Storage provided by third party. Often connects over in- ternet, or via dedicated cables to provider. Expensive. * Used to be a very profitable idea, but now that storage is so cheap, most storage provided by third parties are free. * For example, google gives 1 gigabyte storage for free now. (Picture of device interconnections shows CPU communicates with channels through channel adapters) Direct attached storage --------- |Server | --------- || \/ --------- | disk | --------- SAN: Described like a cloud, the cloud is the interface between servers and disk. --------- --------- --------- |Server | |Server | |Server | --------- --------- --------- \ || / \ || / \ || / \ || / \ || / ------------------------- | Cloud | ------------------------- || || \/ -------- | disk | -------- - There used to be disk storage the size of washing machines. Because of the arm size, inertia of the arm moving from one section to another can move the whole disk across the room. (Picture of IBM 3330, 100MB per spindle) (Picture of IBM 2870 Mutiplex Channel, the size of 2 refridgerators) - Flash Memory * Kind of memory storage like DRAM * Doesn't lose contents when no power * Has a limit on how many times it can be rewritten until it breaks. Currently that number is in in the 6 figures. Topic: File Structure, I/O Optimization - File: a named collection of bits (usually stored on disk). * From the OS' standpoint, the file consists of a bunch of blocks stored on the device. * Programmer may actually see a different interface (bytes or records), but this doesn't matter to the file system (just pack bytes into blocks, unpack them again on read- ing). * File may have attributes and properties- e.g. name(s), protection, type (numeric, alphabetic, binary, c program, fortran progrm, data, etc.), time of creation, time of last use, time of last modification, owner, length, link count, layout (format). - How do we (or can we) use a file? * Sequential: information is processed in order, one piece after the other. This is by far the most common mode: e.g. editor writes out new file, compiler compiles it, etc. * Random Access: can address any block in the file direct- ly without passing through its predecessors. E.g. the data set for demand paging, libraries, databases. Need to know what block we want (e.g. some sort of index or address needed.) * Keyed: search for blocks with particular values, e.g. hash table, associative database, dictionary. Usually not provided by operating system (but is provided in some IBM systems). Keyed access can be considered a form of random access. - Modern file and I/O systems must address four general problems: * 1. Disk Management: * Efficient use of disk space * Fast access to files * File structures * Device use optimization. * User has hardware independent view of disk. (Mostly, so does OS). * 2. Naming: how do users refer to files? * This concerns directories, links, etc. * 3. Protection: all users are not equal * Want to protect users from each other. * Want to have files from various users on same disk. * Want to permit controlled sharing. * 4. Reliability * Information must last safely for long periods of time. - Disk Management: * How should the blocks of the file be placed on the disk? * How should the map to find and access the blocks look? * File Descriptor- A data structrure that gives file attributes and contains the map which tells you where the blocks of your file are. example: inodes on Unix. * File descriptors are stored on disk along with the files (when the files are not open). - Some system, user and file characteristics: (Need to know what files in systems are like so we know how to handle these files) * Most files are small. * In Unix, most files are very small. Lots of files with a few commands in them, etc. * Much of the disk is allocated to large files. * Many of the I/O operations are made to large files. * Most (between 60% and 85%) of the I/Os are reads. * Most I/Os are sequential. Thus, per-file cost must be low but large files must have good performance. - File Block Layout and Access * Contiguous * Linked * Indexed or tree structured - Note - this is just standard data structures stuff, but on disk. - Contiguous allocation: * Allocate file in a contiguous set of blocks or tracks. * Keep a free list of unused areas of the disk. When creating a file, make the user specify its length, allo- cate all the space at once. Descriptor contains location and size. * Advantages: * Easy access, both sequential and random, * Low overhead * Simple. * Few seeks. * Very good performance for sequential access. * Drawbacks: * Horrible fragmentation will make large files impossi- ble. * Hard to predict needs at file creation time. * May over allocate. * Hard to enlarge files. - Can improve this scheme by permitting files to be allocated in extents. I.e. ask for contiguous block; if it isn't enough, get another contiguous block. * Example: IBM OS/360 permits up to 16 extents. Extra space in last extent can be released after file is writ- ten. - Linked files: Link the blocks of the file together as a linked list. In file descriptor, just keep pointer to first block. In each block of file keep pointer to next block. Block -> Block -> Block * Advantages? Files can be extended, no external fragmen- tation problems. Sequential access is easy: just chase links. * Drawbacks? Random access requires sequential access through list. Lots of seeking, even in sequential ac- cess. Some overhead in block for link. Example: TOPS-10, sort of. Alto, sort of. - (Simple) Indexed files: Simplest approach is to just keep an array of block pointers for each file. File maximum length must be declared when it is created. Allocate an array to hold pointers to all the blocks, but don't allocate the blocks. Then fill in the pointers dynamically using a free list. --- ------------------------------- | |----->|Block |Block |Block | --- ------------------------------- | |------------------| | --- | | |----------------------------| --- * Advantages? * Not as much space wasted by overpredicting, both sequential and random access are easy. Only waste space in the index. * Drawbacks? * May still have to set maximum file size (Can have an overflow scheme if file is larger than predicted max- imum.) * Blocks are probably allocated randomly over disk sur- face, and there will be lots of seeks. * Index array may be large, and may require large file descriptor. - Multi-level indexed files: the VAX Unix solution (version 4.4). * In general, any sort of multi-level tree structure. More specifically, we describe what Berkeley 4.3BSD Unix does: * File descriptors: 15 block pointers. The first 12 point to data blocks, the next three to indirect, doubly- indirect, and triply-indirect blocks (256 pointers in each indirect block). Maximum file length is fixed, but large. Descriptor space isn't allocated until needed. Diagram: ------------------------------- | | | | | | | | | | | | | | | |---->An array of pointers to an array that points to data(3 level array) ------------------------------- | | | | | | | | | | | | | | Data Data Data Data Data | | | | Another array that points to data(2 level array) * Advantages: simple, easy to implement, incremental ex- pansion, easy access to small files. Good random access to blocks. Easy to insert block in middle of file. Easy to append to file. Small file map. * Drawbacks: * Indirect mechanism doesn't provide very efficient ac- cess to large files: 3 descriptor ops for each real operation. (When we "open" the file, we can keep the first level or two of the file descriptor around, so we don't have to read it each time.) * File isn't generally allocated contiguously, so we have to seek between blocks. - Block Allocation: * If all blocks are same size, can use bit map solution. * One bit per disk block. * Cache parts of bit map in memory. Select block at random (or not randomly) from bitmap. * If blocks are variable size, can use free list. * This requires free storage area management. Fragmen- tation and compaction. * In Unix, blocks are grouped in groups for efficiency: each block on the free list contains pointers to many free blocks, plus a pointer to the next list block. Thus there aren't many references involved in alloca- tion or deallocation. * Block-by-block organization of free list means that file data gets spread around the disk. - A more efficient solution (used in DEMOS system built at Los Alamos.): * Allocate groups of sequential blocks. Use multi-level index scheme described above, but each pointer isn't to one block - it is to a sequence of blocks. * When we need another block for a file, we attempt to al- locate the next physical block on the track (or cylinder). * If we can't do it sequentially, we try to do it near- by. * If we have detected a pattern of sequential writing, then we grab a bunch of blocks at a time (release them if unused). (The size of the bunch will depend on how many sequential writes have occurred so far.) * Keep part of the disk unallocated always (as Unix does now) - then probability we can find sequential block to allocate is high.