cs162 Lecture Notes for 3/28 by Mangesh Kulkarni and Laura Pelton Sorry for the end of Spring break, but back to work TODAY: * Finish I/O devices * Start File system **DEVICE INTERCONNECTIVITY** Depends on System: Small - Connects to CPU - Use SCSI BUs (originally 8 devices) or PCI Bus - CPU has BUS and all devices connected to BUS - Talk more about BUS's in ISI Mainframes are more complicated - Historically designed for lots of I/O, storage - used for data processing - Lots of mainframes out there - Need to keep things running - Can even get 60Km long I/O cables - interconnection system designed with Channels * write channel program * start I/O * channel looks at program and executes commands * commands sent to storage control --> talks to dvice control --> talks to device * device can send data back through multiple paths - Tree structure led to bottlenecks Net Attached Storage (NAS) - Attached to LAN providing file system - Low to Mid range product Storage Area Network (SAN) - Separate Network - Broken in Blocks - High end product (4-6 figure price range) SNIA - Storage Networking Industry Association - sets standards * (aside?) Linux is open source so people don't want to use it. Storage Service Providers - Storage from 3rd party - Sit on net and rent gigs of storage - connect via internet or dedicated cables to provider - Used to be very profitable since disks are cheap: * price inflated 5x for file system * inflated 5x more for providoers - Now companies lik google give 1Gig of memory for free - Better for them to have file system used. Size perspective: Chanels used to be 2 refrigerator size, now they are on a corner of a chip. Mainframes still exist for data processing b/c PCs are not as reliable ** Figure of mainframe file system: (should be in reader) (sorry it was really complicated and wasn't left up long) Q: Why do mainframes have to be different? A: Like evolution. Don't have to be different, but came from different places. Mainframes came out in the 50s. Needed to be reliable. PC was built quickly, all devices connected to CPU. Hierarchy not needed. Mainframes wanted electronics in center because of cost. Only had few channels Direct Attached Storage - server is directly in contact with disk - Limitations: create "islands of information" ** Figure of Direct Attached storage __________ | Server | ---------- || \/ __________ | Storage | ---------- SANs are clouds- Clouds are hubs, switches routers. ** Figure of SAN __________ | Server | ---------- || \/ ---------- < Cloud > ---------- || \/ __________ | Storage | ---------- SCSI over IP - CPU sees SCSI but uses IP to get to it ** Figure of SCSI over IP -------- -------- -------- |Server| |Server| |Server| -------- -------- -------- ^ ^ ^ \ | / \ | / V V V ----- ------------------------------------------------------------- |CPU| -> | SANS Network | ----- ------------------------------------------------------------- | V ------------------- | Disk | ------------------- Shows Picture Slides (from Reader): * Disk Storage Device (1960s) - Because of Arm motion and size, could write disk access code to make them rock back and forth and even walk. * Disk Pack - 14 inches diameter. Held 28 MB * Five disk storage modules * Next generation 3300.(Late 1970s) - Image of 4 units + control unit - around 1970 - 100 MB/spindle Q: What was the cost of a spindle? A: $25-50k / spindle. Not many people had them. Reminisces about machine room size. * Drum Storage - 2303 (Late 1960s) - Size of a fridge - Drum spins on vertical axis. Each is a row of heads. - 2305 next gen, also a fixed head storage device * Fixed head storage and control. (2305) Q: Are Drum and Fixed Head storage the same? A: Functionally the same. * Data cell drive - 2321 - had strips of tape inside - pulled out the tape, read it, and put it back - 1 generation product * Channel Box - IBM 2870 Multiplexor - size of 2 refrigerators Q: Would this be sold at Ikea? A: Could maybe find at Computer Museum. PC is better and cheaper. Probably not used anymore. * Old Printers - 1403 - print onto the stack of paper being fed in and stacks it in the front - 1403 NI - enclosed version of the 1403 * Tape Drive - IBM 2401 - Was about 6 ft high - long loop of tape moved up and down from reel to reel Q: (from Adrian) Why do they loop down? A: You weren't listening. Need teh slack because of momentum of tape wheel. Slack allows tape to start turning before wheels start moving, since much harder to spin wheel because of angular momentum. * Next generation tape drive - 3420 - next gen after 2401 - difference was that it had 2 loops of tape Q: Cold we tell the difference between that and a fridge? A: We'll see (maybe on midterm). * Slide listing various memory types: - Compact flash (CF) - Memory stick - Secure Digital (SD) - Smart Media - xD Flash Memory - slower than DRAM or SRAM, but maintains data after losing power. - Works like disk. Can't rewrite repeatedly (Has very long time to failure) so can't use as SDRAM - Access time similar to HD - Very long time before failure. Usually replace with cheaper, bigger memory by then. Q: Why so many different technologies A: Different form factors. CF is 5x size of SD. Developed at different times have different costs. Why Memory stick? Sony has own techonology, can set its own prices. Economics: they have a unique product. Q: Why does flash die? A: Not sure. Has limited number of rewrite cycles. Comment: Also speed difference in types (SD is faster than CF) - could be inherent or designed. - HD CF coming out soon - CF was the original, now technology smaller Digression: HP doesn't sell printers, sells cartridges. Almost cheaper to buy new printer as opposed to cartridge. Adrian: New printers only have half full cartridges. Similar to Camera batteries. Back to ways of replacing cartridges. Projected Sales in 2007 (from a few years ago): 315 million cards others ~1% smartMedia .01% xD Picture Card 2% MultiMedia Card 2% Compact Flash 4% Memory Stick Pro 6% Secure Digital 8% Memory Stick Duo 29% mini SD 48% Antique Problem: - Mainframes: Originally no intelligence in disk drive (No buffer) - later devoloped Rotational Position Sensing next topic: **FILE STRUCTURE, I/O OPTIMIZATION** File: a names collection of bits (usually stored on disk) - how the OS sees it: bunch of blocks stored on a device - how the Programmer/User sees it: formatting and a different interface (does not matter to the file system) - Properties: name, protection, type (how to interpret it), creation time, last use, last modification, owner, length, link, count, layout, etc. How do we use a file? Sequential: Information processed in order, one piece after another - most common (editor writes new file, compiler, etc.) Random Access: can address any block in the file directly - user must know which block to address - eg. databases, libraries, paging Keyed: search for block with particular parameters - parameters usually not given by OS, (but is provided in some IBM …) - eg. hashtable, dictionary Modern file & I/O Systems have 4 main issues: Overview: 1. Disk Managment - Efficient use of disk space (layout) - fast access to files - file structure - device use optimization - user has hardware independent view of disk 2. Naming - Links, Dirs, etc. 3. Protection - Protect users from each other - users have files on same disk - controlled sharing 4. Reliability - information must last safely for long periods of time Full Descriptions: *** Disk Management *** Personal Experience doesn't tell you very much about actual disk management How should the blocks of the file be placed on the disk? how should you find them once there? - File Descriptor - data structure gives file attributes and map to where the file is on disk - and lives on disk when the file is not open. - Cached in main memory - System, User and File Characteristics: - per-file cost must be low, large files must also have good performance - 10%-20%of I/O is user prog accessing user files, rest is OS - most files on disk are small so the OS must optimize for that - Most of disk is allocated to the large files - Most I/Os are reads (60-85%) - Most I/Os are sequential Q: Is most Overhead disk space or time? A: Both File Block Layout and Access (uses standard data structures just on disk instead of in memory) - contiguous - linked - indexed or tree-structured Contiguous - Allocate file in contiguous memory blocks - keep a free list of unused areas of disk -user must specify the length of the file -descriptor contains location and size - Advantages: -easy access -low overhead -very simple to implement -few seeks -very good performance for sequential access - Disadvantages: - bad fragmentations (dynamically created file very hard) - hard to predict needs of file at creation - may over allocate (internal fragmentation) - hard to enlarge the files - How to Improve: permit files to be allocated in extents - ask for a block, if not big enough, ask for another one - Eg. IBM OS 360 permits up to 16 extents. Extra space in last extent can be released after file is written Linked - Link blocks of file together as a linked list. In File Descriptor have pointer to 1st block. Each block points to next. - Advantages: - Files can be extended - no external fragmentation - sequential access easy. just have to follow link - Disadvantages - rand access requires sequential access through list - lots of seeking - even in seq access, some OH in block for link - Examples: Tops-10 (sort of), Alto (sort of). (Simple) Indexed Files How: keep an array of block pointers for each file max length declared when created allocate array to hold pointer to all blocks, but don't allocate yet fill in pages dynamically as in free list -Advantages: - not as much space wasted by over allocating - sequential & random access easy - only write space in the index -Drawbacks: - May still have to set maximum file size or have some overflow scheme - block have to be allocated carefully or else lots of calls - Index array may be large so file description is large *Note, the blackboard is being held for ransom by the philosophy department* ** Figure of multilevel indexed files ------------------------------------------------------------- | | | | | | | | | | | | | | | | ----------------------------------------+ ------------------------------------------------------------- | | | | | | | | | | | | | | | v v v v v v v v v v v v v | +---------------------+ -------------------------------- DATA PAGES | | | | | | ... | | | | +------------------+ | -------------------------------- | | | v v v -------------------------------- -------------------------------- -------------------------------- | | | | ... | | | | | | | | ... | | | | | | | | ... | | | | -------------------------------- ------------------------------- -------------------------------- | | | | | | | | ..... | | v v v v v v v | | | DATA PAGES -------------------------------- -------------------------------- | | | | | ... | | | | | | | | ... | | | | | -------------------------------- -------------------------------- | | | | | | | | | | | | | | v v v v v v v v v v v v | DATA PAGES | v -------------------------------- | | | | ... | | | | -------------------------------- | | | | | | | v v v v v v v DATA PAGES Multi-level indexed files (The Unix Solution) - In general, any sort of multilevel tree structure. Lets look at 4.3 BSD: - file descriptors contain 15 block pointers. first 12 point to data, next 3 to indiriect, doubly-indirect and triply indiriect blocks - 256 pointers in each indirect table - maximum file length is fixed, but large - descriptor space isn't allocated until seeded -Advantages: - simple, easy to implement - manual expansion - easy access to small files since never have to go to an indirect block - good random access time of blocks - easy to read/write block in the middle of file - easy to append - small file map Q: Why can't they all be multilevel? A: Lets go back to the slide: -Disadvantages - may have to follow 3 indirect pointers for each operation - really slow if done on a regular basis (normally reading sequentially is not an issue) - file not allocated contiguously so have to seek between blocks Block allocation - if all blocks are the same size, can use a bit map - one bit per disk - cache parts of bit map in memory, select block at random (or not) - Variable-size blocks - use free list - requires free storage management (fragmentation & compentition) - in Unix blocks are joined in groups for efficiency - block by block organization of the free list means that file data gets spread around - More efficient solution (DEMOS) - Allocate groups of sequential blocks. - Use multilevel indexing scheme, but each pointer points to sequence of blocks - when we need another block, alloc next physical block