CS162 3/12/2007 Lecture Notes Administrivia 1. Exams are back and will be brought to class for the next few weeks. Last Week: Finished discussion on 9 track tapes. How much should we remember from data sheets in reader? Remember a general idea of the parameters. --For example, suppose a room is filled with DAT tapes, how man bits can one store? To determine this, you should know the amounts of bits that can be stored in a DAT and the size of physical DAT Tape. The answer should be within one significant digit. Tape Drive Data: 9 Track Tape Drive: Found in 1993 rack mount drives. Total Capacity:140Mbyes; 6250 bit/inch linear density; Capacity is less than calculated capacity of 180-200MB due to large interecord gaps Speed: 125 inches/ second MTBF (Mean Time Between Failure) IBM Cartridges: Capacity: 220MB, 18 Tracks (same as a 2400' tape) Tape: 540'length, 1/2" width MTBF: 1500 hours Speed: 3MB/sec Cartidges are sealed which facilates auto loading by robots in large tape libraries They are also expensive. DEC Tape: Fairly small and wide. Allows for random reads and writes. Slow. DAT Tape: Credit card sized. Tape: 3/8 inche wide. Capacity: 2GB Allows random reading and writing Speed: (1993) 183 KB/sec DAT orginally included an audio format; however, the recording industry killed the format because they did not want their audio to be stored in a digital format which will facilitate digital copying of music. DAT tapes were more sophisticated than reel to reel tapes. Orginally they contained only 7 tracks which only stored databits; later on they evolved to 9 tracks with both data and ECC bits. Modern DAT tapes have 3 levels of ECC: 1 correctable error in 1e15 bits. Tape speed is comparable to reel to reel speeds. Speed is limited by stretching and tearing of tape. However, DAT tapes are read with diagonal scan method. Head rotate as it reads allowing for higher bit rates. Tapes offer cheap dense storage. However, they are not reliable as permanent storage. Tapes will eventually lose their magnetic information. ECC can only correct for only so much errors. Tapes are not invulnerable to physical damage. DLT: Linear Tape System Tracks to in both direction of tape: Writes length of tape, reverses, writes another track, reverses, writes... Type: Cartridge Tape: 2000' long, 1/2" wide MTBF: 250k hours Life Expectancy: 30 years, <10% due to demagnetization, 1 million passes over data Errors: 1 bit in 1e17 correctable errors, 1 bit in 1e27 uncorretable errors Speed: 36MB/sec (burst 200MB/sec) Capacity: <300GB (built compression) Tapes are often used in automated backups in case data is lost. However, often times when data is needed, the data is not there due to failed backups: either the backup program never worked or backup operator didnt happen to show up that night. Conclusion: Tapes are a cheap way to store data. But it is slow and will deteriorate over time. Also, tapes have incompatible formats. If you need to recover data from a tape after the original manufacturer has gone, you may be never be able to recover your data. In the past, tapes were the cheapest way to store data: LBL uses tapes to transport the terabytes data to their supercomputers. Today, cheaper disks and high bandwidth fiber optics may succeed the tape. Harddrives: Data is stored on hard flat aluminum platters coated with a thing magnetic coating. This coating used to be iron oxide, but is now aluminum nickel cobalt. Heads are located on either side of each platter (3 platters->6 heads). All of the platters are connected to the same spindle and all heads are connected to the same yoke so they all move at one. A drive with 3 platters can read 6 tracks at a time without moving the head. Definitions: Track: circular path traced by a stationary head as platter spins. Cyclinder: the set of vertical aligned tracks Record: logcial block of data and extra bits contianing ECC and location data such as track,cylinder, and block #. Old drives used to use absolute positioning to find a track. Today, with drives containing over 10k cylinders, heads use feedback to find the tracks. As they move toward the desired track, they read the track info as they go to determine how much further it needs to move until it actually finds the track. Unlike floppies, drive heads float over the platter like a glider. The spinning platter generate a vortex and the head rides on this vortex of air. This gap is generally in the order of micrometers. If the drive is jolted, the head may crash into the platter and may result in the loss of data. Even though drives have N heads, typically only 1 track is read at a time. 1) Due to the density of tracks on a platter, it is very difficuit to aligh all N tracks at once to allow for simultaneous read/write. The head must realign itself to read a different track within the same cyliner. 2) Drive often share electronics among multiple heads. Only one track can be processed at a time. Drives are sealed to prevent contamination such as dust which will cause the head to crash. Cylinders are divided into sectors... Sectors are physical data blocks of fixed size with interrecord gaps. Size is traditionally set to 512 bytes. Note: A Record is a logical piece of information A Sector is a physical storage on a disk Software must take variable length recrods and pack them into sectors of fixed length. Programmers are abstracted out of this process because... 1) They do not want to deal with this details 2) They maybe not be competant enough to do it well. IBM CKD Drives In the past, ibm made high end CKD formatted drives. Data blocks of variable size that contain CKD data. C-count: block id#, track, cylinder, ECC K-key used by IBM Channel searches D-data Between every block there is a interrecord block. Max block size ~32KB, the bigger the block used, the more efficient the disk. Today, mainframes do not actually use CKD drives even if programs operate on CKD archetecture. Use software that takes regular drives and emulates a CKD drive. EMC developed the idea to build a mainfram disk from cheap small disks that emulate CKD. 35 years ago, drives are removable. Dishwasher sized drive with removable drives. Zip Drive (100MB), Jazz Drive (2GB), very expensive. Very hard to do because of precision requried. Any disk inserted into the drive must precisely align the head up with the tracks. Floppies have a capacity of 1.44MB. Not very useful anymore. IO Sequence: 1) Seek: move head into position over desired track. 2) Wait: wait for correct sector to travel over the head 3) Read/Write: read and write data as sector traves by the head. In old days, electronics divided into sepearate descrete componenets because electronics were expensive. Channels used to be the size of a large refrigerator. Now it is one square mm. Disk->disk controller->storage controller->channel->CPU. Note when drive is reading, all electronics must be availible for a read to occur. RPS miss (rotation position sensing): missing a whole rotation because electronics are not availible. In the past, electronics were not very smart. CPU needed to give the full address: cyclinder, head, and block. Note that this required that all tracks to be of the same length. Otherwise CPU would need to make a complex equation to find the right location if tracks had variable lengths. Today, electronics take care of everything. CPU just give a block address and hardrive will translate this into cylinder,head,block to find the correct location. Now, the outside tracks can be longer than the inner ones. 30% more space. Trends and Graphs see reader for illustration of plots Considerations: Seek time seeks time is improving by 5-10%/year: limited by mechanics: can only make arm lighte and motor more powerful. 35ms -> 7ms (5x improvement over 30 years) Speed increase 9%/year Throughput improved by buffers improve by 8%/year Noise Power noise and power incrase quadratically with speed. large server farms are built near powerplants (ie large hydroelectric damns in Washington) Linear Density increase 21%/year Track Density increase 24%/year Areal Desnity increase 49%/year (capacity doubles per year) (30000x improvement over 30 years) Size Shrinks: 24"->8"->5.25"->3.25",1.8",1" platters Cost cost/megabyte cost/system Volumetric Density floor space in machine room yr2000: 2sqft/terabyte size of ipod/labtop Head to media spacing: 2002: 10 nanometers