CS162 Lecture Notes on 4/1/2009                                  By Andy Lee

I/O Devices

DVD
bits are closer together than in CDs, allows it to hold more data.
Also, the laser used to read DVDs is of a higher frequency.

BluRay:
25GB/layer
max transfer rate: 40Mb/s

HDDVD vs BluRay
HDDVD more compatible, BluRay more space

for an access: 3K-25K instructions
seek time: avg 2-12ms, range 0-30ms
rotational latency: 2-8.33ms
total time is about 5-25ms

Device interconnection

CPU can connect directly to device controller
CPU <-> Channel <-> Storage Controller <-> String Controller <-> Device
CPU would have many Channels, Channels would have many Controllers, etc.
This was to minimize the amount of logic you'd need, since logic was really 
expensive back then.

NAS and SAN

NAS - network attached storage. devices on ethernet mounted on machine. 
Looks like a directory system. Pretty easy to set up at home, not too 
expensive
SAN - storage area network. Seperate network containing storage. More like the room full of servers that you see in big companies, mid-high end prices.
SNIA - storage industry networking association. Standardizes all the NAS/SAN products. Of course, anyone wants a system that can just be plugged and played.

Storage Service Providers

Just storage provided by a third party, like over the internet or over dedicated cables

Data Processing vs Computing
Data Processing has way more robustness. Tons of redundancies ensure that data is never lost. Used in big companies like credit cards, banks, airlines, etc.
Computing on the other hand is more for users. If it crashes, you reboot and in general you're still fine.

File Systems

"A file is just a bunch of bits with a name attached to it"
An OS sees a file as a bunch of blocks with a name.
Files can have many different properties/attributes, such as name, protection, time of creation, time of last use, etc.

Three typical ways to use a file:
 Sequential - processing information in order, one after another
 Random Access - Typical of page tables, etc.
 Keyed - blocks all have key fields, can search based on key

Four groups of concerns:

1. Disk Management - Efficient use of disk space, fast access to files, user has a hardware independent view of disk
2. Naming - names, links, directories, etc.
3. Protection - protect users from each other, allow files from various users on disk, controlled sharing(sharing files to users you want, and not others)
4. Reliability - information must last for long periods of time.

Disk Management
Every file is associated with a file descriptor, which is a data structure that gives file attributes, and maps where all physical blocks are.
Descriptors are stored on disk along with files

characteristics:
 spacial locality, lots of small files and few big files, much of the disk is allocated to the large files. Most I/O's are reads. Many I/O operations made to large files. most I/O's are sequential.
=> Can't have a lot of overhead per file, but must make sure we still have good performance for large files.

File Block Layout and Access:
contiguous - for continuous allocation we need to know the size of the file. The overhead is minimal, and implementation is simple. Also access is easy for both sequential and random files. However, there will be a lot of fragmentation, and large files will be almost impossible to allocate. It's hard to predict needs at file creation time, one may overallocate, leading to more fragmentation. Also it's hard to enlarge files. 
 A way to improve is to allow files to be allocated in N contiguous blocks instead of 1 big block.

linked files - links blocks in a file as in a linked list. file descriptor just keeps a pointer to the first block. Files can be extended easily, and external fragmentation is minimized. Sequential access is still easy. However, now random access is harder because you need sequential access through list. Lots of seek time, even in sequential access. 

Indexed files - Index of pointers to each block via an array of block pointers for each file. This results in not as much space wasted by overpredicting, and sequential/random accesses are easy. However, a maximum file size may still have to be set. Lots of seeks, because blocks are probably allocated randomly over disk surface. File descriptor may be large because of big array of block pointers. 

Multi Level Indexed Files - Include some kind of tree structure to the index.  For example, in Berkeley 4.3BSD Unix, we have 15 block pointers. The first 12 are data blocks, and the last three to indirect blocks, each with 256 pointers. There is still a maximum size for programs, but now it is very large. This is a simple and easy to implement method, with efficient random access, as well as low overhead for access to small files. It's easy to append to files, and the file map is relatively small. However, large files can be inefficiently accessed, and files are generally not allocated contiguously, so there will be plenty of seek time

Block Allocation - if blocks are same size, we can use a bit map solution, where theres one bit per disk block. If blocks are a variable size, we can use a free list, but that requires free storage area management, which can promote fragmentation and compaction. 
Instead, DEMOS is more efficient. Instead of allocating a group to a block, you allocate a group to a sequential block. This lowers seek time since there's a higher chance we're writing on sequential blocks. Keep part of the disk unallocated always, so the probability we can find sequential blocks to allocate is always high.