Sharing Main Memory -- Segmentation and Paging How do we allocate memory to processes? 1. Simple uniprogramming with a single segment per process: One program in memory at a time. (Can actually multiprogram by swapping programs.) Highest memory holds OS. Process is allocated memory starting at 0 (or J), up to (or from) the OS area. Process always loaded at 0. Examples: early batch monitors where only one job ran at a time and all it could do was wreck the OS, which would be rebooted by an operator. Many of today's personal computers also operate in a similar fashion. Advantages Low overhead Simple No need to do relocation. Always loaded at zero. Disadvantages No protection - process can overwrite OS which means it can get complete control of system Multiprogramming requires swapping entire process in and out Overhead for swapping Idle time while swapping Process limited to size of memory CTSS ("compatible" time sharing system); system swapped users completely. No good way share - only one process at a time (can't even overlap CPU and I/O, since only one process in memory.) 2. Relocation - load program anywhere in memory. Idea is to use loader or linker to load program at an arbitrary memory address. Note that program can't be moved (relocated) once loaded. (WHY??) This scheme (#2) is essentially the same as #1, but the ability to load at any address will be used in #3 below. 3. Simple multiprogramming with static software relocation, no protection, one segment per process: Highest or lowest memory holds OS. Processes allocated memory starting at 0 (or N), up to the OS area. When a process is initially loaded, link it so that it can run in its allocated memory area Can have several programs in memory at once, each loaded at a different (non overlapping) address. Advantages: Allows multiprogramming without swapping processes in and out. Makes better use of memory Higher CPU use due to more efficient multiprogramming. Disadvantages No protection - jobs can read or write others. External fragmentation Overhead for variable size memory allocation. Still limited to size of physical memory. Hard to increase amount of memory allocation. Programs are staticly loaded - are tied to fixed locations in memory. Can't be moved or expanded. If swapped out, must be swapped to same location. 4. Dynamic memory relocation: instead of changing the addresses of a program before it's loaded, change the address dynamically during every reference. There are many types of relocation - to be discussed. Under dynamic relocation, each program-generated address (called a logical or virtual address) is translated in hardware to a physical, or real address. This happens as part of each memory reference. Virtual (logical) Address is what the program generates. Virtual address space is set of (legal) virtual addresses the program can generate. Physical (real) addresses - set of addresses in physical memory. Physical address space of program - set of physical addresses it can get to. Physical address space of machine - set of addresses in physical memory. Dynamic relocation leads to two views of memory, called address spaces. We have the virtual address space and the real address space. Each process has its own virtual address space. With static relocation we force the views to coincide. In some systems, there are several levels of mapping. Several types of dynamic relocation. Base & bounds relocation: Two hardware registers: base register for process, bounds register that indicates the last valid address the process may generate. Real address = base + virtual address IF virtual_address < bounds, and VA>= 0 In parallel, the real address is generated by adding it to the base register. This is a form of translation. On each memory reference, the virtual address is compared Advantages: Each process appears to have a completely private memory of size equal to the bounds register plus 1. Processes are protected from each other. No address relocation is necessary when a process is loaded. Task switching is very cheap, when done between processes in memory- just reload processor registers. Higher overhead to load process from disk. Compaction is possible. Disadvantages: Still limited to size of main memory. External fragmentation (between processes) Overhead for allocating variable size spaces in memory. Sharing difficult - only possible if bases & bounds overlap. Only one "segment" - i.e. one region of memory. New, special hardware needed for relocation. Time to do relocation (it isn't free). OS must be able to change value of relocation registers (why?). OS loads new process and sets base and bounds registers. OS schedules process, and sets base and bounds register. When tasks are switched, must be able to swap base, bounds and PC registers simultaneously. These imply that OS must run with base and bounds relocation turned off - otherwise, would affect itself when running. (Or would need its own set of base and bounds registers.) Use of base and bounds controlled by status bit, usually in PSW or SSW, or similar control register. Users must not be able to change values of base and bounds registers Otherwise, no protection between users. Can trash others or OS. Problem: how does OS regain control once it has given it up? OS is entered on trap (including SVC) or interrupt. When OS is entered, use of base and bounds must be disabled. (I.e. bit in PSW is reset.) Typically, trap handler loads new control register values. Base & bounds is cheap -- only 2 registers -- and fast -- the add and compare can be done in parallel. Examples: CRAY-1. IBM 7040/7090. Can consider three types of systems using base and bounds registers: Uniprogramming - single user region. Bring a user in, and run him. Multiprogramming with Fixed Partitions - (OS/MFT) - partition memory into fixed regions (may be different sizes). User goes into region of given size. Not very flexible. IBM OS circa 1965-68 Multiprogramming with Variable Partitions (OS/MVT) - partitions are dynamically variable. IBM OS circa 1967-72. Note that we can do any of the three above schemes without base and bounds registers - just load programs into region at appropriate base address. Task Switching We can now switch between processes very cheaply - don't have to reload memory, just change contents of process control block (which now has values of base and bounds registers). We can also run processes which are not in memory - how? Find empty area of memory in which to place process - how?. Remove one or more processes from memory, if necessary, in order to find space. (I.e. copy the removed processes to space on disk.) Copy new process (from disk) into memory. If only one process fits in memory, have to wait for swap to take place. If several processes fit in memory, can swap one while executing the other. 5. Multiple segments - Segmentation. Divide virtual address space into several "segments". This is not the same as the "segments" of linkers and loaders. Use a separate base and bound for each segment, and also add protection bits (read, write, execute), and valid bit. (Also will want dirty bit.) Each address is now Each memory reference indicates a segment and offset in one or more of three ways: Top bits of address select segment, low bits the offset. This is the most common, and the best. Or, segment is selected implicitly by the instruction (e.g. code vs. data, stack vs. data, which base register is used, or 8086 prefixes). Instruction specifies directly or indirectly a base register for the segment. Subprograms (procedures, functions, etc.) can be separate segments. Segments typically are associated with logical partitions of your process address space - e.g. code, data stack. Or, each module or procedure can be a separate segment. Need either segment table or segment registers to hold the base and bounds for each segment. Memory mapping procedure consists of table lookup + add + compare. Example: PDP-10 with high and low segments selected by high-order address bit. Address translation for segmentation Have segment table - maps segment number to [Segment base address, segment length (limit), protection bits, valid bit, reference bit, dirty bit] This info is in Segment Table Entry (STE) Need some hardware to automatically map virtual (segment number, word number) to real address. Real address = segment_table(segment #) + word number. Invalid if word_number > limit. (Note that we do test without adding bound to both) Also valid bit must be on, and permission bits must permit access. Need more hardware to make it go fast Have Segment Table Base Register (STBR) point to base of segment table (for hardware to use) Alternate approach - if there are a small number of segments, can have segment registers - one register per segment. Can also multiplex a small number of segment registers among a large number of segments (as with X86 architecture) Advantages Each process has own virtual address space Protection between address spaces Separate protection between segments (R/W/E) Virtual space can be larger than physical memory Unused segments don't need to be loaded Can load segments as needed. Attempt to reference missing segment called segment fault. Can share one or more segments sharing is tricky - we'll talk about this later Segments can be placed anywhere in memory that they fit. Memory compaction easy. Segment sizes can be changed independently. Disadvantages Each segment must be allocated contiguously. Segment size < memory size External fragmentation Overhead of allocating memory Need hardware for address translation Overhead (time/hardware) of doing address translation More complicated. Space for segment table. Note that segment tables are usually 1-1 with processes. A segment table defines a process's address space. What would happen if all processes shared the segment table? Protection is a problem Have same problem as before - now we have to allocate shared virtual instead of shared physical memory. When we switch processes, we reload the STBR (segment table base register), which changes address space. Managing segments: Keep copy of segment table in process control block (or if block is too small, associated with it). When creating process, define segments in segment table/PCB. When process is assigned memory, figure out where each segment goes, and put base and bounds into segment table. Need memory map, which maps memory to segments. (Segment table maps segments to memory.) Also called core map When switching contexts, save segment table or pointer to it in old process's PCB, reload it from new process's PCB. When process dies, return segments to free pool. When there's no space to allocate a new segment: Compact memory (move all segments, update bases) to get all free space together. Or, swap one or more segments to disks to make space (must then check during context switching and bring segments back in before letting process run). To enlarge segment: See if space above segment is free. If so, just update the bound and use that space. Or, move the segment above this one to disk, in order to make the memory free. Or, move this segment to disk and bring it back into a larger hole (or, maybe just copy it to a larger hole). Or, move it down, if there is space below. Can load segments only when needed. Segment Fault - an attempt to reference a segment which is not present. Trap to OS Find space for segment - replace another one, if necessary Load Segment (remove other segments to make space, if necessary) Set valid bit==1, and update other entries in STE. Make process ready. Paging: goal is to make allocation and swapping easier, and to reduce memory fragmentation. Make all chunks of virtual memory the same size, call them pages. Typical sizes range from 512-16K bytes. Divide real memory into page frames, which are the same size as pages. I will frequently be sloppy and say "page" when I mean "page frame". Virtual Address typically now consists of N bits, partitioned as K (page number) and N-K (byte within page). For each process, a page table defines the base address of each of that process' pages. Each page table entry contains bits for the real address of the page, protection, valid, reference, and dirty bits. Page table base register points to base of page table. Translation process: page number always comes directly from the (virtual) address. Since page size is a power of two, no comparison or addition is necessary. Just do table lookup and bit substitution. No limit field is needed or used. (just overflow to next page) We will need a table (page map or core map) or memory map telling us who owns which page frame in memory. Points back to any page table that points to this page. Not all of a process' memory has to be loaded into real memory. If one attempts to reference a location not in memory, it is prevented by a page fault - this condition is detected by the valid bit. Same as before with segment fault. Page fault - trap condition. Detected by hardware when valid bit is off. Trap to OS (trap, not interrupt) OS finds page frame, (somehow) gets page, (reads from disk) updates page table, make process ready. Pages and Paging are used to produce a physical partitioning of the process address space and memory. There usually isn't any relation between page boundaries and what is in a page. Advantages Easy to allocate: keep a free list of available page frames and grab the first one. No external fragmentation. When combined with segmentation (discussed later): Non-contiguous allocation of segments. Permits process to have virtual space much larger than physical space. Permits pages to be loaded as/when needed. Disadvantages Internal fragmentation: page size doesn't match up with information size. The larger the page, the worse this is. Hardware for address translation. Time for address translation. Page faults may cause considerable overhead. What happens when we have a page fault (missing page)? - to be discussed later. Need for page replacement algorithm. We need algorithms to decide when to move pages into and out of memory. (discussed later). Table space: if pages are small, the table space could be substantial. In fact, this is a problem even for normal page sizes: consider a 32-bit addresss space with 1k pages. What if the whole table has to be present at once? 1. Partial solution: keep base and bounds for page table, so only large processes have to have large tables. 2. Usual solution: make page table two level. Map high order bits through first table, and lower order page number bits through 2'nd table. First level table can be called page directory, or segment table (confusing usage). Second level table usually called page table. 3. Put user page tables in OS virtual memory - then unneeded pages are not allocated. Note that this yields a 2 level page table - address is mapped through OS page table and then user page table. 4. Make page table a Hash Table (done by IBM and HP) Called inverted page table Efficiency of access: even small page tables are generally too large to load into fast memory in the relocation box. Instead, page tables are kept in main memory and the relocation box only has the page table's base address. It thus takes one overhead reference for every real memory reference. If page table is two level, requires two extra references. Where are the page tables? Page tables are either referred to with real addresses, or OS virtual addresses. Cannot be put where users can get to them. Otherwise, users could change values, which would bypass protection. Page table entries are usually real addresses (including addresses of first level page table, and PTBR.) Could have OS virtual addresses in entries, which means that another level of translation is needed. Is the OS paged? Yes - advantages for users also apply to OS. Can page tables be paged out? Sure - why not? But if page tables are in OS's virtual memory, and page tables have OS address space virtual addresses in them, then translation of user virtual address also requires OS virtual addr. translation. This might require a recursive page fault. Means that OS page tables must be in real memory and use real addresses. Alternative is to put page tables in "real memory" and use real addresses. (I.e. have V=R). What can't be paged out? This is called ``wired down''. The code that brings in pages. Pages for critical parts of the operating system. (Handling a page fault takes time.) Some interrupt and trap handlers, including code that starts up a process. OS page tables Sensitive real time routines Pages currently undergoing I/O. (i.e. I/O buffers) Note how effective paging is for protection - you can only reference parts of memory which appear in your page table(s). The only parts that appear are those that you have access to. Paging and segmentation combined Diagram of segment table/ page table mapping. In segment table entry, put protection bits (read, write, execute), valid bit. Each segment broken into one or more pages. Segments correspond to logical units: code, data, stack. Segments vary in size and are often large. Protection can be associated with segments. Pages are for the use of the OS; they are fixed-size to make it easy to manage memory. Going from paging to P+S is like going from single segment to multiple segments, except at a higher level. Instead of having a single page table, have many page tables with a base and bound for each. Call the stuff associated with each page table a segment. Advantages: Provides 2 level mapping (as did page directory and page table). Makes page table size manageable. Provides both physical unit of management (page) and logical unit of management (segment). Effectively produces two dimensional addressing [segment, address within segment]. Can grow and shrink segments individually, and without interfering with other segments. Just add pages (which can be anywhere in memory.) Segmentation with no compaction or fragmentation problem. Bounds checks on segments handled by having page not be valid. (quantized to page size). No page table for segment which doesn't exist. Can share segment and/or page. Protection at level of page and/or segment. Disadvantages More complicated than either segmentation or paging. Overhead of 2 level mapping (time and hardware). Overhead of both schemes. Usual internal fragmentation problem, but if page size is small compared to most segments, then internal fragmentation is not too bad. Paging vs. Segmentation - page is fixed size, physical unit of information, used only for memory management; not visible to programmer. Segment is logical unit (usually) visible to user, of arbitrary size. Note that user may see (be aware of) segmentation. User should not be aware of paging. Can share at two levels: single page, or single segment (whole page table). Does shared region have to be at same address in each process? No - as long as it can be found. Can shared region contain any absolute addresses (i.e. virtual addr)? Usually not - very dangerous - addresses may not be the same in each process. But can contain relative addresses - eg. offsets to certain registers or segment base. Such registers can be loaded by each process differently. If entire segment is shared, and addresses are relative to start of segment, we are okay. Copy on write. Share pages, but with 2 separate page tables. Both page tables point to same pages. Pages are made read only. On attempt to write, a copy is made. Problem: how does the operating system get information from user memory? E.g. I/O buffers, parameter blocks. Note that the user passes the OS a virtual address. 1. Use real addresses - In some cases the OS just runs unmapped. Then all it has to do is read the tables and translate user addresses in software. Note: addresses that are contiguous in the virtual address space may not be contiguous physically. Thus I/O operations may have to be split up into multiple blocks. 2. Can specify (somehow) that the data addresses are to use the User Page Tables. (would need special hardware) Note that we therefore need two active PTBRs - user PTBR and System PTBR. 3. Have OS page tables point to user pages. 4. A few machines, most notably the VAX, make both system information and user information visible at once (but can't touch system stuff unless running with special kernel protection bit set). This makes life easy for the kernel, although it doesn't solve the I/O problem. I.e. OS is in everyone's address space. Another example: VAX. Address is 32 bits, top two select segment. Four base-bound pairs define page tables (system, P0, P1, unused). Pages are 512 bytes long. Read-write protection information is contained in the page table entries, not in the segment table. One segment contains operating system stuff, two contain stuff of current user process. Potential problem: page tables can get big. Don't want to have to allocate them contiguously, especially for large user processes. Solution is to use the system page table to map the user page tables so the user page tables can be scattered: System base-bounds pairs are physical addresses, system tables must be contiguous. User base-bounds pairs are virtual addresses in the system space. This allows the user page tables to be scattered in non-contiguous pages of physical memory. The result is a two-level scheme. This is alternative to normal two level scheme. If normal two level scheme were used, and if page tables were paged, would actually be four level scheme. System 370 example: 24-bit virtual address space, 4 bits of segment number, 8 bits of page number, and 12 bits of offset. Segment table contains real address of page table along with the length of the page table (a sort of bounds register for the segment). Page table entries are only 12 bits, real addresses are 24 bits. Inverted Page Table Idea is that page table is organized as hash table. Hash from virtual address into table with number of entries larger than physical memory size. (Page table shared by all processes.) Problem with segmentation and paging: extra memory references to access translation tables can slow programs down by a factor of two or three. There are obviously too many translations required to keep them all in special processor registers. But for small machines (e.g. PDP-11), can have one register for every page in memory, since can only address 64Kbytes. Solution: Translation Lookaside Buffer (TLB), also called Translation Buffer (TB) (DEC), or Directory Lookaside Table (DLAT) (IBM), or Address Translation Cache (ATC) (Motorola). A TLB is used to store a few of the translation table entries. It's very fast, but only remembers a small number of entries. On each memory reference: First ask TLB if it knows about the page. If so, the reference proceeds fast. If TLB has no info for page, translator must go through page and segment tables to get info. Reference takes a long time, but give the info for this page to TLB so it will know it for next reference (TLB must forget one of its current entries in order to record new one). TLB Organization: Virtual page number goes in, physical page location comes out. Similar to a cache. So what the TLB does is: Accept virtual address See if virtual address matches entry in TLB If so, return real address If not, ask translator to provide real address. Translator loads new translation into TLB, replacing old one. (Usually one not used recently.) (for later: Must replace entry in same set.) Will the TLB work well if it holds only a few entries, and the program is very big? Yes - due to Principle of Locality. (Peter Denning) Principle of Locality 1. Temporal Locality - Information that has been used recently is likely to be continued to be used. Alternate formulation - information in use now consists mostly of the same information as was used recently. 2. Spatial Locality - info near the current locus of reference is also likely to be used in the near future. Example - top of desk is cache for file cabinet. If desk is messy, stuff on top is likely to be what you need. Explanation- code is either sequential or loops. Data used together is often clustered together (array elements, stack, etc.) In practice, TLBs work quite well. Typically find 96% to 99.9% of the translations in the TLB. TLB is just a memory with some comparators. Typical size of memory: 16-512 entries. Each entry holds a virtual page number and the corresponding physical page number. How can memory be organized to find an entry quickly? One possibility: search whole table associatively on every reference. Hard to do for more than 32 or 64 entries. A better possibility: restrict the info for any given virtual page to fall in into a subset of entries in the TLB. Then only need to search that Set. Called set associative. E.g. use the low-order bits of the virtual page number as the index to select the set. Real TLBs are either fully associative or set associative. If the size of the set is one, called direct mapped. Replacement must be in same set. Translator is a piece of hardware that knows how to translate virtual to real addresses. It uses the PTBR to find the page table(s). Reads the page table to find the page. TLBs are a lot like hash tables except simpler (must be to be implemented in hardware). Some hash functions are better than others. Is it better to use low page number bits than high ones to select the set? Is there any way to improve on the TB hashing function? Must be careful to flush TB during each context swap. Why? Otherwise, when we switch processes, we'll still be using the old translations from virtual to real, and will be addressing the wrong part of memory. Alternative - can make process identifier (PID) part of virtual address. Have a Process Identifier Register (PIDR) which supplies that part of the address. When we modify the page table, we must either flush TLB or flush the entry that was modified.