Operating Systems – ENTIRE Summary
January 23, 2010 2 Comments
Current version is COMPLETED: 10:47pm – 24/01/10
Please note there have been BIG CHANGES to the ‘Virtual Memory’ section.
Well here are my notes for Operating Systems, if they help that’s great! I’ll continue revising them and working on the layout.
The contents of the pdf can be found in the rest of this post.
Formatting is only available in the pdf.
A Revision Summary by James Bedford
Copyright to The University of Manchester
This is the completed version: 3:31pm – 24/01/10
This version of summary is complete and won’t be further changed.
Please feel free to contact me to discuss anything at:
Julimbob@hotmail.com – MSN/E-mail
Starscent – Skype
The more I talk about this stuff the more it goes in my head so I’m more than happy to!
Good luck in the exam! 🙂
Operating System Concepts
What does an Operating System Do?
Manages resources; processes, memory, network.
Provides a layer of abstraction that creates protection.
• A process is a program in execution.
• It has an address space: an area of memory that the process can read and write.
• Many processes exist at the same time.
Note on the program counter: it’s very important for debugging! If you look at it it looks chaotic. It only makes sense if you string together the different pieces it’s running. It’s running multiple processes and jumping between them.
A sequence of instructions in sequential execution context.
Multithreading – running multiple threads in the same process.
The application can benefit from this concurrency (e.g. using a GUI).
However – if multiple threads alter information at the same time we could have dangerous side effects.
You can relocate a program in memory:
• Load a program into memory so that it doesn’t have to start at memory location zero (so now you can have multiple programs loaded into memory).
• Storing the program from memory to the disk so that it can be restored later.
• Because of this, when you run a program twice, the operating system may load the program into a different location of memory and store values into different areas of memory, depending on what’s available at the time.
The Operating System as a Virtual Machine
The operating system provides a virtual machine, a convenient (because the process can assume its got it to itself) abstraction. There are lots of “virtual machines” provided by the OS for every single process.
For each process:
• The memory can be limited to a particular section of the memory.
• The file store can be shared, but details can be hidden.
This virtual machine idea provides protection for the hardware, operating system, and other processes from the application being run in a particular process.
You don’t want a process to be able to hault the machine using a hault CPU instruction. Build into the CPU a system of privileged operations. So the “hault” instruction can only be run by the OS. If a process tries to run it, the OS terminates the program. A single bit in the hardware can indicate a “user mode” and a “system”/”kernel”/”privileged” mode for the operating system.
The same technique applies for writing to a random part of the disk drive or memory – the operating system can do that but the process can’t. Similarly with networking – the process can’t hi-jack the network or mess up the timing etc. The process has to ask the OS for this. The OS also needs to prevent the process from affecting the process schedular.
Early computer systems wanted to charge for the amount of CPU time a user uses. Users would get around this by resetting the clock, so the OS used to need to prevent this.
The process manager has to manage creation, deletion, CPU allocation.
The operating system provides an interface between a user program and the operating system, in order for the user program to access hardware resources, known as system calls. These system calls are a bit like a procedure as they have parameters. When a system call is called, a specific routine is run and the user application waits for the operating system for a response.
System calls vary from one operating system to another, but the idea is the same across all systems. There are things you need to ask the operating system for.
System programs provide a convenient environment for program development/execution.
Operating System Structure
Application (on top, i.e. the highest level of abstraction)
Operating System (built up out of two key sections)
• File Manager
• Memory Manager
• Process Manager
• Network Manager
Secondly, and below the above structurally:
• Device Manager
• Network Device Manager
Operating System Services
In the terminal, after the user has typed a command, the following sequence of operations is handled by the operating system:
• Read the command (command interpreter reading and breaking down strings).
• Read the program file.
• Allocate memory.
• Read file into memory.
• Find libraries.
• Start the program running in a process.
• Finish “cleanly”.
Designing an Operating Systems
Originally monolithic systems were constructed, which had no structure and nor design. This was quickly discovered not to be an appropriate solution. Instead, a layered approach is much more appropriate.
Large operating systems have internal kernels with great complexity, design and debugging problems. Monolithic kernels were originally used (which places the entire operating system under a privileged, kernel mode). The problem with this is that there isn’t so much structure to the operating system, and the kernel can become very large, occupying a large amount of space and memory. Microkernels are an alternative approach, which minimises the functionality of the kernel, moving higher level operations up to higher levels of the operating system this means that the kernel is smaller and there’s more structure, but the performance may be slower as more layers have to be traversed before an operation can be completed.
The Process Manager Processes & Threads
How do are processes created?
There’s a system call to create a process. A process can create new processes (known as parent and children processes) by using the create process system call – this creates a process hierarchy.
The are various ways to terminate a process:
• The task/process can complete (normal exit).
• The task/process fails and so ends (error exit or fatal error).
• The process can be killed by another process. If a parent process is killed then so are the child processes.
Process State Diagram
A running process would normally terminate (complete) from this state.
New processes join this state (queue). The process would run, but it’s waiting for the CPU.
A process waiting for an event (such as reading from a disk or using a semaphore).
New Process goes to Ready – admitted.
Ready to Running – via the scheduler’s dispatch system.
Running to Ready – an interrupt, the processor being relinquished or a time-slice has expired.
Running to Blocked – block (e.g. an I/O request).
Blocked to Ready – ready (e.g. I/O event complete).
Running to a Terminated Process – exit (process completion).
Context Switching – everything about the current process must be saved and reloaded when the CPU changes from one process to another (e.g. the information currently in the registers). This is overhead which varies between systems, but may be between 1 to 1000 microseconds. A special CPU instruction can be written for this task which minimises the amount of instructions required to increase performance.
Process Control Block
A table of processes, with one entry per process. An entry contains; process ID, process state, saved registers, CPU scheduling information, memory management and file I/O information.
Threads are multiple flows of control within a process (sometimes referred to as lightweight processes, as opposed to heavyweight processes). Threads allow concurrency within a particular application. E.g. in a web browser, you can be downloading whilst displaying a page. A key disadvantage to threads however, is that the context switch that exists between threads is further overhead.
Threads need much less address space because threads share address spaces with other threads of the same processes, which means that creating and destroying threads is much faster. However, there are complications with sharing the same address space, as we shall soon discover…
Multiple CPUs can run threads at the same time for true parallelism.
Implementing threads at the user-level:
Pthreads is an example user-level-implemented thread library. The user program interacts with the thread library at the user-level, which manages which thread gets run and when. The operating system doesn’t know anything about this (because thread management is being achieved completely in the user-level), so if a single thread is blocked, the entire process is blocked (despite other threads within the same processes being able to run). Thread A could query the other threads to see if they could run before thread A lets itself get blocked to get around this.
The main advantage of this form of implementation is that the scheduling is faster because it’s done in user-space.
Implementing threads in the kernel (known as Native or Kernel Threads):
The operating system supports and schedules individual threads. If one thread is blocked, then it doesn’t affect the other threads within the process.
However, thread creation is slower when implemented in the operating system.
The operating system uses a scheduling algorithm, which chooses which process should be run next. If the operating system supports kernel-level threads, then it is the threads that get scheduled by the scheduling algorithm.
When a process currently running relinquishes the CPU (for various reasons), or when a new process joins the list of processes in the ready state, scheduling is required in order to select the best process to be run to make the maximum use of the CPU’s time.
Typically, process execution consists of CPU execution bursts and I/O waits (I/O bursts). Processes vary in the length and amount of each of these bursts they have, some have long CPU bursts an short, infrequent I//O bursts (known as CPU bound) or vice versa.
Scheduling diagrams can be drawn to show how a series of processes would be ran by the CPU.
You can measure:
• The average turnaround time (the average amount of time a process is in execution before it relinquishes the CPU).
• The average waiting time (the amount of time wasted of a process in the ready state).
When an operating system has nothing to do it will run an ‘idle loop’ which may simply consist of checking the ready queue.
The simplest CPU scheduling algorithm and a non-preemptive scheduling algorithm (a process keeps the CPU until the process terminates or is blocked).
The process that enters the ready state queue first gets the CPU first, and holds it until it’s blocked or has completed.
Not a very satisfactory algorithm, it’s not efficient enough. The average turnaround time or waiting time is very high.
Shortest-job-first is a non-preemptive scheduling algorithm that always runs the process with the smallest CPU bursts (it gives the smallest average turnaround and average waiting times).
The problem with this algorithm is that every time the ready queue is examined the shortest process is run, which means that longer CPU processes are starved of CPU time and may never be run.
Preemptive scheduling algorithms are much better. They run a process for a fixed time, and then interrupt the process and give the CPU to another process to be executed. A time slice (otherwise known as a quantum) is used, which is the maximum time a process can be run before it’s scheduled out, and is typically set to 10-100 milliseconds. The time slice must not be too long (to make it pointless time-slicing), nor too short (which would have a process switching context switching overhead) – this is a major trade off.
The amount of time spent making the context switch can be added onto the end of a CPU burst, when checking the efficiency of the time slice.
The scheduler goes round all the processes in the ready state, giving a time slice of the CPU to each. Round-robin scheduling is just first-come-first served with preemption. The process that’s preempted is moved to the end of the queue.
Scheduling Criteria for Evaluation
Fairness – you don’t want a process to have to wait a long time in the ready state.
CPU utilisation – always keep the CPU occupied.
Throughput – The number of processes completed per time unit.
Turnaround time – minimise the amount of time it takes to complete a process.
Waiting time – related to turnaround, the amount of time a process spends in the ready state waiting for the CPU. In real-time systems such as aircraft flying, particular processes need to be run by a deadline.
The amount of time spent context switching or scheduling.
Access to shared data (such as the main memory) by multiple processes or thread may lead to problems; inconsistencies. Mechanisms are required to work around these problems.
Hardware developers have dealt with concurrent parallel problems for a while (as electricity across a circuit runs in parallel). A race condition is where the first of two parallel flows to complete determines an outcome or result. It makes the result unpredictable and uncontrollable, and so race conditions need to be reduced or eliminated.
Synchronisation – the use of appropriate policies and mechanisms to ensure the correct operation of cooperating processes or threads.
Critical Section – a section of code in the which shared data is accessed, manipulated and written by multiple processes.
Mutual Exclusion – The idea that only one thread at a time can be executing in the critical section.
If mutual exclusion is achieved, and a process crashes in the critical section, then no other process will be able to run once it gets to the critical system and everything will lock up.
When a process is waiting for another process to leave the critical section, it will ‘busy wait’ (keep checking to see if the other process has left the critical section by examining some variable that’s set). This is still taking up CPU time, which makes it an unsatisfactory solution. Semaphores are used instead.
• Dijkstra 1965
• Semaphores consist of two atomic operations; P(S) and V(S).
• P may come from dutch proberen (to test) and V may come from the dutch verhogen (to increment)
• P(S) – checks if a variable S is less than 0 – if it is then the system waits (give up the processor). When this variable becomes positive the system decrements it.
• V(S) – increments the variable S.
• The variables are initialised to a value before the P and V operations are applies.
• For mutual exclusion, the variable should be initialised to 1 or 0 (a binary semaphore).
• Semaphores are very primitive and are similar to writing in machine code if you use them directly.
• When waiting, the CPU’s time should be used effectively, so it’s desirable to give the time to the OS.
A barrier is where we have a number of threads executing, which stop at a specific point (the barrier) that’s been set up. When all the threads get to the barrier then they can succeed/continue. You can implement a barrier using semaphores.
• P(S) and V(S) are atomic operations (which means they can’t be broken down into a series of smaller operations). They occur as a single entity, like an instruction.
• The operating system implements these operations.
• You want the operating system to manage the P(S), and you want this process to be put onto a blocked queue, and wake you up later. If this doesn’t happen, then the CPU will be “busy waiting” (constantly polling a variable) – which is a waste of CPU time.
• There are two methods of implementing semaphores; either the operating system uses a software interrupt, or hardware-implemented instructions (P(S) and V(S) are supported by the CPU).
Problems With Semaphores
Deadlocks – where two processes are both waiting (using the P(S) operation) for a variable S to change.
Starvation – when you overlook a process repeatedly. Solved by allocating the processor to the process being starved.
Live lock – have a method of preventing a dead lock which doesn’t resolve it. Back off and try again approach. Something is still being done (so don’t call it “dead”) – the lock is ‘alive’.
A computer’s memory consists of three hierarchical components:
Processor registers – fastest but has the smallest space because it’s very expensive.
Cache or main memory (RAM).
Hard Disk (HD) – slowest but has the largest space because it’s very cheap.
The main RAM memory is an important resource. One of the operating system’s job is to make efficient use of memory.
With a single user program, the operating system will load a program from the hard disk to the main memory, set the PC to the program’s location in memory, execute the program, and then return to the operating system when the program terminates. This is what Komodo or MSDOS (originally) did.
The operating system and the user program (after being loaded from the HD) are stored in main memory RAM.
Uniprogramming is very restrictive:
• Absolute addressing is used – the addresses are always fixed (to 0).
• Only one program can be run at a time.
• If a program is loaded into the memory and then starts waiting for I/O, the CPU will wait on the I/O, which is a waste of CPU processing time.
• The memory may be too small.
In multiprogramming, the memory is filled with multiple programs. The memory is partitioned, and these partitions are indexed – the partition size is fixed (for the moment). Different programs can then be loaded into these partitions. The memory addressing of a program now needs to be relocated (as the program is written using base address 0, but the partition of memory the program is allocated may be address 16000 – or any other value!)
Dynamic relocation – a loader now alters these addresses with an offset of the partition index.
Virtual memory is a technique that gives the CPU the illusion of large, continuous, linear memory that it has. The physical memory, because it’s expensive, is a lot smaller than the CPU can actually reference (a 32 bit processor can reference 4gig of memory, and a 64 bit processor can reference a lot more!). Virtual memory allows the CPU to think it has a lot more memory available to it than it actually does. It also allows the operating system to manage multiple processes, and ensure that these processes don’t interfere with each other’s memory.
Memory Management Unit
A piece of hardware known as the memory management unit (MMU) sits between the processor and the main memory RAM (or a memory cache if there’s one present), performing translations between large virtual addresses given by the CPU to smaller physical addresses in the memory. It has a ‘Base’ register (which holds the value of the base physical address of the partition for a program) and a ‘limit’ register (which holds the size of the partition).
The MMU takes an address from the CPU, adds it to the value in the base register, and returns the new value in terms of the physical address in memory. It also checks that this new value doesn’t exceed the limit register (to prevent a process from accessing memory outside of its allocated partition) – if it does it throws an error.
The MMU uses a built-in lookup page table (maintained by the operating system) in order to load the base register with the correct base address of the memory partition of the physical memory address.
Swapping – switching information from different memory mediums (where mediums are registers, cache, main memory RAM and storage space).
Issue: “How many times should the memory be partitioned?”
Option 1 – limit the amount of partition’s available.
Option 2; -move memory images to and from the hard disk (very inefficient).
Issue: The program might be bigger than a partition.
Option1; give different sizes to the partitions (the MMU concept can still work). But finding out how big to make the partitions before a program is run can be impossible.
Issue: Fragmentation – when different sizes of partitions are used and one partition is freed up, a fragment of memory is left. The issue now is how to best use the gaps in memory. The memory becomes fragmented which is the processes fragmentation.
Compaction – moving memory partitions so that they’re next to each other in memory, effectively shifting all the memory fragments to the end of the disk.
Implementing Virtual Memory
Virtual memory can be implemented two ways.
1. Paged Virtual Memory
The virtual address space is a size relative to the size of the CPU’s address. A 32 bit processor will have a virtual address size of 4GB, and a 64 bit processor can have a much larger virtual address size. The virtual address space is divided into equally-sized, fixed, pages (which are blocks of memory typically between 512 bytes to 64 kb).
The physical address space (the actually memory within the computer) is divided into page frames (not to be confused with pages, which is the term used for a partition of virtual memory as described in the previous paragraph). A page frame can hold a page, as a page frame is set to exactly the same fixed size as a page.
The MMU translates the addresses of pages to the addresses of page frames. In this sense, the MMU provides a mapping from pages to page frames.
Number of pages = Virtual Address Space Size / Page Size
Number of page frames = Physical Address Space Size / Block Size
Example: If the virtual address space size is 4GB, and the page size is 64kb, then there are 2 ^ 32 number of pages.
To calculate this – convert 4GB into a power of two (2^32) as well as 64kb (which becomes 2^16) and plug these two numbers into the formula.
Tip: When you do division with powers, you just subtract the powers. So (2^32) / (2^16) = (2^16).
Remember: When finding the value in terms of powers of 2, the literal value is in bytes and is approximate, so 2 ^16 is literally 65,536 bytes (or roughly 64kb).
The logical (virtual) address generated by the processor is split into two bit fields; a page number and an offset.
Number of bits in the page number = log2(no. of pages in the virtual address space)
Number of bits in the offset = log2(page size)
Example: If the number of pages in the virtual memory is 2^16 (from the example above), then the number of bits required to represent the page number is log2(2^16) = 16.
Tip: log2(2^16) can be simplified to 16 * log2(2) and then log2(2) is just 1 because log2(2) means, “what power should be used to get from the number in italics to the number in brackets?” – and the answer is 1.
The MMU takes the logical (virtual) address and translates it to a page frame using a lookup table (page table), and checks to see whether the page is in memory or not. If it is in memory, a physical address is computed by replacing the page number with the page frame number (from the lookup table). If it’s not in memory, the transfer is aborted and a page fault will occur.
Page faults – page faults are a form of interrupts that cause a page demand (see next).
Demand paging – the current process will be halted and the CPU will be given to another process. The operating system will load a page from the disk into a page frame. The page table will be changed to say its present, and the correct address will be generated. Once this has been completed, the memory reference will be tried again.
Page replacement policy – an algorithm or technique for replacing page frames. The old page frame must be written back first. Least recently used and first-in-first-out are commonly used. More information on this can be found below.
Page changing – moving data from the back end disk to the main memory. A write back strategy can is used to optimise page changing. A dirty bit is a part of each page table entry to indicate whether the page has been written to or not (this bit signals to use a ‘write back’ or ‘write through’ policy). If the page has been written to (the dirty bit will be set) then the page on the disk will need to be updated when the page is unloaded from the physical memory. The disk’s copy of the page is out of date and needs to be updated with the contents in memory. If the page in memory has not been written to (the dirty bit will be clear), and there is no need to update the page’s data held on the disk. This use of a dirty bit can reduce the amount of disk writing that takes place – the disk will only be written to when it needs to.
Each page table entry:
• Has a virtual page number.
• Has a page resident flag (which signals whether or not the page is loaded into main memory).
• Has a page dirty flag (which indicates whether or not the page been changed).
• Has a page frame number.
Because page tables are very large they also have to be paged in!
2. Segmented Virtual Memory
The segments in the virtual memory system are of variable size, which helps the computer system in three main ways:
1. It helps the operating system by using variable segment sizes.
2. It helps the system software become more optimised.
3. It helps programmers manage processes in a multitasking operating system.
Segments have attributes:
Access rights – determines which programs can use the segment (e.g. the page table can only be accessed by the MMU).
Usage rights – determines which operations can be performed on the memory (read, write or execute).
These attributes allow the operating system to ensure processes don’t interfere with each other, or try to access bits of memory they’re not supposed to.
A CPU will pass an address through the MMU, which is looked up in a segment table, which looks where up the segment is located in memory. The segment table holds other information about that virtual memory segment. A resident bit is flagged if the segment is currently loaded into the main memory, in which case a physical base address is generated by adding the CPU’s address to the segment offset, which can be found in the segment table for each segment entry.
The segmentation process is an associative process because it associates one set of information with another set. It works roughly the same way as with paged virtual memory, and is laid out as follows:
1. A logical (virtual) address is created by the CPU.
2. This address is divided into two bit fields; a segment number and an offset.
3. The MMU takes the segment number and looks it up in the page table. If the resident bit is set for this entry, then the segment is currently in main memory. The MMU takes the base address of the physical main memory segment (this address is stored in the segment table for each segment) and adds it to the offset in order to get the physical memory address. If the resident bit isn’t set, then a segment fault is thrown.
A segment fault is a similar idea to a page fault – see ‘page replacement policies’ later.
The operating system must manage the positioning of variable-sized segments in memory and an algorithm is required in order to arrange the segments and avoid external fragments.
External fragments – are space within the physical memory that isn’t being used, and are caused by segments being swapped out for other segments. Because of the variable sizes, gaps will be formed.
Internal fragments – are the same as external fragments, but are found in virtual memory, which means that the full amount of physical memory won’t be used.
Avoiding External Fragmentation
Compaction is the very slow process of moving the segments together so that all the gaps are together into one. The best way to optimise this is to reduce the amount of times compaction has to occur. This is done by using a fiiting algorithm that can determine the best place to insert a segment.
Best fit – scans the external fragments and decides which is the best to fit the new segment into.
First fit – fast to perform but very inefficient overall. It fits the new segment into the first external fragment it finds that will fit it.
Paging vs. Segmentation
▪ Segmentation is programmer visible. Paging is not.
▪ Paging uses the same size blocks, which means it’s easy to fit blocks, with segmentation it’s harder and algorithms are required.
▪ Paging has internal page fragmentation, whilst segmentation has external page fragmentation.
▪ Paging makes efficient use of disk traffic by balancing the page size with the access and transfer times (this is similar to making efficient uses of packet sizes across a network), where as with segmentation the segments can be an inefficient size of a few bytes.
▪ Paging was invented to simulate large memories, whilst segmentation was invented to provide multiple address spaces.
Page Replacement Policies
Minimising page faults by using sophisticated replacement algorithms is worthwhile performance-wise.
When Depending on which page you select to replace, you can have an effect on the performance.
The page that won’t get used for a while may be a good choice. However, with dynamic behaviour, this would be impossible to predict.
First in First Out (FIFO)
‘First in First Out’ identifies the oldest page in real memory and swaps that page out. Each page can be tagged with an integer sequence number in order to determine the first page in. The assumption is that the first page to be loaded was probably a while ago and isn’t needed much any more.
The problem is that the oldest page may actually contain a critical piece of code that’s always being accessed (but the sequence number is not updated on each access so the system would be unaware of this).
Least Recently Used
‘Least Recently Used’ works on a temporal locality principle. This principle states that the likelihood of referencing a resource is higher if a resource near it was just referenced. A timestamp is used, which may need 64 bits, for each page entry in the page table, which determines which pages have been recently used. This timestamp must be updated every time the page is accessed.
Disadvantages; this may not be the most efficient use of memory, a counter could be added as well, but this would make hardware more expensive and tripe the size of a page table entry.
Not Recently Used
The idea behind ‘Not Recently Used’ is that recently used pages need to be kept in memory as much as possible, and pages not used recently can be replaced.
Two flags are used for each page table entry, an R bit and a M bit. The R bit is set when referenced, and the M bit is set when it’s been modified.
1. At fixed intervals, the referenced bit is cleared.
2. During these intervals, if R = 1 then it is used, and if R = 0 then it is not used.
There are four classes (categories), for each of the combinations of R and M bits. If both the R bit and the M bit are set to 0, then it’s the best candidate for removal. If both bits are set to 1 then it’s the least best candidate. ‘Not Recently Used’ can prioritise the pages into these four different categories, and swap out the page with least priority.
Belady’s Optimal Replacement Algoritm
When a page flip is required, the algorithm looks ahead in time through the programs being executed to determine which of the current pages in memory is required in the furthest away time – this is the page that gets removed.
This is allowing a swapped out process to load its recently used pages before they’re required, rather than loading the pages when they are required.
Working set – multiple recently used pages.
With pre-paging, a working set of a process is swapped into memory. The inefficiency of swapping out multiple pages has to be balanced with the effectiveness of having the pages already in memory.
This is a good way of dealing with processes that require more than one page per instruction.
Page tables are within the physical memory. They’re paged in and out because they can be very large. It’s essential for efficiency that the page table that’s required at any time is already in memory, or it’ll have to be paged in.
Processes share memory within shared user code, a shared data space (such as Unix pipes), or a shared library code (such as frameworks). Multiple copies of this memory would be inefficient, so
the page table must map different processes to the same page. A page table entry can have read (R), write and execute (X) permissions. However, the memory pages aren’t natural units of protection, as memory is also used for storing a program’s instructions as well as using mechanisms such as stacks, which are more of a functionality issue that a data storage issue.
Features of Modern CPUs
Pipelining is a technique to make processors run more optimally. When a processor is given an instruction, it has various stages (also know as processor subunits) that it goes through in order to execute that instruction.
Let’s say that there are 4 stages:
1. Fetch – getting the instruction from memory.
2. Decode – decoding the instruction.
3. Execute – actually performing the execution of the particular instruction.
4. Write – writes a result to some memory.
Without a pipeline, the subunit of the CPU that handles each stage is being wasted whilst it’s waiting for the information from another stage. Pipelining means that each CPU subunit for each stage is always processing some instruction at that stage.
It does take a bit of time to load the pipeline (running the first instruction through all the subunits). However, once the pipeline has been loaded, it will complete an instruction every clock cycle – it can now run as-many-subunits-as-there-are times faster than without pipelining!
Structural hazards – where two instructions are trying to access (write to) the same data in memory at the same time. One instruction will have old data.
Data hazards – where two instructions are trying to access the same registers in the CPU. One instruction will have old register information.
Control hazards – where instructions that alter the program counter (PC) reduce the pipeline performance. This happens when a branch alters the PC when the branch is taken. The subunits before the execution will have processed the wrong instructions because the CPU will have loaded the instructions into the pipeline in memory order, and the execution order is different. The pipeline will then have to re-fill itself with the correct sequence of instructions, and some clock cycles are wasted.
If the probability of the branch being taken can be predicted, then the CPU could load the most probable outcome of the branch into the pipeline – hopefully it would run more closely to the execution order. There are two methods to try and predict the branch’s outcome:
1. Statically – the compiler examines the code of the program and identifies the most probable branch outcome.
2. Dynamically – the CPU builds up statistics about whether the branch is taken or not. When it next gets the the branch it prepares for last time’s outcome.
Branch prediction is different in each processor and CPU companies often keep their prediction algorithms secret from competitors.
Controlling Input and Output
A computer system will connect to a wide variety of input and output devices. These include the mass storage, but also include devices such as a keyboard and mouse.
Polling – the process of continually checking the status register, which is a waste of time. An alternative to this is program periodical calling of the control program. A processor can do a lot of work whilst its waiting for characters. It’s not a good idea for the CPU to be continuously polling the status of an I/O device as this is a waste of CPU time.
The processor is effectively the “brain” of the system which is connected to a system bus. The system bus interacts with other devices. The processor interacts with the memory via the memory management unit (MMU) and with input/output devices via an I/O module.
I/O modules have at least two types special purpose registers within them:
First register – holds one or more bytes of data (for example the information of what key was pressed on a keyboard). This is a data register.
Second register – is a status register, that holds information about the state (for example one bit of this register on a keyboard could represent whether or not a key has been pressed and its data is in the data register).
These two registers are mapped to two special memory addresses so that the processor can access them. The processor then reads from the status register’s memory address, and determines whether or not it needs to read from the data register in order to get new I/O information, such as reading a character from the input.
Periodical programmed I/O – the periodically calling of the control program – once in a while the CPU looks at each I/O module and checks the status register. If the status bit is cleared, then the CPU executes something else. If the status bit is set, then the processor goes and reads the data and acts accordingly. Data for the I/O is often stored in a buffer that can be dealt with when the CPU checks the IO device and finds the status bit to be set.
Issue: If data from the I/O is going into a buffer, then the I/O device must be checked regularly enough so as the buffer doesn’t overflow. However, if the buffer is checked too frequently then the CPU’s time will be wasted when there’s nothing new in the buffer.
Interrupts are used to stop the processor executing a program, perform another task, and then return to executing of the original program. Interrupts can be used to handle I/O, and can be implemented in two ways:
1. Software interrupt (also known as an exception) – normally occurs when there’s an error. A piece of software code is run when a software interrupt occurs in order to handle the error.
2. Hardware interrupt – the way that systems connected to the system bus work. The processor has a number of external connections known as interrupt lines which can be connected to external devices that can signal (by changing from low to high, i.e. 0 to 1). The external devices are connected via these interrupt lines and can make these changes in order to interrupt the processor and cause a hardware interrupt.
With a hardware interrupt, there’s no need to read from the status register because the interrupt indicates that there’s now information in the data register. The hardware line acts like a pseudo-“status register”.
In the keyboard example, every time a character is typed, the hardware interrupt wire will go high and the information will be read from the data. This means that the CPU doesn’t waste any time polling a status register.
The process of a hardware interrupt is as follows:
1. The I/O device signals a hardware interrupt via the interrupt line.
2. The processor finishes executing its current instruction.
3. The processor makes a special type of memory read (known as an IACK). This is an Interrupt Acknowledgement.
4. The device that interrupted the processor responds to the IACK with a a value that indicates its identity so as the processor knows which devices caused the interrupt and can operate accordingly. This is known as an interrupt vector.
5. Everything that’s running on the CPU is saved to the memory (e.g. information in the registers).
6. The interrupt vector is used to access a table that holds the starting address of all the programme routines that handle the interrupts.
7. This routine, known as the Interrupt Service Routine (ISR), is run.
8. At the end of the interrupt service routine, a special instruction will signal the return from the interrupt. The information that was stored to memory in step 5 is loaded back into the CPU and the program continues executing as if nothing had happened.
In the case of the keyboard, the interrupt service routine will place the key that’s been placed’s information into memory, which can be accessed later by a program.
Direct Memory Access (DMA)
Interrupts are even more useful in the case of mass storage devices such as a hard drive.
A DMA module is a more advanced IO module, that sits between the system bus and the I/O device module. The DMA module’s purpose is to give an I/O device direct access to the main memory RAM, and can be used with mass storage devices such as hard disks. Because the IO device can now be set to directly access the memory, it relinquishes the CPU, so as it can get on with other processes.
If the processor wants to write to a disk, the following Direct Memory Access (DMA) steps are taken:
1. The processor informs the disk’s DMA I/O module to write the data. The CPU does this by using the DMA module’s command register (a third register) in the DMA I/O.
2. The DMA controller starts the write process.
3. The CPU executes other processes.
4. When the DMA device is finished, it sets a hardware interrupt wire, notifying the CPU that the disk writing has completed.
This is much more efficient than having the CPU continually checking the status ready of the disk.
The File Manager
Past exam questions will be harder on the file manager – there’s less to know now.
The file manager has to deal with names. The user will name a file, and the operating system will provide a translation as to where the file actually is on the disk. Besides this mapping, the file manager also has to provide protection for the file, and the ability to back up and restore files. It has to also deal with permissions for the files. Storage management also has to keep track of fragments and other implementation issues.
The File System is layered within the operating system:
Naming Service – (deals with “open” requests, mapping a file name to a user file ID).
Storage Service – (an abstraction that deals with a vector of bytes read and write commands will go straight to this level).
Disk Driver – (the bottom layer that interacts with the physical hard disk and deals with blocks).
A file is a collection of related information stored on secondary storage. Files can have a file structure such as lines of text or something more complicated. Each file has attributes, such as its name, size, last update, owner, permissions etc. File operations such as open, close, create, read, write and delete can be used with files through the operating system’s naming service.
Issue: “Should the operating system recognises and support different file types?”
MSDOS used to dislike the user changing file types (at the end of the file name), as it thought that the file wouldn’t be able to work using a different file extension.
Files can be accessed either:
Sequentially – files are read from the beginning onwards.
…or by using…
Direct access – files are read in no particular order, which can be quicker.
A directory is a structured object, which contains a list of information about files.
A directory structure can be:
• Single level directory.
• Two-level directory.
• Tree-structured directory – where a file is a leaf in the tree and directories are nodes.
Each CPU process has a working directory, and a filename relative to the current working directory and user file IDs (which is an index into the table that describes the files).
Within a tree structured directory system a path name has to be decoded by working left to right through the path.
• Multiple accesses are coordinated.
• Multiple reads are ok.
• Multiple writes occurring at the same time will create a race condition. A restriction to prevent this would be to allow one writer and many readers. However, this is inflexible, and you may want multiple users to be able to write to different parts a particular file at the same time.
File Manager Implementation
The file manager is implemented using an active file table in memory. Each entry in the table contains the attributes of the file currently in use, as well as the number of readers and writers.
An “Open file” system call will create an entry in this table.
A “Close file” system call will write the attributes to the disk.
The layout of the files is stored in the first series of blocks.
Contiguous allocation – this provides very good read and write performance, but can cause fragmentation. It’s useful in media streaming, as it may be faster to read straight across the blocks rather than jumping around the hard drive.
However, because multiple processes can be run, different parts of the hard disk may be accesses by two different processes.
Linked allocation – the most natural implementation. Each block states where the next block is, so the blocks are in a chain. There’s space wasted in each block (information pointing to the next block). Getting to a particular block can be slow as the entire link has to be traversed. The alternative is to keep the pointers in a separate table (the File Allocation Table, or FAT). FAT16 uses 16 bits for a pointer to the next block, or FAT32 uses 32 bits, so greater jumps between blocks can be achieved, which may be useful for large hard drives. The issue is that the file allocation table shouldn’t be too large – however it can be!
Indexed allocation – features a table of pointers to blocks. NULLs are used for unused blocks. UNIX uses inodes (a similar idea – an inode is a fixed sized table of pointers that point to blocks which point to further blocks – a method of indirection).
Other File Manager Issues
Performance – a buffer cache can be used to store frequently accesses blocks.
Recovery – backups are a good idea. Consistency checking (such as chdsk in MS-DOS) can be used to recover things such as the FAT or other important disk management features, should they become damaged
Free space management – a bit vector uses 1 bit for every block, and uses a 0 to indicate a free block. For a 4GB disk with 512 blocks, the free block bit vector needs to be 1MB in size. A linked list of free blocks could be used, but then the disk has to be read in order to find a free block. Or free blocks could be grouped together.
THAT’S ALL FOLKS!
I’M MISSING THE CASE STUDIES – I SIMPLY DON’T HAVE ACCESS TO THEM!