TimeDigit.com: August 2011

My Operating System Study Notes for the World...

Jul 01
=====

-OS is a software that manages the hardware.
-OS provides an environment for other programs to run efficiently
-a good OS will be both efficient and also convenient to run.

-OS design depends on the type of hardware they are going to run upon.
-Otherwise their fundamental role is summarized above.

-a complete computer system = hardware + OS + non OS programs + user
-in the above, non OS programs = system programs + application programs.

-in a general sense,the OS is the one program that runs all the time, the kernel.

-note: one bit = a unit of storage (either 0 or 1); one byte = 8 bits; one word = n (bytes) where n is a variable dependent on the computer arch.

===xx==

Jul 03
=====

-bootstrap program = rom bios = firmware = first program to run on a computer when it is turned on.
-bootstrap initializes cpu registers, memory contents, device controllers and then it locates and loads kernel from disk into memory.
-after all this, the os runs its first program called init and waits for an event to happen.

-an event can be an interrupt (from hardware) or a trap (from software system call).
-when an interrupt happens, the cpu stops what it is doing, offloads its current contents to memory and attends to the interrupt.

-the cpu and device controllers compete for memory cycles.
-the cpu and device controllers compete for memory cycles via signals that travel on a common pathway called 'system bus'.
-the kernel (read OS) has a device driver for each device controller.
-the device driver understands the device controller intimately and presents a uniform interface to the kernel to communicate.

-normal communication between cpu and device:
...interrupt > driver loads registers in controller > controller examines contents of registers > action (read/write..) > interrupt > controller tells driver 'work done' > driver gives status to kernel...

-direct memory access between cpu and device:
...interrupt > driver loads registers in controller > controller examines contents of registers > controller loads its data buffers to memory > cpu directly gets info from memory (instead of driver)...

===xxx==

Jul 04 & 05
=====

-Three Benefits of multi-Processor systems                              [ Note: Multi-Processor not Multi-Processing ]
    . Parallelism     => More work in less time                       [ Note: N parallel procs < N procs, in terms of work done due to management overhead ]
    . Economy       => More power less cost
    . Reliability       => More fail over ability
-Asymmetric Multi Processor System = Master Slave architecture = one master cpu assigns tasks = less common
-Symmetric   Multi Processor System = Active-Active architecture = every cpu shares work = more common
-Clusters are different from AMP or SMP systems.
-Clusters are individual systems connected by high speed n/w.
-AMP or SMP systems are part of one big system, share memory and devices.

-Multiprograming    => The mechanism of running multiple programs on a cpu by rapidly switching between them.
-Multiprograming optimizes cpu utilization since one single program cannot practically keep the cpu busy all the time.
-Multiprograming requires a pool of jobs in the memory (virtual or real) that take turns to run on the cpu.
-Multiprograming does not aim for user interaction. Multitasking does.

-Multitasking is a logical extension of Multiprograming. It facilitates user interaction.
-Multitasking switches between the pool of jobs rapidly to allow the user to interact with them while jobs are executing.
-Multitasking as described above consists of one single user running multiple programs.
-Multitasking can also have multiple users running multiple programs. That is called Time sharing.

-Both Multiprograming and Multitasking require job scheduling for the cpu.

-the OS has two modes of operation - kernel mode and user mode.
-in kernel mode - cpu executes OS code. A mode bit = 0 is added to the hardware.
-in user   mode    - cpu executes user code. A mode bit = 1 is added to the hardware.
-the mode bit allows us to know if a task is executed as OS (kernel) code or user code.
-kernel mode executes instructions for the entire system and therefore called 'privileged instructions'.
-privileged instructions include IO, Timer & Interrupt management etc.

Jul 06
=====
-A process is a program that is running on the cpu.
-with each clock tick, the cpu runs a program instruction to progress the process.
-a process may have multiple threads of simultaneous instructions. that requires one program counter per thread.

-memory is a large storage space that the cpu is able to access directly.
-in other words, instructions must be in memory for the cpu to be able to run them.
-if a program needs to be fetched and run, it follows this sequence:
    . magnetic disk > main memory > cache > cpu registers
-note: the speed of data access in the above is:
    . disk < mem < cache < register

Jul 07
=====
-system calls are an interface to the services made available by the os.
-user comands or actions invoke system calls.
-system calls are generally written in C, C++ and Assembly Language.
-even a simple action like reading a file from disk requires upto 10 system calls.
-usually the programers dont have to deal with system calls directly.
-that is because, the OS typically provides a set of system calls in the form of APIs.
-programers call the APIs in their programs. APIs make the programs portable (ie move between two h/w)
-APIs are a collection of system calls that achieve a certain task.
-APIs are OS dependant.
-3 common API types are Win32, Posix and Java-api.

-interprocess communication permits multiple processes to share resources and co-exist.
-two common models for ipc are - message passing model and shared memory model.
-in message passing, messages are stored in a common repository that the processes read from.
-in shared memory, messages are exchanged via access to common resources like memory areas, through locks,latches etc.
-message passing is useful for small amounts of data whereas shared memory is useful for larger amounts of data.

Jul 08
=====
-virtualization discussion:
-the fundamental idea behind virtualization is abstraction.
-ie the abstraction of cpu, memory, disks, network cards, ports etc into multiple execution environments.
-each vm on a given system owns one of these execution environments and has no knowledge of the other vms.
-two common abstraction techniques is a) cpu scheduling and b) virtual memory.

-vms first came in IBM VM370 Mainframes in 1972.
-vms allow for R&D, cost savings, consolidation         (business benefits)
-vms allow for process isolation and resource efficiency     (technical benefits)
-vms requires some level of hardware support.

-vms methods:
-simulation -    in this vm method, host system tricks the guest. guest runs its own exec env and needs no modifications.
-para-vm     -    in this vm method, guest needs a modification. the exec env is not a baremetal, but similar env.
-examples:
    .simulated vm -    vmware                -hypervisor runs as an application in the host/physical user mode.
    .para-vm   vm -    solaris containers/zones    -hypervisor runs in host kernel mode.

Jul 09
=====
-two ways of booting a computer:
    . bios > bootstrap loader from mbr > bsl loads kernel into ram and runs it
    . bios > bootstrap loader from mbr > bsl loads a 2nd boot program from disk to mem > 2nd bp loads kernel into ram
-bios lives in rom. thats why also called rom-bios.
-bios lives in rom because at system startup ram status is unknown and rom is incorruptible (eg: by viruses)
-os kernel lives on 2nd storage (disk) bec such os-kernel are usually bigger than rom.
-some systems like cellphones, game consoles etc store entire os kernel in rom bec such os are usually small.
-rom is sometimes called firmware as their properties are both like firm hardware and invisible software.
-rom speed is slower than ram speed.
-as such sometimes for devices that store os kernel in rom, the os kernel is first loaded into ram and then run.

Jul 10
=====

-Process:
-a process is a running program. It is the unit of work in a system.
-a process needs cpu, ram and io for doing its work.
-a process exists in ram and is operated upon by the cpu.
-a process consists of 5 elements - text, data, heap, empty and stack.
-a process can be in 1of5 states - new, ready, running, waiting, terminated.
-a process can have multiple threads inbuilt.
-processes are tracked using the process control block.

-a process scheduler does timesharing.
-timesharing or multitasking - act of switching the cpu between multiple programs frequently.
-the frequent switching allows users to interact with each program while those programs run.

-the first process in a system is sched (pid=0).
-the 2nd   process in a system is init (pid=1). sched creates pid.
-ps -el shows all the procs.
-child process creation is a two step act.
-step 1 - parent clones itself using fork()        (two identical procs exist)
-step 2 - clone runs own binaries using exec()        (child gets its own identity and goes its own way)
-note: when the child proc ends it reports its status to the parent.

Jul 11
=====
-Thread:
-a thread is actually a thread of execution within a process.
-as such a thread is a subset of a process.
-in other words, one or more threads make up a process.
-a thread is a unit of cpu utilization; a process is a unit of work in a system.
-a thread shares resources with other threads in the process.
-a thread shares the text, data and heap of a process. The stack is not shared.
-a thread has its own id, stack and pointers; other resources are shared with other threads of the process.
-having threads allows a process to multiple things at the same time.
-eg: an email process can send one mesg and display another simultaneously bec of threads.
-threads make systems more efficient in terms of resources required; they save cost and increase scalability.
-on a single core system, thread execution is sequential.
-on a multi core system, thread execution is parallel; ie the threads can run on individual cores.

-both user and kernel processes can have threads.
-user threads can be mapped to kernel threads in one-to-one, many-to-one or many-to-many model.
-linux and windows use one-to-one model.
-solaris, HP-UX, Tru64 use combinations of many-to-one and many-to-many model.

-Pthreads are a standard by POSIX.
-Pthreads define how threads are to be defined and operated.
-Linux, Unix use Pthreads and customize it accordingly.
-Windows and Java have their own thread standards (Win32 and Java API).

-In Linux, the distinction between processes and threads is blurred.
-Linux uses a generic term task for them.
-Linux has the traditional fork(), exec() functions.
-It also has a new clone() function that creates tasks.
-this way it combines the two step process of creating a child into a one.
-this helps in operational efficiency.

Jul 12
=====
-Computer memory consists of a series of bytes or words each with its own address.
-note:    byte=8 bits, word=n(bytes) where n=variable dependant on computer arch.
-to execute a program, cpu reads contents of an addr, decodes it, reads variables, operates and repeats.
-every running program (ie process) has an unique address space.
-the address space has a base register and a limit register.
-the base register=starting point of addr space and limit register=size of address space.
-the cpu sees only a stream of address spaces in memory.
-it has no knowledge of multiple address spaces in memory.
-in other words, the cpu does not know that there are multiple processes existing in the memory.
-the cpu refers to the address space as logical address.
-the mem refers to the address space as physical address.
-the logical to physical mapping is done in a hardware device called memory management unit (mmu).
-the mapping consists of adding a relocation register value to the logical address.
-physical address=logical address + relocation register
-note: a process must be in physical memory to be executed.

-swapping - the act of moving a process address space from physical memory to disk for space management.
-for reasons of efficiency, its better to have all the address units within an address space to be contiguos.
-this is called 'contiguous memory allocation'.
-fragmentation = the state in which the address units within an address space are not contiguous.
-to avoid fragmentation, the system has to work extra to fit all the address units in a contiguous fashion.
-so while contiguos mem allocation is good for memory usage, the system has to work harder to achieve that.

-paging    - the memory management scheme where non-contiguous physical addresses are permitted.
-it avoids the need for the system to continuously try to fit address units contiguously.
-in paging scheme, both physical and logical memory addresses are further subdivided into smaller chunks.
-frames - units of fixed size in physical address under the paging scheme.
-pages - units of fixed size in logical address under the paging scheme.
-so cpu sees a stream of pages where as the memory sees a stream of frames.
-usually the page sizes is denoted in powers of 2 (eg 2kb, 4kb, etc).
-the range of pages and frames are resp managed in a page table and frame table.

Jul 13
=====
-segmentation - is an alternate to memory management based on paging and/or contiguous memory allocation (cma).
-in segmentation, the memory is divided up into memory structures that are differently sized and not contiguous.
-notice that paging divides memory in equally sized units (but usually not contiguous)
-notice that cma    divides memory in equally sized units (but placed contiguously).
-most cpu architectures support a mix and match of various memory schemes (eg paging inside of segments)
-there the physical addr = logical addr + linear addr (for segments) + relocation register.

-virtual memory:
-the previously discussed memory management methods (cma, paging, segmentation) have one goal.
-the goal = keep as many processes in memory simultaneously to permit multiprograming.
-they also have a common flaw.
-the flaw = all of them require that the entire process be present in in memory before it can be executed.
-this requires more resources since all components of a process may not be changing or required all the time.
-virtual memory is the technique to try to solve this problem.
-virtual memory allows processes to share memory and thereby allows the execution of processes that are not entirely in memory.
-advantage = the system can then run processes that are larger than the physical memory.
-virtual memory creates an abstraction between the process and the memory.
-in this abstraction, part of the 2nd storage (disk) is considered as an extension of memory.
-this area of 2nd storage is also organized like the actual physical memory.
-thus a process is unable to tell the difference between physical memory and its disk based extension.
-downside = disk based memory is slower; vm is more complex to implement.

Jul 14 & 15
=====
-storage attachments:
-host attached storage (has):
-disk storage is directly connected through local io ports.
-the ports could be - ide, ata, sata, scsi, fiber.
-usually these ports us a protocol that needs cables (eg scsi ribbon, ide ribbon cable etc)
-also called das - direct attached storage.

-network attached storage (nas):
-storage accessed via a remote system.
-uses rpc interface like nfs, cifs.
-rpc is carried via tcp or udp over an ip network, usually on the same lan.
-easier to share storage between system but lesser performance than das.
-iscsi is a type of nas.
-in iscsi, scsi protocol is carried over ip n/w (instead of scsi over cables or rpc over ip).
-this way iscsi gets the simplicity of nas shared storage with performance of das.

-storage area network (san):
-one drawback of nas is that it increases the traffic and thereby the latency of the ip n/w lan it uses.
-this hits performance.
-san is an answer to that.
-san is a private network based on a dedicated storage protocol instead of a network protocol.
-this provides speed and flexibility.
-eg dedicated storage protocols - fiber channel, inifiniband (special bus arch based on h/w and s/w)

-booting:
-bootstrap is a tiny program that locates and loads os kernel into memory.
-bootstrap usually lives in the rom
-this is convienient bec rom does not need initialization and processor knows about its location from get go.
-also being rom, it is not infected by viruses.
-further, being read only means it cannot be modified, therefore it is strict and inflexible.
-for this reason, some computers put a 1st level code in the rom whose only purpose is to locate and load a 2ndry boot program.
-this 2nd boot program lives on disk, is flexible and locates and loads the os kernel.
-a disk that contains the 2nd bootloader program is called a boot disk or system disk.
-mbr is the 1st sector of the boot disk, it has the 2nd bootloader and partition table.
-mbr = 2nd bldr + parti table.
-boot proc:
-power on -> 1st bootstrap from rom bios -> 2nd bootloader from mbr -> os kernel from 1st sector of boot partition
-boot partition != boot disk.
-boot disk = disk containing 2nd bootloader program
-boot parti= disk containing os kernel
-boot sector = 1st sector of boot partition

Jul 16,17
=====

-malicious code:
-trojan    - program does something else than what is expected to do.
-eg: pretend to be a system cmd like 'cd' but actually ends up deleting files
-eg: login spoofer, shows dummy login screen to retrieve passwd.

-virus    - self replicating embedded code causing harm.
-is embedded inside another good program.
-replicates and causes harm (eg deleting files, formating disks etc).

-worm    - self replicating program.
-causes system performance problems bec it rapidly creates many copies of itself, even on remote systems.
-morris worm (1988 at cornell univ)

-ssl - a cryptogrphic protocol developed initially by netscape.
-allows web browsers to communicate securely with web servers.
-the cryptography key is valid only during the life of the session.

-dmz=semi-secure, semi-trusted domain outside the firewall.
-whats in it:
    . internet ->    dmz    = allowed        -bec protected by f/w
    . internet <- int n/w    = allowed        -bet protected by f/w
    . internet <- dmz    !=not allowed
    . int n/w   -> dmz    !=not allowed fully

-security ratings - A B C D   where A is highest and D is lowest.
-C2 has higher security than C1. C2 = C1 + individal level access control.
-most unix is C1. linux is C2. windows NT is C2?

Linux:
=====
[not covering Linux History; often repeated it is.]
-linux is intimately related to the Internet.
-linux was distributed and developed over the Internet.
-linux kernel code is written entirely from scratch.
-linux system code (ie non kernel) is a mix of original and borrowed code.
-eg: linux system code borrows from GNU, MIT X, BSD etc.
-note: while linux has benefited from BSD code, recently linux code also has been borrowed by BSD & its derivaties (eg FreeBSD).
-linux is licensed under GNU PL (GPL).
-linux is not a public domain s/w.
-in public domain s/w the original author gives up all copyright-rights.
-in gpl, authors maintain rights to their code but allow reading, modification and sharing (rms) of their s/w.
-eg: if author A writes code a and releases under gpl, and author B improves it then
-eg:    author B cannot wipe credits to author A and claim the code as his own.
-eg:    this way copyrights of both author A and author B are retained, yet being freely available.
-eg:    this is called copyleft as a pun on traditional meaning of copyright.

-linux kernel is monolithic - ie all the kernel code shares one address space. this improves performance.
-yet linux system is modular. the kernel can load or unload modules to the kernel as needed.
-modularity benefits:
    . linux kernel is guaranteed to be freely available under gpl. noone can add proprietary components and make it proprie..
    . linux kernel need not be re-compiled every time a new module is added. only the module needs to be compiled.
-loadable kernel modules run in privileged kernel mode and thus have full access to all system resources.
-loadable kernel modules are initiated in a separate address space of memory other than the kernel.

TimeDigit.com

TimeLinux1

Thursday, August 25, 2011

iEverything iResign

Monday, August 1, 2011

Operating System Concepts