TimeLinux1

Thursday, August 25, 2011

iEverything iResign

Yesterday, Aug 24, 2011 the ifamous CEO of Apple - Steve Jobs resigned. This was a breaking news to many and yet it was expected. Jobs had not been keeping health well in recent past. And for the 3rd time in 7 years he took medical leave from duty. On the surface there is a flutter in the media about this announcement. However like it or not, the man was not going to be there forever. So he made the right decision.
The important thing however is that this is not really about health. This is about Business. The announcement was made after the stock markets in US had closed for the day on a warm sunny day in August. August is usually the quiet time of the year for apple. All major announcements are either in the beginning of the year (Jan) or during June. This year was a bit different. There were no big and groundbreaking announcements this year. There were product launch delays aswell. Whether that was by design or circumstantial is another thing. Yet, there is no denying that the company is quiet well in Business. Making money and keeping the interest in its i-eyerything product line. So the timing was fine. You dont want to look embarrassed if you decide to leave or your stockholders kick you out. Rather you want to look like a winner and someone who delivered more than what was promised.
That said, look at Jobs. College dropout, fired from the same company he founded only to return 12 years later and lead it for the next 14 years. He is not an engineer or inventor. Rather he is a highly creative and visionary personality who has his say. That he has good sales skills doesnt hurt either. If you notice apples products are neither the cutting edge inventions nor cheapest ones but rather a good mix of aesthetic appeal, strictly controlled hardware+software and pure marketing. And Jobs has a big hand in this. He brought the life back to a dying company and converted it into a highly profitable one. If he didnt have some talent in him he would not have been able to do it. He knows his strengths and weaknesses and uses them well. Thats his secret. The computer industry has definitely benefited from his thinking. And thats about it. The computer industry must now look beyond and find newer ways of re-inventing itself. There is a lot of potential yet to be tapped for sure..

Monday, August 1, 2011

Operating System Concepts

My Operating System Study Notes for the World...

Jul 01
=====

-OS is a software that manages the hardware.
-OS provides an environment for other programs to run efficiently
-a good OS will be both efficient and also convenient to run.

-OS design depends on the type of hardware they are going to run upon.
-Otherwise their fundamental role is summarized above.

-a complete computer system = hardware + OS + non OS programs + user
-in the above, non OS programs = system programs + application programs.

-in a general sense,the OS is the one program that runs all the time, the kernel.

-note:  one bit = a unit of storage (either 0 or 1); one byte = 8 bits; one word = n (bytes) where n is a variable dependent on the computer arch.

===xx==

Jul 03
=====

-bootstrap program = rom bios = firmware = first program to run on a computer when it is turned on.
-bootstrap initializes cpu registers, memory contents, device controllers and then it locates and loads kernel from disk into memory.
-after all this, the os runs its first program called init and waits for an event to happen.

-an event can be an interrupt (from hardware) or a trap (from software system call).
-when an interrupt happens, the cpu stops what it is doing, offloads its current contents to memory and attends to the interrupt.

-the cpu and device controllers compete for memory cycles.
-the cpu and device controllers compete for memory cycles via signals that travel on a common pathway called 'system bus'.
-the kernel (read OS) has a device driver for each device controller.
-the device driver understands the device controller intimately and presents a uniform interface to the kernel to communicate.

-normal communication between cpu and device:
...interrupt > driver loads registers in controller > controller examines contents of registers > action (read/write..) > interrupt > controller tells driver 'work done' > driver gives status to kernel...

-direct memory access  between cpu and device:
...interrupt > driver loads registers in controller > controller examines contents of registers > controller loads its data buffers to memory > cpu directly gets info from memory (instead of driver)...

===xxx==

Jul 04 & 05
=====

-Three Benefits of multi-Processor systems                              [ Note: Multi-Processor  not Multi-Processing ]
    . Parallelism     => More work in less time                       [ Note: N parallel procs < N procs, in terms of work done due to management overhead ]
    . Economy       => More power less cost
    . Reliability       => More fail over ability
-Asymmetric Multi Processor System = Master Slave architecture = one master cpu assigns tasks = less common
-Symmetric   Multi Processor System = Active-Active architecture = every cpu shares work = more common
-Clusters are different from AMP or SMP systems.
-Clusters are individual systems connected by high speed n/w.
-AMP or SMP systems are part of one big system, share memory and devices.

-Multiprograming    => The mechanism of running multiple programs on a cpu by rapidly switching between them.
-Multiprograming optimizes cpu utilization since one single program cannot practically keep the cpu busy all the time.
-Multiprograming requires a pool of jobs in the memory (virtual or real) that take turns to run on the cpu.
-Multiprograming does not aim for user interaction. Multitasking does.

-Multitasking is a logical extension of Multiprograming. It facilitates user interaction.
-Multitasking switches between the pool of jobs rapidly to allow the user to interact with them while jobs are executing.
-Multitasking as described above consists of one single user running multiple programs.
-Multitasking can also have multiple users running multiple programs. That is called Time sharing.

-Both Multiprograming and Multitasking require job scheduling for the cpu.

-the OS has two modes of operation - kernel mode and user mode.
-in kernel mode - cpu executes OS code. A mode bit = 0 is added to the hardware.
-in user   mode    - cpu executes user code. A mode bit = 1 is added to the hardware.
-the mode bit allows us to know if a task is executed as OS (kernel) code or user code.
-kernel mode executes instructions for the entire system and therefore called 'privileged instructions'.
-privileged instructions include IO, Timer & Interrupt management etc.

Jul 06
=====
-A process is a program that is running on the cpu.
-with each clock tick, the cpu runs a program instruction to progress the process.
-a process may have multiple threads of simultaneous instructions. that requires one program counter per thread.

-memory is a large storage space that the cpu is able to access directly.
-in other words, instructions must be in memory for the cpu to be able to run them.
-if a program needs to be fetched and run, it follows this sequence:
    . magnetic disk > main memory > cache > cpu registers
-note: the speed of data access in the above is:
    . disk < mem < cache < register

Jul 07
=====
-system calls are an interface to the services made available by the os.
-user comands or actions invoke system calls.
-system calls are generally written in C, C++ and Assembly Language.
-even a simple action like reading a file from disk requires upto 10 system calls.
-usually the programers dont have to deal with system calls directly.
-that is because, the OS typically provides a set of system calls in the form of APIs.
-programers call the APIs in their programs. APIs make the programs portable (ie move between two h/w)
-APIs are a collection of system calls that achieve a certain task.
-APIs are OS dependant.
-3 common API types are Win32, Posix and Java-api.

-interprocess communication permits multiple processes to share resources and co-exist.
-two common models for ipc are - message passing model and shared memory model.
-in message passing, messages are stored in a common repository that the processes read from.
-in shared  memory,  messages are exchanged via access to common resources like memory areas, through locks,latches etc.
-message passing is useful for small amounts of data whereas shared memory is useful for larger amounts of data.

Jul 08
=====
-virtualization discussion:
-the fundamental idea behind virtualization is abstraction.
-ie the abstraction of cpu, memory, disks, network cards, ports etc into multiple execution environments.
-each vm on a given system owns one of these execution environments and has no knowledge of the other vms.
-two common abstraction techniques is a) cpu scheduling and b) virtual memory.

-vms first came in IBM VM370 Mainframes in 1972.
-vms allow for R&D, cost savings, consolidation         (business benefits)
-vms allow for process isolation and resource efficiency     (technical benefits)
-vms requires some level of hardware support.

-vms methods:
-simulation  -    in this vm method, host system tricks the guest. guest runs its own exec env and needs no modifications.
-para-vm     -    in this vm method, guest needs a modification. the exec env is not a baremetal, but similar env.
-examples:
    .simulated vm -    vmware                -hypervisor runs as an application in the host/physical user mode.
    .para-vm   vm -    solaris containers/zones    -hypervisor runs in host kernel mode.

Jul 09
=====
-two ways of booting a computer:
    . bios > bootstrap loader from mbr > bsl loads kernel into ram and runs it
    . bios > bootstrap loader from mbr > bsl loads a 2nd boot program from disk to mem > 2nd bp loads kernel into ram
-bios lives in rom. thats why also called rom-bios.
-bios lives in rom because at system startup ram status is unknown and rom is incorruptible (eg: by viruses)
-os kernel lives on 2nd storage (disk) bec such os-kernel are usually bigger than rom.
-some systems like cellphones, game consoles etc store entire os kernel in rom bec such os are usually small.
-rom is sometimes called firmware as their properties are both like firm hardware and invisible software.
-rom speed is slower than ram speed.
-as such sometimes for devices that store os kernel in rom, the os kernel is first loaded into ram and then run.

Jul 10
=====

-Process:
-a process is a running program. It is the unit of work in a system.
-a process needs cpu, ram and io for doing its work.
-a process exists in ram and is operated upon by the cpu.
-a process consists of 5 elements - text, data, heap, empty and stack.
-a process can be in 1of5 states  - new, ready, running, waiting, terminated.
-a process can have multiple threads inbuilt.
-processes are tracked using the process control block.

-a process scheduler does timesharing.
-timesharing or multitasking - act of switching the cpu between multiple programs frequently.
-the frequent switching allows users to interact with each program while those programs run.

-the first process in a system is sched (pid=0).
-the 2nd   process in a system is init  (pid=1). sched creates pid.
-ps -el shows all the procs.
-child process creation is a two step act.
-step 1 - parent clones itself using fork()        (two identical procs exist)
-step 2 - clone  runs own binaries using exec()        (child gets its own identity and goes its own way)
-note: when the child proc ends it reports its status to the parent.

Jul 11
=====
-Thread:
-a thread is actually a thread of execution within a process.
-as such a thread is a subset of a process.
-in other words, one or more threads make up a process.
-a thread is a unit of cpu utilization; a process is a unit of work in a system.
-a thread shares resources with other threads in the process.
-a thread shares the text, data and heap of a process. The stack is not shared.
-a thread has its own id, stack and pointers; other resources are shared with other threads of the process.
-having threads allows a process to multiple things at the same time.
-eg: an email process can send one mesg and display another simultaneously bec of threads.
-threads make systems more efficient in terms of resources required; they save cost and increase scalability.
-on a single core system, thread execution is sequential.
-on a multi  core system, thread execution is parallel; ie the threads can run on individual cores.

-both user and kernel processes can have threads.
-user threads can be mapped to kernel threads in one-to-one, many-to-one or many-to-many model.
-linux and windows use one-to-one model.
-solaris, HP-UX, Tru64 use combinations of many-to-one and many-to-many model.

-Pthreads are a standard by POSIX.
-Pthreads define how threads are to be defined and operated.
-Linux, Unix use Pthreads and customize it accordingly.
-Windows and Java have their own thread standards (Win32 and Java API).

-In Linux, the distinction between processes and threads is blurred.
-Linux uses a generic term task for them.
-Linux has the traditional fork(), exec() functions.
-It also has a new clone() function that creates tasks.
-this way it combines the two step process of creating a child into a one.
-this helps in operational efficiency.

Jul 12
=====
-Computer memory consists of a series of bytes or words each with its own address.
-note:    byte=8 bits, word=n(bytes) where n=variable dependant on computer arch.
-to execute a program, cpu reads contents of an addr, decodes it, reads variables, operates and repeats.
-every running program (ie process) has an unique address space.
-the address space has a base register and a limit register.
-the base register=starting point of addr space and limit register=size of address space.
-the cpu sees only a stream of address spaces in memory.
-it has no knowledge of multiple address spaces in memory.
-in other words, the cpu does not know that there are multiple processes existing in the memory.
-the cpu refers to the address space as logical  address.
-the mem refers to the address space as physical address.
-the logical to physical mapping is done in a hardware device called memory management unit (mmu).
-the mapping consists of adding a relocation register value to the logical address.
-physical address=logical address + relocation register
-note: a process must be in physical memory to be executed.

-swapping - the act of moving a process address space from physical memory to disk for space management.
-for reasons of efficiency, its better to have all the address units within an address space to be contiguos.
-this is called 'contiguous memory allocation'.
-fragmentation = the state in which the address units within an address space are not contiguous.
-to avoid fragmentation, the system has to work extra to fit all the address units in a contiguous fashion.
-so while contiguos mem allocation is good for memory usage, the system has to work harder to achieve that.

-paging    - the memory management scheme where non-contiguous physical addresses are permitted.
-it avoids the need for the system to continuously try to fit address units contiguously.
-in paging scheme, both physical and logical memory addresses are further subdivided into smaller chunks.
-frames - units of fixed size in physical address under the paging scheme.
-pages  - units of fixed size in logical  address under the paging scheme.
-so cpu sees a stream of pages where as the memory sees a stream of frames.
-usually the page sizes is denoted in powers of 2 (eg 2kb, 4kb, etc).
-the range of pages and frames are resp managed in a page table and frame table.

Jul 13
=====
-segmentation - is an alternate to memory management based on paging and/or contiguous memory allocation (cma).
-in segmentation, the memory is divided up into memory structures that are differently sized and not contiguous.
-notice that paging divides memory in equally sized units (but usually not contiguous)
-notice that cma    divides memory in equally sized units (but placed contiguously).
-most cpu architectures support a mix and match of various memory schemes (eg paging inside of segments)
-there the physical addr = logical addr + linear addr (for segments) + relocation register.

-virtual memory:
-the previously discussed memory management methods (cma, paging, segmentation) have one goal.
-the goal = keep as many processes in memory simultaneously to permit multiprograming.
-they also have a common flaw.
-the flaw = all of them require that the entire process be present in in memory before it can be executed.
-this requires more resources since all components of a process may not be changing or required all the time.
-virtual memory is the technique to try to solve this problem.
-virtual memory allows processes to share memory and thereby allows the execution of processes that are not entirely in memory.
-advantage = the system can then run processes that are larger than the physical memory.
-virtual memory creates an abstraction between the process and the memory.
-in this abstraction, part of the 2nd storage (disk) is considered as an extension of memory.
-this area of 2nd storage is also organized like the actual physical memory.
-thus a process is unable to tell the difference between physical memory and its disk based extension.
-downside = disk based memory is slower; vm is more complex to implement.

Jul 14 & 15
=====
-storage attachments:
-host attached storage (has):
-disk storage is directly connected through local io ports.
-the ports could be - ide, ata, sata, scsi, fiber.
-usually these ports us a protocol that needs cables (eg scsi ribbon, ide ribbon cable etc)
-also called das - direct attached storage.

-network attached storage (nas):
-storage accessed via a remote system.
-uses rpc interface like nfs, cifs.
-rpc is carried via tcp or udp over an ip network, usually on the same lan.
-easier to share storage between system but lesser performance than das.
-iscsi is a type of nas.
-in iscsi, scsi protocol is carried over ip n/w (instead of scsi over cables or rpc over ip).
-this way iscsi gets the simplicity of nas shared storage with performance of das.

-storage area network (san):
-one drawback of nas is that it increases the traffic and thereby the latency of the ip n/w lan it uses.
-this hits performance.
-san is an answer to that.
-san is a private network based on a dedicated storage protocol instead of a network protocol.
-this provides speed and flexibility.
-eg dedicated storage protocols - fiber channel, inifiniband (special bus arch based on h/w and s/w)

-booting:
-bootstrap is a tiny program that locates and loads os kernel into memory.
-bootstrap usually lives in the rom
-this is convienient bec rom does not need initialization and processor knows about its location from get go.
-also being rom, it is not infected by viruses.
-further, being read only means it cannot be modified, therefore it is strict and inflexible.
-for this reason, some computers put a 1st level code in the rom whose only purpose is to locate and load a 2ndry boot program.
-this 2nd boot program lives on disk, is flexible and locates and loads the os kernel.
-a disk that contains the 2nd bootloader program is called a boot disk or system disk.
-mbr is the 1st sector of the boot disk, it has the 2nd bootloader and partition table.
-mbr = 2nd bldr + parti table.
-boot proc:
-power on -> 1st bootstrap from rom bios -> 2nd bootloader from mbr -> os kernel from 1st sector of boot partition
-boot partition != boot disk.
-boot disk = disk containing 2nd bootloader program
-boot parti= disk containing os kernel
-boot sector = 1st sector of boot partition

Jul 16,17
=====

-malicious code:
-trojan    - program does something else than what is expected to do.
-eg: pretend to be a system cmd like 'cd' but actually ends up deleting files
-eg: login spoofer, shows dummy login screen to retrieve passwd.

-virus    - self replicating embedded code causing harm.
-is embedded inside another good program.
-replicates and causes harm (eg deleting files, formating disks etc).

-worm    - self replicating program.
-causes system performance problems bec it rapidly creates many copies of itself, even on remote systems.
-morris worm (1988 at cornell univ)

-ssl - a cryptogrphic protocol developed initially by netscape.
-allows web browsers to communicate securely with web servers.
-the cryptography key is valid only during the life of the session.

-dmz=semi-secure, semi-trusted domain outside the firewall.
-whats in it:
    . internet  ->    dmz    = allowed        -bec protected by f/w
    . internet  <-  int n/w    = allowed        -bet protected by f/w
    . internet  <-  dmz    !=not allowed
    . int n/w   ->  dmz    !=not allowed fully  

-security ratings - A B C D   where A is highest and D is lowest.
-C2 has higher security than C1. C2 = C1 + individal level access control.
-most unix is C1. linux is C2. windows NT is C2?

Linux:
=====
[not covering Linux History; often repeated it is.]
-linux is intimately related to the Internet.
-linux was distributed and developed over the Internet.
-linux kernel code is written entirely from scratch.
-linux system code (ie non kernel) is a mix of original and borrowed code.
-eg: linux system code borrows from GNU, MIT X, BSD etc.
-note: while linux has benefited from BSD code, recently linux code also has been borrowed by BSD & its derivaties (eg FreeBSD).
-linux is licensed under GNU PL (GPL).
-linux is not a public domain s/w.
-in public domain s/w the original author gives up all copyright-rights.
-in gpl, authors maintain rights to their code but allow reading, modification and sharing (rms) of their s/w.
-eg: if author A writes code a and releases under gpl, and author B improves it then
-eg:    author B cannot wipe credits to author A and claim the code as his own.
-eg:    this way copyrights of both author A and author B are retained, yet being freely available.
-eg:    this is called copyleft as a pun on traditional meaning of copyright.

-linux kernel is monolithic - ie all the kernel code shares one address space. this improves performance.
-yet linux system is modular. the kernel can load or unload modules to the kernel as needed.
-modularity benefits:
    . linux kernel is guaranteed to be freely available under gpl. noone can add proprietary components and make it proprie..
    . linux kernel need not be re-compiled every time a new module is added. only the module needs to be compiled.
-loadable kernel modules run in privileged kernel mode and thus have full access to all system resources.
-loadable kernel modules are initiated in a separate address space of memory other than the kernel.