TimeLinux1

Sunday, October 16, 2011

Politics and IT Governance

In these days, Politics and regulations on Information Technology are increasingly seen crossing paths.
A few days ago there was the news of Google been summoned to a committee to explain the incidents of its mobile software tracking the geographical location of the cellphone user and keeping a history of where they have been. More recently there was the news of regulators in several states trying to find the solution to what is the effect of imposing sales tax on e-commerce. Many online retailers have threatened to move their business to other states if the sales tax were imposed which could mean the state loses valuable economic employment and state exchequer stands to lose money one way or the other. Besides antitrust lawsuits against monopolistic businesses are not uncommon.
So no matter what the actual circumstance, its not infrequent that Politics and Technology Businesses often rub against each other.
In this the essential element is that Tech businesses are after all another Business which affect peoples livelihood and economic activity for the  administration. As such to protect the interests of the economy in general is an important Political duty of the elected representatives. On the other hand, providing services to the people thereby creating an economic activity is the very essence of any Business. Now there is no surprise in the fact that there is a lots of lobbying in most businesses to 'buy' the legislation in their favor. So there is a need for a moral ground on part of both sides to balance need and greed. Every business exists for some economic activity. The ones involved in for-profit  model are more so. So if their greed overtakes the need for a law abiding activity, the role of the government is clearly to step in. On the other hand if the government is turning a blinds eye to illegal activity, then events like 'occupy wall street' are a natural consequence...

Thursday, October 13, 2011

Dennis Ritchie (1944 - 2011)

Another one of the important figures in the world of Computing has passed away. Dennis Ritchie was the creator of C programing language and along with Ken Thompson, was instrumental in rewriting the Unix OS in C in 1973. They were at Bell labs and were one of the pioneers in Multitasking, Multiuser, Multiplatform Computing. The trio of Brian Kernighan, Ken Thompson and Dennis Ritchie is permanently linked to the world of computing as we know today. All the major OSes of today - Linux, OSX, Solaris, AIX, even Windows have their origins in the design and architecture in Unix and C. Unix provided the fundamental concepts of ACLs, Filesystem hierarchy, Process scheduling, Memory management and Threads that all of these OS'es borrow. And C of course is the language that these concepts are brought to reality in the form of programs that these OSes use as the foundation. True that in the world of IT today dominated by Business Tycoons and CEOs, its very difficult to notice and recognize the importance of REAL computer scientists who were instrumental in making the Technology a thing of everyday life. Their contribution to the Human Evolution and Knowledge sharing is immense and permanent. They are the real visionaries who actually changed the world and the world as we know today would be very different if it was not for them to come and change the landscape with their vision, hardwork and inventions...

Friday, October 7, 2011

Steven Paul Jobs

Todays writing is about Steven Paul Jobs, fondly known as Steve Jobs, Apple cofounder and a visionary who changed the world in several ways.
It was two days ago that I was walking out of the office around 5pm to go home that someone broke the news to me. It was an immediate piercing feeling. As I climbed down the stairwell to go out of the building and to the car, I started fiddling with my phone and see if there was any updates on google news. Until that time, there was none about Jobs. News was only about Amanda Knox or Rick Perry or Presidential elections. So I figured, it was a breaking news. As I drove home, the radio announcer on NPR said "Coming up we have some big news". And what followed was the news of Jobs's passing away and the sheer deluge of news about the man.
For myself, I wish he lived longer. People like him are rare. It is a sad news that he is no more. He recreated the computer industry from an elite geek only place to a more appealing, simple and useful place. He was one Visionary, Business leader, Creative person, Thinker, Artist and Poet in one. People like him are rare and this is a big loss. The world needs another Jobs. He showed the way and lived it too.
I think after a few days when the sheer volume of news about his demise becomes less, it will be a more reasonable time for reflection. Right now its just plain shock and emotions. Yet as difficult and heavy as it may seems, Time moves on. The memories live on. And its for us to ensure that the pioneering work does not stop. That we raise the bar higher and higher like Steve Jobs did.  May he find peace..

Thursday, August 25, 2011

iEverything iResign

Yesterday, Aug 24, 2011 the ifamous CEO of Apple - Steve Jobs resigned. This was a breaking news to many and yet it was expected. Jobs had not been keeping health well in recent past. And for the 3rd time in 7 years he took medical leave from duty. On the surface there is a flutter in the media about this announcement. However like it or not, the man was not going to be there forever. So he made the right decision.
The important thing however is that this is not really about health. This is about Business. The announcement was made after the stock markets in US had closed for the day on a warm sunny day in August. August is usually the quiet time of the year for apple. All major announcements are either in the beginning of the year (Jan) or during June. This year was a bit different. There were no big and groundbreaking announcements this year. There were product launch delays aswell. Whether that was by design or circumstantial is another thing. Yet, there is no denying that the company is quiet well in Business. Making money and keeping the interest in its i-eyerything product line. So the timing was fine. You dont want to look embarrassed if you decide to leave or your stockholders kick you out. Rather you want to look like a winner and someone who delivered more than what was promised.
That said, look at Jobs. College dropout, fired from the same company he founded only to return 12 years later and lead it for the next 14 years. He is not an engineer or inventor. Rather he is a highly creative and visionary personality who has his say. That he has good sales skills doesnt hurt either. If you notice apples products are neither the cutting edge inventions nor cheapest ones but rather a good mix of aesthetic appeal, strictly controlled hardware+software and pure marketing. And Jobs has a big hand in this. He brought the life back to a dying company and converted it into a highly profitable one. If he didnt have some talent in him he would not have been able to do it. He knows his strengths and weaknesses and uses them well. Thats his secret. The computer industry has definitely benefited from his thinking. And thats about it. The computer industry must now look beyond and find newer ways of re-inventing itself. There is a lot of potential yet to be tapped for sure..

Monday, August 1, 2011

Operating System Concepts

My Operating System Study Notes for the World...

Jul 01
=====

-OS is a software that manages the hardware.
-OS provides an environment for other programs to run efficiently
-a good OS will be both efficient and also convenient to run.

-OS design depends on the type of hardware they are going to run upon.
-Otherwise their fundamental role is summarized above.

-a complete computer system = hardware + OS + non OS programs + user
-in the above, non OS programs = system programs + application programs.

-in a general sense,the OS is the one program that runs all the time, the kernel.

-note:  one bit = a unit of storage (either 0 or 1); one byte = 8 bits; one word = n (bytes) where n is a variable dependent on the computer arch.

===xx==

Jul 03
=====

-bootstrap program = rom bios = firmware = first program to run on a computer when it is turned on.
-bootstrap initializes cpu registers, memory contents, device controllers and then it locates and loads kernel from disk into memory.
-after all this, the os runs its first program called init and waits for an event to happen.

-an event can be an interrupt (from hardware) or a trap (from software system call).
-when an interrupt happens, the cpu stops what it is doing, offloads its current contents to memory and attends to the interrupt.

-the cpu and device controllers compete for memory cycles.
-the cpu and device controllers compete for memory cycles via signals that travel on a common pathway called 'system bus'.
-the kernel (read OS) has a device driver for each device controller.
-the device driver understands the device controller intimately and presents a uniform interface to the kernel to communicate.

-normal communication between cpu and device:
...interrupt > driver loads registers in controller > controller examines contents of registers > action (read/write..) > interrupt > controller tells driver 'work done' > driver gives status to kernel...

-direct memory access  between cpu and device:
...interrupt > driver loads registers in controller > controller examines contents of registers > controller loads its data buffers to memory > cpu directly gets info from memory (instead of driver)...

===xxx==

Jul 04 & 05
=====

-Three Benefits of multi-Processor systems                              [ Note: Multi-Processor  not Multi-Processing ]
    . Parallelism     => More work in less time                       [ Note: N parallel procs < N procs, in terms of work done due to management overhead ]
    . Economy       => More power less cost
    . Reliability       => More fail over ability
-Asymmetric Multi Processor System = Master Slave architecture = one master cpu assigns tasks = less common
-Symmetric   Multi Processor System = Active-Active architecture = every cpu shares work = more common
-Clusters are different from AMP or SMP systems.
-Clusters are individual systems connected by high speed n/w.
-AMP or SMP systems are part of one big system, share memory and devices.

-Multiprograming    => The mechanism of running multiple programs on a cpu by rapidly switching between them.
-Multiprograming optimizes cpu utilization since one single program cannot practically keep the cpu busy all the time.
-Multiprograming requires a pool of jobs in the memory (virtual or real) that take turns to run on the cpu.
-Multiprograming does not aim for user interaction. Multitasking does.

-Multitasking is a logical extension of Multiprograming. It facilitates user interaction.
-Multitasking switches between the pool of jobs rapidly to allow the user to interact with them while jobs are executing.
-Multitasking as described above consists of one single user running multiple programs.
-Multitasking can also have multiple users running multiple programs. That is called Time sharing.

-Both Multiprograming and Multitasking require job scheduling for the cpu.

-the OS has two modes of operation - kernel mode and user mode.
-in kernel mode - cpu executes OS code. A mode bit = 0 is added to the hardware.
-in user   mode    - cpu executes user code. A mode bit = 1 is added to the hardware.
-the mode bit allows us to know if a task is executed as OS (kernel) code or user code.
-kernel mode executes instructions for the entire system and therefore called 'privileged instructions'.
-privileged instructions include IO, Timer & Interrupt management etc.

Jul 06
=====
-A process is a program that is running on the cpu.
-with each clock tick, the cpu runs a program instruction to progress the process.
-a process may have multiple threads of simultaneous instructions. that requires one program counter per thread.

-memory is a large storage space that the cpu is able to access directly.
-in other words, instructions must be in memory for the cpu to be able to run them.
-if a program needs to be fetched and run, it follows this sequence:
    . magnetic disk > main memory > cache > cpu registers
-note: the speed of data access in the above is:
    . disk < mem < cache < register

Jul 07
=====
-system calls are an interface to the services made available by the os.
-user comands or actions invoke system calls.
-system calls are generally written in C, C++ and Assembly Language.
-even a simple action like reading a file from disk requires upto 10 system calls.
-usually the programers dont have to deal with system calls directly.
-that is because, the OS typically provides a set of system calls in the form of APIs.
-programers call the APIs in their programs. APIs make the programs portable (ie move between two h/w)
-APIs are a collection of system calls that achieve a certain task.
-APIs are OS dependant.
-3 common API types are Win32, Posix and Java-api.

-interprocess communication permits multiple processes to share resources and co-exist.
-two common models for ipc are - message passing model and shared memory model.
-in message passing, messages are stored in a common repository that the processes read from.
-in shared  memory,  messages are exchanged via access to common resources like memory areas, through locks,latches etc.
-message passing is useful for small amounts of data whereas shared memory is useful for larger amounts of data.

Jul 08
=====
-virtualization discussion:
-the fundamental idea behind virtualization is abstraction.
-ie the abstraction of cpu, memory, disks, network cards, ports etc into multiple execution environments.
-each vm on a given system owns one of these execution environments and has no knowledge of the other vms.
-two common abstraction techniques is a) cpu scheduling and b) virtual memory.

-vms first came in IBM VM370 Mainframes in 1972.
-vms allow for R&D, cost savings, consolidation         (business benefits)
-vms allow for process isolation and resource efficiency     (technical benefits)
-vms requires some level of hardware support.

-vms methods:
-simulation  -    in this vm method, host system tricks the guest. guest runs its own exec env and needs no modifications.
-para-vm     -    in this vm method, guest needs a modification. the exec env is not a baremetal, but similar env.
-examples:
    .simulated vm -    vmware                -hypervisor runs as an application in the host/physical user mode.
    .para-vm   vm -    solaris containers/zones    -hypervisor runs in host kernel mode.

Jul 09
=====
-two ways of booting a computer:
    . bios > bootstrap loader from mbr > bsl loads kernel into ram and runs it
    . bios > bootstrap loader from mbr > bsl loads a 2nd boot program from disk to mem > 2nd bp loads kernel into ram
-bios lives in rom. thats why also called rom-bios.
-bios lives in rom because at system startup ram status is unknown and rom is incorruptible (eg: by viruses)
-os kernel lives on 2nd storage (disk) bec such os-kernel are usually bigger than rom.
-some systems like cellphones, game consoles etc store entire os kernel in rom bec such os are usually small.
-rom is sometimes called firmware as their properties are both like firm hardware and invisible software.
-rom speed is slower than ram speed.
-as such sometimes for devices that store os kernel in rom, the os kernel is first loaded into ram and then run.

Jul 10
=====

-Process:
-a process is a running program. It is the unit of work in a system.
-a process needs cpu, ram and io for doing its work.
-a process exists in ram and is operated upon by the cpu.
-a process consists of 5 elements - text, data, heap, empty and stack.
-a process can be in 1of5 states  - new, ready, running, waiting, terminated.
-a process can have multiple threads inbuilt.
-processes are tracked using the process control block.

-a process scheduler does timesharing.
-timesharing or multitasking - act of switching the cpu between multiple programs frequently.
-the frequent switching allows users to interact with each program while those programs run.

-the first process in a system is sched (pid=0).
-the 2nd   process in a system is init  (pid=1). sched creates pid.
-ps -el shows all the procs.
-child process creation is a two step act.
-step 1 - parent clones itself using fork()        (two identical procs exist)
-step 2 - clone  runs own binaries using exec()        (child gets its own identity and goes its own way)
-note: when the child proc ends it reports its status to the parent.

Jul 11
=====
-Thread:
-a thread is actually a thread of execution within a process.
-as such a thread is a subset of a process.
-in other words, one or more threads make up a process.
-a thread is a unit of cpu utilization; a process is a unit of work in a system.
-a thread shares resources with other threads in the process.
-a thread shares the text, data and heap of a process. The stack is not shared.
-a thread has its own id, stack and pointers; other resources are shared with other threads of the process.
-having threads allows a process to multiple things at the same time.
-eg: an email process can send one mesg and display another simultaneously bec of threads.
-threads make systems more efficient in terms of resources required; they save cost and increase scalability.
-on a single core system, thread execution is sequential.
-on a multi  core system, thread execution is parallel; ie the threads can run on individual cores.

-both user and kernel processes can have threads.
-user threads can be mapped to kernel threads in one-to-one, many-to-one or many-to-many model.
-linux and windows use one-to-one model.
-solaris, HP-UX, Tru64 use combinations of many-to-one and many-to-many model.

-Pthreads are a standard by POSIX.
-Pthreads define how threads are to be defined and operated.
-Linux, Unix use Pthreads and customize it accordingly.
-Windows and Java have their own thread standards (Win32 and Java API).

-In Linux, the distinction between processes and threads is blurred.
-Linux uses a generic term task for them.
-Linux has the traditional fork(), exec() functions.
-It also has a new clone() function that creates tasks.
-this way it combines the two step process of creating a child into a one.
-this helps in operational efficiency.

Jul 12
=====
-Computer memory consists of a series of bytes or words each with its own address.
-note:    byte=8 bits, word=n(bytes) where n=variable dependant on computer arch.
-to execute a program, cpu reads contents of an addr, decodes it, reads variables, operates and repeats.
-every running program (ie process) has an unique address space.
-the address space has a base register and a limit register.
-the base register=starting point of addr space and limit register=size of address space.
-the cpu sees only a stream of address spaces in memory.
-it has no knowledge of multiple address spaces in memory.
-in other words, the cpu does not know that there are multiple processes existing in the memory.
-the cpu refers to the address space as logical  address.
-the mem refers to the address space as physical address.
-the logical to physical mapping is done in a hardware device called memory management unit (mmu).
-the mapping consists of adding a relocation register value to the logical address.
-physical address=logical address + relocation register
-note: a process must be in physical memory to be executed.

-swapping - the act of moving a process address space from physical memory to disk for space management.
-for reasons of efficiency, its better to have all the address units within an address space to be contiguos.
-this is called 'contiguous memory allocation'.
-fragmentation = the state in which the address units within an address space are not contiguous.
-to avoid fragmentation, the system has to work extra to fit all the address units in a contiguous fashion.
-so while contiguos mem allocation is good for memory usage, the system has to work harder to achieve that.

-paging    - the memory management scheme where non-contiguous physical addresses are permitted.
-it avoids the need for the system to continuously try to fit address units contiguously.
-in paging scheme, both physical and logical memory addresses are further subdivided into smaller chunks.
-frames - units of fixed size in physical address under the paging scheme.
-pages  - units of fixed size in logical  address under the paging scheme.
-so cpu sees a stream of pages where as the memory sees a stream of frames.
-usually the page sizes is denoted in powers of 2 (eg 2kb, 4kb, etc).
-the range of pages and frames are resp managed in a page table and frame table.

Jul 13
=====
-segmentation - is an alternate to memory management based on paging and/or contiguous memory allocation (cma).
-in segmentation, the memory is divided up into memory structures that are differently sized and not contiguous.
-notice that paging divides memory in equally sized units (but usually not contiguous)
-notice that cma    divides memory in equally sized units (but placed contiguously).
-most cpu architectures support a mix and match of various memory schemes (eg paging inside of segments)
-there the physical addr = logical addr + linear addr (for segments) + relocation register.

-virtual memory:
-the previously discussed memory management methods (cma, paging, segmentation) have one goal.
-the goal = keep as many processes in memory simultaneously to permit multiprograming.
-they also have a common flaw.
-the flaw = all of them require that the entire process be present in in memory before it can be executed.
-this requires more resources since all components of a process may not be changing or required all the time.
-virtual memory is the technique to try to solve this problem.
-virtual memory allows processes to share memory and thereby allows the execution of processes that are not entirely in memory.
-advantage = the system can then run processes that are larger than the physical memory.
-virtual memory creates an abstraction between the process and the memory.
-in this abstraction, part of the 2nd storage (disk) is considered as an extension of memory.
-this area of 2nd storage is also organized like the actual physical memory.
-thus a process is unable to tell the difference between physical memory and its disk based extension.
-downside = disk based memory is slower; vm is more complex to implement.

Jul 14 & 15
=====
-storage attachments:
-host attached storage (has):
-disk storage is directly connected through local io ports.
-the ports could be - ide, ata, sata, scsi, fiber.
-usually these ports us a protocol that needs cables (eg scsi ribbon, ide ribbon cable etc)
-also called das - direct attached storage.

-network attached storage (nas):
-storage accessed via a remote system.
-uses rpc interface like nfs, cifs.
-rpc is carried via tcp or udp over an ip network, usually on the same lan.
-easier to share storage between system but lesser performance than das.
-iscsi is a type of nas.
-in iscsi, scsi protocol is carried over ip n/w (instead of scsi over cables or rpc over ip).
-this way iscsi gets the simplicity of nas shared storage with performance of das.

-storage area network (san):
-one drawback of nas is that it increases the traffic and thereby the latency of the ip n/w lan it uses.
-this hits performance.
-san is an answer to that.
-san is a private network based on a dedicated storage protocol instead of a network protocol.
-this provides speed and flexibility.
-eg dedicated storage protocols - fiber channel, inifiniband (special bus arch based on h/w and s/w)

-booting:
-bootstrap is a tiny program that locates and loads os kernel into memory.
-bootstrap usually lives in the rom
-this is convienient bec rom does not need initialization and processor knows about its location from get go.
-also being rom, it is not infected by viruses.
-further, being read only means it cannot be modified, therefore it is strict and inflexible.
-for this reason, some computers put a 1st level code in the rom whose only purpose is to locate and load a 2ndry boot program.
-this 2nd boot program lives on disk, is flexible and locates and loads the os kernel.
-a disk that contains the 2nd bootloader program is called a boot disk or system disk.
-mbr is the 1st sector of the boot disk, it has the 2nd bootloader and partition table.
-mbr = 2nd bldr + parti table.
-boot proc:
-power on -> 1st bootstrap from rom bios -> 2nd bootloader from mbr -> os kernel from 1st sector of boot partition
-boot partition != boot disk.
-boot disk = disk containing 2nd bootloader program
-boot parti= disk containing os kernel
-boot sector = 1st sector of boot partition

Jul 16,17
=====

-malicious code:
-trojan    - program does something else than what is expected to do.
-eg: pretend to be a system cmd like 'cd' but actually ends up deleting files
-eg: login spoofer, shows dummy login screen to retrieve passwd.

-virus    - self replicating embedded code causing harm.
-is embedded inside another good program.
-replicates and causes harm (eg deleting files, formating disks etc).

-worm    - self replicating program.
-causes system performance problems bec it rapidly creates many copies of itself, even on remote systems.
-morris worm (1988 at cornell univ)

-ssl - a cryptogrphic protocol developed initially by netscape.
-allows web browsers to communicate securely with web servers.
-the cryptography key is valid only during the life of the session.

-dmz=semi-secure, semi-trusted domain outside the firewall.
-whats in it:
    . internet  ->    dmz    = allowed        -bec protected by f/w
    . internet  <-  int n/w    = allowed        -bet protected by f/w
    . internet  <-  dmz    !=not allowed
    . int n/w   ->  dmz    !=not allowed fully  

-security ratings - A B C D   where A is highest and D is lowest.
-C2 has higher security than C1. C2 = C1 + individal level access control.
-most unix is C1. linux is C2. windows NT is C2?

Linux:
=====
[not covering Linux History; often repeated it is.]
-linux is intimately related to the Internet.
-linux was distributed and developed over the Internet.
-linux kernel code is written entirely from scratch.
-linux system code (ie non kernel) is a mix of original and borrowed code.
-eg: linux system code borrows from GNU, MIT X, BSD etc.
-note: while linux has benefited from BSD code, recently linux code also has been borrowed by BSD & its derivaties (eg FreeBSD).
-linux is licensed under GNU PL (GPL).
-linux is not a public domain s/w.
-in public domain s/w the original author gives up all copyright-rights.
-in gpl, authors maintain rights to their code but allow reading, modification and sharing (rms) of their s/w.
-eg: if author A writes code a and releases under gpl, and author B improves it then
-eg:    author B cannot wipe credits to author A and claim the code as his own.
-eg:    this way copyrights of both author A and author B are retained, yet being freely available.
-eg:    this is called copyleft as a pun on traditional meaning of copyright.

-linux kernel is monolithic - ie all the kernel code shares one address space. this improves performance.
-yet linux system is modular. the kernel can load or unload modules to the kernel as needed.
-modularity benefits:
    . linux kernel is guaranteed to be freely available under gpl. noone can add proprietary components and make it proprie..
    . linux kernel need not be re-compiled every time a new module is added. only the module needs to be compiled.
-loadable kernel modules run in privileged kernel mode and thus have full access to all system resources.
-loadable kernel modules are initiated in a separate address space of memory other than the kernel.

       
   

Thursday, July 7, 2011

Appstore Drama

All the computer users know that a computer is useless without Applications. Applications is what you do when you read your email, listen to your music, watch video, write a document, edit a spreadsheet, browse the web and so on..
No wonder, Applications are what make a computer better (or easy to use) than another. Applications have been around since the computers dawned on the human race. So has been the term 'Application' or fondly called 'App' or 'app' for short. It is very important to understand the distinction between the generic term 'Application' or 'App' and the Application itself. The difference is the same as the one between lets say English language and an author's work or creative idea. You can copyright your own literary work, but can you copyright the whole language itself? The answer is plainly 'NO'. The language is a common public entity. Whereas one persons creative use of that language to write a beautiful poem or song is 'that' persons own original work and therefore entitled to individual protection.
Extend this analogy to the computer world. The word Application is like the English language in our example whereas one particular application like lets say a mapping service or an online game is like the authors original literary work. Therefore if an organization tries to bamboozle their way in and try to copyright/trademark the generic term 'App' itself is as baseless as trying to copyright the whole language.
No wonder the Federal court recently threw out Apples claim to the word 'App' and 'App store' and its derivative 'Appstore'. Such monopolizing mentality exhibited by Apple is illegal, unethical and counterproductive to innovation. Amazon on the other hand who is trying to launch their own 'Appstore' for Android have all the right to do so. The court made the right decision. We want this to continue in case they try to drag this lawsuit to higher courts.

Go here for more..

Thursday, June 9, 2011

icloud sucks

Everything about icloud sucks.
This is why -- it is another attempt to limit the computer users ability to run and use their files and programs seamlessly. It is locked in to the ixyz platform. You cant sync your music directly unless you use the itunes on all your devices. You cant share your pictures unless you use all ixyz hardware. Try doing this with a non apple phone. Cant do it. Same goes with other type of files like documents or video content.
So whats the point mr apple? why do we have to buy and run only your hardware and software to use our own programs and files? Add insult to injury, you dare to charge $25 for your sync-service (imatch or icatch whatever you call it). Perhaps you need to brand your products as 'irob' your rights or 'isteal' your money.

Go here for more.

Friday, May 27, 2011

Linux in Depth - 2

... continued ...

-Linux process management is similar to Unix but there are some imp differences.
-In Unix, process creation is a two step process:
    . a new process which is exact copy of the parent is created using the fork() system call.
    . the child is then allowed to run its own program using the exec() system call.
-the two steps are required for the following reasons:
    . step 1, fork(), is reqd bec otherwise the child will cause the termination of the parent process.
    . step 2, exec(), is reqd bec otherwise the child will continue running the same process as parent.
-in step 1, fork(), the child inherits the environment of the parent. eg: signals, address space, open files etc.
-in step 2, exec(), the child changes  the environment specific to the program to be run.
-To identify and manage a process the following properties are used by the os:
    . pid        -  for identification
    . credentials    -  for permissions
-also, a process may have one or more threads of execution within it.
-So the above was the Unix model of process management.

-in Linux, things are a little different.
-Linux does not differentiate between processes and threads.
-instead it refers to them as a third term called 'task'.
-task refers to a flow of control within a program. it resembles Unix threads.
-besides providing fork(), linux has a clone() system call that creates a task (like unix threads).
-one can think of clone() as a superset of fork().
-If the parent and child have the same env (signal, addr spc, files etc) then clone is equiv to fork.
-the main reason for this new model in Linux is efficiency.
-In Linux, the clone() call causes the context of a program to be stored in independent subcontexts.
-this independence of subcontexts helps in sharing resources between multiple processes.
-this means fewer resources can support more processes--which leads to increased efficiency in process mgmt.

-Linux filesystem behavior is similar to that of Unix.
-anything that can handle an input, output stream of data is treated as a file.
-as such, devices are treated as files as well and managed by special device files.
-device files may be of three types - block, character and network.
-block devices - allow random access to data blocks. eg hard disks, optic disks, flash etc.
-char  devices - allow serial access to data. eg keyboard, mouse, display
-netw  devices - allow access to data via kernel. eg remote servers, disks etc.

-the kernel is informed of an event via traps (by procs) and interrupts (by devices).
-the processes are informed of an event via signals.
-signals allow processes to talk to each other. this is called interprocess communication.
-signals also are used by kernel to talk to processes.
-signals do not transfer data but only info about events.
-besides signals, a pool of semaphores are also used for interprocess communication.
-like signals, semaphores do not transfer data but only info about events.
-the actual data is transfered between process using pipes and shared memory.

---xxx--

Tuesday, May 24, 2011

Linux in Depth - 1

-linux looks and feels like unix--compatibility with unix has been a major design goal of linux.
-linux kernel was created and released by in 1991 on intel 80386, a 32 bit processor, by Linus Torvalds.
-first linux kernel version 0.01 was released May 14, 1991. It had no networking and limited device drivers.
-version 1.0 was released Mar 14, 1994. It had networking.
-current major version of linux kernel is version 2.6 that was released in 2003.
-while the kernel code for linux was written from scratch, non kernel code was modeled on GNU, X, BSD etc.
-in recent times, even BSD has taken Linux code in return; eg: pc sound h/w devices, certain math-libraries etc.
-linux in NOT a public domain s/w.
-in public domain s/w the author gives away the copyrights.
-in linux, copyrights are still held by various authors.
-rather linux is free s/w in that while maintaining authors copyrights it permits anyone to read, modify and redistribute it.
-while re-distributing, they have to give away the binaries plus the source code.
-note that its ok to charge a price for the redistribution.
-this means linux is free as in freedom but not necessarily in price/money.

-three components of a linux system:
    . kernel    - does h/w abstraction eg: cpu scheduling, virtual memory etc.
    . system libr    - system calls, for apps to talk to kernel
    . system utils    - individual system procs like daemons that have a specific task
-the processor has two modes - privileged and non privileged.
-the kernel code executes in the processors privileged mode.
-the other  code executes in the processors nonpriv    mode.
-the 'other' code is everything non kernel code. it could be a system code or user code.
-in privileged mode, the code has full control of system h/w.
-the switch between modes occurs due to system calls which triggers a 'context switch'.
-context switch (cs) is the switch of the processor between two processes.
-the two processes may be in different modes altogether - eg: privileged and unprivileged modes.
-during cs, the state of the current process must be saved in PCB before next process is scheduled.
-during cs, cpu is idle. so if the time spent in cs is minimized, system performance improves.
-cs is h/w dependent. ie, by better design, cs can be made more efficient.

-linux kernel code is a single monolithic binary, ie all the code is part of a single address space.
-it includes code for scheduler, virtual mem, device drivers, filesystem and networking.
-the main reason for putting all this code in one address space is 'performance'.
-bec all kernel code is in a single addr space, no context switch is reqd during os internal tasks.
-eg when a proc calls an os func or h/w interrupt occurs etc, no context switch is reqd.
-despite being monolithic, linux kernel is modular.
-non-essential code is kept outside of the kernel in the form of modules.
-these modules can be dynamically loaded or unloaded.
-modules run in kernel mode and have full access to the device they drive.
-two imp implications of modules outside of the kernel:
    . only module code needs to be recompiled (instead of whole kernel) if any changes made
    . module code can be distributed on non-GPL proprietary licenses
-in the above, the 2nd point (proprietary modules) would not be possible if module were inside the kernel.
-this is because the kernel is under GPL and forces all code in it to be GPLed.
-this is important because even though the modules run in kernel mode, they need not be under GPL like the kernel code.

... continued ...

Sunday, May 22, 2011

OS Concepts - Memory Management - 3

... continued from previous discussion ...

-virtual memory:
-contiguous memory allocation, paging, segmenting all have a common goal - multiprogramming.
-all of the above use swapping of processes in and out of memory to achieve the goal of multiprogramming.
-however, the shortcoming is that they all swap the 'entire' process between mem and disk and back.
-this is where another concept of 'virtual memory' or 'vm' comes in.
-the central idea behind virtual memory is that an entire process need not be swapped in and out of memory.
-instead a process can be divided up into smaller chunks and those smaller chunks can be swapped.
-in other words, virtual memory allows execution of processes that are not completely im memory.
-infact, vm allows existence of programs much larger than physical memory.
-to achieve this the os considers physical memory plus a part of storage as one logical chunk of memory.
-this logical chunk of mem+disk is called virtual memory.
-vm also allows sharing of memory between processes.
-virtual memory is divided up into uniform sized frames.
-the frames are mapped to the logical pages which are what the cpu operates on.
-the logical pages to physical frames mapping is stored in the page table.
-the page table is the software construct that exists in the memory mgmt unit (mmu) that is a hardware component.
-remember, the structure of a process address space is comprised of:
    . text         -program file
    . data         -global vars, constants
    . heap        -dynamic memory location
    . stack        -temp area for functions, params, vars etc
    . empty space    -between heap and stack for growth of either
-in the above, the empty space between the heap and stack is made up of virtual memory.
-so as needed, page-frames from this empty-space are swapped in out.
-the implementation of vm is not without its pitfalls.
-if implemented incorrectly, it can degrade system performance.
-usually this happens in the case of page thrashing.
-thrashing is the condition in which the system is spending more time swapping and less time executing.
-this happens when the scheduler is overwhelmed with the page-frame swapping requests of running processes.

---xxx--

OS Concepts - Memory Management - 2

... continued from our previous discussion ...

-fragmentation - is a condition in which address spaces in memory are not contiguous.
-in other words, the address spaces are interspersed by blocks of memory that are unused.
-these blocks are unused as they are not sufficiently large to house address spaces belonging to any process.
-fragmentation causes inefficiency in memory usage.

-paging is a method to avoid or minimize fragmentation.
-paging divides:
    .physical memory into units called 'frames'
    .logical  memory (in cpu registers) into units called 'pages'
-the size of frames and pages are defined by the hardware and are in powers of 2 bytes. eg 512 bytes, 1 KB etc..
-frames and pages are usually same size.
-when a process requests data from disk, a frame is requested (from disk) and loaded as page (in memory).
-since the size of page=size of frame, the blocks fit exactly and there is no wastage of mem (ie fragmentation).
-a structure in the MMU called 'page table' keeps track of which frame is loaded as which page.
-note: MMU is a hardware device while page table is a logical construct within mmu.

-segmentation is another method of memory management.
-in segmentation, size of memory unit is not as important as content (as opposed to paging).
-in a way the segmentation is a users perspective of the memory.
-each segment is defined by a name and a length or offset.
-the segment name (by user) to physical address (in memory) is mapped in a 'segment table'.

-in certain computer architectures, both paging and segmentation can coexist.
-Intel architecture is an example where both the methods can coexist.
-on Intel arch, linux has 6 segments for kernel code, kernel data, user code, user data, tasks and a default.
-linux also has 3 level paging strategy for supporting other cpu archs.

... continued ...

Friday, May 20, 2011

Xoom - Motorola, what were you thinking?

I'm a bit late in commenting about this but better late than never. Actually I wanted the initial hype to die down a little bit before I took my assessment.
OK. So lets talk about Motorola Xoom.
Its a new tablet from Motorola based on Google's Honeycomb. It was launched earlier this year with a lot of fanfare and hype prior to its launch. Ads touting Xoom to be the next thing in mobile computing, the best thing that happened since the advent of stone tablets, ads mimicking the 1984 Superbowl ad and what not..
So all in all, the folks in Motorola spent a lot of time in the Marketing Blitz.
Then came the product itself and the next thing we know, its a dud.
Its a dud because, despite all the hype and hoopla, its not selling as it was expected and as it should be.
Why not? First off, the folks who decide product design at Motorola need to be fired. What is great about a tablet that is heavier, thicker and smaller than the competition? Why tout it as the Best thing to have happened since human beings adopted use of tablets? I can't understand. Perhaps you do. No wonder they are unable to sell the product based on the sheer appeal of the tablet. The competition beats it in the look and appeal.
Secondly, Motorola, you need to fire the folks who set the pricing. Why on earth would you introduce that is more expensive, actually at least $100 more than the similar model from the competition. Why on earth will anyone buy a tablet for $800 (Wifi + 3G), or $600 (Wifi only) which are pricier and yet have no physical look or appeal better than their competition? Simply does not make sense.
Looks like the Motorola management has less control of the marketing and product design aspects.
As a result the first REAL tablet we thought could do better than the competition, actually is a BIG disappointment.
Learn Motorola, Learn...

OS Concepts - Memory Management - 1

Memory Management:
-towards the goal of maximizing cpu utilization, multiple programs need to be stored in memory.
-memory consits of a large array of words or bytes each with its own address.
-the cpu fetches instructions from the memory according to the value of the program counter.
-the cpu can fetch instructions directly only from:
    . registers built into the cpu
    . main memory.
-in other words, for instructions to be fetched by cpu, they must be brought to the above two first.
-note that cpu cannot directly access secondary storage like disks.
-since registers are built into the cpu, their access is as fast as the cpu itself.
-main memory on the other hand is accessed via the memory bus and therefore is much slower to access.
-to bridge this, a fast memory between cpu and main memory called 'cache' is used.
-apart from speed issues, the multiple processes residing in memory need to be protected from each other.
-this is done using the concept of memory address space.
-address space protection requires hardware support in the cpu.
-memory address space is like an isolated container for the processes.
-the address space is bounded by a base-register and a limit-register.
-base -register defines the smallest address for the address space.
-limit-register defines the range of addresses above the base register, available to the address space.
-the base and limit registers are part of the cpu h/w.
-the cpu compares the address generated in user mode with the registers assigned to a process.
-if there is a mismatch, then the process is trying to access memory outside its address space and that must be stopped.
-this is done by trapping the process in the os and sending an fatal error to the user process.
-the 'os' here is the kernel that runs in kernel mode.
-the os or kernel has unrestricted view and access to the memory assigned to the kernel and the users.
-note: the memory has a fixed area assigned for os kernel space and the remainder for user space.

-swapping is the event in which a process is moved temporarily out of memory to disk or secondary storage.
-this allows the scheduler to put another process in ready queue and available for cpu to work upon.
-address space assigned to a process is usually contiguos.
-if this becomes not possible (eg due to excessive swapping), fragmentation of memory arises.
-the cpu register does not know the absolute memory address of a process.
-this is because cpu register has much smaller size than main memory.
-the address used by the cpu is called 'logical address'.
-the address used by the mem is called 'physical address'.
-the mapping of logical to physical address is managed by the 'mmu' (memory management unit).
-this concept of logical address mapped to a separate physical address is central to proper memory management.

... continued ...

Thursday, May 19, 2011

CPU Scheduling Notes

cpu scheduling:
-the objective of multiprogramming is to maximize cpu utilization.
-to do this a process is run on the cpu until it must wait.
-during the wait time, another process is picked up from a pool of processes in memory to run on the cpu.
-this picking action is done by the cpu-scheduler.
-in real terms it is the kernel threads not processes that are being scheduled.
-thus process scheduling and thread scheduling are often used interchangeably.
-the queue of processes in memory that are ready to run is called 'ready-queue'.
-note: the ready-queue (that the scheduler chooses from) is not necessarily a fifo queue.
-the entries in the ready-queue are pcb (process control blocks) of processes.
-scheduling can be either preemptive or cooperative.
-in cooperative scheduling a process keeps running until it must wait or it completes its task.
-cooperative scheduling is also called as non-preemptive scheduling.
-in preemptive  scheduling a process runs when it changes from either running or waiting state to ready state.
-most contemporary os these days use preemptive scheduling.
-preemptive scheduling is more complex, requires kernel and hardware support.
-once the scheduler has selected a process to run, another component the dispatcher comes into play.
-the dispatcher transfers the control of cpu from the outgoing to the incoming process.

-on systems that support threads, the scheduling is done at the thread level.
-further, the scheduling is only done for the kernel threads, not user threads.
-user threads are managed and served by thread library and the kernel is unaware of them.

-on systems with multiple cpus, the scheduling is done on each cpu individually.
-this way each cpu can share some of the kernel and user processes and is called 'symmetric multiprocessing' (smp).
-most modern os like linux, unix, windows are smp systems.
-on systems with multiple cores and multiple cpus, the cores can handle distinct threads.
-multicore systems consume less power than multiprocess systems.
-virtualized systems behave as multiprocessor systems in terms of scheduling.
-there the vm software presents virtual cpus to each vm env.
-linux, unix and windows all use priority based preemptive scheduling.
-linux had non-preemptive scheduling prior to kernel 2.6.
-since 2.6, the kernel was made fully preemptive.

Wednesday, May 18, 2011

Linux: Interprocess Communication

-interprocess communication or ipc:
-ipc is the mechanism of sharing data between multiple processes.
-the idea behind ipc is - efficiency - system speed, resource utilization.
-two fundamental models of ipc:
    . shared memory
    . message passing
-shared mem - a region of mem is marked to be shared bet processes.
-mesg  pass - processes talk and share data via mesgs.
-shared mem is faster, more complex to implement. kernel intervention not reqd.
-mesg  pass is slower, less complex to implement. requires kernel intervention.
-certain shared mem terms in linux/unix - shmget(), shmat(), shmmax, shmmin, shmsize etc..
-while ipc can be useful for communication between processes in a system, other mechanisms are needed for remote proc comm.
-a client server system is an example of remote process communication.
-three models of client server system are:
    . sockets
    . remote procedure call (rpc)
    . pipes (ordinary or named)
-sockets are an endpoint of communication - usually a 'host-ip:port' pair.
-in a client server socket system, there is a pair of sockets on the client and server side.
-rpc is a call from a local app to a remote app using at the kernel layer.
-pipes allow two processes to talk and share their data.
-ordinary pipes are one way - like parent child - ie child inherits the data and atrributes of the parent.
-named pipes are two way - no parent child - ie procs can exist independent of each other.
-in linux, named pipes are called fifo and created using mkfifo() system call.
-once created they can be operated with regular system calls like open(), read(), write(), close().

OS Concepts - 3

... continued ...

-threads:
-threads are subset of processes ie a flow of control within a process.
-threads are the fundamental unit of cpu utilization.
-a process can have one or more threads ie the threads of a process share the same address space in mem.
-threads increase efficiency - a multithreaded process can do multiple things simultaneously.
-eg: a web browser process may have one thread displaying graphics while another gets data from disk.
-a thread can listen for client requests and spawn a new thread for the req; then go back to listening.
-multithreading becomes more efficient on multicore chips.
-in that, each thread can run independently on a different core in parallel.
-note: the os sees each core within a chip as a separate processor.
-most os kernels these days have multiple threads.
-eg: linux has a multithreaded kernel that among other things do free memory management.
-threads exist in both the user and kernel space.
-threads in the user space interact with threads in kernel space via system calls.
-the relationship between threads in user space and kernel space may be many-one, many-many, one-one.
-the one-one model is more common in contemporary os.
-threads are implemented via three main type of libraries.
-the three libraries are:
    . pthreads    -for linux, unix
    . win32 api    -for windows world
    . java threads    -for jvm implementations.
-in linux, the distinction between process and threads is blurry.
-instead they are treated as 'tasks'.
-for this, the clone() system call is used instead of a succession of fork() and exec() calls.
-note: when a child process is created first the fork() call is made.
-this creates two identical address spaces (ie processes).
-thereafter the exec() call is made to load a binary image of a program into memory (ie executes it).
-in this manner, the parent-child processes are able to communicate initially and then go separate ways.
-the parent then runs the wait() system call to wait for a return code from the child after the child finishes its work.
-in linux, the clone() system call creates the tasks.
-the task 'points' to the address space of the parent instead of creating a separate address space for the child.

---xxxxx--

Sunday, May 15, 2011

OS Concepts - 2

... continued ...


-system calls are internal command instructions to the kernel.
-they are an interface to the services made available by an operating system.
-user programs use system calls to interact with the kernel and in turn the hardware.
-system calls are generally routines written in C (occassionaly C++ or Assembly).
-Examples of system calls - open(), read(), write(), exec(), fork() etc..
-usually OS, provide a set of APIs that programmers can use without having to dabble with system calls.
-the APIs give a uniform set of commands that internally call system calls
-Examples of APIs are three - Posix APIs (Linux, Unix, Aapl); Win32 (Windows); Java API for JVMs.
-Six types of system calls relate to - process, file, device, info, communication, protection.

-multitasking os usually require interprocess communication.
-interprocess communication can be one of two models - message passing model or shared memory model.
-in mpm, message tokens are passed bet two procs while they are in their own address spaces.
-in smm, two processes agree to share some of their address space in order to communicate.
-mpm is suitable for small amounts of data share, esp between two computers.
-smm is suitable for large data share and is faster.
-smm is more complicated to implement than mpm.

-virtual machines:
-the fundamental idea behind virtual machines is to abstract h/w of a single system into multiple execution env.
-vm is attained via a vm abstraction layer (called hypervisor) for the h/w and cpu scheduling.
-vm tricks the guest os to think that they are in full control of the h/w.
-in this no changes on the guest os are reqd.
-paravirtualization - is another variation of vm.
-in pvm, the guest os is presented with an exec env which is similar but not identical to guest's preferred system.
-in other words, the guest must be modified to run the pvm h/w.
-in pvm only one kernel is insalled and h/w is not virtualized (the kernel env is global env)
-rather the host os and its devices are virtualized (the guest env are local env)
-this provides the processes in the guest os to think that they are the only procs on the system.
-examples of pvm are solaris containers.
-from an implementation perspective, the host os has a kernel mode and a vm env that is in user mode.
-then within the vm env, each guest os has its own 'virtual' kernel and user modes.
-when the guest os makes a system call, it is passed to the hypervisor.
-the hypervisor then changes the register contents and program counter for the vm to simulate a real system call.
-vm require h/w support.
-in vmware, the hypervisor runs in the physical user mode.
-in other words, vmware runs as an application tool on top of the os.
-virtual machines running on top of vmware think that they are running on bare h/w.


-processes:
-a process is a program in execution. it is the unit of work in computing.
-it is a running set of resources like cpu time, memory, files and io devices.
-every process has an address space in memory, where it executes.
-the address space comprises:
    . text         -program file
    . data         -global vars, constants
    . heap        -dynamic memory location
    . stack        -temp area for functions, params, vars etc
    . empty space    -between heap and stack for growth of either
-a process changes state during execution.
-process states could be new, ready, running, waiting, terminated.
-process control block (pcb) is the info that the operating system maintains for every process.
-the process control block is how the os identifies each process and operates on it.
-the process control block contains imp info like state, register address, memory addr, account info etc.
-a process may have multiple threads of execution.
-threads increase efficiency in computing.
-if a program is invoked multiple times, they have different processes and threads but might share some info.
-eg: if a browser is opened multiple times simultaneosly, each will be a separate process they may share some cache info.
-process scheduler is a programming construct of the os.
-the job of process scheduler is to select the process to run on the cpu from a pool of pcbs.
-when scheduled, each resource (cpu, mem, disk) will have its own queue that has multiple processes that wait on that resource.
-processes can be cpu bound or io bound ie one that spends more time on cpu or io operations resp.
-for efficiency a scheduler needs to put a good mix of cpu and io bound procs.
-context switch is the operation during which a cpu switches between procs.
-in context switch the cpu stops and saves its current proc and starts a new one.
-the time spent in context switch is a cpu overhead as the cpu does not do any useful work during this time.

... continued ...

OS Concepts - 1

OS Concepts:

-A very general definition of OS will be that it is:
    . a big program
    . that manages the hardware of a computer
    . on behalf of its users and other programs.
-the definition of an OS varies from the viewpoint it is looked at.
-a users definition of OS will differ from that of a program or that of the system itself.

-kernel - is that part of the OS that is intimately associated with hardware control and operation.
-kernel always runs on a computer.
-system programs are programs that are part of the OS but not part of the kernel.
-application programs are programs that are not part of the OS and are initiated by the user.

-a general purpose computer consists of cpus and device controllers connected via a system-bus.
-the device controllers control devices like memory, disk drives, video, keyboard etc.
-the cpu and device controllers can execute concurrently, competing for memory cycles.
-the memory cycles are in turn dependant on cycles of signals on the system-bus.
-typically the os has a device driver for each device controller.

-the bootstrap program that resides in the firmware, reads and loads the OS from disk to memory.
-once booted the os waits for some event to happen from either the s/w or h/w.
-events generated by h/w are called interrupts.
-events generated by s/w are called traps. traps are signalled by something called 'system calls'.
-the interrupts or traps are basically requests for cpu attention.
-during the event, the cpu stops what it is doing, stores the info in a fixed address in memory and attends to the event.
-once done with the event, it returs to the fixed address to resume its work.

-multiprogramming    - running multiple programs from one or more users simultaneously
-multitasking        - like multiprogramming but the tasks are from a single user.
-the above two ideas follow from that the system keeps many jobs in main memory simultaneously in a job pool.
-then it picks up jobs from the job pool to execute on the cpu. This is done by the scheduler.
-the switching between jobs is expected to be smooth so as to permit an effective end-user experience.

-to distinguish between user code and system code, two modes of os operation are defined:
    . user-mode and
    . kernel-mode
-they are set using a mode bit in the software.
-the mode bit allows the os to distinguish between a code from user or kernel.
-for user-mode,   mode bit is 1.
-for kernel-mode, mode bit is 0.
-eg: when a program run by a user requests a service from the os (via a system call), it changes the mode bit.
-at system boot, the h/w and os start in kernel mode.
-then the os starts user applications in user mode.
-when an event occurs (trap or interrupt), the h/w and os switch from user mode to kernel mode.
-this dual mode of operation protects the system from crashing due to runaway user programs.
-this is because the h/w allows privileged instructions to be run only in kernel mode.
-if an attempt is made by user program to run a privileged h/w instruction, then the h/w traps it to the os.
-the os shows an appropriate error message and the memory space for the user program may be dumped.
-the only way an user program can request h/w service is through a system call.
-note: intel 8088, on which dos ran, did not have the mode bit. so a runaway user program could crash the system.
-note: process failure causes memory to be dumped into a file called 'core dump'.
-note: kernel  failure causes kernel to create a file called 'crash dump'. kernel failure=crash.

-process is a program in execution.
-to become a program in execution, a process needs cpu time, memory, files and io devices.
-a process can have one or more threads in it.
-a thread is a sequence of instructions within a process.
-a single threaded process has one program counter.
-a multi  threaded process has one program counter for each thread.
-a program counter specifies the next instruction to be executed by the cpu.
-many processes can execute concurrently by multiplexing on a single cpu.

-cpu and memory are tighly integrated to work together.
-during process execution, cpu constantly reads and writes instructions from the memory.
-for a program to be executed as a process, it must be mapped and loaded to a fixed address in memory.
-this address in memory is called address space.
-in a multiprogramming system, the address space for each process needs to be isolated from others.
-yet at the same time, the processes must be able to share resources for efficiency.
-these present design and implementation challenges to the os.

... continued ...

Tuesday, April 26, 2011

Linux: A closer look - Miscellenia

-IPC - Interprocess communication - in a multiprocessing system, many processes share fewer resources requiring ipc.
-ipc increases efficiency by allowing sharing of resources and info between processes.
-at times one process may have to wait for another process to release a shared resource.
-this wait can be arbitrarily long depending on the precise timing of resource acquisition and request.
-Race condition describes the situation above where two resources compete and wait for resources to share.
-Mutex or Mutual Exclusion is a mechanism by which Race conditions can be avoided.
-Mutex helps by isolating the resources and space for each runnable process.

-virtualization first came in 1967 in IBM cp/cms system.
-now its back full force on Intel platform.
-in future many computers will be running hypervisors on the bare metal.
-the hypervisor will create a number of vms, each with its own operating system.
-even though multicore chips are here, the os'es dont use them efficiently.
-the combination of vm and multicore chips opens new avenues.
-cloud computing enables data to be stored remotely and accessed locally.
-this is becoming more and more popular.
-The limiting factor is network speeds, not size of data.

-NT is based on DEC VMS technology.
-VMS was designed by Dave Cutler at DEC.
-NT and VMS are so strikingly similar that in the early 1990s DEC and MSFT fought lawsuits over their IP and settled out of court.
-Unix and NT (or VMS) differ mainly because the types of computers they were designed for.
-Unix was designed in 70s on small machines with limited cpu and memory power.
-Unix has 'processes' as the unit of concurrency and composition.
-NT was desgined in the early 90s when machines had more cpu and memory.
-NT has 'threads' as the unit of concurrency, dynamic libraries are unit of composition.
-NT uses fork and exec as a single operation.
-this means a single operations creates a new process and runs another program without first making a copy.

Saturday, April 23, 2011

Linux: A closer look - IO Concepts

... continued ...

-In linux all IO is treated as file operations.
-in other words, IO devices are treated like files and accessed via the open, read, write and other file system calls.
-devices like printer, disks, terminals etc are listed as special files in /dev dir; eg /dev/lp for printer.
-eg: cp afile /dev/lp will print the file 'afile'. infact, cp is not even aware that it is printing.
-files can be regular or special files.
-special files can be block files or character files.
-block files are read one block at a time and can be accessed randonly.
-eg: its possible to jump directly to nth block of a block device file. usually they are used for disks.
-character files are used for devices that input or output character streams instead of blocks.
-eg: keyboards, printers, mice, etc.
-each special has a major device number and a minor device number.
-major device number refers to the driver and minor device number refers to the actual device that uses the driver.
-eg: if a disk driver supports two disks, then the two disk have the same major number but different minor number.
-the file type, major and minor numbers can be viewed in the ls -l output.

-another example of IO is networking as pioneered by Berkeley Unix and then adopted by Linux.
-the key concept here is that of socket.
-sockets can be treated as physical mailboxes on the wall where users interact with the postal system.
-similarly sockets allow you to access network services.
-sockets can be created and destroyed dynamically.
-sockets are created on both source and destn.
-the sender uses 'connect' system call, the receiver uses 'listen' system call.
-once the connection is no longer needed, it can be closed with the 'close' system call.
-socket creation returns a file descriptor which is needed for establishing a conn, reading data, writing and releasing conn.
-sockets can be:
    . reliable conn oriented byte stream        - send and receive follow the same order of bytes
    . reliable conn oriented packet stream        - like first one but preserves packet boundaries
    . unreliable packet transmission        - random order of packets for efficient transmission
-eg of reliable conn types    - tcp
-eg of unreliable conn type    - udp
-both tcp and udp are layered on top of ip.
-all three of these originated in arpanet (us dept of defense project) and led to the Internet of today.


Friday, April 22, 2011

Linux: A closer look - 4 - Memory Management

... continued ...

-every linux process has an address space in memory logically consisting of three segments: text, data and stack.
-contents of the three segments:
          . text segment    - machine instructions from programs executable code produced by a compiler/assembler; constant size.
          . data segment   - variables, strings, arrays and other runtime data. size changes (unlike text seg above).
          . stack segment  - shell variables, the command text etc. usually virtual address map.
-text segment can be shared between processes, data and stack are not.
-depending on the process needs and variables, the address space size allocated by linux varies from process to process.
-but it is in powers of 2 and expressed in pages.  eg: 2tothe4 pages or 16 pages.
-the size of the page is OS dependant. eg its common to have page size as 512 bytes, 1 KB, 2 KB etc.
-if initially allocated address space is not enough, a process can dynamically allocate more memory in Linux.
-the available address space for a machine is dependant on cpu bitsize.
-eg a 32 bit cpu will be able to handle an address space 2tothepower32 bits in size.
-eg a 64 bit cpu will be able to handle an address space 2tothepower64 bits in size.
-the Posix standard doesnt define memory management system calls. It lays its burden on the C library function called 'malloc'.
-in Linux, there are memory management system calls like - 'mmap' and 'munmap'
-note that the address space for a process is comprised of physical memory (text, data) and virtual memory components (stack).
-if a process is sitting idle and memory is short to handle all the needs, it can be moved from physical memory to virtual memory on disk.
-in earlier Unix systems the entire process used to be moved (called swapping).
-in Linux, instead of moving the entire process, parts of a process are moved (this is for efficiency).
-this is done by the page daemon (process 2, that we discussed in our previous blog)
-note that process 0 is and idle process sometimes called swapper, proc 1 is init and proc 2 is page daemon.
-the page daemon runs infrequently -- sleeps, wakes up and sleeps again. it is usually listed as 'kswapd' in ps.

... continued ...

Thursday, April 21, 2011

Linux: A closer look - 3 - Boot Process

... continued ...

-Linux Boot Process in general:
-when powered on, the bios performs power on self test (post)
-then the master boot rec (mbr) which is the 1st sector of the boot disk is read into memory and executed
-the mbr contains a small 512 byte program that calls the bootloader (eg grub or lilo) from the boot device
-the bootloader then copies itself to a higher memory address to make space for the kernel that will come afterwards.
-the bootloader usually needs to know how the filesystem works but not always true.
-eg: grub needs to know fs but intel's lilo doesnt need to (it relies on disk geometry not fs knowledge)
-the knowledge of disk geometry/fs helps the bootloader locate and load the kernel into memory
-the kernel startup code is written in assembly language and is machine dependent.
-the kernel startup job is to identify hardware, sanity checks, calling C language main procedure to start os etc.
-the C code starts services and writes messages on console.
-this includes loading drivers; modules are loaded as needed (unix preloads most, linux is more on-demand)
-once all h/w is ready, process 0 is started.
-process 0 sets up realtime clock, mounts root fs, creates process 1 (init) and process 2 (page daemon).
-process 1 checks if it is supposed to come up as single user or multi user process (depending on how boot was started by sa)
-in single user mode, process 1 forks a shell process and waits for user input.
-in multi  user mode, process 1 forks another process that runs the rc.d init scripts
-init scripts mount additional fs, start services, runs getty which starts login process on terminals.
-terminal procs are part of /etc/ttys, login procs are part of /bin/login, authentication info is in /etc/passwd
-if the login is ok, the command prompt is displayed in CUI or desktop env is started.
-thereafter the user is on his own...

... continued ...

Wednesday, April 20, 2011

Linux: A closer look - 2

... continued ...

-In linux, processes are created using the fork system call.
-fork creates an exact replica of the parent. The new process is called the child.
-initially the parent and child are exact copies but later they can generate distinct private memory image and variables.
-the parent is aware of the child process and the child knows about its parents pid.
-when the child process terminates it passes a status mesg to parent.
-processes can talk to each other via one of two ways -either pipes or signals. This is called interprocess comm.
-traditionally, processes are resource containers and threads are units of execution within a process. (Go here for a detailed discussion on process and threads)
-in 2000, linux introduced a new system call called 'clone' that blurred the differences between processes and threads.
-the 'clone' system call is not part of any other flavor of Unix.
-what clone did is that it made the resources comprising a process -eg: open files, signal handlers, etc specific to threads.
-in other words, what used to be hitherto common to all threads within a process, now became specific to individual threads.
-also traditionally each process in Unix is identified by a pid and all threads of that process share it.
-with the advent of clone call, linux introduced and identifier call 'task identifier' or tid.
-the tid is a basically a thread identifer.
-so if the clone system call is used to create a new process it will have a new pid and one/more new tids, sharing nothing with its parent pid.
-note, that the parent will have knowledge of both the new pid and tids.
-on the other hand, if the clone system call is used to create a new thread, it will have the same pid as parent but new tids.

... continued ...

Tuesday, April 19, 2011

Linux: A closer look - 1

-The first version of linux was 0.01 in 1991 (by Linus Torvalds).
-It had 9300 lines of C and 950 lines of assembly language code.
-'Linux' is usually meant to refer to the linux kernel.
-the linux kernel is responsible for managing the h/w and providing a system call interface to all programs.
-the system calls form the standard call library that live in user mode.
-the standard library (fork, open, read, write)is different from the standard utilities (like shell, X, gcc etc)
-the 'make' utility helps maintain large programs whose source code consists of multiple files.
-multiple source files can share header files using the 'include' keyword.
-so if a header file changes the source files need recompilation.
-the purpose of make utility is to keep track of which source file depends on which header.

-linux kernel components:
-kernel sits on the h/w and supports the overlying programs via system calls.
-the kernel can be broadly divided into 3 layers - lowest, middle and top.

-at the lowest layer of the kernel are interrupt handlers and dispatchers.
-interrupts are the primary way of interacting with devices. interrupts cause dispatching.
-when an interrupt happens, the dispatcher stops a running program, saves its state and starts the next program.
-interrupt code is in C while dispatcher code is in assembly language.

-at the middle layer of the kernel exist IO manager, memory manager and process manager.
-IO manager deals with device drivers for filesystems, network devices and terminals.
-memory manager deals with physical to virtual memory mapping, cache maintenance etc.
-process manager deals with process creation, scheduling and termination.

-at the top layer of the kernel exists the system call interface.
-user events lead to system calls causing a trap event that lead to action on the lower and middle layers as seen above.
-as one can imagine, the three layers of the linux kernel are highly interdependent in existence and action.

... continued ...

Monday, April 18, 2011

Linux: OS Basics - 3

... continued ...

Virtualization:

-Virtualization is the concept of running multiple OS simultaneously on one physical hardware.
-virtualization was attained originally on IBM VM370 in the late 60s but had been fairly nascent for almost 40 years.
-This was primarily due to the way the Intel chips were designed.
-They fixed it in 2005 in the so-called VT (virtualization technology). AMD also did so around the same time.
-This was bec, in recent times due to business needs, there was a renewed interest in virtualization.

-In virtualization, the hardware is abstracted by a piece of software called hypervisor.
-hypervisor is sometimes also called 'virtual machine monitor'.
-The hypervisor creates multiple abstract copies of the hardware that can co-exist in isolation.
-in a virtualized env, the hypervisor is the only s/w running in kernel mode.
-hypervisors can be of two types: type 1 hypervisors and type 2 hypervisors.
-type 1 hypervisor - runs on bare metal. Abstracts complete hardware.
-type 2 hypervisor - runs inside a host os. All guest os system calls are channeled through host os.
-type 2 hypervisors were a way to bypass the shortcomings of Intel chips until VT came along.
-VMware was a pioneer in popularizing type 2 hypervisor that came out of a research work at Stanford Univ.
-as the technologies evolved, both type 1 and type 2 hypervisors exist now.
-there are no real merits or demerits of either of the above types of hypervisors.
-both have their own strengths and weaknessess.

-most failures happen not due to hardware but due to buggy software--esp the os.
-in case of a virtual machine, the hypervisor is the only s/w running on baremetal.
-and the hypervisor has much fewer lines of code than a full os--ie lesser bugs.
-this is why, virtualized envs are generally fault-tolerant.
-benefits of virtualization include lower tco--lower cost, reliability, manageability etc.

-in a virtualized env, the hypervisor is the only s/w running in kernel mode.
-the hypervisor fools the overlying os by creating the illusion that the os is running in kernel mode.
-in reality, the os runs in user mode only and depending on the hypervisor for h/w access.
-all sensitive instructions issued by the guest os are turned into system calls to the hypervisor.
-the hypervisor executes them on the real h/w. the guest os never actually interact with h/w.

-in neither type 1 or type 2 hypervisors does the guest os need to be modified in any form.
-paravirtualization is a special type of virtualization in which the guest os is modified to make hypervisor calls.
-in this, the guest os is said to be para-virtualized.
-the difference in traditional vm (type 1/2) and paravirtualized is that:
    . in the former guest os makes calls to hardware via the hypervisor--ie it has to know the hardware.
    . in the latter guest os makes calls to hypervisor & the hyp then makes calls to h/w--ie guest os doesnt know h/w.
-paravirtualized guest os usually has better performance compared to full virtualization as it doesnt have to make calls to the h/w.
-a common example of paravirtualized hypervisor is Xen.

... now we are at a point where we can discuss Linux in a greater depth..

Sunday, April 17, 2011

Linux: OS Basics - 2

... continued ...

-process    - a program in execution.
-the program file exists on the disk. To execute, it must be read from disk to memory where the cpu operates on it.
-the process is fundamentally a container that holds all the info needed to run a program.
-every process is uniquely identifiable by its process id or pid.
-address space    - the space in memory where the process exists.
-This means, every process has an associated address-space, every process needs a container.
-address space values are from zero to a certain maximum.
-In a multi-programming system where multiple programs compete for cpu-time, the kernel does this:
    . listen to system calls from processes
    . upon system call traps, the kernel freezes the info in a processes's address space & saves it
    . then the kernel assigns another process to run on the cpu.
    . the kernel repeats the above on and on.
-note: a process must be resumed from exactly the same state that it was saved in memory.
-the info in the address-space is:
    . the executable program    - the code
    . program's data         - the variables and temp values
    . program's stack        - resources, counter, open-files, alarms etc.
-process table    - is a table maintained by the kernel.
-process table has info about every running process and their address space.
-for a given system, addressable space is 2tothepowerN where N is the cpu bitsize (eg 32bit or 64bit).
-if main memory is less than the addressable space, virtual memory is helpful.
-together they can map the entire addresable space.

-in contrast to processes that have independent address-spaces what if two lines of execution have same address space?
-threads are exactly that. threads are execution structures having a shared address space in memory.
-threads form part of a process. in other words a process can have multiple threads.
-threads improve efficiency by parallelizing execution of instructions for a process.
-threads are especially more efficient and relevant in multicore chips.
-in nutshell, threads are like mini-processes that exist in user space (since they are subset of a process)


-System calls - a mechanism for the user programs to communicate with the kernel.
-In Posix, there are about 100 standard system calls.
-In contrast, windows has thousands of calls (usually called API calls).
-note: from previous discussion, remember, user programs exist in user-space and kernel exists in kernel-space.
-examples of system calls - fork, exit, open, read, write, close, kill etc.
-to create a new process, an existing parent process forks and creates an identical copy of itself.
-the identical copy has its own address space and pid.

-kernels can be of two types - micro-kernel and mono-lithic.
-microkernel OS are more modular and a small set of modules runs in kernel mode. All else is in user space.
-eg: device drivers are in the user-space and bugs in a driver crash only it, not the system.
-microkernel os may be modular and seemingly secure but not necessarily fast.
-eg: Minix, Symbian, PikeOS.
-monolithic kernels are not so modular.
-they have a big chunk of modules running in the kernel mode and very few in user-space.
-eg: all device drivers in exist as part of the kernel.
-monolithic os may be bit bigger but are faster as most of the modules are already compiled and ready to go.
-eg: Unix, Linux.

... continued ...

Linux: OS Basics - 1

( Prior to going into Linux Specifics, let us review some basics in OS )

Operating System (OS) Basics:
-OS is a layer of software that manages computer hardware, hides complexity from users and runs programs for them.
-users interact with the OS via a shell (command line) or desktop (graphical interface).
-the part of the OS that interacts with the user-interface and hardware is called 'kernel'.
-kernel is the essence of an OS--therefore sometimes is used interchangeably with in jargon.
-OS has two modes -- kernel mode and user mode that run simultaneously.
-kernel mode runs the kernel part of the OS.
-user   mode runs the rest of the OS (ie all except the kernel, like the shell, gui, applications etc).
-in multi-user systems, the number of programs running in user mode are many.
-but behind those multiple programs there is only one kernel.
-the kernel creates an illusion of multi-tasking by rapidly switching between multiple user programs and serving them.
-this means for a very small amount of time, there is only one program accessing the hardware via the kernel.

-useful terms:
-multi-plexing        - sharing resource between multiple programs
-multi-programming    - running multiple programs simultaneously
-multi-tasking        - multi-programming
-multi-user        - serving multiple users simultaneously
-multi-processing    - running multiple processes simultaneously for the programs
-multi-threading    - running multiple threads within each process
-multi-core        - microprocessors with multiple independent cores built within.

-system call - a request made by the user program (in user mode) to receive services from the kernel (in kernel mode)
-system calls create a trap or interrupt request and there by gain attention of the kernel and processor.

-types of memory:
-registers    - fastest, costliest, smallest. live in the cpu. as fast as cpu (1 nanosec response)
-cache        - part of cpu but controlled by main mem. L1 and L2 cache. (2 nanosec response)
-main memory    - physical ram. Large and cheaper (10 nanosec response)
-disk/virtual    - large, cheap, slow (10 millisec response)
-tapes        - largest, cheapest, slowest (100 sec response)

-virtual memory - part of the disk that serves to store temporary mem blocks and thereby extends main memory.
-mmu - memory mgmt unit is a part of the cpu that maps physical mem addresses to virtual memory disk location.
-this enables a system to run more programs than can be fit in the main physical memory.
-bus - bus is the high speed communication channel on which various components of the computer talk.
-the cpu, memory, disk, peripheral are all connected to the bus.

... continued ...

Wednesday, April 13, 2011

Linux: Perl - 3

... continued ...

-Perl control structures:
-    if (expr) {...}
    unless (expr) {...}
-    if (expr) {...} else {...}
-    if (expr) {...} elsif {...} .. else {...}
-    foreach|for var (list) {...}
-    foreach|for (expr1; expr2; expr3) {...}
-    while (expr) {...}
-    until (expr) {...}

-useful functions/expressions:
-    my        - limits scope of a variable
-    die        - sends its argument to stderr and aborts program
-    chomp        - removes trailing newlines
-    say        - prints with a newline
-    print        - prints output on stdout
-    open        - opens a file
-    sort        - returns elements of an array in order
-    reverse        - returns elements of an array in reverse order
-    $!        - last system error
-    @ARGV        - array holding the arguments from the cmdline
-    %ENV        - env variables of the shell that called perl
-    split        - divides a string into substrings
- shift, push, pop, slice - array functions
-subroutines are distinct function blocks apart from the main program.
-subroutines can be called by other programs.
-this keeps the program structure modular.
-variables defined in the main prog are available to the subroutines.
-variables defined in the subroutine hv local scope by default.
-subs are defined by sub keyword and a name, {}
-eg: sub asub{..}
-they are called by their name and ()
-eg: asub ()
-params passed to a subroutine are listed in an array @_ and elements in @_ are $_
-cpan is the repository for perl docs, faqs and modules.
-perl modules come as tar files.
-in the module, first follow the 'readme' file.
-then create the config by running 'perl Makefile.PL'
-then run the 'make' on the Makefile just created to create the pkg.
-then run the 'make test' to test the module works.
-then run the 'make install' as root to install the module.
-post install, you can query perldoc to review the documentation.


Tuesday, April 12, 2011

Linux: Perl - 2

... continued ...

-Perl has designed to be close to human languages.
-For instance, perl distinguishes between singular and plural data
-also, perl 'say' cmd is like print but adds a newline and sounds more natural.
-In Perl, a variable comes into existence when you assign value to it--no need to declare first.
-But if you want, you can use 'use strict' clause to require explicit declaration.
-also, you refer the variable each time with the identifiers $ (scalar), @ (array) or % (hash)
-when you define variable with a lexical term 'my' it is recognized only within that program (ie local scope).
-this is useful when you have many subroutines and each needs local scope definition to not interfere with others.
-scalar variables begin with $ sign and hold a string or number.
-eg: $name = "Sammy" or $n1 = 5;
-Perl is smart enough to determine from context if a scalar is string or number.
-running a perl prog with -w shows runtime warnings, if any.
-running a perl prog with -e checks syntax on the command line.
-array variables hold multiple scalars.
-eg: @arrayvar = (8, 18, "Sammy"). It is a plural variable.
-hash variable is a plural variable that holds an array of key-value pair.n
-hashes are unordered, you need not have a sequence and can miss a key (eg 0 1 3 instead of 0 1 2 3).
-eg: %hashvar = (boat => seastar, numfive =>5, 4=>fish);
-in the above, the first element is boat, 2nd is numfive and third is 4 (a numeric).
-so the elements were 2 string types and one numeric, ie no order or same type reqd; thats hash.
.
.. continued ...