TimeDigit.com: April 2011

Tuesday, April 26, 2011

Linux: A closer look - Miscellenia

-IPC - Interprocess communication - in a multiprocessing system, many processes share fewer resources requiring ipc.
-ipc increases efficiency by allowing sharing of resources and info between processes.
-at times one process may have to wait for another process to release a shared resource.
-this wait can be arbitrarily long depending on the precise timing of resource acquisition and request.
-Race condition describes the situation above where two resources compete and wait for resources to share.
-Mutex or Mutual Exclusion is a mechanism by which Race conditions can be avoided.
-Mutex helps by isolating the resources and space for each runnable process.

-virtualization first came in 1967 in IBM cp/cms system.
-now its back full force on Intel platform.
-in future many computers will be running hypervisors on the bare metal.
-the hypervisor will create a number of vms, each with its own operating system.
-even though multicore chips are here, the os'es dont use them efficiently.
-the combination of vm and multicore chips opens new avenues.
-cloud computing enables data to be stored remotely and accessed locally.
-this is becoming more and more popular.
-The limiting factor is network speeds, not size of data.

-NT is based on DEC VMS technology.
-VMS was designed by Dave Cutler at DEC.
-NT and VMS are so strikingly similar that in the early 1990s DEC and MSFT fought lawsuits over their IP and settled out of court.
-Unix and NT (or VMS) differ mainly because the types of computers they were designed for.
-Unix was designed in 70s on small machines with limited cpu and memory power.
-Unix has 'processes' as the unit of concurrency and composition.
-NT was desgined in the early 90s when machines had more cpu and memory.
-NT has 'threads' as the unit of concurrency, dynamic libraries are unit of composition.
-NT uses fork and exec as a single operation.
-this means a single operations creates a new process and runs another program without first making a copy.

Saturday, April 23, 2011

Linux: A closer look - IO Concepts

... continued ...

-In linux all IO is treated as file operations.
-in other words, IO devices are treated like files and accessed via the open, read, write and other file system calls.
-devices like printer, disks, terminals etc are listed as special files in /dev dir; eg /dev/lp for printer.
-eg: cp afile /dev/lp will print the file 'afile'. infact, cp is not even aware that it is printing.
-files can be regular or special files.
-special files can be block files or character files.
-block files are read one block at a time and can be accessed randonly.
-eg: its possible to jump directly to nth block of a block device file. usually they are used for disks.
-character files are used for devices that input or output character streams instead of blocks.
-eg: keyboards, printers, mice, etc.
-each special has a major device number and a minor device number.
-major device number refers to the driver and minor device number refers to the actual device that uses the driver.
-eg: if a disk driver supports two disks, then the two disk have the same major number but different minor number.
-the file type, major and minor numbers can be viewed in the ls -l output.

-another example of IO is networking as pioneered by Berkeley Unix and then adopted by Linux.
-the key concept here is that of socket.
-sockets can be treated as physical mailboxes on the wall where users interact with the postal system.
-similarly sockets allow you to access network services.
-sockets can be created and destroyed dynamically.
-sockets are created on both source and destn.
-the sender uses 'connect' system call, the receiver uses 'listen' system call.
-once the connection is no longer needed, it can be closed with the 'close' system call.
-socket creation returns a file descriptor which is needed for establishing a conn, reading data, writing and releasing conn.
-sockets can be:
    . reliable conn oriented byte stream        - send and receive follow the same order of bytes
    . reliable conn oriented packet stream        - like first one but preserves packet boundaries
    . unreliable packet transmission        - random order of packets for efficient transmission
-eg of reliable conn types    - tcp
-eg of unreliable conn type    - udp
-both tcp and udp are layered on top of ip.
-all three of these originated in arpanet (us dept of defense project) and led to the Internet of today.

Friday, April 22, 2011

Linux: A closer look - 4 - Memory Management

... continued ...

-every linux process has an address space in memory logically consisting of three segments: text, data and stack.
-contents of the three segments:
          . text segment    - machine instructions from programs executable code produced by a compiler/assembler; constant size.
          . data segment   - variables, strings, arrays and other runtime data. size changes (unlike text seg above).
          . stack segment - shell variables, the command text etc. usually virtual address map.
-text segment can be shared between processes, data and stack are not.
-depending on the process needs and variables, the address space size allocated by linux varies from process to process.
-but it is in powers of 2 and expressed in pages. eg: 2tothe4 pages or 16 pages.
-the size of the page is OS dependant. eg its common to have page size as 512 bytes, 1 KB, 2 KB etc.
-if initially allocated address space is not enough, a process can dynamically allocate more memory in Linux.
-the available address space for a machine is dependant on cpu bitsize.
-eg a 32 bit cpu will be able to handle an address space 2tothepower32 bits in size.
-eg a 64 bit cpu will be able to handle an address space 2tothepower64 bits in size.
-the Posix standard doesnt define memory management system calls. It lays its burden on the C library function called 'malloc'.
-in Linux, there are memory management system calls like - 'mmap' and 'munmap'
-note that the address space for a process is comprised of physical memory (text, data) and virtual memory components (stack).
-if a process is sitting idle and memory is short to handle all the needs, it can be moved from physical memory to virtual memory on disk.
-in earlier Unix systems the entire process used to be moved (called swapping).
-in Linux, instead of moving the entire process, parts of a process are moved (this is for efficiency).
-this is done by the page daemon (process 2, that we discussed in our previous blog)
-note that process 0 is and idle process sometimes called swapper, proc 1 is init and proc 2 is page daemon.
-the page daemon runs infrequently -- sleeps, wakes up and sleeps again. it is usually listed as 'kswapd' in ps.

... continued ...

Thursday, April 21, 2011

Linux: A closer look - 3 - Boot Process

... continued ...

-Linux Boot Process in general:
-when powered on, the bios performs power on self test (post)
-then the master boot rec (mbr) which is the 1st sector of the boot disk is read into memory and executed
-the mbr contains a small 512 byte program that calls the bootloader (eg grub or lilo) from the boot device
-the bootloader then copies itself to a higher memory address to make space for the kernel that will come afterwards.
-the bootloader usually needs to know how the filesystem works but not always true.
-eg: grub needs to know fs but intel's lilo doesnt need to (it relies on disk geometry not fs knowledge)
-the knowledge of disk geometry/fs helps the bootloader locate and load the kernel into memory
-the kernel startup code is written in assembly language and is machine dependent.
-the kernel startup job is to identify hardware, sanity checks, calling C language main procedure to start os etc.
-the C code starts services and writes messages on console.
-this includes loading drivers; modules are loaded as needed (unix preloads most, linux is more on-demand)
-once all h/w is ready, process 0 is started.
-process 0 sets up realtime clock, mounts root fs, creates process 1 (init) and process 2 (page daemon).
-process 1 checks if it is supposed to come up as single user or multi user process (depending on how boot was started by sa)
-in single user mode, process 1 forks a shell process and waits for user input.
-in multi user mode, process 1 forks another process that runs the rc.d init scripts
-init scripts mount additional fs, start services, runs getty which starts login process on terminals.
-terminal procs are part of /etc/ttys, login procs are part of /bin/login, authentication info is in /etc/passwd
-if the login is ok, the command prompt is displayed in CUI or desktop env is started.
-thereafter the user is on his own...

... continued ...

Wednesday, April 20, 2011

Linux: A closer look - 2

... continued ...

-In linux, processes are created using the fork system call.
-fork creates an exact replica of the parent. The new process is called the child.
-initially the parent and child are exact copies but later they can generate distinct private memory image and variables.
-the parent is aware of the child process and the child knows about its parents pid.
-when the child process terminates it passes a status mesg to parent.
-processes can talk to each other via one of two ways -either pipes or signals. This is called interprocess comm.
-traditionally, processes are resource containers and threads are units of execution within a process. (Go here for a detailed discussion on process and threads)
-in 2000, linux introduced a new system call called 'clone' that blurred the differences between processes and threads.
-the 'clone' system call is not part of any other flavor of Unix.
-what clone did is that it made the resources comprising a process -eg: open files, signal handlers, etc specific to threads.
-in other words, what used to be hitherto common to all threads within a process, now became specific to individual threads.
-also traditionally each process in Unix is identified by a pid and all threads of that process share it.
-with the advent of clone call, linux introduced and identifier call 'task identifier' or tid.
-the tid is a basically a thread identifer.
-so if the clone system call is used to create a new process it will have a new pid and one/more new tids, sharing nothing with its parent pid.
-note, that the parent will have knowledge of both the new pid and tids.
-on the other hand, if the clone system call is used to create a new thread, it will have the same pid as parent but new tids.

... continued ...

Tuesday, April 19, 2011

Linux: A closer look - 1

-The first version of linux was 0.01 in 1991 (by Linus Torvalds).
-It had 9300 lines of C and 950 lines of assembly language code.
-'Linux' is usually meant to refer to the linux kernel.
-the linux kernel is responsible for managing the h/w and providing a system call interface to all programs.
-the system calls form the standard call library that live in user mode.
-the standard library (fork, open, read, write)is different from the standard utilities (like shell, X, gcc etc)
-the 'make' utility helps maintain large programs whose source code consists of multiple files.
-multiple source files can share header files using the 'include' keyword.
-so if a header file changes the source files need recompilation.
-the purpose of make utility is to keep track of which source file depends on which header.

-linux kernel components:
-kernel sits on the h/w and supports the overlying programs via system calls.
-the kernel can be broadly divided into 3 layers - lowest, middle and top.

-at the lowest layer of the kernel are interrupt handlers and dispatchers.
-interrupts are the primary way of interacting with devices. interrupts cause dispatching.
-when an interrupt happens, the dispatcher stops a running program, saves its state and starts the next program.
-interrupt code is in C while dispatcher code is in assembly language.

-at the middle layer of the kernel exist IO manager, memory manager and process manager.
-IO manager deals with device drivers for filesystems, network devices and terminals.
-memory manager deals with physical to virtual memory mapping, cache maintenance etc.
-process manager deals with process creation, scheduling and termination.

-at the top layer of the kernel exists the system call interface.
-user events lead to system calls causing a trap event that lead to action on the lower and middle layers as seen above.
-as one can imagine, the three layers of the linux kernel are highly interdependent in existence and action.

... continued ...

Monday, April 18, 2011

Linux: OS Basics - 3

... continued ...

Virtualization:

-Virtualization is the concept of running multiple OS simultaneously on one physical hardware.
-virtualization was attained originally on IBM VM370 in the late 60s but had been fairly nascent for almost 40 years.
-This was primarily due to the way the Intel chips were designed.
-They fixed it in 2005 in the so-called VT (virtualization technology). AMD also did so around the same time.
-This was bec, in recent times due to business needs, there was a renewed interest in virtualization.

-In virtualization, the hardware is abstracted by a piece of software called hypervisor.
-hypervisor is sometimes also called 'virtual machine monitor'.
-The hypervisor creates multiple abstract copies of the hardware that can co-exist in isolation.
-in a virtualized env, the hypervisor is the only s/w running in kernel mode.
-hypervisors can be of two types: type 1 hypervisors and type 2 hypervisors.
-type 1 hypervisor - runs on bare metal. Abstracts complete hardware.
-type 2 hypervisor - runs inside a host os. All guest os system calls are channeled through host os.
-type 2 hypervisors were a way to bypass the shortcomings of Intel chips until VT came along.
-VMware was a pioneer in popularizing type 2 hypervisor that came out of a research work at Stanford Univ.
-as the technologies evolved, both type 1 and type 2 hypervisors exist now.
-there are no real merits or demerits of either of the above types of hypervisors.
-both have their own strengths and weaknessess.

-most failures happen not due to hardware but due to buggy software--esp the os.
-in case of a virtual machine, the hypervisor is the only s/w running on baremetal.
-and the hypervisor has much fewer lines of code than a full os--ie lesser bugs.
-this is why, virtualized envs are generally fault-tolerant.
-benefits of virtualization include lower tco--lower cost, reliability, manageability etc.

-in a virtualized env, the hypervisor is the only s/w running in kernel mode.
-the hypervisor fools the overlying os by creating the illusion that the os is running in kernel mode.
-in reality, the os runs in user mode only and depending on the hypervisor for h/w access.
-all sensitive instructions issued by the guest os are turned into system calls to the hypervisor.
-the hypervisor executes them on the real h/w. the guest os never actually interact with h/w.

-in neither type 1 or type 2 hypervisors does the guest os need to be modified in any form.
-paravirtualization is a special type of virtualization in which the guest os is modified to make hypervisor calls.
-in this, the guest os is said to be para-virtualized.
-the difference in traditional vm (type 1/2) and paravirtualized is that:
. in the former guest os makes calls to hardware via the hypervisor--ie it has to know the hardware.
. in the latter guest os makes calls to hypervisor & the hyp then makes calls to h/w--ie guest os doesnt know h/w.
-paravirtualized guest os usually has better performance compared to full virtualization as it doesnt have to make calls to the h/w.
-a common example of paravirtualized hypervisor is Xen.

... now we are at a point where we can discuss Linux in a greater depth..

Sunday, April 17, 2011

Linux: OS Basics - 2

... continued ...

-process    - a program in execution.
-the program file exists on the disk. To execute, it must be read from disk to memory where the cpu operates on it.
-the process is fundamentally a container that holds all the info needed to run a program.
-every process is uniquely identifiable by its process id or pid.
-address space    - the space in memory where the process exists.
-This means, every process has an associated address-space, every process needs a container.
-address space values are from zero to a certain maximum.
-In a multi-programming system where multiple programs compete for cpu-time, the kernel does this:
    . listen to system calls from processes
    . upon system call traps, the kernel freezes the info in a processes's address space & saves it
    . then the kernel assigns another process to run on the cpu.
    . the kernel repeats the above on and on.
-note: a process must be resumed from exactly the same state that it was saved in memory.
-the info in the address-space is:
    . the executable program    - the code
    . program's data         - the variables and temp values
    . program's stack        - resources, counter, open-files, alarms etc.
-process table    - is a table maintained by the kernel.
-process table has info about every running process and their address space.
-for a given system, addressable space is 2tothepowerN where N is the cpu bitsize (eg 32bit or 64bit).
-if main memory is less than the addressable space, virtual memory is helpful.
-together they can map the entire addresable space.

-in contrast to processes that have independent address-spaces what if two lines of execution have same address space?
-threads are exactly that. threads are execution structures having a shared address space in memory.
-threads form part of a process. in other words a process can have multiple threads.
-threads improve efficiency by parallelizing execution of instructions for a process.
-threads are especially more efficient and relevant in multicore chips.
-in nutshell, threads are like mini-processes that exist in user space (since they are subset of a process)

-System calls - a mechanism for the user programs to communicate with the kernel.
-In Posix, there are about 100 standard system calls.
-In contrast, windows has thousands of calls (usually called API calls).
-note: from previous discussion, remember, user programs exist in user-space and kernel exists in kernel-space.
-examples of system calls - fork, exit, open, read, write, close, kill etc.
-to create a new process, an existing parent process forks and creates an identical copy of itself.
-the identical copy has its own address space and pid.

-kernels can be of two types - micro-kernel and mono-lithic.
-microkernel OS are more modular and a small set of modules runs in kernel mode. All else is in user space.
-eg: device drivers are in the user-space and bugs in a driver crash only it, not the system.
-microkernel os may be modular and seemingly secure but not necessarily fast.
-eg: Minix, Symbian, PikeOS.
-monolithic kernels are not so modular.
-they have a big chunk of modules running in the kernel mode and very few in user-space.
-eg: all device drivers in exist as part of the kernel.
-monolithic os may be bit bigger but are faster as most of the modules are already compiled and ready to go.
-eg: Unix, Linux.

... continued ...

Linux: OS Basics - 1

( Prior to going into Linux Specifics, let us review some basics in OS )

Operating System (OS) Basics:
-OS is a layer of software that manages computer hardware, hides complexity from users and runs programs for them.
-users interact with the OS via a shell (command line) or desktop (graphical interface).
-the part of the OS that interacts with the user-interface and hardware is called 'kernel'.
-kernel is the essence of an OS--therefore sometimes is used interchangeably with in jargon.
-OS has two modes -- kernel mode and user mode that run simultaneously.
-kernel mode runs the kernel part of the OS.
-user   mode runs the rest of the OS (ie all except the kernel, like the shell, gui, applications etc).
-in multi-user systems, the number of programs running in user mode are many.
-but behind those multiple programs there is only one kernel.
-the kernel creates an illusion of multi-tasking by rapidly switching between multiple user programs and serving them.
-this means for a very small amount of time, there is only one program accessing the hardware via the kernel.

-useful terms:
-multi-plexing        - sharing resource between multiple programs
-multi-programming    - running multiple programs simultaneously
-multi-tasking        - multi-programming
-multi-user        - serving multiple users simultaneously
-multi-processing    - running multiple processes simultaneously for the programs
-multi-threading    - running multiple threads within each process
-multi-core        - microprocessors with multiple independent cores built within.

-system call - a request made by the user program (in user mode) to receive services from the kernel (in kernel mode)
-system calls create a trap or interrupt request and there by gain attention of the kernel and processor.

-types of memory:
-registers    - fastest, costliest, smallest. live in the cpu. as fast as cpu (1 nanosec response)
-cache        - part of cpu but controlled by main mem. L1 and L2 cache. (2 nanosec response)
-main memory    - physical ram. Large and cheaper (10 nanosec response)
-disk/virtual    - large, cheap, slow (10 millisec response)
-tapes        - largest, cheapest, slowest (100 sec response)

-virtual memory - part of the disk that serves to store temporary mem blocks and thereby extends main memory.
-mmu - memory mgmt unit is a part of the cpu that maps physical mem addresses to virtual memory disk location.
-this enables a system to run more programs than can be fit in the main physical memory.
-bus - bus is the high speed communication channel on which various components of the computer talk.
-the cpu, memory, disk, peripheral are all connected to the bus.

... continued ...

Wednesday, April 13, 2011

Linux: Perl - 3

... continued ...

-Perl control structures:
-    if (expr) {...}
    unless (expr) {...}
-    if (expr) {...} else {...}
-    if (expr) {...} elsif {...} .. else {...}
-    foreach|for var (list) {...}
-    foreach|for (expr1; expr2; expr3) {...}
-    while (expr) {...}
-    until (expr) {...}

-useful functions/expressions:
-    my        - limits scope of a variable
-    die        - sends its argument to stderr and aborts program
-    chomp        - removes trailing newlines
-    say        - prints with a newline
-    print        - prints output on stdout
-    open        - opens a file
-    sort        - returns elements of an array in order
-    reverse        - returns elements of an array in reverse order
-    $!        - last system error
-    @ARGV        - array holding the arguments from the cmdline
-    %ENV        - env variables of the shell that called perl
-    split        - divides a string into substrings
- shift, push, pop, slice - array functions
-subroutines are distinct function blocks apart from the main program.
-subroutines can be called by other programs.
-this keeps the program structure modular.
-variables defined in the main prog are available to the subroutines.
-variables defined in the subroutine hv local scope by default.
-subs are defined by sub keyword and a name, {}
-eg: sub asub{..}
-they are called by their name and ()
-eg: asub ()
-params passed to a subroutine are listed in an array @_ and elements in @_ are $_
-cpan is the repository for perl docs, faqs and modules.
-perl modules come as tar files.
-in the module, first follow the 'readme' file.
-then create the config by running 'perl Makefile.PL'
-then run the 'make' on the Makefile just created to create the pkg.
-then run the 'make test' to test the module works.
-then run the 'make install' as root to install the module.
-post install, you can query perldoc to review the documentation.

Tuesday, April 12, 2011

Linux: Perl - 2

... continued ...

-Perl has designed to be close to human languages.
-For instance, perl distinguishes between singular and plural data
-also, perl 'say' cmd is like print but adds a newline and sounds more natural.
-In Perl, a variable comes into existence when you assign value to it--no need to declare first.
-But if you want, you can use 'use strict' clause to require explicit declaration.
-also, you refer the variable each time with the identifiers $ (scalar), @ (array) or % (hash)
-when you define variable with a lexical term 'my' it is recognized only within that program (ie local scope).
-this is useful when you have many subroutines and each needs local scope definition to not interfere with others.
-scalar variables begin with $ sign and hold a string or number.
-eg: $name = "Sammy" or $n1 = 5;
-Perl is smart enough to determine from context if a scalar is string or number.
-running a perl prog with -w shows runtime warnings, if any.
-running a perl prog with -e checks syntax on the command line.
-array variables hold multiple scalars.
-eg: @arrayvar = (8, 18, "Sammy"). It is a plural variable.
-hash variable is a plural variable that holds an array of key-value pair.n
-hashes are unordered, you need not have a sequence and can miss a key (eg 0 1 3 instead of 0 1 2 3).
-eg: %hashvar = (boat => seastar, numfive =>5, 4=>fish);
-in the above, the first element is boat, 2nd is numfive and third is 4 (a numeric).
-so the elements were 2 string types and one numeric, ie no order or same type reqd; thats hash.
.
.. continued ...

Monday, April 11, 2011

Linux: Perl - 1

Perl:
-A language based on awk, C, sed and many others, it was created by Larry Wall in 1987.
-Initially its main intent was text processing, now it has grown to be a full blown language.
-Perl is smart and efficient. Elegance is left out for others..
-Perl has thousands of 3rd party modules, that enhance its smartness and efficiency.
-Perl authoritative source is www.cpan.org and its home page is www.perl.com.
-The best way to learn Perl is to work with it.

-The first line of perl scripts begin with #!/usr/bin/perl.
-this tells the shell to pass the script to perl for executing.
-To debug perl:
    . perl -w    -shows warnings
    . use strict    -imposes order
    . perldebug    -perl debugger
-perl documentation is in package perl-doc. Install it first before using Perl.
    # yum install perl-doc
-perldoc allows you to get easy documentation and also create your own.
-perldoc is similar to manual pages but specific to perl.
-Some perl cmds:
    # perl anyprog.pl        - run a program
    # perl -e 'anyprog.pl'        - runs perl cmd inplace on the cmd line.
    # perl -v            - version

-Some perl terms:
    . module    - self contained chunk of code
    . block        - zero or more statements delimited by curly brackets {}
    . list        - series of scalars, scalars being a type of variable
    . array        - variable holding a list in a definite order

...continued...

Sunday, April 10, 2011

Linux: Apache - 2

... continued ...

Setting up Apache:
-first make a backup copy of /etc/httpd/conf/httpd.conf
-in RHEL/Fedora, run 'system-config-httpd' utility--it is simpler than editing httpd.conf file directly.
-you can setup Apache to listen simultaneously at multiple ip/ports by setting the 'Listen' directive.
-eg:    Listen    80                [ localhost ]
    Listen    192.168.1.1:8070
-the ServerName directive sets the FQDN for the Apache webserver
-eg:    ServerName www.example.com:80
-besides the above, you can set other important directives like ServerAdmin and DocumentRoot.
-then there are multiple optional/additional directives for MaxClients, Timeout, Errorlog etc..
-Dont forget the restart of Apache httpd daemon.
    # /sbin/service httpd restart | start | stop | graceful

-Redirects:
-Apache can respond to a request for an URI by asking the client to request a different URI. This is called 'redirect'.
-This works because redirection is part of http implementation.
-This is especially useful in cases of website-moves or new page additions.
-The ServerName and UseCanonicalName directives of the httpd.conf file are used in redirection.

-Virtual Hosts:
-Virtual hosts allow one instance of Apache to requests directed to multiple IP/hosts as if they were multiple servers.
-Each virtual host can then server different content.
-virtual hosts can be set up by 'name' or by 'IP address'.
-virtual hosts by name are useful if you have only one shared IP address.
-virtual hosts are defined in the <VirtualHost> container of httpd.conf.
-in this container you can define new ServerName, DocumentRoot and other directives, different from the main server.

-Troubleshooting:
-Some useful Apache Troubleshooting commands:
    # /sbin/service http configtest           - checks syntax of httpd.conf file
    # /sbin/service httpd status               - checks all the httpd daemon status
    # telnet <ip addr> 80                       - to verify client browser is functional
    # cat .htaccess                                   - verifies authentication directives to restrict access to web pages
    # htpasswd -c .htpasswd mrinal        - sets new password for user mrinal (-c = change)
Note: the typical location for .htpasswd file is /var/www/.htpasswd

Wednesday, April 6, 2011

Linux: Apache - 1

-Apache is the most popular web server on the Internet today.
-The reason for its popularity: robust and extensible + easy install, config and maintenance.
-Web servers work on client-server tech. The clients being the web browsers on users computers.
-Web servers are oblivious to the content itself; server admin and content creation are two different functions.
-Apache design is modular. Parts of its code can be recompiled and loaded individually.
-Apache depends on three base pkgs - httpd, apr (apache portable runtime), apr-util
-other optional pkgs are mod_perl, mod_ssl, webalizer (logs) etc..
-To start/stop apache:
    # /sbin/service httpd start | stop | graceful        [graceful bounces apache without affecting current sessions]
-Apache uses tcp port 80 (or 443 for secure http).
-since Apache uses privileged ports, it is started as root.
-Once started for security, Apache process runs as user and group=apache.
-the root of the directory hierarchy that apace serves content from is called document root.
-the default document root is /var/www/html.
-it automatically displays index.html file from this dir.
-the current popular version on apache is 2.2
-default apache config file is /etc/httpd/conf/httpd.conf.
-three basic params in this file are:
    . servername    << name >>
    . serveradmin    << email addr >>
    . serversignature Email
-thereafter, bounce apache and point browser to http://<servername> to test

... continued ...

Tuesday, April 5, 2011

Linux: Email - 2

Concluding part of email discussion..

-configuring sendmail:
-config files reside in /etc/mail. Primary file is sendmail.cf.
-this file is not edited directly--the corresponding sendmail.mc file is edited.
-sendmail reads the corresponding *db files in the same dir.
-*db files are generated using 'makemap' cmd.
-the config files contain 'dnl' keyword in beginning and end of certain lines.
-dnl = delete to new line; dnl causes the daemon to ignore rest of the line. so dnl is used for comments.
-after editing sendmail.mc, restart sendmail to regenerate the sendmail.cf file and restart daemon.
    # /sbin/service    sendmail restart
-certain other files exist in /etc/mail like:
    . mailertable    -    forwards emails from one domain to another
    . access    -    sets up a relay host
    . virtusertable    -    serves email to multiple domains, like alias
-spamassassin is a tool for preventing spam which accounts for three quarters of all email
-spam is more correctly, unsolicited commercial email.
-spamassasin does analysis of mail header, text, blacklists etc.
-spamassasin is accessed via spamc cmd.
-eg:    $ echo 'hi there' | spamc
-webmail - mail accessed over the web eg: gmail
-webmail allows you to access email from anywhere on the Internet, unlike dedicated clients like thunderbird.

Monday, April 4, 2011

Linux: Email - 1

Here is the first of Email series:

-Sending and Receiving emails requires 3 pieces of s/w:
    . mua    - mail user agent    - the client used by end user - eg: thunderbird
    . mta    - mail transfer agent    - the email server - eg: sendmail=smtp server
    . mda    - mail delivery agent    - the deliverer at receiver site - eg: dovecot (runs pop, imap)..
-most mua's (eg thunderbird) dont interact directly with mta (eg sendmail).
-they use a services like pop / imap.
-This is esp useful when the sending host is a mobile device. the mua and mta are on different hosts.
-sendmail:
    . the smtp server uses port 25 by default for its work.
    . requires pkgs - sendmail (for work) and sendmail-cf (for config)
-while sendmail is predominant, it is complex. its alternatives are postfix and qmail.
-sendmail working:
    . when email is sent by mua, it is queued in /var/spool/mqueue
    . sendmail generates a df (datafile) and qf (queue file) for the msg (message).
    . sendmail then sends the file to the dest sendmail.
    . if any error occurs, sendmail creates a tf (temp file) and xf (error log).
    . on receiver side, the mda stores msgs in /var/spool/mail in mbox format.
    . in this dir, each user has a mail file.
    . sendmail logs are in /var/log/maillog.
    . to see status of outgoing mail run 'mailq' cmd or 'mailstats' cmd.
    . to read mail, run 'mail' cmd.
    . if the delivery is made to alternate users (eg root, postmaster, webmaster), /etc/alias can be used.
    . format of /etc/alias is:
        type:    whom-to
    -eg:    complaints:    root, mrinal@admin.com
        lost:        /dev/null
    . after /etc/alias is created, run cmd 'newaliases' or bounce sendmail to recreate alias.db that sendmail reads.
    . while /etc/alias is config by root, indiv users can redirect/forward their own mails using .forward file.
    . the .forward file lives in users home dir

...continued...

TimeLinux1