Monday, October 17, 2016

RedHat Cluster


1.            How can you define a cluster and what are its basic types?
                A cluster is two or more computers (called nodes or members) that work together to perform a task. There are                 four major types of clusters:
Storage
                High availability
                Load balancing
High performance
2.            What is Storage Cluster?
Storage clusters provide a consistent file system image across servers in a cluster, allowing the servers to simultaneously read and write to a single shared file system. 
A storage cluster simplifies storage administration by limiting the installation and patching of applications to one file system. 
The High Availability Add-On provides storage clustering in conjunction with Red Hat GFS2
3.            What is High Availability Cluster?
High availability clusters provide highly available services by eliminating single points of failureand by failing over services from one cluster node to another in case a node becomes inoperative. 
Typically, services in a high availability cluster read and write data (via read-write mounted file systems). 
A high availability cluster must maintain data integrity as one cluster node takes over control of a service from another cluster node. 
Node failures in a high availability cluster are not visible from clients outside the cluster. 
High availability clusters are sometimes referred to as failover clusters.
4.            What is Load Balancing Cluster?
Load-balancing clusters dispatch network service requests to multiple cluster nodes to balance the request load among the cluster nodes. 
Load balancing provides cost-effective scalability because you can match the number of nodes according to load requirements. If a node in a load-balancing cluster becomes inoperative, the load-balancing software detects the failure and redirects requests to other cluster nodes. 
Node failures in a load-balancing cluster are not visible from clients outside the cluster. 
Load balancing is available with the Load Balancer Add-On.
5.            What is a High Performance Cluster?
High-performance clusters use cluster nodes to perform concurrent calculations. 
A high-performance cluster allows applications to work in parallel, therefore enhancing the performance of the applications. 
High performance clusters are also referred to as computational clusters or grid computing.
6.            How many nodes are supported in Red hat 6 Cluster?
                A cluster configured with qdiskd supports a maximum of 16 nodes. The reason for the limit is because of         scalability; increasing the node count increases the amount of synchronous I/O contention on the shared            quorum disk device.
7.            What is the minimum size of the Quorum Disk?
                The minimum size of the block device is 10 Megabytes.

8.            What is the order in which you will start the Red Hat Cluster services?
                In Red Hat 4 :
                # service ccsd start
                # service cman start
                # service fenced start
                service clvmd start (If CLVM has been used to create clustered volumes)
                # service gfs start
                # service rgmanager start
                In RedHat 5 :
                # service cman start
                # service clvmd start
                # service gfs start
                # service rgmanager start
                In Red Hat 6 :
                # service cman start
                # service clvmd start
                # service gfs2 start
                # service rgmanager start
9.            What is the order to stop the Red Hat Cluster services?
                In Red Hat 4 :
                # service rgmanager stop
                # service gfs stop
                # service clvmd stop
                # service fenced stop
                # service cmanstop
                # service ccsd stop
                In Red Hat 5 :
                # service rgmanager stop
                # servicegfsstop
                # service clvmd stop
                # servicecman stop
                In Red Hat 6 :
                # service rgmanagerstop
                # service gfs2 stop
                # service clvmdstop
                # service cman stop
10.          What are the performance enhancements in GFS2 as compared to GFS?
Better performance for heavy usage in a single directory
Faster synchronous I/O operations
Faster cached reads (no locking overhead)
Faster direct I/O with preallocated files (provided I/O size is reasonably large, such as 4M blocks)
Faster I/O operations in general
Faster Execution of the df command, because of faster statfs calls
Improved atime mode to reduce the number of write I/O operations generated by atime when compared with GFS
GFS2 supports the following features.
extended file attributes (xattr)
the lsattr() and chattr() attribute settings via standard ioctl() calls
nanosecond timestamps
GFS2 uses less kernel memory.
GFS2 requires no metadata generation numbers.
Allocating GFS2 metadata does not require reads. Copies of metadata blocks in multiple journals are managed by revoking blocks from the journal before lock release.
GFS2 includes a much simpler log manager that knows nothing about unlinked inodes or quota changes.
The gfs2_grow and gfs2_jadd commands use locking to prevent multiple instances running at the same time.
The ACL code has been simplified for calls like creat() and mkdir().
Unlinked inodes, quota changes, and statfs changes are recovered without remounting the journal.
11.          What is the maximum file system support size for GFS2?
GFS2 is based on 64 bit architecture, which can theoretically accommodate an 8 EB file system. 
However, the current supported maximum size of a GFS2 file system for 64-bit hardware is 100 TB. 
The current supported maximum size of a GFS2 file system for 32-bit hardware for Red Hat Enterprise Linux Release 5.3 and later is 16 TB. 
NOTE: It is better to have 10 1TB file systems than one 10TB file system.
12.          What is the journaling filesystem?
A journaling filesystem is a filesystem that maintains a special file called a journal that is used to repair any inconsistencies that occur as the result of an improper shutdown of a computer.
In journaling file systems, every time GFS2 writes metadata, the metadata is committed to the journal before it is put into place. 
This ensures that if the system crashes or loses power, you will recover all of the metadata when the journal is automatically replayed at mount time.
GFS2 requires one journal for each node in the cluster that needs to mount the file system. For example, if you have a 16-node cluster but need to mount only the file system from two nodes, you need only two journals. If you need to mount from a third node, you can always add a journal with the gfs2_jadd command.
13.          What is the default size of journals in GFS?
                When you run mkfs.gfs2 without the size attribute for journal to create a GFS2 partition, by default a 128MB                 sizejournal is created which is enough for most of the applications
                In case you plan on reducing the size of the journal, it can severely affect the performance. Suppose you reduce             the size of the journal to 32MB it does not take much file system activity to fill an 32MB journal, and when the     journal is full, performance slows because GFS2 has to wait for writes to the storage.
14.          What is a Quorum Disk?
Quorum Disk is a disk-based quorum daemon, qdiskd, that provides supplemental heuristics to determine node fitness.
With heuristics you can determine factors that are important to the operation of the node in the event of a network partition
                For a 3 node cluster a quorum state is present until 2 of the 3 nodes are active i.e. more than half. But what if                due to some reasons the 2nd node also stops communicating with the 3rd node? In that case under a          normal architecture the cluster would dissolve and stop working. But for mission critical environments and such   scenarios we use quorum disk in which an additional disk is configured which is mounted on all the nodes with        qdiskd service running and a vote value is assigned to it.
                So suppose in above case I have assigned 1 vote to qdisk so even after 2 nodes stops communicating with 3rd               node, the cluster would have 2 votes (1 qdisk + 1 from 3rd node) which is still more than half of vote count for a      3 node cluster. Now both the inactive nodes would be fenced and your 3rd node would be still up and running           being a part of the cluster.
15.          What is rgmanager in Red Hat Cluster and its use?
This is a service termed as Resource Group Manager
RGManager manages and provides failover capabilities for collections of cluster resources called services, resource groups, or resource trees
it allows administrators to define, configure, and monitor cluster services. In the event of a node failure, rgmanager will relocate the clustered service to another node with minimal service disruption.
16.          What is luci and ricci in Red Hat Cluster?
luci is the server component of the Conga administration utility
Conga is an integrated set of software components that provides centralized configuration and management of Red Hat clusters and storage
luci is a server that runs on one computer and communicates with multiple clusters and computers via ricci

ricci is the client component of the Conga administration utility
ricci is an agent that runs on each computer (either a cluster member or a standalone computer) managed by Conga
This service needs to be running on all the client nodes of the cluster.
17.          What is cman in Red Hat Cluster?
This is an abbreviation used for Cluster Manager. 
CMAN is a distributed cluster manager and runs in each cluster node. 
It is responsible for monitoring, heartbeat, quorum, voting and communication between cluster nodes.
CMAN keeps track of cluster quorum by monitoring the count of cluster nodes.
18.          What are the different port no. used in Red Hat Cluster?
IP Port no.
Protocol
Component
5404,5405
UDP
corosync/cman
11111
TCP
ricci
21064
TCP
dlm (Distributed Lock Manager)
16851
TCP
Modclustered
8084
TCP
luci
4196,4197
TCP
rgmanager

19.          How does NetworkManager service affects Red Hat Cluster?
The use of NetworkManager is not supported on cluster nodes. If you have installed NetworkManager on your cluster nodes, you should either remove it or disable it.
                                # service NetworkManager stop
                                # chkconfig NetworkManager off
The cman service will not start if NetworkManager is either running or has been configured to run with the chkconfig command
20.          What is the command used to relocate a service to another node?
                # clusvcadm -r service_name -m node_name
21.          What is split-brain condition in Red Hat Cluster?
We say a cluster has quorum if a majority of nodes are alive, communicating, and agree on the active cluster members. For example, in a thirteen-node cluster, quorum is only reached if seven or more nodes are communicating. If the seventh node dies, the cluster loses quorum and can no longer function.
A cluster must maintain quorum to prevent split-brain issues.
If quorum was not enforced, quorum, a communication error on that same thirteen-node cluster may cause a situation where six nodes are operating on the shared storage, while another six nodes are also operating on it, independently. Because of the communication error, the two partial-clusters would overwrite areas of the disk and corrupt the file system.
With quorum rules enforced, only one of the partial clusters can use the shared storage, thus protecting data integrity.
Quorum doesn't prevent split-brain situations, but it does decide who is dominant and allowed to function in the cluster.
quorum can be determined by a combination of communicating messages via Ethernet and through a quorum disk.
22.          What are Tie-breakers in Red Hat Cluster?
Tie-breakers are additional heuristics that allow a cluster partition to decide whether or not it is quorate in the event of an even-split - prior to fencing. 
With such a tie-breaker, nodes not only monitor each other, but also an upstream router that is on the same path as cluster communications. If the two nodes lose contact with each other, the one that wins is the one that can still ping the upstream router.That is why, even when using tie-breakers, it is important to ensure that fencing is configured correctly.
CMAN has no internal tie-breakers for various reasons. However, tie-breakers can be implemented using the API.
23.          What is fencing in Red Hat Cluster?
Fencing is the disconnection of a node from the cluster's shared storage. 
Fencing cuts off I/O from shared storage, thus ensuring data integrity. 
The cluster infrastructure performs fencing through the fence daemon, fenced.
When CMAN determines that a node has failed, it communicates to other cluster-infrastructure components that the node has failed. 
fenced, when notified of the failure, fences the failed node. 
24.          What are the various types of fencing supported by High Availability Add On?
                Power fencing — A fencing method that uses a power controller to power off an inoperable node.
                storage fencing — A fencing method that disables the Fibre Channel port that connects storage to an inoperable                                         node.
                Other fencing — Several other fencing methods that disable I/O or power of an inoperable node, including IBM                 Bladecenters, PAP, DRAC/MC, HP ILO, IPMI, IBM RSA II, and others.
25.          What are the lock states in Red Hat Cluster?
                A lock state indicates the current status of a lock request. A lock is always in one of three states:
                Granted — The lock request succeeded and attained the requested mode.
                Converting — A client attempted to change the lock mode and the new mode is incompatible with an existing                                                     lock.
                Blocked — The request for a new lock could not be granted because conflicting locks exist.
                A lock's state is determined by its requested mode and the modes of the other locks on the same resource.
26.          What is DLM lock model?
DLM is a short abbreviation for Distributed Lock Manager.
A lock manager is a traffic cop who controls access to resources in the cluster, such as access to a GFS file system.
GFS2 uses locks from the lock manager to synchronize access to file system metadata (on shared storage)
CLVM uses locks from the lock manager to synchronize updates to LVM volumes and volume groups (also on shared storage)
In addition, rgmanager uses DLM to synchronize service states.

without a lock manager, there would be no control over access to your shared storage, and the nodes in the cluster would corrupt each other's data.

Veritas Volume Manager and Veritas Cluster


1.            What is the difference between  Failing   and   Failed?
                Failing :
                                Failing means,  it is going to fail. In failing disk's private region is available and public region is not available.                                 so,  we can recover the data using the private region.
                Failed :
                                Failed means,  it is already failed.  In failed disk the both private and public regions are not available.  So, we                                 cannot get back the (recover) data.  The only thing is replace  or restore the data from backup.
2.            What are the deamons  of Veritas Volume Manager?
                (a)           vxconfigd :
                                (i)   This is the main deamon in Veritas Volume Manager.
                                (ii)  It maintains the Volume Manager configuration information.
                                (iii) It always resides in the private region of the disk.
                                (iv) It communicate with the kernel and update the Volume states to configure the database.
                                (v) It always starts before mounting the  root  ( / )  file system.
                (b)           vxiod :
                                (i)   This is used to maintain I/O (input  and  output)  operations.
                                (ii)  This also defines how many  I/O  operations at a time.
                (c)            vxrelocd :
                                (i)   It always monitors the consistency in the disks  and  notify the user if failed  using  (by)  vxnotifyd                                                    deamon.
                                (ii)  It also relocate and recognize the new disk.
                (d)           vxrecoverd :
                                (i)   It passes the lost data into new disk.
                                (ii)  It also notify to the Administrators using  (by)  vxnotifyd deamon.
                (e)           vxnotifyd :
                                (i)   It notify to the user  (Administrator)  about failed disks and after recovery also it notify to the                                             Administrator.
3.            How to create the root mirror?
                (i)            Bring the disk from  O/S  to  Veritas Volume Manager control using the Veritas Advanced Management tool,                  # vxdiskadm    command  (It gives  (displays)  options for easy administration of Veritas Volume Manager).
                (ii)           Select  2nd option ie.,  Encapsulation because to preserve the existing data present in the disk  and  reboot                     the system to effect Encapsulation  and  modify the  /etc/sysconfig  file. While Encapsulating, it asks disk                        name and disk group  (root disk name  and  rootdg).
                (iii)          Backup the  / (root),  /etc/sysconfig  directories.
                (iv) Take another disk  and  initialize it by  # vxdisksetup   -i       command.
                (v)           Add the above initialized disk to the volume group  ie.,  roodg  by 
                                # vxdg    -g        adddisk    mirrordisk=
                (vi) vxmirror    -v    -g       (disk  level  mirroring)
                (vii) For individual mirroring,   # vxassist     -g         mirror        or
                                 # vxrootmirr    -g          command.

4.            What is the service group in Vertias  Cluster?
                Service group is made up of resources  and  their links which we normally requires to maintain the High           Availability for the application.
5.            What is the use of  ' halink '   command?
                # halink   command is used to link the dependencies of the resources.
6.            What are the differences  between  switchover  and  failover?
SwitchOver
FailOver
(i)  Switchover is the manual task.
(i)  But,  Failover is a automatic task.
(ii) We can switchover service groups from online  
     cluster node to offline cluster node incase of 
     power outage,  hardware failure,  schedule
     shutdown  and reboot.
(ii) But, the failover will failover the service group to
      the other node when Veritas Cluster  heartbeat
      linkdown,  damaged, broken because of some 
      disaster  or  system hung.

7.            Which the main configuration file for  VCS  (Veritas  Cluster)   and   where it is stored?
                ' main.cf '   is the main configuration file for  VCS   and   it is located in  /etc/VRTSvcs/conf/config    directory.
8.            What is the public region  and  private region?
                when we bring the disk from  O/S  control to Volume Manager control in any format  (either  CDS,  simple  or                  sliced),  the disk is logically divided into two parts.
                (a)  Private region :
                                It contains Veritas configuration information like  disk type  and  name,  disk group name,  groupid  and                                         configdb. The default size is  2048 KB.
                (b)           Public region :
                                It contains the actual user's data like applications,  databases  and  others.
9.            There are five disks on VxVM  (Veritas  Volume Manager)  and  all are failed.  What are the steps you follow to         get those disks into online?
                (i)            Check the list of disks in Volume manager control  by  # vxdisk  list   command.
                (ii)           If the above disks are not present, then bring them  O/S  control  to  VxVM  control by 
                                # vxdisksetup     -i                        (if  data is not on those disk)     or  execute
                                # vxdiskadm     command  and  select  2nd option ie.,  encapsulation  method  if the disks having the data.
                (iii)          Even though If it is not possible, then check the disks are available at  O/S  level  by  # fdisk    -l   command.
                                (a)  If the disks are available,  execute the above command once again.
                                (b) If the disks are not available then recognize them  by  scanning the hardware.
                (iv) Even though  if  it is not possible, then reboot the system and follow the steps  (i)  and  (ii).
10.          What is the basic difference between  private disk group   and   shared disk group?
                Private disk group :
                The disk group is only visible for the host on which we have created it.  If the host is a part of the cluster, the   private disk group will not be visible to the other cluster nodes.
                Shared disk group :
                The disk group is sharable and visible to the other cluster nodes.
11.          How will you create private disk group  and  shared disk group?
                # vxdg    init    =                    (to create the private disk group)
                # vxdg    -s   init   =(to create the shared disk group)
12.          How will you add new disk to the existing disk group?
                we can do this in two ways.
                (i)            Run  # vxdiskadm    command, which will open menu driven program to do various disk operations.  Select                     add disk option and give disk group name and disk name.
                (ii)           # vxdg    -g        adddisk    =
                                Example:  # vxdg   -g   appsdg   adddisk   disk02=/dev/sdb
13.          How will you grow  or  shrink the volume/file system?  What is the meaning of grow by,  grow to,  shrink by              and   shrink to  options?
                (i)            We can grow the volume/file system   by,
                                # vxassist    -g   appsdg   growby   or  growto    100GB    appsvol                               (or)
                                # vxresize    -g   appsdg   +100GB    appsvol     alloc =
                (ii)           We can shrink the volume/file system  by,
                                # vxassist    -g    appsdg    shrinkby    20GB   appsvol
                                # vxassist    -g    appsdg    shrinkto    20GB    appsvol                                    (or)
                                # vxresize    -g    appsdg     -10GB    appsvol                    (to shrink by the size 10GB)
                                # vxresize    -g    appsdg     10GB     appsvol                    (to shrink to the size 10GB)
                Meanings :
                                growby  :
                                This will be used to grow the file system by adding new size to the existing file system.
                                growto :
                                This will be used to grow the file system  upto the specified new size. This will not be added the new size to                      the existing one.
                                shrinkby :
                                This will be used to shrink the file system by reducing the new size from the existing file system size.
                                shrinkto :
                                This will be used to shrink the file system upto the specified new size. This will not be reduced the file system                    new size from the existing one.
14.          If  vxdisk  list  command gives you disk status as  " error ". What are the steps you follow to make respective             disk  online?
                This issue is mainly because of fabric disconnection. So, execute   # vxdisk    scandisks    command. Otherwise                  unsetup  the disks  using   # /etc/vx/bin/vxdiskunsetup    and   setup the disks again using                                                # /etc/vx/bin/vxdisksetup    command.       
Note :/etc/vx/bin/vxdiskunsetup    will remove the private region  from the disk  and destroy the data.  So,        backup the data before using this command.

15.          Which are the different layouts for vxvm?
                (i)mirror                                                                  (ii)stripe  (default)
                (iii) concate                                                            (iv) raid 5
                (v)           stripe-mirror                                                          (vi)          mirror-stripe
16.          How will you setup  and  unsetup  disks explicitly suing  vxvm?
                # /etc/vx/bin/vxdisksetup                                                   (to setup the disks)
                # /etc/vx/bin/vxdiskunsetup                                              (to unsetup the disks)
17.          How will you list the disks which are in different disk groups?
                # vxdisk   list    or    # vxprint                                              (to list from current disk group  or imported disk group)
                # vxdisk    -o    alldgs                                                           (to list all the disks which are in different disk groups)

18.          Define LLT   and   GAB.  What are the commands to create them?
                LLT :
                                (i)   LLT  means  Low Latency  Transport protocol
                                (ii)  It monitor the kernel  to  kernel  communication.
                                (iii) It maintain and distribute the network traffic within the cluster.
                                (iv) It  uses  heartbeat between the interfaces.
                GAB :
                                (i)   GAB means  Global  Atomic  Broadcasting.
                                (ii)  It maintain and distribute the configuration information of  the cluster.
                                (iii) It  uses  heartbeat  between the disks.
                Commands :
                                # gabconfig     -a                                                                  (to check the status of the GAB, ie., GAB is running  or not)
                                                If port  ' a '  is  listening,  means  GAB  is running, otherwise  GAB  is not running.
                                                If port  ' b '  is  listening, means  I/O  fencing  is enabled, otherwise  I/O fencing is disabled.
                                                If port  ' h '  is  listening  means  had  deamon is working, otherwise  had  deamon is not working.
                                # gabconfig    -c   n   2                                         (to start the GAB  in 2 systems in the cluster, where  2  is  seed  no.)
                                # gabconfig    -u                                                   (to stop the GAB)
                                # cat  /etc/gabtab                                                (to see the GAB configuration information and the it contains as, )
                                                gabconfig      -c    n  x                           (where  x  is a no. ie.,  1, 2, 3, ....etc.,)
                                # lltconfig    -a                                       (to see the status of the llt)
                                # lltconfig    -c                                                       (to start the llt)
                                # lltconfig    -u                                                       (to stop the llt)
                                # lltstat    -nvv                                                       (to see the traffic status between the interfaces)
                                # llttab   -a                                                             (to see the cluster ID)
                                # haclus    -display                                               (to see all the information on the cluster)
                                # cat   /etc/llttab                                   (to see the llt configuration and the entries are as,)
                                                Cluster  ID,  host  ID,  interface  MAC  address, ...etc.,
                                # cat   /etc/llthosts                                               (to see the no. of nodes present in the cluster)
19.          How to check the status of the Veritas Cluster?
                # hastatus     -summary
20.          Which command is used to check the syntax of the main.cf?
                # hacf    -verify    /etc/VRTSvcs/conf/config
21.          How will you check the status of the individual resources of Veritas Cluster (VCS)?
                # hares    -state    
22.          What is the use of   # hagrp   command?
                # hagrp    command is used doing administrative actions on service groups like, on-line service group,  off-line                  service group  and  switch, ...etc.,
23.          How to switch over the service group?
                # hagrp    -switch   
24.          How to online the service group in VCS?
                # hagrp    -online        -sys   
25.          What are the steps to follow for switch over the application from  System  A   to   System  B?
                (i)            First unmount the file system on System  A.
                (ii)           Stop the volume on System  A.
                (iii)          Deport the disk group from System  A.
                (iv)          Import the disk group to another System  B.
                (v)           Start the volume on System  B.
                (vi) Finally mount the file system  on  System  B.
26.          How many types of clusters  available?
                (i)            Hybrid  Cluster.
                (ii)           Parallel  Cluster.
                (iii) Failover  Cluster.
27.          What is meant by seeding?
                Normally,  we will define how many nodes to start in a cluster while booting  or  explicitly by executing                        
                # gabconfig    -c   n  2    command. Here  2  means  2  seeds  to start in a cluster.  This no.  is called seeding.
28.          What is Split brain  issue in  VCS  and how to resolve this?
                A  Split brain issue  means, multiple systems use the same exclusive resources and usually resulting in data     corruption.
                Normally  VCS  is configured with multiple nodes  and  are communicates with each other. When power loss  or              system crashed,  the  VCS  assumes the system has failed and trying to move service group to other system to              maintain high availability. However communication  (heartbeat)  can also failed due to network failures.
                If network traffic  (connection)  between any two groups of systems fail simultaneously, a network partition    occurs. When this happen, systems on both sides of the partition can restart the applications from the other            side, ie., resulting in duplicate services. So, the most serious problem caused by this and effects the data on          shared disks.
                This split brain issue normally occurs in  VCS 3.5  to  VCS 4.0 versions.  But, from VCS 5.0 onwards the  I/O        fencing (new feature) is introduced to minimize the split brain issue. If  I/O  fencing is enabled in a cluster, then      we can avoid the split brain issue.
29.          What is Admin wait   and   Stale Admin wait?
                ADMIN-WAIT :
                If  VCS  is started on system with a valid configuration file and other systems are in the  ADMIN-WAIT  state, The           new system transition to the  ADMIN-WAIT  state    (or)
                If  VCS  is started on system with  a  stale configuration file  and  if other systems are in the  ADMIN-WAIT   state,          the new system transition to the  ADMIN-WAIT  state.
                STALE-ADMIN-WAIT :
                The configuration files are in read-only mode. If any changes wants to make that file as read-write mode. If any                 changes occurs in  ' main.cf '   file in cluster, then the changes are in  ' .stale '  hidden file under configuration                 directory. While changes occurring,  if the system restarted    or   rebooted, then the cluster will start with  '      .stale '   file.  So, the  VCS  is started on a system with a stale configuration file, the system status will be  STALE-    ADMIN-WAIT    until another system in the cluster starts with a valid configuration file   or  otherwise execute  
                # hasys    -stale    -force         (or)    # hasys     -force         to start the system   forcefully with the correct   or   valid configuration file.
30.          What is meant by resource   and  how many types?
                Resource   is a software   or  hardware  component managed by the  VCS.
                Mount points,  disk groups,  volumes,  IP addresses, ....etc., are the Software components.
                Disks,  Interfaces (NIC cards), ....etc.,  are the Hardware components.
                There are two types of resources  and  they are,
                (i)            Persistent  Resources                            (we can put them either  on-line   or  off-line)
                (ii)           Non-Persistent Resources    (we can put  off-line  only)
                If  the resource is in faulted state, then clear the service group state. Resources cab be critical   or  non-critical.                 If  the resource is critical, then it automatically failover  if the resource is failed.  If  the resource is  Non-critical,            then it switch over and we have to manually switch over the resource group  to another available system.
31.          What are the dependencies between resources in a  Cluster?
                If one resource depends on other resource, then there is a dependency between those resources.
                Example :   NIC  (Network Interface Card)  is hardware component nothing but hardware resource. The IP         address  is a software  component  nothing but software resource  and  it depends on NIC card. The relationship   between  NIC   and   IP address  is  Parent  - Child  relationship. The resource which one is starts first, that one is          called  Parentresource  and   the remaining dependency resources are called  Child  resource.
32.          What are the minimum requirements for  or  in VCS?
                (i)            Minimum  two identical (same configuration)  systems.
                (ii)           Two  switches  (Optical  Fibre  Channel).
                (iii) Minimum  three  NIC  cards.   (Two  NICs for private network  and  one  NIC  for  public network).
                (iv)          One  common  storage.
                (v)           Veritas  Volume  Manager  with  license.
                (vi) Veritas  Cluster  with  license.
33.          What are the Veritas  Cluster  deamons?
                (i)            had :
                                (a)  It is the main deamon in  Veritas  Cluster  for  high availability.
                                (b)  It monitors the cluster configuration and whole cluster environment.
                                (c)  It interacts with all the agents  and  resources.
                (ii)           hashadow :
                                (a)  It  always monitor the had deamon.
                                (b)  It's  main functionality is  logging  about the  cluster.
35.          What are the main configuration files  in  a  Cluster?
                *   /etc/VRTSvcs/conf/config/main.cf                and
                *  /etc/VRTSvcs/conf/config/types.cf                are the main configuration files in Cluster.
36.          What are the main log files in a Cluster?
                (i)            /var/VRTSvcs/log/Engine_A.log                        (logging about when the cluster started,  when failed,  when  failover                                                                                                                                                                  occurs,  when switchover  forcefully, ...etc.,)
                (ii)             /var/VRTSvcs/log/hashadow_A.log                (logging  about  the  hashadow  deamon)
                (iii)            /var/VRTSvcs/log/agent_A.log                        (logging  bout  agents)
37.          What are the Cluster  components?
                (i)            Cluster.
                (ii)           Service groups.
                (iii)          Resources.
                (iv) Agents.
                (v)           Events.
38.          What is your  role in the  Cluster?
                Normally we will get requests  like,
                (i)            Add a node.
                (ii)           Add a resource.
                (iii) Add a service group.
                (iv)          Add a resource to the existing service group.
                (v)           Add mount points.
                And  sometimes we get some troubleshooting issues  like,
                (i)            had  deamon is not running.
                (ii)           Split barin issue.
                (iii)          If the resources are faulted, then restart the service groups  and  moving service groups from one node to                        another.
                (iv)          Cluster  is  not  running.
                (v)           Communication failed between two nodes.
                (vi)          GAB  and  LLT  are not running.
                (vii) Resource not started.
                (viii) main.cf   and   types.cf   files corrupted.
                (ix)          I/O  fencing  (a  locking mechanism  to  avoid  the split brain  issue)  is not enabled  (at disk level / SAN  level).
                (x)           And the locks are,
                                (a)  engine.lock
                                (b)  ha.lock
                                (c)  agent.lock
39.          What are the statuses of a service group?
                (i)            online
                (ii)           offline
                (iii)          partial
                *              If  the non-critical  resource is failed,  then the status of the   service group   may be in partial status.
                *              If  the critical resource is failed,  then the status of the   service group   may be in offline status.
40.          How to move the service group  from  one node  to another node  manually?
                (i)            Stop the application.
                (ii)           Stop the database.
                (iii)          Unmount  the file system.
                (iv) Stop the volume.
                (v)           Deport the disk group.
                (vi)          Import the disk group.
                (vii) Start the volume.
                (viii) Mount the file system.
                (ix)          Start the database.
                (x)           Start the application.
41.          How to rename  a disk group  in  VxVM  in stepwise?
                (i)            Stop the application.
                (ii)           Stop the database.
                (iii)          Unmount  the  file system.
                (iv)          Stop the volume.
                (v)           Deport  the disk group.
                (vi)          Rename the disk group.
                (vii) Import the disk group.
                (ix)          Start the volume.
                (x)           Mount the file system.
                (xi)          Start the database.
                (xii) Start the application.
42.          How to create a volume with 4 disks?
                (i)            Bring the disks to  O/S control  by scanning the Luns  using the following the command,
                                # echo    "---"     >    /sys/class/scsi_host/< lun  no. >/scan            (to scan the lun no.)
                (ii)           Bring those disk from  O/S  control  to  VxVM  control.
                                (a)  If we want to preserve the data, then  bring the disks  to VxVM  control using encapsulation method  by
                                       # vxdiskadm   (here we get the options to do this  and select  2nd  option  ie.,  Encapsulation)
                                (b)  If we don't  want to preserve the data, then  bring the disks to VxVM control  using initialization method                         by     # vxdisksetup        -i                   (for example   # vxdisksetup     -i    /dev/sda)
                                                # vxdisksetup        -i                   (for example   # vxdisksetup     -i    /dev/sdb)
                                                # vxdisksetup        -i                   (for example   # vxdisksetup     -i    /dev/sdc)
                                                # vxdisksetup        -i                   (for example   # vxdisksetup     -i    /dev/sdd)
                                                # vxdisk   list                          (to see VxVM  controlled disks)
                (iii)          Create a disk group.
                                # vxdg    init         disk01=/dev/sda                 (for example diskgroup   name   as   appsdg)
                (iv)          Adding remaining three disks to the above disk group.
                                # vxdg     -g     appsdg         adddisk      disk02=/dev/sdb
                                # vxdg     -g     appsdg         adddisk      disk02=/dev/sdc
                                # vxdg     -g     appsdg         adddisk      disk02=/dev/sdd
                                #vxdg     list         (to see all the disks belongs to that diskgroup  for example  appsdg)
                (v)           Create the Volume  (for the requested size   and   requested layout).
                                # vxassist       -g    appsdg      make            (for example   volume name is  appsvol   and                                                                                                                                                                                                                 size in  TB/GB ... etc)
                (vi) Create a   file system  on that volume.
                                # mkfs     -F   vxfs     /dev/vx/rdsk/appsdg/appsvol
                (vii) Create the mount point and provide the requested permissions  to that mount point.
                                # mkdir     /mnt/apps
                (viii) Start the volume.
                                # vxvol     -g     appsdg     start    appsvol
                (ix)          Mount the  file system  on the above mount point.
                                # mount     -F     vxfs     -o          /dev/vx/dsk/appsdg/appsvol
                                   (where    rw  means   read-write     and     re  means    read-only)
                (x)           Put the entry into  the    "/etc/fstab"    file  for permanent mount.
                                *    If the volume is created for cluster,  don't  put the entry in   /etc/fstab    file.
                (xi)          And finally send the mail to client  or  requested person
43.          What is the difference between  Global Cluster  and Local Cluster?  Have you configured the Global  Cluster?
                Local  Cluster :
                If  all the nodes in a Cluster are placed  in a  same location,  that Cluster is called   Local Cluster.
                Global  Cluster :
                If  all the nodes in a Cluster are placed in different  Geological  locations,  that Cluster is called  Global Cluster.                The main advantage of global cluster is high availability when  Natural  Calamities   or disasters  occurs.

                *   No,  I  haven't configure the  Global  Cluster.
44.          How to start  and  stop  the  Cluster?
                # hastart                                                                                                (to start the local node in the Cluster)
                # hastart     all                                                                       (to start all the nodes in the Cluster)
                # hastart     -sys       (to start  a  specified  system  or  node  in the  Cluster)
                # hasys    -force     (to forcefully start the system in the  Cluster)
                # hastop                                                                                (to stop the local node in the Cluster)
                # hastop     all                                                                       (to stop all the systems  in the Cluster)
                # hastop     -sys      (to stop the specified system  or  node  in the Cluster)
45.          What is the  Service group  and  Resource?
                Service group :
                (i)            A  collection  or  group of physical  and  logical resources  is called the Service group.
                (ii)           Moving service group from  one system to another system means, moving resources from one system to                          another system.
                Resources :
                (i)            It  is a software  or  hardware components  like, diskgroup,  volume,  IP address,  mount point are software                     resources  and  disk,  NIC cards  are  hardware  resources.
                (ii)           The value of resource is known as  Attribute.
                                Example :  (a)  System list is attribute of a System A  or  System B.
                                                     (b)  Auto start is the  attribute of System.                                   
Resource
Attribute
Value
     NIC    
IP  address
192.168.1.1
Diskgroup
diskgroup name
appsdg
Disk
disk name
disk01
Interface
Interface name
eth0
               
                (iii) There are two types of resources.
                                (a)  Persistent Resource :
                                       Those resources which we cannot  start  or  stop  are called Persistent resources.
`                               Some resources we can  start/stop  and  some other resources we cannot stop  or  start.
                                       Example :  We cannot start  or  stop the  NIC card.
                                (b)  Non - Persistent Resource :
                                       Those resources which we can  start/stop  are called  Non - Persistent  Resources.
                (iv) Resources may be critical  or  non-critical. We need to design the resources as critical or  non-critical. ie., the                              customer will insists which is critical  and  which is non-critical.
                (v)           If  critical resource is failed, then only the service group moved automatically from one system to another                        system. ie.,  failover,  otherwise  if non-critical resource is failed, then we need to the manual movement  of                     service group from one system to another system. ie.,  switchover.
46.          What are the steps you follow to put the volume in a  Cluster?
                (i)            First create the diskgroup,  volume  and  create the file system  and  mount  and  unmount before put the                        volume in a cluster because testing of that volume is working  or  not.
                (ii)           Create the service group and add the Attributes to it.
                                # hagrp     -add    
                                Example:  # hagrp    -add    appssg

                Attributes :
                # hagrp   -modify   appssg   system  list={ sys A0,  sys B0}      (to add  sys A  and  sys B  attributes to service group)
                # hagrp    -modify    appssg   autostart   list={ sys A}                     (to start the  sys A  attributes  automatically)
                # hagrp    -modify    appssg   enabled    1   or   0            (1  means  start   and  0 means  not to start  automatically)
                (iii)          Creating  resources  and  adding them to the  service group  and  specify their  attributes.
                                For  file system :
                                (a)  /mnt/apps                      (the mount point)
                                (b)  /appsvol                         (the volume name)
                                (c)  /appsdg                           (the disk group)
                                # hares    -add   dg-apps     diskgroup     appssg                             (to add the diskgroup resource  to a service group)
                                   (where   dg-apps  is  resource name,  diskgroup   is  a keyword   and  appssg   is  a  service group name)
                                # hares    -modify   dg-apps    diskgroup    appsdg         (to add the diskgroup attribute to a service group)
                                # hares    -modify   dg-apps    enable   1                                          (to enable the resource)
                                # hares    -add    dg-volume    volume    appssg                              (to add the  volume resource to a service group)
                                # hares    -modify    dg-volume    volume   appsvol        (to add the volume attribute to a service group)
                                # hares    -modify   dg-volume    diskgroup    appsdg    (to add the diskgroup to the volume)
                                # hares    -modify    dg-volume    enable   1                     (to enable the volume resource)
                                # hares    -modify    dg-volume    critical   1                      (to make the resource as  critical)
                                # hares    -add    dg-mnt    mount    appssg                     (to add the mount point resource to a service group)
                                # hares    -modify    dg-mnt    blockdevice=/dev/vx/rdsk/appsdg/appsvol       (to add the block device resource                                                                                                                                                                                                                    to a service group)
                                # hares    -modify    dg-mnt     fstype=vxfs                   (to add the mount point attributes to a service group)
                                # hares    -modify    dg-mnt     mount=/mnt/apps                          (to add the mount point directory attribute to a                                                                                                                                                                                                                                            service group)
                                # hares    -modify    dg-mnt     fsckopt=% y    or    %n     (to add the fsck attribute either  yes  or  no  to                                                                                                                                                                                                                                           service group)
                (iv) Create  links  between  the above diskgroup,  volume   and   mount point resources.
                                # hares     -link    parent-res     child-res
                                # hares     -link     dg-appdg      dg-volume
                                # hares     -link     dg-volume     dg-mnt
47.          What is meant by freezing   and  unfreezing  a service group with persistent  and  evacuate options?
                Freezing :
                If  we want to apply patches to the system in a cluster, then we have to freeze the service group because  first stop the service group,  if it is critical, the service group will move automatically to another system in  Cluster.              So, we don't  want to move the service group from one system  to  another system,  we have to freeze the                service group.
                Unfreeze :
                After completing the task, the service group should be unfreezed  because,  if the is crashed  or  down and the                 resources are critical, then the service group cannot move from  system 1   to   system 2  due to freezed the      service group  and  results in not available of application. If unfreezed the service group after maintenance,  the      service group can move from  system 1  to  system 2.  So,  if system 1  failed, the system2  is available and            application also available.
                Persistent option :
                If  the service group is freezed with persistent option,  then we can stop  or  down  or  restart the system. So,    there is no loss of data and  after  restarted the system, the service group is remains in  freezed  state only.
                Example :  # hasys     -freeze     -persistent    
                                                # hasys     -unfreeze     -persistent    
                Evacuate :
                If this option is used in freezed service group system,  if the system down  or  restarted  the persisted information is evacuated,  ie.,  before freeze  all the service groups  should be moved from  system 1  to  another     system 2.
48.          What are the layouts are available in  VxVM  and  how they will work  and  how to configure?
                (i)            There are  5 layouts  available in  VxVM. They are  RAID-0,  RAID-1,  RAID-5,  RAID-0+1  and  RAID-1+0.
                RAID-0 :
                We can configure  RAID-0  in two ways.
                (a)  Stripped  (default).
                (b)  Concatenation.
                Stripped :
                (i)            In this minimum two disks required to configure.
                (ii)           In this the data will write on both the disks parallelly. ie., one line in one disk and 2nd line on 2nd disk, ...etc.,
                (iii) In this the data writing speed is fast.
                (iv) In this there is no redundancy for data.
                Concatenation :
                (i)            In this minimum one disk is required to configure.
                (ii)           In this the data will write in first disk and after filling of first disk then it will write on 2nd disk.
                (iii)          In this the data writing speed is less.
                (iv)          In this also there is no redundancy for data.
                RAID-1 :
                (I)            It is nothing but mirroring.
                (ii)           In this minimum  4 disks are required to configure.
                (iii)          In this same data will be written on disk1  and  disk 3,  disk 2  and  disk4.
                (iv)          If disk 1  failed, then we can recover the data from disk3  and  if disk 2  failed, then we can recover the data                   from  disk 4. So,  there is no data loss  or  we can minimize the data loss.
                (v)           In this half of the disk space may be wasted.
                RAID-5 :
                (i)            It is nothing but stripped with distributed parity.
                (ii)           In this minimum  3 disks  required to configure.
                (iii)          In this one line will write on disk 1   and   2nd line write on disk 2  and the parity bit will write on disk3. The                     parity bit will write on 3 disk simultaneously. If disk 1  failed then we can recover the data from disk2  and                      parity bit from disk 3. So,  in this data will be more secured.
                (iv)          In this disk utilization is more  when compared to RAID-1, ie.,  1/3 rd  of disk space may be wasted.
                (v)           This RAID-5 will be configured  for  critical applications like  Banking,  Financial,  SAX  and   Insurance...etc.,                     because the data must be more secured.
                Creating  a  volume with layout :
                # vxassist     -g       make       layout=
                Example :  # vxassist    -g   appsdg    make   appsvol    50GB   layout=raid 5            (the default is  RAID-5 in  VxVM)

                Logs :
                *   If the layout is mirror, then log is  DRL.
                *   If the layout is RAID-5, then the log is RAID-5  log.
                *   The main purpose of the log is fast recovery operation.
                *   We have to specify whether the log is required  or  not  in all types of layouts except  RAID-5  because the                 logging is default in  RAID-5.
                *              If we want to configure RAID-5 without logging then,
                                # vxassist     -g        make       50GB,  nolog   layout=raid 5
                *   If the layout is other than RAID-5 then,
                                # vxassist     -g         make      50GB,  log    layout=mirror
                *   If we want to add the log to the existing volume then,
                                # vxassist     -g        addlog    logtype=drl   or  raid5
                *   If we want to remove the log from the existing volume then,
                                # vxassist     -g        rmlog    
49.          What is read policy  and how many types of read policies available?
                Read policy means, how the disk  or  volume should be read when accessing the data.
                Types of  read policies :
                (i)            Select
                (ii)           Prefer
                (iii)          Round Robin
                *   By default the read policy is Round Robin.
                # vxvol     -g         rdpol = < select/prefer/roundrobin     
50.          What is your role in VxVM?
                Normally, we get requests from application, development, production  and   QA  people like,
                (i)            Create  a  volume.
                (ii)           Increase the volume.
                (iii)          Decrease the volume.
                (iv)          Provide Redundancy by implementing  RAID-1   or   RAID-5.
                (v)           Provide the required permissions.
                (vi) Put the volume in the Virtual machine.
                (vii) Put the volume in the  Cluster.
                (viii) Provide high availability to the applications  and  databases.
                (ix) Sometimes destroy  or  remove the volume.
                (x)           Backup  and  restore the data whenever  necessary.
                And sometimes  we get some troubleshooting issues like,
                (i)            Volume is not started.
                (ii)           Volume is not accessible.
                (iii) Mount point deleted.
                (iv)          File system crashed.
                (v)           One disk failed in a volume.
                (vi) Volume manager deamons  are  not running.
                (vii) Volume manager configuration files  missed  or  deleted.
                (viii) VxVM  licensing  issues.
                (ix)          Diskgroup  not  deporting  and  not importing.
                (x)           Volume is started,  deamons  are  running  but  users  cannot accessing the data.
                (xi) Disk  are  not ditected.
                (xii) Hardware  and  software  errors.
51.          What is meant by  snap backup  and  how to take the  snap backup?
                (i)            Snap backup means,  taking backup using snapshots.
                (ii)           In  24X7/365 days running servers normally we take snap backup.ie.,  no downtime allowed.
                (iii) The above said servers are called  BCV (Business  Continuity  Volumes).
                Backup :
                (i)            First stop the Application.
                (ii)           Stop the Database.
                (ii)           Unmount the file system.
                (iii)          Stop the volume.
                (iv)          Deport the diskgroup.
                (v) Import the diskgroup.
                (vi)          Join the snap diskgroup.
                (vii) Syncing the data.
                (viii) Take  the  backup.
                (ix) Split the snap diskgroup.
                (x)           Deport the diskgroup.
                (xi)          Import the diskgroup.
                (xii) Start the volume.
                (xiii) Mount the file system.
                (xiv) Start the  Database.
                (xv) Start the Application.
52.          What are the steps you follow to rename a  diskgroup?
                (i) Stop the Application.
                (ii)           Stop the Database.
                (iii) Unmount the file system.
                (iv)          Stop the volume.
                (v)           Deport the diskgroup.
                (vi) Rename the diskgroup.
                (vii) Import the diskgroup  by
                                # vxdg     -n          import          command.
                (viii) Start the volume.
                (ix) Mount the file system.
                (x)           Start the Database.
                (xi) Start the Application.
53.          How to install  VxVM?  What version of Veritas you are using  and how to know the veritas version?
                (i)            Install the veritas  supplied packages  using  # rpm   or  # yum   commands.
                (ii)           Execute the command   #vxinstall   to install  VxVM  ie.,  enable the system to use volume manager.
                (iii) #vxinstall   will allow us to encapsulate  or  not encapsulate  the root disk.
                (iv) Always use option 2 ie., Custom installation because, if  option 1 is used ie.,  Quick installation,  it takes all                               the disks for  rootdg.

                License :
                (i)            All the licenses  are stored in  /etc/vx/licenses   directory  and  we can take backup of this directory and                                           restore it back, if we need reinstall the server.
                (ii)           Removing   VxVM  package will not remove the installed license.
                (iii)          To install  license  # vxlicinst   command is used.
                (iv) To see the  VxVM  license information  by   # vxlicrep   command.
                (v) To remove the  VxVM  license by   # vxkeyless   set   NONE    command.
                (vi)The license packages are installed in  /opt/VRTSvlic/bin/vxlicrep    directory.
                (vii) The license keys are stored in   /etc/vx/licenses/lic     directory.
                (viii) We can see the licenses  by executing the below commands,
                                # cat   /etc/vx/licenses/lic/key    or  
                                # cat   /opt/VRTSvlic/bin/vxlicrep | grep   "License  key"
                (ix) To see the features of license key  by   # vxdctl   license   command.
                Version :
                (i)            We are using VxVM6.2  version.
                (ii)           to know the version of VxVM  by  # rpm   -qa    VRTSvxvm    command.
54.          What are the available formats to take the control of disks from  O/S  to  veritas  in VxVM?
                We can take the control of disks from  O/S  to  veritas  in 3 formats.
                (i)            CDS  (Cross  platform  Data  Sharing   and  the default  format in  VxVM).
                (ii)           Sliced.
                (iii) Simple.
                (i)            CDS :
                                (a)  We can share the data between different Unix flavours.
                                (b)  The private  and  public both regions are available in 7th partition.
                                (c)  The entire space is in 7th partition.
                                (d)  So, there is a chance to loss the data because, if the disk is failed ie., partition 7 is corrupted  or  damaged                                        then the data may be lost.
                                (e)  This is the default in veritas  volume manager.
                (ii)           Sliced :
                                (a)  It is always used for root disk only.
                                (b)  In this format  we cannot share the data between different Unix flavours. Normally sliced  is used for root                                         disk   and   cds  is used  for  data.
                                (c)  Private region is available at  4th partition  and  public region is available at  3rd partition.
                                (d)  So, if public region is failed,  we can recover the data from private region ie., minimizing the data loss.
                (iii) Simple :
                                (a)  This format is not using widely now because,  it is available in  old  VxVM 3.5
                                (b)  In this private and public regions  are available at  3rd partition.
                Specifying the format  while setup :
                # vxdisksetup    -i   /dev/sda                               (to setup the disk  and this is default  format  ie.,  CDS  format) 
                # vxdisksetup     -i     /dev/sdb      format =                                  (to specify  sliced  or  simple  format)
55.          In how many ways can we manage  VxVM?
                (I)            Command  line  tool.
                (ii)           GUI   (vea   tool)

                (iii)  # vxdiskadm     command  (it gives the options  to manage the disks)

Linux, CCNA and MCSE Questions: User Managment

Linux, CCNA and MCSE Questions: User Managment