Sangam14.. Call for papers is now open..

Sangam14 – call for papers is now open.. Please visit http://www.sangam14.info and submit your abstracts…

Image

Sangam13 – just 10 days away

Along with several other key speakers (Arup Nanda, Gaja and several others ) from around the world, I will be presenting a session at Sangam13

12 Cool New features in Oracle 12c RAC

This session will discuss the 12 cool new features in Oracle 12c RAC.  After introduction into the Oracle 12c enhancements, the session will discuss in detail 12 of the main new features with examples on how and where to implement them.

Will you be attending.. hope to see you at Sangam..  Visit http://www.sangam13.info for additional information about the event.

 

 

 

 

Oracle Open World 2013

Getting ready for the next major event. Oracle Open world 2013.. a place to meet the international techies once again.. While there are several conferences around the world that brings occasions to meet many of them, OOW is one umbrella event where almost everyone comes back to visit.

Similar to the previous OOW, I will be presenting two paper at OOW. Both are back to back sessions on Sunday starting at 9:30 AM.. at the user group event. Both papers are a joint presentation with PS Janakiram
UGF2580 Oracle VM: For Enterprise DR Solutions                                             9:15 AM     Moscone West – 2000
UGF2675 Deploy Oracle RAC 12c on Oracle VM VirtualBox in 60 Minutes 10:30 AM     Moscone West – 2001

This years OOW starts with the OTN ACE director briefing sessions at the Oracle HQ on Thursday 19th and Friday 20th Sept. Its going to be a exciting 10days filled with lots of networking and opportunities for knowledge sharing and learning.

Looking forward to this..

sangam11… is just around the corner

Along with several other key speakers (Arup Nanda, Mark Rittman, Hemant Chitale, and several others ) from around the world, I will be presenting two sessions at Sangam11

Friday 9th December 2011 – Implementing Oracle 11g Enterprise Manager for Prime Time Monitoring

Oracle Enterprise Manger has evolved over the years and in its latest incarnation, 11g has become a highly efficient Enterprise Management solution. It has evolved from a Database self-management tool to a complete suite that provides Business driven IT Management capabilities. There are plug-ins available to manage Enterprise Applications like EBS, OBIEE etc., and Middleware Technologies like WLS, SOA Suite etc., Being enterprise level monitoring tool, enterprises cannot afford downtime of the repository or its notification services. In this presentation we review the implementation of EM leveraging the HA capabilities provisioned from RAC and Weblogic Clustering behind a load-balancer including DR for EM

Saturday 10th December 2011 – Using 11gR2 Result Cache Feature in a RAC Environment

The result cache feature introduced in Oracle database 11g provides considerable benefits for operations where results sets do not change quite often. This feature helps store the final result sets in memory providing high speed access to the dataset. In a RAC environment it provides an added advantage that, when multiple users execute identical queries from multiple instances in the cluster, instead of the entire dataset being transferred across the interconnect or retrieved by performing an I/O operation, only the final result set is transferred across the interconnect. This provides great benefits in speed and resource utilization providing considerable performance enhancement. This presentation will go deep into the architecture and functioning of this feature in a RAC environment.

The agenda for sangam11 along with the breakout sessions is live at www.aioug.org

Sangam11 … Early bird registration opens

Registration for the much awaited Oracle users conference in India is now open.. we have closed the paper and the paper selection is under way. The agenda for the event will be available soon.  Register now…

Sangam11Flyer001

Hope to see you at this event..

Sangam11 – Call for Papers Open

 

AIOUG established in 2007 is growing leaps and bounds every year.  Sangam (meeting of the minds) the annual users conference has been a great success in the past two years.  AIOUG is ready to host its third annual users conference Sangam11 on Friday 9th and Saturday 10th December 2011 @ Bangalore India.

sangam11

The call to submit your abstracts and take the opportunity to showcase your technical knowledge to the highly demanded Oracle users in the world.

Please submit your papers.. at www.aioug.org = > events => Sangam11 CFP

OTN Tour – South America – 2011 – Day 4

 

My last stop is Peru.  Two weeks away from home meeting with oracle users and giving presentations.  It was fun.  People in south America are very friendly and helpful.  Anyone would answer questions and will stop to show you directions.  For example the cab drivers at Ecuador for example showed me places along the way, explained and answered several questions.  He also had a business card and like it said on the card, he spoke fluent English.

IMG_1263

13th July 2011 – Day 4/Stop 4 – Peru

My sessions:

  1. Oracle 11gR2 New features in RAC
  2. Oracle tools and utilities for monitoring  a RAC environment.

IMG_1536

OTN Tour – South America – 2011 – Day 3

 

I have missed posting about my other two stops ..  you know how it is, when you return back to work after a gap of two weeks, you are flooded with stuff to get done.  

11th July 2011 – Day 3 or stop 3 was Colombia. 

at Colombia gave two  presentations,

My Sessions

  1. 11gR2 New Features in RAC
  2. Tools and Utilities for Monitoring RAC

Colombia really had a mixed but interesting audience.  We had several DBA and developers on the mix and both the tracks where well attended. 

LAotnTOUR2011-columbia

OTN tour team in Colombia

Next and final stop for me is Peru..

OTN Tour – South America – 2011 – Day 2

 

After completing San Jose moving on to Quito, Ecuador.  

Boarded the plane to Quito with a stop over at Panama city.  On landing at the Quito airport I was informed that they found my missing luggage and it was here.. Continental Airlines had found it after 4 days.. hurray.. that was good news

At the event, it is always good to meet other experts from the Oracle area… we had a great team on the OTN tour.

OTNtour-Quito2011

8th July 2011  – Quito – Ecuador

My Sessions

  1. New Features in 11gR2 RAC
  2. Oracle 11g RAC for Developers
  3. Tools and Utilities for Monitoring RAC

Audience seem to be interested however all of them where on translators so really could not understand how much of the presentation they got until the end of the conference. The Q & A to the expert panel had several technical questions.

OTNTour2011Quito

Something new that I learnt here. Ecuador uses American dollar for their currency bills but uses local coins.  I was told that they switched to this system in 2001.

OTN Tour – South America – 2011 – Day 1

 

We are part of an Oracle OTN tour covering South America.  This is one of the ways you give back to the Oracle community by educating them and informing them about the latest and greatest available in Oracle technology space. 

As part of the group of ACE directors traveling in South America, I am speaking/presenting in 4 locations.  Starting with Costa Rica and ending in Peru.  The tour goes on beyond Peru to other countries but my share is 4 countries spread over a two week period.

5th July 2011  – San Jose – Costa Rica

I presented three sessions at San Jose, 

  1. Using Oracle Result Cache Features in Oracle 11g RAC
  2. Oracle 11g RAC for Developers
  3. Tools and Utilities for Monitoring RAC

Had an excellent audience.  It was interested to note the amount of interaction that most DBA’s bring to the table in the form of questions and experiences. All the three sessions had both beginners and current RAC users and provided a mix of interest.

oracle 013OTN -SJ 5jul2011 oracle 017OTN -SJ 5jul2011 oracle 029OTN -SJ 5jul2011

Being in Costa Rica was like being at home.  This tropical country is so similar from several angles to the small state called Kerala, in India where I am from.  The plants, the fruits the way the houses are constructed are so similar that apart from the language everything was very close. I enjoyed the fruits and food. The people here are very friendly.   For a minute I felt home sick looking at the nature, I am sure if I was not busy with the presentations and the preparation I would have wanted to go to India instead of continue the tour.  

San Jose – Costa Rica is done and moving on to Quito, Ecuador

Sangam10… sessions well accepted.

Sangam10 was better compared to Sangam09.. better attendance, well organized conference and no complains on food and water.

I had presented two sessions at Sangam10..  I did not do any RAC presentation during Sangam09, but several folks  asked why I had not given a session.  I  followed the call for papers like others and was selected by the selection panel. Both my sessions were well attended.

Session 1.  Think outside the interconnect when optimizing your RAC environment.

DSC_0131  DSC_0129 DSC_0130

Kept the session of an interactive type.  so that audience can get involved through the entire session.  Was very well accepted and the received good feedback.

Session 2. 11gR2 New features for RAC.

DSC_0191 

Did not notice how fast the time went by.. when users get interactive and you get several questions time just flies by.  Could not complete the entire presentation.  I was only able to complete the new features for the clusterware.  did not cover ASM and RDBMS new features.   Maybe next time I do this topic, I will request for a longer duration.

Looking forward to Sangam11..  before that Oracle Open World 2010 is just around the corner. I will be giving two sessions at OOW this year.

1. Waits, Enqueues, and Latches in Oracle 11g RAC

2. The Oracle Database 11gR2 Result Cache in an Oracle RAC Environment

More updates on OOW later…

11gR2 – ASM redundancy levels.

 

In 11gR2 ASM diskgroups are used 1) by Grid infrastructure for the OCR, Voting disk and the ASMspfile. 2) by the database for DATA and FRA.

ASM redundancy levels, can be a bit misleading in Oracle 11gR2 compared to 10gR2.   They are called the same, Normal redundancy, High Redundancy and External Redundancy.  However, depending on what area of the 11gR2 configuration this will be used, the number of disks required to configure are not the same.  DBA’s seldom are able to take the difference into consideration when requesting disks /LUN’s from the storage administrators and are taken by surprise during installation.

Grid Infrastructure:  When configuring diskgroups for the OCR file, voting disk and ASMspfile, Normal redundancy requires 3 disks/LUNS to be part of the diskgroup.  High redundancy requires 5 disks/LUNS to be part of the diskgroup.  External redundancy requires just one disk/LUN to be part of the diskgroup. The below screen output shows the error message when only one disk/LUN is selected for the GRID diskgroup.

 grid7

Database: When configuring diskgroups for DATA and FRA, normal redundancy requires 2 disks/LUNS to be part of the diskgroup.  High redundancy requires 3 disks/LUNs or failure groups to be part of the diskgroup.  External redundancy requires just one disk/LUNs or failure groups to be part of the diskgroup

asmca2

More about configuring OCR and voting disks in the next post.

Sangam10 – Early bird registration extended

 

AIOUG has extended the early bird registrations to 10th August 2010.. Take advantage of this opportunity and register for this user conference on 3rd and 4th September at Hyderabad.

Great speakers, Jonathan Lewis, Mark Rittman, Iggy and more..  

11gR2 – having a second OCR file on a separate diskgroup

 

Oracle database versions  < 11gR2 RAC, the OUI had a screen where we could specify multiple OCR and or Voting disks.  Starting with 11gR2 since the OCR and Voting disks are stored on ASM storage we don’t have options to specify multiple OCR or for that matter multiple voting disks. Well do we need multiple disks for OCR and voting disks.  with these files stored on ASM, we could take advantage of the ASM mirroring options.  Normal Redundancy (two copies of the file), High Redundancy (three copies of the file) or External Redundancy (where ASM does not manage the redundant copy of these files, but the redundancy is maintained /protected at the storage level by mirroring disks.

If the normal redundancy option or the high redundancy option was selected when the disk group that stores the OCR/voting disks, the automatically the disks are mirrored for either normal or high redundancy.

Now what if we do not want to use ASM mirroring.. but would like the clusterware to maintain two or more copies of OCR files on physically different diskgroups.  Oracle supports this option, not while installing the Grid infrastructure but after the installation is  complete using the ocrconfig utility. Lets go through one such scenario.

OCR is a critical component of the RAC architecture.  From the point that the clusterware starts (on server start/reboot)and to start all applications running on the clusterware which includes database, listener, ASM, database services etc, the OCR is consulted by the clusterware for placement of resources.   OCR contains all the rules for high availability of these resources. The clusterware will use these definitions for placement of resources when an server / instance crashes.  In 11gR2 the OCR contains additional information such as server pool definitions etc.

The first time the clusterware starts it determines the location of the OCR file by checking the following location

1.  Request storage administrators to create a new LUN with the same size as the LUN that currently hosts the OCR and voting disks for the 11gR2 cluster

2.  Connect to the ASM instance on one for the database servers and create a new disk group and mount the disk group on all instances in the cluster using the following syntax

CREATE DISKGROUP PRD_GRID2 EXTERNAL REDUNDANCY ADD DISK ‘ ‘;

ALTER DISKGROUP PRD_GRID2 MOUNT

3. Once the diskgroup have been mounted..we can configure the diskgroup for OCR.   This configuration requires root privileges. 

Connect to the server as root. and execute ocrconfig command from the GRID_HOME /bin directory. 

[root@prddb3 bin]# ./ocrconfig -add +PRD_GRID2

Note:  if the disk group does not have the required privileges then you can get the following error.   I ran into this error while configuring OCR on PRD_GRID2 diskgroup.

PROT-30: The Oracle Cluster Registry location to be added is not accessible.

Also reported in the GRID_HOME/log/prddb3/client directory is a ocrconfig log file that contains the following errors

[root@prddb3 bin]# cat /app/grid/product/11.2.0/log/prddb3/client/ocrconfig_4000.log
Oracle Database 11g Clusterware Release 11.2.0.1.0 - Production Copyright 1996, 2009 Oracle. All rights reserved.
2010-07-10 20:38:18.472: [ OCRCONF][2833243664]ocrconfig starts...
2010-07-10 20:38:23.140: [  OCRCLI][2833243664]proac_replace_dev:[+PRD_GRID2]: Failed. Retval [8]
2010-07-10 20:38:23.140: [  OCRAPI][2833243664]procr_replace_dev: failed to replace device (8)
2010-07-10 20:38:23.140: [ OCRCONF][2833243664]The new OCR device [+PRD_GRID2] cannot be opened
2010-07-10 20:38:23.140: [ OCRCONF][2833243664]Exiting [status=failed]... 

What does this error mean?  by default when you create a diskgroup in 11gR2 RAC from the command prompt using SQL plus, the default compatibility attribute of the diskgroup is set to 10.1. This issue does not occur if you create it through ASMCA.

[oracle@prddb3]# sqlplus ‘ /as sysasm’

SQL> SELECT NAME,COMPATIBILITY,DATABASE_COMPATIBILITY from v$asm_diskgroup; 

NAME         COMPATIBILITY        DATABASE_COMPATIBILI VOT
------------ -------------------- -------------------- ---
PRD_DATA    11.2.0.0.0           11.2.0.0.0           N
PRD_FRA     11.2.0.0.0           11.2.0.0.0           N
PRD_GRID1   11.2.0.0.0           11.2.0.0.0           N
PRD_GRID2   10.1.0.0.0           10.1.0.0.0           N

Note The ASM compatibility and the database (RDBMS) compatibility defaults to 10.1. This needs to be changed to 11.2  in order for the clusterware to recognize that this is a 11gR2 ASM configuration.

4. Change the compatibility of the new diskgroup to 11.2 as follows:

ALTER DISKGROUP PRD_GRID2 SET ATTRIBUTE ‘COMPATIBILITY.ASM’=’11.2’;

ALTER DISKGROUP PRD_GRID2 SET ATTRIBUTE ‘COMPATIBILITY.RDBMS’=’11.2’;

5. The commands will change the compatibility levels for the PRD_GRID2 diskgroup and the new values can be verified using the following query:

SQL> COL NAME FORMAT A12
SQL> COL COMPATIBILITY FORMAT A20
SQL> COL DATABASE_COMPATIBILITY FORMAT A20

SQL> SELECT NAME,COMPATIBILITY,DATABASE_COMPATIBILITY from v$asm_diskgroup; 

NAME         COMPATIBILITY        DATABASE_COMPATIBILI
------------ -------------------- --------------------
PRD_DATA     11.2.0.0.0           11.2.0.0.0
PRD_FRA      11.2.0.0.0           11.2.0.0.0
PRD_GRID1    11.2.0.0.0           11.2.0.0.0
PRD_GRID2    11.2.0.0.0           11.2.0.0.0 

6. Once the data has been verified, attempt to configure the diskgroup for OCR.   This configuration requires ‘root’ privileges. Connect to the server as root. and execute ocrconfig command from the $GRID_HOME/bin directory. 

[root@prddb3 bin]# ./ocrconfig -add +PRD_GRID2

7.  To verify if the new OCR file is created and the /etc/oracle/ocr.loc file is updated with the new file information.

[root@prddb3 bin]# cat /etc/oracle/ocr.loc
#Device/file  getting replaced by device +PRD_GRID2
ocrconfig_loc=+PRD_GRID1
ocrmirrorconfig_loc=+PRD_GRID2

8. The  ocrcheck command reflects two OCR locations

[root@prddb3 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3972
         Available space (kbytes) :     258148
         ID                       : 1288159793
         Device/File Name         : +PRD_GRID1
                                    Device/File integrity check succeeded
         Device/File Name         : +PRD_GRID2
                                    Device/File integrity check succeeded 

                                    Device/File not configured 

                                    Device/File not configured 

                                    Device/File not configured 

         Cluster registry integrity check succeeded 

         Logical corruption check succeeded 

9. Apart from the standard utilities used to very the integrity of the OCR file, the $GRID_HOME/log/prddb3/client directory also contains logs reflecting that the change was successful.

[root@prddb3 bin]# cat /app/grid/product/11.2.0/log/prddb3/client/ocrconfig_25560.log
Oracle Database 11g Clusterware Release 11.2.0.1.0 - Production Copyright 1996, 2009 Oracle. All rights reserved.
2010-07-10 21:01:00.652: [ OCRCONF][4224605712]ocrconfig starts...
2010-07-10 21:01:13.593: [ OCRCONF][4224605712]Successfully replaced OCR and set block 0
2010-07-10 21:01:13.593: [ OCRCONF][4224605712]Exiting [status=success]... 

10.  The clusterware alert log also has an entry indicating successful addition of the OCR disk.

/app/grid/product/11.2.0/log/prddb3/alertprddb3.log
[crsd(27512)]CRS-1007:The OCR/OCR mirror location was replaced by +PRD_GRID2.

11.   Lets check if the physical files can be found on the ASM storage.  Set the ORACLE_SID to the ASM instance on the server and using asmcmd the following can be found

ASMCMD> pwd
+PRD_GRID2/PRDCW/OCRFILE
ASMCMD> ls -lt
Type     Redund  Striped  Time             Sys  Name
OCRFILE  UNPROT  COARSE   JUL 10 21:00:00  Y    REGISTRY.255.715795989 

ASMCMD> cd +PRD_GRID1/PRDCW/OCRFILE
ASMCMD> ls -lt
Type     Redund  Striped  Time             Sys  Name
OCRFILE  UNPROT  COARSE   JUL 10 21:00:00  Y    REGISTRY.255.724021255 

10gR2., adding dependency to applications/resources managed by CRS

 

Yesterday we had back to back crashes on two our ETST servers.  After analysis we observed that the crash was due to node evictions by CRS due to communication delays with the other servers, reason: faulty network connection.  Outside this, after the crash and  during auto start of the servers, we encountered another issue.  The database would not start automatically.  The CRS attempted to start the database instance however it failed.   After looking at the log files we found that there was an attempt to start the instance while the ASM instance on that server was not completely up.  Not on just on one server but both servers that got evicted.  The instance was then started manually and all was fine.

To fix this issue for the future, after deliberating with the DBA team we decided to add a between the database instance and ASM instance on all the instances. The dependency is to be configured in the OCR file and when CRS starts these resources /applications it will apply the rule to the instance startup process.

This is what was done to implement the dependency..

1.  execute crsstat to check on all the instances configured on the cluster

[oracle@tstdb1 ~]$ crsstat
HA Resource                              Target     State
-----------                              ------     -----
ora.ETST.ETST1.inst ONLINE ONLINE on tstdb1
ora.ETST.ETST2.inst                      ONLINE     ONLINE on tstdb2
ora.ETST.ETST3.inst                      ONLINE     ONLINE on tstdb3
ora.ETST.db                              ONLINE     ONLINE on tstdb1
ora.tstdb1.ASM1.asm ONLINE ONLINE on tstdb1
ora.tstdb1.ETST_TSTDB1.lsnr ONLINE ONLINE on tstdb1
ora.tstdb1.LISTENER_TSTDB1.lsnr ONLINE ONLINE on tstdb1
ora.tstdb1.gsd                           ONLINE     ONLINE on tstdb1
ora.tstdb1.ons                           ONLINE     ONLINE on tstdb1
ora.tstdb1.vip                           ONLINE     ONLINE on tstdb1
ora.tstdb2.ASM2.asm ONLINE ONLINE on tstdb2
ora.tstdb2.ETST_TSTDB2.lsnr ONLINE ONLINE on tstdb2
ora.tstdb2.LISTENER_TSTDB2.lsnr ONLINE ONLINE on tstdb2
ora.tstdb2.gsd                           ONLINE     ONLINE on tstdb2
ora.tstdb2.ons                           ONLINE     ONLINE on tstdb2
ora.tstdb2.vip                           ONLINE     ONLINE on tstdb2
ora.tstdb3.ASM3.asm ONLINE ONLINE on tstdb3
ora.tstdb3.ETST_TSTDB3.lsnr ONLINE ONLINE on tstdb3
ora.tstdb3.LISTENER_TSTDB3.lsnr ONLINE ONLINE on tstdb3
ora.tstdb3.gsd                           ONLINE     ONLINE on tstdb3
ora.tstdb3.ons                           ONLINE     ONLINE on tstdb3
ora.tstdb3.vip                           ONLINE     ONLINE on tstdb3

2.  create a profile file using the current configuration for all instances.  You do not have to connect to the respective servers, this can be done from any server.

$ORA_CRS_HOME/bin/crs_stat -p ora.ETST.ETST1.inst >> /app/oracle/product/crs/crs/public/ora.ETST.ETST1.inst.cap
$ORA_CRS_HOME/bin/crs_stat -p ora.ETST.ETST2.inst >> /app/oracle/product/crs/crs/public/ora.ETST.ETST2.inst.cap
$ORA_CRS_HOME/bin/crs_stat -p ora.ETST.ETST3.inst >> /app/oracle/product/crs/crs/public/ora.ETST.ETST3.inst.cap

Notice the location of the ‘.cap’ file. It is important the cap file is created in that location. Depending on the type of resource there are two locations where the '.cap’ file can be located. The public folder (/app/oracle/product/crs/crs/public/) is used for all resources owned by Oracle and 'private' folder (/app/oracle/product/crs/crs/private/) is used by resources managed by CRS and owned by root.
3.  Update this profile with the dependency

$ORA_CRS_HOME/bin/crs_profile -update ora.ETST.ETST1.inst -r ora.tstdb1.ASM1.asm
$ORA_CRS_HOME/bin/crs_profile -update ora.ETST.ETST2.inst -r ora.tstdb2.ASM2.asm
$ORA_CRS_HOME/bin/crs_profile -update ora.ETST.ETST3.inst -r ora.tstdb3.ASM3.asm

After the above operation change, the profile is updated and the notice that the ‘REQUIRED_RESOURCES’ parameter is is changed with the ASM instance resource

[oracle@tstdb1 ~]$ cat /app/oracle/product/crs/crs/public/ora.ETST.ETST1.inst.cap
NAME=ora.ETST.ETST1.inst
TYPE=application
ACTION_SCRIPT=/ETST/oracle/product/10.2.0/db/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=tstdb1
OPTIONAL_RESOURCES=
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.tstdb1.ASM1.asm
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=600
START_TIMEOUT=0
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=
USR_ORA_IF=
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=
USR_ORA_NETMASK=
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=

4.  The reminder of the steps should be performed one instance at a time to avoid downtime to the database  All these steps are performed as user ‘oracle’

   4(a)..  shutdown instance 1  using server control

  srvctl stop instance -d ETST -i ETST1

   4(b)    unregister the resource/entry for the instance from the OCR

  $ORA_CRS_HOME/bin/crs_unregister ora.ETST.ETST1.inst

   4(c)    register the resource/entry for the instance back to the OCR

  $ORA_CRS_HOME/bin/crs_register ora.ETST.EST1.inst

    4(d)   Start instance 1 using server control 

  srvctl start instance -d ETST -i ETST2

     4(e)   Verify if the change has made it to the OCR file.  The command below will list the content /definitions for the resource.  You can check for  

REQUIRED_RESOURCES=ora.tstdb1.ASM1.asm
  $ORA_CRS_HOME/bin/crs_stat -p ora.ETST.ETST1.inst

      4(f)  repeat steps 4(a) thru 4(e) for other instances in the cluster

Sangam10:: Registration opens – early bird discounts

 

SANGAM10-web

We are excited to host our second annual users conference in  India.  Many of you who attended the event in Bangalore during 2009 should have noticed how much a user group can provide in the form of information sharing and knowledge transfer.  This years conference is also on the similar lines.. we have a great list of technical sessions to be delivered by a great list of experts from around the world.  Jonathan Lewis will be primary presenter and will deliver two half day sessions on ‘Writing Optimal SQL’.  Apart from Jonathan we have other speakers including Mark Rittman, Vivek Sharma, and many others to deliver two half days of break out sessions.

Registration is now open and the early bird rates are on…

Visit www.aioug.org/sangam10.php for details  

OCR Repair – yet another scenario

Few months back I had written about an situation when the OCR file was corrupt and we had to repair the file to fix issues.  We ran into yet another scenario in our 10gR2 RAC environment.  But this time we really did not notice any accessibility issues for the CRS, because the CRS stack would get started just fine however the database, instances and or the database services would not get started by the CRS.  

While the basic solution is to restore the OCR file from a clean backup or export file, the troubleshooting and maintenance tasks maybe a bit different. Checking the status of the applications managed by the clusterware we noticed that the database, instance and database services did not start.  

HA Resource                                        Target          State
———–                                                 ——             —–
ora.TSTDB.TSTDB1.inst                    ONLINE      OFFLINE on tstdb1
ora.TSTDB.TSTDB2.inst                    OFFLINE     OFFLINE on tstdb2
ora.TSTDB.TSTDB3.inst                    OFFLINE     OFFLINE on tstdb3
ora.TSTDB.EXTVEN.TSTDB1.srv  OFFLINE     OFFLINE on tstdb1
ora.TSTDB.EXTVEN.TSTDB2.srv  OFFLINE     OFFLINE on tstdb2
ora.TSTDB.EXTVEN.TSTDB3.srv  OFFLINE     OFFLINE on tstdb3
ora.TSTDB.EXTVEN.cs                      OFFLINE     OFFLINE on tstdb1
ora.TSTDB.db                                        OFFLINE     OFFLINE on tstdb1
ora.tstdb1.ASM1.asm                         ONLINE      ONLINE  on tstdb1
……………………….
……………………….  

Repeated attempts to start the database and instances using srvctl did not help.  So we opted to do some troubleshooting.  

Check the database and instances manually:  Using SQL Plus try to start a database instance. If this works then ASM and DB are in good health and its just the CRS that is not able to start or display the status of the database. Its not able to retrieve the database/instance/service definitions from the OCR file and perform the required operations.  

As user ‘root’, shutdown crs on all servers..   

 /etc/init.d/init.crs stop 

Check if the /etc/oracle/ocr.loc file has the following entries..  

[oracle@tstdb2 mvallath]$ cat /etc/oracle/ocr.loc   
ocrconfig_loc=/dev/raw/ocr1                           <== PRIMARY COPY OF THE OCR FILE  
ocrmirrorconfig_loc=/dev/raw/ocr2            <== MIRRORED COPY OF THE OCR FILE  
local_only=FALSE  

If the cluster detects that the OCR file is corrupt it will disable the corrupt file and continue with the mirrored copy.  In which case you may see the ocr.loc file was altered by the clusterware.  Most of the time the mirrored copy should be good and the clusterware should start all applications.  However in this case the mirrored copy was only partially good.  All components of the clusterware stack and applications with the exception of the database, instances and database services were up and visible to the CRS.  

[oracle@tstdb1 ~]$ cat /etc/oracle/ocr.loc  
#Device/file /dev/raw/ocr1 being deleted   <==ALTERED BY CLUSTERWARE..   
ocrconfig_loc=/dev/raw/ocr2                          <== ALTERED BY CLUSTERWARE..   
local_only=false  

Please take note from the output above, that clusterware not only marked the primary copy as ‘deleted’ but also renamed the mirrored copy as the new primary copy and changed the local_only=false (false was in upper case earlier). Before restoring the good copy of the OCR file, the ocr.loc file needs to be changed to reflect its original state. Manually edit this file with the appropriate entries or get a good copy of the ocr.loc (if other servers have not been modified) moved from one of the other servers to this location.  

Normally under such circumstances, CSS could coredump.  The dump file is located in $ORA_CRS_HOME/log//cssd directory.  Analyzing the coredump we also found the following string “#Device/file %s being deleted  

Once the ocr.loc file is updated on all servers in the cluster, import the OCR from a latest export file as user ‘root’  

/app/oracle/product/crs/ocrconfig -import /home/oracle/mvallath/OCRexp28Apr2010.dmp  

If you do not have an good export .. you can get a good copy of the backup.. (clusterware performs an automatic backup of the OCR file every 4 hours)  and restore the OCR file..  

/app/oracle/product/crs/ocrconfig -showbackup  
tstdb1     2010/04/28 09:05:52     /app/oracle/product/crs/cdata/tstcw 

List all the backups found in the location from the above output..      

[oracle@tstdb1 ~]$ ls -ltr /app/oracle/product/crs/cdata/tstcw
total 50328
-rw-r–r– 1 root root 8572928 Apr 27 21:05 day.ocr
-rw-r–r– 1 root root 8572928 Apr 28 01:05 week.ocr
-rw-r–r– 1 root root 8572928 Apr 28 01:05 day_.ocr
-rw-r–r– 1 root root 8572928 Apr 28 01:05 backup02.ocr
-rw-r–r– 1 root root 8572928 Apr 28 05:05 backup01.ocr

8572928 Apr 28 09:05 backup00.ocr root root–r– 1 -rw-r

  

Find the best version of the OCR backup file based on when the clusterware was stable and restore using the following command:  
/app/oracle/product/crs/ocrconfig -restore /app/oracle/product/crs/cdata/tstcw/day.ocr  

Start the clusterware on all nodes as user ‘root’
/etc/init.d/init.crs start  

Note:  Occasionally once the restore or import is completed, clusterware could go into a panic mode and reboot the server.  This is fine and is normal behavior, this normally happens when crs has not completely shutdown and has a hanging daemon on one of the servers.  

This should start the clusterware and all the applications managed by the CRS.  once the clusterware components have been verified and database has been checked.. It would be a good practice to export the OCR file..as a good copy.  

Thanks to Baldev Marepally for helping trouble shoot this issue..  

coexist 10gR2 and 11gR2 RAC db on the same cluster – Part II

I accidently posted this blog entry over my previous entry on this same topic.  Thanks to Google I was able retrieve my old post from the Google cache and posted it again to my blog.

———————————–

My previous post discussed about the various stumbling blocks we encountered during our 10gR2 database installation in a 11gR2 environment. We took it a step at a time to troubleshoot and install the database documenting and fixing the issues as we go. Yesterday browsing through Metalink I noticed a very recent article on the same subject.
Pre 11.2 Database Issues in 11gR2 Grid Infrastructure Environment [ID 948456.1]
Which recommends several patches and steps that could help ease the installation process

coexist 10gR2 and 11gR2 RAC db on the same cluster.. stumbling blocks

Due to project/application requirements we had to create a new 10gR2 database on a 11gR2 cluster. These are the high level steps that were attempted to complete this effort.

  1. Install 11gR2 Grid Infrastructure
  2. Create all ASM diskgroups using asmca
  3. Install 11gR2 database binaries
  4. Create the 11gR2 database using dbca from the 11gR2 DB home
  5. Install 10gR2 database binaries
  6. Create 10gR2 database using dbca from the 10gR2 DB home

Once all the prerequisites are met, 11gR2 installation is a very smooth process. Everything goes so smooth. some of us who have worked the some of the true clustered database solutions such as Rdb on VMS clusters (many of you don’t even know that Oracle owns another database called Oracle Rdb.  Oracle acquired this excellent database from Digital Equipment Corporation a.k.a DEC around 1992 and surprisingly Oracle Rdb is used by many customers even today to manage their VLDB systems), Oracle Parallel Servers (OPS) and then most recently with 9iR2 RAC, would remember how difficult it was to complete the installation. Oracle has come a long way in streamlining this process. Its so easy and the entire 11gR2 RAC configuration can be completed with little or no effort in less than 1 hour.

Once the 11gR2 environment as up and running, the next step was to configure the 10gR2 RAC database on the same cluster. We first installed the 10gR2 binaries. runInstaller was able to see that there was a cluster already installed. During the verification step, installer complained of an incompatible version of clusterware  on the server.  We ignored the error and moved on. Binaries installed successfully on all nodes in the cluster.  After installed 10.2.0.1 we complete the upgrade to 10.2.0.4

Note: When 10gR2 installer was available 11g was not, then how can it be aware of a higher version. Higher versions are almost always compatible with the lower versions.  With this idea we moved on.

Stumbling Block I

Next step was to configure the database using dbca. Its important to install the dbca from the 10gR2 /bin directory.  we noticed that the intro screen was different, it did not show the choices we normally see in a clustered database installation.  We did not get the choice to select between creating a ‘RAC’ database or a ‘single instance database’.  This indicated that something was wrong. Why did the installer see that there was a clusterware already there and this is a RAC implementation.  Why not dbca?  Searching through Oracle documentation I found this note.

When Oracle Database version 10.x or 11x is installed on a new Oracle grid infrastructure for a cluster configuration, it is configured for dynamic cluster configuration, in which some or all IP addresses are provisionally assigned, and other cluster identification information is dynamic. This configuration is incompatible with older database releases, which require fixed addresses and configuration.

You can change the nodes where you want to run the older database to create a persistent configuration. Creating a persistent configuration for a node is called pinning a node.”

We can check if the nodes are pinned using olsnodes command. You have a new switch in 11gR2 that lists the pinned status for a node. 

[prddb1] olsnodes -h
Usage: olsnodes [ [-n] [-i] [-s] [-t] [<node> | -l [-p]] | [-c] ] [-g] [-v]
        where
                -n print node number with the node name
                -p print private interconnect address for the local node
                -i print virtual IP address with the node name
                <node> print information for the specified node
                -l print information for the local node
                -s print node status – active or inactive
                –t print node type – pinned or unpinned
                -g turn on logging
                -v Run in debug mode; use at direction of Oracle Support only.
                -c print clusterware name

[prddb1] olsnodes -t
prddb1     Unpinned
prddb2     Unpinned
prddb3     Unpinned

Pinning of a node is done using the crsctl utility.  crsctl and olsnodes are both located in $GRID_HOME/bin directory.

crsctl pin css –n prddb1

check if they are pinned

[prddb1] olsnodes -t
prddb1     Pinned
prddb2     Pinned
prddb3     Pinned

Stumbling Block II

dbca was now able to see the RAC cluster and we continued.  Ran into the second stumbling block after selecting ASM as the storage manager. “ASM instance not found .. press ok to configure ASM” 

In  11gR2 listeners are driven by the SCAN feature, meaning there is one SCAN listener for each SCAN IP defined in the DNS server. Apart from the three scan listeners, there is a parent SCAN listener that listens on the various database services and connections. Listener service is named as LISTENER in 11gR2 however called LISTENER_<HOSTNAME> in Oracle 10gR2. the dbca log files located in $ORACLE_HOME/cfgtools/dbca/trace.log showed the following entries..

[AWT-EventQueue-0] [11:7:53:935] [NetworkUtilsOPS.getLocalListenerProperties:912] localNode=prddb1, localNodeVIP=prddb1-vip
[AWT-EventQueue-0] [11:7:53:935] [NetworkUtilsOPS.getLocalListenerProperties:913] local listener name = LISTENER_prddb1
[AWT-EventQueue-0] [11:7:53:939] [NetworkUtilsOPS.getLocalListenerProperties:923] No endpoint found for listener name=LISTENER_prddb1
[AWT-EventQueue-0] [11:7:53:939] [ASMAttributes.getConnection:209] getting port using fallback…
[AWT-EventQueue-0] [11:7:53:940] [ASMInstanceRAC.validateASM:609] oracle.sysman.assistants.util.CommonUtils.getListenerProperties(CommonUtils.java:455)

Metalink describes this error and suggests that we use the LISTENER name that is currently configured during database creation.  Well this also did not help.

The workaround was to add the qualified listener to the listener.ora file and to reload the listener.

Stumbling Block III

The listener fix got us through the ASM configuration and database creation. Hoping that is all the issues we could potentially run into we just moved ahead and defined database services as part of the database creation process.

Our next encounter was with the instance /database/services startup process. dbca failed with errors unable to start database, instance and database services. The entries made it to the OCR file as was evident from the crsstat output. However they would not start?  As the obvious step was to check if these resources are registered with the OCR, we checked the status of the database using srvctl

srvctl status database –d  <dbname>  there was no output. How could this be, the crsstat showed the entries but the srvctl gave no results to the check. 

Next step was to take a dump of the OCR file and see if the entries was really in it, found the entries. After considerable research determined that the entire format and structure, syntax of the srvctl utility in 11gR2 is different compared to 10gR2. Tried the srvctl utility from 10gR2 home and it did the trick

We have both 11gR2 and 10gR2 RAC on a 11gR2 clusterware/Grid Infrastructure cluster.  Both using ASM, 11gR2

Logs in 10gR2 RAC..

 

I have received several emails asking help with why oracle does not write information into the alert log files during database startup failures.. Due to practice we tend to look for instance or database related information in our standard log directories such as $ORACLE_BASE/admin/…/bdump or $ORACLE_HOME/network/log directories.  This causes panic and anxiety, searching Google, open entries on OTN forums or open an SR with Oracle support.  The Oracle documentation has also not done a good job in this area. 

Entries are not found in the alert log because the database/instance was not started using SQL*Plus entries are not added to the  db alert log. Depending on what we are trying to look for, what area of the stack is being examined, or at what state of the application is running under,  there are different kinds and flavors and locations where you could find logs.

Note:  This is not a complete list, but a start for the beginners to navigate their way through the troubleshooting process.

1.  During installation:

  • When the installation is started until such time that the screen where the orainventory location is specified the logs are written to the /tmp directory of the server from where the installer is executed.  Subsequently logs get written to $ORACLE_BASE/oraInventory/logs location.
  • When creating a database using dbca or configuring network using netca  the logs are generated to $ORACLE_HOME/cfgtoollogs/. Depending on the configuration assistant used the logs are created under specific directories.

[oracle@] cfgtoollogs]$ ls -ltr
total 40
…………………………

drwxr-x— 2 oracle dba 4096 Feb 16 16:50 oui
drwxr-x— 3 oracle dba 4096 Feb 18 13:45 emca    <===== EMCA (enterprise Manager) log files
drwxr-xr-x 2 oracle dba 4096 Feb 22 22:22 catbundle
drwxr-x— 3 oracle dba 4096 Feb 23 13:13 dbca    <=====  DBCA log files
drwxr-xr-x 2 oracle dba 4096 Feb 23 13:22 netca   <=====  NETCA log files
drwxr-xr-x 3 oracle dba 4096 Mar 18 16:47 opatch  <=====  opatch log files

2.  Clusterware startup

  • On server start or reboot, the information pertaining the various devices are written to the system logs files directory. The location of these log files can vary depending on the OS, on Linux servers they are located in  /var/log/messages.
  • When the OS starts the clusterware, messages are written to two locations, (a) to the alert log located at $ORA_CRS_HOME/log/< >alert<nodename>.log and  (b) for the various clusterware daemon processes to the respective daemon directories.
  • After the clusterware demons are up and running, the clusterware attempts to start the resources configured and managed by the CRS. This is done by reading the configuration settings from the OCR file. During this process, the CRS generates log files for each resource.
    • For the nodeapps which includes VIP, ONS and GRD the log files are located in $ORA_CRS_HOME/log/racg directory.
    • The logs related to the other resources are located under the respective home directories. For example the ASM startup log file is located in the $ASM_HOME/log/<nodename>/racg/ directory and the RBDMS database related resources will be located in the $ORACLE_HOME/log/<nodename>/racg directory.

[oracle@prddb1]$ ls -ltr $ORA_CRS_HOME/log/prddb1
total 168
drwxr-x— 2 oracle dba  4096 Feb 16 14:41 admin   <=== 
drwxr-x— 2 root   dba  4096 Feb 16 14:44 crsd    <===  cluster ready services deamon log files
drwxr-x— 2 oracle dba  4096 Mar 18 09:17 evmd    <===  event manager deamon log files
drwxr-x— 4 oracle dba  4096 Mar 23 13:12 cssd    <===  cluster synchronization services deamon log files
-rw-rw-r– 1 root   dba 48915 Mar 29 02:44 alertprddb1.log <===  clusterware alert log
drwxrwxr-t 5 oracle dba  4096 Mar 29 11:03 racg    <===  all clusterware nodeapps such as VIP, ONS, etc
drwxr-x— 2 oracle dba 98304 Mar 31 15:57 client  <===  all clusterware client log files

[oracle@prddb1 racg]$ ls -ltr $ASM_HOME/log/prddb1/racg
total 68
drwxr-xr-t 2 oracle dba  4096 Feb 16 15:53 racgmain
drwxr-xr-t 2 oracle dba  4096 Feb 16 15:53 racgeut
drwxr-xr-t 2 oracle dba  4096 Feb 16 15:53 racgmdb
drwxr-xr-t 2 oracle dba  4096 Feb 16 15:53 racgimon
-rw-r–r– 1 oracle dba  2256 Mar 23 13:04 imon.log
-rw-r–r– 1 oracle dba 22423 Mar 25 10:06 ora.prddb1.LISTENER_PRDDB1.lsnr.log <== ASM listener log
-rw-r–r– 1 oracle dba  2617 Mar 25 10:06 mdb.log
-rw-r–r– 1 oracle dba 17696 Mar 25 10:06 ora.prddb1.ASM1.asm.log  <==  asm log generated by CRS during resource startup

[oracle@prddb1 racg]$ ls -ltr $ORACLE_HOME/log/prddb1/racg/
total 940

-rw-r–r– 1 oracle dba  43669 Mar 25 10:06 ora.prddb1.RACPROD_PRDDB1.lsnr.log   <==  database listener log
-rw-r–r– 1 oracle dba   6155 Mar 29 11:00 ora.RACPROD.RMANBKUP.RACPROD1.srv.log  <== database service log
-rw-r–r– 1 oracle dba  15474 Mar 29 11:01 ora.RACPROD.RMANBKUP.cs.log   <== database composite service log
-rw-r–r– 1 oracle dba  11606 Mar 29 11:16 ora.RACPROD.RACPROD1.inst.log  <== database instance startup log

3. Database logs

Once the database is up and running then the log entries are written to their normal destinations…  for example the database logs will be written to $ORACLE_BASE/admin/<database name>/bdump/alert_<db instancename>.log

Note: Only the errors during startup are logged under $ORACLE_HOME/log/<nodename>/racg location.  If the startup is clean the log at this location only has a success entry and the remaining startup entries are  written to the alert_<instancename>.log on the respective nodes for the respective instances.

In 11gR2 this is again different.. we will discuss that in a different blog at a later time.

Sangam10 :: call for papers is on

SANGAM10-web

Interested in presenting at the Oracle users group conference in India?  AIOUG has started accepting papers for their second users conference to be held on 3rd and 4th September.

Check out www.aioug.org/sangam10.php for more details.

10gR2 RAC – POC – additional VIP to isolate network traffic

In a RAC configuration we traditional have a  VIP associated to the public address for every node in the cluster.  Since this is the only public network to the database server, all traffic is directed through this VIP.

Note:  Multiple networks in the cluster are not officially  supported for Oracle 10.2 RAC.    So this entire venture of multiple networks was at our risk.

We had a requirement to isolate application traffic received from the application tier from the regular public/user/third party application traffic.  For this an additional public IP was configured for each server in the cluster and an additional VIP was defined on this new public IP.   Since the intention was to use this new VIP only for the application tier, we called it the middle tier (mt) VIP.     So this note discuss the process followed and the outcome

Steps to add the new VIP

There are two methods of adding a VIP to CSR from the command prompt as discussed below or by creating a ‘cap’ file that contains the profile definitions.  This is a 3 node cluster, the nodes are ebspocdb1, ebspocdb2, ebspocdb3

1. Create an user VIP using the following command as user ‘oracle’

/app/oracle/product/crs/bin/crs_profile -create ora.ebspocdb1-mt.vip -t application -d ‘Mid Tier application VIP’ -a /app/oracle/product/crs/bin/usrvip -h ebspocdb1 -p favored -o as=1,ap=1,ra=0,oi=eth3,ov=112.30.1.42,on=255.255.255.0
/app/oracle/product/crs/bin/crs_profile -create ora.ebspocdb2-mt.vip -t application -d ‘Mid Tier application VIP’ -a /app/oracle/product/crs/bin/usrvip -h ebspocdb2 -p favored -o as=1,ap=1,ra=0,oi=eth3,ov=112.30.1.43,on=255.255.255.0
/app/oracle/product/crs/bin/crs_profile -create ora.ebspocdb3-mt.vip -t application -d ‘Mid Tier application VIP’ -a /app/oracle/product/crs/bin/usrvip -h ebspocdb3 -p favored -o as=1,ap=1,ra=0,oi=eth3,ov=112.30.1.44,on=255.255.255.0

2. Once the profile has been created, register the resource with CRS..

/app/oracle/product/crs/bin/crs_register ora.ebspocdb1-mt.vip
/app/oracle/product/crs/bin/crs_register ora.ebspocdb2-mt.vip
/app/oracle/product/crs/bin/crs_register ora.ebspocdb3-mt.vip

3. This resource should be owned by ‘root’.  As user ‘root’ change the ownership of the resource and then give ‘oracle’ user execute privilege for this resource.   You can execute these commands from any node in the cluster.

/app/oracle/product/crs/bin/crs_setperm ora.ebspocdb1-mt.vip -o root
/app/oracle/product/crs/bin/crs_setperm ora.ebspocdb1-mt.vip -u user:oracle:r-x

/app/oracle/product/crs/bin/crs_setperm ora.ebspocdb2-mt.vip -o root
/app/oracle/product/crs/bin/crs_setperm ora.ebspocdb2-mt.vip -u user:oracle:r-x

/app/oracle/product/crs/bin/crs_setperm ora.ebspocdb3-mt.vip -o root
/app/oracle/product/crs/bin/crs_setperm ora.ebspocdb3-mt.vip -u user:oracle:r-x

4. Start the resource as user ‘oracle’

/app/oracle/product/crs/bin/crs_start -c ebspocdb1 ora.ebspocdb1-mt.vip
/app/oracle/product/crs/bin/crs_start -c ebspocdb2 ora.ebspocdb2-mt.vip
/app/oracle/product/crs/bin/crs_start -c ebspocdb3 ora.ebspocdb3-mt.vip

5.  Verify if VIP has been configured and started

[oracle@ebspocdb1 ~]$ crsstat | grep .vip
ora.ebspocdb1.vip           ONLINE     ONLINE on ebspocdb1
ora.ebspocdb1-mt.vip        ONLINE     ONLINE on ebspocdb1
ora.ebspocdb2.vip           ONLINE     ONLINE on ebspocdb2
ora.ebspocdb2-mt.vip        ONLINE     ONLINE on ebspocdb2
ora.ebspocdb3.vip           ONLINE     ONLINE on ebspocdb3
ora.ebspocdb3-mt.vip        ONLINE     ONLINE on ebspocdb3

NOTE:  if you need to modify this resource for any reason you could use the crs_profile update command.  However you have to first stop the resource using crs_stop command and then execute the update command. For example..
crs_profile -update ora.ebspocdb1-mt.vip -t application -a /app/oracle/product/crs/bin/usrvip -h ebspocdb1 -p favored -o oi=eth3,ov=112.30.1.42,on=255.255.255.0
After executing the update command, repeat steps 2 thru 4 to implement this change.

Create second  database listener

Create a second listener on the new mt.vip  using netca on a different port.  Select the new mt VIP as the network.  This will also add the new listener to CRS.  After configuring the listener verify the status.

[oracle@ebspocdb1 ~]$ crsstat | grep .lsnr
ora.ebspocdb1.LISTENERMT_EBSPOCDB1.lsnr    ONLINE     ONLINE on ebspocdb1
ora.ebspocdb1.LISTENER_EBSPOCDB1.lsnr      ONLINE     ONLINE on ebspocdb1
ora.ebspocdb2.LISTENERMT_EBSPOCDB2.lsnr    ONLINE     ONLINE on ebspocdb2
ora.ebspocdb2.LISTENER_EBSPOCDB2.lsnr      ONLINE     ONLINE on ebspocdb2
ora.ebspocdb3.LISTENERMT_EBSPOCDB3.lsnr    ONLINE     ONLINE on ebspocdb3
ora.ebspocdb3.LISTENER_EBSPOCDB3.lsnr      ONLINE     ONLINE on ebspocdb3

Update the LOCAL_LISTENER and REMOTE_LISTENER definitions in the tnsnames.ora file.

Update the LOCAL_LISTENER and REMOTE_LISTENER definitions in the tnsnames.ora file to include the new LISTENERMT_EBSPOCDB1 listener definitions.  Once this is complete and the listener is recycled, the database services are dynamically registered with the listener.

The Results

While the default VIP was defined in the DNS server, the private VIP was only visible to the app tier. On a load test, we noticed that not all connections was established with success.  Actually every other connection failed.

The intention of having a dedicated VIP was to isolate connections from the app tier to the database tier using this private VIP.  However once the LOCAL_LISTENER parameter is configured, database services are registered with both listeners,  what’s wrong with that? When the app tier makes a connection request to the database tier using the new VIP, and when the database tier had to return the session handle back to the requestor, it could not pin it back to the same listener that received the request and almost every other time the session handle was routed to the other listener causing session death.

What would have been nice, is if the services could be mapped to a specific listener, which would allow sessions to be pinned back to the service to the listener to the requestor.

Conclusion

Its obvious why Oracle does not support multiple public networks for the database tier, since VIPs are assigned to public NIC, additional VIP did not help in this isolation.

OCR Repair..

We are in the middle of a test cycle trying to implement FAN between BEA weblogic and Oracle 10g R2 3 node RAC database on OEL 5.   As part of the configuration and setup, after adding the remote application servers to the ONS configuration.   The clusterware did not restart on reboot.

1. Checking the demons using ps -ef | grep cssd, crsd,evmd . all demons where up. however crs_stat – all or crsstat did not give any output.

2. Checking the CSSD log files I noticed the following message.. in the cssd log file

$ORA_CRS_HOME/log/prddb3/cssd/cssd.log

[CSSD]2009-12-24 19:30:36.042 [1274124608] >TRACE:   clssnmRcfgMgrThread: Local Join
[CSSD]2009-12-24 19:30:36.042 [1274124608] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk

Note:  This basically indicated that a node was locking the disk not allowing other nodes to join the cluster.  The node (prddb4) was trying to read the OCR file  (please note OCR is the first file that is accessed by the clusterware during startup) and was not able to.  This potentially indicates a bad OCR file. 

In a similar situation before, a reboot of all servers fixed the locking and the clusterware started without any hiccups.

There may have been other reasons on why this could have happened however due to the urgent nature of the problem and the time it could take to debug and or troubleshoot the situation we decided to repair the OCR file.

3. Nodes prddb3 and prddb5 was repeatable attempting to start the CRS. which generated lots of log entries.  To avoid the logs filling up the disks we requested system admins to shut down the cluster.

Now to fix the problem only one node was started prddb3.  We disabled the autostart of crs using  (this requires ROOT access)
/etc/init.d/init.crs stop  (to stop the crs stack)
/etc/init.d/init.crs disable. (to disable aiuto start on reboot)

4. Based on analysis in step 2 above, the next step was  to repair the OCR file using the following steps also as user root.

[root@prddb3 bin]# ./ocrconfig -repair ocr /dev/raw/ocr1
[root@prddb3 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     306968
         Used space (kbytes)      :      12852
         Available space (kbytes) :     294116
         ID                       :  658275539
         Device/File Name         : /dev/raw/ocr1
         Device/File integrity check succeeded

         Device/File not configured

Cluster registry integrity check succeeded

5.  ‘Device/File not configured’?  Then what is the check succeeded message. Isn’t it a bit confusing. We had configured two OCR files and then why is the second file missing?  Realized that the repair command will only repair one OCR file at a time, besides we had only repaired the primary copy.  Next step was to repair the mirror copy

[root@prddb3bin]#./ocrconfig -repair ocrmirror  /dev/raw/ocr2
[root@prddb3 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     306968
         Used space (kbytes)      :      12852
         Available space (kbytes) :     294116
         ID                       :  658275539
         Device/File Name         : /dev/raw/ocr1
         Device/File integrity check succeeded
         Device/File Name         : /dev/raw/ocr2
         Device/File integrity check succeeded

Cluster registry integrity check succeeded

6.  Now that both the OCR files are fixed we started the clusterware stack using the /etc/init.d/init.crs start command. This brought up the cluterware and the complete stack without any issues.  

7. Now we had to reset the clusterware sstartup process to auto restart on reboot. . (recollect we disabled reboot in Step 3 above).

/etc/init.d/init.crs enable
/etc/init.d/init.crs start

OOW 2009

OOW was a great success …gone are the days when this event is focused towards just marketing.  Normally this conference had more VP, CIO, etc attending to get overview of marketing fluff from various presentations.  These days presentations are filled with good technical information useful for database administrators in their day to day activities. In this open world I had the opportunity to present three papers..

1. RAC for developers:  It was a pretty to see that basic knowledge of how Oracle works lacking amongst the developers.. they did not know their query was executed by oracle and the result sets returned back to them.  this was a hard presentation as I was dragged into a rat hole on what an instance was and how the database is common against many instances in a RAC environment.  I did not do much justice on what the presentation was focused on  namely discussing FAN, FCF, load balancing..how to code and troubleshoot them.

2. Looking under the hood at the Oracle Clusterware. I had done this presentation in OOW 2008 and received good response/feedback.  This year also the response was very good, had a full room and lots of good questions at the end of the session.

IMG_7546

3.  Tools and Utilities for RAC available on Metalink.  Most popular of all the presentations, its surprising how many of us really know the various tools and utilities available on Metalink and OTN that could be used for troubleshooting and monitoring a RAC.  Yet we spend all the money we have to third party tool vendors and developers when most of what is provided by them and essential for day to day administration and troubleshooting of the RAC environment are available free of cost from Oracle.

Looking forward to presenting at future OOW’s  and conferences around the world.

_____________

16th Jan ‘10  —  this has been sitting in my draft folder for such a long time.. I understand this is old material.. but since its already done.. i decided to post it.

______________

DGB::ORA-16608:one or more sites have warnings

I am at a customer site trying to help them with a physical standby implementation in a RAC environment. The primary database is a three node RAC cluster but the standby environment is a single instance RAC database.

During setup and configuration of the dataguard broker, we had the following error . ” ORA-16608: one or more sites have warnings”. Like it says one or more errors? Even if there is only one error with the configuration or setup the misleading message can make you wondering.. Lets take at analyzing one such scenario..

1. From the command line interface of the broker when you check the configuration status using show configuration, the broker would give you the current setup at the end of the show configuration output, the broker tries to query the current status from all servers involved, this includes all instances in the RAC configuration on both the primary and physical standby side.

DGMGRL> show configuration verbose

Configuration
Name: AT_DGB
Enabled: YES
Protection Mode: MaxPerformance
Fast-Start Failover: DISABLED

Databases:
ATDB – Primary database
ATSB – Physical standby database

Current status for “AT_DGB”:
Warning: ORA-16608: one or more databases have warnings

2. Which database in the DGB configuration has this error? SHOW DATABASE VERBOSE <database name> would tell us which database has this error. The error above really does not tell which database has the error. So we may have to check the details against both the databases. Checking the status of the Primary database we observed “ORA-16809: multiple warnings detected for the database” another error? again not one specific error but potentially many.. that what the error description tells us..

DGMGRL> SHOW DATABASE VERBOSE ‘ATDB’;

Current status for “ATDB”:
Warning: ORA-16809: multiple warnings detected for the database

Properties:
InitialConnectIdentifier = ‘atdb’
ObserverConnectIdentifier = ”
LogXptMode = ‘SYNC’
Dependency = ”
DelayMins = ‘0’
Binding = ‘OPTIONAL’
MaxFailure = ‘0’
MaxConnections = ‘1’
ReopenSecs = ‘300’
NetTimeout = ‘180’
LogShipping = ‘ON’
PreferredApplyInstance = ”
ApplyInstanceTimeout = ‘0’
ApplyParallel = ‘AUTO’
StandbyFileManagement = ‘auto’
ArchiveLagTarget = ‘0’
LogArchiveMaxProcesses = ’30’
LogArchiveMinSucceedDest = ‘1’
DbFileNameConvert = ”
LogFileNameConvert = ”
FastStartFailoverTarget = ”
StatusReport = ‘(monitor)’
InconsistentProperties = ‘(monitor)’
InconsistentLogXptProps = ‘(monitor)’
SendQEntries = ‘(monitor)’
LogXptStatus = ‘(monitor)’
RecvQEntries = ‘(monitor)’
HostName(*)
SidName(*)
LocalListenerAddress(*)
StandbyArchiveLocation(*)
AlternateLocation(*)
LogArchiveTrace(*)
LogArchiveFormat(*)
LatestLog(*)
TopWaitEvents(*)

Checking the details of the other database in the configuration reveled no errors.

3. Time to drill down further to get more details. Checking the DMON process log files under ‘bdump’ destination directory we noticed the following.. (during initial configuration it easier if we tail -f on the DMON log file on both the primary and standby servers to monitor its activity)..

for example: /oracle/admin/atdb/bdump/$ tail -f drcatdb1.log

DG 2009-09-30-14:21:34 0 2 0 RSM Warning: Property ‘StandbyFileManagement’ has inconsistent values:METADATA=’auto’, SPFILE=”, DATABASE=’auto’
DG 2009-09-30-14:21:34 0 2 0 RSM0: HEALTH CHECK WARNING: ORA-16714: the value of property StandbyFileManagement is inconsistent with the database setting
DG 2009-09-30-14:21:34 0 2 0 RSM Warning: Property ‘ArchiveLagTarget’ has inconsistent values:METADATA=’0′, SPFILE=”, DATABASE=’0′
DG 2009-09-30-14:21:34 0 2 0 RSM0: HEALTH CHECK WARNING: ORA-16714: the value of property ArchiveLagTarget is inconsistent with the database setting
DG 2009-09-30-14:21:34 0 2 0 RSM Warning: Property ‘LogArchiveMinSucceedDest’ has inconsistent values:METADATA=’1′, SPFILE=”, DATABASE=’1′
DG 2009-09-30-14:21:34 0 2 0 RSM0: HEALTH CHECK WARNING: ORA-16714: the value of property LogArchiveMinSucceedDest is inconsistent with the database setting
DG 2009-09-30-14:21:35 0 2 698930610 Operation MON_VERIFY continuing with warning, status = ORA-1679

4. There are definitely so many errors in the DMON log file all related to property mismatch between dataguard broker and the database. “HEALTH CHECK WARNING: ORA-16714: the value of property ArchiveLagTarget is inconsistent with the database setting” and all these properties have a mismatch?

Checking the values on both the DGB and the instance really could not find any mismatch. After considerable searching on metalink and google had no luck with any specific reasons why this could be wrong or showing as a mismatch.

5. If you recollect one of the requirements for the DGB is to have an SPFILE configuration, the customer did have an SPFILE however instead of keeping the file on shared location, maintained three copies (one on each instance) of the SPFILE in the $ORACLE_HOME/dbs directory..

6. We created a new pfile from the spfile, disabled the init<SID>.ora file and created an spfile file on ASM.

SQL>create spfile=’+data_01_dg/atdb/spfileatdb.ora’ from pfile=’/home/oracle/pfileATB.ora

This did not solve the problem. Apparently initatdb1.ora located in $ORACLE_HOME/dbs directory was not rightly pointing to the ASM spfile. We changed this to look like the output below..

$ORACLE_HOME/dbs/cat initatdb1.ora
SPFILE=’+DATA_01_DG/atdb/spfileatdb.ora’

7. Starting the instance either using SQLplus or SRVCTL utility brought the database instance up successfully however the instance did not use the right spfile that was created on ASM but used the spfile located in the $ORACLE_HOME/dbs directory.

8. We had to add the spfile using the srvctl utility to the OCR file. Where is the SPFILE information stored in the OCR file? Every database managed by on the RAC cluster could keep the SPFILE file (optionally) in the OCR file. Lets analyze the OCR file, to read the contents of the OCR file, we take a dump of the contents into ASCII readable format.

Not seen such a behavior before.  We are continuing to investigate why this is happening?

$ORA_CRS_HOME/bin/ocrdump will create an output called OCRDUMP in the default directory where the command was executed from. Edit the OCRDUMP file (vi OCRDUMP) and search for SPFILE, against the database ATDB database the output was blank. You would think that if the database was created using DBCA that wrote the database and instance related entries to the OCR file

[DATABASE.DATABASES.atdb.SPFILE]
ORATEXT :
SECURITY:{USER_PERMISSION:PROCR_ALL_ACCESS, GROUP_PERMISSION:PROCR_WRITE,OTHER_PERMISSION: PROCR_READ, USER_NAME : oracle, GROUP_NAME : dba}

Using SRVCTL utility we added the SPFILE information to the OCR file and this fixed the startup issue to use the right SPFILE.

$ /oracle/crs/bin/srvctl modify database -d atdb -p +DATA_01_DG/ATDB/spfileatdb.ora

To make sure that the entry was in the OCR we did another dump of the OCR file and checked the entry for the SPFILE for the ATDB database. Yes it was there..

[DATABASE.DATABASES.atdb.SPFILE]
ORATEXT : +SHARED_DATA_01_DG/ARM00TDB/spfilearm00tdb.ora
SECURITY:{USER_PERMISSION:PROCR_ALL_ACCESS, GROUP_PERMISSION:PROCR_WRITE,OTHER_PERMISSION: PROCR_READ, USER_NAME : oracle, GROUP_NAME : dba}

9. This really fixed the issue, it was the DGB not able to synchronize the value of these various parameters/properties between the various instances in the cluster. BTW, in a RAC environment its a best practice to maintain the SPFILE in a shared location such as ASM storage.

The SHOW CONFIGURATION command from the DGMGRL returned error free.

DGMGRL> SHOW CONFIGURATION VERBOSE

Configuration
Name: AT_DGB
Enabled: YES
Protection Mode: MaxPerformance
Fast-Start Failover: DISABLED
Databases:

ATDB – Primary database
ATSB – Physical standby database

Current status for “AT_DGB”:
SUCCESS

Rundown of the last 1 odd years….

I had just started and after two entries disappeared?  hmmm..  yes apologize for staying away…

Lets do a rundown of the last 1 odd years..

1.  ASM book launch was good, the book is doing good in the  market its interesting to see how well ASM is being adopted in the Oracle space. Is time to update this book for the next release?   with 11gR2 released maybe something to consider.. need to get in touch with the other authors.  maybe soon..

2.  Did do several presentations at the OOW 2008:
1. Looking under the hood at the  Oracle clusterware (2). Understanding Oracle RAC   (3). Tools and utilities available on metalink . All presentations where well attended, several good questions and while sharing we also learn.  Will I be a the OOW09?  ;OOW is a great place to meet up with several of the Oracle users from all over the world.  Unlike the earlier open world shows, OOW  has become more technical in the recent years and there is a lot we learn from some the excellent topics presented in the RAC space.  Its the lack of time or session conflicts that prevent many of us from attending all of them.

3. RAC SIG..
The RAC SIG had opened elections for its officer positions.  As mentioned before, I wanted a smaller role, besides I could not stay away from not working with user groups. Stood for the ‘International Chair’ position and was elected.  This is a great role, because it allows me to helping take the RAC SIG to the next level.  We are encouraging starting smaller SIGS in the various regions of the world.  Actually China, Singapore and India have adopted this already, with several meetings already scheduled and organized we hope other regions will also do the same..

4. AIOUG
Great start, and 2008 was a great year. we had two technology nights at Bangalore and Hyderabad.  Great start, great interest from Oracle users in India.  In 2009 we have already done our first quarterly meeting in Mumbai, instead of a technight we organized a techday.  We wanted to try this based on the feedback received from the other two tech events.
We just completed our next major event ‘3 day Oracle user conference Sangam’09.  With Tom Kyte being the key speaker for the event, we also had other speakers from inside and outside Oracle.  More about it in a separate blog.

Sangam09 just completed, OOW just around the corner. . lot to write and share.. not to mention more technical content also on RAC and ASM

Back from UKOUG 2007

UKOUG is really a very well organized conference. A conference I look forward to attend every year, either as speaker or as an attendee. This years conference was no different as usual great. Unbelievable set of presenters on great topics. It was interesting to see some excellent RAC related topics this year. I did give a presentation on “Exploiting Parellel Queries in a RAC Environment” just focusing on RAC related areas. Apart from this I also had the opportunity to chair a RAC roundtable on the final day.

Julian Dykes presentation was different compared to last year. Those who attended last years UKOUG presentation, Julian had done a presentation where he requested some attendees (including myself) to act as Oracle instances to demonstrate Cache Fusion. Here are some photographs from this session last year (courtesy Connor McDonald) Picture 1 and picture 2. This year his presentation focused on Interconnects. Was excellent as usual. Here is another view from Alex’s blog on this presentation..

There were 4-5 papers from CERN on RAC topics, two of the ones I had attended are 1) Real Life experiences of RAC Scalability with a 6 Node Cluster, 2) Installing and Managing Hundreds of RAC Servers. Its interesting to see more and more companies undertaking challenging projects using Oracle RAC.

Another outstanding presentation was on Oracle Clusterware By Bernhard de Cock Buning. He explained the clusterware details with such great detail. Showing all the components of the clusterware and then drilling down into each component at a greater detail. The new component piece that I learnt from this presentation was the oclsomon, which was introduced in Oracle 10.2.0.2 and I have not covered this in my Oracle 10g RAC Grid Services and Clustering book. Great job Bernhard..

I will discuss about oclsmon later on this blog.

I hope these presenters can do a webcast on these topics for the Oracle RAC-SIG.

Back from OOW 2007

Just returned from OOW’2007. As usual this is one of the events that tend to be pretty hectic with several sessions and events that get scheduled in parallel and you end up confused on which one would be better to pick. Though several times, several of us think that OOW is Oracle’s show case to show off what is coming or some kind of marketing talk. Over the past couple of years, I have noticed that this is not completely true. There are presentations that contains product and new release overview, if you carefully select the presentation titles, you can get some good technical presentation.

Some of the good technical sessions I had attended were the performance tuning for data warehouses, performance tuning roundtable, performance tuning RAC and several of the IOUG and SIG events.

OOW started on Sunday with various SIGs presenting in their respective areas. I had presented a topic on Parallel queries in Oracle RAC for the RAC-SIG on Sunday.

ASMBook

At OOW, we also released a book on Oracle Automatic Storage Management (ASM) published by Oracle Press. This book is a combined effort of Nitin, Rich Long and myself. I am happy that the contents are pretty well covered from the basics of storage management to some internals and also to topics such as how to change the paradigm of the current of storage management implementation.

.AIOUGlogo.png

This week had another great importance, a dream had come true, and a dream I had been working on for several years had become a reality. While every country had an Oracle Users Group that provided knowledge and information sharing to Oracle users, one such group was long past due in India. I had been working on getting one started over the past several years. I am happy to say we officially went live with this group. Those interested should check the user group website at www.aioug.org