Friday, September 12, 2008

Multi-Column CTX Index (ALMOST Hack-Attack)

The Problem:

The company I work for has an Oracle table containing book information that has data in two columns that they want to perform a keyword search on, Subject_Headings - VarChar2(70) & Medical_Subject_Headings - VarChar2(70). Now, we could create two separate CTX indexes, one on Subject headings and one on Medical_Subject_Headings and then use an OR in the where clause like this:

Select *
From Items
Where (
contains(Subject_Headings, '{HEART}') > 0 OR
contains(Medical_Subject_Headings, '{HEART}') > 0
)

The problem with this approach is that:

  1. It requires two separate indexes.
  2. It requires both of those indexes to be searched every time the keyword is used.
  3. The "OR" in the where clause affects the performance of the query.

The simple solution, or so I thought, was to create a compound CTX index. (a CTX index using multiple columns) So I tried this implementation:


Create Index schema.G_CTX_Subject_IDX

On schema.Items(Subject_Headings, Medical_Subject_Headings)

IndexType Is CTXSYS.Context;

But, if you're reading this you probably already know that you can't get there from here. You get the dreaded: ORA-29851: cannot build a domain index on more than one column. So what do you do?

It l(o)(O)ks like the only way to solve this problem is with a stupid hack trick.

The Stupid Hack:

Right about now it looks like I am going to have to:

  1. Create an extra hack column, Subjects VarChar2(141), to hold the contents of Subject_Headings - VarChar2(70) + one space + the contents of Medical_Subject_Headings - VarChar2(70)
  2. Create an insert & update trigger on the table to merge the values of Subject_Headings and Medical_Subject_Headings into the hack column "Subjects"
  3. Create a CTX index on the "Subjects" column.

If you are thinking that also then hang on...

The Real Solution:

The real solution is to use the Multi_Column_DataStore. I have read elseware on the web that only the CTXSYS user can create multi_column preferences but I was able to do it using my normal user. I don't know if that is because the DBA relaxed the permissions and I can't ask him because he left the company which left us in a kind of a fix. Anyway these examples are going to be built using a normal user; if the statements don't work then you can login as ctxsys and try them.

The first thing you need to do is create datastore preference/definition.

Begin

ctx_ddl.Create_Preference('two_column_subject', -- Preference Name

'MULTI_COLUMN_DATASTORE'); -- object name

End;

Next, you need to set the attributes for that preference. In this case my datastore type is going to be columns and my columns are Subject_Headings and Medical_Subject_Headings:

Begin

cst_ddl.Set_Attribute('two_column_subject', -- pref name

'COLUMNS', -- attribute name

'Subject_Headings, Medical_Subject_Headings'); -- value = these columns

End;

Now we only need to create our CTX index using this datastore. For this example I am only demonstrating how to perform the solution so I have not assigned parameters for word list, stop words, lexer, storage, etc. You should check with your DBA for important things like those.

Create Index schema.G_CTX_Subject_IDX

On schema.Items(Subject_Headings)

IndexType Is CTXSYS.Context

Parameters('DataStore schema.two_column_subject');

In the above example you would substitue "schema" with the user that created the multi-column preference "two_column_subject".

Soup-To-Nuts Example:

Note:

I am using TOAD v9.0.0.160 and this script has been tested with that version of TOAD ONLY. If you are running this in SQL*PLUS then you may need to make some minor modifications in order to run the script.

--

-- Drop the table if it exists and then create it.

--

Begin Execute Immediate 'drop table Scott_Items'; Exception When Others Then Null; End;
/


Create Table Scott_Items
(
pk_Item Number(11) Not Null,
Title VarChar2(100),
Subject_Headings VarChar2(70),
Medical_Subject_Headings VarChar2(70)
);
/

Insert Into Scott_Items (pk_Item, Title, Subject_Headings, Medical_Subject_Headings)
Values(1, 'Title 1', '1. HEART 2.REFERENCE', '1.CARDIOVASCULAR DISEASES.');
/

Insert Into Scott_Items (pk_Item, Title, Subject_Headings, Medical_Subject_Headings)
Values(2, 'Title 2', '1. MAGNETIC RESONANCE IMAGING. 2. RADIOLOGICTECHNOLOGISTS.', 'MAGNETIC RESONANCE IMAGING.');
/
Insert Into Scott_Items (pk_Item, Title, Subject_Headings, Medical_Subject_Headings)
Values(3, 'Title 3', '1. SUBSTANCE ABUSE--PATHOPHYSIOLOGY. 2. DOPAMINE--PHYSIOLOGICAL EFFECT', '1. NEUROPHYSIOLOGY. 2.HEART EFFECT');
/
Insert Into Scott_Items (pk_Item, Title, Subject_Headings, Medical_Subject_Headings)
Values(4, 'Title 4', '1.SPORTS 2.BASKETBALL 3.CELTICS 4. REFERENCE', '');
/
Begin
Begin
ctx_ddl.drop_preference('two_column_subject');
Exception
When Others Then Null;
End;
ctx_ddl.create_preference('two_column_subject',
'MULTI_COLUMN_DATASTORE');


ctx_ddl.set_attribute('two_column_subject',
'COLUMNS',
'Subject_Headings, Medical_Subject_Headings');

End;

/

--

-- Drop existing CTX index, if it exists nad create a new one

--

Begin Execute Immediate 'drop INDEX G_CTX_SUBJECT_IDX'; Exception When Others Then Null; End;

/

CREATE INDEX G_CTX_SUBJECT_IDX ON Scott_Items(Subject_Headings)

INDEXTYPE IS CTXSYS.CONTEXT

PARAMETERS('DataStore SCHEMA.two_column_subject'); -- Change SCHEMA to your user
/

Select Title From Scott_Items

Where Contains(Subject_Headings, '{HEART}')>0;

TITLE

------------------

Title 1

Title 3

Summary:

In this hack-attack we successfully solved the problem of creating a multi-column CTX index (domain index) and avoided the expensive and ugly hack of adding a column, creating triggers, and concatinating data in order to create a single searchable index that would allow a user to search two columns with one keyword statement.

I hope that this hack-attack was helpful to you in your Oracle travels.

ScottE

2 comments:

Scott Abel said...

I am running into a problem with the CTX index not syncing correctly. If the indexed column changes then the sync works but if the second column changes the CTX sync does not see the change. I am currently looking into this and will, hopefully, have the answer shortly.

Scott Abel said...

It looks like we need to resort to a different form of the "stupid hack trick." Please look for the next post: "Multi-Column CTX Index (Stupid Hack Trick)"