Monday, October 31, 2005

weighted searches

hi Brian
 
I didn't answer one of your questions fully enough.
 
When you search for a provider, the PRS can do a weighted search (search confidence < 100%). You configure the system to apply various weights to search fields. These weights are then used to match providers. You can avoid the weighted search by specifying a confidence level of 100%.
 
What it doesn't do is, say, synonym matches, or other complex matching techniques.
 
Thanks
Andrew
 

If the Search Confidence Level submitted is less than 100%, PRS will search the database and calculate a total for each record based upon the weighting of the input criteria and whether or not there is a full or partial match in the database.  The details of the algorithm are:

·         Every input criterion for a given query must be assigned a weight that will be stored in the PRS.  Weights used for Provider searches are stored in the code table GRS_CT_PROV_CRITERION_WEIGHTS.  The weights can be any number between 0 and infinity, but 50 is probably reasonable.  If the weight for a given criterion is 0, the effect is that criterion is not considered in the query.  If the weight is 50 (and if the maximum weight is 50) then that criterion must be an exact match.  If the weight is any number between 1 and 49, the criterion is considered during the query but does not have to be an exact match.  

·         When a query is submitted, PRS will calculate a "possible total" based on the sum of the weights of the supplied input criteria.

·         PRS will issue a search on the database for each search criterion, and "keep score" of the hits, weighting them according to the weight codes.  If a record matches a particular input criterion (independent of whether the input criterion includes a wildcard character) the weight associated with that criterion will be added to the total score.

·         The total score will be divided by the possible total to achieve a percentage confidence level.

·         If the calculated confidence level of the record is greater than or equal to the Search Confidence Level, then the record should be returned.

For example:

·         The user supplies a Name, an Address Line 1 and a Province.  Search Confidence threshold is 50%.

·         Criteria weights (from the code table) are as follows:

·         Name:  50

·         Address Line 1:  45

·         Province:  10

·         This search should not return any single-criterion matches (even Name), but should return any RU which matches any pair of criteria.

                                                                         i.      Calculate the total possible score:  105; Calculate the target score:  52.5

                                                                        ii.      For each criterion, construct a query listing matches and their scores:

·         Name Query returns:

RU_ID

Score

Criterion

00001

50

NAME

00002

50

NAME

·         Address Query returns:

RU_ID

Score

Criterion

00001

45

ADDRESS

00003

45

ADDRESS

00004

45

ADDRESS

·         Province Query returns a long list of RU IDs:

RU_ID

Score

Criterion

00005

10

PROVINCE

00003

10

PROVINCE

                                                                         i.      Form a query presenting the union of these result sets.

                                                                        ii.      Form a summary query summarizing total score by RU_ID:

RU_ID

Score

00001

95

00002

50

00003

55

00004

45

00005

10

                                                                         i.      RU_Ids 00001 and 00003 are returned. 1 has a Name and Address Line 1 match, 3 has a Name and Province match.  2 Has only a Name match, 4  has only an Address Line 1 match, and 5 has only a Province match, so all three are excluded.



This page is powered by Blogger. Isn't yours?