Monday, October 31, 2005
weighted searches
If the Search Confidence Level submitted is less than 100%, PRS will search the database and calculate a total for each record based upon the weighting of the input criteria and whether or not there is a full or partial match in the database. The details of the algorithm are:
· Every input criterion for a given query must be assigned a weight that will be stored in the PRS. Weights used for Provider searches are stored in the code table GRS_CT_PROV_CRITERION_WEIGHTS. The weights can be any number between 0 and infinity, but 50 is probably reasonable. If the weight for a given criterion is 0, the effect is that criterion is not considered in the query. If the weight is 50 (and if the maximum weight is 50) then that criterion must be an exact match. If the weight is any number between 1 and 49, the criterion is considered during the query but does not have to be an exact match.
· When a query is submitted, PRS will calculate a "possible total" based on the sum of the weights of the supplied input criteria.
· PRS will issue a search on the database for each search criterion, and "keep score" of the hits, weighting them according to the weight codes. If a record matches a particular input criterion (independent of whether the input criterion includes a wildcard character) the weight associated with that criterion will be added to the total score.
· The total score will be divided by the possible total to achieve a percentage confidence level.
· If the calculated confidence level of the record is greater than or equal to the Search Confidence Level, then the record should be returned.
For example:
· The user supplies a Name, an Address Line 1 and a Province. Search Confidence threshold is 50%.
· Criteria weights (from the code table) are as follows:
· Name: 50
· Address Line 1: 45
· Province: 10
· This search should not return any single-criterion matches (even Name), but should return any RU which matches any pair of criteria.
i. Calculate the total possible score: 105; Calculate the target score: 52.5
ii. For each criterion, construct a query listing matches and their scores:
· Name Query returns:
RU_ID | Score | Criterion |
00001 | 50 | NAME |
00002 | 50 | NAME |
· Address Query returns:
RU_ID | Score | Criterion |
00001 | 45 | ADDRESS |
00003 | 45 | ADDRESS |
00004 | 45 | ADDRESS |
· Province Query returns a long list of RU IDs:
RU_ID | Score | Criterion |
00005 | 10 | PROVINCE |
00003 | 10 | PROVINCE |
i. Form a query presenting the union of these result sets.
ii. Form a summary query summarizing total score by RU_ID:
RU_ID | Score |
00001 | 95 |
00002 | 50 |
00003 | 55 |
00004 | 45 |
00005 | 10 |
i. RU_Ids 00001 and 00003 are returned. 1 has a Name and Address Line 1 match, 3 has a Name and Province match. 2 Has only a Name match, 4 has only an Address Line 1 match, and 5 has only a Province match, so all three are excluded.
About Me
- Name: Andrew Cripps
- Location: Victoria, British Columbia, Canada
Independent consultant in healthcare.