Skip to content

Issue with fieldPotency caused by casesensitive comparison when counting $numberOfMatches #43

@rraadd

Description

@rraadd

Hi,

I had some issues with &fieldPotency. The simpleSearch call I used was:

[[!SimpleSearch?
&searchStyle=partial
&docFields=pagetitle,longtitle,description,introtext
&perPage=50
&fieldPotency=pagetitle:10,longtitle:1,description:1,introtext:1
]]

No matter what values for pagetitle I set the sorted results seemed to be randomly ordered. Resources which had the search term in their title were listed after those where the term was used only in the description tag even when pagetitle fieldPotency values were many times higher than those set for the description.

The problem

When the search term is used to count the number of matches for each &docField the comparison seems to be CASE SENSITIVE. Since the titles of my resources all start with a capital letter when compared to search term the result was no match. Thus the potency values for pagetitle were simply ignroned. To understand this behavior better check lines 126-132 of simplesearchdriver.php located in model/simplesearch/driver/

foreach ($this->search->searchArray as $term) {
$queryTerm = preg_quote($term,'/');
$regex = ($searchStyle == 'partial') ? "/{$queryTerm}/i" : "/\b{$queryTerm}\b/i";
$numberOfMatches = preg_match_all($regex, $resource->{$field}, $matches);
if (empty($this->searchScores[$resourceId])) $this->searchScores[$resourceId] = 0;
$this->searchScores[$resourceId] += $numberOfMatches * $potency;
}

In my particular case the problem was 'solved' by simply changing the first letter of $term to be capital before checking for matches by adding those 3 lines of code:

foreach ($this->search->searchArray as $term) {

$first_letter = mb_strtoupper(mb_substr($term, 0, 1, "UTF-8"), "UTF-8");
$term_end = mb_substr($term, 1, mb_strlen($term, "UTF-8"), "UTF-8");
$term = $first_letter . $str_end;

$queryTerm = preg_quote($term,'/');
$regex = ($searchStyle == 'partial') ? "/{$queryTerm}/i" : "/\b{$queryTerm}\b/i";
$numberOfMatches = preg_match_all($regex, $resource->{$field}, $matches);
if (empty($this->searchScores[$resourceId])) $this->searchScores[$resourceId] = 0;
$this->searchScores[$resourceId] += $numberOfMatches * $potency;
}

I am not a programmer so this 'solution' may turned out to be ineffective or even wrong. If you have better ideas how this issue could be avoided I would greatly appreciate if you share you knowledge. Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions