[Feature Request] Overlapping Ratio Threshold support

### Overlapping Ratio
Currently, `find_overlap` will be True when any single overlap occurs.

https://github.com/MantisAI/nervaluate/blob/df0e695645b9d8cd78017552a5a9cc8734f82bf8/src/nervaluate/evaluate.py#L330
https://github.com/MantisAI/nervaluate/blob/df0e695645b9d8cd78017552a5a9cc8734f82bf8/src/nervaluate/utils.py#L85-L104

However, in most cases, we hope there could be an `overlapping ratio threshold`.
That is something like this
```python3
pred = {'start':10, 'end':15}
label = {'start':12, 'end':18}

# calculate union and intersection
union = {'start':10, 'end':18}
intersection = {'start':12, 'end':15}

#calculate ratio
ratio = (15-12) / (18-10)

return ratio > threshold
```

The current `find_overlap` uses set operation to find overlaps, which seems to be time inefficient. It would be directly obtained via `start` and `end` values:

Here's my implementation:
```python3

def find_overlap(self, true: dict[str, int | str], pred: dict[str, int | str]) -> bool:
    start_max = max(true['start'], pred['start'])
    end_min = min(true['end'], pred['end'])
    if start_max >= end_min:
        return False
    start_min = min(true['start'], pred['start'])
    end_max = max(true['end'], pred['end'])
    overlap_ratio = (end_min - start_max) / (end_max - start_min)
    return overlap_ratio > self.overlap_ratio_threshold
```


### Last Character excluded
I wonder why we consider the last token, which is very counter-intuition. This comes from https://github.com/MantisAI/nervaluate/pull/32. Maybe @aflueckiger could provide any explanation on this? Does your data `end` includes the last character?

I think for most data, the start and end are the offsets in the original text string:
`text[start:end]` which means the last character is excluded. `text[1:3]` and `text[3:5]` don't have any overlapping.
https://github.com/MantisAI/nervaluate/blob/df0e695645b9d8cd78017552a5a9cc8734f82bf8/src/nervaluate/evaluate.py#L294-L296

### Any support for huggingface Evaluate?
Would the maintainers consider using the standard of huggingface Evaluate? which means inheriting `evaluate.Metric` and pushing to huggingface hub. Afterwards, users could directly call `metric = evaluate.load('{hub_url}')`

Example: https://huggingface.co/spaces/evaluate-metric/glue/blob/main/glue.py

	def find_overlap(true_range: range, pred_range: range) -> set:
	"""Find the overlap between two ranges

	Find the overlap between two ranges. Return the overlapping values if
	present, else return an empty set().

	Examples:

	>>> find_overlap((1, 2), (2, 3))
	2
	>>> find_overlap((1, 2), (3, 4))
	set()
	"""

	true_set = set(true_range)
	pred_set = set(pred_range)

	overlaps = true_set.intersection(pred_set)

	return overlaps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Overlapping Ratio Threshold support #81

Overlapping Ratio

Last Character excluded

Any support for huggingface Evaluate?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	# overlapping needs to take into account last token as well
	pred_range = range(pred["start"], pred["end"] + 1)
	true_range = range(true["start"], true["end"] + 1)

[Feature Request] Overlapping Ratio Threshold support #81

Description

Overlapping Ratio

Last Character excluded

Any support for huggingface Evaluate?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions