-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Hi , i use this project with sqlite_fts4 to custom tokenizer and ranking function in a engine-map of sqlite.
And the mainly interaction between sqlite and python is the
register_functions defined in sqlite_fts4 and your register_tokenizer which plugin the code as your example say.
And i try my chinese tokenizer locally with one engine as follows:
`
import jieba
class JiebaTokenizer(fts.Tokenizer):
def tokenize(self, text):
for t, s, e in jieba.tokenize(text):
l = len(t.encode("utf-8"))
p = len(text[:s].encode("utf-8"))
yield t, p, p + l
contents = [("これは日本語で書かれています",), (" これは 日本語の文章を 全文検索するテストです",),
("新兴铸管",)]
tkj = fts.make_tokenizer_module(JiebaTokenizer())
conn.execute("CREATE VIRTUAL TABLE fts USING FTS4(tokenize={})".format("jieba_tokenizer"))
c = conn
r = c.executemany("INSERT INTO fts VALUES(?)", contents)
r = c.execute("SELECT * FROM fts").fetchall()
r = c.execute("SELECT * FROM fts WHERE fts MATCH '新兴'").fetchall()
`
the last r produce the success conclusion.
My problem is that when i use it in a dictionary of engine, key as name, value as engine,
with some complex interaction (register)
It yield the following error in gdb:
Program received signal SIGSEGV, Segmentation fault.
0x0000555555690253 in delete_garbage.isra.26 (
old=0x5555558c7540 <_PyRuntime+416>, collectable=0x7fffffffda30)
at /tmp/build/80754af9/python_1599203911753/work/Modules/gcmodule.c:948
948 /tmp/build/80754af9/python_1599203911753/work/Modules/gcmodule.c: No such file or directory.
this seems a error caused by cffi,
relate questions are:
https://stackoverflow.com/questions/43079945/why-is-there-a-segmentation-fault-with-this-code
https://stackoverflow.com/questions/41577144/how-to-solve-a-sqlite-fts-segmentation-fault-in-python
some says cffi have some problem in nest objects, and say if replace cffi by pybind11,
this kind of problem can be solved, can you try to give me some suggestions ?
And if you require, i will upload the whole code to make the error reproduce.
Thank you.