Random Segmentation fault with embedding.search after upserting #498

Vincent-liuwingsang · 2023-07-08T16:04:29Z

I'm getting segmentation fault "randomly" when using with FastApi. It usually happens after the index is updated.

3 example errors below when running some tests where the setup is:

sqlite as db
10 embedding.search per second via GET (with RLock A)
embedding.upsert + embedding.persist for ~2k docs tasking roughly 5 seconds to index. (with RLock A)
seg fault usually occurs in a couple dozen of embedding.search after the embedding update is done(e.g. 3-4 seconds after)

I'm a little stuck as the memory usage doesn't seem too bad with 800mb(its an existing index+db. about 400mb without loading the index). Any idea how I can debug this?


Thread 0x00000002996b3000 (most recent call first):
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 114 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 449 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 549 in feed_forward_chunk
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/pytorch_utils.py", line 236 in apply_chunking_to_forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 537 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 610 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 1020 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/models/pooling.py", line 108 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/models/pooling.py", line 128 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/models/pooling.py", line 75 in encode
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/vectors/transformers.py", line 48 in encode
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/vectors/base.py", line 136 in batchtransform
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/base.py", line 293 in batchtransform
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/search.py", line 71 in search
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/search.py", line 107 in dbsearch
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/search.py", line 53 in __call__
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/base.py", line 345 in batchsearch
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/base.py", line 329 in search


Thread 0x00000002a26b3000 (most recent call first):
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/activations.py", line 79 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 450 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 549 in feed_forward_chunk
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/pytorch_utils.py", line 236 in apply_chunking_to_forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 537 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 610 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 1020 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/models/pooling.py", line 108 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/models/pooling.py", line 128 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/models/pooling.py", line 75 in encode
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/vectors/transformers.py", line 48 in encode
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/vectors/base.py", line 136 in batchtransform
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/base.py", line 293 in batchtransform
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/search.py", line 71 in search
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/search.py", line 107 in dbsearch
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/search.py", line 53 in __call__
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/base.py", line 345 in batchsearch
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/base.py", line 329 in search
  
  
  Thread 0x000000029fbb3000 (most recent call first):
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 114 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 462 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 550 in feed_forward_chunk
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/pytorch_utils.py", line 236 in apply_chunking_to_forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 537 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 610 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 1020 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/models/pooling.py", line 108 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/models/pooling.py", line 128 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/models/pooling.py", line 75 in encode
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/vectors/transformers.py", line 48 in encode
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/vectors/base.py", line 136 in batchtransform
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/base.py", line 293 in batchtransform
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/search.py", line 71 in search
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/search.py", line 107 in dbsearch
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/search.py", line 53 in __call__
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/base.py", line 345 in batchsearch
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/txtai/embeddings/base.py", line 329 in search
  
Thread 0x0000000297f3f000 (most recent call first):
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/functional.py", line 2515 in layer_norm
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/normalization.py", line 190 in forward
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501 in _call_impl
  File ".../.pyenv/versions/3.11.1/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 465 in forward

The text was updated successfully, but these errors were encountered:

Vincent-liuwingsang · 2023-07-09T19:10:20Z

tested with fresh conda environment, empty embedding instance and same condition. (e.g. 10 read/s, then rlock with an upsert+persist, 10 read/s continues).

I thought some of my installation could be dodgy but it seems a bit random that seg fault happens in tqdm🤔

Thread 0x0000000295a13000 (most recent call first):
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/tqdm/std.py", line 434 in format_meter
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/tqdm/std.py", line 1148 in __str__
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/tqdm/std.py", line 1492 in display
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/tqdm/std.py", line 1344 in refresh
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/tqdm/std.py", line 1095 in __init__
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/tqdm/std.py", line 1521 in trange
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 159 in encode
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/vectors/transformers.py", line 48 in encode
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/vectors/base.py", line 136 in batchtransform
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/embeddings/base.py", line 296 in batchtransform
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/embeddings/search.py", line 71 in search
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/embeddings/search.py", line 108 in dbsearch
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/embeddings/search.py", line 53 in __call__
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/embeddings/base.py", line 349 in batchsearch
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/embeddings/base.py", line 332 in search
  
  
   File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 387 in set_truncation_and_padding
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 428 in _batch_encode_plus
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2825 in batch_encode_plus
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2634 in _call_one
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2548 in __call__
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 113 in tokenize
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 319 in tokenize
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 161 in encode
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/vectors/transformers.py", line 48 in encode
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/vectors/base.py", line 136 in batchtransform
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/embeddings/base.py", line 296 in batchtransform
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/embeddings/search.py", line 71 in search
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/embeddings/search.py", line 108 in dbsearch
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/embeddings/search.py", line 53 in __call__
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/embeddings/base.py", line 349 in batchsearch
  File "/Users/wingsangvincentliu/miniconda3/lib/python3.10/site-packages/txtai/embeddings/base.py", line 332 in search

davidmezzetti · 2023-07-09T19:44:01Z

This is a tricky one, unfortunately I don't have a good answer for you. There is a known issue between Faiss and PyTorch on macOS regarding OpenMP conflicts.

More information on this issue can be found in the following links. Perhaps one of those can give you a hint on things to try.

Some ideas from those threads.

Disable OpenMP threading: export OMP_NUM_THREADS=1
Try to build libomp from source
Downgrade PyTorch to <= 1.12
Switch the index backend from faiss to hnsw

I also recommend upvoting some of those other issues to attempt to get them more visibility. This issue appears to be the root cause.

davidmezzetti · 2023-07-10T17:03:27Z

I was able to get the macOS build scripts to work with the latest version of PyTorch and setting the OMP_NUM_THREADS=1 variable. This is probably the best bet.

Vincent-liuwingsang · 2023-07-12T14:36:05Z

thanks will give this a go tonight. Which version of pytorch are you building with?

Vincent-liuwingsang · 2023-07-12T22:04:00Z

@davidmezzetti Thanks again for the help and advice. A bit of further digging suggests the random seg fault I'm experiencing is not related to this library or mac specific setup at all.

crash logs are consistent suggesting that pyobjc(as subdependency) is the culprit. Isolating the pyobjc part seemed to have resolved the issue. @davidmezzetti Cheers for looking into this!

Thread 28 Crashed:
0   libobjc.A.dylib               	       0x193a545f8 objc_release + 16
1   libobjc.A.dylib               	       0x193a5c0b4 AutoreleasePoolPage::releaseUntil(objc_object**) + 196
2   libobjc.A.dylib               	       0x193a58b7c objc_autoreleasePoolPop + 256
3   libobjc.A.dylib               	       0x193a80520 objc_tls_direct_base<AutoreleasePoolPage*, (tls_key)3, AutoreleasePoolPage::HotPageDealloc>::dtor_(void*) + 168
4   libsystem_pthread.dylib       	       0x193ded970 _pthread_tsd_cleanup + 620
5   libsystem_pthread.dylib       	       0x193df069c _pthread_exit + 84
6   libsystem_pthread.dylib       	       0x193deffb4 _pthread_start + 160
7   libsystem_pthread.dylib       	       0x193deada0 thread_start + 8

Vincent-liuwingsang · 2023-07-12T23:53:56Z

the root cause of this seemed to be that

.autorelease() on objc objects doesn't seem to work well in multithread environments

the fix was to use objc.autorelease_pool() to wrap resources that will be allocated and de-allocated.

davidmezzetti · 2023-07-13T14:11:51Z

Glad you were able to resolve it.

davidmezzetti mentioned this issue Jul 10, 2023

OpenMP issues with torch 1.13+ on macOS #377

Closed

Vincent-liuwingsang closed this as completed Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random Segmentation fault with embedding.search after upserting #498

Random Segmentation fault with embedding.search after upserting #498

Vincent-liuwingsang commented Jul 8, 2023 •

edited

Loading

Vincent-liuwingsang commented Jul 9, 2023

davidmezzetti commented Jul 9, 2023

davidmezzetti commented Jul 10, 2023

Vincent-liuwingsang commented Jul 12, 2023

Vincent-liuwingsang commented Jul 12, 2023

Vincent-liuwingsang commented Jul 12, 2023

davidmezzetti commented Jul 13, 2023

Random Segmentation fault with embedding.search after upserting #498

Random Segmentation fault with embedding.search after upserting #498

Comments

Vincent-liuwingsang commented Jul 8, 2023 • edited Loading

Vincent-liuwingsang commented Jul 9, 2023

davidmezzetti commented Jul 9, 2023

davidmezzetti commented Jul 10, 2023

Vincent-liuwingsang commented Jul 12, 2023

Vincent-liuwingsang commented Jul 12, 2023

Vincent-liuwingsang commented Jul 12, 2023

davidmezzetti commented Jul 13, 2023

Vincent-liuwingsang commented Jul 8, 2023 •

edited

Loading