How to get all documents? #144

sfeast · 2022-03-30T20:15:56Z

Is there a way to get all documents returned as results?

For example:

miniSearch.search("")

returns an empty array, but I'm looking for a way to get the opposite, all documents.

A use case I have is that I want to only filter by a numeric range in some cases. Something like this:

// get all documents with val property >= minVal
miniSearch.search('', {
    filter: (result) => {
        return result.val >= minVal
    }
})

but that currently returns nothing since no results are given to the filter.

I know it's not the best use of this library as mentioned here - #119 (comment) however it's just one of several scenarios I'm using it for & would be great to be able to leverage it as well for this.

& awesome library btw 🙌 🙏

The text was updated successfully, but these errors were encountered:

lucaong · 2022-03-31T13:50:36Z

Hi @sfeast ,
at the moment there is no built-in way to return all documents. I am evaluating a possibility, so it might be provided as a feature in the near future, but unfortunately not yet.

In general, it's often easier to filter outside of MiniSearch if you do not need to perform a full-text search. Something like:

documents.filter((doc) => doc.val >= minVal)

That said, if you really want to do that within MiniSearch, one way to do that is by creating a dummy field that always has a certain value:

const miniSearch = new MiniSearch({
  fields: ['dummy', 'val', /* ...your other fields here */],
  storeFields: ['val'],
  extractField: (document, fieldName) => {
    // Create a dummy field that always have the value "xxx"
    if (fieldName === 'dummy') { return 'xxx' }
    return document[fieldName]
  }
})

// Searching for "xxx" in field "dummy" should return all documents
miniSearch.search('xxx', {
  fields: ['dummy'],
  filter: (result) => result.val >= minVal
})

If, instead of xxx, you use a value that is guaranteed not to be in any other field (say, a rspecific andom alphanumeric string), you could even avoid restricting the search on the dummy field.

Now, this is admittedly a little hacky, but it should get it done.

sfeast · 2022-03-31T18:33:47Z

Thanks @lucaong - both current options are workable for me & I appreciate the detailed example 🙇

One last question - with this approach:

documents.filter((doc) => doc.val >= minVal)

documents here would be my own copy of the documents right? ie there's no way to get that from MiniSearch directly? Just trying to avoid keeping my own copy if possible.

lucaong · 2022-03-31T22:56:49Z

Happy to help :)

Yes, that’s correct, in the first option that would be your own copy of the documents.

samuelstroschein · 2022-04-30T19:47:26Z

I am looking for an alternative to fuse.js that optionally returns the original list given an empty search query "". The need for such a feature seems big, see krisk/Fuse#229 (PS your chance ;) )

lucaong · 2022-05-01T08:06:39Z

Hi @samuelstroschein ,
thanks for your comment.
The way I would implement a solution for such a need is:

const documents = [/* your documents…*/]

const docsById = documents.reduce((byId, doc) => {
  byId[doc.id]
  return byId
}, {})

const miniSearch = new MiniSearch({
  fields: [/* your fields… */]
})

const search = (query, options = {}) => {
  if (query == null) { return [] }
  if (query.trim().length === 0) {
    return documents
  } else {
    return miniSearch.search(query, options).map((result) => docsById[result.id])
  }
}

// Usage
search('')
// => …all documents 

search('something')
// => documents matching 'something'

As you can see, I had to make some choices that depend on the use case, such as:

A search query containing only spaces is considered empty
A null search query is not the same as an empty search query
The search function returns the matching documents in order of relevance, as opposed to search results (with match info, etc.)
When passed an empty query, the search function returns the original documents in their original order

Each of these decisions could vary depending on the specific needs of a project. Therefore, also considering that it is simple to define such a wrapper function, I think it is better that MiniSearch does not offer “natively” the option to return all results on empty search, and leaves the choice of implementation details to the developers.

samuelstroschein · 2022-05-01T14:06:28Z

@lucaong

I had to make some choices that depend on the use case

Implement the option with a callback instead of a boolean flag.

const search = new MiniSearch({
  fields: [/* your fields… */],
  returnOriginalDocumentsWhen: (searchQuery) => searchQuery === ""
})

I think it is better that MiniSearch does not offer “natively” the option to return all results on empty search, and leaves the choice of implementation details to the developers.

Hmm, I mean that's why people install an npm package in the first place. I don't want to write code. The discussion in the other package is quite big, indicating that a lot of people want that feature.

Edit

Actually what it really needs is just a filter instead of search function.

minisearch.filter(('search query'))

lucaong · 2022-05-02T08:30:10Z

Implement the option with a callback instead of a boolean flag.

It's unfortunately more complicated than that. For example, how should documents be sorted? It would seem reasonable to return them in the original order, but what if one defines a boostDocument function? Then it makes more sense to compute the boost for each document and re-sort them. But since the original list is static, a smart developer would prefer to pre-sort the list only once, and skip the search-time boosting calculation when returning all documents.

Similarly, since MiniSearch returns an array of SearchResult, not documents, when returning all results it would have to first map each document into a search result. But depending on the use case, developers might map results back to documents (like I did in my example before). In that case, it's a lot more efficient to avoid mapping to SearchResult[] in the first place (especially as it maps the whole collection, potentially tens of thousands documents).

Moreover, at the moment MiniSearch does not keep a reference to the original collection of documents, so it cannot return it. This is by choice: it is possible to make some documents searchable without storing the document itself in memory.

Of course, it is theoretically possible to implement options for each of these choices, but that would make the API surface huge, and hard to learn. Instead, these details are better defined in code. The reason why code is better than configuration in this case, is that configuration is something that has to be learned for each and every library, while code is general purpose: for a configuration option to be ergonomic, it has to save the developer a non-trivial amount of code or cognitive load. If it generates more open questions, it is not worth, because learning all the implications takes more effort than taking control of the issue with code.

Hmm, I mean that's why people install an npm package in the first place. I don't want to write code.

I would say, one does not want to write code at the wrong level of abstraction. What I mean is: even when using a library, one does have to write code. The point is that one normally prefers to avoid writing code that pertains to the internal details of the problem solved by the library, and instead focus on code pertaining to the higher level goal of the application.

Therefore, a library has to choose its own boundaries and goals. MiniSearch, as its design document outlines, "enables developers to build [turn-key opinionated solutions] on top of its core API, but does not provide them out of the box.". MiniSearch takes care, for example, of implementation details of the inverted index or of the document scoring, but it leaves to the developers the responsibility to write code that defines their specific full-text search problem.

It would be absolutely appropriate to build a library on top of MiniSearch that makes some of these decision and builds a higher level of abstraction. That would save developers from writing some code, but also restrict their options. For developers that have those specific needs, such library would facilitate things. MiniSearch itself though has to enable also developers that have different needs. In other words, your request is completely legitimate, it just lies outside of MiniSearch self-assigned boundaries of abstraction.

The discussion in the other package is quite big, indicating that a lot of people want that feature.

I understand and respect the fact that many people have this need. As a matter of fact, even some of my own apps have the same need. But apart from using MiniSearch in my production applications, I do not profit from MiniSearch: my motivation in maintaining it stems from the satisfaction of what I consider a well crafted piece of software. I am happy if more people use it, because it means that it is solving more problems than it was originally conceived for, but I would not sacrifice the solidity of its design for popularity. By open-sourcing my library, I get to keep the satisfaction of crafting software the way I consider best, without having to sacrifice it to chase more users. Users, in turn, get the freedom to use my library, and to create applications or other libraries on top of it.

In sum, I do agree with you that yours is a common need. My opinion though, is that such need is better served by writing some thin layer of code, like the example I provided, than by adding more configuration options. But it is perfectly reasonable to disagree with that, and such thin layer can be packaged in a library for convenience.

samuelstroschein · 2022-05-02T14:30:24Z

@lucaong

Thank you for the in-depth reply and explanation. I overread the stated goal of "[...] enables developers to build [turn-key opinionated solutions] on top of its core API, but does not provide them out of the box" and was looking (expecting) a drop-in replacement for fuse.js.

On a side note, I have a question regarding your i18n workflow at megaloop. Can you send me a DM on Twitter or via email?

lucaong/minisearch#144

sfeast changed the title ~~How to return all documents~~ How to get all documents Mar 30, 2022

sfeast changed the title ~~How to get all documents~~ How to get all documents? Mar 30, 2022

lucaong closed this as completed Apr 1, 2022

gwynm added a commit to gwynm/obsidian-omnisearch that referenced this issue Dec 30, 2022

When search is empty, show all notes

0cf63f9

lucaong/minisearch#144

rszyma mentioned this issue May 30, 2023

website: improve cheat-sheet page performance ryanoasis/nerd-fonts#1252

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get all documents? #144

How to get all documents? #144

sfeast commented Mar 30, 2022 •

edited

Loading

lucaong commented Mar 31, 2022 •

edited

Loading

sfeast commented Mar 31, 2022

lucaong commented Mar 31, 2022

samuelstroschein commented Apr 30, 2022

lucaong commented May 1, 2022 •

edited

Loading

samuelstroschein commented May 1, 2022 •

edited

Loading

lucaong commented May 2, 2022 •

edited

Loading

samuelstroschein commented May 2, 2022 •

edited

Loading

How to get all documents? #144

How to get all documents? #144

Comments

sfeast commented Mar 30, 2022 • edited Loading

lucaong commented Mar 31, 2022 • edited Loading

sfeast commented Mar 31, 2022

lucaong commented Mar 31, 2022

samuelstroschein commented Apr 30, 2022

lucaong commented May 1, 2022 • edited Loading

samuelstroschein commented May 1, 2022 • edited Loading

Edit

lucaong commented May 2, 2022 • edited Loading

samuelstroschein commented May 2, 2022 • edited Loading

sfeast commented Mar 30, 2022 •

edited

Loading

lucaong commented Mar 31, 2022 •

edited

Loading

lucaong commented May 1, 2022 •

edited

Loading

samuelstroschein commented May 1, 2022 •

edited

Loading

lucaong commented May 2, 2022 •

edited

Loading

samuelstroschein commented May 2, 2022 •

edited

Loading