Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of code duplication between Autocomplete and Search #331

Closed
duker33 opened this issue May 16, 2019 · 14 comments · Fixed by #332
Closed

Get rid of code duplication between Autocomplete and Search #331

duker33 opened this issue May 16, 2019 · 14 comments · Fixed by #332
Assignees
Labels
2 hours some big and monolith issues. For example with hard decisions or discussions 2 performer can implement issue at his closest convenient time cleanup everything that do project stronger, flexible, reusable hard issue requires strong architecture skills, management skills, project domain area, etc Search Search and Autocomplete feature

Comments

@duker33
Copy link
Collaborator

duker33 commented May 16, 2019

Now configuring Search and Autocomplete search_entities list leads to different results.
But they should be identical

@duker33 duker33 added 2 hours some big and monolith issues. For example with hard decisions or discussions 1 burning issue hard issue requires strong architecture skills, management skills, project domain area, etc cleanup everything that do project stronger, flexible, reusable Search Search and Autocomplete feature labels May 16, 2019
@duker33 duker33 self-assigned this May 16, 2019
duker33 added a commit that referenced this issue May 16, 2019
@ArtemijRodionov
Copy link
Contributor

ArtemijRodionov commented May 16, 2019

@duker33 Title of the issue looks like you going to refactor search.views, but after I made a review it seemed to me that you are going to refactor search.search. We already have an issue for it.

What exactly are you going to do in this issue?

duker33 added a commit that referenced this issue May 17, 2019
@duker33
Copy link
Collaborator Author

duker33 commented May 17, 2019

@artemiy312 , wow, i didn't notice this issue, sorry.
It will affect search.views too, of course. Since search.views now is strong coupled to search.search one.

You hold #289 throw 05 Mar. So, let me take it plz.
If you agree, we'll mark #289 as duplicate of current issue and move along

@duker33 duker33 added the blocked issue is blocked by another issue label May 17, 2019
@ArtemijRodionov
Copy link
Contributor

@duker33 Let's specify an issue we are going to solve here.

  1. We have a mess in search.views, that should be generalized. I think we both agree on that.
  2. search.search looks ok on the background. It just need little bit refactored and cleaned up. It lift up lot of work that should be separated
  3. We have shared logic between frontend and backend:
  • Backend performs a search query and returns a response with json
  • Frontend somehow tries to parse the json with heterogeneous data and renders it

All backend logic is generalized in refarm-site, but frontend is uniquely implemented in each client site.
As i remember we have a convention to keep frontend thin

duker33 added a commit that referenced this issue May 17, 2019
@duker33
Copy link
Collaborator Author

duker33 commented May 17, 2019

@artemiy312 , i agree with most of the points, but i think that search.views.AutocompleteView is not perfect.
I've just tried to add some new search_entities to Autocomplete and it didn't work. I looked AutocompleteView, broke my brain with it's prepare_fields method.
So, i decided to redesign this strange procedural data adapting to some modular and obvious.

Bad AutocompleteView logic roots from search.search of course. That's why i started from it.
Good search.search will hide all data adapting inside. So AutocompleteView will go out in favor of simple objects composition directly in the client code.
I provided example at the related PR:

# site's side
def autocomplete(request, query):
    # we'll use django orm search based on postgresql.
    # Query sets will already contain searched entities.
    # https://docs.djangoproject.com/en/1.11/topics/db/search/
    return JsonResponse(list(
        ResultsStack([
            Results(
                CategoryQuerySet().filter(name__search=query),
            ),
            Results(
                ProductQuerySet().filter(name__search=query),
            ),
        ]).chain()
    ))

@ArtemijRodionov
Copy link
Contributor

@duker33 >Good search.search will hide all data adapting inside

It doesn't mean that we should fully rewrite it, I believe we should create a new abstraction, that will adapt data to needs.

I worry that we will lose some features and get more bugs, but if you sure let's make like this

@duker33 duker33 added 2 performer can implement issue at his closest convenient time and removed 1 burning issue labels May 19, 2019
@duker33
Copy link
Collaborator Author

duker33 commented May 19, 2019

@artemiy312 , search.search.search is the only data seaching behaviour. The rest of search.search code are strange conditions when we should use one or another search algo. We have no tests, no UI analytics about search efficiency. So, we can throw away current search realization in favor of much more ready and clear django qs based search.

Also we have enough resources to implement good arch. How we can move forward our arch discussion? That's my proposition: let's merge related PR with my arch version. And you'll propose another PR fixing my arch prototype with expreaded code comments.

Let's move?

@duker33 duker33 added discuss issue needs to finish discussion before start working and removed blocked issue is blocked by another issue labels May 19, 2019
@ArtemijRodionov
Copy link
Contributor

@duker33 >So, we can throw away current search realization in favor of much more ready and clear django qs based search.

Django qs search uses full-text search, in our case it is postgresql full-text search.
We decided to not use it because it doesn't provide fuzzy full-text search, so it can't handle any typos in a query. Instead we are using the trigram extension.

Trigram works only with textual data, so we have to determine the type of searched fields to provide ability to search by non-textual fields too. For those non-textual fields we are using our custom search function. All it does the current implementation, so we can't simply throw away it

We should document this

@duker33
Copy link
Collaborator Author

duker33 commented May 20, 2019

@artemiy312 , trigram is implemented at postgres side too.
We can choose algo like this. It can solve the problem i think:

def autocomplete(request, query):
    # we'll use django orm search based on postgresql.
    # Query sets will already contain searched entities.
    # https://docs.djangoproject.com/en/1.11/topics/db/search/
    return JsonResponse(list(
        ResultsStack([
            Results(
                CategoryQuerySet().filter(name__search=query),
                CategoryQuerySet().filter(mark__trigram=query),
                CategoryQuerySet().filter(h1__search=query),
            ),
            Results(
                ProductQuerySet().filter(name__search=query),
                ProductQuerySet().filter(mark__trigram=query),
                ProductQuerySet().filter(h1__search=query),
            ),
        ]).chain()
    ))

I'll repeat myself: we have no data if current search algo is working or not. Maybe simple name__search will make more good UX then current algo.
So, we can throw it away in favor of a new flexible system with min start features

@ArtemijRodionov
Copy link
Contributor

ArtemijRodionov commented May 20, 2019

@duker33 >We can choose algo like this. It can solve the problem i think:

We already have implemented this algo in search.search, but it makes less queries to db. Your algo will make a query per a field, but current implementation makes a query per a model. Soon or later we have to implement the same what we already have.
I want to emphasize that current implementation already use django qs search and postgresql trigram extension, but in more optimized way.

I believe it will be enough to rewrite search.views and refactor search.search

@duker33
Copy link
Collaborator Author

duker33 commented May 20, 2019

@artemiy312 , let me fix example =)

Soon or later we have to implement the same what we already have.
Of course we have. That's my aim: do the same in more clear way.

def autocomplete(request, query):
    # we'll use django orm search based on postgresql.
    # Query sets will already contain searched entities.
    # https://docs.djangoproject.com/en/1.11/topics/db/search/
    return JsonResponse(list(
        ResultsStack([
            Results(
                CategoryQuerySet().filter(
					SearchQuery(name__search=query)
					| SearchQuery(mark__trigram=query)
					| SearchQuery(h1__search=query)
				),
            ),
            Results(
                ProductQuerySet().filter(
					SearchQuery(name__search=query)
					| SearchQuery(mark__trigram=query)
					| SearchQuery(h1__search=query)
				),
            ),
        ]).chain()
    ))

We also can use sorting by SearchRank and so on.

Generally speaking our task consists of two parts:

1. Search engine
It should be implemented with django querysets. It's django's recommended way and it's have good arch i beleave.
Now we have owful procedural code like this:

if not self.trigram_fields:
	return search(term, self.qs, self.decimal_fields)
elif not self.decimal_fields:
	return _trigram_search(term)
else:
	trigram_data = _trigram_search(term)
	default_data = search(term, self.qs, self.decimal_fields)
	return trigram_data if trigram_data else default_data if default_data else None

We can relatively easily rewrite it in one queryset. If you don't believe it, just let me continue working with code.
I'll propose some real code, you'll review it and we'll discuss.

2. Results list
It has geterogeniouos nature. Every element of the list can have represent different entity.
We should provide clear mech to pack results list for rendering.

I proposed a way to do it. It's definetily not the most perfect. But it's much more good then simple geterogeniouos list merging with no interface and contracts.
Results list template simply don't know (and no one know) what fields are exists and what are not

@ArtemijRodionov
Copy link
Contributor

@duker33

do the same in more clear way.

You provided the code that uses full-text search. As I said before, full-text search doesn't support fuzzy search, it doesn't handle any typos in a search query. Also postgres can't combine the full-text search with trigram

We should provide clear mech to pack results list for rendering.

I have concern about the heterogeneous list too, but if we will make it homogeneous we have to produce such template:

{% for r in results %}
  {% if r.name %}
    {{ r.name }} 
  {% endif %}
  {% if r.price %}
    {{ r.price }} 
  {% endif %}
  {% if r.specification %}
    {{ r.specification }} 
  {% endif %}
  and so on
{% endfor %}

It looks unclear to me. Things will get worse, when we add markup.
Also we have to extends and manages Result and Results classes.

I'd prefer to use dict for this purpose:

{% if results.products %}
  {% for p in results.products %}
    {{ p.name }} {{ p.price }} {{ p.category.name }}
  {% endfor %}
{% endif %}
{% if results.series %}
  {% for s in results.series %}
    {{ s.name }} {{ p.specification }}
  {% endfor %}
{% endif %}

But it seems a matter of taste, so it will up to you.

@duker33
Copy link
Collaborator Author

duker33 commented May 25, 2019

@artemiy312 ,

Also postgres can't combine the full-text search with trigram

It doesn't clear to me why postgres can't mix trigram with the common search. On django QS level trigram is just another query filter (trigram_similar) or can be used with TrigramSimilarity/TrigramDistance queries just as common queryset
https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/search/#trigram-similarity

As I said before, full-text search doesn't support fuzzy search, it doesn't handle any typos in a search query

It's true, but trigram has pros too. I'll not going cover them - it's for the separated task. But in general, we should be able to combine different techniques for different content: trigram/search/like.

we have to produce such template: <...> It looks unclear to me.

It's true problem. To solve we can provide good default empty values at Result class. This code will be much more clear since it will use class instantiation.

Let me generalize the question you've shown: Should search template know about results nature or not?
I don't know the right answer. Only i know, that this knowledge is semantic coupling between templates and search view. So we should clarify why we use it (if we).
If we'll face this issue, we'll fix template processing somehow. It'll not be simple, but will be easy

@duker33 duker33 removed the discuss issue needs to finish discussion before start working label May 31, 2019
@duker33
Copy link
Collaborator Author

duker33 commented May 31, 2019

i close the discussion. We'll continue it at sub issue's PR. We'll discuss more concrete code proposition

duker33 added a commit that referenced this issue May 31, 2019
* #331  Draft for the search arch

* #331  Subtask new search arch implementation

* #331  Drop contexts in favor of results processing

* #331  Minor imports fix

* #331  Review#1 fixes. Approve docs
@0pdd
Copy link
Collaborator

0pdd commented May 31, 2019

@duker33 the puzzle #337 is still not solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 hours some big and monolith issues. For example with hard decisions or discussions 2 performer can implement issue at his closest convenient time cleanup everything that do project stronger, flexible, reusable hard issue requires strong architecture skills, management skills, project domain area, etc Search Search and Autocomplete feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants