Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutl databases (multi db) support #924

Open
bluetech opened this issue May 7, 2021 · 50 comments
Open

Mutl databases (multi db) support #924

bluetech opened this issue May 7, 2021 · 50 comments

Comments

@bluetech
Copy link
Member

bluetech commented May 7, 2021

This issue replaces some historical issues: #76, #342, #423, #461, #828, #838, #839 (probably a partial list).

Background

Django supports multi databases. This means defining multiple entries in the DATABASE setting, which then allows directly certain queries to certain databases.

One case is when an extra database is entirely independent, has its own migrations, setups etc.

Second case is when an extra database is readonly, only used for read queries, not managed by Django.

Third case is a readonly replica, for this Django provides the MIRROR setting

Django allows configuring the order in which test databases are set up.

Django's multi-db testing support

pytest-django mostly relies on Django's underlying TransactionTestCase and TestCase classes for dealing with DB setups and such. Each pytest-django test gets run in a dynamically-generated TestCase/TransactionTestCase.

The main setting for mutli-db support is TransactionTestCase.databases. This tells Django which databases to consider for the test case. By default it's only default. It's possible to specify __all__ to include all databases.

Historical note: The TransactionTestCase.databases attribute was added in Django 2.2. Before that a multi_db attribute was used. pytest-django only supports Django>=2.2 so we happily don't need to concern ourselves with that.

Previous attempts

#397 - Adds multi_db=True argument to pytest.mark.django_db(), adds django_multi_db fixture. Problem: uses the old multi_db attribute instead of the databases attribute.

#416 - Very similar to #397.

#431 - Adds django_db_testcase fixture which allows the user to completely customize the test case class, including setting databases. Rejected for being too flexible, I'd prefer direct support for multi-db.

#896 - Adds a global per-database setting for whether to add to the databases value or not. Rejected because I think it should be possible to customize per-test.

Proposed solution

IMO we want something like #397/#416, but modernized to use databases instead of multi_db. The fixture part would be a bit problematic because it's no longer just a boolean (fixture enabled/not enabled), but a list of database aliases. So some solution would be needed for that, or maybe only the mark would be supported.

I'll try to work on it myself, but if for some reason I don't, PRs are definitely welcome!

@jgb
Copy link

jgb commented May 8, 2021

Following this! It's a showstopper for us, preventing updates beyond Django 3.0. We currently have an internal monkey patching workaround that works with Django <= 3.0 but I can't get it to work when they remove the multi_db parameter.

@bluetech
Copy link
Member Author

bluetech commented May 8, 2021

Initial PR in #930.

@pmourlanne
Copy link
Contributor

@jgb #397 was developed for your exact use case back in the day :) It is outdated now :zoidberg:

@jgb
Copy link

jgb commented May 17, 2021

@bluetech I'm testing 4.3.0 specifically for the multi db support. It seems to work, kind of. At least under django 3.0 it works. However as soon as I upgrade to django 3.1 or 3.2, it goes wrong, because somehow my data isn't getting flushed anymore after each test?
Actually it goes wrong with all versions of django: 3.0, 3.1 and 3.2.
Like I run a test which creates an object in the database, and when I run the test again the object of the previous run still exists.
Any idea what might be going wrong here?

It seems like this old issue describes what I'm seeing: #76
4.3.0 does allow me to access multiple db's, but it doesn't properly clean / flush them in between test runs.

@jcushman
Copy link

Awesome! Thanks for working on this, it looks great.

I converted a multidb project over to the experimental API and it seems to be working, including flushing data between test runs. (I was previously using the workaround of TransactionTestCase.databases = TestCase.databases = set(settings.DATABASES.keys()) in a session-scoped fixture.)

It's also working correctly with pytest-xdist, which is very cool.

The fixture part would be a bit problematic because it's no longer just a boolean (fixture enabled/not enabled), but a list of database aliases. So some solution would be needed for that, or maybe only the mark would be supported.

FWIW I definitely struggled with the lack of a fixture version. Just to spell out the issue you're talking about, with a single-db project I would do something like this:

@pytest.fixture
def person(db):
    return Person.objects.create(...)

# no need for pytest.mark.django_db:
def test_person(person):
    ...

With multi-db I would imagine wanting to do something like this:

@pytest.fixture
def db_people(add_django_db):
    add_django_db('people')

@pytest.fixture
def db_books(add_django_db):
    add_django_db('books')

@pytest.fixture
def person(db_people):
    return Person.objects.create(...)

@pytest.fixture
def book(db_books):
    return Book.objects.create(...)

# no need for @pytest.mark.django_db(databases=['people', 'books'])
def test_reader(person, book):
    ...

(Not sure if that's possible to implement, but just as an example of how an API could work.)

This would be convenient for fixtures in general, because otherwise it's easy to forget to remove 'books' from databases when you edit which fixtures are used by test_reader later, and it's also nice just for removing a line of noise from each test. But it becomes particularly useful when using doctests:

def reader_stuff():
    """
    # no way to use pytest.mark here?
    >>> person = getfixture("person")
    >>> book = getfixture("book")
    ...
    """

The workaround I found to get my doctests working was to add an autouse fixture so db and transactional_db fixtures would by default load all databases, unless there's an explicit pytest.mark.django_db:

@pytest.fixture(autouse=True)
def database_defaults(request):
    # set default databases for `db` and `transactional_db` fixtures
    if not hasattr(request.node, '_pytest_django_databases'):
        request.node._pytest_django_databases = list(settings.DATABASES.keys())

That's a pretty handy thing to be able to opt into, so it might be nice to have a more official way to do it? But it still leaves the doctests less efficient than they could be otherwise, since they don't actually need access to all the databases, so I think a fixture version would still be useful.

One other idea I had while doing the conversion -- it would be cool if there was some flag I could use to be warned about unused databases, so if I removed the book fixture from test_reader I'd be warned that 'books' was no longer needed in the annotation. Not sure if that's possible to track, just a random brainstorm in case there's some handy way to implement it.

Thanks again for pushing this forward!

@jgb
Copy link

jgb commented May 19, 2021

@bluetech so why does the flushing between each test work for @jcushman but not for me? ❓

@bluetech
Copy link
Member Author

@jgb @jcushman Thanks for the reports! I'll be looking at this again this weekend and I'll reply then.

@jcushman
Copy link

@jgb here's what I'm using successfully in case it helps.

Packages:

Python 3.7.10
Postgres 11.11
Django==3.2.3
pytest==6.0.1
pytest-django==4.3.0
pytest-xdist==1.32.0

Databases:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        ...
    },
    'capdb': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        ...
    }

I started with adding just a simple test file before converting my actual tests:

@pytest.fixture
def court():
    return Court.objects.create(name='foo')  # comes from capdb

@pytest.fixture
def mailing_list():
    return MailingList.objects.create(email='foo@example.com')  # comes from default

@pytest.mark.parametrize("x", range(10))
@pytest.mark.django_db(databases=['default', 'capdb'])
def test_multi(x, court, mailing_list):
    assert Court.objects.count() == 1
    assert MailingList.objects.count() == 1

Results:

# pytest capdb/tests/test_multi.py
========================================= test session starts =========================================
platform linux -- Python 3.7.10, pytest-6.0.1, py-1.10.0, pluggy-0.13.1
django: settings: config.settings.settings_pytest (from ini)
rootdir: /app, configfile: setup.cfg
plugins: django-4.3.0, forked-1.0.1, flaky-3.6.0, celery-4.3.0, cov-2.9.0, redis-2.0.0, xdist-1.32.0
collected 10 items

capdb/tests/test_multi.py ..........                                                            [100%]

========================================= 10 passed in 4.28s ==========================================

I imagine if you can try an isolated test like that and narrow down the issue it might help bluetech when they get back to working on this. You might also look closely at your fixtures, maybe entirely disable them, and see if the isolated test starts working again -- it wouldn't be that surprising if an old multidb workaround in your fixtures is messing with this.

@gonzaloamadio
Copy link

Does this mean that if I have a multi-db project, I can not use pytest for tests?

I have a legacy DB and I created an app with some models that correlates with tables in that legacy DB, and also I have created some endpoints with Django DRF (managed=False), so no migrations are done.
So basically would be case 1 of first comment by bluetech.

@bluetech
Copy link
Member Author

@jcushman

FWIW I definitely struggled with the lack of a fixture version

Yes, I'm pretty sure we need some integration with fixtures here.

Your suggestion should be doable; all we really need is to know the list of databases once the test is to be executed.

I think the add_django_db API is not too great, but maybe I need to get used to it a bit.

I'll definitely mull it over. Of course if someone wants to submit a PR with a proposal that would be possible as well.

# no way to use pytest.mark here?

Right, currently there is no way to add marks directly to doctests. This is pytest-dev/pytest#5794. I can't think of any clear way to support it either.

One other idea I had while doing the conversion -- it would be cool if there was some flag I could use to be warned about unused databases

For the multi-db support, pytest-django depends almost entirely on the django.test code, so such a feature would probably have to go through Django. It might be possible to somehow track whether a connection for a database was used during a test, but I'm not sure. There are also bound to be a lot of false-positives (or rather, cases where you want to keep a DB anyway), so would definitely need to be off by default.

@bluetech
Copy link
Member Author

@jgb

My answer is pretty much what @jcushman said (thanks!) -- it works here, so we'll need more details to help us narrow down the cause.

@bluetech
Copy link
Member Author

@gonzaloamadio

Does this mean that if I have a multi-db project, I can not use pytest for tests?

It's supposed to be the other way around - previously you couldn't, now you can.

If you configured the legacy database in your DATABASES then it should work. If you tried it and it didn't work, we'd need to know how it failed.

@gonzaloamadio
Copy link

gonzaloamadio commented Jun 2, 2021

Hi, @bluetech
I have made a reusable app (let's call it API) that has this models unmanaged models.
Then (in same repo) another "testapp" that install this API

In this testapp/settings.py I have 2 databases. One is a legacy db, the one that will be queried by models unmanaged models in API app.

DATABASES = {
    'default': {
        'ATOMIC_REQUESTS': True,
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(str(BASE_DIR), 'db.sqlite3'),
         . . . 
    },
    'legacy_db': {
        'ENGINE': django.db.backends.postgresql,
        . . . 
    }
}
DATABASE_ROUTERS = ['my-project.testapp.database_routers.DatabaseRouter']

And then I have also a testapp/settings_test.py. In this file, I disable migrations, define sqlite databases and set managed=True for models.

from .settings import *
from django.test.runner import DiscoverRunner

class DisableMigrations(object):

    def __contains__(self, item):
        return True
    def __getitem__(self, item):
        return

MIGRATION_MODULES = DisableMigrations()

class UnManagedModelTestRunner(DiscoverRunner):
    """
    Test runner that automatically makes all unmanaged models in your Django
    project managed for the duration of the test run.
    """

    def setup_test_environment(self, *args, **kwargs):
        from django.apps import apps

        self.unmanaged_models = [
            m for m in apps.get_models() if not m._meta.managed
        ]
        for m in self.unmanaged_models:
            m._meta.managed = True
        super(UnManagedModelTestRunner, self).setup_test_environment(
            *args, **kwargs
        )

    def teardown_test_environment(self, *args, **kwargs):
        super(UnManagedModelTestRunner, self).teardown_test_environment(
            *args, **kwargs
        )
        # reset unmanaged models
        for m in self.unmanaged_models:
            m._meta.managed = False


DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.sqlite3",
        "NAME": os.path.join(BASE_DIR, "db_test.sqlite3"),
        "TEST": {
            "ENGINE": "django.db.backends.sqlite3",
            "NAME": os.path.join(BASE_DIR, "db_test.sqlite3"),
        },
    },
    'legacy_db': {
        "ENGINE": "django.db.backends.sqlite3",
        "NAME": os.path.join(BASE_DIR, "db_legacy_test.sqlite3"),
        "TEST": {
            "ENGINE": "django.db.backends.sqlite3",
            "NAME": os.path.join(BASE_DIR, "db_legacy_test.sqlite3"),
        },
    },
}

# Set Django's test runner to the custom class defined above
TEST_RUNNER = "testapp.settings_test.UnManagedModelTestRunner"

If I run normal unit tests with "production" settings, it fails as expected failing because relations for unmanaged models does not exists.
./manage.py test --settings=testapp.settings

If I run using test settings , it work as expected.
./manage.py test --settings=testapp.settings_test

BUT, if I run using pytest.

    def execute(self, query, params=None):
        if params is None:
            return Database.Cursor.execute(self, query)
        query = self.convert_query(query)
>       return Database.Cursor.execute(self, query, params)
E       django.db.utils.OperationalError: no such table: ingredient

Here is a gist with more code (models, factory, test, settings): https://gist.github.com/gonzaloamadio/14f935d96809299b7f1e9fb88a6e8e94

I put a breakpoint and have inspected DB. And also as expected, when I run with unittest suite, the ingredient table was there?

image

But when run with pytest.. no ingredient table there

image

@gonzaloamadio
Copy link

gonzaloamadio commented Jun 3, 2021

One more comment. I have run unittest with verbose option. This is the output

❯ ./manage.py test --settings=testapp.settings_test --verbosity=2

Using existing test database for alias 'default' ('db_test.sqlite3')...
Operations to perform:
  Synchronize unmigrated apps: account, admin, core_api, auth, contenttypes, corsheaders, django_extensions, django_filters, messages, rest_framework, runserver_nostatic, sessions, softdelete, staticfiles, test_without_migrations
  Apply all migrations: (none)

Synchronizing apps without migrations:
  Creating tables...
    Creating table auth_permission
    Creating table auth_group
     . . .  more of django core stuff tables
Running migrations:
  No migrations to apply.

Using existing test database for alias 'legacy_db' ('db_legacy_test.sqlite3')...
Operations to perform:
  Synchronize unmigrated apps: account, admin, core_api, auth, contenttypes, corsheaders, django_extensions, django_filters, messages, rest_framework, runserver_nostatic, sessions, softdelete, staticfiles, test_without_migrations
  Apply all migrations: (none)
Synchronizing apps without migrations:
  Creating tables...
    Creating table ingredient                   <--- This is the key.  
    Running deferred SQL...
Running migrations:
  No migrations to apply.

So I found this solution : https://stackoverflow.com/questions/30973481/django-test-tables-are-not-being-created/50849037#50849037

Basically do in conftest what UnManagedModelTestRunner is doing.

This solution worked for me @bluetech

$ cat conftest.py

import pytest
# Set managed=True for unmanaged models. !! Without this, tests will fail because tables won't be created in test db !!
@pytest.fixture(autouse=True, scope="session")
def __django_test_environment(django_test_environment):
    from django.apps import apps

    get_models = apps.get_models

    for m in [m for m in get_models() if not m._meta.managed]:
        m._meta.managed = True

@bluetech
Copy link
Member Author

bluetech commented Jun 3, 2021

@gonzaloamadio Right, pytest doesn't consider TEST_RUNNER (it is itself the test runner), so your tweaks are not affecting it. I'm not sure if you were eventually able to do the modification in the conftest.py or not. If not, let me know and I'll try to help.

@bepuca
Copy link

bepuca commented Jun 10, 2021

So I just stumbled upon this section on the docs. We are currently upgrading a multi DB Django project from Django 1.11 to Django 3.2 and also upgrading the pytest and pytest-django packages. I was not aware of all these changes, but for us it worked out of the box without any issues. The tests are passing without problems. So thank you for that!

@AnderUstarroz
Copy link

AnderUstarroz commented Feb 14, 2022

Multi DB is supported then? I am facing a strange issue when trying to execute queries using cursor within multiple DBs:

Am I doing a bad use of the fixtures?

@peterschwarzdev
Copy link

Same problem here! Django DB fixtures don't work with the cursors generated by:

from django.db import connections

???

@mschoettle
Copy link
Contributor

We ran into a problem when we added a second database to the Django settings. When running pytest it tried to create a database with a second test_ prefix (test_test_<dbname>) for the default database. It did only happen to a few other developers so I could not reproduce it.

We were able to fix it by specifying a specific database name for the test database in the settings (in DATABASES['default']['TEST']['NAME']).

@AnderUstarroz
Copy link

AnderUstarroz commented Jul 12, 2022

What about accessing a non-default DB within a fixture, how can we do that?

I know about the db fixture, which allows access just to the default DB:

@pytest.fixture()
def db_access(db):
    # Here you can access just the "default" db.

@raianul
Copy link

raianul commented Jul 20, 2022

What I did so far - in confest.py


TestCase.databases = {"default", "replica"}

However its creating both test database but all of my fixture is executing for default, not in the replica. Ultimately my tests failed.

Here is my fixture -

@pytest.fixture(name="test_factory")
def fixture_test_factory():

    def _test(**kwargs):
        return TestModel(name="test", **kwargs)

    return _test

Do I need to make any other changes?

@bobflorian
Copy link

solid multi db support would a huge win for us. we have multi tenant Django app touching a bunch of databases and ancillary databases as well

@erkandmp
Copy link

erkandmp commented Oct 5, 2022

Hey guys, thanks to all for all of your work in this project!

I found my way to this thread while I was upgrading some packages and running the test suit:

AssertionError: Database connections to 'xxxx_db' are not allowed in this test. 
Add 'xxxx_db' to pytest_django.fixtures._django_db_fixture_helper.<locals>.PytestDjangoTestCase.databases 
to ensure proper test isolation and silence this failure.

suggestion
If there could be a way to configure globally that access to the non-default database is ok, it would help us a lot.
All of the tests are marked as "talks to the database"(ie. db fed as an argument fixture or @pytest.mark.django_db) and trying to get to pytest-django>=4.4.3 would require to refactor multiple thousands of tests for us. 😅

For now I pinned pytest-django==4.2.0.

Thanks again and have a great time!

@dannyzhong
Copy link

my case is even more complicated, I have two databases, one in MySQL, one in Postgresql, not sure how to pytest them

@christianbundy
Copy link

Is enable_db_access_for_all_tests meant to support multiple databases? Here's what I'm using:

@pytest.fixture(autouse=True)
def enable_db_access_for_all_tests(db):
    pass

@christianbundy
Copy link

christianbundy commented Feb 3, 2023

I'm unsure how useful this will be for others, but if you're transitioning from a codebase where all of your tests inherit from a standard test class where you set `databases = "all", or if you use that pattern often, you should know:

  • You don't need to enable_db_access_for_all_tests.
  • This pattern causes problems pytest-xdist. You may see django.utils.connection.ConnectionDoesNotExist: The connection '_' doesn't exist., which comes from _databases_support_transactions splitting cls.databases assuming it's a set of aliases, when in reality it's the string "__all__".
    • You can solve this by setting databases = ["default", "other", "etc"].
    • If you just override _databases_support_transactions to return True, then your pytest-xdist workers will apparently use the same database connection.
    • I can't imagine how/why it would be related to the above, but previously I was experiencing Cache + xdist #527 and haven't seen issues fixing "__all__" and removing everything from my conftest.py.

@lachlancannon
Copy link

Is the current support meant to include support for the MIRROR setting? Everything except that was working for me, and I also couldn't write the test data directly to the second database without permission errors, e.g. User.objects.using('reporting').create(email='whatever@wherever.com')

@SHxKM
Copy link

SHxKM commented Apr 22, 2023

Is there any way where I can apply:

@pytest.mark.django_db(databases=['default'])

To all tests? I have hundreds of tests that are using the db using the "magic" db argument to the test:

def test_profile(db):
 ...

Edit: Even after removing the 2nd DB from my dev.py settings file and even removing references to using(...) pytest is failing. Where is it getting it's idea about the 2nd DB from now?

@smohsenmohseni
Copy link

Is the current support meant to include support for the MIRROR setting? Everything except that was working for me, and I also couldn't write the test data directly to the second database without permission errors, e.g. User.objects.using('reporting').create(email='whatever@wherever.com')

i have same issue

@pmaresca
Copy link

Is it possible to know which db a test that specifies multiple db's is intended to be running against? i.e. I've got two db's and the test runs twice, does the test know which db this run is for?

@julianmoji
Copy link

julianmoji commented Nov 14, 2023

Nice guys, Everything works, is there some way to avoid the database flushing??

@hannylicious
Copy link

Everything seems to be working for me! Great feature, IMO.

@kkrasovskii
Copy link

kkrasovskii commented Feb 7, 2024

Hi there all!

Please, could anybody explain how to run a test for specific database from multi databases configuration?

I have defined three databases in settings.py: 'default', 'postgres1' and 'postgres2'. The first one is default Django sqlite3 DB. The second one with read-only access and the last one is with read-write access. I need to check that my model is created in 'postgres2' database with read-write access. So, I wrote the test:

import pytest
from my_app.models import MyModel


@pytest.mark.django_db(databases=['postgres2'])
def test_create_my_model():
    MyModel.objects.create()

If I run the test, I get an error saying that there is no access to 'postgres1' database with read-only access (yep, there is no connection to postgres1, but I expect it will not be used).
What is wrong here?

Thanks in advance!

@gonzaloamadio
Copy link

gonzaloamadio commented Feb 7, 2024 via email

@mschoettle
Copy link
Contributor

@kkrasovskii I suspect that you need to tell the ORM which DB to use for this model. Either via a DB router (https://docs.djangoproject.com/en/dev/topics/db/multi-db/#automatic-database-routing) or manually (https://docs.djangoproject.com/en/dev/topics/db/multi-db/#manually-selecting-a-database).

Not sure why it is trying postgres1 instead of default though.

@kkrasovskii
Copy link

kkrasovskii commented Feb 8, 2024

@gonzaloamadio, @mschoettle, than you so much for reply!

The problem doesn't seem to be routing. My apps work fine with the databases: the problem with tests.

django version 5.0.2,
pytest-django version 4.8.0

As I've mentioned earlier, three databases described in settings.py: sqlite3 as 'default' and two postgres databases ('postgres1', 'postgres2'). When I run the test, there is no connection to 'postgres1'. I rewrote the test the way @gonzaloamadio suggested:

from django.test import TestCase
from my_app.models import MyModel

class MyTestCase(TestCase):
   databases = ['default', 'postgres2']

   def setUp(self):
       MyModel.objects.create()

   def test_something(self):
       pass

DB routing is done so that MyModel is written to base 'postgres2'.

If I run pytest command, I get an error that there is no connection to 'postgres1'. In the case I also get the warning:

RuntimeWarning: Normally Django will use a connection to the 'postgres' database to avoid running initialization queries against the production database when it's not needed (for example, when running tests). Django was unable to create a connection to the 'postgres' database and will use the first PostgreSQL database instead.

If I run the same command with 'postgres1' config removed from settings.py, the test is passing.
If I run python manage.py test, the test is always passing (with and without 'postgres1' configuration in settings.py).

@mschoettle
Copy link
Contributor

I am not entirely sure if this will fix it but we had some issues with the test databases and had to add the following to each database connection setting to force using the correct database in tests:

'TEST': {
            'NAME': f'test_{env("DATABASE_NAME")}',
        },

@dferens
Copy link

dferens commented Feb 14, 2024

One more solution in #1113

@lonetwin
Copy link

lonetwin commented May 24, 2024

Hi, I ended up here when searching for a solution to use pytest with a memory based sqlite db backend, for tests in an application where the production db is postgres, since the application is largely db engine agnostic.

The rationale being in-memory sqlite is blazing fast. The only hiccup being that a couple of my tests do require the postges db due to limitations in the sqlite engine.

I naively thought this might be sufficient:

# in tests/settings.py (which imports the application settings and overrides test specific config
...
DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.sqlite3",
        "NAME": ":memory:?cache=shared",
    },
    "postgres": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": "...",
        ...
    },
}
...

# in tests that invoke queries not supported by sqlite

@pytest.mark.django_db(databases=["postgres"])
def test_complex_querying(db):
    ...

Unfortunately this results in

django.test.testcases.DatabaseOperationForbidden: Database queries to 'default' are not allowed in this test.
Add 'default' to pytest_django.fixtures._django_db_helper.<locals>.PytestDjangoTestCase.databases
 to ensure proper test isolation and silence this failure.

My assumption here is that this potentially is due to the fact that the model being queried (which actually is constructed from a FactoryBoy factory) is somehow registered with the default db.

Now before I go down the rabbit hole to attempting to address that is there something else that I should be aware about w.r.t pytest-django's db setup ? For instance, IIUC, the db setup is session scoped, which means that if I create a postges specific db fixture then I'd have to override/duplicate all of is being done when requesting the regular db fixture.

Is this correct ? Is there a simpler way that I am not aware of ?

TBH, the payoff for this "sqlite db with pg when required" is significant (integration test time down from 14m to 3m bar the 2 failing tests that fail due to the link above). So, I would really like to pursue this.

Edit: I just thought of an alternate approach of using pytest marks to mark some tests pg specific and then invoke pytest twice -- one using settings where postges is a the default db add limits to running the marked tests and the other that uses sqlite as the default db and runs all the unmarked tests.

Again, is there a simpler way to express this ?

@SHxKM
Copy link

SHxKM commented Jun 2, 2024

Is the current support meant to include support for the MIRROR setting? Everything except that was working for me, and I also couldn't write the test data directly to the second database without permission errors, e.g. User.objects.using('reporting').create(email='whatever@wherever.com')

@lachlancannon any chance you've found a solution to this? I'm using two DATABASE entries just so I can have a separate connection (timeout) for slower queries, but seems like pytest is treating it as an entirely different database.

@danielmcquillen
Copy link

danielmcquillen commented Jun 20, 2024

Wondering what the status is of this feature is.

I have two databases, a 'default' and a read-only DB, and would like to be able to have 'default' work as normal in a TestCase, but have the read-only database be connected to an actual, existing db in postgresql, not a dynamically-created db for the test.

(Background: I use the standard approach in Django of a using custom model class for models managed by the read-only DB, e.g.

class CustomExternalModel(models.Model):
    class Meta:
        managed = False
        abstract = True

...and then use a custom database router to know whether a model uses the 'default' or the read only DB.)

However I'm unclear how to configure pytest and pytest-django to set up my TestCase so that it creates a test db normally for 'default' but links directly to the existing read-only database (and doesn't add "test_" to the name or try to build the db dynamically).

I feel the answer might be spread out across multiple responses above but I'm not sure what it is or whether solutions above are still valid.

Thanks for any help! (I'll remove this comment and put somewhere else if there's a support channel I'm missing. In that case, apologies for misplaced question.)

@hannylicious
Copy link

hannylicious commented Jun 21, 2024

Wondering what the status is of this feature is.

AFAIK, it's still working fine.

I have two databases, a 'default' and a read-only DB, and would like to be able to have 'default' work as normal in a TestCase, but have the read-only database be connected to an actual, existing db in postgresql, not a dynamically-created db for the test.

I've always been told that's a big 'no-no', as it defeats a lot of the ideas behind testing - that it should be very isolated, reproducible and rely as little on any outside things as possible (ideally it would be no outside resources, which is why Mocks exist).

Warning: lengthy reply. Skip to end for TL;DR.

It's not always that simple though, I get that. Where I work, we have a use case where we have a few db's that are not owned by us - and we have unmanaged models mapping the data out - not unlike your case. We end up having pytest spin up a 'fake test db' (similar to the db we don't own), then we have a script that creates the dummy data in that database for the tests we're running (leveraging factories / factory-boy, --no-migrations and the hypothesis library where wanted/needed). We even have scripts that setup an entire apps worth of data for end-to-end testing, too. That way, we're in control of the data in the db (even if it's randomized) and we know exactly what is going on, where and how. If you test against a live db with data already in it? You lose that control, the flexibility, the reproducibility, and there is always the potential for something to have changed in the live db beyond your control which will utterly destroy your test suite and cause problems. Not to mention, there may be instances where you want to setup a 'specific niche case' that doesn't exist currently in the live DB - and if you're testing against the live DB? Good luck. Do you really want to go through all that hassle? Realistically, no. And you wouldn't want to intentionally put potentially bad data into a live db just for the sake of testing.

There is also the undue burden on the systems. Hitting a live DB with requests might have a big impact on performance for other users currently using that db. It might cause other unforeseen issues. What if your test was written in such a way that it somehow manipulated the data from the db and saved it to the live db? No, thank you. The speed of sending those requests and the slow(er) replies when testing against a live db also are, in my opinion, reason enough to say 'no way'.

These are just some of the many reasons people recommend not testing against a live db. And there are lots more.

There are lots of solutions - you could mock the db response for the unmanaged models (giving you control, reproducibility, etc.), you could spin up randomized data using factory_boy like we do (again, bringing you all the benefits mentioned previously), you could use fixtures to fill the test_db with appropriate data similar to (if not exactly the same as) your real-life db. IMO fixtures are the most 'clunky' way to do it and get old fast - but it literally gives you an exact replica of your 'real life db' (completely with the existing data that exists in the db) for your local testing. There are so many ways to spin up the exact same data that is currently in your live db in the test environment using the test db's that it really should never be a consideration to test against the live db.

Maybe someone will come along with some examples of testing against live dbs and a perfect reason as to why they do it - but I personally have yet to ever have found a single one, honestly. I'm happy to entertain the idea though - it's just that I personally have never found one.

TL;DR: Don't test against live db's. Spin up your data in a local test_db and let pytest handle the creation/tear-down. Use scripts/libraries/fixtures to replicate your data in the local test db.

@danielmcquillen
Copy link

danielmcquillen commented Jun 21, 2024

@hannylicious Thanks for your thoughtful response.

Sorry I should have clarified that the second db is read-only but doesn't need to be the live version. It's just a very static, kind of complex data store that is used as background data in a lot of operations spread widely across the main app. Also these are more of the integration tests than the unit tests...in the unit tests I think I've done a reasonable job of using patches and avoiding dependencies like this.

In the past I've tried to make a lot of MagicMock objects and patching them here...and there...and dang over there too...but I found myself spending way to much time trying to figure out where to inject them in different places of the app and the tests were getting brittle. I also spent a lot of time in tests trying to reconstruct just a part of this very stationary, very non-changing read-only db.

So I wanted to just create one real database that could be used in tests and then just set the tests up to connect to it. This db could act like fixtures that don't need to be loaded (and already relational!). And yes, a step away from that is your comment about writing scripts / fixtures to populate a portion or a complete representation of it.

But if all of your read-only models are set up as unmanaged and you're using a router to decide between databases for each model, it becomes harder to populate the read-only test db in your tests. Now you need some kind of custom test runner to change things...maybe look for and iterate through this subset of models and change them back to managed, and maybe do some other things to get the test database migrated and ready. Also, your read-only Django models might only model a portion of the target db...yet there might be foundational data not reflected in those (facade-like) Django model classes that's needed in the target db for it to be relationally sound. Setting up the tests to do this makes the tests more brittle and hard to understand when you come back to them, when you really only need the unchanging, read-only db attached and ready for the integration test to run.

But it may be I'll need to do the above as it's the proper way. If so, I haven't found a reference implementation for the current best practice on how to do this. Is there a gist or other resource you'd recommend that shows the latest way to handle this? It seems like decorators have changed and some of the earlier examples I've found aren't current. I can't seem to find one place that shows how to do this...and my limited understanding prevents me from putting them together in the right way.

Thanks again for any thoughts.

@bluetech
Copy link
Member Author

I agree with @hannylicious said, but if you still want to do it, I think you can use the TEST setting in DATABASES, see https://docs.djangoproject.com/en/5.0/ref/settings/#test. You want to point it at your real db and tell it not to create or migrate etc.

@hannylicious
Copy link

Now you need some kind of custom test runner to change things...maybe look for and iterate through this subset of models and change them back to managed, and maybe do some other things to get the test database migrated and ready.

That's not necessarily true. You could do something like this:

class UnmanagedModel(Model):
    id      = models.IntegerField(db_column='ID', primary_key=True)
    year    = models.IntegerField(db_column='year')
    thing  = models.SubFactory(db_column='thing')
    ...whatever else...

    class Meta(object):
        managed = getattr(settings, 'SOME_ENV_VAR', False)
        db_table = '[database].[databaseTable]'

Given something like that - you can use 'SOME_ENV_VAR' to control whether or not it's a 'managed' model or not without the need for excess test-runners and things like that. You could add that into your core settings, your specific test settings, or even add it to the command at run time - however you want! Sure, it's a little wonky, but it works amazingly well and combined with the no-migrations argument to avoid it needing migrations files (in the event you don't have/run migrations against your unmanaged models), and using FactoryBoy to create huge swaths of 'test data' very simply - it becomes much more simple to spin up fully relational data in just a few lines of code. Depending how well you setup your factories? In some cases, you can write one line and have it setup all the related models in the read-only db as well as in the full access db.

Even given your use case, I'm not seeing any particular reason or benefit as to why you should use a real persistent db for the read-only portion of things. IMO, if I were in your position - I would be spinning up whatever test-db as read-only (or not, doesn't matter, you just need some data for the test really...) and just using factory-boy or some scripts to create the data in there I need. Even if the data in the database is 'a lot', create a factory for your unmanaged models that contain all expected attributes and creation pieces and let it manage the creation of the data. You can get as complex or simple as you want! You can use things like SubFactory to have it automatically create all the expected relationships.

Let's assume for the sake of discussion I needed some read-only test data about 2 kids in a school system. I need different kids, different schools, different school years, but same grades recorded in 2 classes - with both kids sharing the same last name of "BRICK" to recreate a bug that someone brought to our attention that only happens in that very specific occurrence. That might sound tricky and/or painful to setup - but using factory_boy?

ReportCardFactory.create_batch(2, science_grade="A", math_grade="B", student__last_name="BRICK")

Depending on how you setup your factories, it would/could create 2 report cards, tied to 2 different students with the last name of "BRICK", at 2 different schools, during different school years, etc. each with an "A" in science and a "B" in math. When you come back to it 6 months from now? Easily readable, easily understood. It would also create any myriad of other attributes/relationships you setup in a fashion to where the data is as randomized as you want or as specific as you want. All from that one line (if you setup the factories to do so). The mix of related models being created could be managed or unmanaged, factory_boy won't care - it's just putting the data into the db, in the related tables where they should be. And yes, it's wildly fast - using an in memory db it takes fractions of a second for this data to be setup. Assuming your test is short and sweet? The entire setup, test and teardown would be in most cases way, way faster.

I wrote this article about it: https://hannylicious.com/blog/testing-django/ ; Yes, the article is out of date. Yes, the article has issues that need correcting and could be more clear. However, the over-arching ideas of quickly/easily creating test data for unmanaged models is still relevant. It's also how I personally prefer to solve the issue. In my experience, letting your test suite handle the db side of things is much faster, more maintainable, more manageable, requires less resources, has less barriers and gives you access to all the benefits that test libraries offer.

I'm all for learning better ways to do things - so if there are times where a live read-only db would be appropriate, I'd love to hear about it! But from what you're describing? Again, in my opinion, I don't see any reason you can't just spin that data up at the start of your test using whatever test db you see fit or whatever db your test suite is designed to spin up and would still heavily recommend using the test-db your suite gives you and letting it do the maintenance of that.

@meshy
Copy link
Contributor

meshy commented Jul 3, 2024

Just dropping in a quick comment here to mention that I've opened a ticket mentioning a specific issue I've had with the multi-db support.

@sidmitra
Copy link

sidmitra commented Jul 9, 2024

Is the current support meant to include support for the MIRROR setting? Everything except that was working for me, and I also couldn't write the test data directly to the second database without permission errors, e.g. User.objects.using('reporting').create(email='whatever@wherever.com')

While multi-db to work with different DB works, i haven't yet been able to figure out how MIRROR works.

I looked at the tests in the project. looks like i was missing transaction=True

 @pytest.mark.django_db(transaction=True, databases=["default", "replica"])
    def test_replica_mirrors_default_database(self, request: pytest.FixtureRequest) -> None:
        Item.objects.create(name="spam")
        Item.objects.using("replica").create(name="spam")

        assert Item.objects.count() == 2
        assert Item.objects.using("replica").count() == 2

Note that databases=["__all__"] also works if you want to enable all tests.

cc @lachlancannon @smohsenmohseni @SHxKM - not sure if this is what you wanted to know.

@danielmcquillen
Copy link

danielmcquillen commented Jul 10, 2024

Following advice above to work on creating fixtures for both DBs (thanks). My project uses a custom router to map models to the correct db, for sake of argument their names are "default" and "secondarydb".

DATABASE_ROUTERS = ["myapp.database_router.MyDatabaseRouter"]

class MyDatabaseRouter(object):
    """
    A custom database router that lets us map "MyExternalModel"
    models to tables in a read-only database.
    """

    def db_for_read(self, model, **hints):
        if issubclass(model, MyExternalModel):
            return "secondarydb"
        return "default"

    def db_for_write(self, model, **hints):
        if issubclass(model, MyExternalModel):
            if settings.UNDER_TEST:
                # During tests we want to be able to write initial tables
                return "secondarydb"
            else:
                # We don't want writes to the secondarydb database
                return None
        return "default"

    def allow_relation(self, obj1, obj2, **hints):
        # Don't allow relations between two databases.
        return isinstance(obj1, MyExternalModel) == isinstance(
            obj2, MyExternalModel
        )

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        if settings.UNDER_TEST:
            return True
        elif db == "secondarydb":
            return False
        return True

When I run pytest, it seems like my custom database router is being called, but only the allow_migrate method. The db_for_write method isn't called during setup.

...and since that method is not being used, Django is trying to create all model tables in the first db, even if the model is meant to be created in the second db.

Is there a special configuration step in pytest or pytest-django to make sure the database router's db_for_write method is called during test database setup?

Background:

I have set up models that are related the second database so that they inherit from a base class like:

class MyExternalModel(models.Model):
    class Meta:
        managed = False
        abstract = True
        app_label = 'some_app_label'

Unmanaged models are made manageable during tests via conftest.py:

@pytest.fixture(autouse=True, scope="session")
def django_test_environment(django_test_environment):
    from django.apps import apps

    get_models = apps.get_models

    for m in [m for m in get_models() if not m._meta.managed]:
        m._meta.managed = True

And my router makes these read-only via db_for_write ... but should allow writes when under test (but as mentioned above this method isn't being called at all during test setup when Django builds a table in the database for each model).

  def db_for_write(self, model, **hints):
        if issubclass(model, MyExternalModel):
            if settings.UNDER_TEST:
                # During tests we want to be able to write 
                return "secondarydb"
            else:
                # Normally we don't want writes to the secondary database
                return None
        return "default"

django==5.0.6
pytest==8.2.2
pytest-django==4.8.0

@sstanovnik
Copy link

I was faced with a similar problem: having multiple databases, with one of them being read-only.

My specific constraints are:

  • there are one or more databases that need to be switched to in-memory SQLite connections for testing, as is tradition
  • there is one external PostgreSQL instance which is always read-only, pre-populated with data, which is a very expensive operation
  • I want to touch non-test code as little as possible, so changing the router is out of the question
  • tests use multiple databases (with a properly configured router and settings, which is out of scope for this comment) marked with @pytest.mark.django_db(databases=["a", "b"])

As a bit of background, I was, as far as I understand, in a similar situation to @danielmcquillen where there were many (many) mocks all over the place, and it was getting unwiedly, to put it mildly.
Tests had already used either the (testing) read-write database or the read-only database, but never both at the same time.
If the read-only database was used, the django_db_setup fixture was overridden to be a no-op which is fine, but doesn't work on a specific database, but all of them at once.
For (e.g. end-to-end) tests that obviously depended on both databases, mocks were applied liberally throughout the tests, making them brittle by depending on irrelevant implementation details which ground my gears like you wouldn't believe.

So, to avoid mocking the entire application for each test case, I created a pluggable monkeypatch that leaves the database alone.

First a global test/foobar_test/conftest.py:

from collections.abc import Callable
from typing import Any

import pytest
from django.db.backends.base.creation import BaseDatabaseCreation


@pytest.fixture(scope="session", autouse=True)
def _fixture_leave_database_alone() -> None:
    """Allows running tests against multiple databases, where some databases are static and read-only.

    Depends on `settings.DATABASES[db]["TEST"]["FOOBAR_TESTING_LEAVE_DATABASE_ALONE"]` being set,
    which is also how this fixture is used.
    The fixture is always present (autouse=True).
    """

    def alone_leaver_decorator(func: Callable[..., Any]) -> Callable[..., Any]:
        def alone_leaver(self: BaseDatabaseCreation, *args: Any, **kwargs: Any) -> Any:
            if self.connection.settings_dict.get("TEST", {}).get("FOOBAR_TESTING_LEAVE_DATABASE_ALONE", False):
                return self._get_test_db_name()  # type: ignore[attr-defined]
            return func(self, *args, **kwargs)

        return alone_leaver

    mp = pytest.MonkeyPatch()
    mp.setattr(BaseDatabaseCreation, "create_test_db", alone_leaver_decorator(BaseDatabaseCreation.create_test_db))
    mp.setattr(BaseDatabaseCreation, "destroy_test_db", alone_leaver_decorator(BaseDatabaseCreation.destroy_test_db))

And then modify the test subset of settings.py just a little bit to accomodate this fixture:

DATABASES = {
    "my-read-only-database": {
        "TEST": {
            "MIGRATE": False,
            "FOOBAR_TESTING_LEAVE_DATABASE_ALONE": True,  # new custom setting
        },
    },
}

Everything else is left as-is.

This works by intercepting database creation, and potentially more importantly, destruction, with a custom decorator, which prevents the implementation from executing if you say so in the configuration.
And because that config is a first-class citizen of Django's testing overrides, I have no qualms with modifying it a bit more to fit some more testing overrides.

The fixture will "just work" because it's autouse=True.

I believe the solution will work for all database providers, but if any override the methods themselves, the monkeypatched implementation obviously wouldn't be called, so they wouldn't take effect.

For anyone eagle-eyed enough to notice that I didn't use pytest's monkeypatch fixture but instead created it myself: notice the scope of the fixture.
It needs to be session-scoped so it happens before databases are initialized, but the monkeypatch fixture is function-scoped, so there's a mismatch and pytest gets angry.

Hope this helps anyone else reduce the number of mocks too.

Footnote with versions:

  • Django==4.2.15
  • pytest==8.3.2
  • pytest-django==4.8.0

@RemiDesgrange
Copy link

My 2cts on this.

I have 3 DB in my project. One of which is not managed by Django.

What I do for now is:

@contextmanager
def unmanaged_table_for_tests(model: type[Model]):
    con = connections["non_default"]
    model._meta.managed = True
    with conn.schema_editor() as schema_editor:
        schema_editor.create_model(model)
        yield
        schema_editor.delete_model(model)

then adding a pytest.mark.django_db(databases=['default', 'non_default'), transaction=True) to all my test.

IIRC this fixture

def _django_db_helper(

it's not possible to patch databases without rewriting the whole fixture. (maybe there is a way that I'm not aware of).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests