Refactor Serval API to use IJob and IEngine interfaces #502

johnml1135 · 2024-10-01T17:06:26Z

This is to prepare to add a word alignment (and also aqua enhanced word alignment) engine. Adding the engines will be a separate PR.

This change is

ddaspit

Serval is very intentionally designed around the philosophies laid out in Domain Driven Design (DDD). DDD places an emphasis on clearly modeling the software around the system's domain. One of the core principles of DDD is that it is not possible to define a unified data model for a complex domain. The same basic concept might have different representations that make sense in different contexts. This makes it difficult to share the same representation of an entity across different contexts. DDD introduces the idea of bounded contexts to help alleviate these issues. You define a unified language or model within a bounded context. Common concepts are mapped between bounded contexts using defined APIs and contracts. This allows each bounded context to evolve independently of each other. DDD and bounded contexts fit well within a system that needs to grow, scale, evolve, be extended over time. It it especially useful when the system is developed incrementally and the domain model hasn't been defined yet. This is very much how we intended to develop Serval. We are staring with machine translation, but we intend to add many different types of NLP tools to Serval in the future. We don't want to redesign the domain model everytime we introduce a new type of NLP tool. I specifically designed each tool type as a bounded context, so that we can grow the domain over time. It does mean that there might be some model overlap across bounded contexts, but I'm willing to pay that small price to give us the flexibility to evolve each bounded context independently of each other.

We can still share code across bounded contexts where it makes sense, but we shouldn't share models across bounded contexts. If you look at Serval.Shared, it contains shared contracts that are used for communication between bounded contexts and reusable components/services, such as ServalControllerBase, OwnedEntityServiceBase, and ScriptureDataFileService. It does not try to unify domain models across bounded contexts.

This PR seems to be mostly about attempting to unify the models across bounded contexts, which contradicts the design goals for Serval. We should be looking for opportunities to share code/functionality.

Reviewable status: 0 of 79 files reviewed, all discussions resolved

johnml1135 · 2024-10-07T18:17:30Z

@ddaspit - I spent some time reviewing domain driven design and when it is most effective to be used: https://learn.microsoft.com/en-us/previous-versions/msp-n-p/ee658117(v=pandp.10). Here is a key idea that I found: "While Domain Driven Design provides many technical benefits, such as maintainability, it should be applied only to complex domains where the model and the linguistic processes provide clear benefits in the communication of complex information, and in the formulation of a common understanding of the domain.".
As I have reviewed the architecture, here are a few observations:

The "different meanings for the same word" occurs less - it is not like "meter" in the example used. Mostly it occurs when the data is transferred from one layer to another, not in a separate domain.
I am assuming that any processes foreseeably implemented in Serval will have these characteristics:
- There will be a collection of data arranged in a corpora and processed to create a reusable engine or to get immediate results (or both).
- This processing will be fairly intensive, which is why it is not done locally.
The "plumbing" of Serval is not really domain specific, especially the management of engines, jobs and the gathering of results. Each one may have an extra field here or there, or an extra routine (needs to be extensible), but the core job running and processing data on a built engine is the same.

Therefore, I think the following Serval plumbing can be shared with all engine types:

Message outbox (in process)
Webhooks for job status (in process)
Management of engines, jobs and results in the database (implemented here)
Creating, and managing Hangfire and ClearML jobs (not done yet)
Old file cleanup

These would then be the unique things needed for each job:

Definitions for all the parameters and Job options for engine and Job creation
Hosting the engine in the engine server for live inferencing (or on Modal or somewhere else in the future for NMT jobs)
Pre- and post- processing job routines
Main build job
Extensions to query the build model (translate, get alignment, etc.)

I think changing Serval from "welcome all microservices" to the "one pipeline with multiple types of data" will end up easing the addition of more types of processing with minimal effort.

Refactor Serval API to use IJob and IEngine interfaces

37e0b2f

johnml1135 requested a review from ddaspit October 1, 2024 17:06

ddaspit reviewed Oct 4, 2024

View reviewed changes

johnml1135 closed this Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Serval API to use IJob and IEngine interfaces #502

Refactor Serval API to use IJob and IEngine interfaces #502

johnml1135 commented Oct 1, 2024 •

edited by ddaspit

Loading

ddaspit left a comment •

edited

Loading

johnml1135 commented Oct 7, 2024

Refactor Serval API to use IJob and IEngine interfaces #502

Refactor Serval API to use IJob and IEngine interfaces #502

Conversation

johnml1135 commented Oct 1, 2024 • edited by ddaspit Loading

ddaspit left a comment • edited Loading

Choose a reason for hiding this comment

johnml1135 commented Oct 7, 2024

johnml1135 commented Oct 1, 2024 •

edited by ddaspit

Loading

ddaspit left a comment •

edited

Loading