Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Serval API to use IJob and IEngine interfaces #502

Closed
wants to merge 1 commit into from

Conversation

johnml1135
Copy link
Collaborator

@johnml1135 johnml1135 commented Oct 1, 2024

This is to prepare to add a word alignment (and also aqua enhanced word alignment) engine. Adding the engines will be a separate PR.


This change is Reviewable

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serval is very intentionally designed around the philosophies laid out in Domain Driven Design (DDD). DDD places an emphasis on clearly modeling the software around the system's domain. One of the core principles of DDD is that it is not possible to define a unified data model for a complex domain. The same basic concept might have different representations that make sense in different contexts. This makes it difficult to share the same representation of an entity across different contexts. DDD introduces the idea of bounded contexts to help alleviate these issues. You define a unified language or model within a bounded context. Common concepts are mapped between bounded contexts using defined APIs and contracts. This allows each bounded context to evolve independently of each other. DDD and bounded contexts fit well within a system that needs to grow, scale, evolve, be extended over time. It it especially useful when the system is developed incrementally and the domain model hasn't been defined yet. This is very much how we intended to develop Serval. We are staring with machine translation, but we intend to add many different types of NLP tools to Serval in the future. We don't want to redesign the domain model everytime we introduce a new type of NLP tool. I specifically designed each tool type as a bounded context, so that we can grow the domain over time. It does mean that there might be some model overlap across bounded contexts, but I'm willing to pay that small price to give us the flexibility to evolve each bounded context independently of each other.

We can still share code across bounded contexts where it makes sense, but we shouldn't share models across bounded contexts. If you look at Serval.Shared, it contains shared contracts that are used for communication between bounded contexts and reusable components/services, such as ServalControllerBase, OwnedEntityServiceBase, and ScriptureDataFileService. It does not try to unify domain models across bounded contexts.

This PR seems to be mostly about attempting to unify the models across bounded contexts, which contradicts the design goals for Serval. We should be looking for opportunities to share code/functionality.

Reviewable status: 0 of 79 files reviewed, all discussions resolved

@johnml1135
Copy link
Collaborator Author

@ddaspit - I spent some time reviewing domain driven design and when it is most effective to be used: https://learn.microsoft.com/en-us/previous-versions/msp-n-p/ee658117(v=pandp.10). Here is a key idea that I found: "While Domain Driven Design provides many technical benefits, such as maintainability, it should be applied only to complex domains where the model and the linguistic processes provide clear benefits in the communication of complex information, and in the formulation of a common understanding of the domain.".
As I have reviewed the architecture, here are a few observations:

  • The "different meanings for the same word" occurs less - it is not like "meter" in the example used. Mostly it occurs when the data is transferred from one layer to another, not in a separate domain.
  • I am assuming that any processes foreseeably implemented in Serval will have these characteristics:
    • There will be a collection of data arranged in a corpora and processed to create a reusable engine or to get immediate results (or both).
    • This processing will be fairly intensive, which is why it is not done locally.
  • The "plumbing" of Serval is not really domain specific, especially the management of engines, jobs and the gathering of results. Each one may have an extra field here or there, or an extra routine (needs to be extensible), but the core job running and processing data on a built engine is the same.

Therefore, I think the following Serval plumbing can be shared with all engine types:

  • Message outbox (in process)
  • Webhooks for job status (in process)
  • Management of engines, jobs and results in the database (implemented here)
  • Creating, and managing Hangfire and ClearML jobs (not done yet)
  • Old file cleanup

These would then be the unique things needed for each job:

  • Definitions for all the parameters and Job options for engine and Job creation
  • Hosting the engine in the engine server for live inferencing (or on Modal or somewhere else in the future for NMT jobs)
  • Pre- and post- processing job routines
  • Main build job
  • Extensions to query the build model (translate, get alignment, etc.)

I think changing Serval from "welcome all microservices" to the "one pipeline with multiple types of data" will end up easing the addition of more types of processing with minimal effort.

@johnml1135 johnml1135 closed this Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants