Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

Require less configuration #131

Open
morgo opened this issue Feb 13, 2019 · 6 comments
Open

Require less configuration #131

morgo opened this issue Feb 13, 2019 · 6 comments
Assignees
Labels
feature-request This issue is a feature request

Comments

@morgo
Copy link

morgo commented Feb 13, 2019

Feature Request

Is your feature request related to a problem? Please describe:

Currently lightning requires a lot of configuration:

  • tikv-importer address
  • mydumper data-source-dir
  • tidb host
  • pd host
  • import dir* (temporary file location for tikv-importer)

I would like to find a way where it can be used with minimal configuration. This helps improve convenience/support notice users and experiments.

Describe the feature you'd like:

I am open to ideas on implementation:

  • Can lightning discover a tidb-host and tikv-importer host from PD? That might not work perfectly, since it won't know which one is closest. I think it could absolutely discover the pd host from TiDB though (making the configuration of PD optional).
  • Could lightning be embedded/bundled as a TiDB plugin? it would be nice to load data directly with SQL-like syntax:
    LIGHTNING LOAD 's3://path/to/mydumper' (using the local tidb and learning pd from it).
  • Can tikv-importer somehow become embedded too?
  • Could the mydumper data-source-dir just be moved from configuration, to argument $1 for tidb-lightning?

Describe alternatives you've considered:

This is really only about improving convenience/usability for casual use cases, so there are many alternative implimentations. There are a lot of users that don't want to edit configuration files, but rather just have a tool setup and running with no effort.

Teachability, Documentation, Adoption, Optimization:

@morgo morgo added the feature-request This issue is a feature request label Feb 13, 2019
@kennytm kennytm self-assigned this Feb 14, 2019
@kennytm
Copy link
Collaborator

kennytm commented Feb 14, 2019

Can lightning discover a tidb-host and tikv-importer host from PD? That might not work perfectly, since it won't know which one is closest. I think it could absolutely discover the pd host from TiDB though (making the configuration of PD optional).

Thanks, looks like we could find the PD address from http://tidb-ip:10080/settings

Importer isn't registered on PD though, so we can't use PD to discover Importer.

Could the mydumper data-source-dir just be moved from configuration, to argument $1 for tidb-lightning?

Yes

Could lightning be embedded/bundled as a TiDB plugin?

Can tikv-importer somehow become embedded too?

cc @jackysp?

A plugin is basically a *.so library, we could place the Lightning/Importer code inside the plugin, or make the plugin a front-end which controls Lightning/Importer on another machine.

Placing the code directly inside the plugin is the same as the "mixed deployment" strategy which we no longer recommend. Lightning/Importer are resource intensive programs and doing so may bring down the cluster due to using up all CPU and network bandwidth (making the cluster unresponsive).

+---------------+
| TiDB          |
| +-----------+ |
| | Lightning | |
| +-----------+ |  +------+
| | Importer  +----+ TiKV |
| +-----------+ |  +------+
+---------------+

Placing the code outside the plugin means the user still needs to deploy the two programs. The only difference is being able to execute the command as SQL vs command line. IMO this doesn't improve much usability 🙃.

+--------------+
| TiDB         |
| +----------+ | +-----------+
| | (plugin) +---+ Lightning |
| +----------+ | +-----+-----+
+--------------+       |
                 +-----+-----+  +------+
                 | Importer  +--+ TiKV |
                 +-----------+  +------+

@morgo
Copy link
Author

morgo commented Feb 14, 2019

Importer isn't registered on PD though, so we can't use PD to discover Importer.

.. but it could be? :-) So start tikv-importer with the address of a pd server. This is a similar request to pingcap/tidb#6435

Placing the code directly inside the plugin is the same as the "mixed deployment" strategy which we no matter recommend. Lightning/Importer are resource intensive programs and doing so may bring down the cluster due to using up all CPU and network bandwidth (making the cluster unresponsive).

This is true in the case of a multi-tenant TiDB cluster, but in the common case of lightning, I think I would not be using the cluster until after the data has been restored. So resource saturation is not a problem.

@jackysp
Copy link
Member

jackysp commented Feb 14, 2019

I think make lightning as a plugin of TiDB is a good idea. But if tikv-importer is also embedded, we may meet some CGO issues?

@kennytm
Copy link
Collaborator

kennytm commented Feb 14, 2019

@morgo

.. but it could be? :-) So start tikv-importer with the address of a pd server.

This means we also need to supply the PD address to tikv-importer which is also a configuration ;)

@morgo
Copy link
Author

morgo commented Feb 14, 2019

Placing the code outside the plugin means the user still needs to deploy the two programs. The only difference is being able to execute the command as SQL vs command line. IMO this doesn't improve much usability 🙃.

I think there are actually a few differences here:

  • I normally run command line programs from my bastian host, but it doesn't have the right specs to run tikv-importer/lightning. So it means I need to provision a VM to run lightning (of which the specs will likely be similar to TiDB: lots of CPUs and memory). So I am actually happy to use the mixed deployment model to avoid thinking about provisioning (even if I am using cloud).
  • Using SQL to initiate means the pd/tidb locations are already known (tikv-importer is not).
  • Using SQL means avoiding local shell access (we could more easily incorporate it into the web GUI, work with various security policies..). Ideally it is not relying on local file access too, but I have created Restore from S3 compatible API? #69 for that.

@IANTHEREAL
Copy link
Collaborator

Regarding the embedded problem, the easy way to use it has great benefits for the product.
In addition, lightning may support online import later, so we need to carefully consider the hybrid deployment;
another idea, if lightning can be A special form of tidb exists (only for import services), maybe it can solve this problem

howerver it is not able to achieve it immediately, we can first focus on optimizing the configuration

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-request This issue is a feature request
Projects
None yet
Development

No branches or pull requests

4 participants