Skip to content

Commit

Permalink
[DOCS] Augments job validation tips (#21184)
Browse files Browse the repository at this point in the history
  • Loading branch information
lcawl committed Jul 27, 2018
1 parent c82e5a2 commit 9395e7c
Show file tree
Hide file tree
Showing 2 changed files with 75 additions and 14 deletions.
2 changes: 1 addition & 1 deletion docs/ml/creating-jobs.asciidoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[role="xpack"]
[[ml-jobs]]
== Creating Machine Learning Jobs
== Creating machine learning jobs

Machine learning jobs contain the configuration information and metadata
necessary to perform an analytics task.
Expand Down
87 changes: 74 additions & 13 deletions docs/ml/job-tips.asciidoc
Original file line number Diff line number Diff line change
@@ -1,24 +1,28 @@
[role="xpack"]
[[job-tips]]
=== Machine Learning Job Tips
=== Machine learning job tips
++++
<titleabbrev>Job Tips</titleabbrev>
<titleabbrev>Job tips</titleabbrev>
++++

When you are creating a job in {kib}, the job creation wizards can provide
advice based on the characteristics of your data. By heeding these suggestions,
you can create jobs that are more likely to produce insightful {ml} results.

[[bucket-span]]
==== Bucket Span
==== Bucket span

The bucket span is the time interval that {ml} analytics use to summarize and
model data for your job. When you create a job in {kib}, you can choose to
estimate a bucket span value based on your data characteristics. Typically, the
estimated value is between 5 minutes to 1 hour. If you choose a value that is
larger than one day or is significantly different than the estimated value, you
receive an informational message. For more information about choosing an
appropriate bucket span, see {xpack-ref}/ml-buckets.html[Buckets].
estimate a bucket span value based on your data characteristics.

NOTE: The bucket span must contain a valid time interval. For more information,
see {ref}/ml-job-resource.html#ml-analysisconfig[Analysis configuration objects].

If you choose a value that is larger than one day or is significantly different
than the estimated value, you receive an informational message. For more
information about choosing an appropriate bucket span, see
{xpack-ref}/ml-buckets.html[Buckets].

[[cardinality]]
==== Cardinality
Expand All @@ -30,16 +34,26 @@ detect when users are accessing resources differently than they usually do.

If the field that you use to split your data has many different values, the
job uses more memory resources. In particular, if the cardinality of the
`partition_field_name` is greater than 100, you are advised to consider
alternative options such as population analysis.
`by_field_name`, `over_field_name`, or `partition_field_name` is greater than
1000, you are advised that there might be high memory usage.

Likewise if you are performing population analysis and the cardinality of the
`over_field_name` is below 10, you are advised that this might not be a suitable
field to use.

For more information, see
field to use. For more information, see
{xpack-ref}/ml-configuring-pop.html[Performing Population Analysis].

[[detectors]]
==== Detectors

Each job must have one or more _detectors_. A detector applies an analytical
function to specific fields in your data. If your job does not contain a
detector or the detector does not contain a
{stack-ov}/ml-functions.html[valid function], you receive an error.

If a job contains duplicate detectors, you also receive an error. Detectors are
duplicates if they have the same `function`, `field_name`, `by_field_name`,
`over_field_name` and `partition_field_name`.

[[influencers]]
==== Influencers

Expand All @@ -60,3 +74,50 @@ do not need more than three. If you pick many influencers, the results can be
overwhelming and there is a small overhead to the analysis.

The job creation wizards in {kib} can suggest which fields to use as influencers.

[[model-memory-limits]]
==== Model memory limits

For each job, you can optionally specify a `model_memory_limit`, which is the
approximate maximum amount of memory resources that are required for analytical
processing. The default value is 1 GB. Once this limit is approached, data
pruning becomes more aggressive. Upon exceeding this limit, new entities are not
modeled.

You can also optionally specify the `xpack.ml.max_model_memory_limit` setting.
By default, it's not set, which means there is no upper bound on the acceptable
`model_memory_limit` values in your jobs.

TIP: If you set the `model_memory_limit` too high, it will be impossible to open
the job; jobs cannot be allocated to nodes that have insufficient memory to run
them.

If the estimated model memory limit for a job is greater than the model memory
limit for the job or the maximum model memory limit for the cluster, the job
creation wizards in {kib} generate a warning. If the estimated memory
requirement is only a little higher than the `model_memory_limit`, the job will
probably produce useful results. Otherwise, the actions you take to address
these warnings vary depending on the resources available in your cluster:

* If you are using the default value for the `model_memory_limit` and the {ml}
nodes in the cluster have lots of memory, the best course of action might be to
simply increase the job's `model_memory_limit`. Before doing this, however,
double-check that the chosen analysis makes sense. The default
`model_memory_limit` is relatively low to avoid accidentally creating a job that
uses a huge amount of memory.
* If the {ml} nodes in the cluster do not have sufficient memory to accommodate
a job of the estimated size, the only options are:
** Add bigger {ml} nodes to the cluster, or
** Accept that the job will hit its memory limit and will not necessarily find
all the anomalies it could otherwise find.

If you are using {ece} or the hosted Elasticsearch Service on Elastic Cloud,
`xpack.ml.max_model_memory_limit` is set to prevent you from creating jobs
that cannot be allocated to any {ml} nodes in the cluster. If you find that you
cannot increase `model_memory_limit` for your {ml} jobs, the solution is to
increase the size of the {ml} nodes in your cluster.

For more information about the `model_memory_limit` property and the
`xpack.ml.max_model_memory_limit` setting, see
{ref}/ml-job-resource.html#ml-analysisconfig[Analysis limits] and
{ref}/ml-settings.html[Machine learning settings].

0 comments on commit 9395e7c

Please sign in to comment.