Use structured logs and export traces

Log to a file in ~/.porter/logs/CORRELLATION_ID.json and in colored human readable output to stderr. When stderr is not a TTY, color is omitted (which is what we need for piping to a file or running on headless CI servers). Tracing has been enabled using open-telemetry and can be configured with the standard OTEL environment variables. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md#specify-protocol This is just a stub of the final tracing and log switch over, that lays down a framework for converting the rest and instrumenting more commands. For now we start a trace for the command executed. Signed-off-by: Carolyn Van Slyck <me@carolynvanslyck.com>
carolynvs · Jan 4, 2022 · 631831f · 631831f
1 parent fdeebb9
commit 631831f
Show file tree

Hide file tree

Showing 79 changed files with 1,773 additions and 280 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -14,6 +14,7 @@
   * [Makefile explained](#makefile-explained)
   * [Install mixins](#install-mixins)
   * [Preview documentation](#preview-documentation)
+  * [View a trace of a Porter command](#view-a-trace-of-a-porter-command)
   * [Write a blog post](#write-a-blog-post)
 * [Code structure and practices](#code-structure-and-practices)
   * [What is the general code layout?](#what-is-the-general-code-layout)
@@ -436,6 +437,42 @@ a new blog post and then preview it:
 Our pull request preview and the live site will not show posts with a date in
 the future. If you don't see your post, change the date to today's date.
 
+## View a trace of a Porter command
+
+Porter has an experimental feature, structured-logs, that sends trace data about the commands run to an OpenTelemetry backend.
+It can be very helpful when figuring out why a command failed because you can see the values of variables and stack traces.
+
+In development, you can use the [otel-jaeger bundle] to set up a development instance of Jaeger, which gives you a nice website to see each command run.
+
+```
+porter install --reference carolynvs/otel-jaeger:v0.1.0 --allow-docker-host-access
+```
+
+Then to turn on tracing in Porter, set the following environment variables.
+This tells Porter to turn on tracing, and connect to OpenTelemetry server that you just installed.
+
+**Posix**
+```bash
+export PORTER_EXPERIMENTAL="structured-logs"
+export PORTER_TELEMETRY_ENABLED="true"
+export OTEL_EXPORTER_OTLP_PROTOCOL="grpc"
+export OTEL_EXPORTER_OTLP_INSECURE="true"
+```
+
+**Powershell**
+```powershell
+$env:PORTER_EXPERIMENTAL="structured-logs"
+$env:PORTER_TELEMETRY_ENABLED="true"
+$env:OTEL_EXPORTER_OTLP_PROTOCOL="grpc"
+$env:OTEL_EXPORTER_OTLP_INSECURE="true"
+```
+
+Next run a Porter command to generate some trace data, such as `porter list`.
+Then go to the Jaeger website to see your data: http://localhost:16686.
+On the Jaeger dashboard, select "porter" from the service drop down, and click "Find Traces".
+
+[otel-jaeger bundle]: https://github.com/getporter/example-bundles/tree/main/otel-jaeger
+
 ## Command Documentation
 
 Our commands are documented at <https://porter.sh/cli> and that documentation is

diff --git a/cmd/porter/bundle.go b/cmd/porter/bundle.go
@@ -63,7 +63,7 @@ func buildBundleBuildCommand(p *porter.Porter) *cobra.Command {
 			return opts.Validate(p)
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return p.Build(opts)
+			return p.Build(cmd.Context(), opts)
 		},
 	}
 
@@ -144,7 +144,7 @@ For example, the 'debug' driver may be specified, which simply logs the info giv
 			return opts.Validate(args, p)
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return p.InstallBundle(opts)
+			return p.InstallBundle(cmd.Context(), opts)
 		},
 	}
 
@@ -196,7 +196,7 @@ For example, the 'debug' driver may be specified, which simply logs the info giv
 			return opts.Validate(args, p)
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return p.UpgradeBundle(opts)
+			return p.UpgradeBundle(cmd.Context(), opts)
 		},
 	}
 
@@ -249,7 +249,7 @@ For example, the 'debug' driver may be specified, which simply logs the info giv
 			return opts.Validate(args, p)
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return p.InvokeBundle(opts)
+			return p.InvokeBundle(cmd.Context(), opts)
 		},
 	}
 
@@ -304,7 +304,7 @@ For example, the 'debug' driver may be specified, which simply logs the info giv
 			return opts.Validate(args, p)
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return p.UninstallBundle(opts)
+			return p.UninstallBundle(cmd.Context(), opts)
 		},
 	}
 
@@ -356,7 +356,7 @@ Note: if overrides for registry/tag/reference are provided, this command only re
 			return opts.Validate(p.Context)
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return p.Publish(opts)
+			return p.Publish(cmd.Context(), opts)
 		},
 	}
 
@@ -388,7 +388,7 @@ func buildBundleArchiveCommand(p *porter.Porter) *cobra.Command {
 			return opts.Validate(args, p)
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return p.Archive(opts)
+			return p.Archive(cmd.Context(), opts)
 		},
 	}
 

diff --git a/cmd/porter/credentials.go b/cmd/porter/credentials.go
@@ -109,7 +109,7 @@ will then provide it to the bundle in the correct location. `,
 			return opts.Validate(args, p)
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return p.GenerateCredentials(opts)
+			return p.GenerateCredentials(cmd.Context(), opts)
 		},
 	}
 

diff --git a/cmd/porter/explain.go b/cmd/porter/explain.go
@@ -24,7 +24,7 @@ func buildBundleExplainCommand(p *porter.Porter) *cobra.Command {
 			return opts.Validate(args, p.Context)
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return p.Explain(opts)
+			return p.Explain(cmd.Context(), opts)
 		},
 	}
 	f := cmd.Flags()

diff --git a/cmd/porter/inspect.go b/cmd/porter/inspect.go
@@ -26,7 +26,7 @@ like parameters, credentials, outputs and custom actions available.
 			return opts.Validate(args, p.Context)
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return p.Inspect(opts)
+			return p.Inspect(cmd.Context(), opts)
 		},
 	}
 	f := cmd.Flags()

diff --git a/cmd/porter/installations.go b/cmd/porter/installations.go
@@ -118,7 +118,7 @@ You can use the show command to create the initial file:
 			return opts.Validate(p.Context, args)
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return p.InstallationApply(opts)
+			return p.InstallationApply(cmd.Context(), opts)
 		},
 	}
 

diff --git a/cmd/porter/main.go b/cmd/porter/main.go
@@ -1,13 +1,19 @@
 package main
 
 import (
+	"context"
 	_ "embed"
+	"fmt"
 	"os"
+	"strings"
 
 	"get.porter.sh/porter/pkg/cli"
+	"get.porter.sh/porter/pkg/experimental"
 	"get.porter.sh/porter/pkg/porter"
+	"github.com/pkg/errors"
 	"github.com/spf13/cobra"
 	"github.com/spf13/pflag"
+	"go.opentelemetry.io/otel/attribute"
 )
 
 var includeDocsCommand = false
@@ -16,13 +22,67 @@ var includeDocsCommand = false
 var usageText string
 
 func main() {
-	p := porter.New()
-	defer p.Close()
+	run := func() int {
+		p := porter.New()
+
+		rootCmd := buildRootCommandFrom(p)
+
+		// Trace the command that called porter, e.g. porter installation show
+		calledCommand, formattedCommand := getCalledCommand(rootCmd)
+		ctx, log := p.Log.StartSpan(context.Background(), calledCommand, attribute.String("command", formattedCommand))
+		defer func() {
+			// Capture panics and trace them
+			if panicErr := recover(); panicErr != nil {
+				log.Error(errors.New(fmt.Sprintf("%s", panicErr)), attribute.Bool("panic", true))
+				log.EndSpan()
+				p.Close()
+				os.Exit(1)
+			} else {
+				log.EndSpan()
+				p.Close()
+			}
+		}()
+
+		if err := rootCmd.ExecuteContext(ctx); err != nil {
+			// Ideally we log all errors in the span that generated it,
+			// but as a failsafe, always log the error a the root span as well
+			log.Error(err)
+			return 1
+		}
+		return 0
+	}
 
-	cmd := buildRootCommandFrom(p)
-	if err := cmd.Execute(); err != nil {
-		os.Exit(1)
+	// Wrapping the main run logic in a function because os.Exit will not
+	// execute defer statements
+	os.Exit(run())
+}
+
+// Returns the porter command called, e.g. porter installation list
+// and also the fully formatted command as passed with arguments/flags.
+func getCalledCommand(cmd *cobra.Command) (string, string) {
+	// Ask cobra what sub-command was called, and walk up the tree to get the full command called.
+	var cmdChain []string
+	cmd, _, err := cmd.Find(os.Args[1:])
+	if err != nil {
+		cmdChain = append(cmdChain, "porter")
+	} else {
+		for cmd != nil {
+			cmdChain = append(cmdChain, cmd.Name())
+			cmd = cmd.Parent()
+		}
+	}
+	// reverse the command from [list installations porter] to porter installation list
+	var calledCommand strings.Builder
+	for i := len(cmdChain); i > 0; i-- {
+		calledCommand.WriteString(cmdChain[i-1])
+		calledCommand.WriteString(" ")
 	}
+	calledCommandStr := calledCommand.String()[0 : calledCommand.Len()-1]
+
+	// Also figure out the full command called, with args/flags.
+	formattedCommand := fmt.Sprintf("porter %s", strings.Join(os.Args[1:], " "))
+
+	return calledCommandStr, formattedCommand
 }
 
 func buildRootCommand() *cobra.Command {
@@ -57,7 +117,17 @@ Try our QuickStart https://porter.sh/quickstart to learn how to use Porter.
 				return nil
 			default:
 				p.DataLoader = cli.LoadHierarchicalConfig(cmd)
-				return p.LoadData()
+				err := p.LoadData()
+				if err != nil {
+					return err
+				}
+
+				if p.Config.IsFeatureEnabled(experimental.FlagStructuredLogs) {
+					// When structured logging is enabled, the error is printed
+					// to the console by the logger, we don't need to re-print it again.
+					cmd.Root().SilenceErrors = true
+				}
+				return nil
 			}
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {

diff --git a/cmd/porter/parameters.go b/cmd/porter/parameters.go
@@ -109,7 +109,7 @@ will then provide it to the bundle in the correct location. `,
 			return opts.Validate(args, p)
 		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return p.GenerateParameters(opts)
+			return p.GenerateParameters(cmd.Context(), opts)
 		},
 	}
 

diff --git a/docs/config.toml b/docs/config.toml
@@ -207,15 +207,21 @@ pygmentsStyle = "friendly"
       [[menu.main]]
         name = "Copy a Bundle"
         identifier = "copy-bundles"
-        url = "/copy-bundles"
+        url = "/copy-bundles/"
         weight = 5
         parent = "tasks-administrators"
       [[menu.main]]
         name = "Move a bundle across an airgap"
         identifier = "archive-bundles"
-        url = "/archive-bundles"
+        url = "/archive-bundles/"
         weight = 6
         parent = "tasks-administrators"
+      [[menu.main]]
+        name = "Collect Diagnostics"
+        identifier = "diagnostics"
+        url = "/administrators/diagnostics/"
+        weight = 100
+        parent = "tasks-administrators"
 
   # Developers
   [[menu.main]]

diff --git a/docs/content/administrators/diagnostics.md b/docs/content/administrators/diagnostics.md
@@ -0,0 +1,45 @@
+---
+title: Collect Diagnostics from Porter
+description: How to configure Porter to generate logs and telemetry data for diagnostic purposes
+---
+
+When the [structured-logs experimental feature][structured-logs] is enabled, Porter generates two types of data to assist with diagnostics and troubleshooting:
+
+* [Logs](#logs)
+* [Telemetry](#telemetry)
+
+## Logs
+
+Porter can be configured to write logs to the PORTER_HOME/logs directory, for example ~/.porter/logs.
+Each time a Porter command is executed, a new log file is created, formatted in json, containing all the logs output by the command.
+The log lines are filtered by the current log level.
+
+The name of each log file is the command's _correlationId_ which can be used to find the trace for the command executed in the configured [telemetry](#telemetry) backend.
+See [Log Settings] for details on how to configure logging.
+
+## Telemetry
+
+Porter is compatible with the [OpenTelemetry] specification and generates trace data that can be sent to a [compatible services][compat].
+When the [structured-logs experimental feature][structured-logs] and [telemetry] is enabled, Porter automatically uses the standard [OpenTelemetry environment variables] to configure the trace exporter.
+
+Below is an example trace from running the porter upgrade command. You can see timings for each part of the command, and relevant variables used.
+
+![Screen shot of the Jaeger UI showing that porter upgrade was run](/administrators/jaeger-trace-example.png)
+
+If you are running a local grpc OpenTelemetry collector, for example with the [otel-jaeger bundle], you can set the following environment variables to have Porter send telemetry data to it. This turns on the [structured-logs experimental feature][structured-logs], enables telemetry, and uses standard OpenTelemetry environment variables to point to an unsecured grpc OpenTelemetry collector.
+
+* PORTER_EXPERIMENTAL: structured-logs
+* PORTER_TELEMETRY_ENABLED: true
+* OTEL_EXPORTER_OTLP_PROTOCOL: grpc
+* OTEL_EXPORTER_OTLP_ENDPOINT: 127.0.0.1:4317
+* OTEL_EXPORTER_OTLP_INSECURE: true
+
+See [Telemetry Settings][telemetry] for all the supported configuration settings.
+
+[compat]: https://opentelemetry.io/vendors/
+[OpenTelemetry environment variables]: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.8.0/specification/protocol/exporter.md
+[telemetry]: /configuration/#telemetry
+[Log Settings]: /configuration/#logs
+[structured-logs]: /configuration/#structured-logs
+[OpenTelemetry]: https://opentelemetry.io
+[otel-jaeger bundle]: https://github.com/getporter/example-bundles/tree/main/otel-jaeger
diff --git a/docs/content/administrators/jaeger-trace-example.png b/docs/content/administrators/jaeger-trace-example.png