Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker thread not instrumented using workerData #3486

Closed
amoscatelli opened this issue Dec 13, 2022 · 12 comments
Closed

Worker thread not instrumented using workerData #3486

amoscatelli opened this issue Dec 13, 2022 · 12 comments
Assignees
Labels
bug Something isn't working information-requested Bug is waiting on additional information from the user triage

Comments

@amoscatelli
Copy link

amoscatelli commented Dec 13, 2022

What happened?

Steps to Reproduce

Instrumenting Worker Thread doesn't work when using workerData api.
Using postMessage and message handlers are correctly instrumented.

So for example this works :

                            const worker = new Worker(
                                ... path to job.js
                            );
                            worker.postMessage(
                                <JobParameters>{
                                    ...
                                }
                            );
parentPort.once(
    'message',
    async (jobParameters: JobParameters) => {
        // code here is correctly instrumented
    }
);

Instrumenting doesn't work using workerData instead:

                            const worker = new Worker(
                                ... path to job.js,
                                {
                                    argv: process.argv,
                                    env: process.env,
                                    workerData: <JobParameters>{
                                         ...
                                    },
                                    execArgv: process.execArgv
                                }
                            );
import { workerData } from 'worker_threads';

const jobParameters: JobParameters = workerData;

// code here is not instrumented !

Expected Result

Node.js should be correctly instrumented in a worker thread using workerData, just like when using postMessage.

Actual Result

Node.js is not correctly instrumented, for example http api calls are not traced.

Additional Details

Probably this is a timing issue.
Using postMessage, the Worker Thread message handler, that is the lambda from the code above, has more time to be instrumented.
When using workerData, code is executed earlier, because it is executed straight away without register any handler.

Also I double checked that the issue is present both propagating and not propagating the --require tracing.js and other opentelemetry configuration to the worker thread via argv, env and execArgv as you can see in the code above.

OpenTelemetry Setup Code

import { NodeSDK } from "@opentelemetry/sdk-node";
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { OTLPMetricExporter } from "@opentelemetry/exporter-metrics-otlp-http";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { diag, DiagConsoleLogger, DiagLogLevel } from '@opentelemetry/api';

diag.setLogger(
    new DiagConsoleLogger(),
    DiagLogLevel.INFO
);

let sdk = new NodeSDK(
    {
        traceExporter: new OTLPTraceExporter(),
        metricReader: new PeriodicExportingMetricReader(
            {
                exporter: new OTLPMetricExporter()
            }
        ),
        instrumentations: [
            getNodeAutoInstrumentations()
        ]
    }
);

process.on('SIGTERM', () => sdk.shutdown());

sdk.start()
    .then(
        () => console.log('Tracing initialized')
    ).catch(
        error => console.error('Error initializing tracing ' + error)
    );

package.json

"@opentelemetry/auto-instrumentations-node": "0.35.0",
    "@opentelemetry/exporter-metrics-otlp-http": "0.33.0",
    "@opentelemetry/exporter-trace-otlp-http": "0.33.0",
    "@opentelemetry/sdk-node": "0.33.0",

Relevant log output

No output,
traces/metrics are simply missing/not sent.
@amoscatelli amoscatelli added bug Something isn't working triage labels Dec 13, 2022
@dyladan
Copy link
Member

dyladan commented Dec 14, 2022

Does your worker have its own OTel setup code or do you rely on the main setup for this? We don't do anything particular to support worker threads so I have to assume your worker has its own setup? If that's the case, you may want to try doing a manual setup using trace-base directly. There is a known issue where the SDK setup is asynchronous so you would miss any spans started before the resource detection completes.

@amoscatelli
Copy link
Author

It relies on the main setup.
I don't do anything particular.

Strange thing is that the main setup is inside a javascript used with the --require.

I don't want to manually setup the worker thread tracing, isn't there any way to 'wait' for the required javascript to be fully loaded ?

@amoscatelli
Copy link
Author

Maybe I could try to wait the sdk start ?

@legendecas
Copy link
Member

legendecas commented Dec 20, 2022

The worker environment and the main context environment are two distinct JavaScript execution environments. We can not share direct JavaScript object access between them. The SDK must be properly set up in both environments if you would like to instrument in the worker environment too.

@amoscatelli
Copy link
Author

amoscatelli commented Dec 20, 2022

When I call the WorkerThread the --require switch is used/forwarded just like for the main thread.
The environment variables are copied too.
I can see the sdk is set up :

15:19:54.216 INFO  [ 'Tracing initialized' ] 
15:19:57.205 INFO  [ 'REST JSON -> TYPED JSON : 702 milliseconds.' ] 
15:19:57.962 INFO  [ 'REST JSON -> TYPED JSON : 726 milliseconds.' ] 

The first log line is about sdk being initialized, the other two are about the computations done by the worker threads.
Yet my axios traces are NOT sent, if I use workerData.

I also tried using the await in the tracing.js setup, and this still doesn't work.

Using sendMessage the instrumentation makes the instrumentation work (and I can see my axios traces), so, again, my guess is that workerData is too straightforward and instrumentation isn't completed when axios calls are done by the worker threads.

Is there any wait to wait for the instrumentation to be completed using the --require switch with node ?

@Flarna
Copy link
Member

Flarna commented Dec 21, 2022

This is because of the async nature of NodeSDK in case resource detection is done (which is the default). see here

It also happens on main thread.

You can avoid it by using the NodeTracerProvider which starts sync.

e.g. in your tracing.js

const { ConsoleSpanExporter, SimpleSpanProcessor } = require("@opentelemetry/sdk-trace-base")
const { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node")

const traceExporter = new ConsoleSpanExporter();
const spanProcessor = new SimpleSpanProcessor(traceExporter);

const provider = new NodeTracerProvider();
provider.addSpanProcessor(spanProcessor);
provider.register();

Or you disable autodetection of Resource in NodeSDK.

postMessage is async therefore it happens after SDK has started.

It is vital to delay loading modules to instrument not only the actual operations. If you require the modules to instrument to early they are no instrumented. This could be improved (see #3146)

@pichlermarc pichlermarc added the information-requested Bug is waiting on additional information from the user label Jan 4, 2023
@amoscatelli
Copy link
Author

I think I'll wait for the resolution of #3502

Thank you

@pichlermarc
Copy link
Member

@amoscatelli does the latest version fix your issue? 🙂

@amoscatelli
Copy link
Author

I changed my workplace, so I'll try to engage one of my ex coworker to have this tested.

Thank you for your work !

@alessandro-tucci-visiontech

@amoscatelli @pichlermarc We will be able to try this within the next two weeks. I'll let you know as soon as I will have some useful feedback

@dyladan
Copy link
Member

dyladan commented Mar 1, 2023

Any update?

@alessandro-tucci-visiontech

@dyladan We apologize for the delay wrt the expected return time.
We can now confirm that traces are correctly sent also in the workerData case

@dyladan dyladan closed this as completed Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working information-requested Bug is waiting on additional information from the user triage
Projects
None yet
Development

No branches or pull requests

6 participants