Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry (OTel) logging in ARAX #2186

Closed
saramsey opened this issue Oct 26, 2023 · 23 comments
Closed

OpenTelemetry (OTel) logging in ARAX #2186

saramsey opened this issue Oct 26, 2023 · 23 comments
Assignees

Comments

@saramsey
Copy link
Member

Hi all,

The latest three-month development milestones for Translator are calling for all ARAs and KPs to implement OpenTelemetry logging of web API calls, by end of December. We will need to do this for ARAX and (I suppose, insofar as it does call PloverDB via a web API) RTX-KG2.

@saramsey
Copy link
Member Author

saramsey commented Nov 1, 2023

I've reached out to Yaphet to better understand the requirements, and I have learned the following:

  1. Both ARAs and KPs are expected to implement OpenTelemetry logging
  2. Both incoming and outcoming web requests are to be logged via OpenTelemetry
  3. There will be one Jaeger server per ITRB maturity level, which I guess (from within ITRB) we can address like this:
jaeger_host: "jaeger-otel-agent.sri"
jaeger_port: "6831"

Unclear to which Jaeger service we can log, from services running on arax.ncats.io (I need to find out).

@saramsey
Copy link
Member Author

saramsey commented Nov 16, 2023

Still discussing with Yaphet how to get a Jaeger server (addressable on the Internet) spun up that we can use in development work.

@saramsey
Copy link
Member Author

saramsey commented Nov 16, 2023

Latest info from Yaphet is:

Hi
@Chris Bizon (SRI, Ranking Agent), @ramseyst
, we could use the docker compose file here https://github.com/TranslatorSRI/Jaeger-demo/blob/main/jaeger-docker-compose.yaml and point the code to port 4318 as seen here (https://github.com/TranslatorSRI/Jaeger-demo/blob/main/service-C/server.py#L41) that would work, or the docker image itself can be stood up with out the docker compose file as outline here (https://www.jaegertracing.io/docs/1.50/getting-started/#all-in-one)

@dkoslicki
Copy link
Member

@kvnthomas98 This is something that NCATS wants quite soon. Is this something you can work on, pausing your other MVP2 work?

@saramsey
Copy link
Member Author

Thank you @dkoslicki

@kvnthomas98 kvnthomas98 self-assigned this Nov 17, 2023
@kvnthomas98
Copy link
Collaborator

Sure @dkoslicki I will look it into this.

@kvnthomas98
Copy link
Collaborator

kvnthomas98 commented Nov 27, 2023

Hi @saramsey ,

Do you have the ITRB jaeger endpoints for each maturity level?

@saramsey
Copy link
Member Author

We don't yet have access to an already-running (as in, provided for us by the SRI team) Jaeger endpoint that is on the Internet (though I understand that Yaphet is researching how to set that up).

However, within an ITRB-deployed container, I am told that the following OpenTelemetry configuration should work, and the hostname should resolve:

  jaeger_host: "jaeger-otel-agent.sri"
  jaeger_port: "6831"

I have not tested that, however. And it seems (to me) not ideal if our only way to test it out, is to deploy to ITRB CI.

@edeutsch
Copy link
Collaborator

Note that they are working on some documentation here:
https://github.com/NCATSTranslator/TranslatorTechnicalDocumentation/pull/53/files

It would likely be useful to read that and provide feedback/comments. It is an open PR.

@saramsey
Copy link
Member Author

@kvnthomas98 I think the documentation that @edeutsch linked is helpful; it explains how to run a local Jaeger, which we can use in development and testing. I was hoping that SRI would provide us with an Internet-addressable Jaeger endpoint that we could use in testing, but apparently there are some issues with that (so it is still pending). In the meantime, I think maybe we should try moving forward with using a "local Jaeger" for development and testing on arax.ncats.io. See this section of the documentation that Eric linked:
https://github.com/NCATSTranslator/TranslatorTechnicalDocumentation/blob/214dcfef8465c95c1f68b0f62549b43442c23a30/docs/deployment-guide/monitoring.md?plain=1#L25-L47

@edeutsch
Copy link
Collaborator

@saramsey
Copy link
Member Author

saramsey commented Nov 30, 2023

@kvnthomas98 what kind of EC2 instance would you need for hosting a Jaeger collector? Can you describe the hardware requirements? And storage requirements? Also what version of Ubuntu? I think we typically use Ubuntu 22.04?

@saramsey
Copy link
Member Author

saramsey commented Nov 30, 2023

Hi all, from the Translator Release Schedule Timeline Google sheet, it's looking like we have two weeks to code this issue and get it into CI; I think the opportunity to push these updates to TEST will be on Dec. 15.

@dkoslicki
Copy link
Member

Oof, Kevin is currently working on an ordering and organizing ask that also has the same deadline.

@saramsey saramsey self-assigned this Nov 30, 2023
@saramsey
Copy link
Member Author

saramsey commented Dec 1, 2023

Looks like the previous commit was to the issue2186 branch (thank you @kvnthomas98 )

@kvnthomas98
Copy link
Collaborator

kvnthomas98 commented Dec 1, 2023

Hi @saramsey,
Sorry I missed the message.
For hardware. requirements a m5.large should do since the collector is lightweight and we don't have a crazy load, If we want to be cautious m5.xlarge should do. Please do share your thoughts.
For storage I was thinking we could use elastic search.
Regarding storage volume, I have no idea how much storage we need.

Once you've brought up the instance, please do let me know. I can work on setting up docker, jaeger collector and elastic search and testing our ARAX code.

@saramsey
Copy link
Member Author

saramsey commented Dec 2, 2023

Hi @kvnthomas98 I have created an m5.large instance jaeger.rtx.ai in the us-east-1 region, with 64 GiB of EBS storage. I set up the AWS security group policy for the instance to allow ingress packets to ports 16686/tcp (Jaeger front-end) and 4318/tcp (Jaeger OTel via HTTP) from the CIDER block 35.81.149.105/32 (i.e., from arax.ncats.io). I installed your SSH RSA public key into the instance so you should be able to log into it from the command-line via

ssh -o StrictHostKeyChecking=no ubuntu@jaeger.rtx.ai

I've installed docker (from docker.io) into the instance, and I've already pulled the Docker image jaegertracing/all-in-one from DockerHub via

sudo docker pull jaegertracing/all-in-one

You can run Jaeger locally via the command (which is adapted from the one in the installation instructions on the Jaeger website):

 sudo docker run --rm --name jaeger  \
                             -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
                             -p 6831:6831/udp \
                             -p 6832:6832/udp  \
                             -p 5778:5778 \
                             -p 16686:16686 \
                             -p 4317:4317 \
                             -p 4318:4318 \ 
                             -p 14250:14250 \
                             -p 14268:14268 \
                             -p 14269:14269 \ 
                             -p 9411:9411 \
                             jaegertracing/all-in-one:latest

I've put that command in a shell-script /home/ubuntu/run-jaeger.sh. So, when I run the aforementioned shell script, the Jaeger front-end is reachable and works, as shown here:
Screenshot 2023-12-02 at 7 47 40 AM

and I can make a TCP connection from arax.ncats.io to port 16686 on the Jaeger server in us-east-1,

stephenr@ip-172-31-53-16:~$ nc -v jaeger.rtx.ai 16686
Connection to jaeger.rtx.ai 16686 port [tcp/*] succeeded!

and to port 4318 as well:

stephenr@ip-172-31-53-16:~$ nc -v jaeger.rtx.ai 4318
Connection to jaeger.rtx.ai 4318 port [tcp/*] succeeded!

Just to minimize cost, I've opted to stop the instance until we are ready to test it.

So whenever you want to test out OpenTelemetry, simply do the following three steps:

  1. In the AWS Console, go to EC2 and start the jaeger.rtx.ai instance
  2. From your local computer, run ssh ubuntu@jaeger.rtx.ai ./run-jaeger.sh, which should start Jaeger
  3. Test OpenTelemetry as you like, pointing the OTel data stream at jaeger.rtx.ai:4318.
  4. When you are done, I suppose we can stop the jaeger.rtx.ai instance (until such time as we deploy telemetry to arax.ncats.io and we need jaeger.rtx.ai to be running all the time).

@saramsey
Copy link
Member Author

saramsey commented Dec 3, 2023

To be clear, for security reasons, I have locked down the security group policy on jaeger.rtx.ai, though we can allow other IPs to connect as well, if need be:

Screenshot 2023-12-02 at 4 13 22 PM

@saramsey saramsey changed the title OpenTelemetry logging in ARAX OpenTelemetry (OTel) logging in ARAX Dec 5, 2023
@saramsey
Copy link
Member Author

saramsey commented Dec 6, 2023

The following simple demonstration python code snippet, run in python3.9 inside the rtx1 container on arax.ncats.io, successfully logs a message to our Jaeger server on jaeger.rtx.ai:

from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
#from opentelemetry.sdk.resources import SERVICE_NAME as telemetery_service_name_key
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
jaeger_host = 'jaeger.rtx.ai'
jaeger_port = 6831
trace.set_tracer_provider(TracerProvider(resource=Resource.create({'bar': 'foo'})))
jaeger_exporter = JaegerExporter(agent_host_name=jaeger_host, agent_port=jaeger_port)
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(jaeger_exporter))
tracer = trace.get_tracer("test_otel.py")
with tracer.start_as_current_span("span-name") as span:
    # do some work that 'span' will track
    print("doing some work...")

The above python code (which was adapted from a demo program from the ARAGORN team) requires the following packages to be installed:

(venv) rt@d1fd345478a0:~$ pip freeze
annotated-types==0.6.0
anyio==3.7.1
asgiref==3.7.2
backoff==2.2.1
certifi==2023.11.17
charset-normalizer==3.3.2
Deprecated==1.2.14
exceptiongroup==1.2.0
fastapi==0.104.1
googleapis-common-protos==1.59.1
grpcio==1.59.3
h11==0.14.0
httpcore==1.0.2
httpx==0.25.2
idna==3.6
importlib-metadata==6.11.0
opentelemetry-api==1.21.0
opentelemetry-exporter-jaeger==1.21.0
opentelemetry-exporter-jaeger-proto-grpc==1.21.0
opentelemetry-exporter-jaeger-thrift==1.21.0
opentelemetry-exporter-otlp==1.21.0
opentelemetry-exporter-otlp-proto-common==1.21.0
opentelemetry-exporter-otlp-proto-grpc==1.21.0
opentelemetry-exporter-otlp-proto-http==1.21.0
opentelemetry-instrumentation==0.42b0
opentelemetry-instrumentation-asgi==0.42b0
opentelemetry-instrumentation-fastapi==0.42b0
opentelemetry-instrumentation-httpx==0.42b0
opentelemetry-proto==1.21.0
opentelemetry-sdk==1.21.0
opentelemetry-semantic-conventions==0.42b0
opentelemetry-util-http==0.42b0
protobuf==4.25.1
pydantic==2.5.2
pydantic_core==2.14.5
requests==2.31.0
six==1.16.0
sniffio==1.3.0
starlette==0.27.0
thrift==0.16.0
typing_extensions==4.8.0
urllib3==2.1.0
wrapt==1.16.0
zipp==3.17.0

Note, not all of the imported packages are used in the code snippet; so the code and the required packages could be simplified somewhat, and also will likely change for us in any event because we use python-requests instead of httpx. But, it illustrates that the opentelemetry SDK is working for sending spans (or messages or whatever they are called) to our Jaeger collector on jaeger.rtx.ai:

(venv) rt@d1fd345478a0:~$ python3 test_otel.py
/home/rt/test_otel.py:12: DeprecationWarning: Call to deprecated method __init__. (Since v1.35, the Jaeger supports OTLP natively. Please use the OTLP exporter instead. Support for this exporter will end July 2023.) -- Deprecated since version 1.16.0.
  jaeger_exporter = JaegerExporter(agent_host_name=jaeger_host, agent_port=jaeger_port)
doing some work...

And the view from the Jaeger frontend:

Screenshot 2023-12-05 at 5 01 11 PM

@edeutsch
Copy link
Collaborator

edeutsch commented Dec 7, 2023

wow, that's really adding a lot of... complexity

@saramsey
Copy link
Member Author

Thank you @kvnthomas98 for putting together this PR.

kvnthomas98 added a commit that referenced this issue Dec 13, 2023
#2186 Open Telemetry Implementation using Jaeger Exporter
@kvnthomas98
Copy link
Collaborator

Telemetry Instrumentation code has been added and merged to master Jaeger UI on both ITRB CI and jaeger.rtx.ai show traces from the telemetry sent over. Thanks @saramsey and @edeutsch for the help!

@kvnthomas98
Copy link
Collaborator

code pushed to ITRB-Test! closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants