Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Expose OSThreadId and TimeStamp on EventWrittenEventArgs #19002

Merged
merged 13 commits into from
Aug 1, 2018

Conversation

brianrob
Copy link
Member

Expose two new public APIs on System.Diagnostics.Tracing.EventWrittenEventArgs:

  • OSThreadId: The OS thread identifier of the thread that generated the event.
  • TimeStamp: Contains the date and time when the event was generated.

This PR also contains the underlying implementation required to plumb this information into managed code.

@@ -343,6 +343,16 @@ public static new Thread CurrentThread
[MethodImplAttribute(MethodImplOptions.InternalCall), ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
private static extern Thread GetCurrentThreadNative();

internal static int CurrentOSThreadId
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas, what is the best way for me to make sure that this doesn't break CoreRT? Should I just put this property in an ifdef CORECLR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add this to RuntimeThread that exists in both CoreCLR and CoreRT CoreLibs, and then implement this on RuntimeThread in both.

What the implementation should look like depends on how this is meant to work cross-plat - see my other comment.

}

/// <summary>
/// Gets a DateTime that specifies when the event was written.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please indicate that it is the UTC DateTime.

internal struct EventPipeSessionInfo
{
[StructLayout(LayoutKind.Sequential)]
internal struct SystemTime
Copy link

@vancem vancem Jul 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLease don't use SystemTime. It is inconvinient and unnecessary.
Note that you might not want to even use QueryPerfCounter stufff either. DateTime.UtcNow now uses the Window GetSystemTimePreciseAsFileTime function (or Linux equivalent), so you can probably just use DateTime everywhere (FileTime and DateTime are almost the same (they are both 100ns granularity, and only differ in an offset (See DateTIme.FromFileTimeUtc). This gets rid of a bunch of ugly unit conversions. Just use DateTime everywhere for timestamps, and use the same PAL call we use for DateTime.UtcNow, and all this gets very simple.

/// <summary>
/// Gets the identifier for the OS thread that wrote the event.
/// </summary>
public int OSThreadId
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int32 OS thread id is Windows specific concept. It does not exist on Unix. The CoreCLR Win32 PAL fakes the implementation to return something, but that implementation is less than ideal (e.g. it can return same thread ID for multiple threads in the given process).

Copy link
Member

@jkotas jkotas Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related:

https://github.com/dotnet/coreclr/issues/889
https://github.com/dotnet/coreclr/issues/13037#issuecomment-318137113

IIRC, the fake Win32 threadID has been a problem for the debugger and it required some creative workarounds there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not certain what your guidance is here. If all you are saying is that we should expose a OSThreadId as a long in this API, (so that when we fix #889 things just work) I think that is reasonable.

Copy link
Member

@jkotas jkotas Jul 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends on the expected use of this property is. This PR does not say what we expect this to be used for. Since these are new public APIs, it would be nice to use the process and template for adding public APIs so that we have information like this captured and taken into account.

I can think about two options:

  • Return the actual OS thread ID. It means that it has to be 64-bit number for portability to non-Windows OSes. The implementation should return the actual OS thread ID (i.e. what pthread_threadid_np or gettid returns), not the Win32 PAL fake thread ID.

  • Return the managed thread ID. Managed thread IDs are 32-bit, managed-code specific concept.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can give information on what this OSThreadID will be used for, as this was meant to mimic similar APIs in the Windows ETW interface. The OSThreadID's main use is as a correlator. It allows you to 'link' events that are related (in this case that they happened on the same thread). It is used pervasively.

Now either of the options above can be used for that purpose, so that does not help. It is actually helpful if the IDs are NOT REUSED AGRESSIVELY, since if they are reused commonly, the parsing logic has to go to some trouble to basically make they more globally unique again (otherwise you get false correlations if a thread dies and another thread reuses the ID). Now arguably that reuse problem is there for OSThreadID, but in practice most code simply ignores the problem (which works fine for OSThreadID most of the time).

The other value of using the OSThreadID is that it is certainly possible that the data will be persisted and correlated with data that comes from other tools (e.g. the Linux Perf tool), that does not have the concept of the managed thread id. This creates unnecessary pain trying to corelate things. (And this is another reason you want the 'true' OSThreadID now any one that was morphed by our PAL).

I think given this, the guidance should be simply to expand the OSThreadID to be a 64 bit number (presumably a long).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vancem If this is going to be used for doing similar analysis to ETW, would it make more sense to fall back to the UniqueProcessKey concept which is size_t on Windows anyways? This is what various "other" tools I am familiar with use and might be useful for them in the future as well 😄

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronRobinsonMSFT - You will note UniqueProcessKey is not what is in every event, but rather in the Process Event, which is then correlated to the (less globally unique) process key. There is a reason for this (part legacy, but mostly that uniqueness comes at a cost, and you want your system to be pay-for-play (things that don't need global uniqueness don't pay for it)).

As a practical mater, I don't think we have a similarly globally unique ID for threads, and we probably don't need it (if thread IDs are unique within a process, that would be enough), and because we want to operate on multiple OSes we likely want to stick with things that are available 'everywhere'.

I think are guidance currently si that we should use the true OS thread ID, and thus make the value 64 bit. How woudl your insight change that suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoot - sorry for the noise. When you mentioned the 64-bit size, I remembered a unique key of size_t and completely ignored the fact that this was related to threads not processes.

My basic thought here was an attempt to keep the underlying data model consistent between the two platforms (i.e. type size on Windows and type size on Non-Windows). Since thread ID in the eventing system is already 32-bit I was thinking of using a different but present ID mechanism in ETW that is already 64-bit or size_t.

In this case, there isn't one because the only thread ID-esque concept on an ETW event is 32-bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that the plan that makes the most sense here is to expand the OSThreadId to be a long and expose it via the PAL in CoreCLR. I will also investigate what needs to be done to support this in CoreRT.

Also, I intend to file an issue in CoreFX to have the APIs reviewed. I was mostly looking to make sure that there weren't other issues with implementation before I went and did that, with the hope of avoiding having to go back and forth to make changes. I will do this shortly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have filed https://github.com/dotnet/corefx/issues/31401 to review the new APIs.

@brianrob
Copy link
Member Author

Thanks @vancem and @jkotas. I will be out of the office for a few days, but will pick this up when I return.

dwEstimatedRefCount,
NULL, // domain value is not interesting in CoreCLR
className.GetUnicode(), nameSpace.GetUnicode(), wszOperation, GetClrInstanceId());
EX_TRY
Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT Jul 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has something changed that now allows the functions in here to throw? Has this function always been capable of throwing an exception, just never had a catch around it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was caused by a previous change of mine: #17989. The uses of SString can throw.

@brianrob
Copy link
Member Author

dotnet-bot test OSX10.12 x64 Checked CoreFX Tests ('java.nio.channels.ClosedChannelException')

@brianrob
Copy link
Member Author

I believe I've addressed all of the feedback so far. I am going to remove the WIP tag. This is ready pending any additional feedback and:

@brianrob brianrob changed the title [WIP] Expose OSThreadId and TimeStamp on EventWrittenEventArgs Expose OSThreadId and TimeStamp on EventWrittenEventArgs Jul 30, 2018
@brianrob
Copy link
Member Author

dotnet-bot test OSX10.12 x64 Checked CoreFX Tests ('java.nio.channels.ClosedChannelException')

}
else
{
Debug.Assert(s_threadIds[i] > 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianrob why are you using Debug.Assert instead of Assert? Should the tested API work only in Debug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamsitnik, thanks for pointing this out. The test should work on all flavors. I have replaced calls to Debug.Assert with calls to a test-local Assert as there isn't a platform-level Assert.

@brianrob
Copy link
Member Author

brianrob commented Aug 1, 2018

The CoreRT implementation of RuntimeThread.CurrentOSThreadId has been merged (dotnet/corert#6161). I'm going to merge this change and then will make any potential changes needed to the public surface if there are requested changes from https://github.com/dotnet/corefx/issues/31401.

@brianrob brianrob merged commit c4c1672 into dotnet:master Aug 1, 2018
@brianrob brianrob deleted the eventwritteneventargs branch August 1, 2018 18:38
brianrob added a commit to brianrob/coreclr that referenced this pull request Aug 6, 2018
* Add EventWrittenEventArgs public properties OSThreadId and TimeStamp and plumb through OSThreadId.

* Plumb ActivityId and RelatedActivityId to EventListener.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants