-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Expose OSThreadId and TimeStamp on EventWrittenEventArgs #19002
Conversation
…and plumb through OSThreadId.
…ntWrittenEventArgs.TimeStamp.
@@ -343,6 +343,16 @@ public static new Thread CurrentThread | |||
[MethodImplAttribute(MethodImplOptions.InternalCall), ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)] | |||
private static extern Thread GetCurrentThreadNative(); | |||
|
|||
internal static int CurrentOSThreadId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas, what is the best way for me to make sure that this doesn't break CoreRT? Should I just put this property in an ifdef CORECLR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add this to RuntimeThread
that exists in both CoreCLR and CoreRT CoreLibs, and then implement this on RuntimeThread
in both.
What the implementation should look like depends on how this is meant to work cross-plat - see my other comment.
} | ||
|
||
/// <summary> | ||
/// Gets a DateTime that specifies when the event was written. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please indicate that it is the UTC DateTime.
internal struct EventPipeSessionInfo | ||
{ | ||
[StructLayout(LayoutKind.Sequential)] | ||
internal struct SystemTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PLease don't use SystemTime. It is inconvinient and unnecessary.
Note that you might not want to even use QueryPerfCounter stufff either. DateTime.UtcNow now uses the Window GetSystemTimePreciseAsFileTime function (or Linux equivalent), so you can probably just use DateTime everywhere (FileTime and DateTime are almost the same (they are both 100ns granularity, and only differ in an offset (See DateTIme.FromFileTimeUtc). This gets rid of a bunch of ugly unit conversions. Just use DateTime everywhere for timestamps, and use the same PAL call we use for DateTime.UtcNow, and all this gets very simple.
/// <summary> | ||
/// Gets the identifier for the OS thread that wrote the event. | ||
/// </summary> | ||
public int OSThreadId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int32
OS thread id is Windows specific concept. It does not exist on Unix. The CoreCLR Win32 PAL fakes the implementation to return something, but that implementation is less than ideal (e.g. it can return same thread ID for multiple threads in the given process).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related:
https://github.com/dotnet/coreclr/issues/889
https://github.com/dotnet/coreclr/issues/13037#issuecomment-318137113
IIRC, the fake Win32 threadID has been a problem for the debugger and it required some creative workarounds there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not certain what your guidance is here. If all you are saying is that we should expose a OSThreadId as a long in this API, (so that when we fix #889 things just work) I think that is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends on the expected use of this property is. This PR does not say what we expect this to be used for. Since these are new public APIs, it would be nice to use the process and template for adding public APIs so that we have information like this captured and taken into account.
I can think about two options:
-
Return the actual OS thread ID. It means that it has to be 64-bit number for portability to non-Windows OSes. The implementation should return the actual OS thread ID (i.e. what
pthread_threadid_np
orgettid
returns), not the Win32 PAL fake thread ID. -
Return the managed thread ID. Managed thread IDs are 32-bit, managed-code specific concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can give information on what this OSThreadID will be used for, as this was meant to mimic similar APIs in the Windows ETW interface. The OSThreadID's main use is as a correlator. It allows you to 'link' events that are related (in this case that they happened on the same thread). It is used pervasively.
Now either of the options above can be used for that purpose, so that does not help. It is actually helpful if the IDs are NOT REUSED AGRESSIVELY, since if they are reused commonly, the parsing logic has to go to some trouble to basically make they more globally unique again (otherwise you get false correlations if a thread dies and another thread reuses the ID). Now arguably that reuse problem is there for OSThreadID, but in practice most code simply ignores the problem (which works fine for OSThreadID most of the time).
The other value of using the OSThreadID is that it is certainly possible that the data will be persisted and correlated with data that comes from other tools (e.g. the Linux Perf tool), that does not have the concept of the managed thread id. This creates unnecessary pain trying to corelate things. (And this is another reason you want the 'true' OSThreadID now any one that was morphed by our PAL).
I think given this, the guidance should be simply to expand the OSThreadID to be a 64 bit number (presumably a long).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vancem If this is going to be used for doing similar analysis to ETW, would it make more sense to fall back to the UniqueProcessKey
concept which is size_t
on Windows anyways? This is what various "other" tools I am familiar with use and might be useful for them in the future as well 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AaronRobinsonMSFT - You will note UniqueProcessKey is not what is in every event, but rather in the Process Event, which is then correlated to the (less globally unique) process key. There is a reason for this (part legacy, but mostly that uniqueness comes at a cost, and you want your system to be pay-for-play (things that don't need global uniqueness don't pay for it)).
As a practical mater, I don't think we have a similarly globally unique ID for threads, and we probably don't need it (if thread IDs are unique within a process, that would be enough), and because we want to operate on multiple OSes we likely want to stick with things that are available 'everywhere'.
I think are guidance currently si that we should use the true OS thread ID, and thus make the value 64 bit. How woudl your insight change that suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoot - sorry for the noise. When you mentioned the 64-bit size, I remembered a unique key of size_t
and completely ignored the fact that this was related to threads not processes.
My basic thought here was an attempt to keep the underlying data model consistent between the two platforms (i.e. type size on Windows and type size on Non-Windows). Since thread ID in the eventing system is already 32-bit I was thinking of using a different but present ID mechanism in ETW that is already 64-bit or size_t
.
In this case, there isn't one because the only thread ID-esque concept on an ETW event is 32-bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that the plan that makes the most sense here is to expand the OSThreadId to be a long and expose it via the PAL in CoreCLR. I will also investigate what needs to be done to support this in CoreRT.
Also, I intend to file an issue in CoreFX to have the APIs reviewed. I was mostly looking to make sure that there weren't other issues with implementation before I went and did that, with the hope of avoiding having to go back and forth to make changes. I will do this shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have filed https://github.com/dotnet/corefx/issues/31401 to review the new APIs.
dwEstimatedRefCount, | ||
NULL, // domain value is not interesting in CoreCLR | ||
className.GetUnicode(), nameSpace.GetUnicode(), wszOperation, GetClrInstanceId()); | ||
EX_TRY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has something changed that now allows the functions in here to throw? Has this function always been capable of throwing an exception, just never had a catch around it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was caused by a previous change of mine: #17989. The uses of SString can throw.
dotnet-bot test OSX10.12 x64 Checked CoreFX Tests ('java.nio.channels.ClosedChannelException') |
I believe I've addressed all of the feedback so far. I am going to remove the WIP tag. This is ready pending any additional feedback and:
|
dotnet-bot test OSX10.12 x64 Checked CoreFX Tests ('java.nio.channels.ClosedChannelException') |
} | ||
else | ||
{ | ||
Debug.Assert(s_threadIds[i] > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brianrob why are you using Debug.Assert
instead of Assert
? Should the tested API work only in Debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamsitnik, thanks for pointing this out. The test should work on all flavors. I have replaced calls to Debug.Assert
with calls to a test-local Assert
as there isn't a platform-level Assert
.
The CoreRT implementation of RuntimeThread.CurrentOSThreadId has been merged (dotnet/corert#6161). I'm going to merge this change and then will make any potential changes needed to the public surface if there are requested changes from https://github.com/dotnet/corefx/issues/31401. |
* Add EventWrittenEventArgs public properties OSThreadId and TimeStamp and plumb through OSThreadId. * Plumb ActivityId and RelatedActivityId to EventListener.
Expose two new public APIs on
System.Diagnostics.Tracing.EventWrittenEventArgs
:OSThreadId
: The OS thread identifier of the thread that generated the event.TimeStamp
: Contains the date and time when the event was generated.This PR also contains the underlying implementation required to plumb this information into managed code.