forked from brendangregg/perf-tools
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
2b16013
commit 967a81c
Showing
5 changed files
with
525 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../system/tpoint |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
Demonstrations of tpoint, the Linux ftrace version. | ||
|
||
Let's trace block:block_rq_issue, to see block device (disk) I/O requests: | ||
|
||
# ./tpoint block:block_rq_issue | ||
Tracing block:block_rq_issue. Ctrl-C to end. | ||
supervise-1692 [001] d... 7269912.982162: block_rq_issue: 202,1 W 0 () 17039656 + 8 [supervise] | ||
supervise-1696 [000] d... 7269912.982243: block_rq_issue: 202,1 W 0 () 12862264 + 8 [supervise] | ||
cksum-12994 [000] d... 7269913.317924: block_rq_issue: 202,1 R 0 () 9357056 + 72 [cksum] | ||
cksum-12994 [000] d... 7269913.319013: block_rq_issue: 202,1 R 0 () 2977536 + 144 [cksum] | ||
cksum-12994 [000] d... 7269913.320217: block_rq_issue: 202,1 R 0 () 2986240 + 216 [cksum] | ||
cksum-12994 [000] d... 7269913.321677: block_rq_issue: 202,1 R 0 () 620344 + 56 [cksum] | ||
cksum-12994 [001] d... 7269913.329309: block_rq_issue: 202,1 R 0 () 9107912 + 88 [cksum] | ||
cksum-12994 [001] d... 7269913.340133: block_rq_issue: 202,1 R 0 () 3147008 + 248 [cksum] | ||
cksum-12994 [001] d... 7269913.354551: block_rq_issue: 202,1 R 0 () 11583488 + 256 [cksum] | ||
cksum-12994 [001] d... 7269913.379904: block_rq_issue: 202,1 R 0 () 11583744 + 256 [cksum] | ||
[...] | ||
^C | ||
Ending tracing... | ||
|
||
Great, that was easy! | ||
|
||
perf_events can do this as well, and is better in many ways, including a more | ||
efficient buffering strategy, and multi-user access. It's not that easy to do | ||
this one-liner in perf_events, however. An equivalent for recent kernels is: | ||
|
||
perf record --no-buffer -e block:block_rq_issue -a -o - | PAGER=cat stdbuf -oL perf script -i - | ||
|
||
Older kernels, use -D instead of --no-buffer. Even better is to set the buffer | ||
page size to a sufficient grouping (using -m), to minimize overheads, at the | ||
expense of liveliness of updates. Note that stack traces (-g) don't work on | ||
my systems with this perf one-liner, however, they do work with tpoint -s. | ||
|
||
|
||
Column headings can be printed using -H: | ||
|
||
# ./tpoint -H block:block_rq_issue | ||
Tracing block:block_rq_issue. Ctrl-C to end. | ||
# tracer: nop | ||
# | ||
# entries-in-buffer/entries-written: 0/0 #P:2 | ||
# | ||
# _-----=> irqs-off | ||
# / _----=> need-resched | ||
# | / _---=> hardirq/softirq | ||
# || / _--=> preempt-depth | ||
# ||| / delay | ||
# TASK-PID CPU# |||| TIMESTAMP FUNCTION | ||
# | | | |||| | | | ||
supervise-1697 [000] d... 7270545.340856: block_rq_issue: 202,1 W 0 () 12862464 + 8 [supervise] | ||
supervise-1697 [000] d... 7270545.341256: block_rq_issue: 202,1 W 0 () 12862472 + 8 [supervise] | ||
supervise-1690 [000] d... 7270545.342363: block_rq_issue: 202,1 W 0 () 17040368 + 8 [supervise] | ||
[...] | ||
|
||
They are also documented in the Linux kernel source under: | ||
Documentation/trace/ftrace.txt. | ||
|
||
|
||
How about stacks traces for those block_rq_issue events? Adding -s: | ||
|
||
# ./tpoint -s block:block_rq_issue | ||
Tracing block:block_rq_issue. Ctrl-C to end. | ||
supervise-1691 [000] d... 7269511.079179: block_rq_issue: 202,1 W 0 () 17040232 + 8 [supervise] | ||
supervise-1691 [000] d... 7269511.079188: <stack trace> | ||
=> blk_peek_request | ||
=> do_blkif_request | ||
=> __blk_run_queue | ||
=> queue_unplugged | ||
=> blk_flush_plug_list | ||
=> blk_finish_plug | ||
=> ext4_writepages | ||
=> do_writepages | ||
=> __filemap_fdatawrite_range | ||
=> filemap_flush | ||
=> ext4_alloc_da_blocks | ||
=> ext4_rename | ||
=> vfs_rename | ||
=> SYSC_renameat2 | ||
=> SyS_renameat2 | ||
=> SyS_rename | ||
=> system_call_fastpath | ||
cksum-7428 [000] d... 7269511.331778: block_rq_issue: 202,1 R 0 () 9006848 + 208 [cksum] | ||
cksum-7428 [000] d... 7269511.331784: <stack trace> | ||
=> blk_peek_request | ||
=> do_blkif_request | ||
=> __blk_run_queue | ||
=> queue_unplugged | ||
=> blk_flush_plug_list | ||
=> blk_finish_plug | ||
=> __do_page_cache_readahead | ||
=> ondemand_readahead | ||
=> page_cache_async_readahead | ||
=> generic_file_read_iter | ||
=> new_sync_read | ||
=> vfs_read | ||
=> SyS_read | ||
=> system_call_fastpath | ||
cksum-7428 [000] d... 7269511.332631: block_rq_issue: 202,1 R 0 () 620992 + 200 [cksum] | ||
cksum-7428 [000] d... 7269511.332639: <stack trace> | ||
=> blk_peek_request | ||
=> do_blkif_request | ||
=> __blk_run_queue | ||
=> queue_unplugged | ||
=> blk_flush_plug_list | ||
=> blk_finish_plug | ||
=> __do_page_cache_readahead | ||
=> ondemand_readahead | ||
=> page_cache_sync_readahead | ||
=> generic_file_read_iter | ||
=> new_sync_read | ||
=> vfs_read | ||
=> SyS_read | ||
=> system_call_fastpath | ||
^C | ||
Ending tracing... | ||
|
||
Easy. Now I can read the ancestry to understand what actually lead to issuing | ||
a block device (disk) I/O. | ||
|
||
|
||
Here's insertion onto the block I/O queue (better matches processes): | ||
|
||
# ./tpoint -s block:block_rq_insert | ||
Tracing block:block_rq_insert. Ctrl-C to end. | ||
cksum-11908 [000] d... 7269834.882517: block_rq_insert: 202,1 R 0 () 736304 + 256 [cksum] | ||
cksum-11908 [000] d... 7269834.882528: <stack trace> | ||
=> __elv_add_request | ||
=> blk_flush_plug_list | ||
=> blk_finish_plug | ||
=> __do_page_cache_readahead | ||
=> ondemand_readahead | ||
=> page_cache_sync_readahead | ||
=> generic_file_read_iter | ||
=> new_sync_read | ||
=> vfs_read | ||
=> SyS_read | ||
=> system_call_fastpath | ||
[...] | ||
|
||
|
||
You can also add tracepoint filters. To see what variables you can use, use -v: | ||
|
||
# ./tpoint -v block:block_rq_issue | ||
name: block_rq_issue | ||
ID: 942 | ||
format: | ||
field:unsigned short common_type; offset:0; size:2; signed:0; | ||
field:unsigned char common_flags; offset:2; size:1; signed:0; | ||
field:unsigned char common_preempt_count; offset:3; size:1; signed:0; | ||
field:int common_pid; offset:4; size:4; signed:1; | ||
|
||
field:dev_t dev; offset:8; size:4; signed:0; | ||
field:sector_t sector; offset:16; size:8; signed:0; | ||
field:unsigned int nr_sector; offset:24; size:4; signed:0; | ||
field:unsigned int bytes; offset:28; size:4; signed:0; | ||
field:char rwbs[8]; offset:32; size:8; signed:1; | ||
field:char comm[16]; offset:40; size:16; signed:1; | ||
field:__data_loc char[] cmd; offset:56; size:4; signed:1; | ||
|
||
print fmt: "%d,%d %s %u (%s) %llu + %u [%s]", ((unsigned int) ((REC->dev) >> 20)), ((unsigned int) ((REC->dev) & ((1U << 20) - 1))), REC->rwbs, REC->bytes, __get_str(cmd), (unsigned long long)REC->sector, REC->nr_sector, REC->comm | ||
|
||
|
||
Now I'll add a filter to check that the rwbs field (I/O type) includes an "R", | ||
making it a read: | ||
|
||
# ./tpoint -s block:block_rq_insert 'rwbs ~ "*R*"' | ||
cksum-11908 [000] d... 7269839.919098: block_rq_insert: 202,1 R 0 () 736560 + 136 [cksum] | ||
cksum-11908 [000] d... 7269839.919107: <stack trace> | ||
=> __elv_add_request | ||
=> blk_flush_plug_list | ||
=> blk_finish_plug | ||
=> __do_page_cache_readahead | ||
=> ondemand_readahead | ||
=> page_cache_async_readahead | ||
=> generic_file_read_iter | ||
=> new_sync_read | ||
=> vfs_read | ||
=> SyS_read | ||
=> system_call_fastpath | ||
[...] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
.TH tpoint 8 "2014-07-20" "USER COMMANDS" | ||
.SH NAME | ||
tpoint \- trace a given tracepoint. Static tracing. Uses Linux ftrace. | ||
.SH SYNOPSIS | ||
.B tpoint | ||
[\-hHsv] [\-d secs] [\-p PID] tracepoint [filter] | ||
|
||
.B tpoint | ||
\-l | ||
.SH DESCRIPTION | ||
This will enable a given tracepoint, print events, then disable the tracepoint | ||
when the program ends. This is like a simple version of the "perf" command for | ||
printing live tracepoint events only. Wildcards are currently not supported. | ||
If for any reason tpoint(8) is insufficient, use the more powerful perf | ||
command for tracing tracepoints instead. | ||
|
||
Beware of feedback loops: tracing tcp functions over an ssh session, | ||
or writing ext4 events to an ext4 file system. For the former, tcp | ||
trace data could be redirected to a file (as in the usage message). For | ||
the latter, trace to the screen or a different file system. | ||
|
||
Since this uses ftrace, only the root user can use this tool. | ||
.SH REQUIREMENTS | ||
FTRACE CONFIG and tracepoints, which you may already have enabled and available | ||
on recent kernels. | ||
.SH OPTIONS | ||
.TP | ||
\-d seconds | ||
Set the duration of tracing, in seconds. Trace output will be buffered and | ||
printed at the end. This also reduces overheads by buffering in-kernel, | ||
instead of printing events as they occur. | ||
|
||
The ftrace buffer has a fixed size per-CPU (see | ||
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing, | ||
try increasing that size. | ||
.TP | ||
\-h | ||
Print usage message. | ||
.TP | ||
\-H | ||
Print column headers. | ||
.TP | ||
\-l | ||
List tracepoints only. | ||
.TP | ||
\-s | ||
Print kernel stack traces after each event. | ||
.TP | ||
\-v | ||
Show the tpoint format file only (do not trace), identifying possible variables | ||
for use in a custom filter. | ||
.TP | ||
\-p PID | ||
Only trace kernel functions when this process ID is on-CPU. | ||
.TP | ||
tracepoint | ||
A tracepoint name. Eg, block:block_rq_issue. See the EXAMPLES section. | ||
.TP | ||
filter | ||
An ftrace filter definition. | ||
.SH EXAMPLES | ||
.TP | ||
List tracepoints containing "open": | ||
# | ||
.B tpoint -l | grep open | ||
.TP | ||
Trace open() syscall entry: | ||
# | ||
.B tpoint syscalls:sys_enter_open | ||
.TP | ||
Trace open() syscall entry, showing column headers: | ||
# | ||
.B tpoint -H syscalls:sys_enter_open | ||
.TP | ||
Trace block I/O issue: | ||
# | ||
.B tpoint block:block_rq_issue | ||
.TP | ||
Trace block I/O issue with stack traces: | ||
# | ||
.B tpoint \-s block:block_rq_issue | ||
.SH FIELDS | ||
The output format depends on the kernel version, and headings can be printed | ||
using \-H. The format is the same as the ftrace function trace format, described | ||
in the kernel source under Documentation/trace/ftrace.txt. | ||
|
||
Typical fields are: | ||
.TP | ||
TASK-PID | ||
The process name (which could include dashes), a dash, and the process ID. | ||
.TP | ||
CPU# | ||
The CPU ID, in brackets. | ||
.TP | ||
|||| | ||
Kernel state flags. For example, on Linux 3.16 these are for irqs-off, | ||
need-resched, hardirq/softirq, and preempt-depth. | ||
.TP | ||
TIMESTAMP | ||
Time of event, in seconds. | ||
.TP | ||
FUNCTION | ||
Kernel function name. | ||
.SH OVERHEAD | ||
This can generate a lot of trace data quickly, depending on the | ||
frequency of the traced events. Such data will cause performance overheads. | ||
This also works without buffering by default, printing function events | ||
as they happen (uses trace_pipe), context switching and consuming CPU to do | ||
so. If needed, you can try the "\-d secs" option, which buffers events | ||
instead, reducing overhead. If you think the buffer option is losing events, | ||
try increasing the buffer size (buffer_size_kb). | ||
|
||
Consider using perf_events, which manages buffers differently and more | ||
efficiently, for higher frequency applications. | ||
.SH SOURCE | ||
This is from the perf-tools collection: | ||
.IP | ||
https://github.com/brendangregg/perf-tools | ||
.PP | ||
Also look under the examples directory for a text file containing example | ||
usage, output, and commentary for this tool. | ||
.SH OS | ||
Linux | ||
.SH STABILITY | ||
Unstable - in development. | ||
.SH AUTHOR | ||
Brendan Gregg | ||
.SH SEE ALSO | ||
functrace(8), funccount(8), perf(1) |
Oops, something went wrong.