Skip to content

Commit

Permalink
Add some doc files and utilities
Browse files Browse the repository at this point in the history
  • Loading branch information
Scott MacVicar committed Feb 20, 2010
1 parent 95cc4a3 commit a551a6e
Show file tree
Hide file tree
Showing 5 changed files with 394 additions and 0 deletions.
49 changes: 49 additions & 0 deletions bin/report_mutex.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
<?php

$server = $argv[1];
$top = $argv[2];
$translate = $argv[3];
if (!$top) $top = 20;

$ret = shell_exec("GET 'http://$server/stats.kvp?agg=*&keys=:mutex.*:'");
$stats = json_decode($ret);
if (!$stats) {
exit("No mutex profile data was found on server\n");
}

foreach ($stats as $name => $count) {
if (preg_match('/mutex.([0-9a-f:]+).(hit|time)/', $name, $m)) {
$stack = $m[1];
$type = $m[2];

if ($type == 'hit') {
$hits[$stack] = $count;
} else {
$times[$stack] = $count;
}
}
}

arsort($hits); $hits = array_slice($hits, 0, $top);
arsort($times); $times = array_slice($times, 0, $top);

$thits = array();
print str_repeat('=', 70)."\n";
foreach ($hits as $stack => $count) {
print $count ." x sampling hits:\n";
print $translate ? translate_stack($stack) : $stack."\n";
print str_repeat('-', 70)."\n";
}
$ttimes = array();
print str_repeat('=', 70)."\n";
foreach ($times as $stack => $count) {
print (int)($count/1000000) ." seconds:\n";
print $translate ? translate_stack($stack) : $stack."\n";
print str_repeat('-', 70)."\n";
}

function translate_stack($stack) {
global $server;
return shell_exec("GET http://$server/translate?stack=$stack");
}

68 changes: 68 additions & 0 deletions doc/command.admin_server
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
<h2>Admin Server URL Commands</h2>

When running a compiled program as an HTTP server, by default it runs an
admin server on a specified port. One can send an HTTP request to this port
to perform certain actions. To list all possible commands,

GET http://localhost:9999

This is a list of available URLs:


/stop: stop the web server
/translate: translate hex encoded stacktrace in 'stack' param
stack required, stack trace to translate
build-id optional, if specified, build ID has to match
bare optional, whether to display frame ordinates
/build-id: returns build id that's passed in from command line
/check-load: how many threads are actively handling requests
/check-mem: report memory quick statistics in log file
/check-apc: report APC quick statistics
/status.xml: show server status in XML
/status.json: show server status in JSON
/status.html: show server status in HTML
/stats-on: main switch: enable server stats
/stats-off: main switch: disable server stats
/stats-clear: clear all server stats
/stats-web: turn on/off server page stats (CPU and gen time)
/stats-mem: turn on/off memory statistics
/stats-apc: turn on/off APC statistics
/stats-apc-key: turn on/off APC key statistics
/stats-mcc: turn on/off memcache statistics
/stats-sql: turn on/off SQL statistics
/stats-mutex: turn on/off mutex statistics
sampling optional, default 1000
/stats.keys: list all available keys
from optional, <timestamp>, or <-n> second ago
to optional, <timestamp>, or <-n> second ago
/stats.xml: show server stats in XML
/stats.json: show server stats in JSON
/stats.kvp: show server stats in key-value pairs
/stats.html: show server stats in HTML
from optional, <timestamp>, or <-n> second ago
to optional, <timestamp>, or <-n> second ago
agg optional, aggragation: *, url, code
keys optional, <key>,<key/hit>,<key/sec>,<:regex:>
url optional, only stats of this page or URL
code optional, only stats of pages returning this code

If program was compiled with GOOGLE_CPU_PROFILER, these commands will become available,

/prof-cpu-on: turn on CPU profiler
/prof-cpu-off: turn off CPU profiler

If program was compiled with GOOGLE_HEAP_PROFILER, these commands will become available,

/prof-heap-on: turn on heap profiler
/prof-heap-dump: take one snapshot of the heap
/prof-heap-off: turn off heap profiler
/stats-malloc: turn on/off malloc statistics
/leak-on: start leak detection
sampling required, frequency
/leak-off: end leak detection and report leaking
cutoff optional, default 20 seconds, ignore newer allocs

If program was compiled with GOOGLE_TCMALLOC, these commands will become available,

/free-mem: ask tcmalloc to release memory to system
/tcmalloc-stats: get internal tcmalloc stats
62 changes: 62 additions & 0 deletions doc/debug.leak
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
9999
<h2>Debugging Memory Leaks</h2>

First of all, we need unit tests to verify different classes and functions
(esp. extension functions) don't have memory leaks by running under valgrind
like this:

GLIBCXX_FORCE_NEW=1 \
valgrind --suppressions=../bin/valgrind.suppression --tool=memcheck \
--leak-check=full --num-callers=30 --max-stackframe=3000000 \
test/test TestExtFoo::test_ext_bar

When it comes to server running, it becomes impossible to run valgrind or
heap profiler that slows down request handling very much. Here's the procedure
to run built-in memory leak detection against a live server:

1. Turn on heap profiler

Build the server (both HPHP and www) with modification of rules.mk:

DEBUG=1
#GOOGLE_CPU_PROFILER = 1
GOOGLE_HEAP_PROFILER = 1

This turns off CPU profiler and turns on heap profiler that gives us malloc()
hooks for our own sampling based leak detection. We also need to turn on DEBUG
to generate readable stacktraces.

2. Turn off mt_allocator

Run server with <b>GLIBCXX_FORCE_NEW=1</b>. This environment variable turns
off STL's mt_allocator, which doesn't call free() when some STL objects are
destructed.

3. Initialize long-living objects

Let the server run for a few minutes, until APC is mostly updated. Otherwise,
APC objects may be reported as leaked items.

4. Turn on leak detection

Hit the server to turn on leak detection:

GET http://[server]:9999/leak-on?sampling=500

The higher the sampling rate, the least impact leak detection has on server
running, but it will take longer to collect leaked items. 500 is a good rate
in our debugging process.

5. Report leaks

Wait for minutes long, or even hours long, depending on how rare the leak
happens. Then hit the server to turn off leak detection and to report leaks:

GET http://[server]:9999/leak-off > leak_report

6. Examine output

The output should have all leaked items. Sometimes some stacks are not
fully translated, and a manual translation needs to be done like this:

./www --mode translate <hex-coded-stacktrace>
26 changes: 26 additions & 0 deletions doc/debug.mutex
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@

<h2>Debugging Excessive Mutex</h2>

1. Trun on mutex stats

Hit admin port with /stats-mutex to turn on mutex stats:

GET http://localhost:9999/stats-mutex

2. Query mutex stats

Get mutex stats like this,

GET "http://localhost:9999/stats.kvp?agg=*&keys=:mutex.*:"

3. Pre-written script

Or, run bin/ report stats,

php ../bin/report_mutex.php localhost 10 1

4. Turn off mutex stats

Hit admin port with /stats-mutex to turn off mutex stats:

GET http://localhost:9999/stats-mutex
189 changes: 189 additions & 0 deletions doc/stats
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@

<h2>Server Stats</h2>

For each page, we collect stats by time slots. Each time slot is configured as
StatsSlotDuration seconds and server internally keeps StatsMaxSlot number of
slots. Inside each slot, we keep a set of stats by page or URL. These stats
include 3 built-in ones ("url", "code" and "hit") and many key-value pairs
defined by different parts of the system.

slot:
time:
pages:
page:
url: original URL
code: return code
hit: total counts
details:
key-value pair
key-value pair
key-value pair
...


<h2>Stats Query</h2>

To query stats, hit admin port with a URL like this,

http://[server]:9999/stats.[fmt]?from=[t1]&to=[t2]...

from: (optional) starting time's timestamp (e.g. 1251927393),
- use -n for n seconds ago
- when omitted or 0, it will be the earliest possible time server keeps

to: (optional) ending time's timestamp,
- use -n for n seconds ago
- when omitted or 0, it will be "now"

agg: (optional) aggregation, can be any one of these,
* aggregate all data into one list of key value pairs
url aggregate all data by URLs
code aggregate all data by response code
(omitted) default by time slots

keys: (optional) comma delimited keys to query, each of which can be decorated
[key] just the key's value, e.g. "sql.conn"
[key]/hit average per page hit, e.g. "sql.conn/hit"
[key]/sec per second rate, e.g. "sql.conn/sec"
#[regex]# keys matching the regular expression
(omitted) all available keys

url: (optional) only output stats matching the specified URL

code: (optional) only output stats of pages that have response code

[fmt]: can be one of these:

xml XML format
json JSON format
kvp simple key-value pairs in JSON format, assuming agg=*


<h2>Available Keys</h2>

1. SQL Stats:

(1) Connections

sql.conn: number of connections newly created
sql.reconn_new: number of connections newly created when trying to reconnect
sql.reconn_ok: number of connections re-picked up when trying to reconnect
sql.reconn_old: number of connections dropped when trying to reconnect

(2) Queries

sql.query: number of queries executed
sql.query.[table].[verb]: per table-verb stats
sql.query.[verb]: per verb stats, where [verb] can be one of these:

- select
- insert
- update
- replace
- delete
- begin
- commit
- rollback
- unknown

2. MemCache Stats:

mcc.madd: number of multi_add() calls
mcc.madd.count: total count of multi added keys
mcc.mreplace: number of multi_replace() calls
mcc.mreplace.count: total count of multi replaced keys
mcc.set: number of set() calls
mcc.add: number of add() calls
mcc.decr: number of decr() calls
mcc.incr: number of incr() calls
mcc.delete: number of delete() calls
mcc.delete_details: number of delete_details() calls
mcc.get: number of get() calls
mcc.mget: number of multi_get() calls
mcc.mget.count: total count of multi got keys
mcc.replace: number of replace() calls
mcc.set: number of set() calls
mcc.stats: number of stats() calls

3. APC Stats:

apc.miss: number of item misses
apc.hit: number of item hits
apc.update: number of item updates
apc.new: number of new items
apc.erased: number of successfully erased items
apc.erase: number of items that failed to erase (because they were absent)
apc.inc: number of inc() call
apc.cas: number of cas() call

4. Memory Stats:

mem.[type].[size].alloc: total number of objects allocated of the type
mem.[type].[size].freed: total number of objects freed of the type

These two stats are only available when Google heap profler is turned on for
debugging purposes:

mem.malloc.peak: peak malloc()-ed memory
mem.malloc.leaked: leaked malloc()-ed memory

5. Page Sections:

page.wall.[section]: wall time a page section takes
page.cpu.[section]: CPU time a page section takes
mem.[section]: SmartAllocator memory a page section takes
network.uncompressed: total bytes to be sent before compression
network.compressed: total bytes sent after compression

Section can be one of these:

- queuing
- all
- input
- invoke
- send
- psp
- rollback
- free

6. evhttp Stats:

- evhttp.hit: used cached connection
- evhttp.hit.<address> used cached connection by URL
- evhttp.miss no cached connection available
- evhttp.miss.<address> no cached connection available by URL
- evhttp.close cached connection got closed
- evhttp.close.<address> cached connection got closed by URL
- evhttp.skip not set to use cached connection
- evhttp.skip.<address> not set to use cached connection by URL

7. Application Stats:

PHP page can collect application-defined stats by calling

hphp_stats($key, $count);

where $key is arbitrary and $count will be tallied across different calls of
the same key.

8. Special Keys:

hit: page hit
load: number of active worker threads
idle: number of idle worker threads


<h2>Example URL</h2>

GET "http://localhost:9999/stats.kvp?prefix=hphp&agg=*" \
"&keys=apc.hit/sec,hit,load,:sql.query..*.select:," \
"network.compressed/hit,hit/sec"

This URL queries the following data:

hit: page hits
hit/sec: request per second
apc.hit/sec: APC hit per second
load: number of active threads currently
network.compressed/hit: sent bytes per request
:sql.query..*.select: all SELECTs on different tables

0 comments on commit a551a6e

Please sign in to comment.