Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: Remember recent lookups and display them in an admin endpoint #8116

Merged
merged 69 commits into from
Oct 17, 2019
Merged
Show file tree
Hide file tree
Changes from 66 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
f871f2b
initial impl.
jmarantz Sep 1, 2019
be02163
Add optional recent-lookups tracking to SymbolTable.
jmarantz Sep 1, 2019
7f404a0
Hook up recent_lookups_ maps into SymbolTable.
jmarantz Sep 1, 2019
4eeb7eb
full stack of recent-lookups admin access compiles, not tested or tried.
jmarantz Sep 2, 2019
c1a1a32
cleanup
jmarantz Sep 2, 2019
e2e384e
format
jmarantz Sep 2, 2019
33c0f88
Hold fewer locks for less time.
jmarantz Sep 2, 2019
8808258
Add test for SymbolTableImpl::getRecentLookups().
jmarantz Sep 2, 2019
8bc22f6
simplified; admin flow almost working
jmarantz Sep 2, 2019
86e9204
admin interface kind of works.
jmarantz Sep 2, 2019
cd83c62
collate fix.
jmarantz Sep 2, 2019
fd3bbbd
report the total lookup count as well.
jmarantz Sep 3, 2019
cfa1634
format and add admin tests.
jmarantz Sep 3, 2019
49dfcb4
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 3, 2019
e8498a2
add function doc to new functions.
jmarantz Sep 3, 2019
312d389
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 3, 2019
491ab32
cleanup
jmarantz Sep 3, 2019
45a4feb
Privatize some functions.
jmarantz Sep 3, 2019
90327f7
add doc.
jmarantz Sep 6, 2019
7b867b3
format
jmarantz Sep 6, 2019
52793db
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 8, 2019
3e9e3a5
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 8, 2019
e9662a0
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 14, 2019
fd23fbe
Add logging and simplify impl slightly by making it untemplated and j…
jmarantz Sep 17, 2019
1bf7773
added speed test.
jmarantz Sep 18, 2019
c949893
switch to lru
jmarantz Sep 18, 2019
14f9eae
cleanup test deque behavior.
jmarantz Sep 18, 2019
56b0fec
format
jmarantz Sep 18, 2019
8cdf5fc
format
jmarantz Sep 18, 2019
9b65b71
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 18, 2019
b920220
Update call-sites to expect the count rather than the time.
jmarantz Sep 19, 2019
2e4f122
update test to reflect we are showing counts rather than date/time.
jmarantz Sep 19, 2019
82c03cd
fix golden result for admin test.
jmarantz Sep 19, 2019
fa447d8
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 20, 2019
cb828f8
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 20, 2019
60c0673
address review comments
jmarantz Sep 20, 2019
9b92a3f
clang tidy and do a better job of collating the aggregated lookup data.
jmarantz Sep 20, 2019
c0ad91d
remove dead local variable.
jmarantz Sep 20, 2019
e08c677
Hook up reset_counters to clear recent_lookups.
jmarantz Sep 23, 2019
b38b505
cover http2 case as well.
jmarantz Sep 23, 2019
e6935fc
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 23, 2019
a8f685f
Construct StatNameSet from SymbolTable, to avoid the 'rememberSet' ha…
jmarantz Sep 23, 2019
88c5dc5
Merge branch 'remember-recent-lookups-lru' into remember-recent-lookups
jmarantz Sep 23, 2019
c536bed
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 23, 2019
67b0f49
switch to lru implementation.
jmarantz Sep 24, 2019
0760e00
Add more admin endpoints to control recent-lookups tracking.
jmarantz Sep 24, 2019
09c948c
cleanups.
jmarantz Sep 25, 2019
25cbe85
more comments.
jmarantz Sep 25, 2019
514fb31
spelling.
jmarantz Sep 25, 2019
c9dddb4
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 25, 2019
5ec6fce
Cleanup LRU and fix asan error.
jmarantz Sep 25, 2019
422d15d
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 25, 2019
049cab3
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 26, 2019
e4ea14e
Merge branch 'master' into remember-recent-lookups
jmarantz Sep 27, 2019
106d75d
Merge branch 'master' into remember-recent-lookups
jmarantz Oct 1, 2019
af6d5de
address remaining comments.
jmarantz Oct 1, 2019
6cdf862
Merge branch 'master' into remember-recent-lookups
jmarantz Oct 14, 2019
9f0b754
Merge branch 'master' into remember-recent-lookups
jmarantz Oct 14, 2019
6bca221
fix builda
jmarantz Oct 14, 2019
14cbbcf
Record the total recent lookups as a counter.
jmarantz Oct 14, 2019
d82b1df
simplify flush-stats impl for recent lookups by switching from counte…
jmarantz Oct 14, 2019
ff84c2c
update doc
jmarantz Oct 14, 2019
e8cb00c
language cleanup.
jmarantz Oct 14, 2019
faf1cfa
add release note.
jmarantz Oct 14, 2019
a70b2de
use correct method name. Awesome that we have a docs consistency chec…
jmarantz Oct 14, 2019
c18f184
quotes vs backquotes
jmarantz Oct 14, 2019
a7bb051
Merge branch 'master' into remember-recent-lookups
jmarantz Oct 16, 2019
f17bcdb
fix typos and add explicit option syntax.
jmarantz Oct 16, 2019
298210c
un-option the reference to use-fake-symbol-table, which is not doc'd.
jmarantz Oct 16, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/root/intro/version_history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ Version history
* access log: reintroduce :ref:`filesystem <filesystem_stats>` stats and added the `write_failed` counter to track failed log writes
* admin: added ability to configure listener :ref:`socket options <envoy_api_field_config.bootstrap.v2.Admin.socket_options>`.
* admin: added config dump support for Secret Discovery Service :ref:`SecretConfigDump <envoy_api_msg_admin.v2alpha.SecretsConfigDump>`.
* admin: added :http:get:`/stats/recentlookups`, :http:post:`/stats/recentlookups/clear`,
:http:post:`/stats/recentlookups/disable`, and :http:post:`/stats/recentlookups/enable` endpoints.
* api: added ::ref:`set_node_on_first_message_only <envoy_api_field_core.ApiConfigSource.set_node_on_first_message_only>` option to omit the node identifier from the subsequent discovery requests on the same stream.
* buffer filter: the buffer filter populates content-length header if not present, behavior can be disabled using the runtime feature `envoy.reloadable_features.buffer_filter_populate_content_length`.
* config: added support for :ref:`delta xDS <arch_overview_dynamic_config_delta>` (including ADS) delivery
Expand Down
51 changes: 50 additions & 1 deletion docs/root/operations/admin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ modify different aspects of the server:
.. http:post:: /reset_counters

Reset all counters to zero. This is useful along with :http:get:`/stats` during debugging. Note
that this does not drop any data sent to statsd. It just effects local output of the
that this does not drop any data sent to statsd. It just affects local output of the
:http:get:`/stats` command.

.. http:get:: /server_info
Expand Down Expand Up @@ -354,6 +354,55 @@ modify different aspects of the server:
Envoy has updated (counters incremented at least once, gauges changed at least once,
and histograms added to at least once)

.. http:get:: /stats/recentlookups

This endpoint helps Envoy developers debug potential contention
issues in the stats system. Initially, only the count of StatName
lookups is acumulated, not the specific names that are being looked
up. In order to see specific recent requests, you must enable the
feature by POSTing to `/stats/recentlookups/enable`. There may be
approximately 40-100 nanoseconds of added overhead per lookup.

When enabled, this endpoint emits a table of stat names that were
recently accessed as strings by Envoy. Ideally, strings should be
converted into StatNames, counters, gauges, and histograms by Envoy
code only during startup or when receiving a new configuration via
xDS. This is because when stats are looked up as strings they must
take a global symbol table lock. During startup this is acceptable,
but in response to user requests on high core-count machines, this
can cause performance issues due to mutex contention.

This option requires Envoy to be started with `use-fake-symbol-table 0`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can use an :option ref link here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done; thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I backed this out, as use-fake-symbol-table is not docced, as it was meant to be temporary.


See :repo:`source/docs/stats.md` for more details.

Note also that actual mutex contention can be tracked via :http:get:`/contention`.

.. http:post:: /stats/recentlookups/enable

Turns on collection of recent lookup of stat-names, thus enabling
`/stats/recentlookups`.

See :repo:`source/docs/stats.md` for more details.

.. http:post:: /stats/recentlookups/disable

Turns off collection of recent lookup of stat-names, thus disabling
`/stats/recentlookups`. It also clears the list of lookups. However,
the total count, visible as stat `server.stats_recent_lookups`, is
not cleared, and continues to accumulate.

See :repo:`source/docs/stats.md` for more details.

.. http:post:: /stats/recentlookups/clear

Clears all outstanding lookups and counts. If called when recent lookup
collection is enabled, this clears all the, but collection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, and I simplified this, as I don't think it adds anything to the paragraph to discuss the two different cases.

commit coming.

continues. If called when recent lookup collection is disabled,
there is no effect, as disabling collection clears the data.

See :repo:`source/docs/stats.md` for more details.

.. _operations_admin_interface_runtime:

.. http:get:: /runtime
Expand Down
26 changes: 26 additions & 0 deletions include/envoy/stats/symbol_table.h
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,32 @@ class SymbolTable {
virtual void callWithStringView(StatName stat_name,
const std::function<void(absl::string_view)>& fn) const PURE;

using RecentLookupsFn = std::function<void(absl::string_view, uint64_t)>;

/**
* Calls the provided function with the name of the most recently looked-up
* symbols, including lookups on any StatNameSets, and with a count of
* the recent lookups on that symbol.
*
* @param iter the function to call for every recent item.
*/
virtual uint64_t getRecentLookups(const RecentLookupsFn& iter) const PURE;

/**
* Clears the recent-lookups structures.
*/
virtual void clearRecentLookups() PURE;

/**
* Sets the recent-lookup capacity.
*/
virtual void setRecentLookupCapacity(uint64_t capacity) PURE;

/**
* @return The configured recent-lookup tracking capacity.
*/
virtual uint64_t recentLookupCapacity() const PURE;

/**
* Creates a StatNameSet.
*
Expand Down
1 change: 1 addition & 0 deletions source/common/stats/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ envoy_cc_library(
hdrs = ["symbol_table_impl.h"],
external_deps = ["abseil_base"],
deps = [
":recent_lookups_lib",
"//include/envoy/stats:symbol_table_interface",
"//source/common/common:assert_lib",
"//source/common/common:logger_lib",
Expand Down
4 changes: 4 additions & 0 deletions source/common/stats/fake_symbol_table_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,10 @@ class FakeSymbolTableImpl : public SymbolTable {
return StatNameSetPtr(new StatNameSet(*this, name));
}
void forgetSet(StatNameSet&) override {}
uint64_t getRecentLookups(const RecentLookupsFn&) const override { return 0; }
void clearRecentLookups() override {}
void setRecentLookupCapacity(uint64_t) override {}
uint64_t recentLookupCapacity() const override { return 0; }

private:
absl::string_view toStringView(const StatName& stat_name) const {
Expand Down
104 changes: 100 additions & 4 deletions source/common/stats/symbol_table_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ void SymbolTableImpl::addTokensToEncoding(const absl::string_view name, Encoding
// ref-counts in this.
{
Thread::LockGuard lock(lock_);
recent_lookups_.lookup(name);
for (auto& token : tokens) {
symbols.push_back(toSymbol(token));
}
Expand Down Expand Up @@ -207,16 +208,94 @@ void SymbolTableImpl::free(const StatName& stat_name) {
}
}

StatNameSetPtr SymbolTableImpl::makeSet(absl::string_view name) {
uint64_t SymbolTableImpl::getRecentLookups(const RecentLookupsFn& iter) const {
uint64_t total = 0;
absl::flat_hash_map<std::string, uint64_t> name_count_map;

// We don't want to hold stat_name_set_mutex while calling the iterator, so
// buffer lookup_data.
{
Thread::LockGuard lock(stat_name_set_mutex_);
for (StatNameSet* stat_name_set : stat_name_sets_) {
total +=
stat_name_set->getRecentLookups([&name_count_map](absl::string_view str, uint64_t count) {
name_count_map[std::string(str)] += count;
});
}
}

// We also don't want to hold lock_ while calling the iterator, but we need it
// to access recent_lookups_.
{
Thread::LockGuard lock(lock_);
recent_lookups_.forEach(
[&name_count_map](absl::string_view str, uint64_t count)
NO_THREAD_SAFETY_ANALYSIS { name_count_map[std::string(str)] += count; });
total += recent_lookups_.total();
}

// Now we have the collated name-count map data: we need to vectorize and
// sort. We define the pair with the count first as std::pair::operator<
// prioritizes its first element over its second.
using LookupCount = std::pair<uint64_t, absl::string_view>;
std::vector<LookupCount> lookup_data;
lookup_data.reserve(name_count_map.size());
for (const auto& iter : name_count_map) {
lookup_data.emplace_back(LookupCount(iter.second, iter.first));
}
std::sort(lookup_data.begin(), lookup_data.end());
for (const LookupCount& lookup_count : lookup_data) {
iter(lookup_count.second, lookup_count.first);
}
return total;
}

void SymbolTableImpl::setRecentLookupCapacity(uint64_t capacity) {
{
Thread::LockGuard lock(stat_name_set_mutex_);
for (StatNameSet* stat_name_set : stat_name_sets_) {
stat_name_set->setRecentLookupCapacity(capacity);
}
}

{
Thread::LockGuard lock(lock_);
recent_lookups_.setCapacity(capacity);
}
}

void SymbolTableImpl::clearRecentLookups() {
{
Thread::LockGuard lock(stat_name_set_mutex_);
for (StatNameSet* stat_name_set : stat_name_sets_) {
stat_name_set->clearRecentLookups();
}
}
{
Thread::LockGuard lock(lock_);
recent_lookups_.clear();
}
}

uint64_t SymbolTableImpl::recentLookupCapacity() const {
Thread::LockGuard lock(lock_);
// make_unique does not work with private ctor, even though FakeSymbolTableImpl is a friend.
return recent_lookups_.capacity();
}

StatNameSetPtr SymbolTableImpl::makeSet(absl::string_view name) {
const uint64_t capacity = recentLookupCapacity();
// make_unique does not work with private ctor, even though SymbolTableImpl is a friend.
StatNameSetPtr stat_name_set(new StatNameSet(*this, name));
stat_name_sets_.insert(stat_name_set.get());
stat_name_set->setRecentLookupCapacity(capacity);
{
Thread::LockGuard lock(stat_name_set_mutex_);
stat_name_sets_.insert(stat_name_set.get());
}
return stat_name_set;
}

void SymbolTableImpl::forgetSet(StatNameSet& stat_name_set) {
Thread::LockGuard lock(lock_);
Thread::LockGuard lock(stat_name_set_mutex_);
stat_name_sets_.erase(&stat_name_set);
}

Expand Down Expand Up @@ -484,10 +563,27 @@ StatName StatNameSet::getDynamic(absl::string_view token) {
Stats::StatName& stat_name_ref = dynamic_stat_names_[token];
if (stat_name_ref.empty()) { // Note that builtin_stat_names_ already has one for "".
stat_name_ref = pool_.add(token);
recent_lookups_.lookup(token);
}
return stat_name_ref;
}
}

uint64_t StatNameSet::getRecentLookups(const RecentLookups::IterFn& iter) const {
absl::MutexLock lock(&mutex_);
recent_lookups_.forEach(iter);
return recent_lookups_.total();
}

void StatNameSet::clearRecentLookups() {
absl::MutexLock lock(&mutex_);
recent_lookups_.clear();
}

void StatNameSet::setRecentLookupCapacity(uint64_t capacity) {
absl::MutexLock lock(&mutex_);
recent_lookups_.setCapacity(capacity);
}

} // namespace Stats
} // namespace Envoy
28 changes: 26 additions & 2 deletions source/common/stats/symbol_table_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "common/common/stack_array.h"
#include "common/common/thread.h"
#include "common/common/utility.h"
#include "common/stats/recent_lookups.h"

#include "absl/container/flat_hash_map.h"
#include "absl/strings/str_join.h"
Expand Down Expand Up @@ -161,6 +162,10 @@ class SymbolTableImpl : public SymbolTable {

StatNameSetPtr makeSet(absl::string_view name) override;
void forgetSet(StatNameSet& stat_name_set) override;
uint64_t getRecentLookups(const RecentLookupsFn&) const override;
void clearRecentLookups() override;
void setRecentLookupCapacity(uint64_t capacity) override;
uint64_t recentLookupCapacity() const override;

private:
friend class StatName;
Expand All @@ -176,6 +181,9 @@ class SymbolTableImpl : public SymbolTable {
// This must be held during both encode() and free().
mutable Thread::MutexBasicLockable lock_;

// This must be held while updating stat_name_sets_.
mutable Thread::MutexBasicLockable stat_name_set_mutex_;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we do need a separate mutex for this or we get deadlock detection triggered by absl when calling getDynamic().


/**
* Decodes a vector of symbols back into its period-delimited stat name. If
* decoding fails on any part of the symbol_vec, we release_assert and crash
Expand Down Expand Up @@ -241,7 +249,9 @@ class SymbolTableImpl : public SymbolTable {
// TODO(ambuc): There might be an optimization here relating to storing ranges of freed symbols
// using an Envoy::IntervalSet.
std::stack<Symbol> pool_ GUARDED_BY(lock_);
absl::flat_hash_set<StatNameSet*> stat_name_sets_ GUARDED_BY(lock_);
RecentLookups recent_lookups_ GUARDED_BY(lock_);

absl::flat_hash_set<StatNameSet*> stat_name_sets_ GUARDED_BY(stat_name_set_mutex_);
};

/**
Expand Down Expand Up @@ -709,19 +719,33 @@ class StatNameSet {
return pool_.add(str);
}

/**
* Clears recent lookups.
*/
void clearRecentLookups();

/**
* Sets the number of names recorded in the recent-lookups set.
*
* @param capacity the capacity to configure.
*/
void setRecentLookupCapacity(uint64_t capacity);

private:
friend class FakeSymbolTableImpl;
friend class SymbolTableImpl;

StatNameSet(SymbolTable& symbol_table, absl::string_view name);
uint64_t getRecentLookups(const RecentLookups::IterFn& iter) const;

const std::string name_;
Stats::SymbolTable& symbol_table_;
Stats::StatNamePool pool_ GUARDED_BY(mutex_);
absl::Mutex mutex_;
mutable absl::Mutex mutex_;
using StringStatNameMap = absl::flat_hash_map<std::string, Stats::StatName>;
StringStatNameMap builtin_stat_names_;
StringStatNameMap dynamic_stat_names_ GUARDED_BY(mutex_);
RecentLookups recent_lookups_ GUARDED_BY(mutex_);
};

} // namespace Stats
Expand Down
Loading