Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A lot of time is spent determining the size of the pending/queued transaction pool #16717

Closed
veox opened this issue May 9, 2018 · 9 comments
Closed

Comments

@veox
Copy link
Contributor

veox commented May 9, 2018

EDIT: Original title: A lot of time is spent determining the length of txSortedMap

System information

Geth version: Geth/v1.8.7-stable-66432f38/linux-amd64/go1.10
OS & Version: Ubuntu 16.04.4 LTS (Xenial Xerus)

Background

I've tried to do txpool analysis, retaining as much of it as possible (e.g. having 32768 slots). My assumption was that this would mostly require copious amounts of memory, but it seemed to increase CPU load, too.

I've dropped the count down to 8192 and ran some profiling. (See link at the very bottom.)

Possible optimisation?..

In core/tx_pool.go, TxPool promoteExecutables() has a lot of nested loops that iterate over various lists of transactions.

Calls are made to txSortedMap's Len() (comes from core/tx_list.go); it seems that the structure does not have a specific field for this, so the determination is made on-the-fly.

In my tests, 5-10% of a "production" node's time is spent determining the length of these maps. This would seem to be true for most any txpool settings, unless severely restricting txpool.pricelimit.

(BTW, the default for that is 1 wei, which also contributes to the issue: the txpool is pretty much guaranteed to be immediately filled up by "default policy" peers, and remain in a constant state of churn from then on.)

Since this probably can't be avoided, perhaps it can be optimised for?..

Steps to reproduce the behaviour

Run geth something like:

/usr/bin/geth --metrics \
    --datadir /home/geth/.ethereum \
    --cache 3072 \ 
    --txpool.pricelimit 31337000 --txpool.accountslots 4 --txpool.globalslots 8192 --txpool.accountqueue 4 \
    --lightserv 50 --lightpeers 1000 --maxpeers 1025

In console:

debug.cpuProfile('/home/geth/fiddled.prof', 3600)

Generate SVG:

go tool pprof fiddled.prof
(pprof) svg > fiddled.svg

CPU profiling graph

https://veox.pw/dump/fiddled.svg

Note: this was intentionally timed to a window with little DB flushing.


Cc: @AlexeyAkhunov - not sure if you're mostly interested is sync optimisation, or runtime also.

@AlexeyAkhunov
Copy link
Contributor

Looks like Len() function is getting called quite a lot on the elements of pool.pending. It looks like pool.pending is not mutated inside the promoteExecutables(), so a simple assignment to a variable should do the job.

@ryanschneider
Copy link
Contributor

This seems odd: txSortedMap.Len is just a wrapper around len(m.items), which appears to be just a value lookup on h.count in the golang runtime:

https://golang.org/src/runtime/hashmap.go#L1232

My guess is its actually the number of operations done on pool.pending rather than the actual Len calls themselves.

@veox
Copy link
Contributor Author

veox commented May 10, 2018

Done a bit of debug-print digging, most invocations are indeed done when estabilishing the pending map's length depth - here:

for _, list := range pool.pending {
    pending += uint64(list.Len())
}

pool.pending is a hash map:

pending map[common.Address]*txList

to txLists of likely different sizes.

Perhaps it is pool.pending that could benefit from a pool.pendingcount?..


EDIT: This section seems related to PR #16720, which might have made it obsolete.

The profiling graph doesn't show this, but the addTxs -> addTxsLocked -> promoteExecutables call path gets hit a lot passing in accounts one-by-one, although this passage in addTxsLocked suggests that's not the expected only supported mode of operation:

addrs := make([]common.Address, 0, len(dirty))
for addr := range dirty {
	addrs = append(addrs, addr)
}
pool.promoteExecutables(addrs)

In other words, addTxsLocked receives a txs argument that often has just one account - although the function is capable of processing lists of transactions from many accounts.

@veox veox changed the title A lot of time is spent determining the length of txSortedMap A lot of time is spent determining the size of the pending transaction pool May 10, 2018
@veox
Copy link
Contributor Author

veox commented May 10, 2018

P.S. Although in current network conditions this manifests mostly with pool.pending, it is likely that with an enormous txpool it would be visible instead with pool.queue.

The crux of the issue is that these are "hash maps of (non-homogenous) lists", and their size is determined by iterating over the elements.

@veox veox changed the title A lot of time is spent determining the size of the pending transaction pool A lot of time is spent determining the size of the pending/queued transaction pool May 10, 2018
@stale
Copy link

stale bot commented May 30, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@veox
Copy link
Contributor Author

veox commented May 30, 2019

AFAIK nothing has changed between v1.8.7 and v1.8.27: pool.pending is still iterated over, and it's still a hash map keyed by common.address, - with various-length lists as values.

I haven't run profiling on v1.8.27, but would guess the inefficiency is still present.

@karalabe
Copy link
Member

karalabe commented Nov 7, 2019

The 1.9 release family changed the way transactions are shuffled within the bool from a synchronous one after every insert to an async one that runs in the background. This means that if there's high churn, the shuffling (and inherently the iterating) will run much much rarer. Could you perhaps try if a latest release still exhibits this behavior? My initial guess would be that it does not.

@veox
Copy link
Contributor Author

veox commented Nov 15, 2019

I'm not able to replicate the experiment completely (the machine has been "beefed up" resource-wise). Below is with geth v1.9.6.

However, the situation seems to have gotten "worse". Running:

/usr/bin/geth --nousb --datadir /home/geth/.ethereum \
    --v5disc --cache 6144 \
    --txpool.pricelimit 31337000 --txpool.globalslots 8192 \
    --txpool.accountslots 4 --txpool.accountqueue 4 \
    --ethstats "veox-geth-lightserv:$ETHSTATS_SECRET@ethstats.net" \
    --light.serve 100 --light.maxpeers 475 --maxpeers 500 \
    --metrics

Notice the inclusion of ethstats, which constantly queries the size of txpool. txpool.pricelimit is same as in original experiment, so as to guarantee constant churn.

Profiling graph: https://veox.pw/dump/fiddled-v1.9.6.svg

It's now at 18.93% (another round shows 19.20%, another one 18.00%). Most likely an indicator of how everything else improved...


I still see no solution for this other than to have a len counter on txSortedMap, incrementing it as transactions are added, decrementing it as they are removed.

But that is patently unfriendly to multi-processing, and I don't know the proper golang approach. :(

@holiman
Copy link
Contributor

holiman commented Sep 10, 2020

We've made a lot of improvements lately:

I believe this is mostly fixed now, so closing. Please open a new ticket if this problem still is present

@holiman holiman closed this as completed Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants