Reduce memory usage when rendering markdown #20326

lunny · 2022-07-12T02:59:39Z

This PR removed at least two whole markdown copy when rendering a document.

delvh

Took me (surprisingly) way too long to understand where we reduce the memory usage until I noticed it...

modules/markup/html.go

lunny · 2022-07-12T09:11:17Z

Took me (surprisingly) way too long to understand where we reduce the memory usage until I noticed it...

Not only that, I also removed rawHTML, err := io.ReadAll(input) which also copied the whole content. That means, previously there are at least 2 copy of the content in the memory and now they are gone.

Co-authored-by: delvh <dev.lh@web.de>

delvh · 2022-07-12T09:20:40Z

rawHTML, err := io.ReadAll(input) - yeah, that was the thing that took me so long.
But what is the other reduction I missed?
The replaceAll(buffer) vs replaceAll(input)?

lunny · 2022-07-12T09:25:27Z

rawHTML, err := io.ReadAll(input) - yeah, that was the thing that took me so long. But what is the other reduction I missed?

It's different memory usage between convert the whole content and convert part of them, when using a io.Reader to do that, there is only about 1K or 2K when reading the whole content.

The replaceAll(buffer) vs replaceAll(input)?

Yes

wxiaoguang · 2022-07-12T09:31:10Z

modules/markup/html.go

 	}
+	n = copy(bs, tagCleaner.ReplaceAll([]byte(nulCleaner.Replace(string(original[:n]))), []byte("&lt;$1")))


I do not think it is correct.

The content is incomplete in the original buffer, the tagCleaner may not work correctly. And tagCleaner is quite a complex regexp, I am not sure whether it works for incomplete content.

And one more thing, after the replacing, the returned string may be longer than before, it may overflow the bs buffer.

Gusted · 2022-07-12T09:39:58Z

The reduced memory is only correct for larger inputs.

1024 bytes (this PR use more memory):

-> % benchstat old.bench new.bench 
name            old time/op    new time/op    delta
PostProcess-12     199µs ± 2%     206µs ± 3%   +3.56%  (p=0.008 n=5+5)

name            old alloc/op   new alloc/op   delta
PostProcess-12    18.7kB ± 0%    22.1kB ± 0%  +17.72%  (p=0.008 n=5+5)

name            old allocs/op  new allocs/op  delta
PostProcess-12      27.0 ± 0%      29.0 ± 0%   +7.41%  (p=0.008 n=5+5)

1024 * 8 bytes (this PR use less memory):

-> % benchstat old.bench new.bench
name            old time/op    new time/op    delta
PostProcess-12    1.76ms ± 4%    1.72ms ± 3%     ~     (p=0.310 n=5+5)

name            old alloc/op   new alloc/op   delta
PostProcess-12     119kB ± 0%      92kB ± 0%  -22.57%  (p=0.008 n=5+5)

name            old allocs/op  new allocs/op  delta
PostProcess-12      35.0 ± 0%      36.0 ± 0%   +2.86%  (p=0.008 n=5+5)

And for reference 1024 * 128 bytes:

-> % benchstat old.bench new.bench
name            old time/op    new time/op    delta
PostProcess-12    27.1ms ± 6%    26.5ms ± 2%     ~     (p=0.421 n=5+5)

name            old alloc/op   new alloc/op   delta
PostProcess-12    2.01MB ± 0%    1.45MB ± 0%  -27.94%  (p=0.016 n=4+5)

name            old allocs/op  new allocs/op  delta
PostProcess-12      52.0 ± 2%      62.2 ± 2%  +19.62%  (p=0.008 n=5+5)

Bench code:

diff --git a/bench/bench_test.go b/bench/bench_test.go
new file mode 100644
index 000000000..33d9279cc
--- /dev/null
+++ b/bench/bench_test.go
@@ -0,0 +1,17 @@
+package bench
+
+import (
+       "io"
+       "strings"
+       "testing"
+
+       "code.gitea.io/gitea/modules/markup"
+)
+
+func BenchmarkPostProcess(b *testing.B) {
+       input := strings.Repeat("a", 1024)
+       b.ReportAllocs()
+       for i := 0; i < b.N; i++ {
+               markup.PostProcess(&markup.RenderContext{}, strings.NewReader(input), io.Discard)
+       }
+}

6543 · 2022-07-13T02:33:00Z

☝️ would be nice to also add this test into this pull

lunny · 2022-12-11T13:36:06Z

closed as it's not always correct.

extract from #20326

Reduce memory usage when rendering markdown

b9194bf

lunny added the performance/memory Performance issues affecting memory use label Jul 12, 2022

lunny added 2 commits July 12, 2022 13:37

Reduce memory usage when rendering markdown

e46f754

Merge branch 'main' into lunny/performance_renderer

ea3b13e

lunny added this to the 1.18.0 milestone Jul 12, 2022

lunny requested a review from zeripath July 12, 2022 05:43

Fix bug

e2f2300

delvh approved these changes Jul 12, 2022

View reviewed changes

modules/markup/html.go Outdated Show resolved Hide resolved

modules/markup/html.go Outdated Show resolved Hide resolved

GiteaBot added the lgtm/need 1 This PR needs approval from one additional maintainer to be merged. label Jul 12, 2022

Apply suggestions from code review

4c2c466

Co-authored-by: delvh <dev.lh@web.de>

wxiaoguang reviewed Jul 12, 2022

View reviewed changes

Merge branch 'main' into lunny/performance_renderer

d121484

6543 self-requested a review July 12, 2022 13:05

lunny added the pr/wip This PR is not ready for review label Jul 12, 2022

Merge branch 'main' into lunny/performance_renderer

424b7f8

6543 added 2 commits July 16, 2022 16:05

Merge branch 'master' into lunny/performance_renderer

d2bd8c2

add BenchmarkPostProcess

d6ee57a

lunny modified the milestones: 1.18.0, 1.19.0 Oct 17, 2022

lunny mentioned this pull request Dec 11, 2022

Use multi reader instead to concat strings #22099

Merged

lunny closed this Dec 11, 2022

lunny deleted the lunny/performance_renderer branch December 11, 2022 13:36

lunny removed this from the 1.19.0 milestone Dec 11, 2022

lunny added a commit that referenced this pull request Dec 12, 2022

Use multi reader instead to concat strings (#22099)

3e8285b

extract from #20326

go-gitea locked and limited conversation to collaborators May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage when rendering markdown #20326

Reduce memory usage when rendering markdown #20326

lunny commented Jul 12, 2022 •

edited

Loading

delvh left a comment

lunny commented Jul 12, 2022

delvh commented Jul 12, 2022 •

edited

Loading

lunny commented Jul 12, 2022 •

edited

Loading

wxiaoguang Jul 12, 2022

wxiaoguang Jul 12, 2022

Gusted commented Jul 12, 2022

6543 commented Jul 13, 2022

lunny commented Dec 11, 2022

		}
		n = copy(bs, tagCleaner.ReplaceAll([]byte(nulCleaner.Replace(string(original[:n]))), []byte("<$1")))

Reduce memory usage when rendering markdown #20326

Reduce memory usage when rendering markdown #20326

Conversation

lunny commented Jul 12, 2022 • edited Loading

delvh left a comment

Choose a reason for hiding this comment

lunny commented Jul 12, 2022

delvh commented Jul 12, 2022 • edited Loading

lunny commented Jul 12, 2022 • edited Loading

wxiaoguang Jul 12, 2022

Choose a reason for hiding this comment

wxiaoguang Jul 12, 2022

Choose a reason for hiding this comment

Gusted commented Jul 12, 2022

6543 commented Jul 13, 2022

lunny commented Dec 11, 2022

lunny commented Jul 12, 2022 •

edited

Loading

delvh commented Jul 12, 2022 •

edited

Loading

lunny commented Jul 12, 2022 •

edited

Loading