Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support handle invalid logs #19

Merged
merged 20 commits into from
Jul 10, 2020

Conversation

baurine
Copy link
Contributor

@baurine baurine commented Jul 3, 2020

close #17 , close #16

What did:

  • Support handle the log files that include invalid logs in the beginning and end, try at most 10 lines to get the first valid log.
  • Support handle the log files that include invalid logs in the middle, treat the invalid log with previous log time, level, and message by the whole line content.
  • Fix the bug that would read incorrect content from the end of the log file when others append new contents to the file
  • Improve the performance for reading content from the end of the log file, read 512 bytes once instead of 1 byte

@baurine baurine marked this pull request as draft July 3, 2020 15:45
@baurine baurine force-pushed the support-handle-illegal-logs branch from ba6e2e8 to dab3e40 Compare July 3, 2020 15:57
@baurine baurine force-pushed the support-handle-illegal-logs branch from f82c3b0 to e1084ad Compare July 6, 2020 05:45
@baurine baurine marked this pull request as ready for review July 6, 2020 06:19
@baurine baurine changed the title [WIP] Support handle illegal logs Support handle illegal logs Jul 6, 2020
@baurine
Copy link
Contributor Author

baurine commented Jul 6, 2020

@breeswish @crazycs520 @lonng PTAL, thanks!

search_log.go Outdated
@@ -136,20 +175,22 @@ func readLine(reader *bufio.Reader) (string, error) {
return string(line), nil
}

func readLastLine(file *os.File) string {
// Read a line from the end of a file.
// TODO: read byte by byte is low efficiency, we can improve it, for example, read 1024 bytes one time
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refine by this way? (not sure whether it has some pitfalls.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can add a todo here.

search_log.go Outdated
stat, _ := file.Stat()
filesize := stat.Size()
for {
cursor -= 1
cursor--
file.Seek(cursor, io.SeekEnd)
Copy link
Contributor Author

@baurine baurine Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the log has appended some new logs after the file is opened when calling file.seek(0, io.SeekEnd), does it work same as file.seek(filesize, io.SeekStart)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, I think we should add some tests for these scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will do some tests to check it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lonng , add a test method TestReadAndAppendLogFile to test this case, it seems we will read the unexcepted result from the end when new contents is appending to the file. But we can try to resolve it in another PR.

企业微信20200707120345

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will continue to fix it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it by file.Seek(cursor, io.SeekStart).

@baurine baurine changed the title Support handle illegal logs Support handle invalid logs Jul 6, 2020
@breezewish
Copy link
Member

Support handle the log files that include invalid logs in the beginning and end, skip them until find a valid log.

Maybe we can set a threshold for this behaviour, i.e. maximum look ahead xxx bytes, in case of the whole big file is invalid logs and causing performance degrade

Support handle the log files that include invalid logs in the middle, treat the invalid log with previous log time, unknown log level, and message by the whole line content.

Maybe we can keep the same log time, log level as the previous one

@baurine
Copy link
Contributor Author

baurine commented Jul 6, 2020

Support handle the log files that include invalid logs in the beginning and end, skip them until find a valid log.

Maybe we can set a threshold for this behaviour, i.e. maximum look ahead xxx bytes, in case of the whole big file is invalid logs and causing performance degrade

There is a tryLines parameter to control the maximum lines to read, but currently tryLines is set to 0 to mean may read all lines, and no way to config it from outside. will make the default value to 10 lines, how about it?

Support handle the log files that include invalid logs in the middle, treat the invalid log with previous log time, unknown log level, and message by the whole line content.

Maybe we can keep the same log time, log level as the previous one

ok, and updated it.

search_log.go Outdated
stat, _ := file.Stat()
filesize := stat.Size()
fmt.Println("current filesize:", filesize)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove it later

fmt.Println("final content:\n", string(content))
}

// result: (may be different every time)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove later


stat, _ := file.Stat()
filesize := stat.Size()
fmt.Println("initial filesize: ", filesize)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this?

line := fmt.Sprintf("[2020/07/07 06:%d:17.011 -04:00] [INFO] appended logs\n", i)
_, err := file_append.WriteString(line)
c.Assert(err, IsNil, Commentf("write %s failed", line))
fmt.Print("write line: ", line)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto?

@@ -515,3 +534,102 @@ func (s *searchLogSuite) TestParseLogItem(c *C) {
c.Assert(item.Message, Equals, cas.message)
}
}

func (s *searchLogSuite) TestReadAndAppendLogFile(c *C) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test output is not expected. I think we need to re-implement the readLineReverse function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I decided to continue to re-implement it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

- read from file io.SeekEnd will read incorrect content when other threads append logs

cursor--
file.Seek(cursor, io.SeekStart)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace file.Seek(cursor, io.SeekEnd) by file.Seek(cursor, io.SeekStart), the former way may read incorrect content from the end of a file when some others append new logs to this file.

@baurine
Copy link
Contributor Author

baurine commented Jul 7, 2020

Addressed all comments, polished by the following work:

  • Fix the bug that would read incorrect content from the end of the log file when others append new contents to the file
  • Improve the performance for reading content from the end of the log file, read 512 bytes once instead of 1 byte

@breeswish @lonng @crazycs520 PTAL, thanks!

if cursor == -filesize { // stop if we are at the begining
break

var size int64 = 512
Copy link
Contributor Author

@baurine baurine Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read 512 bytes once instead of 1 byte.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change very much, would you like to write a benchmark for this optimization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me try it.

Copy link
Contributor Author

@baurine baurine Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison:

Old way:
searchLogSuite.BenchmarkReadLastLine        10000            124423 ns/op
searchLogSuite.BenchmarkReadLastLine        10000            126135 ns/op

New way:
searchLogSuite.BenchmarkReadLastLines      1000000              1892 ns/op
searchLogSuite.BenchmarkReadLastLines      1000000              1920 ns/op

Conclusion:

the new implementation is 65x (126135/1920) as fast as the old way. (the result should depend on the length of the last line. In this case, the last line is 76 bytes long. If we double the last line length, the old way cost time will be doubled too, see details in PR #20 )

search_log.go Outdated
break
}
endCursor -= int64(len(lines))
linesSlice := strings.FieldsFunc(lines, func(c rune) bool { return c == '\n' || c == '\r' })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there will be problems when line ending is \r\n.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I thought that, let me try to polish it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polish it by strings.Split(strings.ReplaceAll(finalStr, "\r\n", "\n"), "\n") and put it inside the readLastLines method.

search_log.go Show resolved Hide resolved
cursor -= size

file.Seek(cursor, io.SeekStart)
chars := make([]byte, size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this line out of the loop and reuse the buffer to improve performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! update it soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried but found it doesn't work well, including keep size value not change. Because file.Read(chars) reads len(chars) bytes content, so we need to make sure the chars is allocated by the size we need to read, instead of fixed 512 bytes.

lines = append(chars, lines...)

// find first '\n' or '\r'
for i := 0; i < len(chars); i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to check \r and \n in the scanned bytes from this round, instead of previous rounds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works as you said, len(chars) is the length of content just read in this round.

@breezewish
Copy link
Member

Very cool!

search_log.go Outdated
filesize := stat.Size()
// Read lines from the end of a file
// endCursor initial value should be the filesize
func readLastLines(file *os.File, endCursor int64) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readLastLines look like maybe returns multiple lines, but the return parameter is string, should be []string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to get read bytes by len(string) in its caller method, return []string will miss the length info, else I need to add an extra readSize return value. Which way is better?

Copy link
Collaborator

@crazycs520 crazycs520 Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to return lines []string, size int, it's more clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

baurine and others added 2 commits July 8, 2020 11:39
Co-authored-by: Wenxuan <hi@breeswish.org>
@baurine baurine force-pushed the support-handle-illegal-logs branch from 508ad40 to 56008cf Compare July 8, 2020 06:06
search_log.go Outdated Show resolved Hide resolved
breezewish
breezewish previously approved these changes Jul 9, 2020
Copy link
Member

@breezewish breezewish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest LGTM. @crazycs520 PTAL

Co-authored-by: Wenxuan <hi@breeswish.org>
Copy link
Collaborator

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support to handle illegal format lines invalid time formath when search log
4 participants