Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcl-clock rewritten in c #2

Closed
wants to merge 22 commits into from

Conversation

sebres
Copy link
Owner

@sebres sebres commented Dec 28, 2016

This is an artificial PR, corresponds with FlightAware's Tcl-bounties program, issue
flightaware/Tcl-bounties#4

Current performance increase:

  • clock format - 10-15 times faster;
  • clock scan -format - 30-60 times (some previously extremely slow scans up to 100 times faster);
  • clock scan (freescan) - 15-20 times;
  • clock add - 30-90 times;

My current result of performance test-cases attached bellow, (running threaded, warmed up, 4 x threads working, 4 core i5-2400 @ 3.10 GHz).

Here is an excerpt from diff of total-blocks (normal system-load):

 % # Format
  
 ********************************************************************************
 Total 32 cases in 16.00 sec.:
-371.620594 µs/# 1736699 # 109263.467 #/sec 15894.599 nett-ms
+32.574084 µs/# 19007011 # 1277997.901 #/sec 14872.490 nett-ms
 Average:
-11.613144 µs/# 54271 # 109262 #/sec 496.706 nett-ms
+1.017940 µs/# 593969 # 1277999 #/sec 464.765 nett-ms
 Min:
-5.707173 µs/# 86686 # 175218 #/sec 494.732 nett-ms
+0.574267 µs/# 789155 # 1741349 #/sec 453.186 nett-ms
 Max:
-47.6873 µs/# 10472 # 20970.0 #/sec 499.381 nett-ms
+4.671148 µs/# 105698 # 214080 #/sec 493.731 nett-ms
 ********************************************************************************
  
 % # Scan

 ********************************************************************************
 Total 35 cases in 17.50 sec.:
-1831.621900 µs/# 413807 # 23678.910 #/sec 17475.762 nett-ms
+31.400022 µs/# 19301079 # 1180129.840 #/sec 16355.047 nett-ms
 Average:
-52.332054 µs/# 11823 # 23679 #/sec 499.307 nett-ms
+0.897143 µs/# 551459 # 1180129 #/sec 467.287 nett-ms
 Min:
-22.5439 µs/# 22120 # 44358.0 #/sec 498.670 nett-ms
+0.658980 µs/# 696086 # 1517495 #/sec 458.707 nett-ms
 Max:
-125.380 µs/# 3986 # 7975.7 #/sec 499.765 nett-ms
+1.916346 µs/# 253079 # 521826 #/sec 484.987 nett-ms
 ********************************************************************************

 % # FreeScan
 
 ********************************************************************************
 Total 17 cases in 8.50 sec.:
-684.758500 µs/# 321446 # 37903.163 #/sec 8480.717 nett-ms
+32.940470 µs/# 5976102 # 733668.512 #/sec 8145.507 nett-ms
 Average:
-40.279912 µs/# 18908 # 37902 #/sec 498.866 nett-ms
+1.937675 µs/# 351535 # 733668 #/sec 479.147 nett-ms
 Min:
-16.4479 µs/# 30288 # 60798.2 #/sec 498.173 nett-ms
+0.791828 µs/# 587441 # 1262901 #/sec 465.152 nett-ms
 Max:
-99.6852 µs/# 5013 # 10031.6 #/sec 499.722 nett-ms
+8.778852 µs/# 56573 # 113910 #/sec 496.646 nett-ms
 ********************************************************************************

 % # Add 

 ********************************************************************************
 Total 14 cases in 7.00 sec.:
-546.773300 µs/# 241810 # 34603.012 #/sec 6988.120 nett-ms
+9.650766 µs/# 9665868 # 1504157.966 #/sec 6426.099 nett-ms
 Average:
-39.055236 µs/# 17272 # 34603 #/sec 499.151 nett-ms
+0.689340 µs/# 690419 # 1504158 #/sec 459.007 nett-ms
 Min:
-15.3289 µs/# 32490 # 65236.4 #/sec 498.035 nett-ms
+0.554461 µs/# 814551 # 1803552 #/sec 451.637 nett-ms
 Max:
-76.5505 µs/# 6527 # 13063.3 #/sec 499.645 nett-ms
+0.899414 µs/# 521492 # 1111835 #/sec 469.037 nett-ms
 ********************************************************************************

I've made a (threaded) build of newest binaries for Windows, if someone needs it, please see #2 (comment).

…out `-gmt 1`, because the return value would be -3600):

  % clock scan "01.01.1970" -locale current -format %x
  time value too large/small to represent
…" the execution limited by fixed time (in milliseconds) instead of repetition count (more precise results, to prevent very long execution time it is no more necessary to estimate repetition count)
… to 5000 now), because otherwise sporadically stutters on some platforms on very fast iterations (<= 1µs/#)
@sebres
Copy link
Owner Author

sebres commented Dec 28, 2016

Current performance increase:

  • clock format - 10-15 times faster;
  • clock scan -format - 30-50 times (some previously extremely slow scans up to 100 times faster);
  • clock scan (freescan) - 15-20 times;

Additionally it is few CPU and memory hungry.
For example after executing of following script tcl consumes in delta 4MB fewer memory per execution (because of memory fragmentation in original trunk, that goes to tcl internal memory manager):

time {set v([incr i]) [clock format $i -format "%Y-%m-%dT%H:%M:%S %z" -gmt 1]} 100000

Because many rules are system-wide (across interpreter), it spares resources also, e. g. don't wash out the cpu-cache.

@sebres
Copy link
Owner Author

sebres commented Dec 28, 2016

Known small incompatibilities:

  • confusing precedence of some clock-formats by clock scan -format, see http://core.tcl.tk/tcl/tktview?name=e7a722cd35;
  • because of bug-fix, the free scanning by relative date with ordinal month and relative date with ordinal month and relative weekday deliver another results now (previously was not test-covered):
 % # FreeScan : relative date with ordinal month (I said January)
 % clock scan "5 years 18 months 385 days next 1 January" -base 0 -gmt 1
-Fri Jul 21 02:00:00 CEST 1978
+Sat Jan 21 01:00:00 CET 1978
 % # FreeScan : relative date with ordinal month and relative weekday (I said Fri in January)
 % clock scan "5 years 18 months 385 days next January Fri" -base 0 -gmt 1
-Sat Jul 22 02:00:00 CEST 1978
+Fri Jan 27 01:00:00 CET 1978
  • I've introduced an optional tokens, for example zone (%z resp. %Z) is optional now (can be switched off);
  • several bugs in clock format, as well as in clock scan was fixed, thus some new test-cases created, that may fail in original branch (see https://github.com/sebres/tcl/pull/2/files#diff-3f17fa8e2a8a99dd3608554003a3b001);
  • additionally scan/format token %Es introduced to parse or format local seconds (in opposition to %s for posix seconds);
  • value -now will be accepted as clock value for format or add functions, e. g. clock format -now -f %u;
  • own extensions made in ParseClockFormatFormat, ParseClockScanFormat or DateParseActions are no more effective (because the scanning and formatting is pure C-implementation now). I plan to provide a functionality for extension with own scanning/parsing rules.

Side effects:

  • because new engine does not used regexp (and all lexer rules work in utf-8) the scanning object will be not converted in string (unicode), so leaves its current internal representation untouched.

…ance counters actualized in calibration thread in UpdateTimeEachSecond;

This entails that sometimes sporadically time-drifts resp. jump-esque time-shifts occurred, what for example produces very confusing results during time measurement.
[unix] wrong cast fixed in TclpGetWideClicks: multiplication with 1000000 in long int may cause overflow
…00000; more precise threshold handling after NativeGetTime fix.
@sebres
Copy link
Owner Author

sebres commented Dec 29, 2016

I've found by testing that we had a bug in time routines (sporadically time-drifts in performance counters).
See http://core.tcl.tk/tcl/tktview/b87ad7e9146832d505f9a430d779c5313c440256

So fixed now and the results of performance test-cases are updated above (because they were blurred by the error).

@sebres
Copy link
Owner Author

sebres commented Dec 29, 2016

I've made a (threaded) build of newest binaries for Windows (if someone needs it, please find enclosed both links).

tcl8.7 win(x86) clock speedup - trunk-rewrite-clock-in-c.zip
tcl8.7 win(x86) timerate build - trunk-with-timerate.zip

After unzip of archive, just start get-library.cmd if you've a git for windows or download manually https://github.com/sebres/tcl/archive/sb/trunk-rewrite-clock-in-c.zip and unpack library folder parallel (into the same) directory contains bin folder.

Directory of .\tcl8.7-clock-speedup
<DIR>          bin
<JUNCTION>     library [.\tcl8.7-clock-speedup\tcl8.7\library]
<DIR>          tcl8.7
           143 get-library.cmd
            18 tclsh.cmd
            55 test-clock-performance.cmd

@aidanhs
Copy link

aidanhs commented Dec 29, 2016

FYI, building on linux doesn't seem to work:

tclClock.o: In function `ClockConfigureClear':
tclClock.c:(.text+0x235): undefined reference to `ClockFrmScnClearCaches'
tclClock.o: In function `ClockFormatObjCmd':
tclClock.c:(.text+0x46c3): undefined reference to `ClockFormat'
tclClock.o: In function `ClockScanObjCmd.part.10':
tclClock.c:(.text+0x4bfd): undefined reference to `ClockScan'
tclClock.o:(.data.rel.ro.local+0x1a8): undefined reference to `TclStrIdxTreeTestObjCmd'
/usr/bin/ld: libtcl8.7.so: hidden symbol `ClockFrmScnClearCaches' isn't defined
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
Makefile:628: recipe for target 'libtcl8.7.so' failed

@sebres sebres added the WiP label Dec 29, 2016
@sebres
Copy link
Owner Author

sebres commented Dec 30, 2016

@aidanhs Thx for the testing!

A couple of incompatibilities

I've fixed it now and extended with many new test cases to cover this "regexp" based things... (see new test cases clock-6.22.11 - clock-6.22.20).

Although such "artificial" and very irrational "date-time" formats are theoretically imaginable, but I don't think, that it makes sense at all in the practice (or just to be 100% backwards compatible?).
So for example just try to recognize (or understand the result) of this one:

clock format [clock scan "1112 13 120" -format "%y%m%d %H%M%S" -gmt 1] -locale en -gmt 1

So which would be correct Sun Jan 02 13:12:00 GMT 2011 or Tue Dec 13 01:02:00 GMT 2011?

Better for the explanation is the usage of almost the same output format %y-%m-%d %H:%M:%S:

% clock format [clock scan "1112 13 120" -format "%y%m%d %H%M%S" -gmt 1] -format "%y-%m-%d %H:%M:%S" -gmt 1

So 11-01-02 13:12:00 or 11-12-13 01:02:00 ?

Because of greedy regex with mandatory space in-between (and optional before of each token) it is extremely NFA (or even DFA) engine dependent (I've already seen both variants).
But what you will get by the human brain here? ;) I mean it is very ambiguous and the usage of such formats would be very doubtful also.

So what. I've it ready now. Costs from 0.04µs to 0.07µs per iteration more in the performance test cases. And one sleepless night. :)
But please come no more with such bizarre cases (please real and ordinary common use cases)... :)

I can additionally implement a fallback to something using regular expressions (or small pre-compiled NFA/DFA or even more complex gramar rules in bison or yacc lexer). But I think ATM it would be breaking a fly on the wheel.

…-6.22.12), involving space count in look ahead and end distance calculation (because spaces are optional in date-time string as well as in scanning format).
…re fixed (see test cases clock-6.22.11 - clock-6.22.20), additionally involving look ahead token of known type into pre-search process.
@sebres sebres removed the WiP label Dec 30, 2016
…used vars, functions etc; types normalization;
@sebres
Copy link
Owner Author

sebres commented Dec 30, 2016

Building on *nix fixed and should work now (tested on debian jessie)

@sebres
Copy link
Owner Author

sebres commented Jan 2, 2017

[UPD] This comment is obsolete since calibrated timerate (newer test-script version) after merge with new timerate from #3.

Please note among the comparison the tcl-own overhead that consumes 0.1µs for executing of the byte code resp. the measurement overhead, explained below (does nothing, just "executes" an empty code scope):

% timerate {} 500
0.101916 µs/#, 4906001 #, 9812002 #/sec

Thus the time of 0.1µs/# in comparison to 6.14µs/# is almost neglectable, but in comparison to 0.68µs/#
it is nearby 10% - 15%....

So all performance test-cases above, that have the running time less than 1µs per iteration, are in reality more (at least 10%) faster.

@sebres sebres changed the base branch from trunk to sb/_se-back-port/timerate January 3, 2017 22:10
@sebres sebres force-pushed the sb/trunk-rewrite-clock-in-c branch from e79a484 to 4def0a2 Compare January 3, 2017 22:21
@sebres
Copy link
Owner Author

sebres commented Jan 5, 2017

Because of several fixes and improving of measurement procedure, I've actualized measure results (see above in head of the PR) and the win-binaries (see #2 (comment)).

- static used in non-static inline function;
- x64 int cast on pointer [-Wpointer-to-int-cast];
- (obscure) may be used uninitialized in this function [-Wmaybe-uninitialized];
- TclEnvEpoch initialized and declared extern;
@sebres sebres force-pushed the sb/trunk-rewrite-clock-in-c branch from 242ab9e to 1b69a0d Compare January 5, 2017 15:41
…" (and freescan)...

test-performance.tcl: test cases extended to cover "clock add"
@sebres
Copy link
Owner Author

sebres commented Jan 7, 2017

I've added a new commit with "clock add" rewritten in C.

Below you'll find a new snippet of the test-cases covered "clock add" functionality...

clock-add-test-perf.diff.txt

@sebres
Copy link
Owner Author

sebres commented Jan 10, 2017

Rebased to fossil-repository / RFE-ticket - http://core.tcl.tk/tcl/info/ddc948cff9781daa.

Further development will be done in TCT fossil-repository.

@sebres
Copy link
Owner Author

sebres commented Jan 17, 2017

As regards flightaware/Tcl-bounties#21 (comment) (ensemble overhead) the newest clock-command becomes still faster.
So compared to the original-tcl-core it is 100 to 150 times faster now.

@sebres sebres closed this Jan 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants