Skip to content

Commit

Permalink
Add support for the 'compact model', allowing 50% reduction of used b…
Browse files Browse the repository at this point in the history
…andwidth between client and server, under some constraints.

Kind of specialization for small embedded devices.
  • Loading branch information
dfeneyrou committed Aug 12, 2021
1 parent 4d1b350 commit 6c8b6ec
Show file tree
Hide file tree
Showing 21 changed files with 610 additions and 292 deletions.
281 changes: 215 additions & 66 deletions c++/palanteer.h

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions c++/test/test_instru_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,3 +202,17 @@ def test_build_instru27():
def test_build_instru28():
"""USE_PL=0 PL_VIRTUAL_THREADS=1"""
build_target("testprogram", test_build_instru28.__doc__)


# Short date (32 bits wrapping) feature
@declare_test("build instrumentation")
def test_build_instru29():
"""USE_PL=0 PL_SHORT_DATE=1"""
build_target("testprogram", test_build_instru29.__doc__)


# Compact model feature
@declare_test("build instrumentation")
def test_build_instru30():
"""USE_PL=0 PL_COMPACT_MODEL=1"""
build_target("testprogram", test_build_instru30.__doc__)
30 changes: 29 additions & 1 deletion c++/test/test_instru_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ def test_short_hash_string():
def test_32bits_arch():
"""Config 32 bits architecture PL_SHORT_STRING_HASH=1 -m32"""
if sys.platform == "win32":
LOG("Skipped: 32 bit build is not applicable under Windows")
LOG("Skipped: 32 bits build is not applicable under Windows")
return
build_target(
"testprogram",
Expand Down Expand Up @@ -445,3 +445,31 @@ def test_external_short_string():
"The strings of the path are no more obfuscated",
)
process_stop()


@declare_test("config instrumentation")
def test_compactmodel():
"""Config Compact Model PL_COMPACT_MODEL=1"""
build_target("testprogram", "USE_PL=1 PL_COMPACT_MODEL=1")

data_configure_events([EvtSpec("CRASH"), spec_add_fruit])
try:
launch_testprogram()
CHECK(True, "Connection established")
except ConnectionError:
CHECK(False, "No connection")

events = data_collect_events(timeout_sec=1.0)
CHECK(events, "Some events are received")
CHECK(
not [1 for e in events if e.path[-1] == "CRASH Stacktrace"],
"No crash event has been received",
)
status, answer = program_cli("async_assert condvalue=0")
CHECK(status == 0, "CLI to make an assert called successfully", status, answer)
events = data_collect_events(timeout_sec=2.0)
CHECK(
[1 for e in events if e.path[-1] == "CRASH"] and not process_is_running(),
"Crash event due to the assert has been received",
)
process_stop()
7 changes: 4 additions & 3 deletions c++/testprogram/testProgram.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -248,8 +248,9 @@ void
cliHandlerCreateMarker(plCliIo& cio)
{
const char* msg = cio.getParamString(0);
(void)msg; // Remove warnings when Palanteer events are not used
plMarkerDyn("test_marker", msg);
if(msg) { // In case Palanteer events are not used
plMarkerDyn("test_marker", msg);
}
}


Expand Down Expand Up @@ -415,7 +416,7 @@ collectInterestingData(plMode mode, const char* buildName, int durationMultiplie
// Test all the 'group' APIs
if(plgIsEnabled(TESTGROUP)) { plgFunctionDyn(TESTGROUP); }
{
int a = 0;
int a = 0; (void)a;
plgBegin(TESTGROUP, "Group begin/end test");
plgData(TESTGROUP, "Group variable a", a);
plgVar(TESTGROUP, a);
Expand Down
25 changes: 12 additions & 13 deletions docs/instrumentation_api_cpp.md.html
Original file line number Diff line number Diff line change
Expand Up @@ -1007,14 +1007,14 @@ <h1> @@ <a href="#">C++ Instrumentation API</a> </h1>
<br>
**plFunction does not compile**

You probably hit the issue described in [plFunction](#plfunction): the (not recent) C++ compiler does not consider `__FUNCTION__` as `constexpr`. <br>
You probably hit the issue described in [plFunction](#plfunction): some "old" C++ compilers do not consider `__FUNCTION__` as `constexpr`. <br>
In such case, either:
- switch to `plFunctionDyn()`, with the drawback of the non optimal dynamic string management
- use `plScope("manually copied function name")`
- use a more recent compiler
- use a more recent compiler, if possible

<br/>
**I use clang with ASAN and the memory allocations are not logged...**
**I use `clang` with ASAN and the memory allocations are not logged...**

Overloading `new` and `delete` operators in clang with ASAN does not work, this is a known issue in clang (see clang bugzilla https://bugs.llvm.org/show_bug.cgi?id=19660 ).

Expand Down Expand Up @@ -1042,23 +1042,21 @@ <h1> @@ <a href="#">C++ Instrumentation API</a> </h1>
**I have some troubles with sockets under Windows**

First, ensure that "`#define _WINSOCKAPI_`" is set before including any windows header (see testProgram.cpp). <br/>
Then, if your program already manages the WSA, you have to tell Palanteer not to do it also, with `#define PL_IMPL_MANAGE_WINDOWS_SOCKET 0`
Then, if your program already manages the WSA, you have to tell Palanteer not to do it also, with `#define PL_IMPL_MANAGE_WINDOWS_SOCKET 0` (default is 1)

<br/>
**PL_IMPL_CONTEXT_SWITCH is set to 1 but I do not see any core related event**
**PL_IMPL_CONTEXT_SWITCH is set to 1 but I do not see any CPU related event**

The access to such process data is restricted on many OS for security reasons. <br/>
To be able to collect them, your program shall run in privileged mode (root on Linux, administrator on Windows).

Beware that on Linux, using `sudo` will create record files from and for the root user. The location of the records may change too.
The access to such data is restricted on many OS for security reasons. <br/>
Their collection implies that the program runs in privileged mode (root on Linux, administrator on Windows).

<br/>
**I have many threads in parallel and many "SATURATION" markers despite I allocated enough memory for the event collection buffers**

If all CPUs are saturated, the collection task cannot run regularly enough and buffers will get full whatever their allocated size. <br/>
The consequence is that event logging will start to block the threads, waiting for some space for logging, and place a "SATURATION" marker.

The presence of the viewer or scripting module on the same machine is maybe the problem, as they use also some CPU and may interfere with the program observation. <br/>
The presence of the viewer or scripting module on the same machine is maybe the problem, as they use also some CPU and may interfere with the program under observation. <br/>
Ideally, the available CPU quantity shall be: at least the required quantity for your program + 3 (instrumentation thread + viewer recording thread + viewer display thread).

Recording on a file reduces this requirement to: at least the required quantity for your program + 1 (instrumentation thread), as the processing of the raw events will be done later.
Expand All @@ -1075,10 +1073,11 @@ <h1> @@ <a href="#">C++ Instrumentation API</a> </h1>
On Windows, the stacktrace decoding requires some initialization which cannot be safely done at crash time.

The following points shall be checked:
- the compilation flag `USE_PL` is set to 1.
- The compilation flag `USE_PL` is set to 1.
- If only the stacktrace feature is desired, set all `USE_PL`, `PL_NOEVENT` and `PL_NOCONTROL` to 1
- the compilation flag `PL_IMPL_STACKTRACE` is not explicitely set to zero (it is set to 1 by default on Windows)
- the function `plInitAndStart` is called
- The compilation flag `PL_IMPL_STACKTRACE` is not explicitely set to zero (it is set to 1 by default on Windows)
- The function `plInitAndStart` is called
- The stacktrace decoding initialization is done at this momen
- If no event recording is desired, the mode `PL_MODE_INACTIVE` can be used.


Expand Down
67 changes: 65 additions & 2 deletions docs/instrumentation_configuration_cpp.md.html
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,8 @@ <h1> @@ <a href="#">C++ Instrumentation configuration</a> </h1>
| [PL_DYN_STRING_MAX_SIZE](#pl_dyn_string_max_size) | Defines the maximum size of a dynamic string | 512 B |
| [PL_SHORT_STRING_HASH](#pl_short_string_hash) | Use 32 bits hash rather than 64 bits | 0 |
| [PL_SIMPLE_ASSERT](#pl_simple_assert) | Disables the assertion enhancements and reverts to basic ones | 0 |
| [PL_SHORT_DATE](#pl_short_date) | Use a 32 bits clock output, which implies wraps | 0 |
| [PL_COMPACT_MODEL](#pl_compact_model) | Use the "compact app model" to reduce the transferred data | 0 |

<br/>

Expand Down Expand Up @@ -468,9 +470,9 @@ <h1> @@ <a href="#">C++ Instrumentation configuration</a> </h1>
* the discriminative power of the hash
By default, the 64 bits version is used to ensure that virtually no collision happens. <br/>
This constant forces the usage of the 32 bit version of FNV.
This constant forces the usage of the 32 bits version of FNV.
The only 'gain' is for 32 bit systems and only when using dynamic strings or recording context switches: run-time computation speed will then be slightly improved. <br/>
The only 'gain' is for 32 bits systems and only when using dynamic strings or recording context switches: run-time computation speed will then be slightly improved. <br/>
Note that storage size and transmitted data are not reduced when using this flag.
The default value is:
Expand All @@ -491,6 +493,49 @@ <h1> @@ <a href="#">C++ Instrumentation configuration</a> </h1>
#define PL_SIMPLE_ASSERT 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
### PL_SHORT_DATE
This compilation flag switches the instrumentation to use a 32 bits clock, and the server side to handle its wraps.
The two main use cases are:
- The high resolution clock is indeed 32 bits (on 32 bits architectures for instance)
- The compact application model is used (see [PL_COMPACT_MODEL](#pl_compact_model) below)
If set, the server manages clock wraps and reconstitutes the unique clock date. <br/>
For proper wrap handling, the wrap period shall be twice larger than the event sending period (should always be the case in theory).
The default value is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++
#define PL_SHORT_DATE 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

### PL_COMPACT_MODEL

A "compact application model" is an instrumented program which transfers 50% less data with the server (12 bytes per events instead of 24). <br/>
This applies both for storage on file and for connected mode.

To be compatible with the "compact model", the following constraints shall be fullfilled:
- At most 65535 unique strings are used
- Only 32 bits types are used
- If used, `double` and 64 bits integers are truncated to their 32 bits equivalent

!!! note This mode is not restricted to small embedded devices
The biggest constraint is usually the restriction to 32 bits data types (which is even easier on 32 bits architectures). <br/>
Indeed, if the usage of formatted dynamic string is moderate, the limit of 65k unique strings should be reached only for large programs

In this mode, the following settings are forced:
- `PL_SHORT_DATE` is set to 1 (32 bits clock)
- `PL_SHORT_STRING_HASH` is set to 1 (32 bits string hashs)

The default value is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++
#define PL_COMPACT_MODEL 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
!!! tip
The record sizes on the server side are unaffected by this flag. Only the `.pltraw` file size and the byte quantity sent by socket are reduced.
### PL_GET_CLOCK_TICK_FUNC
This macro points to the function which provides a high resolution clock stored on a uint64_t.
Expand Down Expand Up @@ -520,6 +565,24 @@ <h1> @@ <a href="#">C++ Instrumentation configuration</a> </h1>
#endif
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Troubleshootings

**What is the recommended setting for small embedded devices?**

Such devices usually use a 32 bits architecture and run software of moderate size. <br/>
Two settings have been designed with this use-case in mind:
- The [PL_COMPACT_MODEL](#pl_compact_model), which reduces the data transfer on socket (or any transport layer)
- The [PL_EXTERNAL_STRINGS](#pl_external_strings) feature, which zeroes the cost of using static strings in the instrumentation

Note that the compact model forces the activation of [PL_SHORT_DATE](#pl_short_date) and [PL_SHORT_STRING_HASH](#pl_short_string_hash). <br/>
For very limited resource cases, the additional flag [PL_SIMPLE_ASSERT](#pl_simple_assert) can be used.

The remote control can also be disabled (with `PL_NOCONTROL=1`), but be able to remotely control and test an embedded device is always interesting.

<br/>
**What is the impact of these settings on the memory usage?**

See the description in the [performance section](index.html#c++memoryusage).


<h1> @@ <a href="instrumentation_api_python.md.html">Python Instrumentation API</a> </h1>
Expand Down
2 changes: 1 addition & 1 deletion server/base/bsHashMap.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
// Simple and fast flat hash table with linear open addressing, dedicated to build a lookup
// - Hashing is internal (for u32 & u64 keys) and an external api is provided (for performance).
// If external, ensure that it is good enough to avoid clusters, and that external api is always used
// - best storage packing is for 32 bit key size
// - best storage packing is for 32 bits key size
// - single value per key (overwrite of existing value)

template <typename K, typename V>
Expand Down
2 changes: 1 addition & 1 deletion server/base/bsOs.h
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ void osSetIcon(int width, int height, const u8* pixels); // Size of p
// strcasestr does not exists on windows
const char* strcasestr(const char* s, const char* sToFind);

// ftell and fseek are limited to 32 bit offset, so the 64 bit version is used here
// ftell and fseek are limited to 32 bits offset, so the 64 bits version is used here
#define bsOsFseek _fseeki64
#define bsOsFtell _ftelli64

Expand Down
2 changes: 1 addition & 1 deletion server/base/bsString.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ bsCharUtf8ToUnicode(u8* begin, u8* end, char16_t& codepoint)
if (((*begin)&0x80)==0x00) trailingBytes = 0;
else if(((*begin)&0xE0)==0xC0) trailingBytes = 1;
else if(((*begin)&0xF0)==0xE0) trailingBytes = 2;
else return 0; // Failure, unable to support 32 bit unicode, only 16 bits
else return 0; // Failure, unable to support 32 bits unicode, only 16 bits
if(begin+trailingBytes>=end) return 0; // Failure due to corrupted input

u32 output = 0;
Expand Down
2 changes: 1 addition & 1 deletion server/base/bsString.h
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ class bsString : public bsVec<u8>
// Windows way of coding unicode
// Cons: incompatibility with ascii and all existing interfaces (cf Windows duplicated API A&W...). Subject to endianess. Also twice bigger size than UTF8 for ascii
// Pro : (almost) always 2 bytes for (almost) all characters in the world. More efficient than UTF8 for BMP characters not in latin-1
// Note : Implementation limited to 16 bits (no support for 32 bit codepoints) which is the range of the BMP.
// Note : Implementation limited to 16 bits (no support for 32 bits codepoints) which is the range of the BMP.
// This approximation allows: a fixed size (16 bit) codepoint + direct mapping between the unicode codepoint and the (limited) UTF16 value
class bsStringUtf16 : public bsVec<char16_t>
{
Expand Down
Loading

0 comments on commit 6c8b6ec

Please sign in to comment.