c++ - Understanding the overhead of lambda functions in C++11 -
this touched in why c++ lambda slower ordinary function when called multiple times? , c++0x lambda overhead think example bit different discussion in former , contradicts result in latter.
on search bottleneck in code found recusive template function processes variadic argument list given processor function, copying value buffer.
template <typename t> void processarguments(std::function<void(const t &)> process) {} template <typename t, typename head, typename ... tail> void processarguments(std::function<void(const t &)> process, const head &head, const tail &... tail) { process(head); processarguments(process, tail...); } i compared runtime of program uses code lambda function global function copies arguments global buffer using moving pointer:
int buffer[10]; int main(int argc, char **argv) { int *p = buffer; (unsigned long int = 0; < 10e6; ++i) { p = buffer; processarguments<int>([&p](const int &v) { *p++ = v; }, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10); } } compiled g++ 4.6 , -o3 measuring tool time takes more 6 seconds on machine while
int buffer[10]; int *p = buffer; void copyintobuffer(const int &value) { *p++ = value; } int main(int argc, char **argv) { int *p = buffer; (unsigned long int = 0; < 10e6; ++i) { p = buffer; processarguments<int>(copyintobuffer, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10); } return 0; } takes 1.4 seconds.
i not going on behind scenes explains time overhead , wondering if can change make use of lambda functions without paying runtime.
the problem here usage of std::function. send copy , therefore copying contents (and doing recursively unwind parameters).
now, pointer function, contents is, well, pointer function. lambda, contents @ least pointer function + reference captured. twice copy. plus, because of std::function's type erasure copying data slower (not inlined).
there several options here, , best passing not std::function, template instead. benefits method call more inlined, no type erasure happens std::function, no copying happens, good. that:
template <typename tfunc> void processarguments(const tfunc& process) {} template <typename tfunc, typename head, typename ... tail> void processarguments(const tfunc& process, const head &head, const tail &... tail) { process(head); processarguments(process, tail...); } second option doing same, sending process copy. now, copying happen, still neatly inlined.
what's equally important process' body can inlined, lamda. depending on complexity of copying lambda object , size, passing copy may or may not faster passing reference. may faster because compiler may have harder time reasoning reference local copy.
template <typename tfunc> void processarguments(tfunc process) {} template <typename tfunc, typename head, typename ... tail> void processarguments(tfunc process, const head &head, const tail &... tail) { process(head); processarguments(process, tail...); } third option is, well, try passing std::function<> reference. way @ least avoid copying, calls not inlined.
here perf results (using ideones' c++11 compiler). note that, expected, inlined lambda body giving best performance:
original function: 0.483035s original lambda: 1.94531s function via template copy: 0.094748 ### lambda via template copy: 0.0264867s function via template reference: 0.0892594s ### lambda via template reference: 0.0264201s function via std::function reference: 0.0891776s lambda via std::function reference: 0.09s
Comments
Post a Comment