aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/builtin-script.c
Commit message (Collapse)AuthorAgeFilesLines
* perf sample: Remove arch notion of sample parsingIan Rogers2025-07-251-1/+1
| | | | | | | | | | | | | | By definition arch sample parsing and synthesis will inhibit certain kinds of cross-platform record then analysis (report, script, etc.). Remove arch_perf_parse_sample_weight and arch_perf_synthesize_sample_weight replacing with a common implementation. Combine perf_sample p_stage_cyc and retire_lat as weight3 to capture the differing uses regardless of compiled for architecture. Signed-off-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf evlist: Change env variable to sessionIan Rogers2025-07-251-1/+1
| | | | | | | | | | | | The session holds a perf_env pointer env. In UI code container_of is used to turn the env to a session, but this assumes the session header's env is in use. Rather than a dubious container_of, hold the session in the evlist and derive the env from the session with evsel__env, perf_session__env, etc. Signed-off-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf session: Add accessor for session->header.envIan Rogers2025-07-251-6/+8
| | | | | | | | | | The perf_env from the header in the session is frequently accessed, add an accessor function rather than access directly. Cache the value to avoid repeated calls. No behavioral change. Signed-off-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf stat: Move metric list from config to evlistIan Rogers2025-07-111-2/+1
| | | | | | | | | | | | | | The rblist of metric_event that then have a list of associated metric_expr is moved out of the stat_config and into the evlist. This is done as part of refactoring things for python, having the state split in two places complicates that implementation. The evlist is doing the harder work of enabling and disabling events, the metrics are needed to compute a value and it doesn't seem unreasonable to hang them from the evlist. Signed-off-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf tools: display the new PERF_RECORD_BPF_METADATA eventBlake Jones2025-06-201-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here's some example "perf script -D" output for the new event type. The ": unhandled!" message is from tool.c, analogous to other behavior there. I've elided some rows with all NUL characters for brevity, and I wrapped one of the >75-column lines to fit in the commit guidelines. [email protected] [0x260]: event: 84 . . ... raw event: size 608 bytes . 0000: 54 00 00 00 00 00 60 02 62 70 66 5f 70 72 6f 67 T.....`.bpf_prog . 0010: 5f 31 65 30 61 32 65 33 36 36 65 35 36 66 31 61 _1e0a2e366e56f1a . 0020: 32 5f 70 65 72 66 5f 73 61 6d 70 6c 65 5f 66 69 2_perf_sample_fi . 0030: 6c 74 65 72 00 00 00 00 00 00 00 00 00 00 00 00 lter............ . 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [...] . 0110: 74 65 73 74 5f 76 61 6c 75 65 00 00 00 00 00 00 test_value...... . 0120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [...] . 0150: 34 32 00 00 00 00 00 00 00 00 00 00 00 00 00 00 42.............. . 0160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [...] 0 0x50fc8 [0x260]: PERF_RECORD_BPF_METADATA \ prog bpf_prog_1e0a2e366e56f1a2_perf_sample_filter entry 0: test_value = 42 : unhandled! Signed-off-by: Blake Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf script: Display off-cpu samples correctlyHoward Chu2025-05-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No PERF_SAMPLE_CALLCHAIN in sample_type, but 'perf script' needs to display a callchain, have to specify manually. Also, prefer displaying a callchain: gvfs-afc-volume 2267 [001] 3829232.955656: 1001115340 offcpu-time: 77f05292603f __pselect+0xbf (/usr/lib/x86_64-linux-gnu/libc.so.6) 77f052a1801c [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0) 77f052a18d45 [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0) 77f05289ca94 start_thread+0x384 (/usr/lib/x86_64-linux-gnu/libc.so.6) 77f052929c3c clone3+0x2c (/usr/lib/x86_64-linux-gnu/libc.so.6) to a raw binary BPF output: BPF output: 0000: dd 08 00 00 db 08 00 00 <DD>...<DB>... 0008: cc ce ab 3b 00 00 00 00 <CC>Ϋ;.... 0010: 06 00 00 00 00 00 00 00 ........ 0018: 00 fe ff ff ff ff ff ff .<FE><FF><FF><FF><FF><FF><FF> 0020: 3f 60 92 52 f0 77 00 00 ?`.R<F0>w.. 0028: 1c 80 a1 52 f0 77 00 00 ..<A1>R<F0>w.. 0030: 45 8d a1 52 f0 77 00 00 E.<A1>R<F0>w.. 0038: 94 ca 89 52 f0 77 00 00 .<CA>.R<F0>w.. 0040: 3c 9c 92 52 f0 77 00 00 <..R<F0>w.. 0048: 00 00 00 00 00 00 00 00 ........ 0050: 00 00 00 00 .... Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Howard Chu <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Gautam Menghani <[email protected]> Tested-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Fix output type for dynamically allocated core PMU'sThomas Falcon2025-03-051-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch was originally posted here: https://lore.kernel.org/all/[email protected]/ I have rebased on top of Arnaldo's patch here: https://lore.kernel.org/all/Z2XCi3PgstSrV0SE@x1/ The original commit message: " perf script output may show different fields on different core PMU's that exist on heterogeneous platforms. For example, perf record -e "{cpu_core/mem-loads-aux/,cpu_core/event=0xcd,\ umask=0x01,ldlat=3,name=MEM_UOPS_RETIRED.LOAD_LATENCY/}:upp"\ -c10000 -W -d -a -- sleep 1 perf script: chromium-browse 46572 [002] 544966.882384: 10000 cpu_core/MEM_UOPS_RETIRED.LOAD_LATENCY/: 7ffdf1391b0c 10268100142 \ |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 5 7 0 7fad7c47425d [unknown] (/usr/lib64/libglib-2.0.so.0.8000.3) perf record -e cpu_atom/event=0xd0,umask=0x05,ldlat=3,\ name=MEM_UOPS_RETIRED.LOAD_LATENCY/upp -c10000 -W -d -a -- sleep 1 perf script: gnome-control-c 534224 [023] 544951.816227: 10000 cpu_atom/MEM_UOPS_RETIRED.LOAD_LATENCY/: 7f0aaaa0aae0 [unknown] (/usr/lib64/libglib-2.0.so.0.8000.3) Some fields, such as data_src, are not included by default. The cause is that while one PMU may be assigned a type such as PERF_TYPE_RAW, other core PMU's are dynamically allocated at boot time. If this value does not match an existing PERF_TYPE_X value, output_type(perf_event_attr.type) will return OUTPUT_TYPE_OTHER. Instead search for a core PMU with a matching perf_event_attr type and, if one is found, return PERF_TYPE_RAW to match output of other core PMU's. " Suggested-by: Kan Liang <[email protected]> Suggested-by: Ian Rogers <[email protected]> Signed-off-by: Thomas Falcon <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf script: Add not taken event for branch stackLeo Yan2025-03-051-7/+13
| | | | | | | | | | | The branch stack has an existed field for printing mispredict, extend the field for printing events and add support not-taken event. Reviewed-by: Ian Rogers <[email protected]> Reviewed-by: James Clark <[email protected]> Signed-off-by: Leo Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf script: Make printing flags reliableLeo Yan2025-03-051-2/+7
| | | | | | | | | | | | | | Add a check for the generated string of flags. Print out the raw number if the string generation fails. Use the SAMPLE_FLAGS_STR_ALIGNED_SIZE macro to replace the value '21'. Reviewed-by: Ian Rogers <[email protected]> Reviewed-by: James Clark <[email protected]> Signed-off-by: Leo Yan <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf sample: Make user_regs and intr_regs optionalIan Rogers2025-02-131-2/+8
| | | | | | | | | | | | | | | | | | | | | | | The struct dump_regs contains 512 bytes of cache_regs, meaning the two values in perf_sample contribute 1088 bytes of its total 1384 bytes size. Initializing this much memory has a cost reported by Tavian Barnes <[email protected]> as about 2.5% when running `perf script --itrace=i0`: https://lore.kernel.org/lkml/d841b97b3ad2ca8bcab07e4293375fb7c32dfce7.1736618095.git.tavianator@tavianator.com/ Adrian Hunter <[email protected]> replied that the zero initialization was necessary and couldn't simply be removed. This patch aims to strike a middle ground of still zeroing the perf_sample, but removing 79% of its size by make user_regs and intr_regs optional pointers to zalloc-ed memory. To support the allocation accessors are created for user_regs and intr_regs. To support correct cleanup perf_sample__init and perf_sample__exit functions are created and added throughout the code base. Signed-off-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf script: Cache the output typeArnaldo Carvalho de Melo2024-12-201-42/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now every time we need to figure out the type of an evsel for output purposes we do a quick sequence of ifs, but there are new cases where there is a need to do more complex iterations over multiple data structures, sso allow for caching this operation on a hole of 'struct evsel'. This should really be done on the evsel->priv area that 'perf script' sets up, but more work is needed to make sure that it is allocated when we need it, right now it is only used for conditionally, add some comments so that we move this to that 'perf script' specific area when the conditions are in place for that. Acked-by: Thomas Falcon <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Link: https://lore.kernel.org/lkml/Z2XCi3PgstSrV0SE@x1 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Move perf_sample__sprintf_flags to trace-event-scripting.cIan Rogers2024-12-181-81/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf_sample__sprintf_flags is used in the python C code and so needs to be in the util library rather than a builtin. Signed-off-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: Mark Rutland <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Howard Chu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: James Clark <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Dapeng Mi <[email protected]> Cc: Kan Liang <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Veronika Molnarova <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Move script_fetch_insn to trace-event-scripting.cIan Rogers2024-12-181-14/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add native_arch as a parameter to script_fetch_insn rather than relying on the builtin-script value that won't be initialized for the dlfilter and python Context use cases. Assume both of those cases are running natively. Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dapeng Mi <[email protected]> Cc: Howard Chu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Veronika Molnarova <[email protected]> Cc: Weilin Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Move script_spec code to trace-event-scripting.cIan Rogers2024-12-181-64/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The script_spec code is referenced in util/trace-event-scripting but the list was in builtin-script, accessed via a function that required a stub function in python.c. Move all the logic to trace-event-scripting, with lookup and foreach functions exposed for builtin-script's benefit. Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dapeng Mi <[email protected]> Cc: Howard Chu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Veronika Molnarova <[email protected]> Cc: Weilin Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf stat: Move stat_config into config.cIan Rogers2024-12-181-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stat_config is accessed by config.c via helper functions, but declared in builtin-stat. Move to util/config.c so that stub functions aren't needed in python.c which doesn't link against the builtin files. To avoid name conflicts change builtin-script to use the same stat_config as builtin-stat. Rename local variables in tests to avoid shadow declaration warnings. Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dapeng Mi <[email protected]> Cc: Howard Chu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Veronika Molnarova <[email protected]> Cc: Weilin Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Move find_scripts to browser/scripts.cIan Rogers2024-12-181-171/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only use of find_scripts is in browser/scripts.c but the definition in builtin causes linking problems requiring a stub in python.c. Move the function to allow the stub to be removed. Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dapeng Mi <[email protected]> Cc: Howard Chu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Veronika Molnarova <[email protected]> Cc: Weilin Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Use openat for directory iterationIan Rogers2024-12-181-27/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite the directory iteration to use openat so that large character arrays aren't needed. The arrays are warned about potential buffer overflows by GCC when the code exists in a single C file. Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dapeng Mi <[email protected]> Cc: Howard Chu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Veronika Molnarova <[email protected]> Cc: Weilin Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Move scripting_max_stack out of builtinIan Rogers2024-12-181-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | scripting_max_stack is used in util code which is linked into the python module. Move the variable declaration to util/trace-event-scripting.c to avoid conditional compilation. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dapeng Mi <[email protected]> Cc: Howard Chu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Veronika Molnarova <[email protected]> Cc: Weilin Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf evsel: Add/use accessor for tp_formatIan Rogers2024-12-091-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an accessor function for tp_format. Rather than search+replace uses try to use a variable and reuse it. Add additional NULL checks when accessing/using the value. Make sure the PTR_ERR is nulled out on error path in evsel__newtp_idx. Reviewed-by: Namhyung Kim <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Ilkka Koskinen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Falcon <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yang Li <[email protected]> Cc: Ze Gao <[email protected]> Cc: Zixian Cai <[email protected]> Cc: zhaimingbing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf build: Include libtraceevent headers directly indicated by pkg-configYicong Yang2024-11-091-1/+1
| | | | | | | | | | | | | | | | | | | | | Currently the libtraceevent's found by pkg-config, which give the include path as: [root@localhost tmp]# pkg-config --cflags libtraceevent -I/usr/local/include/traceevent So we should include the libtraceevent headers directly without "traceevent/" prefix. Update all the users. Fixes: 0f0e1f445690 ("perf build: Use pkg-config for feature check for libtrace{event,fs}") Suggested-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/linux-perf-users/[email protected]/ Signed-off-by: Yicong Yang <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf arm-spe: Correctly set sample flagsGraham Woodward2024-10-291-0/+1
| | | | | | | | | | | | | Set flags on all synthesized instruction and branch samples. Signed-off-by: Graham Woodward <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: Leo Yan <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf stat: Change color to threshold in print_metricIan Rogers2024-10-171-3/+3
| | | | | | | | | | | | | | | | | | | | | Colors don't mean things in CSV and JSON output, switch to a threshold enum value that the standard output can convert to a color. Updating the CSV and JSON output will be later changes. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Yicong Yang <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Will Deacon <[email protected]> Cc: James Clark <[email protected]> Cc: Mike Leach <[email protected]> Cc: Leo Yan <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Tim Chen <[email protected]> Cc: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf env: Find correct branch counter info on hybridKan Liang2024-09-111-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No event is printed in the "Branch Counter" column on hybrid machines. For example, $ perf record -e "{cpu_core/branch-instructions/pp,cpu_core/branches/}:S" -j any,counter $ perf report --total-cycles # Branch counter abbr list: # cpu_core/branch-instructions/pp = A # cpu_core/branches/ = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter # ............... .............. ........... .......... .............. 44.54% 727.1K 0.00% 1 |+ |+ | 36.31% 592.7K 0.00% 2 |+ |+ | 17.83% 291.1K 0.00% 1 |+ |+ | The branch counter information (br_cntr_width and br_cntr_nr) in the perf_env is retrieved from the CPU_PMU_CAPS. However, the CPU_PMU_CAPS is not available on hybrid machines. Without the width information, the number of occurrences of an event cannot be calculated. For a hybrid machine, the caps information should be retrieved from the PMU_CAPS, and stored in the perf_env->pmu_caps. Add a perf_env__find_br_cntr_info() to return the correct branch counter information from the corresponding fields. Committer notes: While testing I couldn't s ee those "Branch counter" columns enabled by pressing 'B' on the TUI, after reporting it to the list Kan explained the situation: <quote Kan Liang> For a hybrid client, the "Branch Counter" feature is only supported starting from the just released Lunar Lake. Perf falls back to only "ANY" on your Raptor Lake. The "The branch counter is not available" message is expected. Here is the 'perf evlist' result from my Lunar Lake machine, # perf evlist -v cpu_core/branch-instructions/pp: type: 4 (cpu_core), size: 136, config: 0xc4 (branch-instructions), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|READ|PERIOD|BRANCH_STACK|IDENTIFIER, read_format: ID|GROUP|LOST, disabled: 1, freq: 1, enable_on_exec: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, branch_sample_type: ANY|COUNTERS # </quote> Fixes: 6f9d8d1de2c61288 ("perf script: Add branch counters") Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Kan Liang <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Minimize "not reaching sample" for '-F +brstackinsn'Andi Kleen2024-09-031-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some situations 'perf script -F +brstackinsn' sees a lot of "not reaching sample" messages. This happens when the last LBR block before the sample contains a branch that is not in the LBR, and the instruction dumping stops. $ perf record -b emacs -Q --batch '()' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.396 MB perf.data (443 samples) ] $ perf script -F +brstackinsn ... 00007f0ab2d171a4 insn: 41 0f 94 c0 00007f0ab2d171a8 insn: 83 fa 01 00007f0ab2d171ab insn: 74 d3 # PRED 6 cycles [313] 1.00 IPC 00007f0ab2d17180 insn: 45 84 c0 00007f0ab2d17183 insn: 74 28 ... not reaching sample ... $ perf script -F +brstackinsn | grep -c reach 136 $ This is a problem for further analysis that wants to see the full code upto the sample. There are two common cases where the message is bogus: - The LBR only logs taken branches, but the branch might be a conditional branch that is not taken (that is the most common case actually) - The LBR sampling uses a filter ignoring some branches, but the perf script check checks for all branches. This patch fixes these two conditions, by only checking for conditional branches, as well as checking the perf_event_attr's branch filter attributes. For the test case above it fixes all the messages: $ ./perf script -F +brstackinsn | grep -c reach 0 Note that there are still conditions when the message is hit -- sometimes there can be a unconditional branch that misses the LBR update before the sample -- but they are much more rare now. Signed-off-by: Andi Kleen <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Add branch countersKan Liang2024-08-141-7/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's useful to print the branch counter information for each jump in the brstackinsn when it's available. Add a new field 'brcntr' to display the branch counter information. By default, the abbreviation will be used to indicate the branch counter. In the verbose mode, the real event name is shown. $ perf script -F +brstackinsn,+brcntr # Branch counter abbr list: # branch-instructions:ppp = A # branch-misses = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (home/sdp/test/tchain_edit) f3+31: 0000000000401774 insn: eb 04 br_cntr: AA # PRED 5 cycles [5] 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: A # PRED 1 cycles [6] 2.00 IPC 0000000000401766 insn: 8b 45 fc 0000000000401769 insn: 83 e0 01 000000000040176c insn: 85 c0 000000000040176e insn: 74 06 br_cntr: A # PRED 1 cycles [7] 4.00 IPC 0000000000401776 insn: 83 45 fc 01 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: A # PRED 7 cycles [14] 0.43 IPC $ perf script -F +brstackinsn,+brcntr -v tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (/home/sdp/os.linux.perf.test-suite/kernels/lbr_kernel/tchain_edit) f3+31: 0000000000401774 insn: eb 04 br_cntr: branch-instructions:ppp 2 branch-misses 0 # PRED 5 cycles [5] 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [6] 2.00 IPC 0000000000401766 insn: 8b 45 fc 0000000000401769 insn: 83 e0 01 000000000040176c insn: 85 c0 000000000040176e insn: 74 06 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [7] 4.00 IPC 0000000000401776 insn: 83 45 fc 01 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 7 cycles [14] 0.43 IPC Originally-by: Tinghao Zhang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Use perf_tool__init()Ian Rogers2024-08-121-35/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use perf_tool__init() so that more uses of 'struct perf_tool' can be const and not relying on perf_tool__fill_defaults(). Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ilkka Koskinen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf tool: Constify tool pointersIan Rogers2024-08-121-21/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tool pointer (to a struct largely of function pointers) is passed around but is unchanged except at initialization. Change parameter and variable types to be const to lower the possibilities of what could happen with a tool. Reviewed-by: Adrian Hunter <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Tested-by: Adrian Hunter <[email protected]> Tested-by: Leo Yan <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ilkka Koskinen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: add --addr2line optionMartin Liška2024-08-121-0/+2
| | | | | | | | | | Similarly to other subcommands (like report, top), it would be handy to provide a path for addr2line command. Signed-off-by: Martin Liska <[email protected]> Cc: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf mem-info: Add reference count checkingIan Rogers2024-05-071-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add reference count checking and switch 'struct mem_info' usage to use accessor functions. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf mem-info: Move mem-info out of mem-events and symbolIan Rogers2024-05-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move mem-info to its own header rather than having it split between mem-events and symbol. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf dso: Add reference count checking and accessor functionsIan Rogers2024-05-061-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add reference count checking to struct dso, this can help with implementing correct reference counting discipline. To avoid RC_CHK_ACCESS everywhere, add accessor functions for the variables in struct dso. The majority of the change is mechanical in nature and not easy to split up. Committer testing: 'perf test' up to this patch shows no regressions. But: util/symbol.c: In function ‘dso__load_bfd_symbols’: util/symbol.c:1683:9: error: too few arguments to function ‘dso__set_adjust_symbols’ 1683 | dso__set_adjust_symbols(dso); | ^~~~~~~~~~~~~~~~~~~~~~~ In file included from util/symbol.c:21: util/dso.h:268:20: note: declared here 268 | static inline void dso__set_adjust_symbols(struct dso *dso, bool val) | ^~~~~~~~~~~~~~~~~~~~~~~ make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/tmp.ZWHbQftdN6/util/symbol.o] Error 1 MKDIR /tmp/tmp.ZWHbQftdN6/tests/workloads/ make[6]: *** Waiting for unfinished jobs.... This was updated: - symbols__fixup_end(&dso->symbols, false); - symbols__fixup_duplicate(&dso->symbols); - dso->adjust_symbols = 1; + symbols__fixup_end(dso__symbols(dso), false); + symbols__fixup_duplicate(dso__symbols(dso)); + dso__set_adjust_symbols(dso); But not build tested with BUILD_NONDISTRO and libbfd devel files installed (binutils-devel on fedora). Add the missing argument: symbols__fixup_end(dso__symbols(dso), false); symbols__fixup_duplicate(dso__symbols(dso)); - dso__set_adjust_symbols(dso); + dso__set_adjust_symbols(dso, true); Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ahelenia Ziemiańska <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Changbin Du <[email protected]> Cc: Chengen Du <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dima Kogan <[email protected]> Cc: Ilkka Koskinen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Tiezhu Yang <[email protected]> Cc: Yanteng Si <[email protected]> Cc: zhaimingbing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Consolidate capstone print functionsAdrian Hunter2024-04-081-15/+28
| | | | | | | | | | | | | Consolidate capstone print functions, to reduce duplication. Amend call sites to use a file pointer for output, which is consistent with most perf tools print functions. Add print_opts with an option to print also the hex value of a resolved symbol+offset. Signed-off-by: Adrian Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Andi Kleen <[email protected]> [ Added missing inttypes.h include to use PRIx64 in util/print_insn.c ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Add capstone support for '-F +brstackdisasm'Andi Kleen2024-04-051-7/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support capstone output for the '-F +brstackinsn' branch dump. The new output is enabled with the new field 'brstackdisasm'. This was possible before with --xed, but now also allow it for users that don't have xed using the builtin capstone support. Before: perf record -b emacs -Q --batch '()' perf script -F +brstackinsn ... emacs 55778 1814366.755945: 151564 cycles:P: 7f0ab2d17192 intel_check_word.constprop.0+0x162 (/usr/lib64/ld-linux-x86-64.s> intel_check_word.constprop.0+237: 00007f0ab2d1711d insn: 75 e6 # PRED 3 cycles [3] 00007f0ab2d17105 insn: 73 51 00007f0ab2d17107 insn: 48 89 c1 00007f0ab2d1710a insn: 48 39 ca 00007f0ab2d1710d insn: 73 96 00007f0ab2d1710f insn: 48 8d 04 11 00007f0ab2d17113 insn: 48 d1 e8 00007f0ab2d17116 insn: 49 8d 34 c1 00007f0ab2d1711a insn: 44 3a 06 00007f0ab2d1711d insn: 75 e6 # PRED 3 cycles [6] 3.00 IPC 00007f0ab2d17105 insn: 73 51 # PRED 1 cycles [7] 1.00 IPC 00007f0ab2d17158 insn: 48 8d 50 01 00007f0ab2d1715c insn: eb 92 # PRED 1 cycles [8] 2.00 IPC 00007f0ab2d170f0 insn: 48 39 ca 00007f0ab2d170f3 insn: 73 b0 # PRED 1 cycles [9] 2.00 IPC After (perf must be compiled with capstone): perf script -F +brstackdisasm ... emacs 55778 1814366.755945: 151564 cycles:P: 7f0ab2d17192 intel_check_word.constprop.0+0x162 (/usr/lib64/ld-linux-x86-64.s> intel_check_word.constprop.0+237: 00007f0ab2d1711d jne intel_check_word.constprop.0+0xd5 # PRED 3 cycles [3] 00007f0ab2d17105 jae intel_check_word.constprop.0+0x128 00007f0ab2d17107 movq %rax, %rcx 00007f0ab2d1710a cmpq %rcx, %rdx 00007f0ab2d1710d jae intel_check_word.constprop.0+0x75 00007f0ab2d1710f leaq (%rcx, %rdx), %rax 00007f0ab2d17113 shrq $1, %rax 00007f0ab2d17116 leaq (%r9, %rax, 8), %rsi 00007f0ab2d1711a cmpb (%rsi), %r8b 00007f0ab2d1711d jne intel_check_word.constprop.0+0xd5 # PRED 3 cycles [6] 3.00 IPC 00007f0ab2d17105 jae intel_check_word.constprop.0+0x128 # PRED 1 cycles [7] 1.00 IPC 00007f0ab2d17158 leaq 1(%rax), %rdx 00007f0ab2d1715c jmp intel_check_word.constprop.0+0xc0 # PRED 1 cycles [8] 2.00 IPC 00007f0ab2d170f0 cmpq %rcx, %rdx 00007f0ab2d170f3 jae intel_check_word.constprop.0+0x75 # PRED 1 cycles [9] 2.00 IPC Reviewed-by: Adrian Hunter <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Support 32bit code under 64bit OS with capstoneAndi Kleen2024-04-051-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the DSO to resolve whether an IP is 32bit or 64bit and use that to configure capstone to the correct mode. This allows to correctly disassemble 32bit code under a 64bit OS. % cat > loop.c volatile int var; int main(void) { int i; for (i = 0; i < 100000; i++) var++; } % gcc -m32 -o loop loop.c % perf record -e cycles:u ./loop % perf script -F +disasm loop 82665 1833176.618023: 1 cycles:u: f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2) movl %esp, %eax loop 82665 1833176.618029: 1 cycles:u: f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2) movl %esp, %eax loop 82665 1833176.618031: 7 cycles:u: f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2) movl %esp, %eax loop 82665 1833176.618034: 91 cycles:u: f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2) movl %esp, %eax loop 82665 1833176.618036: 1242 cycles:u: f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2) movl %esp, %eax Reviewed-by: Adrian Hunter <[email protected]> Acked-by: Thomas Richter <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf evsel: Use evsel__name_is() helperYang Jihong2024-04-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code cleanup, replace strcmp(evsel__name(evsel, {NAME})) with evsel__name_is() helper. No functional change. Committer notes: Fix this build error: trace.syscalls.events.bpf_output = evlist__last(trace.evlist); - assert(evsel__name_is(trace.syscalls.events.bpf_output), "__augmented_syscalls__"); + assert(evsel__name_is(trace.syscalls.events.bpf_output, "__augmented_syscalls__")); Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Yang Jihong <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Show also errors for --insn-trace optionAdrian Hunter2024-03-211-1/+1
| | | | | | | | | | | | | | | | | | | The trace could be misleading if trace errors are not taken into account, so display them also by adding the itrace "e" option. Note --call-trace and --call-ret-trace already add the itrace "e" option. Fixes: b585ebdb5912cf14 ("perf script: Add --insn-trace for instruction decoding") Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf: script: add raw|disasm arguments to --insn-trace optionChangbin Du2024-02-211-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now '--insn-trace' accept a argument to specify the output format: - raw: display raw instructions. - disasm: display mnemonic instructions (if capstone is installed). $ sudo perf script --insn-trace=raw ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: 48 89 e7 ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: e8 e8 0c 00 00 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: f3 0f 1e fa $ sudo perf script --insn-trace=disasm ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) illegal instruction ls 1443864 [006] 2275506.209908875: 7f216b426df4 _dl_start+0x4 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df5 _dl_start+0x5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df8 _dl_start+0x8 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %r15 Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Cc: [email protected] Cc: Thomas Richter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* perf: script: add field 'disasm' to display mnemonic instructionsChangbin Du2024-02-211-1/+14
| | | | | | | | | | | | | | | | | | | In addition to the 'insn' field, this adds a new field 'disasm' to display mnemonic instructions instead of the raw code. $ sudo perf script -F +disasm perf-exec 1443864 [006] 2275506.209848: psb: psb offs: 0 0 [unknown] ([unknown]) perf-exec 1443864 [006] 2275506.209848: cbr: cbr: 41 freq: 4100 MHz (114%) 0 [unknown] ([unknown]) ls 1443864 [006] 2275506.209905: 1 branches:uH: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908: 1 branches:uH: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Cc: [email protected] Cc: Thomas Richter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* perf: util: use capstone disasm engine to show assembly instructionsChangbin Du2024-02-211-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | Currently, the instructions of samples are shown as raw hex strings which are hard to read. x86 has a special option '--xed' to disassemble the hex string via intel XED tool. Here we use capstone as our disassembler engine to give more friendly instructions. We select libcapstone because capstone can provide more insn details. Perf will fallback to raw instructions if libcapstone is not available. The advantages compared to XED tool: * Support arm, arm64, x86-32, x86_64 (more could be supported), xed only for x86_64. * Immediate address operands are shown as symbol+offs. Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Cc: [email protected] Cc: Thomas Richter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* perf tools: Make it possible to see perf's kernel and module memory mappingsAdrian Hunter2024-02-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dump kmaps if using 'perf --debug kmaps' or verbose > 2 (e.g. -vvv) for tools 'perf script' and 'perf report' if there is no browser. Example: $ perf --debug kmaps script 2>&1 >/dev/null | grep kvm.intel build id event received for /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko: 0691d75e10e72ebbbd45a44c59f6d00a5604badf [20] Map: 0-3a3 4f5d8 [kvm_intel].modinfo Map: 0-5240 5f280 [kvm_intel]__versions Map: 0-30 64 [kvm_intel].note.Linux Map: 0-14 644c0 [kvm_intel].orc_header Map: 0-5297 43680 [kvm_intel].rodata Map: 0-5bee 3b837 [kvm_intel].text.unlikely Map: 0-7e0 41430 [kvm_intel].noinstr.text Map: 0-2080 713c0 [kvm_intel].bss Map: 0-26 705c8 [kvm_intel].data..read_mostly Map: 0-5888 6a4c0 [kvm_intel].data Map: 0-22 70220 [kvm_intel].data.once Map: 0-40 705f0 [kvm_intel].data..percpu Map: 0-1685 41d20 [kvm_intel].init.text Map: 0-4b8 6fd60 [kvm_intel].init.data Map: 0-380 70248 [kvm_intel]__dyndbg Map: 0-8 70218 [kvm_intel].exit.data Map: 0-438 4f980 [kvm_intel]__param Map: 0-5f5 4ca0f [kvm_intel].rodata.str1.1 Map: 0-3657 493b8 [kvm_intel].rodata.str1.8 Map: 0-e0 70640 [kvm_intel].data..ro_after_init Map: 0-500 70ec0 [kvm_intel].gnu.linkonce.this_module Map: ffffffffc13a7000-ffffffffc1421000 a0 /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko The example above shows how the module section mappings are all wrong except for the main .text mapping at 0xffffffffc13a7000. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Like Xu <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* perf script: Print source line for each jump in brstackinsnKan Liang2024-02-071-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the srcline option, the perf script only prints a source line at the beginning of a sample with call/ret from functions, but not for each jump in brstackinsn. It's useful to print a source line for each jump in brstackinsn when the end user analyze the full assembler sequences of branch sequences for the sample. The srccode option can also be used to locate the source code line. However, it's printed almost for every line and makes the output less readable. $perf script -F +brstackinsn,+srcline --xed Before the patch, tchain_edit_deb 1463275 15228549.107820: 282495 instructions:u: 401133 f3+0xd (/home/kan/os.li> tchain_edit.c:22 f3+40: tchain_edit.c:20 000000000040114e jle 0x401133 # PRED 6 cycles [6] 0000000000401133 movl -0x4(%rbp), %eax 0000000000401136 and $0x1, %eax 0000000000401139 test %eax, %eax 000000000040113b jz 0x401143 000000000040113d addl $0x1, -0x4(%rbp) 0000000000401141 jmp 0x401147 # PRED 3 cycles [9] 2.00 IPC 0000000000401147 cmpl $0x3e7, -0x4(%rbp) 000000000040114e jle 0x401133 # PRED 6 cycles [15] 0.33 IPC After the patch, tchain_edit_deb 1463275 15228549.107820: 282495 instructions:u: 401133 f3+0xd (/home/kan/os.li> tchain_edit.c:22 f3+40: tchain_edit.c:20 000000000040114e jle 0x401133 srcline: tchain_edit.c:20 # PRED 6 cycles [6] 0000000000401133 movl -0x4(%rbp), %eax 0000000000401136 and $0x1, %eax 0000000000401139 test %eax, %eax 000000000040113b jz 0x401143 000000000040113d addl $0x1, -0x4(%rbp) 0000000000401141 jmp 0x401147 srcline: tchain_edit.c:23 # PRED 3 cycles [9] 2.00 IPC 0000000000401147 cmpl $0x3e7, -0x4(%rbp) 000000000040114e jle 0x401133 srcline: tchain_edit.c:20 # PRED 6 cycles [15] 0.33 IPC Signed-off-by: Kan Liang <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
* perf: script: fix missing ',' for fields optionChangbin Du2023-10-171-1/+1
| | | | | | | | A comma is missed at the end of line. Signed-off-by: Changbin Du <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
* perf script: Print "cgroup" field on the same line as "comm"Ivan Babrou2023-08-081-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3fd7a168bf51 ("perf script: Add 'cgroup' field for output") added support for printing cgroup path in perf script output. It was okay if you didn't want any stacks: $ sudo perf script --comms jpegtran:23f4bf -F comm,tid,cpu,time,cgroup jpegtran:23f4bf 3321915 [013] 404718.587488: /idle.slice/polish.service jpegtran:23f4bf 3321915 [031] 404718.592073: /idle.slice/polish.service With stacks it gets messier as cgroup is printed after the stack: $ perf script --comms jpegtran:23f4bf -F comm,tid,cpu,time,cgroup,ip,sym jpegtran:23f4bf 3321915 [013] 404718.587488: 5c554 compress_output 570d9 jpeg_finish_compress 3476e jpegtran_main 330ee jpegtran::main 326e2 core::ops::function::FnOnce::call_once (inlined) 326e2 std::sys_common::backtrace::__rust_begin_short_backtrace /idle.slice/polish.service jpegtran:23f4bf 3321915 [031] 404718.592073: 8474d jsimd_encode_mcu_AC_first_prepare_sse2.PADDING 55af68e62fff [unknown] /idle.slice/polish.service Let's instead print cgroup on the same line as comm: $ perf script --comms jpegtran:23f4bf -F comm,tid,cpu,time,cgroup,ip,sym jpegtran:23f4bf 3321915 [013] 404718.587488: /idle.slice/polish.service 5c554 compress_output 570d9 jpeg_finish_compress 3476e jpegtran_main 330ee jpegtran::main 326e2 core::ops::function::FnOnce::call_once (inlined) 326e2 std::sys_common::backtrace::__rust_begin_short_backtrace jpegtran:23f4bf 3321915 [031] 404718.592073: /idle.slice/polish.service 8474d jsimd_encode_mcu_AC_first_prepare_sse2.PADDING 55af68e62fff [unknown] Fixes: 3fd7a168bf514979 ("perf script: Add 'cgroup' field for output") Signed-off-by: Ivan Babrou <[email protected]> Acked-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Remove some large stack allocationsIan Rogers2023-06-121-4/+13
| | | | | | | | | | | | | | | | | Some char buffers are stack allocated but in total they come to 24kb. Avoid Wstack-usage warnings by moving the arrays to being dynamically allocated. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf callchain: Use pthread keys for tls callchain_cursorIan Rogers2023-06-121-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pthread keys are more portable than __thread and allow the association of a destructor with the key. Use the destructor to clean up TLS callchain cursors to aid understanding memory leaks. Committer notes: Had to fixup a series of unconverted places and also check for the return of get_tls_callchain_cursor() as it may fail and return NULL. In that unlikely case we now either print something to a file, if the caller was expecting to print a callchain, or return an error code to state that resolving the callchain isn't possible. In some cases this was made easier because thread__resolve_callchain() already can fail for other reasons, so this new one (cursor == NULL) can be added and the callers don't have to explicitely check for this new condition. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ali Saidi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Brian Robbins <[email protected]> Cc: Changbin Du <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: Fangrui Song <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Babrou <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Ye Xingchen <[email protected]> Cc: Yuan Can <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf addr_location: Add init/exit/copy functionsIan Rogers2023-06-121-31/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct addr_location holds references to multiple reference counted objects. Add init/exit functions to make maintenance of those more consistent with the rest of the code and to try to avoid leaks. Modification of thread reference counts isn't included in this change. Committer notes: I needed to initialize result to sample->ip to make sure is set to something, fixing a compile time error, mostly keeping the previous logic as build_alloc_func_list() already does debugging/error prints about what went wrong if it takes the 'goto out'. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ali Saidi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Brian Robbins <[email protected]> Cc: Changbin Du <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: Fangrui Song <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Babrou <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Ye Xingchen <[email protected]> Cc: Yuan Can <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf thread: Add accessor functions for threadIan Rogers2023-06-121-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using accessors will make it easier to add reference count checking in later patches. Committer notes: thread->nsinfo wasn't wrapped as it is used together with nsinfo__zput(), where does a trick to set the field with a refcount being dropped to NULL, and that doesn't work well with using thread__nsinfo(thread), that loses the &thread->nsinfo pointer. When refcount checking is added to 'struct thread', later in this series, nsinfo__zput(RC_CHK_ACCESS(thread)->nsinfo) will be used to check the thread pointer. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ali Saidi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Brian Robbins <[email protected]> Cc: Changbin Du <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: Fangrui Song <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Babrou <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Ye Xingchen <[email protected]> Cc: Yuan Can <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Fix allocation of evsel->priv related to per-event dump filesArnaldo Carvalho de Melo2023-06-121-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When printing output we may want to generate per event files, where the --per-event-dump option should be used, creating perf.data.EVENT.dump files instead of printing to stdout. The callback thar processes event thus expects that evsel->priv->fp should point to either the per-event FILE descriptor or to stdout. The a3af66f51bd0bca7 ("perf script: Fix crash because of missing evsel->priv") changeset fixed a case where evsel->priv wasn't setup, thus set to NULL, causing a segfault when trying to access evsel->priv->fp. But it did it for the non --per-event-dump case by allocating a 'struct perf_evsel_script' just to set its ->fp to stdout. Since evsel->priv is only freed when --per-event-dump is used, we ended up with a memory leak, detected using ASAN. Fix it by using the same method as perf_script__setup_per_event_dump(), and reuse that static 'struct perf_evsel_script'. Also check if evsel_script__new() failed. Fixes: a3af66f51bd0bca7 ("perf script: Fix crash because of missing evsel->priv") Reported-by: Ian Rogers <[email protected]> Tested-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Ravi Bangoria <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Increase PID/TID width for outputNamhyung Kim2023-06-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On large systems, it's common that PID/TID is bigger than 5-digit and it makes the output unaligned. Let's increase the width to 7. Before: $ perf script ... swapper 0 [006] 1540823.803935: 1369324 cycles:P: ffffffff9c755588 ktime_get+0x18 ([kernel.kallsyms]) gvfsd-dnssd 95114 [004] 1540823.804164: 1643871 cycles:P: ffffffff9cfdca5c __get_user_8+0x1c ([kernel.kallsyms]) perf-exec 1558582 [000] 1540823.804209: 1018714 cycles:P: ffffffff9c924ab9 __slab_free+0x9 ([kernel.kallsyms]) nmcli 1558589 [007] 1540823.804384: 1859212 cycles:P: 7f70537a8ad8 __strchrnul_evex+0x18 (/usr/lib/x86_64-linux-gnu/libc.so.6> sleep 1558582 [000] 1540823.804456: 987425 cycles:P: 7fd35bb27b30 _dl_init+0x0 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2> dbus-daemon 3043 [003] 1540823.804575: 1564465 cycles:P: ffffffff9cb2bb70 llist_add_batch+0x0 ([kernel.kallsyms]) gdbus 1558592 [001] 1540823.804766: 1315219 cycles:P: ffffffff9c797b2e audit_filter_syscall+0x9e ([kernel.kallsyms]) NetworkManager 3452 [005] 1540823.805301: 1558782 cycles:P: 7fa957737748 g_bit_lock+0x58 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7400.5> After: $ perf script ... swapper 0 [006] 1540823.803935: 1369324 cycles:P: ffffffff9c755588 ktime_get+0x18 ([kernel.kallsyms]) gvfsd-dnssd 95114 [004] 1540823.804164: 1643871 cycles:P: ffffffff9cfdca5c __get_user_8+0x1c ([kernel.kallsyms]) perf-exec 1558582 [000] 1540823.804209: 1018714 cycles:P: ffffffff9c924ab9 __slab_free+0x9 ([kernel.kallsyms]) nmcli 1558589 [007] 1540823.804384: 1859212 cycles:P: 7f70537a8ad8 __strchrnul_evex+0x18 (/usr/lib/x86_64-linux-gnu/libc.so.6> sleep 1558582 [000] 1540823.804456: 987425 cycles:P: 7fd35bb27b30 _dl_init+0x0 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2> dbus-daemon 3043 [003] 1540823.804575: 1564465 cycles:P: ffffffff9cb2bb70 llist_add_batch+0x0 ([kernel.kallsyms]) gdbus 1558592 [001] 1540823.804766: 1315219 cycles:P: ffffffff9c797b2e audit_filter_syscall+0x9e ([kernel.kallsyms]) NetworkManager 3452 [005] 1540823.805301: 1558782 cycles:P: 7fa957737748 g_bit_lock+0x58 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7400.5> Reviewer notes: Adrian added: "Might be worth noting that currently the biggest PID_MAX_LIMIT is 2^22 so pids don't get bigger than 7 digits presently" $ echo $((2 ** 22)) 4194304 $ echo -n $((2 ** 22)) | wc -c 7 $ Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
* perf script: Add new output field 'dsoff' to print dso offsetChangbin Du2023-05-121-36/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new 'dsoff' field to print dso offset for resolved symbols, and the offset is appended to dso name. Default output: $ perf script ls 2695501 3011030.487017: 500000 cycles: 152cc73ef4b5 get_common_indices.constprop.0+0x155 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) ls 2695501 3011030.487018: 500000 cycles: ffffffff99045b3e [unknown] ([unknown]) ls 2695501 3011030.487018: 500000 cycles: ffffffff9968e107 [unknown] ([unknown]) ls 2695501 3011030.487018: 500000 cycles: ffffffffc1f54afb [unknown] ([unknown]) ls 2695501 3011030.487018: 500000 cycles: ffffffff9968382f [unknown] ([unknown]) ls 2695501 3011030.487019: 500000 cycles: ffffffff99e00094 [unknown] ([unknown]) ls 2695501 3011030.487019: 500000 cycles: 152cc718a8d0 __errno_location@plt+0x0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1) Display 'dsoff' field: $ perf script -F +dsoff ls 2695501 3011030.487017: 500000 cycles: 152cc73ef4b5 get_common_indices.constprop.0+0x155 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5) ls 2695501 3011030.487018: 500000 cycles: ffffffff99045b3e [unknown] ([unknown]) ls 2695501 3011030.487018: 500000 cycles: ffffffff9968e107 [unknown] ([unknown]) ls 2695501 3011030.487018: 500000 cycles: ffffffffc1f54afb [unknown] ([unknown]) ls 2695501 3011030.487018: 500000 cycles: ffffffff9968382f [unknown] ([unknown]) ls 2695501 3011030.487019: 500000 cycles: ffffffff99e00094 [unknown] ([unknown]) ls 2695501 3011030.487019: 500000 cycles: 152cc718a8d0 __errno_location@plt+0x0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0) ls 2695501 3011030.487019: 500000 cycles: ffffffff992a6db0 [unknown] ([unknown]) Signed-off-by: Changbin Du <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Hui Wang <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>