aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/base/node.c
Commit message (Collapse)AuthorAgeFilesLines
...
* drivers/base/node: Avoid manual device_create_file() callsTakashi Iwai2015-03-251-14/+20
| | | | | | | | | Instead of manual calls of multiple device_create_file() and device_remove_file(), use the static attribute groups assigned to the new device. This also fixes the possible races, too. Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
* drivers/base: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo2015-02-141-1/+2
| | | | | | | | | | | | | | | printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. * Line termination only requires one extra space at the end of the buffer. Use PAGE_SIZE - 1 instead of PAGE_SIZE - 2 when formatting. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* cpumask: factor out show_cpumap into separate helper functionSudeep Holla2014-11-071-10/+4
| | | | | | | | | | | | | | | | | | | | | | Many sysfs *_show function use cpu{list,mask}_scnprintf to copy cpumap to the buffer aligned to PAGE_SIZE, append '\n' and '\0' to return null terminated buffer with newline. This patch creates a new helper function cpumap_print_to_pagebuf in cpumask.h using newly added bitmap_print_to_pagebuf and consolidates most of those sysfs functions using the new helper function. Signed-off-by: Sudeep Holla <[email protected]> Suggested-by: Stephen Boyd <[email protected]> Tested-by: Stephen Boyd <[email protected]> Acked-by: "Rafael J. Wysocki" <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
* mm: remove noisy remainder of the scan_unevictable interfaceJohannes Weiner2014-10-101-3/+0
| | | | | | | | | | | | | | | | The deprecation warnings for the scan_unevictable interface triggers by scripts doing `sysctl -a | grep something else'. This is annoying and not helpful. The interface has been defunct since 264e56d8247e ("mm: disable user interface to manually rescue unevictable pages"), which was in 2011, and there haven't been any reports of usecases for it, only reports that the deprecation warnings are annying. It's unlikely that anybody is using this interface specifically at this point, so remove it. Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* driver/base/node: remove unnecessary kfree of node struct from ↵Yasuaki Ishimatsu2014-10-031-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unregister_one_node Commit 92d585ef067d ("numa: fix NULL pointer access and memory leak in unregister_one_node()") added kfree() of node struct in unregister_one_node(). But node struct is freed by node_device_release() which is called in unregister_node(). So by adding the kfree(), node struct is freed two times. While hot removing memory, the commit leads the following BUG_ON(): kernel BUG at mm/slub.c:3346! invalid opcode: 0000 [#1] SMP [...] Call Trace: [...] unregister_one_node [...] try_offline_node [...] remove_memory [...] acpi_memory_device_remove [...] acpi_bus_trim [...] acpi_bus_trim [...] acpi_device_hotplug [...] acpi_hotplug_work_fn [...] process_one_work [...] worker_thread [...] ? rescuer_thread [...] kthread [...] ? kthread_create_on_node [...] ret_from_fork [...] ? kthread_create_on_node This patch removes unnecessary kfree() from unregister_one_node(). Signed-off-by: Yasuaki Ishimatsu <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: [email protected] # v3.16+ Fixes: 92d585ef067d "numa: fix NULL pointer access and memory leak in unregister_one_node()" Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
* mm: export NR_SHMEM via sysinfo(2) / si_meminfo() interfacesRafael Aquini2014-08-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Historically, we exported shared pages to userspace via sysinfo(2) sharedram and /proc/meminfo's "MemShared" fields. With the advent of tmpfs, from kernel v2.4 onward, that old way for accounting shared mem was deemed inaccurate and we started to export a hard-coded 0 for sysinfo.sharedram. Later on, during the 2.6 timeframe, "MemShared" got re-introduced to /proc/meminfo re-branded as "Shmem", but we're still reporting sysinfo.sharedmem as that old hard-coded zero, which makes the "shared memory" report inconsistent across interfaces. This patch leverages the addition of explicit accounting for pages used by shmem/tmpfs -- "4b02108 mm: oom analysis: add shmem vmstat" -- in order to make the users of sysinfo(2) and si_meminfo*() friends aware of that vmstat entry and make them report it consistently across the interfaces, as well to make sysinfo(2) returned data consistent with our current API documentation states. Signed-off-by: Rafael Aquini <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* numa: fix NULL pointer access and memory leak in unregister_one_node()Xishi Qiu2014-03-091-0/+4
| | | | | | | | | | | | | | | | | | | | | When doing socket hot remove, "node_devices[nid]" is set to NULL; acpi_processor_remove() try_offline_node() unregister_one_node() Then hot add a socket, but do not echo 1 > /sys/devices/system/cpu/cpuXX/online, so register_one_node() will not be called, and "node_devices[nid]" is still NULL. If doing socket hot remove again, NULL pointer access will be happen. unregister_one_node() unregister_node() Another, we should free the memory used by "node_devices[nid]" in unregister_one_node(). Signed-off-by: Xishi Qiu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
* thp: account anon transparent huge pages into NR_ANON_PAGESKirill A. Shutemov2013-09-121-6/+0
| | | | | | | | | | | | | | | | | | | | | | | We use NR_ANON_PAGES as base for reporting AnonPages to user. There's not much sense in not accounting transparent huge pages there, but add them on printing to user. Let's account transparent huge pages in NR_ANON_PAGES in the first place. Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Al Viro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Wu Fengguang <[email protected]> Cc: Jan Kara <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Ning Qu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* drivers/base/node.c: switch to register_hotmemory_notifier()Andrew Morton2013-04-291-2/+6
| | | | | | | | | | | Squishes a warning which my change to hotplug_memory_notifier() added. I want to keep that warning, because it is punishment for failnig to check the hotplug_memory_notifier() return value. Cc: Greg KH <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* numa: add CONFIG_MOVABLE_NODE for movable-dedicated nodeLai Jiangshan2012-12-131-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | We need a node which only contains movable memory. This feature is very important for node hotplug. If a node has normal/highmem, the memory may be used by the kernel and can't be offlined. If the node only contains movable memory, we can offline the memory and the node. All are prepared, we can actually introduce N_MEMORY. add CONFIG_MOVABLE_NODE make we can use it for movable-dedicated node [[email protected]: fix Kconfig text] Signed-off-by: Lai Jiangshan <[email protected]> Tested-by: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Wen Congyang <[email protected]> Cc: Jiang Liu <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: David Rientjes <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Greg KH <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* hugetlb: use N_MEMORY instead N_HIGH_MEMORYLai Jiangshan2012-12-131-1/+1
| | | | | | | | | | | | | | | | | | N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Lai Jiangshan <[email protected]> Acked-by: Hillf Danton <[email protected]> Signed-off-by: Wen Congyang <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Lin Feng <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* drivers/base/node.c: cleanup node_state_attr[]Lai Jiangshan2012-12-121-10/+10
| | | | | | | | | | | | | use [index] = init_value use N_xxxxx instead of hardcode. Make it more readability and easier to add new state. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Wen Congyang <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* mm: cleanup register_node()Yasuaki Ishimatsu2012-12-121-1/+1
| | | | | | | | | | | | | register_node() is defined as extern in include/linux/node.h. But the function is only called from register_one_node() in driver/base/node.c. So the patch defines register_node() as static. Signed-off-by: Yasuaki Ishimatsu <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* memory-hotplug: suppress "Device nodeX does not have a release() function" ↵Yasuaki Ishimatsu2012-12-121-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | warning When calling unregister_node(), the function shows following message at device_release(). "Device 'node2' does not have a release() function, it is broken and must be fixed." The reason is node's device struct does not have a release() function. So the patch registers node_device_release() to the device's release() function for suppressing the warning message. Additionally, the patch adds memset() to initialize a node struct into register_node(). Because the node struct is part of node_devices[] array and it cannot be freed by node_device_release(). So if system reuses the node struct, it has a garbage. Signed-off-by: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Wen Congyang <[email protected]> Cc: David Rientjes <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Minchan Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Greg KH <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* numa: convert static memory to dynamically allocated memory for per node deviceWen Congyang2012-12-121-16/+22
| | | | | | | | | | | | | | | We use a static array to store struct node. In many cases, we don't have too many nodes, and some memory will be unused. Convert it to per-device dynamically allocated memory. Signed-off-by: Wen Congyang <[email protected]> Cc: David Rientjes <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Minchan Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* mm: fix off-by-one bug in print_nodes_state()Ryota Ozaki2012-05-291-5/+3
| | | | | | | | | | | | | | | /sys/devices/system/node/{online,possible} outputs a garbage byte because print_nodes_state() returns content size + 1. To fix the bug, the patch changes the use of cpuset_sprintf_cpulist to follow the use at other places, which is clearer and safer. This bug was introduced in v2.6.24 (commit bde631a51876: "mm: add node states sysfs class attributeS"). Signed-off-by: Ryota Ozaki <[email protected]> Cc: Lee Schermerhorn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* drivers/base/memory.c: fix memory_dev_init() long delayYinghai Lu2012-02-021-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One system with 2048g ram, reported soft lockup on recent kernel. [ 34.426749] cpu_dev_init done [ 61.166399] BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] [ 61.166733] Modules linked in: [ 61.166904] irq event stamp: 1935610 [ 61.178431] hardirqs last enabled at (1935609): [<ffffffff81ce8c05>] mutex_lock_nested+0x299/0x2b4 [ 61.178923] hardirqs last disabled at (1935610): [<ffffffff81cf2bab>] apic_timer_interrupt+0x6b/0x80 [ 61.198767] softirqs last enabled at (1935476): [<ffffffff8106e59c>] __do_softirq+0x195/0x1ab [ 61.218604] softirqs last disabled at (1935471): [<ffffffff81cf359c>] call_softirq+0x1c/0x30 [ 61.238408] CPU 0 [ 61.238549] Modules linked in: [ 61.238744] [ 61.238825] Pid: 1, comm: swapper/0 Not tainted 3.3.0-rc1-tip-yh-02076-g962f689-dirty #171 [ 61.278212] RIP: 0010:[<ffffffff810b3e3a>] [<ffffffff810b3e3a>] lock_release+0x90/0x9c [ 61.278627] RSP: 0018:ffff883f64dbfd70 EFLAGS: 00000246 [ 61.298287] RAX: ffff883f64dc0000 RBX: 0000000000000000 RCX: 000000000000008b [ 61.298690] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 61.318383] RBP: ffff883f64dbfda0 R08: 0000000000000001 R09: 000000000000008b [ 61.338215] R10: 0000000000000000 R11: 0000000000000000 R12: ffff883f64dbfd10 [ 61.338610] R13: ffff883f64dc0708 R14: ffff883f64dc0708 R15: ffffffff81095657 [ 61.358299] FS: 0000000000000000(0000) GS:ffff883f7d600000(0000) knlGS:0000000000000000 [ 61.378118] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 61.378450] CR2: 0000000000000000 CR3: 00000000024af000 CR4: 00000000000007f0 [ 61.398144] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 61.417918] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 61.418260] Process swapper/0 (pid: 1, threadinfo ffff883f64dbe000, task ffff883f64dc0000) [ 61.445358] Stack: [ 61.445511] 0000000000000002 ffff897f649ba168 ffff883f64dbfe10 ffff88ff64bb57a8 [ 61.458040] 0000000000000000 0000000000000000 ffff883f64dbfdc0 ffffffff81ceb1b4 [ 61.458491] 000000000011608c ffff88ff64bb58a8 ffff883f64dbfdf0 ffffffff81c57638 [ 61.478215] Call Trace: [ 61.478367] [<ffffffff81ceb1b4>] _raw_spin_unlock+0x21/0x2e [ 61.497994] [<ffffffff81c57638>] klist_next+0x9e/0xbc [ 61.498264] [<ffffffff8148ba99>] next_device+0xe/0x1e [ 61.517867] [<ffffffff8148c0cc>] subsys_find_device_by_id+0xb7/0xd6 [ 61.518197] [<ffffffff81498846>] find_memory_block_hinted+0x3d/0x66 [ 61.537927] [<ffffffff8149887f>] find_memory_block+0x10/0x12 [ 61.538193] [<ffffffff814988b6>] add_memory_section+0x35/0x9e [ 61.557932] [<ffffffff827fecef>] memory_dev_init+0x68/0xda [ 61.558227] [<ffffffff827fec01>] driver_init+0x97/0xa7 [ 61.577853] [<ffffffff827cdf3c>] kernel_init+0xf6/0x1c0 [ 61.578140] [<ffffffff81cf34a4>] kernel_thread_helper+0x4/0x10 [ 61.597850] [<ffffffff81ceb59d>] ? retint_restore_args+0xe/0xe [ 61.598144] [<ffffffff827cde46>] ? start_kernel+0x3ab/0x3ab [ 61.617826] [<ffffffff81cf34a0>] ? gs_change+0xb/0xb [ 61.618060] Code: 10 48 83 3b 00 eb e8 4c 89 f2 44 89 fe 4c 89 ef e8 e1 fe ff ff 65 48 8b 04 25 40 bc 00 00 c7 80 cc 06 00 00 00 00 00 00 41 54 9d <5e> 5b 41 5c 41 5d 41 5e 41 5f 5d c3 55 48 89 e5 41 57 41 89 cf [ 89.285380] memory_dev_init done Finally it takes about 55s to create 16400 memory entries. Root cause: for x86_64, 2048g (with 2g hole at [2g,4g), and TOP2 will be 2050g), will have 16400 memory block. find_memory_block/subsys_find_device_by_id will be expensive with that many entries. Actually, we don't need to find that memory block for BOOT path. Skip that finding make it get back to normal. [ 34.466696] cpu_dev_init done [ 35.290080] memory_dev_init done Also solved the delay with topology_init when sections_per_block is not 1. Signed-off-by: Yinghai Lu <[email protected]> Cc: Kay Sievers <[email protected]> Cc: Nathan Fontenot <[email protected]> Cc: Robin Holt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
* Merge branch 'driver-core-next' into Linux 3.2Greg Kroah-Hartman2012-01-061-74/+80
|\ | | | | | | | | | | | | | | | | | | | | | | This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file, and it fixes the build error in the arch/x86/kernel/microcode_core.c file, that the merge did not catch. The microcode_core.c patch was provided by Stephen Rothwell <[email protected]> who was invaluable in the merge issues involved with the large sysdev removal process in the driver-core tree. Signed-off-by: Greg Kroah-Hartman <[email protected]>
| * convert 'memory' sysdev_class to a regular subsystemKay Sievers2011-12-211-70/+76
| | | | | | | | | | | | | | | | | | | | | | | | This moves the 'memory sysdev_class' over to a regular 'memory' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
| * cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystemKay Sievers2011-12-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Userspace relies on events and generic sysfs subsystem infrastructure from sysdev devices, which are made available with this conversion. Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Paul Mundt <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Tigran Aivazian <[email protected]> Cc: Len Brown <[email protected]> Cc: Zhang Rui <[email protected]> Cc: Dave Jones <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Russell King <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: "Srivatsa S. Bhat" <[email protected]> Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
* | drivers/base/node.c: fix compilation error with older versions of gccClaudio Scordino2011-11-181-6/+8
|/ | | | | | | | | | Patch to fix the error message "directives may not be used inside a macro argument" which appears when the kernel is compiled for the cris architecture. Signed-off-by: Claudio Scordino <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
* mm: per-node vmstat: show proper vmstatsKOSAKI Motohiro2011-05-251-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2ac390370a ("writeback: add /sys/devices/system/node/<node>/vmstat") added vmstat entry. But strangely it only show nr_written and nr_dirtied. # cat /sys/devices/system/node/node20/vmstat nr_written 0 nr_dirtied 0 Of course, It's not adequate. With this patch, the vmstat show all vm stastics as /proc/vmstat. # cat /sys/devices/system/node/node0/vmstat nr_free_pages 899224 nr_inactive_anon 201 nr_active_anon 17380 nr_inactive_file 31572 nr_active_file 28277 nr_unevictable 0 nr_mlock 0 nr_anon_pages 17321 nr_mapped 8640 nr_file_pages 60107 nr_dirty 33 nr_writeback 0 nr_slab_reclaimable 6850 nr_slab_unreclaimable 7604 nr_page_table_pages 3105 nr_kernel_stack 175 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 260 nr_dirtied 1050 nr_written 938 numa_hit 962872 numa_miss 0 numa_foreign 0 numa_interleave 8617 numa_local 962872 numa_other 0 nr_anon_transparent_hugepages 0 [[email protected]: no externs in .c files] Signed-off-by: KOSAKI Motohiro <[email protected]> Cc: Michael Rubin <[email protected]> Cc: Wu Fengguang <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* memory hotplug: Update phys_index to [start|end]_section_nrNathan Fontenot2011-02-041-4/+8
| | | | | | | | | | | | | | | | | | | | | | | Update the 'phys_index' property of a the memory_block struct to be called start_section_nr, and add a end_section_nr property. The data tracked here is the same but the updated naming is more in line with what is stored here, namely the first and last section number that the memory block spans. The names presented to userspace remain the same, phys_index for start_section_nr and end_phys_index for end_section_nr, to avoid breaking anything in userspace. This also updates the node sysfs code to be aware of the new capability for a memory block to contain multiple memory sections and be aware of the memory block structure name changes (start_section_nr). This requires an additional parameter to unregister_mem_sect_under_nodes so that we know which memory section of the memory block to unregister. Signed-off-by: Nathan Fontenot <[email protected]> Reviewed-by: Robin Holt <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
* thp: transparent hugepage sysfs meminfoDavid Rientjes2011-01-141-3/+18
| | | | | | | | | | Add hugepage statistics to per-node sysfs meminfo Reviewed-by: Rik van Riel <[email protected]> Signed-off-by: David Rientjes <[email protected]> Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* writeback: add /sys/devices/system/node/<node>/vmstatMichael Rubin2010-10-261-0/+14
| | | | | | | | | | | | | | | | | | | For NUMA node systems it is important to have visibility in memory characteristics. Two of the /proc/vmstat values "nr_written" and "nr_dirtied" are added here. # cat /sys/devices/system/node/node20/vmstat nr_written 0 nr_dirtied 0 Signed-off-by: Michael Rubin <[email protected]> Reviewed-by: Wu Fengguang <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* driver core: Convert link_mem_sections to use find_memory_block_hinted.Robin Holt2010-10-221-3/+5
| | | | | | | | | | | | | | | | | Modify link_mem_sections() to pass in the previous mem_block as a hint to locating the next mem_block. Since they are typically added in order this results in a massive saving in time during boot of a very large system. For example, on a 16TB x86_64 machine, it reduced the total time spent linking all node's memory sections from 1 hour, 27 minutes to 46 seconds. Signed-off-by: Robin Holt <[email protected]> To: Gary Hade <[email protected]> To: Badari Pulavarty <[email protected]> To: Ingo Molnar <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
* drivers/base/node.c: reduce stack usage of node_read_meminfo()KOSAKI Motohiro2010-08-101-23/+23
| | | | | | | | | | | | | | drivers/base/node.c: In function 'node_read_meminfo': drivers/base/node.c:139: warning: the frame size of 848 bytes is larger than 512 bytes Fix it by splitting the sprintf() into three parts. It has no functional change. Signed-off-by: KOSAKI Motohiro <[email protected]> Cc: Greg KH <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* mm: compaction: add /sys trigger for per-node memory compactionMel Gorman2010-05-251-0/+3
| | | | | | | | | | | | | | | | | Add a per-node sysfs file called compact. When the file is written to, each zone in that node is compacted. The intention that this would be used by something like a job scheduler in a batch system before a job starts so that the job can allocate the maximum number of hugepages without significant start-up cost. Signed-off-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* nodemask: include slab.h from drivers/base/node.cTejun Heo2010-04-061-1/+1
| | | | | | | | | | | | | | NODEMASK_ALLOC/FREE are mapped to kmalloc/free if NODES_SHIFT > 8. Among its several users, drivers/base/node.c wasn't including slab.h leading to build failure if NODES_SHIFT > 8. Include slab.h from drivers/base/node.c. This isn't an ideal solution but including slab.h directly from nodemask.h is not an option because nodemask.h gets included everywhere. For now, make it work by including slab.h from its users. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Ingo Molnar <[email protected]>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <[email protected]> Guess-its-ok-by: Christoph Lameter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Lee Schermerhorn <[email protected]>
* driver core: numa: fix BUILD_BUG_ON for node_read_distanceDavid Rientjes2010-03-191-2/+5
| | | | | | | | | | | | | node_read_distance() has a BUILD_BUG_ON() to prevent buffer overruns when the number of nodes printed will exceed the buffer length. Each node only needs four chars: three for distance (maximum distance is 255) and one for a seperating space or a trailing newline. Signed-off-by: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
* sysdev: Use sysdev_class attribute arrays in node driverAndi Kleen2010-03-081-15/+16
| | | | | | | | | | | | Convert the node driver to sysdev_class attribute arrays. This greatly cleans up the code and remove a lot of code. Saves ~150 bytes of code on x86-64. Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
* sysdev: Convert node driver class attributes to be data drivenAndi Kleen2010-03-081-47/+18
| | | | | | | | | | Using the new attribute argument convert the node driver class attributes to carry the node state. Then use a shared function to do what a lot of individual functions did before. Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
* sysdev: Pass attribute in sysdev_class attributes show/storeAndi Kleen2010-03-081-5/+12
| | | | | | | | | | | | | | | | | | | | | | | Passing the attribute to the low level IO functions allows all kinds of cleanups, by sharing low level IO code without requiring an own function for every piece of data. Also drivers can extend the attributes with own data fields and use that in the low level function. Similar to sysdev_attributes and normal attributes. This is a tree-wide sweep, converting everything in one go. No functional changes in this patch other than passing the new argument everywhere. Tested on x86, the non x86 parts are uncompiled. Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
* mm: slab-allocate memory section nodemask for large systemsDavid Rientjes2009-12-151-4/+9
| | | | | | | | | | | | | | | | | | | Nodemasks should not be allocated on the stack for large systems (when it is larger than 256 bytes) since there is a threat of overflow. This patch causes the unregister_mem_sect_under_nodes() nodemask to be allocated on the stack for smaller systems and be allocated by slab for larger systems. GFP_KERNEL is used since remove_memory_block() can block. Cc: Gary Hade <[email protected]> Cc: Badari Pulavarty <[email protected]> Cc: Alex Chiang <[email protected]> Signed-off-by: David Rientjes <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* mm: add numa node symlink for cpu devices in sysfsAlex Chiang2009-12-151-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | You can discover which CPUs belong to a NUMA node by examining /sys/devices/system/node/node#/ However, it's not convenient to go in the other direction, when looking at /sys/devices/system/cpu/cpu#/ Yes, you can muck about in sysfs, but adding these symlinks makes life a lot more convenient. Signed-off-by: Alex Chiang <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Gary Hade <[email protected]> Cc: Badari Pulavarty <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg KH <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: David Rientjes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* mm: refactor unregister_cpu_under_node()Alex Chiang2009-12-151-6/+12
| | | | | | | | | | | | | | | | | | | By returning early if the node is not online, we can unindent the interesting code by two levels. No functional change. Signed-off-by: Alex Chiang <[email protected]> Cc: Gary Hade <[email protected]> Cc: Badari Pulavarty <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg KH <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: David Rientjes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* mm: refactor register_cpu_under_node()Alex Chiang2009-12-151-9/+11
| | | | | | | | | | | | | | | | | | | By returning early if the node is not online, we can unindent the interesting code by one level. No functional change. Signed-off-by: Alex Chiang <[email protected]> Cc: Gary Hade <[email protected]> Cc: Badari Pulavarty <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg KH <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: David Rientjes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* mm: add numa node symlink for memory section in sysfsAlex Chiang2009-12-151-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit c04fc586c (mm: show node to memory section relationship with symlinks in sysfs) created symlinks from nodes to memory sections, e.g. /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 If you're examining the memory section though and are wondering what node it might belong to, you can find it by grovelling around in sysfs, but it's a little cumbersome. Add a reverse symlink for each memory section that points back to the node to which it belongs. Signed-off-by: Alex Chiang <[email protected]> Cc: Gary Hade <[email protected]> Cc: Badari Pulavarty <[email protected]> Cc: Ingo Molnar <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Greg KH <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: David Rientjes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* hugetlb: offload per node attribute registrationsLee Schermerhorn2009-12-151-10/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Offload the registration and unregistration of per node hstate sysfs attributes to a worker thread rather than attempt the allocation/attachment or detachment/freeing of the attributes in the context of the memory hotplug handler. I don't know that this is absolutely required, but the registration can sleep in allocations and other mem hot plug handlers do it this way. If it turns out this is NOT required, we can drop this patch. N.B., Only tested build, boot, libhugetlbfs regression. i.e., no memory hotplug testing. Signed-off-by: Lee Schermerhorn <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Nishanth Aravamudan <[email protected]> Cc: David Rientjes <[email protected]> Cc: Adam Litke <[email protected]> Cc: Andy Whitcroft <[email protected]> Cc: Eric Whitney <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* hugetlb: handle memory hot-plug eventsLee Schermerhorn2009-12-151-5/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Register per node hstate attributes only for nodes with memory. As suggested by David Rientjes. With Memory Hotplug, memory can be added to a memoryless node and a node with memory can become memoryless. Therefore, add a memory on/off-line notifier callback to [un]register a node's attributes on transition to/from memoryless state. N.B., Only tested build, boot, libhugetlbfs regression. i.e., no memory hotplug testing. Signed-off-by: Lee Schermerhorn <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Nishanth Aravamudan <[email protected]> Cc: Adam Litke <[email protected]> Cc: Andy Whitcroft <[email protected]> Cc: Eric Whitney <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* hugetlb: add per node hstate attributesLee Schermerhorn2009-12-151-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the per huge page size control/query attributes to the per node sysdevs: /sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/ nr_hugepages - r/w free_huge_pages - r/o surplus_huge_pages - r/o The patch attempts to re-use/share as much of the existing global hstate attribute initialization and handling, and the "nodes_allowed" constraint processing as possible. Calling set_max_huge_pages() with no node indicates a change to global hstate parameters. In this case, any non-default task mempolicy will be used to generate the nodes_allowed mask. A valid node id indicates an update to that node's hstate parameters, and the count argument specifies the target count for the specified node. From this info, we compute the target global count for the hstate and construct a nodes_allowed node mask contain only the specified node. Setting the node specific nr_hugepages via the per node attribute effectively ignores any task mempolicy or cpuset constraints. With this patch: (me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB ./ ../ free_hugepages nr_hugepages surplus_hugepages Starting from: Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 0 Node 2 HugePages_Free: 0 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 vm.nr_hugepages = 0 Allocate 16 persistent huge pages on node 2: (me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages [Note that this is equivalent to: numactl -m 2 hugeadmin --pool-pages-min 2M:+16 ] Yields: Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 16 Node 2 HugePages_Free: 16 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 vm.nr_hugepages = 16 Global controls work as expected--reduce pool to 8 persistent huge pages: (me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 8 Node 2 HugePages_Free: 8 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 Signed-off-by: Lee Schermerhorn <[email protected]> Acked-by: Mel Gorman <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Nishanth Aravamudan <[email protected]> Cc: David Rientjes <[email protected]> Cc: Adam Litke <[email protected]> Cc: Andy Whitcroft <[email protected]> Cc: Eric Whitney <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* mm: oom analysis: add shmem vmstatKOSAKI Motohiro2009-09-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Recently we encountered OOM problems due to memory use of the GEM cache. Generally a large amuont of Shmem/Tmpfs pages tend to create a memory shortage problem. We often use the following calculation to determine the amount of shmem pages: shmem = NR_ACTIVE_ANON + NR_INACTIVE_ANON - NR_ANON_PAGES however the expression does not consider isolated and mlocked pages. This patch adds explicit accounting for pages used by shmem and tmpfs. Signed-off-by: KOSAKI Motohiro <[email protected]> Acked-by: Rik van Riel <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Acked-by: Wu Fengguang <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* mm: oom analysis: Show kernel stack usage in /proc/meminfo and OOM log outputKOSAKI Motohiro2009-09-221-0/+3
| | | | | | | | | | | | | | | | The amount of memory allocated to kernel stacks can become significant and cause OOM conditions. However, we do not display the amount of memory consumed by stacks. Add code to display the amount of memory used for stacks in /proc/meminfo. Signed-off-by: KOSAKI Motohiro <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* mm: remove CONFIG_UNEVICTABLE_LRU config optionKOSAKI Motohiro2009-06-171-4/+0
| | | | | | | | | | | | | | | | Currently, nobody wants to turn UNEVICTABLE_LRU off. Thus this configurability is unnecessary. Signed-off-by: KOSAKI Motohiro <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Andi Kleen <[email protected]> Acked-by: Minchan Kim <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Matt Mackall <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Lee Schermerhorn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* cpumask: replace node_to_cpumask with cpumask_of_node.Rusty Russell2009-03-131-1/+1
| | | | | | | | | | Impact: cleanup node_to_cpumask (and the blecherous node_to_cpumask_ptr which contained a declaration) are replaced now everyone implements cpumask_of_node. Signed-off-by: Rusty Russell <[email protected]>
* mm: get_nid_for_pfn() returns intRoel Kluin2009-03-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | get_nid_for_pfn() returns int Presumably the (nid < 0) case has never happened. We do know that it is happening on one system while creating a symlink for a memory section so it should also happen on the same system if unregister_mem_sect_under_nodes() were called to remove the same symlink. The test was actually added in response to a problem with an earlier version reported by Yasunori Goto where one or more of the leading pages of a memory section on the 2nd node of one of his systems was uninitialized because I believe they coincided with a memory hole. That earlier version did not ignore uninitialized pages and determined the nid by considering only the 1st page of each memory section. This caused the symlink to the 1st memory section on the 2nd node to be incorrectly created in /sys/devices/system/node/node0 instead of /sys/devices/system/node/node1. The problem was fixed by adding the test to skip over uninitialized pages. I suspect we have not seen any reports of the non-removal of a symlink due to the incorrect declaration of the nid variable in unregister_mem_sect_under_nodes() because - systems where a memory section could have an uninitialized range of leading pages are probably rare. - memory remove is probably not done very frequently on the systems that are capable of demonstrating the problem. - lingering symlink(s) that should have been removed may have simply gone unnoticed. [[email protected]: wrote changelog] Signed-off-by: Roel Kluin <[email protected]> Cc: Gary Hade <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* mm: show node to memory section relationship with symlinks in sysfsGary Hade2009-01-061-0/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Show node to memory section relationship with symlinks in sysfs Add /sys/devices/system/node/nodeX/memoryY symlinks for all the memory sections located on nodeX. For example: /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 indicates that memory section 135 resides on node1. Also revises documentation to cover this change as well as updating Documentation/ABI/testing/sysfs-devices-memory to include descriptions of memory hotremove files 'phys_device', 'phys_index', and 'state' that were previously not described there. In addition to it always being a good policy to provide users with the maximum possible amount of physical location information for resources that can be hot-added and/or hot-removed, the following are some (but likely not all) of the user benefits provided by this change. Immediate: - Provides information needed to determine the specific node on which a defective DIMM is located. This will reduce system downtime when the node or defective DIMM is swapped out. - Prevents unintended onlining of a memory section that was previously offlined due to a defective DIMM. This could happen during node hot-add when the user or node hot-add assist script onlines _all_ offlined sections due to user or script inability to identify the specific memory sections located on the hot-added node. The consequences of reintroducing the defective memory could be ugly. - Provides information needed to vary the amount and distribution of memory on specific nodes for testing or debugging purposes. Future: - Will provide information needed to identify the memory sections that need to be offlined prior to physical removal of a specific node. Symlink creation during boot was tested on 2-node x86_64, 2-node ppc64, and 2-node ia64 systems. Symlink creation during physical memory hot-add tested on a 2-node x86_64 system. Signed-off-by: Gary Hade <[email protected]> Signed-off-by: Badari Pulavarty <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
* cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and ↵Rusty Russell2008-12-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | cpulist_scnprintf to take pointers. Impact: change calling convention of existing cpumask APIs Most cpumask functions started with cpus_: these have been replaced by cpumask_ ones which take struct cpumask pointers as expected. These four functions don't have good replacement names; fortunately they're rarely used, so we just change them over. Signed-off-by: Rusty Russell <[email protected]> Signed-off-by: Mike Travis <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Cc: [email protected]
* vmscan: unevictable LRU scan sysctlLee Schermerhorn2008-10-201-0/+5
| | | | | | | | | | | | | | | | | | | | | | | This patch adds a function to scan individual or all zones' unevictable lists and move any pages that have become evictable onto the respective zone's inactive list, where shrink_inactive_list() will deal with them. Adds sysctl to scan all nodes, and per node attributes to individual nodes' zones. Kosaki: If evictable page found in unevictable lru when write /proc/sys/vm/scan_unevictable_pages, print filename and file offset of these pages. [[email protected]: fix one CONFIG_MMU=n build error] [[email protected]: adapt vmscan-unevictable-lru-scan-sysctl.patch to new sysfs API] Signed-off-by: Lee Schermerhorn <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>