2008-10-14 00:24 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-14 01:50 -!- less(~less@145-116-238-192.uilenstede.casema.nl) has joined #tux3 2008-10-14 02:41 -!- pgquiles_(~pgquiles@249.Red-79-155-127.staticIP.rima-tde.net) has joined #tux3 2008-10-14 02:42 -!- ceatinge_(~ceatinge@veryclever.net) has joined #tux3 2008-10-14 02:54 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-14 02:54 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-14 06:31 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-14 06:31 -!- nataliep(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-10-14 09:36 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-14 12:12 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-14 15:44 -!- mokkpr01(~chatzilla@133-132.127-70.tampabay.res.rr.com) has joined #tux3 2008-10-14 18:30 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-14 18:57 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-14 19:23 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-14 19:45 ACTION is wondering in buffer.c... 2008-10-14 19:47 wandering? 2008-10-14 19:48 just saw an article in the news about McCain and YouTube and the DMCA... cute 2008-10-14 19:49 right, wandering :P 2008-10-14 19:50 damn... I press the wrong button and the whole chat window was cleared :| 2008-10-14 19:52 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-14 19:52 Hi 2008-10-14 19:56 hi ralucam 2008-10-14 19:57 T-3 minutes 2008-10-14 19:59 T-1 2008-10-14 20:00 ACTION is ready 2008-10-14 20:01 ok 2008-10-14 20:02 last time we went delving into the relationship between pages and buffers 2008-10-14 20:02 let's do some more of that 2008-10-14 20:02 let's look at sb_bread 2008-10-14 20:03 http://lxr.linux.no/linux+v2.6.26.6/include/linux/buffer_head.h#L278 ? 2008-10-14 20:03 2.6.27 indexed yet? 2008-10-14 20:03 http://lxr.linux.no/linux+v2.6.27/include/linux/buffer_head.h#L278 :D 2008-10-14 20:03 right 2008-10-14 20:03 you don't think sticking to 2.6.26 is worth it? 2008-10-14 20:03 I don't know if the search works though... 2008-10-14 20:04 2.6.26 is fine 2008-10-14 20:04 we don't need to get off on lockless page cache right now 2008-10-14 20:04 ok, bread is the classic bsd way of accessing buffer cache 2008-10-14 20:05 just one parameter, the buffer, and the block to read is in the buffer struct 2008-10-14 20:05 which I will just call buffer instead of buffer_head from now on 2008-10-14 20:05 the _head is entirely fluff, doesn't mean anything 2008-10-14 20:06 struct buffer traditionally also has a size 2008-10-14 20:06 that was a stupid idea 2008-10-14 20:06 and we have largely dropped that now 2008-10-14 20:07 instead, the size is taken from a field in the superblock, which is why we now have sb_bread, taking an sb, a physical block on the device referenced by the sb, and returning a buffer 2008-10-14 20:07 sb_bread(struct super_block *sb, sector_t block) 2008-10-14 20:07 seems to take a block number as a parameter... 2008-10-14 20:07 yes 2008-10-14 20:08 oh, misinterpreted your comment 2008-10-14 20:08 (to mean the sb had a field with the block number) 2008-10-14 20:08 and my comment re the orignal bread was wrong, does not take a buffer 2008-10-14 20:08 http://www.ipnom.com/FreeBSD-Man-Pages/bread.3.html 2008-10-14 20:08 bread(struct uufsd *disk, ufs2_daddr_t blockno, void *data, size_t size) 2008-10-14 20:09 let's go find the old linux one just for interest 2008-10-14 20:09 2.4? 2008-10-14 20:09 yes 2008-10-14 20:10 you sure sb_bread, doesn't just read the superblock from a specific block number? 2008-10-14 20:10 http://lxr.linux.no/linux-old+v2.4.31/fs/buffer.c#L1189 2008-10-14 20:10 yes, I'm sure 2008-10-14 20:10 oh, right 2008-10-14 20:11 missed the previous line ;-) someone posted a link to 278 instead of 277 ;-) 2008-10-14 20:11 struct buffer_head * bread(kdev_t dev, int block, int size) <- the legacy linux version 2008-10-14 20:11 I feel stupid... 2008-10-14 20:11 the freebsd version fell even more off the tracks 2008-10-14 20:11 ah, you just tied me for mistakes tonight ;) 2008-10-14 20:12 the trick of the pro is to make those mistakes faster than the amateur 2008-10-14 20:12 Maze: sorry :P 2008-10-14 20:12 lol 2008-10-14 20:13 ok, bad to sb_bread 2008-10-14 20:13 simply calls __bread 2008-10-14 20:13 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1437 2008-10-14 20:13 which no longer needs to know anything about the sb 2008-10-14 20:14 the only reason we needed the sb was to know the blocksize and the underlying device 2008-10-14 20:14 this should have simpley been called "bread" 2008-10-14 20:14 that is, the sb_bread should have been bread 2008-10-14 20:14 I'm guess there's some weird interactions if you call this functions with non-constant size values 2008-10-14 20:15 don't do it 2008-10-14 20:15 never has worked properly 2008-10-14 20:15 never will 2008-10-14 20:15 putting the blocksize in the struct buffer was just a big mistake 2008-10-14 20:15 so size is basically a device property then? 2008-10-14 20:15 not really 2008-10-14 20:15 has been at times 2008-10-14 20:16 has caused lots of bugs 2008-10-14 20:16 see set_block_size 2008-10-14 20:16 or some name like that 2008-10-14 20:16 again, doesn't work properly 2008-10-14 20:16 the buffer size is properly just a property of the superblock, and actually one you can ignore 2008-10-14 20:16 as long as you don't overlap buffers of different sizes 2008-10-14 20:17 there is no cache coherence in that case 2008-10-14 20:17 meta-question: if we want to use bio's for everything... why do we care about bufferheads? 2008-10-14 20:17 we're going to arrive at a bio pretty soon in this little side trip 2008-10-14 20:17 let's try __bread_slow 2008-10-14 20:17 this code path has gotten deeply messed lately 2008-10-14 20:18 with various optimizations + historical cruft 2008-10-14 20:18 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1239 2008-10-14 20:18 we see submit_bh there 2008-10-14 20:18 let's go in 2008-10-14 20:18 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L2862 2008-10-14 20:19 nothing terribly surprising 2008-10-14 20:19 and there we see some code much like you wrote for junkfs 2008-10-14 20:19 this is actually kind of a stupid arrangement 2008-10-14 20:19 the bio could have been allocated on the stack of the caller 2008-10-14 20:19 because we do a sync wait in __bread_slow 2008-10-14 20:20 ok, that is it for sb_bread 2008-10-14 20:20 anything not completely clear there? 2008-10-14 20:20 error handling ;-) 2008-10-14 20:21 hah 2008-10-14 20:21 very poor on this path 2008-10-14 20:21 these functions historically had no error report except "return NULL" 2008-10-14 20:21 much of Linux is still that way, very slowly changing 2008-10-14 20:22 if you check out my latest revs to the tux3 bio interface, there is a mechanism for sending back accurate errors there 2008-10-14 20:22 but usually we tend to drop the ball somewhere in the call chain, and not return the actual error, the higher level just guesses 2008-10-14 20:22 usually the guess is EIO or ENOMEM, randomly 2008-10-14 20:22 yeah, submit_bh returns a value, but it doesn't get checked, etc 2008-10-14 20:22 "don't be part of the problem" 2008-10-14 20:23 when you write you own kernel code 2008-10-14 20:23 you will even see stuff like that in my user space simulation 2008-10-14 20:23 C is just not very good at returning error codes 2008-10-14 20:23 anyway... I will fix it over time 2008-10-14 20:23 -!- cydork(~vihang@59.184.62.147) has joined #tux3 2008-10-14 20:23 it's the penalty you pay for having full control of exceptions... 2008-10-14 20:24 for not having? 2008-10-14 20:24 oh 2008-10-14 20:24 kind of 2008-10-14 20:25 it's more about having no good way to return multiple results from a function, one of which is an error code 2008-10-14 20:25 there's IS_ERR 2008-10-14 20:25 error used it? 2008-10-14 20:25 painful 2008-10-14 20:25 beautifull hack if there ever was one... 2008-10-14 20:25 semantics are not completely obvious either 2008-10-14 20:25 but yeah, it's a little painful 2008-10-14 20:26 often not clear whether it wants err or -err 2008-10-14 20:26 I think part of the problem is it doesn't consider null an error 2008-10-14 20:26 IS_ERR? it wants a pointer 2008-10-14 20:26 and returns bool 2008-10-14 20:26 ERR_PTR 2008-10-14 20:26 I think of it as all one thing 2008-10-14 20:26 clumsy 2008-10-14 20:26 but what can you do? 2008-10-14 20:27 everybody seen vecio and syncio from tux3/super.c ? 2008-10-14 20:28 ACTION did not :( 2008-10-14 20:28 linky? 2008-10-14 20:28 http://phunq.net/ddtree?p=tux3fs;a=blob;f=fs/tux3/super.c;h=1023f06407bc8752e0afc4c2c71940023a18b9f9;hb=HEAD 2008-10-14 20:29 I have to set this repo up better 2008-10-14 20:29 junkfs_fill_super - dead code? 2008-10-14 20:29 it does the only actual work 2008-10-14 20:30 oh lol 2008-10-14 20:30 called from ext3_fill_super 2008-10-14 20:30 yeah see it now 2008-10-14 20:30 tux3_... 2008-10-14 20:30 will disappear next rev, yes it was humor 2008-10-14 20:30 anyway, there you see a far more elegant way of getting a block into memory than sb_bread 2008-10-14 20:31 we only need to know about sb_bread to know how other filesystems do it 2008-10-14 20:31 well 2008-10-14 20:31 sb_bread is still important to is 2008-10-14 20:31 let's go take a look at another part of it 2008-10-14 20:32 where it enters the buffer into the buffer cache 2008-10-14 20:32 almost forgot about that, the most important thing 2008-10-14 20:32 __getblk actually creates the buffer and does this job 2008-10-14 20:33 just like in the tux3 buffer cache emulation 2008-10-14 20:33 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1403 2008-10-14 20:33 thanks 2008-10-14 20:33 you got there seconds ahead of me ;) 2008-10-14 20:34 what, I'm still unsure off... 2008-10-14 20:34 also: http://tux3.org/tux3?f=a4a6f8e640c5;file=user/test/buffer.c <- search for "bread" 2008-10-14 20:34 is why we need a buffer cache 2008-10-14 20:34 don't we have a page cache already? 2008-10-14 20:34 short answer: filesystem metadata 2008-10-14 20:35 there is a page cache dedicated to the block device itself, in addition to a page cache for each inode 2008-10-14 20:35 but when block size does not match page size, it is pretty much impossible to do locking properly with page sized units 2008-10-14 20:36 so what we do instead, is use the buffer attached to the pages as our locking units 2008-10-14 20:36 we looked at that last thursday 2008-10-14 20:36 I thought we'd already put the metadata in files... ;-) 2008-10-14 20:36 but at the time did not really know what it was for 2008-10-14 20:36 heh 2008-10-14 20:36 well what about the metadata for the metada files? 2008-10-14 20:37 ultimately we have to go cache some absolute blocks 2008-10-14 20:37 I thought that was in RAM ;-) 2008-10-14 20:37 uhm the superblock? 2008-10-14 20:37 but let's not divert the topic too much 2008-10-14 20:38 and the blocks that index the files 2008-10-14 20:38 they can't themselves be in files, unless you want a really evil result like ntfs 2008-10-14 20:38 and even then, you can't put _all_ of the file index blocks in files 2008-10-14 20:39 ok, __getblk 2008-10-14 20:39 there's a "friend of grab_cache_page" 2008-10-14 20:39 __find_get_block 2008-10-14 20:40 just a wrapper for __find_get_block_slow 2008-10-14 20:40 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1118 2008-10-14 20:40 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1370 2008-10-14 20:40 see the touch_buffer() there? that implements the lru 2008-10-14 20:41 brings the underlying page to the hot end of the lru 2008-10-14 20:41 we should be here now http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L262 2008-10-14 20:42 now we found a real freind of grab_cache_page, as opposed to a mere hanger on 2008-10-14 20:42 find_get_page 2008-10-14 20:42 http://lxr.linux.no/linux+v2.6.26.6/mm/filemap.c#L630 2008-10-14 20:43 we don't need to go there just now, suffice to say that if the page isn't in the page cache it doesn't try to add it 2008-10-14 20:44 back at _slow... 2008-10-14 20:45 if we find a page in the page cache and it has buffers, when we loop across the buffer list mod the ratio of the buffer size to the page size 2008-10-14 20:46 ah, and we do some evil cruft with the buffer_mapped concept 2008-10-14 20:46 buffer_mapped meaning that the _blocknr field in the buffer is filled in with a physical block number 2008-10-14 20:46 and a bit is set in the buffer flags to indicate this is so 2008-10-14 20:47 actually, that field is entirely redudant in the case of the buffer cache 2008-10-14 20:47 because we can always know the physical device offset from the page->index of the underlying page that stores the buffer data 2008-10-14 20:48 this code returns with a pointer to buffer in "ret" 2008-10-14 20:48 crufty stuff 2008-10-14 20:49 ok, that was the fast path 2008-10-14 20:49 if the buffer wasn't there then we fall onto the slow path 2008-10-14 20:49 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1403 2008-10-14 20:49 in __getblk 2008-10-14 20:50 see that hardsector size stuff 2008-10-14 20:50 largely legacy 2008-10-14 20:50 doesn't do much except create bugs these days 2008-10-14 20:51 lol 2008-10-14 20:51 in __getblock_slow we see an attempt at integration with the vm cache shrinking code 2008-10-14 20:51 it's not pretty 2008-10-14 20:52 but sometime go look at grow_buffers 2008-10-14 20:52 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1118 ? 2008-10-14 20:52 yes 2008-10-14 20:53 "__getblk() cannot fail - it just keeps trying." http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1396 2008-10-14 20:54 in spite of this assertion, the kernel is littered with code to take evasive action if getbllk returns NULL 2008-10-14 20:54 lol 2008-10-14 20:55 should just let that segfault in the tux3 kernel port, perhaps with a pointer to the comment 2008-10-14 20:55 1400 * __getblk() will lock up the machine if grow_dev_page's try_to_free_buffers() 2008-10-14 20:55 1401 * attempt is failing. FIXME, perhaps? 2008-10-14 20:55 wheee 2008-10-14 20:55 try_to_free_buffers is the worst function in the entire kernel 2008-10-14 20:55 whee indeed 2008-10-14 20:56 it's vm, not vfs so we will not look at it right now 2008-10-14 20:56 all this buffer stuff is very fragile and arguably broken 2008-10-14 20:56 it's a credit to the bug chasing talents of people like akpm and linus that it works at all 2008-10-14 20:57 just to give everybody some sense of confidence in what we are about to do ;) 2008-10-14 20:57 you didn't come to university to have the truth softened, right? 2008-10-14 20:57 :P 2008-10-14 20:57 I'm trying to understand why we have per-file and per-metadata pagecache + bufferheads, instead of just device cache 2008-10-14 20:58 this is just device cache 2008-10-14 20:58 ah 2008-10-14 20:58 you mean why not throw away file caches? 2008-10-14 20:58 basically 2008-10-14 20:58 because we need to index cache objects by logical file offset 2008-10-14 20:58 or have filecaches just be pointers to the right pages of the device cache 2008-10-14 20:58 good idea 2008-10-14 20:59 probably a very good idea 2008-10-14 20:59 we' kind of in transition here 2008-10-14 20:59 unifying the page and buffer cache, which used to be a lot more separate 2008-10-14 21:00 linux 2.0 actually copyied data between them to get something resembling coherence 2008-10-14 21:00 right, but right now we have a lot of copying of data around, right? 2008-10-14 21:00 so it's better than it was, which was really really awful 2008-10-14 21:00 we don't, no 2008-10-14 21:00 it's pretty much all done with pointers 2008-10-14 21:00 if you have disk - partition - lvm physical - lvm volume - lvm logical - filesystem - file 2008-10-14 21:00 then how many times to we copy 4KB of data in order to read 4KB? 2008-10-14 21:00 still all just pointers 2008-10-14 21:00 none usually 2008-10-14 21:01 sorry 2008-10-14 21:01 one 2008-10-14 21:01 copy_to_user 2008-10-14 21:01 one dma into the cache and one copy_to_user 2008-10-14 21:01 if it's memory mapped, then no copy_to_user 2008-10-14 21:01 just the dma 2008-10-14 21:03 right, because there's no device cache 2008-10-14 21:03 so unless there's some raid involved, it doesn't much matter 2008-10-14 21:04 it would be very nice to use the buffer cache as a device cache 2008-10-14 21:04 and we should 2008-10-14 21:04 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-10-14 21:04 so, if instead of reading from userspace, we mmap, will that be just a dma to the mmapped region? 2008-10-14 21:04 but it is a lot of work, and as you can see, there is some very fragile code that _will_ break when we mess with this 2008-10-14 21:04 that will 2008-10-14 21:04 dma to the physical page, which is mapped into a process memory space 2008-10-14 21:05 will the dma happen on mmap, or when we later try to read a not-present page? 2008-10-14 21:05 I'm guessing the latter 2008-10-14 21:06 make sense to be the latter! 2008-10-14 21:06 you can make a big areas :P 2008-10-14 21:06 ok, just to wrap up our tour of getblk, the place where a buffer is actually created and inserted into the buffer cache is grow_buffers 2008-10-14 21:06 folks 2008-10-14 21:06 rather badly misnamed function, which is why I had to look at it half a dozen times to know where this happens 2008-10-14 21:06 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1081 2008-10-14 21:07 (interestingly a second implementation in raid5) 2008-10-14 21:07 grow_dev_page <- trying to continue the grand tradition of finding ever worse names for functions 2008-10-14 21:07 no wonder bsd guys tend to slit their wrists when forced to read linux code 2008-10-14 21:08 probably explains why there are so few bsd guys 2008-10-14 21:08 :-) 2008-10-14 21:08 yes, we all know linux leads to killer filesystems 2008-10-14 21:08 eek 2008-10-14 21:09 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1093 <- compare to #1108 2008-10-14 21:09 and promise me you will never write code like that 2008-10-14 21:10 I believe that does actually do what it's supposed to 2008-10-14 21:10 :D 2008-10-14 21:10 could use a comment though 2008-10-14 21:10 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1028 <- finally, here is where the work gets done 2008-10-14 21:10 we enter a page into the page cache, with buffers on it 2008-10-14 21:10 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1027 2008-10-14 21:11 I think this code actually came from me 2008-10-14 21:11 way back 2008-10-14 21:11 started as my hack to make htree work in the page cache of a file 2008-10-14 21:11 we seem to have a lot of ways to allocate memory... 2008-10-14 21:11 linus liked that idea and decided to use it for the buffer cache 2008-10-14 21:11 there doesn't seem to be much deallocation 2008-10-14 21:11 a good idea 2008-10-14 21:11 maze, there only needs to be deallocation in one spot 2008-10-14 21:12 shrink_caches 2008-10-14 21:12 that is indeed a magical and good thing 2008-10-14 21:12 yes 2008-10-14 21:12 are all page caches actually in one lru then? 2008-10-14 21:12 the kernel is this kind of organica, self cleaning thing 2008-10-14 21:12 has to keep moving, like a shark 2008-10-14 21:13 filling up cache with new stuff about to be used, evicting old stuff to make room for it 2008-10-14 21:13 (wouldn't that lru then be a source of lock contention on multi-way smp?) 2008-10-14 21:13 yes, all pages in the system are in one lru 2008-10-14 21:13 this actually doesn't make complete sense 2008-10-14 21:13 since dirty pages these days do not tend to be evicted via the page lru at all 2008-10-14 21:13 but by inode flushes 2008-10-14 21:14 because a filesystem can't afford to have the vm writing out random pages in orders that violate ACID constraints 2008-10-14 21:15 ok we went into bonus time 2008-10-14 21:15 questions on thursday ;) 2008-10-14 21:15 :-) 2008-10-14 21:16 ;-) 2008-10-14 21:17 how'd we do on the interesting front this time? 2008-10-14 21:17 it's complex... 2008-10-14 21:17 I feel like vast pieces of this should be avoided in new fs code 2008-10-14 21:17 we're going to wallow in it, unfortunately 2008-10-14 21:18 because using anything other than buffers to access your metadata blocks leads to worse horrors 2008-10-14 21:18 ie. there should be no references to buffer heads at all 2008-10-14 21:18 nice idea except when your block size is smaller than a page 2008-10-14 21:18 see, the concept of 'metadata blocks' 2008-10-14 21:18 is something I have issue with ;-) 2008-10-14 21:18 fixing that problem will lead to reinventing the buffer cache 2008-10-14 21:19 I anxiously await your proposal to replace the notion of metadata 2008-10-14 21:19 not metadata... just metadata blocks 2008-10-14 21:19 "hyperdata" 2008-10-14 21:19 I see 2008-10-14 21:19 with? 2008-10-14 21:19 metadata extents? 2008-10-14 21:19 which would be... simpler? 2008-10-14 21:19 buy not having metadata blocks, you should be able to get acid with no effort (or very little additional effort) 2008-10-14 21:20 buy -> by 2008-10-14 21:20 what do you use instead of metadata blocks? 2008-10-14 21:21 hmm, that's hard to describe a not-fully thought out idea 2008-10-14 21:21 but basically a forward log 2008-10-14 21:21 what about the cache? 2008-10-14 21:21 combined with always writing to free disk space 2008-10-14 21:21 cache of what? 2008-10-14 21:21 metadata 2008-10-14 21:21 in memory structure doesn't have to have anything in common with on-disk 2008-10-14 21:22 probably some sort of tree in sparse file or something though 2008-10-14 21:22 it's very helpful if it does 2008-10-14 21:22 the buffer cache already is a tree 2008-10-14 21:22 and if you have a sparse file, you have to have metadata for that file somewere 2008-10-14 21:22 in the tree of course ;-) 2008-10-14 21:22 where, in another file? and now does that recursion terminate? 2008-10-14 21:23 that's why you have a forward log 2008-10-14 21:23 with care ;-) 2008-10-14 21:23 but you still haven't explained how your metadata is cached 2008-10-14 21:23 the tree is in a file, the file is page cached 2008-10-14 21:23 and how are the blocks of that page cache mapped to the disk? 2008-10-14 21:24 using the tree 2008-10-14 21:24 which tree? 2008-10-14 21:24 the one stored on those blocks 2008-10-14 21:24 what if you have a cache miss on one of those blocks? 2008-10-14 21:24 yeah, it's hard to describe 2008-10-14 21:25 I should probably work it out fully... 2008-10-14 21:25 yes, and you will realized that we're not that far off with the current arrangement 2008-10-14 21:25 (a cache miss is not a big problem with a tree, so long as it's not a mere radix-tree) 2008-10-14 21:25 the part that sucks is being stuck with page size resolution, we need more flexibility than that 2008-10-14 21:26 which is what this whole creaky mess of buffer_heads is about 2008-10-14 21:26 it's a solution, just not a good solution 2008-10-14 21:26 improving it would be a good project 2008-10-14 21:26 not a summer project though 2008-10-14 21:26 right 2008-10-14 21:27 I can understand why it's done the way it is 2008-10-14 21:27 it's just duplication of code/concepts and multiple opportunities to screw up and get locking wrong in edge cases 2008-10-14 21:27 a couple of things I propose to do about it 2008-10-14 21:27 1) let struct page denote objects with sizes larger and smaller than page size, thus obviating the need for struct buffer_head 2008-10-14 21:28 larger... probably easy 2008-10-14 21:28 smaller... ur 2008-10-14 21:28 brain fault 2008-10-14 21:28 2) unify the page and buffer cache so that a miss in a page cache then looks in the buffer cache to see if the page is there, so we use the buffer cache as a large device cache 2008-10-14 21:29 3) implement physical readahead in the unified cache 2008-10-14 21:29 4) implement active page table defragmentation so that we can realistically work with larger block sizes 2008-10-14 21:29 5) dynamically allocate struct page's ;-) 2008-10-14 21:30 that's part of (1) 2008-10-14 21:30 yeah, I was wondering if you meant to include that or not 2008-10-14 21:31 a crude form 2008-10-14 21:31 only dynamically allocate to fill in the gaps between the ones in the array 2008-10-14 21:31 gaps 2008-10-14 21:31 ? 2008-10-14 21:32 I think if you want to do dynamic allocation of struct page it's an all-or-nothing scenario 2008-10-14 21:32 yes, you have an array of 4k physical pages, but want to have 1K struct pages, so 3 1K struct pages go between each two 4K physical pages 2008-10-14 21:32 not at all 2008-10-14 21:32 currently physical page address -> struct page is a (PA>>PAGE_SIZE)*sizeof(struct page) + base operation 2008-10-14 21:32 just dynamically allocating for the sub-physical sized pages works out ok 2008-10-14 21:33 you could get much more invasive about this, but probably not a good idea for a first try 2008-10-14 21:33 ACTION says good night (and thanks for the lecture) 2008-10-14 21:33 by virtue of what a 'page' is for the cpu, I'm not sure sub-pagesize pages are realistic 2008-10-14 21:33 night raz 2008-10-14 21:33 good night 2008-10-14 21:33 what happens when you have conflicting access permissions on two sub-pages? 2008-10-14 21:34 the sub-pagesize pages are basically just for locking 2008-10-14 21:34 which is the only really indispensible thing that buffer_heads do at present 2008-10-14 21:34 don't conflict 2008-10-14 21:34 subpages are not entered into page table entries 2008-10-14 21:34 they can't be 2008-10-14 21:34 in that case couldn't we just have a byte of 8 bits for 8 512 byte locks in struct page? 2008-10-14 21:35 possibly, but there page oriented code uses more fields than that 2008-10-14 21:35 for example, the ->index 2008-10-14 21:35 used to locate the apge in a page cache 2008-10-14 21:35 we want all that code to continue to work 2008-10-14 21:36 otherwise we have a massive rewrite in store for everything that touches a page 2008-10-14 21:37 a change like this... 2008-10-14 21:37 it would probably end up with a massive rewrite almost any decent way you do it 2008-10-14 21:38 I don't think it'd be possible to have some sort of shim compatibility translation layer 2008-10-14 21:38 you could potentially leave buffer_heads around, until everthing had been ported... 2008-10-14 21:39 but actually have the same interface... unlikely 2008-10-14 21:44 the subpage concept isn't that big a deal 2008-10-14 21:45 mostly just affects things like grab_cache_page that we looked at 2008-10-14 21:45 a whole bunch of block io library cruf goes away 2008-10-14 21:45 because we lose the list of buffers per page 2008-10-14 22:05 http://www.newmobilecomputing.com/thread?333779 2008-10-14 22:06 where? 2008-10-14 22:06 ?where? 2008-10-14 22:07 oh 2008-10-14 22:07 mention of ftux3 2008-10-14 22:07 trying to locate the location 2008-10-14 22:07 of the summit 2008-10-14 22:07 what summit? 2008-10-14 22:07 oh right yeah 2008-10-14 22:07 the one in the article 2008-10-14 22:07 was a pretty lame summit 2008-10-14 22:07 flips: you get google alerts for tux3 also i see ;) 2008-10-14 22:07 oh that summit 2008-10-14 22:07 even worse 2008-10-14 22:08 shapor, they're great 2008-10-14 22:08 NYC 2008-10-14 22:09 http://www.linux.com/feature/132203 <- joe barr's take 2008-10-14 22:09 all in all, a very bad thing 2008-10-14 22:09 for linux 2008-10-14 22:09 getting way too much corp polictics in the works 2008-10-14 22:10 linux foundation... not really representing the community 2008-10-14 22:10 http://www.austinlug.org/node/259 2008-10-14 22:12 "Just out of respect for the natives of Austin, they should have made a choice not to slouch back on bureaucratic policy and instead, make an exception to that policy in order to be good guests and pay respect to the local Linux Kernel enthusiasts. 2008-10-14 22:12 Instead, they big-timed him and sent him home. That's when you know your movement has been co-opted and it's no longer a progressive social force." 2008-10-14 22:16 http://blog.internetnews.com/skerner/2008/10/no-press-at-linux-foundation-e.html 2008-10-14 22:51 hey all 2008-10-14 22:51 hi 2008-10-14 22:52 hello flips 2008-10-14 23:06 say goodnight all 2008-10-14 23:06 hey pranith 2008-10-14 23:06 hey tim_dimm 2008-10-14 23:06 past my bedtime 2008-10-14 23:07 off to sleep huh? 2008-10-14 23:07 yup 2008-10-14 23:07 hmm 2008-10-14 23:07 twins wore me out today 2008-10-14 23:07 goodnight then 2008-10-14 23:07 :) 2008-10-14 23:07 :-) 2008-10-14 23:07 hmm, lucky you 2008-10-14 23:07 boy and a girl 2008-10-14 23:07 very lucky 2008-10-14 23:07 yeah, i remember :) 2008-10-14 23:07 later guys 2008-10-14 23:46 -!- cydork(~cydoork@122.169.100.164) has joined #tux3 2008-10-14 23:48 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3