2008-09-18 02:48 -!- kbingham(~kbingham@92.10.191.55) has joined #tux3 2008-09-18 03:35 folks 2008-09-18 03:35 not much irc traffic today 2008-09-18 03:35 how's it going ? 2008-09-18 03:44 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-09-18 04:03 today was pretty busy 2008-09-18 04:03 off channel action irl 2008-09-18 04:04 a sys_unfuck syscall was proposed, and useful work was also done 2008-09-18 04:05 irl ? 2008-09-18 04:05 in real life 2008-09-18 04:05 ok 2008-09-18 04:05 good, cabal meeting of sorts ? 2008-09-18 04:07 full blown 2008-09-18 04:07 oh really ? unannounced ? 2008-09-18 04:07 true 2008-09-18 04:07 who was there ? 2008-09-18 04:07 flips: are you getting private /msg ? 2008-09-18 04:07 can't say it was a cabal meeting 2008-09-18 04:07 ok 2008-09-18 04:09 regarding extents ? 2008-09-18 04:10 one thing indeed 2008-09-18 04:10 coding right now 2008-09-18 04:10 tricky 2008-09-18 04:10 yeah 2008-09-18 04:16 ok night 2008-09-18 04:17 surprised you're up this late still 2008-09-18 04:18 me too 2008-09-18 07:16 -!- Kirantpatil(~kiran@122.167.223.69) has joined #tux3 2008-09-18 07:16 -!- Kirantpatil(~kiran@122.167.223.69) has left #tux3 2008-09-18 07:57 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-18 08:36 -!- openblast(~quassel@static.230.173.47.78.clients.your-server.de) has joined #tux3 2008-09-18 08:57 -!- openblast(~quassel@static.230.173.47.78.clients.your-server.de) has joined #tux3 2008-09-18 09:21 -!- kbingham(~kbingham@92.20.194.187) has joined #tux3 2008-09-18 10:15 -!- tim_dimm_(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-18 10:20 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-18 10:24 -!- tim_dimm_(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-18 10:42 -!- kbingham(~kbingham@92.20.194.187) has joined #tux3 2008-09-18 10:47 -!- konrad(~konrad@D-128-208-53-196.dhcp4.washington.edu) has joined #tux3 2008-09-18 11:00 top 2008-09-18 11:57 -!- pgquiles(~pgquiles@50.Red-79-153-248.staticIP.rima-tde.net) has joined #tux3 2008-09-18 13:53 flips: btrfs claims to eventually have online disk checking 2008-09-18 13:54 a coworker just attended a btrfs talk 2008-09-18 16:17 dwalk_next is hard to write 2008-09-18 16:17 given some context already set up, returns the next extent from a dleaf 2008-09-18 16:18 probably will turn into a post to the list 2008-09-18 16:18 big complexity in a small corner 2008-09-18 16:18 as expected, actually 2008-09-18 16:56 hey 2008-09-18 17:02 pong 2008-09-18 17:02 how's it going ? 2008-09-18 19:07 -!- ChanServ changed mode/#tux3 -> +o flips 2008-09-18 19:07 -!- flips changed topic to "Tux3 list membership roars past 100! ~ http://tux3.org ~ Tux3 U, right here Tue and Thur 8 p.m. Pacific Time ~ Next session: bio level data transfer" 2008-09-18 19:08 -!- flips changed topic to "Tux3 list membership roars past 100! ~ http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 p.m. Pacific Time ~ Next session: bio level data transfer" 2008-09-18 19:08 maze, ping 2008-09-18 19:19 -!- flips changed topic to "Tux3 list membership roars past 100! ~ http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 p.m. Pacific Time ~ Next session: bio level data transfer ~ Seinfeld ads canned, thanks for small mercies" 2008-09-18 19:19 -!- flips changed mode/#tux3 -> -o flips 2008-09-18 19:27 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-18 19:31 I figure if I make myself a new cuppa dark french right now have a fighting chance of getting streaming dleaf read working by midnight 2008-09-18 19:31 maybe even write 2008-09-18 19:31 ACTION takes action on that item 2008-09-18 19:34 ACTION is browsing LDD a little... 2008-09-18 19:48 -!- BSD(~bandan@pool-71-174-177-86.bstnma.east.verizon.net) has joined #tux3 2008-09-18 19:52 -!- Kirantpatil(~kiran@122.167.219.189) has joined #tux3 2008-09-18 19:53 -!- Kirantpatil(~kiran@122.167.219.189) has left #tux3 2008-09-18 19:53 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-18 19:55 Um.. How do I clone the git ddtree ? 2008-09-18 19:55 tried git clone? 2008-09-18 19:55 on the url I posted? 2008-09-18 19:55 Ya I mean what's the URL ? Sorry I probably missed it :( 2008-09-18 19:56 in a message somewhere, "tux3 report: what's next" 2008-09-18 19:56 alternatively, go to phunq.net/ddtree 2008-09-18 19:57 has gitweb and everything 2008-09-18 19:58 git clone http://phunq.net/tux3fs is what I tried 2008-09-18 19:58 it would be nice it git just worked 2008-09-18 19:59 like mercurial 2008-09-18 19:59 kay 2008-09-18 19:59 hmm.. 2008-09-18 19:59 a matter of getting the url right 2008-09-18 19:59 I think it gets confused by symlinks 2008-09-18 20:00 Yay I will just do it with hg, never mind :) 2008-09-18 20:00 git is just the kernel part 2008-09-18 20:00 you don't need that right now 2008-09-18 20:01 so mercurial 2008-09-18 20:01 nice nick 2008-09-18 20:01 :) 2008-09-18 20:03 I'll clean up the git cloneability later 2008-09-18 20:03 Thanks! 2008-09-18 20:03 manshack underwent a major re-arrange 2008-09-18 20:03 just another point on the "merucial rules" curve I think 2008-09-18 20:03 yummy 2008-09-18 20:03 wow 2008-09-18 20:03 we started 3 minutes ago 2008-09-18 20:04 no maze 2008-09-18 20:04 so we will take a slight change in session plan 2008-09-18 20:04 instead of doing bio transfers we will continue drilling down into generic_write 2008-09-18 20:05 ok, somebody summarize where we got to, please... mention _2copy 2008-09-18 20:06 ACTION looks at RazvanM 2008-09-18 20:06 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2063 2008-09-18 20:06 and the summary? 2008-09-18 20:07 and we got there from here: http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2319 2008-09-18 20:07 the 2copy is used when there is no support for write_begin 2008-09-18 20:08 what is happening in this function? 2008-09-18 20:08 and we use prepare_Write and commit_write 2008-09-18 20:09 the data is moved to some kernel pages and then to some user memory? :P 2008-09-18 20:09 hi all 2008-09-18 20:09 hi 2008-09-18 20:09 ACTION takes a seat at the back of the room 2008-09-18 20:09 the data is moved from user memory onto buffer pages 2008-09-18 20:10 then the buffer pages are committed to disk 2008-09-18 20:10 sorry... I got the order wrong :P 2008-09-18 20:10 2copy is the lamest name anybody could have possibly chosen :p 2008-09-18 20:10 appears to be the real thing though 2008-09-18 20:11 just where we should be reading 2008-09-18 20:11 __grab_cache_page is the heart of it 2008-09-18 20:11 other things are decoration 2008-09-18 20:11 such as fault_in_readable 2008-09-18 20:12 just a quick q: why some functions start with uppercase? 2008-09-18 20:12 attempts to deal with the many dangerous recursions 2008-09-18 20:12 with varying degrees of success in terms of robustness and readability 2008-09-18 20:12 razvanm, random hackers 2008-09-18 20:12 what is write_begin? 2008-09-18 20:12 sometimes have studly caps days 2008-09-18 20:12 hey 2008-09-18 20:13 write_begin is a hook for some specialized user I don't know about 2008-09-18 20:13 "completely general interface used inexactly one place" like as not 2008-09-18 20:13 or "homework for shapor" 2008-09-18 20:13 hey maze 2008-09-18 20:13 :) 2008-09-18 20:13 ok 2008-09-18 20:13 ok, we can return to the original session plan 2008-09-18 20:14 maze, the plan is for you to report your findings on basic bio transfers 2008-09-18 20:14 lol 2008-09-18 20:14 point to code (you might want to pastie it) 2008-09-18 20:14 uhm, lol 2008-09-18 20:14 how about I put a tar.gz up? 2008-09-18 20:14 don't copy in the channel unless it's 1/2 lines 2008-09-18 20:14 that too 2008-09-18 20:14 pastie is good, use your taste 2008-09-18 20:15 if you had it checked in you could point a urls 2008-09-18 20:15 so... remember to check in next time ;) 2008-09-18 20:15 uploading 2008-09-18 20:15 since you code is so short I'd suggest just pasting the whole thing 2008-09-18 20:16 http://m.a.z.e.pl/junkfs.tar.gz 2008-09-18 20:16 lol nice domain! 2008-09-18 20:16 really 2008-09-18 20:16 leet 2008-09-18 20:16 yeah, I own z.e.pl 2008-09-18 20:17 almost as cool as cr.yp.to 2008-09-18 20:17 so I also have m.a@z.e.pl 2008-09-18 20:17 heh 2008-09-18 20:17 "opened with ark" 2008-09-18 20:17 or m@z.e.pl - whichever you prefer 2008-09-18 20:17 ok, who has got the code open, and who not? 2008-09-18 20:17 me not 2008-09-18 20:18 ok, got it open 2008-09-18 20:18 ark works pretty fscking well 2008-09-18 20:18 I'm impressed 2008-09-18 20:18 mind you - this is very rough, and mostly was debugging plus getting it working 2008-09-18 20:18 I'm still not quite sure of everything, and although I fixed the last hang bug I found 2008-09-18 20:18 I haven't since tested 2008-09-18 20:18 so I'm not sure ;-) 2008-09-18 20:18 don't worry, shapor will hurt you if you get anything wrong 2008-09-18 20:19 lol 2008-09-18 20:19 ACTION wields axe 2008-09-18 20:19 so... where does the bio read setup start? 2008-09-18 20:20 do you want me answering? 2008-09-18 20:20 yes 2008-09-18 20:20 you should have been asking ;) 2008-09-18 20:20 hmm. 2008-09-18 20:20 right 2008-09-18 20:20 so pretty much everything except super.c is either makefile or debug 2008-09-18 20:20 noticed 2008-09-18 20:21 and the bottom of super.c is pretty standard module init stuff 2008-09-18 20:21 nicely lindented 2008-09-18 20:21 for the moment we only care about the bio transfer 2008-09-18 20:21 and above that is the standard fs registering and fs_ops stuff 2008-09-18 20:22 and from there we get to junkfs_get_sb which calls into get_sb_bdev 2008-09-18 20:22 which calls junkfs_fill_super as a callback 2008-09-18 20:22 and that's were all the action is 2008-09-18 20:22 action :) 2008-09-18 20:22 get_sb_bdev also exclusively opens the block device for us, so that's nice 2008-09-18 20:22 finally, after 4 days of tux3 U 2008-09-18 20:22 at the point we enter into junkfs_fill_super, we have an exclusively opened block device 2008-09-18 20:23 which is passed in the superblock 2008-09-18 20:23 sb->s_bdev 2008-09-18 20:23 in junkfs_fill_super we then proceed to allocate memory for 3 basic objects 2008-09-18 20:23 1) memory to read in the 512 byte (SB_SIZE) superblock 2008-09-18 20:23 1 sector sb, leet 2008-09-18 20:23 2) an object to store state (in the bio->b_private field) 2008-09-18 20:24 c) a bio 2008-09-18 20:24 1 and 2 are just normal kmalloc's 2008-09-18 20:24 3 is via bio_alloc 2008-09-18 20:24 thus 1 and 2 will need to be kfree'd 2008-09-18 20:24 -!- Bushman(~marcin@c-76-23-106-132.hsd1.sc.comcast.net) has joined #tux3 2008-09-18 20:24 and 3 will need to be bio_put'ed at some point before the end of junkfs_fill_super 2008-09-18 20:24 or we'll leak 2008-09-18 20:25 anyway, standard handling of error returns on all the allocs 2008-09-18 20:25 and we get to: 2008-09-18 20:25 bio->bi_bdev = sb->s_bdev; 2008-09-18 20:25 <------>bio->bi_sector = 0; // first sector 2008-09-18 20:25 <------>s = bio_add_page(bio, virt_to_page(buf), SB_SIZE, offset_in_page(buf)); 2008-09-18 20:25 which is most of the bio preparation stage 2008-09-18 20:25 Bushman: hi Marcin 2008-09-18 20:25 the real meat 2008-09-18 20:25 we set the bio to refer to the correct block device 2008-09-18 20:25 marcin, hi 2008-09-18 20:25 and (for now - this is all junkfs ;-) ) we just read the first sector 2008-09-18 20:26 sectors in new linux are always exactly 512 bytes 2008-09-18 20:26 that's leet nuff for us 2008-09-18 20:26 so we're saying here offset 0 * 512 into the block dev 2008-09-18 20:26 then we need to tell the bio where to store the data 2008-09-18 20:26 (or read from, since a write would be identical) 2008-09-18 20:26 right, struct bio is sector-addressed for no good reason 2008-09-18 20:26 s = bio_add_page(bio, virt_to_page(buf), SB_SIZE, offset_in_page(buf)) 2008-09-18 20:26 hello Daniel 2008-09-18 20:27 this actually gives our carefully allocated memory to the bio as memory 2008-09-18 20:27 bushman, enjoy ;) 2008-09-18 20:27 note that bio_add_page takes (bio, struct page*, len, ofs) 2008-09-18 20:27 i dunno if enjoy is the right word for kernel code just before bedtime ;) 2008-09-18 20:27 so we pass in the bio, then convert the bufs address to a page via virt_to_page 2008-09-18 20:27 and you could write it out in full in about as much code as the function call takes 2008-09-18 20:27 pass the length of the block 2008-09-18 20:28 and calc the offset from the page struct for the ofs via offset_in_page 2008-09-18 20:28 bushman, then just enjoy the geek banter 2008-09-18 20:28 virt_to_page? 2008-09-18 20:28 I'm assuming at this point that a kmalloc can't give us memory split across pages 2008-09-18 20:28 - not sure if this is correct 2008-09-18 20:28 shapor, great question 2008-09-18 20:28 maze, correct 2008-09-18 20:28 so buf was kmalloc'ed, so it's a virtual kernel memory address 2008-09-18 20:29 maze, unless the kmalloc is bigger than a page 2008-09-18 20:29 virt_to_page gives us the struct page * for the kaddr we pass to it 2008-09-18 20:29 [flips: of course] 2008-09-18 20:29 maze, and why do we need the struct page? 2008-09-18 20:29 because that's what bios want 2008-09-18 20:29 if you look at what a bio is 2008-09-18 20:29 it's 3 things 2008-09-18 20:30 the struct bio 2008-09-18 20:30 which has a lot of management fields 2008-09-18 20:30 the bvec which 2008-09-18 20:30 is an array of a tiny struct with 3 fields 2008-09-18 20:30 { struct page * p; int len; int ofs; } 2008-09-18 20:31 so basically a list of where to put the next len bytes, specifying memory via page/ofs pairs 2008-09-18 20:31 this is for two reasons: 2008-09-18 20:31 [at least as far as i can tell] 2008-09-18 20:31 a) most hw (ie. stuff the blockdevice drivers care about) 2008-09-18 20:31 cares about physicall addresses and not virtual kernel addresses 2008-09-18 20:31 right 2008-09-18 20:31 ie. for dma and all that good for performance goodness 2008-09-18 20:32 b) this can also be used for data xfr into userspace 2008-09-18 20:32 and there is no guarantee userspace memory has a mapping into kernel space 2008-09-18 20:32 [high mem] 2008-09-18 20:32 the big reason: scatter gather 2008-09-18 20:32 this is a dma interface in disguise 2008-09-18 20:32 very effective one 2008-09-18 20:32 this also makes it easier to coallesce physically neighboring memory together into the bvecs 2008-09-18 20:32 precisely 2008-09-18 20:33 right, another way of saying scatter gather 2008-09-18 20:33 notice that in bio_alloc 2008-09-18 20:33 we passed in a 1 2008-09-18 20:33 that 1 is the number of bvecs in the bvec area allocated to the bio 2008-09-18 20:33 so that limits how many non-contig pieces of memory we can have in the bio 2008-09-18 20:33 ah 2008-09-18 20:33 here - all we need is 1 2008-09-18 20:33 and because you did that, you could have initialized your one bvec with a simple structure assignment 2008-09-18 20:33 instead of the function call 2008-09-18 20:33 right. 2008-09-18 20:33 which does a bunch of stuff you don't need 2008-09-18 20:34 oh well. 2008-09-18 20:34 does a bio_vec describes exactly one page? 2008-09-18 20:34 maze, exactly 2008-09-18 20:34 no 2008-09-18 20:34 bv_len 2008-09-18 20:34 it describes a start page with ofset and a length 2008-09-18 20:34 the length may exceed that page and cross into however many next ones 2008-09-18 20:34 the precise rules for merging are overridable 2008-09-18 20:34 it describes a data region that resides within one page 2008-09-18 20:35 so the bio interface will be quite good for extents 2008-09-18 20:35 many device drivers have limits on how many sectors they can transfer in one go (ie. 200 or so) 2008-09-18 20:35 maze, you can't cross a page with a bvec 2008-09-18 20:35 flips, you sure? 2008-09-18 20:35 sadly, or perhaps sanely 2008-09-18 20:35 I certainly ain't ;-) 2008-09-18 20:36 pretty sure 2008-09-18 20:36 but then I don't know what I'm talking about here 2008-09-18 20:36 never seen it done ;) 2008-09-18 20:36 these are still all guesses 2008-09-18 20:36 pollacks ain't sane, just ask Shap 2008-09-18 20:36 I thought they merged by themselves 2008-09-18 20:36 hmm, well, first homework I;d guess 2008-09-18 20:36 one more q: bv_len is counting bytes or sectors? :P 2008-09-18 20:37 merging happens in the physical driver 2008-09-18 20:37 good question 2008-09-18 20:37 anyway bio_add_page returns how much it successfully added (or what the current total is, not sure) in bytes 2008-09-18 20:37 bytes I think 2008-09-18 20:37 so if everything is good it should be 512 at this point 2008-09-18 20:37 hence the check 2008-09-18 20:37 it's pretty badly braindamaged i that respect, counting in different units for no good reason 2008-09-18 20:37 if it doesn't match, we've got a problem - which mind you - AFAICT - can't happen 2008-09-18 20:37 and we bio_put to free the structure and basically error out 2008-09-18 20:38 [of course here we always error out, because this is junkfs (tm)] 2008-09-18 20:38 anyway if s==512 then we're good 2008-09-18 20:38 oh bv_len is definitely bytes 2008-09-18 20:38 we setup to more fields in the bio 2008-09-18 20:38 bi_end_io is the call back for when the bio is processed (or errors out) 2008-09-18 20:39 when the disk completion interrupt fires 2008-09-18 20:39 key point 2008-09-18 20:39 bi_private is a pointer to our data (the mz struct) so that we can figure out what we're talking about in the endio handler 2008-09-18 20:39 and then we submit the bio for READ 2008-09-18 20:39 now this (ie. bios) are inherently asynchronous 2008-09-18 20:40 so at this point it might have already completed - it could have been cached and come back immediately 2008-09-18 20:40 right... it's the _only_ way to recover a memory context for a completed bio 2008-09-18 20:40 [I think] 2008-09-18 20:40 or we might need to wait some indeterminate amount of time 2008-09-18 20:40 it's much more direct than that 2008-09-18 20:40 here's where we make use of the waitqueue which we helpfully placed in the mz struct 2008-09-18 20:40 disk raises interrupt -> endio gets called 2008-09-18 20:40 in interrupt context 2008-09-18 20:40 this is as on the metal as you will get without going hypervisor 2008-09-18 20:41 oh, so basically end_io should do as little as feasibly possible 2008-09-18 20:41 preferably as simple as it is here 2008-09-18 20:41 yes 2008-09-18 20:41 again yes 2008-09-18 20:41 is it the right place to call bio_put ? 2008-09-18 20:41 though I often get excessive there ;) 2008-09-18 20:41 anyway, earlier on, we'd already initialized the waitqueue, so now we can just wait on it 2008-09-18 20:41 in the endio handler? 2008-09-18 20:42 except wait needs not only a waitqueue (wq) but also a condition 2008-09-18 20:42 [which it checks _first_] 2008-09-18 20:42 maze, _interruptible? 2008-09-18 20:42 hence mz struct also contains a boolean 2008-09-18 20:42 flips: yeah, no idea what the right choice is there, meaning to ask about this 2008-09-18 20:42 shapor, yes 2008-09-18 20:42 very important question 2008-09-18 20:42 flips, so how would it behave in a hypervisor? any changes? does it lose determinism? 2008-09-18 20:42 why does it matter? 2008-09-18 20:43 if interruptible, you better be prepared to field anything that can be thrown at you 2008-09-18 20:43 if uninterruptible, you'd better be able to prove it always completes 2008-09-18 20:43 is that the basis for atomicity then? 2008-09-18 20:43 so what could get thrown at us, and will the bio always complete? 2008-09-18 20:43 flips: what happens if there is an error 2008-09-18 20:43 bushman, we don't touch hypervisors 2008-09-18 20:43 disk io error or something 2008-09-18 20:43 if we did, it would be to implement hard realtime or something 2008-09-18 20:43 hypervisors should be transparent to the os 2008-09-18 20:43 does the endio handler get called? 2008-09-18 20:44 yes endio has err parameter 2008-09-18 20:44 bushman, there is some sense of atomicity here in the interruptible/noninterrupble distinction 2008-09-18 20:44 loose sense 2008-09-18 20:44 just to finish off this (junkfs_fill_super) function, we then dump the superblock via printk and free everything and return an error (junkfs remember.?) 2008-09-18 20:44 maze, in kernel interrupts don't just happen, you have to ask for them 2008-09-18 20:45 even with preemption 2008-09-18 20:45 ? 2008-09-18 20:45 or they get fielded on syscall exit 2008-09-18 20:45 SHOULD be transparrent, but since most of them mangle time into nonlinear, doesnt it screw up our predictions when interrupt is gonna finish? 2008-09-18 20:45 task switch is not interrupt 2008-09-18 20:45 it's caused by an interrupt 2008-09-18 20:45 oh i see you just aren't checking the err parameter in end_io_read 2008-09-18 20:45 you can get a task switch even with wait_uninterruptible 2008-09-18 20:45 probably should ;) 2008-09-18 20:45 so while in kernel space, my thread of execution is guaranteed not get interrupted by anything? 2008-09-18 20:46 right I should ;-) 2008-09-18 20:46 all that means is, an interrupt won't cause the wait to bail early 2008-09-18 20:46 you have to wrap your interruptible wait in a loop 2008-09-18 20:46 or write uninterruptible 2008-09-18 20:46 so interruptible here refers to what? can be interrupted by killing the mount process? 2008-09-18 20:46 which is probably what you want here 2008-09-18 20:46 just means the wait may bail before the wak 2008-09-18 20:46 wake 2008-09-18 20:47 so has to be in a loop, and you can't assume that what you were waiting for actually happened 2008-09-18 20:47 so i guess the big question here is how do we guarantee that the write is gonna complete? 2008-09-18 20:47 so I'd want uninterruptible? or interruptible and then on some interrupts somehow cancel and free the bio 2008-09-18 20:47 just write uninterruptible until you know kernel scheduling better ;) 2008-09-18 20:47 (read here) 2008-09-18 20:47 uninterruptable will cause it to be D too iirc 2008-09-18 20:47 bushman, it always completes 2008-09-18 20:47 D state 2008-09-18 20:48 with or without an error 2008-09-18 20:48 Bushman: it may complete with an error 2008-09-18 20:48 which gets passed to the endio handler 2008-09-18 20:48 yes, this is d state, the real thing 2008-09-18 20:48 which as written ignores all errors, and just marks the io as completed, frees the bio, and wakes the wq 2008-09-18 20:48 interruptable is not quite so severe i guess 2008-09-18 20:48 you are in d state any time you're waiting in kernel 2008-09-18 20:48 even interruptable? 2008-09-18 20:48 yes 2008-09-18 20:48 unless you're doing wait_interruptible? 2008-09-18 20:49 hmm 2008-09-18 20:49 flips: didn't we find that not to be the case 2008-09-18 20:49 with ddsnap 2008-09-18 20:49 even then I think 2008-09-18 20:49 hmm, so how could I get this to be abortable, in case for example the block device hangs on network? 2008-09-18 20:49 remember our threads were all D state 2008-09-18 20:49 you get a qualifier on your ps output 2008-09-18 20:49 until we changed it to interruptable 2008-09-18 20:50 maze, that's not your job, it's the job of the device insert/remove 2008-09-18 20:50 which of course means it's badly mismanaged ;) 2008-09-18 20:50 but... 2008-09-18 20:50 not your problem for now 2008-09-18 20:50 well what if we're running this off of a nbd or something like that, and the network gets pulled 2008-09-18 20:50 would the bio then just (eventually) return with an error to endio? 2008-09-18 20:50 that's nbd's problem 2008-09-18 20:50 again not yours 2008-09-18 20:51 you can try to do timeouts and things, but you're risking redudancy 2008-09-18 20:51 and confusion 2008-09-18 20:51 right 2008-09-18 20:51 risking redundancy ? 2008-09-18 20:51 duplicating functionality that is better performed at some other layer 2008-09-18 20:52 constant risk with the blind leading the blind ;) 2008-09-18 20:52 yeah 2008-09-18 20:52 good point 2008-09-18 20:52 but the blind leading the deaf is ok 2008-09-18 20:52 maze, that was a great walkthrough, and the code is great too 2008-09-18 20:52 yes! 2008-09-18 20:52 not perfect, but you don't need that to be great in linux ;) 2008-09-18 20:52 I stil don't quite understand a bunch of it 2008-09-18 20:52 MaZe: thanks, i was following closely with little time to type 2008-09-18 20:53 a few warts make it more real, like a european movie 2008-09-18 20:53 hah 2008-09-18 20:53 ACTION rolls eyeballs 2008-09-18 20:53 lol 2008-09-18 20:53 maze, I am going to cut and paste your code into fs/tux3/super.c 2008-09-18 20:53 and tux3 is going to read a leet sector sized sb too 2008-09-18 20:54 heh 2008-09-18 20:54 s/junkfs/tux3/ 2008-09-18 20:54 hehe 2008-09-18 20:54 exactly 2008-09-18 20:54 or s/tux3/junkfs/ 2008-09-18 20:54 depending on leetness or lack of it 2008-09-18 20:54 so it seems silly for every fs to have to do this 2008-09-18 20:54 is the vfs totally useless? 2008-09-18 20:54 yes 2008-09-18 20:55 pretty much 2008-09-18 20:55 what I still haven't found is how to specify the io priority of the bio you submit 2008-09-18 20:55 pretty close 2008-09-18 20:55 not completely 2008-09-18 20:55 lame but not useless 2008-09-18 20:55 better than NT 2008-09-18 20:55 I'm assuming it inherits from the ionice'ness of the process in whose context you're running 2008-09-18 20:55 maze, completely separate 2008-09-18 20:55 it's part of the elevator abstraction 2008-09-18 20:55 oh? 2008-09-18 20:56 huh? 2008-09-18 20:56 i was wondering that too 2008-09-18 20:56 inheriting anything is completely a property of the elevator plugin 2008-09-18 20:56 shouldn't submitting a read/write request to a blockdevice be exactly when this matters? 2008-09-18 20:56 see "request queue" 2008-09-18 20:56 oh, the mysterious q parameter 2008-09-18 20:56 one of the harder code reading projects in kernel 2008-09-18 20:56 it's a mess 2008-09-18 20:56 I saw all over the place 2008-09-18 20:56 that is apparently a field in the bio struct 2008-09-18 20:57 q is a carpet under which all kinds of doggie poo is swept 2008-09-18 20:57 it's really a bag tied onto the side of the bio 2008-09-18 20:57 we'll get rid of it before next christmas 2008-09-18 20:57 I hope 2008-09-18 20:57 I just want a nice aio read/write with priority interface for my coding 2008-09-18 20:57 you got it 2008-09-18 20:57 already 2008-09-18 20:58 well s/nice/nicer than what we had before/ 2008-09-18 20:58 that would be a good project.. a new aio interface 2008-09-18 20:58 right, I have the aio rw 2008-09-18 20:58 sounds like it should map easily enough.... 2008-09-18 20:58 bio transfer is aio at its purest 2008-09-18 20:58 yeah 2008-09-18 20:58 right, but you want prioritization in there 2008-09-18 20:58 should be easier than non aio realy 2008-09-18 20:58 and that's what I'm failing to see 2008-09-18 20:58 maze, in the elevator 2008-09-18 20:58 'scuze my newbness, but wouldnt priority be at odds with queuing that the controllers try to do? 2008-09-18 20:58 so does the bio go through the elevator? 2008-09-18 20:59 bushman, interactions, yes 2008-09-18 20:59 not all good 2008-09-18 20:59 well, you want something htb like for io 2008-09-18 20:59 best to try and harmonize with them 2008-09-18 20:59 wait a minute, what's the layering here? 2008-09-18 21:00 is the physical hw under the elevator under the bio 2008-09-18 21:00 vfs <-> bio <-> driver 2008-09-18 21:00 and where's the elevator? 2008-09-18 21:00 between bio and driver 2008-09-18 21:00 vfs <-> bio <-> elevator <-> driver 2008-09-18 21:00 right? 2008-09-18 21:00 vfs <-> bio <-> elevator <-> driver 2008-09-18 21:00 ? 2008-09-18 21:00 heh 2008-09-18 21:00 heh 2008-09-18 21:00 exactly 2008-09-18 21:00 so by choosing the request queue in the bio, I choose priority of the request with regards to other requests? 2008-09-18 21:00 and the presence/lack of the elevator is up to the driver or virtual driver even 2008-09-18 21:01 so the elevator can appear at multiple or no places in the stack 2008-09-18 21:01 so the elevator messes with fields in the bios? 2008-09-18 21:01 is this screwy? or is this just me...? 2008-09-18 21:01 and vice versa in an idiotic way... sometimes useful way 2008-09-18 21:01 maze, it's screwy 2008-09-18 21:01 not just you 2008-09-18 21:01 but better than we had in 2.4 2008-09-18 21:02 it's damn fast actually, compared to a disk 2008-09-18 21:02 we didn't have that a few years ago 2008-09-18 21:02 now it's looking slow again 2008-09-18 21:02 and people are asking me to fix it 2008-09-18 21:02 it shall be done 2008-09-18 21:03 wait a minute - what is slow? 2008-09-18 21:03 the interfaces / kernel code? 2008-09-18 21:03 this who kooky chain 2008-09-18 21:03 whole 2008-09-18 21:03 vfs <-> bio <-> elevator <-> driver 2008-09-18 21:03 layering is right 2008-09-18 21:03 implementation is faulty 2008-09-18 21:03 agreed 2008-09-18 21:04 anyway 2008-09-18 21:04 we're using the existing one for now 2008-09-18 21:04 it will work for tux3 as well as it works for anybody 2008-09-18 21:04 better, because we will use it more directly 2008-09-18 21:04 and have fewer strange waits and so on 2008-09-18 21:04 right 2008-09-18 21:04 and when we do see a strange wait, we will be able to pounce on it 2008-09-18 21:04 that's why I wanted to go all the way down to the bio on the sb read 2008-09-18 21:05 a) for practice 2008-09-18 21:05 b) because it's the way it should be done 2008-09-18 21:05 unlike if you use the... odd... vfs block io helpers 2008-09-18 21:05 well I think we are going to stay all the way down here for tux3 2008-09-18 21:06 tux3 has no use asking other subsystems to submit bios on its behalf, unless that subsystem is an lvm 2008-09-18 21:06 and even then, we just submit a bio to the lvm without caring its not a real device 2008-09-18 21:06 still have to figure out how to do mmap like stuff (ie. trigger read in, on page fault, or write out, both for kernel and userspace, and cow, etc) 2008-09-18 21:06 maze, handled for you 2008-09-18 21:06 like magic 2008-09-18 21:06 cool - assuming it does the right thing (tm) 2008-09-18 21:06 see filemap.c -> nopage 2008-09-18 21:06 kinda right 2008-09-18 21:06 some messed locking 2008-09-18 21:07 which I'm not sure it does for cache coherency netfs 2008-09-18 21:07 bottlenecks on i_mutex during fault in 2008-09-18 21:07 bad 2008-09-18 21:07 so it probably needs to be gone through with a fine comb then 2008-09-18 21:07 even nfs is cache coherent/consistent with respect to mmap 2008-09-18 21:07 as I was expecting 2008-09-18 21:07 yes 2008-09-18 21:07 right in to the danger zone 2008-09-18 21:08 speaking of which 2008-09-18 21:08 what bottlenecks on i_mutex? 2008-09-18 21:08 time to turn on the ghetto blaster 2008-09-18 21:08 and get back to coding 2008-09-18 21:08 I'm assuming the code in filemap.c which deals with page-in/outs of mmapped pages 2008-09-18 21:08 oh, right it's already 10 past 9 2008-09-18 21:08 so is that it for this time? 2008-09-18 21:08 ACTION puts on Holst's the planets, performed by korean rock band 2008-09-18 21:09 ACTION scrolls back to remember his homework 2008-09-18 21:09 that's it, nice one maze 2008-09-18 21:09 is anybody sticking around to ask lame(er) questions? 2008-09-18 21:09 next time it will be razvanm's turn 2008-09-18 21:09 :P 2008-09-18 21:09 oh, awesome, what's he doing? 2008-09-18 21:09 to explain some more of _2copy 2008-09-18 21:09 ah 2008-09-18 21:09 lame question period is officially open 2008-09-18 21:10 intelligent questions banned 2008-09-18 21:10 what's an elevator? 2008-09-18 21:10 ACTION doesn't have anything to ask this time 2008-09-18 21:10 a kernel elevator 2008-09-18 21:10 when you read/write data to a hard disk 2008-09-18 21:10 otherwise you're going to get some dumb jokes 2008-09-18 21:10 which is a spinning platter with a seeking head 2008-09-18 21:10 elevator = io scheduler 2008-09-18 21:10 then depending on the order you send out request 2008-09-18 21:10 just caught up 2008-09-18 21:11 you may need to do a small or large number of seeks 2008-09-18 21:11 like tivo for geeks 2008-09-18 21:11 yup, and it's algorithms are the same as a busy elevator in a skyscraper 2008-09-18 21:11 seeks are very expensive 2008-09-18 21:11 so you try to minimize seeks 2008-09-18 21:11 for good performance (b/w), but higher latency 2008-09-18 21:11 so are tlb misses 2008-09-18 21:11 and page cache misses 2008-09-18 21:11 you basically scan the disk from top to bottom, doing read writes at increasing lba addresses 2008-09-18 21:11 irregardless of the order they were submitted in 2008-09-18 21:11 then do the same thing going downwards 2008-09-18 21:12 somewhat downwards 2008-09-18 21:12 ok great, but from this level, can we be aware of what media we're writing to so we dont make it overinvolved in cases it doesnt matter, like solid state disks? 2008-09-18 21:12 right 2008-09-18 21:12 the disk doesn't like going backwards as much as forwards 2008-09-18 21:12 the consecutive read/write sectors are still upwards 2008-09-18 21:12 Bushman: you can pick an io scheduler on a per-block-device basis 2008-09-18 21:12 and sometimes you skip the backwards step entirely 2008-09-18 21:12 depends 2008-09-18 21:12 bushman, mostly we don't care, where we do care we care a lot 2008-09-18 21:12 lots of fine tuning required to get optimal performance 2008-09-18 21:13 and it heavily depends on usecases 2008-09-18 21:13 /sys/block/sda/queue/scheduler 2008-09-18 21:13 as long as it's adjustable from userspace i'm good ;) 2008-09-18 21:13 plus you can throw in individual io priorities into the mix (ie. reading this sector is more important) 2008-09-18 21:13 we try to design for whole classes of usecases, rather than one at a time 2008-09-18 21:13 and b/w per job, and hard read/write deadlines, etc 2008-09-18 21:13 and it all gets complex 2008-09-18 21:13 http://friedcpu.wordpress.com/2007/07/17/why-arent-you-using-ionice-yet/ 2008-09-18 21:13 shapor, nice, i havent gotten used to the new linux, i've been bsd'ing since '03 2008-09-18 21:13 i only recently discovered ionice 2008-09-18 21:13 and the elevator is the piece of code which gets requests thrown at it 2008-09-18 21:14 i think mentioned on here 2008-09-18 21:14 does some algo mumbo jumbo to put them in the 'best' order 2008-09-18 21:14 shapor, because it doesn't work that well? 2008-09-18 21:14 and throws them at the disk 2008-09-18 21:14 flips: yes but the interface is there 2008-09-18 21:14 if people use it they can report bugs 2008-09-18 21:14 sure 2008-09-18 21:14 if people dont report bugs or say it sucks on lkml it wont get fixed 2008-09-18 21:14 same problem with posix_fadvise 2008-09-18 21:14 note that for a network nic 2008-09-18 21:14 we will take it for a spin at some point 2008-09-18 21:15 you have a certain amount of b/w 2008-09-18 21:15 maze will ;) 2008-09-18 21:15 and it's all pretty easy - conceptually 2008-09-18 21:15 and shapor will make some nice charts of the event logs 2008-09-18 21:15 vfs + bio events 2008-09-18 21:15 oh i almost forgot about that 2008-09-18 21:15 sending each packet involves a fixed amount of headroom, (header fields), the packet itself, and a fixed footer 2008-09-18 21:15 still no clue how to glue those together 2008-09-18 21:15 so when you send a packet you know exactly how much of the nic (ie. for how long) you're using it up 2008-09-18 21:16 thus you can make very nice guarantees 2008-09-18 21:16 and this is what htb + sfq does for networking 2008-09-18 21:16 htb? sfq? 2008-09-18 21:16 you can partition your network card pretty much arbitrarily between diifferent apps 2008-09-18 21:16 giving different apps different priorities, then different priorities different amounts of bw 2008-09-18 21:16 and the priorities don't need to be strictly linear either 2008-09-18 21:16 htb? sfq? 2008-09-18 21:16 htb 2008-09-18 21:16 oh could i get in on the testing? i've done a lot of work visualizing sequences of events in temporal OSPF loops, this should be i could do ;) 2008-09-18 21:17 htb is basically a tree structure 2008-09-18 21:17 the nodes are were requests come in 2008-09-18 21:17 what's the tla mean? 2008-09-18 21:17 the root is were requests come out 2008-09-18 21:17 so each application (or tcp stream, or whatever you're using) gets assigned to a leaf node in this tree 2008-09-18 21:17 (Stochastic Fairness Queueing) 2008-09-18 21:18 and the network driver then (when it wants to send) always pulls from the root 2008-09-18 21:18 gah 2008-09-18 21:18 each node in this tree has a certain speed of accumulating tokens 2008-09-18 21:18 (htb = hierarchical token buckets) 2008-09-18 21:18 that it accumulates in the bucket in that node 2008-09-18 21:18 wouldnt stochastic approach that every client is equally unhappy? ;) 2008-09-18 21:19 Bushman: sfq is used in the leafs to randomly select between clients / tcp streams you consider equivalent 2008-09-18 21:19 you hang an sfq off of each leaf node in htb, so you actually throw the packets at the correct sfq, and the htb leaf pulls it from the attached sfq 2008-09-18 21:19 network peeps are always reinventing the world ;) 2008-09-18 21:20 ah, so you use the hiarchical token buckets to assign different classes of service to different apps/streams? 2008-09-18 21:20 anyway, you divide up each nodes bandwidth among it's children 2008-09-18 21:20 and then define how and when they can borrow/lend tokens to each other 2008-09-18 21:20 I'm not doing a very good job of defining it here 2008-09-18 21:20 but it's wicked! 2008-09-18 21:20 no- you're doing a great job 2008-09-18 21:20 maze, I'm getting the idea 2008-09-18 21:20 sounds wicked 2008-09-18 21:20 yea i just did a project with filtering/limiting at work, so i'm getting it 2008-09-18 21:21 it sounds a lot smarter than it is ;) 2008-09-18 21:21 well, disk layer doesn't have any such pretentions to sophistication 2008-09-18 21:21 yet 2008-09-18 21:21 heh 2008-09-18 21:21 damn academis justifying their existence 2008-09-18 21:21 anyway, basically htb + sfq is the best I've seen for networking, and would probably be awesome for other stuff as well like scheduling cpus 2008-09-18 21:21 I can imagine the mess if it did 2008-09-18 21:21 Bushman: gee filtering and limiting, i wouldn't have guessed :P 2008-09-18 21:21 except it's probably to compute intensive for that and can't take cache-heat or memory nearness into account 2008-09-18 21:21 shapor: stfu ;) 2008-09-18 21:22 :) 2008-09-18 21:22 anyway, with disk it gets tougher 2008-09-18 21:22 if it did, could be interesting as a cache coherency protocal 2008-09-18 21:22 because you can't just up and calculate how long a particular operation will take 2008-09-18 21:22 network peeps always trying to find the must obscrue TLA 2008-09-18 21:22 Bushman: don't you guys use bullets for limiting ? :P 2008-09-18 21:22 mot <- most obscure tla 2008-09-18 21:22 haha 2008-09-18 21:22 (with the nic, you know its line rate, you know how many bytes your sending, the size of the pre and post-amble, the wait between packets, you thus now the _entire_ cost of sending any given packet] 2008-09-18 21:23 dont make me whip out stories about invalidating keys with thermite granades 2008-09-18 21:23 motley cru 2008-09-18 21:23 tla? 2008-09-18 21:23 mot? 2008-09-18 21:23 maze, and you don't know much carrier sense backout is going to cost ;) 2008-09-18 21:23 most obscure three letter acronym 2008-09-18 21:23 ah, so you use the hiarchical token buckets to assign different classes of service to different apps/streams? - precisely 2008-09-18 21:23 and that's where your pretentions to realtime control come crashing down 2008-09-18 21:23 which is a fla 2008-09-18 21:24 which is a tla 2008-09-18 21:24 which is a tla 2008-09-18 21:24 third time lucky 2008-09-18 21:24 for example I would give each user in my network their own sfq for local traffic to another nic (just switching) to another network via wireless and to the internet (via the same wireless) 2008-09-18 21:24 to make delivery time guaranteed, woudlnt you have to have full preempt kernel? (oh i miss 80ties Amigas) 2008-09-18 21:24 ACTION thinks of some keys he'd like invalidated 2008-09-18 21:24 and then use htb to make sure everything was fair on the slow internet link, and on the others at the same time - worked awesome 2008-09-18 21:25 be right back in 10. 2008-09-18 21:25 was a good one 2008-09-18 21:25 so who's hungry? 2008-09-18 21:25 me? 2008-09-18 21:25 was just going to order from bruno's 2008-09-18 21:25 we could meet there instead 2008-09-18 21:25 you don't coult, you're always hungry 2008-09-18 21:25 flips: i thought you were coding not slacking tonight 2008-09-18 21:25 ;-) 2008-09-18 21:25 i need to sleep, it's past midnight here damn it 2008-09-18 21:26 shapor, what do think I was doing while maze was talking? 2008-09-18 21:26 Bushman: I'll drink a zyweic for you :) 2008-09-18 21:26 Bushman: east coast? 2008-09-18 21:26 bushman, laterz 2008-09-18 21:26 you guys keep it too interesting 2008-09-18 21:26 heh thanks 2008-09-18 21:26 ACTION also goes to bed. Good night to everyone. 2008-09-18 21:27 shapor- you up for grub? 2008-09-18 21:27 Shapor: you better have some Zywiec/Okocim handy when i invade LA again 2008-09-18 21:27 that would be "more grub" 2008-09-18 21:27 safe bet, shapor already cooked tonight 2008-09-18 21:27 tim_dimm_: yeah i already ate 2008-09-18 21:27 k, beer? 2008-09-18 21:27 flips: no, i had gDinner 2008-09-18 21:27 how about some chianti 2008-09-18 21:27 ? 2008-09-18 21:27 flips: did Shap introduce you to polish beer yet? 2008-09-18 21:27 heh 2008-09-18 21:27 don't need shap for that 2008-09-18 21:28 used to live in berlin 2008-09-18 21:28 ah yes, the spoils of war... ;) 2008-09-18 21:28 heh 2008-09-18 21:28 not sure which way that one cuts 2008-09-18 21:28 all the kings horses.... 2008-09-18 21:28 couldn't stop tanks 2008-09-18 21:28 but they stopped for a beer! 2008-09-18 21:29 berlin has lots of wayward poles 2008-09-18 21:29 drinking, mostly 2008-09-18 21:29 some leggy poles 2008-09-18 21:29 drinking 2008-09-18 21:29 flips: since its late night, how's swingers sound? 2008-09-18 21:29 even the ubermensch need a brewsky 2008-09-18 21:29 or playing with the berlin boys 2008-09-18 21:29 toying actually 2008-09-18 21:29 berlin boy toys 2008-09-18 21:29 also fun for the finhish girls 2008-09-18 21:29 finnish 2008-09-18 21:29 tim_dimm_: people who dont know LA might take that out of context ;) 2008-09-18 21:30 i thought of that as soon as I hit enter 2008-09-18 21:30 esp with the PC bunch we have in here 2008-09-18 21:30 for the record, swingers is a diner 2008-09-18 21:30 shapor: maybe i should tell them about how you behaved when i took you out to boystown in chicago ;) 2008-09-18 21:30 hahah 2008-09-18 21:30 lol 2008-09-18 21:30 tim_dimm_, 802 Broadway? 2008-09-18 21:31 yeah 2008-09-18 21:31 corner of lincoln and broadway 2008-09-18 21:31 shap? 2008-09-18 21:31 sure 2008-09-18 21:31 tim_dimm_, 22 oclock? 2008-09-18 21:31 pick u up 2008-09-18 21:31 ? 2008-09-18 21:31 i could go for a vanilla chai latte 2008-09-18 21:31 kay 2008-09-18 21:31 sure 2008-09-18 21:31 good idea 2008-09-18 21:31 you commie bastards 2008-09-18 21:31 keep those wrists safe tonight 2008-09-18 21:31 i got waffle house 2008-09-18 21:32 :) 2008-09-18 21:32 k rollin in ten 2008-09-18 21:32 you coming by here? 2008-09-18 21:32 shapor: drive by u then flips 2008-09-18 21:32 yeah 2008-09-18 21:32 good 2008-09-18 21:32 sure 2008-09-18 21:32 see you then 2008-09-18 21:32 k 2008-09-18 21:32 got 28 minutes to hack on dleaf 2008-09-18 21:32 bushman, good to meet you 2008-09-18 21:32 ACTION puts pants on 2008-09-18 21:32 nice to talk with everyone 2008-09-18 21:33 swingers, pants 2008-09-18 21:33 bushman, see you soon 2008-09-18 21:33 wtf dude 2008-09-18 21:33 haha 2008-09-18 21:33 ;-) 2008-09-18 21:33 oh if shap is putting pants on... SAY HI TO JOELLE! 2008-09-18 21:33 bushman, we need to meet up 2008-09-18 21:33 yea i know, end of fiscal year madness here, maybe this weekend we'll talk more 2008-09-18 21:33 Bushman: she says hi ;0 2008-09-18 21:33 ;) rather 2008-09-18 21:34 bushman, works for me 2008-09-18 21:35 flips: my boss been just tasked with writing the next orange book like thing, so we can make our requirements whatever we want, literally 2008-09-18 21:36 this is DoD/govt wide stuff, seriously influential development for the next decade, so it's the perfect moment to sneak in all kinds of security goodness 2008-09-18 21:37 bushman, sweet 2008-09-18 21:37 means I'd better bootstrap my clue 2008-09-18 21:38 i get to be the technical ideas feeder, as tehy're more policy, so if you got good ideas, i'm all ears 2008-09-18 21:38 what color is this one going to be? 2008-09-18 21:38 green book? 2008-09-18 21:38 this is la after all 2008-09-18 21:38 nah, the rainbow series been retired, dunno what it's gonna be called 2008-09-18 21:39 leetbook 2008-09-18 21:39 they've realized common criteria was an EPIC FAIL! 2008-09-18 21:39 nice 2008-09-18 21:39 onion book 2008-09-18 21:39 Bushman: isn't that what you spent a year of raduate school pissing and moaning about? 2008-09-18 21:39 or... maybe pomegranite 2008-09-18 21:39 anyway, must sleep, got some hacking certification to pass tommorow 2008-09-18 21:39 pomegrantis have excellent security... isolation... 2008-09-18 21:40 compartmentalization 2008-09-18 21:40 yes, that was the class i was forced into when you were 'visiting' 2008-09-18 21:40 robustness... 2008-09-18 21:40 Bushman: thanks again for that ;) 2008-09-18 21:40 hmm 2008-09-18 21:40 they always pick such meaningful names like dod8200.2 or dcid6/3 2008-09-18 21:42 now , now, now, not all poles drink 2008-09-18 21:42 ...heavily... 2008-09-18 21:42 no? 2008-09-18 21:42 we take breaks 2008-09-18 21:43 true 2008-09-18 21:43 to sleep... 2008-09-18 21:43 of sorts 2008-09-18 21:43 my break been too long, i need my okocim porter damn it 2008-09-18 21:44 orange book on what? 2008-09-18 21:44 security requirements for high assurance computer systems 2008-09-18 21:44 ah 2008-09-18 21:44 govt/mil style stuff 2008-09-18 21:44 http://en.wikipedia.org/wiki/TCSEC 2008-09-18 21:45 that can be interesting 2008-09-18 21:45 red hat and suse got b2 a while back 2008-09-18 21:45 so long as you don't have to write or read it 2008-09-18 21:45 I seem to recall 2008-09-18 21:45 those documents need an interface layer 2008-09-18 21:45 heh, i forgot the official name of it, i actually got the orange covered book on my shelf ;) 2008-09-18 21:45 so did windows nt... for a rather specialized configuration 2008-09-18 21:45 (ie. come with your own personal interpreter) 2008-09-18 21:45 with the network unplugged I think it was 2008-09-18 21:45 well, you need stuff like labeled packets on a network, 2008-09-18 21:46 i totally agree, that's why we're redoing it, cuz everything up to this point sucks 2008-09-18 21:46 isolation 2008-09-18 21:46 yeah, lots of fun 2008-09-18 21:46 maze, I agree with the concept of defining the functionality rather than the interface 2008-09-18 21:46 heh, if you had citizenship i could hire you right now ;) 2008-09-18 21:46 and while you're at it, you should probably make sure to sneak in good performance 2008-09-18 21:46 defining interfaces just doesn't fly with linux kern hacks 2008-09-18 21:47 I'd make sure a decent qos and the like makes it in 2008-09-18 21:47 these people couldnt give less shit about performance 2008-09-18 21:47 barriers 2008-09-18 21:47 that's what they want 2008-09-18 21:47 MaZe: good point.. you really have to sneak in performance 2008-09-18 21:47 I'd want these to be able to interoperate 2008-09-18 21:47 can't get there from here, provably 2008-09-18 21:47 with a public network like the internet 2008-09-18 21:47 the only way to sneak in performance is to push for small code 2008-09-18 21:48 fortunately, performance and provability tend to go hand in hand 2008-09-18 21:48 that's the only spot where security and performance principles meet 2008-09-18 21:48 or make this be the standard for the backbone or something 2008-09-18 21:48 yup 2008-09-18 21:48 because to get performance you need simple 2008-09-18 21:48 less code - easier to understand 2008-09-18 21:48 NOT YOUR CODE! 2008-09-18 21:48 easier to prove correct (or believe correct)/understand 2008-09-18 21:48 Bushman: no US Citizenship, just Canadian ;-) we Canadians rule the world. 2008-09-18 21:49 i've been reading tux' code, holy crap, did you all grow up coding up demos for amigas in the 80ties? 2008-09-18 21:49 imo, that's a requirement for decent coding 2008-09-18 21:49 bushman, you need to read some other filesystem code 2008-09-18 21:49 it's just as dense, but not as performant for the most part 2008-09-18 21:50 i know, i'm kidding, but i saw bitslicing and i went 'oh shnap, they didnt...' 2008-09-18 21:50 the real trick, is you want to define nice clean apis/interfaces, then stick to them without breaking through the layering 2008-09-18 21:50 ACTION thinks about cutting and pasting some vfat code 2008-09-18 21:50 while at the same time avoid layer for the sake of layers - a lot of the vfs code is just wrappers on wrappers - sad 2008-09-18 21:50 bushman, the bitfields stuff will go, mostly 2008-09-18 21:50 can't have that on the actual media 2008-09-18 21:50 and a vital point is, the apis have to be precisely and accurately defined and documented 2008-09-18 21:50 -!- tim_dimm__(~mobile@32.172.89.233) has joined #tux3 2008-09-18 21:51 too much variation between machine architectures 2008-09-18 21:51 yea i was thinking how easy that would be to do buffer overflows on 2008-09-18 21:51 maze, careful there 2008-09-18 21:51 does this function sleep? what's the input? the output? how does it deal with errors? what errors can it return? how long can it take? 2008-09-18 21:51 maze, there is a long and star studded history of api proposals to the linux kernel core that failed 2008-09-18 21:51 hmm? 2008-09-18 21:52 Shapor: outside dude 2008-09-18 21:52 tim_dimm__: k 2008-09-18 21:52 you almost want a language where you can have write constraints for before/after/during execution of each function...like Eiffel 2008-09-18 21:52 selinux only just squeeked in 2008-09-18 21:52 Bushman: agreed 2008-09-18 21:52 that should almost be a gov requirement for the code thats secure 2008-09-18 21:52 most other consortium/thinktank type apis have failed to get merged 2008-09-18 21:52 that's how stock market software is done 2008-09-18 21:52 not to say it can't happen 2008-09-18 21:52 but one has to be _very careful_ 2008-09-18 21:53 careful with what? 2008-09-18 21:53 i got to sit down with the creator of eiffel few months ago, very smart dude 2008-09-18 21:53 maze, proposing apis to linus 2008-09-18 21:53 ah 2008-09-18 21:53 better to propose functionality 2008-09-18 21:53 define the functionality and the invariants 2008-09-18 21:53 where's the problem? he doesnt' like them? 2008-09-18 21:53 oh 2008-09-18 21:53 let linus and friends take it to api 2008-09-18 21:53 yea, linus seems to be a big proponent of order emergent out of chaos... 2008-09-18 21:53 perhaps with some helpful guidance 2008-09-18 21:54 yeah, I kind of consider the api to be the functionality/invariants 2008-09-18 21:54 I'm not sure I'm really aware of the difference 2008-09-18 21:54 maze, he has a healthly disrespect for anybody else's ability to design a robust api 2008-09-18 21:54 linus isn't the world's greatest either, but its his kernel 2008-09-18 21:54 he's not the worst either, or even below 99th percentile 2008-09-18 21:54 ok, by api, I don't mean it can't be changed later - as in stable 2008-09-18 21:55 api for demo apps, sure 2008-09-18 21:55 I mean a layer below which you don't have to descend to understand what it will do 2008-09-18 21:55 yea but how do you arrive at stable without having it in real action for a while 2008-09-18 21:55 reference implementation 2008-09-18 21:55 jsut don' 2008-09-18 21:55 jsut don't let it grow into a huge undertaking with emotional baggage 2008-09-18 21:55 bushman, true 2008-09-18 21:55 well 2008-09-18 21:55 some strange bootstrap 2008-09-18 21:56 usually best to work with incremental modifactions 2008-09-18 21:56 you have to have a clear idea of where you're heading 2008-09-18 21:56 usually you get that the second or third time around the block 2008-09-18 21:56 because you've tried it yourself, yes 2008-09-18 21:56 but that still doesn't cut it with core 2008-09-18 21:56 i was a sysadmin for a long time, if i learned anything is that long term reality beats out fuzzing/use cases anytime ;) 2008-09-18 21:56 -!- tim_dimm__(~mobile@32.172.89.233) has joined #tux3 2008-09-18 21:56 bushman, exactly 2008-09-18 21:56 and the reality is posix 2008-09-18 21:57 I'm not sure what you mean by that, especially by fuzzing 2008-09-18 21:57 yeah and postfix is broken 2008-09-18 21:57 so this will succeed to the extent it builds on that 2008-09-18 21:57 posix is nice because it's a standard 2008-09-18 21:57 but, oh boy, what a standard it is... 2008-09-18 21:57 Flips: be outside in 3 min 2008-09-18 21:57 and because linus cares about it in a backhanded way 2008-09-18 21:57 kay 2008-09-18 21:57 ACTION thinks about pants 2008-09-18 21:58 anybody gonna come pick me up? 2008-09-18 21:58 housecoat currently in case any of you were wondering ;) 2008-09-18 21:58 let anything run in real production environment long enough and it's gonna encounter more bugs than all the test cases you can predict/generate. all tests are contrived. reality is strangely objective 2008-09-18 21:58 maze, we'll send the learjet by in 3 years 2008-09-18 21:58 Bushman: I'm an SRE right now, so I know ;-) 2008-09-18 21:59 what's SRE? 2008-09-18 21:59 bushman, it's not really true of core kernel though 2008-09-18 21:59 -!- tim_dimm__(~mobile@32.172.89.233) has joined #tux3 2008-09-18 21:59 way more bugs are squeezed out before it gets into hands of users 2008-09-18 21:59 or we'd be dead 2008-09-18 21:59 site reliability engineer for google, running crawling and indexing, all the way from machines to 70% or so of the way up the stack 2008-09-18 22:00 that's true, to me kernel is something that joins the clarity of time travel with readability of alchemy ;) 2008-09-18 22:01 well, we used to have the nice 2.odd trees, now all users are beta testers :/ 2008-09-18 22:01 they are also much smaller changes though 2008-09-18 22:01 alrightt, 1am, time to pass out, over an out, great meeting you all 2008-09-18 22:02 nice to meet you 2008-09-18 22:02 and you have the stable kernels as well - the ones in RHEL4/5 and 2.6.16.X and then you have the newer ones being tested in fedora and ubuntu and desktop distros, and then you have bleeding edge in unreleased distros (fedora 10, etc) 2008-09-18 22:02 mainline 2008-09-18 22:03 so this is something I'm not sure about, but I think the trees used by distros have gotten _MUCH_ closer to mainline 2008-09-18 22:03 Flips outside now 2008-09-18 22:03 cu 2008-09-18 22:03 bye guys 2008-09-18 22:04 I used to compile kernels from source back in 2.4 days 2008-09-18 22:05 Mobile irc = busted 2008-09-18 22:05 :) 2008-09-18 22:05 nowadays I use whatever distro provided kernel is available 2008-09-18 22:05 ie. right now I'm running fedora 9 and tracking koji (ie. running 2.6.26.5-42 now) 2008-09-18 22:06 the problem with building your own kernel is it's so freaking complex to get the right config options 2008-09-18 22:06 not to mention you end up with a config noone has tested... 2008-09-18 22:06 and you end up building so many modules you'll never use 2008-09-18 22:07 (make config could really use a detect usb/pci/etc devices present in system and enable those forcibly, disable the rest) 2008-09-18 22:07 I would guess this actually means almost everybody is running a distro provided kernel 2008-09-18 22:11 Thanks! 2008-09-18 23:29 maze, make defconfig is your friend 2008-09-18 23:29 hmm? 2008-09-18 23:29 try it 2008-09-18 23:29 oh, speaking of that, yeah, still 2008-09-18 23:30 I've got a perfectly good kernel someone else deals with 2008-09-18 23:30 and I can compile modules against it 2008-09-18 23:30 I'm happy ;-) 2008-09-18 23:30 you'll get over it 2008-09-18 23:31 that being happy thing 2008-09-18 23:31 these days you just cat your config out of proc 2008-09-18 23:32 cat /proc/config.gz | gunzip | less 2008-09-18 23:32 and lsmod 2008-09-18 23:34 lots of windows peeps reading our mailing list archives 2008-09-18 23:35 chances are, linux hacks running company laptops 2008-09-18 23:35 but you never know 2008-09-18 23:40 or bots trying to be subtle 2008-09-18 23:40 sneakbots 2008-09-18 23:41 flipz_out: do you have stats? 2008-09-18 23:41 maybe 2008-09-18 23:41 installed the stats thing 2008-09-18 23:41 didn't check it 2008-09-18 23:41 what's the command? 2008-09-18 23:41 webalizer 2008-09-18 23:41 it produces html output 2008-09-18 23:42 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-18 23:42 lame 2008-09-18 23:42 well 2008-09-18 23:42 by default /var/www/webalizer in debian i think 2008-09-18 23:42 maybe 2008-09-18 23:42 got to get it doing something more sensible 2008-09-18 23:42 only giving me per-month right now 2008-09-18 23:43 I want per-hour 2008-09-18 23:43 monthly for may??? 2008-09-18 23:43 wtf 2008-09-18 23:43 Usage Statistics for tux3.org 2008-09-18 23:43 Summary Period: May 2007 2008-09-18 23:43 Generated 18-Sep-2008 23:41 PDT 2008-09-18 23:43 [Daily Statistics] [Hourly Statistics] [URLs] [Entry] [Exit] [Sites] [Referrers] [Search] [Agents] [Locations] 2008-09-18 23:43 Monthly Statistics for May 2007 2008-09-18 23:44 got more important things to do than give enemas to stats scripts 2008-09-18 23:45 1 45 58.44% slashdot.org/comments.pl 2008-09-18 23:47 microsoft seems to be crawling my site with the user-agent 2008-09-18 23:47 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322) 2008-09-18 23:48 ah 2008-09-18 23:48 seen that much 2008-09-18 23:48 from diverse ip addresses 2008-09-18 23:48 evil? 2008-09-18 23:48 its obviously a bot 2008-09-18 23:48 what makes you think its msftbot? 2008-09-18 23:48 its grabbing an html file 2008-09-18 23:48 msnbot 2008-09-18 23:48 and not any of the jpgs linked on it 2008-09-18 23:48 also 2008-09-18 23:49 the ip's belong to msft ;) 2008-09-18 23:49 they're not even good at sneaking 2008-09-18 23:50 65.55.109.0/24 and 65.55.110.0/24 2008-09-18 23:51 OrgName: Microsoft Corp 2008-09-18 23:51 and they set the referrer to 2008-09-18 23:51 http://search.live.com/results.aspx?q=camera 2008-09-18 23:51 to make it look like people are using live.com to find my site 2008-09-18 23:51 seems shady 2008-09-18 23:52 where "camera" is any common word which appears in my site 2008-09-18 23:53 shady 2008-09-18 23:53 ballmer style 2008-09-18 23:53 http://ekstreme.com/thingsofsorts/blogging/yell-if-microsofts-livecom-spammed-you-too 2008-09-18 23:54 with all the subtletly of a giraffe in a japanese tea house 2008-09-18 23:54 this has been going on a long time 2008-09-18 23:58 i guess its them trying to emulate a human 2008-09-18 23:58 seems stupid