2008-09-19 00:02 but can it pogo 2008-09-19 00:15 ... 2008-09-19 00:20 Shapor: # Idle priority is VERY cautious about marking block devices idle. If your foreground tasks are using disk, then your background tasks will become noticeably slower, as they get blocked from touching the disks until Linux knows for sure your foreground tasks have all had a chance at the disk. Most of the times, you don?t care about this anyway, but don?t run a torrent in non-idle class and expect a 20GB copy to finish till the torrent?s done! 2008-09-19 00:20 == lame 2008-09-19 00:20 http://friedcpu.wordpress.com/2007/07/17/why-arent-you-using-ionice-yet/ 2008-09-19 00:21 got to be a hint why only phb oriiented vendors provide it by default 2008-09-19 00:21 course... maybe that's every vendor ;) 2008-09-19 00:32 yeah its not very fine grained 2008-09-19 00:33 that VERY is a red flag 2008-09-19 00:34 i didn't generate hard data but when i use a combinatino of ionice and nice compressing log files the system seems more responsive 2008-09-19 00:34 than if i dont use them 2008-09-19 00:34 i dont care how long it takes to compress my log files really 2008-09-19 00:35 so if ANY other io wants my disk let it have it 2008-09-19 00:35 if the log compression never finishes thats ok 2008-09-19 00:35 it is useful even in its current state 2008-09-19 00:36 now i just need a version of cat which puts posix_fadvise in the io even loop so it doesn't piss on my buffer cache either 2008-09-19 00:37 cat --dont-piss-on-my-buffer-cache 2008-09-19 00:37 mount -ttux3 -oloop foodev /mnt 2008-09-19 00:37 we start here. 2008-09-19 00:37 wow! we got here 2008-09-19 00:38 my cut n paste of mazes junk just worked 2008-09-19 00:38 not junk 2008-09-19 00:38 junkfs :) 2008-09-19 00:38 junkfs reulz 2008-09-19 00:39 unlike maze, I did not have to reboot my workstation 2008-09-19 00:39 because I ran it under uml 2008-09-19 00:39 worth getting that working 2008-09-19 00:40 heh 2008-09-19 00:40 mount: wrong fs type, bad option, bad superblock on /dev/loop0, 2008-09-19 00:40 or too many mounted file systems 2008-09-19 00:40 (aren't you trying to mount an extended partition, 2008-09-19 00:40 instead of some logical partition inside?) 2008-09-19 00:40 ok, let's get out the junk mop 2008-09-19 00:40 and make it presentable for tux3 checkin #3 2008-09-19 00:41 i thought you were working on extents ;) 2008-09-19 00:41 or #4 if you count the lame git checkin 2008-09-19 00:41 this is related to extents, just as ketchup is related to ice cream 2008-09-19 00:42 maze done good in here 2008-09-19 00:43 I particularly like the little hexdump in 7 lines 2008-09-19 00:43 ACTION cuts it down to 6 2008-09-19 01:00 flipz: where did dir.c come from 2008-09-19 01:00 looks a lot different from fs/ext2/dir.c in a recent kernel 2008-09-19 01:00 dir.c? 2008-09-19 01:00 oh 2008-09-19 01:00 it's the same 2008-09-19 01:01 just marginally cleaned up 2008-09-19 01:01 and got rid of the page wanking 2008-09-19 01:01 when back to buffer ops as god intended 2008-09-19 01:02 changed the interface a bit? 2008-09-19 01:02 not really 2008-09-19 01:02 ext2_create_entry 2008-09-19 01:02 ? 2008-09-19 01:03 pretty much the same 2008-09-19 01:03 that level isn't implemented 2008-09-19 01:03 in tux3 2008-09-19 01:03 well 2008-09-19 01:03 it's in inode.c 2008-09-19 01:03 well i dont see ext2_create_entry in the ext2/dir.c 2008-09-19 01:03 in fact lxr says it doesn't exist 2008-09-19 01:03 try namei.c 2008-09-19 01:03 mknod 2008-09-19 01:03 or something 2008-09-19 01:03 did you rename it? 2008-09-19 01:03 buncha verbosity 2008-09-19 01:04 were they passing in a dentry before? 2008-09-19 01:04 no 2008-09-19 01:04 it's trivial 2008-09-19 01:04 um 2008-09-19 01:04 ok 2008-09-19 01:04 you want to know the name 2008-09-19 01:04 justa sec 2008-09-19 01:05 ext2_add_link 2008-09-19 01:05 dumb name 2008-09-19 01:05 yeah thats what i thought 2008-09-19 01:06 you changed the interface 2008-09-19 01:06 I really didn't change much 2008-09-19 01:06 did not want to discover new bugz 2008-09-19 01:06 because they create the dentry first 2008-09-19 01:06 and pass that 2008-09-19 01:06 rather than a filename 2008-09-19 01:06 hmm I did a little 2008-09-19 01:06 because no dentries in tux3 userspace 2008-09-19 01:06 and they call ext2_create seperately 2008-09-19 01:07 hmm 2008-09-19 01:07 caught me 2008-09-19 01:07 perhaps there should be 2008-09-19 01:07 to make kernel port easier 2008-09-19 01:07 ext2 is not an exemplary model for namespace structure 2008-09-19 01:07 hmm 2008-09-19 01:07 this is all fs internal 2008-09-19 01:07 ok 2008-09-19 01:08 might as well drop some of the braindamage 2008-09-19 01:08 good call on that though 2008-09-19 01:08 i'm just trying to fix a bug in it ;) 2008-09-19 01:08 bug! 2008-09-19 01:08 i think you did introduce one 2008-09-19 01:08 ;) 2008-09-19 01:08 happens 2008-09-19 01:15 feels like there are too many interfaces in ext2/dir.c 2008-09-19 01:15 yes 2008-09-19 01:15 a linux meme 2008-09-19 01:16 making interfaces looks confusingly like productive work 2008-09-19 01:19 now... why did maze put a wait queue inside the bio 2008-09-19 01:19 looking forward to the explanation ;) 2008-09-19 01:19 ACTION unborks 2008-09-19 01:20 it seems like a lot of stuff is landing in our inode.c 2008-09-19 01:20 sposed to put a pointer to the wait queue there, not the wait queue itself 2008-09-19 01:20 sure 2008-09-19 01:20 inode.c is a toilet 2008-09-19 01:20 heh 2008-09-19 01:20 by tradition 2008-09-19 01:20 dont flush it! 2008-09-19 01:20 might lose something good 2008-09-19 01:22 ah so the vfs does indeed hand you a dentry 2008-09-19 01:22 not a filename 2008-09-19 01:22 man lxr is fucking slow 2008-09-19 01:23 i'm going to run my own 2008-09-19 01:23 damn europeans much be awake 2008-09-19 01:23 good luck installing it 2008-09-19 01:23 must even 2008-09-19 01:23 let me know how it works out 2008-09-19 01:23 hrm the interface is kinda crap 2008-09-19 01:24 ACTION tries not getting sidetracked making lxr not suck as much 2008-09-19 01:29 shapor, know a shell command for writing a few bytes at the beginning of a file without truncating the file? 2008-09-19 01:30 reiserfs has some weird looking shit in it 2008-09-19 01:30 you don't say 2008-09-19 01:31 take a simple idea and make it weird 2008-09-19 01:31 flipz: dd ? 2008-09-19 01:31 how bout that shell command? 2008-09-19 01:31 ah 2008-09-19 01:31 didn't know it could do that 2008-09-19 01:31 notrunc i think 2008-09-19 01:33 conv=notrunc 2008-09-19 01:33 lets you plop down data in it without truncating 2008-09-19 01:33 dd conv=notrunc if=hello of=foodev 2008-09-19 01:33 dd has a really weird command syntax 2008-09-19 01:34 root@usermode:~# ./tux3 2008-09-19 01:34 we start here. 2008-09-19 01:34 wow! we got here 2008-09-19 01:34 super = 68 65 6C 6C 6F 0A 00 00 00 00 00 00 00 00 00 00 2008-09-19 01:34 mount: Not a directory 2008-09-19 01:34 with maze's 'art' fixed 2008-09-19 01:35 the number of right things in maze's little hack _vastly_ outnumbers the wrong things 2008-09-19 01:35 but the wrong things are doozers ;) 2008-09-19 01:36 "It is rumored to have been based on IBM's JCL, and though the syntax may have been a joke[1], there seems never to have been any effort to write a more Unix-like replacement." 2008-09-19 01:36 from the wikipedia dd page 2008-09-19 01:36 http://en.wikipedia.org/wiki/Dd_(Unix) 2008-09-19 01:36 :p 2008-09-19 01:36 longest running joke in unix 2008-09-19 01:37 dd deprecated? 2008-09-19 01:37 i think not 2008-09-19 01:37 flipz: we should fix it :) 2008-09-19 01:38 right, if only because we own the name 2008-09-19 01:38 yup 2008-09-19 01:38 ddcp 2008-09-19 01:38 nah too long 2008-09-19 01:38 and its not cp 2008-09-19 01:38 dd --oldbroken 2008-09-19 01:38 dd2 2008-09-19 01:38 dd --muchbetter 2008-09-19 01:39 ddd 2008-09-19 01:39 or how about just "d" 2008-09-19 01:40 dd with a symlink 2008-09-19 01:42 hardlink 2008-09-19 01:42 mandatory 2008-09-19 01:42 provide legacy compatability if the argv[0] is dd 2008-09-19 01:42 otherwise new hawtness 2008-09-19 01:43 root@usermode:~# ./tux3 2008-09-19 01:43 we start here. 2008-09-19 01:43 wow! we got here 2008-09-19 01:43 super = 68 65 6C 6C 6F 0A 00 00 00 00 00 00 00 00 00 00 2008-09-19 01:43 root@usermode:~# mount 2008-09-19 01:43 /dev/ubda on / type ext2 (rw) 2008-09-19 01:43 proc on /proc type proc (rw) 2008-09-19 01:43 devpts on /dev/pts type devpts (rw,gid=5,mode=620) 2008-09-19 01:43 /root/foodev on /mnt type tux3 (rw,loop=/dev/loop0) 2008-09-19 01:43 that's enough for tonight 2008-09-19 01:43 sweet 2008-09-19 01:43 almost ;) 2008-09-19 01:43 so i'm trying to get a backup of your git tree up on github.com 2008-09-19 01:43 how'd you clone it? 2008-09-19 01:43 they already have linus's tree 2008-09-19 01:44 I failed 2008-09-19 01:44 so i forked it 2008-09-19 01:44 always forget how 2008-09-19 01:44 now i'm just trying to push your changes in to it 2008-09-19 01:44 just clone mine 2008-09-19 01:44 i dont think i can 2008-09-19 01:44 don't rebase to anything 2008-09-19 01:44 well 2008-09-19 01:44 I'll fix that 2008-09-19 01:44 tomorrow 2008-09-19 01:44 you need to have the git service running 2008-09-19 01:44 maybe you do 2008-09-19 01:44 I do 2008-09-19 01:44 it's just configged borkly 2008-09-19 01:45 git ui braindamage as much as anything 2008-09-19 01:45 nothing is obvious 2008-09-19 01:45 telnet phunq.net 9418 2008-09-19 01:45 yeah you do 2008-09-19 01:45 mercurial is altogether more usable in this and other ways 2008-09-19 01:46 we should get the whole vfs running in user space 2008-09-19 01:46 would be killer for testing 2008-09-19 01:46 I'm milding interested in doing a dentry like thing 2008-09-19 01:46 but we have fuse for that, really 2008-09-19 01:46 fuse is... 2008-09-19 01:46 ugh 2008-09-19 01:46 we just need to use it better 2008-09-19 01:46 yeah 2008-09-19 01:46 true 2008-09-19 01:46 we're really fitting sideways into it right now 2008-09-19 01:46 yeah its gross 2008-09-19 01:46 I'm amazed anything at all works 2008-09-19 01:47 the bug i was fixing 2008-09-19 01:47 is trying to create a file with a name which is too long 2008-09-19 01:47 returns an error 2008-09-19 01:47 that its too long 2008-09-19 01:47 as it should 2008-09-19 01:47 but creates it anyway 2008-09-19 01:47 oh, bad 2008-09-19 01:47 with the name truncated 2008-09-19 01:47 and no inode 2008-09-19 01:47 naughty 2008-09-19 01:47 its fucked 2008-09-19 01:47 heh 2008-09-19 01:47 I doubt that was my idea 2008-09-19 01:47 its a case you never get in dir.c 2008-09-19 01:48 because it checks when it creates the dentry 2008-09-19 01:48 nice catch 2008-09-19 01:48 in the vfs 2008-09-19 01:48 since we dont use it 2008-09-19 01:48 lamissimo 2008-09-19 01:48 yeah 2008-09-19 01:48 "always check your inputs" 2008-09-19 01:48 yeah 2008-09-19 01:48 that one musta got quietly slipped by ted 2008-09-19 01:48 subtle due to a minor interface change 2008-09-19 01:49 and its not liek dentries have fixed sized strings in d_name 2008-09-19 01:49 so there is no hard maximum 2008-09-19 01:49 its just supposed be be checked 2008-09-19 01:49 i dunno the limit is rediculously short 2008-09-19 01:50 i think 255 bytes maybe 2008-09-19 01:50 why not allow for long filenames 2008-09-19 01:50 that's considered long 2008-09-19 01:51 i suppose 2008-09-19 01:51 silly limitation 2008-09-19 01:51 useful silly limitation 2008-09-19 01:51 of course it comes from wanting to represent the length with a byte 2008-09-19 01:51 so you can have fixed size dentries? 2008-09-19 01:51 fixed size? 2008-09-19 01:52 i'm asking 2008-09-19 01:52 is that the reason? 2008-09-19 01:52 oh i see 2008-09-19 01:52 they certainly aren't fixed size 2008-09-19 01:53 hm one byte lengths 2008-09-19 01:53 true, useful 2008-09-19 01:53 qstr 2008-09-19 01:53 yeah 2008-09-19 01:53 was looking at that earlier 2008-09-19 01:53 len is int 2008-09-19 01:54 it's checked somewhere but you're right 2008-09-19 01:54 ext3 is violating, not checking it 2008-09-19 01:54 ext2 2008-09-19 01:54 __d_path or something 2008-09-19 01:54 playing fast and loosey goosey 2008-09-19 01:55 er no 2008-09-19 01:55 where do the dentries get created? 2008-09-19 01:55 i guess i should look top-down 2008-09-19 01:55 start with sys_open 2008-09-19 01:56 rather than bottom up 2008-09-19 01:56 somewhere in path_walk 2008-09-19 02:01 ah yeah, just got there 2008-09-19 02:01 3440 objp = ____cache_alloc(cache, flags); 2008-09-19 02:01 :p 2008-09-19 02:01 damn thats twisty 2008-09-19 02:01 guy who invented slab also invented zfs 2008-09-19 02:02 eh? 2008-09-19 02:02 course I doubt he wrote four underbars there 2008-09-19 02:02 true 2008-09-19 02:03 this is by way of checking whether kmalloc returns ERR_PTR or just NULL on error 2008-09-19 02:03 seems to be the latter 2008-09-19 02:03 http://lxr.linux.no/linux+v2.6.26.5/fs/namei.c#L869 2008-09-19 02:03 maze naively assumed otherwise, of course maze is right and we are wrong 2008-09-19 02:04 but he have to match linux fart for fart 2008-09-19 02:05 -!- kushal(~kushal@121.246.36.162) has joined #tux3 2008-09-19 02:07 so i got all the way down to http://lxr.linux.no/linux+v2.6.26.5/fs/dcache.c#L1241 2008-09-19 02:07 but i can't find the damn length check 2008-09-19 02:07 let me know ;) 2008-09-19 02:07 try get_name 2008-09-19 02:09 bah better to do this during the day when europeans are sleep and lxr is fast 2008-09-19 02:10 heh 2008-09-19 02:10 you're poking in the right place 2008-09-19 02:10 by the time lxr loads i've lost my train of thought 2008-09-19 02:10 right 2008-09-19 02:10 it would be useful to install it 2008-09-19 02:10 then you can teach me 2008-09-19 02:11 involves postgres & mod_perl 2008-09-19 04:15 -!- kushal_(~kushal@121.246.36.194) has joined #tux3 2008-09-19 05:13 -!- kushal(~kushal@121.246.36.194) has joined #tux3 2008-09-19 07:52 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-19 08:34 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-19 09:27 -!- guile(~guile@89-159-217-245.rev.numericable.fr) has joined #tux3 2008-09-19 09:28 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-19 09:35 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-19 10:46 -!- Kirantpatil(~kiran@122.167.197.109) has joined #tux3 2008-09-19 10:46 -!- Kirantpatil(~kiran@122.167.197.109) has left #tux3 2008-09-19 11:09 actually whether I had used IS_ERR and ERR_PTR correctly was something else I'd been meaning to ask ;-) 2008-09-19 11:12 flipz: "Ext3cow was designed as a platform for regulatory compliance, and has been used to implement secure deletion, authenticated encryption, and incremental authentication. See the publications page for more details." 2008-09-19 11:13 http://www.ext3cow.com/Publications.html 2008-09-19 11:13 about idle - not lame - idle class is not meant to affect anything else using io, neither b/w wise nor latency wise - hence it has to be very conservative, if that's not what you want... don't use idle (and even then idle can still impact performance of non-idle tasks...) 2008-09-19 11:14 and ionice has more classes than just idle 2008-09-19 11:15 exactly two more 2008-09-19 11:15 although it's not as powerful as it could/should be 2008-09-19 11:15 although i think idle is the most appealing one 2008-09-19 11:16 its common to want to do io intensive tasks in the background like backups or whatnot 2008-09-19 11:16 yeah, using kvm now 2008-09-19 11:16 ionice'ing a kvm session? 2008-09-19 11:16 mind you of course, all my printk's are non-multithreaded-printk compatible - who cares ;-) [for now] 2008-09-19 11:18 I put the wq in the bio, cause I needed something to wait on... was there something else I could wait on, and wake up from the endbio func? 2008-09-19 11:18 uhm, what's wrong with just putting the wq there? what use is the extra level of indirection? 2008-09-19 11:19 If you do install your own lxr - pass links to it ;-) 2008-09-19 11:19 we don't need all kversions 2008-09-19 11:19 dd can write bytes without trunc 2008-09-19 11:20 ah, there it is in the log - still catchingup 2008-09-19 11:20 ;) 2008-09-19 11:20 hey what did you expect... I have no bloody idea what I'm doing ;-) [about the wrong things being doozers] 2008-09-19 11:20 hrm it would be cool if lxr could be tied to a git repo 2008-09-19 11:21 and dd is weird... but it works and is everywhere... 2008-09-19 11:21 or is it already 2008-09-19 11:22 you know what is annoying is the number of clicks you need to do to download anything from sourceforge 2008-09-19 11:22 ah yeah it talks to git 2008-09-19 11:25 yeah I was thinking I should be checking both for errors and null... 2008-09-19 11:26 kvm, and ionice, no was referring to running my tests in kvm, like flips is in uml 2008-09-19 11:27 clicks - yeah agreed 2008-09-19 11:27 Ah, caught up.... 2008-09-19 11:27 seems you guys had a productive night 2008-09-19 11:27 mine was as well 2008-09-19 11:27 first time in a long time that I'm not sleepy before noon 2008-09-19 11:27 i was going to ask what the problem was with putting the wq in the bio as well 2008-09-19 11:28 whats the difference if you put a pointer there 2008-09-19 11:28 well, I need both the wq, and a bool 2008-09-19 11:28 so I put in a pointer to a struct with both 2008-09-19 11:28 (also should probably have an error return field in there) 2008-09-19 11:30 ok, back to work 2008-09-19 11:56 +<----->bio->bi_io_vec[bio->bi_vcnt] = (struct bio_vec){ 2008-09-19 11:56 +<-----><------>.bv_page = virt_to_page(buf), 2008-09-19 11:56 +<-----><------>.bv_offset = offset_in_page(buf), 2008-09-19 11:56 +<-----><------>.bv_len = SB_SIZE }; 2008-09-19 11:56 +<----->bio->bi_size = SB_SIZE; 2008-09-19 11:56 +<----->bio->bi_end_io = end_io_read; 2008-09-19 11:56 +<----->bio->bi_private = &mz; 2008-09-19 11:56 +<----->bio->bi_vcnt = 1; 2008-09-19 11:57 either that should be bio->bi_io_vec[0] = ... 2008-09-19 11:57 or bio->bi_vcnt++; 2008-09-19 11:59 plus putting the wq on the stack is stack bloat - isn't that bad if we want 4k stacks? 2008-09-19 12:00 actually, adding some sort of debug stack depth tracking might be useful. 2008-09-19 12:00 record deepest spot on stack ever hit in your code 2008-09-19 12:00 hmm, maybe the kernel already does that automatically 2008-09-19 12:07 -!- nataliep(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-09-19 12:12 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-19 13:16 maze, ping 2008-09-19 13:17 maze, your ERR_PTR is mostly wrong, where you've applied it to functions that only return ptr or null 2008-09-19 13:17 entirely wrong to be precise ;) 2008-09-19 13:19 maze, putting the work queue in the bio is inherently fragile, the bio can disappear 2008-09-19 13:19 put a pointer to the work queue in the bio 2008-09-19 13:20 right, the wq is in a pointer in the bio 2008-09-19 13:20 yeah checking whether err-ptr is required or not was a todo 2008-09-19 13:20 but of course that's actually not documented anywhere 2008-09-19 13:20 useful, like at the top of said functions 2008-09-19 13:21 the way you've done it, you've got double indirection - I've got single 2008-09-19 13:21 maze, it would be very cool if lxr could be tied to a repo... think about versioned indexes :) 2008-09-19 13:21 heh 2008-09-19 13:21 you either indirect on the wait, or indirect on the complete 2008-09-19 13:22 the wait indirection will be executed more often than the complete 2008-09-19 13:23 bio->bi_vcnt++ would be an improvement 2008-09-19 13:24 the mz goes on the stack anyway 2008-09-19 13:24 if we're that tight for stack space we should not be doing 4K stacks 2008-09-19 13:24 (which people are slowly learning) 2008-09-19 13:31 Iceweasel can't find the server at m.a.z.e.pl. 2008-09-19 13:38 maze, by the way, a wait queue is tiny 2008-09-19 13:38 just a spinlock and a list 2008-09-19 13:38 yeah my university is migrating to a new building 2008-09-19 13:38 why does the mz go on stack if it's kmalloc'ed? 2008-09-19 13:39 my mz wan't kmalloced 2008-09-19 13:39 oh 2008-09-19 13:39 can't see your original code any more 2008-09-19 13:39 good to post that kind of thing to the list 2008-09-19 13:39 it was a nice hack 2008-09-19 13:39 very nice 2008-09-19 13:39 which hack? 2008-09-19 13:39 junkfs 2008-09-19 13:39 oh 2008-09-19 13:40 too bad bio is such a bloaty interface 2008-09-19 13:40 not easy to make useful helpers for it 2008-09-19 13:42 I have ;-) 2008-09-19 13:42 int bioio(int rw, dec_t dev, sector_t sector, unsigned size, 2008-09-19 13:42 endio_t endio, void *private, unsigned vecs, struct page *page, 2008-09-19 13:42 unsigned off, unsigned len); 2008-09-19 13:42 :p 2008-09-19 13:43 yeah, should probably call it something like synchronous_bio_io 2008-09-19 13:44 can shell it as syncbio 2008-09-19 13:44 or synchbio standing for synch[ronous]_b[io]_io 2008-09-19 13:44 don't want sync since that means something else 2008-09-19 13:44 not really 2008-09-19 13:44 it's just a part of a sync 2008-09-19 13:44 well, it won't sync to disk 2008-09-19 13:44 oh, wait it will 2008-09-19 13:44 it will 2008-09-19 13:44 uhm, even if it's an lvm volume? 2008-09-19 13:44 syncbio is the one 2008-09-19 13:45 yes 2008-09-19 13:45 right it will 2008-09-19 13:45 the page cache is above this level 2008-09-19 13:45 things get screwy when virtual block devices cache 2008-09-19 13:45 which some do 2008-09-19 13:45 I'm still not quite clear on how to do barriers and permit reordering in the elevator at this level 2008-09-19 13:45 and theyget screwy 2008-09-19 13:45 barriers are a big mess 2008-09-19 13:46 mostly we just close our eyes and try to do simple things 2008-09-19 13:46 but anyway 2008-09-19 13:46 as you can tell ... I don't do/like simple 2008-09-19 13:46 a barrier is a flag on any bio 2008-09-19 13:46 bad idea actually 2008-09-19 13:46 I like powerful - shoot yourself in the foot things ;-) 2008-09-19 13:46 barrier should be separate bio 2008-09-19 13:46 but a barrier should be more like a pointer to another bio(s) which should be first 2008-09-19 13:46 this write must happen after those writes 2008-09-19 13:46 maybe 2008-09-19 13:47 a new barrier api would be a nice contribution 2008-09-19 13:47 no reason for barriers between fs'es on two partitions on the same bdev 2008-09-19 13:47 current one is teh suck 2008-09-19 13:47 and I don't think you need barriers on read... 2008-09-19 13:47 you do 2008-09-19 13:47 I'll even remember why 2008-09-19 13:47 not badly 2008-09-19 13:47 why? something net related? 2008-09-19 13:47 but its the same as memory ops 2008-09-19 13:48 need all combinations if you look hard enough 2008-09-19 13:48 oh, but then it's a these rights must hit disk before this read 2008-09-19 13:48 s/rights/writes/ 2008-09-19 13:48 that kind of thing 2008-09-19 13:48 barriers between readS? 2008-09-19 13:48 barries between reads... hmm 2008-09-19 13:48 I'd think no 2008-09-19 13:48 probably tackling something at the wrong level 2008-09-19 13:48 I can see writes -> barrier -> writes/reads 2008-09-19 13:49 I don't see reads -> barrier -> reads, nor reads -> barrier -> writes 2008-09-19 13:49 although I guess what exactly should happen if read, write to same sector gets reordered... hmm. 2008-09-19 13:49 the arrow directions are ambiguous 2008-09-19 13:49 arrows pointing out time 2008-09-19 13:49 flow 2008-09-19 13:49 commas do that ;) 2008-09-19 13:50 will we really have to fix the bdev interface first? 2008-09-19 13:50 I can see reads/barrier/writes 2008-09-19 13:50 but not a strong case 2008-09-19 13:50 the bdev barrier interface? 2008-09-19 13:50 yes, it's naive 2008-09-19 13:50 well, and the prio interface 2008-09-19 13:51 there is none 2008-09-19 13:51 get it all kind of nice and usable 2008-09-19 13:51 the prio ideas are just a hack in one elevator option 2008-09-19 13:51 we (I?) need a bdev interface which is aio read/write scatter/gather with priorities htb-like and barriers 2008-09-19 13:52 true 2008-09-19 13:52 be happy to work on it with you 2008-09-19 13:52 a lot of it is there 2008-09-19 13:52 a lot isn't 2008-09-19 13:52 I have plenty of apps 2008-09-19 13:52 starting with media... 2008-09-19 13:53 how exactly barriers should work is an interesting question 2008-09-19 13:53 yes 2008-09-19 13:53 you don't want them to be too strong 2008-09-19 13:53 or awkward 2008-09-19 13:53 but strong enough to implement the consistency the fs needs 2008-09-19 13:53 you want it to solve the primary problem well, which is journalling 2008-09-19 13:53 exactly 2008-09-19 13:54 and it has to take into consideration real world disks 2008-09-19 13:54 and the fact they spin/seek - something to be very aware of when working on this, since impacts priorities much 2008-09-19 13:54 and you might desire consistency x-dev 2008-09-19 13:54 need to write to journal dev before hitting base dev 2008-09-19 13:54 maze, notice there is nothing read-specific about your endio 2008-09-19 13:55 *cute* 2008-09-19 13:55 needs a different name 2008-09-19 13:55 I know 2008-09-19 13:55 hey it was a hack ;-) 2008-09-19 13:55 not any more 2008-09-19 13:55 hehe 2008-09-19 13:55 right 2008-09-19 13:56 maze, I don't know what I was going on about with your bio private pointer, your usage is fine 2008-09-19 13:56 on the stack is more leet 2008-09-19 13:57 kmallocs are bad things 2008-09-19 13:57 I don't know 2008-09-19 13:57 fragment 2008-09-19 13:57 fragile 2008-09-19 13:57 stack is small nowadays 2008-09-19 13:57 not that small 2008-09-19 13:57 yeah, I've wanted to see exactly how much stack space I actually have 2008-09-19 13:57 for my leet new fs idea, I actually need to be very careful 2008-09-19 13:58 sure 2008-09-19 13:58 since with both a net layer and an fs layer it might get tight 2008-09-19 13:58 but this is on the other side of too careful 2008-09-19 13:58 mayhaps 2008-09-19 13:58 I'm still new ;-) 2008-09-19 13:58 you are? 2008-09-19 13:59 I think you're past 50 percentile in hacking skills of people who call themselves that 2008-09-19 13:59 kernel hacking 2008-09-19 13:59 another couple months will get you past 90 2008-09-19 13:59 I still have no idea about anything yet ;-) 2008-09-19 13:59 you think anybody else does? 2008-09-19 13:59 how'd we get all that crap in kernel if anybody had a clue? 2008-09-19 14:00 oh one thing, there are a few null statements in your code that you may not think are there 2008-09-19 14:01 extra semicolons 2008-09-19 14:05 oh, well I like semicolons 2008-09-19 14:05 think every } should be followed by a ; 2008-09-19 14:06 heading to grab lunch 2008-09-19 14:06 (except in } else {) 2008-09-19 14:06 and C/C++ just has bad syntax with semicolon 2008-09-19 14:06 s 2008-09-19 14:07 I assume they'll be gone ;) 2008-09-19 14:08 extra parents and curlies are also frowned at, but extra semicolons are cause for shouting 2008-09-19 14:08 extra parens I mean 2008-09-19 14:08 extra parents are probably ok, particularly in utah 2008-09-19 14:11 let em shout 2008-09-19 14:12 hmm, I wonder 2008-09-19 14:12 is it true that removing a semicolon will either 2008-09-19 14:12 a) result in code functioning the exact same way as before 2008-09-19 14:12 or 2008-09-19 14:12 b) result in a compile failure 2008-09-19 14:13 probably noy 2008-09-19 14:13 extra semicolons make the code more fragile 2008-09-19 14:13 you can get a big surprise if somebody adds a seemingly innocuous conditional 2008-09-19 14:14 in theory there is no effect on generated code 2008-09-19 14:14 in practice, theory and practice are different 2008-09-19 14:15 yeah while I know ';' is a statement seperator, I much prefer to think of them as end-of-statement markers 2008-09-19 14:15 hmm 2008-09-19 14:15 closet pascal groupie ;) 2008-09-19 14:15 maybe I got that backwards 2008-09-19 14:15 well 2008-09-19 14:15 I love pascal syntax 2008-09-19 14:15 soor 2008-09-19 14:15 sorry 2008-09-19 14:16 yes, backwards 2008-09-19 14:16 but still a closet pascaller I think 2008-09-19 14:16 whatever - everything should end with a ';' 2008-09-19 14:16 semicolons are stupid 2008-09-19 14:16 should be optional 2008-09-19 14:16 designers of c are/were stupid 2008-09-19 14:16 but since they are there, have to use them lindentally 2008-09-19 14:16 imho should be required ;-) 2008-09-19 14:17 every line should be required to have two semicolons, one at the beginning, one at the end 2008-09-19 14:17 because you need them anyway, - can't live without em 2008-09-19 14:17 nah 2008-09-19 14:17 every statement should end with a ';' 2008-09-19 14:18 might be at the end of line, might be in the middle, might extend into the next line 2008-09-19 14:18 whitespace shouldn't matter (although could cause compiler warnings) 2008-09-19 14:18 e l s e 2008-09-19 14:19 anway 2008-09-19 14:19 let's not go there ;) 2008-09-19 14:19 else should always be } else { 2008-09-19 14:19 you either have a simple if (blah) something; 2008-09-19 14:19 not considered lindenty to have curlies around single statements 2008-09-19 14:19 or an if () { ... } else { ... }; 2008-09-19 14:19 not saying I think that's good or bad, it's just not lindenty 2008-09-19 14:20 yeah, I know 2008-09-19 14:20 my personal opinion, is: 2008-09-19 14:20 either it's short and sweet fits on a line if (something) something; 2008-09-19 14:21 or should be the full multi-line if () {\n ...\n } else {\n ...\n };\n 2008-09-19 14:21 my personal opinion is, if it's written in C is going to look ugly and there is little you can do about it 2008-09-19 14:21 possibly without the else clause if not needed 2008-09-19 14:21 break your heart trying 2008-09-19 14:21 true 2008-09-19 14:21 folks 2008-09-19 14:22 yes, fixing C is something I've thought of, codenamed 'the language advanced', a curious mix of pascal/c/c++/java/asm/gnu-isms/lisp 2008-09-19 14:22 but besides thinking about it never got anywhere 2008-09-19 14:22 (never tried) 2008-09-19 14:22 bh: hey 2008-09-19 14:22 your brainpower is needed more badly elsewhere ;) 2008-09-19 14:22 hehe 2008-09-19 14:23 but if you write the language first, you can then write the kernel in a language which doesn't suck... 2008-09-19 14:23 you can and nobody will care 2008-09-19 14:23 agreed 2008-09-19 14:23 still an interesting exercise 2008-09-19 14:23 a disappear for years exercise 2008-09-19 14:23 true 2008-09-19 14:24 hence the 'haven't ever tried' part 2008-09-19 14:24 save it for when you're old 2008-09-19 14:24 show those whippersnappers 2008-09-19 14:24 I'm hoping someone else will do it 2008-09-19 14:24 they will 2008-09-19 14:24 or I'll get some smart/bright friends and students to do it 2008-09-19 14:24 there's always somebody with enough time on their hands to write an os from scratch 2008-09-19 14:25 they get 15 minutes of slashdot fame and a nice job where they can stew 2008-09-19 14:25 hehe 2008-09-19 14:25 if/when I go back to school to get my phd, I've been thinking about leading a course for some of the best'n'brightest with design and implementation of a language or os as the topic 2008-09-19 14:26 you can practice here 2008-09-19 14:26 anyway, back to earth 2008-09-19 14:26 you're already TA at tux3u 2008-09-19 14:26 sure 2008-09-19 14:26 got to think about the next level for junkfs/tux3fs 2008-09-19 14:26 right 2008-09-19 14:27 right now I'm trying to think of what I want from the mm subsystem for my fs 2008-09-19 14:27 it's cool how tux3fs is both ramfs and diskfs now, hmm? 2008-09-19 14:27 hehe 2008-09-19 14:27 that's the most instructive thing so far 2008-09-19 14:27 re vfs 2008-09-19 14:28 and at which layer of the vfs (generics for most ops or not) the interfaces need to happen 2008-09-19 14:28 linux kinda gets it right, it's just warty 2008-09-19 14:28 also... error trapping 2008-09-19 14:29 I'd like to see a stack unwinding/resource recovery discipline 2008-09-19 14:29 also wondering if implementing reads by userspace (with appropriate aligned buffer) by unmap and map in ro cow pages from page cache or somewhere else would be appropriate and fast and/or sow 2008-09-19 14:29 slow 2008-09-19 14:29 it would be appropriate even on linux 2008-09-19 14:29 and if there's any race there 2008-09-19 14:29 there are cases where it's slow, but in general it's powerful 2008-09-19 14:29 just linux isn't orgainized that way 2008-09-19 14:30 linux has a loopy approach 2008-09-19 14:30 very naive 2008-09-19 14:30 would get zero copy reads, and most of the time you don't write over that data 2008-09-19 14:30 as in... too many loops 2008-09-19 14:30 yes, and it gets even more fun with net + disk + buffer + both ways 2008-09-19 14:30 exactly 2008-09-19 14:30 hence I'm thinking of this as a two layer fs to begin with 2008-09-19 14:30 that's why nobody has been crazy enough to attempt it 2008-09-19 14:30 look a splice 2008-09-19 14:30 simple thing 2008-09-19 14:31 big disaster 2008-09-19 14:31 I'm not actually sure where splice is atm? 2008-09-19 14:31 what happened there? last I knew there was an exploit and fix and exploit and fix... 2008-09-19 14:31 freesearch lxr 2008-09-19 14:32 it's a feature build on a base of jello 2008-09-19 14:33 there does appear to be a way to get your own page to trigger on all sorts of page operations, so that's good 2008-09-19 14:33 oh yes 2008-09-19 14:33 it's fun 2008-09-19 14:33 meant code not page 2008-09-19 14:33 "stupid page tricks" 2008-09-19 14:34 yeah, but I'm guessing it's needed for decent cache coherency 2008-09-19 14:34 even if it will mean locking will effectively end up being page (not byte-range) based 2008-09-19 14:35 anyway, I figure it's important to know what's possible, to know what can be later implemented, and to design the possibility in from the start 2008-09-19 14:35 you'll be getting into vm soon enough 2008-09-19 14:36 you can help me with the variable page rewrite if you like 2008-09-19 14:36 linus said he would open 2.7 if I did that hack 2008-09-19 14:36 I've realized that the xattr interface can probably be used as a nice ioctl layer for the fs 2008-09-19 14:36 it can? 2008-09-19 14:36 yeah like setting fs.tux3.option = something on an inode 2008-09-19 14:37 and then reading it back 2008-09-19 14:37 have the stuff be auto-generated 2008-09-19 14:37 ioctls would not be pleasant for that 2008-09-19 14:37 and have options for stuff like type of optimizations to be used on this file or etc 2008-09-19 14:37 we have ddlink for that 2008-09-19 14:37 I think xattr is nice here - although haven't looked at ddlink 2008-09-19 14:37 reiser5 ;) 2008-09-19 14:37 hmm? 2008-09-19 14:37 ddlink is cool 2008-09-19 14:38 really cook 2008-09-19 14:38 cool 2008-09-19 14:38 reiser5? what's with reiser4? 2008-09-19 14:38 "dead" 2008-09-19 14:38 is reiser even being worked on? 2008-09-19 14:38 slowly 2008-09-19 14:38 very slowly 2008-09-19 14:39 is reiser 4 done? stable? dropped? 2008-09-19 14:39 quasi stable 2008-09-19 14:40 should be merged, under a different name imho 2008-09-19 14:41 chris mason was one of the big driving forces on reiser, at least reiser 3, and he's entirely devoted to btrfs now 2008-09-19 14:41 which is something like reiser 3.5 2008-09-19 15:00 ah 2008-09-19 15:00 I came up with a few interesting network fs related ideas last night 2008-09-19 15:00 was a very productive bath ;-) 2008-09-19 15:01 works for me too, showers though 2008-09-19 15:01 something about that running water 2008-09-19 15:01 settles the lame ideas, let's bouyant ones float to the top 2008-09-19 15:11 maze, 2008-09-19 15:11 while (vecs--) 2008-09-19 15:11 bio->bi_io_vec[bio->bi_vcnt++] = va_arg(args, struct bio_vec); 2008-09-19 15:13 int bio(int rw, dev_t dev, sector_t sector, bio_end_io_t endio, void *private, unsigned vecs, ...) 2008-09-19 16:57 what are you folks going to finished the file system ? 2008-09-19 16:57 what=when 2008-09-19 16:57 are were there yet ? 2008-09-19 16:58 ACTION grins 2008-09-19 17:31 gregkh is an idiot 2008-09-19 17:32 oh was that public 2008-09-19 17:32 http://dustinkirkland.wordpress.com/2008/09/18/whats-behind-gregkhs-latest-rant/ 2008-09-19 17:32 wouldn't be so bad if he could design, code or debug 2008-09-19 17:33 bh, we're getting closer 2008-09-19 17:33 the kernel port is getting a little attention 2008-09-19 17:33 needs a lot more 2008-09-19 17:56 true fact: the linux kernel makefile is 1600 lines long 2008-09-19 18:01 541 KBUILD_CFLAGS += $(call cc-option,-Wdeclaration-after-statement,) <- this is the line we kill to enable inline decls 2008-09-19 18:01 I guess we are going to do taht 2008-09-19 18:01 for now until just before merge 2008-09-19 18:09 sk8 oclock 2008-09-19 18:09 one could say sk8teen oclock 2008-09-19 18:18 -!- BSD(~bandan@70-4-203-156.area3.spcsdns.net) has joined #tux3 2008-09-19 18:26 -!- BSD(~bandan@70-4-203-156.area3.spcsdns.net) has joined #tux3 2008-09-19 19:02 -!- BSD(~bandan@70-4-203-156.area3.spcsdns.net) has joined #tux3 2008-09-19 19:46 -!- BSD(~bandan@68-244-245-217.area3.spcsdns.net) has joined #tux3 2008-09-19 19:57 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-09-19 20:42 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-19 20:51 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-19 21:20 Results 1 - 10 of about 283,000 for tux3. 2008-09-19 21:20 up 100k in a day 2008-09-19 21:20 wonder what happened 2008-09-19 21:21 100k...hits? 2008-09-19 21:21 musta been the waking post 2008-09-19 21:21 100k up in one day, yes 2008-09-19 21:21 damn 2008-09-19 21:21 the internet loves wanking I guess 2008-09-19 21:21 hey, did you guys want a faster lxr? 2008-09-19 21:21 very much 2008-09-19 21:22 ok, i just got one going on my home quad, i gotta tweak postgres and we should be ready to rock 2008-09-19 21:22 excellent 2008-09-19 21:22 of course bandwidth might be a problem 2008-09-19 21:22 your admin skillz rock 2008-09-19 21:22 shapor can fix that 2008-09-19 21:22 shap can fix anything, that bastard! :) 2008-09-19 21:23 truth 2008-09-19 21:23 do you need the free text searches? that's some extra software that i'd have to get/configure 2008-09-19 21:23 oh yes 2008-09-19 21:23 the whole enchilada 2008-09-19 21:23 freetext is essential 2008-09-19 21:23 ok, i'll go play with that 2008-09-19 21:24 thanks much 2008-09-19 21:24 have to run out before whole foods closes 2008-09-19 21:24 didnt know, i just started dicking around with it to remind myself what sysadmining was like in linux ;) 2008-09-19 21:24 or I don't get my sushi tonight 2008-09-19 21:24 go get sushi, that's a moral imperative 2008-09-19 21:24 it's a mess, isn't it. LXR install I mean 2008-09-19 21:24 it aint pretty, but than again i dont do normal...anything 2008-09-19 21:25 that's a good sign 2008-09-19 21:25 bbiaf 2008-09-19 21:26 lol 2008-09-19 21:43 Bushman: what kind of bandwidth do you have? 2008-09-19 21:43 ACTION puts head down in shame 2008-09-19 21:43 cable modem 2008-09-19 21:44 its good enough 2008-09-19 21:44 most requests are only a few 10's of kb probably 2008-09-19 21:44 i can 'pimp my apache' and turn on compression since it's text it should be alright 2008-09-19 21:44 by looking at the code cpu is gonna be the bottle neck first ;) 2008-09-19 21:45 yea the db design makes baby jesus cry 2008-09-19 21:45 so i gotta go through postgres configs first, which can take hours to do right, that shit is complicated 2008-09-19 21:46 i'm i can get a dual Dual Core Woodcrest 2008-09-19 21:46 for $120/mo 2008-09-19 21:46 i might do that 2008-09-19 21:47 since i'm finally more than breaking even with my current servers 2008-09-19 21:47 the database for a whole 2.6.26.5 tree is about 1.1gb, so i'll just shove the whole thing into memory 2008-09-19 21:47 how much ram does it have? 2008-09-19 21:47 braking even? what are you hosing? 2008-09-19 21:47 6gb ;) 2008-09-19 21:47 a few sites 2008-09-19 21:48 it's my Matlab cruncher ;) 2008-09-19 21:48 well a few paying sites 2008-09-19 21:48 persiankitty is back? :) 2008-09-19 21:48 haha 2008-09-19 21:48 then various freebees like zumastor.org 2008-09-19 21:49 Bushman: you installed lxrng right? 2008-09-19 21:49 thats the one running on lxr.linux.no 2008-09-19 21:49 lxrng? i just found lxr-devel 2008-09-19 21:49 i think that ones out of date 2008-09-19 21:49 it's like 0.9.5 2008-09-19 21:49 but if it works thats cool 2008-09-19 21:50 there are annoying bugs in the one running on lxr.linux.no 2008-09-19 21:50 like it will show the same result multiple times 2008-09-19 21:50 see http://lxr.linux.no/ at the bottom of the page 2008-09-19 21:50 step by step instructions for setting it up too ;) 2008-09-19 21:50 well i havent seen the web part of it yet, the first run through the code indexing just ended like 10 mins ago 2008-09-19 21:51 ah cool 2008-09-19 21:51 took a while 2008-09-19 21:51 too bad i dont have any phat hardware at home 2008-09-19 21:51 i have 15/15 fios 2008-09-19 21:51 should i ship you a box? :) 2008-09-19 21:52 how many watts ? ;) 2008-09-19 21:52 if i were to guess it probably sounds like a helicopter 2008-09-19 21:52 never knew you to have an quiet boxes 2008-09-19 21:52 ups says about 90 sitting idle with cpu frequency throttling, 140+ at full boogie 2008-09-19 21:53 so 100 on average 2008-09-19 21:53 it's a bulb 2008-09-19 21:53 so about $8/mon 2008-09-19 21:54 are you all green, or just being a cheapass? 2008-09-19 21:54 little from column a... 2008-09-19 21:54 dont have to answer, i know the answer ;) 2008-09-19 21:54 hm the place i used to live had free electricity and fios available 2008-09-19 21:55 should build a datacenter in it 2008-09-19 21:55 or a grow house ;) 2008-09-19 21:55 ok u turn 2008-09-19 21:55 is that a 'weeds' reference? 2008-09-19 21:55 get me a prius 2008-09-19 21:56 so i can sneak up on mofos real quiet 2008-09-19 21:56 'so i can sneak up on motherfuckers' 2008-09-19 21:56 hah 2008-09-19 21:56 that's a bit spooky 2008-09-19 21:56 annnyway 2008-09-19 21:57 bandwidth shouldnt be a problem 2008-09-19 21:57 if it is, ship it to me ;) 2008-09-19 21:57 cant we just set it up on one of your real servers? 2008-09-19 21:57 sure 2008-09-19 21:57 but i dont want to kill the cpus 2008-09-19 21:57 we'll see how it goes on cable modem 2008-09-19 21:57 i can put it up on marcintology, but that's just a normal shared web account 2008-09-19 21:57 i can run dyndns for you if you dont have it already 2008-09-19 21:58 oh i got dns for it ;) 2008-09-19 21:58 lxr.tux3.org ? 2008-09-19 21:58 ooh could we pull from ddtree too? 2008-09-19 21:58 ok, let's do that than 2008-09-19 21:58 that would be slick! 2008-09-19 21:58 flipz: you like that idea? 2008-09-19 21:58 well first i wanna see if i can get it working nicely 2008-09-19 21:58 have the ddtree lxr'ed ? ;) 2008-09-19 21:58 flipz is doing sushi 2008-09-19 21:59 ah ok well i'm out for a bit too 2008-09-19 21:59 i sushi'ed m'self for lunch yesterday so i should be good for a day or two before i'm gonna start jonsing again 2008-09-19 21:59 good, i cant work with you making me laugh 2008-09-19 22:08 http://blogs.pcworld.com/staffblog/archives/007783.html 2008-09-19 22:08 take a look at the windows ad at the bottom... 2008-09-19 22:09 some enlightened folk at microsoft snuck in penguins... 2008-09-19 22:09 they're everywhere 2008-09-19 22:12 if needs be I can probably stick lxr on an athlon64 at my univ in poland 2008-09-19 22:12 or, maybe I could host a second copy from home off of comcast 2008-09-19 22:13 ddtree.tux3.org 2008-09-19 22:14 no functionen 2008-09-19 22:14 Host ddtree.tux3.org not found: 3(NXDOMAIN) 2008-09-19 22:15 it was just a suggestion 2008-09-19 22:15 we can make it happen any time 2008-09-19 22:15 haha 2008-09-19 22:16 nice redmond penquins 2008-09-19 22:16 are they that clueless or are there subversives inside m$? 2008-09-19 22:19 I'm guessing subversion from inside 2008-09-19 22:19 but it looks like windows out onto the future to me, penguins everywhere 2008-09-19 22:22 ugh, reading a notebook review and someone is claiming gigabit wired is overkill... 2008-09-19 22:22 what the hell are they drinking? 2008-09-19 22:22 or smoking 2008-09-19 22:23 "The thing that gets us out of bed every day is the prospect of creating pathways above, around and through walls." msft marketdroid 2008-09-19 22:23 sheesh 2008-09-19 22:23 I'm glad I'm not him 2008-09-19 22:23 or them 2008-09-19 22:23 what gets me out of bed is sheer effort of will 2008-09-19 22:23 and the prospect of some french roast 2008-09-19 22:25 heh, for me it's usually the buzzer and a sense of duty 2008-09-19 22:25 when the family gets back tomorrow it will be a four year old jumping on me 2008-09-19 22:26 "time to get up and play daddy" 2008-09-19 22:27 "An approach dedicated to engineering the absence of anything that might stand in the way" -- my gawd who came up with that one, ballmer? 2008-09-19 22:27 ugh 2008-09-19 22:27 engineering the absence - new msft slogan 2008-09-19 22:31 i'm all for engineering the absence 2008-09-19 22:31 of msft 2008-09-19 22:32 is the stupid question hour still on? 2008-09-19 22:32 yep 2008-09-19 22:33 vm.swappiness, wanna explain it to me, what's it really do, what rules of thumb i wanna use to determine it, etc 2008-09-19 22:33 oh, one of those 2008-09-19 22:34 it's an andrewism 2008-09-19 22:34 did i pick a touchy topic? :) 2008-09-19 22:34 take a vm that just plain doesn't swap very well and bolt knobs on it 2008-09-19 22:34 akpm is a friend 2008-09-19 22:34 but the linux vm has been dire for a few years 2008-09-19 22:34 `http://kerneltrap.org/node/3000 2008-09-19 22:34 swappiness is one of the attempted bandaids 2008-09-19 22:36 what's wrong with knobs? better to have them than to not have them, isnt it? 2008-09-19 22:36 better to have it work 2008-09-19 22:36 than to give up and offer a knob which also doesn't work 2008-09-19 22:37 like those old televisions 2008-09-19 22:37 you had a color control that ranged from "very green" to "very red" with "very blue" in between 2008-09-19 22:38 anyway 2008-09-19 22:38 I'm taking a sabbatical from vm 2008-09-19 22:38 so I am licensed to throw turds 2008-09-19 22:42 Bushman: the vm operates in a delicate balance right now with knobs pulling in 4 dimensions 2008-09-19 22:42 this breaks more than you think 2008-09-19 22:43 i've seen memory recursion deadlocks with AoE and ddsnap 2008-09-19 22:43 as well as other wacky behavior like: 2008-09-19 22:43 http://pengaru.com/~swivel/pop_comparisons/04-26-2006/ 2008-09-19 22:44 I don't even want to think about the vm 2008-09-19 22:44 let's stick with fixing fs and bdev 2008-09-19 22:44 Vito sounds like he could use to do some PCA to reduce the dimensionality in his quest for performance 2008-09-19 22:46 its not a lot to ask of a modern server to be able to parse silly protocols like pop at wire speed 2008-09-19 22:46 and cache some stuff 2008-09-19 22:47 thats a case of the os clearly pissing in your cheerios 2008-09-19 22:48 evicting >2GB of buffer cache at a time due to brain damage in the vm 2008-09-19 22:49 shapor: you're spoiled, you need to work with windows for a while ;) 2008-09-19 22:49 no thanks 2008-09-19 22:49 once i reinstalled a SCSI driver and all my fonts went bold 2008-09-19 22:49 wanna explain that one 2008-09-19 22:51 mmm the nigiri was fine, now time to look into the sake issue 2008-09-19 22:51 lol 2008-09-19 22:52 bushman, use after free? 2008-09-19 22:52 in the registry? 2008-09-19 22:53 that was a long time ago, i dont remember. i saw that and i thought i had some bad sushi or something ;) 2008-09-19 22:55 i'm still working on the freetext indexing, do we need just straight text/html parsing, or do we want more like PDF or .doc? 2008-09-19 22:55 there's no such thing as bad sushi 2008-09-19 22:55 careful there ;) 2008-09-19 22:55 MaZe: let it sit out in the sun for few hours, then you'll experience bad sushi 2008-09-19 22:56 bushman, I didn't know lxr had that knob 2008-09-19 22:56 but straight text, yes 2008-09-19 22:56 it's not lxr, it's swish, the text indexer 2008-09-19 22:56 right, so lxr doesn't recommend a config for swish? 2008-09-19 22:56 I thought they did 2008-09-19 22:57 kinda, but if i'm doing it by hand, might as well pimp it out a bit 2008-09-19 22:57 anyway, text 2008-09-19 22:58 plus Shap tells me that i grabbed a slightly different code than what the norwiegian lxr site is running 2008-09-19 22:58 remember when it was glimpse? 2008-09-19 22:58 ıʞsʍoʞʎzɔuǝż ˙ɐ ɾǝıɔɐɯ - who can read this? 2008-09-19 22:58 a kinda sort open indexer 2008-09-19 22:58 university project 2008-09-19 22:58 the lxr manual says glimpse doesnt support all the functions it wants, so i went with swish 2008-09-19 22:58 they tried to close it up and make some dough, nobody used it, finally somebody wrote swishe and nobody remembers glimpse 2008-09-19 22:59 i've used this russian indexer/search engine called mnogosearch before, if i'm really bored i might see if i can use that here 2008-09-19 23:00 one of the things we plan to get happening with tux3 is proper incremental indexintg 2008-09-19 23:04 why are we leaking the sb inode map in inode.c test? 2008-09-19 23:04 ==31367== 8,160 (8,040 direct, 120 indirect) bytes in 1 blocks are definitely lost in loss record 4 of 7 2008-09-19 23:04 ==31367== at 0x4A1B858: malloc (vg_replace_malloc.c:149) 2008-09-19 23:04 ==31367== by 0x401E44: new_map (buffer.c:452) 2008-09-19 23:04 ==31367== by 0x40A088: new_inode (inode.c:128) 2008-09-19 23:04 ==31367== by 0x40B53C: make_tux3 (inode.c:493) 2008-09-19 23:04 ==31367== by 0x40BA21: main (inode.c:554) 2008-09-19 23:04 because it's broken ;-) 2008-09-19 23:04 shapor, we want to know 2008-09-19 23:05 does seem like the test is broken 2008-09-19 23:05 doesn't* 2008-09-19 23:06 want hints to track it down? 2008-09-19 23:06 put exit(1) somewhere and see if you put it before or after the leak 2008-09-19 23:06 smrt 2008-09-19 23:06 question: 2008-09-19 23:06 if a page-fault gets triggered 2008-09-19 23:07 due to the page being not present or write to read-only page 2008-09-19 23:07 from user space 2008-09-19 23:07 what context do you end up in the kernel? is that considered process context or interrupt context? 2008-09-19 23:07 process 2008-09-19 23:07 but not process 2008-09-19 23:07 it's non-interrupt kernel 2008-09-19 23:08 lol, so what can/can you not do - how does it differ from process and interrupt? 2008-09-19 23:08 can you sleep? 2008-09-19 23:08 any doc pointers? 2008-09-19 23:08 yes, you have to in order to read in the page 2008-09-19 23:08 right - fair point 2008-09-19 23:09 and of course another thread can trigger the same page fault again before you read it in, so you need to lock appropriately 2008-09-19 23:09 or that might happen automagically 2008-09-19 23:09 http://lxr.linux.no/linux+v2.6.26.5/arch/x86/mm/fault.c 2008-09-19 23:09 so how many different types of contexts do we have? 2008-09-19 23:10 process? kthread? fault handler? interrupt? anything else? 2008-09-19 23:10 wtf if i put an exit(1) right before the return 0; in main it doesn't detect a leak 2008-09-19 23:10 try exit(0)? 2008-09-19 23:10 does it always leak the same way? 2008-09-19 23:10 maybe instead of exit(1) do goto return 0 at end of main? 2008-09-19 23:11 same 2008-09-19 23:11 cant you printf a bunch of usual suspects? 2008-09-19 23:11 exit(0) also no error 2008-09-19 23:11 maze, http://lxr.linux.no/linux+v2.6.26.5/Documentation/exception.txt#L266 2008-09-19 23:11 maybe it bypasses the check code? 2008-09-19 23:11 i'll try moving the return up instead 2008-09-19 23:12 right that's the dealing with exceptions in kernel space 2008-09-19 23:12 shapor, right, valgrind does that 2008-09-19 23:12 and using them to detect unauthorized reads, etc 2008-09-19 23:12 maze, process, kthread and fault handler are all the same 2008-09-19 23:12 shap: you got ida pro handy? 2008-09-19 23:13 oh really? so basically there's just 2: process/kthread/fault vs interrupt 2008-09-19 23:13 the only real difference with process is it has a user address space to work with 2008-09-19 23:13 mm 2008-09-19 23:13 that's actually just a flag bit 2008-09-19 23:13 and the doc you referred to is about faults triggered from the kernel 2008-09-19 23:13 like having a file table 2008-09-19 23:13 ah 2008-09-19 23:14 right, but since we don't have any pointers passed in as parameters from userspace that doesn't really much matter 2008-09-19 23:14 maze, the doc tells you about do_page_fault 2008-09-19 23:15 that it gets called? 2008-09-19 23:15 I know that ;-) 2008-09-19 23:15 but do_page_fault does special stuff if the page fault got triggered with eip in kernel space 2008-09-19 23:16 ah i see whats going on 2008-09-19 23:16 although I guess we can trigger page fault-ins for user pages from kernel space as well 2008-09-19 23:17 with the leak 2008-09-19 23:17 so the two cases shouldn't really differ 2008-09-19 23:17 how would that change anythign? if you need to get a page, gotta fetch it, regardless where EIP is pointing at, isnt it? 2008-09-19 23:18 we don't support faults in kernel space 2008-09-19 23:18 you don't want to trigger sigsegv from kernel space 2008-09-19 23:18 only object is to oops properly 2008-09-19 23:18 used to panic on that 2008-09-19 23:18 you return EFAULT instead 2008-09-19 23:18 ok so how do you prevent that? 2008-09-19 23:19 wait, so what happens if userspace syscall writes and the buffer I passed in is swapped out? 2008-09-19 23:19 I'd always assumed the page-ins would happen via fault, do we actually map the memory in in some other way? 2008-09-19 23:23 maze, ok it took me a while to remember 2008-09-19 23:23 when a fault occurs, unlike an interrupt it isn't interrupting some random process 2008-09-19 23:23 right 2008-09-19 23:23 it faults in the process that needs to work, so the fault handler uses that context 2008-09-19 23:24 it just has to do a little register fiddling and play with the intruction pointer 2008-09-19 23:24 in some cases, parsing the instruction stream 2008-09-19 23:24 dimly remembering this from the last time I did it, many years ago 2008-09-19 23:24 fault semantics on x86 are utter crap 2008-09-19 23:24 so basically a fault triggered from userspace is almost like a syscall 2008-09-19 23:24 yes 2008-09-19 23:25 besides the fact it can happen anywhere, and needs some special asm on entry/exit to deal with the weird semantics 2008-09-19 23:25 http://lxr.linux.no/linux+v2.6.26.5/arch/sh/kernel/cpu/sh5/entry.S#L1134 2008-09-19 23:25 for example 2008-09-19 23:25 how about a fault triggered on access from within the kernel? 2008-09-19 23:26 oops if you're lucky 2008-09-19 23:26 panic if not 2008-09-19 23:26 and what if its triggered from irq context? 2008-09-19 23:26 death 2008-09-19 23:26 try it ;) 2008-09-19 23:26 it's easy 2008-09-19 23:26 so how does the kernel guarantee the userspace memory accessed by syscalls is present? 2008-09-19 23:27 it goes delving into page table entries and things 2008-09-19 23:27 do we grab and release locks on the userspace memory before and after the copy (and if not present call the pagein handlers manually?) 2008-09-19 23:27 well 2008-09-19 23:27 it doesn't need to be present 2008-09-19 23:27 because it can fault 2008-09-19 23:27 huh? 2008-09-19 23:27 well 2008-09-19 23:27 sorry 2008-09-19 23:27 not in kernel ;) 2008-09-19 23:27 double huh 2008-09-19 23:27 it does the fault by hand 2008-09-19 23:27 see get_user_pages 2008-09-19 23:28 right, so like I said above at 11:27:10 2008-09-19 23:28 don't have timestamps on 2008-09-19 23:28 (11:27:10 PM) MaZe: do we grab and release locks on the userspace memory before and after the copy (and if not present call the pagein handlers manually?) 2008-09-19 23:29 we don't grab locks on user memory 2008-09-19 23:29 not sure what the question is 2008-09-19 23:29 not on struct page *? 2008-09-19 23:29 no 2008-09-19 23:29 not for that 2008-09-19 23:29 we take ref counts 2008-09-19 23:29 ain't that the same thing? 2008-09-19 23:29 nonzero ref count holds a page in memory 2008-09-19 23:29 no 2008-09-19 23:30 but it prevents the page from disappearing from under us? right? so it's like a lock - except others can access it as well, right 2008-09-19 23:30 see why lock is an inappropriate name here 2008-09-19 23:30 I meant lock in the sense of lock into ram 2008-09-19 23:30 it's not at all like a lock 2008-09-19 23:30 it's a refcount 2008-09-19 23:31 a lock is a serializer, and recount is a don't kill me 2008-09-19 23:31 http://lxr.linux.no/linux+v2.6.26.5/mm/memory.c#L962 <- follow_page, doing the job of mm hardware by hand 2008-09-19 23:32 does refcount = 0 immediately result in memory being destroyed? 2008-09-19 23:32 yes 2008-09-19 23:32 or will it only get evicted if need be at that point? 2008-09-19 23:32 now... depends whether it's anon or not 2008-09-19 23:33 anon has to be swept up, page cache is immediately freed at that point 2008-09-19 23:33 don't quote me, I used to hack that stuff ;) 2008-09-19 23:33 but it's been a while 2008-09-19 23:33 so normally a processes mapping holds a refcount on it's memory? 2008-09-19 23:34 but that doesn't work 2008-09-19 23:34 check __free_page 2008-09-19 23:34 there have to be 2 layers here 2008-09-19 23:34 one virtual - what a process needs, one physical - what's in memory 2008-09-19 23:34 that puts the page back on the buddy as soon as count hits zero I believe 2008-09-19 23:34 probaby vma vs page 2008-09-19 23:34 so... treatment of inodes by the vfs is rather different 2008-09-19 23:35 there is one refcount on a page for each pointer to it, basically 2008-09-19 23:35 including one for the lru, I tried to implement, andrew never took the patch 2008-09-19 23:35 but definitely one for the page cache 2008-09-19 23:35 and one for each page table entry pointing at it 2008-09-19 23:35 there aren't really two layers in linux 2008-09-19 23:35 unlike freebsd 2008-09-19 23:36 it's all one layer 2008-09-19 23:36 -!- amey(~amey@116.73.35.180) has joined #tux3 2008-09-19 23:36 vma just specifies some access rights to memory regions 2008-09-19 23:37 hmm 2008-09-19 23:37 you have lots of time to get that sorted 2008-09-19 23:37 has no real impact on fs development 2008-09-19 23:39 am I right in assuming that during kernel startup 2008-09-19 23:39 a fragment of physical memory is reserved for a big honking array 2008-09-19 23:39 of struct pages - one per page of physical memory in the system? including high-mem and all that? 2008-09-19 23:40 possibly with some hacks for discontig mem 2008-09-19 23:40 yes 2008-09-19 23:40 it's rather crude 2008-09-19 23:40 and struct page as struct address_space * mapping in it 2008-09-19 23:40 s/as/has/ 2008-09-19 23:41 which appears to be inode related 2008-09-19 23:41 very much so 2008-09-19 23:41 we make every page header have that field even if its anon where the field has no use 2008-09-19 23:42 how big is a struct page - around 50 bytes? 2008-09-19 23:42 address_space, mapping, and page cache are different names for the same thing by the way 2008-09-19 23:42 stupidly sloppy terminology 2008-09-19 23:42 less I think 2008-09-19 23:43 it's been heavily sqzd 2008-09-19 23:43 maybe 50 on 64 bit 2008-09-19 23:43 so we basically throw 1% of memory out for accounting purposes. 2008-09-19 23:43 go into junkfs and printf(... sizeof(struct page)) 2008-09-19 23:43 much more than that 2008-09-19 23:44 and that's not even including the cpu pagetables 2008-09-19 23:44 dentry and inode cache are really extravagant 2008-09-19 23:44 it's not lean and mean 2008-09-19 23:44 only compared to even worse kernels 2008-09-19 23:45 56 2008-09-19 23:45 64 bit, right? 2008-09-19 23:45 yes 2008-09-19 23:46 multiply 56 times 1 TB / 4096 2008-09-19 23:46 kind of wicked how you can use junkfs as a code injector 2008-09-19 23:46 that's the point 2008-09-19 23:46 side door into the kernel for dodgy people 2008-09-19 23:46 14 GB 2008-09-19 23:46 so... imagine the suck when we scan that 2008-09-19 23:47 the sound of sucking is the only thing you hear from that computer 2008-09-19 23:47 scan it for what? 2008-09-19 23:47 anything 2008-09-19 23:47 freeable memory 2008-09-19 23:47 why would we want to scan it? 2008-09-19 23:47 oh 2008-09-19 23:47 so there's no heap structure of memory or anything like that 2008-09-19 23:47 nope 2008-09-19 23:47 it's the crudest imaginable system 2008-09-19 23:47 oh, that is indeed quite a vacuum 2008-09-19 23:48 while 1 TB is still rare 2008-09-19 23:48 it was only recently that linus allow vma to be a tree isntead of a linear list 2008-09-19 23:48 32-128G is perfectly reasonable nowadays 2008-09-19 23:49 and at that point we have almos 1-2G of struct page's 2008-09-19 23:49 1 tb is right around the corner 2008-09-19 23:49 argh 2008-09-19 23:49 -!- amey(~amey@116.73.35.180) has left #tux3 2008-09-19 23:49 so basically another part that wasn't designed 2008-09-19 23:49 indeed 2008-09-19 23:50 that comment applies to almost all the parts 2008-09-19 23:50 s/almost// 2008-09-19 23:50 do other kernels get this even worse? 2008-09-19 23:50 yes 2008-09-19 23:50 unbelievable? 2008-09-19 23:50 yes 2008-09-19 23:50 but true 2008-09-19 23:50 possible exception of, oh, qnx 2008-09-19 23:51 I'd always assumed the kernel to be this awesome C/assembler layer of wicked algos and data structures 2008-09-19 23:51 haha 2008-09-19 23:51 welcome, you're a kernel hacker now 2008-09-19 23:51 evrything optimized and tuned to hell and back 2008-09-19 23:51 to hell, not back 2008-09-19 23:51 lol 2008-09-19 23:52 some bits are ok 2008-09-19 23:52 yeah 2008-09-19 23:52 some bits are pretty damm amazing 2008-09-19 23:52 and I'm assuming it is in general getting better over time 2008-09-19 23:52 but most bits are just plain crap 2008-09-19 23:52 hard to say 2008-09-19 23:52 it's getting bigger 2008-09-19 23:53 yes, I've noticed 2008-09-19 23:53 I'm not sure its getting faster, seems to be regressing a little 2008-09-19 23:53 but I've assumed that hasn't been core functionality 2008-09-19 23:53 more just new drivers 2008-09-19 23:53 new filesystems 2008-09-19 23:53 etc 2008-09-19 23:53 also core 2008-09-19 23:53 all the big iron stuff 2008-09-19 23:53 from sgi and ibm 2008-09-19 23:54 hmm 2008-09-19 23:54 buffer.c and filemap.c get longer and longer 2008-09-19 23:54 so 2.7 happens when we get rid of bh? 2008-09-19 23:54 things like mpage.c appear 2008-09-19 23:54 linus said the variable sized page patch would be enough to open 2.7 2008-09-19 23:54 that was a while ago 2008-09-19 23:54 thing is, I'm not sure I would want 2.7 to happen 2008-09-19 23:54 2.5 was an utter mess 2008-09-19 23:55 don't know how much he treats an email as a promise ;) 2008-09-19 23:55 and 2.6. up to 2.6.7 or so was junk 2008-09-19 23:55 going through that again would be painful 2008-09-19 23:55 2.6 is kinda starting to stink 2008-09-19 23:55 it was fresh and new once 2008-09-19 23:56 well 2008-09-19 23:56 it's different 2008-09-19 23:56 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-19 23:56 in 2.3/4/5 we had a desparate situation 2008-09-19 23:56 nobody had a kernel that worked properly 2008-09-19 23:56 otoh, it does seem like there's a lot of deep in the core stuff that should be changed 2008-09-19 23:56 complaints about paging artifacts every day on lkml 2008-09-19 23:57 for years 2008-09-19 23:57 that was bad 2008-09-19 23:57 it just didn't work 2008-09-19 23:57 it's better now, where 2.6 may suck but it works 2008-09-19 23:57 that's a good base to step out and do some housecleaning 2008-09-19 23:57 how much of this is sucky code, or bad algos, and how much is it just being np-complete or unsolvable problems 2008-09-19 23:58 most of it is just sucky code 2008-09-19 23:58 nearly all 2008-09-19 23:58 we know about impossibility 2008-09-19 23:58 don't count that 2008-09-19 23:59 also, we have much better processes for bug tracking and nailing regressions 2008-09-19 23:59 back in the day it was just linus and some text mode mailer